# WHAT CAN WE MAKE OF THEORIES OF EMBODIMENT AND THE ROLE OF THE HUMAN MIRROR NEURON SYSTEM?

EDITED BY: Analía Arévalo, Juliana Baldo, Fernando González-Perilli and Agustín Ibáñez PUBLISHED IN: Frontiers in Human Neuroscience

#### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-761-3 DOI 10.3389/978-2-88919-761-3

# About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **WHAT CAN WE MAKE OF THEORIES OF EMBODIMENT AND THE ROLE OF THE HUMAN MIRROR NEURON SYSTEM?**

Topic Editors:

**Analía Arévalo,** Universidad de la República, Uruguay **Juliana Baldo,** VA Northern California Health Care System, USA **Fernando González-Perilli,** Universidad de la República, Uruguay **Agustín Ibáñez,** Institute of Cognitive Neurology, Argentina

Echo and Narcissus (John William Waterhouse, 1903). Echo and Narcissus (1903) by John William Waterhouse Walker Art Gallery, Liverpool

In recent years, work surrounding theories of embodiment and the role of the putative mirror neuron system (MNS) in humans has gained considerable attention. If humans have developed a network of neurons that fire in response to other beings' actions, as has been shown in macaques, this system could have vast implications for all kinds of cognitive processes unique to humans, such as language, learning, empathy and communication in general. The goal of tapping into and understanding such a system is a fascinating yet challenging one. One form of embodiment — embodied linguistics —

suggests that the way we process linguistic information is linked to our physical experience of the concept conveyed by each word. The interaction between these cognitive systems (i.e., language and motor processing) may occur thanks to the firing of neurons making up the MNS. The possible interdependence between different cognitive systems has implications for healthy as well as pathological profiles, and in fact, work in recent years has also explored the role of 'embodiment' and/or the MNS in clinical populations such as stroke, Parkinson's Disease, Alzheimer's Disease, and Autism, among others.

Research on embodiment and/or the MNS has been approached with a number of different methodologies, but the results obtained with these different methodologies have not been entirely consistent, generating doubts regarding the theories. The question has been raised as to what this line of inquiry can gain from the types of evidence contributed by functional neuroimaging methods carried out with healthy volunteers versus behavioral or lesion-symptom mapping methods employed with neurologically-compromised individuals.

Of particular interest are the clinical applications of this line of research. If indeed a system exists which reflects a tight link between, for example, the human language and motor systems, then the obvious challenge is to tap into this system to create useful therapies that can provide rehabilitation where damage has occurred.

This Research Topic brought together work conducted with healthy and patient populations using several behavioral and imaging techniques, as well as insightful commentaries and opinion pieces. We believe the combined work of the participating authors is an important contribution to this intriguing line of research and an excellent point of reference for future work.

**Citation:** Arévalo, A., Baldo, J., González-Perilli, F., Ibáñez, A., eds. (2016). What Can We Make of Theories of Embodiment and the Role of the Human Mirror Neuron System? Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-761-3

# Table of Contents

*05 Editorial: What can we make of theories of embodiment and the role of the human mirror neuron system?*

Analía Arévalo, Juliana Baldo, Fernando González-Perilli and Agustín Ibáñez


Indra T. Mahayana, Michael J. Banissy, Chiao-Yun Chen, Vincent Walsh, Chi-Hung Juan and Neil G. Muggleton

*42 Neuroanatomical substrates of action perception and understanding: An anatomic likelihood estimation meta-analysis of lesion-symptom mapping studies in brain injured patients*

Cosimo Urgesi, Matteo Candidi and Alessio Avenanti

*59 Observation and imitation of actions performed by humans, androids, and robots: An EMG study*

Galit Hofree, Burcu A. Urgen, Piotr Winkielman and Ayse P. Saygin

*73 Sticking your neck out and burying the hatchet: What idioms reveal about embodied simulation*

Natalie A. Kacinik


François Osiurak

*99 The affordance-matching hypothesis: How objects guide action understanding and prediction*

Patric Bach, Toby Nicholson and Matthew Hudson

*112 No need to match: A comment on Bach, Nicholson and Hudson's "Affordance-Matching Hypothesis"*

Sebo Uithol and Monica Maranesi

*114 Response: No need to match: a comment on Bach, Nicholson, and Hudson's "Affordance-Matching Hypothesis"*

Patric Bach, Toby Nicholson and Matthew Hudson

# Editorial: What can we make of theories of embodiment and the role of the human mirror neuron system?

Analía Arévalo<sup>1</sup> \*, Juliana Baldo<sup>1</sup> , Fernando González-Perilli 2, 3 and Agustín Ibáñez 4, 5, 6, 7, 8

<sup>1</sup> Center for Aphasia and Related Disorders, East Bay Institute for Research and Education, Martinez, CA, USA, <sup>2</sup> Center for Basic Research in Psychology and Faculty of Information and Communication, University of the Republic, Montevideo, Uruguay, <sup>3</sup> Department of Basic, Evolutionary and Educational Psychology, Universitat Autonoma de Barcelona, Barcelona, Spain, <sup>4</sup> Laboratory of Experimental Psychology and Neuroscience, Institute of Cognitive Neurology (INECO), Favaloro University, Buenos Aires, Argentina, <sup>5</sup> National Scientific and Technical Research Council (CONICET), Buenos Aires, Argentina, <sup>6</sup> UDP-INECO Foundation Core on Neuroscience, Diego Portales University, Santiago, Chile, <sup>7</sup> Department of Psychology, Universidad Autónoma del Caribe, Barranquilla, Colombia, <sup>8</sup> Centre of Excellence in Cognition and its Disorders, Sydney, NSW, Australia

Keywords: human mirror system, embodiment, grounded cognition, mirror neurons, language processing

Over the last 20 years, work surrounding theories of embodiment and the role of the putative mirror neuron system (MNS) in humans has been hotly debated. In 2000, Ramachandran (2000, p. 1) suggested that mirror neurons would do for psychology what DNA did for biology, providing "a unifying framework" that would help explain a host of mental abilities." In fact, the strong evidence for action/perception coupling observed in macaque mirror neurons led several authors to implicate this system in higher order functions in humans, such as empathy, language and theory of mind (Rizzolatti and Arbib, 1998; Gallese et al., 2004; but see Hickok, 2009). Thus, embodiment is a broad area of study that suggests that motor resonance participates in several of these higher order processes. However, the exact role played by specific brain structures and/or actual mirror neurons in these processes varies greatly across theories and authors. This special issue brought together 12 studies conducted with healthy as well as brain-injured populations, behavioral as well as imaging techniques (functional and structural), and opinion pieces and responses. Through this broad landscape, we offer a fresh and frugal approach to the challenges and controversies of the translational neuroscience of embodiment and the MNS.

Two of the articles in this collection addressed how the human MNS might underlie the physiological mechanisms that give rise to human emotions. In "Motor empathy is a consequence of misattribution of sensory information in observers," Mahayana et al. (2014) used TMS to measure participants' reactions while they observed videos of painful stimuli being inflicted on another person. Their results suggest that empathy may be partially caused by a misattribution of perceptual information: pain experienced in someone else is perceived as occurring in oneself. This finding raises an interesting and novel view on embodiment that suggests that the empathy experienced through our mirror system is in fact selfish, as it mostly reflects empathy toward ourselves. In "Washing the guilt away: effects of personal versus vicarious cleansing on guilty feelings and prosociality," Xu et al. (2014) asked participants to write about a guilt-inducing past wrong and were then asked to wash their hands, watch a video of someone washing their hands, or a video of someone typing. They were then asked whether they would help a Ph.D. student with her thesis by answering some questions. Participants who felt the least guilty were those who washed their hands, followed by those who watched the hands-washing video, and then by those who watched the typing video. Also, participants who felt most guilty were more likely to help the student with her project. The authors conclude that washing one's hands or watching someone else washing their

#### Edited and reviewed by:

Srikantan S. Nagarajan, University of California, San Francisco, USA

> \*Correspondence: Analía Arévalo, analia@ebire.org

Received: 20 March 2015 Accepted: 28 August 2015 Published: 11 September 2015

#### Citation:

Arévalo A, Baldo J, González-Perilli F and Ibáñez A (2015) Editorial: What can we make of theories of embodiment and the role of the human mirror neuron system? Front. Hum. Neurosci. 9:500. doi: 10.3389/fnhum.2015.00500 hands can be good for feelings of guilt, but not compassion. Both studies offer new evidence for the connection between inner 'motor resonance' and emotion (i.e., Wicker et al., 2003). Also, the study by Xu et al. and that of Kacinik (see below) are classic examples of embodied language, where even the enactment of metaphorical expressions can strongly activate the mirror neuron system.

In "Language comprehension warps the mirror neuron system," Zarr et al. (2013) asked participants to read sentences describing the transfer of objects away or toward the reader. The adapting sentences disrupted prediction of actions in the same direction, but (a) only for videos of biological motion, and (b) only when the effector implied by the language (e.g., the hand) matched the videos. Similarly, Kacinik (2014) asked participants to read a story and act out the idioms presented (e.g., literally sitting on the fence, on the edge of one's seat) in "Sticking your neck out and burying the hatchet: what idioms reveal about embodied simulation." They found that the process of embodying idioms simply by engaging in the corresponding actions activated their meaning enough to significantly influence subsequent processing and judgments. Finally, in "Action relevance in linguistic context drives wordinduced motor activity," Aravena et al. (2014) analyzed online modulations of grip force while subjects listened to target words embedded in different linguistic contexts. They conclude that motor structure activation is part of a dynamic process that integrates the lexical meaning potential of a term and the context in the online construction of a situation model, which is a crucial process for fluent and efficient online language comprehension. Similarly to Xu et al. (see above), these three articles support the notion that the motor resonance of language strongly influences its comprehension. The strict version of this view, which argues that semiotic coding would mostly rely on the human MNS (see Pulvermüller et al., 2014), continues to be controversial and is challenged by other articles in this topic (see below).

Two articles used neuroimaging to identify the neural correlates of embodiment. In an fMRI study entitled "Hand specific representations in language comprehension," Moody-Triantis et al. (2014) asked participants to perform right or left hand actions and then read sentences describing these same actions. They found that language-induced activity overlapped with pre-motor and parietal regions associated with action planning rather than those observed in action execution, endorsing a less strict interpretation of the MNS in humans, in which association (and not primary motor cortices) are activated. In "Neuroanatomical substrates of action perception and understanding: an anatomic likelihood estimation metaanalysis of lesion-symptom mapping studies in brain injured patients" (2014), Urgesi et al. (2014) conducted a meta-analysis of 11 studies and 361 patients and reported that non-linguistic action perception and understanding are associated with the inferior frontal cortex, the inferior parietal cortex and the middle/superior temporal cortex. Again, rather than primary motor cortex, they found that surrounding regions in frontal, parietal, and temporal cortex were associated with action perception.

Two other theoretical/opinion articles also steer away from stricter MNS interpretations and suggest that the motor system influences action perception but is not its sole critical component. In "Homuncular mirrors: misunderstanding causality in embodied cognition," Mikulan et al. (2015) propose a network view of language processing in which the mirror neuron system plays an important role in priming or facilitating understanding (or even indexing action semantics) but not directly in action understanding. Similarly, Bach et al. (2014) propose an object-based view of action understanding in "The affordance-matching hypothesis: how objects guide action understanding and prediction." They suggest that object knowledge (what an object is for and how it is used) informs and constrains action interpretation and prediction.

Additionally, we included two response pieces to Bach et al.'s proposal, one by Osiurak (2014) and the other by Uithol and Maranesi (2014). The latter, in turn, received a response from Bach and colleagues (under review), which is also included in this issue. Osiurak proposes the "mechanical knowledge hypothesis," which diminishes the role of manipulation in action understanding and distances itself from traditional MN theories, while Uithol and Maranesi support an enactivist view, which criticizes the need for integrating the processes of action interpretation and action prediction. On the other hand, Bach et al.'s counter argument suggests that the match is indeed needed to fulfill the requirements of a predictive model of action understanding.

Intriguingly, in "Observation and imitation of actions performed by humans, androids and robots: an EMG study," Hofree et al. (2015) show that these phenomena are not limited to agents with a biological appearance but also for robotic agents, opening important implications regarding human-robot interaction.

All of these works expand our understanding of the human MNS by extending previous work and delimiting the boundaries of how we should interpret those findings. As a group, contributing authors seem to agree on less strict interpretations of embodiment and the human MNS, suggesting these are strong contributors to various aspects of action and cognition, but do not represent the sole basis of language, learning, or comprehension. Future work should further explore the precise mechanisms underlying the links between action planning, execution, and semantic processing, as well as the relative dependence of distinct cognitive processes on mirror activity.

# Acknowledgments

We thank all the authors and reviewers who contributed to our special topic.

# References


of cognition and communication. Neuropsychologia 55, 71–84. doi: 10.1016/j.neuropsychologia.2013.12.002


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Arévalo, Baldo, González-Perilli and Ibáñez. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Action *relevance* in linguistic context drives word-induced motor activity

#### *Pia Aravena1 \*, Mélody Courson1, Victor Frak2, Anne Cheylus 1, Yves Paulignan1, Viviane Deprez <sup>1</sup> and Tatjana A. Nazir <sup>1</sup>*

*<sup>1</sup> L2C2 Institut des Sciences Cognitives - Marc Jeannerod, CNRS/UCBL, Université Claude Bernard Lyon1, Bron, France*

*<sup>2</sup> Département de Kinanthropologie, Faculté des Sciences, Université du Québec à Montréal, Montréal, Canada*

#### *Edited by:*

*Agustin Ibanez, Institute of Cognitive Neurology, Argentina*

#### *Reviewed by:*

*Claudia Gianelli, University of Potsdam, Germany Giovanni Mirabella, University of La Sapienza, Italy Silvia Spadacenta, Eberhard Karls University of Tübingen, Germany*

#### *\*Correspondence:*

*Pia Aravena, L2C2 Institute of Cognitive Science - Marc Jeannerod, CNRS/UCBL, Université Claude Bernard Lyon1, 67 Bd Pinel, 69675 Bron, France e-mail: pia.aravena@isc.cnrs.fr*

Many neurocognitive studies on the role of motor structures in action-language processing have implicitly adopted a "dictionary-like" framework within which lexical meaning is constructed on the basis of an invariant set of semantic features. The debate has thus been centered on the question of whether motor activation is an integral part of the lexical semantics (embodied theories) or the result of a post-lexical construction of a situation model (disembodied theories). However, research in psycholinguistics show that lexical semantic processing and context-dependent meaning construction are narrowly integrated. An understanding of the role of motor structures in action-language processing might thus be better achieved by focusing on the linguistic contexts under which such structures are recruited. Here, we therefore analyzed online modulations of grip force while subjects listened to target words embedded in different linguistic contexts. When the target word was a hand action verb and when the sentence focused on that action (John **signs** the contract) an early increase of grip force was observed. No comparable increase was detected when the same word occurred in a context that shifted the focus toward the agent's mental state (John **wants** to *sign* the contract). There mere presence of an action word is thus *not sufficient* to trigger motor activation. Moreover, when the linguistic context set up a strong expectation for a hand action, a grip force increase was observed even when the tested word was a pseudo-verb. The presence of a known action word is thus not *required* to trigger motor activation. Importantly, however, the same linguistic contexts that sufficed to trigger motor activation with pseudo-verbs failed to trigger motor activation when the target words were verbs with no motor action reference. Context is thus not by itself sufficient to supersede an "incompatible" word meaning. We argue that motor structure activation is part of a dynamic process that integrates the lexical meaning potential of a term and the context in the online construction of a situation model, which is a crucial process for fluent and efficient online language comprehension.

**Keywords: embodied language, context-dependency, lexical semantics, conceptual flexibility, situation models**

# **INTRODUCTION**

A growing number of evidence supports the idea that the brain's motor structures are implicated in the processing of language referring to motor actions (for a review see Hauk and Tschentscher, 2013). However, the crosstalk that the neural networks underlying motor actions entertain with language processes is not well understood. Currently, the theoretical approaches that aim at accounting for the role of motor activation during action-language processing mainly focus on the question of whether language-induced motor activity should be considered as an integral part of lexical semantics or, rather, as resulting from ensuing "higher-level" processes involved in the construction of mental representations of the described state of affairs (Hauk et al., 2008a,b; Van Elk et al., 2010; Bedny and Caramazza, 2011). Answering this question is believed to solve the issue of whether motor activation is relevant for actionlanguage processing or merely an epiphenomenon (for reviews on the theoretical accounts in this debate, see Meteyard et al., 2012; Pulvermüller, 2013). However, determining whether languageinduced motor activation is part of one of these two processes implies considering lexical meaning access and the representation of the situation described by the context as separated processes. Such a dichotomic view, however, is grounded in models of lexical meaning representation currently regarded as no longer tenable (Hoenig et al., 2008; Raposo et al., 2009; see also Egorova et al., 2013). A better understanding of language-induced motor activity may thus require a shift in theoretical perspective.

Research on the role of language induced sensorimotor activation has generated a large body of sometimes conflicting experimental results (see e.g., Hauk et al., 2004 vs. Postle, McMahon, Ashton et al., 2008; Buccino et al., 2005 vs. Pulvermuller et al., 2005; for a review see Willems and Francken, 2012). While these inconsistencies could be seen as an obstacle for the understanding of the crosstalk between language and motor structures, they could alternatively be regarded as providing important insights into the nature of this phenomenon: the heterogeneity in the findings could well indicate that the recruitment of sensorimotor structures crucially depends on the linguistic and extra-linguistic context (see Hoenig et al., 2008; Sato et al., 2008; Papeo et al., 2009, 2012; Rueschemeyer et al., 2010; Mirabella et al., 2012; Tomasino and Rumiati, 2013; for a recent review, see Yang, 2013; see also van Dam et al., 2011; Willems and Casasanto, 2011). That the context a word is uttered in partially determines its meaning is well established among linguists and psycholinguists (e.g., Allwood, 2003; Elman, 2011). According to Allwood (2003) for instance, lexical meaning representations emerge from multiple interactions within a broad knowledge structure. This word knowledge, that Allwood refers to as the "meaning potential" of a word, comprises the set of all the information that the word has been used to convey either by an individual or by a language community. Within the bounds of this meaning potential, the kind of event, property, or entity a given word is taken to denote shift according to the context the word occurs in.

In line with the above view, a vast number of psycholinguistic studies have demonstrated early effects of context on lexical semantics processing (for a review, see Spivey and Huette, 2013). For example, Federmeier et al. (2007) recorded ERPs as participants read target words in weakly constraining (e.g., "Mary went into her room to look at her gift") or strongly constraining (e.g., "The child was born with a rare gift") sentence contexts. The authors analyzed the N400 ERP-component, whose magnitude is positively correlated to interpretative problems, and found a smaller N400 for the same target words in the strongly compared to the weakly constraining contexts. The brain thus seems to use context information to generate likely upcoming stimuli and to prepare ahead of time for their processing (see also Kako and Trueswell, 2000; Kamide et al., 2003; Chambers and Juan, 2008; Bicknell et al., 2010). Note that this "lexical anticipation" phenomenon involves evaluating the contextual properties of a word and not merely its characteristics as an entity of the mental lexicon. The whole event evoked when processing a sentence within a given context restricts the set of potential word referents (Kako and Trueswell, 2000; Kamide et al., 2003; Chambers and Juan, 2008; Bicknell et al., 2010; Kukona et al., 2011). In other terms, lexical meaning access profits from a representational state of the situation described by the context (e.g., Nieuwland and Van Berkum, 2006; Hagoort and van Berkum, 2007; Metusalem et al., 2012). This representational state, which can assimilate information about time, social relations, mental acts, space, objects, and events (MacWhinney, 2005; Frank and Vigliocco, 2011), has been termed by linguists and philosophers as "mental models" or "situation model" (Johnson-Laird, 1983; Van Dijk and Kintsch, 1983; Zwaan and Radvansky, 1998; Zwaan and Madden, 2004). As demonstrated by Nieuwland and Van Berkum (2006), situation models can even overrule constraints provided by core lexical-semantic features such as animacy, which, in classic linguistic semantics, is encoded in the mental lexicon. Hence, when participants listened to a story about a dancing peanut that had a big smile, the canonical inanimate predicate "salted" for the inanimate object "peanut" elicited a larger N400 component than the animate predicate "in love." Situation models can thus neutralize processing difficulties due to animacy violations, confirming that lexical meaning does not necessarily involve an initial context-independent semantic computation.

Despite the remarkable body of evidence regarding the context dependency of lexical meaning, these results have rarely been taken into account in the cognitive neuroscience literature that discusses the role of motor structures in action-language processing. In fact, many researchers in this domain seem to have implicitly relied on theoretical views that apprehend word recognition and semantic processing in a form-driven, exhaustive, bottom-up fashion (Swinney and Love, 2002; MacDonald and Seidenberg, 2006). In this manner, semantic and pragmatic context exerts its effects only after word meaning has been elaborated. What is more, it seems as if it is tacitly assumed that words have fixed meanings that are accessed like entries in a dictionary (c.f. "conceptual stability"; Hoenig et al., 2008. See also Elman, 2011). However, within a theoretical frame that considers lexical meaning access as an interactive process, integrating information from many different sources, the question of whether language-induced motor activation is an integral part of lexical meaning or a mere effect of the ensuing construction of a situation model (Hauk et al., 2008a,b; Chatterjee, 2010; Bedny and Caramazza, 2011) does not make sense. Therefore, this issue will not satisfactorily inform the main interrogation regarding the function of motor activation in action-language processing. We believe that an understanding of the role of motor structures in the construction of linguistic meaning requires a detailed exploration of the context under which motor structures are recruited during action-language processing.

Critical results along this line were provided by Taylor and Zwaan (2008). These authors demonstrated that in a sentence describing a manual rotation (e.g., "He placed his hand on the gas cap, which he opened slowly"), compatible motor responses (i.e., manual rotation of a knob in a congruent direction with the linguistically described activity) are facilitated during reading the verb "opened." Motor responses are also facilitated while reading of the adverb that modifies the action verb (i.e., "slowly"), but not while reading of the adverbs that modify the agent (e.g., "He placed his hand on the gas cap, which he opened *happily*"). According to Taylor and Zwaan (2008), the difference between the two conditions is explained by the fact that the adverbs that modify the action maintain the linguistic semantic focus on the action described in the sentence. Note that these results suggest that motor structure activation is sustained beyond the lexicalentity of the action term, extending to the broader linguistic event in which the word is embedded. Results from our laboratory further support this view. By analyzing online grip force variations that index cerebral motor activity in response to target words (c.f. Frak et al., 2010), our study revealed an increase of grip force starting around 200 ms after the onset of a manual action word when the word occurred in an affirmative sentence (e.g., "Fiona lifts the luggage"), but not when it occurred in a negative sentential context ("Fiona does *not* lift the luggage") (Aravena et al., 2012). Our interpretation of these data is that in affirmative context, motor features of the target word are activated because of the *relevance* of the action within the situation model. In negative contexts the motor features remain irrelevant in spite of the actual presence of the action word in the sentence, because the sentence-induced situation model does not focus on the action.

In the present study, we present two experiments that further investigate how the sentential context modulates word-induced motor activation. As in our previous studies (Frak et al., 2010; Aravena et al., 2012), we measured grip force variations while subjects listen to words that describe manual motor actions. Note that an increase of word-induced grip force can be interpreted as an incomplete inhibition of the output of primary motor cortex activity (Jeannerod, 1994; Frak et al., 2010). No motor task associated to the linguistic process was required, as participants were asked to count how many sentences contain a name of a country. This ensured the ecology of the experimental environment as it simulates a quite natural linguistic situation.

In Experiment 1 we set out to investigate the effect of linguistic focus on action-verb induced motor activity by making use of the *volition modality* ("want to do," see Morante and Sporleder, 2012). Volition is a grammatical modality that pertains to the intentions of an agent with respect to an action. It sets an action in an *irrealis mood* indicating that the relevant situation or action has not yet happened. Indeed, wanting to do X presupposes that X is not currently being done or taking place. Hence, the situation model evoked by the volition modality does not focus a motor action. In Experiment 2 we assessed the degree of contextdependency of language-induced motor activation by measuring motor activity at the point where the target word is expected. For example, for an utterance beginning with "With his black pen, James. . . " the word "writes" is a continuation that is far more likely than the word "walk," as the former evokes a more plausible action for the use of the "black pen" (see Bicknell et al., 2010; Matsuki et al., 2011). To investigate the anticipatory effects of an action context on the subsequent word processing, we used either a pseudo-verb with no associated reference or a verb whose associated reference was incompatible with the action meaning anticipated by the context. In keeping with the findings of our experiment with negative contexts, we predicted that the processing of an action word should neither be sufficient nor even necessary to activate motor structures. Hence:


#### **MATERIALS AND METHODS**

#### **EXPERIMENT 1: VOLITION**

#### *Ethics statement*

All of the participants in this study gave an informed written consent. The study was approved by the Ethical Committee CPP (Comité de Protection des Personnes) Sud-Est II in Lyon, France.

#### *Participants*

All of the participants were French undergraduate students (18– 35 years old; mean age = 21.7, *SD* = 1*.*5) and right-handed Edinburgh handedness inventory (Oldfield, 1971), with normal hearing and no reported history of psychiatric or neurological disorders. Twenty-five participants (including 13 females) participated in this study. Eight participants were eliminated from the analysis due to an extremely weak signal throughout the experiment, thus preventing the capture of grip-force. We used a grip-force mean below 0.13 V in combination with the absence of signal changes throughout the experiment as criteria for discarding participants from the analyses.

#### *Stimuli*

A total of 115 French sentences served as stimuli (see Supplementary Material). Ten were distractor-sentences containing a country name. The data from the trials using the distractor-sentences were not included in the analysis. Thirtyfive target-action words were embedded into action-in-focus and volition-in-focus sentences resulting in 70 total sentences corresponding to the two conditions of the experiment: the actionin-focus and the volition-in-focus condition. All of the target action words were verbs denoting actions performed with the hand or arm (e.g., scratch or throw). Thirty-five sentences containing common nouns denoting concrete entities with no motor associations were used for comparison with earlier studies (e.g., Frak et al., 2010; Aravena et al., 2012). The target nouns and verbs were controlled for frequency, number of letters, number of syllables and bi- and trigram frequency (New et al., 2001, see Supplementary Material). Three examples of experimental stimuli are provided in **Table 1**.

All critical verbs were in the present tense and in neutral 3rd person. Verbs always occurred in the same position of the sentence. The sentences were spoken by a French male adult. His voice was recorded using Adobe Soundbooth and the recordings were adjusted to generate similar trial lengths using the Audacity 1.2.6 software. Two pseudo-randomized sentences lists were generated from trials; these lists contained uniform distributions of the different sentence types. The two lists were alternated between participants. The mean word duration was 459 ms (*SD* = 97 ms)

#### **Table 1 | Example of stimuli used in the Experiment 1 and their approximate English translation.**


*Underlined words represent the target words. Words in bold type represent the linguistic focus of the sentence.*

for the nouns and 415 ms (*SD* = 78 ms) for the verbs. There was an interval of 2000 ms between the sentence presentations.

# *Equipment and data acquisition*

Two distinct computers were used for data recording and stimulus presentation to ensure synchronization between audio files and grip-force measurements (estimated error *<*5 ms). The first computer read the play-list of the pseudo-randomized stimuli. The second computer received two triggers from the first computer, which indicated the beginning and the end of the play-list. This second computer also recorded the incoming force signals from the load cell at a high sampling rate of 1 KHz. To measure the activity of the hand muscles, a standalone 6-axis load cell of 68 g was used (ATI Industrial Automation, USA, see **Figure 1**). In the present study, force torques were negligible due to the absence of voluntary movement; thus, only the three main forces were recorded: Fx, Fy, and Fz as the longitudinal, radial and compression forces, respectively (**Figure 1B**).

# *Procedure*

Participants wore headphones and were comfortably seated behind a desk on which a pad was placed. They were asked to rest their arms on the pad, holding the grip-force sensor in a precision grip with their right hand (see **Figure 1**). The thumb, index, and middle fingers remained on the load cell throughout the experiment. Holding the sensor with the index, thumb, and middle finger implies more stability of the object (i.e., less grip force variations due to finger adjustments) than holding it with the index and thumb only.

**FIGURE 1 | Experimental material and setting. (A)** A standalone 6-axis load cell of 68 g was used (ATI Industrial Automation, USA). **(B)** The three main forces were recorded: Fx, Fy, and Fz as the longitudinal, radial and compression forces, respectively. **(C)** Participants hold the grip-force sensor in a precision grip with their right hand. Bottom panel: participants wore headphones and were comfortably seated behind a desk on which a pad was placed. They were asked to rest their arms on the pad, holding the sensor.

The Experimenter demonstrated how to hold the grip sensor and participants were requested to hold the cell without applying voluntary forces.

The cell was suspended and not in contact with the table. The participants kept their eyes closed for the duration of the experiment. They were verbally instructed to listen to the spoken sentences. Their task was to silently count how many sentences contained the name of a country. To avoid muscular fatigue, a break of 10 s was given every 3 min. The total length of the experiment was 12 min.

# *Data analysis*

Prior to the data analysis, each signal component was pretreated with the Brain Vision Analyzer 2.0 software (Brain Vision Analyzer software, Brain Products GmbH, Munich, Germany). The data were filtered at 10 Hz with a fourth-order, zero-phase, low-pass Butterworth filter, and a notch filter (50 Hz) was applied in case that artifact caused by electrical power lines would have persisted. Finally, a baseline correction was performed on the mean amplitude of the interval from −400 to 0 ms prior to word onset. The baseline correction was implemented because of a possible global change in grip-force during the session (12 min), and because we are only interested in grip-force changes. Thus, we adjusted the post-stimulus values by the values present in the baseline period. A simple subtraction of the baseline values from all of the values in the epoch was performed. As the participants were asked to hold the grip-force sensor throughout the experiment, a "negative" grip-force refers to a lesser gripforce and not to the absence of grip-force, which is impossible in this context. Only Fz (compression force) was included in the analysis as this parameter was determined to be the most accurate indicator of prehensile grip-force. The Fz signals were segmented offline into 1200 ms epochs spanning from 400 ms pre-stimulus onset to 800 ms post-stimulus. The segments with visually detectable artifacts (e.g., gross hand movements) and the trials that showed oscillations exceeding the participant's mean force were isolated and discarded from the analysis. A mean of 6.04 segments (17.2%) were discarded per condition. The Fz signals for action words in action-in-focus, action words in volition-in-focus and nouns were averaged for each participant and the grand mean was computed for each condition.

We selected three time windows (i.e., 100–300, 300–500, and 500–800 ms after word onset) that were identified as critical phases during the processing of words in auditory sentences in Friederici's (2002) model and that were used previously in our work for language-induced grip-force analysis (Aravena et al., 2012). Given that the conduction time between the primary motor cortex (M1) and hand muscle is approximately 18–20 ms (estimations using TMS, Rossini et al., 1999), we added 20 ms to each of these windows, resulting in 120–320 ms for the first window, 320–520 ms for the second time window and 520–800 ms for the third.

For each condition, the averaged grip-force values in the three time windows were compared with their proper baseline (i.e., averaged grip-force values over the segment between −400 and 0 ms before target word onset) using a one-sample *t*-test against zero; for a window that presented significant grip-force modulations with respect to the baseline, a comparison between the conditions was performed using repeated measures of Analysis of Variance (ANOVA). *Post-hoc* two-by-two comparisons were performed using the Bonferroni test. Since statistical significance is heavily dependent upon sample size, and our study sample was smaller than 20, we also report "effect sizes" (Cohen's *d*; Cohen, 1988). An effect size is calculated by taking the difference of the mean between two conditions and dividing this difference by the pooled standard deviation of the two conditions. This allows estimating how many standard deviations difference there is between the conditions. According to Cohen (1988) and effect size of.20 (i.e., a difference of a fifth of the standard deviation) is a small effects size. A medium effect size is 0.50 and a large effect size is 0.80.

#### **EXPERIMENT 2: PSEUDO-VERBS**

### *Ethics statement*

All participants in this study gave an informed written consent. The study was approved by the Ethical Committee CPP (Comité de Protection des Personnes) Sud-Est II in Lyon, France.

# *Participants*

All of the participants were French undergraduate students (18– 35 years old; mean age = 21.7, *SD* = 2*.*1) and right-handed [Edinburgh Inventory definition (Oldfield, 1971)], with normal hearing and no reported history of psychiatric or neurological disorders. Nineteen subjects (including 10 females) participated in this study and none had participated in Experiment 1.

# *Stimuli*

A total of 158 French sentences served as stimuli (see Supplementary Material). Ten were distractor-sentences containing a country name. The data from the trials using the distractorsentences were not included in the analysis.

For this experiment, 37 pseudo-verbs were created obeying French's phonotactic constraints using the -Lexique Toolbox of the data base Lexique 3 (New et al., 2001). The soundness of the verb as a French verb was controlled (see Supplementary Material). Thirty-seven target non-action words were utilized. All non-action words were verbs denoting no action performed with the hand or arm (e.g., decide, think), as confirmed by the stimuli validation process (see Supplementary Material). Thirty-seven target action words were included. All action words were verbs denoting actions performed with the hand or arm (e.g., scratch or throw) as established by the stimuli validation process (see Supplementary Material).

All the target words were controlled for frequency, number of letters, number of syllables, and bi- and trigram frequency (New et al., 2001).

The 37 action verbs, the 37 pseudo-verbs, and the 37 nonaction verbs were embedded into action contexts. The 37 target non-action verbs were also embedded into non-action contexts.

Action contexts were designed in such a way that the first adverbial phrase and the subject of the sentence coded a situation, which anticipated a hand action. The degree of effector specificity (i.e., hand action) of action contexts and the action verb cloze probability were controlled. The "degree of effector specificity" was defined as how representative of a hand action was the action encoded by the sentence. All actions encoded by sentences were highly prototypical as hand actions. Cloze probability was defined as how easy was to anticipate a hand action verb from the previous sentential context. Only the contexts that induce highly cloze probability of hand action verbs were considered as action contexts (see Supplementary Material).

In summary, the present study exploited four conditions:


Four examples of experimental stimuli are provided in **Table 2**.

All critical verbs were in the present tense and in neutral 3rd person. Verbs always occurred in the same sentential position (see **Table 2**). The sentences were spoken by a French female adult. Her voice was recorded using Adobe Soundbooth and the recordings were adjusted to generate similar trial lengths using the Audacity 1.2.6 software. Three lists of 37 action contexts (A, B, and C) were created to avoid context repetition between the three action context conditions. Action words were included in A, when pseudo-verbs were included in B and non-action words in C, and they were included in B when pseudo-verbs were in C and non-action in A, etc. Therefore, three pseudo-randomized sentences lists were generated from such balanced combination (ABC, BCA, CBA) in addition to the non-action C-non-action V list and the 10 country sentences. These lists contained uniform distributions of the different sentence types. The three lists were alternated between participants. The mean word duration was 459 ms (*SD* = 97 ms). There was an interval of 2000 ms between the sentence presentations.

#### **Table 2 | Example of stimuli used in the Experiment 2 and their approximate English translation.**


*Underlined words represent the target words.*

# *Equipment and data acquisition*

The equipment and data acquisition from Experiment 1 were used in Experiment 2 (see also Aravena et al., 2012).

# *Procedure*

The procedure from Experiment 1 was repeated with the exception that in the current experiment prior to the beginning of test participants were verbally instructed to apply a specific minimal force on the cell (i.e., between 0.08 and 0.13 V; that was surveyed by the experimenter in the visual signal online registration software) and maintain it throughout all the experiment without applying other voluntary forces. This instruction served to assure the operative capture of the signal, insofar as an extremely weak signal prevents the detection of grip-force variations as shown in Experiment 1 (from which eight participants were eliminated due to frail signals). The total length of the experiment was 18 min.

# *Data analysis*

The analysis used for Experiment 2 was the same used in Experiment 1.

# **RESULTS**

# **RESULTS EXPERIMENT 1: VOLITION**

**Figure 2** plots the variations in grip-force amplitude as a function of time after target word onset for the three experimental conditions (volition-in-focus condition, action-in-focus condition, and nouns condition). The top panel displays individual data for the three conditions and the bottom panel compares data of the three conditions averaged over all participants. As is obvious from the figure, for the action-in-focus condition a steady increase in the grip force [the compression force component of the load cell (Fz)] was observed soon after target words presentations and it is maintained until the last interval. By contrast, the volition and the nouns condition remained nearly constant at baseline.

For the action-in-focus condition the test against the baseline revealed a significant increase in the grip-force in the three time windows [*p* = 0*.*013, *p* = 0*.*009, *p* = 0*.*005 for 120–320, 320–520, 520–800 ms respectively]. No significant effects against baseline were observed for the volition-in-focus or for the nouns condition.

The ANOVA revealed significant effects of the conditions in the last two time windows [*F(*2*,* <sup>32</sup>*)* = 3*.*4505, *p* = 0*.*043 and *F(*2*,* <sup>32</sup>*)* = 5*.*6477, *p* = 0*.*007 respectively]. *Post-hoc* comparison (Bonferroni) for the second window showed that the Action condition (*M* = 0.08 V, *SD* = 0*.*1) differed significantly from the Volition condition (*M* = −0*.*01 V, *SD* = 0*.*1) [*p* = 0*.*05] and just failed to be significantly different from the Noun condition (*M* = − 0*.*009 V, *SD* = 0*.*08) [*p* = 0*.*06 ns]. In the last window *post-hoc* comparison revealed that the Action condition (*M* = 0.14 V, *SD* = 0*.*19) different from the Volition condition (*M* = − 0*.*02 V, *SD* = 0*.*18) [*p* = 0*.*02] as well as from the Noun condition (*M* = −0*.*03 V, *SD* = 0*.*8) [*p* = 0*.*007]. **Table 3** summarizes the effect sizes (Cohen d) of the different comparisons. In all time windows large effect sizes were found for the difference between the Action vs. Nouns conditions as well as between the Action vs. Volition conditions.

All together these analyses confirm that the same action words embedded in sentences whose focus is on the mental state of the agent do not increase grip force in the same way as when they are embedded within sentences that focus the action.

# **RESULTS EXPERIMENT 2: PSEUDO-VERBS**

**Figure 3** plots the variations in grip-force amplitude as a function of time after target word onset for the four experimental conditions (action-action condition, action-pseudo-verb condition, action-non-action condition, and non-action-non-action condition). The top panel displays individual data for the four conditions and the bottom panel compares data of the four conditions averaged over all participants. As is obvious from the figure, for the action-action condition and the action-pseudo-verb condition, a steady increase in the grip force [the compression force component of the load cell (Fz)] was early observed, and maintained until the last interval. By contrast, the action-non-action condition appeared to cause a drop in the grip-force. Finally, non-action-non-action condition remained nearly constant at baseline.

For the Action-Action condition, the test against the baseline revealed a significant increase in the grip-force in the three time windows [*p* = 0*.*01, *p* = 0*.*02, and *p* = 0*.*04 for 120–320, 320–520, 520–800 ms respectively]. For the Action-Pseudo-verb condition, the test against the baseline also revealed a significant increase in the grip-force in the three time windows [*p* = 0*.*01, *p* = 0*.*006, and *p* = 0*.*01, respectively]. No significant effects against baseline were observed for the non-action verbs in the action context or for the non-action-non-action condition. The ANOVA was significant in all time windows [*F(*3*,* <sup>54</sup>*)* = 4*.*558, *p* = 0*.*0064, *F(*3*,* <sup>54</sup>*)* = 5*.*2004, *p* = 0*.*0032, and *F(*3*,* <sup>54</sup>*)* = 3*.*251, *p* = 0*.*0287, for the first, second and third window, respectively]. Results of the *post-hoc* tests (Bonferroni) are plotted in **Table 4**.

The comparison of the three critical conditions (Action-Nonaction vs. Action-Action and Action-Pseudo-verbs) revealed significant effects in the first two time windows. First time window: Action-Non-action condition (*M* = −0*.*1 V, *SD* = 0*.*19) differed significantly from the Action-Action (*M* = 0.099 V, *SD* = 0*.*15) [*p* = 0*.*01] as well as from the Action-Pseudo-verbs conditions (*M* = 0.08 V, *SD* = 0*.*13) [*p* = 0*.*019]. Second time window: Action-Non-action condition (*M* = −0*.*1 V, *SD* = 0*.*3) vs. Action-Action condition (*M* = 0.16 V, *SD* = 0*.*28) [*p* = 0*.*006] and vs. Action-Pseudo-verb condition (*M* = 0.12 V, *SD* = 0*.*16) [*p* = 0*.*029]. In the third time window the same tendency was also evident but the differences with the Action-Non-action condition did not reached significance: Action-Non-action condition (*M* = − 0*.*11 V, *SD* = 0*.*3) vs. Action-Action condition (*M* = 0.16 V, *SD* = 0*.*34) [*p* = 0*.*061] and vs. Action-Pseudo-verb condition (*M* = 0.13 V, *SD* = 0*.*23) [*p* = 0*.*123]. By contrast, the comparison with the Non-action-Non action condition did not survive the Bonferroni correction for multiple comparison (all *p*'s *>* 0.05).

**Table 5** summarizes the effect sizes (Cohen d) of the different comparisons. In all time windows large effect sizes were found for the difference between the Action-Action vs. Action Non-action conditions as well as between the Action-Pseudoword vs. Action Non-action conditions. In the second and third time windows

medium to large effect sizes were also found between the Action-Action vs. Non-action Non-action conditions and between the Action-Pseudoword vs. Non-action Non-action conditions.

around the mean value across the subjects (shaded regions). For the

# **DISCUSSION**

Our experiments were designed to explore the impact of local linguistic context on word-induced neural activation of motor structures. There are two main results of this study. First, compatible with previous findings (Taylor and Zwaan, 2008; Zwaan et al., 2010) our work shows that linguistic focus as defined by Taylor and Zwaan (2008) modulates language-induced motor activity. The presence of an action word in an utterance is not in itself sufficient to trigger a related motor activation (see also Raposo et al., 2009; Aravena et al., 2012; Schuil et al., 2013). Second, our data further shows that the linguistic surrounding and the knowledge of situation it sets up can be sufficient to activate the motor properties of a contextually expected action verb. The actual presence of a known action word is not necessary for the activation of motor structures (for similar results in pragmatic context, see Van Ackeren et al., 2012). Importantly, however, the very same

compared.

**Table 3 | Cohen's d for the differences between the various conditions in the three time windows.**


context can nonetheless fail to trigger relevant motor activation if the tested lexical item is a familiar word that has no associated motor features. Hence, contextual expectations set up by a given utterance are not in themselves sufficient to supersede a lexical meaning that does not involve a motor content. On the basis of this evidence, we argue that language-induced motor activation is neither driven by purely context-free lexical meaning access nor the result of a fully post-lexical higher order operation. Rather, the activation of motor structure results from the dynamic interactions of available lexical and contextual information that take part in the online construction of a complex mental model associated with the processing of a sentence meaning.

In Experiment 1, we used the modal operator "vouloir" (to want) to manipulate the mode of access to a described action by shifting the linguistic focus toward the agent's attitude with respect to the action. "Modality" is a grammatical category that allows relativizing the validity of sentence meaning to a set of possible situations (Perkins and Fawcett, 1983). Agent-oriented modalities focus on the internal state of an agent with respect to the action expressed by a predicate (Bybee et al., 1994). Volition thus focalizes the sentence on the agent's attitude toward the action rather than on the action itself (Morante and Sporleder, 2012). Our results show that motor structures were only recruited when the action verb was the focus of the sentence meaning and not when the sentence meaning focused on the agent's attitude toward the action. These findings are consistent with the linguistic focus hypothesis proposed by Taylor and Zwaan (2008) (see also Zwaan et al., 2010; Gilead et al., 2013). However, our study goes beyond what these authors found. Recall that Taylor and Zwaan (2008) showed that language-induced motor activation could "spill-over" from the actual action word to the linguistically adjacent post-verbal adverb, provided that the adverb modified the action. Our study goes further than these results because we show that motor activation for the *action word itself* can be switched on and off as a function of the linguistic focus. Critically, our study also provides the timing of the contextually constrained word induced motor activation: linguistic focus modulates motor activity within a temporal window that has been associated with lexical semantic retrieval (i.e., 300–500 ms after word onset, see Friederici, 2002).

The results of our first experiment thus suggest that the processing of an action verb can rapidly activate motor features of a denoted action. However, these motor features are only recruited when the denoted action is *relevant* within the currently elaborated situation model. The sensitivity of language-induced motor activation to the relationship between context and lexical semantics suggests that motor structures could serve semantic specification.

The findings of Experiment 2 show that word induced motor activation involves an early evaluation of the context against which the relevance of the action features of the potential verbs are determined (for studies on the anticipatory referential interpretation see, e.g., Kako and Trueswell, 2000; Kamide et al., 2003; Chambers and Juan, 2008; Bicknell et al., 2010). Our sentences were designed so that a fronted adverbial phrase and the subject of the sentence set up a situation in which a hand action was anticipated (i.e., the action context). Following this sentential context the ensuing verb was either a verb denoting a hand action, a verb denoting non-action, or a pseudo-verb unknown to the subject. As expected, when the verb denoted a hand action, an increase of grip force was observed shortly after word onset. Critically, grip force also increased with a pseudo-verb unknown to the listener, but not when a known verb with no motor denotation was presented instead (e.g., "With his black pen, James **plans** to . . . "). These data clearly testify that the increase of grip force was not merely an effect of context. One plausible explanation for our finding is that when a sentence contains an unknown word, the process of meaning construction fills the semantic gap with the most adequate content within the given context (in our case an action performed with the hand) until more information is available. In other terms, the listener maintains the situation model elaborated from previous context and integrates the unknown word into this representation. In our experiment, the instrument described in the adverbial phrase as well as the human agent (i.e., "With his black pen, James. . . ") anticipate hand-action relevant motor features. By integrating this information the listener models a situation that foresees a particular action as a plausible thematic relation. When the ensuing verb is unknown to the listener the elaborated situation model is maintained and motor structures are recruited. However, when the ensuing verb is a known word that does not refer to an action, the non-action verb updates the modeled situation and cancels action representation anticipated by the context. Thus, contextual parameters might be understood as part of a representational state that is constantly restructured and revised following incoming information (see also McRae et al., 2005; Bicknell et al., 2010; Matsuki et al., 2011).

The results of our second experiment thus suggest that the construction of a situation model allows making rapid inferences and predictions for the elaboration of linguistic meaning. The brain generates a continuous stream of multi-modal predictions and pattern completion based on previous experiences (see, for example, Barsalou, 2009). This drive to predict is a powerful engine for online language comprehension (Federmeier, 2007; Elman, 2009).

In conclusion, together with our previous findings (Aravena et al., 2012) the present results indicate that the recruitment of motor structures during the processing of an action word hinges on specific conditions: (i) the context must focus on a motor action and (ii) the tested word form must not be *incompatible*

standard error of the mean (s.e.m.) around the mean value across the

with a contextually anticipated action, i.e., it has to be either compatible or neutral as in the case of a pseudo-verb. Hence, the processing of an action word does not recruit motor structures constantly. The same action word form that provokes motor activity in one linguistic context will cease to do so in another one. Note further that in conditions in which word processing recruits motor structures, this language-induced motor activity is observed within the time frames in which lexical meaning are believed to be retrieved (Friederici, 2002; Swinney and Love, 2002).

Although an increasing number of recent studies has started to account for the context dependency of motor activity (e.g., Sato et al., 2008; Rueschemeyer et al., 2010; Mirabella et al., 2012; Papeo et al., 2012; Tomasino and Rumiati, 2013) the majority of research programs are still strongly rooted in a "dictionarylike" perspective of word meaning (see Elman, 2004, 2011; Evans, 2006; Evans and Green, 2006 for critical reviews). The novelty of our work resides in the explicit integration of a theoretical and experimental framework that could serve to link current models of sentence processing to neurobiological data on actionmeaning representation. The here observed on/off switching of motor activity with a given lexical item could be interpreted as evidence against the assumption that motor activity is necessarily a relevant part of the action word meaning (see also Schuil et al.,

condition that is compared.

#### **Table 4 | Results of the** *post-hoc* **tests (Bonferroni) for the different contrasts.**


**Table 5 | Cohen's d for the differences between the various conditions in the three time windows.**


2013). If motor semantic features were indeed accessed via a modular, exhaustive and context-independent process (c.f. Swinney and Love, 2002) motor structures should be recruited in a consistent and mandatory manner. This, however, is clearly not the case. Yet, "low level" lexical semantic process and "higher level" processes of meaning integration are not serial, discrete, and encapsulated operations (for other examples concerning semantics as well as syntax see Friston, 2003; Kamide et al., 2003; McRae et al., 2005; Chambers and Juan, 2008; Bicknell et al., 2010; Matsuki et al., 2011; Papeo et al., 2012). Context can anticipate motor semantic features of lexical items (Experiment 2) and can also switch them off when they are not relevant within the situation model (Experiment 1). Findings like these question the notion that motor semantic features are "fixed parts" of the action word meaning (Hoenig et al., 2008; Raposo et al., 2009; Egorova et al., 2013; Tomasino and Rumiati, 2013). Note that even when a verb such as "open" is processed in isolation, comprehenders are likely to represent meaning by reference to some *frequently* encountered situation, e.g., opening a door or a bottle (see the situated concept representation proposed by Barsalou, 2003).

The question about the functional or epiphenomenal nature of motor structures in action-language processing might therefore not be put in terms of its participation to lexical semantics processing or to the construction of situation models. Rather, to determine the role of motor structures in language processes it is necessary to take into account the fact that language comprehension involves several sources of information that are elaborated in parallel and continuously adjusted to make sense of an utterance as it is perceived (Allwood, 2003; Cuyckens et al., 2003; Elman, 2011). Classical accounts of language-induced motor activity that sees language-induced sensorimotor activity either as epiphenomenon (Mahon and Caramazza, 2008; Hickok, 2009) or as integral part of word meaning (Glenberg, 1997; Barsalou, 1999; Pulvermuller, 1999) are both problematic in that they assume a model that endorses a fixed, dictionary-like set of lexical representations. The here-demonstrated rapidity, flexibility, and context dependency of language-induced motor activity to one and the same word are not compatible with such view. Rather, following Evans and Green (2006) and Elman (2011), we believe that words are "operators" that alter mental states (i.e., situation models) in context-dependent and *lawful* ways. If the timing under which an effect occurs is indicative of its source (lexical meaning or post-lexical) the early language-driven motor effects that we observed in our experiments allow suggesting that motor activity takes part in the action word meaning construction in conditions in which the action is in the linguistic focus.

In short, motor knowledge is part of the *meaning potential* of action words. It participates in the construction of meaning when a currently modeled situation focuses the action and might serve *meaning-specification*. It also allows prediction and pattern completion, which are important processes for fluent and efficient online language comprehension.

# **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/Journal/10.3389/fnhum. 2014.00163/abstract

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 December 2013; accepted: 04 March 2014; published online: 01 April 2014.*

*Citation: Aravena P, Courson M, Frak V, Cheylus A, Paulignan Y, Deprez V and Nazir TA (2014) Action relevance in linguistic context drives word-induced motor activity. Front. Hum. Neurosci. 8:163. doi: 10.3389/fnhum.2014.00163*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Aravena, Courson, Frak, Cheylus, Paulignan, Deprez and Nazir. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Hand specific representations in language comprehension

#### *Claire Moody-Triantis 1, Gina F. Humphreys <sup>2</sup> and Silvia P. Gennari <sup>1</sup> \**

*<sup>1</sup> Department of Psychology, University of York, York, UK*

*<sup>2</sup> Neuroscience and Aphasia Research Unit, School of Psychological Sciences, University of Manchester, Manchester, UK*

#### *Edited by:*

*Analia Arevalo, East Bay Institute for Research and Education, USA*

#### *Reviewed by:*

*Javier Gonzalez-Castillo, National Institute of Mental Health, USA Lisa Aziz-zadeh, University of Southern California, USA*

#### *\*Correspondence:*

*Silvia P. Gennari, Department of Psychology, University of York, York, YO10 5DD, UK e-mail: silvia.gennari@york.ac.uk*

Theories of embodied cognition argue that language comprehension involves sensory-motor re-enactments of the actions described. However, the degree of specificity of these re-enactments as well as the relationship between action and language remains a matter of debate. Here we investigate these issues by examining how hand-specific information (left or right hand) is recruited in language comprehension and action execution. An fMRI study tested self-reported right-handed participants in two separate tasks that were designed to be as similar as possible to increase sensitivity of the comparison across task: an action execution go/no-go task where participants performed right or left hand actions, and a language task where participants read sentences describing the *same* left or right handed actions as in the execution task. We found that language-induced activity did not match the hand-specific patterns of activity found for action execution in primary somatosensory and motor cortex, but it overlapped with pre-motor and parietal regions associated with action planning. Within these pre-motor regions, both right hand actions and sentences elicited stronger activity than left hand actions and sentences—a dominant hand effect. Importantly, both dorsal and ventral sections of the left pre-central gyrus were recruited by both tasks, suggesting different action features being recruited. These results suggest that (a) language comprehension elicits motor representations that are hand-specific and akin to multimodal action plans, rather than full action re-enactments; and (b) language comprehension and action execution share schematic hand-specific representations that are richer for the dominant hand, and thus linked to previous motor experience.

**Keywords: language comprehension, action execution, action representations, premotor cortex, left hand, right hand, mirror neurons**

# **INTRODUCTION**

Theories of embodied cognition argue that language understanding implies partially simulating or re-enacting the actions being described and thus involves brain regions that are recruited in the execution of those actions (Jeannerod, 2001; Glenberg and Kaschak, 2002; Barsalou et al., 2003; Gallese and Lakoff, 2005; Barsalou, 2008). Indeed, it has been found that body part specific regions of the motor system are activated when reading language describing actions (Hauk et al., 2004; Buccino et al., 2005; Pulvermuller, 2005; Tettamanti et al., 2005) and they do so to an effort specific degree (Moody and Gennari, 2010), suggesting that language recruits detailed action representations that would also be required for the execution of the same specific action.

However, the nature of the representations that are shared between action and language remains unclear, in particular, their level of specificity, i.e., to what extent do we re-enact the execution of an action described by language? Indeed, both primary motor and pre-motor regions have been associated with language comprehension and these contrasting findings imply different levels of specificity in the representations elicited by language: if primary motor regions are recruited during language comprehension, comprehenders can be thought to more closely re-enact the action described as if they were performing it, because these regions are directly connected to the spinal cord and musculature (Dum and Strick, 1996). Alternatively, if pre-motor and parietal regions are recruited, comprehenders may activate more schematic action plans that do not involve execution aspects *per se*, since pre-motor regions are typically associated with planning (Cisek et al., 2003).

The view that language may involve highly specific action representations is consistent with fMRI language studies that have reported the recruitment of primary motor regions (Hauk et al., 2004; Rüschemeyer et al., 2007; Kemmerer et al., 2008; Kemmerer and Gonzalez-Castillo, 2010) and with TMS studies showing that stimulation of primary motor cortex during language comprehension modulates body-part specific motor evoked potentials (Oliveri et al., 2004; Buccino et al., 2005; Candidi et al., 2010). In contrast, the view that language involves more schematic action representations is supported by many language studies showing the recruitment of planning-related pre-motor and parietal regions, rather than primary motor regions (Noppeney et al., 2005; Aziz-Zadeh et al., 2006; Moody and Gennari, 2010; Willems et al., 2010; Meteyard et al., 2012).

To shed light on this issue, we conducted an fMRI study directly comparing action execution and language comprehension. The tasks were designed to be as similar as possible to increase the sensitivity of the comparison. Every participant performed an action execution and a language comprehension task. We focused on hand-specificity, i.e., whether the action is performed, or described as performed, with the left or the right hand. Importantly, the actions included in the execution task held a one-to-one correspondence with the content of the sentences read in the language task. Thus, participants executed left and right hand button presses in the execution tasks and correspondingly read sentences describing left or right hand button presses in the language task, albeit in different syntactic forms. In both tasks, participants were required to match a visual cue (e.g., L, R) referring to a left or right hand action with the execution of the action itself or the content of the sentence, thus keeping participants focused on the directionality of the stimuli.

This design has the potential of providing more homogeneous activations and more precise and sensitive comparisons across conditions than previous studies. First, the linguistic stimuli utilized refer to the same action, instead of classing together different verbs (e.g., *grasp, touch, give*), which often have different senses and syntactic properties. Second, the linguistic meanings targeted in the experiment had a one-to-one correspondence with the actions executed in the execution task, unlike previous studies comparing meaningless actions (e.g., finger movements) with semantically complex verbs (e.g., *grasp*) (Aziz-Zadeh et al., 2006). Finally, the execution task preceded the language task to encourage imagery during the language task, thus increasing the chances to detect potentially weak activity in primary motor regions.

Importantly, the focus on hand specificity provides simple ways to distinguish between primary-motor and premotor regions, say, in comparison to body-part manipulations, because the activation patterns for left and right hand actions within primary motor and pre-motor cortices is relatively well understood. Indeed, primary motor cortex has long been thought to play an important role in the control of limbs on the contralateral side of the body (Tanji et al., 1988; Dassonville et al., 1997; Aziz-Zadeh et al., 2002; Cisek et al., 2003). Thus, executing, observing or imagining a right-hand movement would recruit more neurons and stronger activity in the left primary motor cortex, and vise-versa. In contrast, activity in pre-motor regions responds to both right and left hand actions both in cell recording and fMRI studies (Tanji et al., 1988; Kermadi et al., 2000; Cisek et al., 2003; Hanakawa et al., 2006; Horenstein et al., 2009), although they may respond to different degrees (see below). This is due to the fact that the pre-motor cortex houses more schematic representations responsible for planning rather than executing actions, and thus, are less directly linked to the spinal cord (Rizzolatti and Luppino, 2001).

Therefore, we predicted that if language comprehension involves hand-specific representations, the pattern associated with either execution or planning of left and right hand actions in primary motor or premotor areas should also be observed in language comprehension. Specifically, if language recruits schematic planning representations only, then a similar pattern of activity across language comprehension and planning should be found in pre-motor areas, but if linguistic representations are more detailed in execution content, language comprehension should match the execution-specific activity pattern in primary motor regions, i.e., a contralateral pattern.

# **MATERIALS AND METHODS**

### **PARTICIPANTS**

Eighteen participants were recruited for the experiment, all reported to be right-handed native English speakers with no known neurological disorders, and to use the right hand in daily and sport activities (14 female, 4 male; mean age 21, age range 19–23 years).

# **MATERIAL**

In the execution tasks, visual letter cues were used to elicit button presses that could include one or two fingers (e.g., LX, RX, LL, RR). In the language comprehension task, all sentences were written in the first person narrative (e.g., *I am pressing. . . .*) to encourage the activation of the participant's own motor experience during language comprehension. Each sentence described left/right hand button presses using either one or two fingers. In total 160 action sentences were presented. To encourage participants to process the sentence meaning and to maintain their attention, the phrasing of the sentence was varied, for example, when describing one button press with the left hand participants could read one of 4 different sentences (see **Table 1**). The length in characters of the sentences varied from 27 to 47 (mean length 37.25), however to ensure that the sentences were all matched across conditions, the same structure was used in the left and right conditions, with the words *right* and *left* varying accordingly. Therefore, psycholinguistic variables such as length and frequency should not influence the results.

# **TASK PROCEDURE AND DESIGN**

Ethical approval for the study was obtained from Ethics committee at the York Neuroimaging Centre, where the study was carried out. Before the scanning session, participants were familiarized with the letter patterns to be used in the execution task. They practiced this task until they felt confident. They then practiced the subsequent language task, which used the same cues but required different motor responses. All participants performed the execution task before the language task.

#### **Table 1 | Sentence stimuli.**


# *Action execution task*

A go-no-go task was used to elicit button presses. During the experiment, participants held one button box in each hand resting on their lap in a comfortable position. Each button box had two buttons and participants were instructed to rest their index and middle fingers on the buttons of the boxes during the experiment. Visual stimuli were projected through a mirror fixed to the head coil. The go/no-go cues were pairs of letters in red uppercase 50 pt text. In total 200 action stimuli were presented, 160 go trials and 40 no-go trials. The go trials instructed participants to press either one (RX, LX) or two buttons (RR, LL) using either the right or the left hand as quickly as possible (there were 40 trials per cue). During practice, participant learned to match each letter of the visual cue onto each of the four buttons (and fingers), so that RX indicated one button press with the right middle finger, RR indicated pressing both buttons simultaneously (middle and index finger) and so on. The no-go trials instructed participants to refrain from pressing a button (either XR or XL, i.e., an initial X meant no response at all). Visual cues lasted for 500 ms and were then replace by *HH*, which stayed on the screen until the next cue. Cues from different conditions (left/right) using one or two fingers were intermixed in an event-related design following optimal stimulus order (the probability of each condition following any other condition was constant) and random inter-trial times obtained by a schedule optimizing algorithm (http://surfer*.*nmr*.* mgh*.*harvard*.*edu/optseq/). Therefore participants could not predict the upcoming stimulus and had to plan each trial. Inter-trial interval varied in duration from 2 to 26 s (average 5.8 s). The task lasted 960 s in total.

#### *Language comprehension task*

Participants remained in the scanner in the same position and holding the same button boxes as in the previous task. Participants were presented with 160 sentences in white 30 pt text (on black background) each lasting 2000 ms and were asked to read the sentences for meaning. **Table 1** exemplifies the different formats in which sentences were presented (10 cases of each example). After each sentence presentation, a sequence of 37 X's were presented (which constitute the average character length of all sentential stimuli) until next stimulus sentence appeared. To keep participants' attention on the sentential content, 34 catch trials (also lasting 2000 ms) were also included in the design (21.25% of trials). As in action execution, an event-related design was used where trial types were intermixed in such a way that the probability of each trial type (sentence conditions plus catch trials) following any other type was constant, and therefore trial types could not be predicted (the order of trials and inter-trial times were calculated with the same schedule optimizing algorithm as above). Inter-trial intervals ranged from 2 to 26 s (average 4.96 s). Catch trials asked participants about the sentence content using the same cues that were used in the execution task, e.g., *RR?* Participants had to indicate whether the meaning of the previously read sentence corresponded to the cue (meaning judgment task). To respond to this question, they had to use a left hand button press (index finger for *yes* and middle finger for *no*). For example, participants may read *I'm pushing two buttons on the right*, and after a few seconds (corresponding to the variable inter-trial time), they may be presented with *RR?,* in which case, the correct answer is *yes* (a left index finger button press). In order to perform well on this task participants had to read the sentences carefully for their hand-specific action meaning, and therefore it ensured that participants maintained their attention throughout the experiment.

# **DATA COLLECTION PARAMETERS**

A 3T GE Signa Exite MRI scanner was used to collect both highresolution structural images and functional images. Functional images were obtained using a gradient-echo EPI sequence (TR 2000 ms, TE 50 ms, flip angle 90◦, matrix 64 × 64, field of view 24 cm) with 38 axial slices of thickness 3.0 mm. The resulting voxel size was 3*.*75 × 3*.*75 × 3 cm. Note that our TE specification is near those considered optimal for detecting signal in primary motor cortex (Fera et al., 2004). Functional images excluded the cerebellum and in some participants inferior portions of the temporal lobe. A T1 flair image was also obtained in order to facilitate the registration between the high-resolution structural and functional data.

# **DATA ANALYSIS**

Both first level and higher-level analyses were carried out for the language and the action task separately using FEAT (FMRI Expert Analysis Tool) Version 5.91, part of FSL (FMRIB's Software Library, www*.*fmrib*.*ox*.*ac*.*uk/fsl). We have followed the standard order of processes built into the FSL FEAT analysis. Pre-processing steps included brain extraction, slice-timing correction, motion correction (Jenkinson et al., 2002), spatial smoothing using a Gaussian kernel of FWHM 8 mm and highpass temporal filtering (Gaussian-weighted least-squares straight line fitting, with sigma = 25.0 s). Time-series analysis was carried out using FILM with local autocorrelation correction (Woolrich et al., 2001). A boxcar model of the timing of events was created involving the onset and length of each stimulus event, which was then convolved with a hemodynamic response (gamma) function. For both action and language data the events were modeled at the onset of the stimulus presentation with action trials lasting 500 ms and language trials lasting 2000 ms. For the language task, the catch trials were modeled separately to partial out the participant's motor responses but were excluded from any statistical average or comparison of the language data. No-go trials in the execution task were also modeled out and not analyzed further.

Several contrasts were run at the individual level between the different conditions in the execution and the language task. For both the execution and the language data, all actions or sentences together (irrespective of hand) were compared to rest to identify all action or all language regions, and right and left hand actions or sentences were also compared against one another to find those areas that were significantly more involved in performing or reading about left or right hand actions (R *>* L, L *>* R). Individual level analyses were then entered into high-level mixedeffect modeling built into FSL, taking into account both variance and parameter estimates from individual-level results. All higherlevel analyses reported below were carried out using FLAME (FMRIB's Local Analysis of Mixed Effects (Woolrich et al., 2004) within the right or left hemisphere to increase statistical power. Z (Gaussuanised T/F) statistic images were thresholded using a Gaussian Random Field-theory (GRF)-based maximum height with a (corrected) significance threshold of *p* = 0*.*05 (Worsley et al., 1992). For convenience, we will refer to this correction method, *GRF-based correction*.

#### *Region of interest analyses in hand- and execution-specific regions*

To evaluate whether language activity within primary motor regions showed the same pattern as that of action execution, we used execution-specific activity to identify regions for further analyses of hand-specific language activity. To isolate hand-specific execution regions that would not include common planning regions, we used execution activity resulting from contrasting left-hand and right-hand actions, i.e., the contrasts R *>* L and L *>* R, obtained with GRF-based correction at *p* = 0*.*05. Subtracting left from right and right from left action performance should cancel out any general planning activity that is shared across hands, thus identifying execution specific activity, which should show the typical contralateral pattern. Indeed, simply comparing left-hand or right-hand execution relative to rest may still include regions that are common to both hands, and thus likely to reflect common planning regions, because these general contrasts only identify voxels active for one hand irrespective of the other hand. These contrasts yielded as expected, the contralateral pattern shown in in **Figure 1** in the blue-to-cyan and red-to-yellow scales. Within these hand- and execution-specific contralateral ROI masks, we then ran a high-level analysis (GRFbased correction, *p* = 0*.*05) for the language data irrespective of hand, i.e., the contrast all sentences vs. rest, to establish whether language comprehension activated these hand-specific execution regions. This yielded significant language activity (irrespective of hand) shown in green in **Figure 1**. The average percent signal change within the significant cluster resulting from this high-level analysis was then extracted for each participant using FSL tools. *T*-tests (with subjects as random factor) were then used to determine whether there was any difference between left-hand and right-hand sentences.

#### *Region of interest analyses in non-hand-specific regions*

To isolate regions that were sensitive to all hand actions irrespective of hand and thus were likely to include activations in

**FIGURE 1 | Results from the action execution task showing the contralateral pattern of activation specifically responding to left hand actions (in blue, left hand** *>* **right hand contrast) and right hand actions (in red, right hand** *>* **left hand) (whole brain GRF-based correction,** *p* **= 0***.***05).** Significant language comprehension activity responding to all sentence types within each execution region is shown in green.

planning regions, we contrasted all actions relative to rest (GRFbased correction, *p* = 0*.*05). The corresponding contrast was also conducted in the language task to identify all regions involved in language comprehension irrespective of hand (GRF-based correction, *p* = 0*.*05). By multiplying these execution and language comprehension results, we localized several clusters that were significantly active in both tasks, and thus indicated overlapping regions across tasks. This is equivalent to a conjunction analyses as previously referred to in the literature (Nichols et al., 2005). These overlapping clusters thus acted as functional localizers for the regions targeted for further analysis of more specific contrasts (Poldrack, 2007). In particular, to establish whether there were hand-specific activations within these overlapping regions, we extracted the percent signal changes for each hand relative to rest for each participant in each of the main overlapping clusters shown in **Figure 2**. These values were then analyzed with paired *t*-tests (with subjects as random factor) to examine whether either in action planning or in language comprehension, there was stronger activity for a specific hand, and more generally, to examine whether a similar pattern of activity was shown for planning and language, as hypothesized.

# **RESULTS**

**BEHAVIORAL DATA**

#### *Execution task*

The time taken to perform the instructed action and the number of errors made were measured.

*Reaction times.* Trials containing errors or responses longer than 3 standard deviations from the mean were excluded from the reaction times analyses. These exclusions constituted about 3.40%

**FIGURE 2 | Action execution activity (in blue) and language comprehension activity (in red) in response to all actions and all sentence stimuli compared to rest (whole brain GRF-based correction,** *p* **= 0***.***05).** The regions in which language and execution activity overlapped (conjunction) are shown in green and are labeled as dorsal pre-motor (dPM), ventral pre-motor (vPM) and parietal lobe (PL).

of the total data. We found that participants responded faster with the right hand (mean = 615.8 ms) than the left hand (mean = 630.2 ms) [*t*(18) = 2*.*77, *p* = 0*.*01], thus providing supporting evidence that our participants were indeed right-handed.

*Accuracy.* A response was classed as an error if participant either failed to make a response or responded using the wrong hand. On average participants made an error on 3.06% of action trials, although there was not reliable difference between left and right hand actions (Wilcoxon Signed-Rank test: *z* = −0*.*637, *p >* 0*.*05). The numbers of errors were also calculated on nogo trials, with errors being defined as those no-go trials where an action was incorrectly performed. On average, errors on nogo trials were relatively low and were made 2.5% of the time. Furthermore, almost all errors (94%) were consistent with the directional letter in the cue (i.e., if the cue was XR the right button was most likely to be erroneously pressed, and vise-versa).

#### *Language task*

Due to experimenter error, no responses were recorded from one participant. For the remaining 17 participants, on average participants responded correctly on 90.7% of the question trials and the mean reaction time for the responses was 2605 ms, as measured from the presentation of the cue (e.g., *RR?*).

### **OVERALL FUNCTIONAL ACTIVATIONS FOR ACTION EXECUTION AND LANGUAGE COMPREHENSION**

#### *Action representations in hand- and execution-specific regions*

As anticipated from previous research, hand-specific action execution (left *>* right and right *>* left) elicited stronger responses in the contralateral hemisphere (GRF-based correction, *p* = 0*.*05) (**Figure 1**). The strongest activity was centered around the post central gyrus and extended into the central sulcus and pre-central gyrus (left hemisphere peak: −40, −26, 54; right hemisphere peak: 42, 4–30, 58). The corresponding corrected analysis for the language comprehension data contrasting one hand relative to the other however did not elicit any significant response. To make sure that stringent correction level did not miss hand-specific language activity, we conducted further ROI analyses within the contralateral execution clusters, as described in Region of Interest Analyses in Hand- and Execution-specific Regions and reported below.

#### *Actions representations in non-hand-specific (planning) regions*

The contrast of all actions relative to rest (GRF-based correction, *p* = 0*.*05) revealed several brain regions that were commonly activated by the execution task irrespective of hand. These included premotor and parietal regions, as well as other regions. Peak activations for the left-hemisphere are listed in **Table 2**, and the overall pattern of execution activity is shown in the blue-to-cyan scale in **Figure 2**. The contrast of all sentences relative to rest also revealed several brain regions that included parietal, pre-motor, posterior temporal and inferior frontal regions (GRF-based correction, *p* = 0*.*05). Peak activations for the left-hemisphere are listed in **Table 2**, and the overall language activity is shown in the yellow-to-red scale in **Figure 2**. The multiplication of the activity elicited by each of these tasks indicated regions that were significantly activated for both action execution/planning and language **Table 2 | Peak activations for each task and center of gravity for overlapping regions.**


*Coordinates are given for the left hemisphere, which were analogous to those in the right hemisphere. Cluster sizes are given for the GRF-based corrected images at a threshold of z* = *4.5.*

(conjunction), as shown in green in **Figure 2**. These common activations suggest that common neural representations were recruited for both execution/planning and language comprehension. The overlapping regions were located in the middle frontal gyrus/dorsal pre-central gyrus, superior parietal lobule/angular gyrus, and ventral pre-central gyrus and were larger in the left than the right hemisphere. Because these regions were associated with more than one anatomical label according to the Harvard-Oxford Cortical Structural Atlas, henceforth we refer to them as dorsal or ventral pre-motor regions (dPM, vPM) and parietal lobe regions (PL). The centers of gravity of these regions are listed in **Table 2**.

#### **REGION OF INTERESTS**

#### *Hand- and execution-specific regions*

Hand specific language activity was assessed in two steps (see section Region of Interest Analyses in Hand- and Execution-specific Regions) because direct contrast between leftand right-hand sentences did not show any significant voxel in a high-level analysis masked by the hand- and execution-specific ROIs of **Figure 1**. We first conducted a high-level analysis within hand specific execution ROIs to detect any language activity irrespective of hand (all sentences vs. rest). This analysis revealed significant clusters shown in green in **Figure 1**. The clusters were located in the superior portion of the pre-central gyrus (left hemisphere peak: −32, −10, 64). Within these clusters, we then evaluated hand-specific activity by extracting the percent signal change for left and right hand sentences vs. rest for each participant and for each of the left and right hemisphere clusters. *T*-tests comparing left vs. right hand sentence activity within these clusters revealed no significant difference (*p >* 0*.*4). The hand specific pattern of data as seen in action performance is therefore not seen when comprehending hand specific action language within these execution areas.

#### *Non-hand specific (planning) regions*

To examine whether a similar pattern of activity was shown for planning and language within the regions that were significantly activated in both tasks, as hypothesized, for each of the identified common regions of activation for the language and execution tasks (see above and **Figure 2**), we contrasted right and left hand actions or sentences for each of the hemispheres. The overall pattern of results is summarized in **Figure 3**. For all the common clusters of activation in the left hemisphere, we found a parallel pattern of activation across action execution and language comprehension. As shown in **Figure 3**, right-hand actions or sentences elicited stronger activity than lefthand actions or sentences [dPM—language activity: *t*(17) = 2*.*71, *p <* 0*.*02; dPM—execution activity: *t*(17) = 5*.*98, *p* = 0*.*0001; PL—language activity: *t*(17) = 3*.*42, *p <* 0*.*003; PL—execution activity: *t*(17) = 2*.*46, *p <* 0*.*03; vPM—language activity: *t*(17) = 2*.*53, *p <* 0*.*01; vPM—execution activity: *t*(17) = 2*.*53, *p <* 0*.*02]. For the common clusters of activation in the right hemisphere, the pattern of results was numerically similar to that in the left hemisphere, with right-hand actions or sentences also eliciting a stronger response than left hand actions or sentences. However, only the vPM cluster showed statistically significant results for execution and language [language activity: *t*(17) = 3*.*61, *p <* 0*.*002; execution activity: *t*(17) = 2*.*96, *p <* 0*.*009], with all other right-hemisphere regions not reaching significance (*p >* 0*.*05). Note that these results, and particularly those in pre-motor regions, could not be due to eye-movements during reading, which we could not control for: First, left vs. right sentences were identical except for one word, and thus are likely to elicit similar eye-movements. Therefore, the differences in handspecific activity cannot be due to more or less eye-movement in one condition relative to other. Second, the coordinate range typically associated with the frontal-eye field (Paus, 1996; Swallow et al., 2003) do not correspond to those reported here, consistent with the fact that this region is anterior to the hand area. Finally, the execution task, with which language activity overlapped, only involved central fixation, and therefore, cannot be due to eye-movements.

Overall, these results suggest that hand-specific effects are found in regions of common activity for action planning and language comprehension in left pre-motor and parietal regions and right pre-motor regions. Because these regions were active for the execution of either hand action and were not located in primary motor regions, they reflect more schematic representations associated with planning, rather than muscle control. Therefore, action execution/planning and language comprehension appear to recruit some aspects of these more schematic representations. Interestingly, both language comprehension and action execution

show hand specific effects characterized by stronger responses for the right hand than for those of the left hand, suggesting a dominant hand effect, since our participants reported to be right-handed. We will discuss this specific effect below.

# **DISCUSSION**

This study aimed to investigate the nature of the representations that are recruited by hand-specific information during language comprehension, and to assess the extent to which we simulate the actions that we read about by comparing language activity to motor-related activity elicited by similar tasks. Participants were asked to perform left and right hand button presses and read sentences that described the same left and right hand button presses. Hand-specific activity for the language task was then assessed within the primary motor hand-specific contralateral regions where execution and language activity overlapped. We predicted that if we nearly accurately re-enact the actions we read about, hand-specific contralateral activity should occur in execution-specific areas such as primary motor cortex for language comprehension in the same way that it does in action execution. This prediction was not supported. Although there was some significant language activity in the superior portion of the pre-central gyrus, a contralateral pattern of activity for handspecific actions or any sensitivity to hand-specificity was not seen in language as it was in action execution (section Hand- and Execution-specific Regions). This suggests that language comprehension does not show sensitivity to hand-specificity within these execution areas, and therefore that the hand-specific information that was required for language comprehension was not represented within execution areas.

We also predicted that if hand-specific information is represented in a more schematic and general way during language comprehension, then those areas that are responsible for action planning (including the premotor and parietal cortex) would display equivalent activation patterns in the action execution and language task for left and right hand actions. This prediction was assessed in those regions of the premotor and parietal cortex that were activated during action execution *and* language comprehension irrespective of hand, i.e., these regions were significantly active for both left and right hand action or sentences [section Actions Representations in Non-hand-specific (Planning) Regions], but we further examined whether there was any handspecific differences in the amplitude of this activation [section Non-hand Specific (Planning) Regions]. We found that there was more activity for right-hand actions and right-hand sentences than left ones in most of the pre-motor and parietal regions examined within the left hemisphere (a dominant hand effect) as well as in pre-motor areas of the right hemisphere. This indicates a similar pattern of activation across language comprehension and action execution/planning in pre-motor regions, as predicted. Together these results provide support for embodied cognition and suggest that language recruits detailed hand-specific action representations that are nevertheless one-step removed from re-enacting the execution of the action itself. In other words, language comprehension does not fully activate all action components that are required for the performance of that action. Instead, only more schematic action representations that are stored in areas responsible for action planning are recruited for language.

The dominant hand effect, i.e., that right-hand actions or sentences elicited more activity than left-hand ones in pre-motor regions, is consistent with previous studies suggesting that motor representations in language comprehension and action observation are modulated by motor experience (e.g., Buccino et al., 2004; Calvo-Merino et al., 2005; Beilock et al., 2008). Indeed, language studies have shown that right handers and left handers activate pre-motor cortex to a different degree in different hemispheres (Willems et al., 2010), and activity in pre-motor regions describing hockey actions correlates with different degrees of hockey experience in the dominant hemisphere (Beilock et al., 2008). In action observation studies, more activity is also seen in pre-motor areas for observing human compared to non-human actions (Buccino et al., 2004), biomechanically performable actions compared to non-performable actions (Costantini et al., 2005; Candidi et al., 2008) or those actions that a participant is expert, rather than inexperienced in performing (Calvo-Merino et al., 2005; Haslinger et al., 2005; Cross et al., 2006; Kiefer et al., 2007; Beilock et al., 2008). This suggests that increased experience results in the establishment of a more elaborate action representation leading to stronger activations in action execution, observation, and language comprehension.

Our results are consistent with much of the literature on pre-motor cortex showing that unlike primary motor regions, ventral and dorsal premotor regions play a variety of a cognitive functions supporting not only action planning, e.g., via the formation of visuo-motor associations, but also perceptual analysis, serial prediction and attentional functions (Johnson et al., 1996; Boussaoud, 2001; Picard and Strick, 2001; Simon et al., 2002; Schubotz and von Cramon, 2003; Cisek and Kalaska, 2004; Chouinard et al., 2005). In particular, this research has proposed functional differentiations between dorsal and ventral portions of the premotor cortex (e.g., Schubotz and von Cramon, 2003). In this respect, our results suggest common representations for execution/planning and language comprehension in these two premotor regions, as we found a more dorsal pre-motor cluster in the left hemisphere and another cluster more ventral and bilateral (**Figure 2**). Although both left hemisphere clusters are located in the proximity of previously reported hand-related motor and language activity, which indeed have been reported to be located either more dorsally or ventrally (see summary of coordinates in Kemmerer and Gonzalez-Castillo, 2010), the fact that two distinct clusters were fund here suggests different roles for these regions. More dorsal aspects of the pre-motor cortex are implicated in spatial attention and specifically, the use of current or expected sensory features of the environment relative to the body (Boussaoud, 2001; Schubotz and von Cramon, 2003), which is consistent with the attention to directionality required in both our tasks relative to the body. Therefore, it is possible that different aspects of the action representation are distributed across the pre-motor cortex, one cluster linked to spatial features and another to motor plans or schemas.

More importantly for the purpose of our study, our results have implications for theories of embodied cognition as applied to language. Although we cannot exclude the possibility that other more sensitive methods or more targeted designs may reveal language sensitivity in primary motor regions, the same experimental conditions that elicited significant effects in pre-motor regions were not sufficient to detect hand-sensitive activity in primary motor regions. Thus, the comprehension of hand action sentences does not seem to involve action representations that are specifically recruited for left or right hand executions in contralateral hemispheres, even when imagery was encouraged by the order and similarity of the execution and language tasks. This suggests that those regions of primary motor cortex directly linked to the spinal cord are not activated by language and language-elicited imagery in similar conditions to those that activate pre-motor regions. This contrasts with previous fMRI and TMS reports, which may have been tapping into planning components and did not distinguish between effector-specific plans and executions. In TMS studies in particular, it is very likely that stimulation of primary motor cortex will stimulate pre-motor cortex too, due to strong interconnections between the two (Chouinard et al., 2003). Therefore, language does not appear to elicit simulations of the action described *as if we were performing the action*, but rather *as if we had the intention or idea of performing the action*.

Nevertheless, we do find stronger activity for the dominant right hand bilaterally in the pre-central gyrus, and in other left pre-motor and parietal regions. According to previous findings, this suggests that action plans or schemas in these regions activate richer representations for the dominant hand, and in this respect, they are hand-specific representations, i.e., they include information as to whether the action would be executed with the left or the right hand. This is particularly revealing because previous language studies have suggested that hand-action representations are body specific, i.e., right and left handers display opposing activation patterns across the hemispheres in premotor regions, with right handers showing more activity in the right hemisphere than the left hemisphere and vice-versa (Willems et al., 2010). Here, we go a step further and show that these pre-motor representations are not only body-specific but also hand-specific. Even more, if the rich experience associated with the dominant hand is indeed responsible for stronger activity, our results suggests that hand dominance is not only represented on the dominant hemisphere but also bilaterally in the pre-central gyrus, suggesting shared functions across the hemispheres.

These observations are consistent with the fact that mirror neurons have primarily been reported in pre-motor regions, rather than primary motor ones, and are considered multimodal, often integrating visual, somatosensory and motor information (e.g., Rizzolatti et al., 2002; Gallese and Lakoff, 2005). It is thus conceivable that language may also activate them, particularly in a task where attention to hand effector and directionality is required. However, these partial re-enactments only support or contribute to language comprehension, as other regions were also recruited for language comprehension but not action execution, most notably, the left inferior frontal gyrus and the posterior temporal lobe (see **Figure 2**). These two regions have been consistently implicated in many lesion and imaging studies of language processing (Jefferies and Lambon Ralph, 2006; Tyler and Marslen-Wilson, 2008; Humphreys and Gennari, 2014), suggesting that their role is critical to language comprehension. Therefore, our study demonstrates those aspects of the language network where action representations are shared with action planning.

Nevertheless, the cognitive role of mirror-like activity in the brain still remains to be fully understood. Recent findings suggest that mirror-like responses can also be found in primary motor cortex, and that canonical mirror responses can also be found in the hippocampus, SMA and medial frontal regions (Tkach et al., 2007; Lepage et al., 2008; Mukamel et al., 2010). These same regions also display cells with opposite pattern of excitation and inhibition to those observed during action execution or observation, suggesting a role for both integration and differentiation of representations across the brain. Complex activity patterns of neural assemblies across the brain have already been studied in detail by researchers interested in the control of behavior, for which attention and working memory (the need to maintain a goal in memory through complex sequences of actions) are key cognitive processes (e.g., Fuster, 2001). This sort of systemic approach, where temporally integrated activity patterns are investigated across a large network, is likely to provide critical clues for understanding emergent cognitive processes.

# **CONCLUSION**

The present results suggest that within the constraints and assumptions of fMRI research, we don't appear to re-enact the actions that we read about in all the same brain areas that are required for action execution. Only very particular action representations are recruited by language—those involved in more abstract stages of action planning in pre-motor cortex. Nevertheless, the representations that are stored in these planning regions are highly specific in that they contain hand-specific information. This is therefore consistent with embodied theories of language proposing that language understanding involves the partial re-enactment of the action described, including handspecific representations, but we do not accurately re-enact the action as such throughout the motor system. Language understanding is therefore somewhat removed from action execution as it relies upon higher-level cognitive regions.

#### **REFERENCES**


premotor cortex involvement as revealed by fMRI. *J. Neurophysiol.* 88, 2047–2057. Available online at: http://jn.physiology.org/content/88/4/2047.long


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 09 February 2014; accepted: 11 May 2014; published online: 03 June 2014. Citation: Moody-Triantis C, Humphreys GF and Gennari SP (2014) Hand specific representations in language comprehension. Front. Hum. Neurosci. 8:360. doi: 10.3389/fnhum.2014.00360*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Moody-Triantis, Humphreys and Gennari. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Language comprehension warps the mirror neuron system

# *Noah Zarr 1, Ryan Ferguson1 and Arthur M. Glenberg1,2\**

*<sup>1</sup> Department of Psychology, Arizona State University, Tempe, AZ, USA*

*<sup>2</sup> Department of Psychology, University of Wisconsin-Madison, Madison, WI, USA*

#### *Edited by:*

*Agustin Ibanez, Institute of Cognitive Neurology, Argentina*

#### *Reviewed by:*

*Manuel De Vega, Universidad de La Laguna, Spain Susanne Kristen, LMU Munich, Germany*

*\*Correspondence:*

*Arthur M. Glenberg, Department of Psychology, Arizona State University, 950 S. McAllister, Tempe, AZ 85287, USA e-mail: glenberg@asu.edu*

Is the mirror neuron system (MNS) used in language understanding? According to embodied accounts of language comprehension, understanding sentences describing actions makes use of neural mechanisms of action control, including the MNS. Consequently, repeatedly comprehending sentences describing similar actions should induce adaptation of the MNS thereby warping its use in other cognitive processes such as action recognition and prediction. To test this prediction, participants read blocks of multiple sentences where each sentence in the block described transfer of objects in a direction away or toward the reader. Following each block, adaptation was measured by having participants predict the end-point of videotaped actions. The adapting sentences disrupted prediction of actions in the same direction, but (a) only for videos of biological motion, and (b) only when the effector implied by the language (e.g., the hand) matched the videos. These findings are signatures of the MNS.

**Keywords: language comprehension, mirror neurons, neural adaptation, motor system, embodied cognition**

# **INTRODUCTION**

Language comprehension is a simulation process: A sentence is understood by using linguistic symbols to drive neural systems of action (Rizzolatti and Arbib, 1998; Fischer and Zwaan, 2008), perception (Meteyard et al., 2008), and emotion (Havas et al., 2010) into states homologous to those created by actual experience in the described situation. For example, to understand a sentence such as "You give the pencil to Henry," a listener uses her motor system to simulate the action of giving (e.g., moving the arm away from the body while the hand is performing a precision grip), and uses her visual system to simulate the visual characteristics of a pencil.

Simulation accounts (e.g., Glenberg and Gallese, 2012) suggest that the motor system plays a constitutive role in meaning. That is, activity within the motor system is, itself, part of the meaning of the sentence. If correct, then there should be a bi-directional causal relation between motor activity and language comprehension: Changing the motor system should causally affect language comprehension, and changing language comprehension should causally affect the motor system. In both cases it is because language comprehension and motor activity are one and the same thing. Several experiments (discussed later) have demonstrated such bi-directional links using EEG. Here we focus on whether the mirror neuron system (MNS) may be a playing a role in these links.<sup>1</sup>

Previous work (Glenberg et al., 2008) demonstrated half of the bi-directional link, namely that adapting the motor system through repeated literal action affects language comprehension. In those experiments, participants moved beans from one container to another for about 15 min. For half of the participants, the direction of movement was from a location close to the participant to one farther away, and for the other participants the direction of movement was from a far container to a near container that is, toward the body. This repeated action adapts the motor system (Classen et al., 1998). But does repeated action affect language comprehension? The data suggest an affirmative answer: After repeated action in the Away direction, participants were slower to comprehend sentences describing action Away (e.g., "You give Alice the pizza"), and after repeated action Toward, participants were slower to comprehend sentences describing action Toward (e.g., "Alice gives you the pizza"). Why is there a slowing? One possibility is that the relevant action control system is fatigued. A second possibility is that the action control becomes specialized for the repeated movement (e.g., moving a bean using a power grip). Then, when the action control system is called upon to simulate a different movement (e.g., an openhanded movement used to pass a pizza), fewer neural resources are available.

In the work reported here, we demonstrate the other half of the bi-directional link. Participants read a block of sentences all of which described action of a particular sort, for example, transfer away using the hand. On the assumption that language comprehension of action sentences requires a simulation using the motor system, then repeatedly comprehending sentences of the same sort should adapt the motor system much as does repeatedly moving beans.

But, how are we to demonstrate that the motor system has been adapted by the language task? We took advantage of the putative fact that the MNS (Rizzolatti and Craighero, 2004), a component of the motor system, plays a role in both language and action perception. The MNS is active both when an animal engages in action and when the animal perceives a conspecific take similar

<sup>1</sup>As will become apparent, we use the term "bi-directional" in the functional sense that the action system can affect language comprehension and language comprehension can affect the action system, and in both cases the affect is through the MNS. We do not mean that the exact same neurons or associations are themselves bi-directional.

action (Rizzolatti and Craighero, 2004). MNS activity has been linked to language on theoretical grounds (Rizzolatti and Arbib, 1998), using imaging techniques (Aziz-Zadeh et al., 2006), and using behavioral techniques (Glenberg et al., 2008).

When the MNS is engaged, it facilitates prediction of biological motion. For example, an observer's eyes anticipate the location of an actor's hand when the actor is stacking blocks. But when the actor's hand is invisible, so that the blocks appeared to move on their own (i.e., non-biological motion), the eyes lag the blocks; that is, prediction is impaired (Flanagan and Johansson, 2003).

Because the MNS is multi-modal, adapting it through repeated action (Classen et al., 1998) should affect both action perception (Cattaneo et al., 2011) and language comprehension (Glenberg et al., 2008). Here we document the role of the MNS in language comprehension by using the complementary procedure. Namely, if comprehension is a simulation process that uses the MNS, then repeated comprehension of sentences should adapt the MNS. We measure the effects of adaptation using a visual prediction task.

Much like Flanagan and Johansson (2003) our experiment used a manipulation of biological and non-biological motion. But in contrast to that work, we used an explicit measure of prediction rather than tracking eye movements. We created four types of videos (see movies M1–M4) depicting cranberries moving from one container to another about 40 cm away. In a Handaway video, a hand moved a cranberry from a container near the body to one farther away; in a Hand-toward video, a hand moved a cranberry from the far container to the near container. The No-hand videos were nearly identical except that the hand was digitally removed so that the cranberry appeared to move on its own. The participant's task was to press the down arrow key on the computer keyboard when the cranberry crossed the lip of the target container.

In the experiment, participants read blocks of 20 sentences. After each block of sentences, they viewed 20 videos, each depicting the transfer of one cranberry. The videos were comprised of a randomly ordered sequence of five Hand-away, five Handtoward, five No-hand away, and five No-hand toward videos. Each of the five was a random selection from 10 videos of the same type. The reason for this random selection and random ordering was to prevent learning of the timing of particular cranberry movements.

Each participant read six blocks of 20 sentences (each followed by 20 videos). All of the sentences in a block were of the same type: sentences describing transfer away using the hand; sentences describing transfer toward using the hand; sentences describing transfer away using the leg (e.g., "You kicked the stone to Liam"); sentences describing transfer toward using the leg (e.g., "Liam kicked the stone to you"); and two blocks of 20 sentences that did not describe transfer events. The order of these six blocks was randomized for each participant.

During the sentence reading portions of the experiments, a participant read the sentence and judged whether it was written by a native speaker of English or a non-native speaker<sup>2</sup> . The point of this judgment was to focus the reader on each sentence. In addition, a randomly selected 25% of the sentences were followed by a four-alternative comprehension question. This question also was used to motivate processing of meaning and as a check that the participant was attending to the meaning.

If the MNS is adapted by the mere understanding of sentences presented before the videos, then prediction error (the time between when the cranberry actually crossed the lip of the container and the press on the computer key) should be greatest when the implied direction of the sentences (e.g., toward the reader) and the depicted direction of the cranberry movement are the same (Glenberg et al., 2008; Cattaneo et al., 2011). However, this effect should be greatest when the MNS is actively engaged, that is, when the video depicts biological motion as in the hand videos (cf. Flanagan and Johansson, 2003). Thus, for predictions following sentences describing transfer by hand, we predict a three-way statistical interaction between the implied direction of the sentence, the direction of cranberry movement, and whether the video shows a hand or not.

A different prediction is made for the predictions that follow blocks of sentences describing transfer by leg. Although the repeated simulation of these sentences should adapt leg action control, these adapted systems should not play a role in perceiving hand actions. Thus, the implied direction of movement in the leg sentences should not interact with the direction in the video, nor should there be an interaction with biological or non-biological movement.

# **METHODS<sup>3</sup> PARTICIPANTS**

The study was approved by the Arizona State University IRB. The 90 participants (54 female) were university students, and all gave informed consent. All participants were native English

<sup>2</sup>To create sentences that describe transfer, we used the double-object syntax, which strongly suggests transfer even for verbs not typically associated with transfer (Goldberg, 1995). For example, "You peddled the bike to Jace."

However, to get some of the leg sentences to strongly imply transfer, we needed to add the preposition "over to," as in, "You jogged the bottle over to Olivia." We used "over to" in half of the leg sentences. Consequently, we added "over to" to half of the Hand sentences, where the use of "over to" was not necessary but where it did not distract from the meaning, either, as in "Diane threw the pen over to you." We used the Native English judgment to (a) focus the participant on each sentence and (b) justify what may appear to be an unnecessary use of "over to."

<sup>3</sup>This reported experiment is the last in a series of three. Each experiment produced evidence consistent with a MNS explanation, but over the course of the experiments we learned better ways to test the claim. For example, in the initial experiment, we did not have a No-hand condition nor did we have leg sentences. Also, there are two important procedural details that differentiate the reported experiment form the first two experiments. First, in the initial experiments, participants judged if a sentence was sensible or not, and in fact, half of the sentences were intended to be nonsense. This procedure likely diluted any adaptation produced by simulating the sentences because half of the sentences were difficult or impossible to simulate. Second, in the initial experiments we presented the different types of cranberry actions in a continuous block, e.g., a block of 8 Hand-toward videos with no breaks in the filming or the responding. This procedure led to large effects of block order and large learning effects over the course of the experiment that then interacted with the effects of interest. By intermixing the different types of videos in the reported experiment, we avoided the learning effects and the block effects. Details regarding these initial experiments may be obtained from the corresponding author.

speakers, right handed, and had normal or corrected-to-normal vision.

We used 20 triads of sentences with concrete objects that implied transfer by the hand. In addition, we constructed 20 triads of sentences in which the transfer was produced by the leg. An example of a leg triad is "Ethan bicycled the mail to you," "You bicycled the mail to Ethan," and "You read the mail with Ethan." The sentences were arranged into six blocks (Hand Away, Hand Toward, Hand no-transfer, Leg Away, Leg Toward, and Leg notransfer) of 20 sentences each. The order of the sentences within a block was randomized for each participant. The order of the blocks was randomized with the constraints that (a) no more than two hand or two leg blocks could occur successively and (b) two successive blocks could not both be Toward or Away.

For 5 of the 20 hand sentence triads and 5 of the 20 leg sentence triads we composed four-alternative multiple-choice questions about the content of the sentence. For example, for the sentence triad "Chloe danced the bouquet over to you," "You danced the bouquet over to Chloe," and "You smelled the bouquet with Chloe," the multiple choice question was "What object was part of this event? (1) a car (2) a pencil (3) a flower (4) a window?" Thus, 25% of the sentences were followed by a comprehension question.

To create the videos, we began by filming 10 separate Handaway videos and 10 separate Hand-toward videos. Each video began with a hand holding a cranberry above the start container for approximately 1 s. The hand then transferred the cranberry to the target container and dropped the cranberry. These videos were then digitally manipulated to produce 10 No-hand-away and 10 No-hand-toward videos. The manipulation used a masking procedure such that for each frame of the video everything was masked except for the location of the cranberry. These frames were then superimposed on a background similar to that in the original videos. The result was a video in which the cranberry appeared to move by itself and followed the exact path as in the corresponding Hand video. Following each block of sentences, participants observed a random selection of 20 videos with the constraint that there were exactly five of each type. The random selection and ordering of the videos made it difficult to use particular features (e.g., a slight pause in one video followed by a slight speeding in the next) to predict when the cranberry would cross the lip of the container. Consequently, we could collect more data from each participant without the worry that memory from previous trials was affecting the judgments.

#### **PROCEDURE**

Participants were informed that there would be six sections to the experiment, each consisting of two tasks: a sentence comprehension task and a visual prediction task. For the sentence comprehension task, the participant rested the right index finger on the "/" key and the left index finger on the "z" key. Participants were told that upon the presentation of a sentence, they were to judge whether the sentence was written by a native ("/") or nonnative ("z") speaker of English (all were written by native English speakers). Furthermore, they were to use the 1–4 keys to answer the multiple-choice question if one occurred (after approximately every fourth sentence). For the video task, the participant was instructed to rest the right index finger on the "down arrow" key and to press the down arrow key when the cranberry crossed the lip of the target container.

Before the first block of sentences, participants practiced both tasks. For the sentence practice task, participants judged whether each of nine sentences was written by a native English speaker, and three of the nine were followed by multiple-choice comprehension questions. For the visual prediction practice task, participants watched a random selection of 12 videos.

# **RESULTS**

The dependent variable was the difference (in ms) between the time when the cranberry first crossed the lip of the target container and when the participant pressed the down arrow key. However, we subjected the data to some pre-processing before conducting the analyses described below. First, we eliminated the data from 11 participants whose mean absolute prediction errors were more than two standard deviations from the mean4. Second, we intended to eliminate participants who answered the comprehension questions with less than 60% accuracy, however the two participants who met this criterion were already eliminated on the basis of their mean prediction errors. Finally, we noticed that two of the videos (a Toward Hand video and its paired No-hand video) had been inappropriately edited so that they were approximately twice as long as the other videos (the initial section of the video showing the hand above the start container was not edited down to 1 s). Data associated with these two videos were eliminated.

The prediction time errors were analyzed using multi-level modeling (the "mixed" procedure in SPSS). This procedure is similar to multiple regression in that regression coefficients corresponding to main effects of variables and their interactions are estimated. It is different from multiple regression in several regards, however. First, rather than using ordinary least squares to calculate the regression coefficients, they are estimated using maximum likelihood estimation (MLE). Second, the MLE procedure allows an estimate of the variance for each of the coefficients so that a *t*-test (*t* = coefficient/standard error) can be performed for each coefficient using its own error term. The calculation of the degrees of freedom in each variance makes use of the Satterthwaite estimation, and so the degrees of freedom often have a fractional component. Third, the multi-level modeling procedure allows the specification of multiple levels of dependency (and multiple random factors) that may correspond to the dependency of observations within subjects as different from the dependencies between subjects. Because separate variances are estimated for each coefficient, there is no need to ensure sphericity. Finally, the procedure has robust missing data handling so that we could use the data from a participant even if the participant did not respond to one or more cranberries. In the analyses, all predictor (independent) variables were centered.

Separate analyses were performed for predictions following Hand sentences and for predictions following Leg sentences. The predictors were the direction of the sentences read before the predictions (Toward or Away), the direction of transfer in the video

<sup>4</sup>As noted by a reviewer of a previous version of this article, some participants may have had difficulty seeing the rim of the container.

(Toward or Away), and whether a hand was visible or not in the video.

The mean prediction errors for the Hand sentences are presented in **Figure 1**. The predicted three-factor interaction was virtually significant (*p* = 0*.*054), *t(*2895*.*26*)* = −1*.*93. Perhaps more importantly, when considering the Hand videos alone, the interaction between sentence direction and video direction was significant (*p* = 0*.*028), *t(*1413*.*66*)* = 2*.*21. When considering the No-hand videos alone, the same interaction is not significant (*p* = 0*.*69). There were also several significant main effects, although none of theoretical interest: There were main effects of video direction (*p <* 0*.*004) and whether the video showed a hand or not (*p <* 0*.*001).

The mean prediction errors for the Leg sentences are presented in **Figure 2**. The three-factor interaction was not significant (*p* = 0*.*92); when considering the Hand videos alone, the interaction between sentence direction and video direction was not significant (*p* = 0*.*99); and when considering the No-hand videos alone, the same interaction was not significant (*p* = 0*.*88). There were, however, several significant main effects, although none of theoretical interest: There were main effects of sentence direction (*p <* 0*.*001), of video direction (*p <* 0*.*001), and whether the video showed a hand or not (*p <* 0*.*001).

#### **DISCUSSION**

Together with (Glenberg et al., 2008), these data demonstrate bi-directional adaptation effects between a component of the motor system, the MNS, and language That is, repeating literal action in one direction slows subsequent comprehension of sentences describing transfer in that same direction. And, as demonstrated here, comprehending sentences describing transfer in one direction disrupts subsequent perception of action in that same direction.

Several components of the results strongly suggest that they are caused by adapting the MNS. First, the effects are crossmodal. In Glenberg et al. (2008), adaptation using a motor task affected language comprehension. Here, adaptation using a language task (albeit conveyed through vision) affected visual predictions. Second, the effects are specific to an action goal (transfer Away or Toward) rather than a general priming or expectation effect. Third, the results reflect effector-specificity: only when the adapting linguistic stimulus implies transfer by hand is there an effect on predictions for the hand videos. Fourth, and most tellingly, the effects are only found when the prediction task involves a biological effector. That is, the MNS works through a process of motor resonance when the perceiver has goals similar to those accomplished by the perceived movements. If the perceived motion (e.g., No-hand cranberry motion) does not correspond to a motor action in the perceiver's motor repertoire, there should be little MNS involvement.

Caggiano et al. (2013) report that mirror neurons in macaque area F5 do not adapt to observation of repeated actions by changing their firing rate, thus suggesting that our results could not be due to adaptation of a MNS. However, Caggiano et al. also report that local field potentials in area F5, probably produced by input to the mirror neurons, do show adaptation. Thus, although we cannot claim that our procedure directly adapts mirror neuron activity, both our data and Caggiano are consistent with the claim that MNS activity as a whole is affected by adaptation.

These findings also suggest a constitutive relation between language comprehension and motor activity. Note that constitution is not a hypothesis that can be demonstrated by experiment. Experiments demonstrate causal relations, such as A causes B; constitution, however, is a particular form of causality, namely that A causes B because they are the same thing. How then do these results suggest constitution? The argument is one of parsimony. Namely, Glenberg et al. (2008) demonstrated a causal relation between adapting the motor system and language comprehension. Here we demonstrate the complement that using language as an adapting stimulus warps the MNS. Instead of having to propose two separate causal mechanisms, the notion that MNS activity constitutes (at least part) of language comprehension explains the results with a minimum of causal relations and mechanisms.

Nonetheless, it is important to keep in mind several limitations of our data and design. First, our data only support the notion of "bi-directional links" in a functional sense, and they do not demonstrate that the exact same pathways are active when action adapts language and when language adapts action systems. Second, transfer accomplished by the legs may not be as common as transfer accomplished with the hands. And finally, the case for bi-directionality would be stronger if we were to demonstrate that leg sentences would adapt prediction of leg videos.

Finally, we note that these data are not the first to demonstrate bi-directional causal effects between language and the motor system. Aravena et al. (2010) had participants read sentence implying hand actions with an open hand (e.g., applauding) or a closed hand (e.g., hammering). Upon understanding the sentence, the participant pressed a button using an open or closed hand. Then, using EEG, Aravena et al. found that an incompatible hand shape generated a larger N400-like component than a compatible hand shape. This finding implies a causal effect between motor preparation (hand shape) and semantics of the sentence. Aravena et al. also report that the implied hand shape in the sentence affected the motor potential (MP) component generated shortly before literal hand movement. This finding implies a causal effect between sentence comprehension and motor processes.

Guan et al. (2013) used a similar procedure to detect bidirectional links between the motor system and comprehension of abstract language. In particular, Guan et al. had participants read sentences that included the quantifiers "more and more" and "less and less." On comprehending a sentence, the participant either moved the hand up to a response button (a direction compatible with "more and more") or down to a response button (incompatible with "more and more"). Much like Aravena et al., Guan et al. also found a larger N400 for the incompatible trials and a larger MP in the compatible trials. Again, the results imply bi-directional links between language and motor processes.

Thus, subject to the limitations noted above, the data are strong in supporting the claim that there are bi-directional causal connections between aspects of language comprehension and the motor system. Furthermore, to the extent that the parsimony argument is correct, these bi-directional links suggest that motor activity constitutes at least a component of language comprehension (e.g., the understanding of human action). And finally, the data presented here support the claim that the MNS itself contributes to constitution.

#### **ACKNOWLEDGMENTS**

This experiment was inspired by Noah Zarr's senior honors thesis supported in part by Barrett, The Honors College at ASU. Many thanks are due Devan Watson for help creating the videos and help with data collection. Arthur M. Glenberg was partially supported by NSF Grants 1020367 and 13054253. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2013.00870/abstract

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 23 September 2013; accepted: 28 November 2013; published online: 17 December 2013.*

*Citation: Zarr N, Ferguson R and Glenberg AM (2013) Language comprehension warps the mirror neuron system. Front. Hum. Neurosci. 7:870. doi: 10.3389/fnhum. 2013.00870*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Zarr, Ferguson and Glenberg. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Motor empathy is a consequence of misattribution of sensory information in observers

#### **Indra T. Mahayana<sup>1</sup> , Michael J. Banissy<sup>2</sup> , Chiao-Yun Chen<sup>3</sup> , Vincent Walsh<sup>4</sup> , Chi-Hung Juan<sup>1</sup> and Neil G. Muggleton1,2,4\***

1 Institute of Cognitive Neuroscience, National Central University, Jhongli, Taiwan

<sup>2</sup> Department of Psychology, Goldsmiths, University of London, London, UK

<sup>3</sup> Department and Graduate Institute of Criminology, National Chung Cheng University, Chiayi, Taiwan

4 Institute of Cognitive Neuroscience, University College London, London, UK

#### **Edited by:**

Agustin Ibanez, Institute of Cognitive Neurology, Argentina

#### **Reviewed by:**

Anthony Paul Atkinson, Durham University, UK Rei Akaishi, University of Oxford, UK Julià L. Amengual, University of Barcelona, Spain

#### **\*Correspondence:**

Neil G. Muggleton, Institute of Cognitive Neuroscience, National Central University, No. 300, Jhongda Road, Jhongli City, Taoyuan County 32001, Taiwan e-mail: n.muggleton@ucl.ac.uk

Human behavior depends crucially on the ability to interact with others and empathy has a critical role in enabling this to occur effectively. This can be an unconscious process and based on natural instinct and inner imitation (Montag et al., 2008) responding to observed and executed actions (Newman-Norlund et al., 2007). Motor empathy relating to painful stimuli is argued to occur via the mirror system in motor areas (Rizzolatti and Luppino, 2001). Here we investigated the effects of the location of emotional information on the responses of this system. Motor evoked potential (MEP) amplitudes from the right first dorsal interosseus (FDI) muscle in the hand elicited by single pulses of transcranial magnetic stimulation (TMS) delivered over the left motor cortex were measured while participants observed a video of a needle entering a hand over the FDI muscle, representing a painful experience for others. To maintain subjects' internal representation across different viewing distances, we used the same size of hand stimuli both in peripersonal and extrapersonal space. We found a reduced MEP response, indicative of inhibition of the corticospinal system, only for stimuli presented in peripersonal space and not in extrapersonal space. This empathy response only occurring for near space stimuli suggests that it may be a consequence of misidentification of sensory information as being directly related to the observer. A follow up experiment confirmed that the effect was not a consequence of the size of the stimuli presented, in agreement with the importance of the near space/far space boundary for misattribution of body related information. This is consistent with the idea that empathy is, at least partially, a consequence of misattribution of perceptual information relating to another to the observer and that pain perception is modulated by the nature of perception of the pain.

**Keywords: empathy, mirror mechanism, motor evoked potential, transcranial magnetic stimulation, peripersonal space, extrapersonal space**

# **INTRODUCTION**

Empathy has a significant role in the sharing of affective states and in predicting and understanding the feelings, motivations, and actions of others, and the showing of compassion (Gallese, 2003; Minio-Paluello et al., 2009; Bernhardt and Singer, 2012). It has been argued that for emotional social interactions, mirror neuron mechanisms may be involved in the neural basis of the observer's empathy for the emotional state of another individual (Schulte-Ruther et al., 2007). It has been argued that, during observation of an action being executed, activation of mirror neurons matches the observed actions with internal representations (Gallese, 2003; Iacoboni and Mazziotta, 2007). Thus, this has been extrapolated to suggest that mirror neurons may provide a simulation-based form of empathy through interactions with the limbic system or other brain areas related to emotion (Iacoboni and Mazziotta, 2007). One example of reduced effectiveness of these mirror systems can be seen in autistic disorders such as Asperger syndrome (Caggiano et al., 2009) which is associated with reduced empathy and characterized by difficulties in social interaction as well as a narrowed range of personal interests (Minio-Paluello et al., 2009).

The subjective experience of pain may comprise autonomic activity and the desire to produce behavioral responses (Rainville, 2002), the so called pain empathy response. This response activates neural structures that are also involved in the direct experience of pain (Lamm et al., 2011). Observation of painful or non-noxious events shown on the body is said to result in functional modulation of the corticospinal system through the mirror neuron system (Avenanti et al., 2005) and lead to inhibition of corticospinal excitability. This can be observed by measurement of motor evoked potential (MEP) signals (Avenanti et al., 2009) and the MEP amplitude may be used to show the modulation of the motor system as a consequence of altered mirror system activity. Motor inhibition, as shown by a reduction in MEP amplitude specific to the muscle in which pain is observed, is found during the observation of needles penetrating body parts of a human model (Avenanti et al., 2006). Furthermore, tonic muscle pain in the hand may result in a long-lasting depression of the MEP amplitude resulting from transcranial magnetic stimulation (TMS) stimulation of the primary motor area in the hemisphere contralateral to the painful stimulation (Le Pera et al., 2001). This is therefore a good method for observation of changes, presumably modulation of corticospinal excitability, induced by pain and the mirror neuron system modulation of action. It is worth noting that similar stimuli have also been employed in conjunction with fMRI, showing responses in anterior cingulate cortex (Morrison et al., 2004).

Dynamic processes relating to peripersonal and extrapersonal space coding are important for perceiving the correct spatial position of target objects (Berti et al., 2002). Mirror functions in space have been investigated in monkey studies and those in the premotor cortex (F5) and anterior intrapariteal area (AIP) play a fundamental role in space and action perception relating to the spatial organization of movements (Rizzolatti and Matelli, 2003). These areas respond mainly to visual stimuli presented in peripersonal space (Graziano, 1997; Holmes and Spence, 2004) thus exhibiting spatial selectivity for subsequent types of behavioral responses. Examples of this include approaching behavior performed in extrapersonal space or competitive behavior in peripersonal space (Caggiano et al., 2009).

We hypothesized that different somatomotor responses might be observed in human when "mirror-matching" occurs when observing others' feelings at different viewing distances. According to Avenanti et al. (2006), motor reaction to observation of pain that results in suppression of MEPs amplitude may be due to a *mirror-like resonance* mechanism that extracts basic sensory qualities of another person's painful experience, for example: the location of the noxious stimulus. Our primary hypothesis was that such a change is potentially a consequence of misattribution of observed stimuli as relating to the body of the observer. Consequently, this leads to the prediction that effects of observed painful stimuli will be greater if they are presented in a position where it is more plausible that they are actual representations of the observer's own body (i.e., in peripersonal or near space) than when they are in a position where this is less likely (i.e., extrapersonal or far space). We therefore manipulated the distances at which affective visual stimuli were presented to evaluate the effects on motor system excitability as an index of pain empathy responses.

# **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Eleven right handed subjects (5 males and 6 females, mean age: 24.2 ± 1.9 years) with no previous history of neurological problems, all with normal or corrected to normal vision, and without colorblindness participated in the experiment. Right handedness was determined using an adapted version of the Edinburgh Handedness Inventory. Prior to the experiment, participants were also required to verbally report any anxiety or phobia of needles or if they had any conditions involving prolonged use of drugs administered by injection (e.g., insulin-dependent diabetic mellitus). The presence of any of these would have resulted in exclusion from the study. All participants were naïve regarding the experiment task and gave informed consent prior to participation. This experiment was conducted in accordance with the Declaration of Helsinki and the protocol was approved by the local Ethics Committee.

# **ELECTROMYOGRAM AND TRANSCRANIAL MAGNETIC STIMULATION (TMS) RECORDINGS**

A Magstim 200 Super-Rapid Stimulator was used to deliver stimulation via a 70 mm figure of eight coil (Magstim Co., Whitland, Dyfed, UK). The left motor cortex was located initially 5 cm left of the vertex and single pulses of TMS applied near this location to identify the best area to produce a twitch in the right first dorsal interosseus (FDI) muscle of the hand (the level of stimulation used depending on the responses in each subject). The minimum machine output intensity to produce a visually observed muscle twitch was identified using a modified binary search algorithm (Tyrrell and Owens, 1988; Thilo et al., 2004; Silvanto et al., 2007). The obtained intensity then was decreased to identify the resting motor threshold (rMT), rMT was defined as the minimum intensity to produce a peak to peak MEP of 50 µV in at least 5 out of 10 consecutive trials (or with 50% probability) in the relaxed FDI muscle (Rossini et al., 1994; Avenanti et al., 2005, 2006). TMS pulses during the experiment were delivered at an intensity of 120% of this resting motor threshold for each subject individually (mean intensity: 80.9 ± 14.2% of machine output). After the experiment session, none of the participants complained of or reported any discomfort related to the TMS received.

MEPs induced by single pulse TMS over the left motor cortex were recorded simultaneously from the right FDI and abductor digiti minimi (ADM) muscles during the experiment using the Biopac MP35 system (Biopac System, Inc, CA, USA) and were band-pass filtered (20 Hz–2.5 kHz), digitized (sampling rate 5 kHz) and stored for offline analysis to measure the mean peakto-peak (p-p) amplitudes of twitches from the FDI and ADM muscles. The MEPs recorded from the ADM muscle location served as a control for the specificity of any changes seen in the FDI muscle activity during the experiment and reliable responses from this muscle were confirmed during the localization and thresholding of the FDI muscle.

# **PROCEDURE**

Participants had to perform 8 blocks of trials, 4 blocks for each distance (near or far space) with 2 blocks of the pain and touch conditions. There were 24 trials per block and a TMS pulse was delivered every trial. Consequently there were 48 trials for each condition (pain or touch) at each distance. A trial started with a fixation cross for 1 s, followed by a video stimulus for 2.5 s, and followed by a blank screen for 7.5 s (similar to the long intertrial interval used by Avenanti et al., 2005). A single

TMS pulse was delivered during the clip, when the needle had penetrated the hand (pain condition) or the cotton swab had touched (touch condition) the skin, both of which were over the location equivalent to the FDI muscle. MEPs elicited were collected. These stimuli have previously been used by Avenanti et al. (2005) and Minio-Paluello et al. (2009).

Participants were not given any information about the onset of TMS and instructed to watch carefully and pay attention to the video stimuli and asked to keep their right hand relaxed.

Presentation of the video stimuli was controlled with E-Prime (Psychology Software Tools Inc., Pittsburgh, PA) in color presentation and showed the same male right hand for all trials. The video stimuli were presented on a 19 inch cathode ray tube monitor, with 75 Hz refresh rate, either in near space or far space with the presentation order counterbalanced (see **Figures 1A, B**). The near space location was 70 cm from the observers and far space at 140 cm, fitting with the definition of near space as a distance within arm reach (Wooding and Allport, 1998; Weiss et al., 2000). The size of the video animations display were 15 × 10◦ of visual angle (size of the hand approx. 9.5 × 8.4◦ of visual angle) and were controlled in both the near and far conditions so there were no changes of the size (in terms of degrees of visual angle) of the hand pictured in the video. With this manipulation, in dim light experiment room, we expected that participants were unaware of the difference between two viewing distances.

Participants were seated comfortably either 70 or 140 cm away from the display with the center of the screen at eye level for both the near and far conditions. Head position was controlled by a chinrest. The right hand, with electromyography electrodes attached, rested on a table in front of the participant.

#### **Follow up experiment**

Following the experiment described above, a second, broadly similar experiment was conducted to evaluate whether any results obtained were affected by the size of the hand displayed in the far space condition (i.e., was the fact that it was essentially a large hand presented further away important). As such, the experiment was repeated as described above with the exception that the stimuli presented in near and far space were identical in size on this occasion (see **Figure 3A**). Twelve righthanded subjects (6 males and 6 females, mean age: 22.4 ± 2.3 years, mean TMS threshold 78.3 ± 7.7%) took part in this experiment.

#### **DATA ANALYSIS**

#### **Subjective measures analysis**

Subjective measures analysis was carried out to evaluate participants' subjective perception of pain. In the subjective measures analysis, to assess participants' perception of pain we used the short form McGill Pain Questionnaire (SF-MPQ), a multidimensional measure of perceived pain in adults, consisting of the

Pain Rating Index (PRI), a visual analog scale (VAS), and Present Pain Intensity (PPI). All of the subjects were asked to rate the observed stimuli after the TMS session in order to minimize bias. The PRI was used to rate participants' subjective pain perception and required them to imagine how the pain would feel if applied to them. This consists of 15 representative words that are rated on a 4-point *Likert*-type rating scale ranging from 0 (none) to 3 (severe) with 11 sensory and 4 affective words. Using a VAS (10-cm-long) and PPI (range from 0 to 5), participants were asked about the pain intensity shown in the video animation and whether participants considered the pain sensation represented in the video to be intense.

### **Motor evoked potential (MEP) analysis**

The MEP data were recorded during the experiment for later analysis using Biopac BSL 4.0 software (Biopac System, Inc, CA, USA). The MEP data was processed offline and the trials with electromyogram (EMG) activity before TMS (less than 5% of trials) were excluded from analysis. The p-p MEPs amplitudes outside the mean ± 2 standard deviations were also excluded.

*Correlation analysis of subjective measurements and motor evoked potential (MEP) amplitude change.* The indices of MEP amplitude change were computed as follows: amplitude during observation of the pain condition minus amplitude during observation of the touch hand condition divided by the average of the same two conditions. For the correlation of subjective measurements and MEPs amplitude change, *Pearson* correlation coefficients between indices of amplitude change of MEPs recorded from each muscle and subjective reports were computed in each experiment.

*Motor evoked potential (MEP) amplitudes in near and far space.* Analysis of the MEP amplitudes was done with a within-subject repeated measures three-way analysis of variance (ANOVA) with distance (near and far), condition (pain and touch), and muscle (FDI and ADM) as within-subject factors. The MEP amplitudes recorded during "Needle in FDI" condition in near and far conditions were compared against the value of "Touch in FDI" condition in near and far conditions by means of paired-sample *t*-tests.

# **RESULTS**

# **THE CORRELATION OF SUBJECTIVE MEASUREMENTS AND MOTOR EVOKED POTENTIALS (MEPs) AMPLITUDE CHANGE**

In the analysis of subjective measurements indexes, the mean of the sensory-PRI score was 19.2 ± 3.5 SD and the affective-PRI was 4.2 ± 3.0 SD. In each question, the sensory-PRI was higher than the affective-PRI (1.7 ± 0.3 vs. 1.1 ± 0.07 SD, *t*(10) = 3.361, *p* = 0.007). Sensory-PRI analysis showed a predicted negative correlation with MEP amplitude change for the near viewing distance (*r* = −0.560, *p* = 0.037). For the far distance there was also a correlation but this was not significant (*r* = −0.502, *p* = 0.070). We found the video stimuli could induce perception of moderately intense pain (VAS: 4.9 ± 2.2 cm and PPI score 2.5 ± 1.4). Moderate scores of VAS and PPI indices showed that the observation of pain scene visual stimuli triggered emotional reactions of personal distress (Avenanti et al., 2009).

#### **MOTOR EVOKED POTENTIAL (MEP) AMPLITUDES**

Analysis of the MEP amplitudes with a within-subject repeated measures three-way ANOVA with distance (near and far), condition (pain and touch), and muscle (FDI and ADM) as within-subject factors revealed a significant interaction (*F*(1,10) = 10.742, *p* = 0.008). Two-way interactions of distance vs. muscle and condition vs. muscle showed no significant results (*F*(1,10) = 1.121, *p* = 0.315 and *F*(1,10) = 0.599, *p* = 0.457, respectively). A significant main effect of muscle was found (*F*(1,10) = 6.580, *p* = 0.028), with no significant main effect of distance and condition (*F*(1,10) = 0.540, *p* = 479, and *F*(1,10) = 0.042, *p* = 0.841, respectively).

Separate two-way ANOVAs with factors of distance (near and far) and condition (pain and touch) were carried out for each muscle. In FDI muscle, a significant two-way interaction of distance vs. condition was found (*F*(1,10) = 7.810, *p* = 0.019), with no significant main effect of distance (*F*(1,10) = 1.617, *p* = 0.232) or condition (*F*(1,10) = 0.279, *p* = 0.609). For the ADM muscle, no significant two-way interaction was found (*F*(1,10) = 0.116, *p* = 0.740).

In *post-hoc* analyses, significantly lower FDI MEP amplitudes during the pain condition for the near distance were found when compared to amplitudes during the touch condition for the near distance (*t*(10) = 2.73, *p* = 0.021) and amplitudes during pain condition for the far distance (*t*(10) = −2.796, *p* = 0.019). This revealed that the display of actual painful stimuli delivered to the hand resulted in modulation of the motor cortex representing this area (potentially via an inhibition of corticospinal excitability) but only when presented in near space and not for far space (see **Figure 2**).

#### **FOLLOW UP EXPERIMENT**

Analysis was conducted in the same manner as the initial experiment. As before, a three-way ANOVA revealed a significant interaction (*F*(1,11) = 5.471, *p* = 0.039). A significant two-way interaction of distance vs. muscle was found (*F*(1,11) = 6.488,

*p* = 0.027) with significant main effects of muscle and distance (*F*(1,11) = 42.578, *p* < 0.001 and *F*(1,11) = 6.447, *p* = 0.028, respectively). The main effect of distance may have been due to the differing visual angle of the stimuli in near and far space. Separate two-way ANOVAs were carried out for each muscle. In FDI muscle, a significant two-way interaction of distance vs. condition was found (*F*(1,11) = 5.281, *p* = 0.042), with significant main effects of distance (*F*(1,11) = 7.124, *p* = 0.022) and condition (*F*(1,11) = 5.145, *p* = 0.044). In contrast, for the ADM muscle, no significant two-way interaction was found (*F*(1,11) = 0.045, *p* = 0.836). In *post-hoc* analyses, the results were also similar to the initial experiment. In near space the FDI MEP amplitudes during the pain condition were lower compared to amplitudes during the touch condition (*t*(11) = 2.800, *p* = 0.017) and also when compared with amplitudes during the pain condition for the far condition (*t*(11) = −3.739, *p* = 0.003) (see **Figure 3B**). These results confirm that the initial findings of a lack of effect for the far pain condition were not a consequence of the size of the stimuli presented.

# **DISCUSSION**

In this study, we investigated the pain empathy response for different viewing distances, looking at both near and far space. Results were consistent with previous studies that found a reduction in amplitudes of MEPs during the observation of needles penetrating body parts of a human model (Le Pera et al., 2001; Avenanti et al., 2005, 2006). Importantly, our study showed that the empathy response indexed by MEP modulation is limited only to peripersonal space. It was also in line with a study of spatial predictability of somatosensory targets by Van Damme and Legrain (2012) which suggested that spatial attention to a painful somatosensory stimulus is modulated only when the somatosensory targets were in near locations. In the present study, the reduced MEP seen only for near space pain related stimuli suggests is consistent with it being a consequence of misidentification of sensory information, with the MEPs being unaffected by far space stimuli. This effect was also found regardless of whether the stimuli were presented with similar retinal sizes or in smaller with greater distance.

When the painful stimulus is near, it may activate the detection system to facilitate the processing of behaviorally significant sensory input and to select the appropriate response (Legrain et al., 2011). As a painful sensation is unsurprisingly identified as something to be avoided, it is particularly important to monitor nearby objects in order to coordinate avoidance and defense with the aim of preventing potential physical threats, maintain the physical integrity of the body and avoid tissue damage (Cooke and Graziano, 2004; Van Damme and Legrain, 2012).

Empathy is the ability to appreciate the emotions and feelings of others with a minimal distinction between the two (Decety, 2011). The use of painful video stimuli was expected to result in somatic resonance in pain processing areas for others and the self, and triggering empathic responses. The expression of pain also provides a crucial signal that can motivate comforting and caring behaviors in others. In peripersonal space, there is an emergent capacity for self-awareness that is linked to the development of more advanced forms of empathy and social attachment serves intrinsically important regulatory functions related to security, nurturing and distress alleviation (Decety and Svetlova, 2012). Furthermore, this function in peripersonal space is important in terms of human–human interactions for prosocial behavior such as shaking hands or kissing the cheek of another (Lloyd, 2009).

The empathy system related to motor excitability was modulated by stimuli in peripersonal space but seems to be unaffected when the stimuli were presented in extrapersonal space. An ability to disambiguate peripersonal from extrapersonal space allows the observer to evaluate interpersonal behaviors (Caggiano et al., 2009). Thus, it might be assumed that in extrapersonal space, the brain limits the ability to regulate emotions as brain function related to extrapersonal space is more important in producing action or in movement planning (Rosenbaum et al., 2001) rather than regulating responses that may relate to effects on the self.

Perception of an emotion or feeling in another individual activates neural mechanisms responsible for the generation of similar emotions (Gallese, 2003; Gallese et al., 2007). We show that the motor empathy response has a distance limitation. This suggests that empathy responses of this type may be, at least partially, a consequence of the misidentification of visual information as relating to the observer. This may explain (at least partially) findings such as the effects of race on empathy (Forgiarini et al., 2011) and also leads to the prediction that the empathy related modulation of the motor response should reflect the (perceived) similarity of the observer and the stimulus and be altered should the stimulus be presented in a manner which the observer would be unable to replicate (for example, using unusual hand positions).

# **ACKNOWLEDGMENTS**

This work was supported by the National Science Council, Taiwan (Grant number: NSC-100-2410-H-008-074-MY3, NSC-102- 2410-H-008-021-MY3 and NSC-102-2420-H-008-001-MY3). We are grateful to Prof. Salvatore Aglioti for permission to use the video stimuli employed in the study.

# **REFERENCES**


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 23 November 2013; accepted: 21 January 2014; published online: 06 February 2014*.

*Citation: Mahayana IT, Banissy MJ, Chen C-Y, Walsh V, Juan C-H and Muggleton NG (2014) Motor empathy is a consequence of misattribution of sensory information in observers. Front. Hum. Neurosci. 8:47. doi: 10.3389/fnhum.2014.00047 This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Mahayana, Banissy, Chen, Walsh, Juan and Muggleton. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# Neuroanatomical substrates of action perception and understanding: an anatomic likelihood estimation meta-analysis of lesion-symptom mapping studies in brain injured patients

# *Cosimo Urgesi 1,2 \*, Matteo Candidi 3,4 and Alessio Avenanti 4,5*

<sup>1</sup> Laboratorio di Neuroscienze Cognitive, Dipartimento di Scienze Umane, Università di Udine, Udine, Italy

<sup>2</sup> Istituto di Ricovero e Cura a Carattere Scientifico "Eugenio Medea," Polo Friuli Venezia Giulia, San Vito al Tagliamento, Pordenone, Italy

<sup>3</sup> Dipartimento di Psicologia, Università "Sapienza" di Roma, Rome, Italy

<sup>4</sup> IRCCS Fondazione Santa Lucia, Rome, Italy

<sup>5</sup> Dipartimento di Psicologia e Centro studi e ricerche in Neuroscienze Cognitive, Alma Mater Studiorum - Università di Bologna, Campus di Cesena, Italy

#### *Edited by:*

Analia Arevalo, East Bay Institute for Research and Education, USA

#### *Reviewed by:*

Elisabetta Ladavas, University of Bologna, Italy Peter H. Weiss, Forschungszentrum Jülich, Germany

#### *\*Correspondence:*

Cosimo Urgesi, Laboratorio di Neuroscienze Cognitive, Dipartimento di Scienze Umane, Università di Udine, Via Margreth, 3, I-33100 Udine, Italy e-mail: cosimo.urgesi@uniud.it

Several neurophysiologic and neuroimaging studies suggested that motor and perceptual systems are tightly linked along a continuum rather than providing segregated mechanisms supporting different functions. Using correlational approaches, these studies demonstrated that action observation activates not only visual but also motor brain regions. On the other hand, brain stimulation and brain lesion evidence allows tackling the critical question of whether our action representations are necessary to perceive and understand others' actions. In particular, recent neuropsychological studies have shown that patients with temporal, parietal, and frontal lesions exhibit a number of possible deficits in the visual perception and the understanding of others' actions. The specific anatomical substrates of such neuropsychological deficits however, are still a matter of debate. Here we review the existing literature on this issue and perform an anatomic likelihood estimation meta-analysis of studies using lesion-symptom mapping methods on the causal relation between brain lesions and non-linguistic action perception and understanding deficits. The meta-analysis encompassed data from 361 patients tested in 11 studies and identified regions in the inferior frontal cortex, the inferior parietal cortex and the middle/superior temporal cortex, whose damage is consistently associated with poor performance in action perception and understanding tasks across studies. Interestingly, these areas correspond to the three nodes of the action observation network that are strongly activated in response to visual action perception in neuroimaging research and that have been targeted in previous brain stimulation studies. Thus, brain lesion mapping research provides converging causal evidence that premotor, parietal and temporal regions play a crucial role in action recognition and understanding.

**Keywords: action perception, action simulation, action understanding, mirror neurons, brain lesion, voxel-lesionsymptom mapping, activation likelihood estimation (ALE) meta-analysis**

# **INTRODUCTION**

Ever since the revolutionary proposal that action and perception systems are tightly linked along a continuum rather than being segregated mechanisms supporting different functions, behavioral studies have shown the many ways in which activity in the motor system modulates concurrent or delayed action perception and the other way around (Prinz, 1997; Schütz-Bosbach and Prinz, 2007a). The original idea that action observation triggers a corresponding activation of similar movement in a passive observer dates back to the ideomotor theories developed by Lotze (1852) and James (1890). More recently, a number of behavioral studies have described "compatibility" (facilitatory) and "incompatibility" (inhibitory) effects between an observed movement or posture and an executed movement (see Hommel, 2010 and Heyes, 2011 for reviews), suggesting a bidirectional influence of action observation on motor performance and of action execution on action perception.

#### **NEURAL CORRELATES OF ACTION PERCEPTION**

The actions of others represent a dynamic and extremely complex visual stimulus and posit a strong challenge to the brain for their perception and understanding. In line with the old ideomotor principle, current models of action perception suggest that in order to solve this computational challenge the brain has evolved an efficient sensorimotor mechanism, namely mapping visual representations of the observed actions onto corresponding motor representations (Rizzolatti and Craighero, 2004; Wilson and Knoblich, 2005; Kilner et al., 2007; Schütz-Bosbach and Prinz, 2007b; Gazzola and Keysers, 2009; Friston et al., 2011; Press et al., 2011; Schippers and Keysers, 2011; Avenanti et al., 2013b; Pezzulo et al., 2013). The activation of motor schemata while observing similar motor schemata in others may allow an understanding of others' actions "from inside" (Rizzolatti and Sinigaglia, 2010) and this motor coding of observed actions may be used to predict incoming visual signals and refine visual perception.

Attention to the action observation–execution coupling gained strong momentum when a plausible neural underpinning of such mechanism was first described under the form of neurons in the F5 sector of the ventral premotor cortex of awake monkeys (di Pellegrino et al., 1992). These cells have been termed "mirror neurons" (Gallese et al., 1996) for their capability to online mirror (i.e., replicate) in motor terms the observed hand– mouth actions (see Casile, 2013 for a review of 20 years of research on mirror neurons in the monkey brain). At their first description in monkeys, the activity of these cells seemed to be strictly dependent upon the actions having a clear transitive goal (i.e., grasping a piece of food), although premotor mirror neurons coding communicative mouth gestures (e.g., lipsmaking; Ferrari et al., 2003) or intransitive hand movements (Kraskov et al., 2009) have been also described. More recently, neurons coding the end-goal of a chain of actions have been described in the inferior parietal cortex of monkeys (i.e., in the cytoarchitectonic area PF and PFG) observing grasp-to-place and grasp-to-eat actions (Fogassi et al., 2005). An important feature of these cells is that their activity seems not to be strictly linked to the precise time-deployment of the observed action; indeed, a certain proportion of parietal mirror neurons are activated in advance of achievement of the end-goal, e.g., during the initial grasping phase (Fogassi et al., 2005). This anticipatory feature was also shown in a single-cell study where monkey premotor mirror neurons fired both when directly seeing hand–food contact and when merely inferring that the observed hand was going to grasp a piece of food behind an occluder (Umiltà et al., 2001).

Transcranial magnetic stimulation (TMS) studies assessing corticospinal excitability (Fadiga et al., 1995; Urgesi et al., 2006, 2010; Candidi et al., 2010; Borgomaneri et al., 2012, 2013, 2014; Barchiesi and Cattaneo, 2013; Mattiassi et al., 2014), electro- and magnetoencephalography (Hari et al., 1998; Cochin et al., 1999; Järveläinen et al., 2004; van Schie et al., 2004; Pineda, 2005; Kessler et al., 2006; Bufalari et al., 2007), functional brain imaging (Chong et al., 2008; Etzel et al., 2008; Kilner et al., 2009; Caspers et al., 2010; Oosterhof et al., 2010; Arnstein et al., 2011; Molenberghs et al., 2012a; Azevedo et al., 2013) and single-cell recording studies in humans (Mukamel et al., 2010) suggested the presence of fronto–parietal neural networks supporting similar mirror-like mechanisms.

A supposed cortical pathway for observed actions to be translated in their motor counterpart (i.e., the action observation– execution link) involves an early processing in visual regions, including the superior temporal sulcus (STS) and the surrounding middle/superior temporal gyri. Monkey studies indicate the STS region contains neurons that are activated by the observation of complex motion conveyed by biological entities (i.e., biological motion) even in the absence of a direct view of the form of the agent that performs the action (Puce and Perrett, 2003). The proposed idea is that visual information coming from lower-level visual areas is sent to temporal regions from where it is relayed to parietal regions (including the inferior parietal lobe and the anterior intraparietal area) and ultimately to premotor regions (Nishitani et al., 2004; Rizzolatti and Craighero, 2004; Caspers et al., 2010; Nelissen et al., 2011; Keysers and Gazzola, 2014). Recent work in humans also suggest that the somatosensory cortex participates in this network (Gazzola and Keysers, 2009; Caspers et al., 2010; Keysers et al., 2010; Jacquet and Avenanti, 2013); however, the pathway through which this region would receive visual signals conveying action observation has been less directly explored.

This temporal, parietal and premotor network, which is often referred to as the action observation network (AON), is suggested to be the basis for sophisticated cognitive skills such as the ability to perceive and understand others' actions and intentions. Neurophysiological and brain imaging techniques have been essential in highlighting that action observation triggers activation of not only temporal, but also fronto–parietal areas possibly coding visual representation of the observed action in motor terms. However, the correlational approach of these methods cannot establish whether neural activity in the AON is also necessary for action perception and understanding. Thus, to test the causal role of the AON in action perception is fundamental to resort to causal methods, i.e., by investigating the influence of altered neural activity in key nodes of the AON, introduced by brain lesions or non-invasive brain stimulation, on the ability to recognize and understand the actions of others (Avenanti and Urgesi, 2011; Urgesi and Avenanti, 2011; Avenanti et al., 2013b).

#### **BRAIN STIMULATION STUDIES OF ACTION PERCEPTION**

Based on the idea that the activation of motor regions is not only concomitant to action observation but that it plays a causal role in processing and full understanding of others' behavior, brain stimulation methods, especially repetitive TMS, have been used to highlight the causative role of premotor and motor regions in the visual perception of seen postures and movements (review in Avenanti et al., 2013b). These studies showed that interferential TMS over the inferior frontal cortex [including the posterior part of the inferior frontal gyrus (IFG) as well as the ventral premotor cortex], but not over control regions, impaired the performance of healthy participants during: (i) *biological motion perception*, in which participants are required to blend the coherent motion pattern of a series of point-lights into a unitary perception of a moving person (van Kemenade et al., 2012); (ii) *visual action discrimination*, in which participants are involved in delayed matching-to-sample of static pictures depicting hand grips (Jacquet and Avenanti, 2013), upper or lower limb actions (Urgesi et al., 2007b; Candidi et al., 2008) or whole body movements (Urgesi et al., 2007a); (iii) *weight estimation*, in which participants are presented with videos of an actor lifting and placing a box of different weights and are asked to estimate the weight of the box (Pobric and Hamilton, 2006); (iv) *goal recognition*, in which participants are required to match the end-goal of action videos (Jacquet and Avenanti, 2013); (v) *deception detection*, in which participants are required to recognize whether the actor who lifts an object is trying to provide deceiving information about its weight (Tidoni et al., 2013). Furthermore, repetitive TMS of the inferior frontal cortex during the observation of others' hand actions prevented healthy participants to perform proactive eye movements similar to those made by the model performing such actions (Costantini et al., 2014; see also Elsner et al., 2013). In a similar vein, stimulation of the inferior frontal cortex abolished the facilitation of motor excitability during action observation (as evidenced by perturb-and-measure TMS protocols: Avenanti et al., 2007, 2013a) as well as the effect of repeated action execution on categorization of seen actions (as shown by cross-modal TMS adaptation; Cattaneo et al., 2011).

Clearly, the functions addressed by these studies are very disparate and involve different levels of action representations, from pure visual processing (e.g., biological perception; discrimination of static postures), active simulation of actor's efforts in lifting the object (e.g., weight estimation), anticipatory coding of what the actor is doing (e.g., proactive gaze), inference of the action goals independently of their means (e.g., goal recognition) or of the ultimate actor's intention (e.g., deception detection). It is, thus, unclear at which level and for which specific function does the inferior frontal cortex play a critical role. Furthermore, other studies have shown that action perception and goal recognition are affected not only by stimulation of the inferior frontal cortex, but also by stimulation of the anterior intraparietal cortex (Cattaneo et al., 2010) and of the dorsal premotor cortex (Stadler et al., 2012; Makris and Urgesi, 2014). Similarly, dual coil TMS paradigms show that stimulation of parietal (Koch et al., 2010) and dorsal premotor (Catmur et al., 2011) cortices influences motor excitability during action observation, in a way that is similar to that caused by stimulation of the inferior frontal cortex (Koch et al., 2010; Catmur et al., 2011). Finally, it is also worth noting that performance in some action perception tasks is impaired after stimulation of the temporal nodes of the AON; for example, repetitive stimulation of STS reduces the sensitivity of biological motion perception (Grossman et al., 2005; van Kemenade et al., 2012), alters the ability to detect small postural changes in neutral and angry body images (Candidi et al., 2011), and disrupts the recognition of the outcome of complex sport actions (Makris and Urgesi, 2014). On the other hand, tasks involving the representation of abstract action goals independently of the effector are affected by stimulation of fronto–parietal but not of temporal areas (Cattaneo et al., 2010). Overall, these findings suggest that action perception and understanding rely on different regions which might provide complimentary contributions to the observer's action representation along a continuum from processing of kinematic features of the observed movement to processing of action goal and intention.

The crucial role played by each node of the AON in action representation, however, cannot be fully clarified by brain stimulation studies alone since the interference induced by single dose TMS of a given area might determine transient functional fluctuations of networks' activity (Siebner et al., 2009; Avenanti et al., 2012a,b; Arfeller et al., 2013). It is likely that such transient instabilities trigger fast compensatory functional reorganization of the network (Arfeller et al., 2013; Avenanti et al., 2013a), as documented for other domains such as action selection (O'Shea et al., 2007), thus allowing task performance to recover (Sack and Linden, 2003; Siebner et al., 2009; Reithler et al., 2011). These patterns of results would somehow limit the implication of brain stimulation results to the description of action perception and understanding deficits in chronic clinical conditions, associated to either neurodevelopmental disorders (e.g., autism spectrum disorder) or acquired brain damage (e.g., apraxia). Indeed, although plastic mechanisms are also evident after these latter forms of lesions, it is clear that these changes are completely different in both their nature and timing and imply extremely different functional effects from those consequent to brain stimulation methods. For example, while real lesions generally induce both morphological and functional longterm changes, virtual lesions induce faster functional changes that vanish away within the time of milliseconds to minutes at the most.

Thus, to establish the causal role of key nodes of the AON in action perception it is fundamental to provide convergent evidence from brain stimulation and brain lesion methods. In addition, although non-invasive brain stimulation techniques allow studying the effects of transient alterations of activity in motor cortical areas on their visual perception, one important limit of this method is that it cannot be applied to deep brain regions as only superficial areas can be easily stimulated. Critically, thus, brain lesions are the only way to describe any stable and causal role of superficial and non-superficial AON areas to action perception and understanding. Overall, the description of the neuropsychological deficits in brain lesion patients provides information on the functions that cannot be, or are much more difficult to, recover after damage to a given gray or white matter area. This provides more compelling evidence for the comprehension of the neural bases of action perception and understanding.

# **PIONEER NEUROPSYCHOLOGICAL STUDIES ON ACTION PERCEPTION DEFICITS**

The investigation of action perception and understanding disorders in brain lesion patients started from the pioneering findings of two classical research streams documenting action perception disorders in patients suffering from aphasia and apraxia, respectively.

The notion that patients with aphasia present disturbances also in pantomime recognition dates back to the seminal clinical observations of Finkelnburg (1870; cited in Varney, 1978), Jackson (1878; cited in Varney, 1978), and Head (1926; cited in Varney, 1978) and was attributed to a general deficit in symbolic thinking (asymbolia). Further studies, however, provided contrasting evidence that pantomime recognition deficits in aphasia patients correlate with the severity of their linguistic deficits. Duffy and Duffy (1975, 1981) developed a pantomime recognition test that did not require processing of verbal instructions or production of a verbal response and patients had simply to point to the correct gesture; they found that patients with aphasia were more impaired than patients with right hemisphere (RH) or subcortical damage and their pantomime recognition abilities correlated with their overall linguistic competence. On the other hand, some studies showed that pantomime recognition in aphasics was independent from general linguistic deficits (Gainotti and Lemmo, 1976) and was more associated to deficits in reading than

to deficits in oral comprehension, suggesting a link of pantomime recognition deficits with visual rather than linguistic or"symbolic" processing (Varney, 1978, 1982). Furthermore, qualitative analysis of the errors made by the aphasic patients in pantomime recognition demonstrated that they most often selected the semantic distractor, suggesting a specific difficulty in extracting the correct meaning of pantomimes (Varney and Benton, 1982; Duffy and Watkins, 1984). Finally, preliminary attempts to identify the neural correlates of pantomime recognition deficits in aphasia (Varney and Damasio, 1987;Varney et al., 1989) revealed that they resulted from lesions in basal ganglia and posterior temporo–parietal cortices, although the association between lesion of these areas and pantomime recognition deficits was weak (i.e., many patients with lesions in these areas did not exhibit any deficit).

The second research stream on the links between motor dysfunctions and action perception-understanding deficits originated the finding that patients with limb apraxia have deficits not only in imitating observed gestures, but also in distinguishing between well-performed from poorly performed movements (Heilman et al., 1982) and in understanding their meaning (Rothi et al., 1985). Importantly, action perception and understanding disorders were specific to the apraxia patients with posterior lesions, while those with anterior lesions were unaffected. In a similar vein, patients with ideational apraxia (defined as the inability to demonstrate correct object-use), presented deficits in sequencing pictures of object-use actions but not of other common events not requiring object manipulation; the deficits in action sequencing were independent from the severity of aphasia or ideomotor apraxia (i.e., gesture imitation) deficits (Lehmkuhl and Poeck, 1981; see also Rapcsak et al., 1995). These findings were interpreted in the context of a dissociation between conceptual action disturbances, which follow left parietal lesions and reflect the disruption of "visuo-kinesthetic motor engrams" guiding the sequencing and timing of motor movements, and production deficits, which follow premotor lesions and reflect the disconnection between parietal centers and motor production system (Heilman et al., 1997; Goldenberg, 1999; Stamenova et al., 2012). Following the same research stream, however, Halsband et al. (2001) found that patients with lesions involving the left parietal cortex showed severe action production and imitation impairments, but only slight, if any, deficits in tasks requiring to judge whether a given sequence was correctly or inadequately performed, to detect sequence or performance errors, or to identify the missing link in an incomplete sequence; conversely, patients with left premotor lesions or RH lesions were not affected in either action comprehension or production.

Overall, classical neuropsychological studies provided evidence that action comprehension disorders may be associated to language or imitation deficits in left hemisphere (LH) patients with aphasia and/or apraxia. All these studies highlighted a certain degree of variability among aphasia and apraxia patients in their relative performance in action comprehension tasks, suggesting that different brain lesions may induce associated or dissociated patterns of action comprehension and production disorders. The scanty documentation about lesion extent and localization notably limited the anatomical inferences that could be drawn from these

findings. Recent neuropsychological studies have strengthened the investigation of the neuroanatomical correlates of action perception and understanding disorders by using lesion mapping and analysis methods that allow testing the extent of the association between lesions in a given brain region and specific behavioral deficits. Performing a systematic review of these studies in order to identify pattern of consistent associations between specific brain lesions and action perception and understanding disorders is the aim of the present study.

# **THE PRESENT STUDY**

In the present study, we aimed to perform an anatomic likelihood estimation (AnLE) meta-analysis of studies using formal lesion-symptom mapping methods to describe the causal relation between brain lesions and action perception and understanding deficits. We considered studies using any formal lesion-symptom mapping procedures spanning from statistical frequency comparison of the lesion overlaps of impaired vs. non-impaired patients (Rorden and Karnath, 2004) to voxel-lesion-symptom mapping (VLSM) according to which, for each brain voxel, the performance of damaged patients is compared to that of non-damaged patients (Bates et al., 2003; Rorden et al., 2009), and comprising also voxelbased morphometry (VBM), which correlates gray-matter density to behavioral performance (Ashburner and Friston, 2000). The quantitative approach of these methods allows investigating subtle and continuous action perception and understanding deficits and associating them with their specific neural substrate.

A limitation of lesion mapping analyses of single studies is that their results are strictly dependent not only on the behavioral task used to probe action perception and understanding skills, but also on the patient population entered into the analysis. In fact, previous studies used different sets of tasks, which relied to different extent on motor production, visual perception and language processing, thus making it difficult to compare the results and to exclude the contribution of deficits attributable to damage to primary sensorimotor areas and/or language areas. Furthermore, the neuroanatomical inferences that can be drawn from the results of these single studies are stronger as more patients with disparate lesion localization and extent are entered into the analysis. However, having a high number of patients satisfying the inclusion criteria for reliable neuropsychological evaluation and with acceptable neuroradiological lesion documentation is one of the major issues in neuropsychological research. As a reflection of this issue, previous studies focused on subpopulations of patients selected on the basis of a specific symptom (e.g., apraxia or aphasia) or on the basis of lesion localization (left or right hemisphere). Since the number of patients in the different studies is relatively small and not surely optimal to cover all brain areas with acceptable power, we believe that formal meta-analytic works may facilitate the emergence of a consistent pattern of association between specific brain lesions and action perception and understanding disorders.

We thus performed a systematic review of existing studies investigating the neuroanatomical substrate of action perception and understanding disorders in brain lesion patients and used Brain-Map Ginger ALE 2.3 software (http://brainmap.org) to perform an AnLE meta-analysis. Although Ginger ALE was developed for activation likelihood estimation (ALE) meta-analyses when used in conjunction with functional neuroimaging results (Turkeltaub et al., 2002; Laird et al., 2005), it also allows performing AnLE meta-analyses if used in conjunction with anatomic data such as VBM (e.g., Nickl-Jockschat et al., 2012) or VLSM (e.g., Chechlacz et al., 2012; Molenberghs et al., 2012b). This last method assesses the overlap between anatomicalfoci identified by different research groups using voxel-wise analyses of the foci obtained based on various lesion-symptom mapping approaches. In the present context, the results of the meta-analysis allowed identifying consistent associations between brain damage and action perception and understanding deficits.

### **MATERIALS AND METHODS**

#### **LITERATURE SEARCH AND SELECTION CRITERIA**

For the purpose of the present study we performed a systematic search in the literature to identify all the relevant papers reporting the performance of brain lesion patients in action perception and understanding tasks. To avoid over-selecting the list on the basis of the specific lesion analysis used, an initial search identified all studies published after 2001 and investigating action perception in brain lesion patients. We searched PubMed with the following keywords: [(action OR actions OR gestures OR gesture OR pantomime OR pantomimes OR "biological motion") AND (perception OR discrimination OR prediction OR understanding OR recognition OR knowledge OR comprehension OR observation OR recognition) AND ("brain lesion" OR "brain damage" OR "brain injury" OR "brain lesioned" OR "brain damaged" OR "brain injured" OR "hemisphere lesion" OR "hemisphere damage" OR "hemisphere injury" OR "hemisphere lesioned" OR "hemisphere damaged" OR "hemisphere injured" OR "brain stroke" OR "hemisphere stroke" OR aphasia OR apraxia OR agnosia) AND (publication date > 2001) NOT (review)]. This yielded a list of 415 papers (last update 11 December 2013), which were screened to select the papers satisfying the following inclusion criteria: (1) testing the performance of focal brain lesion patients (e.g., studies on degenerative or neurodevelopmental disorders were not included); and (2) using at least one action perception and/or understanding task. We identified 34 original research articles published after 2001 that tested action perception in focal brain injured patients and administered at least one action perception and/or understanding task. The reference list of these papers was screened to identify other papers not picked up by the previous automatic search. This allowed us to identify other two papers (Battelli et al., 2003; Tranel et al., 2003). The resulting list of 36 papers was then screened for the following exclusion criteria: (1) not mapping and analyzing patients lesions using one of the standard lesion-symptom mapping approaches based on VLSM, subtraction of lesion overlaps, or VBM; (2) administering tasks with strong linguistic processing demand (e.g., action naming or verb to action scene matching) and (3) cases in which the coordinates of the clusters in the Montreal Neurological Institute (MNI; Evans et al., 1993) or Talairach space (Talairach and Tournoux, 1988) could not be identified either from the information provided in the paper or directly from the authors. Based on these exclusion criteria we did not include studies that involved only single case analyses or a few patients and that selected the patient group on the basis of the presence of a specific symptom associated to the experimental task (i.e., studies where no statistical comparison with a different patient group was performed).

Twelve papers (Sörös et al., 2003; Yoon et al., 2005; Arévalo et al., 2007, 2011, 2012; Bi et al., 2007; Negri et al., 2007; Tranel et al., 2008; Papeo et al., 2010; Pillon and d'Honincthun, 2011; Vannuscorps and Pillon, 2011; Stamenova et al., 2012) were not considered because their action understanding tasks required processing of linguistic stimuli, either naming of visually presented actions or word to picture matching that involved understanding of the word meaning. Five papers were not considered further because they reported single case analyses of action perception and understanding disorders in patients with agnosia (Huberle et al., 2012; Moro et al., 2012), apraxia (Sunderland, 2007), aphasia (Cocks et al., 2009), or frontal brain lesion (Eskenazi et al., 2009). Three papers were not included because they studied small groups of patients who were all impaired in biological motion detection (three patients in Battelli et al., 2003), in sequencing observed actions (six patients in Fazio et al., 2009) or in matching mouth action sounds (Schmid and Ziegler, 2006) and no VLSM or lesion subtraction statistical analysis could be performed. Two studies (Serino et al., 2010; van Dokkum et al., 2012) could not be included because no lesion mapping was performed and patients were recruited on the basis of specific motor symptoms (hemiplegia) whose presence was associated to performance in the experimental task (perception of biological motion). Finally, three studies (Tranel et al., 2003; Kemmerer et al., 2012; Rogalsky et al., 2013) were not included in the meta-analysis because the coordinates of the foci associated to action perception and understanding deficits were not available. From the list of 36 papers published after 2001 and testing action perception and recognition in brain lesion patients, we thus identified 11 papers that did not meet any exclusion criteria (see **Table 1**).

#### **DATA ANALYSIS**

Based on the results of the literature search we entered all the foci whose coordinates (1) were reported by the authors in the paper, (2) could be identified from the information provided in the paper, or (3) were provided by the authors as personal communication. The center coordinates of all clusters reported in the papers were considered provided they referred to tasks involving action perception and understanding independent of linguistic coding. Thus, the coordinates of clusters associated to all tasks were included in cases in which multiple action perception tasks were administered to patients. Conversely, the coordinates of foci associated to tasks requiring linguistic coding (e.g., picture to word matching as in the semantic task in Buxbaum et al., 2005 and Kalénine et al., 2010) were not included in the analysis to rule out the spurious lesional effects of areas associated to language disorders. In cases in which multiple analyses were performed on the same data set but using different lesion analysis approaches (e.g., Pazzaglia et al., 2008b), we entered the coordinates resulting from all analyses. For each cluster, the coordinates of the voxel with maximal statistical value or of the center of mass were entered into the analysis, according to which of the two coordinates was provided by the authors.


**1|ListofstudiesconsideredfortheAnLEmeta-analysislistedinchronologicalorder.**

on a subset of patients and the number of patients considered here is that entered into the analysis and not of the study whole sample. Except when otherwise specified, upper limb actions were used as stimuli.

We performed all analyses in MNI space and the coordinates originally reported in Talairach space were converted into MNI space with the coordinate conversion tool implemented in Ginger ALE software which uses the best-fit icbm2tal transform (Lancaster et al., 2007). We used the revised version of the AnLE methods (Eickhoff et al., 2009) which considers random effects and incorporates variable uncertainty based on sample size. Furthermore, a modification to the AnLE method (Turkeltaub et al., 2012) was used to limit the effect of a single experiment and minimize within-group effects. In keeping with previous AnLE meta-analyses on brain lesion mapping data (e.g., Chechlacz et al., 2012; Molenberghs et al., 2012b), this modified AnLE algorithm was used to control for dependent within-group effects in studies providing different sets of coordinates based on different data analysis approaches (e.g., lesion overlap subtraction and VLSM; as in Pazzaglia et al., 2008b) or on different action perception tasks administered to the same group of patients (as in Pazzaglia et al., 2008a). This AnLE approach models the anatomical foci from different published reports as Gaussian probability density distribution at a given coordinate and calculates the Modeled Anatomic maps (i.e., the 3D images of each foci group) on the basis of the maximum across each focus's Gaussian. Then, an experimental AnLE map is created from the voxel-wise union of all Modeled Anatomic maps. Differentiation of true concurrence of foci vs. random spatial association is performed by testing the experimental AnLE map against AnLE null distribution maps that are generated utilizing a permutation test of randomly generated foci. For thresholding purposes, we followed a cluster level inference method (Eickhoff et al., 2012), which sets the cluster minimum volume such that only 5% of the simulated data's clusters exceed this size. This way, we avoided setting *a priori* a minimum cluster size which could have removed small clusters with high convergence of studies. A cluster-forming statistical threshold of *p* < 0.05 FDR (false discovery rate) was used to correct for multiple comparisons. The resulting maps were overlaid onto the T1-weighted template MRI scan from the MNI provided with the MRIcron software (Rorden and Brett, 2000; available at http://www.mricro.com/mricron). The anatomical localization of the significant clusters identified by the meta-analyses was based on probabilistic cytoarchitectonic maps of the human brain using the SPM Anatomy Toolbox v. 1.7 (Eickhoff et al., 2005). Using a Maximum Probability Map, foci were assigned to the most probable histological area at their respective locations.

# **RESULTS**

The 11 studies and foci entered into the meta-analysis are reported in **Table 1**. The studies involved a total of 361 patients and reported 30 foci of significant lesion-deficit associations. Most patients had lesions in the LH (*N* = 296); only two studies (Moro et al., 2008; Han et al., 2013) reported and analyzed also patients with RH (*N* = 26) and bilateral posterior (*N* = 39) lesions; two further studies (Saygin, 2007; Weiss et al., 2008) tested both LH and RH patients but did not include RH patients in the lesion mapping analysis. Within the LH group, however, there was a good coverage of frontal, parietal, and temporal lesions.

The results of the AnLE meta-analysis are listed and detailed in **Table 2** and they are displayed in **Figure 1**. We identified three lesion clusters with significant co-occurrence of associations with action perception and understanding disorders. The largest cluster (1920 voxels) was located in the left frontal cortex (MNI coordinates of the weighted center, *x, y, z*: −44, 10, 14) and was assigned to Brodmann area (BA) 44 (30.4% of the cluster voxels) and BA 45 (3.4% of the cluster voxels). Local maxima were identified in the pars opercularis (MNI: −48, 12, 12) and pars triangularis (MNI: −38, 14, 26) of the IFG and in the rolandic operculum (MNI: −42, 6, 14). The other two clusters were much smaller. One cluster (304 voxels) was located in the left parietal cortex (MNI coordinates of the weighted center, *x*, *y*, *z*: −35, −54, 36) and was assigned mostly to human intraparietal area 1 (hIP1; 57.2% of the cluster voxel) and marginally to hIP3 (0.7% of the cluster voxels). The third cluster was located in the left middle/superior temporal cortex (MTC/STC) and centered on the lower bank of the STS (MNI coordinates of the weighted center, *x*, *y*, *z*: −43, −52, 5); local maxima were identified in the middle temporal gyrus (MNI: −42, −52, 8) and the underlying white matter (MNI: −44, −52, 2). The cluster with greatest convergence was the one in the IFG (AnLE value = 0.017), especially in the pars opercularis, while the other two clusters were less reliably identified in the studies considered here (AnLE value < 0.12).

# **DISCUSSION**

Previous neurophysiological and brain imaging techniques have been essential in demonstrating that observing others' actions activates high-order visual areas in the temporal cortex, which are involved in processing biological motion, as well as frontal and parietal somatomotor regions, which are involved in performing the observed actions (Puce and Perrett, 2003; Rizzolatti and Craighero, 2004; Caspers et al., 2010; Grosbras et al., 2012). However, these approaches only provide correlational evidence and cannot establish whether temporal, parietal, and frontal areas are necessary for visual recognition and understanding of others' actions (Avenanti and Urgesi, 2011; Avenanti et al., 2013b).

Our meta-analysis of brain lesion studies investigating the neural correlates of action perception and understanding disorders using quantitative lesion mapping analyses showed that lesions of three crucial nodes of the AON, namely the inferior frontal cortex, inferior parietal cortex, and MTC/STC, are consistently associated to deficits in perceiving and understanding the actions of other individuals. This converges with neurophysiologic, neuroimaging and brain stimulation studies in showing that the ability to understand others' behavior recruits a large network of temporal, parietal, and premotor areas that may play complimentary roles in the ultimate action representation.

The probabilistic cytoarchitectonic anatomical localization of the three clusters assigned the inferior frontal cortex cluster mostly to BA 44 and only marginally, in its antero-dorsal aspect, to BA 45. This localization corresponds very much to what reported in the previous ALE meta-analysis of functional imaging studies carried out by Caspers et al. (2010) and it converges with the region we identified in a previous review of the literature of brain stimulation studies that investigated the neural substrates of action perception (Avenanti et al., 2013b). Moreover, the BA44 region is thought to be the human homolog of the macaque ventral premotor cortex area F5 where mirror neurons where first described in the monkey


**Table 2 | Significant AnLE clusters and MNI coordinates of the corresponding local maxima identified in the inferior frontal cortex (IFC), inferior parietal cortex (IPC), and middle/superior temporal cortex (MTC/STC).**

BA, Brodmann area; hIP1, human intraparietal area 1; hIP3, human intraparietal area 3.

brain (di Pellegrino et al., 1992; Rizzolatti and Craighero, 2004). This convergence provides compelling evidence for a critical role of the inferior frontal cortex in action perception.

The inferior parietal cortex cluster was assigned to hIP1 and marginally to hIP3. Thus our parietal cluster resulted to be located more posteriorly and medially than the rostral inferior parietal area (area PFt), which represented the most anterior part of the parietal cluster identified by Caspers et al. (2010) and might correspond to area PF of the monkey brain (Caspers et al., 2008), where parietal mirror neurons were identified (Fogassi et al., 2005). However, parietal mirror neurons have been reported also more posteriorly, in area PFG (Fogassi et al., 2005; Bonini et al., 2010) and monkey imaging studies show that action observation triggers activity not only in area PF, but also in PFG as well as in the somatosensory and intraparietal cortex (Evangeliou et al., 2009; Nelissen et al., 2011). Remarkably, our hIP1/hIP3 cluster appears to overlap, at least partially, with the most posterior aspects of the parietal cluster identified by Caspers et al. (2010), which, similarly to monkey data, extended to the somatosensory cortex and the intraparietal sulcus (IPS) and more specifically to the cytoarchitectonic area hIP3. This partial convergence between our meta-analysis and previous ALE meta-analysis of functional imaging studies (Caspers et al., 2010) may be due to technical reasons. Indeed, besides issues related to the anatomical resolution of lesion mapping methods, an additional key difference should be considered between neuroimaging and lesion studies. While functional magnetic resonance imaging (fMRI) technique detects activation mainly in the gray matter (at least in its typical applications), lesion studies can reveal behavioral consequences of lesion occurring to both gray and white matter. Considering that our cluster was quite medial (MNI *x* = −36), it is likely that it comprised not only gray matter in the intraparietal cortex but also the underlying white matter and, thus, its connections with other brain regions. Notably, functional and structural connectivity studies suggest that human hIP1 and hIP3 are mostly connected with the inferior frontal cortex (e.g., ventral premotor and IFG; see Uddin et al., 2010), which closely corresponds to our frontal cluster. Thus, these findings would support the notion that inferior fronto–parietal networks support action recognition and understanding.

Finally, regarding the temporal cluster, its location closely corresponded to the cluster in the superior temporal sulcus/posterior middle temporal gyrus that was identified by Caspers et al. (2010), despite being again slightly more medial (i.e., suggesting affection of the white matter underlying the middle temporal gyrus).

An important feature of the present AnLE meta-analysis concerns the inclusion of studies that aimed explicitly to exclude that the action tasks had linguistic demands that could affect performance even if patients with aphasia were tested. Thus, our methodological choice to include only papers administering action perception tasks with low, if any, linguistic processing demands allowed ensuring that language comprehension or production abilities are not confounding our results. As noted for brain stimulation studies, however, brain lesion studies used different types of tasks that demand different levels of action representation, from purely perceptual to goal and intention representation levels. Our AnLE meta-analysis allowed us to detect the clusters more consistently associated to general action perception deficits (independently from any linguistic demands). However, the small number of studies did not allow us to perform a more accurate task analysis to detect specific task-lesion associations and this should be considered a limitation of our study. Nevertheless, we believe that a qualitative description and classification of the tasks used in the different studies reported here may be very helpful in clarifying which functions were tapped on and provide a guide for the functional characterization of the tasks used to study action perception in future studies. In the following, we attempted such a task classification, although it should be kept in mind that our AnLE meta-analysis supports a general involvement of the three clusters in action perception and

**FIGURE 1 | Maps of the clusters with significant association between brain lesions and action perception, and understanding disorders overlaid on axial slices (A) or 3D rendering (B) of the Montreal Neurological Institute (MNI) template.** Left hemisphere is on the left, and

right hemisphere is on the right. Color scale indicates AnLE value range. IFG, inferior frontal gyrus; IPS, intraparietal sulcus; MTG, middle temporal gryus. Note that deeper regions are projected onto the surface of the template to better highlight the extension of the cluster.

not their specific functional characterization. Inspection of the tasks used in the different studies suggests that they can be clustered into four different types: (1) biological motion perception (Saygin, 2007; Han et al., 2013); (2) discrimination of action pictures or sounds (Moro et al., 2008; Pazzaglia et al., 2008a; Kalénine et al., 2013); (3) detection of spatio–temporal errors in action sequences (Buxbaum et al., 2005; Pazzaglia et al., 2008b; Weiss et al., 2008; Kalénine et al., 2010; Nelissen et al., 2010); (4) identification of action goal (Saygin et al., 2004a).

# **MOVEMENT PERCEPTION**

In two studies, perception of biological motion was tested presenting point-light displays of human actions and requiring participants to discriminate them from their scrambled versions (Saygin, 2007) or to associate them to a static picture of the corresponding action (Han et al., 2013). In both studies, the task required the patients to extrapolate human actions from the coherent pattern of motion of dots and both studies found that lesions in the MTC/STC and premotor cortex affected biological motion perception. However, while Han et al. (2013) entered both left and right (and bilateral) lesions into the analysis and found that only RH areas were associated to biological motion perception deficits, Saygin (2007) entered only LH lesions in her quantitative analysis and found a role for both MTC/STC cortex and inferior frontal cortex in the LH. Importantly, the behavioral analysis of RH damaged patients revealed that their performance was also impaired and was comparable to that of LH damaged patients, suggesting no specific lateralization effects in this task. It is possible that the choice of Han et al. (2013) to partial out the word-to-picture matching abilities of patients from the biological motion perception predictor ensured to exclude any effects of linguistic confounds, but may have also masked the deficits shown by LH damaged patients in biological motion perception with respect to RH damaged patients. Overall, the data of both studies are in keeping with neuroimaging evidence that observation of point-light displays of human actions activates not only middle/superior temporal (Grossman et al., 2000; Puce and Perrett, 2003) but also premotor areas (Saygin et al., 2004b) and with brain stimulation evidence that interference with both middle/superior temporal (Grossman et al., 2005) and premotor areas (van Kemenade et al., 2012) disrupts biological motion perception.

While our meta-analysis suggests that both temporal and premotor cortices are critical in perceiving the actions of others, studies suggest these regions may provide complimentary contributes to the extrapolation of human movement information from point-light displays. Single-cell recording shows that neurons in STS and premotor areas have different response properties. Indeed, while both types of cells respond during action observation, no study has so far reported STS neurons responding to both observed and executed actions similar to what mirror neurons in the premotor and parietal areas do (Keysers and Perrett, 2004; Rizzolatti and Craighero, 2004). Rather, some STS neurons appear to decrease their activity during action execution (Keysers and Perrett, 2004). On the other hand, while both STS (Baker et al., 2001) and premotor (Umiltà et al., 2001) neurons continue responding during occlusion of the action, they show a differential pattern of temporal coupling with the action course. Indeed, STS neurons respond to the articulated static postures that correspond to the end-point of the actions but not to their start-point (Jellema and Perrett, 2003a); furthermore, the response of some STS neurons to static body postures is influenced by which action has been previously observed (Jellema and Perrett, 2003b) suggesting that their firing is influenced by the perceptual history of the action sequence in which a body posture is presented (Perrett et al., 2009). Conversely, mirror neurons in the premotor cortex show a more variegate response pattern, with some being activated in advance of goal achievement (Umiltà et al., 2001), others that stop firing when the target object has been reached and grasped, and others continuing to discharge also during the active holding phase (Gallese et al., 1996). Taken together, these results may suggest that, while neural activity in STS and the surrounding MTC/STC uses visual information and perceptual experience to form a representation of ongoing actions (Perrett et al., 2009), activity in the premotor cortex may allow using previous motor experience with similar actions in order to simulate missing or ambiguous visual information on ongoing actions (Wilson and Knoblich, 2005; Urgesi et al., 2012;Avenanti et al., 2013a). This would suggest that the less rich is visual processing in STS the more motor simulation processing in premotor cortex is required to construct a full action representation from ambiguous visual information. Direct evidence for this compensatory plasticity of visual and motor action representation came from a "perturb and measure" TMS study (Avenanti et al., 2013a) showing that motor facilitation during posture observation

increases after interferential stimulation of STS (see also Arfeller et al., 2013 for converging TMS-fMRI evidence).

# **ACTION DISCRIMINATION**

The second group of studies used tasks that require matching two similar static pictures (Moro et al., 2008) or videos (means detection task in Kalénine et al., 2013) of body actions or matching an action sound to its corresponding action picture (Pazzaglia et al., 2008a). The results of these three studies were somehow discrepant, likely depending on the type of actions stimuli used (i.e., transitive vs. intransitive). Indeed, while Moro et al. (2008) used only intransitive or mimicked actions, Kalénine et al. (2013) used only transitive actions and Pazzaglia et al. (2008a) used both transitive and intransitive actions of upper limbs and mouth. In keeping with the brain stimulation studies using a similar task in healthy individuals (Urgesi et al., 2007b; Candidi et al., 2008), Moro et al. (2008) showed that damage to left or right inferior frontal cortex impaired the ability to discriminate two body part pictures on the basis of the specific intransitive action the model was performing. On the other hand, the means difference detection task used by Kalénine et al., 2013 required the comparison of the body movements of two goal-directed transitive actions having similar outcome (e.g., cleaning with a straight or circular movement) and performance in this task was associated to damage to the inferior parietal cortex but not to the inferior frontal cortex. Finally, Pazzaglia et al. (2008a) found that lesions of both inferior frontal and inferior parietal cortex impaired the ability to associate sounds to their corresponding action picture. Overall, these studies appear in keeping with the differential involvement of inferior frontal and inferior parietal cortices in mapping intransitive and transitive actions (Buccino et al., 2001), with the inferior frontal cortex being involved in the encoding of both types of actions and the parietal cortex being more involved in the encoding of goal-directed actions (see also Grafton and Hamilton, 2007; Lestou et al., 2008; Jacquet and Avenanti, 2013).

# **ERROR DETECTION**

The third group of studies required participants to detect errors in videos of body actions. In two of these studies, patients with apraxia (Pazzaglia et al., 2008b) and aphasia (Nelissen et al., 2010) were required to observe videos of transitive and intransitive actions that could be executed correctly or not. Although the stimuli used in the two studies were the same, Pazzaglia et al. (2008b) used an intermingled presentation of correct and incorrect actions and participants were required to decide whether each action was executed correctly or not; beyond tapping executive functions required to take a decision (see also Kalénine et al., 2013), this task requires matching the observed action to an internal representation of how that action is normally executed, thus likely calling for motor simulation. These specific task requirements were indeed associated to damage to the inferior frontal cortex. Conversely, Nelissen et al. (2010) presented three versions of the same action (two erroneous versions and one correct) and participants were required to decide which of the three versions was correctly executed; the visual presentation of correct and erroneous executions might have facilitated the identification of the correct solution

without the need to represent with simulation processes how that action should be executed. Indeed, the authors did not find any association between performance in the task and inferior frontal cortex lesion; on the other hand, performance deficits were associated to damage of the left STC, possibly reflecting the use of visual action processing to solve the task.

In the other three studies of this group (Buxbaum et al., 2005; Weiss et al., 2008; Kalénine et al., 2010), participants were presented with a linguistic description of a transitive action and then observed action videos that could or could not contain errors. While in the spatial task of Buxbaum et al. (2005) and Kalénine et al. (2010) participants had to choose the correctly executed action between two action videos that contained or not spatial errors (spatial task), Weiss et al. (2008) required participants to decide whether each video depicted correctly executed action or actions with spatial or sequencing errors. In both cases, patients' performance was associated to damage of the inferior parietal cortex/angular gyrus, suggesting a crucial role of this area in representing the correct spatio–temporal profile of transitive actions. Crucially, while both these tasks contained a linguistic cue (the initial description of the action verb or sentence), processing of the linguistic stimuli was almost irrelevant to task performance, since deciding which action contains a spatial or sequencing error is independent from the processing of its linguistic description. On the other hand, we decided to exclude from the Buxbaum et al. (2005) and Kalénine et al.'s (2010) papers the so called semantic task, that required to associate a verb to one of two different correctly executed action videos. Since this task was strictly related to the understanding of the verb meaning, it did not satisfy the exclusion criteria of not being related to linguistic processing. Conversely, the spatial task could be performed also without understanding of the verb meaning.

# **ACTION GOALS**

The study of the fourth group (Saygin et al., 2004a) required matching the correct objet to a schematic drawing of action. This task does not require the discrimination of the correct action kinematics, but the access to the immediate-goal of observed transitive actions. Performance in this task showed a specific association with damage of the left inferior frontal cortex in aphasia patients, suggesting a role of this area in representing the congruence of action means and goal. It is worth noting that also the so-called outcome detection task in Kalénine et al. (2013) required the processing of action end-goal, since the participants had to discriminate two actions executed with the same body kinematics to obtain different outcomes; performance in this task resulted not to be associated to any specific lesion damage, albeit a marginally significant association was noted with damage to the inferior frontal cortex (Kalénine et al., 2013). Thus, the fourth study group suggests that understanding the immediate and end-goal of observed actions may involve the inferior frontal cortex. This is in keeping with two recent TMS studies showing that stimulation of the inferior frontal cortex affects the ability to match the immediate-goal (Cattaneo et al., 2010) or the end-goal (Jacquet and Avenanti, 2013) of two actions depicted in a video and in a picture (independently of the effector used to grasp/pull a ball as in the study of Cattaneo and colleagues; or independently of the type of grip being used to achieve

the end-goal of a sequence of actions as in the study of Jacquet and Avenanti, 2013). No effect was obtained after stimulation of the anterior intraparietal cortex (Jacquet and Avenanti, 2013), suggesting that processing action end-goals at an abstract level (i.e., independent of action means) relies more on the frontal node of the AON. Thus, these brain lesion and brain stimulation findings converge with neuroimaging studies of action execution (Johnson-Frey et al., 2005) and observation (Grafton and Hamilton, 2007; Bach et al., 2010) and provide causative evidence for a partial division of labor between the parietal and frontal nodes of the AON: while the inferior parietal cortex may be more involved in processing the specific way an observed transitive action is performed, i.e., the action's means of goal-oriented actions, the inferior frontal cortex appears also involved in coding action outcome and goal at a more abstract level and may use such abstract information to complete missing and ambiguous perceptual information about ongoing actions.

# **CONCLUSION**

In sum, our ALE meta-analysis of studies using lesion-symptom mapping methods to describe the causal relation between brain lesions and action perception and understanding deficits identified three regions of the AON, namely the inferior frontal cortex, the inferior parietal cortex and the MTC/STC,whose damage was consistently associated with poor performance in action perception and understanding tasks that required to extrapolate biological motion from point-light displays, to match the kinematics of transitive and intransitve actions and to infer their end-goal. Interestingly, these areas correspond to the three nodes of the AON that are strongly activated in response to visual action perception in neuroimaging research (Caspers et al., 2010; Molenberghs et al., 2012a) and that have been targeted in previous brain stimulation studies (see Avenanti et al., 2013b for a review). Thus, brain lesion mapping provides converging evidence that premotor, parietal and temporal regions play crucial and possibly complimentary roles in perceptual and cognitive action-related processes.

Here we attempted to classify the different studies on the basis of the tasks used to probe action perception and comprehension and have highlighted the importance of differentiating between transitive and intransitive actions and between processing of different types of action-specific information (i.e., action means vs. action goal). However, the limited number of studies available in literature prevented us to draw strong conclusions from this classification and more empirical studies are needed in order to increase the robustness of the meta-analytic approach and to perform more specific task analyses. Furthermore, other action dimensions should be taken into account in the study of the neural bases of action perception disorders. In particular, neuroimaging studies have shown that observing upper and lower limbs and mouth actions activates different sectors of the premotor and parietal cortices in accordance with the somatotopic organization of movement execution (Buccino et al., 2001; Grosbras et al., 2012) and a recent brain stimulation study also supports this organization, with lip and hand motor representations in the premotor cortex being critically involved in processing observed mouth and hand actions, respectively (Michael et al., 2014). Most studies considered in this meta-analysis used only upper-limb movements, thus making it difficult to evaluate the possible role of somatotopy in the precise extent and localization of the neural underpinnings of action recognition. The two studies using point-light displays (Saygin, 2007; Han et al., 2013) showed whole body movements, which involved the displacement of both upper and lower limbs, thus preventing any consideration about somatotopic organization. Moro et al. (2008) used static images that implied actions of lower or upper limbs, but no dissociation between deficits in recognizing upper or lower limbs was noticed. Finally, Pazzaglia et al. (2008b)tested patients with buccofacial or limb apraxia and found a specific functional correspondence between deficits in imitating and matching mouth or upper limb actions. Lesion mapping analysis further confirmed that while insula damage was common to deficits in matching mouth and limb actions, deficits in matching limb actions were associated to damage of the inferior frontal cortex and inferior parietal cortex; conversely deficits in matching mouth actions were associated to damage of only inferior frontal cortex (Pazzaglia et al., 2008b). This last result seems in keeping with the involvement of inferior parietal cortex in coding hand–object interactions in transitive actions (Buccino et al., 2001).

A further important factor that should be taken into account when making inferences about the neural substrate of action perception is whether the action has or does not have a known functional, symbolic, or communicative meaning for the observer. Neuropsychological (e.g., Tessari et al., 2007) and neuroimaging (Peigneux et al., 2004; Rumiati et al., 2005) studies have shown dissociation between the neural correlates of imitating meaningful and meaningless actions. In a similar vein, using positron emission tomography (PET), Decety et al. (1997) found that observing meaningful vs. meaningless actions, with the instructions to either imitate or recognize them, activated partially dissociated neural networks within and outside the classical AON. Crucially, with the exception of Moro et al. (2008), who used both meaningful and meaningless actions, all studies entered in this meta-analysis presented only meaningful actions which were familiar to the observers. This limits the implications of the results to the perception and understanding of meaningful actions; different areas may be required when observers perceive new and meaningless movements of other individuals.

Although we found that damage to all three clusters in the inferior frontal and parietal cortex and MTC/STC caused action perception deficits, the relative involvement of these areas in action perception might be related to the amount of motor simulation required to complete ambiguous perceptual information (Avenanti et al., 2013a), to the domain-specificity of the observer's motor expertise (Calvo-Merino et al., 2006; Aglioti et al., 2008; Fourkas et al., 2008; Abreu et al., 2012; Tomeo et al., 2013; Candidi et al., 2014; Makris and Urgesi, 2014) and to the level of action knowledge that needs to be inferred about others' behavior. Notably, much less evidence has been provided by brain lesion studies on the ability to infer the final intention of the observers and to decide, for example, whether other are deceiving or providing genuine information on their ultimate aims. Although neuroimaging (Grezes et al., 2004; Iacoboni et al., 2005) and brain stimulation studies (Tidoni et al., 2013) suggest that the inferior

frontal cortex may play a major role in these high-level action tasks, future studies are needed in order to provide converging causative evidence on how brain lesions may affect the ability to understand the ultimate intentions of others.

# **ACKNOWLEDGMENTS**

Cosimo Urgesi and Alessio Avenanti are funded by grants from the Ministero Istruzione, Università e Ricerca (Futuro in Ricerca 2012, protocol number: RBFR12F0BD). Cosimo Urgesi is also funded by grants from the Ministero Istruzione Università e Ricerca (Progetti di Ricerca di Interesse Nazionale, PRIN 2009; Prot. 2009A8FR3Z) and the Istituto di Ricovero e Cura a Carattere Scientifico "Eugenio Medea" (Ricerca Corrente 2013, Ministero della Salute). Matteo Candidi is funded by the University of Rome "Sapienza" Ricerche Universitarie, Progetti di Ricerca 2012 (Prot. C26A122ZPS), 2013 (Prot. C26A13ZJN4). Alessio Avenanti is also funded by grants from the Ministero della Salute (Bando Ricerca Finalizzata Giovani Ricercatori 2010, protocol number: GR-2010-2319335) and Cogito Foundation (Project 2013, research grant, R-117/13). The authors wish to thank L. J. Buxbaum, S. Kalénine, P. H. Weiss, and E. Niessen for kind help in providing additional data (i.e., MNI coordinates) on the results of their lesion mapping studies.

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 23 December 2013; accepted: 06 May 2014; published online: 30 May 2014. Citation: Urgesi C, Candidi M and Avenanti A (2014) Neuroanatomical substrates of action perception and understanding: an anatomic likelihood estimation meta-analysis of lesion-symptom mapping studies in brain injured patients. Front. Hum. Neurosci. 8:344. doi: 10.3389/fnhum.2014.00344*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Urgesi, Candidi and Avenanti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Observation and imitation of actions performed by humans, androids, and robots: an EMG study

# *Galit Hofree1†, Burcu A. Urgen2†, Piotr Winkielman1,3,4 and Ayse P. Saygin2\**

*<sup>1</sup> Department of Psychology, University of California, San Diego, San Diego, CA, USA, <sup>2</sup> Department of Cognitive Science, University of California, San Diego, San Diego, CA, USA, <sup>3</sup> Behavioural Science Group, Warwick Business School, University of Warwick, Coventry, UK, <sup>4</sup> Department of Psychology, University of Social Sciences and Humanities, Warsaw, Poland*

#### *Edited by:*

*Analia Arevalo, East Bay Institute for Research and Education, USA*

#### *Reviewed by:*

*Alessio Avenanti, Alma Mater Studiorum – University of Bologna, Italy Lindsay M. Oberman, Brown University, USA*

#### *\*Correspondence:*

*Ayse P. Saygin, Department of Cognitive Science, University of California, San Diego, 9500 Gilman Drive, Mail Code 0515, La Jolla, San Diego, CA 92093-0515, USA*

*asaygin@ucsd.edu*

*†These authors have contributed equally to this work.*

> *Received: 13 March 2014 Accepted: 08 June 2015 Published: 19 June 2015*

#### *Citation:*

*Hofree G, Urgen BA, Winkielman P and Saygin AP (2015) Observation and imitation of actions performed by humans, androids, and robots: an EMG study. Front. Hum. Neurosci. 9:364. doi: 10.3389/fnhum.2015.00364* Understanding others' actions is essential for functioning in the physical and social world. In the past two decades research has shown that action perception involves the motor system, supporting theories that we understand others' behavior via embodied motor simulation. Recently, empirical approach to action perception has been facilitated by using well-controlled artificial stimuli, such as robots. One broad question this approach can address is what aspects of similarity between the observer and the observed agent facilitate motor simulation. Since humans have evolved among other humans and animals, using artificial stimuli such as robots allows us to probe whether our social perceptual systems are specifically tuned to process other biological entities. In this study, we used humanoid robots with different degrees of human-likeness in appearance and motion along with electromyography (EMG) to measure muscle activity in participants' arms while they either observed or imitated videos of three agents produce actions with their right arm. The agents were a Human (biological appearance and motion), a Robot (mechanical appearance and motion), and an Android (biological appearance and mechanical motion). Right arm muscle activity increased when participants imitated all agents. Increased muscle activation was found also in the stationary arm both during imitation and observation. Furthermore, muscle activity was sensitive to motion dynamics: activity was significantly stronger for imitation of the human than both mechanical agents. There was also a relationship between the dynamics of the muscle activity and motion dynamics in stimuli. Overall our data indicate that motor simulation is not limited to observation and imitation of agents with a biological appearance, but is also found for robots. However we also found sensitivity to human motion in the EMG responses. Combining data from multiple methods allows us to obtain a more complete picture of action understanding and the underlying neural computations.

Keywords: electromyography, mirror neuron system, imitative processing, action perception, body movements, human robot interaction, social robotics, social cognition

# Introduction

Understanding the movements and actions of others is critical for survival in many species. For humans, this skill supports communicative and social behaviors, such as empathy, imitation, social learning, synchronization, and mentalizing (Blakemore and Decety, 2001; Brass and Heyes, 2005; Iacoboni, 2009; Hasson et al., 2012). The neural network in the human brain that supports action processing includes multiple brain areas, including neural systems related to visual processing of body form and motion, and the fronto-parietal mirror neuron system (MNS), which supports action understanding via *analysis-bysynthesis* (Rizzolatti et al., 2001; Saygin, 2012).

Although the MNS has been studied intensively in the past few decades, much remains to be specified about the functional properties of the system and the mechanisms that support action understanding. Our research aims to contribute to these goals, specifically in relation to form and motion information in the seen action stimuli. Vision researchers often describe perceptual mechanisms of phenomena of interest and functional properties of brain areas – e.g., whether there is evidence for motion direction selectivity, contrast modulation, category sensitivity (e.g., objects, faces), or retinotopy (e.g., Felleman and Essen, 1991). Although there have been studies of action processing and the MNS that manipulated visual stimulus properties such as body form and biological motion (e.g., Buccino et al., 2004; Saygin et al., 2004b; Casile et al., 2010; van Kemenade et al., 2012; Miller and Saygin, 2013), detailed manipulation of visual stimulus parameters to specify response properties of the MNS has not been as common an approach, possibly because mirror neurons are thought to encode high-level information such as action goals regardless of the specific sensory signals that transmit such information (Rizzolatti and Craighero, 2004). From a systems neuroscience perspective, however, such properties and related neural regions are important to specify (e.g., Giese and Poggio, 2003; Saygin et al., 2004a; Jastorff and Orban, 2009; Nelissen et al., 2011; Saygin, 2012). Going forward, a thorough understanding of the functional architecture of the relevant networks will be essential as a foundation for building less simplistic and more complete neuro-computational accounts of action understanding.

One way of doing this is the exploration of human behavior and brain responses in response to artificial agents, such as robots. Artificial agents can be programmed to perform actions, but offer different degrees of human-likeness and realism, and can be systematically varied on critical variables such appearance and motion (Chaminade and Hodgins, 2006; Chaminade and Cheng, 2009; Saygin et al., 2011). The use of robots in action observation and imitation tasks is also interesting from an evolutionary perspective given that the primate brain has, as far as we know, evolved without exposure to robots. Thus, studies with artificial agents can offer insights into psychological mechanisms in perception and action understanding as well as functional properties of underlying neural systems (Pelphrey et al., 2003; Nelissen et al., 2005; Chaminade et al., 2007; Shimada, 2010; Carter et al., 2011; Cross et al., 2012; Saygin et al., 2012; Urgen et al., 2013). Furthermore, developments in the field of robotics have led to the creation of hyper-realistic androids that invoke a future in which these kinds of robots will be deployed closer to humans than ever before (Coradeschi et al., 2006; Dautenhahn, 2007; Kahn et al., 2007). Artificial agents pose interesting questions for the psychology and neuroscience community, since it is not yet clear how we perceive and interact with such characters, especially those "almost-but-not-quitehuman" agents that can evoke negative emotional responses according to the uncanny valley hypothesis (Ishiguro, 2006; MacDorman and Ishiguro, 2006; McDonnell et al., 2012; Saygin et al., 2012; Urgen et al., 2015). In turn, the robotics and animation fields are also interested in defining design parameters that will increase the acceptability and usability of the agents they develop, including in terms of appearance and motion (e.g., Chaminade and Hodgins, 2006; Kanda et al., 2008; Saygin et al., 2011; Riek, 2013).

Here, we focused on whether and how variations in an agent's human-likeness in (i) appearance and (ii) motion influence basic motor processes occurring during action observation and imitation, and the implications of such findings for mechanisms of action processing. There is evidence that similarity between self and other is important for observation and imitation of others. For example, humans spontaneously mimic android and avatar emotional facial expressions (Weyers et al., 2006; Hofree et al., 2014), but such mimicry is modulated by how humanlike the agent appears to the observers (Hofree et al., 2014). In the domain of action and body movement observation, neural activity of the human MNS appears sensitive to visual and motor similarity between the observer and actor (e.g., Calvo-Merino et al., 2006; Cross et al., 2006). Neuroimaging studies with robots or avatars as experimental stimuli have also been carried out. Many of these studies reported that robot movements engage the motor system, though there are discrepancies among studies (Kilner et al., 2003; Tai et al., 2004; Chaminade and Hodgins, 2006; Chaminade et al., 2007; Gazzola et al., 2007; Oberman et al., 2007; Press et al., 2007; Shimada, 2010; Carter et al., 2011; Cross et al., 2012; Saygin et al., 2012; Urgen et al., 2013).

The link between action production and observation has also been explored in "automatic imitation" or "visuomotor priming" paradigms, where participants perform an action that is either compatible or incompatible with an observed movement (for review, see Heyes, 2011; Gowen and Poliakoff, 2012). If action observation and action production employ shared mechanisms, performing an action that is compatible with the observed action could lead to facilitation in performance. In contrast, performing an action that is incompatible with the observed action could result in an interference effect, i.e., slowing or disruption of performance. Several studies investigated such facilitation or interference effects with human actions (Craighero et al., 1996, 1998; Brass et al., 2000; Stürmer et al., 2000), including work exploring their modulation by factors such as biological form or motion (Kilner et al., 2003; Press et al., 2005; Bouquet et al., 2007; Longo et al., 2008; Crescentini et al., 2011). With robot actions, the results have not been entirely consistent, with reports of automatic imitation for robots (Press et al., 2005), or of effects only limited to biological actions or agents (Kilner et al., 2003; Kupferberg et al., 2012; Hayes et al., 2013), or of more complex interactions (e.g., Chaminade et al., 2005; Liepelt et al., 2010). More specific manipulations of temporal and spatial parameters in these paradigms appear promising for unifying the results (Christensen et al., 2011).

In most previous studies of action observation and imitation that used robots, the stimuli were usually not systematically varied in terms of visual properties such as appearance and motion. Robots with different characteristics were used and compared with humans, which prevents us from reaching conclusions regarding specific visual aspects that may modulate the responses. To overcome these limitations, we collaborated with a robotics lab and developed a well-controlled stimulus set of upper body movements performed by three agents, and manipulated the appearance and motion of the agents (see **Figure 1**, Materials and Methods, and Saygin et al., 2012). These stimuli consist of actions of three agents: a Human, a mechanicallooking humanoid Robot, and a human-looking robot (Android). The 'Android' and 'Robot' are actually two visually different versions of the same agent (the humanoid robot Repliee Q2), while the 'Human' in the videos is the woman whose appearance the Android was modeled after. These visual differences create several comparisons of interest. The Human and Android are very similar in appearance (both biological) but differ in motion dynamics. The Android and Robot are matched in their motion dynamics, but differ in appearance. The Human and Robot, although differing from each other in both appearance and motion, share the feature of having congruence between these factors (both biological and both mechanical, respectively), whereas the android features a mismatch (biological appearance, mechanical motion). These stimuli thus enable us to examine in a controlled fashion how these distinctions might influence action observation and imitation.

Using this special stimulus database, we recently performed behavioral, functional magnetic resonance imaging (fMRI) and EEG studies, demonstrating that both the appearance and the congruency of features (i.e., compatible appearance and motion) can influence action processing – but that this modulation varies depending on the behavioral task, across brain regions, and in different time scales (Saygin et al., 2012; Li et al., 2015; Urgen et al., 2015). These and similar studies described above demonstrate the utility of systematically manipulating the visual parameters of sensory input with artificial stimuli (e.g., robots)

FIGURE 1 | Example stills from the videos used as stimuli in the experiment. Here, we can see Repilee Q2 in both 'Robot' and 'Android' form, and the human 'master' it was modeled on. These three types of videos enabled us to compare across *Human Appearance* and *Human Motion*.

in studying human action processing and the MNS. In addition, they highlight the importance of using multiple complementary methods of inquiry. In the present study, we added to this work by using electromyography (EMG), and also extended the experimental paradigm to include explicit action imitation in addition to observation.

Although much of the research on MNS in relation to action observation and imitation has focused on regions in premotor and parietal cortex (Rizzolatti and Craighero, 2004; Iacoboni, 2009; Molenberghs et al., 2009), primary motor cortex is also involved in action perception (Borroni and Baldissera, 2008; Hari et al., 2014). Action observation, imitation, and imagery have been linked to primary motor cortex in studies with TMS in combination with motor evoked potentials (MEP, e.g., Fadiga et al., 1995, 2005), fMRI (e.g., Iacoboni et al., 1999), EEG/MEG (e.g., Hari et al., 1998; Järveläinen et al., 2001; Caetano et al., 2007; Kilner and Frith, 2007; Kilner et al., 2009; Neuper et al., 2009), intracranial recordings (Mukamel et al., 2010) and occasionally with EMG (Leighton et al., 2010). There are strong reciprocal connections and modulatory influences between premotor cortex (specifically area F5, which contains mirror neurons) and primary motor cortex (Iacoboni et al., 1999; Shimazu et al., 2004). Although EMG is not a direct measure of cortical motor activity and can be susceptible to other non-cortical influences, measuring the activity of actual muscles enables us to obtain a reasonable index of primary motor cortex activity associated with the peripheral motor system (Santucci et al., 2005; Kalaska, 2009; Churchland et al., 2012). In the current study, we recorded muscle activity of the arms of human subjects during observation and imitation of arm actions. Using human and non-human agents as stimuli, we explored how features of the observed agent might modulate EMG – specifically, humanlike motion or humanlike appearance.

Besides adding a new methodology with different strengths to study the functional properties of the MNS, the use of EMG could also help bridge the work on action observation and imitation with the work on facial mimicry. EMG has long been used in studying spontaneous facial mimicry, an automatic process that occurs without explicit instruction (Dimberg, 1982; Carr and Winkielman, 2014). Although, as mentioned above, automatic imitation is thought to occur also for bodily movements and actions (Heyes, 2011), the use of EMG in this field has been rare. Berger and Hadley (1975) found increased arm and lip EMG response during observation of non-emotional actions. Moody and McIntosh (2011) replicated these findings for facial but not arm muscles. Furthermore, since EMG is a continuous measure of muscle activity, it creates the potential for linking the dynamics of the motor responses with those of the visual action stimuli, which by nature are temporally unfolding. In this way, we can assess differences in synchronization between the participant and the observed agent both for the observation and imitation conditions, as was done in a recent study that used EMG to examine synchronization of facial expressions between human participants and a robot (Hofree et al., 2014). Finally, EMG allows us to readily investigate the peripheral motor activity in both arms – left and right – as participants observe or imitate an agent performing an action with one arm. This enables us to explore whether there is muscle activity in the arm that is not directly performing an action, and the lateralization of responses during action observation and imitation (Aziz-Zadeh et al., 2006; Franz et al., 2007; Kilner et al., 2009; Cross et al., 2013).

# Materials and Methods

# Participants

Forty-three University of California, San Diego undergraduates were recruited. Participants had normal or corrected-to-normal vision and were right-handed and received course credit. The research protocol was approved by the University of California, San Diego Institutional Review Board. Written informed consent was obtained from all subjects. Unfortunately, data from ten subjects could not be used due to software or equipment errors, and data from six could not be used because they did not follow instructions (e.g., making arm movements during periods they were supposed to remain still). Note, however, that the final sample size of twenty-seven participants is typical for an EMG study (e.g., Hofree et al., 2014). Those participants were 18–22 years of age; 17 were female.

# Visual Stimuli

Stimuli were 2-s video clips of upper body actions performed by the state-of-the art humanoid robot Repliee Q2 and by the human 'master' after whom it was modeled (**Figure 1**). Repliee Q2 performed each action in two different appearance conditions: in the Android condition, Repliee Q2 appeared as is, in a highly humanlike appearance. In the Robot condition, Repliee Q2 appeared after we stripped off or covered the elements that aimed to make the agent highly humanlike in appearance (**Figure 1**). We refer to these conditions as Android and Robot, respectively, even though they were in fact the same physical robot performing the very same pre-programmed movements.

The Robot and Android conditions differed only in their appearance, with Android featuring a humanlike appearance and the Robot featuring a non-human, mechanical appearance. Crucially, the kinematics of the movement for the Android and Robot conditions were identical, since, as mentioned, they were actually the same machine. For the Human condition, the female adult whose face was used in constructing Repliee Q2 (the 'master' of the android) was asked to watch each of the Repliee Q2's actions and then perform the same action naturally. Thus these videos were comparable in appearance and action to the Android version of Repliee Q2, but differed in the motion and timing dynamics of the actions. Due to inherent limitations of the robot we worked with, as well as human anatomy, we did not have the fourth condition that would have made our experimental design 2 (*Motion*) × 2 (*Appearance*): an agent with an appearance that is identical to our Robot condition but with human motion was simply not possible to generate with the present stimulus set. Therefore, even though there are three levels of the factor *Agent*, the omnibus analysis of variance (ANOVA) does not directly correspond to the hypotheses we are testing (which are reflected in the very design of the stimuli) concerning agent appearance and agent motion (see Saygin et al., 2012; Urgen et al., 2013 and Data Reduction and Analysis).

The three agents' actions were videotaped in the same room and with the same background, lighting, position and camera, yielding a well-controlled set of stimuli. A total of eight actions per actor were used in this study: drinking water from a cup, picking up a piece of paper from a table, grasping a tube of hand lotion, wiping a table with a cloth, waving hand, nudging, turning to look at something, and introducing self (a small Japanese head bow, with the arm raised to the chest). All except the turning action were used in the EMG experiment phase; the turning action was used in the rating phase preceding and following the experiment phase. In all videos, the agent executed arm movements with the right hand. Videos were converted to grayscale and cropped at 400 × 400 pixels, with a semitransparent white fixation cross superimposed at the center. The videos were edited such that movement started right at the beginning of the video. We extended the videos' duration to 5 s by freezing the last frame for 3 μs, so that we could record EMG responses for a full 5 s since responses to dynamic actions can take up to 5 s to offset. Further details of the agents and the action videos are reported in previous publications (Saygin and Stadler, 2012; Saygin et al., 2012; Urgen et al., 2013; Li et al., 2015).

# Procedure

Participants sat comfortably 2 feet in front of a computer screen. Electrodes were affixed to the left side of their face and to their two arms. They were asked to place their arms in their lap. They were instructed to sit calmly, keep still, and follow the instructions on the computer screen.

Before beginning the EMG experiment, participants were briefed that they would be viewing videos of three agents. They were told explicitly whether each agent was human or a robot (cf. Saygin et al., 2012; Urgen et al., 2013). Participants then viewed a video of each agent making a turning movement (looking to the right while seated), and were asked to provide subjective ratings on several attributes (e.g., *Human-likeness* or *Comfort*, see Supplementary Materials 1.1). The presentation of the turning videos and acquisition of ratings were repeated again at the end of the main experiment. The rating data are included in Supplementary Materials 2.1. These, along with the facial EMG activity we measured, were intended to serve as measures of affective responses to help address alternative explanations for our results.

The main experiment was modeled after a classical imitation paradigm (Dimberg, 1982; Hofree et al., 2014). In each trial of the experimental phase, participants were presented with a 5-s blank screen with a fixation cross, followed by an action stimulus. As mentioned, in each video, the agent's movement started at the onset of the movie. Once the video clip was played, the last frame was kept visible on the screen such that there was a 5 s period of visual stimulus and EMG recording for the trial. There were two task conditions administered in different blocks: an *Observation* block and an *Imitation* block. The *Imitation* and *Observation* blocks were identical except for the instructions given at the beginning of the block (i.e., the subject's task). In the *Observation* condition, participants were instructed to simply view videos of the three agents whilst remaining still. In the *Imitation* condition they were instructed to imitate the action they saw in the video ("try and make the same action as the agent," modeled after Dimberg, 1982). As mentioned earlier, all participants were right-handed, and the actions in the stimuli were also performed right-handed. It is well-established that adults show a very strong tendency to imitate with the same effector(s) as the observed actor (anatomical imitation, e.g., Koski et al., 2003; Franz et al., 2007), even though this is more difficult and error prone (Press et al., 2009). We therefore expected participants to imitate the movements with their right hand (see EMG Results and Supplementary Materials 1.2.2 for a control analysis). Overall, The *Imitation* block always followed the *Observation* block, in order to avoid potential expectations to imitate during the *Observation* condition (Cross and Iacoboni, 2014). In each condition, participants were presented with a random order of the three agents performing each of the seven actions six times, with a total of 126 trials per block.

# Electromyography

# Data Acquisition

A pilot study was conducted in order to determine the arm muscle best suited for recording responses for the present stimuli. Electrodes were placed over the bicep brachii, the flexor carpi radialis and the brachioradialis muscles of a participant (member of the lab). EMG was recorded while the assistant conducted the *Imitation* block of the experiment. Based on these data, we determined that the bicep brachii was the best candidate for the actions in this experiment.

Arm EMG was recorded by pairs of 1-cm (4-cm diameter) electrodes placed over the bicep brachii muscle of each arm. The first electrode was placed in the center of the muscle, and the second was placed a collar width (∼2 cm) directly below the first electrode. Facial EMG was measured by pairs of 1-cm (2.5 cm square) electrodes on the left side of the face, over the regions of zygomaticus major (cheek) and corrugator supercilii (brow), according to EMG processing standards (Tassinary and Cacioppo, 2000). For the zygomaticus major muscle, the first electrode was placed in the middle of an imaginary line between the lip corner at rest, and the point where the jaws meet (approximately near the ear lobe), the second electrode a collar width (∼1 cm) posterior to the first. For the corrugator supercilli muscle, the first electrode was placed right above the left eyebrow, on an invisible vertical line from the inner corner of the eye up, the second a collar width posterior to the first (following the eyebrow arch).

AcqKnowledge software (Biopac Systems, Goleta, CA, USA) along with Biopac (Biopac Systems, Goleta, CA, USA) was used to acquire the EMG signal. The amplified EMG signals were filtered online with a low-pass of 500 Hz and a high-pass of 10 Hz, sampled at a rate of 2000 Hz, and then integrated and rectified using Mindware EMG software, version 2.52 (MindWare Technologies Ltd., Gahanna, OH, USA).

### Data Reduction and Analysis

Data were analyzed using Matlab (version R2012b, The Mathworks, Natick, MA, USA), JMP (version 10, SAS Institute Inc., Cary, NC, USA), and SPSS (version 19, IBM Corporation, Armonk, NY, USA). Data were first averaged in 500 ms intervals across a trial (i.e., 10 data points for a 5 s trial). Extreme values (values greater than 3 SD away from the mean) were excluded from the analysis. Next, data were standardized within participant and within each muscle, using as baseline the minimum value in the 2000 ms interval before each trial, with a sliding window to smooth baseline values over trials (this technique helped remove any noisy EMG periods in between trials; see also Supplementary Materials 1.2.1). We calculated baseline-corrected activity for each participant and each muscle across the 5-s trial by removing the calculated baseline per trial from each data point (10 per trial). Finally, we averaged baseline corrected EMG activity within 500 ms intervals across trials for each individual, muscle, condition (observation, imitation), agent, and action.

The main experimental factors were Condition (*Observation* and *Imitation*), Arm (*Left* and *Right*), Motion (*Human* and *Non-Human Motion*), and Appearance (*Human* and *Non-Human Appearance*). As mentioned, due to technical reasons our stimuli do not correspond to a full factorial design with respect to appearance and motion (lacking the nonhuman appearance and human motion condition). The main effect/interaction structure of a conventional ANOVA thus does not correspond to the hypotheses being tested regarding these factors (cf. Saygin and Stadler, 2012; van Kemenade et al., 2012). Rather, our stimuli were designed to investigate effects of *Human* vs. *Non-Human Motion*, and *Human* vs. *Non-Human Appearance* (and the congruence of the two, see Saygin et al., 2012). The Human videos represent *Human Motion*, while the Android and Robot videos represent *Non-Human Motion*. The Human and Android videos both represent a *Human Appearance*, while the Robot video represents *Non-Human Appearance* (**Figure 1**). Therefore we conducted multivariate analysis of variances (MANOVAs) with these factors to explore how these features influenced our EMG dependent measures.

Below, we present the statistics and figures as described above to streamline the presentation. However, for the interested reader, we also provide both statistical analyses and figures that do not collapse the agent levels (i.e., three level Agent factor); but since no new findings or insights emerged, these are included in Supplementary Materials 2.2.1.

# Results

# EMG Results

Participants' EMG responses to the videos were analyzed using repeated-measures MANOVA over all time points in the trial (measured in 500 ms intervals). We examined differences across Condition (*Observation* and *Imitation*), Arm muscles (*Left* and *Right*), Motion (*Human* and *Non-Human Motion*) or Appearance (*Human* and *Non-Human Appearance*), and Time (500 ms intervals across a 5 s trial, for a total of 10 time points).

Observation and Imitation of Human and Non-Human Motion. Top: z-scored EMG activity in the Right arm. Bottom: z-scored EMG activity in the

# Human vs. Non-Human Motion Comparisons

We ran a repeated measures MANOVA with a 2 (Condition: *Observation* vs. *Imitation*) × 2 (Arm: *Left* vs. *Right*) × 2 (Motion: *Human* vs. *Non*-*Human*) × 10 (Time) design. This MANOVA revealed several significant effects, as can be seen in **Figure 2**. First, as expected, across both arms we found more muscle activity in the *Imitation* condition than in the *Observation* condition. This is shown by the main effect of Condition [*F*(1,26) <sup>=</sup> 53.42, *<sup>p</sup> <sup>&</sup>lt;* 0.0001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.67], and a significant Condition × Time interaction [*F*(9,234) = 13.39, *p <* 0.0001, η2 <sup>p</sup> = 0.34]. Second, there was more overall muscle activity in the *Right* (dominant) arm, than in the *Left* arm, as revealed by the main effect of Arm [*F*(1,26) <sup>=</sup> 4.72, *<sup>p</sup>* <sup>=</sup> 0.04, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.15], and the Arm × Time interaction [*F*(9,234) = 11.70, *p <* 0.0001, η2 <sup>p</sup> = 0.31].

However, most interestingly, we found evidence that muscles of the two arms responded differently across conditions, as revealed by the Condition × Arm, and Condition × Arm × Time interactions [*Condition* × *Arm: F*(1,26) = 41.41, *p <* 0.0001, η2 <sup>p</sup> = 0.61; *Condition* × *Arm* × *Time: F*(9,234) = 11.71, *p <* 0.0001, η<sup>2</sup> <sup>p</sup> = 0.31]. Separate MANOVAs for each condition

during Imitation condition. Error bars represent SEM. Asterisks denote significance across Motion, at the 0.05 level.

revealed that in the *Observation* condition, there was more activity in the *Left* arm than the *Right* arm [*main effect of Arm: <sup>F</sup>*(1,26) <sup>=</sup> 21.58, *<sup>p</sup> <sup>&</sup>lt;* 0.0001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.45; *significant Arm* × *Time interaction: F*(9,234) = 2.29, *p* = 0.02, η<sup>2</sup> <sup>p</sup> = 0.62]. At the same time, there was significantly more muscle activity in the *Right* arm than the *Left* arm in the *Imitation* condition [*main effect of Arm: <sup>F</sup>*(1,26) <sup>=</sup> 18.90, *<sup>p</sup> <sup>&</sup>lt;* 0.0001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.42; *significant Arm* × *Time interaction: F*(9,234) <sup>=</sup> 11.95, *<sup>p</sup> <sup>&</sup>lt;* 0.0001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.32]. Participants appeared to respond more strongly with their *Right* arm when told to mimic the videos, but exhibited a stronger response with their *Left* arm when just observing videos.

The two arms differed in their sensitivity to *Human Motion*, as can be seen in the significant Arm × Motion interaction [*F*(1,26) = 9.92, *p* = 0.004, η<sup>2</sup> <sup>p</sup> = 0.28]. MANOVAs for each arm yielded the following arm-specific effects: the *Left* arm demonstrated a significant increase in EMG amplitude in response to *Human Motion* in both conditions [*main effect of Motion: F*(1,26) = 4.22, *p* = 0.05, η<sup>2</sup> <sup>p</sup> = 0.14], while the *Right* arm did not [*no significant main effect of Motion. Motion* × *Time interaction: F*(9,234) = 3.55, *p <* 0.001, η<sup>2</sup> <sup>p</sup> = 0.12; *Condition* × *Motion* × *Time interaction: F*(9,234) = 4.02, *p <* 0.0001, η<sup>2</sup> <sup>p</sup> = 0.13]. However, the timing of responses differed in the *Right* arm, specifically in the *Imitation* condition. As can be seen in **Figure 2** (top right panel), the EMG mimicry response of the *Right* arm was more delayed for *Human Motion* than for *Non*-*Human Motion*. *Post hoc* comparisons of *Human Motion* and *Non-Human Motion* in the early and late half of the trial demonstrate that differences between the types of motion exist only in the first half of the trial [*M*Human Motion = 0.73, *M*Non−Human Motion = 0.84, *t*(26) = 4.32, *p <* 0.001], and disappear in the second half [*M*Human Motion = 0.98, *M*Non−Human Motion = 0.95, *t*(26) = −1.03, *p* = 0.31]. This is likely due to the slight timing differences in the videos between Repliee Q2 and the Human. This effect was specific to the *Right* arm in the *Imitation* condition [hence a significant Condition × Arm × Motion × Time interaction, *F*(9,234) = 2.02, *<sup>p</sup>* <sup>=</sup> 0.04, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.07]. Since the EMG movement in the *Right* arm in this condition was much greater in magnitude than EMG responses in any other condition (this can be seen in **Figure 2** top right panel, where the y-axis scale is three times larger than the y-axes in the other panels), we believe it also drives the significant Motion × Time [*F*(9,234) = 3.42, *p* = 0.001, η2 <sup>p</sup> = 0.12] and Condition × Motion × Time [*F*(9,234) = 2.81, *<sup>p</sup>* <sup>=</sup> 0.004, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.10] interactions. We tested whether the delay in reaction to *Human Motion* was due to a particularly slow response by comparing the lags of the EMG waveform that correlated the highest with the waveform produced by the movement in the corresponding videos (see Synchronization Analyses: Are Observers' and Observed Agents' Movements Linked?). These lags did not differ significantly, suggesting that this was most likely correlated with a timing difference in the videos.

Since the Robot is Non-Human in both motion and appearance, we compared the Android and the Human, specifically testing an effect of *Human Motion* while maintaining constant *Human Appearance*. In this MANOVA, again we found a significant interaction of Condition × Arm × Motion [*F*(1,26) <sup>=</sup> 9.53, *<sup>p</sup>* <sup>=</sup> 0.005, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.27], as well as a significant Motion × Time interaction [*F*(9,234) = 2.15, *p* = 0.03, η2 <sup>p</sup> = 0.08], indicating that the EMG response is specifically sensitive to *Human Motion*.

### Human vs. Non-Human Appearance Comparisons

We ran analogous MANOVAs examining whether EMG responses were sensitive to *Human Appearance*. This MANOVA was a 2 (Condition) × 2 (Arm) × 2 (Appearance) × 10 (Time) design. We observed a significant Appearance × Time interaction [*F*(9,234) = 3.30, *p* = 0.001, η<sup>2</sup> <sup>p</sup> = 0.11], as well as a Condition × Arm × Appearance × Time interaction [*F*(9,234) <sup>=</sup> 2.26, *<sup>p</sup>* <sup>=</sup> 0.02, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.08]. These again seem to be driven by the delay in EMG response in the *Right* arm in the *Imitation* condition for the Human videos. For a closer comparison of *Human* vs. *Non-Human Appearance*, while holding motion constant, we compared EMG responses to the Android and Robot conditions, where the *Appearance* was varied while maintaining the same *Non-Human Motion.* Here, there was no significant effect of Appearance.

# Synchronization Analyses: Are Observers' and Observed Agents' Movements Linked?

As described above and shown in **Figure 2**, we found several significant effects of Time (i.e., changes in EMG amplitude in various points of the trial), which led us to consider whether there might be a relationship between the temporal dynamics of the human EMG response and the motion dynamics over time in the visual stimuli. To explore whether people's movements were linked to the movement of the seen agents, we ran cross-correlation analyses with the EMG data and the motion dynamics of the stimuli. The movement dynamics in the visual stimuli were extracted using an object motion-tracking algorithm (Peddireddi, 2009), representing a rough aggregate measure of the motion of the arm in each video (since no other moving objects were present). The video arm movement and the arm EMG response were compared using cross-correlation, which allowed us to determine the lag at which maximal correlation occurred between the visual movement and the time-delayed, congruent EMG activity for each Action, Agent, Condition, and Arm. We aggregated the correlations found for each subject, for the different conditions using a Fisher's z transformation, and compared correlations for each subject across experimental factors.

**Figure 3** shows average correlations across conditions. Though in all conditions the correlations were significant and positive, they varied across experimental conditions. A repeated-measures MANOVA over the z-transformed correlation coefficients conducted across Condition, Arm, and

FIGURE 3 | Average correlations between EMG activity and agent arm movement in video across conditions. Cross-correlations were computed for each individual across experimental conditions. Participants' arm EMG activity was more strongly correlated with agent arm movement during the Imitation condition, especially for the Right arm. Arm EMG activity was also more correlated with *Human Motion* than with *Non-Human Motion*. Error bars represent SEM.

Motion (Human vs. Android and Robot) revealed a significant main effect of Motion. Participants' arm EMG was more correlated with *Human Motion* than *Non-Human Motion* [*F*(1,26) <sup>=</sup> 4.45, *<sup>p</sup>* <sup>=</sup> 0.045, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.15]. We also found a main effect of Condition, with participants' arm EMG more correlated with the observed motion in the *Imitation* than *Observation* condition [*F*(1,26) <sup>=</sup> 49.16, *<sup>p</sup> <sup>&</sup>lt;* 0.0001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.65]. There was also a main effect of Arm [*F*(1,26) = 18.74, *p <* 0.0001, η2 <sup>p</sup> = 0.42], as well as a significant Condition × Arm interaction [*F*(1,26) <sup>=</sup> 6.8, *<sup>p</sup>* <sup>=</sup> 0.02, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.21]. The *Right* arm's EMG dynamics matched the motion in the videos much more than the Left arm, and this difference was more pronounced in the *Imitation* condition (see **Figure 3**). A similar MANOVA run with Appearance (Human and Android vs. Robot) demonstrated no effect of human appearance on correlations between EMG activity and stimulus motion.

#### Supplementary Analyses

In addition to our main factors of interests, we ran additional analyses that are provided in greater depth that may be of interest to some readers, but were not central to the study. As already mentioned, we provided the three factor analyses as well as figures showing the three Agent conditions separately in Supplementary Materials 2.2.1. We also included therein a control to ensure that left arm EMG in the imitation condition was not contaminated by actual left-arm imitation. Although adults overwhelmingly perform anatomical imitation, we set a criterion for rejecting possible mirror imitation. The vast majority of participants clearly used their right hand based on their data. We did find four subjects for whom left hand use could not be ruled out, and excluding these participants did not change the results (see Supplementary Materials 1.2.2). Thus, there is no clear indication of mirror imitation, nor does it appear that the pattern found for the left arm is an artifact of some individuals imitating with the left arm.

We also include in Supplementary Materials results of analyses that also include facial EMG data and Gender (see 2.2.3 and 2.2.4, respectively). From the analyses with Gender as a factor, we observed that *Human Motion* produced a greater effect on EMG of male subjects during the *Imitation* condition as compared with females, and that females demonstrated more *Right Arm* activity during the *Observation* condition than males. Our key findings from the facial EMG analyses were that zygomaticus activity was greater in response to *Human Motion* and *Human Appearance* than to *Non-Human Motion* and *Appearance*, during the *Imitation* condition. On the other hand, corrugator activity increased in response to *Human Motion* in the *Observation* condition. These analyses are provided for the interested reader, but given our study was not designed specifically to explore these issues, they should be considered preliminary observations.

# Discussion

The initial discovery of mirror neurons in the macaque area F5 and evidence for the involvement of motor brain regions in action perception elicited great enthusiasm (Gallese et al., 1996; Rizzolatti et al., 1996; Fadiga et al., 2005). In the following years, the MNS received intense interest and focus from neuroscientists (Rizzolatti et al., 2001; Rizzolatti and Craighero, 2004), and more broadly contributed to the resurgence of the "embodied cognition" framework (Barsalou, 1999, 2008; Wilson, 2002; Niedenthal, 2007; Grafton, 2009; Winkielman et al., 2015), echoes of which were present decades earlier in the works of prominent psychologists such as James, Gibson, and Piaget (Prinz, 1987). MNS has been proposed as a potential evolutionary and neural basis of many essential human abilities such as empathy, theory of mind, learning, and language (Rizzolatti and Craighero, 2004; Arbib, 2005; Iacoboni, 2009), and has been linked to disorders affecting social and communicative functions such as autism (Iacoboni and Dapretto, 2006). Some embraced MNS as the basis for these functions and more (e.g., "neurons that shaped civilization," Ramachandran, 2012), others were concerned that the explanatory powers of the MNS were exaggerated (e.g., "the most hyped concept in neuroscience," Jarrett, 2012). The importance or even the existence of the system, and implications on social functioning and development became a matter of debate (Hickok, 2009) and, more importantly, of empirical investigation (e.g., Nelissen et al., 2005; Dinstein et al., 2007; Kilner et al., 2009; Lingnau et al., 2009; Mukamel et al., 2010; Hamilton, 2013; Cook et al., 2014).

In the past few years, the vast majority of researchers in the field have rejected either extreme, viewing the MNS neither as a silver bullet, nor merely as hype. Looking at the empirical data on the MNS and embodiment, and not necessarily the interpretation of said data, it is difficult to remain unconvinced that some degree of motor processing is an important and critical part of action understanding. Two decades on, the field is moving toward a more neutral framework for thinking about the MNS and embodiment, and of course, for empirical work. This research topic is part of an increasing awareness that, despite the impressive body of work that has accumulated on the topic, much remains to be specified about the MNS and the perception and imitation of actions (Kilner and Lemon, 2013; Cook et al., 2014). Among others, topics that require further research include the response properties, origins and functions of the MNS; how MNS contributes to imitation, empathy and communication; correlational vs. causal relationships between MNS and behavior; individual differences in action processing in healthy and clinical populations; the relationship between MNS and disorders of social cognition; computational mechanisms of information processing within MNS, as well as interactions with other brain areas (Brass and Heyes, 2005; Oztop et al., 2006; Kilner et al., 2007; Engel et al., 2008; Grafton, 2009; Gilaie-Dotan et al., 2011; Mcbride et al., 2012; Sasaki et al., 2012; Avenanti et al., 2013; Fleischer et al., 2013; Hamilton, 2013; Miller and Saygin, 2013; Marshall and Meltzoff, 2014; Simpson et al., 2014).

It is worth noting that research on the functional properties of MNS has been naturally dominated by neuroimaging studies, which focus on the central nervous system. However, in the context of embodied cognition, a complete characterization of the mechanisms of action observation and imitation requires consideration of the peripheral systems as well. Here, we used EMG to examine how muscle activity might be influenced by human-likeness of the agent during action observation and imitation using stimuli of actions performed by three agents: a human agent featuring humanlike appearance and motion, an android featuring humanlike appearance and non-humanlike motion, and a robot featuring non-humanlike appearance and motion.

# Artificial Agents in Cognitive Neuroscience

In terms of our understanding of functional properties of MNS and simulation theory, which posits visually perceived actions are mapped onto the viewer's own sensorimotor neural representations, stimuli that feature artificial form or motion patterns can allow us to explore the boundary conditions for evoking motor simulation. Artificial agents such as robots can be important experimental stimuli to test such hypotheses since robots can perform recognizable actions, but can differ from biological agents in their design (Chaminade et al., 2007; Saygin and Stadler, 2012; Urgen et al., 2013).

Although there is a growing body of research that employs robots as experimental stimuli in action observation tasks, the cognitive neuroscience literature on the perception of robots has inconsistencies (Kilner et al., 2003; Chaminade and Hodgins, 2006; Chaminade et al., 2007; Gazzola et al., 2007; Oberman et al., 2007; Press et al., 2007; Saygin et al., 2012; Urgen et al., 2013). Some studies reported that perception of robot actions results in similar activity in the MNS (as compared to that for human actions), whereas others have argued that the MNS is not responsive to nonhuman actions (Tai et al., 2004). Importantly, an fMRI study found no difference between conditions in ventral premotor cortex using the same stimuli employed in the current study (Saygin et al., 2012). In addition, a subset of the same stimuli were used in an EEG study, reporting indistinguishable modulation of the power of sensorimotor mu oscillations (which have been linked to motor simulation and the MNS, e.g., Cochin et al., 1999; Arnstein et al., 2011; Press et al., 2011) for human, android and robot actions (Urgen et al., 2013). The present data, however, showed differential modulation of EMG activity for these stimuli. How can we reconcile these findings in the light of the recent experimental evidence? One possibility is that EMG activity does not directly reflect the activity of the premotor cortex, which has been the focus of most prior work. EMG instead partially reflects the activity of the primary motor cortex, and is also susceptible to other influences (see Contributions of EMG: Mechanisms of Action Observation and Imitation).

# Lateralization in Action Imitation and Action Observation

In the present study, during explicit action imitation, EMG activity in the right hand was greater than the activity in the left hand. This is unsurprising given that participants were explicitly asked to imitate the agents' actions, which were right handed, but assures that EMG can reliably pick up imitation-related activity. More interestingly, in the explicit imitation condition, we found enhanced EMG activity also in the stationary left arm. Furthermore, the EMG activity in the left arm was also present during passive observation; in fact, it was greater than the activity in the right arm.

These results are consistent with previous reports that observation of actions involving one hand can influence motor activity related to both hands of the observer (Borroni and Baldissera, 2008; Borroni et al., 2008). Why did the supposedly passive, non-dominant left arm, show activity during both action imitation and action observation? One possibility is a spatial compatibility effect, whereby observing an action performed on the one side of the screen (here, left) would elicit activity in the same side of the body. Such spatial compatibility effects are well-documented, specifically in studies using stimulus response compatibility paradigms (for a review, see Lu and Proctor, 1995). In fact, it has been suggested that motor resonance may be linked not to the specific arm that performs the action, but to the side of space of the observed action: Kilner et al. (2009) reported that attenuation of beta oscillations during action observation, which show mirror-like properties and are thought to index the activity of primary motor cortex (see Kilner and Frith, 2007 and Contributions of EMG: Mechanisms of Action Observation and Imitation), was greater in the contralateral hemisphere. Greater motor cortex activity in the contralateral side, i.e., the right motor cortex, might then produce greater muscle activity in the left arm.

Another reason for our pattern of findings, especially in the observation condition, could be inhibitory processes that suppress activity of the dominant (right) arm when no action takes place (i.e., during action observation). The presence of inhibitory influences during action observation was recently highlighted (Cross et al., 2013; Vigneswaran et al., 2013). The left arm, on the other hand, could receive less inhibition. Lateralization of premotor and motor cortical processing and the relationship to muscle activity is a complex neuro-computational problem (Baldissera et al., 2001; Fadiga et al., 2005; Churchland et al., 2012; Shenoy et al., 2013). Future studies could examine these differences in arm EMG activity by comparing right and left handers' reactions to actions performed by right and left arms.

# Sensitivity to Human Motion

In addition to different patterns of lateralization in action observation and imitation, we found that muscle activity for action imitation and observation appeared to be sensitive to the presence of biological motion. That is, EMG responses during explicit imitation as well as observation were greater to an agent that not only looked like a human, but also moved like one. The synchronization results further showed greater linking of participants' EMG dynamics to human motion. On the one hand, this could be consistent with the idea that MNS is specialized for biological actions (Tai et al., 2004). However, participants were able to faithfully imitate actions produced by all three agents, which along with other studies listed in Section "Artificial Agents in Cognitive Neuroscience," challenge the notion of strong selectivity. Rather, what these results indicate may be that the nervous system preserves "temporal fidelity" between seen and performed movements even when participants are not instructed to carefully imitate motion trajectory.

The observed greater EMG response to human motion may have several possible sources. On one hand, biological movements have specific dynamics, and are more complex and familiar in the context of human actions. Within the experiment, however, human motion was presented less frequently (where non-human movement was represented by both the Android and Robot conditions and thus was seen twice as often). Thus it is possible imitation of human movements may have involved more attention or effort, which could result in overall increase in muscle tension. A related "affective" explanation may be that viewing a human elicits greater arousal (but note that participants did not rate the human as eliciting more arousal than the other agents, see Supplementary Materials 2.1), which can influence muscle tone and be detected through EMG (Hoehn-Saric et al., 1997). However, we believe such generic accounts are insufficient to account for the effect. Corrugator activity (brow furrowing), an indicator of effort (de Morree and Marcora, 2010), was greater for *Non-Human Motion*, particularly in the observation condition (see Supplementary Materials 2.2.3). If greater effort were associated with correctly imitating human motion, we would expect the opposite pattern. A delay in reaction to human motion could be another potential indicator of effort in the form of a speed-accuracy tradeoff. However, both in an action prediction study (Saygin and Stadler, 2012), and an attentional capture and cueing study (Li et al., 2015) behavioral data were instead modulated by *Non-Human Appearance* (i.e., Robot condition) indicating generic effort or arousal effects are unlikely to underlie the EMG differences in the current study. Rather, we suggest the significant interactions with Time in the data, and the comparisons of cross-correlation lags demonstrate that the results are better viewed as preserved dynamics between perceived movement and executed movement rather than a delay *per se*. This is a much more interesting possibility, is consistent with prior work (Bouquet et al., 2007; Watanabe, 2008), and should be a fruitful direction to explore in future studies of dynamics of imitation of human and non-human movements, ideally with motion capture along with EMG (Thoroughman and Shadmehr, 1999; Casile and Giese, 2006).

# Contributions of EMG: Mechanisms of Action Observation and Imitation

Electromyography can be an important tool for understanding mechanisms underlying action observation and imitation. It is increasingly understood that in addition to MNS, primary motor cortex is also involved not only in imitation but also in action observation (Borroni and Baldissera, 2008; Hari et al., 2014). However, the relationship between the primary motor cortex, premotor cortex, and the peripheral motor system is not yet wellunderstood. EMG complements methods such as EEG, fMRI and MEG, and by examining actual muscle activity during action observation and imitation, provides an important contribution to the study of action observation and imitation.

Our specific findings pose further interesting questions for the neuroscience community. The data demonstrates that there is muscle activity in the non-dominant arm while the dominant arm is imitating an action, as well as when observing an action performed by the opposite arm. Is this activity related to the dominance of the right arm, the side the action is observed, or to inhibitory neural processes? Further studies that can dissociate effector from spatial compatibility effects such as that of Kilner et al. (2009) could help clarify the underlying reasons.

As for the modulation of EMG by motion dynamics in both observation and imitation, this feature of similarity between the observer and the observed agent might be especially important for imitation, even when it is not explicitly demanded of the participants. In future studies, these can be analyzed with more sophisticated methods and motion capture. Furthermore, the sensitivity of arm EMG to human motion during observation adds a new finding to our multi-modal imaging work with these stimuli, as well as the corresponding research questions regarding the role of human-likeness in action processing. Our previous work did not show any selectivity for biological motion, though there were effects of visual appearance in both behavioral (Saygin and Stadler, 2012; Li et al., 2015) and neuroimaging studies (the extrastriate body area with fMRI and in the frontal theta oscillations in EEG; Saygin et al., 2012; Urgen et al., 2013). Taken together, these studies suggest that EMG taps into processes that we were not able to measure with the brain imaging methods, and adds to efforts to get a more comprehensive picture of the human action processing, MNS and embodiment. Last but not least, since studies have explored EMG in relation to single-cell level activity in motor cortex in non-human primates (Santucci et al., 2005; Kalaska, 2009; Churchland et al., 2012), applying this method to action processing in humans has the potential to help us make better inferences about the physiological mechanisms underlying action imitation and observation, bridge between different methods and brain areas, as well as provide opportunity for exploring cross-species similarities and differences.

# Social Robotics and Artificial Agent Design

Finally, our results have implications for social robotics. One important topic in social robotics today is the design principles of humanoid robots. Neuroscience research can inform how we should design robots that people can seamlessly interact with, as they can with human social partners. In fact, input from cognitive and neural sciences to robotics is essential in this endeavor. In the present study, we found evidence for sensitivity to human motion even during passive observation. Given that unconscious mimicry processes can influence emotional and social processes (Chartrand and Bargh, 1999; Carr et al., 2003), human-robot interaction studies that focus only overt behaviors may miss important implicit effects that may be highly relevant to the identification of design principles for neuro-ergonomic robots.

# Author Contributions

GH, BU, PW, and AS designed the study; AS and BU provided experimental materials; GH programmed and ran the experiment; GH conducted the analyses in consultation with the other authors; GH, BU, PW, and AS wrote the paper.

# Acknowledgments

This study was supported by NSF (CAREER BCS-1151805 to AS), Kavli Institute for Brain and Mind (Innovative Research Award to AS), Qualcomm Institute (formerly Calit2, Strategic Research Opportunities Award to AS, and graduate fellowship for BU), and DARPA (AS). We thank Prof. H. Ishiguro and the Intelligent Robotics Laboratory at Osaka University for help in stimulus preparation, E. Carr, L. Kavanagh, J. Thierman, and M. Hofree for helpful suggestions, and Winkielman lab research assistants for help with data collection.

# References


# Supplementary Material

The Supplementary Material for this article can be found online at: http://journal*.*frontiersin*.*org/article/10*.*3389/fnhum*.* 2015*.*00364


*ACM/IEEE International Conference on Human-Robot Interaction HRI '15* (New York, NY: ACM), 43–50. doi: 10.1145/2696454.2696478


predictive power and evidence of plasticity. *Philos. Trans. R. Soc. Lond. B Biol. Sci.* 369, 20130289. doi: 10.1098/rstb.2013.0289


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Hofree, Urgen, Winkielman and Saygin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Sticking your neck out and burying the hatchet: what idioms reveal about embodied simulation

# *Natalie A. Kacinik\**

*Cognitive Neuroscience of Language Lab, Department of Psychology, Brooklyn College and the Graduate Center, City University of New York, Brooklyn, NY, USA*

#### *Edited by:*

*Analia Arevalo, East Bay Institute for Research and Education, USA*

#### *Reviewed by:*

*Debra Titone, McGill University, Canada Cristina Cacciari, University of Modena, Italy*

#### *\*Correspondence:*

*Natalie A. Kacinik, Department of Psychology, Brooklyn College, City University of New York, 5309 William James Hall, 2900 Bedford Ave., Brooklyn, NY 11210-2889, USA e-mail: nkacinik@brooklyn.cuny.edu*

Idioms are used in conventional language twice as frequently as metaphors, but most research, particularly recent work on embodiment has focused on the latter. However, idioms have the potential to significantly deepen our understanding of embodiment because their meanings cannot be derived from their component words. To determine whether sensorimotor states could activate idiomatic meaning, participants were instructed to engage in postures/actions reflecting various idioms (e.g., *sticking your neck* out) relative to non-idiomatic control postures/actions while reading and responding to statements designed to assess idiomatic meaning. The results showed that statements were generally more strongly endorsed after idiom embodiment than control conditions, indicating that the meaning of idiomatic expressions may not be as disconnected from perceptual and motor experiences than previously thought. These findings are discussed in terms of the mirror neuron system and the necessity of pluralistic contributions from both sensorimotor and amodal linguistic systems to fully account for the representation and processing of idioms and other figurative expressions.

**Keywords: idiom, metaphor, embodiment, amodal symbols, perceptual symbols, mirror neurons**

# **INTRODUCTION**

Across cultures, languages are rich with figurative expressions that frequently occur when people communicate. For instance, English speakers are estimated to utter approximately 10 million novel metaphors and about 20 million idioms over their lifetime (Cooper, 1999). A metaphor involves identifying a connection between otherwise dissimilar conceptual domains (Hoffman, 1984; Lakoff, 1993; Katz, 1996; Gentner et al., 2001), as illustrated in sentences like *Mary's personality is a magnet*, or *Mary's mind was a whirlpool*; whereas idioms are expressions whose meaning cannot be derived from a systematic or literal processing of the component words (Fraser, 1970; Swinney and Cutler, 1979; Libben and Titone, 2008; Vespignani et al., 2010). For example, Mary could be *sticking her neck out* when she sneaks out of her parents' home to attend a late-night party or *sitting on the fence* about who to vote for in the upcoming class election, and the meaning of the individual words does not enable us to understand these idiomatic expressions.

Since the metaphorical link to the origin of their meaning has been lost or is no longer evident, idioms were initially thought to be dead or frozen metaphors (Katz, 1973; Gibbs, 1994; Keysar and Bly, 1999; Jackendoff, 2002; Caillies and Declerq, 2011). Indeed, the similarities and differences in how metaphors and idioms are understood has been the subject of considerable debate (Cacciari, 1993; Katz, 1996; Sanford, 2008; Caillies and Declerq, 2011), particularly since some evidence suggests that idiom comprehension is at least partly based on the activation of underlying conceptual metaphors (Gibbs and O'Brien, 1990; Gibbs et al., 1997; Sanford, 2008; but see Glucksberg et al., 1993; McGlone, 1996; Keysar and Bly, 1999). Research has also shown that idioms and how they are processed can vary according to their transparency and compositionality (Titone and Connine, 1999; Caillies and Butcher, 2007; Libben and Titone, 2008). Although it is no longer correct to consider idioms as dead metaphors (Gibbs, 1993), it is generally accepted that they are indeed a distinct type of figurative expression whose representation and processing differs from that of metaphors (Cacciari, 1993; Glucksberg et al., 1993; Giora and Fein, 1999; Caillies and Declerq, 2011).

Interestingly, even though idioms are used in conventional language twice as frequently as metaphors according to the previously provided estimate, most research has been focused on metaphors, as evidenced by a standard PSYC INFO search yielding over *11 times* more hits for metaphors than idioms. However, despite the fact that metaphors have been studied to a substantially greater extent than idioms, there has still been a considerable amount of research investigating how idioms are represented and processed (e.g., Gibbs, 1986; Gibbs et al., 1989; Hamblin and Gibbs, 1999; Titone and Connine, 1999; Peterson et al., 2001; Tabossi et al., 2005, 2008; Sprenger et al., 2006; Smolka et al., 2007; Schweigert, 2009; Fanari et al., 2010; Holsinger and Kaiser, 2013). As indicated by these references and from the idiom literature in general, the vast majority of studies have been aimed at trying to specify how idiomatic expressions are processed and understood. The major research questions have generally centered around investigating the extent to which idiom comprehension is distinct from or relies on the same lexical, semantic, and syntactic processes involved in the processing of regular literal language (Burt, 1992; Peterson et al., 2001; Tabossi et al., 2005; Vespignani et al., 2010), particularly the degree to which the literal meaning of the component words and of the phrase as whole is potentially activated and processed while the idioms is being understood (Sprenger et al., 2006; Smolka et al., 2007; Rommers et al., 2013).

Other lines of research have focused on identifying the factors or dimensions upon which idioms can vary, such as frequency, familiarity, length, decomposability, transparency, predictability, literality and their effects on how idioms are processed (Titone and Connine, 1994a,b; Libben and Titone, 2008; Tabossi et al., 2008; Skoufaki, 2009; Fanari et al., 2010). However, a review of this literature is really beyond the scope of this paper, since the goal of the present study was to see if it would be possible to activate idiomatic meaning motorically as a result of participants simply engaging in positions or actions like "sticking their neck out," "sitting on the edge of their seat," or "burying a hatchet," without being told or presented with the actual idiomatic expressions themselves. In other words, the purpose of the current study was to investigate the extent to which the representations of some idioms may be embodied and grounded in sensorimotor experiences since prior approaches and research on idioms, as mentioned above, have generally been couched in traditional views of cognition and psycholinguistics, with words and phrases presumed to be relatively abstract, arbitrary, amodal symbols.

This type of standard amodal psycholinguistic approach is also evident in the main theories proposed to explain how idioms are processed. For example, one of the earliest theories of idiom comprehension, the lexical representation hypothesis, suggested that the processing of idioms was similar to the processing of long words whose meaning is simply accessed and retrieved from the lexicon (Swinney and Cutler, 1979). More recent theories, like the decomposition account propose that idiomatic representation varies as a function of compositionality such that some component words carry more idiomatic meaning than others (*pop the question* vs. *kick the bucket*), with highly non-decomposable idioms resulting in faster processing and greater "direct access" to meaning (Gibbs et al., 1989). Alternatively, the configuration approach posits that idioms are represented like any other expression, as a configuration of words that are processed on a word-by-word basis until enough words are configured to retrieve the stored idiomatic string and its meaning (Cacciari and Tabossi, 1988; Cacciari et al., 2007; Tabossi et al., 2009). Since the configuration hypothesis proposes that all idioms are processed like regular literal language until the phrase is identified as an idiom whose meaning is then holistically retrieved, it can thus be considered to represent a hybrid of both decompositional and more unitary non-compositional theories. A variety of hybrid accounts have also been proposed, claiming that the literal meaning and syntactic structure of idioms are always represented and processed to some extent, although they differ in the specific manner by which this occurs (Cutting and Bock, 1997; Titone and Connine, 1999; Sprenger et al., 2006; Caillies and Butcher, 2007; Libben and Titone, 2008).

It is such hybrid accounts that currently seem to have the greatest amount of experimental support (Titone and Connine, 1999; Sprenger et al., 2006; Caillies and Butcher, 2007; Libben and Titone, 2008; Tabossi et al., 2009; Caillies and Declerq, 2011; Holsinger and Kaiser, 2013). However, it should be clear that all of these theories have been focused on the processing of idiomatic expressions, particularly how their non-literal meaning is understood given that its connection to the words in the utterance is generally not clear. Across all of these accounts there is nothing to suggest that the words and phrases in these expressions consist of anything other than standard amodal symbols or lemmas (Cutting and Bock, 1997; Sprenger et al., 2006) with links to their literal and figurative conceptual representations.

In contrast to these types of conventional psycholinguistic theories, there is now considerable evidence that much of our cognitive functioning, conceptual representations, and language processes are fundamentally grounded in our sensorimotor and perceptual experiences (Lakoff and Johnson, 1980, 1999; Barsalou, 1999, 2008; Zwaan and Madden, 2005; Gibbs, 2006b). For example, Stanfield and Zwaan (2001) found that participants processed pictures faster when they had the same orientation implied in a preceding sentence, such that they responded faster to a vertical rather than horizontal picture of a pencil after reading *John put the pencil in the cup*, suggesting that implicit perceptual simulations primed participants to more quickly recognize corresponding spatial orientations. Similarly, Zwaan et al. (2002) found that after reading sentences like *The ranger saw the eagle in the sky*, participants were faster to respond by naming or deciding that spatially congruent pictures (i.e., an eagle with outstretched wings) as opposed to incongruent images like an eagle with folded wings corresponded to a word in the sentence, again showing that participants' comprehension was perceptually biased. A series of studies by Matlock and colleagues has shown that the reading latencies of sentences with implied or "fictive" motion such as *the road runs through the valley* were affected by the speed of motion, type of travel or distance conveyed in a preceding story (Matlock, 2004); and conversely, that fictive motion sentences can influence the subsequent interpretation of an ambiguous sentence like *Next Wednesday's meeting has been moved forward two days* (Matlock et al., 2005), as well the duration and manner of an individual's eye movements (Richardson and Matlock, 2007). More recently, Ansorge et al. (2010) have further demonstrated that such embodied effects can even be obtained with masked subliminal spatial prime words like *high* that can facilitate the processing of related target words like *above*, in addition to affecting the performance of spatially congruent or incongruent responses.

Further support for the embodiment of language comes from research showing that conceptual processing can both affect or be affected by the activity of corresponding perceptual and motor brain regions. For instance, an fMRI study by Hauk et al. (2004) showed that simply reading action words referring to the face, arms, and legs (*e.g., lick, pick, or kick*) resulted in somatotopic motor cortex activation in regions corresponding to the body part responsible for that action. In another study of arm- and leg-related words (*e.g., fold, beat, grasp vs. kick, hike, step*) words, Pulvermüller et al. (2005) used transcranial magnetic stimulation (TMS) to disrupt neural activity in arm areas of the left languagedominant hemisphere and obtained faster responses to leg-related terms, whereas TMS applied to leg areas facilitated responses to arm terms. These and other findings described in a review by Fischer and Zwaan (2008) therefore provide considerable support that language comprehension can be facilitated or hindered by perceptual and motor processes.

Although such findings are convincing, the objection could be raised that the concrete or highly imageable nature of the stimuli biases responses in the direction of embodied effects. Figurative language provides a stronger test for embodiment because even when figurative expressions involve concrete terms their meanings typically refer to abstract concepts divorced from their embodied origins. However, there is now a considerable that at least some types of figurative expressions, particularly metaphors, are also embodied. One of the earliest and most comprehensive proposals along these lines was the *conceptual metaphor theory* (*CMT*) proposed in a seminal book by Lakoff and Johnson (1980), suggesting that metaphors are not simply linguistic phenomena but (1) reflect more general cognitive and experiential aspects of how concepts are represented and processed in the human mind, and (2) represent a new tool for gaining insight into the acquisition of conceptual knowledge, particularly our knowledge and understanding of abstract concepts like GOOD or BAD which become metaphorically represented as being high or low in a spatial sense. The validity of underlying conceptual metaphors like GOOD IS UP has been shown in several experiments demonstrating how different concepts and psychological states (i.e., power, affect, the divine, and even real estate) map onto the vertical axis to demonstrate that UP is indeed generally associated with GOOD (Meier and Robinson, 2004, 2006; Schubert, 2005; Giessner and Schubert, 2007; Meier et al., 2007a,b, 2011).

Other studies have examined the PERSONALITY or FRIENDLINESS is TEMPERATURE metaphor and found that incidental experiences with physical warmth (holding hot vs. iced coffee) induced "warm" judgments about others (e.g., trust) (Williams and Bargh, 2008), that people regulate social warmth with physical warmth (i.e., lonelier people had an increased tendency to take warm baths/showers) (Bargh and Shalev, 2012), and that those who were socially ostracized felt physically colder than those who were not (Zhong and Leonardelli, 2008). A recent investigation by Gibbs (2013) examined the embodiment of the RELATIONSHIPS ARE JOURNEYS metaphor by presenting participants with brief passages describing either a smoothly developing relationship or one with difficulties that are still there and have not been overcome, in either metaphorical or non-metaphorical language. When participants were later asked to walk or imagine themselves walking to a marked spot 40 feet away, those presented with the successful relationship walked longer and further than those given the unsuccessful relationship, but only when written in language conveying a journey metaphor. Lastly, simple metaphoric expressions like *swallow your pride* or *spit out the facts* were found to be understood faster when they were preceded by either the actual or imagined corresponding action relative to mismatching action or no movement control conditions, and these findings were not simply due to lexicalsemantic activation or associations (Wilson and Gibbs, 2007). A thorough review and discussion about the embodiment of metaphor is beyond the scope of this paper, but readers may consult the following references (Gibbs, 2006a; Gibbs et al., 2006; Gibbs and Matlock, 2008; Ritchie, 2008; Falck and Gibbs, 2012). Most of the aforementioned work has focused on the manner in which concrete, embodied states facilitate the activation of *pre-existing* conceptual knowledge, but it has also recently been shown that higher-order, abstract and ill-defined *processes* like creativity can be enhanced by embodying metaphors (Slepian et al., 2010; Leung et al., 2012; see Eskine and Kaufman, 2012, for a review).

All of these findings suggest that metaphors are more than linguistic expressions; they are indicative of how embodied experiences influence both the activation and processing of various conceptual representations. However, a potential criticism of this work is that the mappings between the embodied source domains and the abstract target domains in many metaphors can be quite direct and obvious (consider the physical/interpersonal warmth research). In addition, the proponents of metaphor embodiment do not claim that their findings "necessarily generalize to all kinds of metaphorical language*...* [and that] embodied simulation is necessarily central to all aspects of metaphor comprehension" (Gibbs, 2013, pp. 376–377). This point is potentially important with respect to the embodiment of idioms since they have, by definition, generally lost the metaphorical connection to the origin of their meaning (see the graded account and findings by Desai et al., 2013, mentioned in the next section). Another critical issue regarding the embodiment of language and metaphor is the extent to which the activation of sensorimotor is a really fundamental and obligatory part of conceptual representation and processing, or an epiphenomenal byproduct of contextual priming effects or other underlying amodal mechanisms (Mahon and Caramazza, 2008; Dove, 2009). Putting these issues aside, there is now strong support that the comprehension of at least some metaphors relies on embodied sensorimotor simulations, including the results of some recent neuroimaging studies (Chen et al., 2008; Desai et al., 2011, 2013; Lacey et al., 2012).

Contrary to the considerable behavioral and neuroscientific research on the embodiment of metaphors, very little work has been done to investigate the extent to which idioms may also be embodied. This is likely because in contrast to the perceptually rich and verbally creative quality of metaphors, idioms seem like the hallmark of fixed amodal expressions whose meaning must be explicitly learned, stored, and retrieved from memory. Idioms therefore appear to present a potential challenge for embodied theories of cognition because it seems improbable that the comprehension of idiomatic meaning would involve the activation of perceptual and/or motor regions rather than simply linguistic information. However, due to the considerable amount of evidence demonstrating the embodiment of cognition across various domains such as literal and non-literal language like metaphors, there have been a small but slowly growing number of studies to investigate whether idioms are embodied.

Prior to describing those findings, it is worth considering some earlier work that was not really aimed at investigating the embodiment of idioms *per se*, but consisted of behavioral investigations of the extent to which individuals seem to use mental imagery and underlying conceptual metaphors (that appear to be embodied from the evidence above) in the comprehension of idiomatic meaning. The results of this research have generally been mixed, such that some researchers obtained evidence to support that hypothesis (Gibbs and O'Brien, 1990; Nayak and Gibbs, 1990; Gibbs et al., 1997; Gibbs and Bogdonovich, 1999; Nippold and Duthie, 2003), while others have failed to support the hypothesis (Glucksberg et al., 1993; Cacciari and Glucksberg, 1995; Keysar and Bly, 1995, 1999; McGlone, 1996; Glucksberg and McGlone, 1999; Keysar et al., 2000). Recent efforts to examine the embodiment of idioms have mostly focused on using functional imaging and other neuroscientific techniques to determine whether comprehending idiomatic expressions involves the activity of perceptual and motor areas of the brain.

Similar to the behavioral studies cited above, this research has also produced contradictory results. Specifically, a couple of recent fMRI and MEG studies by Boulenger and colleagues showed that sentences with leg- and arm-related words used in an idiomatic or literal sense (e.g., *He kicked the habit* vs. *He kicked the statue*) each activated somatotopically corresponding areas of motor cortex (although the "leg effect" in the MEG only approached significance) and that the time course of this activation was generally similar and relatively rapid, within 150–250 ms, for both types of stimuli (Boulenger et al., 2009 and Boulenger et al., 2012, respectively). These findings therefore generally support the embodiment of idioms, but it is worth noting that the idiomatic sentences in the latter MEG study produced significantly stronger early activation than literal sentences in language regions like the left temporal pole, Broca's area in the left inferior frontal cortex, and the left dorsolateral prefrontal cortex. Overall, however, the brief latencies and region-specific patterns of activation in these studies suggest that word meaning is recruited from sensoriperceptual systems and that idioms are semantically grounded in the motor system.

Some further but very weak evidence for the embodiment of idioms comes from a couple of recent studies by Desai et al. (2013) and Lauro et al. (2013). Both groups of researchers obtained significant activation in sensory and motor regions for metaphorical sentences like *The congress is grasping the state of affairs* or *Matilde throws her sadness far away* (translated from Italian in Lauro et al., 2013), whereas the results for idiomatic sentences (e.g., *The congress is grasping at straws in the crisis*) only approached significance and showed trends toward the expected effects. In both cases the authors argue for a graded account of embodiment suggesting that as the meaning of an expression becomes increasingly more conventional and abstract, as in the transition from metaphoric to idiomatic meaning, the weaker and less likely it is to activate perceptual and motor brain areas. Even though their results for idioms were non-significant, these researchers claim that their overall findings generally support the embodiment of figurative language.

This is in contrast to several studies that have all failed to provide evidence to support the embodiment of idioms (Aziz-Zadeh et al., 2006; Raposo et al., 2009; Cacciari et al., 2011; Cacciari and Pesciarelli, 2013). For example, the study by Cacciari et al. (2011) involved administering TMS pulses to the leg region of motor cortex and was unable to show significant motor evoked potentials (MEPs) in leg muscles for idiomatic sentences, although MEPs were obtained in response to literal and metaphoric stimuli. Furthermore, both fMRI investigations by Aziz-Zadeh et al. (2006) and Raposo et al. (2009) failed to show significant neural activity in corresponding motor or premotor cortices for expressions like *kick the bucket* and *biting off more than you* *can chew*. In sum, the findings in both the behavioral and neuroscentific literature are contradictory and the extent to which sensorimotor systems contribute to idiomatic meaning remains unclear. Indeed, given that only Boulenger and colleagues have been able to find significant results in the neural domain thus far (Boulenger et al., 2009, 2012), most of the evidence suggests that idioms are not embodied.

As described and to our knowledge, most of the work regarding the embodiment of idioms and language in general, has involved presenting participants with verbal stimuli to see how they subsequently affect their behavioral responses and activate their perceptual and/or motor brain regions. In contrast, the current study took the relatively novel approach of reversing this design, to examine whether putting individuals into sensorimotor states corresponding to certain idioms would activate their meaning and affect participants' subsequent judgments. If embodying idioms solely by having people engage in the relevant actions without exposure to the actual expressions can activate their meaning, it would suggest that perceptual-motor symbols are a fundamental part of their representation and substantially increase our understanding of how idioms are processed, in addition to providing further evidence for embodiment as the foundation of cognitive and linguistic processing. Since the existing support for the embodiment of idioms was weak and it seemed questionable whether the act of "sticking one's neck out" or "sitting on a fence" could really instantiate the corresponding meanings of taking a risk or being undecided, the study was meant to be an initial exploration of this issue with a limited number of stimuli in a highly plausible experimental context. The whole premise of demonstrating idiomatic embodiment in this paradigm hinges upon participants not being aware that the positions and actions they have to perform represent idioms and potentially recognizing the hypothesis.

Specifically, they were told that the purpose of the study was to investigate how reading comprehension may be affected by different positions and movements. We developed a relatively long story involving a crime and subsequent courtroom drama. The whole narrative was designed to read and flow as one coherent story, but it consisted of four parts that were each written to relate to the meaning of a particular idiom. Each portion of the story was followed by a set of four questions designed to measure the extent to which the idiomatic meaning was activated, which was the dependent variable (DV). With respect to both the stories and questions, the idioms were never actually mentioned and considerable effort was taken to avoid using words and phrases closely associated to the idiomatic expressions and their meanings, to prevent them from being simply activated by verbal means. Further information will be provided in the method section, but the idioms used were: *sticking one's neck out, sitting on the fence, sitting on the edge of one's seat, and burying the hatchet*. They were chosen because (1) they could plausibly be worked into the context of the story, (2) involved sustained positions or actions that could be maintained while participants read portions of the story and responded to the questions, and (3) because pretesting of potential idioms by 30 students (19 female) similar to those participating in the experiment, indicated that were familiar and understood by at least 70% of those individuals. For each part of the story, participants were assigned to one of three conditions (embodied idiom, embodied control, or normal control), where they either performed the position or action corresponding to the idiom, a different control position or action, or were simply seated in a normal comfortable position, respectively, in a counter-balanced manner. In other words, every participant engaged in each condition at least once across the 4 portions of the story, with each condition occurring an equal number of times across participants.

To our knowledge, the present study thus appears to be the first to investigate the embodiment of idioms by seeing whether having participants simply perform an idiomatic action would be enough to induce their meaning and affect subsequent judgments. It was predicted that embodying the idioms compared to the two types of control conditions, would result in stronger responses to the questions in the direction of the idiomatic meaning. Demonstrating the embodiment of idioms through this relatively novel approach (see Wilson and Gibbs, 2007; Leung et al., 2012, for similar efforts with metaphors), would provide further support that non-literal abstract meanings can still be grounded in sensorimotor experiences, particularly with respect to idioms where the evidence has been mixed.

### **METHODS**

#### **PARTICIPANTS**

The participants consisted of 60 Brooklyn College undergraduates (35 females, 25 males) who participated in this research for course credit. Prior to participating in the experiment, they were asked to complete a survey about their language background(s). This was to ensure that they had sufficient exposure to the English language by 5 years of age, and maximize the likelihood that they would be familiar with the idioms and the ability to read and understand the narrative and questions. Although every participant was exposed to English by the time they were 5 years old, the language history forms indicated that 17 individuals had more familiarity and knowledge of another language in those early years. Since many Brooklyn College students come from recently immigrated families it is almost impossible to find "pure" monolingual native English speakers. In fact, only 18 out of 60 participants reported that they were not exposed to any other language(s) by the time they were 5 years old.

As described above, participants were randomly assigned to one of three conditions for *each part of the story* in a counterbalanced order and mixed design, such that each participant experienced each condition over the course of the study, but were only assigned to one condition for the portion of the story corresponding to a particular idiom. In other words, one participant would have sat normally for the first part of the story (normal control condition), performed the idiomatic action for the next part of the story (embodied idiom condition), engaged in another control action for the third part of the story (embodied control condition), and sat normally again (normal control condition) for the last part of the story. Another participant would have embodied the idiom for the first part of the story, then engaged in a control position, then sat normally, and embodied the idiom again for the final part of the story, and so on, with each condition occurring equally often in all possible orders, and resulting in 20 participants per condition for each portion of the story (i.e., idiom).

### **MATERIALS AND PROCEDURE**

As mentioned, the story was written and designed to be administered in four sections, corresponding to different idioms and described in further detail below. Every phase of the story began with a couple of introductory sentences to set the scene, followed by 3–4 substantive paragraphs of roughly similar length, ranging from about 300-400 words. Each portion of the story was followed by a set of 4 questions created to assess the strength of the activation of the idiomatic meaning alluded to in the preceding part of the text, and responded to on a 6-point scale (1 = *strongly disagree*, 6 = *strongly agree*), such that the DV consisted of the mean rating across each set of 4 responses. It is important to note that the actual idioms were not mentioned in any of these materials, and care was taken to avoid using words and phrases strongly related to the idiomatic expressions and their meaning. The complete narrative will be described further momentarily, but an example of one section of the story and corresponding questions is provided in Appendix A/Supplementary Materials 1. The full set of materials can be obtained from the author.

The entire story described a court room drama starting with the suspect's account and written so that it could be interpreted like the suspect had *stuck his neck out* (i.e., taken a risk) by getting into a friend's car which ended up resulting in a robbery and accidental murder. This idiom was embodied by having participants literally sit and stick their necks out while reading that part of the story and responding to the questions. An example of one of the risk-related questions that participants had to answer was "Pat was exercising caution as he let Justin drive oddly silent," which was reverse coded. The next portion of the story consisted of the prosecuting and defense attorneys' explanations of the murder which showed that both sides had good points and the case was not clear cut, to convey the idiom of *sitting on the fence* (i.e., feeling ambivalent about a decision), which was embodied by participants straddling a height-adjustable sawhorse such that only the tips of their toes abutted the floor, resulting in an unbalanced state. The set of items to which participants had to respond are listed in the Appendix. The third part of the story involved the judge's comments leading-up to the delivery of the jury's verdict, written in relation to the idiom of *sitting on the edge of your seat* (i.e., feeling excited or anxious about an outcome), which was embodied with participants literally sitting on the edge of their seat. An example question was "I am eager to hear what the verdict will be." Finally, the last part of the narrative dealt with the convict's life and thoughts after the guilty verdict, particularly with respect to his partner in crime to relate to the idiom *burying the hatchet* (i.e., the willingness to forgive). In the embodiment condition, participants were presented with two large catering trays. The left tray contained a small hatchet (with its safety guard on), the right tray was filled with dirt, and participants had to bury the hatchet with dirt using a 1-cup scoop. An example item to which participants responded was "Pat will probably overlook Justin's offense by the time they meet again."

As explained before, participants were assigned to one of three conditions for each portion of the narrative, either (1) the embodiment conditions described above, (2) a normal control condition where they were comfortably seated in front of desk and read normally, or (3) an embodied control condition that involved engaging in a non-idiomatic position or action while reading the passage and answering the questions. The purpose of this latter embodied control condition was to make sure that the effects obtained were not simply due to participants being slightly distracted or uncomfortable while reading and responding to questions in the embodied idiom condition. For the more "positional," "sticking one's neck out," "sitting on the fence," and "sitting on the edge of one's seat" idiomatic actions, participants in the embodied control conditions stood cross-legged while leaning against a wall. However, since the action of "burying the hatchet" involved moving one's hand to scoop dirt and dump it over a hatchet in a neighboring tray, the embodied control condition involved participants sitting at the same desk, but moving dominoes from one side of the table to the other. These conditions were chosen after testing various options because they were judged to be similar to the embodied idiom conditions in terms of positional awkwardness and motoric action.

Participants were instructed that we were investigating the effects of motor information on text comprehension and that they would be required to perform specific positions/actions while reading a story and responding to questions. While giving participants instructions, it is critically important to note that the experimenter never uttered words associated with the idioms and their meaning, let alone the idiomatic expressions themselves, so that they would not be verbally activated. Instead, participants were asked to basically imitate the same position or action demonstrated by the experimenter while they read each part of the story and answered the corresponding questions. Since the study was designed around one continuous and coherent narrative, the participants were all presented with the sections of the story in the same order, but engaged in different positions or actions for each phase based on the conditions to which they were assigned. After the critical idiom embodiment story task, participants completed a distracter task where they read and underlined passages from a text, followed by a questionnaire to assess their understanding of the target idioms (explaining the meaning of example sentences) and their familiarity with the expressions. The entire study took about 75 min to complete.

#### **NORMATIVE DATA**

A separate group of participants (*N* = 31, 15 females) rated the idioms for decomposability, literality, and transparency on 5-point scales, with 1 = *completely non-X* and 5 = *completely X*1 . Idioms can vary on these dimension and these variables can affect how idioms are processed (Westbury and Titone, 2011). This information was collected to accurately assess our stimuli on these dimensions and to examine which variables are more or less important in determining *which* idioms are likely to be embodied. The ratings were obtained by embedding our critical items into a larger set of idioms, and the resulting data is presented in **Table 1**.

### **RESULTS**

All participants were included in the analyses because none of them correctly identified the real purpose of the experiment or had to be excluded for other reasons. In addition, post-testing results indicated that all four idioms were understood and familiar to the vast majority of participants (see **Table 2**). Due to the inherent differences between the idiomatic actions, segments of the story, and subsequent questions, the approach used to analyze the data was to conduct separate One-Way analyses of variance (ANOVAs) across the 3 conditions for each portion of the narrative. Specifically, responses to the 4 questions corresponding to each idiom and phase of the study were averaged into a single score to reflect the activation strength of that idiomatic meaning. One-Way ANOVAs were run for each idiom along with Tukey Honestly Significant Difference (HSD) tests.

Results revealed that actually "sticking your neck out" significantly increased risk judgments (*M* = 4*.*49, *SD* = 0*.*61) relative to the embodied control (*M* = 3*.*71, *SD* = 0*.*93) and normal control (*M* = 3*.*8, *SD* = 0*.*55) conditions, *F*(2*,* 57) = 6*.*981, *p* = 0*.*002, η<sup>2</sup> *<sup>p</sup>* = 0*.*197, with the latter two conditions not significantly differing, *p* = 0*.*92. "Sitting on the fence" also induced more

**Table 1 | Normative idiom ratings on relevant dimensions.**


*Mean ratings of each dimension with standard deviations in parentheses. Higher values indicate stronger endorsement of each dimension.*



in the degree to which they're transparent or opaque. For example, *jump the gun*, which figuratively means *to start ahead of time,* is relatively transparent because its literal meaning clearly motivates its figurative meaning, whereas the meaning of *kick the bucket* is more opaque because there is no clear and transparent relationship between its literal and figurative meaning.

<sup>1</sup>Decomposability refers to the extent to which idioms' individual words contribute to their overall figurative meaning. For example, *save your skin* is decomposable because the word "save" relates to the overall idiom meaning (to protect or save yourself). Literality refers to whether the idiom also has a plausible literal meaning. For example, *kick the bucket* figuratively means *to die* and literally means *to strike a pail with your foot*. However, the idiom *under the weather,* which figuratively means to be ill, does not have as clear a literal meaning. Transparency refers to how clearly and directly the figurative idiomatic meaning is related to the expression's literal meaning. While all idioms have a meaningful idiomatic or figurative interpretation, they vary

ambivalent judgments (*M* = 4*.*45, *SD* = 0*.*58) relative to the embodied control (*M* = 3*.*69, *SD* = 0*.*96) and normal control (*M* = 3*.*49, *SD* = 0*.*95) conditions, *F*(2*,* 57) = 7*.*174, *p* = 0*.*002, η2 *<sup>p</sup>* = 0*.*201, with the latter two conditions not significantly differing, *p* = 0*.*74. Literally "sitting on the edge of your seat" significantly increased judgments of excitement (*M* = 5*.*16, *SD* = 1*.*13) relative to the embodied control (*M* = 3*.*84, *SD* = 1*.*62) and normal control (*M* = 3*.*68, *SD* = 1*.*79) conditions, *<sup>F</sup>*(2*,* 57) <sup>=</sup> <sup>5</sup>*.*640, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*006, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*165, with the latter two conditions not significantly differing, *p* = 0*.*94. However, the effect for "burying the hatchet" was not found to be significant, *F <* 12 .

These results are displayed in **Figure 1**, which shows that the idiomatic embodiment condition generally resulted in higher mean responses across the questions designed to measure the activation of each idiom's meaning, than either of the control conditions. Although the analysis for "burying the hatchet" was not significant, **Figure 1** shows that the mean ratings followed the same expected pattern. To investigate why, repeatedmeasures ANOVAs were conducted on the decomposability, literality, and transparency ratings. The idioms were not different in decomposability and literality, *F*s *<* 1, but did differ in transparency, *<sup>F</sup>*(3*,* 90) <sup>=</sup> <sup>10</sup>*.*489, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*259 (assumptions of sphericity were met). *Post-hoc* pairwise comparisons with Bonferroni corrections showed that *burying the hatchet* was rated significantly less transparent and thus more opaque than the other three idioms, which were not significantly different.

# **DISCUSSION**

The current research was intended to be an exploratory study about whether embodying idioms could instantiate their meaning to subsequently affect processing and judgments. Recall that previous behavioral and neuroimaging investigations of this issue had produced mixed results (Gibbs and Bogdonovich, 1999; Glucksberg and McGlone, 1999; Keysar et al., 2000; Nippold and Duthie, 2003; Boulenger et al., 2009, 2012; Lauro et al., 2013), with a majority of studies failing to show significant activation in perceptual and motor brain areas in response to idiomatic expressions (Aziz-Zadeh et al., 2006; Raposo et al., 2009; Cacciari et al., 2011; Desai et al., 2013). In contrast to the neuroimaging approach of examining participants' brain activity after exposure to idiomatic stimuli, the present study involved placing participants into the sensorimotor states corresponding to certain idioms to determine whether that could significantly activate their meaning. The results showed that this was indeed the case, thereby providing evidence that at least some idioms have an embodied aspect of their representational structure, and suggesting that embodiment may be more fundamental to the conceptual representations and processing of idiomatic expressions than previously thought.

<sup>2</sup>The same analyses were also conducted on the subset of 43 participants (with the caveat that conditions were no longer fully counterbalanced) after excluding the 17 individuals who had reported greater exposure and knowledge of another language besides English in their first 5 years. These results are reported in Appendix B/Supplementary Material 2 and generally followed the same patterns as the overall analyses. Additional examination of the data on a participant-by-participant basis revealed that the effects were moderately robust across participants, with 65% (or 39/60) individuals showing the expected pattern of stronger ratings in the embodied vs. control conditions, whereas the other 35% (or 21/60 participants) gave ratings in one or both control conditions that were higher or equivalent to the embodied conditions. Interestingly, there was a slight tendency for less native speakers (i.e., those with less exposure to English by 5 years of age) to be less likely to show the expected effects. Specifically, out of the 21 participants not following the expected pattern 9 of them consisted of the less "native" individuals, while only 8 of them followed the predicted pattern. In other words, 31 of the 39 participants who showed the expected results were individuals with more exposure to English in their early years.

The approach of the present study is somewhat similar to studies by Ackerman et al. (2010); Leung et al. (2012), and Wilson and Gibbs (2007) that investigated the effects of having participants embody metaphoric actions. Specifically, Ackerman et al. (2010) examined the metaphorical association between physical weight and concepts of severity and importance by showing that participants judged a job candidate to be better if they evaluated him on a heavy vs. light clipboard. Similarly, in another couple of studies investigating the metaphorical links between physically rough textures and harsh or difficult situations, they found that participants judged an ambiguous social situation to be less coordinated (i.e., more difficult and harsh) after working on a puzzle with rough sandpaper-covered pieces compared to those who handled smooth puzzle pieces. The purpose of Ackerman et al.'s experiments was to examine how haptic experiences can affect *interpersonal judgments*. The focus of the study by Leung et al. (2012) was different and designed to investigate whether embodying metaphoric actions is linked to *creative processes*, by seeing if literally thinking about things "on one hand and then the other," "thinking outside the box," or "putting 2 and 2 together," would actually increase measures of creativity. As expected, they found that participants who were seated outside of a box, allowed to walk freely, or who combined the halves of circles together, generally performed better on various convergent and divergent thinking tasks, compared to individuals sitting inside a box, required to walk in a fixed, rectangular path, or who didn't combine circle halves together, respectively.

Finally, in the study by Wilson and Gibbs (2007) participants were trained to perform various actions such as pushing, spitting, and grasping in response to symbols like &,:, and ", displayed on a computer prior to being presented with a phrase like *push the argument, spit out the facts, and grasp a concept*. Wilson and Gibbs found that the responses to those phrases were significantly faster when preceded by matching as opposed to non-matching actions or when preceded by no action. All of these studies thus show that the physical embodiment of metaphors can affect a variety of subsequent processes, but the present research appears to be the first investigation to have taken this approach with idioms. Nevertheless, it must be noted that some of the stimuli from these prior studies consisted of familiar conventionalized phrases like *think outside the box, put 2 and 2 together, swallow your pride, sniff out the truth, and shake off a feeling*, that are likely not that different from the idioms examined in this study. However, in contrast to Wilson and Gibbs (2007) and possibly Leung et al. (2012), participants in the current investigation were never actually presented with the figurative utterances themselves3 . Wilson and Gibbs provide evidence that their participants were unaware of the connection between the metaphoric actions and expressions, and that their results were not due to lexically-based associations and activation of the figurative meaning. In order to most convincingly demonstrate the embodiment of idioms in the current investigation, we thought it best to completely avoid any sort of linguistic instantiation of the idiom or its meaning.

As mentioned, the goal of this study was to investigate idiomatic embodiment with a small number of idioms in a well-designed and plausible experimental procedure where participants would not recognize that the actions they were asked to do corresponded to idioms and come close to guessing the true purpose of the experiment. Although the pattern of results across the embodied idiom compared to both control conditions clearly supports the embodiment of some idioms, the study is clearly limited by the restricted set of idioms. It will therefore be up to future research to design a study investigating the generalizability of these results to a larger set of stimuli, perhaps with a procedure more akin to that of Wilson and Gibbs (2007), because the length of the story segments and sets of questions from the present experiment resulted in a study that was already over an hour long.

This limitation aside, the present study indicates that the sensorimotor experiences of engaging in idiomatic actions instantiated their meaning to affect participants' processing of the discourse and their responses to the corresponding questions. A potential account of these findings will be presented shortly, but let us first consider other reasons why these results may have been obtained. One possibility is that the current effects could be due to motor imagery rather than embodiment *per se* (Willems et al., 2010; Cacciari et al., 2011; Schuil et al., 2013). Distinguishing these concepts can be difficult and some researchers treat them synonymously, but motor imagery has been defined as the covert or mental "simulation" of bodily movement that involves the activation and monitoring of a motor plan without the overt execution of an action (Willems et al., 2010; Tomasino et al., 2011). Since participants in the current study were actually performing the actions, it seems unlikely and counter to the definition of motor imagery that they would simultaneously be imagining those movements as well. If anything, it's more likely that they may have been imaging the content of the narrative and accompanying questions. A related issue is that the concepts underlying the idioms (e.g., taking a risk, indecision) may have simply been activated as a result of reading the story. While both of the latter phenomena could be true, all of the participants received identical materials (i.e., the same story and questions), so any concepts that could have been imaged or activated were kept constant across participants, but the embodiment conditions still resulted in higher ratings. This indicates that there must be something about the cases where participants performed the idiomatic actions relative to the control conditions that caused them to respond more strongly to the questions. It is also possible that the current findings resulted from a synergistic interaction between the text, questions, and idiomatic actions, rather than just the embodiment of the idioms *per se*. However, even if that was the case, it still means that embodying the idioms contributed something distinct to how the narrative was processed and understood above and beyond the other conditions.

Since participants were never explicitly told or asked about the idioms directly, we also cannot be certain that they were activating

<sup>3</sup>With respect to study by Leung et al. (2012), the extent to which creativity metaphors may have been verbally induced while giving participants instructions is unclear. This issue does not appear to be problematic for Ackerman et al. (2010) because their study was not testing specific metaphoric phrases, but rather the underlying conceptual metaphors that give rise to a variety of expressions such as the "gravity of a situation," "having a rough day," and someone being "hard-hearted."

the exact intended expressions. There are many idioms conveying risk, some of which would also be compatible with the action of putting one's head and neck forward (e.g*., to put one's head/neck on the block, to put/stick one's head in a noose, put one's neck on the line*). Given human experience about the importance and vulnerability of one's neck, the existence of multiple expressions conveying risk and involving the neck and head is no coincidence and further supports the embodiment of some idioms. Of course there are other idioms like *playing with fire, playing Russian roulette, skating on thin ice, and walking into the lion's den* that also convey risk. It seems unlikely that they would have been activated by the current procedure, but that is an interesting question for future research. Specifically, would activating the concept of risk by the movement of "sticking one's neck out" generalize and facilitate the processing of other "non-neck" idioms like *skating on thin ice*. Similarly, with respect to *being on the fence*, other idioms about indecision also typically convey a similar state of unbalance and sense of going back and forth or side to side (*e.g., hem and haw, go to and fro, be of two minds, and torn/tugged/pulled in 2 directions or between 2 options*). Hence, the extent to which straddling a sawhorse activated *sitting on the fence* rather than one or more of these analogous expressions is unclear, as is the issue of whether other indecision idioms like *being in a quandary, dragging one's feet, or still up in the air*, may have been activated.

Since the current findings indicate that the meaning of certain idioms can be instantiated simply on the basis of sensorimotor experience, we will now try to provide an account for why this may be the case. Specifically, we will propose that one of the main neural mechanisms that could underlie these effects is the human mirror neuron system (HMNS). Mirror neurons were first identified in macaque monkeys as special cells in area F5 [in the inferior frontal gyrus (IFG) and analogous to Broca's area 44 in the human brain], primary and premotor cortex, inferior parietal cortex, and the superior temporal sulcus (Corballis, 2010; Molenberghs et al., 2012; Traxler, 2013). These neurons would fire action potentials both when the monkeys would observe an individual performing a certain action and also when the monkeys engaged in the same action themselves (di Pellegrino et al., 1992; Gallese et al., 1996; Rizzolatti and Craighero, 2004).

Researchers have also identified a similar mirror neuron system in humans, which has been invoked in accounts of language phenomena, particularly the perception of speech, embodiment of semantics, metaphor, interpersonal discourse, and the evolution of language itself, as well as theory of mind, schizophrenia, autism, alexithymia, and multiple sclerosis (Gibbs, 2006a; Gallese, 2008; Corballis, 2010; Molenberghs et al., 2012; Traxler, 2013). However, this research is more controversial and should be considered with caution (Hickok, 2009; Venezia and Hickok, 2009; Arevalo et al., 2012; Molenberghs et al., 2012; Traxler, 2013). The network of regions involved in the HMNS also appears to be very broad, with a recent meta-analysis finding significant levels of activation in 34 Brodmann areas (Molenberghs et al., 2012). However, a generally bilateral set of regions similar to those of the monkeys (i.e., primary motor cortex, ventral premotor cortex, IFG, superior, and inferior parietal lobules) appear to have the strongest support, in addition to the temporal-occipital junction, portions of the limbic system, particularly the amygdala, insula, and cingulate gyrus, and visual, auditory, and somatosensory cortices, depending on the sensory modalities involved (Corballis, 2010; Arevalo et al., 2012; Molenberghs et al., 2012). There has been considerable research and discussion about exactly what the neurons in these brain regions are doing, but the prevailing claim seems to be that they are involved in generating an internal representation, possibly even the "understanding," of goal-directed actions, rather than just simple imitation (Rizzolatti and Craighero, 2004; Gallese, 2008; but see Hickok, 2009; Corballis, 2010).

We are admittedly hesitant to jump onto the HMNS bandwagon, particularly since the study did not directly investigate this at a neural level. However, the current procedure must have activated the mirror neuron system because participants had to copy the actions demonstrated by the experimenter. This was also true of the control conditions and yet the embodied idiom condition still resulted in significantly greater activation of the idiomatic meaning as measured by the strength of participants' responses to the questions. Therefore, the embodiment of particular idioms is not simply due to the activation of mirror neurons themselves, but rather what those neurons potentially encode with respect to the representation of idioms. Specifically, mirror neurons are important because they interconnect the brain regions involved in the perception of behaviors and the areas responsible for the actual or simulated execution of those actions (Gallese and Lakoff, 2005; Gibbs, 2006a; Fogassi and Ferrari, 2007). As mentioned, the HMNS network is thought to be particularly important for generating an internal representation of an action or behavior and its outcomes or goals (Gibbs, 2006a; Fogassi and Ferrari, 2007; Gallese, 2008). In addition, both the human and monkey research suggests that mirror neurons may be representing actions and their intentions in a more conceptual or cognitive form such that the purpose or consequence of a behavior can be inferred and anticipated (Fogassi and Ferrari, 2007; Gallese, 2008; Corballis, 2010; Traxler, 2013). This possibility combined with the fact that an action does not actually need to be executed but can be encoded into the HMNS by imagination or simulation processes (Gallese and Lakoff, 2005; Wilson and Gibbs, 2007; Gallese, 2008) may explain the embodiment of certain idioms. For instance, the meaning of an idiom like *sticking one's neck out* could become embodied as a result of people encountering individuals being hung or beheaded, and animals being slaughtered by cutting or breaking their necks4 , in addition to hearing the expression while seeing people put themselves in a variety of risky and dangerous situations, such that these experiences get encoded into an individual's perceptual, motor, and mirror neuron systems.

The idiom of *burying the hatchet* reflects the means by which fighting Native American tribes would end their conflicts and declare peace5 (Ammer, 2003). This item is interesting because

<sup>4</sup>In the past when the idiom originated (see Rogers, 1985; Ammer, 2003) such events would have been experienced in person, but nowadays they are generally encountered in various forms of media.

<sup>5</sup>Another suggestion is that it comes from the expression "hang up one's hatchet" dating back to the early 1300s before the arrival of Christopher Columbus, with the word "bury" replacing "hang up" in the 1700s (Ammer, 2003).

people (at least those familiar with American history, like the participants in this study) typically know what it means. However, unlike the other idioms in this study, it is the one least likely to be experienced in an occasional movie, show, or book, particularly a visual depiction of the actual procedure, although the expression itself may be encountered more frequently in contexts where forgiveness has or has not occurred. This idiom did not show a significant effect of embodiment, although the results went in the expected direction. In accordance with this finding, "burying the hatchet" was found to be significantly less transparent than the other idioms, indicating that transparency may be a particularly important factor regarding the extent to which an idiomatic expression is embodied. Recall that transparency was defined as the strength or closeness of the connection between the literal and figurative meaning. The current results thus suggest that the weaker and more distant the connection between the literal and figurative meaning, the more likely the processing system needs to rely on amodal linguistic symbols to represent the idiomatic meaning. The present theoretical framework further suggests that transparency really corresponds to the extent to which idioms have been actually physically experienced, either by an individual directly (e.g., someone who shifts forward to the edge of their seat in anticipation of the next scene in a movie) or indirectly through the observation of other individual(s) (e.g., seeing someone shift from side to side while trying to make a decision). In other words, transparency may reflect the strength and frequency with which sensorimotor and mirror neuron systems have been activated by such idiomatic experiences or encounters over time. This would predict that all other things being equal idioms like to *rock the boat* or *muddy the water* should be more embodied due to the fact that most people have likely experienced the instability of being on a boat or water becoming cloudy as dirt or sand is kicked up, compared to expressions like *to have a chip on one's shoulder, shoot the breeze, or paint the town red* which cannot be physically experienced to the same degree. Indeed, it would be interesting to compare more strongly or weakly embodied idioms matched on various other dimensions (e.g., length, frequency, decomposability, transparency, familiarity, and literality) to determine whether a greater degree of embodiment results in idioms that are more easily processed and remembered than less embodied items.

We thus propose that similar to the embodiment established for other aspects of language including metaphors, the meaning of many idioms is grounded in actual or simulated experiences encoded in the HMNS6 , such that activating the neurons in those corresponding perceptual and motor brain regions can in turn instantiate the idiomatic meaning, as found in the current study. Although it may seem obvious to explain the representation and processing of idioms according to the experiences upon which their meaning is based and understood, prior research has mainly focused on studying the expressions themselves and their properties rather than really considering the situations they describe and the extent to which individuals may have actually directly or indirectly experienced them. The HMNS has been claimed to be one of the fundamental mechanisms responsible for the embodiment of language in general (Barsalou, 2008, 2013; Gallese, 2008; Arevalo et al., 2012; Caligiore and Fischer, 2013; but see Hickok, 2010) with some discussion about how mirror neurons and simulation processes potentially underlie the comprehension of metaphor (Gallese and Lakoff, 2005; Gibbs, 2006a), but to our knowledge the current proposal appears to be first to explicitly suggest that the HMNS may be important for the representation and processing of idioms. A few of the previous neuroimaging studies of idiomatic embodiment have found activity in some of the brain areas involved in the HMNS, most notably the IFG, motor and premotor cortices (Boulenger et al., 2009, 2012; Desai et al., 2013). However, besides suggesting that their results provide support for the notion that abstract figurative meanings are grounded in sensorimotor brain regions, these researchers do not discuss or propose an account of their findings in terms of the mirror neuron system, except for this brief comment by Desai et al. (2013) about the activation obtained for idioms in the IFG (BA44/6) which they describe as being "associated with tool use and thought to be part of the mirror neuron system." (p. 866).

The current findings in conjunction with some of the neuroimaging research thus appear to be indicative of a bidirectional connection between the meaning of idiomatic expressions and the actual or simulated experiences encoded into individuals' perceptual and motor regions through the HMNS. If this conceptualization is accurate, then encountering an idiomatic expression should re-activate those systems to some degree, but the only way to conclusively confirm this would be by using TMS or studying patients with damage to their mirror neuron systems to see if they show any difficulty with idiom comprehension relative to controls. Since some of the regions identified as being important aspects of the HMNS either overlap or are in close proximity to IFG language areas like BA44 and 45, the most convincing support for the importance of the HMNS would come from showing that impairment of more purely motor or premotor cortices affects idiom processing. Cacciari et al. (2011) appear to have conducted the only study that has really investigated this, but they did not obtain significant MEPs in response to idiomatic sentences after TMS pulses to the leg region of motor cortex. Although this finding fails to support that hypothesis, it should be noted that significant MEPs were shown for metaphorical and fictive motion sentences. Even though those stimuli are figurative, Cacciari et al. note that "the motor component of the verb is preserved" (p. 156) in contrast to idioms where it has vanished. In fact, when individuals were asked to "rate the extent to which the idioms referred to actions*...* the ratings were extremely low" (communication via review) suggesting that their idiomatic stimuli were generally not considered to be embodied. If preserving the motion component of the verb is critical, as Cacciari et al. have claimed, then this could account for the discrepancy between prior failures to support the embodiment of idioms (Raposo et al., 2009; Cacciari et al., 2011; Cacciari and Pesciarelli, 2013) and the present study

<sup>6</sup>The fact that simulated or imagined actions can still be encoded into the HMNS is particularly critical for explaining how idioms like sticking one's neck out, stabbing someone in the back, or biting someone's head off could still be embodied without individuals having to personally perform those actions. An interesting question for future research would be to examine the degree to which such idioms may be less strongly embodied than idioms like gritting one's teeth or to be left out in the cold that most people have directly experienced.

which required participants to perform and sustain the idiomatic movements themselves.

Another interesting avenue for future research will be to examine how typically studied features of idiomatic expressions like decomposability and literality relate to the extent of embodiment. Some factors like imageability will undoubtedly be highly correlated with embodiment, but the nature of this relationship for other variables is less clear and should be explored. The linguistic features of idiomatic expressions are an important aspect of their representation and processing, regardless of the degree of embodiment. In addition, since some idioms are most likely either not or very weakly embodied, with their meaning mainly consisting of amodal linguistic symbols (e.g., *something that's the real McCoy, opening a Pandora's box, to get forty winks, kick the bucket, paint the town red, bury the hatchet, or sell someone down the river*), a pluralistic approach like Barsalou et al. (2008) Language and Situated Simulation Theory (LASS) that integrates both linguistic forms and sensorimotor experiences into the human conceptual system may be the most comprehensive and accurate way to account for the wide range of phenomena in natural language, including the variability regarding the embodiment of idioms and other figurative expressions. According to this view, language processing simultaneously triggers the activation of linguistic and sensorimotor simulation systems. The activity of the linguistic system peaks first and is responsible for categorization, spreading activation, and other shallow, word association based processes. The simulation system peaks later and is responsible for deeper conceptual development, which is accomplished through modality-specific simulations, likely involving the HMNS. It is this deeper simulation-based processing that could result in the stronger representations and facilitated processing for more vs. less embodied idioms when other factors remain constant, as suggested earlier.

In conclusion, the present findings show that the process of embodying idioms simply by engaging in the corresponding actions can activate their meaning enough to significantly influence subsequent processing and judgments. This study therefore makes an important contribution to the mixed results in the literature by suggesting that the representation and processing of idiomatic meaning may be more grounded in sensorimotor experiences than previously thought, providing further support for the fundamental importance of embodiment in language comprehension and cognition. Since the current research was limited to a small number of stimuli, it will be up to future studies to investigate a larger and more variable set of idioms to determine the reliability and validity of these results. In spite of this limitation, it is our hope that the relatively novel approach, interesting findings, and proposed account in terms of the HMNS, will stimulate further research along these lines to more thoroughly understand how idioms are represented and processed, particularly with respect to their embodiment.

#### **ACKNOWLEDGMENTS**

This study was really conceived by and conducted in collaboration with my former doctoral student Kendall Eskine, who also wrote an initial version of the manuscript. His contributions to this paper should therefore be considered equal to mine. We are also grateful for the assistance of Ben Cooley and Tom McAusland for their help in conducting this study. We also thank Ray Gibbs and 5 other reviewers for their feedback on an earlier version of this paper.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fnhum*.* 2014*.*00689/abstract

### **REFERENCES**


Zwaan, R. A., and Madden, C. J. (2005). "Embodied sentence comprehension," in *The Grounding of Cognition: The Role of Perception and Action in Memory, Language, and Thinking*, eds D. Pecher and R. A. Zwaan (Cambridge, UK: Cambridge University Press).

Zwaan, R. A., Stanfield, R. A., and Yaxley, R. H. (2002). Language comprehenders mentally represent the shape of objects. *Psychol. Sci.* 13, 168–171. doi: 10.1111/ 1467-9280.00430

**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 February 2014; accepted: 17 August 2014; published online: 24 September 2014.*

*Citation: Kacinik NA (2014) Sticking your neck out and burying the hatchet: what idioms reveal about embodied simulation. Front. Hum. Neurosci. 8:689. doi: 10.3389/ fnhum.2014.00689*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Kacinik. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Washing the guilt away: effects of personal versus vicarious cleansing on guilty feelings and prosocial behavior

# **Hanyi Xu1,2\*, Laurent Bègue<sup>1</sup> and Brad J. Bushman3,4**

<sup>1</sup> Laboratoire Interuniversitaire de Psychologie (LIP), University of Grenoble 2, Grenoble, France

<sup>2</sup> Department of Psychology, University of Louvain, Louvain-la-Neuve, Belgium

<sup>3</sup> School of Communication and Department of Psychology, The Ohio State University, Columbus, OH, USA

<sup>4</sup> Department of Communication Science, VU University Amsterdam, Amsterdam, Netherlands

#### **Edited by:**

Analia Arevalo, East Bay Institute for Research and Education, USA

#### **Reviewed by:**

Juan F. Cardona, Catholic University of Pereira (UCP), Colombia Roel M. Willems, Donders Institute for Brain, Cognition and Behavior, Netherlands

#### **\*Correspondence:**

Hanyi Xu, Department of Psychology, University of Louvain, Pl. du Cardinal Mercier 10, B-1348 Louvain-la-Neuve, Belgium

e-mail: hanyi.xu@uclouvain.be

For centuries people have washed away their guilt by washing their hands. Do people need to wash their own hands, or is it enough to watch other people wash their hands? To induce guilt, we had participants write about a past wrong they had committed. Next, they washed their hands, watched a washing-hands video, or watched a typing-hands video. After the study was over, participants could help a Ph.D. student complete her dissertation by taking some questionnaires home and returning them within 3 weeks. Results showed that guilt and helping behavior were lowest among participants who washed their hands, followed by participants who watched a washing-hands video, followed by participants who watched a typing-hands video. Guilt mediated the effects of cleansing on helping. These findings suggest that washing one's own hands, or even watching someone else wash their hands, can wash away one's guilt and lead to less helpful behavior.

**Keywords: guilt, wash, cleanse, embodiment, prosocial behavior, helping**

# **INTRODUCTION**

When Jesus Christ was brought before Pontius Pilate, the Roman governor in Jerusalem at the time, Pilate offered to release a prisoner for the Passover feast, either Jesus Christ or the "notorious prisoner" Barabbas. The Jewish chief priests and elders persuaded the people to ask for the release of Barabbas. When Pilate asked what should be done with Jesus Christ, the multitude said, "Let him be crucified" (Matthew 27:22). When Pilate asked, "Why, what evil hath he done?" they cried out again, "Let him be crucified" (Matthew 27:23). Pilate then "took water, and washed his hands before the multitude, saying, 'I am innocent of the blood of this just person'" (Matthew 27:24). Likewise, in Shakespeare's play, Lady Macbeth attempted to wash away her guilt of plotting King Duncan's murder by compulsively washing her hands.

#### **GUILT**

Guilt is an unpleasant emotional feeling that helps us know we did something wrong (e.g., Baumeister et al., 1994; Ferguson and Stegge, 1998). Although guilt feels bad to the individual, it is actually quite good for society and for close relationships. You would not want to have a boss, a lover, a roommate, or a business partner who had no sense of guilt. Such people are called psychopaths, and they are often a disaster to those around them (see Hare, 1998). Psychopaths exploit and harm others, help themselves at the expense of others, and feel no remorse about those they hurt.

When people feel guilty about something they have done, they often perform prosocial actions to wash away the guilt. For example, in one study (McMillen and Austin, 1971), half the participants were induced to tell a lie to the experimenter. After the study was over, the experimenter said that participants were free to go, but added that if they had extra time they could help him fill in bubble sheets for another study. Participants who had not been induced to lie volunteered to help fill in bubble sheets for 2 min on average, whereas participants who had been induced to lie volunteered to help fill in bubble sheets for 63 min. The lying participants were apparently attempting to wash away their guilt for lying to the experimenter by being more helpful. Guilt made them more willing to engage in prosocial behavior. The opposite is also true. If people feel cleansed of guilt, they are less likely to engage in prosocial behavior (Zhong and Liljenquist, 2006; Xu et al., 2011). Previous research has not, however, measured whether guilt mediates the effect of cleansing on prosocial behavior. The present research fills this important gap in the literature.

#### **WASHING THE GUILT AWAY**

Can washing one's hands remove one's guilt? Both Pilate and Lady Macbeth thought so, and they are not alone. Research has shown that people often feel less guilty after washing their hands (e.g., Zhong and Liljenquist, 2006; Nelissen and Zeelenberg, 2009; Bastian et al., 2011). Purity is the central notion of morality (Haidt and Joseph, 2008), and cleansing makes one more pure and clean (Lee and Schwarz, 2010). In baptisms and other religious rituals, water is used to wash away sin and make the person clean and pure.

Does one have to physically wash one's own hands of guilt, or is it sufficient to watch others wash their hands? We suggest that watching others wash their hands might "wash away" at least some of the guilt. It has been suggested that embodiment plays an important role in helping the brain simulate experience, process information, form attitudes, arouse emotions, make decisions, and take actions (Niedenthal et al., 2005; Barsalou, 2008). According to embodied cognition theories (Gallese and Lakoff, 2005; Niedenthal, 2007; Barsalou, 2008; Meteyard et al., 2012), acting and simulating share the same brain substrates. When simulating an action, the brain (partially) reactivates the (original) action as well as any accompanying thoughts and feelings (Barsalou, 1999; Rubin, 2006; Niedenthal, 2007). Abstract concepts and emotions are grounded and "embodied" in our concrete experience and knowledge (Lakoff and Johnson, 1980, 1999). That is, abstract concepts and emotions can be comprehended and retrieved by concrete experience as well as by simulating the experience. It is thus plausible that the concepts of "cleanliness" and "purity" are embodied in bodily movements and everyday rituals such as erasing, rinsing, and washing.

Washing one's hands is a "bottom-up" process grounded in authentic sensory and motor experiences that activates the concepts of "cleanliness" and "purity". Watching others wash their hands is a "top-down" process in which the brain simulates comparable sensory and motor experiences. In both cases, guilt should be reduced due to either "bottom-up" reactivation of concepts of "cleanliness" and "purity" or "top-down" simulation of washing one's hands. However, we propose that physically washing one's hands should be more effective in reducing guilt than watching others wash their hands, for two reasons. First, the "bottom-up" experience of cleansing oneself is more perceptually convincing and vivid than the vicarious "top-down" simulation of cleansing oneself. Second, reliving or reenacting an experience only involves reactivation of part of the neurons engaged in the original experience (Damasio, 1989; Barsalou et al., 2003). This discrepancy in the amount of neurons between "bottomup" reactivation and "top-down" simulation should cause the difference in their effect on reducing guilt. Therefore, watching others cleanse themselves might decrease one's own guilty feelings to a lesser degree than washing one's own hands. The present research therefore includes three experimental conditions: selfcleanliness, other-cleanliness, and no-cleanliness control.

#### **OVERVIEW OF THE PRESENT STUDY**

The present research expands past research in several important ways. First, it compares the effect of washing one's own hands versus watching someone else wash their hands. Second, it includes a measure of prosocial behavior to measure the behavioral effects of washing one's guilt away. Third, it tests whether guilt mediates the effect of cleanliness on prosocial behavior.

In the present study we first induced feelings of guilt by having participants recall and then write a detailed description about a past wrong they committed against a significant other (e.g., family member, close friend). Next, they were randomly assigned to one of three experimental conditions: (1) a *personal-cleanliness* condition in which they washed their own hands; (2) an *othercleanliness* condition in which they watched a video of someone else wash their hands; or (3) a no-cleanliness *control* condition in which they watched a video of someone else typing. We measured feelings of guilt before and after the experimental manipulation. Participants were then told the study was over, and they were paid for their participation. The experimenter added, however, that if they wanted they could help a Ph.D. student complete her dissertation by taking some questionnaires home and returning them within 3 weeks in a prepaid envelope. The number of questionnaires returned was used to measure prosocial behavior. We predict that physical self-cleansing is more effective than a metaphorical concept of cleanness in decreasing guilt. But watching someone else wash his or her hands should "wash away" at least some of the guilt. Thus, we predicted the lowest levels of guilt and prosocial behavior among participants who washed their own hands, followed by participants who watched someone else wash their hands, followed by participants in the control condition who watched someone type with their hands. Furthermore, we expected guilt to mediate the effects of cleanliness on prosocial behavior such that the more guilty participants felt, the more helpful they would be.

# **METHOD**

# **ETHICS STATEMENT**

Our study was approved by the ethical committee of Laboratoire Interuniversitaire de Psychologie (LIP) at the University of Grenoble, France. We also obtained consent from our participants.

# **PARTICIPANTS**

Participants were 65 adult patrons at a municipal library in France (30 women; 18–79 years old; *M*age = 41.5, *SD*age = 16.7) who were paid 10e (\$14) in exchange for their voluntary participation.

# **PROCEDURE**

Participants were tested individually. They were told the researchers were studying the relationship between verbal memory, memory of body movements, and emotions. After giving their consent, participants were given 15 min to write a description about an event in which they had done something negative to someone important to them. This paradigm has been widely in past research used to induce guilt feelings in participants (e.g., Niedenthal et al., 1994; Smith et al., 2002; Lickel et al., 2005), and it is especially effective when the person they are writing about is someone important to them (Baumeister et al., 1994; Xu et al., 2011). Participants were told to write down the whole story, to include as many details as possible, and to describe exactly how it made them feel. Next, participants completed the 5-item (e.g., "I feel bad about something I have done") guilt subscale of the State Guilt and Shame Scale (Marschall et al., 1994; Cronbach α = 0.87; *M* = 12.80, *SD* = 5.24) to measure their current feelings of guilt.

Next, participants completed a task that ostensibly measured memory of body movements. They were randomly assigned to three conditions: personal-cleanliness (*N* = 21), other-cleanliness (*N* = 22), or control (*N* = 22). In the *personal-cleanliness* condition, participants first memorized the numbers (from 1 to 14) on a paper for 1 min. Each number was paired with a finger, the palm, or the back of the left or right hand (i.e., 1 = thumb, 2 = index finger, 3 = middle finger, 4 = ring finger, 5 = little finger, 6 = palm, 7 = back of left hand; 8 = thumb, 9 = index finger, 10 = middle finger, 11 = ring finger, 12 = little finger, 13 = palm, 14 = back of right hand). The participant typed these numbers on a computer keyboard, and then wiped each finger, the palm, or the back of the appropriate hand in the order of the numbers with a wet white wipe for about 2 min. In the *other cleanliness* condition, participants watched a 2-min video of someone else doing the same thing as in the *personalcleanliness* condition, and recalled the numbers in the appropriate order. In the *control* condition, participants also watched a 2-min video of a person typing numbers on a keyboard and recalled the numbers.

Next, participants again completed the State Guilt and Shame Scale (Marschall et al., 1994; Cronbach α = 0.81). Participants also completed the Positive and Negative Affect Schedule (PANAS; Watson et al., 1988; *M*positive affect = 32.17, *SD*positive affect = 6.32; *M*negative affect = 19.09, *SD*negative affect = 5.52) to test whether the effects of the manipulation were specific to guilt. This scale contains 10 negative items (*afraid*, *ashamed*, *distressed*, *guilty*, *hostile*, *irritable*, *jittery*, *nervous*, *scared*, and *upset*; Cronbach α = 0.80), and 10 positive items (*active*, *alert*, *attentive*, *determined*, *excited*, *enthusiastic*, *inspired*, *interested*, *proud*, and *strong*; Cronbach α = 0.82).

Participants were told that the study was over, but if they were willing to help a Ph.D. student complete her dissertation they could take some questionnaires (about local public transportation) home and mail them back within 3 weeks in a prepaid envelope. The experimenter recorded the number of questionnaires they took, and also how many they mailed back within 3 weeks.

# **RESULTS**

#### **PRELIMINARY ANALYSES**

Because age has been shown to positively correlate with guilt (Orth et al., 2010), we tested whether there were any main or interactive effects for age on any of the dependent variables (i.e., guilt, number of questionnaires taken, and number of questionnaires returned). No significant effects were found. Likewise, no significant main or interactive effects were found for participants' sex, so the data from men and women were combined.

#### **PRIMARY ANALYSES**

The means and standard deviations for all dependent variables are in **Table 1**.


Note. Standard deviations are in parentheses. Negative affect was calculated without the item "guilty". Means having the same subscript are not significantly different from each other at the .05 significance level.

### **Guilt**

Guilt standardized scores were analyzed using a 3 (personalcleanliness versus other-cleanliness versus control) × 2 (before versus after manipulation) mixed-model ANOVA. The predicted condition × time interaction was significant, *F*(2,62) = 5.85, *p* = 0.004. As expected, guilt scores did not differ between conditions before the manipulation, *F*(2,62) = 1.57, *p* = 0.22. Thus, random assignment to conditions was successful. After the manipulation, however, guilt scores differed across conditions, *F*(2,62) = 10.75, *p* < 0.001. As expected, guilt scores were lower for participants in the *personal-cleanliness* condition than for participants in either the *other-cleanliness* condition (*d* = 1.05, *p* = 0.013) or the *control* condition (*d* = 1.81, *p* < 0.001). Guilt scores were also lower for participants in the *other-cleanliness* condition than for participants in the *control* condition (*d* = 0.76, *p* = 0.041).

#### **Positive and negative affect**

We also examined positive and negative affect to be sure our manipulation was specific to guilt. One-way ANOVA (personalcleanliness versus other-cleanliness versus control) showed no impact of condition on positive affect, *F*(2,62) = 0.71, *p* = 0.50. In addition, there was also no significant impact of condition on any of the other nine negative emotions (i.e., *afraid*, *ashamed*, *distressed*, *hostile*, *irritable*, *jittery*, *nervous*, *scared*, *upset*; *p*s > 0.10), or on all of the other nine negative emotions combined (Cronbach α = 0.82, *p* > 0.50).

There was, however, a significant impact of condition on the single item *guilty*, *F*(2,62) = 10.75, *p* < 0.001, As expected, *guilty* scores were lower for participants in the *personal-cleanliness* condition than for participants in either the *other-cleanliness* condition (*d* = 0.69, *p* = 0.013) or the *control* condition (*d* = 1.24, *p* < 0.001). *Guilty* scores were also lower for participants in the *other-cleanliness* condition than for participants in the *control* condition (*d* = 0.55, *p* = 0.041).

#### **Prosocial behavior**

One-way ANOVA found a significant effect of condition on the number of questionnaires participants completed and returned to the researchers by post, *F*(2,62) = 15.10, *p* < 0.001, As expected, participants in the *personal-cleanliness* condition returned fewer questionnaires than did participants in either the *other-cleanliness* condition (*d* = 0.34, *p* = 0.033) or the *control* condition (*d* = 1.00, *p* < 0.001). Participants in the *other-cleanliness* condition also returned fewer questionnaires than did participants in the control condition (*d* = 1.34, *p* < 0.001).

Because there was a significant correlation between the number of questionnaires taken and the number returned (*r* = 0.74, *p* < 0.001), we also computed the proportion of questionnaires taken that were completed and returned. The effect of condition was still significant, *F*(2,62) = 6.73, *p* = 0.002. As expected, participants in the *personal-cleanliness* condition mailed back fewer questionnaires than did participants in either the *other-cleanliness* condition (*d* = 0.68, *p* = 0.019) or the *control* condition (*d* = 1.01, *p* < 0.001). The latter two conditions did not differ (*d* = 0.34, *p* > 0.24), although the effect-size estimate was not trivial. According to Cohen (1988), *d* = 0.2 is a "small" effect, *d* = 0.5 is a "medium" effect, and *d* = 0.8 is a "large" effect.

# **Mediating effect of guilt**

We also used bootstrapping procedures (Preacher and Hayes, 2004) to test the mediating effects of guilt on the effect of condition (coded +1 = *personal-cleansing*, 0 = *other-cleansing*, −1 = *control*) on the number of questionnaires completed and returned. The results were presented in **Figure 1**, the standardized regression coefficient in parentheses was obtained from a model that included both cleansing and guilt as predictors of prosocial behavior. As can be seen in **Figure 1**, cleansing decreased guilt, and guilt, in turn, was positively related to prosocial behavior. The indirect effect of cleansing on prosocial behavior was significant, 95% confidence interval = −0.37 to −0.32, which excludes the value 0. Nearly identical results were obtained for the proportion of questionnaires returned (i.e., 95% confidence interval was −0.13 to −0.079, which also excludes the value 0).

# **DISCUSSION**

The present study showed that one can indeed wash the guilt away by washing one's hands, replicating previous studies and supporting current embodiment theories that argue that abstract concepts (in our case cleanliness and purity) are bodily embodied and reinstantiated by sensory and motor inputs. The present research, however, does not simply replicate previous research—it extends it in three important ways. First, it compared the effect of washing one's own hands versus watching someone else wash their hands. This comparison showed that although washing someone else wash their hands can cleanse some guilt away, it is not as effective as washing one's own hands. Thus, vicarious experience of cleanliness is not as effective as the action of cleansing (i.e., the personal embodiment of cleansing). However, watching someone else wash his or her hands did have an effect on reducing guilt compared with the control condition. Our findings suggest that while watching another person wash his or her hands, the brain simulates the comparable sensory and motor experience so that it induces vicarious feelings of "cleanliness" and primes the concepts of "cleanliness" and "purity", which counteracts and reduces feelings of guilt and its consequent effect on promoting prosociality. However "top-down" simulation might not be as vivid and convincing as "bottom-up" reactivation, perhaps due to less activated neurons in visual and motor modalities. It is also plausible that the concepts of "cleanliness" and "purity" are more likely to be embodied in tactile and olfactory modalities rather than in the visual modality (e.g., Schnall et al., 2008). Our findings contribute to embodiment theories in that they showed the effect of vicarious cleansing on reducing guilt, and that vicarious cleansing may be less effective than personal embodiment of cleansing.

Second, the present research included a measure of prosocial behavior to measure the behavioral effects of washing one's guilt away. Participants could help a Ph.D. student complete her dissertation simply by completing some questionnaires, in the comfort of their own home, and within a lengthy time period (i.e., 3 weeks). As expected, participants who washed their own hands completed the fewest number questionnaires within the 3-week period. It is remarkable that the effects of washing one's hands can last up to 3 weeks. The difference in proportion of questionnaires returned between the other-cleanliness and personal-cleanliness condition suggests that "bottom-up" reactivation might have longer effect than "top-down" simulation on activating concepts of "cleanliness" and "purity". Again, this might be attributed to fewer neurons involved in the embodying process than in the actual experience.

Third, the present study explains why cleansing decreases prosocial behavior. Our mediation analysis showed that cleansing, especially personal-cleansing, reduced guilt. The less guilty participants felt, in turn, the less likely they were to help the Ph.D. student complete her dissertation. No previous study has included all the three elements (i.e., cleansing, guilt, and prosocial behavior).

This study, like most studies, raises questions as well as answers them. It's still not clear whether and how guilt *per se* is embodied somewhere in the brain's multi-modal system. Does guilt share the same modalities with concepts such as "cleanliness" and "purity?" Does it demand more interoceptive stimuli inputs? How do self-representations fit into the framework of embodiment? Future research should address these questions. In addition, future research should apply embodied cognition theories to selfconscious emotions (pride, guilt, embarrassment, shame, etc.) whose phylogeny is generally inferred based on reasoning independent of perceptual modalities.

In summary, Pilate and Lady Macbeth probably did feel less guilty after washing their hands, much like the participants in our study. Washing one's hand can wash the guilt away. Unfortunately, washing one's hands of guilt can also reduce prosocial behavior. Although washing one's hands is good for hygiene, it is bad for social relationships.

# **AUTHOR CONTRIBUTIONS**

Hanyi Xu and Laurent Bègue designed the experiment. Hanyi Xu performed the study. Hanyi Xu and Brad J. Bushman conducted the data analyses. The experiment was carried out in the lab of Laurent Bègue. Hanyi Xu, Laurent Bègue, and Brad J. Bushman wrote the main manuscript text. All authors reviewed the manuscript.

#### **ACKNOWLEDGMENTS**

The authors disclosed receipt of the following financial support for the research of this article: University Institute of France/University of Grenoble.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 22 November 2013; accepted: 08 February 2014; published online: 28 February 2014.*

*Citation: Xu H, Bègue L and Bushman BJ (2014) Washing the guilt away: effects of personal versus vicarious cleansing on guilty feelings and prosocial behavior. Front. Hum. Neurosci. 8:97. doi: 10.3389/fnhum.2014.00097*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Xu, Bègue and Bushman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Homuncular mirrors: misunderstanding causality in embodied cognition

# *Ezequiel P. Mikulan1,2, Lucila Reynaldo1 and Agustín Ibáñez 1,2,3,4\**

*<sup>1</sup> Laboratory of Experimental Psychology and Neuroscience, Institute of Cognitive Neurology, Favaloro University, Buenos Aires, Argentina*

*<sup>2</sup> UDP-INECO Foundation Core on Neuroscience, Diego Portales University, Santiago, Chile*

*<sup>3</sup> National Scientific and Technical Research Council, Buenos Aires, Argentina*

*<sup>4</sup> Department of Psychology, Universidad Autónoma del Caribe, Barranquilla, Colombia*

*\*Correspondence: aibanez@ineco.org.ar*

#### *Edited by:*

*Analia Arevalo, East Bay Institute for Research and Education, USA*

#### *Reviewed by:*

*Mariella Pazzaglia, University of Rome La Sapienza, Italy*

#### **Keywords: embodied cognition, mirror neuron system, networks, language understanding, causality**

Emerging theories on embodied cognition have caused high expectations, ambitious promises, and strong controversies. Several criticisms have been explained elsewhere (Mahon and Caramazza, 2008; Cardona et al., 2014) and will not be discussed further here. In this paper, we will focus on a specific explanatory strategy frequently assessed by the radical embodied cognition approaches: the use of homuncular explanations for the explicit (or implicit) attribution of causal roles in the comprehension of language understanding. We first present this criticism regarding a prototypical example: the mirror neuron system (MNS) (Rizzolatti and Craighero, 2004; Iacoboni and Dapretto, 2006) in the field of language understanding and then extend our conclusions to other programs of embodied cognition. Here we discuss the radical claims that propose the MNS as the putative mechanism for multiple cognitive and social psychology constructs (e.g., Gallese, 2008; Cattaneo and Rizzolatti, 2009; Iacoboni, 2009) and the critical role of the MNS in language understanding (Heyes, 2010a; Hickok, 2013).

### **A BIG PROBLEM: HOMUNCULARITY AND CAUSALITY OF THE MNS**

In the homuncular explanation (Clark, 1997; Kolak et al., 2006), a phenomenological description of a cognitive event attributed to a whole person (in this case, language understanding) is granted to a subset of brain regions (in this case, the MNS) by using discrete representations. This is the case for radical MNS accounts. The MNS helps in understanding observed actions by extracting and representing goals or meanings (Rizzolatti et al., 2001; Rizzolatti and Craighero, 2004). The fundamental role proposed for the MNS is that of allowing the individual to understand the goal of the action he/she is observing (Fogassi et al., 2005). Gallese (2006) proposed that the MNS allows one to directly access the understanding of others. The so-called "direct-matching hypothesis" suggests that "an action is understood when its observation causes the motor system of the observer to 'resonate'" (Rizzolatti et al., 2001). Thus, the MNS is proposed as an automatic and mandatory mechanism for understanding (Csibra, 2007).

These "homuncular" approaches to the MNS have favored a plethora of mesmerizing functional explanations, from action to higher social cognition (Heyes, 2010b). In the case of language, the intrinsically linguistic property of "understanding" becomes a property of MNS activation. Contrary to homuncular explanations, current brain network approaches to language (Turken and Dronkers, 2011; Friederici and Gierhan, 2013) have shown that language processing requires an orchestrated coordination of different brain regions indexing different processes. The MNS probably plays an important role in priming or facilitating understanding (or even perhaps in indexing action semantics), but this does not imply that the MNS plays a key role in language *understanding*. Even in action language processing, where the MNS seems to be more engaged, other non-MNS regions (such as specific sites for language processing and motor habits) seem to play an important role (Arbib, 2010; van Dam et al., 2010; Amoruso et al., 2013; Cardona et al., 2013; Ibanez et al., 2013; Sakreida et al., 2013). Thus, a single MNS process explaining the whole phenomenon of understanding seems to be a less fruitful approach when compared with a network view of language processing.

The homuncular explanation attributes a causal role to a specific region regarding a complete function. In this radical approach, instead of considering the MNS as an important hub of a network indexing language properties, the MNS itself seems to generate language understanding. Several radical claims highlighting this causal mechanism in language understanding have been proposed. For example, Pulvermüller (2005a) wrote: ". . . words that denote internal states, such as 'pain' or 'disgust,' can be understood *only* because both speaker and listener can relate them to similar motor programs..." (italics mine); and furthermore: "understanding language means relating language to one's own actions." Aziz-Zadeh et al. (2006) declare: "these results suggest a key role of mirror neuron areas in the re-enactment of sensory-motor representations during conceptual processing of actions invoked by linguistic stimuli" (see also Zarr et al., 2013).

Considering the MNS as a causal explanatory mechanism for language understanding would appear like a pseudo-explanation. The homuncular, metonymic attribution of language understanding as a causal property of the MNS involves nothing but a lack of explanation. In spite of these radical claims about the MNS, to our knowledge there is no canonical or putative mechanistic explanation for language understanding based on the MNS. By definition, the MNS contains mirror neurons and other neurons for matching the observation and execution of action (Rizzolatti and Craighero, 2004; Iacoboni and Dapretto, 2006). How does the MNS generate or produce understanding? Just by resonating when observing or executing actions? The MNS property of being activated when observing/executing actions is not an explanation of how language understanding emerges. At the very least, language understanding requires syntactic and semantic access, memory, executive functions, and other languagespecific knowledge (Binder et al., 1997; Friederici, 2011; Price, 2012). A subset of neurons in an artificial system can easily be trained to respond to action observation/execution, mimicking the basic definition of the MNS. Nevertheless, this property by itself will surely not generate language understanding. The main problem with the explanation of language understanding as MNS activity is that there is no real explanation at the level of language content.

Is MNS activation a cause or an accompanying effect of language understanding? There is a lack of empirical evidence for the putative causal role of MNS in language understanding. In cognitive neuroscience, there are illustrative examples of the causal role of an area in a function. For example, electrical stimulation of the anterior insula triggers the experience of disgust (Caruana et al., 2011). Similarly, electrical stimulation of the fusiform gyrus can selectively disrupt face perception (Parvizi et al., 2012). Therefore, we can conclude that the insula and the fusiform gyrus have a causal and critical role in the experience of disgust and in face perception, respectively. Those cases do not have a full causal explanation (in the Aristotelian sense of an "efficient" cause) because these regions are connected with several other brain regions whose involvement also affects the emotional and perceptual response. Nevertheless, it is still possible to suggest a critical role of these regions in the generation of the disgust experience or in facial perception.

Focal lesion studies may provide more direct answers to these questions (Rorden and Karnath, 2004). Reports on aphasic and apraxic patients fully support the embodied nature of cognition. However, these have yielded controversial results regarding the causal explanations of "understanding." Overlaps (Rothi et al., 1985; Saygin et al., 2003; Nelissen et al., 2010) and dissociations (Rothi et al., 1991; Mahon and Caramazza, 2005) between language and action networks have been reported. In any case, the overlap is not enough to assert that understanding occurs as an effect of motor resonance or to establish a unidirectional causal explanation. Experiments in which researchers are able to show a given region's critical role in a specific function are extremely scarce in MNS research regarding language understanding. To our knowledge, there is no single experiment demonstrating that MNS activity plays a causal role in language understanding instead of merely reflecting it. Thus, the strong claims about the causal role of the MNS in language understanding contrast with the scarce available evidence.

Most of the evidence regarding the MNS and action language is centered around facilitation effects, i.e., understanding is not dependent on MNS activation (measured directly or indirectly), but is only facilitated by it. For example, Pulvermüller et al. (2005b) showed that Transcranial Magnetic Stimulation (TMS) of the hand area in the left hemisphere led to faster responses to hand-related words in a lexical decision task, while stimulating the leg area in the left hemisphere had the same effect on leg-related words. This effect was not present in control conditions (stimulating the right hemisphere and sham stimulation). Similarly, Tucker and Ellis (2004) found a response compatibility effect when subjects used an input device that required either a power or a precision grip to indicate whether objects that required either type of grip were natural or man-made. Responses were faster when the presented object (picture or word) required the same grip type as the input device. Most studies show language understanding as capacity that is facilitated by MNS involvement or attenuated by MNS disruption. Nevertheless, no studies have assessed interfered or abolished understanding, or shown a critical dependence on the MNS. Thus, evidence suggests that the MNS reflects the effect of understanding rather than causing it (Hickok, 2013). The MNS might play an important role in general associative learning (Heyes, 2010b; Cooper et al., 2013) or a specific facilitation/priming effect in language understanding, but not a causal role in understanding by itself. There is no doubt about the activation of the MNS during execution and observation, but several concerns arise when this activation is interpreted as a causal explanatory mechanism in several cognitive domains.

# **CAUSAL EXPLANATIONS IN NEUROSCIENTIFIC EMBODIED COGNITION**

The notion of a causal role for MNS in language understanding is a prototypical example of a radical claim that a single region subserves understanding. In the language domain, other similar causal explanations have been proposed. The Embodied Semantics theory claims that processing the meaning of a concept recruits the same neural networks that underlie the perceptual and motor experiences associated with it (Gallese and Lakoff, 2005). In other words, regions that are activated during action observation and action execution should also be activated during the comprehension of words referring to those actions. It has been reported that this activation follows a somatotopical pattern (Hauk et al., 2004; Pulvermüller, 2005a), that is, leg concepts ("kicking") activate the homuncular leg area in the motor cortex and mouth concepts ("eating") activate the mouth area. Even though evidence has shown this type of activation pattern, the match is not exact and the overlap is inconsistent within and across different studies (Postle et al., 2008; Turella et al., 2009; Arbib, 2010; Fernandino and Iacoboni, 2010; Arevalo et al., 2012). Other regions that have been implicated in tasks involving the processing of linguistic stimuli are the prefrontal cortex, the temporal lobe and the cerebellum (Arbib, 2010). Furthermore, lesions to the motor cortex do not necessarily cause deficits in action-word processing (Saygin et al., 2004). In sum, although there is motor and premotor activation when processing language (Glenberg et al., 2008) and linguistic comprehension might be enhanced by it, there is no conclusive empirical evidence showing that this is a sufficient mechanism for linguistic understanding (Fischer and Zwaan, 2008).

Other areas of embodied cognition, including radical MNS approaches to action understanding, imitation, emotion, and social cognition, present the same potential pitfall: the temptation to use a simplistic homuncular explanation for the phenomenon of understanding through a single resonating brain area. Current brain network views and non-MNS accounts of classical domains such as action observation/recognition (Buccino et al., 2004; Kokal et al., 2009), imitation (Molenberghs et al., 2009), language (Grodzinsky et al., 2000; Hickok and Poeppel, 2007; Friederici, 2011) and social cognition (emotion, empathy, and theory of mind; Baird et al., 2011; Decety et al., 2012; Ibanez and Manes, 2012; Kennedy and Adolphs, 2012) can be integrated with the experience-based and situated nature of cognition without appealing to a simplistic execution-observation matching system or attributing the cognitive phenomenon of interest to a single brain region. Although our experience is embodied, our emotions are embodied, and even our culture is embodied, this does not mean *ipso facto* that the activation of discrete hypothetical representations in a single region would be enough to explain the emergence of understanding. In other words, emotions, language and culture are grounded (Barsalou, 2008) in our bodily experiences, but this does not necessarily mean that there is a simple isomorphism between the actual body and the spatiotemporally-distributed activity of body signals in the brain (Berlucchi and Aglioti, 2010).

The extremely significant emergence of embodied cognition, highlighting the role of the body, emotions and culture as well as the subjective experience in shaping the human mind, can and should be detached from a simplistic and at the same time radical homuncular view that human cognitive understanding is ruled by a single discrete brain area.

#### **ACKNOWLEDGMENTS**

This work was partially supported by grants from CONICET, CONICYT/ FONDECYT Regular (1130920), FONCyT-PICT 2012-0412, FONCyT-PICT 2012-1309, and the INECO Foundation.

# **REFERENCES**


evidence from aphasia. *Brain* 126(pt 4), 928–945. doi: 10.1093/brain/awg082


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 07 February 2014; accepted: 23 April 2014; published online: 13 May 2014.*

*Citation: Mikulan EP, Reynaldo L and Ibáñez A (2014) Homuncular mirrors: misunderstanding causality in embodied cognition. Front. Hum. Neurosci. 8:299. doi: 10.3389/fnhum.2014.00299*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Mikulan, Reynaldo and Ibáñez. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Mechanical knowledge, but not manipulation knowledge, might support action prediction

# *François Osiurak1,2\**

*<sup>1</sup> Laboratoire d'Etude des Mécanismes Cognitifs (EA 3082), Université de Lyon, Bron Cedex, France <sup>2</sup> Institut Universitaire de France, Maison des Universités, Paris, France*

*\*Correspondence: francois.osiurak@univ-lyon2.fr*

#### *Edited by:*

*Analia Arevalo, East Bay Institute for Research and Education, USA*

*Reviewed by: Buxbaum Laurel, Moss Rehabilitation Research Institute, USA*

**Keywords: affordance, apraxia, manipulation knowledge, mechanical knowledge, tool use**

### **A commentary on**

# **The affordance-matching hypothesis: how objects guide action understanding and prediction**

*by Bach. P., Nicholson, T., and Hudson, M. (2014). Front. Hum. Neurosci. 8:254. doi: 10.3389/fnhum.2014.00254*

Bach et al. (2014) proposed a novel model of action understanding, the affordancematching hypothesis, to explain how people both interpret and predict actions of others. This model is based on two types of information. The first is function knowledge and is supposed to inform people about the goals that can be achieved with tools. The second is manipulation knowledge and is thought to provide information about the motor behaviors required to achieve these goals. In their model, function knowledge and manipulation support action interpretation and action prediction, respectively. Here, I mainly discuss the idea that manipulation knowledge might be central to action prediction.

The distinction made by Bach et al. (2014) between function knowledge and manipulation knowledge is inspired to some extent from a part of the literature on apraxia (e.g., Buxbaum and Saffran, 2002; van Elk et al., 2014). In their model, function knowledge is viewed as storing information about the goals of tools, namely, their usual function<sup>1</sup> . For instance, as they wrote, people know that "a tap is for getting water." By contrast, manipulation knowledge would be useful to determine what are the motor behaviors required to use tools (e.g., knowing that a tap requires turning it clockwise). This way of conceptualizing the cognitive bases of human tool use has however been intensively debated in recent years. Particularly, a growing body of evidence indicates a strong link in left brain-damaged apraxic patients between the ability to actually use familiar tools (i.e., the use of a tool with its corresponding object, such as a hammer with a nail) and the ability to use novel tools to solve mechanical problems (Goldenberg and Hagmann, 1998; Goldenberg and Spatt, 2009; see also Osiurak et al., 2009; Jarry et al., 2013; Osiurak et al., 2013). In line with this, it has been proposed that mechanical knowledge, but not manipulation knowledge, might be central to tool use, by allowing people to reason about physical object properties (Osiurak et al., 2010, 2011; Goldenberg, 2013; Osiurak, 2014). Contrary to Bach et al. (2014), the mechanical knowledge hypothesis posits that what people learn when using a tap is not that a clockwise rotation of the hand is needed, but rather that a clockwise rotation of the tap is needed. In this framework, motor behaviors are adjusted on-line on the basis of the prediction of the tool use action to be done. Interestingly, a strong link between mechanical knowledge and the left inferior parietal lobe has also been documented, challenging the role of this cerebral region for the storage of manipulation knowledge (for reviews, see Goldenberg, 2013; Orban and Caruana, 2014; Osiurak, 2014).

Another important aspect concerns the role of function knowledge. Patients with a selective impairment of function knowledge have been shown to be still able to actually use familiar tools with their corresponding objects as well as to use novel tools to solve mechanical problems (for a review, see Osiurak et al., 2011). In other words, function knowledge is neither sufficient nor necessary for tool use (Buxbaum et al., 1997). So the intriguing issue is, what is the role of function knowledge? It has been recently proposed that function knowledge might be useful for determining the social usages associated with tools (Osiurak et al., 2010, 2011; see also Goldenberg, 2013; Osiurak, 2014). For example, function knowledge can help someone to know that a knife can be used to cut tomatoes or meat, open an envelope, peel a fruit, and so on. However, this knowledge is not viewed as supporting tool use *per se*. After all, people can know that a stethoscope can be found in a medical context and that its function is to "listen to the heart" without being able to use it properly. To do so, mechanical knowledge is required. Consequently, as Bach et al. (2014) suggested, function knowledge can indeed be of primary interest to interpret the actions of others, by determining in function of the context and of the social usages associated with the tool the potential goals of the action.

Having said this, I propose to revise their model by modifying the idea that action prediction is supported by manipulation knowledge (see **Figure 1**). Rather, I assume that people might predict the outcomes of the actions made by others

<sup>1</sup>Note that, contrary to Bach et al. (2014), many authors even assume that function knowledge is the basis for predicting actions of others because it might contain information about the specific actions associated with the physical properties of tools (e.g., van Elk et al., 2014). Nevertheless, I will not discuss this aspect in more detail here.

by using mechanical knowledge. To illustrate it, let us come back to an example given by Bach et al. (2014). As they stated: "Imagine, for example, the unpleasant situation of standing across from another person holding a gun. Object knowledge specifies that a gun is for shooting (function knowledge), and that, in order to achieve this goal, the gun would have to be raised, pointed at the target, and fired (manipulation knowledge)" (Bach et al., 2014; p. 3). In this example, Bach et al. (2014) implied that the position of the gun to be correctly used as well as its utilization derive from manipulation knowledge. However, it is also possible to stress that mechanical knowledge is needed to guide the user to correctly position the gun and to use it. In addition, the issue is how manipulation knowledge can help you to know that the bullet can kill you. This is purely independent from the motor behaviors of the user. However, this prediction can vary according to whether you wear bulletproof vest or not. In other words, to know whether the bullet will kill you or not, you need mechanical knowledge to compare the physical properties of the bullet with those of your body or of your bulletproof vest. Again, in this case, manipulation knowledge is absolutely unnecessary to predict the outcomes of the action.

In sum, the model proposed by Bach et al. (2014) provides an appropriate account to think about the potential sources of information at the basis of action interpretation and prediction. However, I am not convinced that manipulation knowledge is the appropriate theoretical construct that can explain how people predict the actions of others. Before concluding, I would like to emphasize that the revised model I propose can be viewed as a strong version of the mechanical knowledge hypothesis, excluding any role for manipulation knowledge in action understanding. This might appear surprising considering the significant literature supporting the importance of this knowledge for action and object representation (for recent publications, see Yee et al., 2013; Buxbaum, 2014; Buxbaum et al., 2014a,b). In a way, there is here an apparent discrepancy raising the key issue of whether the brain stores mechanical and/or manipulation knowledge. The available evidence is not sufficient to answer it, suggesting interesting perspectives for future research in the field.

# **ACKNOWLEDGMENTS**

This work was supported by grants from ANR (Agence Nationale pour la Recherche; Project Démences et Utilisation d'Outils/Dementia and Tool Use, N◦ANR 2011 MALZ 006 03), and was performed within the framework of the LABEX CORTEX (ANR-11- LABX-0042) of Université de Lyon, within the program "Investissements d'Avenir" (ANR-11- IDEX-0007) operated by the French National Research Agency (ANR).

# **REFERENCES**

Bach, P., Nicholson, T., and Hudson, M. (2014). The affordance-matching hypothesis: how objects guide action understanding and prediction. *Front.* *Hum. Neurosci.* 8:254. doi: 10.3389/fnhum.2014. 00254


1964–1972. doi: 10.1016/j.neuropsychologia.2013. 06.017


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 June 2014; accepted: 02 September 2014; published online: 17 September 2014.*

*Citation: Osiurak F (2014) Mechanical knowledge, but not manipulation knowledge, might support action prediction. Front. Hum. Neurosci. 8:737. doi: 10.3389/ fnhum.2014.00737*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Osiurak. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The affordance-matching hypothesis: how objects guide action understanding and prediction

# **Patric Bach\*, Toby Nicholson and Matthew Hudson**

School of Psychology, University of Plymouth, Drake Circus, Devon, UK

#### **Edited by:**

Analia Arevalo, East Bay Institute for Research and Education, USA

#### **Reviewed by:**

Cosimo Urgesi, University of Udine, Italy Sebo Uithol, Universitá degli Studi di Parma, Italy

**\*Correspondence:**

#### Patric Bach, School of Psychology,

University of Plymouth, Drake Circus, Plymouth, Devon PL4 8AA, UK

e-mail: patric.bach@plymouth.ac.uk

Action understanding lies at the heart of social interaction. Prior research has often conceptualized this capacity in terms of a motoric matching of observed actions to an action in one's motor repertoire, but has ignored the role of object information. In this manuscript, we set out an alternative conception of intention understanding, which places the role of objects as central to our observation and comprehension of the actions of others. We outline the current understanding of the interconnectedness of action and object knowledge, demonstrating how both rely heavily on the other. We then propose a novel framework, the affordance-matching hypothesis, which incorporates these findings into a simple model of action understanding, in which object knowledge—what an object is for and how it is used—can inform and constrain both action interpretation and prediction. We will review recent empirical evidence that supports such an object-based view of action understanding and we relate the affordance matching hypothesis to recent proposals that have re-conceptualized the role of mirror neurons in action understanding.

**Keywords: affordances, action understanding, action prediction, object function, object manipulation**

# **ACTION UNDERSTANDING IN AN OBJECT CONTEXT: THE AFFORDANCE-MATCHING HYPOTHESIS**

Action understanding lies at the heart of social interaction. Knowing the goal of another person's action allows one to infer their internal states, predict what they are going to do next, and to coordinate one's own actions with theirs (Hamilton, 2009; Sebanz and Knoblich, 2009; Bach et al., 2011). The ability to understand others' actions is often assumed to rely on specialized brain systems that "directly map" observed motor acts to a corresponding action in the observer's motor repertoire, allowing it to be identified and its goal to be derived (Rizzolatti et al., 2001; Gazzola and Keysers, 2009). In monkeys, mirror neurons have been discovered that fire both when the monkey executes a particular action, and when it merely observes the same actions being executed by someone else (Pellegrino et al., 1992; Gallese et al., 1996). Also for humans, there is now converging evidence that action observation engages neuronal ensembles also involved in action execution, and that these ensembles code specific actions across both domains (Fadiga et al., 1995; Chong et al., 2008; Mukamel et al., 2010; Oosterhof et al., 2010, 2012).

Yet, even though there remains little doubt that action-related representations are also activated when one observes others act, attempts to directly link these activations to goal understanding have been less successful. There is little evidence from lesion or transcranial magnetic stimulation studies that would reveal a critical role of motor-related brain areas for understanding the actions of others (Catmur et al., 2007; Negri et al., 2007; Kalénine et al., 2010; but see Avenanti et al., 2013b; Rogalsky et al., 2013). Similarly, whereas some imaging studies revealed an involvement of mirror-related areas in action understanding tasks, such as the inferior frontal gyrus or the anterior intraparietal sulcus (Iacoboni et al., 2005; Hamilton and Grafton, 2006), a growing number of studies point to areas outside the classical observationexecution matching system, such as the medial prefrontal cortex, the superior temporal sulcus, or the posterior temporal lobe (Brass et al., 2007; de Lange et al., 2008; Liepelt et al., 2008b; Kalénine et al., 2010). Others reveal that mirror-related brain activations are primarily found for meaningless actions, where kinematics is the only information available (Hétu et al., 2011), substantially limiting the theoretical reach of motoric matching accounts. Finally, there are theoretical reasons why motor or kinesthetic information, on which direct matching is assumed to be based, does not suffice to unambiguously identify the goals of complex human motor acts. For example, most human motor behaviors (e.g., picking up something) can be performed in various circumstances to achieve a variety of goals, such that a one-to-one mapping of actions to goals is not possible (e.g., Hurford, 2004; Jacob and Jeannerod, 2005; Uithol et al., 2011).

These observations have posed a challenge to motor-matching views of action understanding, and have led several theorists to suggest either that the direct-matching account has to be revised, or that motoric matching cannot be the primary driver of action understanding in humans (Bach et al., 2005, 2011; Csibra, 2008; Kilner, 2011). Here, we propose a new view, which incorporates the available data on motoric matching and mirror neurons, but places them in a model of action understanding that emphasizes the role of object knowledge, which helps predict and interpret any observed motor act. Such a combined model, we argue, can explain extant data and account for several of the observed inconsistencies. In the following, we will (1) briefly review the current understanding of action knowledge associated with objects; (2) sketch a basic model of how this knowledge could contribute to action understanding, and (3) review common findings in humans and monkeys on the use of object-related knowledge in action observation in the light of this model.

Throughout the manuscript we use the term "goal" to refer to desired states of the environment, one's own body, or mind. Following Csibra (2008), we presuppose that goals can be located at different levels, reaching from simple, low level goals, such as completing a grasp or hammering in a nail, to distal goals such as hanging up a picture frame. We use the term "action" to refer to bodily movements that are performed with the express purpose to achieve such a goal. The term "target objects" or "recipient objects" are used to refer to the objects affected by these actions.

# **ACTION INFORMATION PROVIDED BY OBJECTS**

The effective use of objects sets humans apart from even their closest relatives in the animal kingdom (e.g., Johnson-Frey, 2003). Most human actions involve objects, either as the recipient to be *acted upon*, or as a tool to be *acted with* (cf. Johnson-Frey et al., 2003). The capacity to use objects has unlocked a vast range of effects humans can achieve in the environment that would otherwise be outside the scope of their effector systems. They range from cutting with a knife, shooting a gun, to sending a text message with a mobile phone, and traveling the world with various types of vehicle.

The capacity for using these objects is underpinned by a specialized network in the left hemisphere, spanning frontal, parietal and temporal regions (Haaland et al., 2000; Johnson-Frey, 2004, for review; Binkofski and Buxbaum, 2013; for reviews, see van Elk et al., 2013), some of which appear to be unique to humans (Orban et al., 2006; Peeters et al., 2009, 2013). This network supports object-directed action by coding (at least) two types of information. For every object, humans learn not only what goals they can, in principle, achieve with it ("function knowledge"), but also the motor behaviors that are required to achieve these goals ("manipulation knowledge") (Kelemen, 1999; Buxbaum et al., 2000; Buxbaum and Saffran, 2002; Casby, 2003, for a review, see van Elk et al., 2013). When growing up, one learns, for example, that a tap is for getting water, and that this requires turning it clockwise. Similarly, one learns that a knife is for cutting, and that this requires alternating forward and backwards movements, with an amount of downward pressure that depends on the object one wants to cut. Objects, therefore, seem to provide one with the same links between potential action outcomes and required motor behaviors that are central to the control of voluntary action (see Hommel et al., 2001). These links allow objects to act as an interface between an actor's goals and their motor system (cf. van Elk et al., 2013). They allow actors not only to decide *whether* they want to use an object (by matching object functions to one's current goals), but also—if they do—to derive *how* to utilize the object to achieve the desired result (by using manipulation knowledge to guide one's motor behaviors with the object).

Whenever people interact with objects at least some aspects of this knowledge are activated automatically (e.g., Bub et al., 2003, 2008). In the monkey premotor cortex, so called *canonical neurons* have been discovered that fire not only when the monkey executes a specific grip (e.g., a precision grip), but also if it merely observes an object which requires such a grip (a small object such as a peanut), indicating a role in linking objects to actions (Murata et al., 1997). Similar evidence comes from behavioral and imaging studies in humans. Passively viewing an object, for example, has been shown to activate not only the basic movements for reaching and grasping it (e.g., Tucker and Ellis, 1998, 2001; Grèzes et al., 2003; Buccino et al., 2009), but also—under appropriate circumstances—the more idiosyncratic movements required for realizing the objects' specific functions (e.g., the swinging movement required to hammer in a nail; for a review, Creem and Proffitt, 2001; Bach et al., 2005; Bub et al., 2008; van Elk et al., 2009; see van Elk et al., 2013).

Action information is such a central aspect of human object knowledge that it directly affects object identification and categorization. Already in 12 month old infants, object function contributes to object individuation and categorization (e.g., Booth and Waxman, 2002; Kingo and Krøjgaard, 2012). In adults, several studies have shown that an object is identified more easily when preceded by an object with either a similar or complementary function (e.g., corkscrew, wine bottle) (e.g., Riddoch et al., 2003; Bach et al., 2005; McNair and Harris, 2013), or one that requires similar forms of manipulation (e.g., both a piano and a keyboard require typing, Helbig et al., 2006; McNair and Harris, 2012). These results are mirrored on a neurophysiological level by fMRI repetition suppression effects for objects associated with similar actions, even when these objects are only passively viewed (e.g., Yee et al., 2010; Valyear et al., 2012).

Other studies document the tight coupling of function and manipulation knowledge (see van Elk et al., 2013 for a review). Several imaging studies have revealed at least partially overlapping cortical representations for function and manipulation knowledge (Kellenbach et al., 2003; Boronat et al., 2005; Canessa et al., 2008). Similarly, it has been known for a long time that lesions to the left-hemispheric tool networks disrupt knowledge not only of what the objects are "for"—goals that can achieved with them—but also knowledge of how they have to be used, while disruptions of function knowledge only are rare (Ochipa et al., 1989; Hodges et al., 1999; Haaland et al., 2000; Buxbaum and Saffran, 2002; Goldenberg and Spatt, 2009). In addition, there is a host of behavioral studies demonstrating that the activation of manipulation knowledge is tied to the prior activation of function/goal information, both on the behavioral (Bach et al., 2005; van Elk et al., 2009; McNair and Harris, 2013) and on the neurophysiological level (Bach et al., 2010b). For example, in a recent study based on Tucker and Ellis (1998) classic affordance paradigm, it was shown that which of an object's manipulation was retrieved—grasping for placing or for functional object use was determined by which goal was suggested by the surrounding context (see also Valyear et al., 2011; Kalénine et al., 2013).

# **THE AFFORDANCE-MATCHING HYPOTHESIS**

The basic assumption of the affordance-matching hypothesis is that manipulation and function knowledge about objects cannot only be used during action execution, but also for predicting and understanding the actions of others. In the same way as object function and manipulation knowledge can act as the interface between one's own goal and motor systems, it can provide one with similar links between the inferred goals of others and their likely motor behaviors.

The affordance-matching hypothesis has two main features. The first feature is the assumption that whenever we see somebody else in the vicinity of objects, the associated function and manipulation knowledge is retrieved (see **Figure 1**, top panel, cf. Rochat, 1995; Stoffregen et al., 1999; Costantini et al., 2011; Cardellicchio et al., 2013; for a review, see Creem-Regehr et al., 2013), constrained by further contextual cues such as other objects or social signals (see below). As is the case for one's own actions, this provides the observer with immediate knowledge about the potential goals of the actor (through function knowledge: what the objects are *for)*, as well as the bodily movements that would be required to achieve these goals (through manipulation knowledge: *how* the objects have to be used). Imagine, for example, the unpleasant situation of standing across from another person holding a gun. Object knowledge specifies that a gun is for shooting (function knowledge), and that, in order to achieve this goal, the gun would have to be raised, pointed at the target, and fired (manipulation knowledge). Thus, simply deriving function and manipulation knowledge about the objects somebody acts with—without taking into account the specific motor behavior they perform—can serve both interpretative and predictive roles. Function knowledge supports *action interpretation* because knowledge about what an object is for provides insights into the potential goals of the other person. In contrast, manipulation knowledge aids *action prediction*, because knowledge about how an object is handled highlights potentially forthcoming actions, supporting more efficient identification and interaction.

The second major feature of the affordance-matching hypothesis is the assumption that, as during action production, an object's function and manipulation knowledge are coupled, so that activating one also activates the other. This coupling substantially enhances the predictive and interpretative contributions of object knowledge, depending on the flow of information for function to manipulation knowledge or vice versa (**Figure 1**, middle and lower panel). Consider, for example, that most objects have multiple uses—even the gun could be given to someone, holstered, or harmlessly laid on a table—and there are typically multiple objects in a scene, each associated with a number of functional manipulations. We assume that these objects are not weighted equally during action observation. Instead, as it is the case during own action planning (e.g., Valyear et al., 2011; Kalénine et al., 2013), those objects will be highlighted, the functions of which are most in line with the (inferred) goals of the actor. Moreover, because object knowledge ties these functions to specific manipulations, the identification of such a functionally matching object can directly activate the associated motor behaviors, leading to action predictions that are in line with the inferred goals (**Figure 1**, middle panel).

Previous research has established that additional objects in the environment—especially potential recipients of the action are another major determinant for which action goals are preactivated. Seeing a person holding a hammer might activate hammering movements to a stronger extent when this person is also holding a nail than when they are holding a toolbox (cf. Bach

movements. Bottom panel: flow of information during action interpretation. Observed behavior that matches an object's manipulation activates the corresponding function, which in turn provides information about the actor's goal.

functions. The associated manipulation knowledge predicts forthcoming

et al., 2005, 2009, 2010b; Yoon et al., 2012; McNair and Harris, 2013). Social cues are another important influence, as cues such as gaze or emotional expression can directly supply action goals. In the above example, if the person shows an angry facial expression and tone of voice, his actions of raising the arm and pulling the

**prediction and interpretation**. Top panel: Action prediction. Prior knowledge of an actor's goal (shooting) activates knowledge of objects with corresponding function. The associated manipulation knowledge (raising the arm, pulling the trigger) supports action prediction by biasing visual perception towards these manipulations. Lower panel: Action interpretation. Observed behavior is matched to the manipulations supported by the object. If both match, the corresponding functions are activated, providing likely goals of the actor.

trigger will be foremost in our mind (**Figure 2**, upper panel), while a calm voice and friendly manner might at least make us consider the other possible meaningful actions one can do with a gun.

Here, therefore, flow of information from object function to manipulation aided action prediction. In contrast, the interpretation of observed motor behavior can benefit from the reverse flow of information: from manipulation to function knowledge. Note that, in many cases, an observed motor act is, by itself, devoid of meaning. The same—or at least very similar—motor act can be used for various purposes. Consider the everyday actions of inserting a credit card into a cash machine, or a train ticket into a ticker canceller. Motorically, both actions are virtually identical, but they serve very different goals (cf. Bach et al., 2005, 2009, 2010b; Jacob and Jeannerod, 2005). However, knowledge about the objects involved can directly disambiguate such alternative interpretations. Because object knowledge links the different manipulations of a tool with distinct functions, the detection of a motor behavior that matches such a manipulation can directly confirm the associated action goal (**Figure 1**, lower panel). In the above example, if the person with the gun in the hand indeed raises their arm, the interpretation is clear: with a gun in the hand, the otherwise meaningless motion of raising the arm is predicted by the goal of shooting (**Figure 2**, lower panel).

This interpretative role of object knowledge becomes particularly important if one considers that not only motor acts are ambiguous, but the functions of objects are as well. Some objects can be handled in different ways, and produce different outcomes. For example, a fork can be used to spear a carrot (in order to subsequently eat it) or to mash it. Here, the object context is identical and therefore does not allow one to anticipate one of these goals. However, a match of the actually observed motor behavior with one of the objects' functional uses immediately provides such disambiguating information. As a consequence, just seeing how the fork is held may be enough to disambiguate its subsequent use.

Together, therefore, the affordance-matching hypothesis specifies the different pathways of how objects—via the associated function and manipulation knowledge—can make powerful contributions to both action interpretation and action prediction. For descriptive purposes, the flow of information through these pathways has been described mostly separately. Of course, interpretation and prediction in most cases interact strongly, with one constantly influencing the other. For example, a confirmed action prediction will verify inferred action goals, which, in turn, will trigger new action predictions, that can be either confirmed or disconfirmed by new sensory evidence.

# **EVIDENCE FOR AFFORDANCE MATCHING IN ACTION OBSERVATION**

Several recent studies have documented the major role of object information in action understanding (e.g., Hernik and Csibra, 2009; Hunnius and Bekkering, 2010; Bach et al., 2014). They do not only show that object-based modes of action understanding can complement the more motoric modes that have been the focus of most prior work (e.g., Boria et al., 2009), but also support the more specific interactions between object and motor information predicted by the affordance-matching hypothesis. In the following, we will briefly review some important findings.

# **OBJECT MANIPULATION KNOWLEDGE GUIDES ACTION PREDICTION**

The affordance-matching hypothesis posits that people do not only derive manipulation knowledge for the objects relevant to their goals, but also for the objects relevant for the goals of others (for a similar argument, see Creem-Regehr et al., 2013). This knowledge directly constrains the motor behaviors expected from the other person, allowing for efficient action prediction. Indeed, there is ample evidence from studies in children and adults that human observers do not only interpret actions *posthoc*, but actively predict how they will continue (e.g., Flanagan and Johansson, 2003; Falck-Ytter et al., 2006; Uithol and Paulus, 2013). Several studies have demonstrated that these predictions are directly informed by objects and knowledge about the movements required for their effective manipulation. Hunnius and Bekkering (2010), for example, have revealed that when children observe others interacting with objects, their gaze reflects their predictions about the actions to follow. Seeing somebody reach and grasp a cup, therefore, evokes gaze shifts towards the mouth, while seeing somebody grasp a telephone evokes gaze shifts towards the ear, providing direct evidence that an object's typical manipulation can guide action prediction.

Studies on adults similarly support the notion that observers routinely rely on object knowledge to predict forthcoming actions. A range of studies has established that when people see somebody else next to an object, the most effective grip to interact with it is activated, as if they were in the position of the observed actor (cf. Costantini et al., 2011; Cardellicchio et al., 2013). Moreover, consistent with the affordance-matching hypothesis, the activations of these actions has a predictive function and biases perceptual expectations towards these actions. In a recent study by Jacquet et al. (2012) participants identified, in a condition of visual uncertainty, complete and incomplete object-directed actions. For each object, an optimal (low biomechanical cost) and sub-optimal (high biomechanical cost) movement was presented. As predicted from affordance matching, participants more easily identified the movements optimally suited to reach a given object, in line with the idea that extracted affordances have biased visual perception towards these actions.

Other studies confirm that contextual information about the currently relevant action goals guides attention towards relevant objects (Bach et al., 2005, 2009, 2010b; van Elk et al., 2009). Social cues—particularly another person's gaze—are one such source of information (see Becchio et al., 2008 for a review). In human actors, gaze is typically directed at the target of an action, even before it is reached (Land and Furneaux, 1997; Land et al., 1999). Human observers, as well as some primates, are aware of this relationship and exploit it to predict the action's target (Phillips et al., 1992; Call and Tomasello, 1998; Santos and Hauser, 1999; Scerif et al., 2004). If this is the case, then other people's gaze should determine for which objects manipulation knowledge is retrieved. Indeed, Castiello (2003; see also Pierno et al., 2006) reported that observing object-directed gaze primes reaches towards the object, just as if one were directly observing this action. Similarly, research using fMRI has shown that observing an object-directed gaze activates similar premotor and parietal regions as when actually observing an action towards this object (Pierno et al., 2006, 2008). These findings directly support our contention that gaze implies a goal to interact with an object, which in turn activates the necessary actions (cf. "intentional imposition", Becchio et al., 2008).

Another important source of information is the other objects in a scene, which—if they complement the object the actor is wielding—can directly suggest an action goal (e.g., a key and a keyhole suggest the goal of locking/unlocking a door, but key and a slot of a screw do not). It has been known for a while that patients with visual extinction, who are generally unable to perceive more than one object at a time, are able to perceive two objects if the objects show such a functional match (Riddoch et al., 2003). Importantly, perception was further enhanced when the spatial relationship between the objects matched the objects' required manipulation (e.g., corkscrew above rather than below a wine bottle), supporting the idea that implied goals suggested by functionally matching objects drove the retrieval of manipulation knowledge (for a similar effect in healthy adults using the attentional blink paradigm, see McNair and Harris, 2013).

In a behavioral study, we directly tested the idea that action goals implied by potential action recipients are enough to activate the required manipulation (Bach et al., 2005). Participants had to judge whether a tool (e.g., a credit card) was handled correctly according to its typical manipulation, but varied whether a recipient object was present that either matched the typical function of the object or did not (e.g., slot of a cash machine, or a slot of ticket canceller), while controlling whether the action could be physically carried out (i.e., the credit card could just as easily be inserted into the slot of the ticket canceller as into the cash machine). As predicted, we found that manipulation judgments of others' actions were sped up by the presence of functionally congruent objects, in line with the idea that implied action goals pre-activate associated manipulations (for similar findings, see van Elk et al., 2009; Yoon et al., 2010; Kalénine et al., 2013).

### **OBSERVED MANIPULATIONS CONFIRM ACTION INTERPRETATIONS**

The above studies show that affordances of objects combine with contextual and social information about the actor's goals in the prediction of forthcoming actions. What happens if such a prediction is indeed confirmed? According to the affordancematching hypothesis, each function of an object is associated with a specific manipulation that is necessary to achieve this goal. A match between an actually observed action and this predicted manipulation allows observers to infer the action's function: the object can lend the action its meaning.

On a general level, this predicts that, next to movements, objects should be a prime determinant of how actions are understood and distinguished from one another. From the developmental literature, such object-based effects of action understanding are well known. In a seminal study, Woodward (1998) habituated infants to seeing another person reach for one of two objects. After habituation, the position of the objects was switched, so that the same movement would now reach a different object, and a different movement would reach the same object. The results showed that, indeed, infants dis-habituated more to changes of the objects than to changes of the movements, even though the change of movement was more visually different from the habituated action. This suggests that infants interpret other people's reaches as attempts to reach a particular object, such that changes of these objects, but not of the movements required to reach them, change the "meaning" of the action. Indeed, the effects were absent when the object was grasped by an inanimate object with similar shape as the human arm, suggesting that the effect indeed relates to the goals associated with the objects (but see Uithol and Paulus, 2013, for a different interpretation). Moreover, other studies show that the effects depend on the infants' prior interaction experience with the objects, in line with the idea that the effects emerge from ones' own object knowledge (Sommerville et al., 2005, 2008).

Of course, this study only shows on a basic level that objects determine the inferred goal of an observed motor act. Since then, it has been demonstrated that these goal attributions indeed rely on a sophisticated matching of observed actions to the manipulations required to interact with the target object. For example, in the case of simple grasps, the volumetrics of the objects provide affordances for a specific type of grip, with larger objects affording whole hand power grips and smaller objects affording precision grips (e.g., Tucker and Ellis, 1998, 2001). There are now several studies—in children and adults—that show that inferences about a reach's goal are based on such grip-object matches. For example, Fischer et al. (2008) demonstrated that simply showing a certain type of grip triggers anticipative eye movements towards a goal object with a corresponding shape, implying an identification of the action goal based on affordance matching. This capacity is well established already in infants. Daum et al. (2009) have shown that at 6–9 months, children routinely establish such relationships between grasps and goal objects, showing dis-habituation when grasping an object that was incongruent with the initial grip. Even at this age, therefore, children "know" that different objects require different grips, and they can anticipate the goal of an action based on the matching between this affordance, and the observed grip.

Importantly, and in line with the affordance-matching hypothesis, these effects are guided by the same object manipulation knowledge that guides an individual's own actions. Infants' ability for affordance matching directly depends on their ability to exploit these affordances for their own actions. Only those children who used accurate pre-shaping of their own hand to the different object types used this match information to anticipate which object would be grasped (Daum et al., 2011). Similar evidence comes from a study tracking infants' eye movements. As in adults, congruent shapes of the hands allowed infants to anticipate (fixate) the goal object of a reach, and this ability was dependent on their own grasping ability (Ambrosini et al., 2013).

Such effects are not restricted to grasping. In tool use, the manipulations one has to perform with a given tool to realize its function are, if anything, even more distinct (e.g., the swinging motion of hammering, the repetitive finger contractions when cutting with scissors). In an early study, we therefore asked whether a tool that was applied appropriately to a goal object would help identify the goal of an action (Bach et al., 2005). Participants had to judge whether two objects could, in principle, be used together to achieve an action goal (e.g., screwdriver and slot screw vs. screwdriver and slot of a keyhole), but had to ignore whether the orientation of the tool relative to the goal object matched the associated manipulation (e.g., same orientation for screwdriver and screw, but orthogonal orientations of scissors and piece of paper). We found that incongruent manipulations slowed down judgment times, but only for object combinations that suggested a goal; for those that did not, even when otherwise physically possible (e.g., a screwdriver that would fit into a keyhole), this effect was completely eliminated. This is therefore in line with the idea that goal inferences are automatically verified by matching the actually observed action with the required manipulation, but if no potential goal is identified in the first place, such a matching does not take place. Similar findings have been provided by different labs in both adults (van Elk et al., 2009) and children (Sommerville et al., 2008).

If this conception of action understanding is correct, one would predict that object information is key to the comprehension of observed actions, and should therefore also involve strongly overlapping brain regions. We have recently tested the idea that object-related activation is the primary driver of action understanding (Nicholson et al., submitted). In an fMRI study we showed participants a sequence of different everyday actions such as pouring a glass of wine, paying with a credit card, or making coffee—while directing their attention either towards the movements involved, the objects used or the goals of the actions. Consistent with the affordance matching hypothesis, goal and movement tasks produced markedly different brain activations, while activations in the goal and object task were—to a large extent—identical.

### **AFFORDANCE MATCHING GUIDES IMITATION**

Evidence that affordance matching guides action interpretation comes from research on imitation. There is ample evidence that children's imitation does not reflect a faithful copying of the observed motor behavior, but is based on the goal. Unless the specific motor behavior appears crucial to goal achievement (or for fulfilling social expectations, Over and Carpenter, 2012), children try to achieve the same goal with actions that are most appropriate to their circumstances, that is, they *emulate*rather than *imitate*the observed action (Gergely et al., 2002; see Csibra, 2008, for review). If this is the case, and if affordance-matching contributes to these goal inferences, then we should find that actions are specifically imitated when matching the affordances of their goal object.

This indeed seems to be the case. When children observe others reach with their hand to either their ipsilateral or contralateral ear, they primarily attempt to reach for the same target object (i.e., the correct ear), but do so predominantly with an ipsilateral reach, thus ignoring how the actor achieved the goal, and choosing the most appropriate reach for themselves (Bekkering et al., 2000). As seen in Woodward's study, therefore, the goal object determined the interpretation of the action, and this goal served as the basis for imitation while the movement form was neglected (for further discussion on the role of goals in imitation, see Csibra, 2008; see Uithol and Paulus, 2013, for a critical look at such interpretations).

Studies on adults confirm that specifically those actions are imitated, which match the affordances of the goal objects. Humans have a general tendency to automatically imitate other people's actions (Chartrand and Bargh, 1999; Brass et al., 2000; Bach et al., 2007; Bach and Tipper, 2007). Wohlschläger and Bekkering (2002) showed that imitation of simple finger tapping movements is enhanced for the most effective movements towards the goal objects (marked spots on a table), and this effect has been linked to the inferior frontal gyrus, one of the assumed homologs of monkey area F5, where mirror neurons have first been discovered (Koski et al., 2002). In a recent study, we revealed similar effects for automatic imitation of reach trajectories. Observers specifically tend to imitate the direction of observed reaches, if the configuration of the hand matched the size of the goal object (Bach et al., 2011). Other studies have revealed similar findings, showing that muscle activation induced by transcranial magnetic stimulation (TMS) to the motor cortex when watching others grasp objects is higher when the observed grasps match the affordances of the goal object (e.g., Gangitano et al., 2004; Enticott et al., 2010).

# **RELATIONS TO RECENT ACCOUNTS OF MIRROR NEURONS AND ACTION UNDERSTANDING**

The above review shows that the affordance-matching hypothesis can unify a range of recent findings on children's and adult action observation. However, we believe that it is also in line with the single cell evidence, particularly with findings about mirror neurons in the macaque premotor and parietal cortices (di Pellegrino et al., 1992; Fogassi et al., 2005). Recently, several theorists have started to re-evaluate the thesis that mirror neurons are part of a bottom-up mechanism for action recognition (e.g., Rizzolatti and Craighero, 2004), and—in line with the affordance matching hypothesis—instead highlighted their role in matching sensory input to top-down action expectations (e.g., Kilner et al., 2007a; Csibra, 2008; Liepelt et al., 2008a; Bach et al., 2010b, 2011).

Csibra (2008), for example, argues that initial inferences about the goal of an observed action are not based on motoric matching, but driven by contextual information in the scene (e.g., prior knowledge about others' intentions, eye gaze, emotional expression, etc.). Once such an initial goal has been inferred, the job of the mirror neurons is to produce an "emulation" of an action that would be suitable to achieve this goal, based on the observers' own action knowledge. Their firing signals a match between observed action and this emulation, and therefore allows observers to confirm that the correct goal was inferred. In contrast, if there is no such match between predicted and observed action, the inferred goal is revised, and a new—hopefully better matching emulation can be produced. As proposed by affordance matching, this emulation does not only serve such an interpretative function, but also aids action prediction. Especially during visual uncertainty, the emulation can be used to "fill in" action information not obtained directly through perception (for recent evidence for such a filling in, see Avenanti et al., 2013a).

Kilner's (2007a; see also Grafton and de C. Hamilton, 2007) predictive coding account follows a similar principle. The mirror system is seen to be part of a hierarchy of reciprocally connected layers, with goal information at the top and motor or kinematic information at the bottom levels. As in Csibra's model, initial goal inferences are derived from contextual information in the scene. Guided by the observers' own action knowledge, these goals are translated into predictions for forthcoming movements and fed into the lower levels. Incoming sensory stimulation is matched against this signal and elicits a prediction error in case of a mismatch. The next level up can then alter its own prediction signal to reduce this mismatch. As in (Csibra's 2008) model, this sparks a chain of forward and backward projections through the interacting levels, where different goals can be "tried out", until emulation and visual input overlap and the prediction error is minimal.

In such views, therefore, the firing of mirror neurons is interpreted not as the autonomous detection of an action goal (Rizzolatti and Craighero, 2004), but as the detection of a predicted motor act that is in line with a previously inferred action goal (cf. Bach et al., 2005, 2010b). The affordancematching hypothesis agrees with these general ideas. Both of these prior views, however, are relatively vague about how contextual information influences prediction and interpretation. With the notion of coupled function and manipulation knowledge, the affordance-matching hypothesis introduces a specific mechanism via which such goal inferences can be made and translated into predictions of forthcoming motor acts. Indeed, in the following we will review some key pieces of evidence that suggest that response conditions of mirror neurons are not only in line with predictive accounts (see Kilner et al., 2007a; Csibra, 2008), but specifically with the notion that knowledge of how to manipulate objects drives these prediction processes.

# **MIRROR NEURONS AND AFFORDANCE MATCHING**

A classical finding is that mirror neurons fire only for actions that are directed at an object (be it physical, such as a peanut, or biological, such as a mouth), but not if the same body movement is observed in the absence of an object (i.e., mimed actions). This finding is often interpreted as showing that mirror neurons encode the goal of an action: the goal of *reaching for something* rather than the motor characteristics of the reaching act itself (Umilta et al., 2001; Rizzolatti and Craighero, 2004). However, in the light of the affordance matching hypothesis, an alternative interpretation is that the firing of the mirror neurons confirms a specific action that has been previously predicted, based on the affordances of the object (e.g., a reach path on track towards the object location with a grip that is appropriate for the object size). In the absence of an object, no specific grasp is predicted, and hence the mirror neuron remains silent even if one occurs (for a similar argument, see Csibra, 2008). Such an interpretation does not deny that the firing of mirror neurons is goal-related; however, rather than encoding the abstract goal of grasping itself, it suggests that the firing of mirror neurons might signal a movement that matches a functional object manipulation.

Another important aspect are the various reports of object specificity of mirror neuron responses. Consider, for example, that mirror neurons fire consistently only for motivationally relevant objects, like food items (Gallese et al., 1996; Caggiano et al., 2012). For abstract objects, such as spheres or cubes, firing subsides quickly after the initial presentations. This is directly in line with our proposal that the selection of objects for which the affordances are extracted is guided by the functional relevance of the objects towards the actor's goals. Consistent with this interpretation, it has recently been revealed that while a large number of mirror neurons respond preferentially to objects that had been previously associated with reward, a smaller number fire specifically for objects that are not associated with reward (Caggiano et al., 2012). This separate encoding of the same motor acts towards different object types reveals that mirror neuron responses are dependent on object function: they allow observers to disambiguate predicted action goals (here: to gain a reward or not) by matching them to the different movements suitable to achieve these goals.

Another important finding is that mirror neurons in the parietal cortex fire based not on the observed movement itself, but based on its ultimate goal (Fogassi et al., 2005). That is, even when merely observed, the same reaching action is encoded by different mirror neurons depending on whether it is performed with the ultimate goal of placing the objects somewhere else, or eating it. Again, this finding is often interpreted as revealing a coding of the action goal, but it is also in line with the matching of different predictions based on object context. The reason is that, in this experiment, the different goals were not extracted from movement information (the initial grasps were identical for both goals), but by object information: grasps to place were signaled by the presence of a suitable container in reach of the model, while grasps to eat were signaled by the absence of this container (see supplementary material, Fogassi et al., 2005). The finding therefore provides direct support for affordance-matching: mirror neurons fire not because they autonomously derive the goal of the action, but because they detect an action that has been predicted from the presence of objects (for a similar argument, see Csibra, 2008).

An untested prediction of the affordance-matching hypothesis is that mirror neurons should encode the specific motor act expected by the object. They should therefore fire specifically, or most strongly, for a motor act afforded by the object. A mirror neuron encoding precision grips during own action execution should fire most strongly not only if a precision grip is observed, but also if the observed object is one that affords a precision grip (i.e., a small object). In contrast, a mirror neuron encoding whole hand prehension should fire most strongly if a power grip is observed towards an object that affords a power grip (a large object). Some suggestive evidence for such an objectaction matching process was provided by Gallese et al. (1996). They reported, first, that some grasping and manipulation-related mirror neurons only fired for objects of specific sizes, but not for larger or smaller objects (but without providing details on whether size and grip had to match). Second, they reported that mirror neurons do not fire even if the monkey sees a grasp and an object, unless the hand's path is indeed directed towards the object, revealing mirror neuron responses do not only require object presence, but (1) a specific type of object; and (2) a precise targeting of the action towards the object, in line with a matching of action to object affordances.

Similar evidence comes from recent studies on humans that have linked the matching of observed movements to those afforded by the objects to areas in premotor and parietal cortex, the brain regions where mirror neurons have been discovered in the macaque monkey. During grasp observation, these regions become activated when computing the match between grips and objects (Vingerhoets et al., 2013), and respond more strongly for reach errors, specifically when a reach deviates from the path predicted by the object (Malfait et al., 2010). Similarly, in the domain of tool use, they are involved in computing the match between an observed manipulation and the manipulation required to realize the object's function (Bach et al., 2010b). Of course, the conclusion that these affordance-matching related activations in humans indeed reflect mirror neurons need to be interpreted with caution, as none of these studies assessed a role of these regions in motor performance. However, it is noteworthy that the response of these regions correlates with the observer's sensorimotor experience with the actions (Bach et al., 2010b), a criterion that has been proposed for identifying mirror neurons in humans (cf. Calvo-Merino et al., 2005, 2006). Moreover, the parietal activations overlap tightly with the foci identified in a recent meta-analysis on grasp execution (Konen et al., 2013), and the peak coordinates overlap with regions with mirror properties identified by a recent meta-analysis (Molenberghs et al., 2012). Activations in the premotor cortex are particularly close, with peak voxels in the Malfait et al. (2010) and our own study (Bach et al., 2010b) falling within 5 mm of the peaks identified in the meta-analysis.

# **OPEN QUESTIONS AND FURTHER PREDICTIONS**

An open question is how these affordances, which ultimately inform mirror neuron responses, are derived. During own action execution, this role appears to be played by the canonical neurons, which fire both when the monkey executes a specific grip and when it views an object that can be manipulated with this grip. These neurons therefore appear to derive object affordances and specify how an object should be interacted with. Indeed, if the bank region of F5—the region where canonical neurons are primarily located—is inactivated, object-directed grasping is disrupted as well (Fogassi et al., 2001). In contrast, inactivation of the convexity of F5, the area where mirror neurons are primarily located, does not produce such execution impairments, merely slowing down the monkey's actions. It has therefore been argued that, while canonical neurons derive the appropriate grip, mirror neurons play a monitoring role, providing the monkey with "assurance" (p. 583) that its action is on track (Fogassi et al., 2001, see also, Bonaiuto and Arbib, 2010; Fadiga et al., 2013).

A similar division of labor—between deriving object affordances and matching the actually observed action towards this prediction—might happen during action observation. A recent study (Bonini et al., 2014) has provided evidence for specialized "canonical-mirror neurons" in monkey area F5 that appear to play the role of affordance extraction for other people's actions. These neurons respond both when the monkey sees an object in extrapersonal space, and if somebody else performs an action towards it. In contrast to typical canonical neurons, their responses are not constrained to the monkey's peripersonal space, and to an object orientation most suitable for grasping. In line with our hypothesis, the authors therefore argued that these neurons might provide a "predictive representation of the impending action of the observed agent" (p. 4118).

Other data points may, at first glance, show a less obvious link to the affordance-matching hypothesis. One example is the finding that a subset of mirror neurons that respond to grasping will also respond—after training—to grasps of the actions with a tool (Umilta et al., 2001; Ferrari et al., 2005). This finding is often taken as evidence that mirror neurons encode higherlevel goals ("grasping") rather than the relevant motor behaviors themselves. A slightly different explanation is provided by the affordance-matching hypothesis. On this view, mirror neurons do not generalize across different motor acts subserving the same goals, but across different perceptual cues that are informative of action success. For example, a mirror neuron that originally tests grasp success by monitoring fingers closing around an object may learn that the same success condition is met when the end of pliers close around the object. In other words, learning enables the tool tips to be treated like the tips of one's own fingers (cf. Iriki et al., 1996). Such an interpretation is not inconsistent with the encoding of goals of mirror neurons. However, rather than encoding the abstract goal of grasping something, mirror neurons would encode a lower-level perceptual goal state of effectors—be they part of a body or of a tool—close around a target object.

A similar argument can be made for the finding that a large number of mirror neurons are only *broadly congruent*, typically showing a more specific tuning during action execution than observation. For example, during execution, one neuron might fire only when the monkey grasps an object with its hand, while during observation it may fire for grasps with both hands and mouth (cf. Gallese et al., 1996). If one takes a monitoring view of mirror neurons, such differences may emerge naturally from the differential availability of perceptual cues during action and perception. Note for example that during observation one has a view of other people's hands and mouths, but not during one's own actions. A neuron that simply checks whether a body part is on a path towards a target object can therefore perform this test on hands and mouths during perception, but only for hands during execution, giving the impression of a stricter tuning. We believe that such and other differences in the input available during perception and action—such as prior action selection processes or different action capabilities of monkey and model might give rise to the otherwise surprising response profiles of broadly congruent mirror neurons. However, to what extent such hypotheses can be supported by evidence is currently unclear, and a full integration into the current model will therefore be the subject of future work.

# **EXTENSION TO OTHER ACTION TYPES**

The affordance-matching hypothesis contrasts with initial views of action understanding as a bottom-up process (e.g., Rizzolatti et al., 2001; Rizzolatti and Craighero, 2004; Iacoboni, 2009), where observers simulate the outcome of actions based on their prior knowledge about motor commands and their perceptual consequences. Our contention is not that affordance matching is the only way that object-directed actions can be understood, but that it provides a fast and efficient means for action interpretation and prediction for well-known everyday object directed actions. Actions involving unknown objects, for example—or actions with common objects used in unusual ways—might benefit from a bottom up approach that combines a simulation of the motor actions with the mechanical properties of the objects to derive likely action outcomes. Indeed, recent work has revealed such processes of "technical reasoning" during planning of objectdirected actions (e.g., Osiurak et al., 2009), and studies on action observation have shown that mirror-related brain areas become activated specifically for actions that are not known (e.g., Bach et al., 2011, 2014; Liew et al., 2013). However, even in these cases top-down processes can contribute, if one assumes that the relevant function and manipulation knowledge is tied not only to objects as a whole, but to certain object characteristics as well (e.g., any hard object can be used for hammering if it is brought down in force onto the recipient object). Future work will need to establish more closely the boundary conditions that decide which of these two pathways to action understanding and prediction are chosen.

Our discussion has so far focused on manual object-directed actions, which are often seen as the paradigmatic case of human action. However, there is no reason why similar processes may not govern the perception for actions made with other body parts. Walking, for example, one of our most frequent daily actions, happens in an object context, and the paths we take are governed by the objects (and people) surrounding us, and their relevance to our goals. Such actions should therefore be predicted and interpreted in a similar manner as manual actions. Thus, in the same way as observers can predict that a thirsty actor will grasp a glass of water in front of them, they can predict the path the actor would take to a glass on the other side of a room.

The same argument can be made for other cues that guide our social interactions, such as eye gaze and the emotional expressions that typically accompany it. Most of these actions are again object-directed, and observers implicitly understand this objectdirectedness (Bayliss et al., 2007; for a review, see Frischen et al., 2007; Wiese et al., 2013). People look at objects and may smile or frown in response to them. Knowing how objects relate to the actor's goals therefore allows one to predict future looking behavior and emotional expressions, which, in turn, can confirm these goal inferences. Various studies now confirm the presence of prediction or top down effects in gaze and expression understanding. For example, Wiese et al. (2012) recently demonstrated that the classical gaze cuing effects—the extent to which an observer's attention follows another person's gaze—is not driven only by stimulus information but by intentions attributed to the other person.

For other types of action, the link to object knowledge is less clear. Sometimes, observers do not have any information about objects used in an action, for example because the relevant objects are hidden from view (e.g., Beauchamp et al., 2003), or because the action is pantomimed (e.g., during gesturing, Hostetter and Alibali, 2008; Bach et al., 2010a). Here, therefore, the required manipulation cannot be retrieved from the visible objects, but from a much larger variety of possible manipulations in memory. Identifying an object that would match this movement should therefore be relatively slow and effortful, unless the observed movements are highly idiosyncratic, or likely objects have already been pre-activated by assumptions about the actor's goals or contextual cues. However, as soon as a matching object-manipulation pairing is identified, the action can be interpreted and predicted in a similar manner as for fully visible actions (for evidence for such a prediction of pantomimed actions, see Avenanti et al., 2013b, albeit without linkage to object centered mechanisms).

Intransitive actions—such as stretching or spontaneous smiles—are another example. They produce motor activation just like the observation of object directed actions (Costantini et al., 2005; Romani et al., 2005; Urgesi et al., 2006), but they are, by definition, excluded from the present model. As they are neither directed at an object, nor do they involve objects as an instrument, object knowledge can therefore not contribute to their interpretation and prediction. We speculate, however, that their processing may follow similar principles. As it is the case for object-directed actions, intransitive actions link certain kinds of movement (e.g., stretching) with a specific function, typically with reference to one's internal state (e.g., to relieve some symptoms of tiredness). If such a linkage exists, it can provide similar predictive and interpretative functions as the analogous knowledge about objects. Knowing about someone's internal state, may allow one to predict forthcoming actions. Observing these actions, in turn, can then disambiguate possible interpretations about the individual's internal states. However, there is still considerable debate in the literature about how intransitive actions are processed when observed. Future research needs to disentangle these processes, and more closely describe how they interact with one's (inferred) knowledge about a person's internal states.

# **CONCLUSIONS**

Several recent proposals have challenged the idea that a motoric matching process, instantiated by the mirror neuron system, is the key driver of action understanding in humans. Yet, they have left open which alternative source of information could be used instead. The affordance-matching hypothesis posits a key role of objects. It specifies how action prediction and interpretation arises from a combination of object knowledge—how it is used and what it is for—and the actor's current goals and motor behaviors. Such a view can account for a variety of findings and integrates them into a common framework. Moreover, it provides an intuitive account of how the understanding of others' actions can be grounded in one's own experiences. For the perception of everyday object-directed actions, this grounding does not result from a matching of motor parameters, but is based on the identity of the objects, and one's prior experiences about their function and use.

# **ACKNOWLEDGMENTS**

We thank Nicolas McNair, as well Kimberley Schenke and Nick Lange, for their insightful comments on an earlier draft of this paper. The work was supported by the Economic and Social Research Council (ESRC) grant number ES/J019178/1.

# **REFERENCES**


movements. *Eur. J. Neurosci.* 20, 2193–2202. doi: 10.1111/j.1460-9568.2004. 03655.x


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 December 2013; accepted: 05 April 2014; published online: 12 May 2014*. *Citation: Bach P, Nicholson T and Hudson M (2014) The affordance-matching hypothesis: how objects guide action understanding and prediction. Front. Hum. Neurosci. 8:254. doi: 10.3389/fnhum.2014.00254*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Bach, Nicholson and Hudson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# No need to match: a comment on Bach, Nicholson and Hudson's "Affordance-Matching Hypothesis"

#### *Sebo Uithol <sup>1</sup> \* and Monica Maranesi <sup>2</sup>*

*<sup>1</sup> Department of Neuroscience, University of Parma, Parma, Italy*

*<sup>2</sup> Brain Center for Social and Motor Cognition, Italian Institute of Technology, Parma, Italy*

*\*Correspondence: sebouithol@gmail.com*

#### *Edited by:*

*Analia Arevalo, East Bay Institute for Research and Education, USA*

#### *Reviewed by:*

*Ricarda I. Schubotz, Westfälische Wilhelms University, Germany*

**Keywords: affordances, mirror neurons, action observation, enactive and extended cognition, social neuroscience**

Mirror neurons and canonical neurons are two classes of visuomotor neurons that are activated by different visual stimuli (Rizzolatti and Kalaska, 2012). Mirror neurons respond to a biological effector *interacting* with an object (Gallese et al., 1996), suggesting their role in action recognition, while canonical neurons respond to the presentation of a graspable object (Murata et al., 1997), and are considered crucial in visuomotor transformation for grasping (Jeannerod, 1995).

In their interesting and thoughtprovoking "affordance-matching hypothesis" Bach et al. (2014) argue that both types of neurons contribute to action understanding. Action hypotheses are posited to be created by means of *object affordances*. Affordances are motor possibilities an object offers (Gibson, 1979). The visual description of an object's intrinsic features are associated with possible motor acts toward that object. A possible neural implementation for this mechanism are canonical neurons. The thus generated action hypothesis based on an object affordance would then be confirmed by the mirror neuron system. When a match between a predicted action (canonical) and an actually observed action (mirror neurons) is confirmed, either the action goal can be predicted based on observed behavior, or behavior can be predicted based on observed goals (see their Figure 1).

We believe, however, that the proposed separation of hypothesis generation and hypothesis matching is not in line with the empirical evidence currently available, and that the division between "interpretation" and "prediction" relies on a cognitivist assumption that is hard to defend. We suggest that enactivist approaches provide a less problematic framework for studying action understanding.

Bach and colleagues are not entirely explicit about the nature of the proposed matching mechanism between affordance and observed action, but we see two options for the proposed division of labor. In the first and admittedly unlikely option, mirror neurons play the role of a quizmaster that knows the answers. If the right hypothesis is posited, all the mirror neuron system has to do is confirm it. In this case, the contribution of the affordances is superfluous, as mirror neurons already extracted all that is needed from the perception of an action, (i.e., the quizmaster knows the answer). Counter evidence for this option exists in the form of mirror neurons that fire in the absence of an affordance to be matched. The auditory mirror neurons reported by Kohler et al. (2002) fire upon the presentation of the sound of an action alone (peanut breaking, paper tearing) without there being an affordance to match, or a prediction to confirm.

But more importantly, virtually all mirror neuron studies (except Bonini et al., 2014a and Caggiano et al., 2009) involved actions performed in the extrapersonal space—out of reach for the monkey. Canonical neurons remain generally silent when an object is in extrapersonal space of the monkey, suggesting a mainly pragmatic (i.e., in terms of possibilities to interact with the object), rather than a metric reference frame (i.e., in terms of physical distance between the object and the observer; Maranesi et al., 2014). This means that the bulk of mirror neuron study reports mirror neuron firing in absence of canonical neuron firing. This, in turn, means that the major part of mirror neuron activity cannot rightfully be framed as "affordance matching," at least not when canonical neurons are assumed to provide the affordances.

The second and more likely option is that affordance extraction and mirror neuron firing jointly contribute to action understanding by each generating a hypothesis; one based on the object, consisting of one or more actions the object affords, and one about the action the actor is possibly performing ("action classification"; Uithol et al., 2011). When two hypotheses match, they are combined and the action is recognized. However, this means that mirror neuron input is not dependent on the availability of a to-bematched affordance (i.e., mirror neuron activity is expected without affordances available), which is in line with the empirical evidence as highlighted above, but not predicted by the affordance-matching hypothesis. And also here the fact that canonical neurons fire upon object presentation only in monkey's peripersonal space would mean that canonical neuronbased affordances can only be matched within the monkey's peripersonal space. The only neurons showing canonical properties that could be activated by objects in the extrapersonal space are a recently discovered class of neurons reported by Bonini et al. (2014a). These neurons were dubbed "canonical-mirror neurons" as they show both canonical and mirror properties at the single neuron level. However, the canonical-mirror response to object presentation in the extra-personal space cannot be considered a neural implementation of an affordance, as these neurons do not fire for the same objects in the peripersonal space. Rather, these neurons seem to be involved in an object-triggered action prediction (Bonini et al., 2014a), which is indeed in line with the affordance-matching hypothesis, but emphatically does not generalize to canonical and mirror neurons in general. Additionally, recent findings (Bonini et al., 2014b) revealed that some mirror neurons, besides discharging during action observation, are also active when an action is *not* performed by an actor. This activation can obviously not be interpreted as a match between object affordances and action kinematics, as the latter are absent.

As a solution, one might detach the hypothesis generation and confirmation processes from canonical and mirror neurons; the principle of affordance matching is after all not committed to these classes of neurons. But then we wonder what evidence remains for framing action understanding as "hypothesis generation and testing." Why is there the need to combine the (in this case two) types of information into a unified representation? We believe that this framing of action understanding as drawing unified and coherent conclusions about observed actions may have been guided by the (cognitivist) assumption that cognition is centered around retrieving information. Alternatively, the framework of enactivism (Varela et al., 1991; Hutto, 2013; Hutto and Myin, 2013) seems to be much more in line with the complexity in action understanding. Enactivism assumes that cognition is not for creating representations about external events, but interacting with the world. In this framework, action understanding can take many guises of which many are best understood as a form of pattern completion: The observer is faced with an incomplete percept of an action, which is then completed based on perceptual mechanisms, mirror mechanisms and even higher associations—e.g., actorsobject associations (see Uithol and Paulus, 2013). Importantly, there is no need to combine the different routes into a unified representation of the observed action or inferred action goal. If both object and action information are available, perhaps the classification or prediction process is faster, easier and better, but the current evidence suggest that unifying the types of information into a single match is not necessary.

If action understanding is no longer framed as forming a conclusion about an observed action, but instead in terms of pluriform pattern completion that do not mount (always) to a unified representation, another assumption of the affordance matching hypothesis disappears as well: the difference between interpretation and prediction. Both interpretation ("classification" in our terminology) and prediction involve completing a pattern based on an incomplete percept. This means that the information flow cannot be segmented in "interpretation," "knowledge," and "prediction." Interpretation is not a process upstream of knowledge, and prediction is not a process downstream from it, nor do they represent information flows in opposite directions; both notions refer to the process of sensorimotor action specification.

In all, we believe that the suggestion of the affordance-matching hypothesis that different sources of information can each contribute to action understanding is an important one that could open doors to new lines of research. However, the current evidence does not support the proposed division between hypothesisgeneration and hypothesis testing.

### **ACKNOWLEDGMENT**

The authors want to thank Luca Bonini and Katrin Heimann for valuable suggestions. Sebo Uithol was supported by the EU grant "Toward an Embodied Science of Intersubjectivity" (TESIS, FP7-PEOPLE-2010-ITN, 264828).

#### **REFERENCES**


monkeys. *Science* 324, 403–406. doi: 10.1126/science.1166818


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 23 June 2014; accepted: 25 August 2014; published online: 10 September 2014.*

*Citation: Uithol S and Maranesi M (2014) No need to match: a comment on Bach, Nicholson and Hudson's "Affordance-Matching Hypothesis." Front. Hum. Neurosci. 8:710. doi: 10.3389/fnhum.2014.00710 This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Uithol and Maranesi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Response: No need to match: a comment on Bach, Nicholson, and Hudson's "Affordance-Matching Hypothesis"

#### Patric Bach\*, Toby Nicholson and Matthew Hudson

School of Psychology, Cognition Institute, University of Plymouth, Plymouth, UK

Keywords: affordances, action understanding, action prediction, object function, object manipulation

#### **A commentary on**

# **No need to match: a comment on Bach, Nicholson and Hudson's "Affordance-Matching Hypothesis"**

by Uithol, S., and Maranesi, M. (2014). Front. Hum. Neurosci. 8:710. doi: 10.3389/fnhum.2014.00710

We are grateful for Uithol and Maranesi's (2014) insightful comments on our article "The affordance-matching hypothesis: How objects guide action understanding and prediction" (Bach et al., 2014). There, we argued that action understanding is not well-accounted for by process in which observed actions are simply matched, based on kinematic information, to an action in one's motor repertoire. Instead, we proposed that action understanding draws heavily on object information. Humans represent objects in terms of both (1) the goals that can be achieved with them (function knowledge), and (2) the specific motor behaviors required to achieve these goals (manipulation knowledge). This knowledge can make a major contribution to action observation, allowing observers not only to infer the goals someone wants to achieve with an object (via function knowledge) but also to predict the actions that this person would need to carry out to achieve these goals (via manipulation knowledge).

A key question in such a view is what derives the affordances—the known manipulations—of objects handled by other people. As Uithol and Maranesi rightly point out, and as we conclude in our article, canonical neurons are an unlikely candidate. While canonical neurons indeed seem to encode actions one can perform with an object (grasping, tearing), their firing is restricted to the peripersonal space, coding for actions the monkey could do itself. A much better candidate are the mirror-canonical neurons discovered by Bonini et al. (2014), which fulfill a similar role in the peripersonal space of other people. They fire both when the monkey sees an object, and when seeing someone else perform an appropriate action on the object. The Bonini et al. (2014) study was published shortly before our article, and we were only able to discuss it briefly in our paper. Yet, the response properties of these neurons match the predictions of affordance matching perfectly, and we are grateful to Uithol and Maranesi for further highlighting them. They nicely complement the wealth of behavioral evidence that reveal that observers extract object affordances for other people, even outside their own peripersonal space (for a review, see Creem-Regehr et al., 2013), and that mental simulation of hand-object interactions shows similarly lateralized motor activity as when actually performing such manipulations (e.g., when Borghi and Scorolli, 2009; Marino et al., 2012).

Next to highlighting this supportive evidence, Uithol and Maranesi provide two challenges for the affordance-matching hypothesis. First, we had argued that mirror neurons are not independent action recognizers. Instead, their purpose is confirmatory: they check whether one of the object's potential manipulations is indeed occurring (e.g., opening a peanut, grasping an apple; for similar

#### Edited by:

Analia Arevalo, East Bay Institute for Research and Education, USA

#### Reviewed by:

Adolfo M. García, Universidad Favaloro, Argentina

#### \*Correspondence:

Patric Bach patric.bach@plymouth.ac.uk

Received: 16 June 2015 Accepted: 03 December 2015 Published: 22 December 2015

#### Citation:

Bach P, Nicholson T and Hudson M (2015) Response: No need to match: a comment on Bach, Nicholson, and Hudson's "Affordance-Matching Hypothesis". Front. Hum. Neurosci. 9:685. doi: 10.3389/fnhum.2015.00685 arguments, see Kilner et al., 2007; Csibra, 2008). Support for this idea comes, among other findings reviewed in our article, from the observation that mirror neurons do not fire for a motor act in isolation, but only when it is directed to an appropriate object (Gallese et al., 1996) and that firing subsides quickly when the hand deviates from the predicted path. In contrast to this view, Uithol and Maranesi argue that mirror neurons could also recognize actions independently. They point to the audiovisual mirror neurons discovered by Kohler et al. (Kohler et al., 2002; Keysers et al., 2003). These neurons fire not only when an object-directed action is seen, but also when it is merely heard (e.g., the sound of a peanut breaking). As sound provides no object information, Uithol and Maranesi argue there is no prior affordance against which the action can be matched, arguing against an affordance matching interpretation of mirror neurons.

However, in our article, we specifically considered such cases in which action recognition occurs with little prior object information (e.g., because objects are hidden from view or actions are pantomimed). We argued that, in such cases, the action would not be matched to a seen object, but to a much greater variety of affordances of objects in memory. Identifying such a match would therefore be slow and effortful, unless the observed movements are highly idiosyncratic, or the potential objects had already been constrained by the prior context. Strikingly, all these considerations seem to apply to the original Kohler studies. They tested a very limited set of six actions, on which the monkeys were extensively trained, and which were shown repeatedly, in random order, during the experiment. Thus, while vision did not provide object information directly, the potential set of objects was nevertheless highly constrained, and the heard actions could be efficiently matched to one of these alternatives. To our knowledge this has not been tested yet in monkeys, but affordance matching would predict that these auditory mirror neuron responses would be very much delayed or reduced, if no such prior experimental object context would be available.

Finally, we had proposed that function and manipulation knowledge about objects could interact, in a productive manner, during action observation. Knowing somebody's goals will predict exactly which manipulations are required with an object to achieve this goal, supporting action prediction. Conversely, recognizing a known way of manipulating an object allows one to infer which of the object's functions the actor wants to realize, supporting action interpretation. Uithol and Maranesi argue that a single process, similar to pattern completion processes in vision, could account for both. In this view, an object representation linking a goal (driving in a nail), an object (a hammer), and a required manipulation (forceful downwards movements) provides such a pattern, which is filled in if one aspect is missing (as long as the overall pattern is recognized). We are not averse to this possibility. The affordance-matching hypothesis is relatively agnostic as to how the proposed mechanism is implemented. What we would like to argue—and this was the purpose of the paper—is that as soon as an architecture linking objects, goals and body movements is established it can be used for both purposes: prediction (when likely movements are inferred from objects and goals) and interpretation (when likely goals are inferred from how the object is manipulated). Thus, rather than reflecting different processes, prediction and understanding are different processing outcomes that arise (a) from the completeness of the stimulus, and (b) from the task. For example, coordinating our own action with that of another person (e.g., handing over an object) requires efficient prediction. In contrast, longer-term predictions about others' behavior require knowledge of their goals.

Importantly, though, and this is perhaps the main point of disagreement, we do not believe that, even if there was such a pattern completion process, this would negate the "need to match." In vision, only the simplest possible patterns can be "filled in" without recourse to prior knowledge, for example, in cases of edge extensions or extrapolation of retinal motion. Instead, it requires that a matching pattern at a higher cortical level is activated (Rao and Ballard, 1999). Completion is possible precisely because this matching representation can provide the missing information. This is not an out-dated "cognitivist" assumption either: Recent predictive coding models see it as the core of general brain function, across all levels of the cortical hierarchy (Barsalou, 2009; Friston and Kiebel, 2009). The brain constantly forms higher-level hypotheses about the environment, which are propagated downwards and tested against the sensory input. Prediction errors are fed back upwards so that matching hypothesis can be confirmed, and mismatching ones are revised until they match the sensory input. The affordance matching view is directly informed by these views. Objects provide both hypotheses about potential goals (the object's function), and a means for testing them against the currently observed action (the associated manipulations). Object knowledge, therefore, provides the "patterns" against which seen actions can be compared, and from which their goal can be derived.

We would like to end by noting that in the year since publication several studies have provided evidence for the different components of our model. For example, Thioux and Keysers (2015) demonstrated direct links between connectivity in parietal-premotor "mirror" circuits and the ability to anticipate which of two objects someone else is going to grasp, providing evidence for an encoding of object–action affordance relationships in these areas. Similarly, Schubotz et al. (2014) showed that activity in some of these regions increased parametrically with the number of actions afforded by the goal object, in line with the notion that observed actions are indeed matched against action "hypotheses" derived from objects. Finally, Maranesi et al. (2014) revealed predictive firing of mirror neurons before action initiation, if the context (a go signal) implied that the observed actor had a goal of reaching toward the object. This provided direct support for the idea that prior goal assumptions specify which action someone will carry out with an object, which can then be tested against the actual visual input. Indeed, we have recently provided direct evidence for this idea, by showing that top-down predictions directly feed into even low-level perceptual representations of observed motor acts, biasing them further toward the assumed goals than they really were (Hudson et al., 2015, 2016).

# ACKNOWLEDGMENTS

The work was supported by the Economic and Social Research Council (ESRC) grant number ES/J019178/1.

# REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Bach, Nicholson and Hudson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# ADVANTAGES OF PUBLISHING IN FRONTIERS

FAST PUBLICATION Average 90 days from submission to publication

COLLABORATIVE PEER-REVIEW

Designed to be rigorous – yet also collaborative, fair and constructive

RESEARCH NETWORK Our network increases readership for your article

# OPEN ACCESS

Articles are free to read, for greatest visibility

# TRANSPARENT

Editors and reviewers acknowledged by name on published articles

GLOBAL SPREAD Six million monthly page views worldwide

#### COPYRIGHT TO AUTHORS

No limit to article distribution and re-use

IMPACT METRICS Advanced metrics track your article's impact

SUPPORT By our Swiss-based editorial team

EPFL Innovation Park · Building I · 1015 Lausanne · Switzerland T +41 21 510 17 00 · info@frontiersin.org · frontiersin.org