# THE METAPHORICAL BRAIN

EDITED BY: Seana Coulson and Vicky T. Lai PUBLISHED IN: Frontiers in Human Neuroscience

### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-772-9 DOI 10.3389/978-2-88919-772-9

# About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **THE METAPHORICAL BRAIN**

Topic Editors: **Seana Coulson,** University of California, San Diego, USA **Vicky T. Lai,** University of South Carolina, USA

Word cloud generated based on the titles and the abstracts from the articles included in this Research Topic via the Tagul website (https://tagul.com)

Metaphor has been an issue of intense research and debate for decades (see, for example [1]). Researchers in various disciplines, including linguistics, psychology, computer science, education, and philosophy have developed a variety of theories, and much progress has been made [2]. For one, metaphor is no longer considered a rhetorical flourish that is found mainly in literary texts. Rather, linguists have shown that metaphor is a pervasive phenomenon in everyday language, a major force in the development of new word meanings, and the source of at least some grammatical function words [3]. Indeed, one of the most influential theories of metaphor involves the suggestion that the frequency of metaphoric language results because cross-domain mappings are a major determinant in the organization of semantic memory, as cognitive and neural resources for dealing with concrete domains are recruited for the conceptualization of more abstract ones [4]. Researchers in cognitive neuroscience have explored whether particular kinds of brain damage are associated with metaphor production and comprehension deficits, and whether similar brain regions are recruited when healthy adults understand the literal and metaphorical meanings of the same words (see [5] for a review). Whereas early research on this topic focused on the issue of the role of hemispheric asymmetry in the comprehension and production of metaphors [6], in recent years cognitive neuroscientists have argued that metaphor is not a monolithic category, and that metaphor processing varies as a function of numerous factors, including the novelty or conventionality of a particular metaphoric expression, its part of speech, and the extent of contextual support for the metaphoric meaning (see, e.g., [7], [8], [9]). Moreover, recent developments in cognitive neuroscience point to a sensorimotor basis for many concrete concepts, and raise the issue of whether these mechanisms are ever recruited to process more abstract concepts [10].

This Frontiers Research Topic brings together contributions from researchers in cognitive neuroscience whose work involves the study of metaphor in language and thought in order to promote the development of the neuroscientific investigation of metaphor. Adopting an interdisciplinary perspective, it synthesizes current findings on the cognitive neuroscience of metaphor, provides a forum for voicing novel perspectives, and promotes avenues for new research on the metaphorical brain.

# **REFERENCES**


**Citation:** Coulson, S., Lai, V. T., eds. (2016). The Metaphorical Brain. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-772-9

# Table of Contents


*144 Auditory and motion metaphors have different scalp distributions: An ERP study*

Gwenda L. Schmidt-Snoek, Ashley R. Drew, Elizabeth C. Barile and Stephen J. Agauas

*153 Action verbs are processed differently in metaphorical and literal sentences depending on the semantic match of visual primes*

Melissa Troyer, Lauren B. Curley, Luke E. Miller, Ayse P. Saygin and Benjamin K. Bergen

*169 How vertical hand movements impact brain activity elicited by literally and metaphorically related words: an ERP study of embodied metaphor* Megan Bardolph and Seana Coulson

# Editorial: The Metaphorical Brain

Seana Coulson<sup>1</sup> \* and Vicky T. Lai <sup>2</sup>

<sup>1</sup> Cognitive Science Department, University of California, San Diego, La Jolla, CA, USA, <sup>2</sup> Department of Psychology, University of South Carolina, Columbia, SC, USA

Keywords: Alzheimer's disease, autism, embodiment, executive function, figurative language comprehension, hemispheric specialization, right hemisphere damage, schizophrenia

**The Editorial on the Research Topic**

### **The Metaphorical Brain**

Long considered a peripheral topic in linguistics, metaphor is increasingly viewed as a central feature of higher cognition and abstract thought. Investigation of the neural substrate of metaphor has, similarly, become more sophisticated, involving increasingly specific suggestions about the processes involved in its comprehension. This Frontiers Research Topic brings together contributions from a diverse array of cognitive neuroscience to offer a snapshot of current research on the neural substrate of figurative language, and present a number of avenues for future research. The result is an interdisciplinary perspective on the differences between literal and figurative language and how the underlying neurobiological processes can be investigated.

Indeed, most investigations into the neural substrate of metaphor ultimately concern the relationship between literal and metaphorical meanings. In their excellent review paper, Vulchanova and colleagues outline the arguments for and against the continuity thesis that literal and metaphorical language comprehension recruits essentially the same processing mechanisms. Using autism as a lens through which to consider the issue, they review data that indicate dissociations in the comprehension of literal and figurative language within individuals with ASD. Ultimately, they suggest figurative language deficits in ASD stem from the difficulty these individuals have integrating contextual information to build the situation model.

One source of support for the idea that literal and metaphorical comprehension processes recruit distinct neural substrates is the increasingly contentious claim that the right cerebral hemisphere (RH) plays a crucial role in the comprehension of metaphor, but not literal language. Ianni and colleagues note that much of the data supporting this claim comes from the study of brain-injured patients that have employed sub-optimal tasks for assessing metaphor comprehension. They present a novel test with fine-grained sensitivity to participants' ability to understand both literal and metaphorical language. They present data from three patients to demonstrate (i) comparable impairment on literal and metaphorical language, (ii) greater impairment for metaphorical than literal language, and (iii) selective impairment on metaphorical language.

Addressing the issue of hemispheric specialization in healthy adults, Lai and colleagues examine functional neuroimaging data as participants read literal and metaphorical sentences with varying degrees of familiarity. They found that decreasing familiarity (i.e., increasing novelty) of both literal and metaphorical language led to greater activation bilaterally, with more extensive recruitment of LH brain regions overall. However, the relative contribution of the RH was greater for novel metaphors, as a result of reduced LH activation for novel literal language.

Faust and colleagues utilize network theory in their discussion of hemispheric specialization for metaphor comprehension. In particular, they suggest that the LH exhibits semantic rigidity, manifested by networks in which each node is connected to a small number of other nodes. Rigid networks are well suited for the rapid retrieval of conventional meanings, but ill-suited for creating meanings needed for novel metaphors. The RH exhibits semantic chaos, manifested by highly

### Edited and reviewed by:

Hauke R. Heekeren, Freie Universität Berlin, Germany

#### \*Correspondence: Seana Coulson scoulson@ucsd.edu

Received: 03 September 2015 Accepted: 11 December 2015 Published: 05 January 2016

### Citation:

Coulson S and Lai VT (2016) Editorial: The Metaphorical Brain. Front. Hum. Neurosci. 9:699. doi: 10.3389/fnhum.2015.00699 inter-connected networks that enable fast connections between semantically distant concepts. Although inter-connectivity facilitates the comprehension of novel metaphors, its pathological extreme can be seen in schizophrenia and accompanying thought disorder.

Mashal and colleagues examined brain activity as schizophrenics and age-matched controls read literal phrases, conventional metaphors, and novel metaphors. They find novel metaphors elicited greater activity in the RH precuneus and superior parietal lobule (SPL) among schizophrenics than controls, and that greater activation in this brain region was correlated with better comprehension. In keeping with Faust and Kennet's suggestion that schizophrenia is associated with greater inter-connectivity in the semantic network, Mashal and colleagues found patients showed a greater degree of functional coupling between the precuneus/SPL and other language regions.

As is typical of studies of metaphor comprehension in schizophrenia, Mashal and colleagues found evidence for reduced comprehension in patients relative to controls. However, figurative language is diverse, and requires multiple mechanisms for its comprehension. Cognizant of this fact, Pesciarelli and colleagues investigated whether patients with schizophrenia can utilize both combinatorial mechanisms and the retrieval of stored meanings in their comprehension of idioms. They report evidence suggesting that the difficulty schizophrenic patients have understanding metaphors is less pronounced in the case of idioms for which they can rely on the retrieval of stored meanings.

Differences between the processing of metaphors and idioms are also supported by studies of healthy adults. Columbus and colleagues asked whether domain-general aspects of executive control influenced reading times for familiar and unfamiliar metaphorical sentences and idioms. They found that individuals with high executive control utilized context more efficiently than those with low executive control to commit to a metaphorical or literal reading of a target word. While executive control led to advantages for both familiar and unfamiliar metaphors, all participants read idioms efficiently, reinforcing the importance of retrieval mechanisms for idiom comprehension.

Metaphor comprehension is also impacted by individual differences in abstraction ability. Roncero and de Almedia examined figurative language processing in individuals with Alzheimer's disease (AD). In two studies they explored whether patients' abstraction ability was related to how well they interpreted metaphors, and whether saliency as measured by aptness ratings as well as familiarity ratings influenced patients' metaphor interpretation scores. Their findings suggest that patients with better abstraction ability produced better interpretations, and that aptness, not familiarity predicted patients' metaphor interpretation ability.

Other investigators have examined the importance of concreteness on understanding metaphors. Forgács and colleagues compared ERPs to nouns in phrases that were metaphorical (fluffy speech), concrete literal (nasal speech), and abstract literal (ineffective speech). Whereas, adjectives in literal phrases elicited typical ERP concreteness effects, nouns in metaphorical phrases did not. Paradoxically, when the novel metaphorical phrases were rated more concrete, the ERPs to the target nouns resembled those to nouns in abstract literal phrases. When the novel metaphorical phrases were rated more abstract, the ERPs to target nouns resembled those to nouns in concrete literal phrases. Results are argued to support a model in which the literal meaning from the concrete source domain in a metaphor is abstracted away from its physical sense before it is mapped to the abstract target domain.

Weiland and colleagues examined the priority of literal meaning during the processing of metaphor and metonymy. In this study, the literal meaning of the target expressions (e.g., These lawyers are hyenas) was induced via a briefly presented prime (furry) prior to the target (hyenas). At the target word they observed a reduction in the early ERP effect for both metaphors and metonymies and a delay in the late ERP effect only for metaphors. They suggested that the induced literal meaning facilitated the first stage of metaphor comprehension, which involves its literal sense.

Lakoff, in a theoretical contribution, outlined in brief his influential approach to metaphor, describing the existence of systematic metaphors in language, and arguing that they reflect regularities in the conceptual system, that are in turn driven by experiential correlations encoded neurally via spike timing dependent plasticity. Lakoff details empirical support for his theory that stems from linguistics, psychology, and cognitive neuroscience. He dispels some unflattering and oversimplified readings of his account, and sketches the beginnings of a neural theory of metaphor.

Lakoff's call for further research is taken up by a number of other contributors to the volume. These researchers present original research testing some of the predictions of embodied metaphor theory, and its more general counterpart in embodied cognition. For example, Schmidt-Snoek and colleagues compared the event related brain response to metaphors whose source domain evoked the auditory modality, as in "Her limousine was a privileged snort," or the motor modality, as in "The editorial was a brass-knuckle punch," and literal uses of the same words. They found that auditory words elicited a slightly different pattern of ERPs than did motor words, suggestive of non-overlapping neural generators. These findings fit with predictions from embodied metaphor theory that modality specific activations contribute to both literal and metaphoric meanings of these words.

Troyer and colleagues examine whether videos and still images of point light walkers impact reading times for sentences with action verbs. As one might expect from embodied models of cognition, the perception of visual motion does impact the processing of sentences with action verbs, but does so differently for literal uses, such as "The chemist was walking to his lab," and metaphorical ones, such as "The company was walking to its death."

Similarly, Bardolph and Coulson recorded ERPs as participants read words associated with different regions of vertical space as they moved marbles in an upward or a downward directed motion. Words such as "floor" and "ceiling" elicited very early congruency effects in the ERPs, consistent with the involvement of motor regions in registering the conflict between meaning and motion. Words such as "defeat" and "victory," whose vertical associations were metaphorical, elicited congruency effects that emerged much later, in keeping with a role in pragmatic inference.

By focusing on novel research on the neural basis of metaphor, this Frontiers Research Topic provides insight into the neurobiology of conceptual structure. The contributions highlight how metaphor comprehension reveals hemispheric differences in the organization of semantic memory, the importance of executive function for high-level language comprehension, and the differing roles of sensorimotor activations for concrete and abstract concepts. Beyond the individual contributions, we hope that this special focus will inspire future research on the neural underpinnings of metaphor in language, and metaphor in cognition more generally.

# AUTHOR CONTRIBUTIONS

Both authors contributed equally to the writing and editing of this review article.

# FUNDING

SC received support from the Kavli Institute for Brain & Mind, San Diego.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Coulson and Lai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

#### **Mila Vulchanova<sup>1</sup>\*, David Saldaña<sup>2</sup> , Sobh Chahboun<sup>1</sup> and Valentin Vulchanov<sup>1</sup>**

<sup>1</sup> Language Acquisition and Language Processing Lab, Department of Language and Literature, Norwegian University of Science and Technology, Trondheim, Norway

2 Individual Differences, Language and Cognition Lab, Department of Developmental and Educational Psychology, University of Seville, Seville, Spain

#### **Edited by:**

Seana Coulson, University of California at San Diego, USA

#### **Reviewed by:**

Christelle Declercq, Université de Reims Champagne-Ardenne, France Valentina Bambini, Institute for Advanced Study (IUSS), Italy

#### **\*Correspondence:**

Mila Vulchanova, Language Acquisition and Language Processing Lab, Department of Language and Literature, Norwegian University of Science and Technology, Edvard Bulls veg 1, 7491 Trondheim, Norway e-mail: mila.vulchanova@ntnu.no

This paper is intended to provide a critical overview of experimental and clinical research documenting problems in figurative language processing in atypical populations with a focus on the Autistic Spectrum. Research in the comprehension and processing of figurative language in autism invariably documents problems in this area. The greater paradox is that even at the higher end of the spectrum or in the cases of linguistically talented individuals with Asperger syndrome, where structural language competence is intact, problems with extended language persist. If we assume that figurative and extended uses of language essentially depend on the perception and processing of more concrete core concepts and phenomena, the commonly observed failure in atypical populations to understand figurative language remains a puzzle. Various accounts have been offered to explain this issue, ranging from linking potential failure directly to overall structural language competence (Norbury, 2005; Brock et al., 2008) to right-hemispheric involvement (Gold and Faust, 2010). We argue that the dissociation between structural language and figurative language competence in autism should be sought in more general cognitive mechanisms and traits in the autistic phenotype (e.g., in terms of weak central coherence, Vulchanova et al., 2012b), as well as failure at on-line semantic integration with increased complexity and diversity of the stimuli (Coulson and Van Petten, 2002). This perspective is even more compelling in light of similar problems in a number of conditions, including both acquired (e.g., Aphasia) and developmental disorders (Williams Syndrome). This dissociation argues against a simple continuity view of language interpretation.

**Keywords: figurative language, autism spectrum disorders, metaphors, idioms, impaired processing mechanisms**

### **INTRODUCTION**

Figurative language is a cover term for linguistic expressions whose interpretation is nonliteral, where the meaning of the expression as a whole cannot be computed directly from the meaning of its constituents. Figurative language can vary in types, degrees of extension from the literal and degrees of transparency, and structure. Moreover, figurative expressions can fluctuate from a single word to a long sentence. Here belong a range of phenomena, such as metaphors, idioms, proverbs, humor and jokes, hyperbole, indirect requests, clichés (Gibbs, 1999). Such expressions are characterized by interpretations which cannot be retrieved by simply knowing the basic senses of the constituent lexical items, and where the addressee needs to arrive at the intended meaning rather than what is being said literally.

It has been claimed that it is exactly the need to go beyond the literal interpretation and grasp the intended meaning that makes figurative language special and more demanding for processing (Levorato and Cacciari, 2002). Unlike literal language, such expressions depend more heavily on both linguistic and visual context, and are often—in fact, impossible—to understand in the absence of such context. Still, in everyday communication much of the meaning is implied, and can be understood following linguistic and contextual cues (Coulson, 2005). It is this context sensitivity of natural language that has inspired the continuity claim that figurative language is not exceptional. From this perspective, all language and all its sentences are multiply ambiguous whereby the content of all utterances largely underdetermines their interpretation (Gibbs, 1994; Sperber and Wilson, 2006). This approach suggests that figurative language is rather to be found on a continuum from literal-to looseto metaphorical language and should not be considered as a departure from normal language use. While this is one of the more radical interpretations, all approaches arguing for a lack of exceptionality in figurative language sustain that it is pervasive both in language and in thought (Fauconnier, 1997, 2007; Lakoff and Johnson, 1980; Lakoff, 1987; Turner, 1991). If this is true, then it is not a special form of language.

Yet, research in developmental disorders documents subtle dissociations between the ability to understand literal expressions and the comprehension of nonliteral (figurative) language. For instance, high-functioning individuals with autism with intact structural language skills often fail to understand the meaning of jokes, irony, and idiomatic language (Gold and Faust, 2010; Vulchanova et al., 2012a,b). Thus, they present a case against a simple continuum view of figurative language.

In this paper we present evidence from studies of figurative language processing in autism arguing that this evidence calls for a revision of a simple continuum view. We first review issues of relevance to our main topic, such as how to best approach and understand the similarities and differences in the processing of literal and figurative language. For this purpose we start by discussing evidence from typically developing children and adults, to then move on to comment on the data that can be found in looking at a population of special interest to figurative language, namely individuals with autism. We conclude by suggesting possible ways in which these data can be interpreted in the light of current cognitive accounts of autism and more broader approaches to language comprehension.

### **FIGURATIVE LANGUAGE IN TYPICAL POPULATIONS FIGURATIVE LANGUAGE ACQUISITION IN TYPICAL DEVELOPMENT**

Language development provides evidence of the somewhat different status of figurative language. It takes more time for children to begin to appreciate extended uses. According to Nippold (2006), the development of skills in processing metaphors, idioms and proverbs is an important part of semantic development. Compared to vocabulary acquisition and basic semantic skills, skills in the domain of figurative language emerge later. Thus, recent research (Levorato and Cacciari, 1995; Nippold, 1998, 2006; Kempler et al., 1999; Nippold and Duthie, 2003; Cain et al., 2009) suggests that the acquisition of idioms takes longer than vocabulary acquisition, and that it gradually takes off after age five and on.

Opinions, and findings, however, divide concerning the path of this development. Nippold (1998, 2006) and Nippold and Duthie (2003) assume that this is a gradual development, not essentially different from other lexical development, and that it continues also in adulthood. However, Kempler et al. (1999) show that the understanding of idioms follows a non-linear path, very similar to the vocabulary burst between the second and the third year (Marchman and Bates, 1994; Bates and Goodman, 1997). Unlike vocabulary, however, with idioms, this process takes approximately four times longer, with a peak at around 11 years (Vulchanova et al., 2011). In a study of 6- and 9-year-old children and adults, Laval and Bernicot (2002) provide evidence that only at age 9 can children start to appreciate and use context in idiom comprehension. Furthermore, only from this age on children show sensitivity to frequency and familiarity.

The appreciation of figurative language in development requires coordination between cognitive, linguistic and pragmatic skills (Tolchinsky, 2004; Bernicot et al., 2007). Several factors play a role in the acquisition and comprehension of figurative language. Among the most salient ones for idioms, for instance, are frequency of the expression, transparency of its structure, the context in which it is encountered, and linguistic skills and competences (Nippold and Duthie, 2003). It is commonly agreed and has been demonstrated that metalinguistic awareness facilitates the understanding of figurative language, including idioms (Levorato and Cacciari, 2002; Nippold and Duthie, 2003; Nippold, 2006). It has also been shown that reading comprehension is a strong predictor of idiom comprehension (Levorato et al., 2004).

Bernicot et al. (2007) investigated the order of acquisition of different types of nonliteral language in children. They studied the relationship between the children's understanding and their meta-pragmatic competence, defined as the ability to distinguish between what is being said and what is meant in indirect language. In that study they looked at three different types of expressions: indirect requests, idioms and conversational implicatures, in a story completion task. Their results demonstrate that mastering advanced language skills and competencies, such as those required for figurative language processing, correlates with age. This may be attributed to the maturity needed for the processing of expressions offering increased complexity of the inference between the literal meaning (what we say) and the figurative one (what we mean).

# **ACCOUNTS OF FIGURATIVE LANGUAGE PROCESSING Metaphors and literal language**

Metaphors are by far the most "popular" paradigmatic example of figurative language. At the level of thought, conceptual metaphor is a cognitive process by which we represent an abstract concept in terms of a more concrete and tangible one. Metaphors establish (novel) links or mappings between mental domains or spaces, typically a source one and a target one (Fauconnier, 1985). As such, they are ways of thinking capturing generalizations about the world around us and our experience of it.

Theories of metaphors differ in how they assume metaphors are processed, and whether they consider them a departure from normal (literal) language or not. The standard pragmatic view assumes that metaphors are expressions processed via mechanisms different from those used for literal meanings. On this view, the literal meaning should be accessed and rejected before arriving at the intended (figurative) meaning. This implies that an inference is necessary to access the appropriate intended meaning (Grice, 1975). Many authors consider metaphors as "special" structures which are present in everyday language and change depending on time and culture (Lakoff and Johnson, 1980; Turner, 1991).

Alternatively it is suggested that metaphorical and literal meanings are processed in parallel and also use the same mechanisms (Gibbs, 1994). Thus, for both lexical items and metaphorical language, processing interacts with information retrieved from the context (Gibbs, 1987, 1994). Gibbs et al. (1997) found that metaphors did not require more time to process than literal expressions. Furthermore, reaction times did not differ when the context was adequate. An important caveat here is, that equivalent processing times need not reflect equivalent effort (Coulson and Van Petten, 2002; Bambini and Resta, 2012).

Coulson and Van Petten (2002) point to evidence from processing studies suggesting that metaphoric language places heavier demands on processing and requires additional effort for alignment and inference than literal language, not in the least by placing additional demands on working memory (Blasko, 1999). They further argue that the continuity claim should be distinguished from the equivalence claim, which assumes that metaphoric language is no more difficult to comprehend than literal language. They adopt a conceptual blending approach to metaphor (Fauconnier and Turner, 1998), which explains metaphor comprehension as a dynamic process, which creates a blending space combining attributes from the source and target domains. Thus, interpretation arises as a result of selecting relevant properties of these domains and inhibiting aspects which are not relevant through a process of constant updating. In an ERP study, which tested three types of expressions, sentences that ended with words used literally, metaphorically and in an intermediate literal mapping condition, they document that metaphors elicit the greatest N400 effect, while literal mappings occupy a place between true metaphors and literal statements judging by brain responses. This study thus provides evidence for the continuity claim, and, at the same time, shows that metaphors are indeed more demanding for processing, but in a gradient way. The authors suggest that metaphor "taxes" the system we use to understand figurative meaning for two basic reasons. On the one hand, one needs to establish a mapping between elements in distantly related domains (e.g., unlike metonymy), and on the other, to retrieve information from memory to integrate these elements. Other studies have shown similar results (Pynte et al., 1996).

A common problem in assessing results from research in figurative language processing, as observed in Pickering and Frisson (2001), is that word frequency, plausibility, and cloze probability have not always been controlled in several studies reporting reading times for literal and figurative language. Such variables should be taken into account and would have produced different results when determining whether figurative language is more demanding compared to literal language.

Links between metaphors and other types of figurative expressions have been suggested. Gibbs (2003) argues that idioms, often considered as "dead" metaphors, in fact, offer a more dynamic metaphor-based processing. In another study, Gibbs et al. (1997) conducted a series of experiments using a priming method to investigate the role of conceptual metaphors in immediate idiom integration. The aim of the study was to establish whether conceptual metaphors were accessed faster in the context of idioms in discourse. Participants accessed conceptual metaphors more often for the purpose of understanding an idiom, and less so when they were processing literal expressions or literal paraphrases of idioms. Furthermore, this study demonstrated that people access the appropriate conceptual metaphor when they are integrating each specific idiom, and not a similar one with the same figurative meaning. This suggests that idioms with the same figurative meanings may be associated with different conceptual metaphors.

### **Processing of idiomatic expressions**

As a form of nonliteral language, idioms have attracted attention both in theoretical linguistics and in empirical psycholinguistic research, as a result of their specific nature, both in terms of structure and organization. Unlike regular phrases and expressions, idioms come largely in a "prepackaged" form, with many, if not all of their components which cannot be freely replaced or supplemented. Idioms are expressions of varying degree of frozenness and semantic transparency. On the one hand, they are retrieved from the lexicon because they have to be acquired and stored like lexical items, and, on the other, they are processed like structures generated by grammar (Jackendoff, 2002; Vulchanova et al., 2011). Due to this "double" nature or different levels of processing, understanding idioms may pose problems.

There are two kinds of theories regarding how idioms are processed and understood. According to the Lexical Representation hypothesis, idioms are stored as lexical items, and understanding an idiom involves two parallel processes, a retrieval process (which is faster), and a literal compositional computation process based on decomposing every element separately (Swinney and Cutler, 1979).

Hamblin and Gibbs (1999) highlight idiom decomposability and suggest that idiom interpretation depends on identifying the individual constituents, because most idioms are decomposable. It is thus suggested that the processing and understanding idioms cannot be reduced to lexical access or lexical retrieval only (Cacciari and Tabossi, 1988; Gibbs, 1992; Vega-Moreno, 2001). This type of approach bridges over to the second type of approach, the configuration hypothesis. Authors that support it assume that idioms are represented in a distributed way and they are processed as complex expressions (Cacciari and Tabossi, 1988). Tabossi et al. (2005) found that spoken idiom identification differs from word recognition. This means that the modality of presenting the idiom may affect the way we process these expressions as well.

A central question in all of the above approaches to idiom processing, but also more broadly to figurative language processing, is whether literal meanings are accessed first, and whether at all. Some authors reject the existence of literal or default meanings altogether (Sperber and Wilson, 2006), while others suggest a revision of the concept of literal meaning (Ariel, 2002). Recent experimental research, however, provides evidence of the existence of literal meaning, and support for a possible distinction between basic/literal senses and interpretations, and extended/derived ones. Foraker and Murphy (2012) investigated how polysemous senses are processed during sentence comprehension. In this study, in a condition where the context was neutral and did not bias towards a specific sense, participants read disambiguating sentences faster when these sentences were compatible with the dominant sense of target words. Rubio Fernandez (2007) provides further evidence of the "lingering" presence of literal meaning in the processing of figurative language in the domain of metaphors, where core features of word meaning remained activated even after the metaphorical meaning was retrieved.

Idioms are easier to understand in the presence of supportive context. It has been commonly established that the main role of context is to provide semantic support for decoding the target (appropriate) meaning of a sentence or an expression (Cacciari and Levorato, 1989; Gibbs, 1991; Levorato and Cacciari, 1995; Vega-Moreno, 2001; Laval, 2003).

Several factors influence idiom comprehension. Idiomatic expressions can vary in transparency. It is much easier to understand more transparent expressions that opaque ones. Another factor is familiarity. It is a variable that influences comprehension, and many studies establish that a higher degree of familiarity increases performance, leading to better results in different comprehension tasks (Gibbs, 1991; Levorato and Cacciari, 1995; Nippold and Taylor, 2002; Lacroix et al., 2010).

We suggest here that competence in figurative language is characterized by the ability to process language beyond the literal interpretation of individual words. This competency relies both on inferencing skills and on the ability to integrate contextual information from both verbal and nonverbal sources. We expand this idea further in the next sections.

While much of extended language use goes unnoticed to typical native speakers of a language, and, as such may appear part of normal communication, it may pose severe problems for children and adults with developmental deficits. Such populations offer a glimpse into subtle dissociations between literal and nonliteral (figurative) language. For instance, in the autistic spectrum, even high-functioning individuals are often described as overly literal and often fail to appreciate figurative expressions. Such dissociations speak against the view that there exist no basic senses of lexical items (Sperber and Wilson, 2006), since these senses appear to be the only ones available to these individuals. This if often displayed in problems in the autistic spectrum with resolving linguistic ambiguity. We devote the rest of this paper to analyzing this issue and reviewing what data from individuals with autism can tell us about the true nature of figurative language.

### **FIGURATIVE LANGUAGE IN AUTISM SPECTRUM DISORDER**

Autism spectrum disorder (ASD) is a disorder characterized by impairments in social interaction and communication, and restricted behavior and interests. The impairment in social interaction can be manifested in marked deficiencies in the use of eye contact, reading facial expressions, emotions, body posture, and gestures. Failure to develop peer relationships, lack of spontaneous seeking to share enjoyment, interests, or achievements with other people and lack of social or emotional reciprocity, are also typical in autism.

Regarding the impairments in communication, the most common problem, even in individuals with adequate structural language, is the inability to initiate or sustain a conversation with others, including inability to maintain a topic shared with the interlocutor. In addition, ASD individuals frequently have a very stereotyped and repetitive use of language, thus leaving no room for spontaneity.

Individuals with ASD also show other restricted repetitive and stereotyped patterns of behavior, interests and activities. They may present encompassing preoccupation with one or more restricted patterns of interest. They are often characterized by lack of flexibility and adherence to specific, nonfunctional routines or rituals, repetitive motor mannerisms (e.g., hand or finger flapping or twisting, or complex whole-body movements). Often they display persistent preoccupation with parts of objects (e.g., wheels of a toy car).

### **POOR FIGURATIVE LANGUAGE IN ASD**

In typical language development, the acquisition of metalinguistic skills and the comprehension of figurative language seem to be achieved in childhood by the age of nine or ten years according to several authors. However, in ASD this process is typically delayed and depends on various factors, such as degree of language impairment, chronological age, context or social environment. Findings in research suggest that there is a delayed rate of development with regard to processing of ambiguity, idioms, metaphors and other types of figurative language in individuals with autism, and problems at more global levels of language structure, although performance may improve with age (Melogno et al., 2012a,b; Vulchanova et al., 2012a,b).

ASD is a disorder that significantly affects language and communication, and many individuals with ASD do not develop fluent language due to comorbidity with other impairments, such as intellectual disability or language disorder (LD). When LD is comorbid with autism, there are serious difficulties in understanding ambiguous linguistic information, as would be expected. In contrast, individuals with high-functioning autism are distinguished by relative preservation of linguistic and cognitive skills. They usually display a level of intelligence which is normal or even above average, and quite often have specific talents in certain areas. However, problems with pragmatic language skills have also been reported in their case, even with clear strengths in areas of grammar (Landa, 2000; Volden et al., 2009; Vulchanova et al., 2012a,b). Such dissociations in ASD between literal and figurative language argue against a simple continuity model.

One of the first studies to address figurative language comprehension and its roots in autism was the study by Happé (1995). It compared three groups of children with autism to a group of age- and VIQ-matched controls on the understanding of three types of expressions, synonyms, similes, and metaphors. In order to test the hypothesis that metaphor comprehension correlates with the ability to read minds and colocutors' intentions, participants were tested on both first-order, and second-order Theory-of-Mind (ToM) tasks. This study was inspired by ideas from Relevance Theory (Sperber and Wilson, 1986) and the aim was to put the basic assumptions of this account to the test, by investigating its predictions on the wellknown problems in aspects of pragmatic language in autism. The basic idea with the language stimuli used in the study, administered as a sentence completion task, was that there is a gradation of difficulty in processing language, ranging from full transparency in literal expressions to close to full transparency (similes), to nontransparency (metaphors). The findings from this study confirm that metaphor comprehension is impaired in children and adolescents with autism, against adequate processing of similes, and that the ability to process metaphors is directly linked to ToM ability. Thus, the ASD participants who passed both first- and second-order ToM tasks outperformed both participants who solved only the first-order tasks and those who did not pass either task.

The ToM account of the well-attested problems in autism with metaphor comprehension was tested further by Norbury (2005). In her study, an alternative hypothesis was put forth, namely that language competence is a better predictor of performance on metaphor tasks. For this purpose Norbury tested ASD children first grouped according to language ability and autistic symptomatology, and in a second analysis on their ToM ability. Both types of groups were compared to typically developing age-matched peers. The study included a number of tasks to establish language status: the British Picture Vocabulary scales (BPVS; Dunn et al., 1997), the Concepts and Directions subtest of CELF-III (Semel et al., 2000), and the Recalling sentences subtest of CELF-III. Semantic knowledge was tested on the Test of Word Knowledge (ToWK; Wiig and Secord, 1992), which includes synonyms, figurative language interpretation (idiomatic phrases), word definitions and word ambiguity (polysemy) testing. The results of this study demonstrate that semantic ability, which is a core language skill, is a better predictor of metaphor comprehension, whereas ToM, even though it predicts a proportion of the variance, is a weaker predictor of figurative language processing. Thus, only the children with a language impairment, with and without autism, showed impaired metaphor understanding. Furthermore, first-order ToM skills, while probably necessary, are not sufficient to ensure success with figurative language interpretation. These results, however, should be interpreted with caution, since the test used to assess semantic knowledge (ToWK) includes a number of subtests assessing figurative language comprehension, and as such, should, by definition, predict performance on metaphorical expressions (Rundblad and Annaz, 2010).

While the above two studies have only investigated metaphors, further research has targeted a broader range of figurative expressions. MacKay and Shaw (2004) report performance on six categories of expressions, including hyperbole, indirect requests, irony, metonymy, rhetorical questions and understatement. The assumption in their design is that all of these categories involve interpreting what is intended, rather than said, in each expression. For this purpose, the authors used two measures, correct understanding of the meaning of the expression, and correct understanding of the intent of the speaker. In this study language stimuli were accompanied by supporting picture material to provide visual focus for the participants. The experimental group included high-functioning ASD children compared to a control group with no communicative difficulties. In addition to the statistical analysis of results, the study also includes many examples of children's responses illustrating the specific pattern evident in the autistic spectrum in interpreting figurative expressions. This study documents a scale of difficulty in the area of figurative language, with irony standing out as (the most) challenging task, even for typical children, especially processing the intent of the speaker, and fewer problems in rhetorical questions, even for autistic children, where the meaning of such questions is accessible. Areas where a significant difference was observed between the typically developing children and the ASD group, include indirect requests (intent), understatements

(intent), as well as metonymy (both meaning and intent). The latter category was problematic even in the presence of visual support cues, and especially when the visual cues were less suggestive. Based on the finding that ASD children performed at the same level as the control group on understanding the meaning of certain figurative expressions (indirect requests, rhetorical questions, understatement), but failed at understanding the intent of those same expressions, the authors suggest that this result may be caused by different levels of language competencies and skills in the two groups, not evident in the results on the vocabulary scales (BPVS). Unfortunately, this study cannot be compared to the above two, since it did not address metaphors (but only metonymy), and did not ask the same question, namely the extent to which figurative language interpretation depends on ToM ability. Yet, it establishes a scale of difficulty in the processing of indirect language and compares performance by ASD children to typically developing peers in a range of figurative expressions.

Whyte et al. (2014) more recently studied idioms in children with ASD ages 5–12 years. They tested them on idiom comprehension, advanced ToM, vocabulary, and syntax. Like the other studies on figurative language, they also found that they performed worse than children matched on chronological-age. They were not, however, worse at understanding idioms than a syntax-matched control group of younger children. These results would support Norbury (2005) view that language impairment is actually the strongest factor in predicting performance on figurative language tasks.

Beyond group studies, a couple of case studies have addressed figurative language in autism. Melogno et al. (2012a) provide a case study of two high-functioning ASD children. Participants were assessed twice, first prior to, and subsequently following an intervention. Even though, initially the two ASD children showed performance comparable to the average range of typical controls, their patterns of response were different. Assessment after intervention revealed improvement, but in a different way for each participant. Moreover, the level of performance was still below their chronological age, indicative of a "drift" in figurative language comprehension.

In our own work we have addressed the cognitive and language profiles of two high-functioning (Asperger) children with a talent for language learning (Vulchanova et al., 2012a,b). These two case studies tested, among other language competencies, idiom comprehension and metaphor comprehension. Both participants in the studies displayed a highly deviant profile in idiom knowledge compared to similarly-aged controls. In the younger participant, the gap in performance with chronological age was huge (*z* = −3.08). Moreover, the participant performed poorer even than much younger children on the same task, suggesting a deviant developmental trajectory. Even though the gap with chronological age was somewhat smaller for the older participant (*z* = −2.22), it was still significant. The same participant showed an atypical pattern of responses to metaphorical expressions based on the design by Gold et al. (2010). In contrast to typical age-matched controls, for this participant, reaction times to novel metaphors and nonsense expressions were similar, reflecting a problem in distinguishing between these two types of expressions and assessing their plausibility.

Some studies have failed to find significant differences in accuracy scores for participants with autism in studies of figurative language comprehension. For example, Colich et al. (2012) tested whether children and adolescents with autism were able to interpret the ironic intent of speakers. Although both typically developing and autistic participants showed longer response times to ironic comments, brain activation profile was more bilateral in the case of the ASD group, indicating a potential compensation mechanism in processing this kind of figurative language. The study by Pexman et al. (2011) also found similar responses in an irony comprehension task, with results in eye-tracking variables and judgment latencies indicating that individuals with autism might be using different mechanisms to respond.

## **DEVELOPMENTAL ASPECTS OF FIGURATIVE LANGUAGE IN ASD**

In addition to the role of structural language abilities, a great deal of research has explored the influence of other variables that could potentially explain this deficit in the interpretation of nonliteral meaning in participants with autism. In this sense, an interesting question is whether these skills develop in relation to chronological age also in this population, and further whether there is development in relation to mental age. This was first studied from a developmental perspective by Rundblad and Annaz (2010). They compared performance on metonymy to metaphor performance in ASD and typical children. While metaphor is considered to represent a mapping between two distinct conceptual domains, metonymy is a mapping within the same conceptual domain (Lakoff, 1987), and, as such, may be considered a less demanding. The study included picture stories with lexicalized metaphors and lexicalized metonymies incorporated in brief stories. The authors established developmental trajectories for each group, and for each task, first assessing performance on the two tasks relative to chronological age. While for the typical group performance on both metonymy and metaphor increased reliably with chronological age, no reliable correlation was found between scores and chronological age on either task in the ASD children. In this group, in addition, children performed significantly worse on the metaphor task. These two tasks also revealed two different trajectories. While for metonymy there was a development, for metaphor, performance was constant across time and ages, indicating what the authors label a zero trajectory. Furthermore, this study shows that vocabulary scores predict reliably metonymy in the ASD group, with improved performance with higher verbal age, while no similar relationship was found for metaphor comprehension. Thus, compared to typical controls, the ASD group displays a similar rate of development for their level of receptive vocabulary in the area of metonymy, whereas the difference in metaphor comprehension is significantly different. This study adds an important developmental perspective, suggesting that metaphors are an area of specific difficulty where development is not only delayed (as with metonymy), but also highly atypical and, most likely, compromised.

Gold et al. (2010) studied metaphor comprehension in Asperger syndrome in adolescent and adult participants using four types of expressions, free (literal) expressions (*pearl necklace*), conventional metaphors (*sealed lips*), novel metaphors (*firm words*), and nonsensical expressions (*violin tiger*). Their main goal was to establish the accuracy of interpreting such expressions and the degree of cognitive load involved, as measured by reaction times and brain activation. The results of this study showed that compared to typical controls, Asperger individuals present with problems, as reflected in significantly longer reaction times compared to typical controls. Furthermore, different patterns of activation, as seen in the N400 amplitude, were found between the Asperger participants and the control group, which reached significance for the category conventional and novel metaphor. While, in the control group conventional metaphors elicited least negativity, for the Asperger group, it was literal expressions. This suggests greater effort in the processing of metaphor across the board in ASD individuals, including even conventional metaphors, which can be stored, as well as novel metaphorical expressions. Moreover, in the Asperger group reaction times were significantly longer for the processing of nonsense expressions. The latter result is indicative of a specific problem evident in other studies of figurative language in autism, namely the inability to assess the plausibility of linguistic expressions events or facts (Tager-Flusberg, 1981; Paul et al., 1988).

An interesting study provides evidence of a specific dissociation between the processing of visually presented metaphors and verbally presented ones in autistic populations (Mashal and Kasirer, 2012). This study compared ASD and Learning Deficit children to typically developing controls. They used 11 subtests, ranging from figurative language interpretation including visual metaphors, idioms, conventional metaphors, novel metaphors, to homophones and semantic tests (synonymy, similarity). The authors analysed the data using a Principal Component Analysis to investigate for clustering of performance results. Results loaded on three different factors in all three groups in the study, and while there was a significant overlap in the loading between groups, both deficit groups displayed a clustering of all three verbal figurative language skills (idioms, conventional and novel metaphors) in the same factor, suggestive of the specific problems in that population. Moreover, in those two groups there was a dissociation between metaphors presented visually, and those presented verbally, as reflected in the results loading on two different factors. In contrast, the typical children displayed an association between idioms and conventional metaphors, which can be expected, given the nature of conventionalized expressions and idioms, and an association between visual and novel metaphors, which loaded together on a separate factor. This latter result suggests an integration in typical individuals of the processing of metaphors, irrespective of their modality (visual or verbal), most probably through a common underlying cognitive mechanism. This does not appear to be the case in the autism group and the group with learning deficits.

All of the clinical studies reviewed here document a dissociation between literal and figurative language in autism, which argues against a simple continuity model. Clearly a revision of this model is called for in the face of these data.

### **ACCOUNTS OF THE PRAGMATIC DEFICIT: A SPECIFIC DEFICIT OR NOT?**

From the studies reviewed above and earlier findings, it becomes evident that there is a pervasive problem in the autistic spectrum in the broader domain of pragmatic aspects of language. However, there is a debate concerning the causes of this problem and what aspects of the autistic profile can account for the pragmatic deficit. One assumption is that the pragmatic deficit is not special and does not dissociate from the rest of language competence in autism. The idea is that performance on pragmatic tasks and the ability to process (ambiguous) language in context correlates directly with structural language competence (Norbury, 2005; Brock et al., 2008; Gernsbacher and Pripas-Kapit, 2012).

Alternatively, the pragmatic deficit can be linked to other traits in the autistic profile. Thus, one of the most widely accepted theories of what is causing the deficit in the domain of figurative language and metaphors, in particular, is based on Happé's study and hypothesis that the deficit is caused by impaired mentalising skills and in terms of impaired ToM (see also Baron-Cohen et al., 1985; Happé, 1993; Baron-Cohen, 2000, 2001). Clearly, the ToM hypothesis can explain one aspect of what is necessary to be able to perceive others' intentions, including those expressed verbally. Yet, many studies reveal increased problems with decrease in the transparency of the mapping between language structure and (intended) interpretation (MacKay and Shaw, 2004). All studies document a specific problem in the area of metaphors, even compared to closely related, but less demanding, phenomena, such as e.g., metonymy. This indicates that reading intentions (mentalising) needs to be operationalized accordingly on a finer scale of gradience, explaining difficulties and/or success in all types of figurative expressions.

A host of hypotheses attempts an account in terms of more general cognitive mechanisms dedicated to information processing. Some authors attribute the deficit to more general problems in executive functioning and the inability to suppress unnecessary information (Ozonoff et al., 1991; Mashal and Kasirer, 2012). This account links to the well-observed problem in assessing event plausibility (Tager-Flusberg, 1981), but also to the Weak Central Coherence account (Frith, 1989; Frith and Happé, 1994; Happé and Frith, 2006). Happé and Frith (2006) suggest that individuals with ASD have difficulties to understand metaphors, because they have a deficit in executive function and central coherence. This can be attributed to the fact that individuals with ASD display a bias for processing information locally rather than globally. Frith (1989) points out that in order to be able to understand a word or an expression they should be put in a concrete context. Context is even more important for figurative expressions, in order to process the intended meaning, rather than just the literal one. In fact, weak central coherence has been attributed as the source of pragmatic problems in individuals with ASD (Noens and van Berckelaer-Onnes, 2005). In addition, Norbury and Bishop (2002) found that people with ASD have difficulties in contextual integration, and the more ambiguous the expression is, the greater the problem in this population (Happé, 1997; Jolliffe and Baron-Cohen, 1999, 2000; López and Leekam, 2003; Brock et al., 2008).

Other accounts seek to explain the pragmatic deficit at the neural level in terms of a right hemisphere (RH) deficit (Gold and Faust, 2010; Gold et al., 2010). In one study, participants were asked to perform a semantic judgment task. The results indicated much less Right Hemispheric contribution to novel metaphor comprehension in ASD. Impaired RH activity was further documented in other studies of figurative language processing (Faust and Mashal, 2007).

Alternatively, it can be assumed that the inability to process figurative language arises from problems in information integration, especially when information is to be retrieved from multiple sources (e.g., problems with processing in context), and linking this to the more general deficit at global processing (top down) at the expense of enhanced local processing (bottom up). Of special interest here is that the well-documented problems in processing ambiguous information arise only in the context of language contra visual information (López and Leekam, 2003), and dissociates from structural language skills (Vulchanova et al., 2012a,b). Furthermore, there is evidence that visual and linguistic metaphors dissociate only in autistic participants, but not in typical children (Rundblad and Annaz, 2010). Therefore, it would be logical to conclude that the difficulties that people with autism demonstrate in figurative language are probably due to inability to either access both modalities at the same time or integrate information from more than one modality at the same time. While the visual context may assist interpretation in typical populations, it may create additional problems in deficit populations such as individuals with autism (Chahboun et al., in preparation).

Indeed, one of the main symptoms of ASD is the lack of information integration and absence of adaptability to the environment (Minshew et al., 1997; Brock et al., 2008). Many authors attribute this to the inability to gather together information in order to be able to distinguish between relevant and irrelevant information, in part attributable to weak central coherence (Frith, 1989; Happé and Frith, 2006; Vulchanova et al., 2012b). Selecting relevant features of the metaphor vehicle concept and suppressing the irrelevant ones has been suggested as the basic mechanism in metaphor comprehension (Rubio Fernandez, 2007).

It is widely argued that individuals with ASD are impaired in processing ambiguous linguistic information in context (López and Leekam, 2003; Brock et al., 2008). In addition, they often fail to attach context to their memories and are specifically impaired in processing social aspects of contextual information (Greimel et al., 2012).

Happé (1996) suggests that difficulties in global processing could be due to conceptual semantic deficits, but also to a failure in extracting perceptual properties from context. López and Leekam (2003) provide evidence that the ability to use context is spared in the visual domain, but reduced in the verbal one. Further, they document increased problems with increased complexity of the verbal stimuli and with higher level of ambiguity. This points to the limitations in the ability to use contextual information in individuals with ASD, but not a complete absence of this skill.

The extent to which individuals with autism can use context in disambiguation is an open question, and findings are controversial. Some authors consider that people with ASD are unable to use contextual information in sentence-processing tasks. Others still, claim that success or failure depend on the nature of the context: the more general the information provided by the context is, the more difficult it is for autistic participants to disambiguate homographs (Hermelin and O'Connor, 1967; Frith and Snowling, 1983; López and Leekam, 2003).

Saldaña and Frith (2007) and Tirado (2013) document that children with ASD have a normal reduction in reading times for expressions which are congruent with previous events, suggesting a relative strength at detecting congruence. It has also been argued that the ability to use context depends on structural language skills, and only ASD participants with poor language skills fail to use visual context (Norbury, 2005; Brock et al., 2008). What is clear from these studies is that different types of context present different processing demands, and autistic performance varies accordingly.

### **SUMMARY AND CONCLUSIONS**

Language is a complex multi-layered and multifaceted system. In order to interpret language appropriately, users need a number of skills. Figurative language can be even more demanding in terms of processing. It is acquired relatively late and has a complex nature, which makes it even more difficult for atypical population, such as individuals with ASD, to understand. What skills are deemed necessary for language processing and figurative language, in particular? Adequate structural language competence, adequate semantic competencies and skills and vocabulary size (Norbury, 2004, 2005; Oakhill and Cain, 2012), inferencing skills; a developed conceptual system and a knowledge base (Schneider et al., 1989; Fuchs et al., 2012; Oakhill and Cain, 2012); information integration skills (context; evaluating plausibility and suppressing irrelevant information, Rubio Fernandez, 2007); mentalising and understanding intentions (see Kintsch, 2000 for a computer simulation model). Needless to say, many of these skills co-vary with language (e.g., semantic skills and vocabulary, conceptual knowledge, and the knowledge base are often directly associated with linguistic labels), so studying them in isolation and controlling for their impact on figurative language depends on the kind of measure adopted. Impairment in any one of these areas is sufficient to cause problems in the comprehension of figurative language. For instance, in order to understand one of the most demanding instances of figurative language, metaphor, the user not only needs to have prior experience and knowledge of the concepts that are being associated in a metaphorical expression, but also knowledge of their respective domains and the networks they form with other concepts in these domains (Keil, 1986; Bambini et al., 2011). This requires information integration and processing skills, beyond those required for simple concept combination (Barsalou, 1999; Wu and Barsalou, 2009), and depends on the ability to form associations, analogies and

other top-down skills. If we take high-functioning autistic individuals as a test case, the cause of the persistent difficulty in figurative language becomes more evident. In this population, structural language is intact; they present with adequate semantic and conceptual skills, are good at compositional operations at the level of the sentence, perform adequately at a number of inferencing tasks (Tirado, 2013 PhD thesis), and usually pass first-order, and often second-order ToM tasks, and have an age-appropriate knowledge base, as attested by normal IQ scores. The only area where problems persist is information integration and inability to use information from the database adequately: evaluating plausible/implausible events; assessing what is relevant; combining information arising from different modalities.

Building on the original proposal by Kintsch (1998), an influential account of (reading) comprehension suggests that success at language processing depends on creating appropriate situation models. This means that the language user needs to create a mental representation of what the message is about, not what the message says (Zwaan, 1999). Based on the evidence in research on problems in the domain of figurative language interpretation, it is highly likely that autistic individuals have problems in building and making use of appropriate situation models. The models they build could in some respects be incomplete. More importantly, they might not be able to make use of them to understand the co-locutor's intention with the message. It seems as though, they possess the necessary knowledge base, but cannot use it adequately, since they cannot judge plausibility (Tager-Flusberg, 1981; Paul et al., 1988), often fail at certain types of inferences, and are not always good at exploiting contextual information. It has been shown that typical children benefit from visual support and are better at processing visually presented metaphors (Epstein and Gamlin, 1994). However, multi-modal and multi-sensory information appears to be a problem in autism, despite intact visual processing *per se* (López and Leekam, 2003; Chahboun et al., in preparation). As a consequence, individuals in the autistic spectrum fail to integrate a situation model that integrates the necessary information, the speaker's intent and the rest of the context in which all this must be used.

### **DIRECTIONS FOR FUTURE RESEARCH**

Most of the studies reviewed in this paper are heterogeneous and difficult to compare. They have used different methodologies, test skills in different figurative language domains, and often use largely heterogeneous groups of participants. Thus, quite often the range of participants is from mid-childhood to adulthood. Since one of the intriguing questions in research in developmental deficits is whether one can expect development in the comprehension of the different types of figurative expressions, more homogenous groups are required, in both longitudinal and cross-sectional studies (cf. Melogno et al., 2012b for a similar point). Similarly, the types of expressions selected in those studies vary tremendously, especially those that have been chosen as exponents of the target category. For instance, the degree to which expressions fall in the category of conventional metaphor needs to be tested prior to including it in an experiment. Likewise, other linguistic properties of the stimuli are important: frequency of expression and/or constituent words will affect processing; collocational frequency of the constituents in the expression (e.g., "buckle" and "button" are by far the most frequent complement fillers of "fasten", as in fasten a buckle/a belt, so these phrases tells us little about argument structure competence in typical and deficit populations alike).

The observed dissociation between figurative (non-literal) and literal language processing in ASD lends support to findings about the neural correlates of idiomatic language processing in typical adult populations (Lauro et al., 2008), suggesting a bilateral involvement of fronto-temporal areas for idioms against selective activation of left inferior parietal areas in the case of literal expressions. The recruitment of the prefrontal cortex may reflect an active selection between alternative meanings when idioms are processed. This offers a new perspective for future research comparing the neural and cognitive mechanisms involved in figurative language comprehension in autism and typical populations.

Another intriguing line of research are recent accounts of the role of embodiment in human cognition and, specifically, in language comprehension (Barsalou, 2008). It has been suggested that the well-attested communication problems in autism could be partially driven by core (low-level) cognitive mechanisms, such as deficits in temporal coordination and sensori-motor impairment (e.g., motor movement). This type of account is consistent with models of embodied cognition in typical populations and is worth pursuing in future research (Eigsti, 2013).

An interesting, yet unexplored perspective are parallel studies of similar pragmatic deficits observed in different developmental disorders. For instance, Lacroix et al. (2010) document problems in idiom comprehension in French speaking children and adolescents with William's syndrome. Similar results have been found while testing the ability to understand metaphors and sarcasm (Karmiloff-Smith et al., 1995; Annaz et al., 2009). Since WS is characterized by a relative strength in language and social interest, but poor conversational skills, contra impaired spatial cognition, it would be interesting to test how this population compares to the autistic spectrum, especially the higher end, where structural language is spared, too. Even more intriguingly, it has been suggested that the observed figurative language problems in WS may be attributed to poor semantic integration (Hsu, 2013).

Finally, if we are right in attributing the figurative language deficit to poor information integration and impaired situation models, appropriate tasks need to be set up to test for exactly these types of skills. Developmental deficits offer a rare glimpse into the, sometimes subtle, dissociations between and within cognitive domains, such as e.g., structural vs. extended (figurative) language, and as such, can shed light on how metaphors and other figurative expressions are processed in typical individuals, what kinds of demands this processing requires and at what cost. Future research should seek to provide a consistent comprehensive account of the mechanisms involved in language comprehension at the neural and cognitive levels in both typical and deficit populations (Dilkina and Lambon Ralph, 2013).

# **ACKNOWLEDGMENTS**

This work has been supported by the EU 7th Framework Programme Marie Curie Initial Training Networks grant Nr. 316748 under the project *Language and Perception* and Grant PSI2010-17401 of the Spanish Ministry of Science and Innovation to the second author.

# **REFERENCES**


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

### *Received: 14 April 2014; accepted: 11 January 2015; published online: 17 February 2015*.

*Citation: Vulchanova M, Saldaña D, Chahboun S and Vulchanov V (2015) Figurative language processing in atypical populations: the ASD perspective. Front. Hum. Neurosci. 9:24. doi: 10.3389/fnhum.2015.00024*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2015 Vulchanova, Saldaña, Chahboun and Vulchanov. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# Flying under the radar: figurative language impairments in focal lesion patients

## *Geena R. Ianni 1 †, Eileen R. Cardillo2 \*†, Marguerite McQuire2 and Anjan Chatterjee2*

<sup>1</sup> Section on Neurocircuitry, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA <sup>2</sup> Department of Neurology, Center for Cognitive Neuroscience, University of Pennsylvania, Philadelphia, PA, USA

### *Edited by:*

John J. Foxe, Albert Einstein College of Medicine, USA

### *Reviewed by:*

Krishnankutty Sathian, Emory University, USA Christelle Declercq, Université de Reims Champagne-Ardenne, France

### *\*Correspondence:*

Eileen R. Cardillo, Department of Neurology, Center for Cognitive Neuroscience, University of Pennsylvania, 3720 Walnut Street B-51, Philadelphia, PA 19104-6241, USA

e-mail: eica@mail.med.upenn.edu

†Geena R. Ianni and Eileen R. Cardillo have contributed equally to this work.

Despite the prevalent and natural use of metaphor in everyday language, the neural basis of this powerful communication device remains poorly understood. Early studies of brain-injured patients suggested the right hemisphere plays a critical role in metaphor comprehension, but more recent patient and neuroimaging studies do not consistently support this hypothesis. One explanation for this discrepancy is the challenge in designing optimal tasks for brain-injured populations. As traditional aphasia assessments do not assess figurative language comprehension, we designed a new metaphor comprehension task to consider whether impaired metaphor processing is missed by standard clinical assessments. Stimuli consisted of 60 pairs of moderately familiar metaphors and closely matched literal sentences. Sentences were presented visually in a randomized order, followed by four adjective-noun answer choices (target + three foil types). Participants were instructed to select the phrase that best matched the meaning of the sentence. We report the performance of three focal lesion patients and a group of 12 healthy, older controls. Controls performed near ceiling in both conditions, with slightly more accurate performance on literal than metaphoric sentences. While the Western Aphasia Battery (Kertesz, 1982) and the objects and actions naming battery (Druks and Masterson, 2000) indicated minimal to no language difficulty, our metaphor comprehension task indicated three different profiles of metaphor comprehension impairment in the patients' performance. Single case statistics revealed comparable impairment on metaphoric and literal sentences, disproportionately greater impairment on metaphors than literal sentences, and selective impairment on metaphors. We conclude our task reveals that patients can have selective metaphor comprehension deficits. These deficits are not captured by traditional neuropsychological language assessments, suggesting overlooked communication difficulties.

**Keywords: metaphor, aphasia, focal lesion patients, figurative language, case study, sentence comprehension**

# **INTRODUCTION**

Metaphor is pervasive in everyday language, and often used to communicate complex, abstract, or unfamiliar concepts. Individuals encounter metaphors on a daily basis in the classroom *(The Bohr model atom is a tiny solar system*), in their social lives*(Our first date was a train wreck*), and in the media *(Congress froze the budget*). As a communication device, metaphor is practical, allowing familiar information to sculpt and inform new concepts. Conceptualized this way, metaphor is fundamental to the flexibility of human thought, revealing novel commonalities, facilitating learning, and enabling abstraction (Lakoff and Johnson, 1980; Gentner, 1983).

Despite the ubiquity of metaphor in thought and language, its neural instantiation remains uncertain. In an early formal demonstration of metaphor deficits following brain injury, Winner and Gardner (1977) found that right-hemisphere damaged (RHD) patients, but not left-hemisphere damaged (LHD) patients or healthy controls, had difficulty matching metaphoric sentences to pictures, suggesting the right hemisphere was uniquely tuned for metaphor comprehension.

Several subsequent patient studies supported this claim (Brownell et al., 1984, 1990; Van Lancker and Kempler, 1987; Mackenzie et al., 1999; Champagne et al., 2004; Klepousniotou and Baum, 2005a,b). However, in some of these cases only RHD patients and controls were tested, providing no means of comparison between the hemispheres (Mackenzie et al., 1999; Champagne et al., 2004, 2007; Rinaldi et al., 2004) or RHD patients who performed at ceiling were excluded from analyses (Brownell et al., 1990). These studies sometimes also contained few items (e.g., as few as three or four in Brownell et al., 1990; Tompkins, 1990; Giora et al., 2000; Zaidel et al., 2002), showed that impairment depended on task (Winner and Gardner, 1977), or failed to show any hemispheric differences when task demands were accounted for statistically (Zaidel et al., 2002). Nonetheless, the first neuroimaging study of metaphor comprehension supported the right-hemisphere hypothesis (Bottini et al., 1994), bolstering the tentative claims made by the patient studies. Thus, the prevailing view became that metaphor comprehension was a lateralized, right hemisphere dominant process.

Many subsequent neuroimaging studies of metaphor comprehension, however, have failed to find the right-lateralized activations predicted by the right-hemisphere hypothesis of metaphor comprehension. Most studies report activation in both hemispheres (Eviatar and Just, 2006; Stringaris et al., 2006, 2007; Ahrens et al., 2007; Mashal et al., 2007; Chen et al., 2008; Bambini et al., 2011; Desai et al., 2011; Diaz et al., 2011; Cardillo et al., 2012; Lacey et al., 2012; Shibata et al., 2012; Uchiyama et al., 2012) and some only left-lateralized activations (Rapp et al., 2004, 2007; Lee and Dapretto,2006;Kircher et al.,2007; Shibata et al.,2007; Mashal et al., 2009; Yang et al., 2009; Diaz and Hogstrom, 2011; Forgács et al., 2012). Recent meta-analyses confirm left-hemisphere dominance for figurative language, including metaphor. Although the right hemisphere is indeed often responsive to metaphoric stimuli, its contribution is neither equivalent to nor stronger than that of the left hemisphere; it is weaker (Rapp et al., 2012) or absent (Bohrn et al., 2012). Consistent with this conclusion, some patient studies found metaphor comprehension to be comparably impaired following left or right hemisphere injury (Tompkins, 1990; Gagnon et al., 2003), or more impaired following left than right injury (Giora et al., 2000).

Unsurprisingly, divergent lesion and neuroimaging data have not led to consensus regarding the laterality of metaphor comprehension (Schmidt et al., 2010). One explanation for these discrepancies is heterogeneity of stimuli and/or task demands. We have addressed stimulus design extensively elsewhere (Cardillo et al., 2010) and will address choice of task here. Tasks common in neuroimaging studies with healthy adults do not always extend well to patient populations. On the one hand, passive tasks like silent reading or periodic comprehension probes provide insufficient behavioral correlates for measurement. On the other hand, more demanding, semantic tasks like valence or plausibility judgment may elicit poor performance because of difficulty with the decision aspect of the task or a response-bias, not because of a comprehension problem, *per se*. These tasks also cannot tell us anything about what a person understood the sentence to mean. Comprehension of metaphoric sentences could be assessed with yes/no questions (Gagnon et al., 2003; Eviatar and Just, 2006; Prat et al., 2012), however, this task produces a relatively insensitive measure. Random guessing alone would produce 50% accuracy. Further, poor performance can only indicate a patient has metaphor comprehension difficulty, but provides no insight into the many possible reasons for a comprehension failure.

Experimental tasks commonly used with patients also present interpretive challenges. Evaluating metaphor comprehension with picture-matching may introduce visuospatial confounds in RHD patients, who perform better than LHD patients when asked to provide oral explanations of the same metaphors (Winner and Gardner, 1977; Mackenzie et al., 1999; Giora et al., 2000; Zaidel et al., 2002; Rinaldi et al., 2004). Oral explanations provide rich information but are difficult to quantify and necessitate fewer items than forced choice tasks (Giora et al., 2000; Zaidel et al., 2002; Champagne et al., 2004). In addition, some LHD aphasics may have difficulty conveying full comprehension in this format because of language production problems (Winner and Gardner, 1977). Semantic similarity judgments – in which a patient matches a metaphoric expression (e.g., *bright*) to

its figurative sense (e.g., *clever*) – avoid many of the previously mentioned confounds. However, stimuli used in such tasks have been highly heterogeneous. Single words, dyads, and triads have all been used and studies have varied in how thoroughly or comparably they have matched answer choices and conditions on lexical confounds that are not of interest (Brownell et al., 1984, 1990; Gagnon et al., 2003).

Clinical assessments of language function following brain injury are even less discerning. Neurologists, speech pathologists, and neuropsychologists rely on diagnostic batteries to reveal compromised language skills, target speech-language rehabilitation approaches, and alert patients and their caregivers to areas of potential communication difficulty. The commonly administered Western Aphasia Battery (WAB; Kertesz, 1982), for instance, assesses spoken and written language production and comprehension, classifying patients by aphasia diagnosis and severity of impairment in different domains.

Although widely used, the WAB exclusively assesses literal language skills. Other aphasia assessments are similarly lacking. The Boston Diagnostic Aphasia Examination (Goodglass and Kaplan, 1983), the Porch Index of Communicative Ability (Porch, 1971), Minnesota Test for Differential Diagnosis of Aphasia (Schuell, 1965), and the Aphasia Diagnostic Profiles (Helm-Estabrooks, 1992) also do not contain any assessment or mention of metaphor. This clinical oversight runs contrary to common experience. Other batteries such as the Right Hemisphere Language Battery (Bryan, 1989) and Montreal Evaluation of Communications (Joanette et al., 2004) do include a figurative subtest but rely on items not motivated by current theoretical and methodological considerations relevant to metaphor comprehension (Cardillo et al., 2010; Schmidt et al., 2010). Furthermore, these batteries are rarely administered to patients with left hemisphere injury.

Given the limitations of existing metaphor comprehension tasks, we developed a new sentence-level, multiple-choice matching task to address these methodological challenges. Sentence stimuli – a staple of neuroimaging studies of metaphor – are preferable to single words, as they are metaphor's most commonly encountered form. Their complexity however, requires careful balancing between figurative and literal conditions in terms of difficulty, a level of control that is rarely documented. Despite their naturalness and the feasibility of generating closely matched stimuli (e.g., Cardillo et al., 2010), sentence-level metaphors have not to our knowledge been used with patients. In our task, participants read a sentence and then chose from an array of four phrases the one that best matches its meaning (one correct target, three incorrect foils). This task has several advantages over other measures: (1) it avoids the visuospatial confounds of picture-matching, (2) it avoids the qualitative nature of oral explanations, (3) it avoids the low sensitivity of yes/no questions, (4) it uses naturalistic language, and (5) it explicitly acknowledges different metaphor subtypes. We demonstrate that the metaphor multiple choice task can be used to reveal unrecognized metaphor deficits in braininjured patients by presenting three illustrative cases. We further demonstrate that this approach can identify metaphor-specific deficits, distinct from general comprehension deficits and unrecognized by traditional neuropsychological assessments of language. Finally, we show that systematically designed foils provide

information about the nature of a patient's comprehension failure.

### **MATERIALS AND METHODS**

### **SUBJECTS**

Participants were three unilateral focal lesion patients enrolled in the University of Pennsylvania Focal Lesion Database. Patients with a history of other neurological disorders, psychiatric disorders, or substance abuse are excluded from the database. The patients presented here were drawn from an ongoing, large-scale group study of metaphor comprehension and specifically selected based on their observed behavioral patterns on our task. Sample size was dictated by the number of unique comprehension profiles that, when presented together, illustrate the capability of our task to detect and distinguish different kinds of metaphor impairment. Detailed demographic and neuropsychological information about the patients is provided in **Table 1** and an axial view of their injury location is provided in **Figure 1**.

Patient 444DX is an 81 year-old retired factory worker who suffered an ischemic stroke 120 months prior to testing. The Philadelphia Brief Assessment of Cognition (PBAC), a brief dementia-screening instrument, was administered to assess function in five cognitive domains: working memory/executive control, lexical retrieval/language, visuospatial/visuoconstructional operations, verbal/visual episodic memory, and behavior/social comportment (Libon et al., 2011). Performance indicated compromised visuospatial, memory, and executive functions but normal language and social skills. Object and action naming battery (OANB) scores confirmed clinically normal lexical access for common object and action names (Druks and Masterson, 2000) and administration of the Western Aphasia Battery (Kertesz, 1982) likewise indicated clinically normal language abilities. An MRI scan demonstrated a lesion damaging the posterior temporal and parietal cortex of the right hemisphere.

Patient 384BX is a 74 year-old, retired butcher who suffered a hemorrhagic stroke 144 months prior to testing. Performance on the PBAC indicated compromised visuospatial, memory, and executive functions but normal language and social skills. Following injury he reported halting speech and stuttering. Administration of the WAB revealed some residual difficulty with naming and a

diagnosis of mild anomia. OANB scores, however, indicated clinically normal lexical access for common object and action names. An MRI scan demonstrated a lesion undercutting the superior frontal gyrus of the left hemisphere.

Patient 642KM is a 78 year-old retired construction manager who suffered an ischemic stroke 130 months prior to testing. Performance on the PBAC indicated compromised memory and executive function but normal visuospatial, language, and social skills. OANB scores indicated clinically normal lexical access for common object and action names, and the WAB score indicated clinically normal language abilities. An MRI scan demonstrated a lesion damaging the parietal cortex of the left hemisphere.

Twelve neurologically healthy older adults recruited from the University of Pennsylvania Control Database served as a control population (Age: 64.3 ± 9.9, Education: 14.4 ± 2.6) and were paid \$15/h for their participation. All participants were native English speakers, right-handed and gave informed consent to participate in accordance with the Institutional Review Board of the University of Pennsylvania.

### **STIMULI**

### *Sentences*

Stimuli consisted of 60 metaphor-literal sentence pairs of three types. One third of the items were of the nominal-entity form, one third were of the nominal-event form, and one third were of predicate form. Nouns referring to concrete entities or objects (e.g., *bullet*, *cheetah, drum*) served as the metaphorical words in nominal-entity sentences, nominalized verbs in nominal-event sentences [e.g., *(a) dance, (a) limp, (a) fall*], and verbs in predicate sentences (e.g., *ran, giggled, argued*). All nominal-entity and nominal-event metaphors were of the form "*The X was a Y*" where *Y* was the word being used metaphorically. All predicate metaphors consisted of a noun phrase and an action verb followed by a prepositional phrase. In these items the verb was the word used metaphorically. It remains to be seen if different types of metaphor are also delineated at the cognitive or neural level (Cardillo et al., 2012). Given that objects and actions, as well as nouns and verbs, have been shown to differ in their semantic properties and neural instantiations (Damasio and Tranel, 1993; Martin et al., 1995; Kable et al., 2002, 2005) it is possible that their


T, temporal; P, parietal; F, frontal; Exec, executive function; Mem, Verbal/visual episodic memory; VisSp, visuospatial/visuoconstructional operations; Lang, lexical retrieval/language; Beh, behavior/social comportment.

<sup>1</sup>Voxel size <sup>=</sup> 1 mm <sup>×</sup> 1 mm <sup>×</sup> 1 mm; <sup>2</sup>Within normal limits cut-off <sup>=</sup> 93.8.

figurative extensions do as well. Although investigating the role of syntactic form and semantic properties of source terms was not the focus of this study, the possibility of encountering categoryspecific deficits dictated that different types of metaphor were balanced.

Forty nominal-entity, 40 nominal-event, and 40 predicate sentence pairs were selected from a superset of 624 sentence pairs [80 pairs were taken from Cardillo et al. (2010) and 80 pairs were drawn from a pool of 312 items designed and normed using identical methods] using Stochastic Optimization of Stimuli software (Armstrong et al., 2012). Optimized selection ensured metaphors and literals were matched in terms of familiarity, length (number of words, number of content words, number of characters), average content word frequency, average content word concreteness, and positive valence ratio (*p*'s > 0.10). As previously observed

**Table 2 | Psycholinguistic properties of literal and metaphoric sentences.**

(Cardillo et al., 2010), metaphors were judged to be significantly less imageable (*p* < 0.005) and natural (*p* < 0.01) than their literal counterparts, and significantly more figurative (*p* < 0.005). Sentences of different types (nominal-entity, nominal-event, predicate) were further matched on interpretability (metaphors only), figurativeness (metaphors only), familiarity, naturalness, imageability, length (number of words, number of content words, number of characters), frequency, concreteness, and positive valence ratio (*p*'s > 0.10). Means and standard deviations of 12 collected psycholinguistic variables are summarized below in **Table 2**.

### *Answer choices*

Four answer choices were generated to accompany each sentence: one correct target and three incorrect foils. All answer choices were composed of an adjective or adverb, followed by a noun. As shown in **Table 3**, in the metaphor condition the target was related to the figurative meaning of the sentence, Foil 1 was related to the literal sense of the sentence, Foil 2 was the opposite of the metaphorical sense of the sentence, and Foil 3 was unrelated. Foils were designed to be informative of the type of language deficit present. A Foil 1 selection indicates a literal bias in metaphor comprehension. A Foil 2 selection indicates a semantic integration impairment, as the metaphorical sense of the source word was necessarily activated but incorrectly interpreted in the context of the sentence. A Foil 3 selection indicates a more general comprehension deficit, as it is entirely unrelated to the sentence.

In the literal condition, the foils were designed to mirror the difficulty and nature of foil types in the metaphor condition as closely as possible. The target was related to the literal meaning


\*SUBTLWF values from Brysbaert and New (2009).


#### **Table 3 | Sentence and answer choice examples.**

of the sentence, Foil 1 was related to the agent of the sentence by category membership (but not implied by the sentence), Foil 2 was the opposite of the literal sense of the sentence, and Foil 3 was unrelated. It was necessarily impossible to make Foil 1 answers of the same nature as Foil 1 answers in the metaphor condition, but by presenting a strong lexical associate of one of the content words, Foil 1 answers were designed to mirror the semantic selection demands of Foil 1 answers in the metaphor condition (which presented a meaning strongly associated with the source term). Given the reversed valence necessarily entailed by the Foil 2 condition (the opposite of the target meaning), an additional constraint on all answer choices was introduced to avoid valencerelated biases in selection: for both metaphor and literal items, Target and Foil 2 had opposite valences and Target and Foil 3 had the same valence.

Finally, frequency values for the answer choices were collected from SUBTLEXus (Brysbaert and New, 2009). No significant differences in average frequency were found between literal and metaphor conditions, between sentence types, or between answer choices. Concreteness values were also collected from the MRC Psycholinguistic Database (Coltheart, 1981) and the University of South Florida Norms (Nelson et al., 2004). For those words that did not have published concreteness values, we collected our own using the procedures of Cardillo et al. (2010). Given the abstract nature of metaphor, Target and Foil 1 answer choices were significantly different in terms of average concreteness (*p* < 0.005). In order to avoid any concreteness-related bias in selection, an additional constraint on all answer choices was introduced: Target and Foil 3 also significantly differed in concreteness (*p* < 0.005) and the target and Foil 2 did not (*p* > 0.10). Literal answer choices also followed this pattern: Target and Foil 1 differed in concreteness (*p* < 0.001), as did Target and Foil 3 (*p* < 0.005), but Target and Foil 2 did not (*p* > 0.10). As such, answer choices were matched on frequency, concreteness and valence so none could aid blind guessing. **Table 3** provides examples of sentence and answer choice stimuli. Full materials are available upon request.

### **PROCEDURE**

### *Control procedure*

All participants made judgments on all 120 items. Subjects were told to choose the single answer choice which best matched the "meaning of the sentence," and to guess if unsure. The task was selfpaced. Participants pushed the space bar once for the sentence to appear. After reading the sentence for comprehension, participants pushed the space bar again to view the answer choices. Answer

choices were presented in quadrant format below the sentence, Participants were instructed to indicate an answer choice using four keys on the keyboard. Sentences were presented centrally in black, 18-point font on a white background using E-Prime 1.1 software on a Dell Inspiron laptop. Each participant received a unique, random order of items. The target and each foil had a 25% chance of appearing in any single quadrant on the screen in any given trial. Ten practice trials preceded four blocks of experimental trials.

### *Patient procedure*

The patients' task was similar to the controls' with one modification: the trials were advanced by the experimenter. The experimenter pressed the spacebar for the sentence to appear. This was followed by a 3 s delay, and then the answer choices were presented beneath the sentence. To avoid motor response and memory difficulties, patients indicated an answer by pointing to or saying the answer aloud and the experimenter recorded this answer using the keyboard.

### **BEHAVIORAL ANALYSIS**

An item analysis of healthy controls' scores revealed three items whose comprehension fell 3 SD below the average; these items were eliminated from further analysis. A subject analysis of accuracy scores revealed a single individual whose comprehension fell 3 SD below average on any given sentence-type; this individual was replaced. For controls, accuracy for literal and metaphor conditions was averaged across all participants. For patients, accuracy in the literal and metaphor conditions was calculated separately for each individual. Foil profiles were generated for each patient by dividing the number of each type of error (Foil 1, Foil 2, Foil 3) by the total number of errors in literal and metaphor conditions.

We tested for a comprehension deficit in the metaphor condition at the level of the individual patient using "Bayesian analysis for a *simple* difference," developed by Crawford et al. (2010). The analysis was done on standardized scores and repeated for the literal condition. This test uses Bayesian Monte Carlo methods to determine if a patient's score is sufficiently below the scores of controls such that the null hypothesis, that the patient's score is an observation from the control population, can be rejected. In this case, patients with a *simple* metaphor or literal deficit exhibit significantly reduced comprehension in that condition, relative to controls.

We also tested for a differential deficit in metaphor comprehension at the level of the individual patient using "Bayesian analysis for a *differential* difference," developed by Crawford et al. (2010). The Bayesian test for a *simple* difference can only indicate whether a patient is impaired in the metaphor, literal, or both conditions. It does not distinguish between reduced accuracy due to difficulty with metaphor specifically and reduced accuracy due to a general impairment affecting literal and metaphoric language alike. The Bayesian test for a *differential* difference however, can make this distinction by also taking into account the differential accuracy score and correlation between the two conditions, as established by the control group. Patients with a *differential* metaphor deficit exhibit proportionally greater difficulty with metaphoric than literal sentences than is observed in the control population.

# **RESULTS**

Overall, the control group performed near ceiling. Literal accuracy (*M* = 96.8, SD = 1.98) was significantly higher than metaphor accuracy (*M* = 93.5, SD = 4.65); *t*(11) = 2.744; *p* = 0.019). The correlation between literal and metaphor accuracy was *R* = 0.516 (*p* = 0.044). In the metaphor condition, Foil 1 (the literal sense of the sentence), was the most common error (66.7%), followed by Foil 2 (24.4%) and Foil 3 (8.9%). In the literal condition, Foil 1 (related to the agent of the sentence by category membership, but not implied by the sentence), was the most common error (78.3%), followed by Foil 2 (17.4%) and Foil 3 (4.3%).

### **GENERAL SENTENCE COMPREHENSION IMPAIRMENT (444DX)**

Application of the Bayesian test for a simple deficit revealed a simple metaphor comprehension deficit [*t*(11) = −3.653; *p* < 0.01] and a simple literal comprehension deficit [*t*(11) = −5.004; *p* < 0.001], in 444DX. Application of the Bayesian test for a differential deficit revealed a non-significant difference in metaphor and literal comprehension scores, indicating a general sentence comprehension impairment. 444DX made predominantly Foil 1 and Foil 2 errors in both the metaphor and literal conditions. See **Table 4** for detailed reporting of single case statistics.

### **DISPROPORTIONATE IMPAIRMENT IN METAPHOR COMPREHENSION (384BX)**

Application of the Bayesian test for a simple deficit revealed a simple metaphor comprehension deficit [*t*(11) = −8.640; *p* < 0.005] and a simple literal comprehension deficit [*t*(11) = −4.182; *p* <0.001] in 384BX. Application of the Bayesian test for a differential deficit revealed a differential metaphor deficit [*t*(11) = 4.656; *p* < 0.02]. In the metaphor condition, 384BX's errors were overwhelmingly Foil 1, while Foil 2 accounted for the majority of errors in the literal condition. See **Table 5** for detailed reporting of single case statistics.

### **SELECTIVE IMPAIRMENT IN METAPHOR COMPREHENSION (642KM)**

Application of the Bayesian test for a simple deficit revealed a simple metaphor comprehension deficit [*t*(11) = −5.790; *p* < 0.0001] in 642KM. Literal comprehension was not significantly different than that of controls. Application of the Bayesian test for a differential deficit revealed a differential metaphor deficit [*t*(11) = 5.129; *p* < 0.001]. Like 444DX, 642KM made predominantly Foil 1 and Foil 2 errors in both the metaphor and

literal conditions. See **Table 6** for detailed reporting of single case statistics.

To summarize, the three patients exhibited three distinct deficit patterns. 444DX demonstrated general sentence level impairment; she was impaired on both metaphor and literal comprehension, but not significantly more so on either condition. 384BX demonstrated a disproportionate metaphor deficit; he was impaired on both metaphor and literal comprehension, but significantly more so for metaphors. 642KM demonstrated a selective metaphor deficit; he was impaired on metaphors but displayed normal literal comprehension.

# **DISCUSSION**

Metaphors are powerful and pervasive communication devices in everyday language, yet conspicuously absentfrom standard clinical assessments of language. The purpose of this study was to demonstrate that a metaphor multiple-choice task can reveal profiles of impaired metaphor comprehension in brain-injured patients that go undetected by traditional aphasia assessments. Three unilateral focal lesion patients made judgments on 60 matched literal-metaphor sentence pairs by choosing the phrase that best matched the meaning of a given sentence from an array of four possible answers. Compared to a group of healthy, older adults, single-case statistics revealed three unique patterns of impaired metaphor comprehension in the three patients (444DX, 384BX, 642KM). None of these patterns were predicted by their performance on standard clinical measures of receptive and expressive language.

Although the WAB is widely used to diagnose and classify aphasia following brain injury, it is agnostic with respect to figurative language, including metaphor. Our data indicate profound, unrecognized deficits in this domain, impairments that can persist post-injury despite normal literal language comprehension, and may significantly impact daily communication and thinking. All three cases in our series were impaired in their comprehension of metaphoric sentences, but the specific pattern of performance suggests these deficits were of three different natures.

444DX was impaired in both literal and metaphoric conditions. The absence of a differential deficit suggests that her difficulty with metaphor reflects a general sentence comprehension impairment. 444DX's low performance is surprising considering her near perfect accuracy on the WAB, OANB, the language subsection of the PBAC, and casual conversation. One possibility is that her behavior reflects, at least in part, difficulty with the semantic executive demands of the task. A multiple choice problem requires the systematic consideration and rejection of competing meanings before selecting the correct one. 444DX's performance on the PBAC indicated impaired memory and executive function, domain general deficits would reasonably impact strategic processing in the linguistic domain as well. Consistent with a difficulty in resolving semantic competition, 444DX remarked, "Some of them were tricky. A lot of times, I thought there were two correct answers. I doubted myself several times."

384BX was also significantly impaired in both his literal and metaphoric comprehension, responding correctly to only 88% of the literal sentences, and only 52% of the metaphoric sentences. Unlike 444DX, however, the difference between his metaphoric


and literal comprehension was greater than would be expected in healthy adults, indicating a disproportionate difficulty with metaphor. This pattern suggests that a milder, lexical-semantic comprehension impairment is present in addition to a metaphorspecific deficit. The severity of 384BX's diagnosed anomia, however, is mild and not suggestive of the severe metaphor impairment observed. Furthermore, anomia is classified as an expressive aphasia, in which language production is affected while comprehension is relatively preserved. Therefore 384BX's poor metaphor comprehension cannot be anticipated by the anomia diagnosis. Nor is he aware of his difficulty. In debriefing he remarked, "I started stuttering after the stroke," but "I can still read and remember," and "I did not feel like my reading was affected (by the injury)."

Most dramatic was the disproportionate metaphor deficit demonstrated by 642KM. Consistent with his high scores on the neuropsychological tests and conversational ease, his performance in the literal condition was near ceiling – yet he responded correctly to only 66% of metaphoric sentences. This pattern indicates his comprehension failure is specific to metaphor and cannot be explained by general language comprehension problems. Like 384BX, 642KM remained unaware of his impairment even after testing, remarking, "it was easy," and "I understood ninety percent of what I was reading." As these comments suggest, this comprehension problem is not only unrecognized by traditional aphasia assessments, but is also opaque to the patient himself.

As the three cases illustrate, not all metaphor deficits are alike. Some deficits are "pure," selective for metaphor while leaving literal language intact (642KM). In other patients this metaphorspecific deficit is accompanied by a milder comprehension deficit affecting literal language as well (384BX). Still other metaphordeficits are reflective of a general deficit, impacting metaphoric and literal language comprehension similarly (444DX). The close matching of metaphoric and literal conditions on psycholinguistic variables enables confident direct comparison of metaphor and literal comprehension. By contrast, many previous studies have tested patients on only metaphoric items (Winner and Gardner, 1977; Mackenzie et al., 1999; Giora et al., 2000; Zaidel et al., 2002; Champagne et al., 2004; Rinaldi et al., 2004), designs that cannot preclude the possibility of a general comprehension deficit, rather than a metaphor-specific one.

The unique foil profiles of each patient further illustrate the diversity of metaphor deficits. 384BX's errors in the metaphor condition were overwhelmingly Foil 1 (literal interpretation). This pattern indicates his metaphor comprehension fails in a specific way, resulting in a systematic, highly implausible misinterpretation. Literal biases have been reported previously in brain-damaged patients by Brownell et al. (1984) and Rinaldi et al. (2004), using picture-matching and a single-word semantic similarity judgment task, respectively. The present study is the first demonstration of literal bias for metaphor comprehension in which metaphor and literal items were closely matched on average and in pairwise fashion. Thus, we may confidently attribute comprehension deficits to difficulty with metaphors, rather than potentially confounding sentence properties (e.g., familiarity, length, frequency, concreteness, etc.). In contrast to 384BX, 642KM, and 444DX showed more mixed foil profiles,

**Table 4 | Single case statistics and foil profile of 444DX.**


with Foil 2 errors in addition to Foil 1 errors. Foil 2 errors indicate the metaphorical meaning was at least partially accessed, but incorrectly interpreted. This error pattern suggests that the origin of comprehension failure in cases like 444DX and 642KM is more complex than for patients presenting only a systematic literal bias. Understanding the different ways metaphor comprehension breaks down in the injured brain may enable more appropriate and targeted rehabilitation strategies.

Metaphor deficits are of clinical interest to patients and their caregivers for many of the same reasons as general language impairments, but their effects on communication may be more insidious. For example, metaphor is an attractive option for discussing internal emotional states (*I exploded at the rude customer*), abstract concepts (*The right thing to do is a gray area*) or explaining new, complex ideas (*The brain is a computer*). In these cases, a literal bias would make comprehending the metaphoric statements as they were intended impossible. Yet, as the normal neuropsychological profiles and the patients' own reflections make plain, metaphor interpretation failures do not announce themselves immediately the way literal comprehension deficits do. The abstract nature of the concepts typically expressed by metaphor may contribute to their poor detection in casual conversation. More simply, we are imperfect listeners; if we expect successful comprehension, we are more likely to project it.

Finally, it is worth noting that both patients demonstrating a disproportionate metaphor deficit had unilateral left-hemisphere lesions (384BX, 642KM). Without overstating the importance of lesion location in such a small sample, this observation is inconsistent with the right-hemisphere hypothesis of metaphor, which predicts metaphor impairments in right- not left-hemisphere patients. In accordance with the accumulating evidence from neuroimaging, our data indicate metaphor comprehension is a not solely a right-hemisphere dependent process. Left-hemisphere brain-damaged patients may be in as much need for figurative language rehabilitation as right hemisphere injured patients. Research on the efficacy of therapies targeting metaphor comprehension is not only scarce, but also customarily only targets right-hemisphere patients because of their presumed susceptibility to these kinds of deficits (Lundgren et al., 2006, 2011).

In sum, our results from three illustrative patient cases establish the utility of a carefully designed multiple choice task as a new tool in the investigation of the neural basis of metaphor comprehension. Focal lesion patients were the focus of this investigation, but the approach is equally suitable for investigating questions of metaphor comprehension in other clinical populations or in neuroimaging studies with healthy adults. The metaphor multiple choice task uniquely avoids the methodological and interpretative pitfalls of tasks previously used with patients, while adding increased sensitivity for capturing different types of comprehension deficits. Further, although not the aim of the current study, the inclusion of metaphors of different types enables investigating current, outstanding theoretical questions about the cognitive and neural mechanisms supporting metaphor comprehension. Most importantly, we wish to highlight the clinical utility of our approach. Our task revealed that patients can have figurative language deficits neither evaluated nor predicted by traditional aphasia assessments. This observation raises the possibility that

many patients that might benefit from targeted therapies are currently overlooked. We can not see what our tools are not designed to detect.

### **AUTHOR CONTRIBUTIONS**

The experiment was conceived by Eileen R. Cardillo and Anjan Chatterjee The stimuli were generated by Eileen R. Cardillo. The experiments were programmed and carried out by Geena R. Ianni and Marguerite McQuire Data analysis was done by Geena R. Ianni with assistance from Eileen R. Cardillo and Marguerite McQuire All authors were involved in data interpretation. The paper was written by Geena R. Ianni and revised by Eileen R. Cardillo, Marguerite McQuire and Anjan Chatterjee All authors approved the final version for submission.

### **ACKNOWLEDGMENTS**

This research was supported by a National Institute of Health grant (R01-DC012511) awarded to Anjan Chatterjee, a National Institute of Health training grant (T32AG000255-16), and a University of Pennsylvania College Alumni Society Research Grant awarded to Geena R. Ianni. The authors are particularly grateful to 444DX, 384BX, and 642KM for their participation. We would also like to thank Christine Watson for help with stimuli norming and selection using SOS, and Jonathan Yu, Casey Gorman, and Sam Cason for their assistance with stimuli norming and data collection.

### **REFERENCES**


testing neural hypotheses about metaphor. *Behav. Res. Methods* 42, 651–664. doi: 10.3758/BRM.42.3.651


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 04 April 2014; accepted: 08 October 2014; published online: 03 November 2014.*

*Citation: Ianni GR, Cardillo ER, McQuire M and Chatterjee A (2014) Flying under the radar: figurative language impairments in focal lesion patients. Front. Hum. Neurosci. 8:871. doi: 10.3389/fnhum.2014.00871*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Ianni, Cardillo, McQuire and Chatterjee. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Familiarity differentially affects right hemisphere contributions to processing metaphors and literals

**Vicky T. Lai <sup>1</sup>\*, Wessel van Dam<sup>1</sup> , Lisa L. Conant <sup>2</sup> , Jeffrey R. Binder <sup>2</sup> and Rutvik H. Desai <sup>1</sup>\***

<sup>1</sup> Department of Psychology, University of South Carolina, Columbia, SC, USA

<sup>2</sup> Department of Neurology, Medical College of Wisconsin, Milwaukee, WI, USA

### **Edited by:**

Seana Coulson, University of California at San Diego, USA

### **Reviewed by:**

Bálint Forgács, Central European University, Hungary Tristan S. Davenport, University of California at San Diego, USA

#### **\*Correspondence:**

Vicky T. Lai and Rutvik H. Desai, Department of Psychology, University of South Carolina, 1512 Pendleton Street, Columbia, SC 29208, USA e-mail: vicky.tzuyin.lai@gmail.com; rutvik@sc.edu

The role of the two hemispheres in processing metaphoric language is controversial. While some studies have reported a special role of the right hemisphere (RH) in processing metaphors, others indicate no difference in laterality relative to literal language. Some studies have found a role of the RH for novel/unfamiliar metaphors, but not conventional/familiar metaphors. It is not clear, however, whether the role of the RH is specific to metaphor novelty, or whether it reflects processing, reinterpretation or reanalysis of novel/unfamiliar language in general. Here we used functional magnetic resonance imaging (fMRI) to examine the effects of familiarity in both metaphoric and non-metaphoric sentences. A left lateralized network containing the middle and inferior frontal gyri, posterior temporal regions in the left hemisphere (LH), and inferior frontal regions in the RH, was engaged across both metaphoric and non-metaphoric sentences; engagement of this network decreased as familiarity decreased. No region was engaged selectively for greater metaphoric unfamiliarity. An analysis of laterality, however, showed that the contribution of the RH relative to that of LH does increase in a metaphorspecific manner as familiarity decreases. These results show that RH regions, taken by themselves, including commonly reported regions such as the right inferior frontal gyrus (IFG), are responsive to increased cognitive demands of processing unfamiliar stimuli, rather than being metaphor-selective. The division of labor between the two hemispheres, however, does shift towards the right for metaphoric processing. The shift results not because the RH contributes more to metaphoric processing. Rather, relative to its contribution for processing literals, the LH contributes less.

**Keywords: metaphor, right hemisphere, novelty, familiarity, difficulty, laterality, language, imaging**

### **INTRODUCTION**

Metaphor has been intensely researched for decades, and the view on metaphor has been transformed from it being something poetic reserved for literary use, to something fundamental and generalizable in our daily language and thinking (Lakoff and Johnson, 2003). The pervasiveness of metaphors has been quantified: People use about 5 metaphors for every 100 words of text (Pollio et al., 1990), including 1.8 novel and 4.08 frozen metaphors (e.g., *leg of a table*) per minute of discourse (Pollio et al., 1977). In recent years there has been a surge of interest in studying the neural basis of metaphor, as the answers not only have implications for clinical conditions such as stroke, schizophrenia, and autism, but also have broader impact for understanding the comprehension of language meaning in general.

Perhaps the most debated issue with regard to the neural basis of metaphor is whether the right hemisphere (RH) plays a special role in non-literal language. Several well-known studies reported a special role of the RH in processing metaphors. Winner and Gardner (1977) examined the comprehension of non-literal sentences (e.g., *give me a hand*) in aphasic patients using a sentence-picture matching task. They found that RH patients were less accurate than left hemisphere (LH) patients (accuracies 43% vs. 58%), and suggested that an intact RH is needed for mapping non-literal language meaning onto situations in which it is appropriate (a picture of a person helping others as opposed to a picture of a hand). Bottini et al. (1994) examined the comprehension of new, unusual figures of speech in sentences (e.g., *The investors were squirrels collecting nuts*) in a neurologically healthy sample studied with positron emission tomography. In a semantic judgment task, participants judged whether a given sentence is a plausible metaphor. They compared metaphor and literal conditions and found strongly right-lateralized activation for the metaphor condition in the frontal, temporal, and parietal regions.

However, many functional magnetic resonance imaging (fMRI) studies using neurologically healthy participants have shown that metaphor processing is left lateralized. Rapp et al. (2004) examined novel metaphors in the form of A-is-B (e.g., *Die Worte des Liebhabers sind Harfenklaenge*, "*The lovers' words are harp sounds*") and their literal counterparts (*Die Worte des Liebhabers sind Luegen*, "*The lovers' words are lies*"). In a valence judgment task, participants judged whether a given sentence has a positive or a negative connotation. When compared with a lowlevel baseline, metaphors led to activation in the right inferior frontal gyrus (IFG) and temporal pole. But when compared with literal sentences, the metaphors only showed activations in the LH, in the left lateral IFG, inferior temporal gyrus, and posterior middle temporal gyrus (MTG). Schmidt and Seger (2009) also examined A-is-B metaphors (e.g., *Respect is a precious gem*). Activations for those metaphors relative to literals were found in the left precentral gyrus, temporal pole, inferior parietal lobe, and lingual gyrus. Chen et al. (2008) examined predicate metaphors embedded in a sentence (e.g., *The man fell under her spell*) in contrast with literals (*The child fell under the slide*). The metaphors led to more activation in the LH than in the RH, with the activations in the left IFG, MTG, and angular gyrus (AG), and the right anterior portion of the MTG.

What, then, determines RH involvement in metaphor processing? One of the most studied factors is metaphor novelty/unfamiliarity.<sup>1</sup> Electrophysiological studies have shown repeatedly that novel metaphors are processed differently from conventionalized ones (Arzouan et al., 2007; Lai et al., 2009; Lai and Curran, 2013). However, whether this difference is reflected in greater RH involvement is unclear, as electrophysiological metaphoricity effects were very similar between hemispheres (Coulson and Van Petten, 2007). In other studies, novelty has been found to mediate RH activations for metaphors (Mashal et al., 2005, 2007; Stringaris et al., 2006; Schmidt et al., 2007; Pobric et al., 2008). In particular, Faust (2012) proposed that the RH is involved only in novel metaphors, not in conventional metaphors. Mashal et al. (2007) contrasted 2-word conventional (*bright student*) and novel (*pearl tears*) metaphorical expressions with literal (*water drop*) and unrelated (*road shift*) expressions. In a semantic task, participants silently judged if the two words were metaphorically related. Novel metaphors, compared with literals, led to activations in bilateral IFG, right posterior superior temporal gyrus (STG), left middle frontal gyrus (MFG), and middle anterior cingulate gyrus. Conventional metaphors, compared with literals, showed activations in the right postcentral parietal lobe, left posterior STG, and left IFG. Direct comparison between novel and conventional metaphors showed that novelty led to activation in the right posterior superior temporal sulcus (STS), right IFG, and left MFG. Based on these findings, Pobric et al. (2008) conducted a repetitive transcranial magnetic stimulation (rTMS) study to examine the causal role of the right posterior superior temporal region in relation to metaphor processing. They found that rTMS to the right posterior STG impaired the processing of novel metaphors but not conventional metaphors. In contrast, rTMS to the left IFG impaired the processing of conventional but not novel metaphors.

Meta-analyses of imaging studies of non-literal language processing have come to somewhat different conclusions (Bohrn et al., 2012; Rapp et al., 2012; Yang, 2014). In Rapp et al. (2012), the overall metaphors > literal contrast based on 16 studies showed mostly LH activations, in the left parahippocampal gyrus and left IFG, but also some RH activations, such as the right IFG. The conventional metaphors > literal contrast showed activations in the LH only, including the left thalamus, left MTG, left AG, and left IFG. The novel metaphors > literal contrast showed activations in mostly the LH (IFG and MFG) but also in the RH (IFG). In Bohrn et al. (2012), the overall metaphors > literal contrast also led to bilateral activations in the IFG. The conventional metaphors > literal contrast also showed activations in the LH only, including the left IFG, left thalamus, and left STG. The results of the novel metaphors > literal contrast, different from the results of the same contrast in Rapp et al. (2012), showed activations only in the LH, in the left MFG extending into left IFG, and left inferior temporal gyrus. The novel metaphors > literal contrast difference between Rapp et al. (2012) and Bohrn et al. (2012) likely resulted from the inclusion of different studies: Rapp et al. (2012) included 5 studies whereas Bohrn et al. (2012) included 8 studies. Similarly, Yang (2014) observed bilateral activations in IFG for the overall metaphor > literal contrast. In addition, bilateral activations in MFG were also observed for this contrast. The LH activation in the IFG, MFG, inferior parietal lobule (IPL), MTG, and lingual gyrus were observed for the conventional metaphors > literal contrast. As for the novel metaphors > literal contrast, like Rapp et al. (2012) but different from Bohrn et al. (2012), activations were found in RH as well as LH regions, including bilateral IFG, bilateral MFG, left IPL, and right STG.

The present study asks whether it is metaphoricity or novelty that leads to non-specific recruitment of RH areas. Novel or unfamiliar metaphors, and unfamiliar sentences in general, are likely to require more resources involving executive processes related to reanalysis, working memory, inhibition, attention, and decision-making. Unfamiliarity is closely related to the notion of difficulty, which also has been operationalized as reaction times (RTs). If literal sentences are significantly easier to process, they likely do not engage executive processes to the same extent. Consistent with this, several studies reported longer RTs for novel metaphors than their literals: 1385 ms vs. 1261 ms in Mashal et al. (2007), 859 ms vs. 744 ms in the non-TMS group in Pobric et al. (2008), and 2300 ms vs. 2140 ms in Rapp et al. (2004). Novel metaphors also took longer to process than conventional metaphors, e.g., 1385 ms vs. 1275 ms in Mashal et al. (2007) and 859 ms vs. 742 ms in Pobric et al. (2008). Other sentence processing studies have shown that conditions that elicit longer RTs are associated with more activation bilaterally, usually stronger in the LH (e.g., Binder et al., 2005; Desai et al., 2006; Yarkoni et al., 2009; Graves et al., 2010). Thus for items that have longer RTs, it is important to take into consideration the contributions from both hemispheres. If RH contribution is measured only using the activation of the RH, ignoring the

<sup>1</sup> Some studies use the term "novel" (e.g., Faust, 2012) whereas others use "unfamiliar" (e.g., Schmidt et al., 2007). In this paper we used the two interchangeably. When we review the work of others, we use the terminology that they chose to use, which potentially represent these authors' theoretical stance. We treat "novel" not in a categorical sense (a metaphor that could not have been encountered before), but in a more continuous sense, as equivalent of "unfamiliar". That is, we treat novelty and familiarity as ends of the same continuous scale.

potential strong LH activations, then increasing RTs can lead to the (possibly false) conclusion of special contribution of the RH.

Some studies have investigated the role of difficulty in metaphor processing (Monetta et al., 2006; Schmidt and Seger, 2009; Yang et al., 2009; Diaz et al., 2011; Forgács et al., 2012, 2014). Conceptualizing difficulty as task difficulty, Monetta et al. (2006) proposed that metaphors are more difficult to process than literals, which is why the RH is needed for supplying additional resources. They showed that when the task demand is high, neurologically healthy participants comprehended metaphors similarly to patients with RH deficits. Consistent with this proposal, Yang et al. (2009) showed that more difficult conditions led to extensive RH activations including the right IFG, prefrontal cortex, and the temporal and parietal regions. Schmidt and Seger (2009) also examined difficulty, but conceptualized difficulty in terms of the ease of interpretation ratings based on Katz et al. (1988). Comparing difficult metaphors with easy ones, they showed activation in the left IFG. It is unclear whether this activation is due to metaphor-specific processing or general effects of difficulty, because no result on comparable literals (i.e., the difficult literals > easy literals contrast) was reported.

To separate the effects of metaphoric processing from general difficulty effects, unfamiliar metaphors must be compared to similarly unfamiliar literals. A few recent studies included the condition of unfamiliar literals (Diaz et al., 2011; Forgács et al., 2012), but examined metaphors that are of different types compared to those in the current study. Diaz et al. (2011) examined A-is-B type of metaphors (e.g., *A rumor is a disease*) and found that the overall novel > familiar contrast showed activations in the bilateral IFG, parahippocampal gyrus, and posterior MTG. The novel > familiar metaphors surprisingly showed no significant activation, and the novel > familiar literals showed activation in the left IFG. Forgács et al. (2012) examined noun-noun compound metaphors and found that, combining metaphors and literals, the novel > conventional contrast showed activations in regions including left IFG, bilateral insula, and Pre-SMA. The novel > conventional contrasts within the literals and within the metaphors were not reported.

A second issue that is potentially problematic is that metaphors tend to differ from literal sentences in concreteness and imageability. In predicate metaphors, a verb denoting action or motion is often applied to an abstract entity (e.g., *We have to throw out that option*.). Comparable literals require that the action be applied to concrete objects (*We have to throw out that pizza*.) This concreteness confound is difficult to remove, because it reflects inherent differences between metaphors and literals (i.e., applying concrete actions to abstract things is what make it metaphoric). In nominal metaphors, the problem can be the opposite, where metaphors are usually more concrete (*The book was a gem*.) than literals (*The book was excellent*.). Hence, in metaphor-literal comparisons, which brain activations reflect concreteness effects rather than metaphor-specific effects is difficult to determine. A way around this problem is to compare metaphors with other metaphors that differ in their novelty or familiarity. If one assumes that relatively novel metaphors engage metaphor processing machinery to a greater extent, then the novel-familiar contrast can eliminate the concreteness confound. Unfortunately, this introduces another confound, as mentioned above: novel metaphors also use more general cognitive resources. A novel-familiar comparison in literals can be used to differentiate between metaphor-specific and general processes.

In this paper we take this latter approach, and examine the effects of decreasing familiarity of both metaphoric and non-metaphoric sentences. Rather than the dichotomous novelfamiliar division, we treat familiarity as a continuous variable, which can potentially provide more power. We use fMRI data from Desai et al. (2011), who tested the role of sensory-motor systems in metaphor comprehension. Their stimuli contained a large set of metaphoric and non-metaphoric sentences that varied in familiarity, including action metaphors (*The council bashed the proposal*), abstract control (*The council criticized the proposal*), and literal action sentences (*The thief bashed the table*). The metaphoric > non-metaphoric contrasts showed activation in the bilateral anterior inferior parietal lobule (aIPL), which has been implicated as an index of (secondary) sensory-motor processing during sentence comprehension. They concluded that the understanding of metaphoric action retains a link to sensory-motor systems involved in action performance. Here we re-analyzed their data with a focus on the issue of laterality.

We also suggest that a potential cause for the divergent findings in the literature lies in the difference in methods of evaluating the role of the RH. In one approach, any activation of the RH (in a metaphor > literal or novel > conventional metaphor comparison) counts as a special role for the RH, regardless of the contribution from the LH (e.g., Schmidt and Seger, 2009). For others, *laterality* of activation is what matters, so that greater RH activation in conjunction with similar or greater LH activation does not count as a special role for the RH (e.g., Coulson and Van Petten, 2007). If the novel > conventional metaphor comparison gives rise to activations in both the RH and LH, then according to the first approach this would be evidence supporting the special role for the RH in metaphor processing. However according to the second approach this would not, unless the novel-conventional difference is greater in the RH than in the LH. Here, we investigate familiarity-related activations in both manners—as activation in the RH and as RH activity in relation to LH activity.

### **MATERIALS AND METHODS**

We briefly summarize the methods in Desai et al. (2011) and elaborate on the analyses we performed specifically for the current study.

### **PARTICIPANTS**

Twenty-two right-handed healthy adults (11 women, age 18– 33 years, average age 24 years) participated in the imaging experiment. All had normal or corrected-to-normal vision, and none had any neurological disorder. All participants gave informed consent prior to participation. This study was approved by the Institutional Review Board at the Medical College of Wisconsin.

### **MATERIALS**

Stimuli consist of 81 triplets of sentences, including metaphorical (*The jury grasped the concept*), abstract (*The jury understood the concept*), and literal action sentences (e.g., *The daughter grasped the flowers*). These sentences were matched in terms of average word frequency; number of phonemes, letters, and syllables; and grammatical structure. In a familiarity norming study, 28 participants rated each sentence on a scale of 1 (not at all familiar) to 7 (very familiar). Items that received lower familiarity ratings were considered more unfamiliar items.<sup>2</sup> In addition, 81 nonsense sentences, 81 nonword sentences, and 54 sentences with varied syntax were included.

For the purpose of the present study, the two non-metaphoric sentences (abstract and literal action) were collapsed into a single non-metaphoric condition. The mean familiarity ratings were 5.24 (SD = 0.77) for the metaphoric and 5.17 (SD = 0.98) for the non-metaphoric conditions (*p* = 0.528). Our unfamiliar stimuli were not highly unfamiliar, but were relatively less familiar than the familiar stimuli. The familiarity rating distributions between the metaphoric and non-metaphoric conditions were similar (**Figure 1**). In a separate meaningfulness judgment task, RTs for each sentence were also collected from 24 subjects. The mean RTs for the metaphoric condition were 1277 ms (SD = 145), which were not statistically different from those for the non-metaphoric condition, 1253 ms (SD = 165; *p* = 0.278). As expected, there was a strong negative correlation between RT and familiarity ratings (*r* = −0.52, *p* < 0.001).

### **EXPERIMENT PROCEDURE AND IMAGE ACQUISITION**

The details of the procedure and image acquisition are described in Desai et al. (2011). Briefly, T2<sup>∗</sup> -weighted whole-brain images were acquired with a TR of 1.8 s and voxel dimensions 3.75 × 3.75 × 4 mm<sup>3</sup> . The sentences were presented visually using white font on a black background, in two parts: The first part was the noun phrase of the sentence (e.g., *The public*), followed by the second part consisting of the verb phrase (*grasped the idea*). The order of sentences was pseudo-randomized. Participants read each sentence and made a covert meaningfulness decision during the imaging experiment. An old/new sentence recognition test was given at the end of each run to encourage and verify subject participation.

### **ANALYSIS**

AFNI software (Cox, 1996) was used for analyses. In a multiple regression model, we used the mean-centered familiarity rating for each sentence as a condition-specific regressor, to examine areas that are modulated as a function of increasing familiarity. The main effect of familiarity across conditions (metaphoric, non-metaphoric) was computed, showing areas whose response varies with familiarity regardless of metaphoricity. Condition × familiarity interactions were also computed, showing areas that are affected differently by increasing familiarity between metaphoric and non-metaphoric sentences. Given that the right STS has been particularly associated with metaphoric processing (Mashal et al., 2007; Pobric et al., 2008), we also performed a region of interest (ROI) analysis using the right STS, defined based on a maximum probability map created with the Destrieux et al. (2010) parcellation, included with AFNI.

The individual statistical maps and the anatomical scans were projected into standard stereotaxic space (Talairach and Tournoux, 1988) and smoothed with a Gaussian filter of 6 mm FWHM. In a random effects analysis, group maps were created by comparing activations against a constant value of 0. The group maps were thresholded at voxelwise *p* < 0.01 and corrected for multiple comparisons by removing clusters below a size threshold of 1000 mm<sup>3</sup> , to achieve α < 0.05. The cluster threshold was determined through Monte Carlo simulations that estimate the chance probability of spatially contiguous voxels exceeding the voxelwise *p* threshold. The analysis was restricted to a mask that excluded areas outside the brain, as well as deep white matter areas and the ventricles.

Additionally, we examined the laterality of activation associated with the main effects and interactions calculated above. A laterality index (LI) was defined as (QLH−QRH)/ (abs(QLH)+abs(QRH)), where QLH and QRH represent the fMRImeasured LH and RH contributions, respectively, and abs() indicates the absolute value of activation. LI was computed at the whole hemisphere level, and then for ROIs defined by major gyral and sulcal structures defined by a maximum probability map of regions defined by the Desikan-Killiany atlas (Desikan et al., 2006, TT\_desai\_dk\_mpm atlas, provided with AFNI). Rather than choosing a fixed arbitrary threshold to find activated voxels within each ROI, we used the method proposed by Fernández et al. (2001). First, for each participant, the mean of the 5% of the voxels with the strongest absolute value within a (bilateral) ROI were calculated. Active voxels were defined as those that fall within 50% of this mean (on both positive and negative sides) within the ROI. Jansen et al. (2006) found this method to be more robust and reproducible than using voxel counts at a fixed statistical threshold, or using unthresholded activation changes. The total activation of these voxels (defined by the sum of beta-coefficients of all above-threshold voxels) was used to calculate LIs. Both positive and negative correlations were used, as areas correlated positively as well as negatively with familiarity were considered to be relevant to processing of metaphoric or non-metaphoric language.

<sup>2</sup>All sentences are available at http://www.mccauslandcenter.sc.edu/delab/? attachment\_id=302

From the Desikan-Killiany atlas, the middle and inferior frontal gyri, superior and middle temporal gyri (both caudal and rostral divisions), and the posterior STS ("bankssts") were considered *a priori*regions of interest, as they have been associated with metaphoric processing (Faust, 2012; Rapp et al., 2012). The three divisions of the inferior frontal gyri were combined into a single IFG ROI. Nonparametric Wilcoxon rank-sum tests (Wilcoxon, 1945) were conducted to find LIs that differed from a constant (0), and correction for multiple comparisons was performed using False Discovery Rate (FDR; Genovese et al., 2002).

# **RESULTS**

### **MAIN EFFECT OF FAMILIARITY**

Decreasing familiarity resulted in increased activation in both hemispheres with LH dominance, but with some activation in the RH (**Figure 2**, **Table 1**). These regions included bilateral IFG, IFS, MFG, insula, precentral gyrus and central sulcus, lateral orbital gyrus, medial SFG, lingual gyrus, and cuneus. The STS and MTG were activated in the LH. A positive correlation with familiarity was observed in the right posterior AG. The ROI analysis on the right STS did not reveal any activation.

### **INTERACTION WITH METAPHORICITY**

No regions showed familiarity × metaphoricity interaction in the whole brain analysis, nor was there any familiarity × metaphoricity interaction in the right STS ROI.

### **LATERALITY ANALYSIS RESULTS**

Laterality analysis of the main effect of familiarity based on predefined ROIs (as opposed to the voxelwise activations found above) showed that MTG becomes more left lateralized as familiarity is decreased across both sentence types (**Table 2**). The posterior STS and caudal MFG also showed marginal left lateralization. No region showed right lateralization.

The critical analysis involves the familiarity × metaphoricity interaction using the LIs for both conditions. Because the hypothesis predicts greater right laterality for metaphors, we examined this effect with one-tailed tests to gain more sensitivity. This analysis showed that the more unfamiliar a metaphoric item is, the more right lateralized it becomes relative to increasingly unfamiliar non-metaphoric sentences, at the whole brain level and also in the caudal MFG (**Table 3**). The interaction in both regions arose from a strong left lateralized activation for the nonmetaphoric sentences and no lateralization (non-significantly different from 0) for the metaphoric sentences (**Figure 3**). Interaction trends were observed in the precuneus and precentral gyrus, following the same pattern (left lateralization for nonmetaphors, no lateralization for metaphors).

### **DISCUSSION**

We examined the effects of decreasing familiarity on both metaphoric and non-metaphoric sentences, to examine the extent to which RH activations for relatively novel, unfamiliar metaphors are driven by the general cognitive demands for


**Table 1 | Regions showing a main effect of decreased familiarity. Cluster volume (in mm<sup>3</sup> ), maximum z-score, and the coordinates in Talaraich space are shown**.

L = left hemisphere, R = right hemisphere, g = gyrus, s = sulcus, sup = superior, mid = middle, inf = inferior.

**Table 2 | Laterality indices for regions showing main effect of correlation with familiarity (positive values = left lateralization; negative values = right lateralization)**.


\* indicates p < 0.05. Regions showing a trend (p < 0.1) are also shown.

**Table 3 | Laterality indices for regions showing metaphoricity** × **familiarity interaction (positive values = left lateralization; negative values = right lateralization)**.


Regions showing a trend (p < 0.1) are also shown. \* indicates p < 0.05.

processing unfamiliar stimuli. We found first that decreased familiarity led to increased activation in both the left and RHs regardless of metaphoricity, with greater activation in the LH. This is consistent with the greater LH activation found in some studies that argue against a special role for RH (e.g., Rapp et al., 2004, 2012; Bohrn et al., 2012). While the controversy relates only to the RH, the LH can also be argued to play a "special role" in processing unfamiliar metaphors and literals, likely reflecting a greater use of the existing left lateralized language system.

While overall the unfamiliarity-related activations were left lateralized, some RH regions were also found to respond to decreased familiarity across both sentence types, most notably the right IFG, MFG, and insula. This pattern suggests that activation in these regions, frequently reported in metaphor studies and used as evidence for a special role of the RH in metaphor processing, is unlikely to reflect metaphor-specific processing but instead reflects increased general cognitive demands of processing unfamiliar stimuli. Past studies have implicated the IFG for processing difficulty (Yang et al., 2009), though in contrast to the right IFG activation observed in the present study, increased difficulty has been associated both with left (Schmidt and Seger, 2009) and bilateral (Diaz et al., 2011) IFG activation. The MFG was activated to a greater extent in the easier condition in Schmidt and Seger (2009) and was left lateralized. These differences might have resulted from a difference in the degree of unfamiliarity of the tested items in these studies: Our items were congregated closer to the familiar end of the scale whereas the items in Schmidt and Seger (2009) were closer to the unfamiliar end.

Our finding that a left lateralized network is engaged for unfamiliar sentences meshes well with Cardillo et al. (2012). In this study, the authors manipulated familiarity parametrically by exposing participants to novel metaphoric stimuli to different degrees. Effects of decreasing familiarity were found in the bilateral IFG, left posterior MTG, and right postero-lateral occipital gyri. This study did not include the corresponding literal conditions of varying familiarity. Nonetheless, the fact that both familiarity induced within a session (Cardillo et al., 2012) and familiarity established through lifelong experiences (the current study) found LH activation further support the view in which the LH and in particular the left IFG are involved in processing unfamiliar stimuli due to general cognitive demands.

We also examined RH contributions relative to LH contributions, by computing laterality indices. Greater left lateralization was observed in the MTG and marginally in the posterior STS. These regions are commonly associated with language processing, including semantic, combinatorial/syntactic, and phonological processing (Binder et al., 1997, 2009; Friederici, 2002; Hartwigsen et al., 2010; Price, 2012). These results are consistent with greater involvement of left-dominant language systems for dealing with more difficult or unfamiliar sentences.

Turning to the laterality analysis of the metaphor × familiarity interaction, the laterality of the unfamiliarityrelated activation at the whole brain level shifted to the right for metaphors relative to non-metaphors. This interaction arose from left lateralization of non-metaphors, and no lateralization (both hemispheres being activated statistically equally, with small numerical right lateralization) for metaphors. Thus, while the RH itself is not activated more than the LH for unfamiliar metaphors relative to familiar metaphors, its contribution is relatively greater for unfamiliar metaphors than for unfamiliar non-metaphors. These results suggest that metaphoric processing alters the division of labor between the hemispheres, with more bilateral activation as opposed to left lateralized activation for nonmetaphors. In other words, the RH does not contribute to a greater extent in metaphoric processing, but the LH contributes less.

The caudal MFG also showed this interaction, arising from the same pattern of left lateralization for non-metaphors and bilateral activity for metaphors. Middle frontal gyri have been associated with working memory (Leung et al., 2002), inhibitory control (Garavan et al., 1999), sustained attention and verification (Kanwisher and Wojciulik, 2000; Cabeza et al., 2003; Habib et al., 2003). While these processes are by no means metaphor specific, they appear to be engaged more for processing unfamiliar metaphoric sentences than for unfamiliar non-metaphoric sentences. The precentral gyrus and precuneus showed a marginal interaction, with more bilateral processing for metaphors. The precentral gyrus activation may be related to the semantic content of the action metaphors used in the current study. The precuneus has been implicated in mental imagery strategies and episodic memory retrieval (Cavanna and Trimble, 2006), which are relevant for metaphor processing.

A few theories predict the RH involvement in processing unfamiliar or novel stimuli. One prevailing view of the RH is that it maintains a wider semantic field, and keeps alternative meanings and senses active (Beeman and Chiarello, 1998). The putative special role of the RH in metaphor processing involves enabling access to these alternative senses. Another claims that while the processing of formulaic language like idioms are primarily left lateralized, the RH can control or modulate this processing (Van Lancker Sidtis, 2012, p.352). And yet another view suggests that the RH is involved in non-salient meaning processing (Giora, 2003). We suggest an additional possibility, namely that the RH, and especially regions such as the right IFG, come online when the resources provided by the LH are not sufficient due to difficulty of comprehension. For all types of difficult linguistic stimuli, the LH is activated more, and there is a "spill over" effect in the RH. This may also explain why in older individuals, more bilateral activity is often observed. With diminished efficiency and capacity of the aged brain, "assistance" from the RH is needed. The bilateral nature of increased activation here, and in several studies that reported regions correlated with RT cited earlier, also supports this idea. We are not aware of any studies that show increased RH activation for language processing without also showing increased LH activation.

While we have focused on the effects of decreased familiarity, increased familiarity showed more activation in the right AG. The right AG is part of the semantic system, showing greater activation for more meaningful relative to less meaningful linguistic stimuli. (Binder et al., 2009). Graves et al. (2010) found activation in the same region for meaningful word combinations (*flower girl*) relative to word pairs that are difficult to combine into a whole (*girl flower*) in a semantic judgment task. An interpretation consistent with these observations is that the activation in the right AG reflects the semantic richness of familiar word combinations, and spreading activations due to greater associations with more meaningful complex stimuli.

One characteristic of the current study is that the stimulus set did not include highly novel metaphors, and mostly included somewhat familiar, comprehensible metaphors of the kind that would be expected in daily language and popular media. This also means that the "unfamiliar" sentences in the current study may be better treated as "less familiar" sentences. It is possible that RH involvement changes for highly unfamiliar metaphors requiring extensive analysis, but this can only be assessed in comparison to equally odd, unfamiliar, or difficult non-metaphoric language. The range of unfamiliarity explored in this study may also be more relevant and ecologically valid, as majority of metaphors encountered in daily or routine language processing are likely created to be comprehensible without extensive analysis. We speculate that very novel or odd metaphors are not only rare, but may necessitate qualitatively different mechanisms involving conscious cognitive control that are usually not engaged during most language processing.

Another characteristic of the study is that the metaphors were embedded in sentences, and familiarity ratings were obtained for whole sentences (*The public grasped the idea*). The sentences were not arbitrarily complex, but had a fixed structure involving a noun phrase (an article and a noun) preceding the metaphor. The effects observed here may also represent some contributions from the noun phrase (*The public*), although those were also present in the non-metaphoric sentences (*The public understood the idea*). The studies that use two-word combinations or Ais-B metaphors have an advantage that the entire stimulus constitutes the metaphor. On the flip side, most metaphors are also encountered in sentence (and larger) contexts, and not in isolation, in routine language processing. The larger context, and the noun phrase in this case, can affect how readily a given metaphor is comprehended (e.g., the metaphor in *The student grasped the idea* may behave like a slightly more familiar metaphor than the metaphor in *The cook grasped the idea*, because students have a stronger association with understanding things). Thus results pertaining to how metaphors are processed and modulated in sentence contexts (and in the minimal noun phrase context in this case) are also relevant to metaphor processing.

### **CONCLUSIONS**

With decreased familiarity or increased novelty, there is greater activation in the whole brain across both metaphors and nonmetaphors, with more extensive recruitment in the LH. Some regions in the RH, especially the IFG and insula, respond to decreased familiarity. Activation of the right IFG, a consistent finding in studies of metaphors, likely reflects a general difficulty effect and not metaphor-specific processing. These findings suggest it is important to equate the novelty/unfamiliarity of the stimuli in studies of metaphor processing. Comparisons of novel and conventional metaphors, or novel metaphors and conventional literal sentences can, and usually do, lead to confounds due to greater general cognitive demands of processing unfamiliar stimuli.

In the present study, no brain regions responded selectively to the decreasing familiarity of metaphors. Unfamiliarity-related recruitment of the right and LHs is relatively bilateral for metaphors and left lateralized for non-metaphors, suggesting a *relatively* greater role for the RH in processing unfamiliar metaphors compared to non-metaphors. Thus, the RH does not contribute to a greater extent in metaphoric processing in an absolute sense, but LH contributes less, affecting lateralization. The answer to the question "does the RH play a special role in metaphor processing?" is both "yes" and "no". It is "yes" in the sense that relative to the LH, the RH does show greater activation compared to its relative activation for processing non-metaphoric stimuli. It is "no" in the sense that the magnitude of activation in the RH, taken by itself, is similar for both metaphoric and similarly-difficult non-metaphoric stimuli.

# **ACKNOWLEDGMENTS**

This research was supported by NIH/NIDCD grant R01 DC010783 (RHD).

### **REFERENCES**


presentation. *Brain Res.* 1146, 128–145. doi: 10.1016/j.brainres.2007. 03.008


Yang, F. G., Edens, J., Simpson, C., and Krawczyk, D. C. (2009). Differences in task demands influence the hemispheric lateralization and neural correlates of metaphor. *Brain Lang.* 111, 114–124. doi: 10.1016/j.bandl.2009.08.006

Yarkoni, T., Barch, D. M., Gray, J. R., Conturo, T. E., and Braver, T. S. (2009). BOLD correlates of trial-by-trial reaction time variability in gray and white matter: a multi-study fMRI analysis. *PLoS One* 4:e4257. doi: 10.1371/journal. pone.0004257

**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 July 2014; accepted: 16 January 2015; published online: 10 February 2015*. *Citation: Lai VT, van Dam W, Conant LL, Binder JR and Desai RH (2015) Familiarity differentially affects right hemisphere contributions to processing metaphors and literals. Front. Hum. Neurosci. 9:44. doi: 10.3389/fnhum.2015.00044*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2015 Lai, van Dam, Conant, Binder and Desai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# Rigidity, chaos and integration: hemispheric interaction and individual differences in metaphor comprehension

# **Miriam Faust 1,2\* and Yoed N. Kenett <sup>1</sup>**

<sup>1</sup> The Leslie and Susan Gonda (Goldschmied) Multidisciplinary Brain Research Center, Bar-Ilan University, Ramat-Gan, Israel <sup>2</sup> Department of Psychology, Bar-Ilan University, Ramat-Gan, Israel

### **Edited by:**

Seana Coulson, University of California at San Diego, USA

### **Reviewed by:**

Matja*z*ˇ Perc, University of Maribor, Slovenia Fanpei G. Yang, National Tsing Hua University, Taiwan

### **\*Correspondence:**

Miriam Faust, The Leslie and Susan Gonda (Goldschmied) Multidisciplinary Brain Research Center, Bar-Ilan University, Building 901, Ramat-Gan 52900, Israel e-mail: faustm@mail.biu.ac.il

Neurotypical individuals cope flexibly with the full range of semantic relations expressed in human language, including metaphoric relations. This impressive semantic ability may be associated with distinct and flexible patterns of hemispheric interaction, including higher right hemisphere (RH) involvement for processing novel metaphors. However, this ability may be impaired in specific clinical conditions, such as Asperger syndrome (AS) and schizophrenia. The impaired semantic processing is accompanied by different patterns of hemispheric interaction during semantic processing, showing either reduced (in Asperger syndrome) or excessive (in schizophrenia) RH involvement. This paper interprets these individual differences using the terms Rigidity, Chaos and Integration, which describe patterns of semantic memory network states that either lead to semantic well-being or are disruptive of it. We argue that these semantic network states lie on a rigidity-chaos semantic continuum. We define these terms via network science terminology and provide network, cognitive and neural evidence to support our claim. This continuum includes left hemisphere (LH) hyper-rigid semantic memory state on one end (e.g., in persons with AS), and RH chaotic and over-flexible semantic memory state on the other end (e.g., in persons with schizophrenia). In between these two extremes lie different states of semantic memory structure which are related to individual differences in semantic creativity. We suggest that efficient semantic processing is achieved by semantic integration, a balance between semantic rigidity and semantic chaos. Such integration is achieved via intra-hemispheric communication. However, impairments to this well-balanced and integrated pattern of hemispheric interaction, e.g., when one hemisphere dominates the other, may lead to either semantic rigidity or semantic chaos, moving away from semantic integration and thus impairing the processing of metaphoric language.

**Keywords: metaphors, creative language, cerebral hemispheres, network science, chaos, rigidity, integration**

# **INTRODUCTION**

Language is complex. Part of this complexity is the unique characteristic of human language that contains highly conventional as well as unconventional, more ambiguous and creative linguistic expressions such as novel metaphors (Faust, 2012; Mirous and Beeman, 2012). In the present paper we suggest that the ability of neurologically intact persons to cope flexibly with the full range of semantic relations expressed in language, including novel metaphoric relations, depends on the pattern of interaction between multiple brain networks in the two cerebral hemispheres during semantic processing. Specifically, we suggest that language is always a whole brain process and thus processing any type of language, including metaphors, requires integration between systemized, more rigid semantic processing associated with the left hemisphere (LH) and more flexible semantic processing associated with the right hemisphere (RH). However, when compared to more conventional types of language, processing novel metaphors may require relatively higher involvement of RH unique semantic coding.

The two cerebral hemispheres have been shown to code semantic information in different ways (for review see Mirous and Beeman, 2012). Much research indicates that RH mechanisms are highly sensitive to distant, unusual semantic relations, whereas LH mechanisms strongly focus on a few closely related word meanings while suppressing distant and unusual meanings (Brownell et al., 1983; Burgess and Simpson, 1988; Faust and Chiarello, 1998; Razoumnikova, 2000; Faust and Kahana, 2002; Bowden and Jung-Beeman, 2003; Faust and Lavidor, 2003; Mihov et al., 2010; Faust, 2012). The interaction between these two semantic systems can thus be described as lying on a *rigidity-chaos* semantic continuum. This continuum includes LH hyper-rigid and rule-based semantic processing on one extreme (e.g., in persons with Asperger syndrome (AS)), and RH chaotic and over-flexible semantic activation on the other extreme (e.g., in persons with schizophrenia). However, moving away from both LH *semantic rigidity* and RH *semantic chaos* leads to hemispheric well-balanced cooperation and to *semantic integration*. We suggest that this integration enables neurologically intact persons to process unconventional and ambiguous language, including both conventional and novel metaphoric expressions. Furthermore, we suggest that the level of semantic integration may be related to individual differences in creative ability.

Metaphors are considered to be part of the more creative aspects of language as they may require unusual semantic processing. Creativity is broadly defined as the creation of something which is both novel and useful, or appropriate (Mednick, 1962; Sternberg and Lubart, 1999; Runco and Jaeger, 2012). By this definition, a creative product is the combination of a flexible process, which allows generation of novel ideas, with a more systemized process which constrains novel concepts by their appropriateness (Nijstad et al., 2010). In line with this definition, creative language includes linguistic products which are both novel and appropriate, such as novel metaphors. Metaphors, including novel metaphoric expressions, are abundant in language (Lakoff and Johnson, 1980), as they allow efficient expressions of ideas that would otherwise be awkward to explain literally (Glucksberg, 2001). However, the use of metaphoric language requires the ability to activate a broader, more flexible set of semantic associations and combine weakly related concepts into a novel and appropriate linguistic product (i.e., sense creation, Bowdle and Gentner, 2005; Faust, 2012). Thus, metaphors are widely used in poetry, the ultimate expression of linguistic creativity, where with a few words, implicit and explicit emotions and associations from one context can powerfully be associated in a novel way with another, different context (Faust, 2012).

We have been working for the past two decades on processing of novel metaphors taken from poetry compared to conventional metaphors, literal expressions, and meaningless, unrelated wordpairs (reviewd in Faust, 2012). This research project used converging behavioral and neurocognitive techniques (accuracy and response times, split visual fields, Evoked Response Potentials (ERPs), Transcranial Magnetic Stimulation, functional Magnetic Resonance Imaging and Magnetoencephalography) to study neurotypical as well as clinical populations, such as persons with AS or schizophrenia (Faust, 2012; Gold and Faust, 2012; Zeev-Wolf et al., 2014).

This research project has consistently shown the contribution of the RH to novel metaphor processing in neurotypical persons and how deviation from a neurotypical state affects comprehension of novel metaphors (i.e., the processing of creative language): on one extreme, persons with AS exhibit rigidity of thought and have difficulties in processing novel conceptual combinations (novel metaphors) accompanied with reduced RH involvement (Gold and Faust, 2012); on the other extreme, persons with schizophrenia exhibiting loose associations, seem to have a different pattern of hemispheric involvement, including increased RH involvement. This different hemispheric pattern may result in the processing of unrelated word pairs as with meanings (Zeev-Wolf et al., 2014). Furthermore, the different patterns of hemispheric involvement in semantic processing characterizing persons with neurodevelopmental psychiatric disorders may be related to their documented deficits in the comprehension of nonliteral expressions while other language skills are relatively preserved (Martin and McDonald, 2004; Thoma and Daum, 2006; Rapp, 2012) [but see Gernsbacher and Pripas-Kapit (2012) for an alternative view on persons with AS].

The role of the RH in creative language is explained by the fine-coarse semantic processing model (FCT; Chiarello, 2003; Jung-Beeman, 2005) and is based on the notion that the RH uniquely activates and maintains a wide range of meanings and associations that enable the creation of novel conceptual combinations. This weak broad activation may better capture semantic relations which depend on the overlap of distantly related meanings. According to this theory, both hemispheres are involved in Bilateral semantic Access, Integration, and Selection (BAIS; Jung-Beeman, 2005), yet with a different processing role for each hemisphere. This difference implies different hemispheric mechanisms for metaphorical and literal language. When people comprehend literal language, the LH is strongly involved because the meaning is dominant, focal, and contextually relevant. However when people process metaphorical language, specifically novel metaphors; the RH plays a more important role because the figurative meaning of metaphors requires activations of loosely related concepts in a broader semantic field. The FCT has supporting neural evidence at both morphological and micro-anatomical levels (Mirous and Beeman, 2012). At the morphological level, there are a few distinct asymmetries between the LH and the RH, such as the LH having a larger temporal plane and a relatively higher ratio of gray to white matter and the RH having relatively more white matter and a higher degree of functional interconnectivity. At the micro-anatomical level, LH neurons have smaller input fields than RH neurons in language related brain areas. This difference in input fields may be related to more specific, fine, neural processing in the LH compared to less specific, coarser processing in the RH (Mirous and Beeman, 2012).

Several studies using functional Magnetic Resonance Imaging (fMRI) techniques to investigate hemispheric processing of metaphors have been conducted (Mashal et al., 2005; Schmidt and Seger, 2009; Yang et al., 2009; Diaz and Hogstrom, 2011; Diaz et al., 2011; Bohrn et al., 2012). Such studies show the involvement and contribution of the RH in processing novel metaphors and figurative language and how this involvement is affected by context, novelty, figurativeness, task difficulty and familiarity (Schmidt and Seger, 2009; Yang et al., 2009; Diaz and Hogstrom, 2011; Diaz et al., 2011). However, while much research has shown the role of the RH in metaphor processing, contradicting findings showing no RH role in metaphor processing and even LH dominance have also been reported (Rapp, 2012). Recent meta-analyses of several fMRI investigations of neural aspects of metaphor processing have yielded both LH and RH dominant activations (Rapp et al., 2012; Yang, 2014). Rapp et al. (2012) found more left-lateralized temporal network activation for processing non-literal language. Nevertheless, when the authors conducted subgroup analysis for different types of non-literal language types, they found more general bilateral and even more RH activated foci for non-salient, novel metaphor processing. Thus, this meta-analysis further strengthens the importance of bilateral hemispheric dynamics in the processing of non-literal language. Yang (2014) conducted an fMRI meta-analysis to investigate the role of the RH and the brain mechanisms involved in metaphor comprehension. This meta-analysis revealed that the RH is involved in metaphor comprehension and is influenced by conventionality, context and task demand. These factors might explain the contradicting evidence found in regard to the role of the RH in metaphor processing (Rapp, 2012). Furthermore, this meta-analysis related each of the three semantic processing components proposed by Jung-Beeman (2005) to neural activity, mainly the temporal lobe (medial temporal gyrus (MTG)/superior temporal gyrus (STG)) to semantic activation and integration and the frontal lobe (inferior frontal gyrus (IFG)) to semantic selection. Each component seems to involve bilateral brain regions activation, while RH regions perform coarser analysis than LH regions for the same process (Yang, 2014). Thus, while the role of the RH in metaphor processing is consistently shown, the importance of bilateral activation and hemispheric cooperation in creative and metaphoric language processing is becoming more and more apparent.

Evidence for bilateral activation in hemispheric processing of metaphors is slowly accumulating. Pobric et al. (2007) used repetitive Transcranial Magnetic Stimulation (rTMS) to study hemispheric involvement in semantic processing. They show that while RH interference only disrupted novel metaphor processing, LH interference disrupted literal and conventional metaphor processing, but facilitated novel metaphor processing (Pobric et al., 2007). These findings suggest that processing novel metaphoric relations requires dynamical, fine-tuned interaction between RH coarse and LH fine semantic processing mechanisms. Furthermore, it has been suggested that the corpus callosum mediates the processing of non-literal language such as metaphors, by the integration of relevant information between hemispheres (Thoma and Daum, 2006). Thoma and Daum (2006) show how persons suffering from agenesis of the corpus callosum (a congenital disease which results in complete or partial absence of the corpus callosum) are impaired in non-literal language processing.

These findings thus suggest a cognitive continuum which settles the contradicting evidence for hemispheric roles and interaction during metaphor processing and may provide a more general account for different patterns of neurocognitive processing of creative, including metaphoric, language exhibited by clinical and neurotypical persons (Gold et al., 2011; Gold and Faust, 2012; Zeev-Wolf et al., 2014). The continuum we propose here is a cognitive continuum of semantic processing states, ranging from rigidity to chaos. We borrow the notions of *rigidity*, *chaos* and *integration* from Siegel (2010) who uses these terms to describe psychological well-being and proposes a framework of *semantic well-being*. We will define the notions of semantic *rigidity*, *chaos* and *integration* and describe how network science allows for quantitative explorations of these notions. This is achieved by describing neural, computational and cognitive research in order to discuss the extreme states of semantic rigidity (via research on persons with AS) and semantic chaos (via research on persons with schizophrenia). Finally, we discuss semantic integration, which we have been recently exploring through the investigation of individual differences in semantic creativity.

# **SEMANTIC WELL-BEING—RIGIDITY, CHAOS AND INTEGRATION**

In presenting an integrative explanation for the psychological state of well-being, Siegel (2010) introduces the notions of rigidity, chaos and integration. As he sees it, emotional well-being is a state of integrative balance, leading to feelings of vitality and livelihood. The claim is that this balance is easily disrupted by deviation either towards too little arousal, a state of rigidity, or excessive arousal, a state of chaos (Siegel, 2010). This deviation from mental balance occurs frequently in mentally healthy persons, but extreme deviations can result in clinical conditions. Siegel claims that the key to mental balance is *integration* linking together different elements from different system, which converges into a balanced synchrony, such as that of a singing choir in harmony (Siegel, 2010). Thus, emotional well-being is considered a balance of systems that integrate with each other and mental illness can be defined as a shift from a state of integration either to a rigid extreme or to a chaotic extreme. Searching for a theoretical framework to relate these notions, Siegel realized that network science allows quantitative definition and exploration of his theory of mental well-being (Siegel, 2010). In this paper, we used network science tools to quantify semantic rigidity, chaos and well-being and relate these processing modes to hemispheric involvement.

Network science is based on mathematical graph theory, providing quantitative methods to investigate complex systems as networks. A network is composed from nodes, which represent the basic unit of the system and links that represent the relations between them. This field has greatly advanced in the past few decades due to technological and quantitative theoretical advances. This rapid development has led to investigations of both properties (structural) and dynamics (such as emotional deviation from integration) of complex systems which can be represented as networks (reviewed in Baronchelli et al., 2013). Of the various network models developed in network science theory, the network model that has been widely used to examine complex systems is the Small World Network model (SWN; Milgram, 1967; Watts and Strogatz, 1998). SWN models have successfully described a wide range of sociological, technological, biological and economical networks (Boccaletti et al., 2006; Cohen and Havlin, 2010; Kenett et al., 2010; Newman, 2010). Two main characteristics of SWN are the networks clustering coefficient (CC) and its average shortest path length (ASPL). The CC refers to the probability that two neighbors (a neighbor is a node *j* that is connected through an edge to node *i*) of a node will themselves be neighbors. The ASPL refers to the average shortest amount of steps (nodes being traversed) needed to be taken between any pair of nodes. A SWN is characterized by having a large CC and a short ASPL.

At the cognitive level, application of network science tools is also developing, mainly to investigate complex systems of language and memory structure (Vitevitch, 2008; Borge-Holthoefer and Arenas, 2010; Chan and Vitevitch, 2010; Vitevitch et al., 2012, 2014; Baronchelli et al., 2013). In the linguistic domain, lexicons of different languages seem to display SWN characteristics, considered to be a fundamental principle in lexical organization (Steyvers and Tenenbaum, 2005; De-Deyne and Storms, 2008a,b; Borge-Holthoefer and Arenas, 2010; Kenett et al., 2011). Investigating the complexity of semantic knowledge with network science allows to uniquely examine fundamental questions such as the nature of semantic organization (what are the structural principles that characterize semantic knowledge?), process and performance (to what extent can human performance on semantic processing tasks be explained in terms of general processing in semantic memory network?) and typical and atypical semantic lexicon development (Steyvers and Tenenbaum, 2005; Beckage et al., 2011; Kenett et al., 2013). Network research in language is slowly shifting from an interest in investigating the structure of mental lexicons to investigating cognitive processes operating on these lexicon networks (reviewd in Borge-Holthoefer and Arenas, 2010). To date, no network model has been proposed to explain differences in creative language processing and specifically metaphor processing, both in healthy and clinical populations. Such a model must be able to provide a network based explanation and predictions to the differences found in a wide range of creative language processing modes, including metaphor comprehension.

A central concept in network theory is the random graph. A random graph is generally defined as any graph in which some parameters are fixed and some parameters are unconstrained, thus random (Newman, 2010). Random graphs were extensively studied by Erdös and Rényi, who defined a general model for a random graph (Erdös and Rényi, 1960). In their model, a graph consists of nodes (N) and a certain probability for the existence of a link between these two nodes (*L*), which is drawn via a Poisson distribution. This probability ranges from 0, where all of the nodes are disconnected from each other, to 1, where all of the nodes are connected to each other. Thus, having a fixed (N, *L*) results in a spectrum of different networks which vary in their connectivity patterns. It is important to note that the randomness of this model is based on its fixed parameters and that the model's simplicity does not accurately model real world networks (Sporns, 2011). To better account for real world network properties (namely, high CC and low ASPL), Watts and Strogatz proposed a "small-world" random graph model (SW; Watts and Strogatz, 1998). Given a fixed number of nodes (N), fixed average degree (K; amount of nodes connected to a specific node *i*) and a probability parameter (p), a SW random graph is constructed in the following way: first, a regular network is constructed with N nodes connected to K neighbors. Next, every edge between a pair of nodes is rewired with a probability of p. Rewiring is defined as changing a link from connecting node *i* and node *j*, to connecting node *i* and node *k*. The rewiring process reshapes the structure of the random network such that it better represents real world networks (as described above). Such random graph models provide a quantitative framework to study different structural organizations of complex systems with a fixed number of nodes, such as brain networks or mental lexicons. In this sense, it is possible to examine how a varying degree of connectivity, which is brought about by variation in *L* affects cognitive processing and predicts individual differences and atypical cognitive conditions.

We argue for a continuum of different mental lexicon states which constrains creative language including metaphor processing. This semantic continuum ranges from a mental lexicon state with extremely low connectivity (resulting in more ordered, rigid organization) to a mental lexicon state with extremely high connectivity (resulting in more random, chaotic organization) (**Figure 1**). In between lies a family of lexicon network states with varying connectivity structure, such as Barabasi-Albert or scale free networks (Cohen and Havlin, 2010). We suggest that a mental lexicon state which balances between these two extremes allows for an efficient processing of both conventional and creative language and also for the differentiation between these language types and meaningless linguistic expressions.

# **SEMANTIC RIGIDITY**

On one extreme of the semantic continuum are rigid networks. Such rigid networks are strongly ordered and minimally random, thus exhibiting a low CC and a high ASPL. Classical computational cognitive models which were proposed in the 1960's to represent semantic memory are one such example (i.e., Collins and Quillian, 1969). Such models were structured tree-based representations and were criticized for their inability to account for flexible categorization (Rogers, 2008;

Borge-Holthoefer and Arenas, 2010). Since metaphor processing requires the activation of wide, flexible associative networks (Faust, 2012), we propose that such rigid networks are inefficient in facilitating creative language processing. A distinctive example of how rigidity of thought disrupts creative language processing is found in persons with AS.

While persons with AS display relatively intact formal linguistic processing (syntax, morphology, phonology), they exhibit extensive difficulties in higher-level aspects of linguistic processing (Gold and Faust, 2012). Studies have shown that persons with AS show difficulty in understanding non-literal language, such as semantically ambiguous (Le Sourn-Bissaoui et al., 2011) and metaphoric (Gillberg and Gillberg, 1989; Gold et al., 2010; Gold and Faust, 2010, 2012) language.

Recent neural research in autism points to white matter deficiencies leading to disrupted connectivity, as suggested by the under-connectivity theory of autism (Just et al., 2004, 2012; Williams et al., 2013). This theory postulates that connectivity of inter-regional brain circuitry is disrupted, particularly affecting cognitive processes which demand integration of frontalposterior brain interactions. This under-connectivity in autism has been shown in various cognitive processes, specifically in language comprehension (Just et al., 2004, 2012; Williams et al., 2013). McAlonan et al. (2009) investigated white matter deficits in children with AS and found that they have predominantly right sided white matter deficits, but also greater white matter volume than controls in LH language areas. Finally, Boger-Megiddo et al. (2006) have shown that children with autism have a disproportionately smaller corpus callosum volumes than typically developing controls.

These findings were further corroborated by functional and neuro-structural studies (Koshino et al., 2005; Nordahl et al., 2007) and converge with electrophysiological evidence showing disrupted RH for processing novel metaphors by persons with AS (Gold and Faust, 2010; Gold et al., 2010). Such electrophysiological research found no differences in the ERPs for processing novel metaphors compared to processing unrelated word pairs, in contrast to neurotypical controls. Thus, when persons with AS processed novel metaphoric and unrelated two word expressions, their N400 amplitudes did not differ, suggesting that they process novel, potentially meaningful semantic relations, as if they are meaningless. In addition, when persons with AS processed conventional and novel metaphors their N400 amplitudes were significantly more negative compared to neurotypical controls. No such difference was found in the N400 amplitude for processing literal or unrelated meanings in persons with AS. These findings suggest that for persons with AS, integration of novel metaphoric meanings is as difficult as the integration of unrelated, nonsensical meanings. The findings may thus provide electrophysiological evidence for the specific difficulties exhibited by persons with AS in processing creative language such as novel metaphors (Gold et al., 2010).

At the cognitive level, Gold and Faust (2012) have attempted to explain the difficulties in processing metaphoric language typically exhibited by persons with AS by extending the Empathizing-Systemizing theory proposed by Baron-Cohen (2009). This perspective argues that in the language domain, conventional language processing is rule-based and thus considered the more systemized part of semantic processing, which remains intact in persons with AS. Creative language processing, on the other-hand, involves some degree of semantic rule-violation strategies, such as the ability to violate conventional, dominant semantic relations and connect remote associations into a new and appropriate linguistic product. As such, the authors argue that creative language processing can be considered similar to the Empathic system, in the sense that is it much less rule-based, thus the processing of this type of language may be disrupted in persons with AS (Gold and Faust, 2012).

We have recently applied a network science research, investigating the structure of semantic memory of persons with AS compared to neurotypical controls (Kenett et al., under review). We show that the semantic memory structure of persons with AS is more compartmentalized than that of neurotypical controls it breaks apart into smaller sub-parts. Community structure is extensively studied in network science, known as modularity (Newman, 2006; Fortunato, 2010; Meunier et al., 2010). The principle of modularity seems to be a fundamental principle of brain network organization and modularity disruption has been related to neurodegenerative diseases (Bullmore and Sporns, 2012). We claim that the hyper-modular structure of semantic memory in persons with AS is related to their rigidity of thought. We suggest that the hyper-modular mental lexicon network organization may hinder their ability to break apart from a specific module in the network and is thus related to their rigidity of thought.

In summary, we suggest that a systemized, highly conventional and relatively rigid processing is crucial for efficient processing of the more conventional, rule-based parts of language, associated with the LH. Nevertheless, extreme states of rigid semantic processing may disrupt the ability to process the more creative aspects of language, associated with higher RH involvement, as evident in persons with AS. Neural, behavioral and network research in persons with AS is beginning to converge to a possibly more coherent explanation of the difficulties such persons exhibit in processing the more creative types of language such as novel metaphors. Such difficulties may be related to an extremely rigid semantic system state, most likely as a result of neural underconnectivity which disrupts their ability for flexible semantic processing. Research on the effect of network rigidity on cognitive processing has only recently begun and requires much further research to better understand this effect (Arenas et al., 2012; Shai et al., 2014). We now turn to the other extreme end of the semantic state continuum—the chaotic state.

# **SEMANTIC CHAOS**

On the other extreme of the semantic continuum lies chaotic state. In network science terms, chaotic networks can be defined as being random, or nearly random—high CC and very low ASPL, marking a network which is very highly connected and very little organized. Current neurocognitive studies which applied network science tools to investigate the developing brain have shown that brain network structures reorganize from an initial chaotic SWN state to a more structured network state (Boersma et al., 2011; Fan et al., 2011; Yap et al., 2011; de Bie et al., 2012; Smit et al., 2012; van Straaten and Stam, 2013). These studies provide neurophysiological evidence that complex brain networks start from a more chaotic, strongly small-worlded state, and slowly, as the brain matures, shift to a more structured state, while retaining SWN properties. Thus, in accordance with Siegel (2010) notion of well-being, the brain transgresses from a chaotic state to a more structured, balanced state. In fact, network research in neurodegenerative diseases show how such diseases alter healthy network states (van Straaten and Stam, 2013). Schizophrenia is a distinctive disease that results in altered neurocognitive network states.

Application of network science to EEG and fMRI data of individuals with schizophrenia has revealed loss of overall functional connectivity and small-world properties with increased network randomness (Micheloyannis et al., 2006; Liu et al., 2008; Rubinov et al., 2009; Lynall et al., 2010). Several fMRI studies have reported reduced clustering and reduced modularity in patients with schizophrenia (Liu et al., 2008; Alexander-Bloch et al., 2010), all supporting the network randomization theory of schizophrenia (Rubinov et al., 2009). Furthermore the severity of the disruption of the small-world structure of the brain seems to be related to the duration of the illness (Liu et al., 2008; Rubinov et al., 2009).

We have recently conducted behavioral and MEG research to investigate metaphor processing in persons with schizophrenia compared to neurotypical controls (Zeev-Wolf et al., 2014; Zeev-Wolf et al., in preparation). This research showed how the ability to differentiate between potentially meaningful, novel metaphoric expressions and meaningless word pairs may be deficient in persons with schizophrenia. Persons with schizophrenia appear to over-rely on coarse semantic coding, which may disrupt their ability to balance between finding new meanings to novel metaphors, on the one hand, and rejecting meaningless linguistic stimuli on the other hand. This was accompanied by a different pattern of hemispheric involvement during the processing of linguistic expressions. Specifically, we found a deficient pattern of RH excessive involvement for all types of expressions, mainly to novel metaphors, in persons with schizophrenia compared to neurotypical controls. Thus, at short c1 stimulus onset asynchrony (SOAS), while neurotypical persons exhibited a LH advantage for novel metaphor processing, persons with schizophrenia exhibited RH advantage. Furthermore, while neurotypical controls exhibited a LH advantage for processing literal and conventional metaphors, persons with schizophrenia exhibited a RH advantage (Zeev-Wolf et al., 2014). The MEG research provided further evidence for the unbalanced relations between hemispheres in processing novel metaphors. We found a general RH over-activation and unbalanced hemispheric activation during metaphor comprehension, as compared to neurotypical controls (Zeev-Wolf et al., in preparation).

At the cognitive level, network research investigating language and thought disorders in persons with schizophrenia is only initially developing (Mota et al., 2012; Voorspoels et al., 2014). Mota et al. (2012) used network tools to study speech acts produced by manic and schizophrenic patients, by creating speech graphs for each clinical population (Mota et al., 2012). This research shows how quantitative network measures can differentiate between persons with schizophrenia (by quantitatively accounting for the schizophrenic phenomena of "poverty of speech"), manic patients (by quantitatively accounting for the manic phenomena of "flight of speech") and neurotypical controls, providing valuable clinical information not measured by classical clinical measurements. Further cognitive network research is required to better quantify the semantic memory of persons with schizophrenia and how it deviates from neurotypical controls, to better understand symptoms such as "loose associations".

In summary, chaotic semantic network state allows for more flexible creative processing, associated with the RH. While a SWN state is a crucial aspect of neurocognitive structural and functional organization (Bullmore and Sporns, 2012; Baronchelli et al., 2013), an over chaotic SWN state may lead to cognitive deficiencies, as apparent in persons suffering from schizophrenia (Zeev-Wolf et al., 2014). As such, the semantic system must balance between rigidity, which may disrupt creative language processing and chaos, which may disrupt conventional, literal language processing. Furthermore, semantic chaos may also disrupt the processing of creative language, interfering with the appropriateness and relevance aspects of creative products. Each of these extremes thus affects the two components of the creative product—novelty (flexibility, chaos) and appropriateness (systemized, rigidity), respectively. This balance is achieved by integration.

# **SEMANTIC INTEGRATION**

To avoid the extreme states of either semantic rigidity or semantic chaos, we suggest that the neurocognitive system must strive for a balanced dynamics in its semantic processing. On the one hand, a highly-structured rule-based semantic system is advantageous to the cognitive system in regard to quickly retrieving the more conventional types of language such as literal meanings and highly conventional metaphoric expressions. This systematic, constraining semantic relation may thus offer a processing advantage for the rule-based semantic system of the LH. On the other hand, when the semantic relations between words comprising a linguistic expression are distant and unusual, such as in novel metaphors, the rule-based semantic system of the LH may require a complementary neural system that is able to cope with the potential rule violations created by non-conventional semantic combinations (Faust, 2012). These two systems must cooperate in a balanced manner, to achieve semantic, including metaphorical, well-being (Siegel, 2010) and to avoid extreme conditions where one system is dominant. Such unbalanced conditions can result in extreme rigidity, leading to an autistic-like state or extreme chaos, leading to a schizophrenic-like state (as reviewed above). Our notion of an interaction between a rule-based, more rigid, systemized linguistic LH system and a hyper-flexible, more chaotic, non-systemized linguistic RH system is supported by the fine coarse model, as described above (Mirous and Beeman, 2012). However, what might be the general neurocognitive basis for such a sub-division?

We have recently proposed a general account for the relations between two hemispheric systems that may support the creative process in different modalities (Kenett et al., under review). We suggest that creativity is not confined to the RH, but that it is a product of a dual system interaction in a given cognitive domain—a specialized neurocognitive system responsible for conventional processing and a non-specialized neurocognitive system responsible for unconventional processing. The interaction between these two systems allows for effective processing of both conventional and unconventional stimuli and may thus support creativity. We investigated our theory in a cognitive task in which the RH is the specialized system, namely face processing, in order to generalize the findings from language research to the visual domain. Face processing has been shown to be more typically processed by the RH (Yovel et al., 2008) thus allowing us to investigate our account. We show how conventional, natural faces are better processed by the specialized RH system, whereas unconventional faces are better processed by the non-specialized LH system (Kenett et al., under review). Furthermore, we show how only processing of unconventional faces presented to the non-specialized LH system is significantly positive related to creative ability (see Aziz-Zadeh et al., 2013 for supporting neural evidence). Thus, a well-balanced interaction between specialized and non-specialized neurocognitive systems seems to be critical for the efficient processing of all types of stimuli and mainly for coping with the less conventional, creative aspects of reality.

Our theory and findings are supported by the growing amount of research showing the importance of hemispheric communication for creativity (Razumnikova, 2007; Takeuchi et al., 2010; Zhao et al., 2014). Takeuchi et al. (2010) have found a significant positive correlation between the size of the corpus callosum and creative ability. The authors interpret their findings as supporting the idea that creativity is associated with "efficient integration of information" through integrated white matter pathways. In a follow up research, these authors conducted a resting state functional imaging research to investigate gray and white matter correlation with intelligence and creativity (Takeuchi et al., 2011). In regard to creativity, this research found a positive significant relation between white matter and creativity, further demonstrating the importance of white matter connectivity and creative ability. Recently, Zhao et al. (2014) conducted a functional connectivity research to examine hemispheric activation in verbal creativity. The authors report bilateral neural pathway activation with greater functional connectivity in the RH. It might be argued that this intra-hemispheric activation is required for the complex interplay between specialized and non-specialized systems in processing conventional and unconventional stimuli and even possibly conventional and unconventional features of a given stimulus.

From a network perspective, classical theory on creativity has directly related it to semantic (or associative) memory structure (Mednick, 1962; Runco and Jaeger, 2012). Mednick's theory of individual differences in creativity proposes that high creative persons are characterized by "flat" (broader) instead of "steep" (few) association hierarchies (Mednick, 1962). Schilling (2005) proposed a SWN theory of creative insight problem solving, suggesting that insight is achieved via restructuring of semantic memory network. Rossman and Fink (2010) found that high creative persons give lower estimates of the distance between unrelated word pairs as compared to less creative persons, implying that high creative persons may have a wider, interconnected semantic network than low creative persons. Finally, we have recently conducted an empirical network research directly investigating Mednick's notion of the difference between low and high creative persons (Kenett et al., 2014). We show how the semantic network of low creative persons is more rigid than that of high creative persons, thus providing empirical network support for Mednick's theory (but see Benedek and Neubauer, 2013 for an alternative view). Thus, individual differences in creative ability may be constrained by semantic memory structure, which is in accord with our proposed semantic continuum. In line with this notion, as the semantic memory state is more rigid, thus it is "less creative", till the point of a clinical state (persons with AS). On the contrary, as the semantic memory state is more chaotic, thus it is "more creative", till the point of a clinical state (persons with schizophrenia).

In summary, semantic integration is crucial for semantic well-being and seems to be implemented by hemispheric communication between a specialized system and a non-specialized system. We propose that this explanation complements the finecoarse semantic processing model and provides a comprehensive account for the contradicting role of the RH in metaphor processing (Faust, 2012; Rapp, 2012). Novel metaphor processing first requires sense retrieval of the conventional parts of the metaphor followed by a process of sense creation which links together the remote parts of the novel metaphor unto a new meaning (Bowdle and Gentner, 2005). Thus, activation of both hemispheres is required—each system contributing its unique processing and via efficient and flexible intra-hemispheric communication achieves semantic integration. This flexible interaction dynamics between the specialized and non-specialized systems may result in the ability to cope with the full range of semantic processing, including novel metaphor comprehension. However, deficient intra-hemispheric communication can result in the extreme states of the semantic continuum. We propose that individual differences in the relation between the LH specialized and RH non-specialized linguistic systems are related to differences in lexicon organization across the semantic continuum, as expressed in the difference between low creative versus high creative persons.

### **CONCLUSIONS–THE WELL-BALANCED SEMANTIC BRAIN**

We began this paper by stating that language is complex. We propose that language is a complex semantic system with varying types that require a delicate balance between the more rigid and the more chaotic aspects of semantic processing, striving for integration. This semantic integration is achieved by hemispheric communication and structural and functional neurocognitive connectivity. We propose a semantic continuum which ranges from extremely rigid to extremely chaotic mental lexicon organization. We argue that such a continuum can explain different modes of semantic processing in clinical (such as persons with AS or Schizophrenia) populations as well as individual differences in creative ability. We provide neural, behavioral and network science evidence, which converge to such a neurocognitive network continuum. Finally, we describe how different patterns of novel metaphor processing can be explained by such a continuum and how it reconciles between the contradicting evidence found in regard to the role of the RH and LH in metaphor processing.

Application of graph theory in neurocognitive research in the past two decades provided quantitative means to explore structure and dynamics of brain networks at all levels. So far, network science has been mainly used to investigate neural structural and functional networks, but such application is also growing in the investigation of the cognitive domain (Baronchelli et al., 2013). Network research at the neural level has identified two key principles of neural networks: functional segregation and integration (Bullmore and Sporns, 2012). In this paper we suggest that these two principles are at the basis of the cognitive task of metaphor processing: while a functional segregation of hemispheric systems operates on complementary types of stimuli, only through semantic integration is efficient metaphor processing achieved. We suggest that analyzing such different mental lexicon conditions which result in various communication, language and thought conditions can greatly contribute to the research of such conditions by bringing together seemingly unrelated findings and conditions and providing a theoretical framework to which they can be related.

A major direction for future research is to relate the systemized LH and the less systemized, more flexible RH semantic systems to the mental lexicon network. Mainly, are there dual parallel lexicon networks which allow efficient systemized and flexible processing? Or is there rather a general mental lexicon network which is somehow represented at the whole brain and operated differently by the systemized LH and flexible RH systems? While the growing mass of evidence of hemispheric communication during semantic processing (also supported by the FCT) seems to support the latter, future converging network and neurocognitive research is required to further investigate the matter. Recently, Caeyenberghs and Leemans (2014) conducted a network based fiber tractography analysis in order to reconstruct the LH and RH structural networks. These authors show how the LH is significantly more structured than the RH, whereas the RH is more small-worlded than the LH (see Caeyenberghs and Leemans, 2014 for a full description). Thus, these findings provide further neural support for our theory. Further research is needed to relate hemispheric network properties and the cognitive mental lexicon. Finally, further network research is required to better quantify our proposed semantic continuum. Mainly, how can a balanced integrated semantic state be quantified in network terms? Another such direction is the application of network science to study lexical organization in chaotic conditions, such as persons suffering from schizophrenia. Such research, which is currently lacking, can further strengthen our proposed semantic continuum and shed a unique light on this clinical condition. Finally, as brain organization at all levels adheres to a network organization, network science should be used to extend and develop neurocognitive models and theories. Such addition of a network layer to models and theories can help in restructuring current models; provide more general accounts and empirical predictions. The theory presented here is one such attempt.

### **ACKNOWLEDGMENTS**

We thank Dror Kenett for his helpful remarks on this manuscript. This research was supported by the Israel Science Foundation (ISF) grant (number 724/09) to Miriam Faust and partially supported by the I-CORE Program of the Planning and Budgeting Committee.

### **REFERENCES**


Milgram, S. (1967). The small world problem. *Psychological Today* 1, 62–67.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 February 2014; accepted: 24 June 2014; published online: 14 July 2014*. *Citation: Faust M and Kenett YN (2014) Rigidity, chaos and integration: hemispheric interaction and individual differences in metaphor comprehension. Front. Hum. Neurosci. 8:511. doi: 10.3389/fnhum.2014.00511*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Faust and Kenett. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# The role of the precuneus in metaphor comprehension: evidence from an fMRI study in people with schizophrenia and healthy participants

# **Nira Mashal 1,2\*, Tali Vishne<sup>3</sup> and Nathaniel Laor 3,4**

<sup>1</sup> School of Education, Bar-Ilan University, Ramat-Gan, Israel

<sup>2</sup> Multidisciplinary Brain Research Center, Bar-Ilan University, Ramat-Gan, Israel

<sup>3</sup> Tel Aviv-Brull Community Mental Health Center, Tel Aviv, Israel

<sup>4</sup> Child Study Center, Yale University, New Haven, CT, USA

### **Edited by:**

Seana Coulson, University of California, San Diego, USA

### **Reviewed by:**

Thilo Van Eimeren, Christian-Albrechts University, Germany Alexander Michael Rapp, University of Tuebingen, Germany

**\*Correspondence:**

Nira Mashal, School of Education, Bar-Ilan University, Ramat-Gan 5290002, Israel e-mail: mashaln@biu.ac.il

Comprehension of conventional and novel metaphors involves traditional language-related cortical regions as well as non-language related regions. While semantic processing is crucial for understanding metaphors, it is not sufficient. Recently the precuneus has been identified as a region that mediates complex and highly integrated tasks, including retrieval of episodic memory and mental imagery. Although the understanding of non-literal language is relatively easy for healthy individuals, people with schizophrenia exhibit deficits in this domain. The present study aims to examine whether people with schizophrenia differentially recruit the precuneus, extending to the superior parietal (SP) cortex (SPL), to support their deficit in metaphor comprehension. We also examine interregional associations between the precuneus/SPL and language-related brain regions. Twelve people with schizophrenia and twelve healthy controls were scanned while silently reading literal word pairs, conventional metaphors, and novel metaphors. People with schizophrenia showed reduced comprehension of both conventional and novel metaphors. Analysis of functional connectivity found that the correlations between activation in the left precuneus/SPL and activation in the left posterior superior temporal sulcus (PSTS) were significant for both literal word pairs and novel metaphors, and significant correlations were found between activation in the right precuneus/SPL and activation in the right PSTS for the three types of semantic relations. These results were found in the schizophrenia group alone. Furthermore, relative to controls, people with schizophrenia demonstrated increased activation in the right precuneus/SPL. Our results may suggest that individuals with schizophrenia use mental imagery to support comprehension of both literal and metaphoric language. In particular, our findings indicate over-integration of language and non-language brain regions during more effortful processes of novel metaphor comprehension.

**Keywords: schizophrenia, novel metaphors, precuneus, language, fMRI**

### **INTRODUCTION**

Patients with schizophrenia demonstrate pervasive deficits in processing different pragmatic aspects of language, and in particular they show impairments in understanding proverbs, irony, and metaphors (Rapp, 2009). Comprehension of figurative language relies on effortful cognitive processes in which the non-literal message of the utterance is extracted. People with schizophrenia tend to interpret proverbs literally, a phenomenon termed "concretism", and clinicians regard proverb interpretation as a potential tool in the diagnosis of schizophrenia (Reich, 1981). Some researchers have suggested that schizophrenia is associated with more general difficulties in abstract thinking (for a review see Thoma and Daum, 2006). The present study focuses on the challenges that people with schizophrenia experience when processing metaphor comprehension, especially novel metaphors.

Metaphors do not constitute a homogenous class of expressions but instead there is a continuum from idioms (dead metaphors) at one end to novel metaphors (live metaphors) at the other end (Fraser, 1998), with metaphors of different levels of conventionality in between. That is, some conventional metaphors had once been novel but due to repeated use have lost their metaphoricity. Metaphor comprehension is said to depend on level of conventionality (Glucksberg and Keysar, 1990; Giora, 1997; Giora and Fein, 1999; Bowdle and Gentner, 2005). The Career of Metaphor model (Bowdle and Gentner, 2005) argues that a newly created metaphor is comprehended via a comparison process, whereas a conventional metaphor is understood via a categorization process. Because the meanings of conventional metaphors are already stored in long term memory (i.e., they have been lexicalized), they are retrieved directly from the mental lexicon or via a previously created abstract metaphoric category. Unlike the comprehension of conventional metaphors, comprehension of novel metaphors involves an on-line effortful process of extraction and comparison of features.

Computing the metaphorical interpretation of an utterance relies on additional cognitive processes. Appraisal of the meaning of figurative language seems to be associated with the development of the ability to evoke mental images. Accordingly, schoolaged children provide less sophisticated, less comprehensive, and more concrete mental images of idiom content than do adults (Nippold and Duthie, 2003). Behavioral evidence concerning the role of mental imagery in the comprehension and memory of idioms suggests that when people interpret idioms they construct a general mental image that is strongly constrained by conceptual mappings between base and target domains (for a review see Gibbs and O'Brien, 1990). For example, the mental image associated with *spill the beans* is derived from the conceptual mapping between the image of a mind as a container and the image of ideas as physical entities (Lakoff, 1987). This mapping evokes the image of taking ideas out of the physical container of the mind. According to Gibbs and O'Brien (1990), these images are unconscious, automatic, and independent of modality. With respect to this notion, Bottini et al. (1994) noted that the retrieval of information from episodic memory as well as mental imagery may be necessary to overcome the denotative violation inherited in metaphoric language.

It has been suggested that deficient language comprehension in schizophrenia is associated with right hemisphere involvement (e.g., Kircher et al., 2002; Mitchell and Crow, 2005; Bleich-Cohen et al., 2009; for a review see Rapp, 2009). According to Mitchell and Crow (2005), the abnormalities in language processing that are typical of schizophrenia reflect activation in right hemisphere homolog regions of key left hemisphere language regions. Furthermore, Mitchell and Crow (2005) argued that these functional changes indicate the loss or reversal of lateralized activation of brain regions associated with particular components of language processing. Although there is behavioral evidence of impairments in non-literal language comprehension in schizophrenia (de Bonis et al., 1997; Drury et al., 1998; for a review see Rapp, 2009; but see also Titone et al., 2002), only few neuroimaging studies tested metaphoric processing in this population (Kircher et al., 2007; Mashal et al., 2013).

If the right hemisphere is deficient in schizophrenia (Mitchell and Crow, 2005), and since there is some evidence suggesting that processing of novel metaphors involves the right hemisphere (Mashal et al., 2005, 2007; Schmidt et al., 2007; Pobric et al., 2008; Mashal and Faust, 2009, but see Rapp et al., 2004, 2007), it is especially intriguing to test novel metaphor processing in schizophrenia. Kircher et al. (2007) found disrupted brain activation during an implicit task of metaphor processing in people with schizophrenia. Participants silently read novel metaphoric sentences (e.g., *the lovers' words are harp sounds*) as well as matching literal sentences (e.g., *the lovers' words are lies*), and then decided whether the sentence had a positive or a negative connotation. People with schizophrenia demonstrated increased activation in the left inferior frontal gyrus (IFG) while processing novel metaphors, whereas healthy participants demonstrated stronger signal changes in the right superior/middle temporal gyrus. Interestingly, the severity of concretism, as rated with the Positive and Negative Syndrome Scale (PANSS), was negatively correlated with left IFG activation, suggesting that activation of this region contributes to concrete thinking in schizophrenia. In a recent fMRI study (Mashal et al., 2013), people with schizophrenia who were asked to silently read novel metaphors demonstrated increased activation in left middle frontal gyrus (MFG) relative to processing of meaningless word pairs. This pattern of activation differed from enhanced brain activation in the right IFG observed in healthy controls. Thus, reversed lateralization patterns were documented in schizophrenia. These results suggest that inefficient processing of novel metaphors in schizophrenia may involve compensatory recruitment of additional brain regions, such as the left MFG, a region known to be involved in executive functioning, and specifically in working memory (e.g., Braver et al., 1997; Jha and McCarthy, 2006). Furthermore, direct comparison between the people with schizophrenia and healthy adults on processing of literal expressions and novel metaphors relative to a baseline condition revealed greater activation in left precuneus in the schizophrenia group.

The precuneus has been studied extensively over the past decade as a central hub of the default mode network (DMN), which typically shows deactivation compared to rest for sensory motor tasks in healthy participants (e.g., Fransson, 2005; Cavanna and Trimble, 2006; Fransson and Marrelec, 2008; Margulies et al., 2009; Zhang and Li, 2012). It has been observed that tasks that demand much attention are associated with decreased activity in the DMN (e.g., Mazoyer et al., 2001). The precuneus is interconnected with both cortical and subcortical regions. It is specifically connected to parietal areas, including the inferior and superior parietal (SP) cortex and the intraparietal sulcus, which have been associated with processing of visuo-spatial information (Selemon and Goldman-Rakic, 1988). Tracer injection studies in non-human primates have shown that the extra-precuneus cortico-cortical connections include the supplementary motor cortex, dorsal premotor area, anterior cingulate, and language related areas such as the prefrontal cortex (BA 8, 9, and 46), as well as the posterior superior temporal sulcus (PSTS) (for a review see Cavanna and Trimble, 2006). These widespread connections with frontal and temporal regions suggest that the precuneus may be involved in a variety of highly integrated and associative behavioral functions. The precuneus has been linked with language related tasks at word (e.g., Kouider et al., 2010) and sentence level comprehension (Whitney et al., 2009). Reviewing 100 studies, Price (2010) concluded that the comparison of comprehensible and incomprehensible sentences is associated with activation in four core regions including the precuneus, the anterior and posterior parts of the left middle temporal gyrus (MTG), bilateral anterior temporal poles, and the left angular gyrus. These regions were also identified in a meta-analysis of 120 studies (Binder et al., 2009) that pointed out seven brain regions engaged in semantic processing, including the posterior cingulate extending to the precuneus.

The precuneus has also been linked to episodic memory retrieval (Shallice et al., 1994), processing of mental imagery (Hassabis et al., 2007; Johnson et al., 2007; Burgess, 2008), and visuo-spatial memory functions (Vincent et al., 2006; Epstein et al., 2007). Previous studies have found that the retrieval of contextual associations is related to activation in the posterior precuneus and left prefrontal cortex. Lundstrom et al. (2005) suggested that the posterior precuneus is activated during regeneration of previous contextual associations and that the left lateral inferior frontal cortex is engaged in explicit retrieval as well as in integration of the contextual associations. Thus, the precuneus (together with inferior frontal cortex) is implicated in the recollection of past experiences. According to Binder et al. (2009), the precuneus is involved primarily in encoding episodic memories but at the same time it is consistently activated in semantic tasks, as it stores meaningful experiences together with their related associations in order to guide future behavior.

Evidence regarding the role of the precuneus in metaphor comprehension is mixed. Data from an fMRI study with healthy participants showed prominent left precuneus activation when familiar metaphoric sentences were contrasted with literal sentences (Schmidt et al., 2010). Data from another fMRI study indicated that the right precuneus plays an important role in processing novel metaphors but not in processing familiar metaphors (Mashal et al., 2005), suggesting that it involves retrieval of information from long-term episodic memory or the use of mental imagery. This interpretation is in line with Lakoff and Johnson's (1980) idea that metaphoric language comprehension may depend on conceptualizations of personal experiences that are stored in episodic memory. Studies with patients with schizophrenia reported other findings. For instance, Kircher et al. (2007) found that literal sentences elicited greater activation in the left and right precuneus relative to metaphoric sentences. Our previous work documented increased activation in the left precuneus during processing of both literal expressions and novel metaphors in people with schizophrenia relative to healthy control participants (Mashal et al., 2013). This means that the precuneus appears to be involved not only in metaphor processing but also in processing of literal language. People with schizophrenia appear to recruit the precuneus but the exact role of the precuneus in language processing in this population remains unclear.

The aim of the present study is to define the role of the precuneus/SPL in processing of metaphors in schizophrenia by applying region-of-interest (ROI) analysis to bilateral precuneus/SPL and language regions. Furthermore, the focus is on precuneus/SPL activation and connectivity. We used a functional connectivity method that measures the interaction of one brain region with another. We thus measured the functional connectivity of the precuneus/SPL with language brain regions (IFG, PSTS) with which it is connected (Cavanna and Trimble, 2006). We also explored whether comprehension of conventional and novel metaphors is associated with signal change in the precuneus/SPL. We hypothesized that the precuneus/SPL would be more strongly activated when participants with schizophrenia processed literal language and novel metaphors relative to healthy participants. Furthermore, we expected to find a correlation between precuneus/SPL response and activation in language brain regions in schizophrenia that would attest for compensation of deficient metaphoric language processing. We also expected to find a positive correlation between signal change in the precuneus/SPL and comprehension of both conventional and novel metaphors.

# **METHOD**

# **PARTICIPANTS**

Twelve outpatients with schizophrenia (mean age = 28.08, *SD* = 4.34) and 12 healthy volunteers (mean age = 27.08, *SD* = 4.10) took part in this research. All participants were native Hebrew speakers and right handed according to selfreport. The patient group included five women and had a mean of 12.3 years of formal education (*SD* = 1.3), and the control group included seven women and had a mean of 13.1 years of formal education (*SD* = 1.0). There were no statistically significant group differences in age (*t*(22) = 0.58, *ns*), gender (χ <sup>2</sup> = 0.67, df = 1, *ns*), or education (*t*(22) = 1.01, *ns*). Patients were recruited through the Tel Aviv Brull Community Mental Health Center, Israel. Two certified psychiatrists verified diagnoses according to the guidelines of the Structured Clinical Interview of the DSM-IV (SCID), Axis I, Patient Edition (First et al., 1994).

Prior to the imaging session, patients were clinically assessed with the Positive and Negative Syndrome Scale (PANSS; Kay et al., 1987) by a clinically trained person. The total mean PANSS score was 58.83 (*SD* = 12.55), with a score of 11.75 (*SD* = 4.29) for positive symptoms, 17.00 (*SD* = 6.95) for negative symptoms, and 30.08 (*SD* = 5.53) for general symptoms. All participants were on stable doses of atypical antipsychotic medication (mean chlorpromazine equivalents = 440 mg/day). Participants received a full explanation of the nature of the study as well as its potential risks and benefits and then provided written informed consent. The study was approved by the Institutional Review Board of Tel Aviv Sourasky Medical Center.

In the present study we reanalyzed the data collected by Mashal et al. (2013) in which 14 people with schizophrenia and 14 healthy participants were scanned. Two participants in each group showed no significant activation in the precuneus/SPL and were thus excluded from the present study.

### **BEHAVIORAL TESTING**

Participants completed a multiple-choice metaphor comprehension questionnaire. The questionnaire included 30 word pairs: 10 conventional metaphors, 10 novel metaphors, and 10 meaningless expressions (Mashal et al., 2013). For each word pair, four interpretations were provided: a correct interpretation, a literal distracter, an unrelated interpretation, and a phrase saying: "this expression is meaningless". Participants were instructed to select the best response. The questionnaire was administered after the fMRI session.

### **fMRI EXPERIMENT**

Data collection was described in Mashal et al. (2013). Here we reanalyzed the data using a ROI analysis and functional connectivity approaches that were based on the extraction of the individual time courses. To provide the reader with all necessary details, we describe all relevant experimental information from our previous paper.

### **STIMULI**

We selected 96 Hebrew word pairs that formed four types of semantic relations: literal (*birth weight*), conventional metaphors (*sealed lips*), novel metaphors (*pure hand*), or unrelated (*grain computer*). Several pretests were performed prior to the study. The aim of the first pretest was to determine whether each twoword expression was literal, metaphoric, or meaningless. Twenty healthy judges saw a list of expressions and were asked to decide if each expression is literally plausible, metaphorically plausible, or unrelated. For each condition we selected expressions that were rated by at least 75% of the judges as literally or metaphorically plausible, or as meaningless. To distinguish between conventional and novel metaphors, another group of 10 judges saw a list of only the plausible metaphors from the first pretest. They were asked to rate the degree of familiarity of these expressions on a 5-point scale ranging from 1 (highly unfamiliar) to 5 (highly familiar). Expressions with a score higher than three were considered conventional (average rating 4.67), whereas expressions with a score lower than three on the familiarity scale were considered novel metaphors (average rating 1.98). The third pretest assessed subjective rating of word frequency. Thirty-one additional raters were asked to rate all words on a 5-point scale ranging from 1 (infrequent) to 5 (highly frequent). The average rating was 3.45 for literal expressions, 3.79 for conventional metaphors, 3.67 for novel metaphors, and 3.38 for unrelated word pairs.

### **EXPERIMENTAL TASK AND PROCEDURE**

The stimuli were presented in a block design fashion. Each block contained six word pairs in one of the experimental conditions. Each word pair was presented for 3000 ms followed by a 500 ms blank. The blocks were separated by either 6 s or 9 s, in which participants viewed a fixation point on a gray background (baseline). Each experimental condition appeared four times (with a total of 16 blocks) during each scan session. Each block contained one distracter, so that within a block of literal word pairs (or conventional or novel metaphors) there was one expression that was meaningless, and within the block of unrelated word pairs appeared one metaphoric expression. The first 18 s of the scan were excluded to allow for T2<sup>∗</sup> equilibration effects.

Participants were asked to silently read each word pair and decide whether the word pair made sense. Prior to the fMRI scan the task was practiced with stimuli that were not used in the experiment.

### **IMAGE ACQUISITION**

Imaging measurements were acquired through a 3T GE scanner (GE, Milwaukee, WI, USA). All images were acquired using a standard quadrature head coil. The scanning session included anatomical and functional imaging. A 3D spoiled gradient echo (SPGR) sequence with high resolution (a slice thickness of 1 mm) was acquired for each person, in order to allow volumetric statistical analyses of the functional signal change and to facilitate later coordinate determinations. The functional T2<sup>∗</sup> weighted images were acquired using gradient echo planar imaging pulse sequence (TR/TE/flip angle = 3000/35/90) with FOV of 200 × 200 mm<sup>2</sup> , and acquisition matrix dimensions of 96 × 96. Thirty-nine contiguous axial slices with 3.0 mm thickness and 0 mm gap were prescribed over the entire brain, resulting in a total of 159 volumes (6201 images).

### **IMAGING DATA ANALYSIS**

The fMRI data were processed through BrainVoyager software (Version 4.9; Brain Innovation, Maastricht, The Netherlands). Prior to statistical tests, motion correction, high frequency temporal filtering (0.006 Hz), and drift correction (no head movement > 1.5 mm was observed in any participant) were applied to the raw data. Pre-processed functional images were incorporated into the 3D datasets through tri-linear interpolation. Images were smoothed with a 6-mm fullwidth, half-maximum (FWHM) Gaussian filter. The complete dataset was transformed into Talairach space (Talairach and Tournoux, 1988). To allow for T2<sup>∗</sup> equilibrium effects, the first six images of each functional scan were excluded.

### **ROIs ANALYSES**

Our ROIs were defined anatomically and functionally. Specific effects were studied in the left and right precuneus extending laterally to the superior parietal lobule (SPL) and in pre-determined regions that are part of the language network: the left and right IFG, the left MFG, and the left and right PSTS. Anatomic definition of ROIs was based on sulci and gyri. The precuneus/SPL (BA 7) is limited anteriorly by the cingulate sulcus, posteriorly by the medial portion of the parieto-occipital fissure, and inferiorly by the subparietal sulcus and the intraparietal sulcus; the pars triangularis (BA 45/46) in the IFG (left and right), and the area near or at the PSTS between the superior temporal gyrus and the MTG BA 22 (left and right). Our ROIs were also functionally selected by calculating three-dimensional statistical parametric maps, separately for each participant, using a general linear model in which all three meaningful experimental conditions (literal expressions, conventional metaphors, novel metaphors) were positive predictors, and resting state was a negative predictor, with an expected lag of 6 s (accounting for the hemodynamic response delay). Thus, for each participant, task related activity within the pre-determined regions was identified by convolving the boxcar function with a hemodynamic function (HRF). **Table 1** presents the average Talairach coordinates of each ROI in each group.

Time courses of statistically significant voxels were collected in each of the ROIs for each person. Individual averaged MR signals were calculated from all epochs (blocks) of the same condition per activated ROI. Signals were then transformed into percent signal change (PSC) relative to baseline. For all analyses involving the fMRI signal extracted from the ROIs, cluster size involved at least 50 voxels, and the significance threshold was set at *p* < 0.01, uncorrected. Significance tests were thus performed on the average PSC obtained within the cluster of all ROIs,

**Table 1 | Mean Talairach coordinates of activation clusters in regions of interest (ROIs)**.


\* healthy participants only; \*\* patients only; SP = superior parietal; IFG = inferior frontal gyrus; PSTS = posterior superior temporal sulcus; MFG = middle frontal gyrus.

as determined for each condition. Because we examined seven predefined ROIs, we set a more conservative threshold of *p* = 0.007 (calculated as 0.05/7) to account for multiple comparisons. The statistical analyses were conducted with STATISTICA software (version 5).

# **FUNCTIONAL CONNECTIVITY ANALYSIS**

Functional connectivity analyses were performed by computing pair-wise correlations between activation in the precuneus/SPL and activation in language regions (PSTS, IFG). For each participant, fMRI time series (one for each ROI) were averaged separately across voxels within these ROIs for each type of semantic relation (literal word pairs, conventional metaphors, and novel metaphors). Pair-wise Pearson correlation coefficients were computed between each pair of regions (left precuneus/SPL-left PSTS, right precuneus/SPL-right PSTS, left precuneus/SPL-left IFG), using the averaged time series across participants (for each group and condition) during task performance (excluding the between-blocks intervals). Next, we standardized these signals by subtracting them from the mean activation and dividing by the SD, highlighting the specific condition fluctuations (see also Ionta et al., 2014). The significance of the correlations was evaluated through a random permutation test (for similar bootstrapping analysis see Arzouan et al., 2007). In this test, Pearson correlation coefficients are calculated from 5,000 random permutations of the averaged time courses, and are then used to construct the distribution and test the significance of the original correlation value. Additional correction was used to compensate for the multiple comparisons (2 groups × 3 semantic relations × 3 pairs of regions), resulting in a conservative threshold of *p* = 0.002 (calculated as 0.05/18).

# **The relation between metaphor comprehension and precuneus/SPL activation**

Next, we evaluated the correlation between behavioral scores on the metaphor comprehension questionnaire and precuneus/SPL activation. We thus calculated Pearson correlations between the PSC elicited by each metaphoric condition (conventional and novel metaphors) and the scores obtained in the metaphor questionnaire, separately for each participant. Then, we tested the significance of these correlations with a random permutation test (Arzouan et al., 2007) that generated 5,000 random permutations for each condition. This method relies on minimal assumptions and can be applied when the assumptions of a parametric approach are untenable (Nichols and Holmes, 2002). The 5,000 permutations were used to construct the distribution, and test the significance of the original correlation value with a *p* value of 0.006 corrected for multiple comparisons (0.05/8 comparisons = 2 groups × 2 semantic relations × 2 pairs of regions).

### **RESULTS**

### **BEHAVIORAL RESULTS**

*Metaphoric questionnaire*: People with schizophrenia understood fewer conventional metaphors (mean = 81.25%, *SD* = 18.07) than did healthy individuals (mean = 97.92%, *SD* = 4.8), *t*(22) = 3.08, *p* < 0.01, and fewer novel metaphors (mean = 68.73% correct, *SD* = 17.10) than did healthy individuals (mean = 88.96%, *SD* = 11.82), *t*(22) = 3.42, *p* < 0.01. No significant group difference was found in comprehension of meaningless word pairs (*p* > 0.05). **Figure 1** presents questionnaire responses by type of expression and group.

### **ROI ANALYSIS**

Average PSC was analyzed in each of the ROIs by a two-way repeated measures ANOVA in regions showing significant activation by both groups or by a one-way repeated measures ANOVA in regions in which there was significant activation in only one group (see **Table 1**).

A two-way repeated measures ANOVA for signal change within the right precuneus/SPL, with the two groups (schizophrenia, healthy) as a between-subject factor and expression type (literal, conventional, novel) as a within-subject factor, revealed a main effect of group, *F*(1,22) = 9.29, *p* = 0.006. A Scheffe *post hoc*

analysis revealed greater signal change in the schizophrenia group than in the healthy group, *p* < 0.01. The main effect of expression type was also significant, *F*(2,44) = 8.74, *p* = 0.006. A Scheffe *post hoc* analysis indicated that literal expressions led to greater signal change than did both conventional metaphors, *p* < 0.01, and novel metaphors, *p* < 0.05. However, the group X expression type interaction was not significant, *F*(2,44) = 3.55, *p* = 0.007 (see **Figure 2**). A two-way repeated measures ANOVA for signal change within the left precuneus/SPL, with the two groups (schizophrenia, healthy) as a between-subject factor and expression type (literal, conventional, novel) as a within-subject factor, revealed no significant effects (*p*s > 0.007).

Percent signal change in the right and left PSTS and the right and left IFG are presented in **Figure 3**. A two-way repeated measures ANOVA for signal change within the right PSTS with the two groups (schizophrenia, healthy) as a between-subject factor and expression type (literal, conventional, novel metaphors) as a within-subject factor, revealed a main effect of expression type, *F*(2,44) = 6.27, *p* = 0.004. A Scheffe *post hoc* analysis indicated that literal expressions led to greater signal change than did novel metaphors, *p* < 0.01 (**Figure 3**). No other effects reached significance (*p* > 0.007). A two-way repeated measures ANOVA for signal change within the left PSTS revealed no significant main effects (*p* > 0.007).

Significant activation in the right IFG was seen in healthy participants alone and therefore a one-way repeated measures ANOVA was performed on signal change in this location, with expression type (literal, conventional, novel) as a withinsubject factor. This analysis revealed a significant main effect of expression type, *F*(2,22) = 10.01, *p* = 0.0008. Signal change for conventional metaphors was significantly weaker than was signal change for novel metaphors, *p* < 0.05, and significantly weaker than was signal change for literal expressions, *p* < 0.01. A two-way repeated measures ANOVA for signal change within the left IFG, with group as a between-subject factor and expression type as a within-subject factor, revealed a significant interaction, *F*(2,44) = 8.85, *p* = 0.0006. A Scheffe *post hoc* analysis showed that literal expressions led to greater signal change in healthy participants than it did in people with schizophrenia, *p* < 0.05. All other effects were not significant (*p* > 0.007).

Finally, because only people with schizophrenia showed significant activation in the left MFG, a one-way ANOVA was performed on signal change in this location, with expression type as a within-subject factor (literal, conventional, novel). No significant main effect of expression type was found (*p* > 0.007) (see **Figure 4**).

### **FUNCTIONAL CONNECTIVITY ANALYSIS**

To determine connectivity patterns we calculated pair-wise Pearson correlations between activation in the precuneus/SPL and activation in language regions for each expression type, separately for each group (see **Table 2**).

*People with schizophrenia*: A permutation test analysis showed a significant correlation between activation in the right precuneus/SPL and activation in the right PSTS for literal word pairs, *p* < 0.001, conventional metaphors, *p* < 0.0001, and novel metaphors, *p* < 0.0001. The correlations between activation in the left precuneus/SPL and activation in the left PSTS were significant for both literal word pairs, *p* < 0.0001, and novel metaphors, *p* < 0.00001. There was also a significant correlation between

activation in the left precuneus/SPL and activation in the left IFG, but only for novel metaphors, *p* < 0.0001. These results point to left precuneus/SPL involvement in processing of both literal expressions and novel metaphoric expressions and literal word pairs and between activation in the right precuneus/SPL and activation in the right PSTS while processing all semantic relations.

*Healthy group*: No significant correlation between precuneus/SPL activation and activation in the other ROIs was observed within the control group.

### **CORRELATIONS BETWEEN PERFORMANCE ON THE METAPHOR QUESTIONNAIRE AND ACTIVATION IN THE PRECUNEUS/SPL**

To further examine whether metaphor comprehension is related to precuneus/SPL activation, we calculated the correlation between scores on the metaphor questionnaire and the BOLD signal recorded within the left and the right precuneus/SPL.

*People with schizophrenia*: using the permutation test, the only correlation that was found to be significant was the correlation between the comprehension of novel metaphors and BOLD signal

**Table 2 | Pair-wise Pearson correlations between activation in precuneus/SPL and activation in pre-determined ROIs, by expression type and group**.


\* = statistically significant association at the α = 0.0001 level using permutation test.

in the right precuneus/SPL, *r* = 0.83, *p* < 0.001. Thus, the more correct responses that were given on the questionnaire, the stronger was the BOLD signal within the right precuneus/SPL.

*Healthy participants*: No significant correlations between questionnaire score and precuneus/SPL activation were found in the control group.

### **DISCUSSION**

The purpose of the present study was to examine the role of the precuneus/SPL in metaphor comprehension in schizophrenia. Three main findings emerged: (1) people with schizophrenia showed greater activation in the right precuneus/SPL relative to healthy participants; (2) within the schizophrenia group BOLD signal in the left precuneus/SPL and in the left PSTS correlated positively during comprehension of both literal word pairs and novel metaphors. There was also a positive correlation between activation in the right precuneus/SPL and in the right PSTS in all semantic relations. In addition, the left precuneus/SPL was co-activated with the left IFG during novel metaphor processing. No equivalent correlations with activation in the precuneus/SPL were found in the healthy group; and (3) within the schizophrenia group comprehension of novel metaphors, as measured by an offline questionnaire, was correlated with increased activation in the right precuneus/SPL.

The behavioral results showed that people with schizophrenia understood fewer metaphors than did healthy participants. This reduced accuracy is consistent with previous evidence of difficulties in metaphor comprehension in schizophrenia (e.g., Iakimova et al., 2005; Kircher et al., 2007), and is associated with an abnormal pattern of brain activation in schizophrenia (Kircher et al., 2007; Mashal et al., 2013). The present study suggests that the right precuneus/SPL is involved in processing linguistic expressions in schizophrenia, and in particular in understanding novel metaphors. People with schizophrenia may recruit this right posterior parietal region to compensate for their deficient metaphor comprehension. It is also possible that metaphor comprehension is deficient in schizophrenia because this area is recruited. However, the current study cannot determine which explanation is correct. Our results also show that increased novel metaphor comprehension (as assessed by the off line questionnaire) was correlated with increased activation in the right precuneus/SPL, consistent with previous views about the central role of the right hemisphere in metaphor comprehension (Bottini et al., 1994; Giora, 1997, 2003; Mashal et al., 2005, 2007).

The precuneus/SPL has been linked to both linguistic and cognitive processes. According to recent meta-analyses, the precuneus is part of the brain networks associated with semantic processing (Binder et al., 2009; Price, 2010). Our findings point to increased activation in the right precuneus/SPL in schizophrenia as compared to controls. It is possible that this increased activation reflects the process of linking two words into a meaningful expression. However, because processing novel metaphors is demanding, requiring the extraction of relevant features of two disparate domains (Bowdle and Gentner, 2005), greater activation is expected when we compare novel metaphors to literal expressions. Nevertheless, the results of the ROI analysis documented similar signal change across different semantic relations. Hence, it is less likely that precuneus/SPL activation reflects semantic processing in general (Binder et al., 2009). Following Lakoff and Johnson (1980), we assume that people construct mental images in order to use and understand not only figurative language but also literal language. The way in which people construct these mental images differs between the two types of expressions though. While the mental images invoked by figurative language are constrained by conceptual mappings between the base and target domains, the mental images invoked by literal language are based on the understanding of basic level prototypes. Thus, it is possible that people with schizophrenia, unlike healthy participants, either use the right precuneus/SPL to form mental images for both literal and figurative language. Alternatively, it is possible that people with schizophrenia may engage in retrieval of personal experiences from episodic memory (Lakoff and Johnson (1980)).

The low activation observed in the right precuneus/SPL within the healthy group may be related to the observation that the precuneus is part of the DMN (e.g., Fransson, 2005; Cavanna and Trimble, 2006; Fransson and Marrelec, 2008; Margulies et al., 2009; Zhang and Li, 2012). Indeed, there is evidence suggesting that the precuneus is normally less activated during attention-demanding tasks (Cabeza and Nyberg, 2000). It is therefore possible that the pattern of right precuneus/SPL activation in the control group reflects reliance on attentional resources during metaphor processing. It is also possible that the increase in activation in the schizophrenia group is due to abnormalities in the resting state network. Bluhm et al. (2007) reported altered spontaneous fMRI signal fluctuations in the precuneus/posterior cingulated cortex in schizophrenia during resting state. Thus, our results may suggest that whereas healthy participants activate the right precuneus/SPL in accordance with its role as a central hub in the DMN, people with schizophrenia fail to use these attentional resources.

Functional connectivity analyses allowed us to detect associations between neural regions that conventional activationbased analyses cannot address. An important finding of our study is the strong functional connectivity between the left precuneus/SPL and the left PSTS during comprehension of literal word pairs and novel metaphors, as well as the strong connectivity between the right precuneus/SPL and the right PSTS during processing of all semantic relations. These findings suggest that the interactions between the precuneus/SPL and the posterior language area, PSTS, may serve to mediate metaphor comprehension in schizophrenia. The fact that the left precuneus/SPL and the left PSTS, to which the precuneus has anatomical connections (Cavanna and Trimble, 2006), were correlated during literal language comprehension as well indicates that people with schizophrenia may automatically activate mental images in response to both literal and metaphoric expressions (Lakoff and Johnson (1980)). The mental images may then be transformed to auditory representations in the left PSTS to enhance comprehension. In addition, the fact that the left precuneus/SPL was co-activated with the left IFG during novel metaphor comprehension suggests that people with schizophrenia use this brain region in collaboration with the IFG to facilitate novel metaphor comprehension. As argued by Lundstrom et al. (2005), this co-activation may reflect reliance on previous contextual associations which are processed in the precuneus/SPL and their integration in the left IFG. Thus, our results may explain some of the inconsistency in previous fMRI studies in which both left and right precuneus involvement was seen in processing of both literal language and metaphors (e.g., Kircher et al., 2007; Schmidt et al., 2010; Mashal et al., 2013). We suggest that people with schizophrenia, but not healthy participants, use the bilateral precuneus/SPL in collaboration with language areas to facilitate both literal and novel metaphor comprehension.

The current ROI analysis revealed abnormal patterns of signal change in people with schizophrenia. Whereas the right IFG was activated in the healthy group, no such activation was recorded in the schizophrenia group. Thus, consistent with the right hemisphere hypothesis (Mitchell and Crow, 2005), lateralization patterns were different in this group. Interestingly, the ROI analysis found greater activation for processing literal expressions as well as novel metaphors relative to conventional metaphors. This finding indicates that the right lateralized activation observed in healthy individuals was not limited to the interpretation of figurative language but included literal language as well. Unlike the group difference that was documented in the right IFG, both groups activated the left IFG. The ROI analysis demonstrated that the healthy group had greater activation in the left IFG while processing literal word pairs than did the schizophrenia group. Thus, whereas both groups activate the left IFG during metaphor processing to the same extent, the patients show deficient activation of the right IFG.

There are some limitations to our study. First, a larger sample size would have strengthened our conclusions. Given that evidence from different analyses converge in showing involvement of the right precuneus/SPL in novel metaphor processing in schizophrenia, we believe the results will be replicated with a larger group of patients. However, a larger sample size of healthy participants is required to test whether the lack of significant connectivity seen in this group stems from the small sample of healthy participants. Second, we performed the analyses on a subgroup of 24 participants (out of 28 in the original study) who showed significant activation within selected ROIs. It is thus possible that the activation pattern seen here is not universal. We note that the exact role of the precuneus/SPL in language processing is still not entirely clear. However, if the precuneus/SPL activates mental images in response to the current task then we expect to see activation in this area during performance of tasks that explicitly tap into the mental visualization of linguistic expressions. Furthermore, because the expressions used in the current study form a continuum in terms of literality and abstractness (Laor, 1990), people with schizophrenia evoke different mental images on that continuum. Future studies are needed to shed more light on the type of mental images processed by the precuneus. Finally, we did not control for medication effects. Although there is evidence that atypical antipsychotic medication enhances cognitive performance (e.g., Sumiyoshi et al., 2001) and specially attention and verbal fluency (Meltzer and McGurk, 1999), the effects of medication on metaphor processing remain unclear.

In summary, our results shed light on precuneus/SPL involvement in metaphor comprehension in people with schizophrenia. The inefficient processing of metaphors in schizophrenia is related to increased activation in the right precuneus/SPL. It appears that people with schizophrenia recruit the right precuneus/SPL to facilitate novel metaphor comprehension, probably because they rely more on mental imagery and episodic retrieval. Furthermore, people with schizophrenia seem to recruit the bilateral precuneus/SPL while processing novel metaphors, as observed by the co-activation of these regions and both language areas. In contrast, healthy participants seem to rely on the bilateral IFG to process literal expressions and the right IFG to facilitate novel metaphor comprehension. Our results also indicate that the precuneus/SPL contributes to comprehension of literal expressions in schizophrenia, as manifested by tight coupling between the precuneus/SPL and the PSTS during literal language processing.

# **ACKNOWLEDGMENTS**

Support for this study was provided by a NARSAD Young Investigator Award from the Brain and Behavior Research Foundation given to the first author. We thank Michael and Barbara Bass for their generous support.

# **REFERENCES**


schizophrenic patients: anomalies in the default network. *Schizophr. Bull.* 33, 1004–1012. doi: 10.1093/schbul/sbm052


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 23 January 2014; accepted: 24 September 2014; published online: 16 October 2014*.

*Citation: Mashal N, Vishne T and Laor N (2014) The role of the precuneus in metaphor comprehension: evidence from an fMRI study in people with schizophrenia and healthy participants. Front. Hum. Neurosci. 8:818. doi: 10.3389/fnhum.2014.00818 This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Mashal, Vishne and Laor. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# Is the comprehension of idiomatic sentences indeed impaired in paranoid Schizophrenia? A window into semantic processing deficits

#### *Francesca Pesciarelli <sup>1</sup> \*, Tania Gamberoni 2,3, Fabio Ferlazzo4, Leo Lo Russo3, Francesca Pedrazzi 5, Ermanno Melati <sup>5</sup> and Cristina Cacciari <sup>1</sup>*

*<sup>1</sup> Department of Biomedical, Metabolic, and Neurological Sciences, University of Modena, Modena, Italy*

*<sup>2</sup> Centro Salute Mentale Pavullo, Modena, Italy*

*<sup>3</sup> Villa Igea Private Hospital, Modena, Italy*

*<sup>4</sup> Department of Psychology, Sapienza University of Rome, Rome, Italy*

*<sup>5</sup> Centro Salute Mentale Polo Ovest Modena, Modena, Italy*

#### *Edited by:*

*Seana Coulson, University of California at San Diego, USA*

#### *Reviewed by:*

*Nathaniel Delaney-Busch, Tufts University, USA Michael Kiang, University of Toronto, Canada*

#### *\*Correspondence:*

*Francesca Pesciarelli, Department of Biomedical, Metabolic, and Neurological Sciences, University of Modena, Via Campi 287, Modena, 41100, Italy e-mail: francesca.pesciarelli@ unimore.it*

Schizophrenia patients have been reported to be more impaired in comprehending non-literal than literal language since early studies on proverbs. Preference for literal rather than figurative interpretations continues to be documented. The main aim of this study was to establish whether patients are indeed able to use combinatorial semantic processing to comprehend literal sentences and both combinatorial analysis, and retrieval of pre-stored meanings to comprehend idiomatic sentences. The study employed a sentence continuation task in which subjects were asked to decide whether a target word was a sensible continuation of a previous sentence fragment to investigate idiomatic and literal sentence comprehension in patients with paranoid schizophrenia. Patients and healthy controls were faster in accepting sensible continuations than in rejecting non-sensible ones in both literal and idiomatic sentences. Patients were as accurate as controls in comprehending literal and idiomatic sentences, but they were overall slower than controls in all conditions. Once the contribution of cognitive covariates was partialled out, the response times (RTs) to sensible idiomatic continuations of patients did not significantly differ from those of controls. This suggests that the state of residual schizophrenia did not contribute to slower processing of sensible idioms above and beyond the cognitive deficits that are typically associated with schizophrenia.

**Keywords: paranoid schizophrenia, language comprehension, idioms, predictability, multiword units**

### **INTRODUCTION**

Schizophrenia (SZ) is a chronic, debilitating illness characterized by perturbations in cognition, affect, and behavior (DSM-V; American Psychiatric Association, 2013). Most SZ patients have substantial cognitive impairments, compared to overall normative standards and to premorbid functioning, often including language, together with executive function, memory, and attention (for overviews, see Kuperberg and Heckers, 2000; Gold et al., 2009; Harvey, 2010; Barch and Ceaser, 2012; Fisher et al., 2013). SZ has been associated with widespread abnormality of a network of brain areas (e.g., a reversed laterality of activation in the superior temporal gyrus, morphological asymmetries in the superior temporal lobe, structural abnormalities of the ventral parts of the prefrontal cortex) that include the frontal and temporal cortex, the hippocampus, and subcortical regions (for overviews, see Kuperberg and Heckers, 2000; Mitchell and Crow, 2005). The brain areas with abnormal activation or morphology partially overlap with the areas necessary for language comprehension, and specifically for non-literal language comprehension (for overviews, see Thoma and Daum, 2006; Romero Lauro et al., 2008; Cacciari and Papagno, 2012). This brain dysfunction in SZ has been thought to underlie the clinical symptom of concretism (i.e., difficulty in interpreting abstract, non-literal language) that leads to impaired comprehension of non-literal complex structures (Kircher et al., 2007; Schettino et al., 2010; Mashal et al., 2013).

A vast literature on SZ patients has documented semantic processing impairments at single word and sentence levels (for overviews, see Condray et al., 2002; Kiang and Kutas, 2005; Pomarol-Clotet et al., 2008; Kuperberg, 2010a,b). At a word level, a wealth of behavioral and EEG studies compared semantic priming1 effects in SZ patients and healthy controls obtaining divergent results (for overviews, see Minzenberg et al., 2002; Pomarol-Clotet et al., 2008; Kuperberg, 2010a,b; Mathalon et al., 2010; Wang et al., 2011). Studies found an association between SZ and increased spread of activation to weak associates instead of, or in addition to, strong associates at short SOA (stimulus onset asynchrony, SOA: interval between the onset of prime and target presentations) (less than 300 ms). This hyper-priming effect

<sup>1</sup>Semantic priming occurs whenever there is more efficient processing of a target word when preceded by a related stimulus or context.

was often accompanied by reduced or absent priming at long SOAs (more than 300 ms). The exact interpretation of different semantic priming effects at short and long lags is still disputed. For instance, according to the *Activation-Maintenance model* (Salisbury, 2004, 2008) disinhibition within semantic memory leads to the initial large automatic spread of activation in the mental lexicon that would be responsible for the hyper- priming effect often found at short SOAs. Activation would then decay as a function of bottom-up semantic memory trace dissipation (Neely, 1991) coupled with impairment in long-term top-down verbal working memory maintenance. Deficits in maintenance and use of contextual information would lead to impaired semantic priming at long SOAs. In sum, semantic dysfunction in schizophrenia would result from automatic over-activation in semantic networks at short lags and dysfunction in late, controlled processes of context use at long lags (Niznikiewicz et al., 2010). In fact, insensitivity to contextual information is thought to be one of the hallmarks of the linguistic behavior of SZ patients (e.g., Niznikiewicz et al., 1997; Kuperberg et al., 1998; Cohen et al., 1999; Titone et al., 2002). Failure in using contextual information may reflect a more general inability of patients to construct and maintain an internal representation of context for control of action (Cohen and Servan-Schreiber, 1992). This has been correlated with deficits in maintaining context in working memory (e.g., Cohen et al., 1999; Barch et al., 1996). Patients may fail to efficiently use contextual information also because of their inability to identify and encode contextually relevant information (Chapman et al., 1976). However, Titone et al. (2000, 2002) documented that, under specific circumstances, SZ patients may activate contextually relevant information but may fail in inhibiting contextually-irrelevant information especially at long SOAs (Minzenberg et al., 2002) because of a general deficit in controlled semantic processing.

At a sentence level, processing deficits in SZ patients appeared in different forms including syntactic, semantic and pragmatic aspects (e.g., Kuperberg et al., 1998, 2006; Ditman and Kuperberg, 2007, 2010). For instance, it has been shown that SZ patients are relatively insensitive to semantic anomalies presumably because of impairment in building up context during online language processing (Ditman and Kuperberg, 2007). At least some of the sentence-level comprehension abnormalities observed in SZ patients were thought to arise (Kuperberg, 2007, 2010b) from *an imbalance in activity between semantic-memory based and combinatorial mechanisms: unlike healthy controls, patients may fail to engage in combinatorial processing; interpretation (and possibly production) may therefore be primarily driven by semantic memory-based processes* (Kuperberg, 2010b, p. 597). In sum, in SZ these two streams of analysis would fail to cooperate and interact to produce the final sentence interpretation while in normal comprehenders the semantic memory-based stream of analysis occurs partly in parallel with the combinatorial stream of analysis in which the lexico-semantic information of individual words is integrated compositionally with morphosyntactic and thematic structures to determine the sentence meaning.

In normal language comprehension, the general function of combinatorial semantic processing is to integrate the meaning of single words into a coherent sentence representation. However, language comprises many different materials whose actual comprehension requires going beyond compositional processes. In fact for comprehending multiword units such as, for instance, idioms (e.g., *break the ice*, *beat about the bush*), binomials (e.g., *bride and groom*, *spic*, *and span*), or collocations (e.g., *black coffee, morning sickness*), it is necessary to merge combinatorial single word processing with retrieval of lexicalized meanings (for overviews, see Siyanova-Chanturia, 2013; Cacciari, 2014). Establishing whether SZ patients are indeed able to use combinatorial semantic processing in literal sentence comprehension and both combinatorial analysis and retrieval of stored, global meanings in idiomatic sentence comprehension is the main aim of this study.

# **DEFICITS IN THE COMPREHENSION OF NON-LITERAL LANGUAGE IN SZ**

SZ patients have been reported to be more impaired in comprehending non-literal than literal language since early studies on proverbs and metaphors (Gorham, 1961; Kasanin, 1994). Impairment in the comprehension of non-literal language continues to be documented in terms of preference for literal rather than figurative interpretations and poor appreciation of irony (*literality bias*) (metaphors: Chapman, 1960; Cutting and Murphy, 1990; Spitzer, 1997; Drury et al., 1998; Langdon et al., 2002; Langdon and Coltheart, 2004; Kircher et al., 2007; Mashal et al., 2013; idioms: Titone et al., 2002; Iakimova et al., 2005, 2006, 2010; Schettino et al., 2010; proverbs: Gorham, 1961; de Bonis et al., 1997; Sponheim et al., 2003; Brüne and Bodenstein, 2005; Kiang et al., 2007; Thoma et al., 2009; irony: Herold et al., 2002; Langdon et al., 2002; Rapp et al., 2013).

Poor understanding of non-literal language has been attributed to a variety of factors, including a generalized pragmatic comprehension deficit (Tavano et al., 2008). However, recently the idea of a unique mechanism underlying non-literal language deficits has been questioned (Martin and McDonald, 2003; Champagne-Lavau et al., 2006, 2007) by studies that observed qualitatively distinct deficits in different types of non-literal expression, notably in metaphor comprehension (Iakimova et al., 2005; Elvevåg et al., 2011), appreciation of irony (Langdon et al., 2002) and idioms (with poorer performances on literally plausible than on literally implausible idioms, e.g., *skate on thin ice* vs. *throw caution to the winds*) (Titone et al., 2002; Iakimova et al., 2010; Schettino et al., 2010). Then deficient comprehension of non-literal language has been attributed to poor theory of mind (ToM), defined as the ability to attribute mental states to oneself and the others in order to explain and predict behavior in social contexts (Brüne, 2005; Brüne and Bodenstein, 2005; Mo et al., 2008; Champagne-Lavau and Stip, 2009; Gavilán and García-Albea, 2010; Schettino et al., 2010; but see Langdon et al., 2002; Varga et al., 2014). Impaired figurative language comprehension has also been linked to inadequate use of contextual information to construct abstract figurative meanings (Strandburg et al., 1997; Kircher et al., 2007). However, Titone et al. (2002; see also Iakimova et al., 2006, 2010) questioned the idea that SZ patients necessarily exhibit a literality bias. In fact, the lexical decision study of Titone et al. showed that SZ patients were as able as control subjects to use idiomatic contexts to generate idiomatic interpretations when the idiomatic meaning was literally implausible (e.g., *come up roses*) but they instead failed when the idiomatic meaning was semantically ambiguous having also a literal counterpart (e.g., *break the ice*). In sum, this selectively spared ability to comprehend unambiguous idioms would confirm that patients do not have difficulty in understanding non-literal meanings *per se*. Rather they would fail in suppressing competing literal meanings: *difficulty in inhibiting literal interpretation of idiomatic phrases when one is possible, and/or processing ambiguous stimuli, are the sources of contextual failures in schizophrenia* (Titone et al., 2002, p. 318). Unfortunately, in Titone et al's study (2002) patients were presented only with idiomatic sentences, hence without any literal sentence control condition. Hence it is impossible to establish whether patients were comparably good at incrementally integrating word meanings in a compositional way (as necessary for literal sentence comprehension) and at retrieving prefabricated idiomatic meanings from semantic memory (as necessary for idiomatic sentence comprehension). Differences between the comprehension of literally plausible and implausible idioms were observed also by Schettino et al. (2010) in a picture-sentence matching task study. SZ patients and healthy controls were presented with literal and idiomatic sentences followed by a picture correctly or incorrectly depicting the sentence meaning. SZ patients were impaired in choosing the appropriate picture in both types of idiomatic sentence, with a particularly poor performance for literally plausible idioms. However, this result may be influenced by the difficult of representing idiomatic abstract meanings in a pictorial format. Literal pictures may have been easier to elaborate than idiomatic pictures leading to underestimation of the actual ability of patients to comprehend idioms (Papagno and Caporali, 2007).

### **THE PRESENT STUDY**

The present study aimed at investigating whether SZ patients were indeed able to use combinatorial semantic processing to comprehend literal sentences and both combinatorial analysis and retrieval of pre-stored meanings to comprehend idiomatic sentences. In fact, idioms are strings of words with a highly conventionalized meaning stored in long-term semantic memory. Idiomatic meaning does not derive from the composition of idiom constituent word meanings and often refers to abstract mental states or events. We used an online sentence continuation verification task and controlled for a factor that is known to play a major role in literal and non-literal language comprehension, namely the predictability of incoming words. In the sentence continuation verification task, participants are asked to decide whether a target word is a sensible continuation of a previous sentence fragment. This relatively easy task has been widely employed in the psycholinguistic literature to assess sentence comprehension (Burgess and Shallice, 1996) since it is well suited to obtain information on moment-by-moment comprehension placing at the same time little demand on the need to maintain and update information in working memory. The presentation of both the sentence fragment and the target word were self-paced rather than being regulated at fixed rates because selfpaced methods are known to allow subjects to read at a pace that matches their internal comprehension processes (Just and Carpenter, 1980; Kuperberg et al., 2006). Using similar, fixed time durations for patients and controls would have been problematic also because evidence (Butler et al., 2002; Quelen et al., 2005) showed that typically SZ patients need longer presentation durations to perceive a stimulus.

We used only idioms without a literal counterpart (i.e., literally implausible strings, see the Appendix in Supplementary Material and **Table 2** for examples), because evidence showed that SZ patients may be deficient in strategically using contextual information for inhibiting competing literal interpretations when idioms also possess a literal meaning (Titone et al., 2002; Schettino et al., 2010). Since idioms typically have a prefabricated structure, their presence in a sentence may be determined in advance, or reasonably predicted, based on part of an idiom string (e.g., *carry the world on one's. . .* triggers high expectations for the idiomatic completion *shoulder*) (Cacciari and Tabossi, 1988). Several behavioral and EEG studies on language-preserved participants showed that predictable idioms are understood faster than unpredictable ones (e.g., Cacciari and Tabossi, 1988; Cacciari et al., 2007; Vespignani et al., 2010); when the initial fragment of a string creates high expectancy about a final idiomatic conclusion, recognition of a word providing an unexpected ending is slowed down (Tabossi et al., 2005). In sum, idiom predictability can constrain the search through semantic memory facilitating the processing of anticipated components or hindering that of unpredicted ones. However, notwithstanding the acknowledged relevance of word predictability in language processing (for overviews see Federmeier, 2007; Davenport and Coulson, 2011; Cacciari, 2014), this factor has been rather neglected in previous idiom studies on SZ patients2 . Hence we manipulated the predictability of sentence-final words designing literal and idiomatic sentences whose final words were comparably highly expected. While we expect healthy controls to be equally facilitated in anticipating what comes next in literal and idiomatic sentences, patients may be more facilitated by idiomatic than literal predictability because of the bound pre-fabricated structure of idioms.

As we mentioned, in her *Dual Stream hypothesis* Kuperberg (2007, 2010a,b) argued that SZ patients may be characterized by overreliance on semantic-memory based stream of language processing at the expenses of the combinatorial processing stream. Paradoxically, overreliance on semantic memory-based language processing may turn out to be more detrimental to literal than to idiomatic language comprehension. In fact, if one assumes that idiomatic meanings do not have to be compositionally established but are directly retrieved from semantic memory, then this would imply that idiom interpretation in SZ patients should be even more reliant on semantic-memory based processes than in healthy controls. In contrast, comprehending literal sentences requires syntactic and semantic integration of the constituent word meanings. Hence SZ patients may perform nearly as well as healthy controls in comprehending idiomatic ready-to-go meanings, when idioms did not have a competing literal counterpart, while being impaired in understanding literal sentences, at variance with the *literality bias* suggested by prior studies.

<sup>2</sup>Only two studies reported idiom predictability scores (low scores in Titone et al., 2002 and medium scores in Iakimova et al., 2010).

Kuperberg (2010b) argued that retrieval of idioms with a literal counterpart (i.e., ambiguous idioms such as, for instance, *break the ice*) could be relatively facilitated because *a relative impairment in engaging additional combinatorial processing to construct the implausible literal meaning of such idioms* [may result] *in less conflict and increased access to the stored idiomatic meaning* (p. 596). Here we argue that this may also be true of idioms without a literal counterpart (as those used in this study) reflecting a general imbalance of SZ patients toward semantic memory-based processing.

The literature indicates that SZ patients tend to be slower than healthy controls on most cognitive measures (Vinogradov et al., 1998; Harvey, 2010). This may artificially increase the reaction time difference between groups. Hence finding slower response times (RTs) in patients than in healthy controls may not be sufficient for concluding that comprehension is impaired. To overcome this problem, often semantic priming studies (e.g., Spitzer et al., 1993; Kiefer et al., 2009) analyzed the effect of prior context on target word in terms of a priming score (PRI) (see Methods Section). PRIs would reflect the amount of facilitation of prior context on the RTs to a target word (Spitzer et al., 1993). Although the use of PRI primarily derives from single word semantic priming studies, we measured the PRIs of patients and healthy participants when sentence-final words completed literal and idiomatical sentences in sensible or non-sensible ways assuming that sentence-final words could be facilitated by the previous sentence fragments. As reported in the Introduction, in SZ deficient semantic processing may produce distorted priming effect at short lags such that access to words preceded by related primes may be abnormally increased (or reduced) (Ditman and Kuperberg, 2007). Hence, patients, unlike controls, may exhibit exaggerated contextual priming on correct target words as reflected by PRIs larger in patients than in controls.

Studies documented that abnormal semantic processing is often closely associated with evidence of thought disorders, especially in severely ill patients (Ditman et al., 2011). This multidimensional disturbance may emerge in both language comprehension and production with loose lexical associations, incoherent language production, deficient abstract thinking and semantic memory deficits (Andreasen, 1979; Kuperberg and Heckers, 2000; Pomarol-Clotet et al., 2008; Salisbury, 2008; Levy et al., 2010). These disorders are thought to be particularly detrimental to nonliteral language comprehension (Iakimova et al., 2010; Schettino et al., 2010; Mashal et al., 2013). Although the severity of the clinical profiles of the SZ patients involved in this study went from mild to moderate, we tested possible effects of thought disorder (as reflected by scores in the *Positive and Negative Syndrome Scale*, *PANSS*) on target word processing.

We tested a group of relatively young patients (20–45 yearsold) characterized by mild-to-moderate forms of paranoid SZ (as reflected by PANSS scores) and ongoing clinical stability. The choice of this clinical profile was motivated by evidence that in general paranoid SZ patients (together with schizoaffective patients) have higher levels of cognitive ability relative to other forms of the disorder (Goldstein et al., 2005). This may result in a patient sample with relatively moderate average level of psychopathology limiting the potential of any inference about illness state effects on language comprehension but with the advantage of possibly showing aberrant language comprehension already in mild-to-moderate forms of this complex pathology.

In summary, the general aim of the study was to test whether overreliance on the semantic-memory based stream of language processing, at the expenses of the combinatorial processing stream, may paradoxically lead to less impaired comprehension of idiomatic than of literal sentences. SZ patients, unlike healthy participants, may in fact perform worse on literal sentences that require full combinatorial analysis than on idiomatic meanings that do not have to be compositionally established but are directly retrieved from semantic memory. However, SZ impaired language processing may produce distorted semantic effect such that patients, unlike controls, may exhibit exaggerated contextual priming effects. Lastly, we expect the severity of thought disorders within the patient group to affect both RTs and accuracy.

# **EXPERIMENT METHODS**

# *Participants*

Participants consisted of 39 (14 female; mean age = 31 years, age range = 20–45, *SD* = 6*.*2) chronic outpatients with paranoid SZ (DSM-V; American Psychiatric Association, 2013) and 39 healthy volunteers as control participants. Italian was the native language of all participants. The general inclusion criteria were at least 10 years of formal education and age between 18 and 45 years. Patients were recruited from the geographically defined catchment area of Modena and treated by the West Modena Mental Health Service and by a clinic reporting to the same Mental Health Daycare district. Healthy control participants were volunteers recruited in the community through public advertisements. Controls were pairwise matched to patients for age (±2), sex, and education (±2) (see **Table 1**). Controls self-reported to have no history of alcohol or substance abuse, no major medical or neurological illness and no psychiatric illness in first degree relatives. To exclude any past or present psychiatric disorder, controls were administered the *Brief Psychiatric Rating Scale* (BPRS, Ventura et al., 1993). The diagnosis of paranoid schizophrenia of patients was based on the *Positive and Negative Syndrome Scale* (PANSS; Kay et al., 1987; score = 46.69, range = 34–68, *SD* = 8*.*1) and it was confirmed by the clinical consensus of staff psychiatrists. The PANSS is a semi structured interview designed to assess the presence and severity of positive (7 items, e.g., *hallucinations*, *conceptual disorganization*), negative (7 items, e.g., *emotional withdrawal*, *difficulty in abstract thinking or concretism*), and general (16 items, e.g., *anxiety*, *unusual thought content*) psychopathological symptoms. The interview was administered to patients by senior psychiatrists blind to the cognitive data and was aimed at assessing the patients' symptom status in the past week. Based on PANSS classification criteria, 35 patients had a mild form of SZ (PANSS Total score from 34 to 55) and four a moderate form (from 61 to 68)3 . At time of testing, all patients were responsive and clinically stabilized. None of them had comorbid psychiatric

<sup>3</sup>According to PANSS classification criteria, Total scores up to 58 are indicative of a mild form of psychopathology, and up to 75 of a moderate form.


### **Table 1 | Demographic characteristics of the study sample, and clinical characteristics of the schizophrenic patients.**

*M, male; F, female; FG, first-generation antipsychotics; SG, second-generation antipsychotics; FSG, combination of first- and second–generation antipsychotics.*

disorders, alcohol, or substance abuse prior to the study, history of traumatic head injury with loss of consciousness, epilepsy, or other neurological diseases. 33 of the 39 patients were prescribed second-generation antipsychotic medications (as defined by Lohr and Braff, 2003), two first-generation antipsychotics, and four a combination of first- and second-generation antipsychotics. At time of testing, patients had a mean IQ of 88 (range = 58–126, *SD* = 18), assessed with the *Wechsler Adult Intelligence Scale* (WAIS-R), a mean education of 12.6 years (range = 10–14, *SD* = 1*.*33), and a mean illness duration of 8.97 years (range = 1–29, *SD* = 5*.*94) (see **Table 1**). A set of neuropsychological tests was administered to patients and control participants to assess general cognitive functions and language (**Table 1**). Specifically, both patients and controls were administered the Syntactic competence sub-scale of the *Batteria per l'analisi dei deficit afasici* (*B.A.D.A.,* Miceli et al., 1994), an Italian battery on language comprehension originally designed for aphasic patients, to assess basic syntactic comprehension ability and the *Phonemic and Semantic Fluency Tests* (Italian Version; Novelli et al., 1986) to assess general cognitive functioning and semantic processing deficits (for overviews, see Henry and Crawford, 2005). In the *Phonemic fluency test*, individuals produce as many words beginning with given letters (in Italian, F, P, L) as possible in a time interval (60- for each letter). In the *Semantic fluency test*, individuals produce as many members of given stimulus categories (car brands, fruits, and animals) as possible in a time interval (60- for each category). For controls, Digit Span and Vocabulary subtests of WAIS-R were used to estimate, respectively, verbal short-term memory and global verbal intelligence function (Lezak et al., 2004). Patients had significantly poorer performances than healthy controls in all tests (**Table 1**).

Written informed consent was obtained from all participants. Permission for the study was obtained from the Ethical Committee of Modena (*Comitato Etico Provinciale, Azienda Ospedaliero-Universitaria di Modena*).

### *Materials*

Experimental stimuli were formed by 38 idiomatic and 38 literal sentences (see **Table 2** for examples, and the Appendix in Supplementary Material for the idiom list). The final words of all sentences were highly predictable in context, as shown by cloze probability values (see below). Prior to the study, we performed several tests to norm the experimental materials on languageunimpaired subjects (not involved in any other phases of the experiment). First, 60 idioms without a plausible literal meaning were selected from an Italian Idiom Dictionary (e.g., *avere dei grilli per la testa*, *to be full of strange ideas*, *mettersi il cuore in pace, to put one's mind at rest*) and were divided into two lists. Each list was submitted to 20 participants who rated the familiarity of each idiom (from 1: *Never heard* to 7: *Heard very often*) and provided a meaning paraphrase. The 38 idioms selected as experimental materials were highly familiar (*M* = 5*.*02, *SD* = 0*.*59, range = 3.69–5.94) and were correctly paraphrased (*M* = 88.8%, *SD* = 8*.*2, range = 76–100%). Idioms were formed on average by 5.3 words (*SD* = 0*.*7, range = 4–7). Then, 38 sentences (mean number of words = 7.5, *SD* = 1*.*01, range = 6–10) ending with the idiom string and without any bias to the idiomatic meaning were created together with 38 literal sentences of comparable


### **Table 2 | Examples of experimental sentences in Italian and with word-by-word English translations.**

*Good and bad continuations are indicated in capital letters. The idiom meaning is provided in parentheses.*

length and syntactic structure (mean number of words = 7.7, *SD* = 1*.*02, range = 6–10; *t <* 1) (see **Table 2** for examples). To test the cloze-probability of sentence-final words (i.e., the probability that a specific word is given to complete a specific sentence context), different questionnaires containing sentence fragments of increasing length were created. 90 different healthy participants were asked to complete the sentence fragments with the first word that came to their mind. In the final set of experimental materials, idiomatic and literal final words had statistically indistinguishable, very high cloze probability mean values (*M* = 0*.*90; *SD* = 0*.*8, range = 0.75–1; Idiomatic sentences: *M* = 0*.*89, *SD* = 0*.*7, range = 0.75–1; Literal sentences: 0.91, *SD* = 1*.*4, range = 0.76–1, *t <* 1).

The 38 literal and 38 idiomatic sentences were presented in two conditions. In the Sensible continuation Condition, the sentencefinal word was the word that obtained the highest cloze value in the norming phase. These corresponded to the idiom-final words in idiomatic sentences. In the Non-Sensible condition, the last words of idiomatic and literal sentences were substituted with unexpected constituents (cloze value equal zero in both conditions), semantically incongruent to the idiomatic or literal meaning of the sentence, and without any association to any of the preceding words (e.g., idiomatic sentence: *Giulia aveva dei grilli per la TESTA/SPUGNA*, *Giulia had some crickets for the HEAD/SPONGE, Giulia was full of strange idea*; literal sentence: *Maria alla sera andava a nuotare in PISCINA/CRATERE, Maria at night went swimming in the POOL/CRATER*). In order to ensure that the effects of interest were not linked to specific word characteristics, the words forming sensible and non-sensible continuations in each condition were matched for grammatical class, length, frequency, and Age of Acquisition (AoA). In addition, we included 76 filler sentences without any idiom strings whose last word had low to medium cloze probability. The last constituent completed the sentences in sensible ways in half filler sentences and in non-sensible ways in the remaining half. Two lists were created and participants were randomly assigned to one of the two lists so that each sentence was presented only in the sensible or non-sensible version. Each list contained 152 sentences: 38 sentences with sensible continuations (19 idiomatic and 19 literal), 38 sentences with non-sensible continuations (19 idiomatic and 19 literal), and 76 filler literal sentences (38 sensible and 38 non-sensible). Idiomatic sentences represented only 25% of the total number of sentences to prevent participants from developing specific processing strategies for nonliteral sentences, as it is common practice in the psycholinguistic literature.

### *Design and procedure*

Testing and experiment were performed in different sessions (on average three sessions for patients, and two for controls) taking place a few days one after the other. The order of testing and experiment was quasi-randomized across participants.

Each experimental trial began with a fixation cross (+) in the center of a computer screen. A spacebar press initiated the presentation of a sentence fragment that was formed by the sentence without the last word (e.g., *Giulia aveva dei grilli per la*). A second spacebar press initiated the target word presentation that could complete the sentence fragment in a sensible or nonsensible way. The target word was written in GENEVA BOLD 14 and appeared in the center of the screen. The presentation of the target word lasted until a response was given. Participants were instructed to press a *YES* button as quickly and accurately as possible when the target word was a good, sensible continuation of the previous sentence fragment (e.g., *TESTA*) and a *NO* button when the target word was a bad, non-sensible continuation (e.g., *SPUGNA*). The positions of the response buttons were counterbalanced across participants. An experimenter sat behind the patient to ensure that s/he was pressing the spacebar for advancing in the sentence presentation and the response buttons for responding (which always happened). Each participant performed 10 practice trials formed by five literal sentences ending with sensible continuations and five with non-sensible continuations. The practice was followed by the 152 experimental trials. Stimulus presentation and response collection were performed using a purpose-written E-Prime script (Psychology Software Tools).

### **STATISTICAL ANALYSES**

The mean RTs to correct answers and the accuracy proportions of patients and healthy controls in the different conditions are plotted in **Figures 1**, **2**. The mean RTs of correct responses and the accuracy proportions were submitted to analyses of covariance (ANCOVAs) to control for confounding effects accounted for by the following covariates: Verbal fluencies (phonemic and semantic), Vocabulary, and Digit span. Group (patients vs. controls) was a between-subject factor, Sentence (idiomatic vs. literal) and Continuation (sensible vs. non-sensible) within-subject factors. *Post-hoc* Newman-Keuls tests were employed to further examine significant interactions (α = 0*.*05). Comparing healthy subjects and patients may raise a reliability issue for the effects in an ANCOVA design. Thus, we checked the reliability of significant effects from the ANCOVAs by estimating the sampling distribution under the null-hypothesis that no difference exists between healthy subjects and patients using a non-parametric bootstrap procedure (Efron and Tibshirani, 1993; Di Nocera and Ferlazzo, 2000). Namely, on each step: (1) we re-sampled with replacement from the original set of data creating two bootstrap samples, thus making the null-hypothesis true; and (2) the ANCOVA was performed on the bootstrap samples. The procedure was repeated 10,000 times in order to obtain the empirical F distribution under the null-hypothesis. The empirical distribution was then used to estimate the probability of the original *F*-values under the null-hypothesis. The probability values obtained through the bootstrap procedure are hereafter denoted as *p*boot.

The effect of prior context on target words was also operationalized in terms of a priming score (PRI) defined as percentage of facilitation [(RTunrelatedtargets – RTrelatedtargets)/RTunrelatedtargets)∗100] (Spitzer et al., 1993; Kiefer et al., 2009) in the RTs to correct answers. We calculated the PRI for each participant in each condition and entered it in an ANOVA with Group as a between-subject factor and Sentence as a within-subject factor.

To qualify the nature of our effects determining the specific contributions of cognitive, illness-related, and demographical variables to patients' performance, we computed hierarchical regression analyses on the RTs to correct answers using blockwise entry. Twelve predictor variables divided in three blocks were entered in the following order: Block 1 was formed by variables assessing general cognitive and linguistic skills [Verbal fluencies (phonemic and semantic), Vocabulary, BADA, IQ, Digit span]; Block 2 was formed by illness-related variables (years of illness, medications, and PANSS Total Scale); and Block 3 by demographic variables (age, sex, education).

Finally, to explore any effects of the severity of thought disorders, we correlated the mean RTs, and accuracy proportions to the scores of specific items of PANSS (i.e., P2, *Conceptual disorganization,* and N5, *Difficulty in abstract thinking or Concretism*) and of the Negative and Positive Subscales of PANSS. A conservative significance threshold of 0.01 was used to correct for the large number of correlations.

## **RESULTS**

After adjustment by the covariates, the ANCOVA on the mean RTs to correct answers showed significant main effects of Group [*F*(1*,* 72) <sup>=</sup> <sup>9</sup>*.*98, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*002, *<sup>p</sup>*boot *<sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*12], with patients overall slower than controls (+478 ms), and of Continuation [*F*(1*,* 72) <sup>=</sup> <sup>5</sup>*.*22, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*03, *<sup>p</sup>*boot *<sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> *p* = 0*.*07], with non-sensible continuations overall slower than sensible ones (+234 ms). A significant Group by Sentence by Continuation interaction was also obtained [*F*(1*,* 72) = 4*.*33,

**Table 3A | Summary of ANCOVA results for Reaction Times and Accuracy for Group, Sentence, and Continuation while controlling for Phonemic and Semantic fluencies, Vocabulary, and Digit span.**


*<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*04, *<sup>p</sup>*boot <sup>=</sup> <sup>0</sup>*.*014, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*06] (see **Table 3A**). *Post-hoc* tests revealed that patients were significantly faster in responding to sensible than to non-sensible continuations in both literal and idiomatic sentences (Idiomatic sentences: −367 ms, *p <* 0*.*0001; Literal sentences: −253 ms, *p <* 0*.*0001), and to idiomatic than to literal continuations when they were sensible (−158 ms, *p <* 0*.*0003) but not when they were non-sensible (−44 ms). Patients were significantly slower than controls in rejecting non-sensible continuations in literal and idiomatic sentences (+529 ms, *p <* 0*.*01; +578 ms, *p <* 0*.*005; respectively) and, at trend level, in accepting sensible literal continuations (+427 ms, *p* = 0*.*06). Patients did not significantly differ from controls in accepting sensible idiomatic completions (*p* = 0*.*13). Controls were faster on sensible than non-sensible continuations in literal and idiomatic sentences (Idiomatic: −168, *p <* 0*.*0001; Literal: −154 ms, *p <* 0*.*001), and faster on idiomatic than on literal continuations when these were sensible (−108 ms, *p <* 0*.*01) and non-sensible (−93 ms, *p <* 0*.*02). No significant effects of the covariates emerged [Vocabulary: *F*(1*,* 72) = 2*.*67, *p* = 0*.*11; Digit span: *F <* 1; Phonemic fluency: *F <* 1; Semantic fluency: *F*(1*,* 72) = 2*.*62, *p* = 0*.*11]. However, the high number of covariates introduced in the analysis may have reduced the statistical power by adding random noise to the model. Hence we conducted a further ANCOVA with the same factors as the previous one but dropping the least significant covariate (i.e., phonemic fluency). The results of this ANCOVA (see **Table 3B**) mirror the results of the previous one with the exception of two covariates that now show close to significance effects, namely Vocabulary (*p* = 0*.*066), and Semantic Fluency (*p* = 0*.*051).

**Table 3B | Summary of ANCOVA results for Reaction Times for Group, Sentence, and Continuation while controlling for Semantic fluency, Vocabulary, and Digit span.**


Since, as we mentioned, slowing of RTs may inflate contextual effects and group differences, we compared the priming scores (PRI) of controls and patients in the different experimental conditions (see Methods Section). The ANOVA revealed only a significant main effect of Sentence [*F*(1*,* 76) = 4*.*176, *p <* 0*.*04, η2 *<sup>p</sup>* = 0*.*052] with higher priming scores in idiomatic than in literal sentences (17.4 vs. 12.2%, respectively). There were suggestive, although statistically indistinguishable, slightly higher percentages of facilitation in patients than in controls especially in idiomatic sentences (idiomatic sentences: 18.6 vs.16.2%, literal sentences: 12.8 vs. 11.6%, respectively for patients and controls).

Significant effects in the hierarchical regression analyses on patients' RTs revealed that Cognitive variables [i.e., Verbal fluencies (phonemic and semantic), Vocabulary, BADA, IQ, Digit span] accounted for 49.4% of the variance [*F*(6*,* 32) = 5*.*21, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*001, *<sup>r</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*49] in the responses to sensible idiomatic continuations [*F*(12*,* 26) <sup>=</sup> <sup>3</sup>*.*14, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*007, *<sup>r</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*59] with significant contributions of Digit span and IQ [*t*(32) = −2*.*04, *p <* 0*.*05; *t*(32) = 2*.*12, *p <* 0*.*04, respectively]. None of the blocks produced significant *r*<sup>2</sup> changes in sensible literal continuations [*F*(12*,* 26) <sup>=</sup> <sup>2</sup>*.*18, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*05, *<sup>r</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*47]. Cognitive variables also accounted for 34.5% of the variance [*F*(6*,* 32) = 2*.*81, *p <* 0*.*03, *<sup>r</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*34] in non-sensible literal continuations [*F*(12*,* 26) <sup>=</sup> <sup>2</sup>*.*16, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*05, *<sup>r</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*5] with a significant contribution of Digit span [*t*(32) = −2*.*4, *p <* 0*.*02].

The ANCOVA on accuracy4 (see **Table 3A**) revealed a significant main effect of Sentence [*F*(1*,* 72) = 6*.*63, *p <* 0*.*01, *p*boot *<* 0.002, η<sup>2</sup> *<sup>p</sup>* = 0*.*08] with higher accuracy in literal than in idiomatic sentences (98.5 and 97%, respectively). The only covariate leading to a statistically reliable effect was Vocabulary [*F*(1*,* 72) = 10*.*36, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*13].

### **EFFECTS OF CLINICAL VARIABLES**

The correlations of the scores in items P2 and N5 of PANSS5 and RTs and accuracy did not yield any significant results (α = 0*.*01). However, some results significant at trend level merit reporting. Specifically, *Conceptual disorganization scores* (P2) correlated positively with the RTs to sensible and nonsensible idiomatic continuations (*p* = 0*.*04; *p* = 0*.*05, respectively), and inversely with accuracy in responding to non-sensible idiomatic and literal continuations (*p* = 0*.*02; *p* = 0*.*02, respectively). Then, again at trend level, *Difficulty in abstract thought or Concretism* scores (N5) correlated positively with the RTs to sensible idiomatic continuations (*p* = 0*.*02), and inversely with accuracy in non-sensible literal continuations (*p* = 0*.*03). The Negative scale scores positively correlated with the RTs to sensible idiomatic continuations (*p* = 0*.*009), and to sensible and non-sensible literal continuations (but at trend levels: *p* = 0*.*02; *p* = 0*.*04, respectively). Accuracy in responding to non-sensible idiomatic and literal continuations inversely correlated with Negative scale scores (*p* = 0*.*007; *p* = 0*.*004, respectively) and with Positive scale scores (*p* = 0*.*01; *p* = 0*.*01, respectively).

### **DISCUSSION AND CONCLUSIONS**

In normal sentence processing, comprehenders constantly compute the relationships between individual words in a combinatorial way and compare this information with the relationships that are prestored within semantic memory (Kuperberg, 2007, 2010a,b). Semantic memory-based stream of analysis occurs partly in parallel with the combinatorial stream of analysis in which the lexical-semantic information of individual words is integrated compositionally with morphosyntactic and thematic structures to determine the sentence meaning. It has been proposed (Kuperberg, 2007, 2010a,b) that in SZ patients imbalance between the two streams of analysis may lead to sentence comprehension deficit due to over-reliance of semantic-memory based activity at the expense of the combinatorial integrative stream of analysis. Inspired by the *Dual Stream hypothesis* of Kuperberg (2010b), we explored the possibility that idiom comprehension may be relatively spared in SZ patients when idioms are familiar, literally implausible, and predictable before offset. Idiomatic meanings should in fact be directly retrieved from semantic memory; hence patients' over-reliance on a semantic memory-based stream of analysis may turn into a processing resource rather than a limitation. Paradoxically, and despite equally high predictability of sentencefinal words, patients' performance may be poorer in literal sentence that instead require syntactic and semantic integration of the constituent word meanings. This may lead to a patients' performance close to controls in idiomatic but not in literal sentences.

<sup>4</sup>A qualitative analysis of the patients' errors revealed that they made a slightly higher number of errors in rejecting non-sensible idiomatic continuations than in accepting sensible idiomatic ones (35 vs. 28, respectively), and a slightly higher number of errors in accepting sensible literal continuations than in rejecting non-sensible literal ones (22 vs. 15, respectively).

<sup>5</sup>According to PANSS criteria, in our patient sample the severity of Conceptual disorganization (P2) went from absent to moderate (mean score = 1.67, range = 1–4, *SD* = 0*.*91) and Difficulty in abstract thought or Concretism (N5) from absent to mild (mean score = 2.00, range = 1–3, *SD* = 0*.*79). The low average scores in these two items, and in general in the Negative and Positive scales, may have limited the potential for detecting correlations of RTs and accuracy with clinical variables.

Our results showed that patients were overall slower than healthy controls (+478 ms), as expected given the documented general slowing down of SZ patients. Patients were faster in correctly responding to sensible than to non-sensible continuations in both idiomatic and literal sentences. They also were faster in responding to sensible idiomatic continuations than to sensible literal ones, in line with our hypothesis of an advantage driven by the conventionalized nature of idioms. The ANCOVA and the regression analyses showed that cognitive variables indeed played a role in shaping the comprehension performance of patients in line with the evidence of a generalized intellectual impairment of SZ patients even when, as the patients tested in this study, they were relatively well-functioning. Once the contribution of the covariates was partialled out, results showed that patients were slower than controls in correctly rejecting non-sensible literal and idiomatic sentences, and in accepting sensible literal continuations. The RTs of patients to idiomatic sentences were still slower than those of controls but this difference was not statistically significant. This cannot be taken to imply that patients comprehended idioms as controls. Rather these results showed that the state of residual schizophrenia did not contribute to slower processing of sensible idioms above and beyond the cognitive deficits that characterized patients. This was clarified by the results of the hierarchical regression analysis that showed that the reaction times to sensible idioms (and to literal non-sensible sentences) were compellingly explained by differences in the cognitive variables (notably, verbal memory and IQ for sensible idioms, and verbal memory for non-sensible literal sentences). In sum the cognitive dysfunction of the SZ patients tested in this study affected the comprehension of idiomatic as well as of literal sentences, and it was even more pronounced for literal, compositional sentences, in line with our predictions. It should be noted that we measured reaction times to the sentence-final words which may differ from the processing of words within a sentence. In fact, wrap-up effects at the end of sentences place the highest demands on literal, combinatorial processing (Kuperberg et al., 2010).

Patients' accuracy was close to that of controls (96.5 vs. 98.5%, respectively), in contrast to prior studies (e.g., Iakimova et al., 2005, 2010; Thoma et al., 2009; Schettino et al., 2010). We cannot exclude that the lack of a group difference on accuracy across the different experimental conditions may reflect a ceiling effect. Scores in the Vocabulary subtest of WAIS had a general effect on accuracy, a result of interest given that this subtest of WAIS is believed to tap premorbid intelligence in SZ (Lezak et al., 2004) and the documented association of verbal intelligence to efficient sentence comprehension (Hunt, 1977).

The analyses of the priming scores (PRIs) revealed a stronger effect of idiomatic than of literal contexts on target words. It is unlikely that this effect may be due to predictability since sentence-final words were equally highly predictable in both types of sentence. Rather, it seems to reflect the conventionalized, bound nature of idiom strings. In fact, when overlearned figurative expressions are familiar *they provide a degree of context and cloze probability significantly beyond that of literal statements* (Strandburg et al., 1997, p. 605).

In sum, idiom-final words seemed to be more accessible6 to SZ patients than literal-final words, but the processing of both types of words was severely affected by the patients' cognitive abnormalities. Regression analysis showed that cognitive variables (notably, verbal memory and IQ) accounted for a high amount of variance in patients' RTs to sensible idioms and to non-sensible literal sentences. Specifically, short-term verbal memory had a specific role on RTs to non-sensible literal sentences, and both short-term verbal memory and IQ7 on sensible idioms. Prior studies reported mixed evidence on the effects of patients' IQ: it affected idiom comprehension in Iakimova et al. (2010) but was not a significant predictor of correct responses to idioms in Schettino et al. (2010). In Varga et al. (2014) SZ patients with lower IQ were impaired in comprehending unconventional metaphors and irony while performing close to controls in comprehending conventional metaphors (that could in principle be similar to idioms, although no examples are provided in the study). Higher IQ patients performed overall as well as controls. A previous study by Kazmersky et al. (2003) also reported evidence of a link between IQ and figurative language comprehension in healthy participants in that individuals with lower IQ had more difficulty in understanding figurative language than higher IQ individuals.

Correlations showed some effects of the severity of thought disorder on patients' performance, although of limited nature given the clinical profile of patients. In fact RTs tend to slow down as *Conceptual disorganization*, *Difficulty in abstract thinking*, and negative symptoms increased within the patients group. Specifically, higher scores in the item *Conceptual disorganization* (P2) of PANSS were associated with longer RTs to idioms and decreased accuracy on non-sensible sentences (no matter whether literal or idiomatic). This is consistent with evidence that high scores in P2 reflect semantic processing dysfunction (Kiefer et al., 2009), We also found that higher scores in *Difficulty in abstract thinking or Concretism* (N5) led to longer RTs to sensible idioms (as in Iakimova et al., 2010) and decreased accuracy on non-sensible literal sentences. N5 scores are thought to reflect deficient comprehension of abstract, non-literal language (e.g., Kircher et al., 2007; Iakimova et al., 2010; Mashal et al., 2013), as confirmed by recent brain imaging evidence (Kircher et al., 2007; Mashal et al., 2013) that reduced brain activation (in the left IFG and left MFG) during non-literal language comprehension was correlated to high scores in N5. Higher scores in the Negative Scale of PANSS led to longer RTs to sensible idiomatic continuations and to literal ones (sensible and non-sensible). This would be consistent with the claim that severity of negative symptoms is associated with deficits in executive functions (e.g., Basso et al., 1998; O'Leary et al., 2000; Schettino et al., 2010) that brainimaging studies (e.g., Zempleni et al., 2007; Romero Lauro et al., 2008; Proverbio et al., 2009) showed to be relevant to language comprehension, and particularly to idiom comprehension. Lastly,

<sup>6</sup>Accessibility refers to readiness with which a word is retrieved from semantic memory.

<sup>7</sup>This is consistent with what Verguts and De Boeck(2002; Hunt, 1977) defined as the ubiquitous finding of a substantial correlation between memory capacity and general (fluid) intelligence.

higher scores in the Positive Scale, as well as in the Negative Scale, were associated with decreased accuracy in rejecting non-sensible literal and idiomatic continuations. This would conform to evidence that increase in the severity of positive symptoms is linked to meaning processing deficits (Kuperberg and Heckers, 2000; Brüne and Bodenstein, 2005; Salisbury, 2008; Iakimova et al., 2010). Overall, these results indicate that language comprehension in patients with more severe psychopathology was defective in several respects that included differentiating between idiomatic and semantically incongruous literal sentences. This suggests that the ability to comprehend idiomatic expressions and to differentiate conventionalized from anomalous expressions may be indicative of the severity of the linguistic and cognitive deficits of SZ patients. Improving this ability may also constitute a promising path for the treatment of cognitive deficits in SZ patients. In sum, in line with prior evidence (Ditman and Kuperberg, 2007; Titone et al., 2007), our results suggest that even though SZ did not necessarily bring to a loss of semantic-lexical knowledge, definitively it modifies the mechanisms whereby this knowledge is retrieved.

There are some limitations to our study that need to be addressed. First, inclusion criteria may have resulted in a patient sample with mild-to-moderate average levels of psychopathology and this may have limited the potential for detecting possible correlations with clinical variables due to floor effects. Second, patients were tested while they were clinically stabilized hence limiting any conclusions on the exact nature of the language processing perturbations in paranoid SZ. Third, patients were on antipsychotic medication (mostly second-generation antipsychotic medication); hence an effect of treatment could not be ruled out. Fourth, patients and controls were matched in education. Controlling for a factor as education that may account for some variance in neuropsychological measures may remove variance attributable to the variable of interest. Lastly, we only tested patients with paranoid SZ without any comparisons with other forms of SZ. Whatever the case, our results would still be relevant insofar as they show that there is not a global language dysfunction in mild-to-moderate paranoid SZ but qualitatively different language processing impairments that differently affect literal and non-literal language. This may shed some further light on the complexity of the neural underpinnings of literal and non-literal language comprehension as well as on the manifestations of this neurodevelopmental disorder.

As we mentioned in the Introduction, the neural correlates of SZ partly overlap with the functional neuroanatomy of idioms. In fact, converging evidence on language-impaired and languageunimpaired subjects coming from lesion studies, rTMs, and fMRI studies (for overviews, see Thoma and Daum, 2006; Bohrn et al., 2012; Cacciari and Papagno, 2012; Rapp et al., 2012) showed that idiom comprehension is based on a complex neural network that includes the temporal cortex, the superior medial frontal gyrus and the inferior frontal gyrus in the left hemisphere; and the superior and middle temporal gyri, the temporal pole and the inferior frontal gyrus in the right hemisphere, with more extended activations in the left than in the right hemisphere. This neural architecture is not solely involved in idiom comprehension. For instance, idioms and metaphors have largely overlapping activation foci in the left hemisphere (e.g., in the left inferior frontal gyrus) together with important differences concerning a more extended activation in the dorso-lateral prefrontal cortex for idioms than for metaphors, and different clusters of activation in the right inferior frontal gyrus (Bohrn et al., 2012) and right middle temporal gyrus (Rapp et al., 2012) for metaphors than for idioms that may in part depend on the novelty of metaphorical meanings. To the best of our knowledge, so far none of the studies on figurative language comprehension in SZ tested the comprehension of idioms and metaphors within the same sample of patients. Comparing the comprehension of conventional, prestored idiomatic meanings to that of novel, unconventional metaphors would instead provide important evidence on the neural underpinnings of non-literal language comprehension and on whether SZ patients may indeed be favored by the prefabricated nature of idioms as compared to the computation of novel metaphorical meanings that require the blending of distant semantic domains.

# **AUTHOR CONTRIBUTIONS**

Cristina Cacciari initially conceived the idea for the study which was then further developed and finalized by Cristina Cacciari, Francesca Pesciarelli, and Tania Gamberoni. The stimulus materials were prepared by Cristina Cacciari with the help of a doctoral student. Data collection was made possible by Tania Gamberoni, Leo Lo Russo, Francesca Pedrazzi, Ermanno Melati, and Francesca Pesciarelli. Analyses were run by Fabio Ferlazzo and Francesca Pesciarelli. The majority of this paper was written by Cristina Cacciari and Francesca Pesciarelli.

# **ACKNOWLEDGMENTS**

We are grateful to Andrea Cavariani, Chiara Reali, and Daniela Mauri for collecting the data and to the medical doctors and staff of the psychiatric units for their invaluable help.

### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2014.00799/abstract

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 19 March 2014; accepted: 19 September 2014; published online: 09 October 2014.*

*Citation: Pesciarelli F, Gamberoni T, Ferlazzo F, Lo Russo L, Pedrazzi F, Melati E and Cacciari C (2014) Is the comprehension of idiomatic sentences indeed impaired in paranoid Schizophrenia? A window into semantic processing deficits. Front. Hum. Neurosci. 8:799. doi: 10.3389/fnhum.2014.00799*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Pesciarelli, Gamberoni, Ferlazzo, Lo Russo, Pedrazzi, Melati and Cacciari. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Individual differences in executive control relate to metaphor processing: an eye movement study of sentence reading

# *Georgie Columbus 1,2†, Naveed A. Sheikh1,2†, Marilena Côté-Lecaldare1, Katja Häuser 2,3, Shari R. Baum2,3 and Debra Titone1,2\**

*<sup>1</sup> Department of Psychology, McGill University, Montreal, QC, Canada*

*<sup>2</sup> Centre for Research on Brain, Language, and Music, McGill University, Montreal, QC, Canada*

*<sup>3</sup> School of Communication Sciences and Disorders, McGill University, Montreal, QC, Canada*

### *Edited by:*

*Vicky T. Lai, University of South Carolina, USA*

### *Reviewed by:*

*Gwenda L. Schmidt-Snoek, Hope College, USA Victoria Kazmerski, Penn State Erie, The Behrend College, USA*

#### *\*Correspondence:*

*Debra Titone, Department of Psychology, McGill University, 1205 Dr. Penfield Ave., Montreal, QC H3A 1B1, Canada*

*e-mail: dtitone@psych.mcgill.ca*

*†Co-first authors.*

Metaphors are common elements of language that allow us to creatively stretch the limits of word meaning. However, metaphors vary in their degree of novelty, which determines whether people must create new meanings on-line or retrieve previously known metaphorical meanings from memory. Such variations affect the degree to which general cognitive capacities such as executive control are required for successful comprehension. We investigated whether individual differences in executive control relate to metaphor processing using eye movement measures of reading. Thirty-nine participants read sentences including metaphors or idioms, another form of figurative language that is more likely to rely on meaning retrieval. They also completed the AX-CPT, a domain-general executive control task. In Experiment 1, we examined sentences containing metaphorical or literal uses of verbs, presented with or without prior context. In Experiment 2, we examined sentences containing idioms or literal phrases for the same participants to determine whether the link to executive control was qualitatively similar or different to Experiment 1. When metaphors were low familiar, all people read verbs used as metaphors more slowly than verbs used literally (this difference was smaller for high familiar metaphors). Executive control capacity modulated this pattern in that high executive control readers spent more time reading verbs when a prior context forced a particular interpretation (metaphorical or literal), and they had faster total metaphor reading times when there was a prior context. Interestingly, executive control did not relate to idiom processing for the same readers. Here, all readers had faster total reading times for high familiar idioms than literal phrases. Thus, executive control relates to metaphor but not idiom processing for these readers, and for the particular metaphor and idiom reading manipulations presented.

**Keywords: metaphor, idioms, executive control, eye movements, sentence reading, context**

### **INTRODUCTION**

Many instances of language incorporate metaphorical uses of words, some of which are familiar but some of which are unfamiliar. Consider *The students grasped the concept*, where the verb *grasp* refers to taking hold of something conceptually rather than physically (its literal interpretation). Such common metaphors may generally go unnoticed. Indeed, when familiarity is high, comprehension may simply proceed by retrieving this familiar metaphoric meaning from memory, in the same way comprehension normally proceeds for other types of figurative language, such as idioms (e.g., Libben and Titone, 2008; Titone et al., 2014). In contrast, consider *The textbooks snored on the desk* where *snore* means "to go unused," which is metaphorically related to its typical literal meaning, "the sound one makes when one is asleep." Here, the metaphorical meaning is not as familiar as it was for *grasp*, thus, the mental effort required to comprehend this sentence may increase because its intended metaphorical meaning must be generated in the moment (Kintsch, 2000; Kazmerski et al., 2003; Cardillo et al., 2012).

In this study, we investigated whether individual differences in general cognitive capacities, specifically domain-general executive control, relate to metaphor processing. Moreover, we examined this relationship as a function of metaphor familiarity and other factors relevant to on-line comprehension, such as prior contextual constraint, which may or may not force a metaphorical interpretation. We also investigated, for the same participants, whether a relationship between executive control and comprehension extends to another class of figurative language, idioms, which are likely to be more lexicalized than metaphors, and thus amenable to rapid retrieval from memory. As will be seen, our main conclusion is that individual differences in executive control are indeed important for metaphor processing, in a way that varies with familiarity and prior contextual support, and that potentially differs from idioms.

Most psycholinguistic studies of metaphor have investigated nominal metaphors, such as *My lawyer is a shark*, for which some semantic features of the vehicle *shark* (e.g., viciousness) but not others (e.g., marine animal) are attributed to the topic *lawyer*. While it is debated whether this process occurs through category attribution (Glucksberg, 2001, 2003), shared feature comparison (Bowdle and Gentner, 2005), or feature attribution via a multidimensional semantic network search (Kintsch, 2000, 2001; Kintsch and Bowles, 2002), virtually all agree that metaphor understanding depends on the accumulation of one's past experience with particular metaphoric forms—that is, their familiarity.

Indeed, all theoretical accounts would posit that metaphor comprehension should be faster and more accurate when metaphors are familiar when they are unfamiliar. In the category attribution view (Glucksberg, 2001, 2003), this would arise because people directly retrieve familiar metaphoric features, and easily suppress irrelevant features. In the feature alignment view (Bowdle and Gentner, 2005), this would arise because familiar senses of metaphors tend to become integrated over time with the literal word (see also Kintsch, 2000). Consistent with these views, the figurative meanings of familiar vs. unfamiliar metaphors are primed more quickly (Blasko and Connine, 1993), are judged more quickly in phrasal classification tasks (Mashal and Faust, 2009; Goldstein et al., 2012), and undergo a less computationally intensive comparison process (Goldstein et al., 2012; Lai and Curran, 2013; Mashal, 2013). This leads to faster reading and reaction times (Blasko and Connine, 1993; Blasko and Briihl, 1997; Mashal and Faust, 2009; Lai and Curran, 2013), and increased accuracy (Goldstein et al., 2012; Mashal, 2013).

Moreover, some researchers have emphasized how metaphorical knowledge evolves over time by examining, for example, how unfamiliar metaphors can experimentally be made more familiar through repeated exposure (Cardillo et al., 2012; Goldstein et al., 2012). Similarly, some have posited that the figurative meanings of familiar metaphors become lexicalized over time (i.e., they turn into "dead" metaphors, Bowdle and Gentner, 2005), with their component words becoming more polysemous as familiarity increases over time (Glucksberg, 2001, 2003). For these reasons, our understanding of metaphor processing may relate to other work on single word polysemy or homonymy (e.g., Rayner and Frazier, 1989; Frazier and Rayner, 1990; Frisson and Pickering, 1999; Pickering and Frisson, 2001; Klepousniotou et al., 2008), where the crucial question is how selection occurs when multiple meanings are activated.

Of note, several studies have suggested that resolving lexical ambiguity requires increased executive or cognitive control compared to what is required for comprehending unambiguous words (e.g., Gernsbacher and Faust, 1991; Miyake et al., 1994; Gernsbacher and Robertson, 1995; Wagner and Gunter, 2004). Executive control refers to the cognitive skills that govern planning, working memory, and selective attention (Miyake et al., 2000; Karbach and Kray, 2009), which are thought to rely on intact frontal lobe function (e.g., Miyake et al., 2000; Braver et al., 2001). Gernsbacher and Faust (1991; see also Gernsbacher et al., 1990) showed that readers with low comprehension skill (a potential proxy for low executive control) were less capable of inhibiting inappropriate interpretations of lexically ambiguous words (e.g., deciding that *ace* is not related to *He dug with a spade*). Similar results were found in other work for comprehenders with low reading spans, often taken as a measure of working memory (Gunter et al., 2003; Wagner and Gunter, 2004). For example, Miyake et al. (1994) found that readers with low reading spans took longer to read late-occurring disambiguating contexts when the interpretation was unfamiliar or unexpected, suggesting that working memory was necessary to keep both interpretations active until later disambiguating information arrived. These studies are noteworthy in highlighting how a biased context can change what may be considered optimal within a particular comprehension situation. Accordingly, when a prior context is unbiased, the optimal comprehension strategy might be to maintain activation of multiple word meanings or senses in working memory until subsequent disambiguating information arrives. In contrast, when a prior context is biased, the optimal comprehension strategy might be to immediately select or commit to the contextually relevant interpretation of a word's meaning or sense (e.g., Frazier and Rayner, 1990; Frisson and Pickering, 1999).

Given the potential relation between general cognitive capacities and single-word ambiguity resolution, it is reasonable to expect that executive control should also be important for metaphor processing, and indeed, the literature provides some support for this hypothesis. With respect to metaphors and executive control specifically, Chiappe and Chiappe (2007) showed that people with better inhibitory skills (measured by reverse digit span) produced more accurate metaphor interpretations than those with lower skills. Similarly, Kazmerski et al. (2003) found that high-IQ participants (where IQ was correlated with both working memory and vocabulary performance) were more likely to automatically compute metaphorical meanings than low-IQ participants. High-IQ participants also gave better interpretations for metaphors in a subsequent task. In a neuroimaging study, Prat et al. (2012) found that individuals with low vocabulary and working memory performance showed greater activation in the right inferior and middle frontal gyri when processing nominal metaphors (e.g., *He is a prince*), especially those in biased vs. neutral contexts. These findings cohere with other evidence showing that individuals with executive control deficits (e.g., people with schizophrenia) have difficulties processing metaphors (e.g., Mashal et al., 2013).

The role of executive control may be especially important for unfamiliar metaphors. Consistent with this idea, Mashal et al. (2007) found that unfamiliar two-word metaphors (e.g., *sweet sleep*) led to greater neural activation in frontal brain regions (the left middle frontal gyrus, right inferior frontal gyrus, and right posterior superior temporal sulcus) compared to familiar metaphors and literal phrases. In another study, Mashal (2013) found that people with larger reverse digit spans had better recall, comprehension, and recognition for unfamiliar metaphors compared to unrelated word pairs. Such results are consistent with those of Gernsbacher and Robertson (1995) and Gernsbacher et al. (2001), showing that the need to actively suppress metaphor-irrelevant features in a behavioral task was critical for comprehension.

Thus, several sources of evidence suggest that executive control demands during metaphor processing should differ as a function of familiarity and prior context, however, several questions remain. First, while prior work has investigated how individual differences relate to metaphor processing, this work often conflates more than one kind of individual difference simultaneously (e.g., reading span tasks, where performance is based on language processing, vocabulary knowledge, and working memory capacity). Indeed, the majority of tests in the literature have been verbal and language-based in nature (e.g., reading span), thus making it unclear whether domain-general aspects of executive control relate to performance. Second, although previous work has shown that executive control is necessary for resolving lexical ambiguity, to our knowledge no study has investigated how both familiarity and context jointly influence executive control demands during metaphor processing. Finally, past studies have investigated metaphors presented in isolation (e.g., Mashal et al., 2007; Mashal and Faust, 2009; Goldstein et al., 2012; Mashal, 2013) or used secondary tasks, which could compromise the naturalness of comprehension (e.g., Kazmerski et al., 2003; Pierce et al., 2010; Goldstein et al., 2012; Lai and Curran, 2013).

The present study thus addresses some of these limitations in a sentence reading experiment where participants' eye movements are recorded as they naturally read sentences containing metaphors. With respect to assessing individual differences in executive control, we used a well-studied domain-general executive control task (AX-CPT, e.g., Braver et al., 2001). Specifically, we examined different aspects of the eye movement record to determine exactly when executive control is necessary for computing a metaphorical meaning during the time course of reading, and how that varies as a function of the familiarity of the metaphors in question, and of the degree of contextual support provided by the sentence.

To these ends, in Experiment 1, we created sentences containing metaphors that hinged on a metaphoric interpretation of individual verbs (i.e., predicate metaphors), as well as literal sentences using the same verbs. All sentences had the same structure consisting of subject noun, verb, disambiguating context and neutral ending (e.g., *The textbook snored on the desk at the end of the day*). Each sentence could also have a context word prior to the subject noun that either biased a metaphorical or literal interpretation of the verb (e.g., *The unopened textbooks snored on the desk at the end of the day*). Our eye movement measures assessed how long people read the critical verb region in terms of first pass reading, and the whole metaphor region (i.e., noun + verb) in terms of total reading time, thus incorporating fixations occurring after readers had encountered a later disambiguating context (which should indicate whether the verb should have been interpreted metaphorically or literally). This allowed us to construct a timecourse ranging from early to late, enabling us to assess whether individual differences in executive control differentially related to different points of this time-course.

When there was no prior context, we expected that readers would delay committing to a metaphorical interpretation of the verb at the point of the verb (e.g., *snored*), as has been found in prior work on polysemous verbs (Pickering and Frisson, 2001). However, we generally expected that a prior context that biased a specific interpretation of the verb (e.g., *unopened textbooks*) would cause readers to commit to the contextually appropriate interpretation. Of note, we expected that these general effects would be modulated by both metaphor familiarity and individual differences in executive control.

In Experiment 2, we extended the results of Experiment 1 by examining another class of figurative language that is likely to be more lexicalized or familiar than metaphors—idiomatic expressions. Idioms have whole meanings that go beyond the combination of the literal meanings (i.e., *kick the bucket* is not related to the act of kicking nor to a pail), and can be accessed from memory as a single lexical item, while also activating the lexical meanings of the component words (Libben and Titone, 2008). Idioms therefore have meaning ambiguity at the word level, like metaphors, but occur in more predictable word configurations than metaphors, thus functioning to a greater extent than metaphors as highly familiar lexicalized entities. Thus, we generally expected that idioms, unlike metaphors, would not show a strong relation to individual differences in executive control.

## **EXPERIMENT 1: METAPHOR PROCESSING AND EXECUTIVE CONTROL METHOD**

# *Participants*

Thirty-six native speakers of English participated for course credit or compensation of \$10/h. All participants were from the McGill or Montreal community, had normal or corrected-to-normal vision and no self-reported history of speech or hearing disorders with a mean age of 22.74 (*SD* = 2*.*73) and a mean of 16.3 years of education (*SD* = 2*.*0).

### *Stimuli*

We created sentences containing metaphors and literal sentences of the type described above, e.g., *The textbook snored on the desk at the end of the day; The sailor snored in the hammock at the end of the day; The unopened textbooks snored on the desk at the end of the day; The tired sailor snored in the hammock at the end of the day.*

The stimulus set consisted of 256 sentences, which included 64 unique verbs, taken from a larger set of metaphors developed by Cardillo et al. (2010). Cardillo et al. normed these verbs in their metaphorical or literal sentences for literalness, figurativeness, plausibility, naturalness, imageability, frequency and interpretability. We modified the Cardillo et al. sentences by adding a neutral continuation (e.g., *The textbook snored on the desk at the end of the day*). This ensured that neither the verb nor the disambiguating context region was sentence-final. We also modified the sentences by presenting them in two conditions: With an adjective providing context prior to the topic noun of the sentence, or without a prior context (see **Table 1**).

Because we modified the original Cardillo et al. (2010) sentences, we conducted our own normative study to assess familiarity of the literal and metaphorical uses of the verbs. We asked 23 native English speakers (none of whom participated in the sentence reading task) to rate how familiar the verb was on a seven-point Likert scale (1 = *Not at all familiar*, 7 = *Very highly familiar*) for its use (literal or metaphor) in its specific context. The surveys contained sentences for the full set of 64 verbs, but



were divided into two versions so that participants only rated either the literal or metaphorical use of each verb. Each survey was presented in one of two pseudo-randomly ordered lists. Thus, 12 participants rated one version, and 11 participants rated a second version. We then calculated a global familiarity score for each verb by creating a ratio of average metaphoric to literal ratings. Across items, this ratio ranged from 0.71 to 1.19 (mean = 0.98), where a value of 1 indicated that the metaphorical and literal sense of the verb were equally familiar, values greater than 1 indicated that the metaphorical sense was more familiar than the literal sense, and values less than 1 indicated that the metaphorical sense was less familiar than the literal sense. This ratio allowed us to determine relative metaphor vs. literal familiarity.

### *Apparatus*

We used an Eye-Link 1000 tower mounted system (SR-Research™, Ontario, Canada) that sampled eye position every millisecond. Viewing was binocular but eye movements were recorded from the right eye only, using a head rest. Stimuli were presented on a 21-- ViewSonic CRT monitor with a screen resolution of 1024 × 768 pixels, using EyeTrack 7.10 software developed at UMass Amherst (blogs.umass.edu/eyelab/software). Text was presented on a single line in yellow 10-point Monaco font on a black background. Three characters subtended approximately 1◦ of visual angle.

### *Procedure*

The research was carried out with the approval of the McGill University Research Ethics Board. Participants completed a language background questionnaire before the reading task. Eye movements were calibrated using a nine-point grid. The verbcontext pairings were presented once in each of six counterbalanced lists, such that if a participant viewed the metaphor *The textbook snored on the desk at the end of the day*, s/he would not see the same sentence with the added adjective (*The unopened* *textbook snored on the desk at the end of the day);* nor would s/he see the literal counterparts of the metaphor stimuli of their list (*The [tired] sailor snored in the hammock at the end of the day*). No participant saw the same metaphor or literal sentence more than once.

In addition to the experimental sentences, participants also read 16 practice sentences, for a total of 80 stimulus sentences in each list, and 54 trials belonging to a second experiment (see Experiment 2). All stimuli were randomly presented. Practice sentences could be figurative or literal. Eight occurred at the beginning of the reading task and eight occurred after a rest break at the midway point. Twenty-two percent of trials were followed by yes-no comprehension questions.

After the main sentence reading task, participants completed an executive control task consisting of the AX-CPT task (Braver et al., 2001). This task uses letter stimuli, but as they are symbolic and not dependent on language processing, the task is domaingeneral. In this task, participants saw letters one at a time in the center of the screen, and were instructed to press one button when an "X" immediately followed an "A," and to press another button for all other trials. "AX" target trials occurred in 70% of all trials (total trials = 430), and the remaining 30% of trials were comprised of each of three non-target letter combinations (10% each). Thus, the easiest non-target condition was "BY," which provides a baseline for comparison of the other non-target trials. Here, "B" stands for any letter which is not "A," and "Y" stands for any letter that is not "X." Our measure of interest was based on the "BX" trials because encountering the "X" for these trials would trigger a pre-potent tendency to push the button indicated for target "AX" responses rather than non-target responses. This difficulty is thought of as reactive control. Because of the *a priori* similarity between the processes involved in reactive control and what we expect to be required during metaphor interpretation (i.e., suppressing a pre-potent tendency to interpret the words of a metaphor literally), we derived a cost score for each participant based on the millisecond difference between the average correct reaction times for BY from the average correct reaction times for BX.

### **RESULTS**

Overall comprehension question accuracy was 96.4%, indicating that participants performed the language task well. Eye movement data were analyzed using linear mixed effects (LME) models (lme4 package, version 0.999999-2; Bates et al., 2013, in the R Project for Statistical Computing environment, version 3.0.2; R Development Core Team, 2013). One important reason for using LME over traditional statistics is that it allows us to investigate continuous variables that are based on subject-related differences (e.g., executive control costs) and item-related differences (e.g., familiarity ratings for metaphors or idioms). This kind of analysis cannot be easily accomplished using traditional ANOVA (see Baayen et al., 2008, for a more detailed account of the rationale for using LME). To index early cognitive processes, such as lexical access, and later cognitive processes, such as semantic integration (Rayner, 1998, 2009; Rayner et al., 2012), we analyzed gaze duration (the sum of all fixation durations during the first pass) of the verb (Verb GD), and total reading time (the sum of

**Table 2 | Means and standard deviations for median split familiarity and executive function in Experiment 1.**


all fixation durations) of the whole metaphor region (Metaphor TRT), respectively. Thus, for the sentence *The [unopened] textbook snored on the desk at the end of the day*, we analyzed Verb GD for *snored*, and Metaphor TRT for *textbook snored*.

We fit LME models to each eye movement measure. In each model, familiarity (i.e., metaphor/literal familiarity ratio; continuous), executive control (continuous), context (with or without prior context), and condition (metaphor or literal) were fixed effects. Categorical predictors were deviation coded except where noted otherwise, and all continuous predictors were scaled to reduce collinearity. Maximum correlations among main effects were *<*0.16 for each main model. Subjects and items (sentences) were random intercepts across the models; random slopes were included in models only when they were statistically warranted (cf. Baayen et al., 2008; Barr et al., 2013). In addition, for consistency across models, we computed *p*-values using the number of model terms minus one for the degrees of freedom. All model formulae were near-identical in that they included a four-way interaction term for familiarity ratio ∗executive control∗context∗condition. For ease of data interpretation, we present the means and standard deviations for all continuous factors in **Table 2**, with familiarity ratio and executive control median split (recall, they were treated as continuous variables in all models).

### *Verb GD*

We removed extreme outliers (Verb GD *<* 80 ms or *>* 2000 ms) from the dataset, retaining 94.3% of observations. Stepwise log likelihood model comparisons showed that by-subject and byitem random slopes were not warranted for categorical variables in this model. Subject-averaged (F1) means broken down by metaphor condition, familiarity ratio, and context are presented in **Figure 1**. The full model is presented in **Table 3**.

We found a significant interaction between condition and familiarity ratio (*b* = −0*.*04, *SE* = 0*.*02, *p <* 0*.*05), indicating that verbs used in a metaphorical sentence had longer gaze durations than the same verbs used in a literal sentence to the extent that they were low familiar. To further assess the source and direction of this interaction, we computed sub-models where the data were median split into high and low metaphor-literal familiarity ratios. Readers' Verb GD were longer for low familiar verbs in metaphor sentences (e.g., *The textbook snored on the desk at the end of the day*) compared to low familiar verbs in literal sentences (e.g., *The sailor snored in the hammock at the end of the day*) (*b* = 0*.*08, *SE* = 0*.*02, *p <* 0*.*001); there were no significant effects for high familiar metaphor verbs. Thus, it is likely that readers considered high familiarity ratio verbs to be ambiguous in terms

**FIGURE 1 | Context and familiarity subject-averaged (F1) mean reading times (ms) for Verb GD.** Error bars show standard error of the mean.

**Table 3 | Effect sizes (***b***), standard errors (***SE***), and** *p***-values for the verb gaze duration logistic LME model.**


*\*p* <sup>≤</sup> *0.05.*

of their metaphoric vs. literal uses, leading to no differences in Verb GD on the verbs as a function of whether they were intended metaphorically or literally.

We also found an interaction between executive control and presence of a prior context, which did not interact with condition (*b* = −0*.*04, *SE* = 0*.*02, *p <* 0*.*05). This effect indicated

that readers with higher executive control read verbs more slowly when there was a prior context, across both metaphorical and literal sentences; In contrast, readers with low executive control showed no difference in Verb GD as a function of prior context (see the partial effects plot in **Figure 2**). This suggests that readers with high executive control expended more effort to commit to a particular interpretation of the verb at the point of the verb, while readers with low executive control did not.

### *Metaphor TRT*

The models fit to Metaphor TRT included a covariate for character length (continuous) because metaphor length, unlike verb length, varied across sentences with metaphoric vs. literal verb use (e.g., *model flitted* vs. *butterfly flitted*). Extreme outliers were once again removed (Metaphor TRT *<* 80 ms or *>* 4000 ms), leaving 94.5% of the observations. Log likelihood model comparisons showed that by-subject and by-item random slopes were warranted for condition and prior context and were thus included. Subject-averaged (F1) means broken down by metaphor condition, familiarity ratio, and context are presented in **Figure 3**.

As seen in **Table 4**, there was a two-way interaction between context and condition (*b* = −0*.*09, *SE* = 0*.*05, *p* = 0*.*05), and a three-way interaction between condition, context, and executive control (*b* = 0*.*08, *SE* = 0*.*04, *p* = 0*.*05; see the partial effects plot in **Figure 4**). To determine the source of the three-way interaction, we ran sub-models split by trials where there was a prior context and when there was not a prior context. In the model fit to the data without a prior context, there was only a main effect of condition (*b* = 0*.*13, *SE* = 0*.*04, *p <* 0*.*001). In the model fit to the data with a prior context, there was a trend for an interaction between condition and executive control which did not reach significance (*b* = 0*.*04, *SE* = 0*.*03, *p* = 0*.*11). In addition to this sub-model, we also ran sub-models split by condition (i.e., metaphor or literal sentences); the model for the metaphor sentences showed a main effect for context (*b* = −0*.*08, *SE* = 0*.*04, *p* = 0*.*05).

**FIGURE 3 | Context and familiarity subject-averaged (F1) mean reading times (ms) for Metaphor TRT.** Error bars show standard error of the mean.

Taken together, the trends observed in the follow-up analyses suggest that the best interpretation of the original three-way interaction is that participants with high executive control read metaphors as quickly as literal sentences when there was a prior supportive context. In all other cases, all people read metaphors more slowly than literal sentences. This Metaphor TRT finding is compatible with the one reported above for Verb GD, in that they together suggest that participants with high executive control spent more time reading the verb following a prior context, thus, their efforts toward contextual integration occurred earlier than that found for participants with low executive control.

### *Regression probability into prior context*

Given the differences between readers with high and low executive control in Metaphor TRT as a function of prior context, we wished to determine whether and how executive control related to how people read the prior context word itself (e.g., *unopened* in *unopened textbooks snored*). We thus calculated the probability that readers would regress into the prior context word at any point while reading the sentence, and analyzed these data using a generalized LME model. As the model only evaluates data from sentences that had prior contexts, it only included a threeway interaction term for condition ∗executive control∗familiarity ratio, unlike the verb and metaphor region models above. Log likelihood model comparisons showed that random slopes were not warranted in this model.

As seen in **Table 5**, and consistent with the idea that low executive control readers semantically committed to a particular context-driven interpretation of the verb after the first pass, we found a significant interaction for condition∗executive control (*b* = 0*.*31, *SE* = 0*.*16, *p* = 0*.*05). This interaction indicated that readers with low executive control had a considerably higher probability of regressing into the prior context, and particularly when they were reading a metaphor sentence rather than a literal sentence (i.e., *textbooks snored* rather than *sailors snored*; see the partial effects plot in **Figure 5**).



### **DISCUSSION**

In Experiment 1, we examined whether domain-general executive control related to how people read verbs used metaphorically or literally as a function of familiarity and prior context. We found that readers with high but not low executive control took the prior context into account at the point of the verb on the first pass: They exhibited longer Verb GD when a prior context occurred, irrespective of metaphor familiarity. Interestingly, although familiarity speeded Verb GD generally, this general facilitative effect of familiarity did not interact with prior context or executive control.

With respect to later reading measures, however, executive control did interact with condition and context. In terms of Metaphor TRT, people with high executive control showed longer reading times for metaphorical vs. literally intended verbs when there was a prior context. In contrast, people with low executive control did not show this difference, rather they were slower across the board for metaphors. Readers with low executive

**FIGURE 4 | Metaphor TRT partial effects as a function of condition and executive control in sentences with (A) No Prior Context and with (B) Prior Context after removing the effects of familiarity and noun length, and between-subject and between-item variance.** Error bands show 95% confidence intervals.



*\*p* <sup>≤</sup> *0.05.*

control were also more likely than those with high executive control to regress back into the context word, suggesting that they had to work harder to make sense of the sentence after it had been fully read and the intended meaning became clear. Thus, when the context biased a particular interpretation (e.g., *unopened textbooks snored*), people with high but not low executive control spent more time reading the metaphorical verb, presumably to semantically commit to the contextually appropriate interpretation on the first pass. Further, they spent less time rereading both the metaphorical regions of the sentence (e.g., *textbooks* *snored*) or regressing to the biased context word (e.g., *unopened*). Consequently, readers with high executive control displayed a more efficient reading strategy by integrating contextual cues as they occurred on the first pass, whereas readers with low executive control were less likely to do so.

While the overall pattern of metaphor data is relatively clear, one open question is whether a similar pattern of executive function interactions occurs for other forms of figurative language, such as idiomatic expressions, that have fewer on-line comprehension demands. Idiomatic expressions, like metaphors, have figurative meanings that can be more or less familiar (e.g., *kick the bucket*, which is familiar in English and figuratively means "to die"; *bore his cross*, which is known but less familiar in English and figuratively means to "accept one's burden in life"). However, unlike the metaphors used in Experiment 1, the component words of idioms have a high likelihood of co-occurring, independent of what meaning is intended (Wulff, 2008). The implication of this difference between idioms and metaphors is that encountering the initial words of an idiom (e.g., *kick the...*) may enable people to strongly anticipate their completion (e.g., *bucket*), particularly when idioms are highly familiar. This early anticipation of idiom-final words might in turn enable a head start on semantic processing (Cacciari and Tabossi, 1988; Titone and Connine, 1994), such that the interpretive demands faced when one ultimately encounters an idiom-final word are reduced.

In this way, idioms might differ from the situation engendered by metaphors (particularly those included in Experiment 1), where there is no basis upon which to anticipate a figuratively biased verb at a lexical level (e.g., *The unopened textbooks snored*), even in the condition where the context is semantically consistent with a metaphorical interpretation of the verb.

Thus, in Experiment 2, we examined whether the presumed lexical boost afforded idioms compared to metaphors would reduce the overall demands of comprehension, and result in a pattern of data where the same participants in Experiment 1, who showed executive control dependencies for metaphors, would fail to show such dependencies for idioms randomly interspersed in the same experimental set. Of note, because the set of idiomatic sentences included in Experiment 2 were not initially intended to serve this purpose, they are not perfectly comparable in a point for point sense, thus it is not readily possible to statistically compare differences across Experiments 1 and 2. However, qualitative comparison of data patterns across Experiments 1 and 2 may be useful for informing future research efforts that directly compare metaphors to idioms using methods and materials deliberately intended to do so.

# **EXPERIMENT 2: IDIOM PROCESSING AND EXECUTIVE CONTROL**

# **METHOD**

### *Participants*

The participants were the same individuals who completed Experiment 1.

### *Stimuli*

We created sentences containing idioms that all had the same verb-determiner-noun structure: Subject noun, verb, determiner, object noun, and disambiguating context (*Roxy bit her lip and tried to keep the plans for the surprise party a secret*). Like Experiment 1, we had two regions of interest, Idiom-final noun GD and Idiom TRT, the former reflecting early or first-pass comprehension, and the latter reflecting later or second-pass comprehension.

Each sentence contained an idiom or matched literal phrase, followed by a disambiguating context, which forced a particular interpretation of the idiom. There were three conditions, as seen in **Table 6** below. In one condition, idioms were followed by a context that biased the idiom's figurative meaning (Id-Id). In a second condition, idioms were followed by a context that biased the idiom's literal meaning (Id-Lit). In the control condition, a matched literal phrase was always followed by a literal

**Table 6 | Example low and high familiar sentences from idiom-idiom, idiom-literal, and literal-literal conditions.**


**Table 7 | Means and standard deviations for familiarity and executive function in Experiment 2, split by median.**


context (Lit-Lit). The stimulus set consisted of 54 idioms that were selected from a larger set of well-normed idioms (Libben and Titone, 2008), which included familiarity ratings on 5-point scale (1 = *I never or almost never encounter the idiom*, and 5 = *I encounter the idiom frequently*).

### *Apparatus*

Same as Experiment 1.

### *Procedure*

The procedure was identical to Experiment 1 because the idiom sentences analyzed here were randomly intermixed in the metaphor set reported in Experiment 1. For the idiom sentences, participants viewed one of six counterbalanced lists. There were 54 target sentences in each list. An idiom or its literal control, but not both, appeared once in a given list in only one condition. Thus, if a participant viewed the idiom *Roxy bit her lip and tried to keep the plans for the surprise party a secret* in the Id-Id condition, s/he would not see that idiom in the Id-Lit condition (*Roxy bit her lip as she rushed through breakfast in a hurry to get to school*), or its matched literal control in the Lit-Lit condition (*Roxy cut her lip as she rushed through breakfast in a hurry to get to school*). No participant saw any sentence more than once.

### **RESULTS**

Overall comprehension accuracy was 94.5%, indicating that participants were attentive during the experiment.

The same fixed effect structure was applied to each eye movement measure (i.e., Noun GD and Idiom TRT). The fixed effect structure included a three-way interaction term for familiarity (continuous)∗executive control (continuous)∗condition (Id-Id, Id-Lit or Lit-Lit; deviation coded). As in Experiment 1, the continuous predictors were scaled, maximum correlations (all *<* 0.28) showed minimal effects of collinearity, and random intercepts were included for subjects and items. For ease of data interpretation, we present the means and standard deviations for all continuous factors in **Table 7**, with familiarity categorized into high and low with a median split, and the executive control means and standard deviations repeated from Experiment 1.

### *Noun GD*

There were no significant interactions or main effects of idiomatic condition. Thus, idiom-final words were read equally fast in all experimental conditions. As well, there were no interactions with executive control.

# *Idiom TRT*

A covariate was added for idiom length. We removed extreme outliers (Idiom TRT *<* 80 ms or *>* 4000 ms), retaining 95.13% of the total observations. Log likelihood model comparisons showed that by-subject and by-item random slopes were supported for condition in the model. Subject-averaged (F1) means broken down by condition and familiarity are presented in **Figure 6**.

As seen in **Table 8**, we found a significant interaction of condition by familiarity (*b* = −0*.*12, *SE* = 0*.*04, *p <* 0*.*05), indicating that readers had shorter total reading times for high familiar Id-Id phrases (e.g., *Roxy bit her lip and tried to keep the plans for the surprise party a secret*) compared to their reading times for low familiar Id-Id phrases (e.g., *Josh bore his cross the entire flight and didn't complain about the snoring man*) (see the partial effects plot in **Figure 7**). There was no such interaction with familiarity for the Id-Lit contrast in this model. Moreover, a treatment coded model with Lit-Lit as the baseline showed an interaction with familiarity for Lit-Lit vs. Id-Id (*b* = −0*.*12, *SE* = 0*.*04, *p <* 0*.*05), but not Lit-Lit vs. Id-Lit (*p >* 0*.*20), but a relevelled model with Id-Id as the baseline showed a trend for an interaction with familiarity for Id-Id vs. Id-Lit (*b* = 0*.*06, *SE* = 0*.*03, *p* = 0*.*06). These interactions are shown in **Figure 7**. No other effect was significant.

To better locate the source of the familiarity interactions, we computed treatment coded sub-models split into low and high familiarity sentences with Id-Id as the baseline to compare Id-Id vs. Id-Lit and Lit-Lit (since the significant differences in the preceding analyses only involved Id-Id). The model fit to high familiar phrases showed that readers had faster Idiom TRT for Id-Id sentences than for Lit-Lit sentences (*b* = 0*.*16, *SE* = 0*.*06, *p <* 0*.*05). No other effects were significant. Thus, of note, there were no interactions with executive control.

### **DISCUSSION**

Experiment 2 investigated whether idiom processing was modulated by individual differences in executive control for the same participants tested in Experiment 1. Our results show that high familiar idioms had shorter total reading times than matched literal phrases when the sentence was ultimately biased toward an idiomatic interpretation. Finally, of relevance to our question of **Table 8 | Effect sizes (***b***), standard errors (***SE***), and** *Pr* **(***>***|***t***|) values for the idiom total Reading time logistic LME model.**


*\*p* <sup>≤</sup> *0.05.*

interest, individual differences in executive control never interacted with any reading measure.

Residual 0.1604

### **GENERAL DISCUSSION**

We used eye movement measures of sentence reading to determine whether familiarity and context modulates metaphor processing as a function of individual differences in executive control. We also assessed how the relationship between individual differences in executive control and comprehension extended qualitatively to idiom processing for the same participants. There were three key findings.

The first key finding was that, in Experiment 1, relative familiarity of a metaphorical vs. literal interpretation of the verb modulated how much time people spent reading the verb on the first pass (see **Figure 1**). Of note, this effect occurred irrespective of prior context or individual differences in executive control. Accordingly, when metaphor familiarity was low, Verb GD was slower for verbs intended metaphorically than for verbs intended literally. This difference decreased as metaphor familiarity increased (relative to familiarity of the verb's literal interpretation). This suggests that when people encounter verbs intended metaphorically, immediate comprehension is slowed if the metaphorical meaning of the verb is less familiar. The slowing of immediate comprehension potentially reflects some

combination of the time necessary for inhibiting the more familiar literal interpretation of the verb, and for generating or retrieving from memory the verb's metaphorical sense.

bands for Id-Lit (There was no difference between Lit-Lit and Id-Lit).

Our second key finding involved metaphors and executive control in Experiment 1. Specifically, when people encountered a metaphorically or literally intended verb following a prior context that supported whichever interpretation of that verb, readers with high executive control spent more time fixating the verb on the first pass, presumably to immediately integrate the appropriate meaning with the prior context. In contrast, readers with low executive control did not spend extra time fixating the verb under the same circumstances. They consequently experienced comprehension difficulty later on in the sentence, as indicated by longer total reading times of the metaphor region, and a higher likelihood of regressing back to the context word, particularly in the metaphorically biased condition. Thus, these results suggest that high and low executive control readers differed in the rapidity with which they used context to interpret the verb on the first pass, and this difference propagated to later portions of the sentence: High executive control readers made immediate semantic commitments, whereas low executive control readers delayed their semantic commitments pending subsequent disambiguating parts of the sentence (see also Frazier and Rayner, 1990; Pickering and Frisson, 2001, for related work on single-word lexical ambiguity).

Our third key finding involves the role of executive control in idiom processing (Experiment 2). Unlike the global pattern found for metaphor processing, individual differences in executive control did not modulate reading times for idioms in Experiment 2, despite the fact that idiom reading times were affected by idiom familiarity (albeit measured in a very different way than it was for metaphors in Experiment 1). Specifically, high familiar idioms in sentences that had a subsequent idiom context had shorter total reading times for the idiom region than both matched controls and low familiar idioms. This suggests that when an idiom was familiar, people were more likely to entertain its figurative meaning and consequently were less likely to revisit the idiom on the second pass, presumably because the initial semantic commitment made to the figurative interpretation of the phrase was confirmed by subsequent context.

Of note, figurative condition had no significant first-pass effects for gaze duration on idiom-final nouns, unlike the pattern found for metaphor-final verbs, where all readers showed longer gaze durations when there was no prior context and the verb was figuratively intended. This suggests that the global effort needed to resolve a semantic commitment on the first pass is generally reduced for idioms compared to metaphors, perhaps due to the fact that relatively common idiomatic expressions enjoy a lexical boost due to the high co-occurrence of their component words (Wulff, 2008). Thus, it is possible that idioms are so thoroughly lexicalized for native speakers of a language (even ones that are rated as low familiar) that people can partially anticipate their final words and thus get a head start on processing those words lexically and semantically (Cacciari and Tabossi, 1988; Titone and Connine, 1994; Libben and Titone, 2008). This state of affairs may be especially true for natural reading contexts, such as in the current study, where early anticipation of idiom-final words may be enhanced by some amount of parafoveal processing of upcoming words (Kliegl et al., 2007; Hohenstein et al., 2010; Angele et al., 2013). While parafoveal processing of words was also certainly possible for the metaphors in Experiment 1, these metaphors do not likely enjoy the same lexicalized, collocation status as idioms. Thus, readers may not have been as able to extract useful parafoveal information for metaphors that would enable a meaningful head start on processing.

In summary, the results suggest that general cognitive capacities, such as executive control, are important for processing metaphors during natural sentence reading. The results also suggest that not all elements of figurative language may incur the same executive control demands as metaphors. Specifically, executive control demands for idioms during natural reading may differ because idioms are generally more familiar both lexically and semantically compared to metaphorical language. Thus, the results of the present study, while preliminary, suggest that further comparison of metaphors and idioms is a potentially fruitful avenue of inquiry.

### **AUTHOR CONTRIBUTIONS**

Debra Titone designed the study, Marilena Côté-Lecaldare helped create the materials, Naveed A. Sheikh programmed the experiment, and Marilena Côté-Lecaldare, Katja Häuser and Georgie Columbus acquired the data. Debra Titone, Georgie Columbus, Marilena Côté-Lecaldare and Naveed A. Sheikh analyzed and interpreted the data, and wrote the manuscript. Debra Titone, Georgie Columbus, Naveed A. Sheikh, Shari R. Baum and Katja Häuser revised the final manuscript.

### **ACKNOWLEDGMENTS**

This work was supported by a Social Sciences and Humanities Research Council (SSHRC) Standard Operating Grant (Titone) and a Natural Sciences and Engineering Research Council (NSERC) Discovery Award (Titone).

### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 17 April 2014; accepted: 18 December 2014; published online: 13 January 2015.*

*Citation: Columbus G, Sheikh NA, Côté-Lecaldare M, Häuser K, Baum SR and Titone D (2015) Individual differences in executive control relate to metaphor processing: an eye movement study of sentence reading. Front. Hum. Neurosci. 8:1057. doi: 10.3389/ fnhum.2014.01057*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2015 Columbus, Sheikh, Côté-Lecaldare, Häuser, Baum and Titone. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The importance of being apt: metaphor comprehension in Alzheimer's disease

# *Carlos Roncero\* and Roberto G. de Almeida*

*Department of Psychology, Concordia University, Montreal, QC, Canada*

### *Edited by:*

*Seana Coulson, University of California at San Diego, USA*

### *Reviewed by:*

*Cyma Van Petten, State University of New York, Binghamton, USA Christelle Declercq, Université de Reims Champagne-Ardenne, France*

### *\*Correspondence:*

*Carlos Roncero, Bloomfield Centre for Research in Aging, Lady Davis Institute, Jewish General Hospital, McGill University Health Network, 3755 Côte Ste-Catherine Road West, Montréal, QC H4B 1R6, Canada*

*e-mail: tcroncero@gmail.com*

We investigated the effect of aptness in the comprehension of copular metaphors (e.g., *Lawyers are sharks*) by Alzheimer's Disease (AD) patients. Aptness is the extent to which the vehicle (e.g., *shark*) captures salient properties of the topic (e.g., *lawyers*). A group of AD patients provided interpretations for metaphors that varied both in aptness and familiarity. Compared to healthy controls, AD patients produced worse interpretations, but interpretation ability was related to a metaphor's aptness rather than to its familiarity level, and patients with superior abstraction ability produced better interpretations. Therefore, the ability to construct figurative interpretations for metaphors is not always diminished in AD patients nor is it dependent only on the novelty level of the expression. We show that Alzheimer's patients' capacity to build figurative interpretations for metaphors is related to both item variables, such as aptness, and participant variables, such as abstraction ability.

**Keywords: metaphor, Alzheimer's disease, figurative language, language comprehension, aptness, familiarity, abstraction, simile**

### **INTRODUCTION**

Why are we so good at understanding metaphors when they express such obvious falsities? Upon hearing *Juliet is the sun*, how should we interpret Romeo's state of mind? Clearly, what he intends to express about Juliet (the *topic*) seems to be easily understood by attributing to her some property of the sun (the *vehicle*)—perhaps that of sheer brightness, uniqueness, or being vital for life. Although copular metaphors—those with the form *x is y*—are pervasive in natural language and explored profusely in literary works, their comprehension might require considerable cognitive effort. This effort may come from different levels of analysis that metaphors call for, including computing the literally anomalous meaning (what is *said*), interpreting properties of topic and vehicle, and arriving at an interpretation that is assumed to be close to what the speaker *intended* to express. Understanding a metaphor, thus, may engage many systems—from linguistic parsing and semantic composition to executive functions involved in attaining an interpretation that goes beyond what the sentence expresses literally.

We report a study on the interpretation of metaphors by patients diagnosed with probable Alzheimer's disease (AD). Considering the well-documented difficulties that AD patients have with linguistic processes (e.g., Manouilidou et al., 2009), semantic memory (e.g., Whatmough and Chertkow, 2002; Capitani et al., 2003), and working memory, in particular with executive functions (e.g., Baddeley et al., 1986; Bäckman et al., 2005), the task of interpreting non-literal sentences might seem a daunting one for this population. Surprisingly, however, only four studies to our knowledge have investigated how AD patients interpret metaphors (Winner and Gardner, 1977; Papagno, 2001; Amanzio et al., 2008; Maki et al., 2012). These studies differ substantially in method, language, types of metaphors employed, and stimulus properties. Only the study by Amanzio et al. (2008), for example, controlled for level of conventionality, contrasting conventional and familiar metaphors with novel ones. They found that AD patients have difficulty with novel metaphors, but their comprehension of conventional metaphors was similar to that of healthy controls. They suggested that the main reason for the novel-metaphor impairment in AD might be defective executive functions and what they called "verbal reasoning," which are deemed necessary to compute relations between novel topicvehicle combinations. Conventional metaphors, in contrast, were argued to rely less on executive functions and more on retrieving an associated meaning from semantic memory1 .

In the present study, we investigate the role of another variable in metaphor interpretation by AD patients: *aptness*. This variable reflects the degree to which properties of the vehicle capture properties that are applicable to (or can be predicated about) the topic. For example, in *Lawyers are sharks* the vehicle *shark* by hypothesis activates properties that might be true of *lawyer*. Crucially, aptness is independent of conventionality and familiarity: an unfamiliar metaphor can still be apt based on the properties of the vehicle that are applicable to the topic; and a conventional metaphor can be inapt if the common figurative meaning of the vehicle does not apply to the topic. We also evaluated to what degree a patients' ability to perform abstractions could predict metaphor interpretation—on the assumption that abstraction might be required to detach the literal meaning from the expression and generate an interpretation that approaches the intended meaning.

<sup>1</sup>We use "meaning" in a loose sense often to cover both a literal meaning (roughly, what is said) and the intended message or even what is implicated by an expression (e.g., "metaphor meaning"), which might differ substantially from what is literally expressed. When necessary we make adjustments to our use of "meaning" to reflect these distinctions.

We start off with a brief discussion on the comprehension of different types of figurative expressions in AD: proverbs, sarcasm, idioms, and metaphors. Our main goal is to gather the pattern of performance of AD patients in diverse types of tests employed in the investigation of figurative language, and which motivate our study on metaphor, reported below. A secondary goal of our discussion on figurative language in AD involves evaluating both subject and item variables employed in these studies, which is crucial for understanding how a meaning that approximates that of the intended message is attained and how it may be disrupted in AD.

### **FIGURATIVE LANGUAGE IN ALZHEIMER'S DISEASE**

Thus far, 22 studies have investigated diverse forms of figurative language in AD—including proverbs, idioms, sarcastic expressions, and metaphors (for a recent review, see Rapp and Wild, 2011) 2 . What seems to be common to these forms of expression is that there is a stark contrast between what is *said* and what is *intended* by a token utterance. We follow here a classical distinction in pragmatics (e.g., Grice, 1989) assuming that what is *said* is the literal interpretation of the expression, its compositional meaning based on word meanings and how they combine structurally.3 We thus take what is *intended* by a given expression to be what is implicated (rather than explicated), or what the speaker intends to express, whether this intention can be easily calculated (such as the ironic *It is hot in here*, uttered by a visitor to Yukon in January) or not (*Juliet is the sun*). While this distinction has been well established in many research circles in cognitive science, what more recent psycholinguistic and cognitive neuroscience research have shown is that numerous variables play an important role in the process of calculating the intended message from what is said (see, e.g., the papers in Gibbs, 2008, and Roncero and de Almeida, 2014 , for reviews).The main variables of interest include the expression's familiarity (Blasko and Brihl, 1997; Thibodeau and Durgin, 2011), conventionality (Bowdle and Gentner, 2005; Gentner and Bowdle, 2008), and aptness (Chiappe and Kennedy, 1999; Jones and Estes, 2005, 2006; Glucksberg and Haught, 2006). In addition, in the more specific case of figurative expressions in AD, variables such as the degree to which the patient must rely on executive functions such as inhibition and abstraction (Laflache and Albert, 1995; Chapman et al., 1997; Papagno et al., 2003), and whether or not the expression is "frozen," i.e., stored as a whole (Amanzio et al., 2008; Rassiga et al., 2009), have been investigated. We will take this last variable as the perspective from which we discuss briefly the studies on AD patients' interpretation of figurative expressions. The main reason for focusing on this variable is that "frozen" and "non-frozen" expressions by hypothesis rely upon different cognitive resources. In frozen expressions (e.g., idioms), the nonliteral meaning is fixed in the sense that the interpretation relies more on the retrieval of a conventional meaning from semantic memory than on the computation of a novel meaning. In contrast, non-frozen expressions, such as in most metaphors, actual interpretation requires the computation of a meaning, rather than retrieval from memory: even in the case of familiar and conventional metaphors, the actual property retrieved from the vehicle to predicate on the topic is flexible because numerous properties are usually associated with a given conventional vehicle (e.g.,*ruthless*, *aggressive, sneaky*, etc., for *shark*; see Roncero and de Almeida, 2014) 4 .

### **PROVERBS**

A proverb often involves the "teaching of a lesson"—which reflects its *intended* message. For example, *Too many cooks spoil the broth* suggests that too many people involved in a single project can spoil the end result. Although one could argue that these expressions are compositional, for their literal meanings are obtained from their constituents, proverbs are used to express something else, perhaps analogous to the expression itself—and thus they require the retrieval or the computation of another message. Studies have shown that AD patients prefer literal (or "concrete") rather than figurative ("abstract") interpretations of proverbs (e.g., *Rome wasn't built in a day*; Code and Lodge, 1987) and familiar, proverb-like sentences (e.g., *He's saving up for a rainy day*; Kempler et al., 1988). These results have been obtained with both, free-interpretation (Code and Lodge, 1987) and multiple choice tasks (Kempler et al., 1988), suggesting that AD patients' abstracting abilities might be impaired, making it difficult for patients to go beyond what is explicitly said in the sentence. However, Brundage (1996) found that the difficulty with proverbs is mostly due to their familiarity, suggesting instead that comprehension of proverbs relies more on remembering an associated meaning, which becomes stronger with increased familiarity, rather than relying on a process of abstraction from the words in the proverbs. Consistent with these results, Laflache and Albert (1995) found AD patients could provide accurate proverb interpretations, despite showing impaired abstract thinking abilities. This effect was further confirmed by Chapman et al. (1997), but only for familiar proverbs, with patients showing an impairment for unfamiliar proverbs. Chapman et al. also

<sup>2</sup>We should also note that of the 22 studies on figurative language in AD—the 20 reviewed by Rapp and Wild (2011) and two more recent ones (Yamaguchi et al., 2011; Maki et al., 2012)—several have employed figurative language as a diagnostic tool for early dementia (e.g., Code and Lodge, 1987; Santos et al., 2009; Yamaguchi et al., 2011), while others have actually investigated the nature of figurative language understanding obtained from the pattern of linguistic and cognitive deficits in AD. We restrict our discussion to the latter types, which are more closely related to the present study.

<sup>3</sup>We will use "semantic composition" or "compositional" to refer to expressions from which a meaning is obtained by computing the (denotational) meaning of the constituent words and how they combine syntactically. How a non-compositional meaning is obtained (i.e., a figurative meaning not based on the words stated) remains a matter of debate. Some have argued that certain idioms, like *kicked the bucket*, are expressions that behave as if they are lexicalized (Swinney and Cutler, 1979; but see Holsinger and Kaiser, 2013) or simply accessed in memory (McCabe, 1988; Caillies and Declarcq, 2011). Meanwhile, others (e.g., Searle, 1979) have argued that obtaining the figurative meaning of an expression requires one to first entertain its literal meaning to later discard it, or that a figurative meaning can actually be obtained "directly" (Gibbs, 2001; Glucksberg, 2003). These debates, however, will not be resolved in the present paper.

<sup>4</sup>Although we distinguish figurative expressions according the frozen/nonfrozen contrast, what is important is how the intended message conveyed by these expressions is attained—whether it is stored or computed anew.

found that when patients were given a multiple-choice task with four alternatives, including "concrete" and "abstract" (i.e., figurative) interpretations, the effect of familiarity disappeared, with patients having difficulty selecting the figurative interpretation. Therefore, the different conditions (multiple-choice vs. verbal explanation) appear to have distinct cognitive demands, as AD patients performed worse in the multiple-choice condition.

### **SARCASM**

Sarcasm is a form of expression that usually stands in opposition to what is said: *It is hot in here* (Yukon, circa January) means "it is cold" (or, more properly, its negation: "not hot"). While the meaning of the expression itself is compositional, its intended meaning needs to be inferred from a given intonation or context. The investigation of sarcasm interpretation in AD has also employed either free-interpretation (Kipps et al., 2009; Rankin et al., 2009) or multiple-choice (Maki et al., 2012) tasks. Both Kipps et al. (2009) and Rankin et al. (2009) employed two subtests of the TASIT (The awareness of social interference test; see McDonald et al., 2003) in which patients watched vignettes with actors engaging in dialogs that could be interpreted as being either sincere or sarcastic (e.g., *Ruth*: *Great movie, wasn't it?* [. . . ] *Michael*: [. . . ] *I feel I could see it another dozen times*). The dialogs contained the same sentences, thus, participants had to rely on extra-linguistic cues such as intonation, facial expressions, and gestures to judge whether the actors in the scene were being sincere or sarcastic. Both studies found no impoverished sarcasm comprehension in AD patients compared to controls. In the study by Maki et al. (2012), however, AD patients did perform significantly worse than both, a group of healthy elderly controls and a group of patients with mild cognitive impairment (MCI): in fact the AD patients chose the literal interpretation significantly more than other control groups. As with proverbs, the pattern of patients' performance may be due to the task: the use of a multiple-choice paradigm rather than interpretations through verbal explanations.

### **IDIOMS**

Idioms such as *pushing up daisies* (meaning "dead and buried") and *hit the sack* ("going to bed"), represent meanings that are not compositional and might require retrieving the associated meaning from memory. Studies investigating idioms in AD have also found that performance in verbal descriptions is superior to that in multiple-choice of picture selection tasks (Papagno et al., 2003). However, Papagno et al. (2003) also showed that performance in the picture selection task varies with the nature of the pictures presented as alternatives. When patients are presented with pictures representing literal and figurative interpretations of common Italian proverbs (e.g., *to have a green thumb*), they perform at chance (e.g., selecting either a picture of a man with a thumb full of green paint or a picture of a woman gardening), but when the alternative pictures represent either the figurative meaning (the gardener) or an unrelated picture containing the depiction of one word referent from the idiom (e.g., someone with a thumb stuck at a drawer), patients select the figurative interpretation significantly more than the alternative. These results suggest that AD patients are capable of interpreting figurative expressions, but have difficulty efficiently suppressing the literal interpretation when it is presented as an alternative (Papagno et al., 2003).

The hypothesis that AD patients have difficulty suppressing a literal interpretation was further investigated in two matching tasks by Rassiga et al. (2009). In the first, patients had to choose among four drawings the one that corresponded to the interpretation of the idiom. In the second, patients had to match the idiom to one of four alternative words, one associated with the figurative interpretation, one associated with a word in the idiom (the literal alternative), and two unrelated words. Performance was worse than controls in both conditions. In the picture-matching task, participants chose more often the picture corresponding to the literal than to the figurative interpretation. In the sentence-to-word task, however, patients chose the word representing the figurative meaning of the idiom significantly more than other alternatives. Rassiga et al. found that performance on the sentence-to-word task was predicted only by executive-function scores, as measured by a dual-task that included digit span and paper-and-pencil maze tasks (Baddeley et al., 1997). These results again suggest that difficulty in idiom interpretation might be due to failure of inhibition of the literal interpretation, while the degree of inhibition needed can be affected by test modality: picture matching requires more inhibition than single-word matching possibly because alternative scenes involve more working-memory resources to match to an appropriate sentence. By extension, verbal explanations would have required even less inhibition as they do not involve foils, although Rassiga et al. (2009) did not employ this technique. Indeed, in studies in which AD patients were asked to provide verbal explanations for idioms (Papagno, 2001; Amanzio et al., 2008), no impairment was found.

### **METAPHORS**

Copular metaphors, in contrast to idioms, require identifying the relevant property associated with the vehicle that can be applied to the topic (Ortony, 1979; de Almeida et al., 2010). Thus, in *Juliet is the sun* one needs to search for possible ways in which the topic (*Juliet*) could be predicated by the vehicle (*sun*). As suggested by Papagno (2001, p. 1458), metaphors involve "an active search of the specific semantic attribute," more so than other types of figurative language. Because AD patients have impaired executive functions (Baddeley et al., 1986), it follows that metaphors' increased cognitive demands may cause interpretation difficulties, in particular in the search for an appropriate intended meaning.

In what was perhaps the first study examining metaphor interpretation in AD, Winner and Gardner (1977) asked seven individuals (all diagnosed with pre-senile dementia and probable AD) to select, among four pictures, the one that best matched a given metaphorical statement (e.g., *a heavy heart can really make a difference*). Two of the picture-types used are relevant to the present paper: one that matched the figurative meaning of the metaphor (a picture of a man crying), and a second one, which displayed the literal form (a person having difficulty walking due to carrying a large red heart). AD patients were found to pick the picture representing the metaphorical meaning as many times as the picture representing the literal meaning (45 and 44% respectively). This result is consistent with those found for proverbs, idioms, and sarcasm, which show that AD patients have difficulty selecting the *intended* meaning in the presence of a literal competitor <sup>5</sup> .

Papagno (2001), however, employed a verbal explanations task to examine AD patients' comprehension of idioms and metaphors over a 6-month period. Both the idioms and metaphors were considered highly familiar in Italian, to the extent that their meanings could be found in a dictionary. The assumption was that AD patients should have known these expressions, but could have "lost" them during the disease progression. At first examination, only four patients demonstrated impairment for nonliteral language, with metaphor comprehension being the *least* impaired linguistic ability. Among the errors produced, however, a distinction did emerge between idioms and metaphors. Whereas the most common error for idioms was a literal interpretation, the most common error for metaphors was an inability to produce a response. When AD patients were retested at a later stage, there was an overall decrease in nonliteral language comprehension, yet further analysis showed this result was attributable to metaphors only. AD patients showed no decrement for idioms. These results led Papagno to suggest that language impairment, especially for figurative language, is not an early symptom of AD and may only occur late into the progression of the disease.

In addition to using verbal explanations rather than a matching paradigm, Papagno's (2001) study contrasts with Winner and Gardner's (1977) for the familiarity of the items used. Winner and Gardner (1977) did not report the familiarity level of their items, whereas Papagno's metaphors were chosen for their high familiarity. It is possible that the distinct results in the two studies reflect the familiarity level of the individual items in the study. In other words, comprehension was better in Papagno's study because the metaphors were more familiar than those used by Winner and Gardner. This hypothesis was investigated by Amanzio et al. (2008), who compared AD patients' interpretation of novel metaphors with conventional ones—the same conventional metaphors used by Papagno (2001).

Amanzio et al. (2008) predicted that AD patients would show good interpretation for conventional metaphors, whose meanings are well known, because patients would simply need to retrieve the associated figurative meaning from memory, as done for idioms. In contrast, AD patients would have more difficulty with novel metaphors, whose figurative interpretation must be constructed based on possible relationships between topics and vehicles. Thus, the assumption was that for novel metaphors there were no figurative meanings stored in memory. To further support this retrieval-construction dichotomy, Amanzio et al. also compared participants' ability to interpret new and conventional metaphors to patients' interpretations for idioms as these are also assumed to simply rely on memory retrieval. Thus, patients and controls were predicted to show similar performance for idioms and conventional metaphors that rely on retrieval, but worse interpretation for novel metaphors whose meaning needs to be constructed. The results supported their predictions. In addition to conventional metaphors, AD patients also showed good interpretation (similar to controls) for idioms. Novel metaphors were the only category where AD patients displayed a relative impairment compared to controls. AD patients' performance in verbal, visual reasoning, and executive-function tasks were also the best predictors of metaphor interpretation scores. Amanzio et al. took these results to support their hypothesis that the main obstacle faced by individuals when interpreting novel metaphors is the need to construct a meaning due to impaired executive functions: "if the central executive is damaged, the ability to create a new resemblance, required to understand a novel metaphor, may be defective" (p. 7). When the comprehension process relies on retrieval rather than construction, AD patients' verbal explanations is similar to controls', as observed for idioms and conventional metaphors. However, when the process relies on construction, comprehension can be impaired, as observed for novel metaphors.

# **STUDY 1: THE COMPREHENSION OF METAPHORS IN ALZHEIMER'S DISEASE**

We set out to study metaphor comprehension in AD with three main goals in mind. First we were interested in understanding how the possible breakdown of metaphor comprehension in this population might inform us about the normal processes involved in metaphor processing. We see the investigation of patterns of impaired performance—both in groups of patients and in single-case studies—as an important method for understanding how unimpaired linguistic and cognitive systems work (see Caramazza, 1986; Zurif et al., 1989). Clearly, the contrast between meaning construction and meaning retrieval suggested by studies on figurative language with AD patients implies that different cognitive mechanisms might be recruited in metaphor comprehension, and that empirical results depend on task and stimulus variables. Thus, a second goal in our study was to explore the role of different variables underlying metaphor comprehension and particularly the role of familiarity and aptness. And finally, our third goal was to understand figurative language comprehension in AD proper, and more specifically how the semantic and pragmatic systems might breakdown with the disease. The paucity of metaphor comprehension studies in AD is surprising given how productive these expressions are in natural language. An exploration of how metaphors are understood can ultimately help us understand how linguistic, semantic memory and working memory systems are affected with the progression of the disease.

### **FAMILIARITY AND APTNESS**

As we have seen in our brief review of the literature on figurative language in AD, comprehension is better for what we referred to as "frozen" than for "non-frozen" expressions—and this difference reflects distinct cognitive demands. Whereas frozen expressions require retrieving an associated meaning, non-frozen expressions, such as most copular metaphors, require constructing a meaning based on the relation between topic and vehicle words. Whether or not good performance is observed, however,

<sup>5</sup>We also note that Maki et al. (2012) reported worse interpretation for conventional Japanese metaphors (single-word and copular). We would argue, however, that their use of a multiple-choice task could have again biased patients to select the literal foil.

is related to two additional factors: task modality and familiarity. Patients typically perform better when asked to provide verbal explanations rather than when they are asked to match an expression to a picture, word, or sentence alternative; and they also perform better when expressions are familiar. Familiarity, however, interacts with other ways of conceiving how one attains the meaning of a figurative expression. One of them is what Giora (1997) called *saliency*. She argues that both the literal and figurative meanings, when available, compete during comprehension, but the meaning with the highest level of *saliency* will be chosen. Saliency, then is akin to the activation level that one meaning will reach, winning out against a competitor, regardless whether the winner is a literal or a figurative interpretation. With regards to a particular figurative expression, then, the greater the familiarity, the more strongly that expression will be associated with a nonliteral meaning. In other words, familiarity has the effect of making the nonliteral meaning more salient, which makes subsequent retrieval of those meanings easier. Supporting this argument, studies with healthy adults found that familiar nonliteral expressions are read faster than less familiar ones (e.g., Blasko and Brihl, 1997).

The impact of familiarity on saliency, then, is somewhat straightforward: increased experience leads to stronger traces in semantic memory. Gentner and Bowdle (2008), for example, argue that vehicles initially have only an associated literal meaning, but can gain an additional meaning from its frequent use with different topics. Over time, exposure to the nonliteral use of the vehicle leads it to become stored in semantic memory and thus retrieved whenever the vehicle is heard or read in a statement. A metaphor such as *That film was a blockbuster* is taken to mean that the film had great commercial success, as opposed to mean that the film exploded a city block. Gentner and Bowdle refer to such vehicles as *conventional*. In the case of *dead metaphors,* vehicles have become so conventional that only a nonliteral meaning remains. It is worth noting that these highly conventional vehicles were the types used by Papagno (2001) and Amanzio et al. (2008). They found that AD patients interpreted conventional metaphors as well as healthy controls, but were worse than controls when asked to interpret novel metaphors.

In contrast to familiarity, aptness is not related to one's experience but it is rather more related to the salient properties activated by the expressions' topic and vehicle. More specifically, aptness is seen as reflecting the degree to which the vehicle term captures salient properties of the topic (McCabe, 1983; Chiappe et al., 2003b); thus, an expression is more apt when the vehicle captures many properties of the topic. For example, the word *rail* is not a conventional vehicle (Jones and Estes, 2006) and lacks a strongly associated nonliteral meaning. Thus, the statement *John is a rail* has no clear meaning other than the anomalous literal one. Pairing the vehicle with the topic *fashion model*, however, to state *That fashion model is a rail* conveys that the person is extremely thin and skinny—like a rail. Here, the expression is interpretable not from experience with the vehicle, but rather because the statement is highly apt: the vehicle *rail* captures salient attributes associated with *fashion model* (i.e., thinness).

Aptness has been found to correlate strongly with ease of comprehension (Chiappe et al., 2003a). Some researchers (e.g., Glucksberg, 2003, 2008) even argue that aptness is a more important variable than familiarity for metaphor comprehension because unfamiliar metaphors can be well understood if the statements are sufficiently apt (e.g., Glucksberg and Haught, 2006). For AD patients, aptness could make comprehension easier because the relevant properties are salient for both the topic and the vehicle. Patients would then be biased toward selecting those properties as the ones needed for interpretation because they have the highest saliency level. For this reason, statements such as *The senator is a fossil* are more apt and easier to understand than *The track star is a fossil*. Although fossil in both cases has the relevant attribute of *old*, this attribute is more salient of senators than track stars, which makes it easier to employ the relevant attribute (Jones and Estes, 2005, 2006).

In summary, it can be argued that both familiarity and aptness are variables that might account for better metaphor comprehension. It has been difficult, however, to determine which of these variables—aptness or familiarity—is more important because several studies report significant positive correlations between participants' aptness and familiarity ratings (e.g., Jones and Estes, 2006; Thibodeau and Durgin, 2011). Such results have cast doubt on studies that have relied on subjective ratings of familiarity from participants because these ratings may have actually reflected the items' aptness level (Jones and Estes, 2005). To remedy this problem in the present study, we used instead an objective measure of familiarity in our analysis: Internet frequency counts, gathered using the guidelines from Roncero et al. (2006). These counts were first used to reduce an initial large cohort of metaphors to those used in the present study. For aptness judgments, we collected norms from a large group of older adults. By using two different types of measurements (subjective ratings for aptness, but objective ratings for familiarity), we aimed at better tapping into these distinct variables for metaphors whose meanings were absent from a dictionary.

### **OTHER COGNITIVE AND LINGUISTIC VARIABLES**

In addition to examining how different levels of familiarity and aptness impact metaphor interpretation, we also examined whether a participant's ability to infer a relationship between two objects would predict their ability to interpret metaphors. Recall that constructing the meaning of a metaphor (e.g., *time is money*) often involves understanding the relationship between two terms (e.g., *time-money*). The vehicle (*money*) is understood to predicate something about the topic (*time*), and interpretation requires understanding what properties about *time* are being made salient by *money* (e.g., that time is valuable). Therefore, we were especially interested in scores obtained in the Similarities subtest of the Wechsler Adult Intelligence Scale (WAIS-IV). In this task, participants are asked how two objects are "alike" (e.g., *horse-tiger*, *food-gasoline*) with different scores allocated based on the quality of the answer provided. This score is a good measure of a participant's ability to create new relations between two objects, and numerous studies have used it to assess AD patients' executive functions and in particular abstracting abilities (e.g., Laflache and Albert, 1995; Chapman et al., 1997; Helmes and Ostbye, 2002). Our prediction was that the best interpretations would be produced by patients who demonstrate the greatest ability to list salient similarities between two objects.

We also examined working memory as measured by the digit span task (also a subtest of WAIS-IV). Because constructing the meaning of a metaphor presumably requires being able to hold both the topic and vehicle terms in working memory, participants with an extremely short digit span could have difficulty holding the topic term in working memory once the vehicle term itself is processed. Consequently, participants with a very limited digit span would be expected to display poor metaphor interpretation abilities. Finally, we examined if the form of the expression would impact interpretation. More specifically, we tested whether or not participants would have less difficulty interpreting a metaphor such as *The mall is a zoo* if it was presented as a simile (*The mall is like a zoo*). Career of Metaphor theory (Bowdle and Gentner, 2005; Gentner and Bowdle, 2008) proposes that the comprehension of a novel metaphor involves a comparison process between topic and vehicle (e.g., *teachers-sculptors*). More specifically, to understand a new metaphor such as *Teachers are sculptors*, the metaphor must be understood as a simile via the form *Teachers are like sculptors*. If this is the case, when AD patients are presented with a novel metaphor, they might attempt to mentally transform it into a simile to compare topic and vehicle. However, a central executive impairment (Baddeley et al., 2001; Amanzio et al., 2008) might hinder the ability to perform such a metaphorto-simile conversion. In order to test this hypothesis we asked participants to interpret both metaphors and comparable similes, which enabled us to determine if interpretation was better when the metaphors were presented directly as similes.

In summary, in order to control for the possibility that our familiarity ratings would reflect the item's aptness rather than its general frequency, we collected Internet frequency counts as an objective measure of the items familiarity. We predicted that apt metaphors would be better understood, regardless their familiarity level, but this effect would interact with participants' abstraction ability. More specifically, our prediction was that patients with higher similarity scores would produce better interpretations. Lastly, we examined whether AD patients would understand the relationship between topic and vehicle better when these were presented as similes rather than metaphors.

### **METHODS**

### *Participants*

Eleven patients with probable AD (age range 55–86), diagnosed with mild-to-moderate cognitive impairment were recruited with assistance from the Alzheimer's Society of Montreal, as well as Sunrise of Beaconsfield, a retirement community in Beaconsfield, Quebec. Participants were referred to us as individuals who had been given a diagnosis of AD according to the criteria specified by the National Institute of Neurological and Communicative Disorders and Stroke-Alzheimer's Disease and Related Disorders Association (NINCDS-ADRDA; McKhann et al., 1984), and had no other diagnosed dementia or pathology. We also examined patient files to verify the diagnosis. The study was fully explained to each participant and they gave written informed consent to participate in the study (the protocol was approved by the Institutional Review Board of the Douglas Mental Health University Institute). MoCA (Montreal Cognitive Assessment; Nasreddine et al., 2005) and MMSE (Mini-Mental State Examination; Folstein et al., 1975) were administered to all AD patients. Further criteria for participating in the study was patients' ability to understand and follow commands, and have an MMSE score of at least 16. Demographic and neuropsychological data for all participants appear in **Table 1**.

Ten healthy elderly controls, with an age range of 63–86, were recruited from Sunrise of Beaconsfield, were caregivers of the participants diagnosed with AD, or were recruited from the general public. For controls, only the MoCA was administered, with the requirement that all controls obtain a score above 25. All participants (AD patients and controls) were native speakers of English, or were bilinguals with a fluent command of English (case of two individuals), having attended university in English and worked professionally their entire lives in English. Therefore, these participants' English proficiency level was considered sufficient for the present study. All participants had a minimum education level of 6 years (**Table 1**).

**Table 1 | Demographic and neuropsychology data for AD patients and normal controls.**


*AD, Alzheimer's Disease; NC, Normal Controls; MoCA, Montreal Cognitive Assessment (Nasreddine et al., 2005); MMSE, Mini-Mental State Examination (Folstein et al., 1975); Digit Span and Similarities, Subtests of the WAIS-IV; SD, Standard Deviation.*

### *Materials*

The preparation of the stimuli involved two main phases. The first included an aptness-rating task, with a group of healthy elderly individuals, and the collection of frequency counts from the Internet, using the Google Search Engine (see below). These frequency counts allowed us to first identify those metaphors that were familiar and those that were unfamiliar. In the second phase, interpretation norms for a subset of metaphors from the first phase were obtained with another group of healthy elderly individuals. We also collected frequency counts from the Corpus of Contemporary American English (COCA; Davies, 2009) for the metaphors presented to participants. These COCA scores allowed us to check that the Internet counts for the metaphors do accurately reflect general frequency and served as a second objective rating of familiarity. Aptness, familiarity, and interpretation norms for the materials employed in the present study appear in the Supplementary Material.

*Aptness.* Twenty healthy elderly controls (age range 60–83; 10 females), all native speakers of English, were recruited from the general public and given monetary compensation for completing a rating task. These participants did not take part in the subsequent metaphor interpretation task. They were presented a booklet containing 84 metaphors (e.g., *Trees are umbrellas*) taken from another study (Roncero and de Almeida, 2014). Below each expression, there was a scale ranging from 1 to 7. Participants were asked to rate how apt they found each metaphor, where 1 was labeled *not apt,* 4 as *moderately apt*, and 7 as *very apt.* Aptness was explained as how valid they thought each statement was. *Politics is a jungle* was given as an example of an apt statement, while *Politics is a beach* was given as an example of a less apt statement.

*Familiarity.* The Google search engine was used to collect Internet frequency counts following the guidelines set by Roncero et al. (2006). In this method, a metaphor (e.g., *Music is medicine*) is written within quotation marks into the search box to produce a list of websites where the searched item was found. The first website listed is examined, and in general, if the metaphor is used in a figurative manner and expressing the meaning of the searched metaphor, then its production is included in the frequency counts. The next website listed is then examined, and so on, until a cut-off point of 30 "hits" is reached. To be clear, more than simply the first 30 websites are examined. Websites are examined one-by-one until a maximum of 30 productions that properly express the meaning of the searched metaphor is found. Furthermore, repetitions of the same metaphor within a website (e.g., when posts quote the same sentence repeatedly), and repetitions of the same title for a song or book across websites, are counted only once within the total; consequently, this method is more meticulous than simply examining the first 30 websites listed. Regarding the cut-off, the high number of websites listed by Google for certain metaphors can be greater than 10,000 for less familiar metaphors such as *Cities are jungles,* or near the millions for very familiar metaphors such as *Time is money.* As a practical solution, Roncero et al. chose 30 after remarking that few metaphors actually yield this number of productions. Expressions that reach a familiarity score of 30 would also still reflect a relatively higher level of familiarity compared to the rest of the expressions.

Note that the number of hits that Google initially lists is separate from the list of websites it lists. For example, although Google may inform that there are 11,300 hits for the metaphor *Cities are jungles*, the number of websites initially listed is only 99. After listing these websites, Google will print the statement "in order to show you the most relevant entries, we have omitted some entries very similar." This number varies per expression; for example, while it is 99 results prior to the Google statement for *Cities are jungles*, it is 243 for *Lawyers are sharks*, and the non-listed hits come from the same websites that Google already listed. In the present study, we checked all websites until a frequency count of 30 was reached or when Google printed the statement "in order to show you the most relevant results..." for the searched metaphor. Therefore, a frequency count for a metaphor based on the Google search engine reflects how many distinct websites displayed a spontaneous use of the expression.

To further ensure that these Internet frequency counts were tapping into expression familiarity, we also collected COCA frequency counts for the metaphors that were employed in the interpretation task. However, for these frequencies a less restrictive set of guidelines was used, and no maximum cut-off points were applied. The topic (e.g., *time)* was entered as the search word and the vehicle (e.g., *money*) was entered as a collocation within five words before or after the topic. Each listed production was then checked for whether it was expressing a literal or figurative interpretation. For example, *He was run out of both time and money* would be considered a literal interpretation because it refers to time and money in a concrete, non-figurative, manner. In contrast, a sentence such as *He didn't understand that for his lawyer time was money* would be included because it reflects the figurative meaning of *time is money*. We also included examples in the overall count when the exact structure was different, but the expressed meaning was the same. For example, while participants in the present study interpreted *Time is money*, the COCA count totals included productions such as *Time was money, money is equivalent to time,* and *In this profession, money and time are equivalent.* In summary, a given COCA example was excluded from the count totals only when it expressed a literal interpretation of the topic-vehicle relationship.

*Selection of metaphors.* Prior to the study, we decided to employ a set of metaphors that varied in terms of aptness and familiarity, and that only 20 expressions would be presented to each AD patient because a larger number of items would conceivably cause fatigue, due also to other pre-experimental tasks involved. We identified 5 metaphors (or their equivalent similes) that were apt but not familiar (aptness rating higher than 3.5, but an Internet frequency count less than 15), and 5 metaphors that were neither apt nor familiar (an Internet frequency count less than 15, and an aptness rating less than 3.5) There were no items with an Internet frequency count greater than 15 that had an aptness rating less than 3.5; these metaphors would have been categorized as familiar, but not apt. Therefore, to complete our cohort, we identified 10 metaphors that were apt and familiar (an Internet frequency count greater than 15 and an aptness rating higher than 3.5). However, as we later discuss, two of the apt and familiar metaphors were ultimately removed from the analyses due to difficulty in interpretation.

*Interpretation norms.* In order to collect interpretation norms i.e., to obtain the most common interpretation for each expression—a booklet containing the 20 selected metaphors was created. This booklet was presented to 20 healthy controls (age range 60–84; 14 females) that had not participated in the ratings norms nor served as controls in the subsequent interpretation task. In this booklet, each metaphor was presented within a sentence that asked participants to state which property was being expressed about the topic (e.g., *Education is a stairway because education is. . .* ). This method helped facilitate answers that would reflect a particular property or adjective. People were asked to write their answers on a line placed beneath each expression.

The different properties expressed by each participant were collapsed under a single property when they were considered synonyms. For example, *ruthless* and *aggressive* for *Lawyers are sharks* were grouped together under the property label *ruthless*, while *valuable* and *important* for *Time is money* were both categorized under the property *important*. Participants also sometimes wrote elaborate sentences that had the similar meaning of a particular property, without necessarily using a synonym of that particular property. For example, one participant wrote *Lawyers are sharks because lawyers are out to get you!* This sentence was categorized as expressing the property label *ruthless* for *lawyers are sharks* because the sentence conveys the idea that lawyers are *ruthless*.

Two judges were involved in coding the responses. The first judge created the set of properties that reflected the interpretations written for each metaphor. The second judge then verified whether the property chosen was appropriate for the interpretation given. The second judge consulted the first judge when there were any disagreements and resolved any discrepancy. Once the set of properties had been decided, any property stated by a minimum of three participants was considered a salient property for that metaphor. Any properties stated by only two individuals were considered less salient properties. Properties stated only once were considered non-salient properties. This procedure allowed us to identify salient properties for all of the metaphors, but certain metaphors lacked less salient properties as there were no properties mentioned by at least two individuals. A list of the metaphors sorted by item group, accompanied by their salient and less salient properties, is presented in the Supplementary Material.

*Stimuli. Two* booklets were created for the metaphor interpretation task. One booklet listed half the original metaphors as similes (e.g., *Cities are like jungles* rather than *Cities are jungles*). In the second booklet, the topic-vehicle pairs were in the same order, but those items that were metaphors in booklet 1 were written as similes in booklet 2, and vice-versa.

### *Procedure*

A researcher first administered the MoCA and MMSE, if the participant was a person diagnosed with AD, or only the MoCA if the participant was an elderly control. In addition, two subtests of the WAIS-IV were administered: similarities and the forward digit span task. Afterwards—or at another session if the previous tasks took longer than an hour—each participant was read either the first sentence of booklet 1 or 2 and asked to provide an interpretation of the sentence. For example, the first time the metaphor *Music is medicine* was read, the researcher asked the participant, *What is someone trying to say when he or she says that music is medicine?* If the participant could not provide an answer, or failed to mention a particular property, the researcher then asked the participant, *If someone were to say that music is medicine, what would they be trying to say about music?* This method helped prompt answers that reflected a particular property that could then be matched to the interpretation norms. After an interpretation had been given by the participant, and transcribed by the researcher, or if the participant was still unable to provide an adequate answer, the next sentence was read, and so on, for all 20 items. Participants were given unlimited time to provide an interpretation, and told it was fine if they could not think of an interpretation. Sessions involving the metaphor interpretation task lasted approximately 30 min.

### **RESULTS AND DISCUSSION**

Interpretation scoring followed a procedure similar to that used by Papagno (2001). A score of 2 was given if the interpretation mentioned a salient property, but a score of 1 if the interpretation mentioned a less-salient property. If the interpretation expressed a meaning completely different from the salient or less salient properties, or if the participant was unable to provide an answer, a score of 0 was given. Therefore, the maximum average interpretation score obtainable for a particular group was 2. In order to score the answers, one researcher first allocated a set of scores based on the transcriptions, while a second judge, who was blind to whether the interpretations came from a control or a person diagnosed with AD, also provided scores as a reliability check. The interclass correlation was 0.85 (*p <* 0*.*001). Because this reliability score was high, the scores from the first researcher were used in all subsequent analyses.

As mentioned above, two items were dropped from analysis because participants (AD patients and controls) repeatedly expressed difficulty providing an interpretation. Several participants expressed understanding the statement *Life is a journey*, but stated it was difficult to put into words a particular meaning. When interpretations were provided, most participants provided elaborate discussions about life in general rather than providing a particular property. The metaphor *Genes are blueprints* also caused confusion and was consequently dropped from analysis. For most participants, there was an initial period where the participant had to be told that the sentence meant genes "as in DNA" as opposed to *blue jeans.* Several participants (especially AD patients) expressed not understanding the concept *DNA.* Therefore, the metaphor interpretation scores for these items were not included in any of the analyses involving metaphor interpretation scores.

### *Group analyses*

For AD patients, the mean metaphor score was 1.21 (*SD* = 0*.*49) and the mean simile score was 1.32 (*SD* = 0*.*19). For normal elderly controls, the mean metaphor score was 1.45 (*SD* = 0*.*14), and the mean simile score was 1.55 (*SD* = 0*.*31). Figure 1 in the supplementary materials displays these results. We first ran a repeated-measures ANOVA that compared AD patients' and elderly controls' general means for metaphor and simile interpretation scores. Group (AD vs. Control) was the between-subject factor and form (metaphor vs. simile) was the within-subject factor. The main effect of form was not significant [*F*(1*,* 19) = 1*.*47, *p* = 0*.*24, η*<sup>p</sup>* = 0*.*072], nor was the interaction [*F*(1*,* 19) = 0*.*01, *p* = 0*.*95, η*<sup>p</sup>* = 0*.*001]. These non-significant effects, in principle, go against the hypothesis that topic-vehicle words presented in metaphor form (*x is y*) would be harder to understand because they would need to be converted into simile form (*x is like y*), while similes allow for a direct comparison. Although the lack of a difference cannot rule out this hypothesis—in particular because it applies primarily to novel metaphors (e.g., Gentner and Bowdle, 2008)—it is important to note that the verbal interpretation task we employed requires a figurative interpretation of the relation between topic and vehicle, and thus metaphor and simile forms might yield the same interpretation strategy, with both leading to a figurative interpretation. Due to this lack of difference, we refer to these expressions simply as metaphors. The mean metaphor interpretation score for AD patients was 1.26 (*SD* = 0*.*29) vs. 1.50 (*SD* = 0*.*19) for normal controls, and this difference was significant when we ran the repeated-measures ANOVA [*F*(1*,* 19) = 4*.*82, *p <* 0*.*05; η*<sup>p</sup>* = 0*.*20]. We also compared this difference by items, and again found a significant difference [*t*(17) = −2*.*17, *p <* 0*.*05, *r* = 0*.*47]. Thus, the difference between groups, regardless of expression type or other stimulus variables (see below), suggests an impairment in figurative language interpretation in AD, an effect that has not been obtained in tasks that require overt explanation of figurative meaning (e.g., Papagno, 2001; Amanzio et al., 2008). Recall, however, that those null differences were found for conventional metaphors only. We next examine how item variables influenced interpretation.

### *Effects of aptness and familiarity*

We first examined whether the COCA counts would correlate with the Google counts in order to validate Internet frequencies as predictors of familiarity. We found a positive correlation between the Google search counts and the COCA counts [*r*(16) = 0*.*49, *p <* 0*.*05], which suggests the Google counts collected using the Roncero et al. (2006) method tap into how familiar participants may be with a metaphorically expressed meaning. In order to better understand the aptness and familiarity effects on metaphor interpretation, we ran a multiple regression with the aptness ratings, Internet frequency counts, and COCA counts as predictors and AD patients' interpretation scores as the dependent variable. The overall model was significant [Adj *<sup>R</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*43, *<sup>F</sup>*(3*,* 14) <sup>=</sup> 5*.*34, *p <* 0*.*05]. Among the individual predictors, however, aptness ratings alone were a significant predictor of interpretation scores (*t* = 3*.*58, *p <* 0*.*01), but not the Internet frequency counts (*t* = −0*.*23, *p* = 0*.*82) nor the COCA counts (*t* = −0*.*38, *p* = 0*.*71). See Figure 2 in the supplementary materials for a scatterplot between aptness ratings and AD patients' interpretation scores. Similar results were found for controls' interpretation scores. Again, the overall model was significant [Adj *<sup>R</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*43, *F*(3*,* 14) = 5*.*27, *p <* 0*.*05], and aptness ratings were a significant predictor (*t* = 3*.*58, *p <* 0*.*01), but not Internet frequency counts (*t* = −0*.*09, *p* = 0*.*93), nor COCA counts (*t* = −1*.*06, *p* = 0*.*31).

### *Participant variables*

The results are suggestive of a general aptness effect on metaphor interpretation—with overall better interpretation when the relation between topic and vehicle is deemed apt rather than familiar. However, other participant variables need to be taken into account before we can generalize over the effect of aptness on comprehension. In order to better understand the factors influencing performance on the different metaphor factors, we first compared participants' working memory as measured by the simple digit span task. No difference was found between AD patients (*M* = 11*.*18; *SD* = 2*.*44) and controls [*M* = 11*.*60; *SD* = 1*.*07; *t*(14*.*01) = 0*.*52, *p* = 0*.*61, *r* = 0*.*14]. AD patients had a mean similarity judgment score of 17.18 (*SD* = 10*.*13), while controls' score was 26.20 (*SD* = 1*.*81). This difference was significant [*t*(10*.*70) = −2*.*90, *p <* 0*.*05, *r* = 0*.*66]. Finally, AD patients' metaphor interpretation scores were found to correlate with similarity judgment scores [*r*(9) = 0*.*68, *p <* 0*.*05], but not digit span scores [*r*(9) = 0*.*06, *p* = 0*.*86]. Therefore, the ability to abstract a relationship between two objects might be considered a strong predictor of patients' abilities to interpret metaphor.

### **STUDY 2: SUBJECTIVE FAMILIARITY RATINGS BY ELDERLY INDIVIDUALS**

In our main study, aptness ratings provided by participants were strong predictors of metaphor comprehension. In contrast, two objectives measures of familiarity, Internet counts and COCA frequency counts, failed to predict which metaphors would be better interpreted by AD patients. This result is surprising considering studies (e.g., Amanzio et al., 2008) suggesting that familiarity is a strong predictor for the correct interpretation of metaphors. One crucial difference between our metaphors and those of Amanzio et al., however, is that while they used conventional metaphors (obtainable from the dictionary) as their set of familiar and easily understandable metaphors, we used metaphors whose meaning requires understanding a perceived relationship between a topic-vehicle pair even when that metaphor is unfamiliar (e.g., *Deserts are ovens)*. A valid concern, however, is that our measures may have more properly reflected *frequency* rather than *familiarity per se*. We assume that more frequent expressions are also more familiar, but such measures only indirectly reflect *familiarity* when personal experience itself is considered, and may exist on a more subjective level entirely. For example, one may only have heard a particular expression such as *Lawyers as sharks* a few times (say, less than ten), but nevertheless consider the expression rather familiar. Our concern in the first study is that such judgments are influenced by aptness. More specifically, apt metaphors are more easily understood (Chiappe et al., 2003a), and this ease of interpretation might lead people to have the impression that these metaphors are actually more familiar than they are (Thibodeau and Durgin, 2011; see also Jacoby and Whitehouse, 1989; Whittlesea and Williams, 2001, and Westerman, 2008 for parallels in visual recognition memory). These concerns motivated our preference for objective measures of familiarity in the study on metaphor interpretation, but it may have also come at the cost of only examining familiarity indirectly: frequency (occurrence across a medium), rather than familiarity, which would have predicted better interpretations.

In the present study, we examined the predictive value of subjective familiarity ratings. A group of elderly adults was asked to judge how familiar they found the expressions used in the metaphor interpretation study. To ensure that participants would not be biased by aptness—a central concern of our previous study's use of objective measures—participants were specifically told to ignore all aspects related to the metaphor, except their personal experience. We then examined if these subjective ratings of familiarity would predict AD interpretation scores that were collected in the previous study. Because our concern from the beginning of the investigation has been the bias effect of aptness on familiarity ratings, we predicted that there would be a significant relationship between aptness and familiarity ratings despite our best efforts to remove such bias. This result reflects the ease of interpretation effect, whereby people consider statements more familiar because they are more quickly or easily interpreted. Furthermore, in case the subjective familiarity ratings would be a significant predictor of interpretation scores, the aptness ratings collected in the previous study would nevertheless be found to be the stronger predictor of interpretation.

### **METHODS**

### *Participants*

Twenty elderly adults (age range: 67–88, 15 females) were recruited from a list of participants at the Bloomfield Center for Research in Aging (Lady Davis Institute, Jewish General Hospital, Montreal). These participants are accustomed to being recruited for various studies, have no known psychiatric illness nor signs of dementia, and all participants obtained scores over 26 (normal range) on a MoCA task that was administered before the metaphor familiarity task. On this occasion, participants were also administered a large battery of tasks for various unrelated studies, which included the present familiarity ratings. Participants were given monetary compensation for their time.

### *Familiarity ratings*

Ratings were collected vocally. Participants were told they would be read a series of metaphors, and they were to rate how often they had heard the expression in the past, employing a scale ranging from 1 (not at all) to 7 (practically every day). They were further told that they were not to rate how well they liked the expression, nor how well they understood it, but to focus solely on how often they had heard this given expression previously. Finally, they were told that while some expressions may be quite familiar, others could be ones they had never heard before, case in which they should simply answer honestly with a rating of 1. After the participant confirmed understanding the instructions, the researcher then read each metaphor in the following manner: *From 1 to 7, 1 being not at all, how often have you heard the metaphor. . .* (e.g. *Music is medicine)?* All participants were read each expression in this manner one-by-one, and all participants were presented the expressions in the same order. The participant then stated a number between 1 and 7 as their rating, and the researcher recorded this response. If the participant chose to change their rating before moving to the next expression, this new rating replaced the first one. When participants gave two responses (e.g., stating *"I'd give that a 2 or a 3*"), the lower number was always chosen.

### **RESULTS AND DISCUSSION**

The collected familiarity ratings are presented in the Supplementary Material. The mean metaphor rating was 3.45 (*SD* = 1*.*84). These ratings correlated significantly with both the Internet counts [*r*(16) = 0*.*61, *p <* 0*.*01] as well as the COCA counts [*r*(16) = 0*.*59, *p <* 0*.*01], which suggests that participants' ratings were tapping into the general frequency of the expression. However, as predicted, subjective familiarity ratings were also significantly correlated with the aptness ratings obtained in the previous study [*r*(16) = 0*.*70, *p <* 0*.*01], which suggests that ratings were affected by an items' aptness level. Indeed, the significant correlation with aptness may explain why these subjective measures of familiarity, unlike the previously collected objective measures, were a significant predictor for the interpretation scores of AD patients from study 1 [*r*(16) = 0*.*53, *p <* 0*.*05]. Noting these three significant relationships (aptness and familiarity, aptness with interpretation scores, familiarity with interpretation scores), we checked for mediation by running a regression with both aptness and familiarity ratings as predictors, and AD patients' interpretation scores as the dependent variable. The overall model was significant [*F*(2*,* 15) = 8*.*26, *p <* 0*.*01], but among the individual predictors, the only one significant was aptness (*t* = 2*.*78, *p <* 0*.*05; familiarity, *t* = 0*.*16, *p* = 0*.*88).

The results thus allow us to argue that AD patients' superior interpretation of metaphors rated more familiar is fully mediated by the aptness level of these metaphors. Furthermore, the semipartial correlation between AD patients' interpretation scores and aptness ratings was 0.50, but only 0.03 for familiarity ratings; a large drop from 0.53 when familiarity ratings are considered alone. Therefore, the results cement our findings from the previous study: when considering the types of metaphors that will be best interpreted by AD patients, aptness rather than familiarity is a more important predictor. Metaphors that older adults consider more familiar will be better interpreted by AD patients, but this relationship depends on the aptness level of these metaphors. In other words, metaphors considered more familiar are better understood by AD patients because they are also inherently more apt.

# **GENERAL DISCUSSION**

Thus far, only studies employing a multiple choice or a match-totarget kind of test have found deterioration in figurative language comprehension in mild-to-moderate AD patients (e.g., Chapman et al., 1997; Rassiga et al., 2009). A similar pattern was also obtained by two of the other four studies investigating metaphor comprehension in AD (Winner and Gardner, 1977; Maki et al., 2012). One problem with such studies is that the task provides a literal interpretation together with the figurative one; in those circumstances, AD patients may have difficulty inhibiting the literal interpretation in order to select the figurative one. When tasks require free verbal interpretations of familiar figurative language, AD patients usually do not differ from controls (e.g., Papagno, 2001; Amanzio et al., 2008). In the present study, participants were asked to provide interpretations for different metaphors and similes, and while metaphors and similes did not differ within groups, AD patients produced overall worse figurative interpretations than did controls. Crucially, however, we also show that the pattern of impairment in AD depends both on the patient's abstraction abilities and on the aptness of the metaphor. In this section we focus on two key issues related to the pattern of figurative language performance in AD: stimulus variables, in particular aptness and familiarity, and whether the meaning of a figurative expression is constructed or stored in memory. We follow this discussion with a proposal for how aptness and abstraction abilities interact to yield metaphor interpretation.

### **APTNESS AND FAMILIARITY**

Familiarity with a figurative expression has been one of the most studied variables investigated in both, the psycholinguistics and the cognitive neuropsychology literatures (see Gibbs, 2008; Rapp and Wild, 2011). The studies by Papagno (2001) and Amanzio et al. (2008), more specifically, had both employed familiar (dictionary listed) metaphors; and in both studies, patients had no difficulty with familiar metaphors, on the assumption that their meanings could be accessed from memory. Amanzio et al. showed moreover that the patients had greater difficulty interpreting novel metaphors—on the assumption that they would need to compute a novel meaning and this ability might be affected in AD. From these results, Amanzio et al. argued that novelty was a crucial variable for predicting performance. In the present study, we showed the importance of *aptness* for predicting metaphor interpretation. Patients' interpretations, despite being worse than the controls', were particularly affected by item variables. Furthermore, when we examined aptness and familiarity as predictors of interpretation scores, the pattern of results point to aptness playing a bigger role than familiarity for metaphors whose meanings require the computation of a relationship between the topic and vehicle. Finally, even when we did find a significant relationship between subjective ratings of familiarity and metaphor interpretation, this relationship was found to be fully mediated by aptness. Therefore, a familiar metaphor is understood well because it is apt.

In the world of music, a cliché question is whether a song is popular because it is good, or good because it is popular. This question has an analogous one within the world of metaphor: is a metaphor apt because it is familiar, or familiar because it is apt? (Thibodeau and Durgin, 2011). In the present study, we were unable to identify a familiar metaphor that was not considered apt, despite working initially with a cohort of 84 metaphors (see Roncero and de Almeida, 2014, for the full set and norms). In contrast, it was possible to find metaphors considered apt but not familiar. We believe this reflects a tendency, perhaps a necessity, for expressions to be apt before they are familiar because aptness more so than familiarity will breed comprehensibility. Indeed, consider an extremely inapt statement such as *Flags are dust*. One could recite this metaphor *ad nauseum* and probably never compose a meaning other than the seemingly anomalous literal one. Thus, some level of aptness is needed to give metaphors a comprehensible "lift-off" (Chiappe et al., 2003b; Roncero et al., 2006). Consistent with this idea, we found that the relationship between subjective (i.e., rated) familiarity and interpretation ability was fully mediated by aptness in the study on metaphor interpretation. Note also that while Amanzio et al. (2008) stressed the importance of item familiarity for predicting AD patients' interpretation of metaphors, the metaphors they used were also highly apt. Furthermore, it is unclear whether the AD patients in that study had significantly lower or similar abstraction abilities than the control participants, which we have found to be a key predictive variable for AD patients' interpretation of metaphors.

The results also remind us of a basic characteristic of language: comprehension is possible for statements that have never been heard before because language is compositional and systematic, and thus productive. Novel expressions (or novel combinations of familiar words) such as metaphors are understood because they hinge on our ability to compose meanings and thus evaluate the relationship between the expressions' constituents even when they are not familiar. Consider, for example, the metaphor *Deserts are ovens.* Participants consistently stated they had rarely to never heard this expression before, an expression that had low COCA and Internet counts, but one which AD patients and normal participants almost always correctly stated its metaphorical meaning. Familiarity is probably more important for opaque relationships, i.e., those that are not deducible from the words in the expression as is the case of idioms where the meaning of the expression and its constituent lexical items seem to share an arbitrary relationship. In contrast, in a metaphor where the vehicle used is often selected to express a particular relationship (e.g., *Juliet is the sun*), the aptness of the expression and the abstraction abilities of the individual can be expected to play a greater role in determining how easily it is interpreted correctly. In summary, while familiarity plays a role in metaphor comprehension, as shown in previous studies, the aptness of a metaphor seems to play yet a greater role.

### **ABSTRACTION, RETRIEVAL, AND CONSTRUCTION**

These variables cut across another key distinction in how the meaning of a metaphor is attained: whether it is by accessing a stored representation in memory or whether it is constructed online. As our data suggest, being able to take advantage of the qualities related to aptness depends on the extent to which the ability to abstract is preserved. If so, patients with more impaired abstraction abilities can be expected to demonstrate more difficulty constructing interpretations for metaphors. If the level of abstraction required is low, however, it is possible that even patients with more impaired abstraction abilities can demonstrate normal levels of comprehension. For example, it is easier to determine the relationship between carrots and broccoli (both are vegetables) than between music and tides (both have rhythms). In our data, we noticed that among the best-interpreted metaphors by AD patients was *Deserts are ovens*, where the salient property was *hot*; and also a majority of the AD patients correctly interpreted the metaphor *Hair is a rainbow* as meaning hair is *colorful*. However, neither of these metaphors had been rated apt, and both had low Internet frequency counts (*n <* 5). These findings support the argument that metaphors can be easily interpreted, despite not being very apt, when the abstraction level demanded is low, and perhaps especially true when the properties needed for interpretation are concrete and sensory in nature (Aisenman, 1999). In contrast, a metaphor such as *Families are fortresses*, which would be more difficult for people with low abstraction ability, makes no reference to a specific literal property of fortresses (e.g., made of stone), and refers to more abstract concepts of security and protection. These observations support one of our main suggestions: insofar as different metaphors require constructing meanings that reflect different abstraction levels, individuals who are better abstractors should have an easier time interpreting metaphors. Although this suggestion is made within the limited scope of our investigation with Alzheimer's patients, it points to an important aspect of metaphor—and figurative language—interpretation in general: the ability to infer intended messages from the often anomalous linguistic expressions requires a computational mechanism capable of generating properties and relations beyond linguistic denotation. We suggest that this mechanism is intrinsically associated with comprehenders' abstracting capacities.

The pattern of our results then supports the idea that abstraction plays a key role in building figurative representations, with the aptness of an expression being the most important stimulus property. Whereas Amanzio et al. (2008) argued that novelty matters, we would argue that aptness matters more. Statements are familiar because they are apt, and unfamiliar statements should be understood by AD patients when the aptness level is sufficiently high. Also, while it is difficult to compare our study to that by Amanzio et al.'s, given the numerous methodological differences (language, materials, task, and participant variables), a critical issue seems to be the metaphors used in both studies. Amanzio et al. employed conventional Italian metaphors whose meanings were retrievable from the dictionary, and often included topics that were simply proper names (e.g., *Marco is a lion*). Metaphors that use proper names as topics only require selecting a property of the vehicle; and this process is more consistent with the need to simply retrieve an associated property (perhaps a typical property of lions) to use in the predication. In contrast, the familiar metaphors used in the present study never used topics that were proper names; thus, even for familiar metaphors, patients needed to construct a particular relationship between the topic and vehicle to attain an interpretation. With regards to the computations required for interpretation, then, our set of metaphors could be taken as similar to the novel metaphors used by Amanzio et al.: rather than being retrieved from memory, their meanings had to be constructed, based on the properties triggered by the vehicle that could be predicated of the topic. This process can be expected to be easier for all participants when the statements are apt.

Finally, these results also reinforce a key contrast between ("unfrozen") metaphors and other ("frozen") figurative language types. Whereas patients can rely more on retrieval processes for familiar expressions that have a specific associated meaning, many common metaphors require patients to deploy abilities that allow them to construct an appropriate interpretation. As we have seen, most studies on proverbs, idioms, sarcasm, and metaphor suggest that AD patients' primary difficulty with figurative language seems to be the ability to inhibit a literal interpretation when it is available (e.g., in multiple-choice tasks). But when there is no competition from a literal interpretation and when expressions are familiar, AD patients often did not differ significantly from healthy controls. For less familiar expressions, however, AD patients had shown a reduced ability compared to controls, suggesting an impaired ability to build new meanings. The present study found, however, that the ability to build new meanings is associated to the capacity to abstract away from literal meaning (compute related predicates) and the aptness of the expression.

# **INTERPRETING METAPHORS IN ALZHEIMER'S**

While we have shown evidence for the role of both, abstraction as a participant variable and aptness as a stimulus variable, in metaphor comprehension, it is not clear yet how the two interact to yield a successful interpretation. As measured by the WAIS-IV similarities subtest, abstraction is a process required to go beyond word meanings in search of properties that can account for how two referents might be related. While any two referents can be related (e.g., two things that are concrete or co-exist on Earth), finding appropriate relations requires an examination of which properties *P* can be predicated of any two given objects *x* and *y* such that *P*(*x*) and *P*(*y*) can be deemed true. Aptness is the variable that facilitates this process in metaphor comprehension. An apt metaphor is one in which sets of properties about the vehicle can be attributed to the salient properties of the topic. The process of finding which properties of the vehicle can be predicated about the topic, relies on accessing sets of predicates in memory (e.g., [*ruthless*[*shark*]], [*carnivore*[*shark*]) or building them anew ([*sneaky*[*shark*]]) and applying them to the topic ([*ruthless*[*lawyer*]], [*sneaky*[*lawyer*]]) to yield an interpretation of the metaphor. Furthermore, this process can interact with the progression of AD. Over the course of the disease, as semantic memory deteriorates progressively (Chertkow and Bub, 1990; Mårdh et al., 2013) less familiar information is expected to be "lost" first (Laisney et al., 2011) and semantic categories are expected to become increasingly prototypical over time (Chertkow and Bub, 1990; Laisney et al., 2011; Mårdh et al., 2013). Therefore, metaphors whose interpretations rely on less salient properties of topic-vehicle relations can be expected to engender greater difficulty. Apt metaphors, in contrast, often rely on vehicles whose properties can be easily predicated to the topic (Glucksberg, 2003; Jones and Estes, 2006) and may remain comprehensible for AD patients because their properties might still be available in semantic memory.

We would thus argue that patients who are better abstractors (i.e., closer to or at normal abstraction ability) can interpret even unfamiliar metaphors when these are apt. We suggest that they are capable of doing so because they can compute not only the literal meanings of words but also search for the vehicle predicates that make a predication about the topic appropriate. It is also worth noting that the interpretation errors made by AD patients in the metaphor interpretation study were either geared toward an alternative interpretation—neither compatible with the metaphor nor literal—or simply revealed an inability to produce an interpretation, thus replicating Papagno's (2001) results. These results suggest that patients recognized that the expressions presented to them were false if interpreted literally, but may have lost the ability to search for the predicates that would enable them to construct an alternative interpretation. For example, patients who were unable to interpret a metaphor would typically respond with statements such as "that makes no sense, alcohol can't be a crutch!" Therefore, these patients have retained an ability to recognize whether a statement is true of the world (i.e., that metaphors are literally false), but have difficulty making the correct abstraction.

### **CONCLUSION**

Davidson (1978) had proposed that metaphors invite us to appreciate some fact rather than expressing it overtly. Although families are not literally fortresses, a metaphor does call our attention to what fortresses can possibly predicate of families. Overall, our study suggests that the capacity to appreciate a metaphor is available to patients with AD when they can perform abstractions thus to go beyond literal meaning—when such metaphors have a sufficient level of aptness.

### **ACKNOWLEDGMENTS**

This study was supported by a graduate fellowship from the Alzheimer's Society of Canada and Fonds de recherche du Québec - Nature et technologies (FQRNT) to Carlos Roncero, and by a grant from the Natural Sciences and Engineering Research Council of Canada (NSERC) to Roberto de Almeida. We thank Leah and Marnie McQuire for their assistance collecting norming data from normal elderly controls; Melissa Hindley and Vanesso Manco, who assisted with the collection of data from AD patients; and Marco de Caro for serving as judge in the interpretation scores. We also thank the staff at Sunrise Senior Living community of Beaconsfield and the Alzheimer's Society of Montreal for facilitating the collection of data from participants. We are indebted in particular to Raffaela Cavaliere, Pascale Godbout, and Linda Banks at the Alzheimer's Society of Montreal. Finally, we thank Shelley Solomon for scheduling the norming sessions with participants in the subjective familiarity ratings study.

### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2014.00973/abstract

### **REFERENCES**


anatomic and cognitive correlates in neurodegenerative disease. *Neuroimage* 47, 2005–2015. doi: 10.1016/j.neuroimage.2009.05.077


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 April 2014; accepted: 14 November 2014; published online: 02 December 2014.*

*Citation: Roncero C and de Almeida RG (2014) The importance of being apt: metaphor comprehension in Alzheimer's disease. Front. Hum. Neurosci. 8:973. doi: 10.3389/fnhum.2014.00973*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Roncero and de Almeida. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Metaphors are physical and abstract: ERPs to metaphorically modified nouns resemble ERPs to abstract language

#### **Bálint Forgács 1,2,3\*, Megan D. Bardolph<sup>1</sup> , Ben D. Amsel <sup>1</sup> , Katherine A. DeLong<sup>1</sup> and Marta Kutas <sup>1</sup>**

<sup>1</sup> Kutas Cognitive Electrophysiology Lab, Department of Cognitive Science, University of California, San Diego, CA, USA

<sup>2</sup> Cognitive Development Center, Department of Cognitive Science, Central European University, Budapest, Hungary

<sup>3</sup> Laboratoire Psychologie de la Perception, Université Paris Descartes, Paris, France

### **Edited by:**

Vicky T. Lai, University of South Carolina, USA

### **Reviewed by:**

Christelle Declercq, Université de Reims Champagne-Ardenne, France Roel M. Willems, Donders Institute for Brain, Cognition and Behaviour, Netherlands

### **\*Correspondence:**

Bálint Forgács, Laboratoire Psychologie de la Perception, Université Paris Descartes, 45, Rue des Saints-Pères, 75006 Paris, France e-mail: balint.forgacs@ parisdescartes.fr

Metaphorical expressions very often involve words referring to physical entities and experiences. Yet, figures of speech such as metaphors are not intended to be understood literally, word-by-word. We used event-related brain potentials (ERPs) to determine whether metaphorical expressions are processed more like physical or more like abstract expressions. To this end, novel adjective-noun word pairs were presented visually in three conditions: (1) Physical, easy to experience with the senses (e.g., "printed schedule"); (2) Abstract, difficult to experience with the senses (e.g., "conditional schedule"); and (3) novel Metaphorical, expressions with a physical adjective, but a figurative meaning (e.g., "thin schedule"). We replicated the N400 lexical concreteness effect for concrete vs. abstract adjectives. In order to increase the sensitivity of the concreteness manipulation on the expressions, we divided each condition into high and low groups according to rated concreteness. Mirroring the adjective result, we observed a N400 concreteness effect at the noun for physical expressions with high concreteness ratings vs. abstract expressions with low concreteness ratings, even though the nouns per se did not differ in lexical concreteness. Paradoxically, the N400 to nouns in the metaphorical expressions was indistinguishable from that to nouns in the literal abstract expressions, but only for the more concrete subgroup of metaphors; the N400 to the less concrete subgroup of metaphors patterned with that to nouns in the literal concrete expressions. In sum, we not only find evidence for conceptual concreteness separable from lexical concreteness but also that the processing of metaphorical expressions is not driven strictly by either lexical or conceptual concreteness.

**Keywords: metaphor, figurative language, ERPs, N400, concreteness effect, abstract-concrete, novel expressions**

# **INTRODUCTION**

Metaphors are pervasive in everyday language, arguably being much more than mere rhetorical or poetic tools, possibly even serving as key instruments of linguistic change and innovation (Bréal, 1900). The high frequency of metaphors in natural language is taken by some to reflect the underlying metaphorical nature of the conceptual system. In their cognitive metaphor theory, Lakoff and Johnson (1980) propose that abstract target domains (e.g., *mind*) are structured and grounded via systematic mappings from concrete source domains (e.g., *containers*), thereby establishing conceptual metaphors (THE MIND IS A CONTAINER) which support everyday metaphorical expressions (e.g., "He couldn't get the movie out of his head"). The term that refers to the source domain is also called the *vehicle*, the proposition that is stated about the *topic* term that in turn refers to the concept of the target domain. Even though the source domains are concrete, they are not intended for literal interpretation. For instance, the expressions "thick book" and "steamy book" are both noun phrases comprising an adjective evoking a physical property, followed by a noun. In the first case, the expression as a whole is understood literally, as an object with the physical property of thickness. In contrast, "steamy book" is not understood literally as a tome emitting steam but rather figuratively as a salacious romantic novel.

Embodied cognition (e.g., Lakoff and Johnson, 1999), however, argues the opposite, namely that metaphors are understood via the parallel co-activation of neural structures representing and/or processing physical properties (i.e., "steaminess" in the above example). Consistent with this hypothesis, Desai et al. (2011) found that metaphorical sentences involving physical motion ("The public grasped the idea") were associated with fMRI activations of the left anterior inferior parietal lobe, a secondary sensorimotor area, just like sentences involving literal physical motion ("The girl grasped the flowers"). Moreover, since metaphors activated the left middle superior temporal sulcus similarly to abstract paraphrases ("The public understood the idea"), both an abstract and physical component are implicated. Less strict theories of embodied language processing (e.g., Binder and Desai, 2011) suggest that only novel expressions activate sensorimotor regions, and familiar expressions and/or familiar contexts rely on more abstract representations. This proposal resonates with language processing models such as the graded salience hypothesis (Giora, 1997, 2003) or the coarse semantic coding theory (Beeman, 1998; Jung-Beeman, 2005) that predict different processes for novel expressions (i.e., by the right hemisphere), regardless of figurativeness, and for conventional expressions (i.e., by the left hemisphere) as a result of their salient meaning and/or high degree of association. These potential differences between comprehension of conventional and novel metaphors fall outside the scope of the current inquiry, since we focus solely on relatively novel expressions.

Our aim in this report is to better understand the role that physical (or concrete, as we will also refer to them) properties of individual words (adjectives) and/or concepts (expressed by adjective-noun pairs) play in novel metaphor comprehension in real-time. To that end, we employ an online methodology that permits moment-by-moment examination of the metamorphosis from concrete, literal language into metaphorical, emergent concepts. Specifically, we recorded event-related brain potentials (ERPs)—a method enabling not just quantitative, but also qualitative, comparisons of the neural processing related to linguistic phenomena as they unfold in time.

ERP studies of metaphor processing are often centered on the N400 ERP component. The visual N400 (Kutas and Hillyard, 1980) is a negative-going centroparietally maximal ERP component peaking approximately 400 ms after stimulus onset, sensitive to the ease or difficulty of semantic memory access (for a review, see Kutas and Federmeier, 2011). Although the N400 is sensitive to a wide variety of factors that vary at the single-word level (e.g., concreteness, frequency, orthographic neighborhood size, repetition), the effects of top-down contextual information generally outweigh those of bottom-up information.

Several ERP studies have reported larger N400s to words appearing in metaphorical (e.g., "power is a strong *intoxicant*") vs. literal ("whiskey is a strong *intoxicant*") expressions (e.g., Pynte et al., 1996; Coulson and Van Petten, 2002, 2007; Tartter et al., 2002; Arzouan et al., 2007; Lai et al., 2009; Lai and Curran, 2013); overall, however, results have been inconsistent. Conventional metaphors (e.g., "broken heart") have a fixed figurative meaning, and might be stored as lexical units (Jackendoff, 1997). Perhaps as a consequence they have been found to be processed faster and more accurately than novel metaphors (e.g., "rusty moves"), for which figurative meaning needs to be computed on-line (e.g., Faust and Mashal, 2007; Forgács et al., 2014). Investigations of novel metaphors, however, differ considerably in their details. Pynte et al. (1996), for example, modified the *topics* of conventional metaphors ("fighters" in "Those fighters are lions"), not the *vehicles* ("lions") that carry the figurative meaning; in Tartter et al. (2002), sentence final words were not identical across conditions, leading to differences in frequency and cloze probability; (Coulson and Van Petten, 2002, 2007) controlled for the cloze probability of sentence final words, but not for the novelty and complexity of the expressions themselves (for further concerns with their stimuli see Lai et al., 2009).

Lai et al. (2009) carried out a well-controlled study, using a mixture of noun-, adjective- and verb-based metaphors in sentences of varying complexity. They showed that while conventional and novel metaphors both elicited larger amplitude negativities relative to literal sentences early in the N400 time window (320–440 ms), processing of conventional metaphors converged with that of the literal sentences, whereas novel metaphors continued to be treated more like anomalous sentences. They attributed the sustained negativity (between 440–560 ms) elicited by novel metaphors to semantic integration processes.

Figurative language also has been studied using semantically linked word pairs that constitute relatively minimal linguistic contexts. Arzouan et al. (2007) compared literal, conventional metaphoric, novel metaphoric, and unrelated two-word expressions by manipulating the first word while matching the second word on several psycholinguistic measures. They found that the N400 to the second word monotonically increased from literal, to conventional metaphorical, to novel metaphorical, to unrelated pairs. They also found differences in scalp topography and timing that suggest qualitative differences between the processing of conventional and novel metaphorical expressions; specifically they suggested that a late negative wave (between 550–880 ms) reflects secondary semantic integration, specific to novel metaphors. However, when novel metaphors are compared to conventional literal expressions or to sentences, novelty and figurativeness are confounded; hence the source of the effect is not clear. Comparing novel metaphors to conventional metaphors is not an optimal solution; firstly because it is, in essence, a manipulation of language conventionality; and secondly, and perhaps more importantly, there might be different processes involved in comprehending the two (cf., Bowdle and Gentner, 2005; Forgács et al., 2012). Together, these studies demonstrate, nevertheless, that metaphoricity influences real time language processing within the same time window (i.e., 200–900 ms post stimulus onset) as many other semantic factors (Kutas and Federmeier, 2011).

The N400 time window of the ERP is likewise sensitive to the concrete-abstract dimension of words, which might play a key role in the creation and comprehension of metaphors, which often involve mapping between an abstract (target) and a more concrete (source) concept. After controlling for potential confounding factors between concrete and abstract words, recent work shows behavioral processing advantages for abstract words (Kousta et al., 2011; Barber et al., 2013). ERP studies also indicate rapid differential processing of concrete and abstract words by about 300 ms post stimulus onset (e.g., Kounios and Holcomb, 1994; Holcomb et al., 1999; West and Holcomb, 2000; Lee and Federmeier, 2008; Barber et al., 2013; for a summary see Kutas et al., 2006). ERP concreteness effects are typically characterized as *greater negativity to concrete words* relative to abstract words starting within the N400 time window. These differences sometimes extend into a later time window (500–800 ms), where they typically exhibit more anterior, and more right lateralized, scalp topographies (e.g., West and Holcomb, 2000). These potentially separable electrophysiological constituents of the concreteness effect are consistent with Paivio (2007) dual coding theory, under which there are two semantic systems: a linguistic one that encodes both abstract and concrete words, and a non-verbal imagistic system that encodes only concrete words. On this theory, concrete words enjoy a processing advantage because they activate dual representations and tap into neural resources in both the linguistic and imagistic systems.

Concreteness effects have been observed in weakly constraining sentence contexts (e.g., West and Holcomb, 2000), as well as in single word contexts. Swaab et al. (2002) presented abstract and concrete words following (un)related prime words, and observed canonical N400 priming effects (larger amplitude N400s to words preceded by unrelated relative to related word primes) for both concrete and abstract words. However, they also found topographic differences between the abstract and concrete words, and an enhanced frontal negativity for concrete words, regardless of prime relatedness, consistent with structural and/or qualitative processing differences. It appears that in contrast to sentential contexts that eliminate topographic differences (Holcomb et al., 1999), single word contexts do not suffice to override qualitative ERP concreteness effects. Nonetheless, it seems that context also may have some bearing on the elicitation of concreteness effects.

Whereas minimal context ERP studies have typically manipulated concreteness by presenting different sets of concrete and abstract words that thus could differ on any number of other factors, Huang et al. (2010) cleverly relied on different adjective-same noun combinations to manipulate whether a given noun was modified in a concrete or abstract fashion. They conducted a divided visual field ERP study in which polysemous nouns (e.g., "book") were presented in the left and right visual fields (LVF, RVF), modified either by abstract adjectives ("interesting") or concrete adjectives ("thick"). Following flashes to the RVF (left hemisphere), concretely modified nouns ("thick book") evoked reduced N400s (300–500 ms) relative to abstractly modified nouns ("interesting book"); this is the reverse of the canonical ERP concreteness effect. The authors suggested that concrete adjectives (which themselves evoked the canonical concreteness effect) established a more constraining context than abstract adjectives, and the resulting increased expectancy led to reduced N400s. Following LVF (right hemisphere) presentation, concrete (vs. abstract) expressions evoked a sustained negativity over frontal electrode sites only in a later 500–900 ms time window, consistent with previously reported qualitative processing dissociations, and therefore with some versions of the dual-coding theory. Based on the results of the two-word studies of Swaab et al. (2002) and Huang et al. (2010), the canonical (context-driven) N400 expectancy effect observed in published metaphor studies might be independent of the lexical concreteness effect seen

in the same window, as the two effects seem to go in opposite directions.

To sum up, metaphorical expressions very often rely on physical expressions denoting concrete source domains to describe abstract target domains. Whereas figurative meaning clearly goes beyond the sum of its parts (i.e., the physical senses of constituent words), it is less clear to what extent (and when) the physical senses of constituent concrete words impact immediate processing of metaphorical expressions. Electrophysiological studies of metaphor processing generally show *smaller* amplitude N400s to literal relative to metaphorical expressions. In contrast, electrophysiological studies with centrally presented single words or expressions typically report a *greater* negativity within the N400 time window (and sometimes beyond) to more concrete relative to more abstract words. Against this background literature, we set out to assess whether metaphorical expressions created by combining physical adjectives that do not literally modify nouns (e.g., "sticky meeting") are processed more like concrete or abstract adjectivenoun expressions.

We adopted the word pair paradigm of Huang et al. (2010) in which different adjectives are combined with the same noun to rule out any potential lexical differences between target stimuli. Given that familiarity can mediate between concreteness and context effects (Levy-Drori and Henik, 2006), we limited our exploration to novel metaphorical adjective-noun word pairs, thereby ruling out conventional metaphors that might be stored in the lexicon, and thus invoke different processing. We compared and contrasted the following conditions, for which individual stimulus items were formed by combining three different adjectives with the same noun: (1) Abstract Literal (AL) expressions which were comprised of an abstract adjective + noun (e.g., "conditional schedule"); (2) Concrete Literal (CL) expressions which were comprised of a concrete adjective + noun (e.g., "printed schedule"); and (3) Metaphorical (MET) expressions which were comprised of a different concrete adjective + noun (e.g., "thin schedule") that were likely to be interpreted metaphorically as they could not sensibly be interpreted literally (See **Table 1** for additional representative stimuli).



At issue was whether processing of MET expressions would be driven (1) by the concrete (physical) nature of the adjective (e.g., "thin")—in which case ERPs to the MET nouns would mimic those to the CL nouns; (2) by the non-literal abstract nature of the noun phrases—in which case ERPs to the MET nouns would mimic those to the AL nouns; or (3) by the non-literal, metaphorical nature of the noun phrase interpretation (e.g., "thin schedule")—in which case ERPs to the MET nouns would differ from those to both the CL and AL nouns, eliciting the largest N400 and/or late negativity as in most ERP studies of novel metaphors.

We consider several potential outcomes in the N400 window (300–500 ms) of the adjective as well as the noun. We will first inspect the ERPs elicited by the adjectives to obtain a lexical concreteness effect baseline. We expect to see larger N400s to concrete adjectives (easily experienced with the senses) in both the CL and MET conditions compared to the abstract adjectives (not easily experienced with the senses) in the AL condition. If the concreteness of the adjective drives the processing and interpretation of the noun phrase, then we expect to see an N400 concreteness effect at the noun such that CL expression, but also the MET expressions, exhibit larger N400s than AL expressions (CL = MET > AL). Conversely, if it is the abstractness of the emergent concept to which the noun phrase refers rather than the abstractness of the adjective *per se* that drives processing and interpretation (such that the MET noun is processed as if it followed an abstract adjective) then the MET and AL nouns would elicit equivalently reduced N400 amplitudes (CL > MET = AL). If, however, the system concurrently distinguishes between emergent concreteness, and between abstract concepts that are literal vs. those that are metaphorical, then the N400 to metaphors may be even larger than the N400 for abstract expressions (MET > AL) due to increased processing demands of understanding a novel expression formed by an adjective referring to a physical trait that a noun cannot literally possess. This outcome would converge with the ERP metaphor literature, and with the differential sensitivity of the N400 to independent factors of concreteness and ease of processing/expectancy.

### **MATERIALS AND METHODS STIMULI**

To create the two-word expressions used in the ERP study, each of 212 nouns was combined with 3 different adjectives to form 636 novel word pairs. The nouns were polysemous in that metaphorical (MET), concrete literal (CL) or abstract literal (AL) expressions could result from modification by the different adjectives. Examples of the stimuli can be seen in **Table 1**.

The AL word pairs consisted of abstract adjectives modifying nouns to form expressions referring to abstract concepts. In the CL and MET conditions, adjectives were concrete, but in the MET condition adjectives modified nouns in a nonliteral manner: 43% of the adjectives were shared across these two conditions. CL expressions referred to entities easily experienced by the senses, whereas AL and MET expressions referred to entities not easily experienced with the senses. Word pairs were designed to be meaningful but novel, with novelty controlled for by corpus measures. All word pairs appeared 4 times or less in the BNC and the probability of a noun following an adjective was less than 0.01. Semantic relatedness between constituents of expressions was low, as measured by Latent Semantic Analysis (LSA) (*M* = 0.11, *SD* = 0.11).

To ensure that stimuli were consistent with the definitions above, all word pairs were rated in an online norming study by 90 UCSD students not participating in the ERP study. Word pairs were rated along three dimensions (concreteness, literalness, and meaningfulness) on seven point Likert-scales: (1: not at all— 7: completely). The three tasks assigned randomly to individual word pairs were: (1) "How easy is it to experience with the senses?"; (2) "How literal is it?"; and (3) "How meaningful is it?" We chose a literalness rating in order to avoid the explanation or definition of "metaphorical" and/or "figurative", suspecting that it might be easier to determine whether something is meant literally than figuratively. Each participant saw every word pair but rated individual expressions along only one dimension. Across participants, all word pairs were rated for all dimensions. Pairs in which the CL expression was rated more abstract than the AL expression, or for which the MET was rated more literal than the CL or AL expressions, were excluded. Of the 212 normed items, the two rated least meaningful were discarded. Of the remaining 210 items, the half (105) rated most meaningful (and most literal and most metaphorical for the corresponding conditions) were used as stimuli, with the rest assigned to be fillers. Item norming statistics are summarized in **Table 2**. Using the same target nouns in each condition ensured that noun lexical factors were identically matched (i.e., no differences in terms of frequency, length or other psycholinguistic measure).

A one-way ANOVA revealed significant differences between conditions with respect to concreteness, *F*(2,312) = 162.3, *p* < 0.001, η 2 *<sup>P</sup>* = 0.51, literalness, *F*(2,312) = 387.9, *p* < 0.001, η 2 *<sup>P</sup>* = 0.71, and meaningfulness, *F*(2,312) = 114, *p* < 0.001, η 2 *<sup>P</sup>* = 0.42. Levene's test of equality of error variances was significant for concreteness, *F*(2,312) = 7.84, *p* < 0.001, and meaningfulness, *F*(2,312) = 5.04, *p* < 0.01, and there was a strong trend for literalness, *F*(2,312) = 3, *p* = 0.051. Therefore, the Tamhane *post hoc* test was used for pairwise comparisons. All conditions were significantly different from each other in concreteness (*p* < 0.001), and literalness (*p* < 0.05), while in meaningfulness AL and CL expressions were not significantly different, with only MET differing from the other conditions (*p* < 0.001, although all conditions were still above 4, the middle of the scale used).

Participants read each adjective-noun pair followed by a probe word that was either related or unrelated to the two-word phrase. Examples of stimuli and related probe words are shown in **Table 3**.



**Table 3 | Example stimuli and related probe words**.


### **ERP PARTICIPANTS**

Forty-two UCSD volunteers (18 females) participated for course credit or were compensated at 7 h. Participants were righthanded, native English speakers with normal or corrected-tonormal vision, ranging from 18–29 years old (*M* = 21). Of the 42 participants, 7 were excluded from further analysis due to excessive eye blink or movement artifacts, which left a remaining 35 participants whose data we continued to examine.

### **PROCEDURE**

The experiment was conducted according to human subject protocols approved by the University of California, San Diego Institutional Review Board. All participants provided their informed consent in writing before participating in the experiment. ERPs were recorded in a single session in a sound-attenuated, electrically shielded chamber. Participants sat one meter in front of a CRT monitor and read adjectivenoun pairs followed by probe words. Participants used two hand-held buttons to indicate whether the probe word (e.g., "leader") was related to the adjective-noun pair (e.g., "respected person"). Importantly, this technique encouraged participants to comprehend the novel metaphorical expressions in a figurative rather than a literal sense. Response hand was counterbalanced across participants and lists. Stimuli were centrally presented in white Arial 26 point font on a black background on a CRT monitor. Participants completed 6 blocks of 35 items each with short breaks between them. Each trial started with a blank screen (1000 ms), followed by a fixation cross "+" (1000 ms). The adjective appeared centrally for 200 ms, followed by a 300 ms blank screen, followed by the noun for 200 ms, followed by a 1500 ms blank screen, and finally a probe word appeared for 200 ms. After 800 ms following the probe onset, a question mark "?" was displayed until participants responded with a button press. A small red dot was presented centrally and slightly below the text throughout the trials, except during the question mark and the first 1000 ms blank screen; participants were instructed not to blink when it was present. Participants saw all 105 target nouns once, and each was paired with a single adjective once, resulting in 35 items from each condition, along with 105 filler expressions. Items were arranged in 5 different lists to avoid order effects. Each of the 5 lists was separated into 3 sublists so that each noun was paired with all 3 adjectives across participants.

### **EEG RECORDING**

The electroencephalogram (EEG) was recorded from 26 electrodes arranged geodesically in an Electro-cap, each referenced online to an electrode over the left mastoid. Blinks and eye movements were monitored from electrodes placed on the outer canthi and under each eye, also referenced to the left mastoid. Electrode impedances were kept below 5 KΩ. The EEG was amplified with Grass amplifiers with a band pass of 0.01–100 Hz and was continuously digitized at a sampling rate of 250 samples/second.

### **DATA ANALYSIS**

Trials contaminated by eye movements, excessive muscle activity, or amplifier blocking were rejected off-line before averaging: these trials (8.3% for MET, 10.6% for CL, and 10.1% for AL) were excluded from further analysis. Data were re-referenced off-line to the algebraic mean of the left and right mastoids and averaged for each experimental condition, time-locked to adjective onsets. ERPs were computed for epochs extending from 500 ms pre- to 1500 ms post-adjective onset, using a pre-stimulus baseline of 500 ms. Since we were interested in the processing of the two word adjective-noun expression, we baseline corrected only prior to the adjective, practically treating the two-word combination as one experimental unit.

ANOVAs were used to analyze mean amplitude ERPs over 6 medial central electrodes (MiCe, MiPa, RMCe, LMCe, LMFr, RMFr) where concreteness effects in the N400 time window are commonly observed: these were the same electrode sites over which adjective concreteness effects were assessed to determine inclusion in statistical analyses. Based on the literature, we analyzed concreteness effects in the following time windows: (1) an adjective N400 time window (300–500 ms post-adjective onset); and (2) a noun N400 time window (300–500 ms postnoun onset).

# **RESULTS**

### **BEHAVIORAL RESULTS**

Overall response accuracy (*M* = 88%, SD = 6%) for button presses indicating whether or not the probe was related to the word pair suggested that the word pairs were read for comprehension.

### **ERP RESULTS**

Average ERPs for all 35 participants are shown in **Figure 1**.

### **Adjective N400 (300–500 ms post-adjective onset)**

An ANOVA with 3 levels of word type and 6 levels of electrode location revealed a main effect of word type, *F*(2,68) = 10.65, *p* < 0.001, with CL and MET adjectives showing greater N400 mean amplitude (−1.78 µV and −2 µV, respectively) than AL adjectives (−0.67 µV). Planned pairwise comparisons indicated that the mean amplitudes for CL and AL adjectives and for MET and AL adjectives were significantly different (*p* < 0.001).

Our AL and CL conditions were based on adjective-noun pair concreteness ratings. However, to ensure that these labels also matched the concreteness of the adjectives alone, we obtained concreteness ratings for adjectives in the CL and AL conditions from Brysbaert et al. (2013). For 46 items not found in the database, concreteness ratings were collected from 7 UCSD undergrad students who did not participate in the ERP study.

Adjectives were sorted into high and low concreteness conditions using a median split. An ANOVA with 2 levels of word type (high and low concreteness adjectives) and 6 levels of electrode location revealed a main effect of word type, *F*(1,34) = 15.59, *p* < 0.001, with high concreteness adjectives eliciting a greater negativity (mean amplitude = −1.73 µV) than low concreteness adjectives (−0.62 µV). As these results were nearly identical to the results for our labeled conditions, we assume that the difference between AL and CL adjectives indeed reflects lexical concreteness.

### **Noun N400 (800–1000 ms post-adjective onset)**

A concreteness effect in the expected direction is visible, with nouns in the CL condition eliciting a larger N400 than AL nouns. MET nouns appear to be patterning with CL nouns, also eliciting a larger N400 relative to AL nouns. However, an ANOVA with 3 levels of word type and 6 levels of electrode location showed no main effect of word type, *F*(2,68) = 1.24, *p* = 0.3. In order to increase the sensitivity of the concreteness manipulation, we sorted the data based on paired concreteness ratings into the most concrete and least concrete items within conditions. The most concrete (top half of CL) and least concrete (bottom half of AL) items were compared to MET items in order obtain a clearer pattern of concreteness effects—if they were indeed present in the data.

An ANOVA with 3 levels of word type (MET, CL-high, and AL-low) and 6 levels of electrode location revealed a main effect of word type, *F*(2,68) = 4.38, *p* < 0.05, with CL-high and MET nouns showing greater mean N400 amplitude (−1.39 µV and −1.17 µV, respectively) than AL-low nouns (mean amplitude = −0.29 µV) (**Figure 1**). Pairwise comparisons indicated that the difference between CL-high and AL-low nouns and MET and AL-low nouns was statistically significant (*p* < 0.05 for both comparisons). Thus the N400 pattern at the noun resembles that at the adjective.

Like the median split of the CL and AL conditions, we split MET items based on participant ratings of pair concreteness. These two conditions, MET-high and MET-low, were analyzed in order to better understand how metaphorical items may be processed on the basis of their rated concreteness. An ANOVA with 2 levels of word type (MET-high and MET-low) and 6 levels of electrode revealed a significant main effect of word type, *F*(1,34) = 5.98, *p* < 0.05: the MET-low group was associated with a larger N400 mean amplitude (−1.62 µV) than the MET-high group (−0.53 µV). We next compared these two MET groups to the high and low CL ERPs (**Figure 2**).

First, the high-concreteness MET group was compared to the most abstract and most CL conditions described above. An ANOVA with 3 levels of word type (MET-high, CL-high, and AL-low) and 6 levels of electrode location revealed a main effect of word type, *F*(2,68) = 3.18, *p* < 0.05, with MET-high nouns showing a reduced N400 mean amplitude (−0.53 µV) compared to CL-high nouns (−1.39 µV). Pairwise comparisons showed that the difference between MET-high and CL-high nouns was borderline significant (*p* = 0.08) and there was no statistical difference between MET-high and AL-low nouns (*p* = 0.58).

Second, the low-concreteness MET group was compared to the most abstract and most CL conditions described above. An ANOVA with 3 levels of word type (MET-low, CL-high, and ALlow) and 6 levels of electrode location revealed a main effect of word type, *F*(2,28) = 4.10, *p* < 0.05, with MET-low nouns showing increased N400 mean amplitude (−1.62 µV) compared to ALlow nouns (−0.29 µV). Pairwise comparisons showed that the difference between MET-low and AL-low nouns was statistically significant (*p* < 0.05) and there was no statistical difference between MET-low and CL-high nouns (*p* = 0.66).

To ensure that the observed differences at the noun are not merely spillover from the adjectives, we conducted three ANOVAs as above in the adjective N400 time window (300–500 ms post-adjective onset). Comparing MET (−2.01 µV), CL-high (−2.35 µV), and AL-low (−0.8 µV) revealed a main effect of word type, *F*(2,68) = 8.73, *p* < 0.001, with a significant difference between AL-low and both CL-high (*p* < 0.001) and MET (*p* < 0.01). The ANOVA including MET-high (−1.71 µV), CL-high and AL-low also showed a main effect of word type, *F*(2,68) = 6.98, *p* < 0.01, where only MET-high was different compared to AL-low (*p* < 0.05). The ANOVA with MET-low (−2.32 µV), CL-high, and AL-low likewise revealed a main effect of word type, *F*(2,68) = 7.64, *p* < 0.01, where only MET-low was different from AL-low (*p* < 0.01). In sum, while the pattern of N400 effects at the noun mimicked that at the adjective in the (high vs. low) literal conditions, this was not the case for the high vs. low MET conditions, which reversed their direction from adjective to noun.

# **DISCUSSION**

In the current study we examined the real-time processing of novel metaphorical ("sticky meeting"), AL ("constructive meeting"), and CL ("loud meeting") two-word (adjectivenoun) expressions. We replicated the well-known N400 lexical concreteness effect on the initial adjectives of the two word expressions. A reliable concreteness effect also emerged for the nouns of the literal expressions when the most CL expressions were compared with the most AL expressions—despite the absence of any difference in rated lexical noun concreteness. We also found that on average the N400 to the metaphorical expressions patterned with that to the most CL expressions rather than with that to the most AL expressions, contrary to what we expected. Upon dividing the metaphorical expressions into more concrete vs. more abstract subgroups based on pair concreteness ratings we found that, paradoxically, the more abstract subgroup of metaphors were associated with a larger N400 than not only the most AL expressions but also the more concrete metaphor expressions.

The N400 concreteness effect on the prenominal adjectives resembles that reported for single words (Huang et al., 2010; Rabovsky et al., 2012; Amsel and Cree, 2013; Barber et al., 2013). This finding has been hypothesized to reflect activation of a richer network of semantic representations (or greater activation within a given network) during the processing of concrete vs. less concrete words. Within the literal expressions split by pairwise concreteness, target nouns elicited a prolonged negativity starting in the N400 time window that varied in amplitude with the rated concreteness of the expression. Huang et al. (2010) had showed that modifying a noun in a more concrete vs. more abstract manner can lead to a concreteness effect (e.g., for "book" in "interesting book" vs. "thick book"). Like Huang et al. (2010), we find that for literal (non-metaphorical) expressions, the concreteness of an adjective seems to determine the concreteness effect on a subsequent noun, at least at the extremes (the direction of their effect cannot be directly compared with ours as they employed visual half field presentation and a different task, among other differences). This manifestation of the concreteness effect at the noun is particularly striking given that the nouns themselves do not differ on this very measure (of concreteness).

Unlike some ERP studies (West and Holcomb, 2000; Huang et al., 2010), we find no evidence that our concreteness effects reflect imagery-related processes over and above the processes that routinely influence N400 amplitude. Specifically, our concreteness effects at neither the adjectives nor the nouns of literal expressions exhibited more frontal and/or right hemispheric distributions than the canonical N400 distribution to written words. As we already noted, our N400 concreteness effect at the adjective is consistent with a proposed richness of the activated conceptual representations in a lexico-semantic system (e.g., Holcomb et al., 1999; Barber et al., 2013). Following the same logic, our concreteness effect for the concretely vs. abstractly modified nouns could reflect the richness of the emergent higher-level conceptual representation—at least for the literal expressions. For example, reading "thick book" in order to determine its relation to an upcoming probe word could activate a richer network of features of the concrete concept BOOK than "interesting book". This possibility does not necessarily implicate sensory/motor activations for the interpretation of the CL expressions, as it could just as well reflect greater activation within an amodal semantic system (Plaut and Shallice, 1993). Our results for the literal expressions extend the results of single word studies (e.g., Barber et al., 2013) and Paivio (2007) dual coding theory insofar as they demonstrate that concreteness need not be a strictly lexical property (i.e., pegged to single word meanings), but an emergent property of higher-level concepts as well.

For the metaphorical expressions, however, our N400 data pattern diverges from that of our literal expressions, as well as from Huang et al. (2010). When we compare the metaphor noun N400s to the noun N400s of the most abstract and most CL expressions, our data (at first glance) suggest that the concreteness effect at the noun is driven by the lexical concreteness of the adjective, as seems to be the case in the literal expressions and in Huang et al. (2010). To the extent that concreteness effects at the noun are merely an extension (spillover) of the ERP concreteness effect at the adjective, this pattern should remain unchanged for all metaphorical expressions. However, when we divide our metaphorical expressions by paired concreteness, the more concrete metaphors appeared to be processed (i.e., looked) more like AL expressions, and the more abstract metaphors looked more like CL expressions. In other words, the elicited negativity is reversed within the metaphorical expressions, with expressions rated as more abstract eliciting larger noun N400s than those rated as more concrete. If the negativity for metaphors observed in the N400 time window were a concreteness effect proper, high concreteness metaphors should have elicited a greater negativity than low concreteness metaphors. Yet the more concrete a metaphor was rated, the smaller the negativity it elicited. At a minimum, this pattern of results demonstrates that the processing of the nouns in the metaphorical expressions cannot be driven strictly by either lexical concreteness or higher-level emergent concreteness.

Of course, concreteness is only one of many factors known to influence the ERP, and in particular the N400. Less literal and more novel expressions have been shown to elicit larger N400s. Target words in novel metaphors usually elicit larger N400 amplitudes than target words in literal expressions (Coulson and Van Petten, 2002, 2007; Arzouan et al., 2007; Lai et al., 2009), and relative to conventional metaphors they elicit larger negativities slightly later as well, post-N400 (Arzouan et al., 2007; Lai et al., 2009). Contra this monotonic relationship, our high concreteness metaphorical expression condition did not elicit larger N400s than our AL condition, and strikingly, was *reduced* in comparison with CL expressions (see **Figure 2**). Even though their concreteness positively correlated with literalness (*r*(105) = 0.59, *p* < 0.001) and the more abstract metaphorical expressions did elicit a larger N400 than the more AL expressions, they did not differ from the more CL expressions. If the increased negativity for metaphors in comparison with more AL expressions (**Figure 1**) were due to metaphoricity *per se*, it should have manifest for all metaphors, but it did not.

One reason why our results diverged in part from other investigations of metaphorical language may be that our expressions were matched on novelty across conditions, whereas in the aforementioned studies only the novel metaphors were unfamiliar. As a result, all three of our experimental conditions may have invoked some additional constructive or integrative processing (linked in previous reports to the post-N400, sustained negativity). On this possibility, our finding of equivalent N400s for more AL expressions and more concrete metaphorical expressions (despite a lexical concreteness difference at the adjectives) suggests that readers need not necessarily construct the literal (i.e., physical) interpretation of a novel metaphorical expression before understanding its figurative meaning. This interpretation is consistent with parallel models of metaphor comprehension (Glucksberg, 2003): the abstract, figurative meaning of metaphors might be readily and directly available, as also inferred by Blasko and Connine (1993). Our findings argue against other models of serial processing of metaphors as well. For example, Giora (1997, 2003) proposed that the comprehension of novel metaphors requires the rejection of a salient, literal meaning before arriving at a non-salient metaphorical meaning. If we assume that serial processing would result in non-identical ERP responses, the lack of differences between more concrete metaphors and more AL word pairs does not support the serial processing assumption (unless the latter have both a salient and a non-salient literal meaning).

Moreover, our results indicate that figurative meaning need not be directly derivative of the physical aspects of verbal expressions, but rather may at times emerge abstractly at least by the time window of the N400, a well-established marker of semantic analysis. Consequently, our data might pose a challenge to strong views of embodied cognition (e.g., Lakoff and Johnson, 1999). On a strong embodiment view, sensorimotor source domains (e.g., physical sensation of warmth) are activated in parallel with more abstract target domains, so as to provide structure and semantic content for understanding metaphorical expressions (as in "warm smile"). Gallese and Lakoff (2005) propose that "grasping an idea" involves some of the same motor activations as "grasping a banana". In other words, during conceptual integration both conceptual domains should be active at the same time; if so, we expected to see this reflected in a canonical N400 concreteness effect at the noun. We did not. Among other possible interpretations, the apparent absence of a processing difference between more concrete metaphorical expressions and more AL expressions at the noun could be taken to mean that by the time some metaphorical meanings are constructed, physical aspects of the words might no longer be playing a tangible role in comprehension.

A cumulative conclusion thus far is that neither concreteness nor metaphoricity *per se*, can fully account for the processing differences among our novel literal and metaphorical expressions, at least in the N400 time window. We can speculate about what additional factor may be influencing our results. Among metaphorical expressions, rated pair concreteness correlates with meaningfulness, *r*(105) = 0.53, *p* < 0.001 (a phenomenon observed also by Forgács et al., 2014). Thus, the greater N400 elicited by the less concrete metaphors could reflect the typical inverse relationship between context-driven expectancy and N400 amplitude, rather than processes specific either to lexical concreteness, or to figurative meaning. Perhaps the metaphorical expressions rated more concrete and more meaningful were more likely to increase semantic expectancies for the upcoming noun.

Our finding that more meaningful and more concrete metaphorical expressions seem to be processed like more AL expressions fits nicely with a newly emerging picture of metaphor comprehension. On this view, there is no empirical reason to assume that processing of metaphors invokes special processes that are not also required for comprehending literal language. Indeed, despite long held assumptions about the special role of the right hemisphere in figurative language, recent results suggest that it does not play a privileged role in metaphor comprehension after all (Rapp et al., 2004, 2007; Coulson and Van Petten, 2007; Bohrn et al., 2012; Forgács et al., 2012, 2014). Likewise, there is no support for the proposal that figurative meaning of novel metaphorical expressions proceeds only after attempts at (salient) literal meaning fail (Forgács et al., 2014).

Forgács (2014) has developed a novel theoretical framework for metaphor comprehension—Abstract Conceptual Substitution (ACS). According to this view, for an initial metaphorical interpretation it might suffice to substitute the vehicle term ("fluffy" in "fluffy speech") with one of its abstract, nonphysical properties, prior to any systematic mapping, or structural alignment, etc. This take on metaphor interpretation is closely related to that of Sperber and Wilson (2008), and the lexical pragmatic account of Wilson and Carston (2007), Carston (2010). They propose that metaphors are part of a continuum of loose language use (together with hyperbole and approximation, for example), which are understood via the generation of *ad hoc* concepts. For example, in the expression "fluffy speech" the concept FLUFFY is transformed into FLUFFY\*, which is conceptually both broader and narrower (i.e., more general and more specific at the same time), in ways left as yet unspecified, than the original, encoded, lexical concept. Forgács specifies this broadening/narrowing in terms of the abstract-concrete dimension: FLUFFY\* could broaden the lexical concept FLUFFY by activating more of its abstract properties (e.g., *superfluous, cushy*, etc.), but narrow the lexical concept by suppressing all of its concrete/physical properties (e.g., *physically protruding fluff, textile, texture*, etc.). This approach is similar to Glucksberg (2003) category assertion view, but does not rely on the creation of superordinate *ad hoc* categories or on the generation of *ad hoc* concepts. Instead, it might suffice to conceptually substitute the most relevant (i.e., contextually most activated) abstract property for the vehicle term ("fluffy"), creating "*superfluous, cushy* speech". This is not merely a paraphrase, however, because expressing *superfluous* with "fluffy" brings along with it several cognitive consequences, such as deniability, negotiability, etc., much like indirect speech (cf. Pinker et al., 2008). The lack of a concreteness effect at least for the more meaningful, more concrete metaphorical expressions vs. the more AL expressions is consistent with this abstract substitution view in that the system seems to substitute abstract but not concrete properties for the vehicle term in our novel metaphorical expressions.

To sum up, our results suggest that the concreteness effect does not merely reflect the concreteness of individual words, but may also be sensitive to the concreteness of higher-level conceptual information. At least in the N400 time window, and seemingly only for more meaningful, more concrete adjectival metaphors, our findings suggest that metaphorical language may be processed and presumably understood in an abstract manner, despite the concrete nature of its constituent parts. In conclusion, it appears that comprehending certain metaphorical expressions created from physical concepts and words can be as *readily grasped*, and as *rapidly digested* as AL expressions, although not strictly driven by concreteness.

### **AUTHOR CONTRIBUTIONS**

Bálint Forgács conceived research; Bálint Forgács, Megan D. Bardolph, Ben D. Amsel, Katherine A. DeLong, and Marta Kutas designed research; Bálint Forgács, Megan D. Bardolph performed research; Bálint Forgács, Megan D. Bardolph, Ben D. Amsel, Katherine A. DeLong, and Marta Kutas analyzed data; Bálint Forgács, Megan D. Bardolph, Ben D. Amsel, Katherine A. DeLong, and Marta Kutas wrote the paper.

### **ACKNOWLEDGMENTS**

We would like to thank the invaluable help of Tom Urbach, PhD, in experimental design and stimulus norming, Jamie Alexandre, PhD, in programming, Gabriel Doyle, PhD, in corpus linguistic measures, and Lindsay Crissman and Alex Kuo in data collection. Bálint Forgács was supported by a Fulbright fellowship, and by an Emergence(s) program grant from the City of Paris to Judit Gervain. The research was funded by grant NICHD22614 to Marta Kutas.

### **REFERENCES**


figurative noun noun compound words. *Neuroimage* 63, 1432–1442. doi: 10. 1016/j.neuroimage.2012.07.029


Wilson, D., and Carston, R. (2007). "A unitary approach to lexical pragmatics: relevance, inference and ad hoc concepts," in *Pragmatics*, ed N. Burton-Roberts (Basingstoke: Palgrave MacMillan), 230–259.

**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 April 2014; accepted: 12 January 2015; published online: 10 February 2015*.

*Citation: Forgács B, Bardolph MD, Amsel BD, DeLong KA and Kutas M (2015) Metaphors are physical and abstract: ERPs to metaphorically modified nouns resemble ERPs to abstract language. Front. Hum. Neurosci. 9:28. doi: 10.3389/fnhum.2015.00028*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2015 Forgács, Bardolph, Amsel, DeLong and Kutas. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

# The role of literal meaning in figurative language comprehension: evidence from masked priming ERP

#### *Hanna Weiland1 \*, Valentina Bambini <sup>2</sup> and Petra B. Schumacher 1,3*

*<sup>1</sup> Department of English and Linguistics, Johannes Gutenberg University Mainz, Mainz, Germany*

*<sup>2</sup> Center for Neurocognition and Theoretical Syntax, Institute for Advanced Study IUSS Pavia, Pavia, Italy*

*<sup>3</sup> Institute of German Language and Literature I, University of Cologne, Cologne, Germany*

### *Edited by:*

*Vicky T. Lai, Max Planck Institute for Psycholinguistics, Netherlands*

### *Reviewed by:*

*Arthur M. Jacobs, Freie Universität Berlin, Germany Cristina Cacciari, University of Modena, Italy*

### *\*Correspondence:*

*Hanna Weiland, Department of English and Linguistics, Johannes Gutenberg University Mainz, Jakob-Welder-Weg. 18, Mainz 55099, Germany e-mail: weiland@uni-mainz.de*

The role of literal meaning during the construction of meaning that goes beyond pure literal composition was investigated by combining cross-modal masked priming and ERPs. This experimental design was chosen to compare two conflicting theoretical positions on this topic. The indirect access account claims that literal aspects are processed first, and additional meaning components are computed only if no satisfactory interpretation is reached. In contrast, the direct access approach argues that figurative aspects can be accessed immediately. We presented metaphors (*These lawyers are hyenas*, Experiment 1a and 1b) and producer-for-product metonymies (*The boy read Böll*, Experiment 2a and 2b) with and without a prime word that was semantically relevant to the literal meaning of the target word (*furry* and *talented*, respectively). In the presentation without priming, metaphors revealed a biphasic N400-Late Positivity pattern, while metonymies showed an N400 only. We interpret the findings within a two-phase language architecture where contextual expectations guide initial access (N400) and precede pragmatic adjustment resulting in reconceptualization (Late Positivity). With masked priming, the N400-difference was reduced for metaphors and vanished for metonymies. This speaks against the direct access view that predicts a facilitating effect for the literal condition only and hence would predict the N400-difference to increase. The results are more consistent with indirect access accounts that argue for facilitation effects for both conditions and consequently for consistent or even smaller N400-amplitude differences. This combined masked priming ERP paradigm therefore yields new insights into the role of literal meaning in the online composition of figurative language.

**Keywords: metaphor, metonymy, literal meaning, masked priming, N400, late positivity, experimental pragmatics**

# **INTRODUCTION**

Human communication often requires the construction of meaning that goes beyond the pure compositional computation of the literal meaning of the single sentence components. In contrast to popular believe, these non-literal utterances are not rare individual cases but an ever-present phenomenon in our daily communication (cf. Lakoff and Johnson, 1980). Several types of non-literal language have already been theoretically discussed and empirically investigated in the domain of experimental pragmatics and neuropragmatics (Noveck and Reboul, 2008; Bambini and Bara, 2012; Schumacher, 2013), however there are still some important remaining questions. In the following, we will first concentrate on the processing of metaphors since they play a prominent role in the theoretical discussion (cf. Grice, 1975; Sperber and Wilson, 1985; Gibbs, 1994; Giora, 1997; Glucksberg et al., 2001; Carston, 2010a). We will discuss the general underlying mechanisms in metaphor comprehension, then we will investigate the role of literal meaning aspects on early processing through a novel experimental design. This will be complemented by establishing a connection to one other type of non-literal language, namely metonymy. We report two experiments that investigated (i) the cognitive basis of metaphor and metonymy comprehension in German through event-related potentials (ERPs) and (ii) the role of literal meaning in figurative language processing by using the cross-modal masked priming technique in sentential context in combination with ERPs.

### **THEORETICAL DEBATE OVER FIGURATIVE MEANING**

The contribution of literal meaning aspects1 during figurative processing marks one of the dividing lines between competing theories. While some theories suggest that the processor always starts from the literal meaning (indirect access account), others

<sup>1</sup>We use literal meaning in the sense that it is the meaning of a word stored in the lexicon and which it has if taken for itself ("context-free"). This definition is based on Searle (1978, 1979), who stated that the components of sentences constitute their literal meaning. A single word and a sentence both could have more than one literal meaning, e.g., homonyms. Crucially it is important to differentiate between the literality of the whole utterance and the single words within as was e.g., pointed out by Gibbs (2002) and Recanati (1995). Literal meaning is often contrasted with figurative language in a sense that everything that is not literal is figurative. Beside this binary view, some approaches consider literal and figurative meaning as the endpoints on a scale on which the different phenomena (e.g., metaphor, metonymy, irony, idioms) can be arranged.

assume that only the relevant meaning is processed (e.g., direct access account, Relevancy Theory). In the following we will concentrate on these two extreme positions, since the relatively old debate between the indirect and direct access account still continues and hasn't been fully answered. In particular the question of the early contribution of literal meaning aspects that rests at the core of these two opposing views has not been settled yet. Of course, a range of theoretical approaches exist beside these poles that have adopted less extreme positions: Gibbs (2002) and Recanati (1995) for instance argue for literal meaning to play only a local role: it is activated for single words within a figurative utterance but the processing of the literal meaning of the whole figurative utterance is not required. Others consider literal meaning to linger in the background (cf. Carston, 2010b, but also Giora, 2008). Literal meaning is also suggested to be the important foundation for blending (cf. Fauconnier and Turner, 2002; Coulson and Oakley, 2005), respectively mapping processes (cf. Coulson and Matlock, 2001; Croft, 2002), merging of features (cf. Kintsch, 2000) or the activation of secondary cognitive representations (cf. Evans, 2010). Gentner and Wolff (1997) and Wolff and Gentner (2011) relate the role of literal meaning to the progress of the career of the metaphor. At the beginning, the metaphorical meaning is created via structural alignment of the components of the literal meaning, but in the course of repeated usage, the metaphorical meaning is stored in the lexicon (yielding a dead metaphor). Furthermore, non-literal language use encompasses many different phenomena, including irony, humor, hyperbole, simile, and so forth (cf. e.g., Giora, 1995; Carston, 2002; Sperber and Wilson, 2008; Gibbs and Colston, 2012). It is thus important to identify the differences and commonalities between the various types of figurative language comprehension. In the following, we focus on the link between metaphor and metonymy, which has not been investigated systematically yet (but see Gibbs, 1990; Rundblad and Annaz, 2010; Bambini et al., 2013 for initial developmental and behavioral findings).

In non-literal language processing <sup>2</sup> , the meaning of an utterance must be extended beyond the standard connotation. Not only the range of required modifications varies but also the range of possible interpretations. Metaphor (e.g., *These dancers are butterflies*) is the linguistic phenomenon that allows the greatest width of possible interpretations. Even in the simple form "X is Y," one can imagine a reading in which the dancers are colorful, fluttering, light-footed, and so forth. Since ancient times, the understanding of metaphors has often been defined as transferring properties of a word or phrase, the source (e.g., *butterflies*), to an event, person or object, the target (e.g., *dancers*), where source and target are not directly connected (cf. the "transfer" Aristotle discussed in *Rhetoric*; see also Black, 1962; Lakoff, 1993).

How does a metaphoric reading emerge? This question has sparked a lot of debate (see e.g., Gibbs and Colston, 2012 for an overview). As mentioned above, we will concentrate on two extreme positions, the indirect and direct access view. The indirect access view (also labeled standard model) originates from the approaches by Grice and Searle. Grice (1975) assumed that a metaphor violates the conversational maxim of quality, but the addressee assumes the violation to be intentional and then seeks a meaningful interpretation by means of pragmatically driven implicature. Searle (1979)suggested that metaphors are processed in three steps. First the utterance is identified as not being literally interpretable, i.e., what is said is not what is meant. Second the addressee has to look for possible alternative interpretations of the utterance by comparison of properties. In the last step, the identified properties are checked for their sensicality. Accordingly, the possible interpretations of the metaphor are always achieved by going through the literal meaning. In terms of language processing, this approach would predict differences between literal and non-literal utterances, where any utterance is claimed to be first interpreted literally. These assumptions of the indirect access view have been criticized in subsequent work (cf. e.g., Sperber and Wilson, 1986; Giora, 1997; Gibbs, 2002).

The direct access view argues against the idea that the literal meaning is always accessed first (cf. Gibbs, 1994; Glucksberg, 2008). Originating from the idea that the understanding of metaphor is based on dual reference, Glucksberg (2008) for instance suggested that the processing of metaphor does not include more steps than the interpretation of literal utterances. Assuming that the vehicle (source) has a literal and a metaphorical reference, the processor only has to choose the appropriate one. For the metaphor *This lawyer is a shark*, the processor activates the metaphorical reference *shark* of the predator category that includes all properties relevant for the metaphor (e.g., aggressive, predatory, etc.) but none of the properties irrelevant for the metaphor (e.g., having fins). In a literal context (*This animal is a shark*), the literal reference, including properties like swimming, having fins, and so forth, is selected (see also Kintsch, 2000 for a computational account utilizing latent semantic analysis). Accordingly, literal and figurative meaning should be processed equally fast. These accounts further predict that the pre-activation of the literal meaning of the vehicle (e.g., *shark*) should hamper the processing of the metaphor (cf. Glucksberg, 2008:68: "[l]iteral meanings do not have unconditional priority, and so they are not necessarily easier to compute than nonliteral meanings.").

A similar view has been advanced by Relevance Theory (Carston, 2002; Sperber and Wilson, 2008), where the linguistic content of any type of utterance (metaphoric, hyperbolic, literal, etc.) is underdetermined and the underlying processes should thus be the same. Utterance interpretation is guided by the Principle of Relevance and based on inferential reasoning. Two processes, narrowing and broadening, are involved in the construction of meaning, through which the addressee creates an ad hoc concept, including the relevant meaning range for the current context (cf. Carston, 2002, 2010a; Wilson and Carston, 2007). Although, the processing of literal meaning only involves the selection of the lexical meaning and no narrowing or broadening processes, the underlying inferential steps are suggested to be nearly identical in both cases (Sperber and Wilson, 1986). Hence, access to literal meaning is not obligatory.

<sup>2</sup>A clear discrimination between literal and figurative (non-literal) sentences has been abandonned in the literature in favor of a continuum hypothesis between the figurative and literal pole (cf. e.g., Coulson and van Petten, 2002; Coulson, 2006; Rubio Fernández, 2007; Sperber and Wilson, 2008). Throughout this manuscript, we use the terms literal and figurative to discriminate between the two test conditions.

A newer relevance-theoretic approach by Carston (2010b) contemplated additional effort for the interpretation of figurative language. Based on empirical findings from Rubio Fernández (2007), Carston argued for the lingering of literal meaning even in figurative language processing. Accordingly, (extended) metaphors are appreciated and reflected upon with literal meaning aspects in mind. In terms of processing, this suggests that literal meaning aspects are accessible early on and active throughout metaphor processing.

### **EXPERIMENTAL EVIDENCE ON THE ROLE OF THE LITERAL MEANING**

Previous experimental research indicates that costs are exerted during metaphor processing (for behavioral findings see e.g., Cacciari and Glucksberg, 1994; Noveck et al., 2001), which is further modulated by numerous factors like familiarity, appropriateness, context (for evidence at the behavioral and neural level see e.g., Gibbs, 1994; Giora, 1997; Bambini et al., 2011; Forgács et al., 2012). As far as ERP studies are concerned, several experiments have been conducted in different languages, e.g., in English (Coulson and van Petten, 2002, 2007; Lai et al., 2009; De Grauwe et al., 2010) French (Pynte et al., 1996), Hebrew (Arzouan et al., 2007) and Italian (Resta, 2012). All studies reported a more pronounced N400 for metaphors in contrast to literal control conditions. Hence, the N400 can be considered a stable component in the ERP-research on metaphor that is found for the processing of literary (Resta, 2012) and every-day metaphors, both verbal (Lai et al., 2009) and nominal (e.g., Pynte et al., 1996). Pynte et al. (1996) and Lai et al. (2009) also manipulated the conventionality of usage and the surrounding context. They reported a more pronounced N400 for all metaphors, with amplitudinal variations as a function of the examined factors (e.g., irrelevant context increased the N400-amplitude). The N400 for metaphors has been associated with the cognitive effort needed to comprehend the metaphor, e.g., the search in semantic space (cf. Coulson and van Petten, 2002). In contrast, the studies reported mixed results with respect to later ERP components. The ERP results by Coulson and van Petten (2002), De Grauwe et al. (2010) and Resta (2012) revealed a more pronounced positive deflection for metaphors. Resta linked the Late Positivity to a pragmatic processing stage, which follows semantic processing (N400). Coulson and van Petten (2002) interpreted this effect in terms of recovery of the underlying conceptual metaphor. De Grauwe et al. (2010) considered demands from conflict resolution or selection of the contextually appropriate meaning. Given that late positive effects are observed outside of metaphor processing as well—in other non-literal cases, but also in semantic reversal anomalies (e.g., Regel et al., 2011; Brouwer et al., 2012; Schumacher, 2014)—a more general account of the underlying processes is warranted, reflecting resolution of conflicts from prior processing streams. Other studies on metaphor did not report later effects (cf. Pynte et al., 1996; Coulson and van Petten, 2007 and Lai et al., 2009), which could be due to the selection of the time window of interest (Coulson and van Petten, 2007) or the fact that different word classes (adjectives and verbs) were measured (Lai et al., 2009), which could point toward distinct degrees of sensibility of ERPs to different word classes and related mechanisms.

In general, the findings indicate that figurative language processing exerts costs relative to the processing of more literally used expressions, which is measurable in two discrete processing stages reflected by N400 and Late Positivity effects. However, previous ERP data cannot shed light on the time-course and contribution of literal meaning aspects, as they do not allow to tap into very early processes or to determine whether there is a mandatory initial stage of literal analysis (Bambini and Resta, 2012). A more refined method is required to address this issue. In previous behavioral studies, metaphors were already investigated through priming experiments, for instance in contextual priming studies (Gildea and Glucksberg, 1983; Glucksberg et al., 2001) or in crossmodal priming (cf. Blasko and Connine, 1993; Rubio Fernández, 2007). These priming studies showed the influence of contextual cues and the time-course of property suppression and enhancement. The cross-modal priming data by Rubio Fernández (2007) revealed priming of contextually relevant and irrelevant (literal) meaning aspects until 400 ms after the metaphor (e.g., *plant* and *spike* primed *cactus* in *John doesn't like physical contact. Even his girlfriend finds it difficult to come close to him. John is a cactus*.); 1000 ms after the critical word, the literal meaning was no longer activated. Yet, findings were mixed (cf. also Rubio Fernández, 2007) and the material used was heterogeneous (e.g., adjectives vs. nouns as primes; a mix of hyponymical, heteronymical, and meronymical prime-target relations; metaphor and metonymy interspersed). Furthermore, an even more time-sensitive method than the measure of reaction times is required to answer the question about the role of literal meaning in the early processing of figurative language more adequately. Therefore it seemed fruitful to combine the masked priming paradigm with ERPs.

### **RATIONALE OF THE PRESENT STUDY**

Here we combined the highly time-sensitive method ERP with masked priming. In contrast to the reaction-time studies mentioned above, we presented the prime word immediately before the target word at which point a figurative reading emerges and time-locked the ERP to the word-recognition point of the critical word (see below for further details). Furthermore because we were interested in early automatic processes of figurative language processing, we used pattern masked priming (cf. Kiefer and Spitzer, 2000). Holcomb and Grainger (2006, 2007) provide a detailed description of the interaction of masked priming and ERPs. In this model, processing difficulties at the semantic level are primarily reflected in the N400, where the semantic meaning of the whole word is computed and therefore unrelated prime-target pairs elicit the largest amplitude followed by semantically related prime-target pairs (cf. e.g., Holcomb et al., 2005; Kiyonaga et al., 2007—but note that effects of lexical processing as early as 200 ms after stimulus onset have been reported; e.g., Pulvermüller et al., 2001 for face-related activity verbs; Kissler et al., 2007; Ponz et al., 2014 for processing of emotional information). Based on these findings from word list presentation, we successfully tested the applicability of the masked priming ERP paradigm to sentence processing (Schumacher et al., 2012). Using a procedure as described in more detail in Procedure and illustrated in **Figure 2**, participants listened to sentences for comprehension (e.g., *A student attended a talk in Berlin*) and looked at a pattern mask display. 100 ms before the target word (e.g., *talk*), a masked word was presented visually for 67 ms. ERPs time-locked to the recognition point of the target word revealed that a related prime (e.g., *speaker-talk*) engendered a lower N400-amplitude relative to an unrelated prime (*tailor-talk*) in sentential context but also in word lists, reflecting facilitation. This allows us to look at the role of literal meaning aspects in figurative processing by presenting a probe word associated with the literal meaning prior to the vehicle (e.g., *these songs are drugs: illegal-drugs*). Accordingly, unrelated meaning aspects should hinder processing and induce a more enhanced N400, while related meaning aspects should show facilitation. Within this paradigm, we capitalize on the N400's contribution to lexical access. Crucially, the N400 has also been associated with further subcomponents of lexical processing (i.e., storage, retrieval, integration), which are subserved by distinct neuroanatomical regions (cf. Lau et al., 2008).

Based on the theories discussed above and previous findings, the following predictions can be formulated for metaphor comprehension: first, we expect a biphasic N400-Late Positivity pattern with greater amplitude deflections for the metaphorical condition relative to the literal control in a comprehension task without priming (cf. Coulson and van Petten, 2002; Arzouan et al., 2007; Resta, 2012). Second, to address the question of what role the literal meaning plays in figurative language, we employ the masked priming ERP technique. This should reveal whether a probe word associated with the literal meaning eases or hinders comprehension of the vehicle, where facilitation should be reflected in reduced N400-amplitudes. To this end we also used difference wave plots to compare metaphor comprehension processes with and without priming. Difference waves are created by subtracting the literal condition from the metaphorical one for the presentation without and with priming separately. The hypotheses for the comparison of the factor priming are schematically illustrated in **Figure 1**. The indirect access approach (Grice, 1975; Searle, 1979) and also the theories by Recanati (1995), Giora (1997), and Carston (2002) predict the literal prime to have no negative and even a facilitating effect on the computation of both conditions, since the property of the literal meaning counts as a related prime for these accounts. Hence with priming, the N400 amplitude difference between literal and figurative conditions

should remain the same or even decrease if the prime has a more positive impact on the figurative condition, as can be seen in **Figure 1A**. In contrast, the direct access approach and parallel or relevance-theoretic approaches (e.g., Gibbs, 1989; Kintsch, 2000; Glucksberg, 2008; Sperber and Wilson, 2008) argue for a hampering effect of the literal prime in the figurative condition since the literal meaning is not accessed initially. Hence, when primed, the N400-amplitude should increase for the figurative condition and decrease for the literal condition. As a result, the difference plot for the primed conditions should show a more pronounced negativity as is shown in **Figure 1B**.

As a secondary goal of this research, we wanted to compare metaphor and one particular type of metonymy. This was motivated by the observation that existing theories make different proposals about the (dis)similarity of the processes underlying the computation of metaphors and metonymies. Some accounts argue for the processes to be the same (e.g., Sperber and Wilson, 1985, 2008; Frisson and Pickering, 2001), others suggest them to be different (Lakoff and Turner, 1989; Croft, 1993, 2002). The term metonymy is used for utterances in which a word or phrase is used to refer to something connected to the used expression, e.g., the name of an artist for the work produced by him (*The boy read Böll*). This close connection between the two readings may be directly reflected in the lexical representation (cf. e.g., Pustejovsky, 1995; Asher, 2011). Furthermore, different underlying mechanisms have been ascribed to a range of metonymy types (e.g., Copestake and Briscoe, 1995; Nunberg, 1995). For example, producer-for-product metonymy is less context-specific, frequently used and based on general patterns like "X for Y". In contrast, cases like *The ham sandwich wants to pay* are categorized as meaning transfer, resulting in a reconceptualization of the source. For the cognitive linguistic approach, metonymy is based on mapping within a domain or domain matrix (cf. Lakoff and Turner, 1989; Croft, 2002) or represents a conceptual shift (cf. e.g., Barcelona, 2002; for an overview see Panther and Thornburg, 2003), whereas metaphor is subject to mapping processes between two unrelated domains (cf. e.g., Langacker, 1987; Lakoff, 1993). Accordingly, in *The boy read Böll*, *Böll* relies on a domain that includes the concepts "person" and "work of Böll."

The comprehension of producer-for-product metonymy has been investigated behaviorally, indicating no processing effort (Bambini et al., 2013; see Frisson, 2009 for an overview). While this type of metonymy has not been tested using ERPs before, there are a number of existing studies on logical metonymy (*The boy began the novel*) and different types of nominal metonymies (content-container alternations: *Tim put the beer on the table*; *Tom drank the bottle*), including reference transfer like *The ham sandwich wants to pay* (Kuperberg et al., 2010; Schumacher, 2011, 2013, 2014). These studies cannot support a unified account for processing metonymy. They suggest that metonymies that can be resolved by meaning selection in the lexical representation evoke an N400 and that meaning adjustment that requires reconceptualization (and hence modification of discourse representation) engenders a Late Positivity (cf. Schumacher, 2013). By testing producer-for-product metonymies we want to contribute to this typology and also establish the link to metaphor processing. Using masked priming can provide further insights into the role of literal meaning components.

### **EXPERIMENT 1—LITERAL MEANING IN METAPHOR COMPREHENSION**

In this experiment, we compared the processing of nominal metaphors with that of literal expressions in German to investigate the time-course of metaphor comprehension and whether the literal meaning of a word is activated in the processing of a metaphor. First, the methods applied in the experiment without and with literal primes are described. Then, the results for metaphor without (Experiment 1a) and with priming (Experiment 1b) are reported and finally compared with respect to the impact of priming. Experiment 1a and 2b and Experiment 1b and 2a were presented together in one session but for expository reasons, we presented them as Experiment 1 and 2 separately.

### **METHODS**

### *Participants*

In total, 56 right-handed native speakers of German were paid for participating in this study. All reported normal or correctedto-normal eyesight and no history of neurological disorder. 27 took part in Experiment 1a. Due to too many artifacts from eye-movement, three of them had to be excluded from the data analysis; hence 24 participants entered the statistical analysis (mean age 25.1, ranging from 20 to 30, 17 female). In Experiment 1b, four of the 29 participating subjects had to be excluded from the data analysis because of extensive ocular artifacts. Therefore 25 participants (mean age 24.2, ranging from 19 to 29, 15 female) entered the analysis of the ERP data.

### *Stimuli*

The stimuli were carefully controlled for several factors that are known to influence the processing of metaphors and in particular the N400 effect. First, we collected the familiarity values (cf. Pynte et al., 1996; Lai et al., 2009) and chose metaphors that are neither already lexicalized nor completely unfamiliar (using a scale from not known (1) to well known (5), metaphors from the middle range were selected—see **Table 1** for values). Second, since Kutas and Hillyard (1980) showed that senseless sentences elicit a more pronounced N400 than meaningful utterances, we asked participants to judge the sensicality of the metaphors and the respective literal control sentences. Third, another factor that is known to influence the N400 is cloze probability. More expected words (high cloze probability value) elicit a reduced N400 in contrast to less expected ones (cf. Kutas and Hillyard, 1984). Therefore we truncated the sentences before the critical word and asked participants to complete the sentence fragments by writing down the first continuation that came to their mind. These completions were compared with the actual sentence endings and the percentage of accordance was calculated for each item (regular cloze probability). We also employed a novel approach by analyzing the completions on the basis of whether they resulted in a metaphorical or a literal reading (category cloze probability). This second step was guided by the idea that based on theoretical approaches that



*Familiarity and sensicality were rated on five-point scales. For familiarity, the endpoints were labeled not known (value* = *1) and well known (value* = *5). In the sensicality rating, a happy smiley stood for meaningful (value* = *1) and a sad smiley for meaningless (value* = *5). The term category refers to figurative or literal continuations.*

focus on type-mismatches (e.g., Pustejovsky, 1995; Asher, 2011) or processes within or between domains (e.g., Croft, 2002), it seems promising to determine categorical expectations as well, in order to test whether the N400 is sensitive to category-specific (±metaphorical) predictions of the processor. For that reason, we also calculated the values of categorical accordance by counting the category matches, i.e., metaphorical completions for the metaphorical items and literal completions for the literal items. Based on these pre-test, we selected 40 metaphors and corresponding control sentences whose values are summarized in **Table 1**.

To summarize, the 40 chosen metaphors received medium familiarity scores, to assure that no dead (lexicalized) or totally unknown metaphors were used. The literal controls and metaphors do not differ with respect to their cloze probability values, but with respect to the categorical completions (category cloze probability). As can be seen, the metaphors were classified between meaningful and meaningless whereas the literal controls were rated more toward the meaningful endpoint of the scale. This is a typical pattern in the metaphor literature (see also the material in Bambini et al., 2013). Crucially, the selected metaphors were not rated as anomalous or meaningless. Example stimuli are provided in **Table 2**.

In the conditions with priming (Experiment 1b), the critical word (target; e.g., *hyenas*) was primed with a property of the literal meaning of the target, e.g., *furry.* Based on the close linkage between concepts and their properties (cf. Solomon and Barsalou, 2004) and to avoid problems with potential differences in the relation between prime words and targets (cf. Becker, 1980; Rubio Fernández, 2007), we only used adjectives as primes that were identified as properties of the literal and not of the figurative meaning of the corresponding target word. The appropriate properties were identified in a pre-test, in which participants saw a noun (*hyena*) and a property (*furry*) and had to rate the coherence between these two. Each noun was presented three times with different preselected adjectives to identify the one with the highest coherence value. The summary of the property values can be seen in **Table 3**. Since we used the same word as prime for the literal and the figurative condition, no confounding effects due to the range are expected. See the supplementary material for the whole set of stimuli (target, vehicle, and prime) and respective property coherence values.

### **Table 2 | Example of critical stimuli for Experiment 1a and 1b.**


**Table 3 | Summary of results from pre-tests for selected primes for Experiment 1b.**


*Frequency values are based on wortschatz.uni-leipzig.de. Coherence was assessed on a six-point scale from no coherence (1) to strong coherence (6) between the target word and a particular property.*

In Experiment 1a and 1b, the 80 critical sentences were presented together with 208 filler sentences in three different pseudorandomized orders. The sentences were recorded as natural speech by a female German native speaker in a sound-attenuated booth. Phonetic analyses of the critical targets (targets) and comparisons of duration, pitch and intensity registered no significant differences between the conditions (all *F*s < 1).

### *Procedure*

We used a cross-modal masked priming paradigm adopted by Kiyonaga et al. (2007) and verified in Schumacher et al. (2012) in which the targets were part of auditorily presented sentences, as can be seen in **Figure 2**. Since priming was set as a factor, sentences for Experiment 1a and 2a were presented without primes (but with the forward mask on display) and stimuli for Experiment 1b and 2b with the masking procedure. We now explain the latter in more detail. A fixation asterisk was presented at the beginning of each trial for 500 ms in the center of the monitor. It was followed by the forward mask that consisted of 11 hash marks (#) and the auditory stimulus that started simultaneously. In the condition with priming, the forward mask was replaced by the prime 100 ms before the onset of the auditorily presented target word, hence with 100 ms stimulus onset asynchrony (SOA). The prime was presented for 67 ms and then immediately replaced by the backward mask that consisted of 11 capitalized "X." Until the end of the auditory stimulus, the backward mask remained on the monitor. The sentence presentation was followed by a 1500 ms long blank screen and then by a question mark. At this point, participants had to perform the first of two tasks, which we employed to control for their attention. This first task (color change detection) controlled for the attention paid to the visual display and additionally was meant to distract the participants from the prime presentation. Participants had to detect a color change in the pattern masks (in 44% of all trials), which lasted for only 100 ms. The color change occurred on the forward mask, at least 1000 ms before the target, to avoid an impact on the recorded critical interval. The first task ended by participants pressing one of two buttons ("Yes" or "No") with a maximum response latency of 2000 ms. Following another blank screen of 1500 ms, the second task (probe recognition), implemented to force the participants to pay attention to the auditory stimuli, was indicated by a visually presented word. Participants had to determine whether they had heard this word in the preceding sentence or not. The pressing of one of two possible answer buttons terminated the trial that was followed by a 1500 ms long blank screen. After that, the next trial started. The visual stimuli were presented in the middle of the screen in offwhite against a black background. The letters were shown in Deja Vu Sans Mono font (34 pt.), in which all letters have the same width.

Before each session, participants were carefully instructed about the task. The main experiment was divided in eight blocks with short pauses in-between and preceded by a short training block. Afterwards, a prime detection task was administered to assess the individual prime awareness (cf. Kiyonaga et al., 2007), by first asking the participants in an informal manner if they had recognized anything outstanding. Second, after being informed about the masked priming shortly, the participants saw 30 primes under similar visual conditions (the forward mask lasted 933 or 1933 ms, the prime was again presented for 67 ms and the backward mask lasted 1000 ms) but without auditory stimuli. During the experimental session, the participants sat in front of a 17-inch monitor in a soundproof cabin.

### *EEG recording procedure*

We recorded the electroencephalogram (EEG) from 26Ag/AgCI scalp electrodes mounted on the scalp by an elastic cap (*Electro-Cap International*). The EEG was digitized at a rate of 500 Hz and amplified by a *Brain Vision Brain-Amp* amplifier: impedances were kept below 4 k-. The EEG was referenced online to the left mastoid and re-referenced offline to linked mastoids. We placed the ground at AFz, three electrodes around the subject's right eye (over and under the eye and at its outer cantus) and one electrode at the outer cantus of the left eye. The eye-electrodes served to control for artifacts from eye-movement. To avoid slow signal drifts, the EEG data were processed offline with a 0.3–20.0 Hz band pass filter.

Crucially, previous auditory studies reported that the ERP signature, in particular the N400, varied depending on the word recognition point (cf. van Petten et al., 1999; O'Rourke and Holcomb, 2002; Schumacher et al., 2012). When time-locked to word onset, there were N400-differences for words with early and late word recognition points. When time-locked to the word recognition point, these differences diminished. Therefore we determined the word recognition point of each critical target in a gating task and time-looked the ERPs to it. For the gating task, the critical words were cut individually and then

judged by six native speakers of German that were asked to listen to each sentence carefully and to identify the target word by completing it verbally. By extending the sentences in 50 ms steps, we determined the point at which most participants were able to correctly identify the target word. The word recognition point was on average 168 ms (range 24–374 ms) after the word onset.

Average ERPs were calculated per condition, participant and electrode from the word recognition point to up to 1500 ms and then subjected to automatic (rejection criterion of EOG: >40µV) and manual rejections. 17.37% of all trials had to be excluded due to artifacts. Because of false responses in the probe task or time-outs, 4.97% of the trials were also excluded. In total, 71.4% of the trials without priming (Experiment 1a) and 80.8% of the trials with priming (Experiment 1b) entered the statistical analysis.

### *Data analysis*

We ran statistical analyses for the behavioral data over accuracy rates and reaction times over subjects and items for both tasks. The critical time-windows were predefined by visual inspection. ANOVAs of the ERP data were computed with the factor FIGURATIVENESS (figurative vs. literal) and the factor ROI (topographical region of interest), computed for lateral and midline channels separately. The lateral electrodes were grouped by location as follows: left anterior (F7/F3/FC5/FC1/C3), right anterior (F4/F8/FC2/FC6/C4), left posterior (T7/CP5/CP1/P7/P3), and right posterior (T8/CP2/CP6/P4/P8). The six electrodes form the midline were grouped pair-wise: frontal (Fz/FCz), central (Cz/CPz), and parietal (Pz/POz). Only trials with correct responses to the probe recognition task entered the analysis. The statistical analyses were carried out in a hierarchical manner. To control for potential type I errors due the violations of sphericity, the data were adjusted using the Huynh-Feldt procedure (cf. Huynh and Feldt, 1970).

### **EXPERIMENT 1A** *Behavioral results*

For the color change detection and the probe recognition task, we calculated reaction times and accuracy rates for the literal and metaphorical condition separately. With over 94% correct answers, the results revealed that the participants paid attention to the visual (94.81%, *SD* = 0.12) and auditory (94.86%, *SD* = 0.03) stimuli. Statistical analyses revealed no differences for the factor FIGURATIVENESS for accuracy rates and reaction times in both tasks (all *F*s < 1).

## *Electrophysiological results*

Visual inspection of the grand average ERPs revealed two critical time-windows for the comparison of the literal and the figurative condition (see **Figure 3A**): a more negative deflection for metaphors between 250 and 500 ms (N400-window) and a more positive deflection between 700 and 900 ms (Late Positivity). We ran separate ANOVAs for both time windows that revealed an interaction of FIGURATIVENESS × ROI between 250 and 500 ms (N400 time-window) [*F*(3, 69) = 13.67, *p* < 0.001], significant in the left [*F*(1, 23) = 6.86, *p* < 0.05] and right [*F*(1, 23) = 18.12, *p* < 0.001] posterior regions. For the midline electrodes, the statistical analyses for the N400-window also revealed an interaction of FIGURATIVENESS × ROI [*F*(2, 46) = 17.25, *p* < 0.001], significant in the central [*F*(1, 23) = 10.29, *p* < 0.01] and posterior [*F*(1, 23) = 17.64, *p* < 0.001] regions. For the Late Positivity time-window (700–900 ms), ANOVAs showed an interaction of FIGURATIVENESS × ROI for the lateral electrodes [*F*(3, 69) = 3.48, *p* < 0.05] significant in both left [*F*(1, 23) = 18.37, *p* < 0.001] and right [*F*(1, 23) = 13.33, *p* < 0.01] posterior regions. For the midline electrodes, statistical analyses also revealed a main effect of FIGURATIVENESS in the late time-window [*F*(1, 23) = 12.31, *p* < 0.01].

### *Discussion*

Experiment 1a investigated the processing of nominal metaphors in German without priming. ERPs revealed a biphasic N400-Late Positivity with more pronounced deflections for the metaphorical in comparison to the literal condition. The behavioral data indicated high attentiveness of the participants to visual and auditory stimuli.

The ERP data can be interpreted in line with the idea that the N400 reflects enhanced costs in the lexical access phase, influenced by context and the degree of categorical expectancy. The classical cloze probability values are at 0% in both conditions, therefore the N400-difference (250–500 ms) cannot be explained based on the expectancy of a particular word. The category expectancy value however matches the N400 deflection. The high expectation of any word that completes the sentence literally (almost 100%) elicited a less pronounced N400-amplitude than the unexpected metaphorical completion with any word belonging to the metaphorical category (value of categorical accordance below 1%). In the later time-window, metaphors showed a more pronounced positive-going wave between 700 and 900 ms than the literal control condition. This Late Positivity might be interpreted as reflecting enhanced costs due to pragmatically or inferentially driven mapping processes, involving two unrelated

without priming in**(A)**(Experiment 1a) andwith priming in**(B)**(Experiment 1b). Negativity is plotted up. Vertical bar represents theword recognition point of the target.

domains. This has consequences for discourse representation: in the metaphorical condition, the integration of a referent in the discourse involves the combination of two domains and processing is hence more costly than the simple establishment of a new referent in the literal condition.

Experiment 1a, like other studies before, found enhanced costs for the processing of metaphors in comparison to literal utterances. Hence, theories that argued for metaphors to be interpreted as easily as comparable literal utterances are challenged (e.g., Sperber and Wilson, 1986; Glucksberg, 2008). In contrast, these findings support theoretical accounts that assume more steps or higher effort during the interpretation of metaphors (cf. Grice, 1975; Searle, 1979; Carston, 2010b). Yet these results do not allow to discriminate between different accounts on the steps in figurative processing.

For the reason that we are interested in the role of literal meaning during the early processing stage in the computation of metaphors, Experiment 1a served to set up a baseline. In the following, the same materials were presented with primes that were literal properties of the critical word and then the results are compared with the findings for metaphor processing without priming.

### **EXPERIMENT 1B** *Behavioral results*

As before, participants performed well in both tasks, indicating that they paid high attention to the visual and auditory stimuli. They responded correctly to 98.6% (*SD* = 0.05) of the color change detection and 95.9% (*SD* = 0.02) of the probe recognition task. For accuracy rates, ANOVAs with the factor FIGURATIVENESS revealed no significant differences for both tasks (all *F*s < 1). For reaction times, statistical analyses showed no differences for the color change detection task (all *F*s < 1) and for the probe recognition a significant differences for the subject analysis only [*F*1(1, 24) = 5.13, *p* < 0.05; *F*<sup>2</sup> < 1]. This was due to slower reaction times for the literal (mean = 909 ms) than for the figurative condition (877 ms).

### *Electrophysiological results*

**Figure 3B** shows the grand average ERPs in the masked priming paradigm. To allow for a good comparison with the unprimed conditions, we picked the same time-windows between 250 and 500 ms (N400) and 700 and 900 ms (Late Positivity). Crucially, the figurative and the literal conditions did not seem to differ in this Late Positivity-window but further downstream. Indeed, statistical analyses showed no significant effect for the 700–900 ms time-window (*F* < 1). For the N400 time-window, ANOVAs revealed an interaction of the factors FIGURATIVENESS × ROI [*F*(3, 72) = 7.71, *p* < 0.001], which was resolved significantly in the left [*F*(1, 24) = 6.92, *p* < 0.05] and right [*F*(1, 24) = 7.05, *p* < 0.05] posterior regions, and for the midline electrodes [*F*(2, 48) = 10.03, *p* < 0.001], significant in the posterior region [*F*(1, 24) = 4.82, *p* < 0.05]. Additionally, we analyzed the timewindow between 1100 and 1300 ms (based on visual inspection) in which the metaphorical condition elicited a more positive deflection than the literal control condition. ANOVA showed a main effect of FIGURATIVENESS for the lateral [*F*(1, 24) = 5.49, *p* < 0.05] and the midline electrodes [*F*(1, 24) = 4.41, *p* < 0.05].

### *Post-ERP test*

As described in *Subsection* Procedure, participants were asked to perform a prime detection task. On average, participants detected 18 of 30 primes correctly. The average prime detection rate hence was 59.3%, which mirrors the results from other experiments (e.g., Kiefer, 2002; Kiyonaga et al., 2007). In addition, we controlled for a possible influence of the individual prime detection rate on the size of the N400. Therefore we calculated the correlation of the prime detection rate and N400-amplitude difference for three midline electrodes separately (Cz, CPz, and Pz) by subtracting the maximal amplitude value in the critical timewindow (250–500 ms) of the literal from that of the figurative condition for each participant. These values were then correlated with the individual prime detection rates. The statistical analysis revealed no reliable correlation for any of the three electrodes: Cz (Pearson's *r* = 0.111, *p* = 0.598), the CPz (*r* = 0.161, *p* = 0.442) or the Pz (*r* = 0.167, *p* = 0.425).

### *Discussion*

In this experiment, metaphors were presented within a masked priming paradigm to investigate the role of literal meaning during the lexical access phase of the critical word. As in the condition without priming, we found a more pronounced N4003 (250–500 ms) and Late Positivity (1100–1300 ms) for the metaphorical in comparison with the literal condition. Hence, independent of priming, the processing of the critical word (*hyenas*) is more demanding in the metaphorical environment than in the literal during the lexical access phase as well as in later discourse updating processes.

To see in which direction the literal primes influence the N400 (de- or increasing amplitude-difference), we calculated difference wave plots by subtracting the literal from the metaphorical condition for both experiments (without and with priming) separately (cf. Roehm et al., 2007). This allowed us to filter out differences between the two participant groups and differences arising from the different presentation modalities. Visual inspection of **Figure 4** revealed a slightly reduced N400-amplitude difference (between 250 and 500 ms) for the presentation with a literal prime in Experiment 1b. This was supported by statistical analyses (*p*'s < 0.01). The literal prime word therefore has a facilitating effect on language processing (cf. Rolke et al., 2001; Kiefer, 2002; Grossi, 2006). The fact that the N400-amplitude difference in the masked priming conditions is even reduced indicates a greater benefit of the literal prime word in the figurative than in the literal interpretation. Processing the metaphor may profit from the subliminal prime due to pre-activation of the semantic network of the target, which eases the extra operations required. The data suggest that the pre-activation of the literal meaning of the target word within a metaphor does not hamper, but rather facilitates processing.

Based on these observations, accounts that maintain access to literal meaning aspects like the indirect access view (Grice, 1975; Searle, 1979) are supported. Likewise, theories that expect the literal meaning to linger around in figurative language processing are substantiated as well (Giora, 2008; Carston, 2010b). In contrast, theories that claim that the literal meaning does not play a

line) priming. The vertical bar marks the word recognition point of the target. The critical time-window is shaded in gray.

role in metaphor processing (Sperber and Wilson, 2008) or that predict a hampering effect of the literal prime word (Glucksberg, 2008) cannot be confirmed by the current results. The difference wave plots thus reveal evidence against the direct access account.

Beside the reduced N400-amplitude, priming had an impact on later components. With priming the Late Positivity was delayed from 700–900 ms to 1100–1300 ms. The latency shift might result from interferences from the prime presentation in earlier phases that hamper (later) pragmatic operations. A possible explanation might be that this delay is caused by the literal prime holding up the mapping or reconceptualization processes. Note however that the previous ERP experiment on priming in sentential context already revealed an influence of masked priming in later processing stages where the repetition priming condition registered a late positive deflection in sentential context but not in list presentation (cf. Schumacher et al., 2012). To decide whether the linguistic information of the primes or a more general disturbance triggered by prime presentation provoke the latency shift, further investigations are needed.

In sum, metaphors elicited a more pronounced biphasic N400- Late Positivity pattern for the metaphorical condition in comparison with the literal controls independent of prime presentation. The processing of metaphors therefore can be interpreted as more costly than the processing of literal language. When preceded by a literal prime word, the effort in the lexical access phase of the target word is reduced (smaller amplitude difference) and the Late

<sup>3</sup>Crucially, the fact that we found an N400-amplitude difference between the two conditions speaks for stimulus processing beyond the word level in both presentation modalities (without and with priming). If participants processed each word separately, we should not have found differences at hyena, neither in the condition without nor in the condition with priming.

Positivity is delayed. The reduced N400 refutes the direct access view and supports the idea that the literal meaning is accessed or at least lingering during figurative language processing.

# **EXPERIMENT 2—LITERAL MEANING IN METONYMY COMPREHENSION**

To extend the findings for the role of literal meaning to another type of non-literal language, we also tested producer-forproduct metonymies utilizing the same experimental design as for metaphors.

### **METHODS**

### *Participants*

We gathered data from the same 56 participants as tested in Experiment 1. In Experiment 2a, five of 29 participants had to be excluded due to too many artifacts; hence 24 subjects (mean age 24.3, 15 female) entered the analysis of the ERP data. In Experiment 2b (with priming), the ERP data of 22 subjects (mean age 24.9, age ranged from 21 to 30, 16 women) were analyzed after discarding data from five participants due to extensive ocular and motion artifacts.

### *Stimuli*

The materials were pretested on sensicality, familiarity, cloze probability, and category cloze probability. Since the tested metonymies all belong to the conventional producer-for-product type, we controlled for familiarity of the famous person used (cf. Frisson and Pickering, 2007). Therefore we conducted a test similar to the familiarity pretest by Frisson and Pickering (2007) and calculated the percentage of participants that correctly named the profession of the respective famous individuals. The findings are summarized in **Table 4**.

Again it was necessary to find good primes for the literal meaning of the target word (all last names of famous people like *Böll*). We used the same property acceptability test as described for metaphors and used adjectives as properties to keep the conditions similar to the metaphor experiment. Since we controlled for the fact that the properties should not represent potential properties of the metonymical meaning, it was challenging to find appropriate properties. Many adjectives, e.g., *lively*, that describe famous individuals (as e.g., painters) also describe their work, i.e., the metonymical meaning. Therefore, we used rather general adjectives comprising of human and biographical

**Table 4 | Summary of mean values from pre-tests for selected metonymy and corresponding literal controls.**


*Familiarity reports the percentage of participants who identified the correct profession for the famous people used in this type of metonymy. Sensicality was assessed on a five-point scale ranging from meaningful (value* = *1) to meaningless (value* = *5). Category refers to figurative or literal continuations.*

characteristics (e.g., *divorced, talented*). Their characteristics are summarized in **Table 6**. The critical words and properties with the corresponding property coherence values are added in the supplementary material.

In both Experiments (2a and 2b), the 80 critical sentences were presented together with 208 filler sentences in three different pseudorandomized orders. An example of the metonymies of the type producer-for-product (author-for-work, designer-for-clothing, composer-for-composition, and painterfor-painting) and their literal control sentences can be seen in **Table 5**.

The sentences of the metonymical condition were recorded by the same female German native speaker and under the same conditions outlined in Experiment 1. Again phonetic analyses of the critical targets registered no significant differences (all *F*s < 1) between the metonymical and the literal condition in terms of duration, pitch and intensity.

# *Procedure*

We used the same procedure as in Experiment 1.

### *EEG recording procedure*

The recording procedure was the same as in Experiment 1. Six participants determined the word recognition point for the critical targets in a gating task (on average 191 ms (ranging from 18 to 398 ms) after name onset). Because of probe task responses and filtering procedures, we had to exclude 7.78% of the trials due to artifacts and incorrect responses for the condition without priming and 8.49% of the trials for the condition with priming.

### **Table 5 | Example of critical stimuli for Experiment 2a and 2b.**


### **Table 6 | Summary of results from pre-tests for selected primes for Experiment 2b.**


*Frequency values are based on wortschatz.uni-leipzig.de. Coherence was assessed on a six-point scale from no coherence (1) to strong coherence (6) between the target word and a particular property.*

### *Data analysis*

ANOVAs were carried out for the behavioral data over reaction times and accuracy rates for both tasks. The critical time-windows for the ERP analysis were determined by visual inspection and statistical analyses were computed for the mean amplitude value of the ERP data. ANOVAs for Experiment 2a (without priming) and 2b (with priming) were calculated with the factor FIGURATIVENESS (figurative vs. literal) and ROI (topographical regions of interests).

# **EXPERIMENT 2A**

### *Behavioral results*

The behavioral responses indicated that participants paid attention to the visual and auditory stimuli. The color change detection task yielded over 98% (*SD* = 0.07) correct responses, the probe recognition task over 96% (*SD* = 0.02). For both tasks, ANOVAs for accuracy rates revealed no differences for the factor FIGURATIVENESS for subjects and items (all *F*s < 1). For reaction times, statistical analyses for the color change detection task also revealed no effects (all *F*s < 1) and for the probe recognition task a reliable difference by subjects only [*F*1(1, 23) = 7.48, *p* < 0.05]. This was reflected in faster reaction times for metonymies (mean = 856 ms) than for the literal control sentences (mean = 885 ms).

### *Electrophysiological results*

**Figure 5A** shows the grand average ERPs for metonymies and their literal controls. The figurative condition elicited a more pronounced negativity between 200 and 350 ms (N400 timewindow) in contrast to the literal condition. The findings, based on visual inspection, were confirmed by statistical analyses. ANOVAs revealed an interaction of FIGURATIVENESS × ROI [*F*(3, 69) = 3.11, *p* < 0.05], significant only in the left posterior region [*F*(1, 23) = 4.45, *p* < 0.05], for the lateral regions of interest and no significant effect for the midline (*F* < 1).

### *Discussion*

In this experiment, German producer-for-product metonymies were presented without priming. ERPs revealed a more pronounced N400 (200–350 ms) and no later effects4 . This result is in contrast to previous studies on metonymy that reported a biphasic pattern (reference transfer; Schumacher, 2014) or a monophasic Late Positivity (container-for-content metonymy; Schumacher, 2013). In turn, for the more conventionalized producer-for-product metonymies tested here, eye-tracking studies (Frisson and Pickering, 1999, 2007; McElree et al., 2006) and timed sensicality judgments (Bambini et al., 2013) did not find differences between metonymical and literal utterances if the author was familiar (*read/met Dickens*). However, for unfamiliar authors the processing of the metonymical condition, assessed via eye-tracking, was more costly (*read/met Needham*) if presented without supporting context. Since we presented the metonymies with well-known famous people (familiarity value >90%), one would expect to find no differences based on the former results.

In comparison with previous studies on metonymy, we can draw two important conclusions: first, the ERP data substantiate the existence of different types of metonymy and their respective underlying processes. We assume that producer-for-product

<sup>4</sup>Although visual inspection suggests some later effect, statistical analyses with 50 ms windows from 350 to 900 ms revealed only a significant effect for the time-window between 400 and 450 ms (ROIs and midline) and for the timewindow from 550 to 600 ms (ROIs only). Since no two adjacent time-windows elicited significant differences, we do not consider this difference reliable (cf. Gunter et al., 2000).

metonymy is based on a general metonymic pattern ("X for Y"). The *ham sandwich* example tested by Schumacher (2011, 2014), is categorized as a case of reference transfer since it requires transfer operations on the discourse referent. The interpretation of examples like *The ham sandwich wanted to pay* requires information that is stored in the lexical entry of a restaurant script. Since it is not expressed in the utterance, the processor has to relate these two domains (Nunberg, 1995). This differentiation can explain the findings for reference transfer and producerfor-product metonymy <sup>5</sup> . The producer-for-product type only requires the selection of the correct meaning from the lexical representation. The reference transfer type involves the establishment of a relation between two domains (e.g., "restaurant" and "ham sandwich") via inferencing or mapping processes, which are reflected in the Late Positivity. This explains why we found this effect for metaphors too, but not for the producer-for-product metonymies, where no such operations are required (no Late Positivity). This explanation is also in line with the assumptions made by Schumacher (2013). She argued for reference transfer to involve a reference shift and therefore modifications in discourse structure and for producer-for-product metonymy to involve meaning selection processes only. Second, the observed N400 reflects a context-dependent, expectancy driven process (cf. e.g., Kutas and Hillyard, 1984). This view matches the results because the metonymical completion was expected neither in the metonymies tested here nor in the reference transfer examples. Additionally, Schumacher (2014) reported the absence of an N400-difference when the critical sentences are preceded by a supporting context (higher expectancies of a metonymical completion via pre-activation of e.g., the restaurant setting).

Finally, ERP and eye-tracking seems to have a different degree of sensibility to the underlying language computation processes. This might result from the distinct presentation modalities: in eye-tracking experiments, the participants are presented with the entire sentence at once, allowing them to regress to earlier parts of the utterance. In ERP studies, the participants do not have the possibility to check earlier parts since the material is presented as rapid serial visual sequences or auditorily.

The current experiment revealed evidence for different types of meaning adjustment. The simpler metonymy type (producerfor-product metonymy) elicited a monophasic N400 and differs from the more complex type (reference transfer) and metaphor, which both showed an additional Late Positivity. In Experiment 2b, we presented the producer-for-product metonymies with literal prime words that preceded the target. The comparison of Experiment 2a with 2b will then provide additional insights about the role of literal meaning during figurative language processing.

### **EXPERIMENT 2B** *Behavioral results*

As in the preceding experiments, the participants performed well in all tasks. The accuracy rate elicited for the color change detection task was 95.03% (*SD* = 0.12) and for the probe detection task 95.6% (*SD* = 0.03). ANOVAs showed no significant differences for the factor FIGURATIVENESS for accuracy rates for both tasks (all *F*s < 1). For reaction times, statistical analyses registered no reliable difference for the color change detection task (*F*s < 1) and an effect in the subject analysis only for the probe recognition task [*F*1(1, 21) = 5.74, *p* < 0.05; *F*<sup>2</sup> < 1]. Participants reacted significantly faster in the figurative condition (mean reaction time = 905 ms) than in the literal control condition (mean = 932 ms).

# *Electrophysiological results*

Although visual inspection of **Figure 5B** suggests no differences between the metonymical and the literal condition if preceded by a prime word, we ran ANOVAs over the same time-window (200–350 ms) as in the unprimed condition to keep the results comparable. The statistical results supported the impression gained by visual inspection. The N400 time-window revealed no significant effects (all *F*s < 1). Additionally, we computed ANOVAs with 50 ms time-windows from 0 to 1000 ms. The time-window from 750 to 800 ms showed a main effect of FIGURATIVENESS in the lateral and the midline analysis only (*F*s > 13.7, *p* < 0.01). Again, we do not consider this difference reliable, because the time-window is short and no adjacent time-windows elicited significant differences (cf. Gunter et al., 2000).

### *Post-ERP test*

In the subsequent prime detection task, participants performed around chance level. On average they named 18 primes correctly (59.81%). Since ERPs did not differ in the potential N400 window (see below), we did not compute correlations for prime detection and N400-amplitude differences.

### *Discussion*

In this experiment we presented the stimuli tested in Experiment 2a with a literal prime word to investigate the role of literal meaning in the processing of other types of non-literal language. With priming, producer-for-product metonymies did not elicit an N400 or any other later effects in comparison to the literal controls.

When presented without a literal prime, metonymies evoked an N400, reflecting enhanced costs in the lexical access phase of the critical word (e.g., *Böll*). The observation that metonymies did not elicit an N400-difference when preceded by a literal prime points either toward different underlying costs in early language processes in the comparison of metaphor and metonymy or toward a different degree of sensitivity to priming effects. Unfortunately, metaphor and metonymy were never directly compared in an ERP study before. The only studies carrying out direct comparison are behavioral. A reading and reaction time study by Gibbs (1990) tested how easily figurative reinstatements can be used as anaphors for literal referents that were introduced in a short story and reported faster reaction times for metaphors than for metonymies. This indicates higher processing effort for metonymy than for metaphor and contradicts our ERP findings. The pattern observed by Gibbs may however be confounded by

<sup>5</sup>Note that the suggestions by Nunberg (1995) did also include the stringent necessity of supporting context for the processing of reference transfer, but this was refuted by Schumacher (2014).

an animacy shift for the metonymical (*poor surgeon*/*scalpel*) but not for the metaphorical reinstatement (*poor surgeon*/*butcher*). Additionally, the use of *the scalpel* seems less conventionalized then *the butcher* for referring to a poor surgeon. Since the material was not controlled for familiarity, conventionality or animacy, the longer reading times for metonymies could result from these factors. The current differences may therefore be best explained on the basis of typological differences between metonymy and metaphor. Conversely, the direct comparison in sensicality judgments in Bambini et al. (2013) revealed that the processing costs for interpreting metaphors are higher than for metonymy; in the latter case, response times equated the literal condition, in line with our ERP findings.

# **GENERAL DISCUSSION**

This study investigated the processing of figurative language by using ERPs in combination with masked cross-modal priming to examine the role of literal meaning aspects during the processing of metaphors (Experiment 1) and metonymies (Experiment 2). In the conditions without priming, metaphors revealed a biphasic N400—Late Positivity pattern, while metonymies evoked a monophasic N400. This suggests different underlying mechanisms for the processing of figurative language. In combination with masked priming, the data revealed facilitating priming effects of literal prime words when priming a figurative utterance and a different degree of the impact of the prime on metaphors (reduced N400; Experiment 1b) and metonymies (vanished N400; Experiment 2b).

### **METAPHOR PROCESSING**

We tested nominal metaphors for the first time in German and replicated previous findings from studies that investigated other languages and different degrees of conventionality (cf. Coulson and van Petten, 2002; Arzouan et al., 2007; Resta, 2012). Metaphors evoked a biphasic pattern, which we interpret in terms of enhanced costs during lexical access of the critical word, e.g., *hyenas* (N400) and computational demands required for the modification of the current discourse representation (Late Positivity). The difficulties during lexical access can be explained by several factors, especially context integration. The N400-amplitude reflects the category cloze probability of almost 100% for the literal and below 1% for the metaphorical condition (where category refers to literal or figurative completions in the cloze task). The processor expected a literal completion in both conditions and therefore the metaphorical completion is unexpected and hampers the lexical access of the critical word. In ERP research on metaphors, the N400 has been shown to be sensitive to several factors that may interact with each other, e.g., familiarity (Lai et al., 2009) or the preceding context (Pynte et al., 1996). Taken together, all of these results converge for the N400 to reflect processing effort during lexical access. The N400 is thus sensitive to the preceding context, category expectancy, and the degree of conventionality.

As far as the Late Positivity is concerned, across the literature one can find many possible explanations: it could be associated with pragmatically driven implicatures as suggested by Grice (1975) (see Resta, 2012), mapping operations between two unrelated domains as proposed by the cognitive linguistic approach (cf. e.g., Coulson and Matlock, 2001; Croft, 2002; Wolff and Gentner, 2011; but see Lai and Curran, 2013, for the assignment of mapping processes to the N400), meaning construction via blending of cognitive models (cf. Fauconnier and Turner, 2002; Coulson and Oakley, 2005), the activation of secondary cognitive models (cf. Evans, 2010), associative processes as implied by Searle (1979) and Recanati (1995), or the generation of ad hoc concepts via narrowing and broadening (Carston, 2010a). It is noteworthy that the Late Positivity has not been reported in all of the previous experiments investigating metaphor processing. The mixed findings have already been attributed to differences in the design of the studies that did not report a Late Positivity, e.g., the involvement of different word classes or the analysis of smaller time-windows (see above). Yet the differences across experiments may also be due to qualitative differences in the metaphorical materials. The crucial distinction we want to point out is between verbal metaphors (cf. Lai et al., 2009; Lai and Curran, 2013) that elicited a monophasic N400 and nominal metaphors as in the current study that evoked a biphasic N400-Late Positivity pattern. This distinction would fit with the proposal that the Late Positivity reflects operations on discourse representation structure where costs accrue whenever a discourse referent is added to the discourse or must be modified (cf. Burkhardt, 2007; Schumacher, 2013). Hence, in the case of *These lobbyists are hyenas*, a discourse representation for *hyenas* is established but the metaphoric interpretation requires the extraction of certain properties. The shift from an entity denoting discourse referent to a property results in modifications in the discourse representation and referent deletion6 .

Together with previous findings on metaphors, this study revealed that the processing of metaphorical utterances is more demanding than the processing of literal sentences. Thus, the results support theoretical accounts that argue for different processing cost (metaphor > literal utterance). This includes for example the indirect access account (cf. Grice, 1975; Searle, 1979), which postulates an additional step (more costs) in the processing of metaphors, and the idea of Relevance Theory by Carston (2010b), who argues for enhanced costs for metaphors due to the construction of a relevant ad hoc concept via narrowing and broadening. Although our discussion has started from the extreme poles of the direct and indirect access accounts as they emerge in the pragmatic literature, our findings are compatible with other proposals as well, for example the computational model by Kintsch (2000), who argued for metaphor processing to take place in three steps: first the semantic neighborhood of the vehicle is activated, then a network is created via spreading activation (cf. also Quillian, 1962, 1967), involving the target, the vehicle and the environment of the vehicle, and the size of which depends on the degree of the relation between those two. In the last step, the meaning of the metaphor is created by computing the connection between vehicle and target with the highest activation. Although Kintsch argued for no differences between

<sup>6</sup>Evidence for referent deletion comes from pronoun tests that indicate that "hyena" is no longer accessible: My boss is a hyena. #It is aggressive.

the underlying processes in metaphors and literal utterances, our findings can still be accounted for with his model. First, the vehicle word is activated, which can be related to differences in the N400 time-window due to accessibility (contextual effects). In terms of spreading, a related prime word that immediately precedes the vehicle should ease activation (reduced N400). The Late Positivity can then be attributed to costs emerging from the computation of the connection between target and vehicle. If this is right, the Late Positivity might be sensitive to the semantic distance between vehicle and target word.

Accounts that argue against additional effort in the processing of metaphors cannot be supported (cf. Frisson and Pickering, 2001; Sperber and Wilson, 2008). The direct access view rejects the requirement of an additional step (cf. Gibbs, 1994; Glucksberg, 2008) based on previous findings that showed that metaphorical and literal sentences are read equally fast (cf. Blasko and Connine, 1993) and with no speed differences (cf. McElree and Nordlie, 1999). Although ERPs do not reveal the amount of involved steps in general, they clearly indicate enhanced cognitive effort for metaphors. This assumption is also supported by lower accuracy rates found for metaphors in comparison to literal strings using the SAT paradigm (McElree and Nordlie, 1999) and sensicality judgments (Bambini et al., 2013).

### **METONYMY PROCESSING**

The investigation of producer-for-product metonymy in ERP revealed the existence of at least two different types of metonymic processes. Producer-for-product metonymy only requires simple selection processes, which is reflected in a monophasic N400; it is more demanding than the processing of literal utterances since the metonymic completion is not the expected type of category, while the representation of the discourse referent is unaffected (no Late Positivity). In contrast, reference transfer (cf. Schumacher, 2014) revealed a biphasic pattern that reflected expectancy-based difficulties during lexical access of e.g., *ham sandwich* and thereafter modifications in the discourse representation structure via inferentially or pragmatically driven processes (see above for a more elaborate discussion). These findings support approaches that differentiate between various metonymy types, for instance metonymy requiring transfer operations on the referent and metonymy that is subjected to more general lexical operations of meaning adjustment (cf. Copestake and Briscoe, 1995; Nunberg, 1995). The present results are challenging for theoretical approaches that assume the same cognitive operation for both types of metonymy.

### **METAPHOR AND METONYMY IN COMPARISON**

Additionally, the indirect comparison of metaphors (N400- Late Positivity) and producer-for-product metonymies (N400) without priming (Experiment 1a and 2a) demonstrated differences as well. This leads to some initial conclusions about the (dis)similarity of metaphor and metonymy processing. Both metaphor and metonymy registered a more pronounced N400 amplitude in comparison to their literal control conditions. We interpret the N400 to reflect enhanced costs during lexical access to the respective critical word due to the low category expectancy value for both figurative conditions (below 3%). Crucially, if presented within the priming condition, metonymy no longer elicited an N400 but metaphor still did. This could result from either different degrees of sensibility to priming or, more likely, from a different amount of costs required in the lexical access phase. Since we suggest that the N400 is not limited to reflecting distinct degrees of category expectancy, the difference between figurative types may be due to different meaning adjustment operations. Lexical access in metonymy might only require the selection of the appropriate meaning, while in metaphor lexical access of the vehicle includes the generation of a new meaning. Similar to the differentiation between reference transfer and producer-for-product metonymy, metaphors, in contrast to the metonymies tested here, requires operations on the discourse referent and hence showed a more pronounced Late Positivity.

Therefore the findings support accounts that argue for differences between metaphors (mapping between unrelated domains) and metonymies (conceptual shift within a domain (matrix)) (cf. e.g., Lakoff and Turner, 1989; Croft, 1993, 2002). Accounts that suggest the same cognitive costs for the interpretation of metaphors and metonymy are in contrast challenged by the results (cf. Sperber and Wilson, 1985, 2008; Frisson and Pickering, 2001).

Metaphor and metonymy are only two of many types of figurative language use. An important task for future research will be the development of a typology of figurative language that goes beyond these two types. Initial evidence for this comes from Schumacher (2013) who investigated different types of metonymy (but see also Ferretti et al., 2007 on proverbs, Regel et al., 2011 on irony, Vespignani et al., 2010 on idioms, among others). Note also that this study concentrated on temporal aspects of metaphor and metonymy. Obviously, research on the neuroanatomy of figurative processing is essential to complement our understanding of the language architecture but such an endeavor lies beyond the scope of the current research (see e.g., Bohrn et al., 2012; Rapp et al., 2012 for corresponding meta-analyses on figurative language).

### **ROLE OF LITERAL MEANING ASPECTS**

We presented metaphors and metonymies without and with priming to investigate the role of literal word meaning in figurative language processing. Based on previous studies (cf. Holcomb and Grainger, 2006; Kiyonaga et al., 2007; Schumacher et al., 2012), we expected that a semantically related prime that precedes the target word eases the processing of this target during the lexical access phase. This is reflected in a reduced N400 amplitude. In contrast, unrelated prime words should hamper lexical access and therefore result in a more pronounced negativegoing wave. We used this knowledge to investigate two theoretical positions. The indirect access view argued for the literal meaning of a word to be always accessed first, even during figurative language processing. Therefore a prime word that is a property of the literal but not of the figurative meaning counts as a related prime. Literal priming therefore should elicit the same effects like semantic priming, i.e., a reduced or unchanged N400-amplitude difference. On the other side, the direct access view suggested that the literal meaning of the target word is not accessed in figurative processing. Therefore the literal prime can be equated with unrelated priming and should have no facilitating or even a hampering effect (more pronounced N400). The calculated difference wave plot (see **Figure 4**) compared the amplitude difference between the metaphoric and literal conditions without and with priming. They revealed that the literal prime word has no hampering effect (no enhanced amplitude) on the processing of figurative utterances. In contrast, the N400-amplitude was reduced. This observation is also supported by the processing patterns in the metonymy study, where masked priming resulted in the absence of a difference between the two conditions, indicating that the literal prime does not impede processing. The findings therefore point against accounts that argue for literal meaning to play no role or even having a negative influence on the processing of figurative utterances (e.g., Glucksberg, 2008; Sperber and Wilson, 2008). The data in turn support theories that integrate the literal meaning in the processing of metaphors and metonymies. This involves relatively strict accounts that propose a literal first step in their model (Grice, 1975; Searle, 1979), as well as accounts that argue for the lingering of literal meaning (Carston, 2010b). Our findings are also in line with theoretical approaches that argue for the literal meaning to have a role in blending, e.g., mapping processes or semantic/computational approaches (cf. Kintsch, 2000; Croft, 2002; Fauconnier and Turner, 2002; Coulson and Oakley, 2005; Wolff and Gentner, 2011).

The cross-modal priming technique adopted here for the first time therefore gave important insights into the role of literal meaning in figurative language processing. Namely it shows that during the lexical access phase (N400), independent of figurativity, the literal meaning is activated and therefore primes related to the literal meaning of the critical word facilitate lexical access (reduced N400). It is uncontroversial that the contextually relevant meaning is determined within a short period of time. Still, and crucially, the masked priming data revealed that literal meaning aspects are initially available regardless of whether they are contextually relevant or not. Interestingly, converging evidence comes from the literature on idioms: the literal meaning of the constituent words can be available until the end of idiom strings, and even after the idiomatic meaning has already been recognized (Cacciari, 2014). Theories about figurative language should thus include a phase in which the literal meaning of the critical word is accessed.

Beside the reduced N400-amplitude, literal priming in metaphors causes a delayed Late Positivity. Because priming in metonymies did not elicit a Late Positivity, we can exclude the possibility that literal priming *per se* results in a delayed Positivity. Based on the findings of Schumacher et al. (2012) who used the same paradigm, we argue that in sentential context primes influence processes beside lexical access that are already demanding when computed without priming. This would explain why we found a delayed Late Positivity in 1b (for metaphors) but not in 2b (for metonymies). The findings by Schumacher et al. (2012) then could be reinterpreted in terms of a delayed P325, which reflects the entire repetition of the auditory prime word by the visual target during lexical form processing (cf. Holcomb and Grainger, 2006). Additional studies, for instance with figurative prime words, are needed to shed further light on these findings.

In sum, our data indicate that literal meaning aspects are accessed during the processing of metaphor and metonymy. We further suggest that the electrophysiological differences observed between the ERP patterns in metaphor and metonymic processing call for a more refined typology of figurative processes. To this end, we discussed different types of metonymy (such as producerfor-product metonymy vs. reference transfer) and the possibility of different types of metaphor (nominal vs. verbal).

# **ACKNOWLEDGMENTS**

This research was supported by a collaborative research initiative of the European Science Foundation's EURO-XPRAG Research Networking Program to the second and third author. We would like to thank Florian Bogner, Anika Jödicke, R. Muralikrishnan and Markus Philipp for technical support.

### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2014.00583/abstract

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 April 2014; accepted: 14 July 2014; published online: 04 August 2014. Citation: Weiland H, Bambini V and Schumacher PB (2014) The role of literal meaning in figurative language comprehension: evidence from masked priming ERP. Front. Hum. Neurosci. 8:583. doi: 10.3389/fnhum.2014.00583*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Weiland, Bambini and Schumacher. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Mapping the brain's metaphor circuitry: metaphorical thought in everyday reason

# *George Lakoff\**

*Department of Linguistics, University of California Berkeley, Berkeley, CA, USA*

### *Edited by:*

*Seana Coulson, University of California at San Diego, USA*

### *Reviewed by:*

*Bernard J. Baars, The Neurosciences Institute, USA Seana Coulson, University of California at San Diego, USA*

### *\*Correspondence:*

*George Lakoff, Department of Linguistics, University of California Berkeley, 1303 Dwinelle Hall, #2650, Berkeley, CA 94720-2650, USA e-mail: lakoff@berkeley.edu*

### **HOW WE GOT HERE: CONCEPTUAL METAPHOR IN EVERYDAY REASON**

The discovery of conceptual metaphor independently by Michael Reddy and myself in the late 1970's showed that metaphor is primarily conceptual, and secondarily linguistic, gestural, and visual (Reddy, 1979; Lakoff and Johnson, 1980/2002). There are metaphorical ideas everywhere and they affect how we act. Metaphorical thought and the metaphorical understanding of situations arises independent of language. This discovery led almost immediately to the hypothesis that everyday reason that is understood as "abstract" (not just about "concrete" physical objects and actions) make use of embodied metaphorical thought (Lakoff and Johnson, 1999).

Reddy had found that the abstract concepts of communication and ideas are understood via a conceptual metaphor:


This notation from Lakoff and Johnson characterizes a conceptual mapping from a "source domain" frame for sending objects in containers to a "target domain" frame for communicating ideas via language.

Reddy found over 100 classes of expressions for this metaphor. Examples include: *You finally got through to him. The meaning is right there in the words*. *Put your thoughts into clear language. Your words are hollow***.** And many more. His point was that the generalization covering the linguistic metaphors was not in language, but in the metaphorical concept of communication as sending idea-objects in language-containers.

Reddy furthermore pointed out that the metaphor created an important inference about communication: the speaker is

An overview of the basics of metaphorical thought and language from the perspective of Neurocognition, the integrated interdisciplinary study of how conceptual thought and language work in the brain. The paper outlines a theory of metaphor circuitry and discusses how everyday reason makes use of embodied metaphor circuitry.

**Keywords: conceptual metaphor theory, neural circuitry, embodied cognition**

primarily responsible for its success. If you put an object in a container and send it, the receiver will find the *same* object inside. Reddy observes that in real communication, the hearer has as much responsibility as the speaker, and that what the hearer hears is very often not what the speaker intends. However, the metaphor is often taken literally, as it were true.

### **METAPHOR SYSTEMS AND DOMAINS OF THOUGHT**

A crucial idea in the study of metaphor is the *conceptual metaphor system* for characterizing a *domain of thought*. This idea was first worked out by Eve Sweetser and Alan Schwartz (see Lakoff and Johnson, 1999, chapter 12). They observed that there is a domain of Mind (a metaphorical target) that is understood via a very general metaphor that is in turn split into four subcases, each associated with a separate source domain. The general metaphor is the follow conceptual mapping:


The four special case conceptual metaphors are:


There are many linguistic examples of each.


What seems to define these domains are embodied brain regions or structures significantly involved in performing these functions. The questions raised by this analysis are, What is a domain in the brain? What defines a metaphor system and a domain neurally?

Domains seem to be characterized by hierarchically structured frames. A frame is a complex schema, a mental structure that organizes knowledge. Each frame makes use of primitive concepts and may make use of conceptual metaphors. The elements of a frame are called Semantic Roles.

For example, the semantic roles of the Seeing Frame are: The Viewpoint, The Viewer, Eyes, Light, The Directing of the Eyes, The Act of Seeing, Things Seen, The Gaze (the line from the eyes to the thing seen); Degree of Clarity. There is also knowledge about seeing: You need enough light to see; light has a source; the gaze must extend from the eyes to the thing seen in order to see; things look different from different viewpoints; and so on.

A crucial thing we learn from this is that important abstract concepts are not merely understood via one conceptual metaphor, but via multiple conceptual metaphors that provide different understandings of the concepts. For example, Communication is not just Sending, but it is also Leading (when Thinking is Moving), Showing (when Understanding is Seeing Clearly), and Feeding (when Thinking is Eating). Ideas, metaphorically, can be not only Manipulable Objects, but Locations and Food as well.

Lakoff and Johnson (1999) have shown that important concepts like Event, Action, Causation, the Mind, the Self, Morality, and Being are each defined via multiple conceptual metaphors, sometimes between a dozen and two dozen.

I made a discovery similar to Reddy's at about the same time. I had found that the abstract concept of Love is commonly understood in terms of a Journey. There are lots of linguistic expressions of this sort: *Our relationship hit a dead-end street. The marriage is on the rocks. We're getting nowhere in this relationship. We're going in different directions. We're at a crossroads in our relationship. We're spinning our wheels in this relationship.* And many more. The generalization over these cases is not in the linguistic expressions but in a conceptual mapping (indicated by "==*>*").


# **EMBODIED PRIMARY METAPHORS**

Mark Johnson and I later discovered that this complex metaphor was made up of more basic components. There are primitive metaphors that are acquired in ordinary daily life when two basic embodied experiences regularly occur together. For example, purposes are understood as destinations. In everyday life, achieving purposes often requires getting to a destination. If you want a cold beer, you'll have to go to the refrigerator. In American culture, people are expected to have goals in life, and a couple in a long-term love relationship is expected to have compatible life goals. Metaphorically that means having common destinations. A relationship is a metaphorical vehicle for three reasons: First, a vehicle is a means of getting to a destination. Second, a vehicle is a *container*. In general, relationships are understood in terms of containers: you are *in* relationship; you can *enter* or *leave* a relationship. Third, intimacy is understood metaphorically in terms of closeness: *We're very close; we're drifting apart.* Thus, a relationship is conceptualized as a container in which you are close and which is a means for reaching destinations.

Johnson and I reasoned as follows: Why is intimacy metaphorized as closeness? Because intimacy requires being physically close. Why is a relationship a container? Because when you are growing up, you tend to live in the same enclosed space as your relatives. Purposes are conceptualized as destinations because, over and over again, to achieve a purpose you have to go to a specific location. The general principle is that regular correlations in real-world embodied experience leads to primitive conceptual metaphors—*embodied primary metaphors using embodied promitive concepts*—that can combine to form complex conceptual metaphors, like the Live Is a Journey metaphor.

These considerations led directly to the theory of embodied cognition. The most popular theory of meaning at the time was that concepts were all literal, that there were no metaphorical concepts, and that concepts got their meaning via truth conditions—directly from conditions holding objectively in the real world, independent of the intervention of human minds and brains. The existence of conceptual metaphors did not fit that theory. The idea that there are primitive conceptual metaphors that arise from regular correlations in embodied experience did not fit that theory. If we were right, then a new theory of meaning for concepts was necessary.

The most obvious candidate was a theory of embodied cognition. Physical concepts, like running and jumping, chairs and people, could be understood through the sensorimotor system: they can be performed, seen, felt. If abstract concepts get their meaning via conceptual metaphor, and if complex conceptual metaphors are made up of primitive conceptual metaphors that get their meaning via embodied experience, then the meaning of concepts comes through embodied cognition.

If that was so, Johnson and I realized that there should be significant real-world consequences. Take the metaphor of Labor as a Resource, where companies seek *cheap labor*, with workers seen as interchangeable commodities to be purchased for minimum cost in a *labor market* and working people are hired though the "Human Resource Department." Thus, corporations, to maximize profits, should seek to minimize the "cost" of labor—by cutting pay and benefits, outsourcing, and laying off workers whenever possible. Johnson and I saw enormous social and political consequences arising from abstract thought being characterized metaphorically.

# **METAPHOR AND THE MEANING OF IDIOMS**

The earliest examples we looked at took us to the study of idioms. The traditional theory held that idioms had arbitrary meanings. We discovered that the meanings of a huge range of idioms were anything but arbitrary. They made use of conceptual metaphors! But not in any obvious way.

The first one I looked at was: *We're spinning our wheels in this relationship*. It has a conventional image, with knowledge about the image: The wheels are on a car. The car is stuck with the wheels spinning (either in sand, or on ice, etc.). The car isn't moving. We're putting a lot of effort into getting it moving, but it won't move. We are frustrated.

The Love Is a Journey mapping applies to the conceptual knowledge about the image. The car (a vehicle) is the relationship, the travelers are lovers and they are not making progress toward common destinations (compatible life goals). They feel frustrated.

That is what it means to be spinning your wheels in a relationship. **The conceptual metaphor applies to knowledge about the image, yielding the meaning of the idiom!**

But although the Love Is a Journey metaphor applies systematically in understanding this idiom, the literal meanings of the words in the idiom ("spinning" and "wheels") are not mapped by this metaphor. Those words activate a conventional mental image with associated knowledge commonplace in one's culture. There is a system of metaphors fixed in the mind that applies naturally, automatically, very quickly, and unconsciously to such knowledge, linking the knowledge of the image to the meaning of the idiom.

There are a huge number of idioms like this. Consider *The marriage is on the rocks.* The marriage (the relationship) is a boat (a vehicle). A boat on the rocks is not moving forward. The couple in the boat is not progressing toward their common destination (compatible life goals). The boat is likely to be harmed in some way. Even if it gets free of the rocks, it may not be able to continue on the journey. That is, even if the marriage survives, the couple may still split up. And when the boat hits the rocks, the passengers may be hurt physically. Given the metaphor that psychological harm is physical harm, the couple may be psychologically harmed by the incident.

If you have that image for the idiom and that knowledge about the image, then that is what the idiom means metaphorically. That same Love is a Journey metaphor, applying to a different image and knowledge, yields a different meaning.

These constitute a special class of idioms: they are both are imageable and metaphorical. New ones are being created all the time (Lakoff, 1987, case study 2).

A Note: Metaphorical mappings occur at a certain level of generalization. In the Love Is A Journey metaphor, the relationship is a generalized vehicle. There are special cases of vehicles: cars, boats, planes (*We may have to bail out*), rockets (*We've just taken off), trains* (*We're off the track*). It's important to recognize the general level of the conceptual metaphor. Encountering *The marriage is on the rocks*, you should not conclude that the conceptual metaphor is Love Is a Boat.

A caution: Not every speaker has the same image and knowledge. For example, some speakers understand "on the rocks" in terms of a scotch on the rocks image and the idiom will seem to them to have an arbitrary meaning. For them, the Love is a Journey metaphor does not apply, and idiom is not metaphorical. It works for them as if it were a single lexical item with an arbitrary meaning, that is, one that does follow from the language. For example, it may mean, "will probably get a divorce."

On the other hand, the arbitrary meaning may use a different conceptual metaphor, as in *The couple will probably split up*, which uses the conceptual metaphor that a relationship is a single entity made up of two parts. "Splitting up" means the relationship comes apart and there is no longer a single entity.

When a neuroscientist is using an idiom in metaphor research where there is averaging over a number of subjects, it is important to make sure that all the subjects use the same metaphor in understanding the idiom. That is not easy to do. Moreover, the metaphor may apply systematically not to the words "spin" and "wheels," or to the words "on" and "rocks," but rather to the concepts in the way the image is understood—if it is understood at all!

Some idioms are completely arbitrary, that is, you cannot figure out the meaning from the words. Take "by and large." It was originally a nautical term from the days of sailing ships. To sail "by" meant close by the wind, whereas to sail "large," meant with the wind fully behind you filling the sails (making them large). If a ship sailed well both "by and large," then it sailed well under most conditions. Via the commonplace metaphor that Action Is Motion, with sailing as a special case of motion, sailing by and large came metaphorically to mean action by and large, that is, under most conditions. With the complete loss of "by and large" in its nautical meaning, the meaning of "by and large" kept the meaning of "mostly" but the systematic metaphorical relationship to the words was lost.

Some neuroscientists choose to study idioms with body part names like *hand*, or words for what body parts do, like *kick* or *bite*. The point is to see if the relevant body part word activates the brain region in the topographic map of the body in the motor cortex. But such idioms vary in their degree of arbitrariness and directness. There is a commonplace conceptual metaphor, Control Is Control by the Hands. It occurs in the understanding of idioms like *It's in your hands now, He's got the whole world in his hands*, *They handed over the company to the Mafia*. In these cases there is a relatively direct metaphorical connection between hands and control. But that particular metaphor is not present in the understanding of *He's an old hand at phonological analysis; Tax cuts are handouts to the wealthy; Don't bite the hand that feeds you.*

The idiom *kick the bucket* has been used in some neuroscience experiments to see if there is activation in the foot region of the motor cortex. What would one expect? Not much. First, there is a lot of variation across speakers. For many speakers it is an arbitrary idiom, with the meaning of *kick* playing no role at all in the meaning. For some there is a weak mental image. Here is mine:

The bucket is upright. There is some but not much liquid in it. It is weakly kicked over and what liquid there is spills out, and it is empty and on its side after the kick. There is a common conceptual metaphor that seems to be applying here: Life is a Fluid in the Body, as in sentences like *The life drained out of him; He's full of* *life; He's brimming with life.* The spilling out of the fluid from the bucket means death. But since there was not much fluid in it in the first place, it suggests a particular kind of death—death when there is not much life left, as with an old person expected to die soon. You won't say *She kicked the bucket* of a child run over by a car or a young woman who died in childbirth.

Incidentally an image like mine appears in a prominent place in two popular movies. In *It's a Mad, Mad, Mad, Mad World*, Jimmy Durante plays an old man who dies of a heart attack on a mountain. As rigor mortis sets in, his leg goes out and kicks over a bucket that tumbles down the mountain. In *Young Frankenstein*, the man soon to become the monster dies and, in rigor mortis, kicks over a slop bucket at the edge of the bed. The kicking of the bucket is a comic way of indicating death, a visual pun in two slapstick movies.

But for many speakers, *kick the bucket* is an arbitrary idiom, with no mental image of kicking. Even in the best of cases, one shouldn't expect much by way of foot activation in the motor cortex. The kicking is only indirectly connected to the death, and then only via a conceptual metaphor that has nothing directly to do with kicking. In addition, the bucket may be a container, like the body, but that's a weak connection. And for most speakers, there is no connection at all.

The morals for neuroscientists: Be aware of what kind of idioms you are using in your experiments and what their cognitive analysis is. Always list the idioms you are using in any write-up of your experiment. And test your subjects for the images they may or may not associate with the idioms.

### **EMOTION METAPHORS**

In the early 1980's, Zoltán Kövecses and I discovered that systems of emotion metaphors arise from the physiology of emotions (Lakoff, 1987; Kövecses, 2000, 2002). For example, Paul Ekman and his colleagues found that when one is angry, skin temperature rises, blood pressure increases, and there is interference with accurate visual perception and fine motor control (Ekman et al., 1983). That is why we get such linguistic metaphorical expressions as *boiling mad, He exploded, blind with rage, hopping mad*, and many more (Lakoff, 1987, Case Study 1), (Wilkowski et al., 2009).

Damasio (1996) has observed that such bodily experiences have correlates in the brain's somatosensory system which are registered and can be seen via neuroimaging in the ventromedial prefrontal cortex as "somatic markers" that play an important role indecision making. This raises the possibility that emotions *are constituted by* the bodily effects that are registered in brain during emotional experience. Thus, it would be natural for emotions to be metaphorically conceptualized as those bodily effects, as Kövecses and I observed. This accords with the theoretical model of Lindeman and Abramson (2008) of the causal mechanisms of depression. They hypothesize that "(a) the inability to alter events is conceptualized metaphorically as motor incapacity; (b) as part of this conceptualization, the experience of motor incapacity is mentally simulated; and (c) this simulation leads to both feelings of lethargy and peripheral physiological changes consistent with motor incapacity."

These ideas, together with our emotion metaphor research, raises the possibility that one can get insight into emotional states via neuroscience and the study of linguistic metaphors for physical states.

### **METAPHOR SCIENCE IN LANGUAGE**

A whole field of metaphor science developed after 1980, including research on the role of conceptual metaphor in grammar. The first major paper on construction grammar came out in 1987, a 100+ page study of There-constructions that demonstrated the importance of conceptual metaphor in grammar (Lakoff, 1987, Case study 3). Since then, Adele Goldberg and Ellen Dodge, in booklength studies, have demonstrated how conceptual metaphors work in grammatical constructions (Goldberg, 1995; Dodge, 2010). Following those insights, Karen Sullivan has since provided the first general theory of how conceptual metaphor structures grammatical constructions (Sullivan, 2007, 2013).

Why does research on metaphor in grammar matter for an understanding of abstract thought? Because that research appears to show that there is a bifurcation in the way conceptual metaphor works in abstract thought.


### **MAPPING METAPHOR CIRCUITRY**

In 1988, Jerome Feldman and I set up the Neural Theory of Language group at the International Computer Science Institute at UC Berkeley. Its goal was to apply neural computation to results in cognitive linguistics and embodied cognition (Feldman, 2006).

In 1997, Srini Narayanan worked out a neural computational theory of metaphor in his dissertation (Narayanan, 1997) and has expanded on that work greatly since then (Feldman and Narayanan, 2004; Loenneker-Rodman and Narayanan, 2012). He and I have been working on a theory of the neural circuitry required for thought and language. The following is a discussion of the current status of our research as it applies to metaphor.


# **OUR CURRENT NEURAL THEORY OF METAPHOR**

Our current theory begins with a basic observation: The division between concrete and abstract thought is based on what can be observed from the outside. Physical entities, properties, and activities are "concrete." What is not visible is called "abstract:" emotions, purposes, ideas, and understandings of other nonvisible things (freedom, time, social organization, systems of thought, and so on). From the perspective of the brain, each of those abstractions are physical, because all thought and understanding is physical, carried out by neural circuitry. That puts "concrete" and "abstract" ideas on the same basis in the brain. Where conceptual metaphor theorists saw conceptual metaphor as conceptualizing the abstract in terms of the concrete, neural metaphor theory linked neural circuitry to other neural circuitry, allowing for a uniform theory, as follows:


but still understood via embodied primitive concepts and primary metaphors.

# **PRIMITIVE CONCEPTS AND PRIMARY METAPHORS** *Primitive concepts*

The research of Talmy (2000), Langacker (1987), Fillmore (1968), Narayanan (1997) has indicated that there are embodied primitive concepts that arise in all natural languages.

Primitive concepts are all embodied via brain circuitry linked to the body via the sensorimotor system (Regier, 1997). Motion, for example, is characterized both via **topographic maps** of the visual field in which activation moves across the visual map, coordinated with **executing circuitry** for moving the body from an initial location, through a course of motion, to a final location.

The embodiment circuitry for different primitive concepts makes use of different parts of brain, which are anatomically organized by links to the body. For example, topographic maps for motion are in region MT (or V5) in the occipito-temporal lobe, while sequentially operating circuitry for executing bodily movement occurs in the premotor and supplementary motor cortices.

We know from research on mirror neuron systems that there are premotor-parietal pathways for linking action with vision (and imagined vision) (Gallese et al., 1996; Rizzolatti et al., 1996; Gallese and Lakoff, 2005).

We (that is, researchers in embodied neurocognition) hypothesize that primitive concepts have a **schema structure** that mediates between embodiment circuitry and complex concepts that are expressed by linguistic structures in natural language.

Elementary schemas have a Part-whole structure, with the entire schema as the Whole and the Semantic Roles as the Parts. Examples of Primitive Schemas with their Semantic Roles include:


From a neural perspective, the elements of a schema are neural ensembles (called "nodes"), linked together to form a "neural gestalt." A neural gestalt is defined by very simple activation strengths and threshold conditions: each semantic role node, when activated, activates the whole schema node, which in turn activates all of its role nodes.

**Complex concepts** are formed by **neural binding circuits**, which bind together schemas in different parts of the brain. A simple example is the concept INTO, which brings together schemas for Motion and Containment: the Source of the Motion is bound to the Exterior of a bounded region and the Goal of the Motion is bound to the Interior of the bounded region.

A **Binding Circuit** links two semantic role nodes in different schemas in different locations. It has to meet certain conditions. The schemas have to be able to function independently or as a single complex schema. The bound nodes have to be taken as "identical;" that is, they have to indistinguishable in their firing patterns. Both conditions are accomplished as follows.


Binding circuits are the primary mechanism of neural composition forming complex concepts by binding nodes across diverse brain regions.

# *Primary metaphors*

Primary metaphors (Grady, 1997) are circuits that map primitive neural schemas onto other primitive neural schemas. This occurs when those pairs of neural schemas are regularly activated together because of real-world experience.

Here is a commonplace example. It is a very common occurrence in everyday life that one has to go to a specific location in order to achieve a given purpose. If you want a cold beer, you have go to the refrigerator where the beer is kept. If you want to brush your teeth, you have to the bathroom where the toothbrush and toothpaste are kept. And so on, case after case, day after day. Even infants, to feel secure, have to crawl over to where their favorite toy animal or their blanket is lying. These experiences give rise to the primary metaphor Purposes Are Destinations, which is widespread around the world. It maps the Motion Schema onto the Purposeful Action Schema as follows:


Each of these is a submapping; the whole collection of mappings jointly constitutes the metaphor mapping.

This mapping reflects a real-world fact. In the repeated experiences of going to a location to achieve a purpose, the elements of the motion schema correspond to the elements of the purposeful action schema. That is, the Actor is the Mover, the Action is the Motion, and so on.

Each of these correspondences in experience has a reflex in the brain: the corresponding nodes occur in different brain regions, but they fire together. Here is our hypothesis:

• The nodes that regularly fire together strengthen (via Hebbian learning) with regular firing.


What determines the direction of first spiking? The answer is simple: the direction from which most activation comes regularly. That will be the metaphorical Source.

When we look at examples, this explanation appears to hold. Here are some examples:


So far, this works for the many cases checked out. The result is that the Source and Targets of primary metaphors can be predicted by the STDP theory of neural learning, which is a truly remarkable result.

There are hundreds, if not thousands, of primary metaphors structuring our conceptual system. They are learned via neural learning mechanisms early in life, usually before language, just by functioning in the everyday world.

Each primary metaphor neurally maps one primitive schema onto another, creating an asymmetric circuit linking them. But each primitive schema can also occur independently of any metaphor circuitry. That means that the metaphor circuitry must be gated: normally the gates modulating the connecting synapses would not be firing above base rate; the metaphor circuit is turned on when the gates are turned on, emitting sufficient neurotransmitters to allow activation to flow.

Each submapping has a gate. In the whole mapping, the gates work together. How? The theory requires the submapping gates and the gate for the whole mapping to form a gestalt circuit. Activating any submapping activates the whole mapping, and activating the whole mapping activates each submapping. As before, gestalt circuits have easy-to-learn combinations of activation and threshold strengths.

### **EMBODIED COGNITION: THE EXPERIMENTAL RESULTS**

In Narayanan's theory of primary metaphor, the metaphors are neural circuits asymmetrically linking two brain regions, a source region to a target region, with inferences from the source region used in the target region. That means that the physical consequences of source domain activation will, via the metaphor circuitry, yield corresponding target domain activation. It follows that the activation of metaphor circuitry can prime target domain behavior, where "prime" means that it contributes neural activation that makes the behavior more likely. Here are some cases of conceptual metaphors and the confirming experiments, in which there is source domain brain activation connected via metaphor circuitry to target domain brain regions that govern target domain behavior.


washing expunged the guilt, and they saw no need to perform a helping act to expunge their guilt.


Why does this happen? Conceptual metaphors are asymmetrical physical circuits in the brain allowing the consequences of source domain activation to apply in the cases of target domain activation. Those consequences can be a sense of filth after immoral behavior, inferences affecting crime policy, feelings of pain in empathy with a loved one, leaning forward physically, judgments of importance or temperature, and so on.

Experimental results of this sort were predicted by the idea of embodied conceptual metaphor. The experimental confirmation goes well beyond the cases just listed. The following two dozen studies will provide a sense of how robust the phenomenon is: Fishy smells induce suspicion, negative moral evaluation lessens the value of money, wiping the slate clean allows one to ignore past mistakes, unburdening yourself of a secret lowers the estimation of the upward slant of hills, and many more cases where metaphor circuitry linking two brain areas leads to behavior deriving from the physical metaphor linkage. Enjoy these: (Boroditsky, 2000; Singer et al., 2004, 2006; Aziz-Zadeh et al., 2006; Gibbs, 2006; Wilson and Gibbs, 2007; Casasanto, 2008; Boulenger et al., 2009; IJzerman and Semin, 2009; Schubert and Koole, 2009; Landau et al., 2010; Sapolsky, 2010; Desai et al., 2011; Lee and Schwarz, 2011, 2012; Saygin et al., 2011; Fay and Maner, 2012; Mattingly and Lewandowski, 2013; Pitts et al., 2013; Deckman et al., 2014; Galinsky et al., 2014; Knowles et al., 2014; Masicampo and Ambady, 2014; Sassenrath et al., 2014; Schoel et al., 2014; Slepian et al., 2014; Stellar and Willer, 2014).

### **THE NEURAL METAPHOR SYSTEM**

This should not be thought of as a mere laundry list of cases. What links then together are the mechanisms that create the neural metaphor system—the neural learning mechanisms, the mapping circuits, the bindings, and the best-fit condition. "Best-fit" is more accurately called the conservation of energy law, namely, maximize the activation of existing circuitry with strong synapses that takes the least energy. Why, for example, should smelling fishy be behaviorally connected to suspicion (Lee and Schwarz, 2012)? The metaphor system contains all of the following. Note that special cases are instances of neural bindings of special to general cases that have been learned.

	- *Experiential basis:* In eating, pure food correlates with well-being, rotten food, with ill-being (Lakoff, 2008, Ch. 4)
	- *Special cases:* Communication is Sending; Thinking Is Eating: Understanding Is Digesting Understanding Is Perceiving: special case: Smelling

*Special Case:* Achieving a Purpose is Getting Desirable food A Difficulty Is Getting Undesirable food *Special Case:* Rotten food *Special Case*: Rotten fish

• *Definition:* Suspicion is an understanding that someone has acted immorally to thwart someone else's purposes without their knowledge.

Here we have primary metaphors with special cases. They fit together to form a fixed complex metaphor system that defines the abstract concept of suspicion. Because this is an existing complex neural metaphor system in the brain, it can be activated in experiments to prime behavior. That is what is going on in embodied cognition experiments that show metaphor influencing behavior.

What is particularly interesting in the Lee and Schwarz paper is what they call "bidirectionality" in "metaphorical effects" in the experiments. They showed not just that fishy smells induce suspicion, but that *by inducing suspicion in subjects, that subjects were better able to distinguish the smell of fish oil from other smelly oils.* Their point is that, while other experiments show unidirectional, source to target metaphorical effects in experiments, they could produce bidirectional experimental effects.

What does this mean? Bear in mind that bidirectionality of *experimental effect* may or may not mean bidirectionality of the *metaphorical mapping*.

There are two important considerations not discussed by the experimenters. First, in Narayanan's neural theory of conceptual metaphor, STDP (spike-timing dependent plasticity) changes bidirectional ordinary Hebbian circuitry by strengthening synapses in the regularly first-spiking direction and weakening synapses in the opposite direction. Strengthening and weakening produces *relative* asymmetry, not absolute asymmetry. Moreover, the *amount* of strengthening and weakening depends on *how* *regularly* there is spiking in one direction rather than the other. In short, there should be variation in degree. Weakening does not mean no activation in that direction, only less, often much less. But it may still be enough to produce priming effects.

Narayanan's STDP theory makes the following prediction: Activating a conceptual domain that is a metaphorical target of one or more conceptual metaphors will provide some (often little) activation of one or more source domains of various metaphors. That is, a target domain can prime (to some extent, perhaps small) possible sources. For example, divorce should prime the splitting apart, going in separate directions. Difficulties should prime burdens, roadblocks, containment, uphill climbs, etc. Success should prime climbing ladders, getting fruit, reaching destinations, and so on. Cognitive linguists studying metaphor have long noticed such effects intuitively.

Second, there is the issue of language in general: the relation between words and their meaning *is* bidirectional. This is especially true of idioms that are both imageable and metaphorical. *Smell fishy* is such an idiom. It has an olfactory image. One can imagine what rotten fish (or fish oil) smells like. This could account for the bidirectional effect of the experiment, as follows.

Suspicion activates the idea of the *immoral* thwarting of someone's *purposes*. Immorality weakly primes rottenness (one of the primary metaphorical sources), and purposefulness weakly primes getting food to eat (one of the primary metaphorical sources), which in turn would thwart eating. Rotten food has the special case of smelly fish, and that smell image primes the idiom. That weak priming may still be strong enough to help distinguish fish oil smells from other smells.

Moreover, these are not mutually exclusive and the effects could combine in the experiment to yield a bidirectional experimental effect. The Lee and Schwarz experiment is lovely and points to the need to better understand the difference between unidirectionality in metaphorical mapping and unidirectionality in experimental effect.

### **METAPHORICAL INFERENCE: THE INVARIANCE HYPOTHESIS**

How can conceptual metaphors provide content to abstract concepts, and how can different conceptual metaphors for a concept provide different content?

The circuitry constituting primary metaphors makes use of the structure of the source concept to reason about the target concept (Lakoff, 1993). For example, consider States, e.g., depression, confusion, etc. A State is understood metaphorically as a container, that is, a bounded region in space. Just as you can be in a bounded region, you can be in a state, just as you can enter a bounded region, you can enter a state, just as you can get out of a bounded region, you can get out of a state. The concept of a bounded region is used in the mapping from space to states. Or consider an executing schema that carries out a process. If you are building a house, the house is not yet finished. If you are metaphorically building an institution, the institution is still not complete.

Metaphor mappings are many-to-many.

• A target can have many sources. For example, Anger can be seen as Heat (boiling mad, all burned up, seething), Pressure (He exploded), Madness (go crazy, an insane rage), A Wild Animal (bristling with anger, unleashed his anger, a ferocious temper), and so on.


### *Embodiment and meaningfulness*

Primitive concepts and primary metaphors are at the heart of any neural theory of concepts. The reason is that they are all embodied, and embodiment is what makes concepts meaningful, linking what is going on in our brains to our understanding of the real world.

That does not mean that we understand the real world as it is in some objective sense. But it does mean that we understand the world on the basis of certain of our real experiences in it, even if our understanding is metaphorical in nature, as it commonly is. Metaphorical understanding of our experience is a natural consequence of being neural beings with both bodies and brains connected as they are, with the kind of neural learning capacities that we have.

Abstract concepts don't just float in the air. They have to be given embodied meaning somehow. Embodied metaphor is a major mechanism for characterizing how we understand abstract concepts.

### **COMMON COMPLEXES OF PRIMARY METAPHORS**

Neural binding does not just create complex schemas. It also creates complex metaphors, many of them so commonplace as go unnoticed. A Linear Scale is a Vertical Line with a schema bound to it. The schema has a Bottom, a Top, and Distances from the Bottom to points along the line. The vertical line with this schema also has a metaphor bound to it: More Is Up; Less is Down, where the verticality is in the source domain of the primary metaphor. This has a metaphorical inference, namely, Comparison of Amount Is Relative Height. Thus, *Your income is higher than mine* and *You have a bigger income than me* both mean that you make **more** money than me.

To this complex we bind the primary metaphor Change is Motion. Then, *My income rose* and *My income grew* both mean that my income changed so that I made **more** money.

Now consider the commonplace primary metaphor, Linear Scales are Paths. This primary metaphor can be seeing in expressions like *Harry is way ahead of Bill in athletic ability* and *Sally's intellect is way beyond Max's.* The metaphor is as follows:


We now bind another metaphor to this complex: Fictive motion, or A Line Is the Motion Tracing the Line. This is the metaphor in sentences like *The road runs through the woods* and *The roof slopes downward*. Binding this metaphor to the complex we have yields sentences like *Sally's intellect goes way beyond Max's* and *Corporate profits are far outpacing wages*, where the motion of *go* and **outpace** trace the distance along the vertical line.

To this complex we now bind another primary metaphor: Purposes Are Destinations, as discussed above. This has the metaphorical inference that Success Is Upward Motion and Failing is Falling. This yield sentences like *She is climbing the ladder of success* and *The middle class is falling further behind the one percent*. Note that *behind* in *falling behind* suggests forward motion, while *falling* suggests upward motion against a force pulling one downwards.

Suppose we now bind to the Purposes are Destinations metaphor an Impediment to Motion, namely, a Rigid Container that constrains motion out of the container. This gives rise to metaphorical sentences like *It's hard to climb out of poverty*, *He's trapped in poverty* and *She started climbing the corporate ladder and hit the glass ceiling***.**

This metaphor complex includes a Vertical line with a Bottom to top schema bound to it, and metaphors More Is Up, Comparison of Amount is Relative Height, Change is Motion, Linear Scales are Paths, A Line Is the Motion Tracing the Line, Purposes Are destinations, Success is Upward Motion and Failing is Falling, and Being in a Rigid Container constrains Motion Out of It. Because these are virtually all primary metaphors, we learn complexes of them with ease, without noticing all that is in the complex. Indeed, when one does notice the metaphors in the complex, a sentence like *He's trapped in poverty* may seem literal.

The Neuroscience Moral: Such complex conceptual metaphors are embodied via many different brain regions. There no single region for understanding complex ideas of any sort. Current neuroscience techniques are not likely to find evidence of all the metaphors in such a complex. Neuroscientists studying the anatomy of activation by metaphor with current techniques should probably keep to simple cases.

A General Moral: Primary metaphors—even complexes of many of them—are so natural, embodied, and deep that they can structure ones understanding without noticing that they are there. The neuroscience of concepts leads to a general principle: **You can only understand what the neural circuitry in your brain allows you to understand.** If you don't notice that you are using circuitry that is metaphorical, you will take the metaphors as being literal.

### **LINGUISTIC METAPHORS**

Since language expresses thought, language expresses metaphorical thought as well. But in addition, grammars allow language to combine thoughts to produce an unlimited range of possible thoughts. That works for linguistic metaphor as well. Grammar allows us to combine metaphors to produce an unlimited range of new metaphorical ideas—a range that draws on primary metaphors and basic complexes of primary metaphors, but which goes way beyond those. The contemporary study of *figurative language* draws upon primary metaphor and complexes of primary metaphor, combining with grammar to produce that unlimited range of complex metaphorical thought (See Dancygier and Sweetser, 2014).

The simplest case of linguistic metaphor makes use of simulation in context. Imagine someone offering an explanation of something and his respondent says *That's just not clear*. In the Thought As Vision metaphor system, Understanding Is Seeing Clearly. In the context of a proposed explanation, the word *clear* activates the Thought as Vision system and the sentences metaphorically conveys that the speaker doesn't understand the explanation. The context activates the target domain of the metaphor and the language supplies the source domain.

Another simple case is a head noun preceded by a modifying adjective, as in *brilliant student*. Here the noun *student* is the target concept and the adjective *brilliant is the metaphorical source*. In the Thought as Vision system, an especially bright light source enables especially clear vision, by oneself and/or others. Metaphor simulation is needed here. A *student* is someone who is trying to understand some subject matter. If that student is a source of metaphorical light, then the student has a capacity that is a causal source of her own clear understanding. That constitutes her "brilliance."

Sullivan points out that the adjective in such cases cannot be the target and the noun, the source. Thus, ∗*intelligent light* is metaphorically ill-formed, where intelligent is the target and light is the source. However, there are cases where the adjective is target and noun is source, namely, where the adjective is a domain adjective, that is, an adjective that names a domain, as in *spiritual*, where "emotional" specifies that target domain as emotion and "intelligence" is applied from the source domain of the cognition.

A linguistically naïve view of metaphor characterizes the basic form of a metaphor as A is B (as in "the student is brilliant"), where A is the target and B is the source. But that fails in the case of domain adjectives. *Spiritual wealth* has spiritual defining the target domain and *wealth* defining the source domain. To understand *spiritual wealth*, you have to try to simulate a frameto-frame mapping from the domain of wealth to the domain of spirituality, for example, a considerable wealth might map to considerable spirituality, multiple forms of wealth might map to multiple forms of spirituality. But the A is B form is not available for *spiritual wealth.* ∗*His wealth is spiritual* is ill-formed in metaphorical grammar. To get some sense of the range of such cases, consider *emotional intelligence*, but not ∗*His intelligence is emotional*; *economic war*, but not ∗*This war is economic.*

Metaphor is woven into grammar in complex ways. A common example of metaphor in grammar is described as the construction *the X of Y*, where X is a metaphor source and Y is a metaphor target. But the real examples are more complex. Consider the following examples: *He is in the grip of anger*. *We're riding in the fast lane on the freeway of love*.

In the first case there are two metaphors that act together: Emotions Are Exerters of Force and Control is Control by the Hands. Anger is a special case of an emotion exerting force and thereby control, which is metaphorically control by the hands. That is what it means to be in the "grip" of anger.

In the second case, the metaphors are Love is a Journey and Action is Motion. But there is an extra wrinkle. The freeway is a metonymy; it stands for travel on a freeway. Driving in the fast lane is the specific mode of travel. It is exciting. It is reckless. You could get hurt. The sentence as a whole, with that construction and the metonymy describes a reckless love affair that is exciting but can lead to emotional harm.

### *Everyday complexity of linguistic metaphors*

Metaphors play crucial roles in complex ideas. On Sunday, June 26, 2011, the following headline appeared in the main column on the front page of the NY Times. It was read by millions:

*Insiders Sound an Alarm Amid a Natural Gas Rush Productivity of Shale Wells is a Concern — Investor Flood Spurs Talk of Bubble*

Let us look at some of the metaphors, one at a time.

*Insiders.* An institution is understood as a Container, with an inside and an outside. Those on the inside of the institution are called "insiders." The natural gas industry is such an institution. "Insiders" have "inside information" that the institution tries to keep inside, often because the stock price would change (in this case, fall) if the true information were known "outside" the industry.

*Sound an alarm.* An alarm is a loud warning sound indicating immediate danger. To "sound" an alarm is to create a loud alarm sound heard by those in danger of being significantly harmed. In this case, the metaphorical "harm" is financial. Financial "harm" is understood as loss of money in the market.

Putting these together, we form the idea that people with "inside information" about an industry are loudly warning that investors in that industry may lose a lot of money on their stock investments.

*Amid.* Amid is a spatial term indicating that a physical entity is surrounded by a lot of other physical entities.

*Natural gas rush.* This is a metaphor based on "Gold Rush," in which a large number of people with little real information traveled hurriedly to find gold for the taking. Some did get very rich, but most people worked very hard digging for gold without finding any. In this metaphor, what is preserved is the "rush to get rich quick." What is changed is that natural gas has replaced gold as the way to get rich quick.

Putting all this together via bindings, we get: People working for the natural gas industry who have "inside information" find themselves surrounded by people trying to get rich quick in the natural gas industry and are warning those people of possible loss of money in natural gas stock.

Cases like this are everywhere. Just pick up a newspaper or newsmagazine and start reading. The individual metaphors contribute pieces of knowledge. To piece this knowledge together, the meanings of the individual metaphors have to be combined. That is, neural circuitry must be activated to form an overall coherent meaning, In the neural theory of language, the problem of what gets bound together neurally is called the "best fit" problem: what circuitry can be activated with the least energy to fit the pieces together? The brain is a physical system that works by conservation of energy. The stronger the synapses in a circuit, the less energy it takes to activate that circuit. That means that circuitry with the strongest existing synapses are most likely to be activated to form a best fit. In short, the brain will tend to use what it already knows as much as possible to create a "best fit."

Neural computational modeling of "best fit" for a limited range of cases has been done by Bryant (2009).

### **COMPLEX METAPHORICAL BLENDS**

Consider the example, "*Investor Flood Spurs Talk of Bubble*."

The concept of "inflation" is based on an economic metaphor that real value is substance, and "inflated value" is made up partly of substance and partly of air. Real value is the ability to yield at least a certain amount of profit on an ongoing basis. Inflation occurs when the price of a stock or property gets higher than its real value. Metaphorically, the inflated part of the value is air, not substance.

The concept of a "bubble" comes with an image and knowledge about the image: the bubble is constituted of a fixed amount of substance. The bubble gets bigger when air is pumped into it. The amount of substance is fixed, most of the bubble is air with no substance, and the surface of the bubble gets thinner as the bubble gets bigger. Eventually, the surface gets so thin that bubble breaks and collapses.

In the stock market, a metaphorical bubble is a fixed amount of stock or property. As more people invest in it, the price may go up while the real value does not. That is, there is no "substance" to the investments. Eventually, the amount of value per unit of price is so little, that investors withdraw their investments, and the value *drops* precipitously ("a collapse"). The primary metaphors here are Real Value Is Substance, Inflated Value Is Air; *More Is Up*; and A Success is A Rise; A Failure is A Fall. Success in investing is a gain of real value of investments. Failure in investing means a loss of real value of investments.

In a market segment, a certain amount of investment is needed for the market segment to produce real value. Too many investors can drive stock prices up beyond real value and result in inflated value. Too much inflation produces the threat of a "bubble" that will break, and result in a considerable loss in the value of investments.

A literal flood is large body of uncontrolled rushing water that can sweep people up in it, can do a lot of damage, and harm those caught in the flood. An "investor flood" is a metaphorical flood made up of investors. This is made up of a primary metaphor, Multiplex Is Mass, in which a large number of unspecified indistinguishable individuals is conceptualized as a fluid mass, as when you see hundreds of sheep at a distance as a sheep, or when you see a lot of people as a crowd *flowing* through the streets. Investors who buy stock in a market segment are metaphorically understood as "entering" the market segment. Thus, the market segment is understood as a bounded region of space, and buying stock that is part of that market segment is seen as "entering." A flood moves in a direction, and hence an "investor flood" refers to a large mass of investors entering a single area of the market.

The word "spur" literally refers to the spur on the boot of someone riding a horse. The rider spurs the horse to make it move by inflicting a small pain creating fear of a greater pain to come if the horse does not move. The primary metaphor here is Action Is Motion. "Spur" means to cause action by inflicting fear of pain. The second metaphor used is Financial Loss Is Painful Harm. Metaphorically, "spur" in this case means that the uncontrolled flood of investors in natural gas is causing talk of a bubble because of a fear of financial loss.

Understanding "*Investor Flood Spurs Talk of Bubble"* makes use of a number of very general metaphors: Multiplex Is Mass; More Is Up, Less Is Down; Success is Rising, Failing is Falling; Action is Motion; Financial Loss Is Pain; Real Value Is Substance, Inflation is Air. In addition certain frames are used: A flood frame, a bubble frame, and a spurring frame. From the perspective of the brain, these are neurally activated and neurally bound together in just the right way via grammar and what is called "best fit."

Note that meaning of "*Investor Flood Spurs Talk of Bubble*" does not first assign a literal meaning to the phrase and then apply a metaphor directly to the whole. Rather pieces activate fixed and very general conceptual metaphors and frames, which are then fit together via both grammar and neural best-fit mechanisms to make the most sense in context.

This is common in poetic metaphors. Dylan Thomas wrote "Do not go gentle into that good night." The sentence has no literal meaning. But it has a powerful metaphorical meaning since it evokes three metaphors for death. "Go" activates Death Is Departure, as in "He's left us." "Night" activates Death is Darkness. And "Gentle" activates Life is a Struggle and Death is Giving up the Struggle. The sentence as a whole is given metaphorical meaning via these three conceptual metaphors, each applying to different words in the sentence.

But one doesn't have to look to headlines or poetry. Ordinary language also works this way. Take a sentence like, "Because he skipped steps, what he said didn't add up." Again, the sentence has no literal meaning. Two metaphors are used. "Skipped steps" evokes Thinking Is Moving and Rational Thinking is Moving Step-by-Step. "Didn't add up" evokes Thinking Is Adding, A Thought is a number to be "counted" in the addition, and the Conclusion of an Argument Is the Sum (as in "Let me sum up").

### **NEURAL BINDING CREATES "BLENDS"**

To complete the picture we have given of the current state of metaphor theory, we need to consider some examples of "blends" (Fauconnier and Turner, 2002; Grady et al., 1999). During the home run race in which Mark McGuire and Sammy Sosa sought to break the home run records of Babe Ruth (60 in 154 games) and Roger Maris (61 in 162 games), the race was portrayed visually by a cartoon that appeared daily in newspapers. In the cartoon, a number of batters were lined up as in a race, with the one "ahead" on the right and the ones "behind" on the left. The text might read something like "McGuire is two games behind Ruth," "McGuire catches up with Ruth," and "McGuire passes Ruth." The metaphor being used was: An attempt to break a record Is a Race between the Challengers and the Record Holder. In addition there was a neural binding. Babe Ruth played many years before McGuire and Sosa, and was long dead when McGuire and Sosa challenged his record. To allow the metaphor to apply, a neural binding is needed, identifying Ruth of yesteryear and the contemporary challengers as racers in the same race at the current time. The metaphor, plus the neural binding, creates what is called a "blend."

Another classic example of a "blend" can be seen in cases where the following metaphor applies.

The Profession Metaphor:

A Person who performs actions with a certain *characteristic.*

Is A Member of a Profession known for *that characteristic.*

This is a metaphor, but it also has a neural binding across the source and target: the "characteristic" must be the same. The result of the metaphor plus the binding is called a "blend." The most famous example is the pair:

(1) My butcher is a surgeon.

(2) My surgeon is a butcher.

These draw upon the following frame-based knowledge:

The Butcher Frame: A butcher is someone who characteristically cuts without care and control.

The Surgeon Frame: A surgeon is someone who characteristically cuts with great care and control.

The same metaphor applies in both cases, but with different "characteristics" and different professions. In (1), the source domain Profession uses the Surgeon Frame (a special case of Profession), and the "characteristic" is "cutting with great care and control." In (2), the source domain Profession uses the Butcher Frame (a different special case of Profession) and the "characteristic" is "cutting without care and control." The example uses three kinds of mechanism: A metaphor, a binding, and two frames that are special cases of Profession in the source domain of the metaphor.

### **METAPHORS APPLY TO NARRATIVES**

In any culture, there are narratives. Each narrative has certain dimensions of structure. There is a frame structure, a linear order structure, an emotion structure, and a metaphor structure. The clearest description of how these metaphorical narratives work is given in Chapter 1 of *The Political Mind* (Lakoff, 2008). The emotion structure is particularly interesting.

The Hero-Villain narrative begins with a Villain doing harm or threatening a Victim: Hearing of the harm, you feel anger or outrage. The Hero encounters the Villain and you don't know who is going to win. You feel fear or anxiety. The Hero wins. You feel relief and joy. Such "canned emotions" are built into narrative structures. Moreover, the Hero-Villain narrative can apply, via metaphor, to a political race, to scientific discovery (e.g., *The Double Helix*), to a whistle-blower at a company that is endangering the public (e.g., *Erin Brockovich*).

Jenny Lederer has analyzed a children's story from this perspective. For example, there is a story about a young fish ("The Noble Gnarble") living at the bottom of the ocean who wants to see sunlight. The fish swims up and up, encountering a new danger at each level and overcoming the danger by virtue of what would normally be seen as a handicap that happens to be just the advantage needed to escape the danger. The story is a classical Overcoming-Obstacles-to-Reach-a-Noble-Goal narrative, applied metaphorically to a young fish. Such metaphorical narratives are everywhere.

# **HOW DO WE UNDERSTAND REALLY COMPLEX METAPHORS?**

Metaphorical understanding is based on the embodiment imposed by primary metaphors, which arise via ordinary neural mechanisms when commonplace embodied experiences regularly occur together. Linguistic expressions that are metaphorical are typically complex from a conceptual point of view. They may use a number of conceptual metaphors (many of them primary metaphors) as well as frames and bindings. What are called "blends" arise from metaphors and/or frames plus neural bindings.

What neural bindings occur is often a matter of grammar plus "best fit" in context. According to our current theory of "best-fit," complex neural circuitry is activated in context when that circuitry has the overall strongest synaptic strength in that context, and therefore can be activated with the least energy.

What is remarkable is that this is done instantly. No special talent is needed. Millions of readers read the above headlines in the NY Times, understood them instantly, and never noticed that there was anything unusual about them.

### **CASCADES**

Narayanan and I are in the process of developing a theory of neural cascades to make sense of this data and much more. We distinguish learned cascade circuitry from a functioning cascade of activation and inhibition. Cascades are two-way circuits linking diverse brain regions connected to the body, allowing meaning from multiple realms of embodied experience to "give meaning" to linguistic, gestural, and other aspects of form. Each link in a cascade circuit does very little, but they add up to produce all of human thought.

### **EMBODIMENT: THE CENTRAL ISSUE**

All of what we have been discussing stress the centrality of embodiment as *the* mechanism of meaningfulness. It may be relatively obvious that sensorimotor embodiment plays a role in concepts that are not abstract, like running, kicking, seeing, smelling, and obvious concepts having to do with acting and perceiving. The neural theory of metaphor allows for the sensorimotor system to account for the meaning of abstract concepts as well, in the ways that we have seen throughout this paper.

A theory of cascades is necessary for two reasons: In complex concepts that make use of multiple primary concepts and primary metaphors, there will be a multiplicity of embodiment. Cascade theory provides the circuitry necessary to carry this out. it also provides the circuitry necessary to link the embodiment of linguistic form (in sound, writing, sign, and gesture) to the embodiment of meaning.

### **MULTIMODALITY, NOT MODULARITY**

A major moral: From all the examples given above, it should be clear that there is no one "module" in the brain that handles language, or metaphor, or abstract thought. It takes extensive cascade circuits linking many diverse brain regions to allow for the indefinitely large variety of human reason and imagination.

### **EPILOG**

This volume is a contribution to the scientific study of how the human brain can give rise to the details of thought and language—in this case, metaphorical thought and language. Neuroscience alone cannot answer this question, since it does not study the details of thought and language. Cognitive linguistics does. Hence, this paper. Experimental embodied cognition research also contributes scientific research on this issue. And finally, neural computation of the sort pioneered by Srini Narayanan has allowed us to model the requisite neural circuitry and neural learning mechanisms.

The very existence of this volume is testimony to the desire for cooperation across four disciplines, an integration of which is necessary to address this issue. I would like to express my gratitude to Frontiers and to the editors of this volume for taking on such a cooperative scientific enterprise.

### **ACKNOWLEDGMENTS**

I would like to thank the following for their help: Oana David, Ellen Dodge, Jerry Feldman, Jisup Hong, Mark Johnson, Lara Krisst, Srini Narayanan, Aucher Serr, Elise Stickles, and Eve Sweetser. The research on conceptual metaphor presented here has been supported in part by the Berkeley MetaNet Project through the Intelligence Advanced Research Projects Activity (IARPA) via the Department of Defense US Army Research Laboratory—contract number W911NF-12-C-0022. Disclaimer: The views and conclusions contained herein are those of the author and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoD/ARL, or the U.S. Government.

### **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 April 2014; accepted: 10 November 2014; published online: 16 December 2014.*

*Citation: Lakoff G (2014) Mapping the brain's metaphor circuitry: metaphorical thought in everyday reason. Front. Hum. Neurosci. 8:958. doi: 10.3389/fnhum. 2014.00958*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Lakoff. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Auditory and motion metaphors have different scalp distributions: an ERP study

While many links have been established between sensory-motor words used

#### Gwenda L. Schmidt-Snoek \*, Ashley R. Drew† , Elizabeth C. Barile† and Stephen J. Agauas †

Department of Psychology, Hope College, Holland, MI, USA

### Edited by:

Seana Coulson, University of California at San Diego, USA

### Reviewed by:

Bálint Forgács, Central European University, Hungary Sabrina Schneider, University of Tuebingen, Germany

### \*Correspondence:

Gwenda L. Schmidt-Snoek, Department of Psychology, Hope College, 35 E 12th St., Holland, MI 49423, USA schmidtg@hope.edu

### †Present address:

Ashley R. Drew, Department of Psychology, Temple University; Elizabeth C. Barile, Department of Chemistry, University of Illinois at Urbana-Champaign; Stephen J. Agauas, Department of Psychology, Central Michigan University

> Received: 30 June 2014 Accepted: 23 February 2015 Published: 13 March 2015

### Citation:

Schmidt-Snoek GL, Drew AR, Barile EC and Agauas SJ (2015) Auditory and motion metaphors have different scalp distributions: an ERP study. Front. Hum. Neurosci. 9:126. doi: 10.3389/fnhum.2015.00126 literally (kick the ball) and sensory-motor regions of the brain, it is less clear whether metaphorically used words (kick the habit) also show such signs of "embodiment." Additionally, not much is known about the timing or nature of the connection between language and sensory-motor neural processing. We used stimuli divided into three figurativeness conditions---literal, metaphor, and anomalous---and two modality conditions---auditory (Her limousine was a privileged snort) and motion (The editorial was a brass-knuckle punch). The conditions were matched on a large number of potentially confounding factors including cloze probability. The electroencephalographic response to the final word of each sentence was measured at 64 electrode sites on the scalp of 22 participants and event-related potentials (ERPs) calculated. Analysis revealed greater amplitudes for metaphorical than literal sentences in both 350--500 ms and 500--650 ms timeframes. Results supported the possibility of different neural substrates for motion and auditory sentences. Greater differences for motion sentences were seen in the left posterior and left central electrode sites than elsewhere on the scalp. These findings are consistent with a sensory-motor neural categorization of language and with the integration of modal and amodal information during the N400 and P600 timeframes.

### Keywords: metaphor, motion, auditory, familiarity, imageability, embodied language, N400, P600

# Introduction

Now that many neural links have been established between language and action (e.g., Wallentin et al., 2005) it is time to move beyond the debate about whether language and cognition are embodied and to begin investigating the timing and nature of the neural link between language and sensory-motor aspects of experience (Chatterjee, 2010). Many investigators have demonstrated links between literally used action words (grasp the ball) and action areas of the brain, but fewer have done so with action words used metaphorically (grasp the idea), which do not literally refer to actions. In the current study we investigate the metaphorical use of motion and auditory words using event-related potentials (ERPs). Using this method allows us to examine the timing of the link between conceptual and sensorimotor aspects of a semantic concept.

# Modality and Metaphor

Most psychology and cognitive science researchers have previously regarded language and cognition as amodal; however recently the notion that cognition may be grounded in sensory-motor experience and embodied has become dominant (Barsalou, 2008). Several studies have shown the activation of sensory or motor regions of the brain during the processing of words or other stimuli depicting actions or sensory experiences (Pulvermüller, 2005; Wallentin et al., 2005; Daselaar et al., 2010). Nevertheless intense debate surrounds the embodied view of language (Gibbs, 2013a). To move forward, it is not necessary to continue to demonstrate links between cognition and action. Rather, work must now focus on the nature of that embodiment, the direction of influence between modal and amodal representations, and the timing of the connection between them (Mahon and Caramazza, 2008; Chatterjee, 2010; Gibbs, 2013a). For example, Rueschemeyer et al. (2010) found participants performing an intentional action (but not a nonintentional action) showed a priming effect for processing words depicting manipulable compared to nonmanipulable objects. If something like intentionality is important in showing a link between sensorimotor processes and language, the nature of embodiment may be more complex than previously thought.

Support for embodied theories of language has frequently come from reports of activations in sensory-motor areas of the brain triggered by action words (e.g., Barsalou, 1999; but see Mahon and Caramazza, 2008, who also discuss other interpretations of these findings). Using metaphor is a particularly compelling way to examine embodied theories of language. Showing the neural activation associated with words referring to physical actions (grasp the ball) does not go as far as extending this to action words used metaphorically (grasp the idea). When a word with sensory-motor properties used in a non-literal way recruits the sensory-motor regions of the brain, this activation provides strong support for a robust association between physical experience and completely abstract concepts in the brain, such as understanding (grasping) an idea. While some studies failed to find this association (e.g., Aziz-Zadeh and Damasio, 2008; Cardillo et al., 2012), some recent attempts have been successful (e.g., Cacciari et al., 2011; Desai et al., 2011, 2013). Most studies have reported links between action concepts and the motor system, but links in other modalities such as texture and the sensory system (e.g., Lacey et al., 2012) have also been reported. We extend this work by comparing metaphors based on two modalities, auditory (The flowers were a colorful clamor) and motion (Her inquiries were a nervous scamper).

Identifying when and how modal and amodal representations interact in the brain is important for understanding the nature of embodiment. For example, the link between the amodal and modal representation of a specific word or concept may only happen at a specific stage of processing rather than globally (Ritchie, 2008). Neuroimaging studies by their nature do not provide precise timing information, but electrophysiological methods do. In particular, the N400 ERP component, a negativity occurring about 400 ms after stimulus presentation,

TABLE 1 | Examples of each sentence type.


is sensitive to anomaly as in He took a sip from the transmitter. A larger N400 amplitude has traditionally been considered an index of the ease of semantic integration (Kutas and Hillyard, 1980). Current thinking suggests the N400 is more specifically associated with neural access to initial conceptual representations or semantic retrieval (Van Petten and Luka, 2006). In fact, Federmeier and Laszlo (2009) proposed the N400 is associated with the binding of data from various modalities, creating a multimodal conceptual representation that is dynamically created and highly context dependent (Kutas and Federmeier, 2011). The N400 is an ideal measure for investigating the timing of the neural basis of metaphor based on different modalities and is our primary dependent measure.

### The Current Study

Stimuli in the current study were divided into three figurativeness conditions (literal, metaphor and anomalous) crossed with two modality conditions (auditory and motion). Sentences included auditory literal (His comeback was a haughty **snort**), auditory metaphor (Her limousine was a privileged **snort**), motion literal (The blow was a single **punch**), motion metaphor (The editorial was a brass-knuckle **punch**), and anomalous. See **Table 1** for more examples. The conditions were matched on a large number of potentially confounding factors (Cardillo et al., 2010).

The purpose of the current study was to use ERP to investigate the nature and time course of metaphor comprehension based on two different modalities. We compared the neural processing of motion and auditory modalities. It is common in ERP studies to use differences in component distribution across the scalp to infer differences in neural areas recruited. Kutas and Federmeier (2011) discuss a number of such examples with the N400. We hypothesize that if the N400 reflects the binding of data from different modal and amodal representations, different parts of the brain should be recruited in addition to language areas for each modality--for example the motor cortex for motion sentences, and the auditory cortex for auditory sentences. We predicted a difference in the scalp distribution of the N400 for the two modalities demonstrating different underlying patterns of activation at 400 ms post stimulus.

While examining our data in the present study, it became apparent that differences in positivity were occurring in the P600 time range. The P600 ERP component has traditionally been considered an index of syntactic error processing although it is now known to be involved in various complex sentences processing mechanisms (e.g., Gouvea et al., 2010; Kutas and Federmeier, 2011), including semantic integration (Brouwer et al., 2012). We thus added the P600 as an additional dependent measure.

# Materials and Methods

# Participants

Participants were 28 volunteers with at least 1 year of postsecondary education from the Hope College community. Data from two participants, who scored less than 60% correct in the anomalous condition, were excluded since their score suggests they may not have comprehended many metaphors. Data from an additional four participants were excluded due to insufficient acceptable trials (less than 20 per condition). The remaining 22 participants (17 women, mean age 20.8 years, range 18--23, mean years of education 14.5, range 13--16) were native English speakers, and had no history of neurological or psychiatric disorders. All participants were right-handed with a mean handedness score of 0.84 (SD = 0.16) (Annett, 1970); 11 reported left-handed family members. This study was approved by the Hope College Human Subjects Review Board and all participants provided written informed consent prior to participation.

# Creation of Stimuli

A preliminary list of 411 sentences was compiled consisting of literal, metaphorical, and anomalous sentences. Literal and metaphorical sentences were obtained from Cardillo et al. (2010). Cardillo et al. matched sentences on 10 dimensions: length, frequency, concreteness, familiarity, naturalness, imageability, figurativeness, interpretability, valence, and valence judgment reaction time. The sentences began with a subject followed by the past or present tense form of the verb ''be'' followed by an adjective for the object (e.g., His job was an endless). Each sentence ended with either an auditory or motion target word as the object (e.g., groan). Motion words were physical actions depicting motion such as climb, dig, and stampede, whereas auditory words included sounds like sneeze, chirp, and hiss. For each target word, a literal and a metaphorical sentence were written (**Table 1**). Thus, each target word was used both literally and figuratively based on the context of the noun phrase. In the present study, the same sentence structure and target words were used to create anomalous sentences. Anomalous sentences were created by the authors and had neither a literal nor metaphorical meaning. These sentences were included as a control condition for comparison with the literal and metaphorical conditions.

Before the final selection of stimuli, three preliminary studies further characterized the sentences. Fifty-two native English speakers, who did not participate in the main experiment, completed a cloze probability questionnaire by finishing each sentence with the first word that came to mind. Words were keyed into a spreadsheet using a standard computer keyboard. Excluding one participant due to non-compliance with task instructions, data from the remaining 51 participants (35 women, mean age 19 years) were used to calculate the cloze probability of each sentence. The sum of answers matching the actual target word was divided by the number of participants to measure the sentence ending predictability.

A second questionnaire was completed by 20 native English speakers (14 women, mean age 18 years) who did not participate in the main experiment. Using a 7-point scale (1 = low, 7 = high), each participant rated the familiarity and imageability of 277 sentences (70 literal, 70 metaphorical, 137 anomalous). Responses were keyed into a spreadsheet using a standard computer keyboard. The anomalous sentence ratings were added to the collection of literal and metaphorical sentence ratings.

Third, a pilot test was conducted to attain average response times and accuracy ratings for each sentence. Twenty native English speakers (13 women, mean age 18 years) were tested on the original stimulus set of 411 sentences using the procedure from the main experiment.

The resulting cloze probability, familiarity, imageability, pilot response time, and pilot accuracy ratings were used in the final selection of stimuli to create the most balanced stimuli possible. In addition, several other factors were balanced. Crucially, modality (auditory, motion) and figurativeness (literal, metaphorical) factors did not differ on cloze probability ratings (ps > 0.05).

Some of the stimuli had an adjective modifying the final target word (Cardillo, 2010). Across motion and auditory sentences, there was no difference in the number of sentences having an adjective modifying the object (target) and those that did not (p > 0.05). The frequency and concreteness of adjectives in motion vs. auditory sentences did not differ as a whole or looking at literal and metaphorical sentences separately (all ps > 0.05). However, several factors across figurativeness conditions for either modality could not be balanced (ps < 0.05). **Table 2** lists all the factors considered and descriptive statistics for the four sentence types. **Table 3** gives the results of t-tests conducted to assess differences. The final stimulus set contained 300 sentences, 50 in each condition.

# Procedure

Participants were tested individually in a single experimental session. Stimuli were presented using E-Prime software (Psychological Software Tools, Pittsburgh, PA, USA) in 20pt Arial bold font, with white text on a black background. During a practice block of 10 sentences, participants were acclimated to the task and given verbal feedback regarding their task performance and blinking. Each trial began with the beginning of the sentence (the entire sentence except the last word). Participants controlled the advancement of the trial by pressing the spacebar when ready. Next, an automatic timed sequence occurred in which participants were asked not to blink: fixation cross (500 ms), final word of the sentence (1200 ms), and a response screen (limited to 5000 ms). The response screen instructed participants to indicate whether the presented sentence was literal, metaphorical, or anomalous

### TABLE 2 | Characteristics of the final stimuli.


KF frequency = frequency value from Kuˇcera and Francis (1967). BN frequency = SUBTLEX frequency value from Brysbaert and New (2009). Familiarity, Imageability and Valence reflect ratings of the entire sentence. Frequency and Concreteness ratings reflect the mean value of all content words in each sentence. Since concreteness ratings are based on published norms of individual words, they do not necessarily reflect the concreteness or imageability of the sentence as a whole. Valence ratings were binary; subjects rated sentences as positive or neutral/negative.

### TABLE 3 | Comparison of final stimuli across modality and figurativeness conditions.


Degrees of freedom = 98 for each t-test. ns = non-significant, p < 0.05 (two-tailed). See the legend for Table 2 for information about the items listed.

via keyboard response with the first three fingers of the right hand. This ensured that metaphorical trials were processed as metaphorical by the participant since incorrect trials were discarded. It also ensured the subjects were attending to and processing the sentences. However it may be that a certain neural pattern motivated the participants to give a particular behavior response, triggering our results and resulting in circular reasoning. The present results must

Frontiers in Human Neuroscience | www.frontiersin.org March 2015 | Volume 9 | Article 126 |

be interpreted with this caveat in mind. Once an answer was given, the next trial began after a randomly assigned intertrial interval between 900 ms and 1150 ms in 50 ms increments.

Each of the 17 blocks contained an equal number of each sentence type in a unique random order for each participant. An additional version of the experiment was formed by reversing the order of the blocks. Participants were randomly assigned to one of the two block orders to reduce word priming effects in the experiment. Participants controlled their resting time upon the completion of every block. The total duration of the study was approximately two hours.

### Electrophysiological Recording

Scalp activity was recorded with a 64 channel BioSemi ActiveTwo system (BioSemi Inc., Amsterdam, Netherlands) with an analog-to-digit rate of 512 Hz and a bandwidth of 104 Hz. A Common Mode Sense (CMS) active electrode was used as the reference, and a Driven Right Leg (DRL) passive electrode was used as the ground. Active Ag-AgCl pin-type electrodes were inserted into a Lycra head cap with locations based upon the American Electroencephalographic Society (1994). Electrooculograms (EOG) were recorded using flat-type electrodes placed on the left and right infraorbital ridge and outer cantus. In addition, two more flat-type electrodes were placed on the left and right mastoids. Individual electrode offsets were kept between ±30 mV.

Offline, electroencephalography (EEG)/ERP analyses were conducted using EMSE Suite software (Source Signal Imaging Inc., San Diego, CA, USA). The left and right mastoid recordings were averaged and used as the offline reference. A digital bandpass filter of 0.01--30 Hz was applied to the EEG recordings, and then an individual eye artifact filter removed eye movements for each participant. ERPs were obtained through stimuluslocked averaging of each condition with an epoch extending from 200 ms pre-stimulus to 800 ms post-stimulus. Trials in which EEG or EOG channels exceeded ±50 µV, or in which the participant did not respond correctly in 5000 ms were eliminated. The remaining segments were baseline corrected and then averaged to create ERP waveforms for each participant. The mean number of trials averaged per condition per participant across all cells of data was 35.6 (SD = 6.9, range 20--50). Across the six conditions, the condition with the smallest number of mean trials per participant per condition was the auditory metaphor condition at 30.0 (SD = 6.4) and the condition with the largest number was the auditory anomalous condition with 40.0 (SD = 7.1). **Figure 1** shows how the 64 electrode sites were divided into the following eight scalp regions: Left Anterior (FP1, AF7, AF3, F7, F5, F3, F1, FT7, TCF, FC3, FC1), Left Center (T7, C5, C3, C1), Left Posterior (TP7, CP5, CP3, CP1, P9, P7, P5, P3, P1, PO7, PO3, O1) Center Anterior (FPz, AFz, Fz, FCz, Cz), Center Posterior (CPz, Pz, POz, Oz, Iz), Right Anterior (FP2, AF4, AF8, F2, F4, F6, F8, FC2, FC4, FC6, FT8), Right Center (C2, C4, C6, T8), Right Posterior (CP2, CP4, CP6, TP8, P2, P4, P6, P8, P10, PO4, PO8, O2). We operationalized the N400 amplitude as the area under the curve from 350 ms to 500 ms and the P600

amplitude as the area under the curve from 500 ms to 650 ms, based on visual inspection of grand averages.

# Results

# Behavioral

The mean accuracy score across participants was 0.76 (SD = 0.05) and only correct trials were included in the ERP and reaction time analyses. A 2 (modality) × 3 (figurativeness) repeated measures ANOVA using the Huynh-Feldt sphericity correction was conducted on mean accuracy scores revealing an effect of figurativeness, F(2,42) = 8.2, p = 0.003, ε = 0.784 and a modality × figurativeness interaction, F(2,42) = 6.1, p = 0.005, ε = 1.0 (Degrees of freedom are reported with sphericity assumed throughout this manuscript). The interaction can be explained by an effect of modality for metaphors, F(1,21) = 15.8, p = 0.001, but not for literal or anomalous sentences, ps > 0.05. Planned comparisons revealed that metaphorical sentences (M = 0.68, SD = 0.13) were processed less accurately than either literal (M = 0.78, SD = 0.09, t(21) = 3.0, p = 0.006) or anomalous (M = 0.83, SD = 0.11, t(21) = 3.2, p = 0.005) sentences.

A 2 (modality) × 3 (figurativeness) repeated measures ANOVA using the Huynh-Feldt sphericity correction was conducted on mean reaction times revealing only an effect of figurativeness, F(2,42) = 8.3, p = 0.001, ε = 0.98. Planned comparisons revealed that metaphorical sentences (M = 855 ms, SD = 201 ms) were processed more slowly than either literal (M = 772 ms, SD = 245 ms, t(21) = 2.7, p = 0.01) or anomalous (M = 705 ms, SD = 182 ms, t(21) = 3.9, p = 0.001) sentences.

### Electrophysiological N400

A 2 (modality) × 3 (literal/metaphor/anomalous) × 8 (scalp region) repeated measures ANOVA using the Huynh-Feldt sphericity correction was conducted to assess differences in

N400 amplitude. This revealed a main effect of figurativeness, F(2,42) = 24.0, p > 0.001, ε = 0.79, a main effect of scalp region, F(7,147) = 3.6, p = 0.025, ε = 0.37, and a trending modality × scalp region interaction, F(7,147) = 2.5, p = 0.063, ε = 0.46.

**Figure 2** shows the largest N400 amplitudes were for anomalous sentences, followed by metaphorical sentences, t(21) = 3.9, p = 0.001, followed by literal sentences, t(21) = 4.8, p < 0.001. The modality × scalp region interaction reflects larger N400 amplitudes for motion than auditory sentences, with significant differences in the Left Center, t(21) = 2.1, p = 0.047, Left Posterior, t(21) = 2.5, p = 0.02, and Center Posterior, t(21) = 2.3, p = 0.03 scalp regions (**Figure 3**).

To determine whether the figurativeness effect included a difference between literal and metaphorical sentences, the analysis was repeated without the anomalous condition, revealing a similar pattern with a main effect of figurativeness, F(1,21) = 18.0, p < 0.001, a main effect of scalp region, F(7,140) = 2.8, p = 0.055, ε = 0.37, and a modality × scalp region interaction, F(7,147) = 2.7, p = 0.049, ε = 0.47. No other effects or interactions in either analysis were observed, ps > 0.05.

### P600

Visual inspection of the findings suggested possible effects in the 500---650 ms time window, which we called the P600. To investigate this possibility, the same two analyses were conducted for the P600 amplitude. The first analysis revealed a main effect of figurativeness, F(2,42) = 4.9, p = 0.012, ε = 1.0 (see **Figure 3**), a main effect of scalp region, F(7,147) = 9.7, p < 0.001, ε = 0.47, and a modality × scalp region interaction, F(7,147) = 3.5, p = 0.024, ε = 0.38, with no other effects or interactions, ps > 0.05. **Figure 2** shows that anomalous sentences had a larger P600 amplitude than metaphorical sentences, t(21) = 2.2, p = 0.04, but metaphor sentences did not differ from literal sentences, p > 0.28.

With the anomalous condition removed, only an effect of scalp region was found, F(7,147) = 9.6, p < 0.001, ε = 0.43. Paired sample t-tests revealed no differences between auditory and motion sentences at any of the eight scalp regions, ts > 0.13. Similar to the N400 pattern, the modality × scalp region interaction reflects possibly larger P600 amplitudes for motion than auditory sentences in the Left Center and Left Posterior scalp regions, with few differences elsewhere (**Figure 3**).

### Confounding Factors

The main effect of figurativeness for both the N400 and P600 may be confounded by the familiarity or imageability of the sentences. To explore this possibility, we created two levels of familiarity and imageability by performing a median split on the previously normed ratings. Separate 2 (high, low) × 8 (scalp region) ANOVA analyses using the Huynh-Feldt sphericity correction revealed significant effects for both the N400 (Familiarity, F(1,21) = 33.3, p < 0.001; Imageability, F(1.21) = 15.2, p = 0.001) and P600 (Familiarity, F(1,21) = 5.8, p = 0.03; Imageability, F(1.21) = 4.9, p = 0.04).

last word in auditory and motion sentences (anomalous excluded). For the N400, black rectangles indicate the Left Center, Left Posterior, and Center Posterior regions which had statistical differences between auditory and motion conditions, ps < 0.05. For the P600, black rectangles indicate the Left Center and Left Posterior regions which had differences approaching significance (ps < 0.18); the location × condition interaction was significant.

# Discussion

The current study explored the effect of modality on metaphor processing. We used ERPs to compare the processing of motion (The partnership was a financial tailspin) and auditory (His emails were an insistent knock) unfamiliar metaphors to literal and anomalous sentences using the same final word. We hypothesized a difference in the neural basis of motion compared to auditory metaphors. As predicted, we found a modality by scalp region interaction for the N400, and we discovered the same interaction for the P600. There were no interactions with figurativeness. These results support embodied views of language and suggests that metaphorical language is not qualitatively distinct from language in general. They also support the view that integration of modality and language information may be taking place in the 400 ms timeframe and later.

# Modality

This study suggests different neural processing of auditory and motion-based literal and metaphorical language for the N400 timeframe and also for the later P600 timeframe. Both components index various aspects of language processing. The N400 response to language stimuli represents aspects of semantic processing, including the possible building of a multimodal conceptual representation. The P600 is thought to underlie a revision process that occurs as more information is accounted for during the process of sentence comprehension (Kutas and Federmeier, 2011). Sensory-motor aspects of meaning may be accessed as early as 200 ms (Boulenger et al., 2012). The present findings suggest modality information is still processed and integrated in the 350--650 ms time window with two processes represented by the N400 and P600.

Many behavioral studies have demonstrated a link between the metaphorical use of language and sensory or motor processes, including novel sensory metaphors (the past is heavy) (e.g., Slepian and Ambady, 2014), conventional sensory metaphors (anger is heat) (Wilkowski et al., 2009), or conventional motion metaphors (love is a journey) (Gibbs, 2013b). Sensory motor regions of the brain have recently been shown to be activated in response to not only sensory-motor words but to those words used metaphorically (e.g., Cacciari et al., 2011; Lacey et al., 2012; Desai et al., 2013).

These studies link motor and language processing but do not provide information about the timing or nature of the link. Studies using EEG or MEG demonstrate activation of the motor cortex within 200 ms after the presentation of a word depicting action (Hauk and Pulvermüller, 2004). N400 effects have been found for the processing of visually perceived motion (Proverbio and Riva, 2009) and for the processing of a new meaning grounded in perception or action such as paddling a canoe with a Frisbee (Chwilla et al., 2007). The present findings extend these reports to literally and metaphorically used motion and auditory words presented in sentences. Our effects in the 350--650 ms timeframe suggest the integration and revision processes indexed by the N400 and P600 are likely to occur for both literal and metaphorical sentences with motion and auditory sensory-motor components in a later timeframe. Thus modality information continues to be processed during this time. This result is consistent with views that suggest the embodiment of language is not automatic and instant (Mahon and Caramazza, 2008; Rueschemeyer et al., 2010; Gibbs, 2013a) while not supporting an amodal view of language. (But see Mahon and Caramazza, 2008, who suggest that the activation of the literal meaning of metaphors during comprehension may be sufficient to modulate modality specific processes, although such process may not be required for comprehension). Since the effect existed for both literal and metaphorical sentences, metaphorical language may not be qualitatively distinct from language in general.

# Figurativeness

The current findings demonstrate a graded N400 effect with the amplitude of the N400 increasing from literal to metaphorical to anomalous sentences, consistently found across metaphor ERP studies (e.g., Arzouan et al., 2007; Lai et al., 2009). We also found a similar graded effect for the P600 in the 500--650 ms time range. Because our literal sentences were more imageable and familiar than our metaphorical sentences, it is probable that these factors can partially or completely account for our findings (Lee and Federmeier, 2008; Schmidt and Seger, 2009). Indeed, a median split based on these factors revealed significant differences in both the N400 and P600. The confounding by familiarity and imageability may need to be considered in comparisons between literal and metaphorical stimuli (Schmidt et al., 2010). ERP studies reporting a difference between literal and metaphorical stimuli, including ours, either do not mention matching familiarity between the sentences or if they do, do not balance the sentence types on familiarity. In these cases, metaphorical sentences are reported to be or appear to be less familiar than literal sentences. Indeed, when the metaphors are highly familiar or conventional, N400 differences between literal and metaphorical sentences are not always present (e.g., Balconi and Amenta, 2010). Studies reporting metaphor--literal differences in the N400 have also not addressed the imageability of the sentences used (Coulson and Van Petten, 2002, 2007; Kazmerski et al., 2003; Arzouan et al., 2007; Lai et al., 2009; Gold et al., 2010; Goldstein et al., 2012; Tzuyin Lai and Curran, 2013). Similarly, our metaphorical sentences were less imageable than our literal sentences.

# Conclusion

We report here the first ERP study of motion and auditory based metaphors. Our findings are consistent with the conclusion that the modality of the metaphor may influence its neural

# References


instantiation. The current findings also suggest that integration of modal and amodal meanings may be taking place during the N400 and P600 timeframes. Additional work is required to understand the exact nature of this integration. Further exploration of the interaction between the factor of modality on one hand and imageability and familiarity on the other hand is also warranted.

# Author Statement

GS, EB and SA were involved in the development of stimuli, initial design of the experiment and pilot testing. GS, AD, EB, and SA revised the study design based on pilot testing and finalized the selection of stimuli. AD and EB acquired the data. AD, EB and GS analyzed and interpreted the data. GS and AD wrote the first draft of the manuscript. GS, AD, EB, and SA contributed to and approved of the final manuscript.

# Acknowledgments

The authors thank John Shaughnessy, Audrey Weil, Elizabeth Burks, Rebecca Kresnak, Elizabeth Fast, and Nathan Axdorff for their assistance with various aspects of this work.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Schmidt-Snoek, Drew, Barile and Agauas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Action verbs are processed differently in metaphorical and literal sentences depending on the semantic match of visual primes

# *Melissa Troyer\*, Lauren B. Curley , Luke E. Miller , Ayse P. Saygin and Benjamin K. Bergen*

*Department of Cognitive Science, University of California, San Diego, La Jolla, CA, USA*

### *Edited by:*

*Vicky T. Lai, Max Planck Institute for Psycholinguistics, Netherlands*

### *Reviewed by:*

*Leonardo Fernandino, Medical College of Wisconsin, USA Marcus Perlman, University of Wisconsin, Madison, USA*

#### *\*Correspondence:*

*Melissa Troyer, Department of Cognitive Science, 9500 Gilman Dr., La Jolla, CA 92093, USA e-mail: mtroyer@ucsd.edu*

Language comprehension requires rapid and flexible access to information stored in long-term memory, likely influenced by activation of rich world knowledge and by brain systems that support the processing of sensorimotor content. We hypothesized that while literal language about biological motion might rely on neurocognitive representations of biological motion specific to the details of the actions described, metaphors rely on more generic representations of motion. In a priming and self-paced reading paradigm, participants saw video clips or images of (a) an intact point-light walker or (b) a scrambled control and read sentences containing literal or metaphoric uses of biological motion verbs either closely or distantly related to the depicted action (walking). We predicted that reading times for literal and metaphorical sentences would show differential sensitivity to the match between the verb and the visual prime. In Experiment 1, we observed interactions between the prime type (walker or scrambled video) and the verb type (close or distant match) for both literal and metaphorical sentences, but with strikingly different patterns. We found no difference in the verb region of literal sentences for Close-Match verbs after walker or scrambled motion primes, but Distant-Match verbs were read more quickly following walker primes. For metaphorical sentences, the results were roughly reversed, with Distant-Match verbs being read more slowly following a walker compared to scrambled motion. In Experiment 2, we observed a similar pattern following still image primes, though critical interactions emerged later in the sentence. We interpret these findings as evidence for shared recruitment of cognitive and neural mechanisms for processing visual and verbal biological motion information. Metaphoric language using biological motion verbs may recruit neurocognitive mechanisms similar to those used in processing literal language but be represented in a less-specific way.

**Keywords: sentence processing, verbal semantics, point-light walkers, biological motion, metaphor**

# **INTRODUCTION**

A central question in the cognitive neuroscience of language is how meaning is represented and accessed in the brain during language comprehension and production. It has been well established that language and perception interact in the brain and in behavior. For instance, even after the presentation of a single word, semantically related words are processed more quickly (i.e., primed), possibly due to spreading activation of features shared between the two words (Anderson, 1983), and semantic priming occurs across mixed input and target modalities, such that pictures can also prime related words, and vice versa (Sperber et al., 1979). In the domain of action perception, a great deal of research supports the notion that visually perceiving actions and processing language about actions rely on overlapping representations in the mind and brain (Barsalou, 1999; Glenberg and Kaschak, 2002; Bergen et al., 2007). Here, we set out to explore the relationship between processing language about biological motion and visually perceiving it. To this end, we asked how a visual prime depicting biological motion would affect reading times of literal and metaphorical language containing action verbs with either a close or more distant semantic relationship to the prime.

Literal meaning is dependent not only on information stored in semantic memory but also on the physical context, including representations that are grounded in perception and action (Barsalou, 1999). Glenberg and Kaschak (2002) found that when people processed linguistic information about motion directed toward the self (e.g., *Close the drawer*), compared to away from the self (e.g., *Open the drawer*), they were faster to respond using a button in that direction (that is, closer to the self, compared to farther away). A subsequent study investigating this *action compatibility effect* (ACE) further probed its timing and found that when participants did not know which response button they would use prior to reading such sentences, compatibility effects were no longer present (Borreggine and Kaschak, 2006). These findings minimally suggest that processing language about actions can rely upon representations overlapping with those used to process the physical actions themselves but that the timecourse of such activations is sensitive to context.

Other studies have shown that literal language about objects and space can both facilitate (e.g., Stanfield and Zwaan, 2001) and inhibit (e.g., Richardson et al., 2003; Bergen et al., 2007, 2010) similar behavioral responses, largely dependent on the timing between primes and the response measure. Stanfield and Zwaan (2001) had participants read sentences about objects which implied an object's orientation (either horizontal, e.g., *John put the pencil in the drawer*, or vertical, e.g., *John put the pencil in the cup*). Next, participants were shown a picture of an object in either a horizontal or vertical orientation and were asked whether it had been mentioned in the preceding sentence. They observed that participants responded more quickly when the object matched the implied orientation of the object mentioned in the sentences.

Bergen et al. (2007) investigated the extent to which visual imagery is used in understanding both literal and metaphorical language and found interference-type effects. After reading a short sentence, participants categorized a shape presented in either the upper or lower part of the screen. When the sentence contained concrete verbs or nouns which were semantically associated with the concepts "up" (e.g., *The cork rocketed*) vs. "down" (e.g., *The glass fell*), participants were systematically slower to make a decision about an object presented in the associated part of the screen. These findings support the hypothesis that participants use visual imagery to process literal language about space. Taken together with other behavioral results suggesting that people use mental imagery or simulations when processing information about which effector, or body part, is being used in an action (Bergen et al., 2010), the shape of an object described (Stanfield and Zwaan, 2001), and the axis along which motion occurs (Richardson et al., 2003), these findings suggest people use partially overlapping representations for processing information about physical action and space and linguistic content about action and space.

Functional imaging studies have demonstrated support for this conclusion through comparisons of brain regions involved in sensory perception (e.g., tactile, visual) and motor function with the brain regions involved in processing language whose meaning may be derived using these modalities (Just et al., 2004; Boulenger et al., 2009; Moody and Gennari, 2009; Saygin et al., 2010; Willems et al., 2010). In the domain of motor output, Moody and Gennari (2009) found that a region in premotor cortex was sensitive to the degree of real-world effort required for the action described by a verb in a particular context (e.g., pushing a piano requires more effort than pushing a chair). Interestingly, the anterior inferior frontal gyrus, a region thought to be involved in semantic processing more generally (Demb et al., 1995; Kuperberg et al., 2008) (but not in motor processing, specifically), was also sensitive to the degree of effort. Furthermore, action words read on their own (e.g., *kick*, *lick*, etc.) have been shown to preferentially activate motor regions corresponding to the particular effector (Hauk et al., 2004). Using transcranial magnetic stimulation (TMS) to target specific areas of the motor strip, Pulvermüller et al. (2005) also observed selective interference for effector-specific regions in the motor strip for words related to those effectors. These findings suggest a role for the motor system in more general processing of linguistic content about action.

As for perceptual systems, Just et al. (2004) showed that, compared to low imageability sentences, highly imageable language (e.g., sentences like *The number eight when rotated 90 degrees looks like a pair of spectacles*) modulated activity in the intraparietal sulcus, a brain region thought to play a role in visual attention (Wojciuluk and Kanwisher, 1999) and visual working memory (Todd and Marois, 2004). Higher visual areas such as V5/MT+ have also been implicated in processing language about visual motion (Saygin et al., 2010), though others have found this region either to be unaffected by processing high-motion verbs compared to low-motion (Bedny et al., 2008) or implicated only in a minority of subjects for motion compared to static verbs (Humphreys et al., 2013).

This work converges to suggest that motor production and visual perception of motion are candidates for neurocognitive processes which may contribute to comprehending language about motion. It is now well-established that visually processing others' actions relies on neural mechanisms overlapping with those used for language processing (Rizzolatti et al., 2001; Arbib, 2005). In the present study, we focused in particular on *biological* motion, which refers to the characteristic movement patterns of animate entities as well as specific stimuli used in vision science to study its perception (see below; Johannson, 1973; Blake and Shiffrar, 2007). Producing and understanding biological motion are important functions for humans, and humans are sensitive to perceiving biological motion even when cues are relatively minimal, as with the point-light displays used in vision science (Johannson, 1973), which are animations showing only points placed over key joints of a moving person. Humans exhibit robust perception of biological motion even in degraded conditions (see Blake and Shiffrar, 2007, for a review).

Studies of biological motion perception using point-light walkers have most consistently implicated the posterior superior temporal sulcus (pSTS) as a key region (Grossman et al., 2000, 2005; Vaina et al., 2001; Beauchamp et al., 2003; Pelphrey et al., 2003; Puce and Perrett, 2003; Saygin, 2007; Vangeneugden et al., 2011; van Kemenade et al., 2012; Gilaie-Dotan et al., 2013). In addition, Kemmerer et al. (2008) found that the pSTS was specifically recruited more during silent reading of biological motion verbs similar to *run*, e.g., *jog*, *walk*, compared to other types of action verbs, implicating involvement of this region in both perception of visual biological motion and comprehension of language about such motion. Studies have also shown involvement of other regions in processing biological motion including parietal cortex, body and motion-sensitive visual areas in lateral temporal cortex (EBA, MT+), other areas in temporal and occipital cortex, and the cerebellum (Vaina et al., 2001; Grossman and Blake, 2002; Servos et al., 2002; Saygin et al., 2004b; Nelissen et al., 2005; Jastorff et al., 2010; Sokolov et al., 2012; Vangeneugden et al., 2014).

Importantly, regions that overlap with classical language areas in inferior frontal/ventral premotor cortex (PMC) have been linked to biological motion processing (Saygin et al., 2004b; Saygin, 2007; Pavlova, 2012; van Kemenade et al., 2012; Gilaie-Dotan et al., 2013). One interpretation of the role of frontal regions is that when people view point-light walkers, they recruit their own motor resources for performing the action, as suggested by proponents of embodied cognition and the "mirror neuron" system (Rizzolatti et al., 2001). That the premotor cortex, a brain region that supports the production of motor acts, is activated during perception of point-light walkers supports the notion that this process of motor simulation occurs even when the action is depicted via motion cues (Saygin et al., 2004b). The relationship between motor processing and biological motion perception is not purely correlational: disruption of processing in these regions due to brain injury or virtual lesions induced by TMS leads to deficits in biological motion perception (Saygin, 2007; van Kemenade et al., 2012). Furthermore, activation in these areas can be modulated in those who are experts at performing actions they are viewing, as in the case of professional dancers (Calvo-Merino et al., 2005; Cross et al., 2006). In the general population, individual differences in biological motion are predicted by individual levels of motor imagery (Miller and Saygin, 2013). Given the overlap between frontal regions involved in processing of visual biological motion on the one hand and language comprehension on the other, it is likely that language comprehension may benefit from recruitment of neurocognitive processes also involved in visual perception of motion.

There is therefore evidence that regions involved in both motor execution and visual biological motion perception can be recruited in understanding the semantics of literal language. Another active area of inquiry is the processing of metaphorical meaning. The range of linguistic use and experience extends far beyond the most literal uses of verbs like *throwing* and *walking* to uses in figurative and metaphorical contexts. Conceptual metaphor theory (Lakoff, 1993) proposes that metaphors are understood through a mapping from more concrete source domains (for instance, space) to more abstract target domains (for instance, time, as in, *Time is flying by*). This theory leads to predictions that the cognitive and neural representation of concrete verbs will be activated when they are used metaphorically due to the mapping from the source to target domains. For instance, in the sentence, *The movie was racing to its end*, the verb *racing* might activate brain regions involved in motor execution of running or racing.

Studies directly comparing literal and metaphorical language to investigate whether brain regions and cognitive processes involved in sensory and motor processing are recruited equally for both types of language have found mixed results. Bergen et al. (2007) observed null effects when investigating metaphorical language about space in their spatial interference paradigm. Unlike literal language, where interference effects were observed, metaphorical language related to the spatial concepts "up" (*The numbers rocketed*) and "down" (*The quantity fell*) did not lead to interference when participants made decisions about objects located in the upper or lower halves of a computer screen, respectively. These findings suggest no (or lesser) overlap for spatial representations and metaphorical language about space (compared to literal language about space).

However, other studies have provided evidence that non-literal uses of language about space and motion do recruit more general perceptuo-motor representations. For instance, Matlock et al. have argued that when participants read sentences involving fictive motion describing static events using motion verbs, they mentally simulate motion despite no implication of a physical change (Matlock, 2004; Richardson and Matlock, 2007). For instance, Matlock (2004) had participants read stories ending in a sentence containing a fictive motion verb (e.g., *The road runs through the valley*) and make judgments about whether the sentence made sense given the context. For fictive motion sentences (but not for literal sentences in control experiments, which did not contain fictive motion), the time to make the decision was dependent on properties of the sentence including the speed of travel, the distance traveled, and the ease or difficulty of terrain. These findings minimally suggest that even for language that doesn't imply true motion, individuals access motion-like properties when processing fictive motion verbs.

In a study of metaphorical uses of motion verbs, Wilson and Gibbs (2007) looked at how quickly phrases were read following real or imagined movements made by participants. Across two experiments, participants memorized a set of actions to be performed (Experiment 1) or imagined (Experiment 2) when they viewed particular symbols (e.g., the symbol "and" was paired with the action *push*). Then, they performed a task in which they viewed a symbol, either performed or imagined the action, and then read metaphorical language either related (e.g., *push the argument*) or not (e.g., *stamp out a fear*) to the action. For both performed and imagined actions, participants were faster to read phrases (as measured by the response time of a button press after reading the phrase) when the previous action was congruent with the verb. Wilson and Gibbs interpreted these findings as evidence that processing of metaphorical uses of action verbs relies on representations shared with executing their literal meanings physically. However, if individuals access lexical representations of verbs associated with performed or imagined actions (e.g., activating the word *push* while performing the action of pushing), this alone could be sufficient to prime reading of the phrase *push the argument* even if participants do not activate such representations during normal metaphorical language comprehension. An additional limitation of this study is that it provided no literal comparison (e.g., reading phrases liked *push the cart*, or even reading bare verbs like *push*). Such comparisons would be informative as to the extent to which such concepts (like *push*), when used metaphorically, recruit representations overlapping with those used in motor executions.

In the neuroimaging literature, evidence that metaphorical language recruits perceptuo-motor representations is also mixed. Some studies have found no evidence that processing metaphorical language about particular body parts, for instance, recruits brain regions involved in moving those body parts (Aziz-Zadeh et al., 2006; Raposo et al., 2009). One study using word-byword reading of language (rather than whole-sentence reading) observed a relationship between both literal and idiomatic sentences involving different parts of the body and the corresponding somatotopic regions of the motor strip (Boulenger et al., 2009). For instance, sentences like *John grasped the object/idea* elicited stronger activity in the finger areas of motor strip while sentences like *Pablo kicked the ball/habit* elicited stronger activity in the foot areas of motor strip. However, this study involved a limited number of stimuli in each condition with many repetitions across the experiment, which limits the conclusions that can be drawn about typical sentence comprehension.

Other imaging studies have investigated fictive motion sentences, with mixed results. (such as *A crack was running along the wall* or *The pipe goes into the house*). One study found no difference in activations in visual motion perception regions for language describing fictive motion and actual motion (Wallentin et al., 2005). Saygin et al. (2010) individually located motionsensitive areas as well as face-sensitive areas in each participant and compared brain activity in real motion, fictive motion, and static sentences which were presented as audiovisual movies of a person speaking the sentences. They observed a gradient pattern: actual motion sentences elicited the greatest amount of activity in visual motion perception areas, followed by fictive motion sentences and finally static sentences. This pattern of activity was not observed in face-sensitive areas, showing that the effect was indeed related to motion semantics. In another study, literal motion sentences also modulated activity in motion-sensitive areas for American Sign Language, suggesting that the concurrent recruitment of motion processing mechanisms during language processing does not abolish the effect (McCullough et al., 2012). These findings suggest that both literal and figurative language about motion recruits brain regions involved in visual motion perception, though to a lesser extent by the latter than the former.

Bergen (2012) suggests that mixed results in studies investigating recruitment of perceptuo-motor representations during metaphorical language comprehension may be due to the timecourse of processing. In many studies investigating metaphor (e.g., Aziz-Zadeh et al., 2006; Raposo et al., 2009), whole sentences are presented concurrently to the participant, a procedure which may not maintain the temporal precision necessary to observe effects. In other studies using either shorter phrases or word-by-word presentation of metaphorical language, effects have been observed both behaviorally (Wilson and Gibbs, 2007) and in neuroimaging data (Boulenger et al., 2009). Consider the following metaphorical sentence, an example from our materials: *The story was ambling toward its conclusion.* In natural spoken language, at the point of processing the word *ambling*, the listener may already have enough information (from having heard, first, *The story*) to know that the literal use of *ambling* is inappropriate; stories are abstract and have no legs. However, there are certain similarities that might be mapped out between *stories*, which cannot amble, and *people*, who can. A story takes place on a timeline, and people perceive a timeline through which life moves forward. There are therefore systematic relationships between the word *ambling* used metaphorically and *ambling* used literally. However, in the case of this metaphorical sentence, it is possible that the full metaphorical meaning might not be understood until the end of the sentence (after having heard *toward its conclusion*). That is to say, incrementally comprehending the meaning of such a metaphor might operate rather differently than incrementally comprehending the meaning of a similar literal sentence (e.g., *The teacher was ambling toward the school*). Further, while metaphorical language can recruit brain regions thought to underlie perceptuo-motor representations relevant for the literal use of the language, evidence suggests that these representations may not be as strongly activated for metaphorical language as for literal language (Saygin et al., 2010).

Here, we examined the extent to which processing visual biological motion affected the processing of motion verbs used in both metaphorical and literal contexts. We used short videos of point-light walkers (Johannson, 1973), which display representations of human biological motion, as primes. To determine the effect that recently viewing physical walking motion would have on processing closely-matching verbs (such as *ambling*, *walking*) compared to more distantly-matching verbs (such as *leaping*, *catapulting*), we also used a control motion condition with inverted, scrambled versions of the point-light walkers. A novel contribution of our approach is that the study fully crosses literal and metaphorical language (using identical verbs) with prime type (presence or absence of an action prime) and match-type of the verb use. Following other behavioral and neuroimaging research, we expected to see facilitation following the point-light walker during self-paced reading for literal sentences containing closely-matching verbs compared to those containing distantlymatching verbs. As for our metaphorical sentences, which used the same verbs as the literal sentences but in different contexts, we predicted that if processing visual biological motion is less involved (on average) when reading metaphorical language, we would observe a smaller difference in (or absence of) facilitation between closely-matching and distantly-matching verbs. Such findings would also indicate that processing metaphorical verbs might rely upon a less precise representation of the motion described by the verb.

# **EXPERIMENT 1**

# **METHODS**

# *Norming studies*

We conducted a norming study to prepare experimental materials. We created a set of 18 verbs intended to be descriptive of our point-light walker (this was called the Close-Match set) along with a set of 18 verbs intended to be much less descriptive of this action (this was called the Distant-Match set). For instance, *ambling* was included as a proposed Close-Match verb, and *catapulting* was included as a proposed Distant-Match verb. All Closeand Distant-Match verbs denoted biological motion and could describe an individual moving unidirectionally along a path. In addition, nine Control verbs were included in the norming study. These verbs also described biological motion, though of a very different nature from the type of motion depicting moving along a path (e.g., *shoving* and *sitting*). These verbs were included to provide variety in the set of verbs and also to act as a baseline comparison as biological motion verbs that should be the least likely to match the video.

Five volunteers rated all verbs for how closely they described the action being performed in a short video clip (the point-light walker; see below) on a scale of 1 (least similar to video clip) to 7 (most similar to video clip). For all five participants, verbs were presented in alphabetical order.

As predicted, Close-Match verbs were rated higher (*M* = 5*.*76, *SD* = 1*.*09) than Distant-Match verbs (*M* = 1*.*511, *SD* = 0*.*94), *t*(4) = 10*.*34, *p <* 0*.*001. The nine highest-rated Close-Match verbs were chosen to be included as the verbs in the sentences used in the study. All of these verbs equaled or exceeded the mean of the total set (ratings for each provided in **Table 1**). Nine of the lowest 10 Distant-Match verbs were also included [one verb, *leapfrogging*, was discarded due to its low frequency, as determined by ratings from the MRC Psycholinguistic Database (Wilson, 1988)]. As for the control verbs (*M* = 1*.*64, *SD* = 1*.*05), these were not rated differently from the Distant-Match verb group, but were rated lower than the Close-Match verbs, *t*(4) = 10*.*49, *p <* 0*.*001.

Using the 18 verbs from the first set of norms, two sentences were constructed for each verb, one literal and one metaphorical. Four volunteers rated these sentences for how natural they sounded on a scale from 1 (least natural) to 7 (most natural). In addition to the 36 experimental items (18 literal, 18 metaphorical), we included 12 Filler sentences to be used in the study as well as 12 additional implausible sentences which were only used for the norming study. Filler sentences had the same structure as experimental sentences, except that the verb was not a biological motion verb (e.g., *The teenager was learning in the classroom.*). Implausible sentences were included for variety and so that participants saw sentences specifically designed to elicit low ratings (e.g., *The bread was baking the chef.*). Crucially, both the literal (*M* = 5*.*96, *SD* = 1*.*40) and metaphorical (*M* = 4*.*74, *SD* = 1*.*43) sets were rated significantly higher than the implausible (*M* = 1*.*88, *SD* = 1*.*14) items, *t*(3) = 5*.*16, *p <* 0*.*05 and *t*(3) = 3*.*60, *p <* 0*.*05, respectively. A smaller (but reliable) difference was observed between the metaphorical and literal sets, *t*(3) = 20*.*37, *p <* 0*.*001. This difference is important to consider in interpreting any overall differences in reading times for literal and metaphorical sentences observed in the reading time experiment. Numerically, filler sentences were rated the most natural (*M* = 6*.*56, *SD* = 0*.*85) and were rated higher than the metaphorical sentences [*t*(3) = 4*.*96, *p <* 0*.*05], but not the Literal sentences.

**Table 1 | Ratings for Close-Match and Distant-Match verbs used in the experiment.**


### *Participants*

Participants were 39 undergraduate students, ages 18–34 (*M* = 22, 27 female), at the University of California, San Diego. All participants reported that they were native English speakers and gave informed consent for the study, which was approved by the University of California, San Diego Institutional Review Board. Participants received partial course credit for participating in the experiment.

# *Materials*

*Visual primes.* Stimuli were presented on a CRT screen using the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) for Matlab (refresh rate 60 Hz and screen resolution 1024 × 768 pixels).

Two visual primes were used: an intact, coherent point light walker, and a scrambled point light display. The intact point-light walker was taken from the stimulus set reported in Ahlström et al. (1997). This walker was created by videotaping a human actor walking in place (on a treadmill) and recording the joint positions (e.g., elbow, wrist) of the whole body. The walker was composed of 10 black dots against a white background (see **Figure 1**). The

**primes, and of the static prime images from Experiment 2, including both (C) Walker, and (D) Scrambled primes.**

height of the walker subtended approximately 5.6◦ of visual angle when viewed at a distance of approximately 91 cm. Given the bias of English speakers (who read left to right) to prefer and conceive of actions proceeding from left to right (Chatterjee et al., 1995, 1999; Christman and Pinger, 1997; Chatterjee, 2001), the walkers were always facing to the right.

To create a control prime, we inverted and spatially scrambled the individual points of the intact point-light walker so that the figure could no longer be seen as a person walking. For each trial, the starting position of each dot was pseudorandomly chosen within a rectangle subtending approximately 5.6◦ of visual angle viewed at a distance of 91 cm. in order to match the overall size and dimensions of the upright walker. This manipulation preserved important low-level features of the walker stimuli, including local motion information and point trajectories, while removing global motion information present in the intact walker. The final image appeared as a cluster of centralized dots following individual ellipsoidal paths, based on the same individual dot trajectories as the intact walker (see **Figure 1**). These stimuli have been used in previous studies of biological motion processing (Grossman and Blake, 2002; Saygin et al., 2004b; Miller and Saygin, 2013).

*Sentence materials.* Each participant read a total of 72 sentences: 36 experimental sentences as well as 36 Filler sentences. Twelve of the Filler sentences had been included in the second norming study, and 24 additional sentences were created so that the number of Filler sentences equaled the number of experimental sentences. Examples of sentences from each sentence condition (Literal, Metaphorical, and Filler) and Verb Match condition (Close-Match, Distant-Match) are presented in (1). Sentences were presented with center-screen self-paced reading, with regions indicated by slashes.


Statistics for number of syllables, lexical frequency, concreteness, and imageability for each critical word were computed separately for each sentence type (literal, metaphorical). Critical words were (1) the subject noun (e.g., *teacher*), (2) the verb (e.g., *walking*), and the final noun of the prepositional phrase (e.g., *school*). Some nouns were compound, and in these cases, the nouns were not included in the analyses. Pairwise *t*-tests were then performed for each sentence type and critical word for close- vs. Distant-Match conditions. These statistics are provided in Supplementary Tables 1, 2. The only difference based on a lexical variable was for imageability, which was higher for Distant-Match nouns

compared to close-match nouns for metaphorical sentences at the first critical word (the subject noun). This noun was rated as more highly imageable in the Distant-Match condition (*M* = 560) than in the close-match condition (*M* = 441), *t*(6*.*935) = −3*.*4471, *p <* 0*.*05. All other tests revealed no differences between groups.

Each sentence was followed by a comprehension question, which the participant answered with *yes* or *no* using the keyboard. Half of the comprehension questions were correctly answered with *yes* and half with *no*.

# *Design and procedure*

The experiment design was a 2 (Prime Type: Walker or Scrambled) × 2 (Verb Match: Close-Match or Distant-Match) × 2 (Sentence Type: Literal or Metaphorical). The materials were pseudo-randomized across three lists, such that each participant read each sentence exactly once. The type of visual prime (Walker or Scrambled) preceding the sentence varied across the three lists, such that an intact walker preceded two thirds of the experimental items and a scrambled walker preceded one third of the experimental items. The reasoning behind this choice in design was that the content of the scrambled walkers should be dissimilar to both Close- and Distant-Match verbs. However, the proportion of Filler sentences preceded by a scrambled visual prime was two thirds, with one third being preceded by a walker visual prime, so that across the experiment, all participants saw a total of 36 sentences presented by a walker prime and 36 sentences preceded by a scrambled prime.

On each trial of the experiment, a crosshair appeared in the center of the screen for 3 s (followed by an inter-stimulus interval (ISI) of 1200 ms). Participants then saw a visual prime (for approximately 1000 ms) followed by a display of three dashes in the center of the screen. This was the cue that they could begin reading the sentence for that trial at their own pace by pressing the space bar. A button press was always followed by a delay of 200 ms before the next region was presented. After participants read all three regions, they were presented with a comprehension question, which they answered with yes or no by pressing the "z" or "m" key, respectively. Participants were then given feedback ("Correct" or "Incorrect").

# *Analysis*

Statistical analyses used mixed-effects models incorporating both fixed and random effects and were performed using the lme4 package in the software program R. Fixed effects included Prime Type (Walker, Scrambled) and Verb Match (Close-Match, Distant-Match). Since each participant most likely displays some idiosyncratic behavior (for instance, some people may be faster readers than others), and since individual sentences may be more or less difficult to process (independent of their length), we treated both subject and item as random effects. In addition, phrase length (in number of characters) was also included as a fixed effect in all models. For most analyses, we consider Literal sentences separately from Metaphorical sentences, but in overall analyses, we also analyzed effects of Sentence Type (Literal, Metaphorical, or Filler). Finally, Markov Chain Monte Carlo (MCMC) sampling was used to estimate *p*-values for fixed effects using the pvals.fnc command in the lme4 package in R.

### **RESULTS**

### *Reading times by Sentence Type*

Mean reading times by region for each Sentence Type (Filler, Literal, Metaphorical) are displayed in **Figure 2A**. An analysis including Sentence Type (with Filler acting as the baseline in the model) and length of region as fixed effects and participant and item as random effects showed that there were differences based on Sentence Type in regions 2 and 3 (see **Table 2**). To examine these differences, we used follow-up mixed-effects linear regression models restricting comparisons to only two conditions at a time. Here, we found an increase in RTs for Metaphorical sentences in region 2 (the verb region) compared to both Literal sentences (β = 43*.*797, *SE* = 17*.*670, *t* = 2*.*479, *p <* 0*.*05) and Filler sentences (β = 56*.*195, *SE* = 15*.*28, *t* = 3*.*739, *p <* 0*.*001). Filler and Literal sentences did not differ in region 2. In region 3, Filler sentences led to faster reading times than both Metaphorical sentences (β = 102*.*316, *SE* = 22*.*568, *t* = 4*.*534, *p <* 0*.*001) and Literal sentences (β = 54*.*251, *SE* = 21*.*931, *t* = 2*.*474, *p <* 0*.*05) sentences, but only a marginal difference was observed between Literal and Metaphorical sentences at region 3 (β = 44*.*474, *SE* = 22*.*832, *t* = 1*.*948, *p* = 0*.*052).

### *Literal sentences*

Reading times for Literal sentence regions by condition are displayed in **Figure 2B**, and model estimates are provided in **Table 3**. For Literal sentences, there were no effects or interactions of prime or match type at region 1, the noun phrase.

At region 2, the verb phrase, we observed an interaction of Verb Match and Prime Type (β = 129*.*880, *SE* = 50*.*522, *t* = −2*.*571, *p <* 0*.*05), with no main effect of either Verb Match or Prime Type. To interpret this interaction, we then performed mixed-effects linear regression models on pairs of conditions. Only one significant difference was observed, with the Distant-Match sentences being read more quickly at region 2 following Walker primes compared to Scrambled primes (β = −93*.*18, *SE* = 30*.*45, *t* = −3*.*061, *p <* 0*.*001). This suggests facilitation for less closely matching verbs following the viewing of a walker.

Finally, at region 3, the final prepositional phrase, we observed marginal main effects of both Prime Type (β = 87*.*607, *SE* = 48*.*639, *t* = 1*.*801, *p* = 0*.*072) and Verb Match (β = 138*.*852, *SE* = 72*.*416, *t* = 1*.*917, *p* = 0*.*056), and a significant interaction of match and Prime Type (β = −163*.*964, *SE* = 69*.*054, *t* = −2*.*374, *p <* 0*.*05). Interpretation of this interaction is not straightforward. We again performed mixed-effects linear regression models on pairs of conditions to interpret this interaction. Although numerically, the largest difference was within the Scrambled prime conditions, there was no significant difference between the Scrambled/Close-Match and Scrambled/Distant-Match conditions (*p* = 0*.*14). However, there were marginal differences between prime types both within the Close-Match conditions (β = 87*.*850, *SE* = 51*.*468, *t* = 1*.*707, *p* = 0*.*089) and Distant-Match conditions (β = −78*.*296, *SE* = 43*.*735, *t* = −1*.*790, *p* = 0*.*074). In Literal sentences with Close-Match verbs, reading times trended toward being shorter after the Scrambled primes, possibly suggesting interference for verbs that closely matched the walker prime. In Literal sentences with Distant-Match verbs, however, reading times trended toward being faster after a Walker prime, suggesting that sentences with verbs which matched less closely to the Walker prime may have been easier to comprehend.

### *Metaphorical sentences*

Reading times for Metaphorical sentences by region, Prime Type, and Verb Match type are displayed in **Figure 2B**, and model estimates and statistics are provided in **Table 4**. No significant effects or interactions of Prime Type or Verb Match were observed at region 1.

At region 2, an interaction of Prime Type and Verb Match was again observed (β = 132*.*30, *SE* = 61*.*00, *t* = 2*.*169, *p <* 0*.*05). To interpret this interaction, conditions were then compared by pair. Only one significant difference was observed, with the Distant-Match sentences being read more slowly at region 2 following Walker primes compared to the Scrambled primes (β = 99*.*14, *SE* = 45*.*77, *t* = 2*.*166, *p <* 0*.*05). This finding is the opposite of that observed in region 2 in the Literal condition, where the Distant-Match verb regions were read more quickly after Walker primes compared to the Scrambled primes.

At region 3, there was also a marginal interaction of Prime Type and Verb Match (β = 128*.*441, *SE* = 71*.*540, *t* = 1*.*795, *p* = 0*.*073). Subsequent paired tests showed that within only the Distant-Match conditions, region 3 was read more slowly for the Walker prime condition than in the Scrambled prime condition

**for Literal and Metaphorical sentences.** Error bars represent standard errors of the mean. Interactions between Verb Match and Prime Type at the indicates a significant interaction of Match and Prime for that region (the reader is referred to the text for main effects and other statistics).

### **Table 2 | Model estimates and statistics for analysis by Sentence Type from Experiment 1.**


### **Table 3 | Model estimates and statistics for the Literal sentences.**


### **Table 4 | Model estimates and statistics for the Metaphorical sentences, Experiment 1.**


(β = 127*.*210, *SE* = 52*.*222, *t* = 2*.*436, *p <* 0*.*05). In addition, within the Walker prime conditions, the Distant-Match condition was read marginally more slowly than the Close-Match condition (β = 129*.*556, *SE* = 66*.*896, *t* = 1*.*937, *p* = 0*.*053).

### *Comprehension questions*

Overall, comprehension question accuracy was high. Across all sentences (including fillers), mean accuracy was 96.70% (*SD* = 2*.*94%), with a range of 91.43–100%. There were no significant differences in accuracy by Sentence Type, Prime Type, or Verb Match type, nor any interactions of these variables. Mean accuracy for Filler sentences was 96.85% (*SD* = 3*.*42%); for Literal sentences, 96.42% (*SD* = 4*.*45%); and for Metaphorical sentences, 96.71% (*SD* = 4*.*62%).

### **DISCUSSION OF EXPERIMENT 1**

Experiment 1 set out to ask whether comprehending visual depictions of biological motion and processing (a) literal and/or (b) metaphorical verbal material about biological motion recruit overlapping neurocognitive representations. Our findings suggest that at least part of these representations may be shared across visual and literal verbal modalities as well as across visual and metaphorical verbal modalities, though the precise representations used for processing verbal material may differ depending on the specific type of language being used (i.e., literal vs. metaphorical).

In Literal sentences, reading times were speeded at the verb region (region 2) following intact (compared to scrambled) walker primes for verbs which only distantly matched the action depicted in the prime. That is, a prime video showing a pointlight display of a human figure walking led to faster reading times for the verb region of sentences containing verbs which had been rated as dissimilar to the action depicted in the video (e.g., verbs like *vaulting* or *catapulting*). Furthermore, at a subsequent region of the sentence (a final prepositional phrase), sentences containing closely-matching verbs (like *ambling* and *strolling*) were read more slowly following intact walker primes (compared to scrambled). Following intact walkers, Close-Match verb sentences were also read more slowly than Distant-Match verb sentences. These findings suggest that processing literal language about biological motion may rely on similar representations as the visual depiction of similar actions, such that there is interference in reading language shortly following the processing of a video with closely-matching visual content.

As for the Metaphorical sentences, reading times were *slowed* at the verb region (region 2) following intact (compared to scrambled) walker primes for verbs which only distantly matched the action depicted in the prime. At the final region, there were also slower reading times for sentences containing Distant-Match verbs following intact (compared to scrambled) walker primes, and there were longer reading times for Distant-Match verb conditions compared to Close-Match verb conditions following intact walker primes. These findings show a reversed pattern, compared to the Literal sentences: here, it appears that lessclosely-matching verbal content shows interference. Tentatively, these findings imply broader (or less precise) representations for biological motion verbs being used metaphorically. That is, given that metaphorical use of biological motions verbs led to an increase in reading times for Distant-Match verbs following intact walker primes, it appears that these verbs led to interference in understanding language. This would be expected if metaphorical uses of biological motion verbs activate broader swaths of semantic content in memory. For example, although the verb *jogging* might evoke action-specific representations when it occurs literally, it might activate more general motion representations when used metaphorically, including representations that overlap with the visual depiction of a walker.

These findings converge to suggest that there is an overlap in the representations for biological motion content in both verbal material (i.e., comprehending sentences) and visual material (i.e., processing the video). However, whether the shared representations across visual and verbal modalities are driven primarily by motion *per se* or whether they rely on information about form (e.g., of a human walker) remains an open question. That is, which aspects of the videos are important for representations that may be accessed during language processing? It may be the case that seeing a still image which implies the presence of biological motion may be sufficient to activate representations from memory that may be useful for processing related language about biological motion. Indeed, neuropsychological and TMS studies have shown that injury in ventral premotor cortex leads to deficits in processing actions with or without motion (Saygin et al., 2004a; Pobric and Hamilton, 2006; Saygin, 2007). Furthermore, Grossman and Blake (2001) reported that imagined biological motion recruits brain regions implicated in processing biological motion such as the pSTS. Experiment 2 addresses this question by attempting to replicate Experiment 1 (which used visual biological motion primes) and by extending the paradigm to include a second group of participants who viewed still image primes of similar walkers in order to ask whether the priming effects seen in Experiment 1 were primarily due to viewing motion, or whether they could be induced by static content, where motion is only implied.

### **EXPERIMENT 2**

In Experiment 2, we sought to replicate and extend the finding from Experiment 1 that the processing of literal and metaphorical language about biological motion was affected by visual processing of biological motion in different ways. We also attempted to tease apart potential effects of visual form and visual motion on language processing. To this end, we randomly assigned half of our participants to view primes that were videos of point light displays (identical to those used in Experiment 1), and the other half saw primes that were static images created from similar displays of either randomly displayed dots or point-light walkers. We predicted that if the visual image of the form of a walker alone is enough to imply motion, then we should see nearly identical processing of language following still images compared to motion videos. If, however, visual form is not sufficient to activate neurocognitive representations of motion that may be accessed in language comprehension, then we may observe no priming effects for the still images (or smaller priming effects). We were also interested in the extent to which the still images might have a different effect on metaphorical and literal language, compared to the motion video clips. For instance, more-specific representations of biological motion might be accessed when processing literal biological motion language (such as *The teacher was ambling toward the school.*), but less-specific representations of biological motion might be accessed when processing metaphorical biological motion language (like *The story was ambling toward its conclusion.*). If this is the case, then we might see priming for only the metaphorical sentences following static primes, but not for literal sentences.

### **METHODS**

### *Participants*

Participants were 59 undergraduates, ages 18–32 (*M* = 21, 48 female) at the University of California, San Diego. Thirty participants saw motion video primes and 29 participants saw still image primes. All participants reported that they were native English speakers, and gave informed consent for the study, which was approved by the Institutional Review Board. Participants received partial course credit for participating in the experiment.

### *Materials*

Stimuli were presented on a CRT screen using the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) for Matlab (refresh rate 60 Hz and screen resolution 1024 × 768 pixels).

Visual prime materials for the participants who saw motion primes (i.e., the Motion group) were identical to those used in Experiment 1. The still images used in this experiment (in the Static group) were taken from similar point-light walker displays (Vanrie and Verfaillie, 2004) and edited in Inkscape so that they subtended approximately the same visual angle as the videos (5.6◦ of visual angle viewed at a distance of approximately 91 cm). Scrambled images were also taken from screen captures of scrambled and inverted versions of the upright walkers. As before, images were presented on the screen with a small amount of jitter (up to 0.4◦ of visual angle) to prevent visual adaptation between trials.

All sentence materials were identical to those used in Experiment 1.

### *Design and procedure*

Participants were placed into one of two groups: Motion (replicating Experiment 1) or Static. Within each group, the design was the same: 2 (Prime Type: Walker or Scrambled) × 2 (Verb Match: Close-Match or Distant-Match) × 2 (Sentence Type: Literal or Metaphorical). Other than different Motion and Static participants groups, the design and procedure were identical to those of Experiment 1.

### *Analysis*

Motion and Static participant groups were analyzed separately. Analyses were otherwise identical to those used in Experiment 1.

### **RESULTS**

### *Reading times by Sentence Type*

Reading times by Sentence Type (Literal, Metaphorical, Filler), collapsed across Verb Type and Prime Type, followed a pattern similar to that of Experiment 1 for both the Motion group and for the Static group, with minor differences. Mean reading times by region for each Sentence Type (Filler, Literal, Metaphorical) are displayed in **Figure 3A** (Motion group) and **Figure 4A** (Static group).

For the Motion group, differences emerged at all three regions, with Metaphorical sentences being read slower than the other types of sentences at all three regions. Pair-wise comparisons using linear mixed-effects models revealed that at region 1, Metaphorical sentences were read more slowly than Literal sentences (β = 43*.*766, *SE* = 71*.*202, *t* = 6*.*329, *p <* 0*.*01) and read marginally slower than Filler sentences (β = 29*.*461, *SE* = 15*.*864, *t* = 1*.*857, *p* = 0*.*06). At region 2, Metaphorical sentences were read more slowly than both Literal sentences (β = 77*.*458, *SE* = 15*.*695, *t* = 4*.*935, *p <* 0*.*001) and Filler sentences (β = 73*.*177, *SE* = 13*.*703, *t* = 5*.*340, *p <* 0*.*001). This pattern persisted at region 3, with Metaphorical sentences being read more slowly than both Literal sentences (β = 127*.*910, *SE* = 19*.*328, *t* = 6*.*618, *p <* 0*.*001) and Filler sentences (β = 128*.*612, *SE* = 20*.*301, *t* = 6*.*335, *p <* 0*.*001). There were no differences between reading times for Literal sentences and Filler sentences in any of the regions.

For the Static group, differences emerged at all three regions. The metaphorical condition was again read the slowest across regions. At region 1, Metaphorical sentences were read more slowly than both Literal sentences (β = 71*.*734, *SE* = 16*.*012, *t* = 4*.*480, *p <* 0*.*001) and Filler sentences (β = 36*.*062, *SE* =

which contained the verb, were observed for Literal sentences whereas no significant main effects or interactions were observed in this region for Metaphorical sentences. The ∗ indicates a significant interaction of Match and Prime for that region (the reader is referred to the text for main effects and other statistics).

16*.*803, *t* = 2*.*146, *p <* 0*.*05). Literal sentences were also read marginally more slowly than Filler sentences at this region (β = −27*.*602, *SE* = 15*.*845, *t* = −1*.*742, *p* = 0*.*08). At region 2, Metaphorical sentences were read more slowly than both Literal sentences (β = 67*.*076, *SE* = 17*.*578, *t* = 3*.*816, *p <* 0*.*001) and Filler sentences (β = 99*.*378, *SE* = 14*.*713, *t* = 6*.*754, *p <* 0*.*001). However, Literal sentences were read more slowly than Filler sentences at region 2 (β = 33*.*198, *SE* 13.5, *t* = 2*.*459, *p <* 0*.*05). The pattern at region 3 was the same as at region 2. Metaphorical sentences were read more slowly than both Literal sentences (β = 112*.*684, *SE* = 22*.*922, *t* = 4*.*916, *p <* 0*.*005) and Filler sentences (β = 140*.*385, *SE* = 20*.*755, *t* = 6*.*764, *p <* 0*.*001). Literal sentences were read more slowly than Filler sentences at region 3 (β = 42*.*40, *SE* = 18*.*49, *t* = 2*.*281, *p <* 0*.*05).

### *Reading time differences for Motion vs. Static groups*

To determine whether the Group Type (i.e., whether participants saw motion vs. static primes) had an effect on overall reading time, we analyzed reading times for each sentence region as a function of Sentence Type (Literal, Metaphorical, or Filler sentences) and Group (Motion, Static). As discussed in the previous section (Reading times by Sentence Type), significant differences emerged based on Sentence Type, but there were no main effects of Group Type or interactions of Group Type and Sentence Type (all *p*s *>* 0.10).

### *Literal sentences, Motion group*

Mean reading times for the Literal sentences for the Motion group are plotted in **Figure 3B**. The Literal sentences in the Motion group serve as a direct replication of the Literal sentences from Experiment 1. However, in contrast to Experiment 1, at region 1, there was a main effect of Verb Match (β = 90*.*941, *SE* = 44*.*879, *t* = 2*.*026, *p <* 0*.*05), a main effect of Prime Type (main effect of walker type), and an interaction of Verb Match and Prime Type (β = −122*.*107, *SE* = 43*.*225, *t* = −2*.*825, *p <* 0*.*01). These findings suggests that the Distant-Match verb sentences were read more slowly at region 1; that the Scrambled primes led to overall slower reading times at region 1; and that there was an interaction of the two factors. Given that Close- and Distant-Match sentences began with different words, even at region 1, it is possible that subtle differences in the noun phrases used in these conditions contributed to the differences observed in region 1.

As in Experiment 1, we observed an interaction between Prime Type and Verb Match at region 2, the verb region (β = −98*.*22, *SE* = 39*.*02, *t* = −2*.*517, *p <* 0*.*05). In addition, we also observed a main effect of Prime Type in this region (β = 61*.*65, *SE* = 27*.*47, *t* = 2*.*245, *p <* 0*.*05). The main effect of Prime Type suggests that overall, Literal sentences were read slower at the verb following Walker primes (compared to Scrambled primes). However, the interaction suggests that how closely the verb matched the walker affected reading times, as well. To investigate this interaction, we conducted pair-wise tests using linear mixed-effects models. These revealed that the interaction was driven by the Walker prime/Close-Match condition, which was read more slowly than both the Scrambled prime/Close-Match and Walker prime/Distant-Match conditions (*p*s *<* 0.05). As in Experiment 1, these results suggest that participants exhibited interference in processing closely-matching verbs after viewing a video of a point-light walker.

At region 3, there was only an interaction of Verb Match and Prime Type (β = −117*.*705, *SE* = 49*.*229, *t* = −2*.*391, *p <* 0*.*05), with no main effect of either Verb Match or Prime Type. To follow up on this interaction, pair-wise regressions were conducted and revealed that within the Close-Match verb condition, there was a marginal difference, such that Walker primes led to slower reading times than Scrambled primes (*p* = 0*.*07).

### *Literal sentences, Static group*

Mean reading times for Literal sentences for the Static group are plotted in **Figure 4B**. At region 1, there were no significant main effects of Verb Match or Prime Type, though there was a marginal interaction of Verb Match and Prime Type (β = −73*.*011, *SE* = 40*.*768, *t* = −1*.*791, *p* = 0*.*07).

At region 2, unlike in the Motion group, there were no main effects or interactions of any type.

At region 3, there was an interaction of Prime Type and Verb Match (β = −128*.*496, *SE* = 60*.*727, *t* = −2*.*116, *p <* 0*.*05). There was also a marginal effect of both Prime Type (β = 83*.*26, *SE* = 42*.*371, *t* = 1*.*965, *p* = 0*.*05) and Verb Match (β = 135*.*716, *SE* = 78*.*901, *t* = 1*.*72, *p* = 0*.*09). To tease apart the interaction of Prime Type and Verb Match, follow-up pair-wise regressions were performed. Within the Close-Match verb conditions, Walker primes led to slower reading times than Scrambled primes at region 3 (*p <* 0*.*05). Additionally, within the Scrambled prime conditions, there was a trend for the Distant-Match verb condition to lead to slower reading times than the Close-Match verb condition. Overall, the pattern of results at region 3 looks similar to the Motion group for Literal sentences, but the pattern at the preceding region (2) shows a different pattern (with similar reading times across conditions in the Static condition).

### *Metaphorical sentences, Motion group*

Mean reading times for Metaphorical sentences for the Motion group are plotted in **Figure 3B**. Unlike in Experiment 1, there were no significant differences at any of the three regions. However, at region 3, a numerical pattern similar to that seen in the Literal sentences, Motion group in Experiment 2 was observed, with the Distant-Match verb conditions leading to slower reading times compared to the Close-Match conditions.

### *Metaphorical sentences, Static group*

Mean reading times for Metaphorical sentences in the Static group are plotted in **Figure 4B**. At region 1, there was only a main effect of Prime Type, with Walker primes leading to longer reading times than Scrambled primes (β = 86*.*667, *SE* = 41*.*228, *t* = 2*.*102, *p <* 0*.*05).

At region 2, unlike in the Motion group, there were no main effects or interactions of any type.

At region 3, there was an interaction of Prime Type and Verb Match (β = 201*.*01, *SE* = 81*.*671, *t* = 2*.*461, *p <* 0*.*05). Followup tests showed that this interaction was driven by a difference within the Distant-Match verbs, with reading times at region 3 being slower after Walker primes than after Scrambled primes. These findings echo the results from the Metaphorical sentences in Experiment 1 (using video primes), where similar effects suggested interference following intact Walker primes for sentences containing Distant-Match verbs.

### *Comprehension questions*

High accuracy was observed for comprehension questions in both the Motion group (*M* = 95*.*95%, *SD* = 5*.*85%) and the Static group (*M* = 97*.*14%, *SD* = 3*.*59%). For the Motion group, mean accuracy was 96.97% (*SD* = 4*.*13%) for filler sentences; 96.58% (*SD* = 5*.*24%) for literal sentences; and 95.29% (*SD* = 7*.*50%) for metaphorical sentences. For the Static group, mean accuracy was 97.74% (*SD* = 2*.*88%) for filler sentences; 97.42% (*SD* = 3*.*67%) for literal sentences; and 96.82% (*SD* = 5*.*29%) for metaphorical sentences. There were no differences based on Group, Sentence Type, Verb Match, or Prime Type (all *p*s *>* 0.10).

### **DISCUSSION: EXPERIMENT 2**

Experiment 2 used both moving and static primes. We found that conceptually, for Literal sentences, the findings from Experiment 1 were replicated: in the critical verb region, reading times were slowed for verbs closely matching the action depicted in the upright walker prime videos compared to other conditions. As in Experiment 1, these results suggest that participants exhibit interference in processing closely-matching verbs used literally after viewing a video of an upright point-light walker. However, for Metaphorical sentences, there were no main effects or interactions of Prime Type and Verb Match in the Motion group in Experiment 2 (unlike in Experiment 1), though a similar numerical pattern was observed for these sentences in the final region. The lack of replication may be due to an under-powered study, as we had slightly fewer participants per group in Experiment 2, and with perhaps not enough power to detect an interaction.

Literal sentences in the Static group followed a pattern similar to that of Literal sentences in the Motion group, though differences did not emerge until after the verb region; that is, they emerged at the final region (region 3). Here, an interaction of Verb Match and Prime Type emerged, with sentences containing Close-Match verbs leading to slower reading times after Walker primes compared to Scrambled primes. Interestingly, at region 3, sentences with Close-Match verbs were also read more slowly than sentences with Distant-Match verbs. This pattern is somewhat difficult to interpret, but taken together, these findings suggest a general difficulty for Close-Match verb sentences following Walker primes that emerged later in the sentence following still images than following video primes.

As for Metaphorical sentences in the Static group, the results suggest a similarity with Experiment 1 (where video primes were used), with an interaction of Prime Type and Verb Match at region 3 suggesting that sentences containing Distant-Match verbs are processed more slowly after Walker primes compared to Scrambled primes. However, as in the Literal sentences in the Static group, this interaction only emerged at region 3 whereas in Experiment 1, it emerged earlier (at the critical verb region, region 2). Although the pattern of results observed in Experiment 1 did not replicate in the Motion group of Experiment 2, still images, at least in this sample of participants, were able to differentially affect the processing times of sentences containing distantly- vs. closely-matching verbal content. These findings suggest that there may be partially overlapping neurocognitive representations for metaphorical language about biological motion and form-based visual content about biological motion.

# **GENERAL DISCUSSION**

In summary, we observed different patterns of reading times for metaphorical compared to literal sentences following biological motion video and (to some extent) still image primes. Across two experiments, for literal sentences, we observed a pattern that may roughly be connected with interference for sentences that contained verbs closely matching the sensorimotor content of the primes. However, in Experiment 1, for metaphorical sentences, we observed a pattern that suggested interference for sentences containing distantly-matching verbs. As for still images, we observed a similar pattern, except that effects were not observed at the verb region of sentences, but rather at a later prepositional phrase at the end of the sentence.

What type of mechanism could cause Close-Match verbs to be processed more slowly in literal sentences but more quickly in the metaphorical sentences? To the extent that an interferencetype pattern emerged for the Close-Match verbs in the Literal condition following videos of point-light walkers, this pattern may have been due to difficulty in accessing a precise representation. For instance, if a participant had just read a Literal sentence with the phrase *was ambling*, she may have attempted to access a recently-activated representation of walking (recently accessed due to the viewing of a video prime of an upright pointlight walker). However, if the recently-accessed representation did not constitute a close enough fit with the desired element in memory (e.g., a representation of *ambling*), the participant may have sensed an error due to a slight mismatch between the recently-accessed walking and the attempted access to *ambling*.

Relatedly, semantic priming paradigms have demonstrated that for *coordinates*, that is, categorically related words or pictures which might be used in many similar situations, participants often exhibit interference effects rather than facilitation in picture-naming tasks involving word primes (Alario et al., 2000; Sailor et al., 2009). Semantic priming studies have primarily been conducted using noun coordinates describing objects and animate beings, though one recent study in by Bergen et al. (2010) used pictures and words in a verification task to investigate actions and verbs. Participants were slower to reject mismatching picture-word pairs when the effector typically used to perform the action was the same across the picture and the word (e.g., a picture of someone running and the word *kick*). Here, the presence of interference for similar types of actions also suggests that people activate overlapping representations in visual and verbal modalities. In our study, the visual prime of a walking action and the verb region of the sentences similarly provide an analog to noun coordinate studies using action/verb coordinates. It is possible that for our literal sentences, residual activation of a walking action led to interference for processing the verb when it closely matched the walking action. For the Distant-Match verbs, it could be that participants were able to better use previously activated sensorimotor features from the walking action, with lack of precise match allowing for facilitation in a way that the closer-matching verbs did not achieve. This might be possible if it was easier to distinguish between the walking action and, e.g., verbs like *catapulting*, but both share some overall semantic features such as motion. This pattern is roughly analogous to findings from the semantic priming literature showing that semantic *associates* lead to facilitation under circumstances when semantic coordinates do not (Alario et al., 2000; Sailor et al., 2009). This is not to say that our videos (and static images) of a walking action and Distant-Match verbs like *catapulting* are in fact semantic associates, but a relationship remains between the two which is substantially different from the relationship between the walking action and our Close-Match verbs.

It is important to note that coordinate interference effects in picture-naming tasks are typically restricted to short priming latencies (Alario et al., 2000; Sailor et al., 2009). In our studies, latency for the presentation of the critical verb following the visual prime was variable, given that participants read each phrase at a self-paced rate, but on average, latencies were much longer (including the time between the prime and beginning to read the sentence as well as the time it took to read the first phrase of the sentence). To our knowledge, no other study has examined the timing dynamics of such priming effects of verbs in sentences.

It is difficult to know precisely why such differences for the Static group emerged later in the sentence. One possibility is that activation of sensorimotor semantic content following images may be slower-acting compared to videos. A related possibility is that videos increase the strength of activation levels (in a spreading-activation-type semantic network, e.g., Anderson, 1983). As for the Static primes, the information might not have been activated strongly enough to create such interference/facilitation effects until further down the line, when the content of the sentence had been processed more fully, possibly during so-called "sentence wrap-up" effects (Just and Carpenter, 1980; Rayner et al., 2000).

As for the Metaphorical sentences, it is unclear why the findings from Experiment 1 did not replicate in the Motion group in Experiment 2. It is possible that Experiment 2 was under-powered and that the numerical difference would be reliable in a larger group. To the extent that the findings from Experiment 1 are replicable, however, it appears that metaphorical language containing biological motion verbs may rely on less-specific sensorimotor representations compared to literal language, as indicated by the fact that interference-type effects were observed for the verbs which distantly, rather than closely, matched the content of the video primes. For the still images (Experiment 2), effects were again observed later in the sentence (as was the case for the Literal sentences), and may therefore reflect a similar phenomenon of either slower-acting or weaker activation of sensorimotor semantic content following images compared to videos.

Though our experimental sentences were matched for many lexical properties, the first noun of the metaphorical sentences was found to be more highly imageable for Close-Match conditions compared to Distant-Match conditions. Imageability is known to be highly correlated with concreteness, both of which are known to affect reading times (for critical word 1 in the experimental and filler items from the present study, imageability and concreteness were highly correlated, *r* = 88*.*55, *p <* 0*.*001). In general, high concreteness and imageability are thought to be associated with shorter reading times compared to lower concreteness and imageability (Holmes and Langford, 1976; Juhasz and Rayner, 2003), though this may be modulated by individual differences in a tendency to use imagery (Denis, 1982). However, our results indicated that for metaphorical sentences, differences in reading times at the verb region (just subsequent to processing the subject noun region) went in the opposite direction, with the more highly imageable Distant-Match condition leading to longer reading times during the verb region compared to the less highly imageable Close-Match condition. We argue that this pattern indicates a form of facilitation for more closely-matching verbs, which did not themselves differ in mean imageability across the Close- and Distant-match conditions. Given that the differences indicated by the *t*-test would bias our results in the opposite direction, we do not see this differences as terribly worrisome for our interpretation. However, future studies would do well to more carefully match this property across items.

The possibility that metaphors may be associated with the processing of less precise semantic content—that is, that they are associated with a broad distribution of content from semantic memory compared to literal language—is consistent with popular theories about right-hemisphere language processing. First, data from patients with right-hemisphere compared to lefthemisphere lesions (Winner and Gardner, 1977) and studies of healthy participants using methods from cognitive neuroscience (Mashal and Faust, 2008; Pobric et al., 2008) suggest preferential processing of metaphorical language by the right hemisphere, though not all experimental data support this interpretation (e.g., Coulson and Van Petten, 2007). Second, the right hemisphere may operate over less fine-grained linguistic representations than the left hemisphere. This "coarse coding" hypothesis (Beeman et al., 1994) suggests that there are hemispheric differences in specificity of coding of linguistic information, with the left hemisphere possibly honing in on specific words/concepts while the right hemisphere may activate a broader array of words and/or conceptual content. Our findings are consistent with the notion that broader arrays of semantic content may be activated during the processing of metaphorical language compared to literal language.

Mashal et al. (2007) suggest that the right hemisphere may not necessarily be specialized for metaphorical uses of language, *per se*, but rather for non-familiar language (similar to the "Graded Salience Hypothesis"; Giora, 1997). In an fMRI study, they observed an increased activation in the right hemisphere for novel metaphors (such as *pearl tears*), but not for conventional metaphors (such as *bright student*). Relatedly, the "career of metaphor" hypothesis (Bowdle and Gentner, 2005) proposes that as metaphors age, becoming more and more frequent in production, they become more crystallized so that the original, compositional meaning of their components is lost, and the full meaning of the metaphor is all that is retained. Under this hypothesis, proposals such as Lakoff's (1993) theory of conceptual metaphor are only relevant for new metaphors. That is, novel metaphors are more likely to be interpreted as mappings from a more concrete source domain to the more abstract target domain whereas older metaphors are more fixed in meaning (in the target domain). Recently, an fMRI study by Desai et al. (2011) found imaging evidence to support this hypothesis. They investigated sensorimotor metaphors which were rated as more or less familiar by an independent set of participants. An inverse correlation between familiarity of the metaphor and activation of primary motor areas was observed, such that more familiar metaphors activated these areas to a lesser extent. The authors took these findings as evidence for the (abstract) target of metaphors being ". . . understood in terms of the base domain through motoric simulations, which gradually become less detailed while still maintaining their roots in the base domain" (p. 10).

The metaphors used in the present studies are likely to be relatively new metaphors, by such definitions, which means that according to proposals like Bowdle and Gentner's (2005) "career of metaphor" theory and Mashal et al. (2007) hypothesis regarding the novelty of metaphors, these are the types of metaphors that are likely to engage the recruitment of sensorimotor representations in their processing. According to these theories, in doing so, they may also be likely to engage a different pattern of brain regions from those engaged when processing literal sentences. Our findings are consistent with a story wherein metaphorical language is processed in a different way from literal language, but still may recruit sensorimotor representations overlapping with the processing of visual biological motion and possibly implied motion (as suggested by the findings from the Static group in Experiment 2).

# **ACKNOWLEDGMENTS**

This research was supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Defense US Army Research Laboratory contract number W911NF-12-C-0022. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoD/ARL, or the U.S. Government. Author Melissa Troyer was supported by an NSF Graduate Research Fellowship and the Kroner Fellowship at UCSD. Ayse P. Saygin and Luke E. Miller were supported by the grant NSF CAREER BCS 1151805. We would also like to thank Sandra Zerkle, Alexandria Bell, Wednesday Bushong, and Michael Belcher for assistance in collecting data.

# **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fnhum*.* 2014*.*00982/abstract

# **REFERENCES**


Brainard, D. H. (1997). The psychophysics toolbox. *Spat. Vis.* 10, 433–436. doi: 10.1163/156856897X00357


priming in the picture-word interference task. *Q. J. Exp. Psychol.* 62, 789–801. doi: 10.1080/17470210802254383


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 July 2014; accepted: 17 November 2014; published online: 04 December 2014.*

*Citation: Troyer M, Curley LB, Miller LE, Saygin AP and Bergen BK (2014) Action verbs are processed differently in metaphorical and literal sentences depending on the semantic match of visual primes. Front. Hum. Neurosci. 8:982. doi: 10.3389/fnhum. 2014.00982*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Troyer, Curley, Miller, Saygin and Bergen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# How vertical hand movements impact brain activity elicited by literally and metaphorically related words: an ERP study of embodied metaphor

# *Megan Bardolph and Seana Coulson\**

*Cognitive Science Department, University of California, San Diego, La Jolla, CA, USA*

### *Edited by:*

*Vicky T. Lai, University of South Carolina, USA*

### *Reviewed by:*

*Manuel De Vega, Universidad de La Laguna, Spain Lea Hald, Radboud University Nijmegen, Netherlands*

### *\*Correspondence:*

*Seana Coulson, Cognitive Science Department, University of California, San Diego, 9500 Gilman Drive, Mail Code 0515, La Jolla, CA 92093-0515, USA e-mail: scoulson@ucsd.edu*

Embodied metaphor theory suggests abstract concepts are metaphorically linked to more experientially basic ones and recruit sensorimotor cortex for their comprehension. To test whether words associated with spatial attributes reactivate traces in sensorimotor cortex, we recorded EEG from the scalp of healthy adults as they read words while performing a concurrent task involving either upward- or downward- directed arm movements. ERPs were time-locked to words associated with vertical space—either literally (ascend, descend) or metaphorically (inspire, defeat)—as participants made vertical movements that were either congruent or incongruent with the words. Congruency effects emerged 200–300 ms after word onset for literal words, but not until after 500 ms post-onset for metaphorically related words. Results argue against a strong version of embodied metaphor theory, but support a role for sensorimotor simulation in concrete language.

**Keywords: semantics, embodiment, grounded meaning, N400, LPC, motor resonance, compatibility effects**

### **INTRODUCTION**

Embodied or grounded theories of cognition suggest the neural substrate of word meaning involves brain regions extending considerably beyond the traditional language areas in the brain. Because words are frequently encountered in the context of the objects, actions, and events they represent, their linguistic representations are associated with experiential traces of their referents. The word "dog," for example, is associated with the perceptual attributes of dogs, motoric routines for interacting with dogs, as well as the features of situations in which one typically encounters dogs. According to embodied models of language comprehension, understanding the meaning of "dog" involves the reactivation of contextually relevant experiential traces in sensorimotor cortex (e.g., Zwaan and Madden, 2005).

Embodied models of language meaning contrast with symbolic approaches prevalent in the Twentieth century (e.g., Pylyshyn, 1984), but accord better with views advanced by Nineteenth century neurologists. Observations of patients with brain damage led these neurologists to suggest that concepts are represented in a distributed array of brain regions that includes sensory and motor areas (Wernicke, 1874/1977; Head, 1926). Recent years have provided evidence in support of embodied meaning as behavioral research has shown tight links between the language system and both perception (Barsalou, 2010) and action (Fischer and Zwaan, 2008), and neuroimaging studies have shown that linguistic stimuli activate brain regions associated with sensorimotor processing (Pulvermüller, 2005; Barsalou, 2008). However, the precise import of these sensorimotor activations is controversial, as is the issue of whether amodal representations play a significant role in the neural representation of meaning (see Meteyard et al., 2012 for an insightful review).

Although there is growing agreement that the neural representation of concrete concepts involves sensory as well as motor processing regions, the representation of abstract concepts is hotly contested. Perhaps the most radical suggestion to date is that of Gallese and Lakoff (2005) that embodied content underpins all conceptual structure. Their claim is that abstract concepts used in imagination and language comprehension recruit the same neural substrates as those recruited by primary experience involving direct perception and (inter) action. According to this model, abstract concepts are grounded in experience via the mediation of metaphoric mappings. The concept of time, for example, is grounded in experiences of motion through space such that neural resources recruited for our understanding of spatial motion can be redeployed for relevant inferences about the domain of time (Boroditsky, 2011). On such accounts, metaphoric reference to time *passing*, *creeping*, or *flying* are understandable because of an overlap in the neural substrate of concepts for time and for motion (Lakoff and Johnson, 1999).

Indeed, metaphor theorists have noted that spatial metaphors are highly prevalent, being a common feature of languages throughout the world (Kovecses, 2006), and applying to numerous abstract domains (Lakoff and Johnson, 1980). Orientation metaphors, for example, occur in the domains of morality, health, rationality, consciousness, and control, consistently mapping positive elements upwards and negative ones downwards. In the domain of morality, for example, we talk about *upstanding citizens* vs. people of *low character*; in the domain of control, we talk about the *overlords* vs. the *underclass*. Similarly, positive emotions such as happiness are associated with upper regions of space, while negative ones such as sadness are associated with lower regions of space, as in "*I was feeling down, but having lunch with you has* *really cheered me up!*" Embodied metaphor theory suggests that spatial features are activated in understanding metaphoric uses of these words, just as they are for literal uses of "up," "down," "over," and "under." Moreover, because the theory posits links between the two domains in a metaphor, these spatial schemas are part of our concepts for morality, control, and emotions.

Consistent with embodied metaphor theory, behavioral research has shown that spatial attributes are active in judgment tasks involving abstract concepts structured by orientation metaphors. Much of the research on this topic has taken advantage of stimulus-response compatibility effects, or the finding that participants respond more quickly and accurately in judgment tasks when the nature of the response matches some feature of the stimulus (e.g., indicating the presence of a stimulus in the right side of space with a right hand response). For example, the action sentence compatibility effect (ACE) is the finding that participants respond more quickly to sentences about actions involving movement away from their bodies ("You closed the drawer.") when the response requires a movement away from their bodies; likewise, participants respond more quickly to sentences about movement toward their bodies ("You opened the drawer.") when the response involves movement toward their bodies (Glenberg and Kaschak, 2002). Santana and De Vega (2011) utilized the ACE paradigm to show that vertical motion verbs were subject to compatibility effects. In this study, participants read the Spanish equivalents of sentences such as "The pressured gas made the balloon rise," (literal) and "His talent for politics made him rise to victory," (metaphor), and pressed a button in response to the animated motion of the verb. Responses were faster when the direction of movement matched the direction of the verb, suggesting literal and metaphorical motion verbs activate movement schemas along a similar time course.

Some evidence supports the presence of compatibility effects even in the processing of individual words. For example, when asked to judge which of two social groups (such as "masters" and "servants") was more powerful, participants responded faster when the more powerful group was presented at the top of the screen; when asked to judge which group was less powerful, participants' responses were faster when the chosen group (i.e., servants) was presented at the bottom of the screen (Schubert, 2005). Similarly, when asked to judge whether words (such as "hero" and "liar") had a positive or a negative meaning, participants' responses were faster for positively valenced items when they appeared above the fixation point, and faster for negatively valenced items when they appeared below fixation (Meier and Robinson, 2004).

The observation of compatibility effects for both sentences and individual words is relevant because models of embodied meaning differ regarding the automaticity of motor activity during language comprehension, as well as the language processing stage at which such effects arise. Pulvermüller (2005) posits a strong embodiment model in which sensorimotor meaning activation is rooted in fundamental aspects of neuronal function. On this model, the frequent co-occurrence of words such as "raise" with, say, the action of raising one's hand, leads to the formation of neuronal ensembles connecting the neural representation of the word with the relevant motor programs. Once assembled, the acoustic representation of the word triggers the rapid, automatic activation of associated motor programs. Whereas strong embodiment models suggest the activation of motor schemas arises automatically in the course of word comprehension, weak embodiment models suggest sensorimotor activations are more relevant for situation model construction, processes that occur at the phrase and sentence levels (Havas et al., 2007; Simmons et al., 2008).

Indeed, the automaticity of spatial activations for abstract concepts is somewhat suspect. Spatial compatibility effects such as those reported by Meier and Robinson (2004) are not always observed, and their emergence is heavily task dependent (Lebois et al., 2014). For example, Brookshire et al. (2010) presented positively and negatively valenced words in different colored fonts, and asked participants to indicate the font color with button presses that required either an upward or a downward movement. In experimental conditions that encouraged participants to attend to the meaning of the words, upward responses were fastest for positively valenced words, and downward responses were fastest for negatively valenced words. Spatial congruency effects were absent, however, in conditions that encouraged participants to attend only to the color of the words (Brookshire et al., 2010).

Moreover, while the spatial congruency effects reported by, for example, Brookshire et al. (2010) show that word meanings can rapidly influence the motor system, it is unclear whether activity in motor cortex plays any role in the representation of word meanings themselves. Critics of embodied meaning have also argued for the importance of measures with high temporal resolution, noting that fMRI data cannot be used to distinguish between early effects indexing word meaning from later, more strategic simulation effects that might arise from conscious mental imagery (Mahon and Caramazza, 2008). Derived from synaptically generated current flow within patches of neural tissue, event-related brain potentials (ERPs) are a real time measure of brain function that have been associated with numerous aspects of language (Kutas et al., 2006). Here we combined ERP measures of word comprehension with an experimental manipulation intended to modulate activity in the motor system.

Previous ERP studies investigating the action-sentence compatibility (ACE) effect have shown neural responses associated with hand movements that are compatible or incompatible with actions described in sentences (Aravena et al., 2010). When participants judged sentence meaning via button press, they were faster to respond when the shape of their hand was compatible with actions described in the sentences (i.e., open hand press in response to a sentence about clapping). ERPs time locked to sentence-final verbs (e.g., "applauded") revealed enhanced negativity in a 350–650 ms time window for the incompatible compared to compatible condition. Although the negativity appears to peak at least 600 ms after word onset, the authors describe it as an N400-like effect, and interpret it as an indication that the motor task impacted participants' language comprehension processes.

To further explore sensorimotor contributions to meaning, here we tested whether concurrent hand movements in the vertical plane altered the brain's real time response to words with associated spatial attributes. Accordingly, we recorded participants' EEG as they moved marbles either upwards or downwards while reading words associated with different regions of vertical space. Because spatial associations can be more or less experientially grounded, we compared the impact of our concurrent motion task on words whose verticality was either literal, involving words such as "ascend" and "descend," or metaphorical, involving words such as "inspire" and "defeat." The target of the motor act was thus designed to be either congruent or incongruent with the "height" of the word so that we could examine the timing and topographic profile of ERP congruency effects for words whose verticality was either literal or metaphorical.

The use of a single-word reading task allowed us to test predictions of strong embodiment that spatial features are an automatic aspect of meaning activations. If words reactivate associated experiential traces, one would expect the brain's real time response to the words to be modulated by the congruency between participants' hand movements and the vertical features activated by the words. Embodied theories of meaning predict any such congruency effects should arise during the early stages of meaning processing; i.e., within the first 500 ms after word onset. Moreover, embodied theories of metaphor processing suggest broadly similar congruency effects should emerge for words whose verticality is metaphorical, as for words whose verticality is literal.

### **METHODS**

### **PARTICIPANTS**

This study was conducted with the approval of the UC San Diego Institutional Review Board. Data reported here were from 24 UC San Diego undergraduates (13 male, 11 female). Eleven additional participants were excluded from analysis due to excessive movement artifacts or other technical problems. Participants' ages ranged from 18 to 34, with a mean of 20 years. All participants were right-handed, had normal or corrected-to-normal vision, and none had a history of traumatic head injuries or psychiatric problems. All participants gave informed consent and, in exchange for participation, received extra credit toward their grade in a cognitive science, linguistics, or psychology course.

### **MATERIALS**

Experimental materials included 84 words in the Literal verticality condition, and 84 words in the Metaphorical verticality condition. Words in the Literal verticality condition were a subset of materials used in Collins (2011), while words in the Metaphorical verticality condition were assembled from materials used in Brookshire et al. (2010) (kindly provided by Daniel Casasanto) and materials published in Meier and Robinson (2004).

Words in the Literal verticality condition were divided into 42 Literal Low words (e.g., descend, floor) and 42 Literal High words (e.g., ascend, ceiling). Words in the Metaphorical verticality condition were divided into 42 Metaphorical Low words (e.g., defeat, poverty) and 42 Metaphorical High words (e.g., inspire, power). See **Table 1** for example stimuli. Words in each of the four conditions were roughly matched for psycholinguistic variables as measured by the MRC database (Coltheart, 1981b), including log word frequency (1.9), number of letters (5.4), number of syllables (1.6), familiarity (541.9), and concreteness (433.3). Materials



also included 84 "filler" words deemed by experimenters to be relatively neutral with respect to verticality, and similar in log word frequency (1.5), number of letters (5.6), number of syllables (1.7), familiarity (520.8), and concreteness (539.7).

Because Metaphorical Verticality derives from metaphors associating positively valenced items with upper regions of space, and negatively valenced items with lower regions of space, valence and arousal ratings were obtained for these materials from the Affective Norms for English Words (ANEW) 2010 dataset. Valence ratings range from 1 to 9, where 1 is highly negative, 5 is neutral, and 9 is highly positive. Average valence ratings for Metaphorical High words ranged from 6.5 to 8.7 (average = 7.5), whereas ratings for the Metaphorical Low words ranged from 1.5 to 3.28 (average = 2.5). Metaphorical High and Metaphorical Low words were matched on arousal ratings: 5.6 (*SD* = 1*.*1) vs. 5.7 (*SD* = 0*.*8), respectively.

The verticality of these materials was established in a separate norming study using 10 new participants drawn from the same pool as the ERP study. Participants in the norming study were instructed to rate each word presented on a 5-point scale from 1 (Very low) to 5 (Very high), and in which 3 signaled "Neither high nor low." Participants judged both sets of Low words to be below 3, viz. Literal (2.39) and Metaphorical (2.21), and both sets of High words to be above 3, viz. Literal (4.08) and Metaphorical (4.03). The average rating for the filler items was 3.01.

### **PROCEDURE AND DESIGN**

Participants were seated in a chair facing a computer monitor located approximately forty inches (101 cm) in front of them. On the floor next to the chair was an apparatus containing approximately 100 black marbles. The apparatus had two wooden trays lined with green or red felt (see Image 1 in Supplementary Materials). The green tray was mounted above the red tray for the entire experiment. Both trays were angled so that the marbles in the trays would roll toward the front, where the participant could easily reach them. Participants moved marbles from one tray to the other while reading words on the computer screen.

Since few ERP studies of language processing involve ongoing, controlled movement, it was unclear whether this paradigm would give rise to lateralized EEG activity reflecting motor control processes unrelated to the movement congruency manipulation. Consequently, half the participants used their left hand throughout the study and half the participants used their right hand. In this way, we hoped to examine the impact of response hand on the ERPs, and whether or not any observed hand effects interacted with experimental variables such as Word direction (low/high), Movement direction (upward/downward), or Lit/Met verticality (literal/metaphorical).

The experiment was divided into four blocks—two in which participants were instructed to move marbles into the red tray, and two in which the green tray was the intended target. Participants changed direction after the second block, and the initial direction (upwards toward the green tray, or downwards toward the red tray) was counterbalanced across participants. At the beginning of each block, participants received verbal instruction indicating which colored tray they should move the marbles into; no language about moving upwards or downwards was used during the instructions. Participants were told their task was to move marbles from one tray to the other while reading words presented on the computer monitor.

At the beginning of each block, participants fixated on the center of the monitor and waited until the word "Ready" appeared. Upon seeing the word "Ready," they began moving marbles into the specified tray, using only the arm on the ipsilateral side of the tray. The use of the left or right arm was counterbalanced across participants. Participants were asked to move the marbles at a constant rate, without moving their shoulder and without looking at the marble apparatus. Participants were informed that there would be a memory test afterward to ensure they were reading the words. At the end of a block, the experimenter moved the marbles back into the appropriate tray, as necessary.

Each trial was preceded by a small, yellow fixation cross at the center of the screen, followed by the presentation of a word. Words were presented for 500 ms, followed by 1000 ms of fixation cross. Each block involved the presentation of 21 words from each of the four experimental categories (Literal Low, Literal High, Metaphorical Low, and Metaphorical High) as well as 42 fillers. Presentation order was randomized. Each word was presented twice, once accompanied by an upward movement, and once by a downward movement. Words presented in the first block were presented again in the third block in a different random order; likewise the same words were presented in the second and fourth blocks.

Design was thus mixed, with Hand (left/right) as a between-participants variable, and Movement direction (upwards/downwards), Word direction (Low/High), and Lit/Met verticality (literal/metaphorical) as within-participants variables.

### **EEG RECORDING AND ANALYSIS**

Participants' EEG was recorded with 29 tin electrodes embedded in an Electro-Cap, and arranged in the International 10–20 configuration. EEG recording was referenced on-line to an electrode placed over the left mastoid, and later re-referenced to an average of activity recorded from left and right mastoids. Blinks were monitored by comparing activity at the FP2 channel with recordings from an electrode under the right eye. Horizontal eye movements were monitored via a bipolar derivation of electrodes placed next to each eye (on the outer canthi). All impedances were kept below 5 kOhms.

Analysis involved mean amplitude measurements taken in four intervals intended to capture ERP components to visually presented words 200–300 ms (P2), 300–500 ms (N400), 500–700 ms (LPC), and 700–1100 ms (slow wave). These time windows were chosen based on the ERP literature on language and memory (reviewed in Kutas and Van Petten, 1994; Kutas et al., 2006), and were similar to those used in previous studies in our laboratory (e.g., Davenport and Coulson, 2011).

Measurements were subjected to two sets of analyses. The first were a set of pre-planned comparisons motivated by the embodied cognition literature, and were intended to test first, whether literal words would elicit different brain activity in the incongruent than congruent movement condition, and, second, whether metaphorical words would do so. As is customary in these analyses (see Kaschak et al., 2005), Word direction and Movement direction were treated as a single Congruency factor. Accordingly, separate planned comparisons of ERPs to Congruent (low words with downward movements and high words with upward movements) and Incongruent (low words with upward movements and high words with downward movements) stimuli were conducted for Literal and Metaphorical words, respectively. As response hand was counterbalanced, it was not included as a factor in the analysis. Consequently, factors in these planned comparisons included Congruency (congruent/incongruent), Region (6 levels), and Electrode Site (3 levels). The electrode sites included in each Region can be seen in **Figure 1**.

The second set of analyses were intended to test which, if any, of our independent variables affected the ERPs, and, as such, encompassed all aspects of our design, including the response Hand factor. In these analyses, ERP measurements were subjected to omnibus ANOVA with between-subjects factor Hand (Left/Right), and within-subjects factors Lit/Met verticality (literal, metaphorical), Word direction (low, high), Movement direction (upwards, downwards), Region (6 regions of electrodes) and Electrode site (3 levels). Omnibus analyses enabled us to examine whether there were any unanticipated interactions between experimental variables, and, more importantly, to assess potential

differences in congruency effects for literal vs. metaphorical words.

### **RESULTS**

### **ERP MEASURES: PLANNED COMPARISONS**

Planned comparisons revealed Congruency effects for words whose verticality was Literal only during the interval 200–300 ms post-onset [Congruency *F*(1*,* 23) = 5*.*23, *p <* 0*.*05; Congruency × Region *F*(5*,* 23) = 4*.*05, *p <* 0*.*05]. Words viewed in the incongruent movement condition elicited slightly more positive ERPs over right frontal electrode sites. Comparable analyses of words whose verticality was metaphorical revealed congruency effects only during the 500–700 ms time window [*F*(1*,* 23) = 5*.*57, *p <* 0*.*05].

### **ERP MEASURES: OMNIBUS ANALYSES**

### *Early congruency effect*

Analysis of ERPs measured 200–300 ms post-word onset revealed a significant four-way interaction of Lit/Met x Word direction x Movement direction x Electrode [*F*(5*,* 110) = 3*.*21, *p <* 0*.*05]. Results of the planned comparisons suggest this interaction reflects the presence of congruency effects in the literal words coupled with their absence in the metaphorical words. Early congruency effects for literal but not metaphorical words can be seen in **Figure 2**.

### *Valence effect*

Analysis of ERPs measured 300–500 ms post-onset revealed an interaction of Word direction x Lit/Met [*F*(1*,* 22) = 5*.*38, *p <* 0*.*05]. This interaction was followed up with separate repeated measures ANOVAs for literal and metaphorical words. Analysis of words whose verticality was literal revealed no experimental effects.

By contrast, analysis of words whose verticality was metaphorical revealed a main effect of Word direction [*F*(1*,* 22) = 6*.*64, *p <* 0*.*05]. Relative to metaphorical low words, metaphorical high words elicited slightly less negative ERPs over central scalp sites (see **Figure 3**). As these differences a) did not interact with movement direction and b) were not observed for the affectively neutral items (viz. the literal words, or, for that matter, the filler words), Word direction effects in the metaphorical words were presumed to be due to the fact that high words were positively valenced, while the low words were negatively valenced1 .

### *Late congruency effects*

Analysis of ERPs measured 500–700 ms post-word onset revealed a Hand x Region x Electrode site interaction [*F*(10*,* 220) = 4*.*64, *p <* 0*.*01], as participants using their Left hand had more positive ERPs over frontal central electrodes than those using their Right hand. As this contrasted with the Congruency effect observed in our planned comparisons of ERPs time locked to metaphors and which collapsed across the Hand factor (see Section ERP Measures: Planned Comparisons), we conducted separate *posthoc* analyses of ERPs recorded from participants using their Left hand, and those using their Right hand. In each group, ERPs to metaphors were measured 500–700 ms and subjected to repeated measures ANOVA with factors Congruency (congruent/incongruent), Region (6 levels), and Electrode (3 levels). A significant congruency effect was found for participants using their Left hand [*F*(1*,* 11) = 4*.*86, *p <* 0*.*05], but not for participants using their Right hand [*F*(1*,* 11) = 1*.*53, *p* = 0*.*24]. The discrepancy between the effect of Hand in the omnibus analysis and the effect of Congruency in the planned comparisons presumably results because the Metaphor Congruency effect is driven by ERPs in the group who moved the marbles with their

1Based on the suggestion of a reviewer, we analyzed ERPs for metaphorical high and low words separately to investigate whether either type of word (positive or negative valence) yielded a congruency effect of Word direction x Motion direction. There was a trend toward a significant Word direction x Motion direction interaction for metaphorical high words [*F*(1*,* 23) = 3*.*49, *p* = 0*.*07], with high words viewed in the downward condition (incongruent) eliciting more positive ERPs than in the upward condition (congruent) (µ = 2*.*13, µ = 1*.*67 µV). However, given the unmotivated nature of these analyses, it would be unwise to draw any conclusions from this non-significant trend.

**Literal (right) verticality stimuli.** ERP traces are shown for Congruent (e.g., High words viewed during Upward movement) and Incongruent (e.g., High

depicts a topographic scalp map of the mean amplitude voltage difference for the Literal verticality condition in the 200–300 ms time window.

non-dominant hand. As one might expect from the planned comparisons, analyses of the ERPs to words whose verticality was literal showed no sign of Congruency effects in either the group using their Left hand LH [*F*(1*,* 11) = 1*.*22, *p* = 0*.*29] or their Right hand [*F*(1*,* 11) = 1*.*13, *p* = 0*.*31]. ERPs from the group using their Left hand are shown in **Figure 4**. Reliable Congruency effects were observed 500–700 ms after the onset of Metaphorical but not Literal words.

**low (i.e., negatively valenced) conditions collapsed across movement**

Omnibus analysis of the 700–1100 ms time window revealed an interaction between Hand and Lit/Met [*F*(1*,* 22) = 4*.*92, *p <* 0*.*05] and Hand, Word Direction, and Movement Direction [*F*(1*,* 22) = 6*.*69, *p <* 0*.*05]. To further explore these interactions with Hand group, we conducted two *post-hoc* ANOVAs one for each Hand group—each with factors Lit/Met (literal/metaphorical), Word Direction (high/low), Movement Direction (upwards/downwards), Region (6 levels), and Electrode site (3 levels). Analysis of participants using their Right Hand did not reveal any experimental effects. For participants using their Left Hand, analysis revealed a Word direction x Movement direction interaction [*F*(1*,* 11) = 7*.*30, *p <* 0*.*05], i.e., a Congruency effect. **Figure 5** shows that ERPs were more positive for words in the incongruent movement condition. As no interactions with the Lit/Met factor were observed, these effects were presumed to be similar for literal and metaphorical words. Moreover, analysis of ERPs recorded from participants using their Left Hand indicated Congruency effects just missed significance in both Literal [*F*(1*,* 11) = 4*.*66, *p* = 0*.*054] and Metaphorical [*F*(1*,* 11) = 4*.*40, *p* = 0*.*06] words.

### **MEMORY TEST**

Due to time constraints on the experimental session (viz. all experiments had to be completed within a 2-h time span), only 22 of our 24 participants completed the memory test

300–500 ms time window.

**the Metaphorical condition (top panel) and in the Literal condition (bottom panel).** ERPs shown reflect a composite of the fronto-central ROI electrodes circled and were taken from participants using their left hand. Shaded area represents significant congruency effects 500–700 ms for Metaphorical condition only.

administered at the conclusion of the marbles task. The memory test for each participant was comprised of a random subset of "old" words, along with an equivalent number of "new" words that had not been presented during the experiment. Participants correctly recognized an average of 40% of the "old" words, with an average false alarm rate of 16.1%. Repeated measures ANVOA with the between-subjects factor Hand, and

their left hand.

within-subjects factors Lit/Met, and Word Direction revealed only a main effect of Lit/Met [*F*(1*,* 20) = 7*.*1, *p <* 0*.*05], with higher hit rates for Metaphorical (45%) than Literal (36.4%) words.

left panel shows a bar graph of ERP mean amplitudes averaged across all

# **DISCUSSION**

To explore the contributions of motor cortex to the processing of literal and metaphorical language, the present study examined the event-related brain response to visually presented words as participants performed a concurrent movement task. In keeping with embodied theories of meaning, we found that words whose verticality was literal (e.g., ceiling, floor) elicited more positive ERPs 200–300 ms post-word onset when accompanied by incongruent movements than congruent ones. Effects of movement congruency did not impact ERPs to words whose verticality was metaphorical until more than 500 ms after word onset. During the late interval 700–1100 ms, both literal and metaphorical words elicited more positive ERPs in the incongruent movement condition. Observed effects argue against a strong view of embodied metaphor that predicts early, bottom-up sensorimotor contributions to metaphoric meanings. However, the late congruency effects for the metaphorical items suggests participants were sensitive to a connection between abstract concepts and particular regions of vertical space in accordance with the orientation metaphor GOOD IS UP.

### **EARLY CONGRUENCY EFFECTS FOR LITERAL WORDS**

Early congruency effects observed for the literal condition are consistent with work in cognitive psychology suggesting words depicting concrete objects and actions can activate spatial features with some modal content. For example, Zwaan and Yaxley (2003) showed that participants were faster to judge that pairs of words, such as "attic" and "basement," were related to one another when they were arranged iconically with "attic" above "basement" than for the reverse configuration. Behavioral research also supports interaction between language and the visual system, as words associated with either high ("hat") or low ("boot") regions of space modulate visual attention in those regions (Estes et al., 2008). Moreover, responses to words such as "head" and "foot" have been shown to be faster when they involve a movement that is congruent with the words' spatial associations than incongruent, suggesting that words can evoke spatial schemas with a motor component (Borghi et al., 2004).

Not only do results of the present study support behavioral demonstrations of cross-talk between language and overt motor behavior, they go beyond those findings by showing the impact of the movement manipulation on the real time processing of the words themselves. We suggest that the congruency effect shown in **Figure 2** reflects greater activity in motor or premotor cortex due to the incompatibility of the vertical hand movements with the vertical features evoked by the words. In support of this interpretation, the 200 ms onset of the congruency effect is coincident with the time at which semantic processing of action verbs begins to influence movement kinematics (Boulenger et al., 2006). The interval 200 ms post-onset was also the interval in which Hauk and Pulvermüller (2004) observed a somatotopic response profile in ERPs to action verbs such as "kick" and "lick," consistent with a generator in motor and premotor cortex (Shtyrov et al., 2004).

The timing and topography of the observed congruency effect was also reminiscent of differences in the electrophysiological response to action verbs associated with motor features, and concrete nouns associated with a preponderance of visual features. Relative to nouns, action verbs elicit enhanced positivities over fronto-central electrodes beginning 200 ms post-onset, argued to reflect greater activity in motor and premotor cortices (Preissl et al., 1995; Pulvermüller et al., 1999a). Similar effects have been observed in the electrophysiological response to nouns with predominantly motor vs. predominantly visual associations, indicating these positivities do not stem from syntactic differences between nouns and verbs (Pulvermüller et al., 1999b).

The results of the current study add to the findings reviewed above by showing that engaging the motor system during a reading task can change the early ERP response depending on whether the task supports or hinders embodied processing of a single word. In sum, the early congruency effect for literal words may reflect increased recruitment of motor and premotor cortex engendered by the conflicting demands of the movement task and the motoric features activated by the words. As such, data from the present study provide support for the embodiment claim that concrete concepts have a perceptuo-motor basis and recruit brain structures involved in perception and action.

### **N400 INTERVAL**

One perhaps surprising finding in the present study was that the experimental manipulation, that is, directing participants to move marbles either upwards or downwards, had no detectable effects on the portion of the ERP waveform most consistently linked to semantic processing: the N400 component. These data contrast with prior reports of N400 effects resulting from experimental manipulations that varied the availability of modal information (Kellenbach et al., 2000; Collins et al., 2011; Hald et al., 2011). The present study differed from previous work, however, in its focus on motoric aspects of word meaning, as opposed to perceptual ones, such as the auditory or visual features of objects (Chwilla et al., 2007; Collins et al., 2011). Moreover, research on the neural substrate of action verb processing suggests motoric features are activated 200–300 ms after word onset (Pulvermüller, 2005), and perhaps precede the activation of perceptual features that have been shown to influence the amplitude of the N400 component (Chwilla et al., 2007).

Although it is unwise to read too much into a null result, the absence of N400 congruency effects in the present study are a poor fit with strong embodied meaning models that suggest words elicit fast, automatic sensorimotor activations that operate similarly across contexts. Moreover, the absence of N400 congruency effects in the present study contrasts with the report that verbs such as "applauded" elicit greater N400 when the experimental task requires an incongruent response (involving a clenched fist) than a congruent one (involving an open hand) (Aravena et al., 2010; Ibáñez et al., 2013). One crucial difference between these ERP studies of the action sentence compatibility effect and the present study is the use of single words rather than sentences. The disparity between our findings here with those reported by Aravena et al. (2010) is easily reconciled by models of embodied meaning that suggest sensorimotor activations result from sentence- rather than word- level meaning construction (Zwaan, 2004). Different from strong embodiment models, these models suggest that words prompt listeners to construct sensorimotor simulations that unfold over time. These weak embodiment models posit contextual variability in conceptual activations associated with words, and depict simulation as more of a strategic process (e.g., Lebois et al., 2014).

Supporters of a weak embodiment approach, then, might explain our failure to observe movement congruency effects by noting that the present study did not promote deep enough processing to impact the semantic retrieval operations indexed by the N400. Our task was simply to read the words while performing the movement task, and while participants were told that there would be a memory task, it was not administered until the very end of the ERP recording session. Behavioral research on spatial schemas evoked by words such as "sky" and "ground" indicates the emergence of spatial congruency effects depends on tasks that highlight these words' spatial features (Brookshire et al., 2010; Lebois et al., 2014). Future work should examine how tasks designed to increase the depth of processing impact early vs. late ERP congruency effects observed in the present study.

Another explanation of the absence of congruency effects 300–500 ms post-onset is that the design lacked adequate power. We find this unlikely, however, because we did observe reliable differences during this interval between metaphorically high words such as "delight" and metaphorically low words such as "agony" (see **Figure 3**). Given that no such difference was observed for literal high and low words that were matched in affective valence, we attribute this effect to the fact that the high words were positively valenced, while the low words were negative. The timing and topography of observed valence effects were in keeping with previous studies of affectively valenced words (see Fischler and Bradley, 2006 for a review), and did not interact with the movement condition.

### **LATE CONGRUENCY EFFECTS**

In contrast to the literally related words, metaphorically related words did not elicit movement congruency effects until after 500 ms, arguing against a strong embodied metaphor theory. Instead, metaphorical congruency effects emerged later, with incongruent words eliciting more positive ERPs than congruent ones in the latter part of the epoch. Congruency effects suggest that participants were sensitive to the purely metaphorical verticality of words such as "inspire" and "defeat." However, the lateness of these effects suggests they index processing that follows the initial access of meaning. Indeed, congruency effects for the metaphors presumably reflect the same sorts of intuitions that the participants in our norming study employed to assign a verticality rating to words such as "delight."

Interestingly, single words do not usually elicit late positive effects unless they are employed as a part of a judgment task. Late positivities are more commonly elicited by words in sentences and larger discourses (Van Petten and Luka, 2012), and the amplitude of the late positive complex has been argued to reflect the cost of integrating a word into the larger context (Brouwer et al., 2012). In the present study, the ongoing movement task presumably served as the context, and larger positivities to the incongruent words may reflect the need for additional processing when the direction of planned motion does not match the verticality of the presented word. Consistent with this interpretation, the amplitude of the late positivity is enhanced by response conflict (Doucet and Stelmack, 1999), especially in spatial compatibility paradigms (Leuthold and Sommer, 1999).

Late emerging congruency effects might thus be argued to support a weak form of embodied metaphor in which sensorimotor simulations arise in late stages of meaning processing, perhaps to guide pragmatic inferences. In keeping with this suggestion, late positivities in ERPs to linguistic stimuli have been associated with a number of pragmatic phenomena that require inferential operations, such as the comprehension of jokes (Coulson and Lovett, 2004), ironic remarks (Regel et al., 2010), metaphors (Coulson and Van Petten, 2002), and semantic novelty (Davenport and Coulson, 2011). Indeed, spatial schemas are often used to evoke pragmatic inferences, as in a performance evaluation that reads, "This employee has reached rock bottom, yet continues to dig."

Somewhat different than strong embodied models such as that suggested by Gallese and Lakoff (2005), we suggest a more tempered version of embodied metaphor in which the deployment of sensorimotor simulations is not automatic, but, rather, depends on strategic factors. Indeed, strong embodiment might be understood as adopting an overly reflexive model of lexical activation in which a given word gives rise to a definitive sensorimotor simulation. Many psycholinguists have eschewed the early idea that words automatically activate a fixed lexical entry, suggesting instead that they prompt context-sensitive retrieval from semantic memory (e.g., Coulson, 2006; Elman, 2009). Likewise, some advocates of grounded cognition have suggested that the linguistic activation of modal information varies as a function of context and task (Louwerse and Jeuniaux, 2008; Lebois et al., 2014).

On such a view, the relevance of applicable conceptual metaphors varies as a function of context in much the same way that the relevance of other conceptual information does. As a result, the mere reading of a word such as "wealth" will not necessarily elicit the spatial activations derived from the metaphor GOOD IS UP. Rather, contextual factors will render those associations more or less relevant. In the present study, the cognitive set induced by the movement task may have enhanced the salience of the orientation metaphor, making participants more sensitive to the congruency between concepts such as wealth and upwards-directed movements.

### **SUMMARY**

The early congruency effect for literal words may reflect increased recruitment of motor and premotor cortex engendered by the conflicting demands of the movement task and the motoric features activated by the words. As such, data from the present study provide support for the embodiment claim that concrete concepts have a perceptuo-motor basis and recruit brain structures involved in perception and action. By similar reasoning, the absence of comparable congruency effects for the metaphors argues against embodied metaphor theories that posit rapid, bottom-up activation of sensorimotor cortex as part of word comprehension. Late emerging congruency effects for metaphors are more consistent with weak embodiment that posits strategic connections between abstract concepts and spatial schemas. Whereas the early congruency effects reflect the impact of the movement task on processing word meaning, the late congruency effects may reflect the availability of spatial schemas for pragmatic inference.

### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fnhum*.* 2014*.*01031/abstract

### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 August 2014; accepted: 08 December 2014; published online: 23 December 2014.*

*Citation: Bardolph M and Coulson S (2014) How vertical hand movements impact brain activity elicited by literally and metaphorically related words: an ERP study of embodied metaphor. Front. Hum. Neurosci. 8:1031. doi: 10.3389/fnhum.2014.01031 This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Bardolph and Coulson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# ADVANTAGES OF PUBLISHING IN FRONTIERS

FAST PUBLICATION Average 90 days from submission to publication

COLLABORATIVE PEER-REVIEW

Designed to be rigorous – yet also collaborative, fair and constructive

RESEARCH NETWORK Our network increases readership for your article

# OPEN ACCESS

Articles are free to read, for greatest visibility

### TRANSPARENT

Editors and reviewers acknowledged by name on published articles

GLOBAL SPREAD Six million monthly page views worldwide

### COPYRIGHT TO AUTHORS

No limit to article distribution and re-use

IMPACT METRICS Advanced metrics track your article's impact

SUPPORT By our Swiss-based editorial team

EPFL Innovation Park · Building I · 1015 Lausanne · Switzerland T +41 21 510 17 00 · info@frontiersin.org · frontiersin.org