# FACING THE OTHER: NOVEL THEORIES AND METHODS IN FACE PERCEPTION RESEARCH

EDITED BY: Davide Rivolta, Aina Puce and Mark A. Williams PUBLISHED IN: Frontiers in Human Neuroscience

#### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-794-1 DOI 10.3389/978-2-88919-794-1

## About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

## Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

## What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **FACING THE OTHER: NOVEL THEORIES AND METHODS IN FACE PERCEPTION RESEARCH**

Topic Editors: **Davide Rivolta,** University of East London (UEL), UK **Aina Puce,** Indiana University, USA **Mark A. Williams,** Macquarie University, Australia

"facciaAfaccia" Image by IUCU

We rely heavily on faces during social interactions. Humans possess the ability to recognise thousands of people very quickly and accurately without effort. The serious social difficulties that follow abnormalities of the face recognition system (i.e., prosopagnosia) strongly underline the importance of typical face skills in our everyday life. Over the last fifty years, research on prosopagnosia, along with research in the healthy population, has provided insights into the cognitive and neural features behind typical face recognition. This has also been achieved thanks to non-invasive neuroimaging techniques such as functional Magnetic Resonance Imaging (fMRI), Electroencephalography (EEG), Magnetoencephalog-

raphy (MEG), Diffusion Tensor Imaging (DTI) and Transcranial Magnetic Stimulation (TMS). However, there is still much debate about the cognitive and neural mechanisms of face perception.

In the current "Research Topic" we plan to gather experimental works, opinions, commentaries, mini-reviews and reviews that focus on new or novel theories and methods in face perception research. Where is the field at the moment? Do we need to re-think the experimental procedures we have adopted so far? Again, what kind of techniques (or combination of them) and analysis methods will be important in the future? From the experimental point of view we encourage both behavioural and neuroimaging contributions (e.g., fMRI, EEG, MEG, DTI and TMS).

Despite the main emphasis on face perception, memory and identification, we will also consider original works that focus on other aspects of face processing, such as expression recognition, attractiveness judgments and face imagery. In addition, animal investigations and experimental manipulations that alter face recognition abilities in typical human subjects (e.g., hypnosis) are also welcome. Overall, we are proposing a Research Topic that looks at face processing using different perspectives and welcome contributions from different domains such as psychology, neurology, neuroscience, cognitive science and philosophy.

The current "Research Topic" evolved over the desire to acknowledge the relatively recent loss of three giants in the field: Drs. Shlomo Bentin, Truett Allison and Andy Calder. We dedicate this "Research Topic" to them and their pioneering studies.

**Citation:** Rivolta, D., Puce, A., Williams, M. A., eds. (2016). Facing the Other: Novel Theories and Methods in Face Perception Research. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-794-1

# Table of Contents

*07 Editorial: Facing the Other: Novel Theories and Methods in Face Perception Research*

Davide Rivolta, Aina Puce and Mark A. Williams

## **Prosopagnosia**


Janina Esins, Johannes Schultz, Christian Wallraven and Isabelle Bülthoff


Tina T. Liu and Marlene Behrmann

*78 Multi-voxel pattern analysis (MVPA) reveals abnormal fMRI activity in both the "core" and "extended" face network in congenital prosopagnosia* Davide Rivolta, Alexandra Woolgar, Romina Palermo, Marina Butko, Laura Schmalzl and Mark A. Williams

## **Reviews/Theories**

*89 The rehabilitation of face recognition impairments: a critical review and future directions*

Sarah Bate and Rachel J. Bennetts

*106 Face processing improvements in prosopagnosia: successes and failures over the last 50 years*

Joseph M. DeGutis, Christopher Chiu, Mallory E. Grosso and Sarah Cohan

*120 In your face: transcendence in embodied interaction* Shaun Gallagher


## **Clinical Conditions**

*136 A parametric study of fear generalization to faces and non-face objects: relationship to discrimination thresholds*

Daphne J. Holt, Emily A. Boeke, Rick P. F. Wolthusen, Shahin Nasr, Mohammed R. Milad and Roger B. H. Tootell


## **Holistic/Configural Processing/Facial features**

*178 Using hypnosis to disrupt face processing: mirrored-self misidentification delusion and different visual media*

Michael H. Connors, Amanda J. Barnier, Max Coltheart, Robyn Langdon, Rochelle E. Cox, Davide Rivolta and Peter W. Halligan

*190 Individual differences in cortical face selectivity predict behavioral performance in face recognition*

Lijie Huang, Yiying Song, Jingguang Li, Zonglei Zhen, Zetian Yang and Jia Liu


Günter Meinhardt, Bozana Meinhardt-Injac and Malte Persike

*236 Differential age-related changes in N170 responses to upright faces, inverted faces, and eyes in Japanese children*

Kensaku Miki, Yukiko Honda, Yasuyuki Takeshima, Shoko Watanabe and Ryusuke Kakigi

## *247 The face inversion effect in opponent-stimulus rivalry* Malte Persike, Bozana Meinhardt-Injac and Günter Meinhardt

*258 Photographic but not line-drawn faces show early perceptual neural sensitivity to eye gaze direction*

Alejandra Rossi, Francisco J. Parada, Marianne Latinus and Aina Puce

*274 Own-race and own-age biases facilitate visual awareness of faces under interocular suppression*

Timo Stein, Albert End and Philipp Sterzer


## **Face identity, expression and body expression**


Andrew D. Engell and Gregory McCarthy

*332 Discriminable spatial patterns of activation for faces and bodies in the fusiform gyrus*

Na Yeon Kim, Su Mei Lee, Margret C. Erlendsdottir and Gregory McCarthy


Alla Yankouskaya, Glyn W. Humphreys and Pia Rotshtein

# Editorial: Facing the Other: Novel Theories and Methods in Face Perception Research

Davide Rivolta<sup>1</sup> \*, Aina Puce<sup>2</sup> and Mark A. Williams <sup>3</sup>

*<sup>1</sup> School of Psychology, University of East London, London, UK, <sup>2</sup> Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, USA, <sup>3</sup> Perception in Action Research Centre, and ARC Centre of Excellence in Cognition and its Disorders, Department of Cognitive Science, Faculty of Human Sciences, Macquarie University, Sydney, NSW, Australia*

Keywords: face recognition, holistic processing, Prosopagnosia, fMRI methods, EEG/MEG, fNIRS, Philosophy, MVPA

**The Editorial on the Research Topic**

### **Facing the Other: Novel Theories and Methods in Face Perception Research**

Human and non-human primates rely on information gathered from faces during social interaction. Two channels of information are gathered from the face—the identity of the individual (conveyed by featural and configural aspects of the face), as well as their mental state and potential intentions (conveyed by the dynamic face). With respect to identity, humans can recognize thousands of people who are familiar to them very quickly and accurately without effort. Similarly, new individuals can be identified often after only one previous encounter. With respect to the affective/mental states of others, these can be inferred from affective expressions as well as other facial signals such as gaze changes from both strangers and those who are well known to us.

That brain injuries can cause selective deficits in the recognition of identity was at the core of a now classic model of face perception proposed by Vicki Bruce and Andy Young in the late 1980s (Bruce and Young, 1986). The original model also postulated a pathway for dealing with facial expressions. Indeed, further work on the affective aspects of face processing was conducted by Andy Calder in collaboration with Andy Young (Calder and Young, 2005). The elements of this model have been given a parallel processing and functional neuroanatomical bent, based on not only patient studies, but on functional neuroimaging investigations showing ventral visual pathway activity in healthy subjects (Haxby et al., 2000). Multivariate classification methods (e.g., MVPA) analyzing fMRI data further indicate that category-specific patches of cortex, such as those observed to faces, may be identified in the ventral visual system (Haxby et al., 2001), consistent with human intracranial neurophysiological studies (Puce et al., 1999).

In the current "Research Topic" we have gathered 33 works that include experimental studies, as well as hypothetical and theoretical contributions that review the various behavioral, neurophysiological, and hemodynamic correlates of face processing across the lifespan in both typical and atypical populations. It is our view that future important achievements in the field will likely be derived via interdisciplinary collaborations of scientists coming from different fields, as evidenced by the manuscripts in this volume that include contributions from social and cognitive neuroscientists, cognitive scientists, clinical psychologists, philosophers, as well as engineers, and physicists. Overall, face processing in healthy subjects formed the major corpus of manuscripts in the current "Research Topic." Prosopagnosia was the main theme of six research studies (see below) and two reviews on strategies to enhance face-processing skills (Bate and Bennetts; DeGutis et al.), thus representing the most investigated condition in this special issue on face processing.

#### Edited and reviewed by:

*Srikantan S. Nagarajan, University of California, San Francisco, USA*

## \*Correspondence:

*Davide Rivolta d.rivolta@uel.ac.uk*

Received: *02 December 2015* Accepted: *19 January 2016* Published: *08 February 2016*

#### Citation:

*Rivolta D, Puce A and Williams MA (2016) Editorial: Facing the Other: Novel Theories and Methods in Face Perception Research. Front. Hum. Neurosci. 10:32. doi: 10.3389/fnhum.2016.00032*

Clinical conditions such as social anxiety, epilepsy, attention deficit hyperactivity disorder (ADHD) and autism have also received attention in this "Research Topic". In individuals with social anxiety, skin conductance recordings show enhanced unconscious threat processing relative to those from neurotypical individuals (Jusyte and Schonenberg). Furthermore, emotion processing has been discussed in light of threat detection (Holt et al.) and in relation to the genesis and maintenance of psychopathology (Tanzer et al.). Finally, individuals with ADHD and autism have been differentiated using a novel classification method of hemodynamic responses (as measured with functional near infra-red spectroscopy; fNIRS; Ichikawa et al.).

Face perception, particularly with respect to gleaning an individual's identity, has long been proposed to engage a holistic/configural type of processing, which involves the analysis of the face as a whole, rather than processing individual features in isolation (Young et al., 1987; Maurer et al., 2002; McKone and Yovel, 2009). Holistic/configural processing has been deduced, amongst others, from studies of the "face inversion effect" (i.e., greater difficulty perceiving facial identity in inverted relative to upright faces; Yin, 1969) and the "composite face effect" (i.e., reduced facial identification performance when face halves are vertically aligned compared to when they are misaligned; Young et al., 1987). Profound facial identification deficits (i.e., prosopagnosia) are known to follow injuries to part of the ventral occipito-temporal cortex, seriously undermining the ability of the affected individuals to maintain normal social interactions (Barton, 2008; Rossion, 2008). A more puzzling problem is that of facial recognition deficits that have been present from birth in individuals with no known neurological disease—a condition known as congenital or developmental prosopagnosia (CP; Duchaine, 2000; Behrmann and Avidan, 2005; Rivolta et al., 2013). Face identity recognition can also be more difficult (i.e., increased response time, reduced performance) when the face belongs to an individual from a different race from a neurotypical healthy subject; this is known as the "Other Race Effect" (ORE; Meissner and Brigham, 2001).

In the current "Research Topic", the importance of holistic/configural processing for typical face perception has been underlined in two contributions showing its impairment in both CP (Liu and Behrmann) and in acquired prosopagnosia (AP; Jansari et al.). Using tasks that tapped into holistic/configural processing, such as the composite faces task (Liu and Behrmann), the Navon task and the face-fracturing test (Jansari et al.), these studies demonstrate similar deficits in face processing in a group of individuals who have had face processing deficits since birth (i.e., CP) and in an individual who acquired his deficit much later in life (i.e., AP). In a contribution investigating the ORE, Esins et al. show that despite its apparent similarity to CP, the ORE and CP are unlikely to share the same cognitive mechanism.

Additionally, five studies in the current issue have attempted to further delineate characteristics of holistic/configural face processing in healthy subjects in terms of the experimental design of the composite face task (Meinhardt et al.), of the physical properties of the face itself (Persike et al.; Stein et al.), and of the familiarity of the face (Liccione et al.; Visconti di Oleggio Castello et al.). In sum, results demonstrate: (1) the validity of the complete-design of the composite face task (Meinhardt et al.); (2) a stronger inversion effect for faces than for houses when assessed using opponent-stimulus rivalry (Persike et al.); (3) stronger facedetection mechanisms for same-race and same-age faces when assessed by continuous flash suppression (Stein et al.); (4) a critical role of face familiarity (especially for personally familiar people such as family members) in driving a stronger faceinversion effect (Liccione et al.) and in driving better detection of social cues (Visconti di Oleggio Castello et al.). Finally, novel philosophical accounts based on the phenomenological tradition (e.g., Heidegger, 1996; Marleau-Ponty, 2002) and on work from Levinas (e.g., Levinas, 1969) that mainly focus on the embodied nature of humans have been proposed to re-interpret studies of typical and atypical face processing (Gallagher; Liccione et al.).

Additionally, two novel investigations of face processing with methods such as hypnosis (Connors et al.), and adopting an individual difference approach, when dealing with very large samples (Huang et al.; Yovel et al.) round out the studies dealing with the holistic/configural processing of faces. These studies highlight that important information can be gleaned from between-subject variance in datasets. The evidence that face recognition ability varies across individuals and dissociates from other cognitive abilities is explored as a model that may result in the discovery other specific abilities (Wilmer et al.).

Given the distributed system for processing face identity and expression in the human brain (Haxby et al., 2000), the current "Research Topic" also features a series of contributions that investigate interactions between face identity, face expression, and body expression in neurotypical subjects (Van den Stock and de Gelder; Vicario and Newman; Yankouskaya et al.), and in individuals with CP (Daini et al.). Results in control participants demonstrated that: (1) task-irrelevant bodily expressions influence face-identity matching performance (Van den Stock and de Gelder); (2) emotional-face primes affect the perception of emotional hand gestures (Vicario and Newman); (3) face identity and expression interact when assessed with the Garner paradigm, the composite face task, and the divided attention tasks (Yankouskaya et al.). In contrast, people with CP were impaired in detecting the identity of unfamiliar faces, but not in the detection of non-emotional facial expressions, thus suggesting a dissociation between changeable and invariant configural processing in CP (Daini et al.). Additionally, Kim et al. show that that MVPA is more sensitive than traditional univariate analysis for characterizing the spatial distribution of face- and body-specific activations in the human brain. These results have been corroborated in a second paper (Rivolta et al.) that additionally demonstrated aberrant face versus object activation patterns in CP compared to typical face recognizers. Intracranial EEG recordings in drug-resistant epileptic patients posit that eyesensitive brain regions are actually more abundant and more selective than brain regions that are face- and body- sensitive (Engell and McCarthy).

Neurophysiological studies over twenty years ago demonstrated that a specific negative potential at around 170 ms post-stimulus onset can index aspects of face processing. This ERP was first demonstrated in scalp EEG recordings by Shlomo Bentin and his colleagues, and is known as N170 (Bentin et al., 1996), and in intracranial EEG (N200) by Truett Allison and his team (Allison et al., 1994). The magnetic analog of N170 can also be recorded with magnetoencephalography (MEG), and this entity is known as M170 (Liu et al., 2002; Rivolta et al.). N/M170 is not the only component to show face-sensitive properties—other ERP components have also been described in adults and also in older children (Taylor et al., 2004; Rivolta et al., 2012, 2014; Rossion, 2014). The later components involved in recollection and familiarity of faces, were also explored in CPs, demonstrating abnormal neural processing during face recognition in these individuals compared to controls (Burns et al.).

How can the neurophysiological data inform our understanding of face processing in the human brain? Hemodynamic studies have identified the neuroanatomical substrates for face processing in the human brain. MEG and EEG studies have the capability to characterize the timing underlying these processes (Buzsáki et al., 2012). In the current "Research Topic," N/M170, and other face components, were studied during holistic/configural processing (Marinkovic et al.; Vakli et al.). Reduced gender-adaptation from stretched faces (a manipulation that affects holistic/configural processing) as compared to normal faces (Vakli et al.), and increased and delayed M170 in the right posterior fusiform gyrus (Marinkovic et al.) for inverted faces was found (in line with earlier scalp EEG studies, e.g., Bentin et al., 1996). N170 recordings to eyes and upright and inverted faces in Japanese children indicate that an adult neurophysiological pattern is not seen in children that are younger than 13 years of age (Miki et al.). Interestingly, Nakabayashi and Liu have re-examined the developmental behavioral literature and make the claim that holistic processing is present in early childhood, indicating that some future studies will need to reconcile behavioral and neurophysiological data.

Social context influences how neurophysiological activity to emotional expressions manifests. Specifically, as early as N170,

## REFERENCES


augmentation of the neural response occurs to non-neutral expressions in faces that have been designated as future partners for a social interaction. These data clearly indicate how top-down processing can modulate sensory activity (Bublatzky et al.).

As already noted, neurophysiological methods can identify the timing of neural activity and its dynamics. Given that this is the case, these methods are ideal for studying activity elicited to dynamic faces. Rossi and colleagues show that augmented N170s to viewed dynamic gaze aversions occur to real but not impoverished faces, suggesting that local scleral/iris luminance and contrast plays a role in generating these responses. Additionally, bursts of gamma activity at around 200 and 300 ms post-motion onset may signal detection of facial motion (Rossi et al.). There is a need for more studies evaluating both the dynamics of the MEG and EEG signals and ERP measures so that the earlier and more recent literatures can be bridged. In a similar fashion, comparing data in the same subjects viewing static and dynamic faces (the former in highly controlled lab setting and the latter in more ecologically valid contexts) is greatly needed.

The current "Research Topic" evolved over the desire to acknowledge the relatively recent loss of three giants in the field: Drs. Shlomo Bentin, Truett Allison, and Andy Calder. Shlomo Bentin was fascinated by the holistic/configural aspect of face processing, Andy Calder was stimulated to study how the brain deals with affective facial information, and Truett Allison was interested in the functional neuroanatomy of both facial processing streams—identity and affect. All three scientists were known for working with multiple assessment methods and varied subject populations. We dedicate this "Research Topic" to them and their pioneering studies.

## AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Rivolta, Puce and Williams. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Recognition memory in developmental prosopagnosia: electrophysiological evidence for abnormal routes to face recognition

## **Edwin J. Burns \*, Jeremy J. Tree and Christoph T. Weidemann\***

Department of Psychology, Swansea University, Swansea, Wales, UK

#### **Edited by:**

Davide Rivolta, University of East London, UK

#### **Reviewed by:**

Brad Duchaine, Dartmouth College, USA Ciara Mary Greene, University College Cork, Ireland

#### **\*Correspondence:**

Edwin J. Burns and Christoph T. Weidemann, Department of Psychology, Swansea University, Singleton Park, Swansea SA2 8PP, Wales, UK e-mail: edwinjamesburns@ gmail.com; ctw@cogsci.info

Dual process models of recognition memory propose two distinct routes for recognizing a face: recollection and familiarity. Recollection is characterized by the remembering of some contextual detail from a previous encounter with a face whereas familiarity is the feeling of finding a face familiar without any contextual details. The Remember/Know (R/K) paradigm is thought to index the relative contributions of recollection and familiarity to recognition performance. Despite researchers measuring face recognition deficits in developmental prosopagnosia (DP) through a variety of methods, none have considered the distinct contributions of recollection and familiarity to recognition performance. The present study examined recognition memory for faces in eight individuals with DP and a group of controls using an R/K paradigm while recording electroencephalogram (EEG) data at the scalp. Those with DP were found to produce fewer correct "remember" responses and more false alarms than controls. EEG results showed that posterior "remember" old/new effects were delayed and restricted to the right posterior (RP) area in those with DP in comparison to the controls. A posterior "know" old/new effect commonly associated with familiarity for faces was only present in the controls whereas individuals with DP exhibited a frontal "know" old/new effect commonly associated with words, objects and pictures. These results suggest that individuals with DP do not utilize normal face-specific routes when making face recognition judgments but instead process faces using a pathway more commonly associated with objects.

**Keywords: prosopagnosia, face recognition, recognition memory, familiarity, recollection, electroencephalogram (EEG)**

## **INTRODUCTION**

Prosopagnosia is a selective face perception disorder characterized by an impairment for recognizing faces combined with intact low level visual processing (Bodamer, 1947). It had been thought until recently that prosopagnosia was a rare disorder, with the vast number of identified cases acquiring problems with face recognition following some form of brain injury (Farah, 1990). However, cases with no evidence of neurological injury have been identified in recent years (e.g., de Haan, 1999; Duchaine, 2000; Duchaine et al., 2003). These latter cases have become known as Congenital or Developmental Prosopagnosia (DP). It has been suggested that as many as 1 in 40 of the population meets the criteria for DP (Kennerknecht et al., 2006), with some cases appearing to run in families (Duchaine et al., 2007; Grueter et al., 2007). While individuals with DP exhibit difficulties in recognizing faces, many, but not all, have been shown to possess normal attractiveness processing (Carbon et al., 2010), as well as intact recognition abilities for eye gaze (Duchaine et al., 2009), face emotion (Duchaine et al., 2003; Humphreys et al., 2007), face motion information (Steede et al., 2007; Longmore and Tree, 2013) and greebles (artificial objects designed to be processed holistically like a face; Duchaine et al., 2004).

Face recognition deficits associated with prosopagnosia have been studied using a wide variety of methods: forced choice tasks (e.g., Duchaine and Nakayama, 2006; Rivolta et al., 2012), familiarity judgments (e.g., Kress and Daum, 2003; Grueter et al., 2007) or recall tests for semantic information related to faces such as a name or profession (e.g., Grueter et al., 2007). Dual process models of recognition memory (e.g., Atkinson and Juola, 1973, 1974; Mandler, 1980; Jacoby, 1991; Yonelinas, 1994) propose that there are two distinct routes with which one can recognize a previously seen face: familiarity and recollection. Most of us can relate to the experience of meeting someone and finding their face familiar but, rather frustratingly, being unable to remember any details from when or where one might have met them; this is an example of familiarity based recognition. Recollection on the other hand is characterized by remembering some form of contextual detail, such as specific previous encounters. Traditional dual process models propose that familiarity can vary in strength whereas recollection is usually assumed to be an all-or-nothing, high strength memory (Yonelinas, 2002; for an alternative perspective on the nature of recollection, see Donaldson, 1996; Wixted, 2007; Wixted and Mickes, 2010).

A raft of behavioral, neuropsychological, electrophysiological and neuroimaging studies have provided evidence in support of this dissociation between familiarity and recollection (for reviews, see Yonelinas, 2002; Aggleton and Brown, 2006; Diana et al., 2007). One behavioral method for dissociating familiarity and recollection is the Remember/Know (R/K) procedure (Tulving, 1985). Participants are asked to study a series of items and are then tested on the studied target items along with previously unknown lures. Participants are required to make judgments of "Remember", that is if they could recollect some detail of the item from study, "Know", where they knew they had seen the item in the previous list but could not recollect any details of its presentation or "New", an item that was not on the previous list. It is thought that "remember" responses reflect the recollection process whereas "know" responses measure the contribution of familiarity (Yonelinas, 2002). This suggests that remember responses are associated with high confidence due to the high strength of memory that recollecting details surrounding an item's previous occurrence brings (Eichenbaum et al., 2007). Know responses, however, engender a more pliable level of confidence due the fact familiarity can vary in memory strength (Eichenbaum et al., 2007) The R/K procedure has been successful in dissociating recollection and familiarity effects in electrophysiological (Düzel et al., 1997) and neuroimaging studies (Henson et al., 1999). The present study is the first to use the R/K paradigm to study the recognition of previously unknown faces in individuals with DP.

Traditionally, event related potential (ERP) studies of pictures (e.g., Tsivilis et al., 2001), objects (e.g., Duarte et al., 2004; Groh-Bordin et al., 2006) and words (e.g., Curran, 2000; Maratos et al., 2000) have found familiarity to be associated with early enhanced positivity over frontal regions between 300–500 ms after test stimulus onset, whereas later positivity over parietal sites between 500–700 ms indicates recollection. However, recent ERP studies examining recognition memory for previously unknown faces have suggested that familiarity and recollection might differ temporally and neurally to that of words and objects (Yovel and Paller, 2004; MacKenzie and Donaldson, 2007; Herzmann et al., 2011). These results contribute to the ample evidence suggesting that faces are special stimuli processed differently from other objects (for a review, see McKone and Robbins, 2011). By using an adapted R/K procedure, Yovel and Paller (2004) found that familiarity for faces was associated with a parietal old/new effect between 300–700 ms, whereas recollection for faces was associated with similar positivity over the posterior of the scalp, but also some anterior regions during the same time period. Recollection and familiarity were also found to be maximal between 500–700 ms after stimulus onset. A study by MacKenzie and Donaldson (2007) also found spatially and temporally similar familiarity and recollection ERP effects for faces. In contrast to these studies, Curran and Hancock (2007) found face related ERP effects similar to that of words, pictures and objects. These results might be due to their participants recognizing face images on the basis of extraneous information in the images rather than the facial features. In a follow-up study, Herzmann et al. (2011) showed ERP effects for faces in line with earlier work cited above when extraneous cues were excluded from face images. These results suggest that the removal of any such extraneous cues from face images is important for the study of face processing and consistent with previous work showing that general object processing can be dissociated from that of faces (McNeil and Warrington, 1993; Farah et al., 1995; Moscovitch et al., 1997).

The present study examines recognition memory for faces in those with normal face recognition abilities and individuals with DP in order to determine the relative contributions of recollection and familiarity to performance in these two groups. Moreover, the use of electroencephalogram (EEG) measures enables us to determine the degree to which differences in performance across these two groups reflect qualitative (rather than just quantitative) differences in face processing.

## **METHODS**

### **PARTICIPANTS**

Eight individuals with DP and 20 control participants took part in this study. Four of the individuals with DP and 11 of the control participants were female. The ages of the individuals with DP ranged from 20–38 years (*M* = 25.6 years) and that of the control participants ranged from 18–40 years (*M* = 24.5 years). All participants had normal or corrected to normal vision. One of the individuals with DP and 2 of the control participants were left-handed. Data from 1 control participant was rejected from all analyses due to behavioral performance appearing to be at chance levels. Nine controls failed to correctly respond "know" on enough trials to create reliable ERP waveforms for these responses and were excluded from the ERP analyses described below (we confirmed that their ERPs for correct "remember" responses matched those for the remaining 10 controls and their choice responses are included in **Tables 1**–**4**). The ERPs for the control group are based on 5 male and 5 female participants between the ages of 19 and 40 years (*M* = 27.9) one of which was left handed. Ethical approval for this study was granted by the departmental Ethics Committee at Swansea University.

In line with previous researchers (Duchaine et al., 2007; Bate et al., 2008), we used a battery of neuropsychological tests

**Table 1 | Neuropsychological testing results of the 8 DP cases: Famous Faces Test (FFT), Cambridge Face Memory Test (CFMT), Cambridge Face Perception Test upright and inverted (CFPTupr and CFPTinv).**


#### **Table 2 | Mean accuracy and proportion of correct and incorrect responses (with standard errors).**


#### **Table 3 | Discriminability (with standard errors).**


#### **Table 4 | Mean response times (RTs) of correct and incorrect responses in ms (standard errors)**.


(described in detail below) to diagnose DP. Unless noted otherwise, we took the appropriate norms from the respective research publications. **Table 1** displays the DP cases that participated in this experiment and their neuropsychological tests of face processing impairment. The Famous Faces Test (FFT; Duchaine and Nakayama, 2005) consists of 60 celebrity faces which the participant is required to name or identify in some way. We collected FFT data from 164 participants (101 female) using a shortened FFT (35 faces) in a separate study from the present one to ascertain normative means and SDs for the general population in the local geographical area (*M* = 94.6%, *SD* = 6.23). As can be seen from **Table 1**, all of the DP cases were severely impaired at recognizing famous faces. The Cambridge Face Memory Test (CFMT; Duchaine and Nakayama, 2006) requires the participant to memorize six target faces presented in a number of different views; these faces must then be identified when displayed individually with two distractor faces. We only recruited DP cases that showed an impairment of two SDs or more below the mean in both the CFMT and FFT. During the Cambridge Face Perception Test (CFPT; Duchaine et al., 2007), participants are shown a target face presented in three-quarter view along with six faces presented in frontal view; these six faces have been morphed to appear similar in varying percentages to the target face. Participants are required to arrange the faces in order of similarity to the target face. The test displays faces either upright or inverted. As can

**FIGURE 1 | Mock-up examples of the male and female face stimuli used**.

be seen from **Table 1**, five of the DP participants were impaired on the CFPT with a sixth case approaching 2 SDs below the mean; it should be noted that a diagnosis of prosopagnosia is not reliant upon impairment on this task. We also screened control participants for prosopagnosia by administering the CFMT and confirmed that all *z*-scores were within the normal range (−1.5– 1.4, *M* = −0.36).

#### **STIMULI**

Experimental stimuli consisted of 324 photographic bitmap images of faces, half of which were male. **Figure 1** shows mockup examples of two such stimuli. All faces were unknown to the participants. The faces were presented in the center of a black background on a 14<sup>00</sup> color monitor. The stimuli subtended horizontal and vertical visual angles of approximately 3.9◦ and 5.4◦ respectively. In addition, each face was masked to remove the original background, hair, and ears, i.e., cues that could lead to recognition not based upon the face itself. Luminance of each face was homogenized for the same purpose.

#### **PROCEDURE**

Following application of electrodes (described below), participants were seated on a comfortable chair in a dimly lit booth. The participants faced a computer screen at a distance of approximately 90 cm, with the response buttons placed comfortably within reach to record responses. Participants were fully instructed prior to a practice session consisting of a study and test phase. Before the beginning of any study or test phase, the instructions for each task were repeated to remind participants as to what was required. Between phases, participants were also reminded to remain as still as possible and to fixate centrally throughout stimulus presentation.

The experiment was comprised of 27 blocks of study and test lists. At study participants were asked to remember the faces as best they could and were told that their memory for the faces would be tested in a subsequent test phase. In each study phase participants viewed four repetitions of six face images (half of which were male) for a total of 24 trials. Presentation of the face images was random subject to the constraint that all six faces had to be presented before the next round of repetitions and that no faces repeated across blocks. Each trial consisted of a white fixation cross presented for either 450 or 550 ms, followed by the presentation of a face image for 2500 ms. A 500 ms blank screen then followed prior to the presentation of the next trial.

All faces displayed during the previous study phase, and the six new faces were presented in a random order at test (subject to the constraint that no faces repeated across blocks). Participants were asked to decide whether each face had been presented in the previous study phase, or not, by pressing "remember" if they could remember specific details from the study phase, "know" if they thought the face was encountered in the previous study phase but without remembering any details, or "new" with the first three fingers of their dominant hand (the mapping between buttons and responses was counterbalanced across all participants). Each trial consisted of a white fixation cross presented for either 450 or 550 ms, followed by the presentation of a face image for 2000 ms. Following the face, a white fixation cross would appear again for 150 ms and then a screen prompting participants to respond "remember", "know" or "new" would appear; this screen would remain on screen until a response was made. Participants could not respond until this response prompt screen had appeared. After a response was made, another fixation screen would appear for 150 ms followed by another screen prompting participants to rate on a scale of 1–6 how confident they were of their previous response.

#### **EEG RECORDING**

We recorded electrophysiological data throughout the experiment. The recording at scalp was taken from 128 Ag-AgCl "active" electrodes set in an elastic Biosemi (Amsterdam, the Netherlands) cap. Each electrode was set within the cap in equidistant concentric circles from the 10 to 20 position Cz (Jasper, 1958). The horizontal electro-oculogram (EOG) was recorded from electrodes placed on the outer canthi of each eye. The vertical EOG was recorded from an electrode placed below the left eye. The EEG was recorded referenced to a common mode sense (CMS) electrode, and then re-referenced offline to a common average reference through the use of Brain Electrical Source Analysis (BESA) software (MEGIS software GmbH, Graefelfing, Germany). All electrode channels were band pass filtered from 0.01 to 40 Hz. The analogue signal was digitally sampled at a rate of 512 Hz. ERPs were time locked to the presentation of stimuli, with an epoch that began 200 ms prior to stimulus onset and lasted for 1000 ms post-stimulus. Epochs found to contain EOG artifacts exceeding ±100 µV were rejected from analysis, as were trials where drift from baseline (difference between first and last data point) was greater than 50 µV. We retained data only from those participants with at least 20 remaining trials in each of the experimental conditions of interest. Blink artifacts were corrected using the algorithm implemented in BESA (Berg and Scherg, 1994).

#### **RESULTS**

#### **BEHAVIORAL RESULTS**

**Table 2** displays the percentage of hits, that is the correct identification of a studied face as studied, from the control and DP participants. Between samples *t*-tests comparing the two groups revealed significant differences for the hits [*t*(25) = 4.52, *SE* = 2.89, *p* = 0.009], suggesting that the control participants were better at identifying studied faces as having been previously seen when compared to the individuals with DP. The mean proportion of response types for hits for the controls and those with DP are also shown in **Table 2**. A mixed within-between subject ANOVA of Group (DP, control) × Response ("remember", "know") revealed a significant Group × Response interaction [*F*(1,25) = 7.84, *MSE* = 5363.29, *p* = 0.01] and a significant effect of Response [*F*(1,25) = 10.74, *MSE* = 7346.11, *p* = 0.003]. Paired samples *t*-tests revealed that the control participants made significantly more "remember" than "know" responses when correctly identifying an old face as previously seen [*t*(18) = 5.315, *SE* = 8.91, *p* < 0.001], and no significant differences in response proportions for individuals with DP [*t*(7) = 0.332, *SE* = 11.22, *p* = 0.75]. Between samples *t*tests revealed significant differences between the individuals with DP and control participants in their proportion of "remember" responses [*t*(25) = 2.8, *SE* = 7.79, *p* = 0.01]. These results show that when control participants correctly identified previously studied faces, they did so more frequently using "remember" responses than individuals with DP.

**Table 2** also displays the percentage of false alarms, that is the incorrect identification of a previously unknown lure face as studied, from the control and DP participants. Between samples *t*tests comparing the two groups revealed significant differences for the false alarms [*t*(25) = −4.21, *SE* = 4.73, *p* < 0.001], suggesting that the DP participants were more likely to identify an unstudied face as studied in comparison to the controls. Also displayed in **Table 2** is the mean proportion of incorrect identification of test faces as studied (false alarms). A mixed within-between subject ANOVA of Group (DP, control) × Response ("remember", "know") revealed a significant effect of response [*F*(1,25) = 44.26, *MSE* = 43514.79, *p* < 0.001]. Paired samples *t*-tests revealed that both groups were more likely to incorrectly identify a previously unknown face as being studied using a "know" response rather than a "remember" response [*t*(18) = 4.247, *SE* = 11.78, *p* < 0.001, and *t*(7) = 13.568, *SE* = 5.48, *p* < 0.001], for control participants and individuals with DP respectively.

**Table 3** displays the mean discriminability (hits—false alarms; Donaldson, 1996). A discriminability score of 0 corresponds to no discrimination between studied and new items. A between samples *t*-test revealed significant differences in discriminability between the DP and control participants [*t*(25) = 4.98, *SE* = 0.25, *p* < 0.001]. This suggests that individuals with DP found it harder than controls to discriminate between old and new faces. Between samples *t*-tests also revealed that for "remember" responses, control participants were more effective than those with DP at discriminating old and new faces [*t*(25) = 2.78, *SE* = 0.35, *p* = 0.01], whereas we found no difference in discriminability for "know" responses [*t*(25) = 0.82, *SE* = 0.27, *p* = 0.42]. One sample *t*-tests revealed that "remember" responses significantly discriminated old and new faces [*t*(18) = 11.11, *SE* = 2.42, *p* < 0.001, and *t*(7) = 12.48, *SE* = 1.45, *p* < 0.001], for the control and DP participants respectively. Neither group, however, reliably discriminated old and new faces when responding "know" [*t*(18) = 1.43, *SE* = 0.24, *p* = 0.169, and *t*(7) = 0.392, *SE* = 0.35, *p* = 0.78], for the control and DP participants respectively.

The response times for correct "remember" and "know" responses across the two groups are displayed in **Table 4**. A mixed within-between subject ANOVA of Group (DP, control) × Response ("remember", "know") revealed a significant Group × Response interaction [*F*(1,25) = 12.86, *MSE* = 2622262.51, *p* = 0.001]. Within groups *t*-tests revealed that control participants and individuals with DP responded significantly faster with "remember" than "know" for previously studied faces [*t*(18) = −3.46, *SE* = 169.75, *p* = 0.003, and *t*(7) = −4.919, *SE* = 78.04, *p* = 0.002, respectively]. There were no significant response time differences between the two groups for correct "remember" [*t*(25) = 0.99, *SE* = 183.24, *p* = 0.23], and correct "know" [*t*(25) = 1.29, *SE* = 292.85, *p* = 0.21], responses.

**Table 4** also displays the incorrect "remember" and "know" responses across the two groups. A mixed within-between subject ANOVA of Group (DP, control) × Response ("remember", "know") revealed no significant effects of Response [*F*(1,25) = 2.53, *MSE* = 729116, *p* = 0.126], or Response × Group [*F*(1,25) = 0.032, *MSE* = 9306, *p* = 0.86]. Pairwise comparisons revealed no significant differences between response times for incorrect "remember" responses across groups [*t*(23) = 1.68, *SE* = 0.400, *p* = 0.11], but individuals with DP made incorrect "know" responses significantly faster than the control participants [*t*(25) = 2.148, *SE* = 322, *p* = 0.04]. Pairwise comparisons revealed that the correct "remember" responses were faster than incorrect "remember" responses in the control group [*t*(16) = 3.16, *SE* = 208, *p* = 0.006], but not the DP group [*t*(7) = 1.11, *SE* = 131, *p* = 0.3]. There were no significant differences between response times for correct vs. incorrect "know" responses in either the controls [*t*(18) = 1.38, *SE* = 221, *p* = 0.18], or DP group [*t*(7) = 0.128, *SE* = 68, *p* = 0.9].

Overall, the pattern of performance for the two groups in this task suggest that (a) recognition memory for faces in individuals with DP was clearly impaired relative to the control participants; (b) control participants showed the typical pattern of a greater proportion of "remember" than "know" responses (consistent with other work: Yovel and Paller, 2004; MacKenzie and Donaldson, 2007, 2009); whilst (c) individuals with DP showed no preference for "remember" responses.

#### **ELECTROPHYSIOLOGICAL RESULTS**

#### **ERP effects commonly associated with recognition memory for faces**

For analyses, we divided the central scalp area into four a-priori regions of interest at time intervals of 300–500 ms and 500– 700 ms (c.f., Yovel and Paller, 2004; MacKenzie and Donaldson, 2007) as recollection and familiarity for faces were previously found to occur across both these time windows. The main regions of focus will be across the left and right hemispheres from anterior (left hemisphere: D2, D12, D13; right hemisphere: C2, B31, B32) and posterior sites (left hemisphere: D16, D17, D28; right hemisphere: B2, B18, B19). These electrodes were chosen as they would capture the enhanced positivity exhibited for familiarity and recollection of faces as identified by previous research (Yovel and Paller, 2004; MacKenzie and Donaldson, 2007). These sites would also allow us to examine possible topographical differences

between where these effects occur in those with DP and intact face recognition skills. **Figure 2** displays the locations of these electrodes in the Biosemi cap system.

#### **ANOVAs from the four scalp locations**

We performed mixed within-between subject ANOVAs with factors of Correct Response (remember, know, correct rejections), Location (anterior, posterior), Hemisphere (left, right) and Group (control, DP) on the data from the 300–500 ms and 500–700 ms time windows. In the 300–500 ms time window we found a main effect of Location [*F*(1,16) = , *MSE* = 31.71, *p* = 0.001], and a significant interaction for Location × Hemisphere [*F*(1,16) = 12.84, *MSE* = 7.51, *p* = 0.002] and Response × Hemisphere × Group interaction [*F*(2,32) = 2.9, *MSE* = 0.73, *p* = 0.069]. In the latter time window (500–700 ms), we found a main effect of Location [*F*(1,16) = 20.11, *MSE* = 43.14, *p* < 0.001] and Response [*F*(2,32) = 6.77, *MSE* = 7.58, *p* = 0.004], and a significant interaction for Location × Response [*F*(2,32) = 5.29, *MSE* = 0.97, *p* = 0.01], Location × Hemisphere [*F*(2,16) = 6.6, *MSE* = 5.71, *p* = 0.021], Response × Hemisphere (Mauchly's Test of Sphericity indicated that the assumption of sphericity had been violated, χ 2 (2) = 7.96, *p* = 0.019, therefore degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity (ε = 0.71)) [*F*(1.42,22.67) = 3.56, *MSE* = 1.67, *p* = 0.059] Response × Hemisphere × Group interaction [*F*(2,32) = 3.48, *MSE* = 1.16, *p* = 0.043]. The following sections contain pairwise comparisons that reveal the causes of these effects.

#### **ERP effects commonly associated with familiarity for faces**

**Figure 3** shows enhanced positivity over the posterior and anterior scalp regions, particularly over the left hemisphere, for

control participants when they correctly responded "know" compared to correct "new" responses. The DP cases display some faint positivity over the central and posterior of the scalp across 300– 700 ms, however this positivity appears hugely diminished and covers less of the scalp anterior in comparison to the controls.

Examining the ERP waveforms in **Figure 5**, controls appear to show enhanced positivity for correct "know" responses from around 200–1000 ms when compared to correct rejections, but only over the left hemisphere. While DP cases display some positivity for correct "know" responses from around 300–400 ms in all scalp areas, this positivity only lasts until 600–700 ms, and is of smaller magnitude when compared to that of the controls.

Pairwise comparisons at each scalp location from the 300– 500 ms time window revealed that the control group's correct "know" [*t*(9) = 2.39, *SE* = 0.207, *p* = 0.038] responses were more positive than correct rejections at the left posterior (LP) region. The DP group exhibited no such positivity over any scalp location in this time period.

In the 500–700 ms time window, pairwise comparisons revealed that in control participants, ERPs over the LP area for correct "know" responses were more positive than those for correct rejections [*t*(9) = 2.656, *SE* = 0.254, *p* = 0.026]. Again, as in the earlier time window, the DP group exhibited no apparent "know" old/new effects in any of the four scalp locations. In addition, we found a significant difference between the groups when the mean amplitude of ERPs for correct rejection responses was subtracted from that for correct "know" responses at the LP site [*t*(16) = 2.168, *SE* = 0.382, *p* = 0.046], suggesting a greater old/new effect for correct "know" responses in the control participants in the later time window.

ERPs for correct "know" responses are more positive relative to that of the correct rejections over the LP region in the controls in both time windows. This suggests that the controls are experiencing a similar face-specific familiarity old/new effect as found by previous research (Yovel and Paller, 2004; MacKenzie and Donaldson, 2007). In contrast, our results suggest that for individuals with DP, ERPs in the four central scalp regions do not distinguish between correct "know" responses and correct rejections; there is no expected face-specific familiarity signal present in the DP group.

#### **ERP effects commonly associated with recollection for faces**

**Figure 4** shows that EEG voltage for correct "remember" responses is more positive than that for correct rejections across

the whole scalp for both of the two participant groups. This difference appears maximal over the right hemisphere's central area and is more pronounced in control participants than in individuals with DP.

ERPs for the four main regions of interest are shown in **Figures 5** and **6** for the control and the DP participants respectively. **Figure 5** suggests that control participants exhibit enhanced positivity for correct "remember" responses when compared to correct rejections from around 200–300 ms until the end of the epoch at 1000 ms over all four scalp regions. Individuals with DP also display similar positivity to that of the controls for correct "remember" responses from around 200 ms until the end of the epoch in the RP region (**Figure 6**). This correct "remember" positivity, however, does not appear in the other scalp regions until around 300–400 ms after stimulus onset, but the "remember" old/new effect appears to be of similar magnitude for both participant groups.

Pairwise comparisons at each scalp location from the 300– 500 ms time window revealed that the control group's correct "remember" (*t*(9) = 4.49, *SE* = 0.133, *p* = 0.001) responses were more positive than correct rejections at the LP region. Correct "remember" [*t*(9) = 2.33, *SE* = 0.126, *p* = 0.042] responses were also more positive than correct rejections over the left anterior (LA) location. We found no correct "remember" old/new effects at any of the four a-priori scalp locations in the DP group between 300–500 ms.

In the 500–700 ms time window, pairwise comparisons revealed that correct "remember" responses at LP [*t*(9) = 3.398, *SE* = 0.27, *p* = 0.008] and RP [*t*(9) = 3.807, *SE* = 0.315, *p* = 0.004] regions were more positive than correct rejections in the control group. We also found that ERPs for correct "remember" responses were more positive than those for correct rejections [*t*(9) = 2.487, *SE* = 0.345, *p* = 0.042] in the DP group over only the RP of the scalp.

This pattern of ERPs for control participants is consistent with previous research finding correct "remember" old/new effects over posterior and anterior scalp sites between 300–700 ms (Yovel and Paller, 2004; MacKenzie and Donaldson, 2007). The appearance of correct "remember" old/new effects, however, appear to be delayed in those with DP due to enhanced positivity appearing in the later time window only. This effect also seems quantitatively smaller in the DP group when compared to the controls as indicated by the positivity being restricted only to the RP of the scalp.

#### **Recollection vs. familiarity**

No significant differences were found between correct "remember" or correct "know" responses in the 300–500 ms time window for either of the two participant groups.

Further analyses on the controls between 500–700 ms revealed enhanced positivity for correct "remember" compared to correct "know" responses at RP [*t*(9) = 4.667, *SE* = 0.298, *p* = 0.001] and right anterior (RA) [*t*(9) = 2.483, *SE* = 0.347, *p* = 0.035] locations. We also found that ERPs for correct "remember" responses were more positive than "know" [*t*(7) = 2.84, *SE* = 0.27, *p* = 0.025] responses in the DP group at the RP location.

This suggests that recollection is a much stronger signal in comparison to familiarity in the control group, but only over the right hemisphere. ERP differences in the DP group were again restricted to the posterior of the scalp, with this enhanced positivity for recollection to familiarity appearing only over the right parietal region of the scalp.

It is possible that differences between the two groups with regard to significant old/new effects were only due to differential power to detect these effects (due to different sample sizes and trial numbers). To rule this possibility out, we repeated the analyses after removing the two control participants with the fewest correct "know" responses and then matched the average trial numbers between the two groups. These analyses revealed the same pattern of results.

### **ERP effects commonly associated with familiarity for words and objects**

Visual inspection of **Figure 3** suggests the appearance of a frontal correct "know" old/new effect over the furthermost mid and left frontal sites in those with DP. Intriguingly, this frontal effect does not appear in the controls. A "know" old/new effect over frontal sites between 300–500 ms has previously been associated with familiarity of objects, pictures and words (e.g., Curran, 2000; Maratos et al., 2000; Tsivilis et al., 2001; Duarte et al., 2004; Groh-Bordin et al., 2006), but not generally for faces (Yovel and Paller, 2004; MacKenzie and Donaldson, 2007; Herzmann et al., 2011). Knowing that previous research (Curran, 2000; Maratos et al., 2000; Tsivilis et al., 2001; Duarte et al., 2004; Groh-Bordin et al., 2006) has identified this effect as occurring between the frontal and polarfrontal regions of the scalp, and visually inspecting where this effect was apparent in our data, we averaged the electrodes (C17, C18, C19, C27 and C28) to form a post-hoc region of interest: inferior mid left anterior (ILA). We also created an additional two regions of interest to more robustly confirm any possible effects using the exact frontal electrodes (Left Frontal (LF): C27, C29 and C32; Right Frontal (RF): C16, C14 and C10) as used by previous research (Duarte et al., 2004). The Duarte et al. (2004) study was chosen as the authors used visual objects which appeared to most closely match the stimuli used in the present study.

**Figure 7** displays the ERPs from the ILA region where we identified the apparent frontal positivity related to correct "know" responses. The three waveforms for correct responses appear qualitatively similar within the individuals with DP, suggesting a similar underlying cognitive process being engaged when making recognition judgments of a face in DP. Qualitative differences are clearly apparent when these waveforms from the DP group are compared to the correct response waveforms from the control group. Differences such as these suggest that the two groups are possibly engaging in different cognitive processes when making face recognition judgments.

To better assess the apparent frontal old/new effect for individuals with DP, and in an effort to look for any possible familiarity effects normally associated with objects, pictures and words, we conducted mixed within-between subject ANOVAs on the ILA region with factors of Correct Response ("remember", "know", "new") and Group (control, DP) across the 300–500 and 500– 700 ms time windows.

In the 300–500 ms time window over the ILA region, we found a significant interaction for Response × Group [*F*(2,32) = 0.126, *MSE* = 0.0846, *p* = 0.022]. Pairwise comparisons revealed that correct "know" responses at the ILA site [*t*(7) = 2.88, *SE* = 0.14, *p* = 0.024] were significantly more positive than correct rejections in the DP group. Conversely, correct rejections were significantly more positive than the correct "know" responses at this site in the control group [*t*(9) = 2.88, *SE* = 0.073, *p* = 0.024]. Independent

samples *t*-tests revealed that the magnitude of the correct "know" effect was greater in the DP group than in the controls [*t*(16) = 3.24, *SE* = 0.26, *p* = 0.005]. Repeating these analyses using the electrodes examined by previous research (Duarte et al., 2004) confirmed these effects at the LF, but not RF, region.

In the 500–700 ms time period we found a significant interaction for Response × Group [*F*(2,32) = 0.025, *MSE* = 0.012, *p* = 0.033]. Paired samples *t*-tests revealed a positive correct "know" old/new effect [*t*(7) = 2.91, *SE* = 0.29, *p* = 0.022] in the DP group. No differences were found between any of the control waveforms in this time window. Between group comparisons revealed that this "know" old/new effect was larger in the DP group [*t*(16) = 2.67, *SE* = 0.46, *p* = 0.017]. As with the earlier time window, these effects were confirmed at the LF, but not the RF, location.

We repeated the analyses after removing the two control participants with the fewest correct "know" responses and matching the average trial numbers between the two groups. We ranked the DP participants by the number of their correct "know" responses and separately ranked the controls in the same manner. We then matched each DP participant with their respectively ranked control participant, and reduced the number of trials for each DP participant to that of their matched control participant. The selection of which trials to remove was decided at random by a Python script. This was possible for 7 of the DP cases; one control participant had more correct "know" responses than their matched DP case, in this instance, the control participant had their trial numbers reduced to match the DP participant. These analyses revealed the same pattern of results.

These results suggest that when making recognition judgments, the DP group process faces using a neural pathway commonly associated with words, objects and pictures (e.g., Curran, 2000; Maratos et al., 2000; Tsivilis et al., 2001; Duarte et al., 2004; Groh-Bordin et al., 2006). Familiarity in DP thus appears to be driven by this object related recognition pathway. The controls, however, exhibit no evidence that they process faces using this route, instead it appears that they use routes commonly associated with intact face recognition abilities (Yovel and Paller, 2004; MacKenzie and Donaldson, 2007).

## **DISCUSSION**

We examined recognition memory for previously unknown faces in both control participants and individuals with DP. Previous research has identified face recognition impairments in individuals with DP (e.g., Duchaine and Nakayama, 2006), but we are not aware of any previous attempts to assess the roles of familiarity and recollection for recognition performance in this group. We used an R/K recognition memory paradigm to measure the relative contributions of recollection, inferred from "remember" responses, and familiarity, inferred from "know" responses, to recognition memory for faces. We also obtained EEG recordings to identify the neural mechanisms involved in recognizing recently encountered faces in these two groups. We found that individuals with DP exhibited a variety of behavioral deficits in recognition memory and also differed in their electrophysiological response to test stimuli from individuals with normal face processing abilities. Specifically, in individuals with DP we observed (a) a relatively low proportion of "remember" responses and corresponding high proportion of "know" responses (suggestive of low levels of recollection); (b) a relatively high proportion of false alarms; (c) an apparent lack of a posterior familiarity ERP old/new effect commonly associated with faces as evidenced by similar waveforms for correct "know" responses and correct rejections across all time windows; (d) the appearance of a frontal "know" ERP old/new effect commonly associated with familiarity for objects, pictures and words (but not faces); and (e) a delay in the appearance of a recollection related ERP old/new effect as evidenced by ERPs for trials with correct "remember" responses only appearing more positive than those for trials with correct rejections in the later time window.

### **BEHAVIORAL FINDINGS**

In agreement with previous research we found that individuals with DP have a general impairment for recognizing faces when compared to controls. This impairment was driven by a decreased ability to correctly identify a previously seen face, but also through difficulties in correctly identifying a previously unknown face as new; these problems drive the DP group's diminished capacity to discriminate old from new faces in comparison to the controls. For control participants this task was very easy as evidenced by their extremely high discriminability score, whereas individuals with DP exhibited considerable difficulties, especially for "know" responses which did not discriminate at all between old and new items. Controls identified faces on the basis of recollection in the vast majority of trials, utilizing familiarity much less frequently, as indicated by their relatively high proportion of "remember" rather than "know" responses. Conversely, among individuals with DP the proportions of correct "remember" and "know" responses were about equal. Even though individuals with DP exhibited a much higher false alarm rate than controls, both groups made predominantly "know" responses in this category, suggesting that the similar proportion of correct "remember" and "know" responses in individuals with DP might reflect a specific impairment in recollection rather than a general inability to distinguish between "remember" and "know" responses.

Dual process models of recognition memory purport that familiarity is a faster process than recollection (Yonelinas, 2002), and as such one would expect "know" responses to be faster than "remember" responses—a pattern opposite to that we observed. This discrepancy, however, can be explained by the quality of the distinct phenomenological experiences of recollection and familiarity. It is entirely possible that "remember" and "know" response times do not accurately reflect the actual temporal activation of recollection and familiarity, but rather the speed with which a participant can be confident enough to make a decision (Dewhurst and Conway, 1994; Dewhurst et al., 2006). For example, a participant might respond "remember" the instant a contextual detail is recollected due to the strength of evidence associated with this information. On the other hand, a feeling of familiarity without context may require extra time to elicit a "know" response. Under these circumstances, the dual process model's assumption that familiarity is activated earlier than recollection is still compatible with the behavioral results of faster remember response times observed with the R/K procedure. It should be noted that it is remarkable that any RT differences exist at all; participants could only make a recognition response during a prompt screen which appeared after the face had already been displayed onscreen for 2000 ms.

What reasons could there be for the above differences between those with DP and normal face processing abilities? One explanation might be that of facial distinctiveness, or at least perceived facial distinctiveness, affecting recognition. Previous research has suggested that remember responses are primarily influenced by the distinctiveness of a face, with increasing distinctiveness leading to more recollected experiences (Dewhurst et al., 2005). Increasing distinctiveness has also been indicated as causing fewer false alarms (Light et al., 1979). It has been shown that some individuals with DP display random patterns when rating distinctiveness (Carbon et al., 2010), thus DP cases might have an inability to pick up on the subtle cues from a face that aid recollection. While those with DP possibly appear incapable of deciding distinctiveness in a similar fashion to controls, it would be interesting to see if distinctiveness, at least with regard to how those with DP perceive it, could influence later recognition performance. For example, is subsequent recognition performance for faces rated as distinctive at study by individuals with DP more accurate compared to faces rated as not distinctive, and if so, is this generally through the use of recollection? If recollection is primarily aided by distinctiveness, and that those with DP are incapable of making reliable distinctiveness judgments, then it does raise the question on what "remember" responses in individuals with DP are based. It might be interesting to see if other factors identified in face recognition are being used by those with DP, such as attractiveness, memorability, typicality or how much each face reminds them of someone they already know (Dewhurst et al., 2005).

Increasing usage of familiarity in discrimination tasks has been linked with face typicality, that is, how much a face looks like an average face (Vokey and Read, 1992; Dewhurst et al., 2005). Typicality and distinctiveness have been proposed to be opposite ends of a continuum upon which faces can be found (Johnston et al., 1997). Valentine (1991) formalized this idea into a face-space model, a multidimensional space whereby faces are located dependent upon their characteristics, at the center of which is an average, or typical, exemplar face. Faces that appear to be more typical, or lacking in distinctive features, are grouped around the center of this space, whereby the increased density and similarity of the faces in this area makes it much more difficult to discriminate between them. These faces are suggested to increase familiarity recognition judgments for studied and unstudied faces due to familiarity. Faces found further away from this center, those that are more distinctive, are much less susceptible to false alarms and are increasingly identified by recollection (Dewhurst et al., 2005).

The DP group's low discriminability scores and increased usage of familiarity suggests that face-spaces for individuals with DP are smaller than those in individuals with normal face processing abilities, effectively leading to faces being closer to the center. This would suggest some testable predictions: because the space within which individuals with DP place faces is diminished when compared to controls, those with DP should therefore be less susceptible to the face-space effects found in recollection and familiarity when faces are either morphed to appear more average or distinctive. For example, in those with intact face recognition abilities we should find large increases in recollection if we caricatured faces to make them appear more distinctive and fewer false alarms to such faces. In theory, the magnitude of these effects should be diminished, or possibly non-existent, in DP. Similarly, it should be possible to induce DP-like recognition memory behavior in those with intact face processing skills if we averaged faces to make them appear more typical. It would be interesting to see if doing so would then cause the electrophysiological signatures of recollection and familiarity in those with intact face recognition abilities to appear more similar to those observed in individuals with DP. Two studies have found some normal face-space effects in DP (Nishimura et al., 2010; Susilo et al., 2010) however the lack of a recognition memory paradigm measuring the contributions of recollection and familiarity in either experiment would suggest the need for further research.

## **ELECTROPHYSIOLOGICAL FINDINGS**

The electrophysiological results for the control participants replicate previous research (Yovel and Paller, 2004; MacKenzie and Donaldson, 2007) in finding anterior and posterior old/new effects for "remember" responses and only posterior effects for "know" responses. Taking "remember" responses as an index of recollection, and "know" responses as an index of familiarity, recollection ERP old/new effects in the controls appeared generally to occur over anterior and posterior sites in the 300–500 ms and over posterior sites in the 500–700 ms time windows. Familiarity ERP old/new effects appeared only over LP sites in both time windows. We also found ERPs for "remember" responses to be more positive over right hemisphere regions than those for "know" responses (which were indistinguishable from those for correct rejections). In the controls, the complete lack of an anterior familiarity effect similar to that typically found for objects and words (e.g., Curran, 2000; Maratos et al., 2000; Tsivilis et al., 2001; Duarte et al., 2004; Groh-Bordin et al., 2006) suggests that such effects in other studies using face stimuli (e.g., Curran and Hancock, 2007) might be driven by features that are not central to faces such as hair, clothing, jewelry, and other objects also present in the stimuli. Furthermore, we also found agreement with previous research that recollection related activity for faces was greater than that of the activity associated with familiarity (Yovel and Paller, 2004; MacKenzie and Donaldson, 2007), at least with regard to the right hemisphere.

The lack of recollection effects in the 300–500 ms time window for participants with DP could be due to a general delay in the neural processing of face stimuli relative to the control group. Parietal recollection old/new effects for objects and words generally do not become apparent until 500 ms after stimulus onset (e.g., Maratos et al., 2000; Tsivilis et al., 2001), so if individuals with DP processed faces like other objects, we would not expect a recollection effect earlier than 500 ms after stimulus onset. ERPs for correct "remember" responses, however, look qualitatively similar between the two groups which would be inconsistent with the delayed neural processing of face stimuli in our DP group. Alternatively, it might be the case that the recollection old/new effect in the 300–500 ms time window for the control group is distinct from the corresponding effect in the later time window. Whereas the early and late effect have commonly been assumed to both index recollection (Yovel and Paller, 2004), it has been suggested that the early parietal effect might instead index familiarity (MacKenzie and Donaldson, 2007). It seems plausible that feelings of familiarity precede or at least coincide with recollection, and the early correct "remember" ERP positivity we observed over the left hemisphere of control participants may thus reflect that of a familiarity signal. The absence of this ERP positivity in individuals with DP could be related to their lack of posterior ERP old/new effect for "know" responses: both could index a lack of familiarity for previously studied faces. Consistent with this explanation is the fact that early correct "remember" and "know" ERP waveforms are virtually identical over the LP region for control participants.

Looking at the later time window, we see a clear recollection ERP old/new effect in the DP cases, one that is similar topographically and in magnitude to that of the controls, at least over the RP region. This suggests that the phenomenological experience of recollecting a face in individuals with DP is intact despite the drastically reduced proportion of "remember" responses in this group. Dual process theories that view recollection as an allor-nothing process (e.g., Yonelinas, 1994) would predict similar effect sizes for effects due to recollection. Despite the recollection old/new effect between the two groups appearing to be of similar magnitude over the right parietal region, the fact that no old/new effects were found over other scalp locations might suggest a quantitatively weaker recollection signal in those with DP. This lends tentative support to the proposal that recollection could be a graded, as opposed to discrete, process as suggested by some theories of recognition memory (Wixted, 2007; Wixted and Mickes, 2010).

In the present paradigm, we relied upon participants' own self-generated details for recollection. There might be a concern that this method does not index recollection commonly experienced in the real world, such as that for names, occupations or places. The Yovel and Paller (2004) study found recollection ERP old/new effects related to self-generated details surrounding a face to be qualitatively similar to that of occupations; these recollection old/new effects were also topographically similar to those found here. This would suggest that recollection of selfgenerated information attached to a face is the same as semantic information provided from external sources. MacKenzie and Donaldson (2007), however, found a larger old/new effect when names were recollected in comparison to self-generated details. It would therefore be of interest to see whether the recollection deficits observed here in our DP group would continue to be observed when an objective measure of recollection, such as a name, is employed; if names were no different from other semantic information, then we should observe similar behavioral and electrophysiological abnormalities in DP to those observed for recollection here.

The face related posterior familiarity old/new effect, however, appears to be absent in the DP group, suggesting that the subjective experience leading to "know" responses might differ between the two groups. While those with intact face recognition clearly exhibit a posterior ERP old/new effect when experiencing familiarity for faces, those with DP appear to engage a familiarity route more commonly associated with object, picture and word recognition towards the front of the scalp. This is to our knowledge the first clear evidence that individuals with DP are not processing faces using a specialized, face-specific pathway, but are instead using a route more commonly associated with general objects. Even more interesting is that this pathway appears to be engaged by the DP group during all recognition judgments, as evidenced by the qualitative similarities between the three different correct response ERP waveforms at the frontal region. The ERP waveforms exhibited by the controls in all response categories were qualitatively different in comparison to the DP group, so much so that the correct "know" old/new effect was actually more negative in amplitude in the control group. This finding was a reversal of the correct "know" old/new effect found in the DP group at the same site. It thus appears that an attempt is made to engage the object familiarity process in parallel with the face related recollection experience in DP. These results offer an exciting insight as to why those with DP might be experiencing problems when trying to recognize a face; a face is not treated entirely as special, but also processed using a generic, object related pathway in the brain.

Some authors (e.g., Yovel and Paller, 2004) have suggested that the parietal familiarity and recollection old/new effects are reliant on similar neural generators, thus implying that recollection and familiarity are merely quantitatively different strengths of the same signal. Furthermore, it has also been suggested that the frontal familiarity old/new effect merely reflects conceptual priming (Yovel and Paller, 2004; Paller et al., 2007; Voss et al., 2012) due to the existence of a base level of meaning for stimuli such as words (Maratos et al., 2000) and everyday objects (e.g., Duarte et al., 2004) in recognition memory experiments. The dissociation between the parietal familiarity and recollection effects in those with DP, and the appearance of the frontal familiarity effect commonly associated with objects and words, lends support to the proposal that the posterior familiarity and recollection old/new effects for faces are being driven by dissociable processes. Further to this, that previously novel faces, stimuli highlighted as not susceptible to the conceptual priming problem (Yovel and Paller, 2004), should elicit a frontal familiarity effect in the DP group suggests that the conceptual priming hypothesis is incorrect. Instead, our results would appear to add support to the notion that the mid-frontal ERP effect does actually index a generic familiarity process.

An alternative view, however, might be able to reconcile our data with the conceptual priming hypothesis. Voss and Paller (2007)found that the magnitude of the mid-frontal old/new effect increased in response to increasing ratings of meaningfulness for shapeless blobs; this supports the view that the mid-frontal old/new effect is merely an index of conceptual priming. If our DP cases are not entirely processing faces as faces through typical routes, as evidenced by the lack of a parietal familiarity old/new effect, then it might be the case that they are attempting to find some form of meaning in the faces instead. By trying to find some meaningful way to examine the faces, rather than treating them merely as faces, our DP participants might therefore be rating faces as familiar on the basis of conceptual priming. Our data would therefore still be compatible with the conceptual priming hypothesis if this were found to be the case. Regardless of the underlying neural cause of the mid-frontal old/new effect, it would appear that this pathway is driving familiarity based recognition judgments in our DP group.

It should be noted that although not significant, the topographical figures and waveforms do appear to hint at a possible familiarity effect in the DP group that is qualitatively similar to the controls, albeit hugely dissipated. The lack of differences between correct "know" and correct rejection waveforms at posterior sites might be an index of the difficulty that those with DP are finding at discriminating between the old and new faces; the face related familiarity signal elicited by a face may be so weak for individuals with DP in comparison to the controls that it is incapable of creating a large enough effect in the waveforms to be statistically apparent here. Maybe due to this weakness in the face-specific familiarity route, those with DP then engage the more general object and word familiarity route to aid recognition.

No previous recognition memory study for faces has found a modulation of this parietal familiarity effect. This occurrence in the present study suggests that familiarity for faces could be modulated in a similar way to the anterior familiarity effect seen for objects and words. Future research could employ experimental manipulations to uncover whether these familiarity effects can be modulated through increasing levels of familiarity or confidence. Another possibility could be that familiarity for faces is linked to the same underlying process that detects distinctiveness in faces; the fact that those with DP might be incapable of making distinctiveness judgments in a similar fashion to those with intact face processing abilities (Carbon et al., 2010) could be due to the fact that they are utilizing an object/word route to make such judgments. The nature of the parietal face related familiarity effect has been largely ignored by recognition memory researchers and is an area ripe for study, not only in DP, but also in those with intact face processing abilities. Combined with experimental manipulations of facial distinctiveness, they could provide investigators with a powerful framework within which to elucidate the possible causes of recognition deficits in DP.

#### **CONCLUSIONS**

The present study examined recognition memory for previously unknown faces in DP using an R/K paradigm. From our findings it is clear that there are a range of abnormalities in recognition memory for faces in individuals with DP. These findings supply compelling evidence that future DP researchers should take the relative contributions of recollection and familiarity into consideration when designing studies investigating face recognition. Our electrophysiological results give the first clear evidence that individuals with DP process faces like other objects and we propose that the associated impairments in performance may be related to difficulties in judging distinctiveness and/or typicality of previously unknown faces (Carbon et al., 2010). This finding would not have been apparent from the behavioral results alone and highlights the importance of combining different approaches when investigating face recognition deficits in DP.

The present research also has important implications when diagnosing, and testing treatments of, DP. Further work is required to discover the extent to which those with DP and normal face recognition abilities are utilizing familiarity and recollection when completing the widely used CFMT; a primary tool for diagnosing DP. Around half of all individuals that contact us reporting problems with faces fail to meet the criteria for a diagnosis of prosopagnosia when using the CFMT. The CFMT simply asks participants to pick out a target face from a choice of three faces, with no measure as to how this decision was made. The basis on which those with intact face recognition abilities are identifying faces on the CFMT is as yet unknown, although one could imagine it is primarily through the use of recollection. Those that meet the criteria for a diagnosis on the CFMT might be more reliant on, as our study has demonstrated, a weakened recollection signal and abnormal familiarity route. It is possible that the individuals that report problems with faces, yet fail to meet a diagnosis, might be in some as yet undetected group exhibiting quantifiably distinct recognition processes. If we were to incorporate the R/K or confidence response options into the CFMT, we might find differences between those who report problems yet score within the normal range on the CFMT and others who report no such difficulties. Our findings provide new insights into recognition memory for faces in DP and should guide future research and attempts to improve diagnosis.

## **ACKNOWLEDGMENTS**

We would like to thank Seb Whiteford for helping create the stimuli, Sarah Ghawji and Ting Wang for assisting in the collection of the data and all of our DP participants for contributing their time and effort in taking part in this study.

## **REFERENCES**


Yovel, G., and Paller, K. A. (2004). The neural basis of the butcher-on-the-bus phenomenon: when a face seems familiar but is not remembered. *Neuroimage* 21, 789–800. doi: 10.1016/j.neuroimage.2003.09.034

**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 April 2014; accepted: 25 July 2014; published online: 14 August 2014*. *Citation: Burns EJ, Tree JJ and Weidemann CT (2014) Recognition memory in developmental prosopagnosia: electrophysiological evidence for abnormal routes to face recognition. Front. Hum. Neurosci. 8:622. doi: 10.3389/fnhum.2014.00622 This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Burns, Tree and Weidemann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Behavioral dissociation between emotional and non-emotional facial expressions in congenital prosopagnosia

## *Roberta Daini 1,2\*, Chiara M. Comparetti <sup>1</sup> and Paola Ricciardelli 1,2*

*<sup>1</sup> Department of Psychology, University of Milano-Bicocca, Milan, Italy*

*<sup>2</sup> Milan Centre for Neuroscience, Milan, Italy*

#### *Edited by:*

*Mark A. Williams, Macquarie University, Australia*

#### *Reviewed by:*

*Yaroslav O. Halchenko, Dartmouth College, USA Jeremy Tree, University of Swansea, UK*

#### *\*Correspondence:*

*Roberta Daini, Dipartimento di Psicologia, Università degli Studi di Milano-Bicocca, Piazza dell'Ateneo Nuovo, 1, 20126 Milano, Italy e-mail: roberta.daini@unimib.it*

Neuropsychological and neuroimaging studies have shown that facial recognition and emotional expressions are dissociable. However, it is unknown if a single system supports the processing of emotional and non-emotional facial expressions. We aimed to understand if individuals with impairment in face recognition from birth (congenital prosopagnosia, CP) can use non-emotional facial expressions to recognize a face as an already seen one, and thus, process this facial dimension independently from features (which are impaired in CP), and basic emotional expressions. To this end, we carried out a behavioral study in which we compared the performance of 6 CP individuals to that of typical development individuals, using upright and inverted faces. Four avatar faces with a neutral expression were presented in the initial phase. The target faces presented in the recognition phase, in which a recognition task was requested (2AFC paradigm), could be identical (neutral) to those of the initial phase or present biologically plausible changes to features, non-emotional expressions, or emotional expressions. After this task, a second task was performed, in which the participants had to detect whether or not the recognized face exactly matched the study face or showed any difference. The results confirmed the CPs' impairment in the configural processing of the invariant aspects of the face, but also showed a spared configural processing of non-emotional facial expression (task 1). Interestingly and unlike the non-emotional expressions, the configural processing of emotional expressions was compromised in CPs and did not improve their change detection ability (task 2). These new results have theoretical implications for face perception models since they suggest that, at least in CPs, non-emotional expressions are processed configurally, can be dissociated from other facial dimensions, and may serve as a compensatory strategy to achieve face recognition.

**Keywords: face perception, congenital prosopagnosia, unfamiliar face recognition, emotional expressions, non-emotional expression processing**

## **INTRODUCTION**

Prosopagnosia refers to a category-specific perceptual deficit in face recognition. It can be acquired (i.e., resulting from brain damage, mainly after lesions of occipito-temporal regions; Bodamer, 1947) or congenital (McConachie, 1976). Congenital prosopagnosia (CP) is not caused by brain lesions, but is present from birth, and it occurs along with intact sensory visual abilities and normal intelligence (Behrmann and Avidan, 2005). It has been described as a quite common cognitive disorder, which occurs in 2.47% of the population and almost always runs in families (Kennerknecht et al., 2006). It can be quite dysfunctional given the importance of faces in social life (Behrmann and Avidan, 2005).

Faces, in fact, are among the most important visual stimuli we perceive as they *simultaneously* convey several pieces of important social information. They inform us not only about a person's identity, gender, or age, but also about their mood, emotion, and direction of gaze. Thus, faces can be considered *multi-dimensional* stimuli. Although several behavioral and neuropsychological studies have brought evidence for the existence of cognitive and neural mechanisms dedicated to face perception (Kanwisher et al., 1997, 1999; Posamentier and Abdi, 2003; Kanwisher and Yovel, 2006), still little is known about how these various dimensions are coded and how they are integrated into a single face percept. A first classical distinction has been made between facial expression and facial recognition and identity, which would be processed along two separate routes after an initial stage of visual structural encoding (Bruce and Young, 1986; Kanwisher et al., 1997, 1999; Haxby et al., 2000; Posamentier and Abdi, 2003; Kanwisher and Yovel, 2006). Indeed, it has been reported that prosopagnosic patients with lesions in associative visual cortices, despite their deficit in face recognition can still recognize emotional facial expressions, whereas deficits in expression recognition can occur in patients without prosopagnosia (e.g., Kurucz and Feldmar, 1979; Adolphs et al., 1995), suggesting that expression and identity can be processed independently from each other.

Using fMRI, Haxby et al. (2000) proposed a distributed neural system model for face perception in which face responsive regions were grouped in two systems: the core system that includes areas involved with the visuo-perceptual analysis of a face, and the extended system that includes areas that are involved in the extraction of other information (such as semantics, speech, emotions). Within the core system they emphasize a further distinction between the representation of invariant and changeable aspects of faces. In particular, an important functional and anatomical distinction has been made for the processing of invariant aspects (i.e., eyes, nose, mouth, etc.) and that of changeable aspects of the face (such as eye-gaze direction, facial expression, lip movement, and pre-lexical speech perception), with the former being responsible for the processing of face identity, and the latter being involved in the perception of information that facilitates social interaction and communication (e.g., facial expression).

In the analysis of facial expressions the classical models *implicitly* assume an emotional content (Bruce and Young, 1986; Haxby et al., 2000). However, in everyday life people can show expressions on their faces which do not convey an emotional state. A good example is represented by celebrity impersonators who can mimic the ways in which famous people move their faces. Contrary to facial emotional expressions that are universally recognized and expressed in the same way by all individuals (Ekman and Friesen, 1976), this particular kind of facial expression (called dynamic facial signatures) is idiosyncratic, does not carry an emotional content and provides cues beyond the form of the face (Munhall et al., 2002; O'Toole et al., 2002).

The fact that an observer can quickly and easily recognize in the impersonator's performance the facial mimics of that particular famous actor or politician indicates that we have the ability to extract the identity of a face not only from its invariant aspects (e.g., visual appearance), but also from its changeable aspects (e.g., facial motion and expressions), and even when they do not convey an affective state. People move in unique ways and thus have dynamic facial signatures that perceivers can recognize (Lander et al., 1999). Hence, at least for familiar faces, the person's identity is conveyed both by emotional and non-emotional facial expressions (Hill and Johnston, 2001; Posamentier and Abdi, 2003; Lander and Metcalfe, 2007). Moreover, there is evidence that our brain and cognitive systems can also recognize people both from features and from facial expressions that do not convey an affective state (Knappmeyer et al., 2001). However, there are several outstanding issues regarding the processing of non-emotional facial expressions.

What happens when we perceive expressions that are not emotional (e.g., when somebody pulls his/her face in a meaningless but distinct way)? In keeping with the existing cognitive and neural models, would they be analyzed by the same mechanism and cortical regions underlying the processing of emotional facial expressions? Or instead, would they be processed and perceived as a change in the face invariant features? Although Haxby et al.'s model has been modified to accommodate the recognition of familiar faces thorough the processing of non-emotional facial expression by differentiating the role of visual familiarity from the role of person knowledge (O'Toole et al., 2002; Gobbini and Haxby, 2006), no claim has been made about a possible distinction between emotional and non-emotional facial expression in unfamiliar (unknown) faces.

Recently, it is has been proposed that information about identity could be coded both in the FFA and in the STS. Specifically, the FFA would process static features for both familiar and unfamiliar faces, and the STS, as well as processing emotional facial expression, could also code face identity in the form of dynamic, non-emotional identity signatures (O'Toole et al., 2002). Dynamic information, in fact, contributes to face/person recognition particularly in poor viewing conditions and when invariant facial cues are degraded (Knight and Johnston, 1997; Lander et al., 1999, 2001; Lander and Bruce, 2000). This is because characteristic movements and gestures are reliable cues not only to identity, but also to the recognition of faces of unknown people that have already been seen. In other words, face recognition (i.e., the ability to categorize a face as already seen, although unknown) also relies on changeable features of the face and their dynamic patterns, as does face identity (i.e., the ability to recognize a face as familiar and retrieve our knowledge of it). Lander and Davies (2007) using a face recognition task showed that characteristic motion information could be extracted very rapidly and efficiently when learning a new face, thus suggesting that as a face is learned, dynamic facial information is encoded with its identity and could be used for face recognition also in *unfamiliar* faces.

Although, like acquired prosopagnosic patients (Kurucz and Feldmar, 1979; Tranel et al., 1988; Adolphs et al., 1995; but see also Humphreys et al., 2007), congenital prosopagnosic individuals are indistinguishable from controls in perceiving emotional facial expressions (e.g., Behrmann and Avidan, 2005), very little investigation has been carried out to understand whether in this population non-emotional facial expressions can lead to person recognition, and are dissociable from other facial dimensions (i.e., facial features and emotional facial expressions).

The first evidence suggesting that non-emotional facial expressions could be processed in a specific way, dissociable from emotional facial expressions and other facial features, comes from a study by Comparetti et al. (2011) on typical development individuals (young adults). In this behavioral study both the changeable (emotional and non-emotional expressions) and the invariant (features) aspects of unfamiliar faces were manipulated to investigate a possible new dissociation between emotional and non-emotional facial expressions (i.e., expressions that do not have an affective meaning). Participants were asked to perform a recognition task (2AFC paradigm) and a change detection task, using upright and inverted faces. The faces to be recognized could be either identical to the ones presented in the exposure phase (a face bearing a neutral expression), the same but modified in their internal features, emotional and non-emotional facial expressions, or new faces. Once participants recognized a face as an already seen one, they had to detect whether it was identical to the one previously seen or contained a change. The change could regard the size of the eyes or the mouth (invariant feature manipulation) or the presence or absence of an emotional or a non-emotional facial expressions. The accuracy and RT were measured. It was hypothesized that, if the emotional and non-emotional facial expressions were processed differently, a difference in performance for the three manipulations should emerge. The results showed that each of the three different manipulation conditions had a different impact on the inversion effect (i.e., a decrement in performance that occurs when faces are inverted, thought to reflect a disruption in configural processing and in encoding invariant features; Yin, 1969). In particular, the magnitude of the inversion effect differed in the three manipulations, indicating a difference not only in the processing of the invariant features and the emotional facial expressions, but also a further difference in the processing of non-emotional and emotional facial expressions.

These differences could be due to the fact that although both emotional facial expressions and non-emotional facial expressions convey biological motion, only the former would involve the emotional system (i.e., the extended system in Haxby et al.'s model). Since both types of facial expressions convey dynamic facial information, it is plausible that they are processed by the same area of the core system (i.e., the STS). However, other areas outside the core system could also be involved in processing them, causing the differences between emotional and non-emotional expressions (Gobbini and Haxby, 2006). Thus, it is an open question whether non-emotional facial expressions, which seem to be processed differently both from invariant features and emotional facial expressions, can lead to, or contribute to categorize a face as already seen (i.e., face recognition).

Following our previous study (Comparetti et al., 2011), we made the hypothesis that non-emotional and emotional expressions are processed separately as much as invariant features and changeable aspects.

Important hints come from the study of congenital prosopagnosics, who are impaired at recognizing faces, have difficulties in deriving the configural or holistic relations between face features, but can use facial movement information conveyed by a dynamic face to recognize facial identities (Steede et al., 2007) or to discriminate in a matching task whether two sequentially presented dynamic unfamiliar faces were or not the same identity (Lander et al., 2004). CP individuals, similar to patients affected by acquired prosopagnosia (Busigny and Rossion, 2010), are minimally affected by face inversion and some of them even show a better performance for inverted than for upright faces (the "inversion" superiority effect) (Avidan et al., 2011). Therefore, given that it has been found that in typical development individuals invariant features, emotional and non-emotional facial expressions differ in terms of configural face processing (Comparetti et al., 2011), CP individuals may process non-emotional facial expressions differently than invariant face features, and in the same way as typical development individuals. Moreover, if the processing of non-emotional facial expressions is intact in CP individuals, then it is possible that they use them as cues to facilitate face recognition, thus compensating for their face processing deficits.

The aim of the present study was two-fold. First, we wanted to investigate whether facial expressions that do not convey an affective state (i.e., non-emotional facial expressions) are processed in the same way as emotional facial expressions by congenital prosopagnosic individuals. Second, we wondered whether in CP individuals these expressions could be used as a cue to face recognition given that they should not be, or be less impaired in processing the changeable aspects of a face (Steede et al., 2007). To this end, as in Comparetti et al. (2011), we used the face inversion paradigm and we presented static unfamiliar faces in which one of the following facial aspects was changed: emotional expression; "non-emotional" expression; size of invariant features. Two different tasks were used: a same/different person task (recognition task) and a change detection task. The first task allowed us to test the effect of our manipulations on face recognition processing; whereas the second one was designed to test whether, within the same identity, the change of a specific facial aspect was successfully detected. Moreover, we exploited the face inversion effect as an indicator of underlying perceptual processing. A difference in the magnitude of the face inversion effect for each manipulation in each task would reflect a difference in the processing of face recognition and emotional/non-emotional facial expressions.

## **METHOD**

## **PARTICIPANTS**

The study was conducted in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and fulfilled the ethical standard procedure recommended by the Italian Association of Psychology (AIP). All experimental protocols were also approved by the ethical committee of the University of Milano-Bicocca. All the participants were volunteers and gave their informed consent to the study.

Six participants (3 F and 3 M; aged between 25 and 45 years old; mean = 35; *SD* = 8*.*83), who reported in a non-structured interview lifelong difficulties in face recognition and showed impaired performance on tests of face recognition, took part in the study. They were right-handed, had normal or corrected-tonormal vision, and had no neurological or neuropsychological deficit aside from the impairment in face processing.

In order to compare them with a control group, their performance was compared with that of 10 typical development individuals (6 F and 4 M). They did have difficulties in face recognition (self-report) and were matched to the CP group by be age [controls aged between 22 and 49 years old; mean = 33.8; *SD* = 9*.*55; CPs vs. controls *t*(14) = 0*.*25; n.s.].

## **ASSESSMENT OF CONGENITAL PROSOPAGNOSIA**

Due to the fact that there is an ongoing debate on how to diagnose CP, and on the heterogeneity of the deficit (Schmalzl et al., 2008), in the present study we assessed face perception problems reported by the CP participants by means of more than just one neuropsychological test. The problems reported in a pre-test not structured interview concerned perceived face recognition difficulties, uncertainty in face recognition, prolonged recognition times and the development of compensatory strategies, a pattern compatible with the presence of CP. The presence of CP was further confirmed by comparing the performance of each participant to normative data on three face processing tasks: Benton Facial Recognition Test, TEMA Subtest for memory faces, Cambridge Face Memory Test.

The Benton Facial Recognition Test (BFRT, Benton and Van Allen, 1968; Ferracuti and Ferracuti, 1992), widely used for acquired prosopagnosia, is a test to assess face recognition abilities. For each item, individuals are presented with a target face above six test faces, and they are asked to indicate which of the six images match the target face.

In the TEMA (Reynolds and Bigler, 1995), the subtest for memory faces requires the recognition of target faces from sets of photos of individuals differing in terms of age, gender and ethnic backgrounds, with an increasing number of targets and distracters.

The Cambridge Face Memory Test (CFMT, Duchaine and Nakayama, 2006; Bowles et al., 2009) is the most used and valid test to diagnose CP and it measures face memory (Wilmer et al., 2012); participants learn six unfamiliar target faces, and subsequently are required to recognize them from sets of three faces (one target and two distractor faces). Besides, those faces vary from the learned one (e.g., seen from different viewpoints, with visual noise, etc.). The CFMT test includes two versions based on the orientation of faces, upright and inverted.

**Table 1** shows the performance of our experimental group at each test. Inclusion criteria required a pathological performance at least in two out of three tests.

#### **STIMULI**

Stimuli were the same as used in Comparetti et al. (2011). The faces were created from digital photos of real faces by means of Adobe Photoshop and Poser 5.0 software (Curios Lab, Inc., ad e-frontier, Inc., Santa Cruz, CA) as follows. Firstly, by means of Photoshop a completely symmetrical face was created by duplicating just one hemi-face of the original face. Therefore, the left and the right hemi-faces were perfect mirror-images of one another. This ensured that none of the stimuli used contained any intrinsic, unintended asymmetries that could facilitate recognition. Then, the mirror digital photos were imported in a different software program (Poser 5.0) to generate 12 neutral basic stimuli. For every face, external features were almost entirely removed by the software so that face recognition could only be based on the internal features.

The stimuli comprised 12 neutral basic (unmodified) faces generated by Poser and three sets of modified faces in which different manipulations were made (features, emotions, and nonemotional facial expressions). Among the neutral stimuli, 4 were

**Table 1 | CP's demographic information and performances on tests of face recognition.**


*\*Score falling below the cut off.*

target stimuli (2 picturing females and 2 picturing males) and 8 were distracters (4 F and 4 M), plus 72 modified stimuli which were generated by target and distracter stimuli. For every manipulation, indeed, two different versions of the same manipulation were created (3 different manipulations × 2 versions = 72) using different neutral faces (see **Figure 1**). The first manipulation, regarded the size of features. From each target stimulus and from each distracter, one modified stimulus (version 1) was created in which the eyes were enlarged and another one was created in which the mouth was enlarged (version 2). Both changes consisted of an increase in size of 1 Poser software unit. This unit respects the boundaries of biological compatibility. The second manipulation, regarded emotional facial expressions. Neutral stimuli were now manipulated by means of Poser 5.0 software to show either a happy (version 1) or a sad (version 2) expression. Finally, the non-emotional facial expressions were created by manipulating the neutral faces in their upper (version 1) and lower part (version 2) respectively, around the eyes and the mouth. In doing so, the resulting facial expressions did not express an affective state (i.e., non-emotional facial expressions).

In order to validate the modified stimuli for use in the present and in other studies (e.g., Comparetti et al., 2011) a scalar rating was performed on a sample of 36 stimuli (12 randomly selected from each stimulus set) to evaluate whether they conveyed or not an emotional facial expression. The selected stimuli were presented in upright orientation on a PC display. Twenty typical development participants who did not take part in the present study (12 F and 8 M, aging between 18 and 32 year old) had to evaluate the faces in a Likert-like scale from 0 (stimulus does not express any emotions) to 4 (stimulus expresses clearly an emotions) (see **Table 2**). Following that, they had to indicate which emotions they perceived. They could choose among 8 alternatives: happiness, sadness, anger, disgust, fear, surprise, "other," or "non-emotions." Each stimulus lasted until response.

A univariate Analysis of Variance (ANOVA) was performed on the mean percentages of emotional and non-emotional ratings. The effect of stimulus condition was significant [*F*(1*,* 19) = 81*.*19; *p <* 0*.*0001]. The results were as follows. In the case of manipulation of features and manipulation of non-emotional facial expressions the stimuli were generally perceived as not expressing a particular affective state and they did not differ from each other, whereas all the stimuli bearing an emotional facial expression were judged as expressing an emotion and differed both from those with modified features (*p <* 0*.*0001) and from those displaying a non-emotional facial expression (*p <* 0*.*0005). Moreover, faces expressing happiness were judged as happy stimuli, and those expressing sadness as sad stimuli. Therefore, the rating analysis corroborated the validity of our face stimuli.

In the present experiment, each face (7.1◦ × 9*.*2◦) was presented in gray scale and against the same black colored background. All of the stimuli were presented both upright and inverted (see **Figure 1**).

#### **APPARATUS**

The experiment took place in a dark, sound attenuated room. Participants sat in front of a PC computer monitor at a distance of approximately 70 cm. The screen was framed with a circle black

**FIGURE 1 | (A)** Examples of basic neutral faces (male and female) and their modified versions. The changes in features are depicted in the two upper rows. On the left enlarged eye size; on the right enlarged mouth size. The changes in non-emotional expressions are depicted in the middle rows. On the left the change occurred in the upper part of the face; on the right the change occurred in the lower part of the face. The changes in emotional expression version are depicted in the lower rows. On the left a happy expression; on the right a sad expression. **(B)** Stimuli are showed inverted. Each manipulation complied with the parameters of biological compatibility.


**Table 2 | Mean percentage (%) of emotional and non-emotional ratings given to each type of modified stimulus.**

*Frequency values of participants answers falling above 50% are in Bold.*

carton board of about 15 cm of diameter. Stimulus presentation and registration of task performance were controlled by program Presentation version 9.8. Two keyboards were used: one for the participants, covered by a black card with a hole in correspondence with the button "yes" and "no" (recognition task, see below) and one for the experimenter (same/different task, see below).

#### **PROCEDURE**

The experiment was divided in two sessions, an exposure and an experimental session. In the exposure session the participants saw on the screen the 4 target faces, one by one, 10 times, for 3 s each time. The experimental session followed the exposure one and was divided in four blocks: 2 of upright faces and 2 of inverted faces. In each block neutral and manipulated faces were presented randomly. For each experimental trial the sequence of events was as follows. The trial started with a fixation cross in the center of the screen which lasted 250 ms, then the face stimulus was presented in the center for 500 ms, then there was a gray screen for each task, the same/different person task and the change detection task. For every stimulus participants were asked to indicate whether or not the face was one of the target stimuli. Participants had to press the button "yes" if they saw the face in the exposure phase, or the key no if they did not recognize the face (2 Alternative Forced Choice paradigm). When a stimulus received a "yes" response, participants had then to judge if the stimulus was exactly the same as the one seen in the exposure phase or if there was some change. For the same/different task the experimenter registered the participant's answer on another keyboard pressing the "same" or "different" key. For either the recognition task or the same/different task accuracy was recorded and analyzed.

We used the presence of the inversion effect as a marker of configural processing (e.g., Rossion, 2008).

## **RESULTS**

The percentage of correct responses was used as a dependent variable (**Tables 3**, **4**).

An ANOVA was run separately for each task (recognition and change detection) and for each orientation (upright and inverted), with group (CPs and controls) as a between-subject factor and condition (neutral, features, non-emotional, and emotional expressions) as a within-subject factor. *T*-test statistics for independent samples were run as *Post-hoc* tests to compare the performances of the two groups for significant interactions. *T*-test statistics against the null hypothesis (50%) were also performed in order to test that the effects were not due to chance.

All statistical analyses were conducted using the software package Statistica for Windows (version 8.0, Statsoft Inc., 2007). The variances between groups were assessed by Levene's test for the homogeneity of the variances.

**Figure 2** illustrates participants' performance (CPs and controls) at the first task for each experimental condition.

A first ANOVA was run for the recognition task and the upright condition.

A main effect of group emerged [*F*(1*,* 14) = 9*.*72; *p* = 0*.*007], confirming that the two groups came from different populations in terms of their ability to recognize unfamiliar faces. As expected, controls were better than CPs in recognition (88.87 vs. 72.19%, respectively). However, the significant interaction between group and condition [*F*(3*,* 42) = 3*.*196; *p* = 0*.*033] indicates that this was the case only for neutral faces [controls: 95% vs. CPs: 72.5%, *t*(14) = −4*.*084; *p* = 0*.*001; Levene test: *F*(1*,* 14) = 0*.*096; *p* = 0*.*761] and for faces with modified features size [controls: 91.26% vs. CPs: 68.08%, *t*(14) = −4*.*804; *p* = 0*.*0002; Levene test: *F*(1*,* 14) = 0*.*317; *p* = 0*.*582]. Both non-emotional facial expressions [controls: 84.21% vs. CPs: 76.51%, *t*(14) = −1*.*14; n.s.] and emotional facial expressions [controls: 85.02% vs. CPs: 71.66%, *t*(14) = −1*.*62; n.s.] did not differ between the two groups.

No significant main effect of condition emerged [*F*(3*,* 42) = 1*.*21; n.s.].

A second ANOVA was run for the recognition task and the inverted condition.

No significant main effect of group [*F*(1*,* 14) = 3*.*361; n.s.], or condition [*F*(3*,* 42) = 0*.*306; n.s.] emerged. Their interaction was also not significant [*F*(3*,* 42) = 0*.*183; n.s.].

These results are coherent with the idea that in control subjects a configural processing of features is triggered only by upright faces (e.g., Diamond and Carey, 1986), and is compromised in CP individuals (e.g., de Gelder and Rouw, 2000; Behrmann and Avidan, 2005).

In order to assess configural face-specific mechanisms, the face inversion effect was computed as the difference in accuracy between upright and inverted faces, and CP individuals' performance was compared to that of controls for each task by means of an ANOVA. No significant effect occurred [Group: *F*(1*,* 14) = 0*.*446; n.s.; Condition: *F*(3*,* 42) = 0*.*245; n.s.; Interaction: *F*(3*,* 42) = 3*.*391; n.s.].


**Table 3 | The mean percentages of correct responses for each participant subdivided for each condition in Task 1 (Recognition task).**

**Table 4 | The mean percentages of correct responses for each participant subdivided for each condition in Task 2 (Change detection task).**


The detection of features, and non-emotional and emotional expression changes was assessed by the second task, in which participants were requested to judge if the faces recognized as already seen in the first task were exactly the same or somehow different from those seen in the exposure phase. **Figure 3** illustrates the performance at the second task of controls and CP individuals for each experimental condition.

A third ANOVA was run on change detection accuracy for the upright condition.

A main effect of group emerged [*F*(1*,* 14) = 18*.*68; *p* = 0*.*0007], confirming a better performance of controls in this task (70.75 vs. 50.41%, respectively). A main effect of condition also emerged [*F*(3*,* 42) = 5*.*76; *p* = 0*.*002], as well as a significant interaction between group and condition [*F*(3*,* 42) = 6*.*459; *p* = 0*.*001]. In

particular, these results indicated that the feature condition differed from all the other ones (all *p <* 0*.*005) and that a difference between the two groups was present only when faces had emotional expressions [controls: 89.04% vs. CPs: 29.08%, *t*(14) = −7*.*819; *p <* 0*.*0001; Levene test: *F*(1*,* 14) = 0*.*041; *p* = 0*.*842].

A forth ANOVA was run on change detection accuracy for the inverted condition.

Only the main effect of group [*F*(1*,* 14) = 5*.*976; *p* = 0*.*028] and condition [*F*(3*,* 42) = 4*.*199; *p* = 0*.*01] emerged, confirming a slightly better performance of controls in this task (55.31 vs 39.19%, respectively), and a different performance with feature modified faces than with neutral (*p* = 0*.*004) and emotional expression faces (*p* = 0*.*014).

As it can be seen by the inspection of **Figure 3**, a change in the size of features was really hard to detect both for CP individuals and controls. They all performed below 50%, either with upright or inverted stimuli (CPs: 41.08%, 19.26%, and controls: 45.66%, 38.85%, respectively). This result could be due to the fact that the face processing mechanisms have a low sensitivity to such modifications so as to guarantee efficiency in face identification even when some modifications to the face features (such as a puffiness, for example) occur.

However, the performance at the second task was generally very low in both groups and for this reason we tested each condition in each group vs. the percentage of random responses (50%).

Controls showed a performance above the chance level in the neutral [*t*(9) = 3*.*618; *p* = 0*.*006], the non-emotional [*t*(9) = 6*.*243; *p* = 0*.*0001] and the emotional expression [*t*(9) = 8*.*947; *p <* 0*.*0001] conditions with upright stimuli. As regards the inverted condition, performance was above chance level only in the emotional expression [*t*(9) = 3*.*051; *p* = 0*.*014].

In contrast, the CPs' performance was never significantly above the chance level, and in two conditions were significantly lower: features condition of inverted stimuli [*t*(5) = −3*.*348; *p* = 0*.*020] and emotional expression condition of upright stimuli [*t*(5) = −3*.*325; *p* = 0*.*021].

It is interesting to note that the presence of emotional expressions facilitates the detection of change in the controls, and reduces it in the CPs. It is not the same for non-emotional expressions.

Overall, the results of task 2 suggest a difference in the processing of emotional and non-emotional facial expressions.

The face inversion effect was computed for task 2 as well, and an ANOVA was run with group as a between-subject factor and condition as a repeated-subject factor. No significant effects were found [Group: *F*(1*,* 14) = 0*.*289; n.s.; Condition: *F*(3*,* 42) = 1*.*763; n.s.; Interaction: *F*(3*,* 42) = 1*.*694; n.s.]. Nevertheless, the inspection of **Figures 4**, **5** suggests that CP individuals show a greater inversion effect in the condition of non-emotional expressions, in task 2 as much as in task 1.

### **DISCUSSION**

The aim of this investigation was to shed new light on emotional and non-emotional facial expression processing and to investigate whether in CP individuals these expressions could be used as a cue to face recognition, given that they should be less impaired, or not at all impaired (e.g., Steede et al., 2007). Two consecutive

**plotted as a function of group and experimental manipulations.** Error bars represent standard error of the mean.

tasks were used. In the first task participants had to recognize static unfamiliar faces which could differ either in the emotional facial expressions, in the non-emotional facial expressions, or in the size of invariant features from a set of previously presented faces. The face stimuli were presented either upright or inverted. We also developed a new task (task 2—change detection task), in which participants were asked to detect whether or not a change occurred in the recognized face compared to the exposure session.

The first main result that emerged from our data was that in task 1, in the upright presentation condition, CPs had a significantly worse performance than controls only for two conditions: neutral and feature-modified faces. This is in line with the hypothesis proposed in the literature (e.g., de Gelder and Rouw, 2000; Behrmann and Avidan, 2005) in which congenital prosopagnosia is characterized by an impairment in processing the invariant features of faces.

The second main result concerned the fact that we did not find a difference between CPs and controls in the recognition of unfamiliar face (as our stimuli were) when the manipulations involved facial expressions (emotional and non-emotional), thus suggesting in CPs both a dissociation between changeable and invariant aspects, and a spared processing of the changeable aspects of the face. Although it has already been shown that dynamic facial expressions can help face recognition (Longmore and Tree, 2013), our finding further indicates that CPs could effectively use non-emotional facial expressions of static images as a cue to recognition, but this seems not to be the case for emotional ones (as evident from their performance in task 2).

Even though we did not formally assess CP abilities in discriminate emotional expressions, it is worth noticing that CP individuals were bad at detecting emotional expression changes but this did not seem to affect their ability to detect a change in non-emotional facial expressions. This result is new and suggests that CPs' good performance in the detection of changes in facial expressions is likely to reflect the use of face motion cues even when they have to be derived from a static image of the face, as in the present study.

The anatomo-functional correlate of the processing of changeable aspects of a face is considered to be the Superior Temporal Sulcus (STS; Haxby et al., 2000), the same area which also underlies the processing of biological motion (Allison et al., 2000). It has been reported that responses to facial expressions and other changeable aspects of the face, such as gaze directions, have different locations in the STS (Engell and Haxby, 2007). Therefore, given the heterogeneity of STS, and on the basis of our results, it could be argued that STS region functionality is preserved in CPs. Therefore, one may expect that CP individuals could also perceive biological motion, in the same way they could process the changeable aspect of a face. Future research is needed to clarify this issue.

Interestingly, our results also bring evidence of a further differentiation between emotional and non-emotional facial expression processing. A new finding that extends the results present in the literature (Longmore and Tree, 2013), and is not accommodated by many face recognition models (Bruce and Young, 1986; Kanwisher et al., 1997, 1999; Haxby et al., 2000; Posamentier and Abdi, 2003; Kanwisher and Yovel, 2006). Although CPs' performance in the recognition task did not differ from that of controls in the emotional and non-emotional expressions conditions, in the second task it dropped severely when the change occurred in the emotional facial expressions.

Taken together these data indicate that the processing of emotional and non-emotional facial expressions differs and that a successful recognition of unfamiliar faces can rely on the detection of non-emotional changeable facial features, at least in subjects affected by CP. A possible explanation for this is that emotions conveyed by facial expression have a more universal meaning than non-emotional facial expressions, which instead can be idiosyncratic and more suitable to face recognition (idiosyncratic dynamic facial signature, as defined by O'Toole et al., 2002). In other words, emotional facial expressions are less useful in recognizing an unfamiliar face which has been seen only once. Hence, non-emotional expressions can be used as a better cue to face identity even when the face is unfamiliar. Note also that our findings demonstrate an accurate detection of nonemotional expressions (task 2), other than a dissociation with emotional expressions. Therefore, they can be memorized and used independently from emotional expressions for correct face recognition both by controls and CPs.

We suggest that CPs could rely more on changeable features for improving face recognition, and this is why they could also be more sensitive to detecting differences in these face dimensions.

Our explanation is consistent with the results from a previous study by Lander and Davies (2007), who claimed the possibility of recognizing faces from facial expression even if they are unfamiliar because as a face is learnt, information about its characteristic motion is encoded with identity. Indeed, it seems that typical development individuals were able to extract and encode dynamic information even when viewing a face for a very short time, such as in our exposure session. Our findings are consistent with this idea and support the proposal of a rapid learning of the characteristic of "implied" motion patterns. In this vein, CPs may have developed a special ability to extract information on the identity from the changeable aspects of faces at the expense of a more fine-tuned emotional expression processing.

In controls, the presence of an emotional expression, in fact, facilitates the detection of a difference in the recognized face, while in CPs the performance associated with these stimuli is greatly reduced (task 2). This indicates that in CPs the affective component of facial expression does not play a key role in face recognition.

In line with O'Toole et al. (2002) model, we propose that the processing of facial changeable aspects can lead to face identification since important cues to identity information are extracted through it. These cues are useful for recognizing both familiar (Albonico et al., 2012) and unfamiliar faces, as shown by previous studies (Longmore and Tree, 2013) and the present study. In particular, we argue that the processing of non-emotional facial expressions is preserved and enhanced in CP individuals, who can then use it to compensate their face recognition deficits. We also speculated that the nature of the processing of the changeable aspects of a face could be configural. Specifically, this is true for non-emotional expressions as it is revealed by the presence of a large inversion effect in CP participants both in the recognition and in the change detection task. Interestingly, in our second task (change detection task) the processing of emotional facial expressions seems to be analytic rather then holistic. In fact, not only did CPs show a very poor performance in the detection of a change in the emotional expressions, but they also showed an "inversion of the inversion effect" (i.e., a better performance for inverted than upright stimuli). This is in line with previous studies (Chen and Chen, 2010), which suggested that relevant information for emotion detection is extracted better by facial single district movements and are processed more analytically than non-emotional expression information.

We think that the configural processing of invariant features is the typical mode to reach face recognition and identification, but when this mechanism is impaired such as in congenital prosopagnosia, the analytic processing of single features and the processing of the non-emotional expressions (which are changeable aspects of a face and are processed via a different and dissociable pathway from that of the facial features) can help compensate for face recognition impairments.

In conclusion, congenital prosopagnosics, even if characterized by a deficit in the global processing of invariant features, could show a preserved analysis of changeable aspects, in particular of non-emotional facial expressions which can be used to face recognition.

A speculative hypothesis, to test in future study with a bigger sample size, could be that, although the configural mechanisms processing invariant features are impaired in CPs (in keeping with their difficulty in face recognition tests), the configural processing of changeable aspects could instead be preserved.

## **AUTHOR CONTRIBUTIONS**

Roberta Daini designed the experiment and analyzed the data, wrote the manuscript, discussed the results, and prepared the figures. Chiara M. Comparetti designed, performed the experiment, wrote the Method Section and prepared the stimuli. Paola Ricciardelli discussed the results, wrote and revised the manuscript.

## **ACKNOWLEDGMENTS**

Paola Ricciardelli was supported by a grant from Università di Milano-Bicocca (Fondo di Ateneo 2011). We thank Andrea Albonico, Manuela Malaspina e Laura Corpaccini for helping with data collection.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2014; accepted: 15 November 2014; published online: 03 December 2014.*

*Citation: Daini R, Comparetti CM and Ricciardelli P (2014) Behavioral dissociation between emotional and non-emotional facial expressions in congenital prosopagnosia. Front. Hum. Neurosci. 8:974. doi: 10.3389/fnhum.2014.00974*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Daini, Comparetti and Ricciardelli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Do congenital prosopagnosia and the other-race effect affect the same face recognition mechanisms?

## *Janina Esins <sup>1</sup> \*, Johannes Schultz 1,2, Christian Wallraven3 and Isabelle Bülthoff 1,3*

*<sup>1</sup> Human Perception, Cognition and Action, Max Planck Institute for Biological Cybernetics, Tübingen, Germany*

*<sup>3</sup> Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea*

#### *Edited by:*

*Davide Rivolta, University of East London, UK*

#### *Reviewed by:*

*Tamara L. Watson, University of Western Sydney, Australia Roberta Daini, Università degli studi di Milano - Bicocca, Italy*

#### *\*Correspondence:*

*Janina Esins, Human Perception, Cognition and Action, Max Planck Institute for Biological Cybernetics, Spemannstrasse 38, Tübingen 72076, Germany e-mail: janina.esins@ tuebingen.mpg.de*

Congenital prosopagnosia (CP), an innate impairment in recognizing faces, as well as the other-race effect (ORE), a disadvantage in recognizing faces of foreign races, both affect face recognition abilities. Are the same face processing mechanisms affected in both situations? To investigate this question, we tested three groups of 21 participants: German congenital prosopagnosics, South Korean participants and German controls on three different tasks involving faces and objects. First we tested all participants on the Cambridge Face Memory Test in which they had to recognize Caucasian target faces in a 3-alternative-forced-choice task. German controls performed better than Koreans who performed better than prosopagnosics. In the second experiment, participants rated the similarity of Caucasian faces that differed parametrically in either features or second-order relations (configuration). Prosopagnosics were less sensitive to configuration changes than both other groups. In addition, while all groups were more sensitive to changes in features than in configuration, this difference was smaller in Koreans. In the third experiment, participants had to learn exemplars of artificial objects, natural objects, and faces and recognize them among distractors of the same category. Here prosopagnosics performed worse than participants in the other two groups only when they were tested on face stimuli. In sum, Koreans and prosopagnosic participants differed from German controls in different ways in all tests. This suggests that German congenital prosopagnosics perceive Caucasian faces differently than do Korean participants. Importantly, our results suggest that different processing impairments underlie the ORE and CP.

**Keywords: congenital prosopagnosia, other-race effect, face recognition, Asian, Caucasian**

### **INTRODUCTION**

Recognizing faces is arguably the most important way to identify other humans and bears great social importance. Even though faces are a visually homogeneous object class, most humans are experts in face identification: within milliseconds we can identify a familiar face in poor lighting, after 15 years of aging, 20 pounds of weight loss, or with a different hairdo—and this is true for the several hundred acquaintances we have on average.

One explanation for this achievement is that we use "holistic processing" for faces: we integrate the different components of a face [e.g., the form and color of the features (eyes, nose, and mouth) and their configuration (i.e., spatial distances between the features)] into a whole and do not process single pieces of information individually (Maurer et al., 2002). If the retrieval of this information is disturbed, holistic processing and thus face recognition are impaired (Collishaw and Hole, 2000). Especially configural processing is considered to be one of the most important aspects of holistic processing: disturbing this process alone already strongly affects holistic processing of faces (Maurer et al., 2002).

Most humans are undoubtedly experts at every-day face recognition but this expertise can be disturbed in various ways. Two well-known phenomena in which people show impaired face recognition abilities are congenital prosopagnosia (CP) and the other-race effect (ORE).

CP is an innate impairment in face processing. People with CP often encounter social difficulties, like being considered arrogant or ignorant because they fail to recognize and greet acquaintances. Therefore, some of them tend to keep a socially withdrawn life. Presumably 2.5% of the population is affected (Kennerknecht et al., 2008). In contrast to the acquired form of prosopagnosia, which is caused by acquired brain damage, CP is inborn and there are no evident brain lesions. Also several studies found normal functional brain response to faces in fMRI studies (e.g., Avidan et al., 2005; Avidan and Behrmann, 2009) and EEG studies (e.g., Towler et al., 2012) but subtle differences in connectivity between face processing brain regions for congenital prosopagnosics compared with controls (Avidan et al., 2008). In a single case study of CP, this reduced connectivity could be enhanced by training on spatial integration of mouth and eye regions of faces. The training also had positive effects on

*<sup>2</sup> Department of Psychology, Durham University, Durham, UK*

face recognition performance but vanished after a few months (DeGutis et al., 2007).

The ORE describes the fact that we recognize faces of our own (familiar) race faster and more accurately than faces of an unfamiliar ethnicity (Meissner and Brigham, 2001). This effect (also called "cross-race bias," "own-race advantage," or "otherrace deficit") is a common and known phenomenon. Several models exist to explain the underlying mechanisms causing the ORE. The most common explanation is the higher level of expertise for same-race faces compared with other-race faces (Meissner and Brigham, 2001). This perceptual expertise hypothesis states that the frequent encounter and the training in individuating own-race faces leads to a greater experience in encoding the dimensions most useful to individuate faces of that race. Nevertheless, competing models exists, like the social categorization hypothesis, which states that mere social out-group categorization is sufficient to elicit a drop in face recognition performance (Bernstein et al., 2007). Another hypothesis is the categorization-individuation model which combines perceptual experience, social categorization and motivated individuation (discrimination among individuals within a racial group which requires attending to face-identity characteristics rather than to category-diagnostic characteristics), all three of which co-act and generate the ORE (Hugenberg et al., 2010). The underlying mechanisms are not clear yet, but it has been shown that the ORE can be overcome by training, but only for the trained faces (McKone et al., 2007).

As nearly everyone has experienced the ORE, it is sometimes cited as an example by congenital prosopagnosics when they try to describe to non-prosopagnosics what they experience in everyday life. Both phenomena are characterized by the difficulty in telling people apart or recognizing previously encountered people based on their faces. But also, in both cases, there is evidence for parallels in disturbances of face processing as reviewed in the following.

Some studies used the inversion effect or the composite face effect to test face processing abilities of their participants. The inversion effect describes the effect that face recognition performance is reduced if the faces are presented upside down. The strength of this effect is significantly larger for faces than for other objects for which we are not experts. The composite face effect describes the illusion of a new identity when combining the top half of the face of one person with the bottom half face of another person. The two halves cannot be processed individually and create the face of a new, third person. The illusion disappears when the two halves are misaligned. Both effects, the face inversion and the composite face effect, are considered to be hallmarks for holistic face processing. Both disrupt the configural information leaving the featural information intact. This again is an indication of the importance of configural processing for holistic processing (Maurer et al., 2002). A study testing congenital prosopagnosic participants found no face inversion effect or composite-face effect, neither in accuracy nor in reaction times, indicating their impairment in holistic processing of faces (Avidan et al., 2011). Regarding the face inversion effect for other-race faces, two experiments testing European and Asian participants found a larger effect for same-race faces than for other-race faces in both groups of participants (Rhodes et al., 1989). When testing the composite face task with Asian and European participants, similarly, Michel and colleagues found a significantly larger composite face effect for same-race faces compared with other-race faces (Michel et al., 2006).

In a study conducted by Lobmaier and colleagues, congenital prosopagnosics were tested with scrambled faces (configural information destroyed) and blurred faces (featural information destroyed) in a delayed matching task. Prosopagnosic participants showed significantly worse performance than controls in both conditions (Lobmaier et al., 2010). Chinese and Caucasian-Australian participants tested in an old-new recognition task on blurred and scrambled Asian and Caucasian faces also showed a significantly worse performance for other-race faces than for own-race faces in both conditions (Hayward et al., 2008).

In another study, congenital prosopagnosics participants were tested on a same-different task with the so-called "Jane" set of stimuli (Le Grand et al., 2006). These stimuli faces differ either in features, configuration, or contour. Only a minority of the prosopagnosic participants performed significantly worse than controls on the faces differing in configuration or features, but most prosopagnosics performed significantly worse on faces differing in their contour. A study with Asian participants using the same "Jane" stimuli and a similarly created Asian female face set also showed only marginal effects (Mondloch et al., 2010): Chinese participants were significantly slower on other-race compared with same-race faces (analysis collapsed over all three types (features, configuration, contour), with the longest mean reaction times for the faces differing in contour) but showed no significant differences in performance for any modification (features, configuration, contour). Even though this lack of differences between groups for the "Jane" stimuli was challenged by (Yovel and Duchaine, 2006) (this will be disussed in our general discussion), we note that similar results for other-race observers and prosopagnosic observers were obtained in both studies.

There are several different causes that can reduce face recognition ability (aging, illnesses, drug consumption, etc.). However, the two face recognition disturbances under study here, CP and the ORE, seem to impair face recognition abilities in a similar way, namely by disrupting featural and configural face processing (depending on the used stimuli and task, as reviewed above) causing a lack or reduction of face expertise. Also, in both cases face recognition performance can be increased to a certain extent through training. These similarities could be a hint that the same face processing mechanisms are impaired.

To verify the hypothesis of a common underlying disturbance, it is necessary to compare in detail whether the same kind of impairments appear when looking specifically and directly at featural and configural processing. On one hand, if differences in face recognition performance appear, we can exclude a common underlying disturbance. On the other hand, if similar impairments are found, the hypothesis that the same mechanisms are disturbed is not proven, but possible. In any case, a direct comparison between CP and the ORE is a great chance to get further insights into the yet unknown mechanisms underlying face processing and face recognition.

To conduct this direct comparison we recruited three ageand gender-matched participant groups with a comparatively large sample size of 21 participants per group: German congenital prosopagnosic participants, Korean participants, and German controls. All participant groups performed the same three tests. (1) the Cambridge Face Memory Test (CFMT, Duchaine and Nakayama, 2006), an objective measure of the face recognition abilities of Caucasian faces, (2) a parametric test of the sensitivity to configural and featural information in faces; sensitivity to these two types of facial information has been shown to be reduced in congenital prosopagnosics and other-race observers in previous studies, and (3) a recognition task of faces and familiar and unfamiliar objects to test the influence of expertise on recognition performance.

As all face stimuli used in our tests were derived from Caucasian faces, we expected the Korean group to exhibit evidence of the ORE that could be compared with the performance of the prosopagnosics while the German control group would serve as a baseline. Our predictions for each test were the following: (1) For the CFMT, Koreans and prosopagnosics would have a lower score compared with German controls, due to the disadvantage in recognizing other-race faces for the Koreans and the innate face recognition impairment for the prosopagnosics. This test is a general measure of the severity of face recognition impairments and does not detect if differences in the nature of the impairments exist. (2) We expected to find a decreased sensitivity to configural and featural information for prosopagnosics and Koreans. This prediction was based on reported deficits in processing both kinds of information in prosopagnosic as well as other-race observers (Hayward et al., 2008; Lobmaier et al., 2010 respectively). If prosopagnosics and Koreans would show differences in the extraction of featural and configural information, we could exclude that common mechanisms are impaired. (3) In the object and face recognition test we expected an impaired recognition performance of the face stimuli for Koreans and prosopagnosics, again due to the disadvantage in recognizing other-race faces for the Koreans and the innate face recognition impairment for the prosopagnosics. We expected to find no differences across all participant groups in recognizing the non-expertise object stimuli. Despite a study describing that 54 congenital prosopagnosics selfreported impaired object recognition during interviews (Grüter et al., 2008), most studies explicitly testing object recognition found nearly-normal to normal object recognition abilities for prosopagnosic participants. When impairments were found, they were less pronounced than face recognition impairments (see Kress and Daum, 2003; Le Grand et al., 2006 for reviews).

## **MATERIALS AND METHODS**

**PARTICIPANTS**

We tested three groups of participants: German congenital prosopagnosic participants (from now on referred to as "prosopagnosics"), South Korean participants ("Koreans"), and German control participants ("Germans") with 21 participants per group. The ratio of female to male participants as well as the age of participants in each group was matched as closely as possible. Note that it was hard to recruit older male Korean participants, presumably for cultural reasons; therefore we had to resort to younger male participants in that group to have matching numbers of participants in all groups.

So far, no universally-accepted standard diagnostic tool for CP exists: while the CFMT is widely used to characterize prosopagnosic participants (e.g., Rivolta et al., 2011; Kimchi et al., 2012), other diagnostic means exist. The prosopagnosics of our study were identified by a questionnaire and interview (Stollhoff et al., 2011). Due to time constraints the Koreans and Germans did not participate in the diagnostic interview but reported to have no problems in recognizing faces of their friends and family members. To provide an objective measure of face processing abilities and to maintain comparability with other studies, we tested all participants on the CFMT and report their scores and z-scores, based on the results of the German controls, in **Table 1**.

All participants provided informed consent. All participants have normal or corrected-to-normal visual acuity.

#### *German congenital prosopagnosic participants*

The prosopagnosics were diagnosed by the Institute of Human Genetics, Universitäts-klinikum Münster, based on a screening questionnaire and an diagnostic semi-structured interview (Stollhoff et al., 2011). All prosopagnosics were tested at the Max Planck Institute for Biological Cybernetics in Tübingen, Germany and compensated with 8 Euro per hour plus travel expenses.

### *Korean participants*

The Korean participants were compensated with 30,000 Won (approximately 20 Euro) for the whole experiment. All participants of this group were tested at Korea University in Seoul, South Korea. The Koreans did not perform a diagnostic interview but were asked if they had noticeable problems recognizing faces of friends and family members. None of the participants reported face recognition impairments.

#### *German control participants*

The German control participants were compensated with 8 Euro per hour. All participants of this group were tested at the Max Planck Institute for Biological Cybernetics in Tübingen, Germany. The Germans did not perform a diagnostic interview but were asked if they had noticeable problems recognizing faces of friends and family members. None of the participants reported face recognition impairments.

#### **ANALYSIS**

Many studies found faster reaction times for Asian compared with Caucasian participants regardless of the task (Rushton and Jensen, 2005). We made similar observations in our study and hence we do not compare reaction times between our Asian and Caucasian participants, as any comparison would not give interpretable results. Nevertheless, we compared reaction times for prosopagnosics and Germans for the object recognition task, as participants in both groups share the same ethnicity.

All analyses were conducted with Matlab2011b (Natick, MA) and IBM SPSS Statistics Version 20 (Armonk, NY). The dependent variables analyzed in each test are described in the respective sections.

We report effect sizes as partial eta square (η<sup>2</sup> *<sup>p</sup>*). For One-Way ANOVAs partial eta square and eta square (η2) are the same. For


#### **Table 1 | Overview of the participants in the three different groups.**

*Depicted are their sex (f, female; m, male), age in years, and their scores in the CFMT as well as the according z-scores, based on the results of the German controls.*

our Two-Way ANOVAs partial eta square differs from eta square, therefore we give both values.

#### **APPARATUS**

All participants were tested individually. For prosopagnosics and Germans the experiments were run on a desktop PC with 24-- screen, Koreans performed the tests on a MacBook Pro with a 17-- screen. The CFMT is Java-script based; Matlab and Psychtoolbox were used to run the other experiments. Participants were seated at a viewing distance of approximately 60 cm from the screen.

## **PROCEDURE**

The procedure was approved by the local IRB. All participants completed three tests: (1) the CFMT, (2) a rating task of the similarity of faces differing in features or configuration, (3) an object recognition task. All tests were conducted in the same order to obtain comparable results for each participant. Participants could take self-paced breaks between experiments.

## **TEST BATTERY**

## **CAMBRIDGE FACE MEMORY TEST**

#### *Motivation*

The CFMT was created and provided by Bradley Duchaine and Ken Nakayama (Duchaine and Nakayama, 2006). This test assesses recognition abilities using unfamiliar faces in a 3 alternative-forced-choice task. It has been widely used in recent years in studies of CP and of the ORE. Therefore, we used it here as an objective measure of face recognition abilities.

#### *Stimuli*

As this test has been described in detail in the original study, only a short description is given here. Pictures of the faces of young male Caucasians shown under three different viewpoints and under different lighting and noise conditions were used in recognition tests of increasing difficulty. For a complete description of the test see the original study (Duchaine and Nakayama, 2006).

## *Task*

First the participants were familiarized with six target faces which they then had to recognize among distractors in a 3-alternativeforced-choice task with tests of increasing difficulty. No feedback was given. The test can be run in an upright and inverted condition. We only used the upright condition.

## *Results*

The percent correct recognition of participants was calculated and the mean and standard error of the three participants groups are depicted in **Figure 1**.

Germans (mean percent correct = 82.3%, *SD* = 8*.*3) performed significantly better than Koreans (mean = 73.7%, *SD* = 8*.*8), who performed significantly better than prosopagnosics (mean = 55.2%, *SD* = 5*.*9) [One-Way ANOVA: *F*(2*,* 62) = 67*.*34, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*69, with Tukey HSD *post-hoc* tests: all comparisons *p* ≤ 0*.*002].

#### *Discussion*

As predicted, the Koreans and prosopagnosics performed significantly worse than the Germans. Furthermore, the prosopagnosics performed significantly worse than the Koreans. The significant difference in performance for the Germans and Koreans shows an own-race advantage for the Germans. We assume that reduced performance of the Koreans is due to the ORE; however, as we did not perform the reverse test with Asian faces, we cannot completely exclude an alternative cause for this difference between participant groups. We suggest that this is very unlikely, because the CFMT and its Chinese version (comprising Chinese faces depicted in a similar way and format as the faces in the CFMT; only published after our data acquisition) were already successfully used to measure the ORE in a complete cross-over design in Caucasian and Asian participants (McKone et al., 2012).

From our finding that Koreans show a significantly better recognition performance than prosopagnosics we cannot exclude that the same mechanisms for processing Caucasian faces are affected in these groups. But we can infer that CP has a stronger impact on face recognition abilities compared with the ORE.

## **SIMILARITY RATING OF FACES DIFFERING IN FEATURES OR CONFIGURATION**

#### *Motivation*

This test was conducted to measure in what way and to what extent the retrieval of featural and configural information is disturbed in other-race observers and prosopagnosics. Based on this pattern we want to infer if we can exclude that the same mechanisms for processing Caucasian faces are affected in CP and the ORE. As discussed in the introduction, previous studies found disturbances in holistic processing (e.g., Avidan et al., 2011 for CP; Rhodes et al., 1989; Michel et al., 2006 for the ORE), and disruptions of configural and featural processing (e.g., Lobmaier

**FIGURE 1 | Performance of the 3 participant groups in the CFMT.** Data are displayed as mean percentage correct responses. Error bars: SEM.

et al., 2010 for CP; Hayward et al., 2008 for the ORE). However, other studies using different tasks and stimuli found only minor or no impairments in configural and featural processing (e.g., Le Grand et al., 2006 for CP; Mondloch et al., 2010 for the ORE). The pattern of findings obtained so far was too inconsistent and not detailed enough to draw conclusions regarding our research question. To resolve this controversy and to obtain usable data, we assessed the fine-grained sensitivity to featural and configural facial information and compared the effects of CP and the ORE.

## *Stimulus creation*

We generated eight natural-looking face sets with gradual smallstep changes in features and configuration to determine the grade of sensitivity to featural and configural facial information, without resorting to unnatural modifications (like blurring or scrambling). The faces in each of our stimulus sets differ only in internal features and their configuration. Skin texture and outer face shape were held constant to allow testing purely for sensitivity to internal features and configuration. The face stimuli contain no extra-facial cues (no hair, makeup, clothing, or jewelry).

The stimuli were created using faces from our in-house 3D face database (Troje and Bülthoff, 1996). The faces are 3D laser scans of the faces of real persons. A morphable model allows to isolate and exchange the four main face regions between any faces of the database (Vetter and Blanz, 1999). Those four regions are: both eyes (including eyebrows), the nose, the mouth, and the outer face shape (**Figure 2**). For these regions, the texture (i.e., "skin") and / or the shape can be morphed as well as exchanged between all faces. Additionally the regions can be shifted within each face (e.g., moving the eyes up or apart of each other).

We chose pairs of faces from the database such that the faces in each pair differed largely from each other in both configuration and features. Previous studies that have used faces differing in either features or configuration have shown that participants are more sensitive to featural than to configural changes (Freire et al., 2000; Goffaux et al., 2005; Maurer et al., 2007; Rotshtein et al., 2007). For this reason we further increased the configural

differences of the face pairs by shifting the features slightly (e.g., we moved the eyes closer together in the face which had more closely spaced eyes, and moved the eyes further apart in the other face of the pair). This was done for best conditions to measure configural sensitivity, as this is one main focus of our study, while remaining within natural limits. That the faces are still perceived as natural was tested in a pilot study described further below.

The outer face shape and skin texture of the modified faces were averaged within each pair and applied to both modified faces to create two faces A and B (**Figure 3B**). A and B exhibit different features and inner configuration but identical averaged outer face shape and skin texture. Based on the faces A and B we then generated two more faces by creating a face X with features of face A and the configuration of face B (i.e., the features of face A were moved to the feature locations of face B) and vice versa for face Y (see scheme in **Figure 3A**; see actual face stimuli in **Figure 3B**). By morphing between these four faces in 25% increments we generated a whole set of faces parametrically differing from each other in features (**Figure 3C**, horizontal axes) or configuration (**Figure 3C**, vertical axes). We created eight different sets in the same way as the one depicted in **Figure 3C**, one for each of eight pairs of original faces of our database (note: each original face was used only in one set).

To ensure that the faces we created appeared just as natural as the original faces, we ran a pilot study in which participants rated the naturalness of the modified and original faces without any knowledge about the facial modifications. The modified faces we used for our study showed no significant difference in perceived naturalness compared with the original scanned faces of real people (Esins et al., 2011).

Further, to verify that featural and configural modifications introduced similar amounts of changes in the pictures, we calculated the mean pixelwise image differences between the stimuli with the greatest configural and featural parametrical differences per set. We took the two end point faces of the vertical bar (see **Figure 4**) and calculated their Euclidean distance for each pixel and did the same for the two end point faces of the horizontal bar. Then we calculated the average pixel distance for the two comparisons1 . With this method we obtained mean Euclidean pixel distances for configural and featural changes, for each of the eight created sets. A Wilcoxon signed rank test run on all eight mean distances for the featural changes vs. the eight configural change distances was not significant (*p* = 0*.*31), supporting the

<sup>1</sup>Only pixels which actually differed between both images were taken into consideration. Thus, the gray background and the common outer face shape were omitted for the averaging process. This avoids an artificial reduction of the mean pixel distances.

idea that featural and configural face modifications introduced similar amounts of computational change in the pictures.

#### *Task*

Participants had to rate the pair-wise similarity of faces originating from the same set. Due to time limitations we used only nine test faces per set: the ones located on the central horizontal bar (differing in features) and the central vertical bar (differing in configuration) of each set (see **Figure 4**). Each face was compared with the eight other faces on the central bars of the same set and with itself. Trials in which faces differed in both, features and configuration, were considered filler trials to avoid participants realizing the nature of the stimuli and were omitted from the analysis. Therefore, in sum, for each of the eight sets, we analyzed 29 pair-wise similarity ratings: nine identical face comparisons (100% parametrical similarity), eight face comparisons with 75% parametrical similarity (two faces next to each other in the set), six face comparisons with 50% parametrical similarity, four face comparisons with 25% parametrical similarity, and two face comparisons with 0% parametrical similarity (comparison of the extreme faces of the same bar). So in total there were 232 comparisons during this experiment. The order of comparisons was randomized within and across sets for each participant.

Participants had to rate the perceived similarity on a Likert scale from 1 (little similarity) to 7 (high similarity/identical) and were told to use the whole range of ratings over the whole experiment. The participants saw the first face for 2000 ms, then a pixelated face mask for 800 ms, and then the second face for another 2000 ms. Subsequently, the Likert scale appeared on the screen: here participants marked their rating by moving a slider via the arrow keys on the keyboard (**Figure 5**). The start position of the slider was randomized. There was no time restriction for entering the answer, however, participants were told to rate the similarity without too long considerations. After every 20 comparisons there was a self-paced pause.

The face and mask stimuli had a size of approximately 5.7◦ horizontal and 8.6◦ vertical visual angle. To prevent pixel matching, the faces were presented at different random positions on the screen within a viewing angle of about 7.6◦ horizontally and 10.5◦ vertically.

#### *Analysis*

For every participant we calculated the mean similarity ratings across all eight sets at each of the five levels of parametric similarity (100, 75, 50, 25, 0%). Example data of one German participant is given in **Figure 6**. The black triangles show the average rating of face pairs of all sets differing in features, sorted by the different parametrical similarities. The gray squares show the same for configural changes. As expected, Germans gave similarity ratings close to 7 (high similarity) for very similar faces.

A linear regression (*y* = β*x* + ε) was fitted to these mean similarity ratings (dotted black and gray lines in **Figure 6**). The steepness of the slopes (β) was then used as a measure of sensitivity: steeper slopes indicate more strongly perceived configural or featural changes. For every participant we calculated one regression slope for their featural and one for their configural ratings. The mean and the standard error of the sensitivity β per participant group are illustrated in **Figure 7A**.

To compare performance data, we took a closer look at the pattern of sensitivity to features and configuration: For each individual participant, we subtracted their configural sensitivity from their featural sensitivity. We refer to this difference as 'featural advantage'. The illustration in **Figure 7B** shows the mean of the calculated differences, i.e., the mean of the featural advantage for each group.

#### *Results*

A 2 × 3 ANOVA on the regression slopes β as a measure of sensitivity showed that the main effect of change type (configural, featural) was significant [*F*(1*,* 60) = 233*.*7, *p <* 0*.*001, η<sup>2</sup> = 0*.*46, η<sup>2</sup> *<sup>p</sup>* = 0*.*796]. All participants showed a greater sensitivity to changes in features than to changes in configurations. The main

across all face comparisons of all sets were calculated. The sensitivity ratings for changes in features (black triangles) and configuration (gray squares) are shown separately. The error bars depict standard error. A linear regression (*y* = β*x* + ε) was fitted to both curves individually (dotted black and dotted gray, respectively). The slopes (β) serve as measure of the sensitivity to features and configuration.

effect of participant group (prosopagnosics, Koreans, Germans) was also significant [*F*(2*,* 60) = 6*.*46, *p* = 0*.*003, η<sup>2</sup> = 0*.*07, η<sup>2</sup> *<sup>p</sup>* = 0*.*18]. The interaction between change type and participant group was significant, too [*F*(2*,* 60) = 5*.*48, *p* = 0*.*007, η<sup>2</sup> = 0*.*02, η<sup>2</sup> *<sup>p</sup>* = 0*.*15].

Analysis of simple effects for both change types (configural, featural) was carried out: The group differences of sensitivity to features approaches significance [One-Way ANOVA *F*(2*,* 62) = 3*.*12, *p* = 0*.*0515, η<sup>2</sup> *<sup>p</sup>* = 0*.*09], which was mainly driven by the difference between prosopagnosic and Germans (Tukey HSD *post-hoc* test, *p* = 0*.*051, both other differences *p >* 0*.*17). For configural changes there were significant group differences in sensitivity [One-Way ANOVA *F*(2*,* 62) = 9*.*11, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*23] with prosopagnosics performing significantly differently from Koreans and Germans (Tukey HSD *post-hoc* test, *p* = 0*.*001 and *p* = 0*.*003, respectively. Tukey HSD *post-hoc* test for Koreans vs. Germans *p* = 0*.*91).

For analysis of the featural advantage (**Figure 7B**) we conducted a One-Way ANOVA to further examine the significant interaction of the main effects (participant group vs. change type). The ANOVA showed significant differences between the three groups [*F*(2*,* 62) = 5*.*48, *p* = 0*.*007, η<sup>2</sup> *<sup>p</sup>* = 0*.*15], which are the same values as for the interaction in the 2 × 3 ANOVA, as expected. The Tukey HSD *post-hoc* tests revealed significant differences in the featural advantage between Koreans and prosopagnosics (*p* = 0*.*005), a difference approaching significance for the Koreans vs. the Germans (*p* = 0*.*091) and no difference for prosopagnosics vs. Germans (*p* = 0*.*51).

#### *Discussion*

There is a clear difference in sensitivity to features and configuration of our stimuli faces between Koreans and prosopagnosics: while both groups show about the same sensitivity to featural changes, we found that prosopagnosics have a significantly reduced sensitivity to configuration compared with Koreans (and Germans). Also the featural advantage was significantly smaller for Koreans than for the prosopagnosics. These differences in absolute sensitivity to configural and featural changes, and also the differences in featural advantage, suggest that Korean and prosopagnosic participants do not perceive our Caucasian face stimuli in the same way. Because CP and the ORE show parallels in disrupting featural and configural face processing, we hypothesized that the same mechanisms are disturbed in both cases. This would result in a similarly reduced sensitivity to features and configuration for participants affected by CP or the ORE. But as Korean and prosopagnosic participants show a different pattern of disturbance of their sensitivity, we can reject this hypothesis and conclude that different underlying mechanisms are affected.

Our similarity rating task also allowed to obtain a more detailed picture of the sensitivities to featural and configural information in CP and the ORE. For the prosopagnosics compared with the Germans, the difference between both groups approached significance for sensitivity to features and reached significance for sensitivity to configuration (**Figure 7A**). Our results show a marginally significant difference for prosopagnosics and Germans in featural sensitivity (*p* = 0*.*051). These results bridge the gap between two studies reporting conflicting results using the so-called "Jane" stimuli (Le Grand et al., 2006) and "Alfred" stimuli (Yovel and Kanwisher, 2004; Yovel and Duchaine, 2006), which, like our stimuli, also differ in features and configuration (and contour for the "Jane" stimuli). Only a minority of the prosopagnosic participants performed significantly worse than controls on the "Jane" stimuli differing in features and configuration (Le Grand et al., 2006). Based on the data by Le Grand and colleagues given in Table 4 of that study, comparing prosopagnosics and controls, one can estimate that there was a significant performance difference for the configural but not for the featural modifications. Yovel and colleagues also used the "Jane" stimuli with prosopagnosics and controls and confirmed the significant performance difference between groups for configural modifications and non-significant difference for featural modifications (Yovel and Duchaine, 2006). However, they challenged the "Jane" stimuli for including obvious brightness differences (due to makeup) for the featural modifications. For their own "Alfred" stimuli they found significantly reduced sensitivity to featural and configural modifications for prosopagnosic participants (Yovel and Duchaine, 2006; Duchaine et al., 2007). In turn, their "Alfred" stimuli were challenged for configural modifications going beyond natural limits (as discussed in Maurer et al., 2007). Our newly created stimulus set contains no extra-facial cues (no hair, makeup, glasses, or beard) and exhibits configural changes which have been tested to be within natural limits. With these well controlled stimuli our results suggest that for prosopagnosic participants, the retrieval of the configural information of a face is indeed impaired compared with the Germans. For the sensitivity to features, our results lie between the non-significant results obtained with the "Jane" stimuli and the significant results obtained with "Alfred" faces. Therefore, we conclude that the retrieval of featural information might be impaired for prosopagnosics, although to a lesser degree than the retrieval of configural information.

We found no significant difference in sensitivity to featural or configural information between the Korean and German groups. Our result are in concordance with a previous study, also using the "Jane" stimuli, that found no differences between Caucasian and Asian participants (Mondloch et al., 2010). In contrast, other studies found an own-race advantage for both configuration and feature changes (Rhodes et al., 2006; Hayward et al., 2008). However, we note that the stimuli used in those latter studies involved different kinds of changes than those used in our present study (features and configuration were changed by blurring and scrambling (Hayward et al., 2008) or features were changed through changes in color (Rhodes et al., 2006), which opens the possibility that the ORE impacts differently on the perception of these different kinds of stimulus modifications. Nevertheless, as our stimuli contain more natural and ecological modifications of faces, we believe that our results better reflect participants' face perception. Even though we found no significant differences in sensitivity to featural or configural information between Germans and Koreans, we found that the featural advantage shows a trend to be larger for the Germans compared with the Koreans. Although this difference only approaches significance, we present two explanations for this pattern. The first explanation is that due to the ORE, the sensitivity pattern is altered for our Korean participants. The ORE could reflect Koreans' lower expertise with other-race facial features whereas their configural processing stays unaffected when viewing other-race faces. The second explanation is that the effect is due to cultural differences. Studies have shown that Western Caucasian and Eastern Asian participants focus at different areas of faces and have dissimilar patterns of fixation when looking at faces (Blais et al., 2008). It might be that German and Korean participants employ different strategies when comparing faces in our task, which could have caused the effects we found. In accordance with this hypothesis, a study using Navon figures reported that Eastern Asian participants focus more on global configuration compared with Western Caucasian participants (McKone et al., 2010). By analogy, a greater focus on configurations in faces could explain the reduced featural advantage we observed in the Korean group.

Furthermore, our results show that all groups, regardless of their race and face recognition abilities, were more sensitive to differences in the featural than in the configural dimension of our stimulus set (**Figure 7A**). The presence of a featural advantage is in accordance with findings of previous studies using faces modified within natural limits in their configuration and features, where participants showed a higher sensitivity for featural changes as well (Freire et al., 2000; Goffaux et al., 2005; Maurer et al., 2007; Rotshtein et al., 2007). Even though for the "Alfred" stimuli similar sensitivities to featural and configural modifications were found by Yovel and Kanwisher (2004), their result should be regarded with caution in view of the unnatural configural modifications of their face stimuli (as discussed in Maurer et al., 2007). In contrast, we took care that our face stimuli were always natural looking and pixelwise analyses of our stimuli, as described earlier, have revealed no differences in induced image changes in the featural and configural dimensions. In other words, our stimuli exhibit the same pixelwise variation for the featural and configural changes. The fact that the observers nevertheless show a featural advantage suggests that humans are more sensitive to featural information, and/or perceive these changes to be more profound than changes in configuration. Another possible explanation is that it is more difficult to compare faces differing in configuration than to compare faces differing in features. Additionally, differences between two naturally-occurring faces are more likely to be featural than configural. Therefore, the human face discrimination system might have developed to be better at detecting featural than featural differences between faces.

## **OBJECT RECOGNITION**

#### *Motivation*

In this test we measured the influence of expertise on recognition performance. To this end, we compared recognition performance for objects for which one group has expertise (Caucasian faces) to recognition performance for objects for which no group has expertise (seashells and blue objects).

#### *Stimulus creation*

Three categories of stimuli were used: computer renditions of natural objects (seashells), artificial novel objects (blue objects, dissimilar to any known shapes) and faces. See **Figure 8** for examples of these three categories of objects. All objects and faces where full 3D models, allowing to train and test participants on different viewpoints (see below). For each category we created four targets and twelve distractors.

Sixteen synthetic seashells were taken from a previously created stimulus set (Gaißert et al., 2010). The shells were created using a mathematical model (Fowler et al., 1992) implemented in the software ShellyLib (www*.*shelly*.*de). Attention was paid to sample stimuli spread evenly over the parametrically defined stimulus set space (see Gaißert et al., 2010 for details).

The blue objects were created with 3D Studio Max by Christoph D. Dahl (unpublished work) and were novel to all participants. Differences between these objects are less obvious for a human observer, making recognition more difficult.

For the face stimuli, 16 male Caucasian faces were selected from the MPI 3D face database (Troje and Bülthoff, 1996). The 16 faces where chosen to have as little salient distinctive features as possible (all were clean shaven, had the same gaze direction; showed no blemishes or moles, etc).

None of the stimuli had been seen before by our participants. We created two sets of images for each stimulus category: frontal views for the learning phase, and stimuli rotated by 15 degrees to the right around the vertical axis (yaw) for the testing phase. The change between learning and testing was designed to prevent pixel matching of the stimuli.

All stimuli were shown at a viewing angle of approximately 9.5◦ horizontally and vertically.

## *Task*

There was one block of trials per stimulus category, with the same procedure in all three blocks, as follows: During the learning phase, participants had to memorize four target exemplars depicted in frontal view. First, all four targets were shown together on the screen, then each of the four targets was shown one after the other, and finally all target exemplars were presented together again. Participants could control when to switch to the next screen via a button press. They were aware that if they switched to the next view they could not return to the previous one. No time restriction was applied. During testing, participants saw the images depicting the targets and distractors of the same category under a new orientation and performed an old-new-decision task by pressing buttons on a standard computer keyboard (old = left hand button press; new = right hand button press). Stimuli were presented for a duration of 2000 ms or until key press, whichever came first. The next image appeared as soon as an answer was entered.

Targets and distractors were presented in pseudo-randomized order: The testing was divided into three runs. Four targets and four distractors per category were shown in each run. While the targets were the same in each run, four new distractors were presented, such that all four targets were seen three times and each of the 12 distractors was seen only once. The order of the stimulus blocks (shells, faces then blue objects) was fixed to induce

similar effects of tiredness in all participants. Participants took short self-paced breaks between blocks.

We kept the number of targets and distractors low, as performing tests with faces can be demotivating for prosopagnosics. We used the same number of stimuli in all stimulus categories to ensure comparability. The high similarity between the nonface objects was designed to avoid ceiling performance despite the low number of stimuli and to mimic the homogeneity of the face stimuli.

## *Analysis*

The results were analyzed based on the dependent measure *d*- . The term *d* refers to signal-detection theory measures (Macmillan and Creelman, 2005) and is an index of subjects' ability to discriminate between signal (target stimuli) and noise (distractors). The maximum possible *d* value in this experiment is 3.46 (this depends on the number of trials). A *d* of zero indicates chance discrimination performance, higher values indicate increasing ability to tell targets and distractors apart.

### *Results*

For a summary analysis of the general influence of object category (faces, shells, blue objects) and participant group (prosopagnosics, Koreans, Germans) we ran a 3 × 3 ANOVA on the *d*- values. The main effect of participant group was not significant [*F*(2*,* 60) = 1*.*22, *p* = 0*.*303, η<sup>2</sup> = 0*.*009, η<sup>2</sup> *<sup>p</sup>* = 0*.*04] but the main effect of object category was [*F*(2*,* 60) = 145*.*54, *p <* 0*.*001, η<sup>2</sup> = 0*.*52, η<sup>2</sup> *<sup>p</sup>* = 0*.*71], as well as the interaction between participant group and object category [*F*(4*,* 120) = 7*.*14, *p <* 0*.*001, η<sup>2</sup> = 0*.*05, η<sup>2</sup> *<sup>p</sup>* = 0*.*19]. **Figure 9** depicts the performance of all groups graphically. The Germans and the Koreans were better at recognizing faces than shells and worst for recognizing the blue objects. This order differs for the prosopagnosics who were best at recognizing shells, faces and blue objects in that order.

A One-Way ANOVA on the *d* values for each object category across participant groups revealed significant differences for

**experiment.**

the face stimuli: *F*(2*,* 62) = 8*.*14, *p* = 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*21. A *post-hoc* analysis showed that prosopagnosics' performance was significantly different from the other two groups (Games Howel test, *p* ≤ 0*.*01 for prosopagnosics vs. Koreans and prosopagnosics vs. Germans). The other One-Way ANOVAs and *post-hoc* tests on the level of shells and blue objects, respectively, were not significant (all *p*s *>* 0.2).

We also compared reaction times of Germans and prosopagnosics for the non-face object categories (shells, blue objects) with the Wilcoxon Rank sum test. We found no significant differences (*p* = 0*.*13 for shells, *p* = 0*.*31 for blue objects).

#### *Discussion*

As expected, no significant differences between groups were found for shells and blue objects. This can be explained by the fact that all participants, equally, were non-experts for these objects. Performance differed only for faces. We found that prosopagnosics, as non-experts for faces, performed less well on face recognition than the other two groups. Interestingly, the Koreans, also non-experts for our Caucasian stimuli, did not exhibit a lower recognition performance than Germans. An obvious reason for the absence of the ORE is the small amount of targets to be memorized for this test. It is thus likely that the task was too easy for all non-prosopagnosic participants. For the prosopagnosics, our results show that the task is difficult even with this small amount of target faces. This confirms the results we observed in the CFMT, namely that CP has a stronger impact on face recognition abilities compared with the ORE.

We compared recognition performance for faces not only with one type of objects but with easy and difficult object categories, which reduces the risk of ceiling or flooring effects. Germans and Koreans recognized the non-face objects less easily than the faces, probably because, even for Koreans, their expertise for faces is better than their expertise for the visually similar nonface objects. For prosopagnosics the accuracy performance for faces lay between their performance for easy and difficult object categories. This indicates that the stimuli were not too easy to recognize.

Our findings confirm previous results indicating that, although some prosopagnosics might show object recognition deficits, those impairments are less severe than their face recognition deficits (Kress and Daum, 2003; Le Grand et al., 2006). But a further aspect of object recognition expertise worth exploring is reaction times. Behrmann and colleagues found that object recognition deficit of their five prosopagnosic participants does not show in accuracy performance, but in reaction time (Behrmann et al., 2005); and in a study by Duchaine and Nakayama (2005), many prosopagnosic participants exhibited longer reaction times rather than lower recognition accuracy compared with control participants: four of their seven prosopagnosic participants had a reaction time slower by more than 2 SD compared with the mean reaction time of their controls in most tasks. We did not find slower reaction times for non-face object recognition for prosopagnosics compared with Germans. These results thus exclude a general recognition deficit in our prosopagnosics.

## **CORRELATIONS BETWEEN TESTS**

Given that we ran several face processing experiments with different tasks testing for different aspects of recognition, we also examined the degree of correlation between test performances. For this we calculated Pearson's correlations between task performances across participants of all groups (**Table 2**).

Performance on all four face-related tasks [CFMT, sensitivity to features (Feat) and configuration (Conf), object recognition task with face stimuli (Faces)] were positively and significantly correlated or approached significance. The effect sizes of these correlations (0.22 *< r <* 0.49) were medium and hence the proportions of shared variance (0.05 *< r*<sup>2</sup> *<* 0.24) were rather small. Thus, we assume that although different aspects of face perception are investigated by the tests (i.e., recognition performance, memory, and sensitivity to features and configuration) these aspects are nevertheless to some degree dependent from each other.

Surprisingly there was another significant, but negative correlation (with a rather small effect size): participants with a high sensitivity to configuration of a face tended to have bad performance in the shell recognition task. The small proportion of shared variance of *r*<sup>2</sup> = 0*.*09 led us to refrain from any speculations.

## **GENERAL DISCUSSION**

The combination of tasks used in this study tested various aspects of face and object recognition, which allowed us to compare directly the influence of CP and the ORE. Our hypothesis, based on previous findings, was that in CP and the ORE the same underlying mechanisms might be affected. While we could disprove this hypothesis (this is discussed in detail below), we were able to confirm results of previous studies and importantly we gain new insights concerning the similarities between these two impairments of face recognition.

First, we were able to replicate the findings that congenital prosopagnosics exhibit face recognition deficits but no object recognition deficits (Le Grand et al., 2006). Second, we were able to replicate the ORE with our Koreans in the CFMT. Interestingly our results differ somewhat from the results by McKone et al. (2012) who only found a trend toward a different performance between their Asian and Caucasian participants on the original CFMT. A possible explanation for this discrepancy is that their Asian participants may have had more experience with Caucasian faces because they were overseas students living in Australia at the time of testing. Our Asian participants were tested in Korea and thus were likely to have less experience with Caucasian faces. Third, our experiment testing sensitivity toward featural and configural changes within a face resolves discrepancies between studies testing sensitivity toward featural and configural facial information for prosopagnosics (Le Grand et al., 2006; Yovel and Duchaine, 2006). Our results, in the context of previous studies, show that, compared with German controls, prosopagnosics exhibit an impaired sensitivity toward configural information and possibly and only to a lesser extent, toward featural information of a face.

Importantly, besides those confirmations of previous findings, we report the new finding that sensitivities to features and


**Table 2 | Pairwise correlations between test scores of all participants combined.**

*Depicted are the correlation coefficient, and in parentheses the p-value of the coefficient. Negative correlations are marked in red, significant correlations are written in bold letters. (CFMT, final score; Feat, sensitivity to featural changes in a face; Conf, sensitivity to configural changes in a face; Shells, Faces, Blue objects: d* - *values for shells, faces and the blue objects in the object recognition task.)*

configuration of a face differ between Korean and prosopagnosic participants. For both groups, the observed sensitivity to the featural changes in a face was about the same. The Koreans, however, were better than prosopagnosics (and as good as Germans) at detecting fine changes in configural information in a face. When comparing CP with the ORE, we asked if they derive from a disturbance in the same underlying mechanisms. Our results indicate that this is not the case: especially the difference in absolute sensitivity to configural and featural changes for prosopagnosic and other-race observers is a strong indicator that CP and the ORE impair face recognition differently. As we used the same face stimuli to test all participant groups, our results indicate that lacking expertise for a certain face group does not impact configural processing of those faces (Korean group), while CP does (prosopagnosic group). Even though we cannot explain what exactly causes this difference, these results clearly show that there are different mechanisms underlying both impairments. Therefore, we are not "prosopagnosic for other-race faces" (see also Wang et al., 2009).

Our second main finding is that face recognition performance is more strongly affected by CP than by the ORE. Our prosopagnosics performed significantly worse than the Koreans in all face recognition tasks. A possible explanation is that generally an existing expertise for same-race faces can be used for recognition of untrained other-race faces, while no such expertise exists in CP (Carbon et al., 2007).

The findings of our test battery also have some further implications for the general understanding of face perception and face processing. First, we find that better configural sensitivity relates to better face recognition ability. Koreans and Germans performed significantly better in the general face recognition task Cambridge Face Memory Test, and at the same time showed a significantly higher sensitivity to configural changes in our second test than the prosopagnosics. This importance of configural processing for holistic processing was so far only shown by disrupting configural information, e.g., by the inversion effect (Freire et al., 2000). Our finding is an important result that allows us to get further insight about which aspect of face recognition relates with being a good face recognizer. When correlating performance in the CFMT with the sensitivity to configural changes across all participants, we obtained a significant but medium proportion of shared variance of *r*<sup>2</sup> = 0.24 (which is larger than the proportion of shared variance of *r*<sup>2</sup> = 0.09 of performance in the CFMT and sensitivity to featural changes). Until now studies looking for processes related to face recognition performance mostly correlated it to holistic processing in general (e.g., performance in the composite face task or part-whole-face-task). Different proportions of shared variance were found: either zero (*r*<sup>2</sup> = 0*.*003, Konar et al., 2010), or medium (*r*<sup>2</sup> = 0*.*16, Richler et al., 2011), or similar to our value (*r*<sup>2</sup> = 0*.*21, DeGutis et al., 2013). The range of results in these studies might be explained by the different measures used for face recognition (CFMT vs. own identity recognition tasks), holistic processing (composite face task vs. part-whole-face-task) and different approaches to calculate the effect scores (subtraction scores vs. regression scores, and partial vs. complete composite face design). Whether general problems in processing faces results in an inability to see subtle differences in facial configuration, whether a reduced sensitivity to configuration results in impaired face recognition ability, or whether configural sensitivity and face recognition performance are impaired by disrupting a common underlying process remains an open question. This is a decadeold, and as-of-yet unanswered issue (Barton et al., 2003) which we cannot address using our current data. Nevertheless, our results strengthen the hypothesis that configural processing is linked to face recognition ability, but the proportions of shared variance are only low to medium, which show that configural sensitivity and/or holistic processing cannot solely explain face processing abilities.

The second implication of our findings for face processing stems from the fact that we find no difference in terms of sensitivity to facial features between Koreans and prosopagnosics. This suggests that this aspect is not crucial for determining face recognition abilities. This finding is supported by the low effect size found in correlating the sensitivity to featural changes with face recognition performance (tested either using the CFMT or the face recognition performance in the object recognition task): only a small portion of the variance of face recognition abilities is explained by the sensitivity to differences in features (*r*<sup>2</sup> = 0.09 and 0.11 in both cases).

Overall, with our test battery we were able to replicate results of previous studies and provide new insights into the face processing disturbances caused by CP and the ORE. Thus, when a (Caucasian) prosopagnosic person tries to explain his or her condition to a (Korean) non-prosopagnosic person with the ORE ("They all look the same to you; everyone else does for me, too") this is an inexact comparison. Although the perception of Caucasian faces by Koreans and prosopagnosics observers differs, the analogy probably gives at least an idea of the problems congenital prosopagnosics (though to a stronger extent) have to face.

## **ACKNOWLEDGMENTS**

This research was supported by funding from the Max Planck Society, as well as from the world class university (WCU) program. We would like to thank all participants who participated in this study. Also the help of Prof. Dr. Ingo Kennerknecht in contacting the prosopagnosic participants is highly appreciated. We thank Bradley Duchaine and Ken Nakayama, Nina Gaißert, and Christoph Dahl for graciously giving us their stimulus material.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fnhum*.* 2014*.*00759/abstract

#### **REFERENCES**


using inverted face stimuli and prosopagnosic participants. *Neuropsychologia* 44, 576–585. doi: 10.1016/j.neuropsychologia.2005.07.001


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2014; accepted: 08 September 2014; published online: 29 September 2014.*

*Citation: Esins J, Schultz J, Wallraven C and Bülthoff I (2014) Do congenital prosopagnosia and the other-race effect affect the same face recognition mechanisms? Front. Hum. Neurosci. 8:759. doi: 10.3389/fnhum.2014.00759*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Esins, Schultz, Wallraven and Bülthoff. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The man who mistook his neuropsychologist for a popstar: when configural processing fails in acquired prosopagnosia

Ashok Jansari <sup>1</sup> \*, Scott Miller <sup>2</sup> , Laura Pearce<sup>2</sup> , Stephanie Cobb<sup>2</sup> , Noam Sagiv <sup>3</sup> , Adrian L. Williams <sup>3</sup> , Jeremy J. Tree<sup>4</sup> and J. Richard Hanley <sup>5</sup>

<sup>1</sup> Department of Psychology, Goldsmiths, University of London, London, UK, <sup>2</sup> School of Psychology, University of East London, London, UK, <sup>3</sup> Department of Life Sciences, Centre for Cognition and Neuroimaging, Brunel University, London, UK, <sup>4</sup> Department of Psychology, College of Health and Human Sciences, Swansea University, Swansea, UK, <sup>5</sup> Department of Psychology, University of Essex, Colchester, UK

We report the case of an individual with acquired prosopagnosia who experiences extreme difficulties in recognizing familiar faces in everyday life despite excellent object recognition skills. Formal testing indicates that he is also severely impaired at remembering pre-experimentally unfamiliar faces and that he takes an extremely long time to identify famous faces and to match unfamiliar faces. Nevertheless, he performs as accurately and quickly as controls at identifying inverted familiar and unfamiliar faces and can recognize famous faces from their external features. He also performs as accurately as controls at recognizing famous faces when fracturing conceals the configural information in the face. He shows evidence of impaired global processing but normal local processing of Navon figures. This case appears to reflect the clearest example yet of an acquired prosopagnosic patient whose familiar face recognition deficit is caused by a severe configural processing deficit in the absence of any problems in featural processing. These preserved featural skills together with apparently intact visual imagery for faces allow him to identify a surprisingly large number of famous faces when unlimited time is available. The theoretical implications of this pattern of performance for understanding the nature of acquired prosopagnosia are discussed.

Keywords: face-recognition, FRUs, Navon, mental-imagery, prosopagnosia, featural processing, holistic processing, configural processing

## Introduction

Several acquired prosopagnosic patients have been reported with severe difficulties in identifying faces despite being able to recognize other classes of objects (e.g., McNeil and Warrington, 1993; Riddoch et al., 2008; Rivest et al., 2009; Rossion et al., 2011). The existence of such cases can be used to suggest that a special system dedicated to faces that is not involved in object recognition has been damaged. However, because human faces share the same basic features, it could also be argued that faces are simply very "difficult objects" to recognize. Partial damage to the recognition system might affect faces, but not objects if faces require an additional level of visual processing relative to objects. This position is weakened, however, by the case of acquired object agnosic CK

#### Edited by:

Mark A. Williams, Macquarie University, Australia

#### Reviewed by:

Beatrice De Gelder, Maastricht University, Belgium Jeremy B. Wilmer, Wellesley College, USA

#### \*Correspondence:

Ashok Jansari, Department of Psychology, Goldsmiths, University of London, Lewisham Way, London, SE14 6NW, UK a.jansari@gold.ac.uk

> Received: 23 May 2014 Accepted: 19 June 2015 Published: 17 July 2015

#### Citation:

Jansari A, Miller S, Pearce L, Cobb S, Sagiv N, Williams AL, Tree JJ and Hanley JR (2015) The man who mistook his neuropsychologist for a popstar: when configural processing fails in acquired prosopagnosia. Front. Hum. Neurosci. 9:390. doi: 10.3389/fnhum.2015.00390 (Moscovitch et al., 1997) and the case of developmental agnosic AW (Germine et al., 2011) who are able to identify faces despite having significant difficulties in recognizing everyday objects.

Farah (1990, 1991, 2000) has claimed that recognition of objects and faces typically rely on two distinct forms of visual processing. In Farah's exposition, objects typically require decomposition into parts before they can be identified. For example, identifying a chair might involve recognizing that it has some legs, a flat surface on top of these legs and some sort of back section. The ability to interpret and encode individual parts will be referred to here as "featural processing." Conversely, Farah claimed that faces cannot be recognized by decomposition into parts and are therefore recognized almost exclusively using another system that sees the whole. The ability to combine individual parts into a whole has been given various names by different authors such as holistic, gestalt, or configural processing (e.g., Calis et al., 1984; Diamond and Carey, 1986; Young et al., 1987; Moscovitch et al., 1997; Barton, 2009; see Maurer et al., 2002 for a review). In this paper, following Maurer et al. (2002) we will use holistic to refer to the ability to "glue" individual elements of a face into a coherent whole and configural to refer to "firstorder relations that define faces (i.e., two eyes above a nose and mouth)" (p. 255).

Can the dissociation between object agnosia and prosopagnosia be explained solely in terms of the distinction between featural and configural processing? There is strong evidence that the agnosic patient CK (Moscovitch et al., 1997; Moscovitch and Moscovitch, 2000) has preserved configural processing despite impaired featural processing. CK could recognize familiar faces whenever the configural information appeared to be accessible from the visual stimulus. He therefore performed well-when faces were presented as cartoons, as caricatures, in disguise, and when a single internal feature had been removed. He could also recognize famous faces when all of the external features had been removed, and when they were vertically misaligned. Crucially, however, CK was severely impaired at recognizing inverted famous faces where the configural information is hard to extract. He also performed poorly on other facial recognition tasks in which the configural information was reduced or absent such as recognition of famous faces from their external features and recognition of horizontally misaligned famous faces. Although normal controls were inconvenienced by these manipulations, they all performed very much better than CK. Presumably this is because their object recognition system (unlike that of CK) is able to perform some degree of compensatory feature-based processing on a face when configural information cannot be accessed. On the basis of CK's preserved and impaired pattern of performance with faces, Moscovitch et al. (1997, p. 592) concluded that the ability to identify faces depends crucially on the "spatial relations of the internal features of a face (the eyes, the nose, and the mouth) to each other" and is quite separate from the ability to recognize objects.

Important evidence concerning the existence of a configural deficit in prosopagnosia has come from the study of patient PS who has no low level visual processing impairments. PS is able to distinguish Arcimboldo faces and Mooney faces (Rossion et al., 2011) from non-facial stimuli accurately and at normal speed. This finding suggests that she has preserved ability to process faces holistically. However, PS is severely impaired at matching upright unfamiliar faces (Busigny and Rossion, 2010), but performs as accurately and quickly as controls at matching inverted unfamiliar faces. This dissociation suggests that she can use featural but not configural information to distinguish one face from another. Further evidence of a configural deficit is that PS shows no evidence of perceiving facial features or facial composites more accurately when they appear in the context of a whole face (Ramon et al., 2010) than when they appear alone. This finding suggests that she processes individual facial features independently of the overall facial configuration. Another acquired prosopagnosic (GG) also showed no face inversion effect, no face composite effect and no part-whole advantage on unfamiliar face recognition tasks, consistent with impaired configural but preserved featural processing (Busigny et al., 2010). Barton et al. (2003) reported an acquired prosopagnosic who could detect facial feature changes but was relatively insensitive to manipulations that distorted overall facial geometry of unfamiliar faces. Similar results have also been reported by de Gelder and colleagues (e.g., Huis In 't Veld et al., 2012) with individuals who have the developmental variant of prosopagnosia.

Such studies provide convincing evidence that the problems that prosopagnosic patients such as PS and GG experience in processing unfamiliar faces are associated with a configural processing deficit. Nevertheless, there is less evidence that the core deficit in recognizing familiar faces in acquired prosopagnosia is caused by a configural processing impairment. It has never been demonstrated that prosopagnosic patients perform well at identifying familiar faces when featural rather than configural processing appears to be critical for recognition. For example, PS was impaired relative to controls at recognizing the inverted faces of the students that she taught (Busigny and Rossion, 2010). Similar findings in another individual with acquired prosopagnosia were reported by Rivest et al. (2009), who performed much worse than controls at identifying familiar faces even when they were inverted or fractured.

Such findings raise the possibility that the core deficit in recognizing familiar faces in at least some forms of prosopagnosia is at a deeper level than a purely configural processing deficit. For example, Burton et al. (1991) and Burton and Young (1999) argue that associative prosopagnosics have an impairment at the level of face recognition units (FRUs). More precisely, they claim that the appropriate FRU may be activated when a familiar face is seen, but the connections between the FRU and stored knowledge about the person are so weak that the face is not overtly recognized. On the assumption that the same FRUs are used to recognize familiar faces regardless of orientation, an impairment of this kind should affect recognition of familiar faces regardless of whether they are presented upright, inverted or fractured. The view that the familiar face identification impairment in prosopagnosia is caused by a configural processing deficit would therefore be bolstered if a patient can be found whose performance when recognizing familiar faces shows preserved featural but impaired configural processing. Despite performing badly on tests with unfamiliar faces that require configural processing, such a patient might perform well at familiar face processing tasks such as inverted or fractured face recognition. Below we report the case of an individual (DY) who appears to fit this profile. As we will demonstrate, however, his performance is different in some interesting respects from that typically found in acquired prosopagnosia.

A total of nine studies are presented, two demonstrating the specificity of DY's prosopagnosia, four manipulating levels of configural and featural processing of faces, one manipulating global vs. local processing of non-face stimuli and two investigating visual imagery for famous faces. Experiment 1 addresses DY's ability to make within-category discriminations for non-face visual objects, while Experiment 2 assesses his performance on a visual recognition task of objects and famous faces that have been matched for difficulty. Experiment 3 uses the classic face inversion study to demonstrate the impact of this paradigm on DY's processing of unfamiliar faces while Experiment 5 addresses the impact of inverting a small set of famous faces that he sometimes recognizes. Experiments 4 and 6 use variants of paradigms devised by Moscovitch et al. to pit featural and configural processing against one another using familiar faces. Experiment 7 uses a classic cognitive paradigm devised by Navon (1977) to investigate global and featural processing in non-facial stimuli. This experiment provides evidence that DY's configural deficit is of a general kind and is not confined to the processing of facial materials. Finally, Experiment 8 adapts the Young et al. (1994) approach to looking at mental imagery for faces in the patient using a forced-choice recognition task while Experiment 9 addresses this issue using a free recall paradigm.

## Case History

DY is a right-handed male sales executive born in 1946. After a routine eye check in 1999, a left homonymous hemianopia was revealed and a subsequent MRI scan identified a large arteriovenous malformation (AVM) located in the right posterior hemisphere. In 2000, DY was treated with embolization of the AVM which involves obstruction of the AVM blood vessels with a special glue. This was followed by gamma-knife surgery which captured 90% of the malformation. This is a procedure for treating tumors and AVMs using gamma radiation delivered to a precise location by concentrating multiple beams from weaker sources. In 2001, DY suffered a right occipital intracerebral hemorrhage resulting from bleeding from the AVM. **Figure 1** presents two T2-weighted fluid attenuated inversion recovery (FLAIR) images taken 10 years later in 2011 illustrating DY's lesion. The MR signal from the CSF is suppressed and results in the lesion being more prominent. DY's lesion appears to be confined to the posterior RH, but affecting parietal, temporal, and a large part of the occipital lobe. Regions affected include the precuneus, the cuneus, and lingual gyrus. Also affected are middle temporal and fusiform gyri in the temporal lobe, and the middle and inferior occipital gyri. The regions affected include Brodmann areas 7, 17–19, and 37.

DY recovered in hospital and reported significant cognitive difficulties, mostly memory problems and disorientation. DY felt very distressed because he did not recognize his wife or grandchildren when they came to visit him in hospital. After leaving hospital, DY reported several cognitive difficulties. Initially he experienced confusion in certain situations such as looking at items in the fridge, and he found looking at the shelves in supermarkets unbearable. DY described the experiences, in his own words as "dyslexia of the eyes." Most of these initial difficulties resolved, but his ability to recognize faces has never returned to normal. DY recalls an incident when shopping with his wife, where they separated. Later, upon seeing him again in the street, his wife walked toward him and waited to be acknowledged; he did not recognize her and walked straight past. He has now adopted techniques such as remembering his wife's clothes if they go out so that he can tell who she is when in crowds. DY reports that he is often able to perform relatively well in everyday life with his impairment because if he is expecting to see somebody in a location he is more likely to recognize them successfully by using memory of voices or clothing. In fact, using his very strong visual memory skills, he is able to disguise an extremely profound impairment.

During preliminary testing, analysis of DY's verbal protocol while naming faces suggested that the majority of his correct recognitions were based on individual features found in faces rather than a simple recognition of the face. During this phase of testing, he named a picture of his neuropsychologist (and first author, AJ) as the popstar George Michael. Whilst incorrect, this showed that DY was using AJ's goatee beard, slightly darker skin and gold earring to arrive at the name of George Michael. Now that DY associates the goatee beard with AJ, he has commented a number of times that if AJ shaves off the goatee beard, he will no longer be able to "recognize" him.

DY currently reports no noticeable difficulties with recognizing objects but does have difficulty in finding his way, and often has to ask for directions. There were no indications of difficulties with reading or with color recognition. There was no evidence of a loss of long term memory. The studies reported were conducted over a period of 5 years starting when DY was 60 years of age and in generally good health.

## Participants

DY's performance was assessed relative to either published norms or against a group of age and WTAR-IQ (Wechsler, 2001) matched healthy male control participants. Due to the long timespan of the studies reported, different sets of controls were used for each study. Crawford and Garthwaite's (2002) method for comparing a single case with a group of control subjects was used for statistical comparison of DY's performance against that of the controls.

## Ethical Approval

All the studies described received approval from the University of East London's Ethics Committee.

FIGURE 1 | Two axial T2 FLAIR MR images showing the location of DY's lesion (1 × 1 mm in-plane resolution, 3 mm slice thickness; slice locations relative to the nasion are as follows: (A) −35 mm, (B)

−26 mm). The lesion is confined mainly to the posterior regions of the right hemisphere (predominantly occipital, but also including the precuneus), and extending ventrally into temporal lobe regions.

## Background Testing/Standard Cognitive Functioning

## Visuo-Spatial Processing

## Visual Object and Space Perception Battery (VOSP; Warrington and James, 1991)

DY was normal on five out of the eight subtests of the VOSP (**Table 1**) and his main impairments were on the object identification tasks of "silhouettes" and "progressive silhouettes." In these tests, only the outline forms of the objects are visible. Poor performance on all of these tests suggests that DY suffers from difficulties with recognition of the outlines of shapes, termed, global forms. In object decision tasks when the parts of the objects are available, performance is at a normal level.

## Birmingham Object Recognition Battery (BORB;

Riddoch and Humphreys, 1993)

On this extensive test of visuo-spatial abilities which addresses different levels of visual processing, DY's performance was within the normal range on all tests apart from overlapping figures and matching horizontal lines (**Table 1**). He was able to recognize individual letters, geometric shapes, and line drawings with no difficulty; however, when these were overlapped with each other, DY's performance time in correctly naming the figures fell outside the normal range.

## Facial Expressions of Emotions: Stimuli and Tests (FEEST; Young et al., 2002)

In this test of ability to process emotions from faces, a series of faces showing six standard emotions are presented on a computer screen with six verbal labels corresponding to each emotion. Apart from particular problems in identifying the "anger" emotion, DY's performance was within one standard deviation of the control mean and was often superior.

## Recognizing Mooney faces (Busigny et al., 2010)

Mooney faces are two-tone black and white pictures of faces that do not contain clear facial features. It is difficult to see them as faces when they are presented as inverted. This finding suggests that holistic processing is required in order to identify Mooney faces accurately.

The procedure developed by Busigny et al. (2010) was used with DY. Busigny et al.'s procedure involved presentation of 80 black and white Mooney faces selected from an original set created by Schurger and colleagues (Art of Science Competition, Princeton University, http://www.princeton.edu/ artofscience/gallery). The 80 stimuli were presented both upright and upside-down randomly in two blocks of 80 trials. Each stimulus was presented on a gray background and the participant had to decide whether or not they saw a face by pressing one of two keys on the computer keyboard; they were informed that they should only use the "face" response if they saw a face upright and that anything else should be categorized as a non-face. Participants were instructed to respond as accurately and as quickly as possible. Following their response, a central fixation cross was presented for 300 ms and then a gray screen for 300 ms before the next stimulus.

The results showed that DY was correct on 127/160 correct which was significantly different from that of his matched controls (M = 146.4, SD = 7.2), t(7) = 2.54, p = 0.019. DY's average response time per trial was 1317 ms which was also significantly different to that of the controls (M = 940 ms; SD = 117), t(7) = 3.04, p = 0.009. It therefore appears that DY, has an impairment in holistic processing of unfamiliar faces.

TABLE 1 | Breakdown of DY's performance on VOSP and BORB subtests.


## Benton Face-Matching Test (Benton and van Allen, 1968)

On the Benton Face-Matching Test, DY's score of 41 placed him just within normal limits. At face value, this result could be interpreted as normal face perception. Indeed, De Renzi and Pellegrino's (1998) case Anna, similar to DY, did not exhibit object agnosia and showed poor performance on a range of face-perception tasks. De Renzi et al. (1991) took Anna's normal performance on the Benton test as implying intact face perception. However, Duchaine and Nakayama (2004) have shown that this test has poor specificity for picking up face recognition difficulties. Further, Farah (1991) rightly cautions against using just accuracy for interpreting performance on this task since this can mask an abnormal strategy (an issue that is very relevant to DY's performance in Experiment 6 of the current study—see later). As pointed out by Newcombe (1979, p. 319) "Some prosopagnosic patients are reported to match faces normally... Latencies, however, are not invariably measured." To address this issue, DY's performance on the test was timed and it was found that he took 12 min to complete the task. This is an extremely slow time and DY (who tends to verbalize his thoughts when performing such tasks) laboriously compared different features to arrive at his seemingly "normal" accuracy score. The conclusion is therefore that DY's overall processing of the unfamiliar faces on this task is abnormal. This suggests finds it difficult to use

configural information to distinguish one unfamiliar face from another.

## Experiment 1: Within-category Naming

In order to evaluate the specificity of DY's visual recognition abilities, his within-category naming was assessed.

## Stimuli and Procedure

Participants were shown a series of 20 images of familiar objects in each of four categories (national flags, types of car, famous buildings, and football shirts) and were asked to name each exemplar. All exemplars of flags, with a few exceptions, conform to the same rectangular shape and therefore the only way to name the country is by processing the specific information within the flag. Similarly, cars tend to conform to a prototypical shape but differ along dimensions such as relative sizes of different parts, insignias, etc. The exemplars of buildings were chosen so that there were visually similar exemplars such as famous bridges. Finally given DY's interest in football, shirts belonging to teams in the English Premier League were used; as in the case of flags, all football shirts have the same shape and so identity of the particular football club needs to be done by analysis of each exemplar's colors and insignias.

## Participants

DY's responses were compared to those of eight healthy male controls matched for age (range 55–65 years, mean 60.9 years) and education.

## Results and Discussion

**Figure 2** shows DY's performance compared to that of the controls and shows that he was within one standard deviation or less of the control mean and therefore within normal limits (all p > 0.05). This finding demonstrates that DY does not show a within-category recognition deficit, implying that his naming difficulties are restricted to faces.

## Experiment 2: Familiar Object and Face Recognition

One criticism that can be leveled against using intact object recognition in the context of face recognition difficulties to suggest a specific impairment in the latter ability is that normal controls are likely to perform near ceiling levels on both recognition tasks involving faces and everyday objects (Farah, 1994). Judging a neuropsychological patient's performance as 'intact' relative to such ceiling effects therefore can be questionable. To overcome this potential criticism, DY was administered a naming test in which the difficulty of faces and objects had been titrated to be of equal difficulty.

### Stimuli and Procedure

The Essex-Exeter Matched Difficulty Object and Faces tests (Lyons et al., 2002) test has been specifically created to include sets of objects and faces that have been matched for naming difficulty such that normal performance on neither test is at ceiling levels. The test consists of four subsections, two for faces and two for objects. Each of the subsections contains 31 items resulting in a total of 62 items being presented for each category. Stimuli are presented on a computer screen for an unlimited time with the participant having to provide the name or sufficient semantic information to demonstrate recognition. DY's responses were compared to those of the mean and standard deviation for the 50 participants in the original Lyons et al. (2002) paper.

## Results and Discussion

DY named 35/62 of the objects (M = 41.2, SD = 6.5) showing that even when items have got quite specific names (e.g., puffin) he performs within the control range, t(49) = 0.94, p > 0.05. His responses were both accurate and fast. DY achieved a score of 33/62 (M = 42.2, SD = 12.6) for face naming, which is also within the normal range, t(49) = 0.72, p > 0.05. This performance seems paradoxical for an individual who claims not to be able to recognize his wife and other close family members. However, as with his performance on the Benton test, it is important to take account of the method that he used to achieve such a level of performance. Unlike his rapid responses to objects, DYs responses to faces were slow and faltering. Analysis of his verbal protocol while naming faces suggested that the majority of his correct responses were based on individual features found in faces rather than a "normal" recognition of the face<sup>1</sup> . For example, when shown an iconic picture of Marilyn Monroe, it took DY 7 s to arrive at a name and then said that it was a guess based on her beauty spot and the shape of her lips! Also, rather than stating who the person was, he asks the question "Is that Marilyn Monroe?" In sum, although sometimes DY is able to recognize faces, his method of doing so is far from normal and we believe that the apparently normal accuracy score for naming faces masks a profound face recognition difficulty. In Experiment 4, we will demonstrate formally this impairment by measuring RT as well as accuracy when investigating DY's ability to recognize famous faces. Experiment 3 examines learning of unfamiliar faces because featural cues to identity are much less likely to be available to DY with unfamiliar than with famous faces.

## Experiment 3: Cambridge Face Memory Test (Duchaine and Nakayama, 2006)

The Duchaine and Nakayama (2006) Cambridge Face Memory Test provides an opportunity to investigate whether DY is significantly impaired at learning new faces. It is designed to explore recognition memory for unfamiliar faces in both upright and inverted conditions. The standard finding from normal controls is superior memory for faces when seen upright compared to when seen inverted. This "face inversion effect" has been used as a hallmark indication of the special nature of face processing as under normal circumstances, faces are processed as a configural whole. However, when faces are inverted, this dedicated form of processing is disrupted, increasing the reliance on featural processing. If DY's normal accuracy for familiar faces in Experiment 2 is associated with excellent featural processing and impaired configural processing, it would follow that DY should show a greatly reduced face inversion effect relative to controls.

## Stimuli and Procedure

Duchaine and Nakayama's standard procedure was employed. Briefly, participants are presented with black and white images of unfamiliar faces to memorize. Immediately after a set of learning trials for each face or set of faces, the participant is asked to select the target from among an array that includes two distractors. There are three stages increasing in difficulty with a different number of stimuli for each section: Introductory (N = 18), Novel (N = 30), Novel + Noise (N = 24). The test was completed by DY and each normal control participant in an upright condition followed by the inverted condition. (It should be noted that while inverting a face could involve disruption of configural processing, the CFMT also introduces an additional memory component because there is a delay between initial learning of the to-berecognized upright face and the test trials with inverted faces. Therefore, we acknowledge that there is a contamination of a pure inversion effect as measured by the CFMT. However, since many other research groups have used this measure, we do so while acknowledging this caveat).

## Participants

DY's performance was compared to that of 10 normal controls matched for age and WTAR IQ. The controls had a mean age of 59.2 (range 51–67) and mean IQ of 104.7 (range 92–115).

## Results

**Table 2** presents the performance of DY and the NCs as a function of condition, broken down by sub-category within condition. As expected, collapsing across the different conditions, the NCs show a superiority for recognizing faces upright compared to inverted, t(9) = 6.47, p < 0.001. Conversely, DY

<sup>1</sup>Two examples of DY's face recognition problems are presented in the Supplementary Material. DY was asked if he knew the name of the celebrity and if he provided a name, to give a rating from 1 to 10 of his confidence.

#### TABLE 2 | Performance on Cambridge Face Memory Test (standard deviations in parentheses).


performs at least as well on the inverted faces as on the upright faces. Overall, DY was significantly impaired relative to controls in the upright condition, t(9) = 3.18, p = 0.01, but within normal limits for the inverted condition, t(9) = 0.74, p = 0.48. Looking more closely at the sub-categories, there was a significant difference between DY and the normal controls in the upright introductory, t(9) = 11.32, p < 0.001 and novel sub-categories, t(9) = 2.49, p < 0.05. Contrasting with the upright condition, DY was always within normal limits in the inverted condition (all p > 0.05). There was no significant difference between DY and the controls in the most difficult sub-category of both conditions but as can be seen, his performance in both cases was below chance. Only in the condition where noise is added (a manipulation that disrupts local/feature processing more than global processing) did DY show any evidence of poor performance.

Directly comparing the difference between the upright and inverted conditions, using Crawford and Garthwaite's (2005) Revised Standardized Difference Test (RSDT), it was found that DY was significantly different to NCs, t(9) = 10.45, p = 0.00004.

## Discussion

DY's poor performance in the upright condition clearly reveals a significant impairment in learning new upright faces. Interestingly, DY showed no significant impairment relative to controls in the inverted faces condition consistent with the view that his featural processing of faces is normal. The results strongly suggest that he is relying on featural rather than configural information to identify previously unfamiliar faces. As expected, the normal controls display the expected upright superiority effect achieving higher scores in the upright condition than the inverted condition. However, DY did not show the upright superiority effect and in fact performed slightly better in the inverted condition than the upright condition (**Table 2**). The finding that prosopagnosics perform at least as well on inverted as upright faces has been termed the "inverted inversion effect" (Farah et al., 1995) and is even found in some cases of developmental prosopagnosia (Duchaine et al., 2006; Le Grand et al., 2006; Bate et al., 2008).

The absence of an inversion effect for DY must be treated with some caution, however. First, the performances of DY and controls were near floor in some of the inverted conditions. Second, because of the structure of the CFMT, it is not possible to counterbalance half sets of the upright and inverted conditions. Since our primary aim was to objectively demonstrate DY's difficulty in remembering pre-experimentally unfamiliar faces, we conducted the upright condition first and followed this with the inverted condition in order to test for the inversion effect. So one explanation for the lack of inversion effect in DY is that it may have come about because the inverted faces were presented after the faces had already been presented in the upright condition.

## Experiment 4: "Face-fracturing" Test

In this experiment, we investigated the time that it takes for DY to recognize a famous face. It seems highly likely that his recognition strategy will lead to extremely long RTs even if it sometimes produces accurate performance. We tried to ensure fairly accurate performance by using a set of faces that DY was able to identify consistently. The stimuli were generated by asking DY's wife to provide a list of names of famous people who, she felt DY recognized on a consistent basis when they appeared on TV or in the newspapers. It was stressed that this recognition should be based on visual attributes rather than their names, voices or any semantic information. Using a variety of sources, 25 easily recognizable photos were compiled for the set of stimuli. The critical dependent variable was the speed with which these faces could be identified by DY.

A second goal of the experiment was to investigate DY's ability to recognize fractured faces. Moscovitch et al. (1997) showed that their object agnosic patient CK had impaired recognition of faces that were created by taking intact photographs and cutting them into five or six parts. Individual features (eyes, noses, etc.) were kept intact and the first-order relations between the features were kept intact (e.g., the eyes were kept above the nose which was kept above the mouth, etc.). Moscovitch et al. found that whilst CK's performance was completely normal in the intact condition, his performance fell six standard deviations below the mean of the controls when the same faces were "fractured." Since exactly the same visual information was available in both conditions, these data strongly suggest that the manipulation of isolating features spatially by destroying the gestalt impaired CK's recognition ability. In Experiment 4, Moscovitch et al.'s paradigm was adopted for use with DY.

## Stimuli

Photos that DY's wife thought he would recognize were compiled for the 'intact' set of stimuli. It was stipulated that all the faces had to be of individuals who had come to public prominence before 1999 when DY was first diagnosed with brain damage. Then using image-manipulation software (Corel Draw), each of these color photos was digitally cut using the criteria suggested by Moscovitch et al. (1997). **Figure 3** gives an example of the face of Bob Geldof in the two conditions. In total 25 faces were used.

## Participants

DY's performance was compared to a group of six male control participants who also participated in Experiment 1.

## Procedure

Each list of intact and fractured faces was divided into two equal sets to allow counter-balancing of conditions. Half of the intact

set and half of the fractured set were used on Day 1 of testing and then the remainder were used on Day 2 of testing which took place a week later. Stimuli were presented individually on a laptop using E-prime software and the participant was asked to name as quickly as possible the individual in the display. This allowed accuracy and response times for correctly named stimuli to be measured.

#### Results

In terms of naming accuracy, controls named 24/25 of the intact faces (SD = 0.63) and 23.17/25 of the fractured faces (SD = 1.17). DY performed very similarly to the controls naming 23 faces in the intact condition, t(5) = 1.46, p > 0.05, and 22 in the fractured condition, t(5) = 0.93, p > 0.05. DY's accurate performance for intact faces is expected given that the stimulus set was created by asking his wife for faces that he consistently recognizes; however, it is striking that fracturing has no significant impact on his overall accuracy. By contrast, CK (Moscovitch et al., 1997) was severely impaired by a fracturing manipulation.

To investigate performance further, the average times for correct responses were compared (see **Figure 4**). As can be seen, the normal pattern of performance is that the fractured condition takes longer, almost double that of the intact condition. However, DY shows the opposite pattern with his average time in the fractured condition being only 2 s slower than that of the controls whereas his average time for the intact condition was on average 10 s slower. Directly comparing the difference between RTs in the intact and fractured conditions (Crawford and Garthwaite, 2005), revealed that DY was significantly different from NCs [t(5) = 10.66, p = 0.00013].

### Discussion

The results demonstrated that DY takes a relatively long time to identify familiar faces despite his accurate performance. They also revealed that face fracturing has no impact on his familiar face identification accuracy. Furthermore, the time that DY took to arrive at an answer was in fact faster in the fractured condition, and, anecdotally, he reported that he found this condition easier. Overall, DY's performance implies reliance on featural processing irrespective of whether the face is presented intact or fractured. It may well be that he performs more quickly in the fractured face condition because his impaired configural processing skills interfere with face recognition in the standard

condition (cf. Farah et al., 1995; Boutsen and Humphreys, 2002). The finding that his performance was faster in the fractured condition is consistent at some levels with the evidence of inversion superiority in Experiment 3. As suggested by Farah et al. (1995, p. 2093), this "concept of dominance by a specialized but impaired brain system" has been invoked to explain the discrepancy found in other areas of neuropsychology such as linguistic performance following left-hemisphere brain damage. There may be no interference in the fractured condition because such stimuli do not activate DY's impaired configural processing system.

## Experiment 5: Inverted Famous Faces

The results of Experiments 3 and 4 suggest that DY's problems in identifying faces are associated with a deficit in configural processing despite normal featural processing. Experiment 5 investigated his ability to identify inverted pre-morbidly familiar faces. If he uses featural rather than configural information, it would be predicted that he would show no effect of inversion on accuracy of naming or on the time necessary to recognize a face as being familiar.

#### Stimuli and Procedure

In the first phase of this experiment, the faces from Experiment 4 were used again. They were inverted and presented on a laptop computer. There was no time limit. Following Moscovitch and Moscovitch (2000), an answer was deemed correct if the name was provided or if sufficient semantic information to demonstrate recognition was produced. At least a week separated the presentation of the upright and inverted faces. In the second phase of the experiment, which took place several months later, new inverted pictures of the 25 faces used in Experiment 4 were presented. We used new pictures of the celebrities to avoid any possible priming from having seen the images used in Experiment 4. Participants had to respond with a key press as to whether or not they recognized the inverted face as familiar, a procedure that allowed RT for recognition to be measured.

## Participants

DY's performance was compared to that of nine normal controls matched for age and WTAR IQ. The controls had a mean age of 63.1 years (range 59–69) and mean IQ of 112 (range 90–117).

## Results

**Table 3** shows performance of DY and the matched controls for recognition in the inverted and upright conditions. DY's Phase 1 upright accuracy scores come from Experiment 4. There was no significant difference between DY and the control participants in the inverted condition, t(8) = 0.76, p > 0.05, with his performance falling within one standard deviation of the control mean. The accuracy with which inverted faces were recognized in Phase 2 was also within the normal range, as was the length of time required to make these identification decisions.

## Discussion

When faces that DY can recognize are inverted, his recognition is within normal limits in terms of both speed and accuracy. It is also interesting to note that, like the normal controls, DY's performance was much better in the upright than in the inverted condition. A strong version of a theory that suggests that DY only has access to featural processing might predict that his performance should be the same upright and inverted since he would be basing his recognition on a simple featural match. One possibility, therefore, is that the recognition of individual facial features is, to at least some extent, orientation specific (Moscovitch and Moscovitch, 2000). If so, inversion will make not only configural processing but also featural processing of faces more difficult. Consistent with such an account, a number of experiments have shown that inversion disrupts face feature perception in matching tasks (e.g., Yovel and Kanwisher, 2004; Yovel and Duchaine, 2006). If it is assumed that face fracturing does not interfere with featural processing to the same degree as inversion, then this would explain why performance was much better for fractured faces in Experiment 4 than for inverted faces in the current experiment.

## Experiment 6: External Features

Moscovitch and Moscovitch (2000) argued that if their patient CK's object agnosia was driven by reliance on a face-processing system that is based on configural processing as a result of damage to the part-based system, he would suffer if the main configural information is removed from a face. To explore this,



they created stimuli in which the main configuration of eyes, nose and mouth were cut and were replaced by a white space. They found that whereas healthy controls were somewhat impaired by this manipulation (with recognition dropping to 63.8% of that with the faces whole), CK was grossly impaired with his performance dropping to 33.3%. Experiment 6 was conducted to investigate DY's recognition of familiar faces using only external features.

## Stimuli and Procedure

A corpus of 42 famous faces was assembled. Some of these were of faces that DY is known to recognize and had been used in Experiments 4 and 5; however, care was taken to make sure that the same photograph was not used for the current study and instead new photographs were found. The remaining faces were of famous individuals who are often recognized from their very particular hairstyles or other features outside the face. Following, Moscovitch and Moscovitch's (2000) procedure, a line was drawn just above the eyes and the edges of this were joined to points just either side of the mouth and finally these two points were brought together just underneath the mouth. The space created by these five lines was filled in with white space (see **Figure 5**). All participants were presented the stimuli in the same order on a laptop computer. There was no time limit and, again, an answer was deemed correct if the name was provided or sufficient semantic information to demonstrate recognition was produced.

## Participants

DY's performance was compared to that of the nine controls who took part in Experiment 5.

## Results

DY identified 31/42 of the faces correctly compared to the mean of the controls which was 32.1 (SD = 6.37). There was no significant difference between these scores, t(8) = 0.17, n.s.

## Discussion

DY's unimpaired performance on this test shows that he is able to recognize familiar faces from their external features. This finding represents a dissociation with patient CK (Moscovitch

and Moscovitch, 2000) whose performance on this task was severely impaired. It provides further evidence that DY's familiar face recognition impairment is characterized by normal featural processing but impaired configural processing.

## Experiment 7: Navon Figures

In a landmark study, Navon (1977) investigated the relationship between processing at the "global" level looking at the whole, and processing at the more local level looking at the specific elements of this whole. Using arrays of stimuli where the target (e.g., a large H) was made up of many constituent elements (e.g., small squares or other letters) he found that the "global pattern is apprehended but not its components. All but three subjects did not even notice that the stimuli were made of small letters" (Navon, 1977, p. 368). In his third experiment, he looked at the effect of directing attention either to the global figure (e.g., the large H) or its constituent elements.

Inferences were made from differences in response times when the letters were conflicting (e.g., large H composed of small Ss) and when the letters were consistent (e.g., large H composed of small Hs). Navon found that participants were quicker to recognize global letters than constituent local elements. More importantly, they were also significantly impaired in recognizing local letters when they conflicted with the global letter (e.g., a large S composed of smaller Hs), but not in recognizing global letters when the images conflicted. Navon proposed that global configural aspects of an image are perceived before the local parts. This finding, which has been replicated many times, (see Kimchi, 1992 for a review) has been termed the "global precedence hypothesis." Darling et al. (2009) found that normal participants who were the most susceptible to global interference when recognizing local letters on the Navon task performed better on a test of unfamiliar face identification. Martin and Macrae (2010) reported that individuals who show weak global interference show a reduced face inversion effect on a test of face recognition. There is therefore evidence that global processing in a Navonstyle paradigm corresponds with configural processing and local processing corresponds with featural processing. Moreover, individuals with developmental prosopagnosia have been shown to have local rather than global preference (Behrmann et al., 2005) on this task. Consequently, it would be predicted that if DY has an impairment in configural processing, he should not show the global precedence effect. However, this remains an open question because Busigny et al. (2010) and Busigny and Rossion (2011) found that two different patients with acquired prosopagnosia both showed the standard global precedence effect on the Navon task.

Experiment 7 investigated the Navon effect in DY and matched controls; the latter were expected to show quicker responses in the global attention condition than in the local attention condition. The critical issues were whether DY would be slower in the global level attention condition than the local level attention condition, and whether there would be any significant difference in DY's responses at the global level (as in normal performance) in the conflicting and consistent conditions. At the local level, controls should perform more slowly in the conflicting conditions. Would, however, DY show any significant difference between response times made in the conflicting and consistent conditions?

## Stimuli

Four Navon-type letter images were created. These consisted of large figures of H and S composed of either smaller Hs or Ss, resulting in four possible images, two consistent and two conflicting (see **Figure 6**). The large letters were created on a template using Arial font, point size 300. The smaller letters were created using Arial font, point size 24.

## Participants

DY and nine normal controls from Experiment 3 took part in this experiment.

## Procedure

The test involved a fixation point presented for 2 s, followed by the letter image being presented on a computer screen for 100 ms using E-prime software. The image was then followed by a mask which was a simple array of dots that covered the same visual angle as the experimental stimuli. The participants' task was to respond as quickly and as accurately as possible to whether the image attended to was an H or S by pressing the H or S keys on the keyboard. The mask remained until a response was made. Each of the stimuli were presented 20 times, with a total of 80 trials; 40 of the trials were classified as "consistent" (i.e., H made of Hs and S made of Ss) and the remaining 40 trials were classified as "conflicting" (i.e., H made of Ss and S made of Hs). DY and controls carried out the test in two conditions. The first condition was to respond to the identity of the "global" letter and the second condition was to respond to the identity of the "local" letter. Accuracy and response times were recorded. Before the tests in both conditions, a series of practice items were presented using combinations of the letters L and B until the participant felt comfortable enough to proceed with the test.

## Results

In terms of accuracy, DY made 6.25% errors; one control made 23.75% errors while the error rate for the remainder ranged between 0 and 7.5% so this control was omitted from further analysis. Response times for correct responses were analyzed, and outliers that were more than 2 SDs above the mean were removed for each participant.

DY's response times were found to be in the normal range in both local conditions [consistent: 513 ms, t(7) = 1.03, ns;

conflicting: 547 ms, t(7) = 0.38, ns] (see **Figure 7**). However, in the global task he was significantly slower in the consistent condition [591 ms, t(7) = 3.77, p = 0.007] while the difference in reaction times to that of the controls in the conflicting condition approached significance [627 ms, t(7) = 2.01, p = 0.08].

The normal controls showed a classic interference effect in the local condition with the responses to the consistent letters being faster than those to the conflicting ones (local consistent M = 445 ms, SD = 62; local conflicting M = 517 ms, SD = 73), t(7) = 3.86, p = 0.0062. A similar interference was found in the global condition (global consistent M = 413, SD = 44; global conflicting M = 460, SD = 79), t(7) = 3.20, p = 0.015. Unlike the controls, DY was not susceptible to the interference effect in either the local condition, t(69) = 1.39, p = 0.17, or the global condition, t(69) = 1.01, p = 0.31.

Finally, the global and local conditions were compared to one another. For the normal controls, the global condition was faster and this difference approached significance [Global M = 430 ms, Local M = 480; t(7) = 2.11, p = 0.073]. DY, on the other hand was significantly faster in the local condition (Global M = 609 ms, Local M = 530), t(139) = 3.76, p < 0.001.

#### Discussion

As expected, the results from the healthy controls replicated the classic Navon effect, i.e., that perception of the whole precedes that of constituent elements of an image. This is demonstrated starkly in the significant slowing down when the task is to name the constituent element when its identity conflicts with that of the global form. This happened for both the global and local conditions of the task and while Navon did not find this in his original study, the same has been found by Behrmann et al. (2005); Busigny et al. (2010) and Busigny and Rossion (2011). However, DY's response is quite abnormal and somewhat different from that of the patients studied by Busigny and colleagues. In the local task, his reaction times were comfortably within normal limits but unlike controls, he showed no interference effect. His perception of the whole is grossly abnormal however, with his reaction times being slower than that of controls in both consistent and conflicting conditions. Further, he derives no advantage when the global form matches the local elements and showed no interference effect. Finally, unlike Busigny and Rossion's (2011) patient PS, and similar to Busigny et al.'s (2010) patient GG, DY was significantly faster in the local task. This finding is consistent with intact featural processing paired with impaired global processing.

## Experiment 8: Mental Imagery for Famous Faces

Young et al. (1994) conducted a series of studies on prosopagnosic patients HJA and PH to investigate the links between visual recognition and mental imagery for faces. The results showed that it was possible for an apperceptive prosopagnosic patient such as HJA to have a profound face recognition difficulty and yet perform very well on tasks requiring him to make judgements requiring imagery for faces that he did not recognize. Experiments 8 and 9 were constructed to investigate the integrity of DY's mental imagery for faces.

#### Stimuli

Following Young et al.'s (1994) procedure, four sets of 20 different people who had become famous before the onset of DY's face recognition difficulties (2001) were created. In each set, 10 had a particular feature and 10 did not. The features used were baldness (i.e., 10 people known for being balding or with shaved heads such as the actor Telly Savalas and 10 hirsute people), facial hair (10 people known for usually having mustaches or beards and 10 who were not), fair hair (10 people with fair hair and 10 with dark hair), and glasses (10 people known to usually wear spectacles and 10 who did not). Within each set, the order of the 20 names was pseudo-randomized.

#### Participants

DY's performance was compared to that of eight normal controls matched for age and WTAR IQ. The controls had a mean age of 61.0 years (range 58–65) and mean IQ of 112 (range 101–117).

#### Procedure

Each name within a set was presented individually and the participant was asked to imagine the person's face and answer the question relevant to that set, i.e., balding vs. not balding, facial hair vs. no facial hair, fair vs. dark hair and glasses or no glasses. Examples from each set include: balding vs. not balding, Telly Savalas (correct answer "yes"), Elvis Presley (correct answer "no"); facial hair vs. no facial hair, Groucho Marx (correct answer "yes"), Cliff Richard (correct answer "no"); fair vs. dark hair, Meg Ryan (correct answer "yes"), Jimi Hendrix (correct answer "no"); glasses vs. no glasses, Buddy Holly (correct answer "yes"), Paul McCartney (correct answer "no").

#### Results

Overall, DY achieved 85% accuracy across all four categories and this matched the average of the control participants which was also 85%. From **Figure 8** it can be seen that across the four

categories, DY's performance was at the mean level of the controls or was within one standard deviation. As a result, no further analysis was conducted.

## Experiment 9: Mental Imagery Free Recall

Experiment 8 involved a simple "yes/no" decision and DY's performance seemed perfectly intact. However, it could be argued that good performance on this task is driven by propositional knowledge of different attributes of individuals' faces and that this is not a convincing demonstration of the intactness of DY's internal representations of faces. Therefore, Experiment 9 examined DY's mental representations by conducting a free recall mental imagery task in which he was asked to describe in his own words what a number of people looked like.

## Stimuli and Procedure

The names of 10 famous personalities were read out to participants one at a time. They were asked to describe the person's face to their best of their ability and to avoid semantic attributes, i.e., to base the descriptions purely on visual features. The protocols of the descriptions were transcribed and the resulting transcripts had any remaining non-visual semantic information removed. The 10 verbal descriptions produced by each participant were then given to a set of six raters along with the 10 target names. The raters were simply asked to match the descriptions to the names. From this procedure, the dependent variable for each experimental participant was the average number of their descriptions that were correctly matched to target names by the raters.

## Participants

DY's performance was compared to that of eight normal controls matched for age and WTAR IQ. The controls had a mean age of 61.2 (range 51–69) and mean IQ of 107.4 (range 90–117).

## Results

Across the six raters, DY's verbal descriptions scored 8/10 which was well-within the range of the normal controls (M = 8.4, SD = 1.16), t(7) = 0.33, p > 0.05.

## Discussion of Experiments 8 and 9

The results from Experiments 8 and 9 show that, despite DY having profound problems in recognizing faces, he nonetheless is able to make very good judgments and provide recognizable descriptions from his internal mental images of famous people's faces. His performance (in Experiment 8) is similar to that of the apperceptive prosopagnosic patient HJA who also performed normally on facial imagery tasks for faces that he could not recognize. The mental imagery studies strongly imply that DY's internal representations of the faces of famous people are largely intact. We conclude that his face recognition units are preserved and can be accessed from familiar names and from the semantic system (see Craigie and Hanley, 1993, 1997, for discussion of how this form of retrieval appears to take place).

## General Discussion

In this study, we have presented data from a patient who, in the context of relatively unimpaired naming of familiar objects, complains of profound face recognition difficulties. His impairment with once familiar faces is so severe that, in everyday life, he is unable to recognize even close members of his family such as his wife, children, or grandchildren. **Table 4** presents a summary of DY's performance as standardized scores relative to controls on the main background tests and the nine experimental studies. **Table 4** also indicates the type of processing that was under investigation in each experiment.

Consistent with his reported problems, DY performed much worse than controls at learning new upright faces (Experiment 3). Paradoxically, he identified famous faces more accurately than might have been expected (Experiment 2) and his accuracy in unfamiliar face-matching tasks (e.g., Benton and van Allen, 1968) put him within the "normal" range. However, closer inspection of his performance revealed that his higher-thanexpected accuracy was based almost entirely on identification of particular features within faces. Mistaking a photo of one of the researchers for George Michael because the former has a goatee beard provided an example of how reliant he is on recognition of individual features. This featural strategy revealed itself clearly when he took much longer to identify upright famous faces (Experiment 4) than controls. Similarly the time that he took to complete the Benton unfamiliar facematching test was grossly abnormal, and his verbal protocols clearly revealed a laborious feature-by-feature matching strategy.

The results of subsequent experiments provided further evidence of DY's reliance on featural processing. He produced normal or relatively preserved performance on tests of familiar face recognition that depend on processing of featural information such as recognition of fractured familiar faces (Experiment 4), recognition of inverted familiar faces (Experiment 5), and recognition of famous faces from their external features (Experiment 6).

In none of these tasks is configural information readily available from faces, and DY appears to be relatively unaffected by


TABLE 4 | Summary of DY's performance on background and experimental tests relative to controls with z-scores where possible (numbers in brackets denote experiment numbers).

NCx˙ , Normal Control (NC) mean; NCsd, NC standard deviation; BFMT, Benton Face Matching Task; Essex-Exeter, Essex Exeter Matched Difficulty Task; CFMT, Cambridge Face Memory Task; RT, mean response time (ms); Acc, Accuracy; FR, Free Recall.

its absence. DY's performance in these four experiments, together with his unimpaired ability to name objects, represents a double dissociation with the object agnosic patient CK (Moscovitch et al., 1997; Moscovitch and Moscovitch, 2000) who performed badly at object recognition and on face recognition tasks that require featural processing despite excellent recognition of faces when configural information is available.

When configural information must be used to achieve normal levels of performance, as in the time required to recognize upright famous faces, DY performed much worse than controls (Experiment 4). Consistent with a holistic processing deficit, DY performed poorly on the Mooney faces. Consistent with a more general global processing deficit, DY performed differently from controls in Experiment 7 where he showed evidence of impaired processing of the global form of the Navon Figures. In line with the views of Farah and Moscovitch, therefore, the performance of DY provides strong evidence of a patient with prosopagnosia whose problems in recognizing familiar faces is the consequence of a holistic/configural processing deficit.

Nevertheless, DY's accurate recognition of upright familiar faces (Experiment 2) raises an important question. Why is he able to identify so many famous faces via a featural processing strategy when the performance of many prosopagnosics on such tasks is either at chance or is severely impaired? Is this because DY has unusually good featural processing skills? Some prosopagnosics such as HJA and MS (Newcombe et al., 1989) do suffer from object recognition deficits as well as from face recognition deficits. So, their total inability to identify any familiar faces probably does reflect severe featural as well as configural processing impairments.

However, the situation is different with other prosopagnosic patients whose accuracy at familiar face recognition is severely impaired such as LH (Levine and Calvanio, 1989; Farah et al., 1995), FB (Riddoch et al., 2008), and WJ (McNeil and Warrington, 1993), All three cases appear to have preserved featural processing: LH was able to recognize difficult objects well and performed well at matching inverted unfamiliar faces; FB showed excellent ability to name familiar objects and to learn names for greebles (complex novel shapes); WJ showed excellent ability to identify sheep facesIt therefore seems unlikely that the featural skills of DY are markedly superior to those of all three of these patients. So why are they much less accurate than DY at familiar face identification? One possible explanation is that familiar face identification problems in these three patients reflect a more associative form of prosopagnosia than that experienced by DY. In these three individuals, there may be an impairment either to the face recognition units themselves or to the connections between the face recognition units and the rest of the cognitive system (Burton and Young, 1999). A problem of this kind would impair identification of familiar faces even if featural processing was entirely preserved.

It is also interesting to note that DY's ability to identify inverted and fractured familiar faces makes a striking contrast with the performance of patient DC, reported by Rivest et al. (2009). Like DY, DC had excellent object recognition skills, consistent with preserved featural processing despite problems in identifying familiar faces and matching unfamiliar faces. Unlike DY, however DC, was impaired relative to controls at identifying fractured and inverted familiar faces. For example, DC recognized only 9.1% of inverted pictures of famous faces that he could identify when presented upright. The corresponding figure for controls was 52%. Rivest et al. concluded that the partsbased system cannot by itself identify familiar faces and that the configural processing system must interact with the featural system to recognize fractured and inverted faces. A configural processing deficit, they argue, will invariably lead to a problem in identifying inverted faces. They therefore predict that it should not be possible to observe a prosopagnosic patient who provides a double dissociation with the object agnosic CK by performing well at object recognition and at the recognition of inverted and fractured faces. As we suggested earlier, however, DY appears to represent exactly such a case. It is therefore worth considering instead whether an impairment at the level of the face recognition units (Burton and Young, 1999) might be able to explain DC's poor performance when recognizing familiar faces. Because a face recognition unit impairment would affect face processing at

## References


a point at which featural and configural processing have already been completed, it would disrupt identification of familiar faces regardless of whether they were upright, inverted or fractured. This is precisely the pattern of performance that Rivest et al. observed in DC.

In conclusion, although there is considerable evidence that prosopagnosics' impaired configural processing interferes with their processing of unfamiliar faces (e.g., Rossion et al., 2011), there is much less evidence that a configural processing deficit is the cause of impaired identification of familiar faces in prosopagnosia. Indeed, it appears that the familiar face processing problems experienced by prosopagnosic patients such as DC (Rivest et al., 2009), LH (Levine and Calvanio, 1989; Farah et al., 1995), FB (Riddoch et al., 2008), and WJ (McNeil and Warrington, 1993) can be readily explained in terms of a problem at the level of the face recognition units (Burton and Young, 1999). In cases of apperceptive prosopagnosia such as HJA or ME (Young et al., 1994), there is evidence of impaired featural as well as impaired configural processing. The case of DY is therefore unusual in that he has no problem in recognizing objects (consistent with unimpaired featural processing). Neither does he have a problem in identifying inverted or fractured familiar faces or in accessing mental images of familiar faces (consistent with unimpaired face recognition units). He therefore presents the clearest case yet reported of an acquired prosopagnosic whose impaired processing of familiar faces appears to be the consequence of a configural processing deficit.

## Acknowledgments

We would like to thank DY and his wife for their generous time in participating in the studies and Avery Braun, Jacob Waite, and Nadine Wanke for their help in preparing the manuscript. We would also like to thank Bruno Rossion and Thomas Busigny for allowing us to use their Mooney paradigm. Part of the research was funded by a grant awarded to AJ by the Experimental Psychology Society (EPS).

## Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnhum. 2015.00390

1. Behavioral findings. J. Cogn. Neurosci. 17, 1130–1149. doi: 10.1162/0898929054475154


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Jansari, Miller, Pearce, Cobb, Sagiv, Williams, Tree and Hanley. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Impaired holistic processing of left-right composite faces in congenital prosopagnosia

## *Tina T. Liu1,2\* and Marlene Behrmann1,2*

*<sup>1</sup> Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA*

*<sup>2</sup> Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA, USA*

#### *Edited by:*

*Davide Rivolta, University of East London, UK*

#### *Reviewed by:*

*David White, The University of New South Wales, Australia Friederike Zimmermann, University of East London, UK*

#### *\*Correspondence:*

*Tina T. Liu, Department of Psychology, Carnegie Mellon University, Baker Hall 331, Pittsburgh, PA, USA e-mail: tinaliu@cmu.edu*

Congenital prosopagnosia (CP) refers to a lifelong impairment in face processing despite normal visual and intellectual skills. Many studies have suggested that the key underlying deficit in CP is one of a failure to engage holistic processing. Moreover, there has been some suggestion that, in normal observers, there may be greater involvement of the right than left hemisphere in holistic processing. To examine the proposed deficit in holistic processing and its potential hemispheric atypicality in CP, we compared the performance of 8 CP individuals with both matched controls and a large group of non-matched controls on a novel, vertical composite task. In this task, participants judged whether a cued half of a face (either left or right half) was the same or different at study and test, and the two face halves could be either aligned or misaligned. The standard index of holistic processing is one in which the unattended face half influences performance on the cued half and this influence is greater in the aligned than in the misaligned condition. Relative to controls, the CP participants, both at a group and at an individual level, did not show holistic processing in the vertical composite task. There was also no difference in performance as a function of hemifield of the cued face half in the CP individuals, and this was true in the control participants, as well. The findings clearly confirm the deficit in holistic processing in CP and reveal the useful application of this novel experimental paradigm to this population and potentially to others as well.

**Keywords: congenital prosopagnosia, holistic processing, composite face effect, chimeric face, face lateralization**

## **INTRODUCTION**

Congenital prosopagnosia (CP) refers to the apparently lifelong impairment in face recognition despite normal vision, intelligence, and other cognitive skills. Individuals with CP generally have great difficulties recognizing faces of other people, including their friends and family members, and can even have problems recognizing their own face. CP is a puzzling disorder as these individuals do not have frank neurological damage and, yet, they do not attain mastery of face recognition incidentally over the course of development (for a review, see Behrmann and Avidan, 2005). Of importance to vision science, CP offers a unique window into understanding the psychological and neural mechanism of face processing and, as such, this neurodevelopmental condition has received considerable attention recently.

Unlike acquired prosopagnosia (AP) which results from explicit brain damage and is rare, CP is more common in the population at large (approximately 2% of prevalence rate) in both the Caucasian (Kennerknecht et al., 2006), and non-Caucasian population (Kennerknecht et al., 2007, 2008), and runs in some families (de Haan, 1999; Grüeter et al., 2007; Johnen et al., 2014). Much of the recent research has explored the neural basis of CP and has identified differences, relative to controls, in the distributed face network. These differences are apparent to a greater degree in the more extended/anterior portions of the network than in the more core/posterior regions (Avidan and Behrmann, 2014; Avidan et al., 2014; but see Furl et al., 2010 for a different finding). Studies that explore the psychological or computational basis of CP have largely focused on the failure of these individuals to process visual information holistically and the goal of this study is to explore this further.

## **HOLISTIC PROCESSING (HP) OF FACES**

Given that all faces differ only slightly in the shape and size of facial features which are arranged in the same top-heavy configurations, the spatial relations among these features is considered particularly important for face recognition. In line with this idea, it has been suggested that facial features and their spatial relations are processed holistically (for a review, see Maurer et al., 2002; Cheung and Gauthier, 2010); in other words, there is obligatory or non-independent encoding of all parts of the face and the parts cannot be ignored (for review of recent evidence and perspectives, see Richler and Gauthier, 2014). Moreover, not every observer engages holistic face processing to the same degree and individual differences in HP may have a significant impact on face recognition and may even be used to predict face recognition skills (Richler et al., 2011; Wang et al., 2012).

Many measures of HP have been developed and converging evidence from the face inversion task (Yin, 1969), the part-whole task (Tanaka and Farah, 1993) and the composite face task (Young et al., 1987; Richler et al., 2008) all support the idea that, in normal object perception, faces are processed in a more holistic fashion than other types of objects. Also, expertise with a class of objects can confer the need for HP of homogeneous exemplars (Richler et al., 2009) and HP emerges over the course of development (Scherf et al., 2009). Although there is still considerable debate in the literature concerning the relationship between HP and configural processing, we adopt an operational definition such as that articulated by Richler and Gauthier (2014; see also Amishav and Kimchi, 2010, for definitions) and examine the extent to which parts of a face are encoded in a mandatory and non-independent fashion.

In contrast with the reliance on HP evinced by normal observers, impaired HP and an over-reliance on featural processing are frequently reported in individuals with CP (e.g., Levine and Calvanio, 1989; Avidan et al., 2011). For example, CPs do not show the expected decrement in performance in inverted vs. upright faces (Behrmann et al., 2005). CPs also do not show normal performance in the context of the composite face paradigm. In the standard horizontal composite design, participants make same/different judgments of one half of two faces (say the top half) and the two halves of a face can be either misaligned or aligned. The signature of HP, known as the *composite face effect*, refers to the adverse impact on matching when the two relevant halves are the same (the top halves are identical across the two faces or the bottom halves are identical) and the two irrelevant halves are different, and this interference from the unattended half is greater when the face halves are aligned rather than when they are misaligned (see Rossion, 2013 for a review). That is, when the face halves are aligned, the interference from the irrelevant halves convincingly demonstrates that face processing is "holistic": observers cannot help but process information about the unattended portion of the face, even if it is task-irrelevant. This interference is not apparent to the same extent in the misaligned trials indicating that the face is not perceived holistically when the parts are not in their usual configuration.

Interestingly, unlike the pattern described above, individuals with CP do not make false alarms in the horizontal composite task and do not show the increased interference from the aligned compared with the misaligned unattended half face (Avidan et al., 2011; Palermo et al., 2011). Instead, they perform more veridically than the controls (faster RTs and fewer false alarms), remaining immune to the contribution of the unattended aligned half, and thereby reflecting the deficit in HP in CP. Whether this holistic deficit is true for all CP individuals is unclear. For example, Le Grand et al. (2006) showed that, on a standard composite face task (attend to top or bottom half of face), 7 of their 8 CPs exhibit a composite face effect that is not differentiable from controls.

The failure of CP individuals to apprehend all parts simultaneously appears to extend beyond their ability to encode face parts holistically. For example, in one recent study (Tanzer et al., 2014), CP individuals were asked to judge the width of visually presented rectangles while ignoring their irrelevant height, or to judge changes in width while height remained constant in the context of a Garner speeded classification task. While controls exhibited the expected Garner interference, no such interference was observed for the CPs, indicating impaired HP of integral, non-facial shape dimensions. Both CPs and controls exhibited the same level of Garner interference when the task was changed to reporting non-shape dimensions (in this case, color). These findings indicate a deficit in holistic integral perception of shape dimensions in CP (but see recent paper by Busigny et al., 2014 for argument on face-specific impairment in holistic processing).

It is also the case that some studies show that deficits in CP extend beyond configural processing *per se* as these individuals are also impaired at integrating featural and configural information (Kimchi et al., 2012), and show local superiority and precedence in a hierarchical Navon letter task (Behrmann et al., 2005). Perhaps, unsurprisingly, these deficits adversely impact shape perception more generally rather than just affecting face perception, and this more general perceptual disorder results in difficulties in subordinate-level object discrimination, as well (Behrmann et al., 2005; Garrido et al., 2008). We note that, although there is a growing consensus that HP is affected in CP, this may not be true of all individual cases. As noted previously, most CP individuals in Le Grand et al. (2006) showed a normal composite effect. Also, DeGutis et al. (2012) reported that the inversion and scrambling of face images produced comparable deficits in CPs and controls, suggesting that both groups use holistic processing and configural information to recognize gender. Also, in some studies, CP participants exhibited the typical global superiority in the Navon compound letter task, assumed to tap into higher-order componential processing, as well (Duchaine et al., 2007b; but see Avidan et al., 2005 for impaired global perception in CP). The crux of the current study is to explore HP in CP further with use of a fine-tuned, novel paradigm, as described below, and to characterize this ability both at the group level and the level of each CP individual.

### **HEMISPHERIC LATERALIZATION OF FACES IN CP**

In addition to characterizing HP in CP, here, we examine an additional aspect of their behavior concerning possible differences in hemispheric specialization between CP and controls. There is a general consensus in the field that face perception and holistic processing are more strongly mediated by computations of the right hemisphere (RH) than the left hemisphere (LH). For example, Rossion et al. (2000) found that the RH was activated to a greater extent when participants matched whole faces than face parts whereas this pattern of activity was reversed in the LH homologous region (see also Meng et al., 2012, for differences in hemispheric computations in face perception). Whether CP individuals show a difference in hemispheric profile remains to be determined.

To date, there have been very few detailed explorations of differences in hemispheric specialization in CP vs. controls. Hasson et al. (2003) reported that their CP participant, YT, evinced activation in left lateral occipital (LO) cortex that was more than 1 SD outside the normal range, although they went on to show, using a laterality index, that this difference was unlikely to be associated with YT's face perception difficulty as some of the normal observers showed the same bias toward LH activation. Of course, the absence of a difference in the RH goes against the idea of a disadvantage in the preferential RH HP but nevertheless, the subtle LO difference in CP prompts us to explore this issue further. It is also the case that Avidan et al. (2014) noted a slight difference between CP and controls in hemispheric organization: specifically, they showed that there was greater activation (but not number of voxels or any other dependent measure) in CPs in the left superior temporal sulcus (STS) compared with the controls. Additionally, the right, but not the left occipital face area (OFA), was slightly larger in the controls than in the CP although the activation profiles were not dissimilar across the two groups. Together, these subtle atypicalities, although inconsistent across dependent measures, sides and studies, lead us to examine hemispheric differences in CP more closely in our characterization of HP.

## **THE CURRENT STUDY**

In the current study, we adopt a novel paradigm, the vertical composite task, and explore further the manner and extent to which individuals with CP are impaired at HP and whether this impairment is differentially modulated by hemispheric lateralization, relative to controls. The design of this novel task is a modification of the standard (horizontal) composite task, which has been used extensively to uncover HP of faces under a variety of contexts and manipulations (for overview, see Richler and Gauthier, 2013; Rossion, 2013), and specifically, it is designed to permit us to examine the hemispheric effects, as well.

To explore HP and its hemispheric effects in CP at both the group and the individual level, we examine whether the unattended half of the aligned/misaligned face influences performance on the attended half of faces that are halved vertically. Thus, we examine whether there is any effect on performance from the uncued half-face when participants are cued to the left (right visual field) or right (left visual field) of the chimeric face stimulus. To this end, we created left-right composite (or chimeric) faces by pairing the left half of one face with the right half of another face of the same gender and race. **Figure 1** shows a schematic depiction of the paradigm.

This paradigm potentially affords us several advantages over the standard horizontal composite task. Bisecting the face along the horizontal vs. vertical dimension may make a difference to face perception. CK, an acquired agnosic individual who was able to recognize faces much better than objects, was able to identify famous faces much better when the faces were halved down the midline (and the two halves were misaligned) than when the bisection and misalignment was along the horizontal meridian (Moscovitch et al., 1997). Additionally, this paradigm permits us to examine the relative contribution of the left and right hemispheres to HP. This task is based on the rationale of the "chimeric face effect" (for example, see Indersmitten and Gur, 2003): when a chimeric face is presented over central fixation, observers show

#### **FIGURE 1 | Schematic diagram depicting the left-right composite paradigm.** In this example, the cued part is on the left (with a green/shaded background) and the irrelevant part is on the right (with a white background). The format of the study and test faces can be either both aligned or both misaligned. Participants are instructed to make a same/different judgment

based on the cued part in the study face and the test face, and to ignore the other irrelevant part. In congruent trials, the study and test face halves can both be the same (AB→AB) or different (AB→DC). In incongruent trials, a change can occur either in the irrelevant part (AB→AC) or in the cued part (AB→DB).

a robust preference to select chimeric faces made from two left sides of the original face as being more similar to the original face than chimeric faces made from two right sides (with the left side usually projected to the RH) (Gilbert and Bakan, 1973; Brady et al., 2005). This relative RH advantage for faces is so robust that it is even observed in non-human primates (Dahl et al., 2013) and the vertical composite task is motivated by this chimeric technique. Here, we combine the "chimeric face" technique in which two face halves are paired (along the vertical midline here) with the established composite face paradigm to explore the hemispheric basis of HP in the normal and CP observers.

We adopt the complete version of the composite task here (for a review, see Gauthier and Bukach, 2007, for recent exchange of opinions, see Richler and Gauthier, 2013; Rossion, 2013), which includes both congruent trials in which the relevant and irrelevant halves lead to the "same" response (i.e., both are same or both are different), and incongruent trials in which the relevant and irrelevant halves elicit a "different" response. In the example of **Figure 1** in which the cued part is on the left (with a green/shaded background), the format of the study and test faces can be either both aligned or both misaligned. In addition, the study and test face halves can be either both the same/different ("congruent condition," e.g., study face AB is followed by test faces AB or by test face DC), or one half is different between study and test ("incongruent condition," e.g., study face AB is followed by test face AC). Although we expect performance differences between congruent and incongruent conditions (i.e., the "congruency effect"—akin to a Stroop-type of interference), the critical result, generally taken as an indicator of HP, is the interaction between alignment and congruency. That is, HP is defined as aligned (congruent–incongruent) d –misaligned (congruent– incongruent) d . Based on our predictions, we expect to observe a difference in the magnitude of HP (i.e., interaction between alignment and congruency) in controls and in CPs. Using this exact paradigm, we have previously obtained evidence for a composite effect in control participants (Liu et al., in press) and, as such, have verified the efficacy of the vertical composite task for uncovering HP. We note that in the controls, there was no modulation of the HP by hemisphere as we might have predicted given the evidence for greater RH involvement in HP. We have suggested several reasons why this interaction with hemisphere might be absent, but the pertinent question here is whether the CP individuals differ in their hemispheric contribution to HP relative to the controls.

## **METHODS**

#### **PARTICIPANTS**

All participants reported normal or corrected-to-normal vision and all were right-handed according to their responses to the Edinburgh Handedness Inventory (Oldfield, 1971). Informed consent was obtained prior to the start of the experiment and the protocol was approved by the Institutional Review Board at CMU.

#### *Congenital prosopagnosics*

Eight individuals with CP (age range 18–57 years, mean age = 36.6) participated in the study. Seven were tested in Pittsburgh, PA and one was tested in Nashville, TN. All CPs reported substantial life-long difficulties with face recognition and this impairment was confirmed by poor performance in both the Cambridge Face Memory Test (CFMT) and a famous face questionnaire used successfully to differentiate CP from controls in previous studies (Avidan et al., 2011) (see **Table 1**).

#### *Control participants*

Two groups of control participants were recruited from the Pittsburgh community. One group consisted of thirty-two

**Table 1 | Biographic details and results (raw values and** *z* **scores) of face perception measures for 8 CP individuals.**


*Color code: z-scores that exceed 2 SDs are denoted in red italics and z-scores that are between 1 and 2 SDs are denoted in blue italics.*

*\*Calculation of CFMT Upright z-score is based on the 20 control data provided in Duchaine et al. (2007a,b), M* = *59.6 (out of 72 responses), SD* = *7.6.*

*\*\*Calculation of CFMT Inverted z-score is based on the 20 control data provided in Duchaine and Nakayama (2006), M* = *42.05 (out of72 responses), SD* = *4.71.*

*\*\*\*Calculation of Famous face questionnaire z-score is based on the control data reported in Avidan et al. (2011), M* = *84.1 (%correct), SD* = *13.2.*

Caucasian students (mean age = 22.3 years, 12 M and 20 F) from Carnegie Mellon University (CMU) who participated in the study for course credit. The data from these individuals are reported in our previous study (Liu et al., in press). The other group consisted of eight individuals, age- and gender-matched to the CPs (age ± 3 years), recruited from the Pittsburgh community. As evident below, there are no differences in the performance of these two control groups and so we aggregate the data to obtain a large sample against which to benchmark the CP data.

#### **MATERIALS AND APPARATUS**

The composite stimuli were created from 40 front-view Caucasian male faces (stimuli from Tanaka Lab) with neutral expressions and without hair or glasses. All faces were converted to grayscale images. Each face was approximately 170 pixels in width and 240 pixels in height and was fitted onto a uniform 320 × 420 pixel black background. To ensure that the task could not be performed based purely on facial symmetry (e.g., one eye is higher than the other, larger proportion of mouth on the right side), within each race, the twenty faces were subdivided into five groups of four similar faces based on prior ratings1. Each composite face was then created by pairing the left half of one face with the right half of another face from the same group. A 3-pixel-thick vertical white line was inserted at the center of the face to form a gap between the left- and right-half face. See **Figure 2** for examples of a cue-left aligned incongruent trial and a cue-right misaligned congruent trial. Within each group, the positions of the eight face halves (left and right halves of the four faces) were rotated through a partial Latin square design such that one composite face was never studied again throughout the experiment. Two misaligned versions were included to counterbalance the up/down position of the left and right sides of the composite face: each misaligned composite face was created by moving the left half up or down approximately 80 pixels (around one third of the face).

For CP and their age- and gender-matched controls, stimuli were displayed on a 14 laptop with a resolution of 1366 × 768 pixels and 60-Hz frame rate. These two groups of participants viewed the display from a distance of approximately 40 cm (although this was not fixed), and the face on the screen was 4 cm wide and 5.5 cm high; thus, each face subtended about 5.5◦ horizontally and 7.9◦ vertically. For the student control group, stimuli were displayed on a 20 monitor with a resolution of 1680 × 1050 pixels and 60-Hz frame rate. Participants viewed the display from a distance of approximately 50 cm, and the face on the screen was 4.4 cm wide and 6.2 cm high; thus, each face subtended about 5◦ horizontally and 7◦ vertically.

#### **DESIGN**

This study had one between-subject variable: participant group (CP vs. control; see below for details on combining two control groups), and three within-subject variables: alignment (aligned vs. misaligned), congruency (congruent, incongruent), and visual field (left vs. right of test face). The dependent variable was recognition performance (d ).

#### **PROCEDURE**

The sequence of displays in a single trial is illustrated in **Figure 2**. Each trial began with a black fixation cross presented at the center of the gray screen for 500 ms. After that, a study composite face was shown for 500 ms, followed by a 300-ms mask. A test composite face, together with a square bracket cueing which half of the face (left or right half) was to be judged, was then displayed for 5 s or until a response was made (whichever came first). Participants were asked to judge whether the cued half in the test composite was identical or not to that in the study composite. Participants were instructed to respond as quickly and as accurately as possible by pressing "F" and "J" on the keyboard. The mapping of the key response was counterbalanced across participants. The aligned and misaligned trials were blocked and the experiment consisted of eight blocks of 80 trials each, resulting in a total of 640 trials. The experiment took around 35 min to complete (although some CP individuals took quite a bit longer to complete this). Each participant completed a practice session of 24 trials (consisting of both aligned and misaligned conditions) prior to the experiment. Practice data were checked and, in very rare cases, when accuracy fell below 60% correct, the participant was asked to complete one more practice session before proceeding to the experiment.

#### **RESULTS AND DISCUSSION**

Preliminary analyses comparing the discrimination performance (d ) between the 8 matched controls and the 32 college student controls revealed no main effect of group (*p >* 0*.*05), or interaction of any other factor with group (*p >* 0*.*05). Therefore, we judged our matched control group to be a representative sample of observers with normal face perception and merged their data with those of the larger control group so as to have a widelysampled distribution of normal performance against which to compare the CPs.

### **CP VS. CONTROL**

A four-way mixed ANOVA on discrimination performance (d ), with alignment (aligned, misaligned), congruency (congruent, incongruent), and visual field (cueing left, right of the test face) as within-subjects factors, and participant group (CP, control) as the between-subjects factor revealed a significant effect of group [*F*(1*,* 46) = 41*.*639, *p <* 0*.*001]. As expected, the CP individuals exhibited poorer discrimination performance relative to controls (controls: mean *d* = 1*.*33, *SD* = 0*.*53, CP: mean *d* = 0*.*57, *SD* = 0*.*37), confirming their status as impaired at face perception. There was also a significant interaction of congruency × group, *F*(1*,* 46) = 9*.*931, *p* = 0*.*003, but not with any other factors alone [visual field × group: *F*(1*,* 46) = 0*.*657, *p* = 0*.*422, alignment × group, *F*(1*,* 46) = 3*.*546, *p* = 0*.*066], or with the combination of any two or three factors [alignment × congruency × group, *F*(1*,* 46) = 2*.*758, *p* = 0*.*104, visual field ×

<sup>1</sup>A naïve observer at CMU grouped the Caucasian male faces in the database in a way that maximizes face similarity within each group. Each group contained up four faces and each face was used only once.

alignment × congruency × group, *F*(1*,* 46) = 2*.*256, *p* = 0*.*140]. We examined the basis of the congruency by group interaction by carrying out a paired-samples *t*-test, comparing performance in congruent vs. incongruent trials separately for the CP and control groups. A significant congruency effect was observed in both the control group, *t*(39) = 11*.*851, *p <* 0*.*001, and in the CP group, *t*(7) = 3*.*155, *p* = 0*.*016, somewhat attenuated in the latter case perhaps because of reduced statistical power relative to controls. According to previous research (Bukach et al., 2006; Richler et al., 2008; Curby et al., 2013; but see Rossion, 2013 for a counterargument), the congruency effect alone can be indicative of evidence for HP, and therefore, the observed congruency by group interaction confirms a difference between the CP and control observers.

Note also that because of our a priori hypotheses and the fact that some of the higher-order interactions are trending toward statistical significance, we undertook further investigation within each group so as to elucidate any possible differences in response profile per group. As laid out in the rationale, the alignment by congruency interaction is the most stringent criteria for HP. Because of this a priori prediction and the possibility that unbalanced sample size might have concealed the potential HP by group interaction, we investigated the alignment by congruency interaction separately in CP and in controls. To this end, we conducted a 2 × 2 (alignment × congruency) repeated-measures ANOVA on discrimination performance (d ) separately within the CP group and within the control group. We also excluded the factor of visual field from further analysis because it failed to show a main effect or interaction with any factors in the previous ANOVA, suggesting equal participation of both hemispheres in the left-right composite face task across all groups. Performance (d ) on congruent and incongruent trials in the aligned and misaligned conditions is plotted separately for controls and CP in **Figure 3**.

#### *Controls*

In the control data (aggregated over 32 college student controls and 8 matched controls), there was a significant alignment by congruency interaction [*F*(1*,* 39) = 41*.*488, *p <* 0*.*001], indicative of HP of left-right composite faces. In other words, judgment of the cued half is strongly influenced by the irrelevant half when faces are aligned, and this influence is reduced when faces are misaligned. In addition, the main effect of congruency was significant, *F*(1*,* 39) = 150*.*191, *p <* 0*.*001. The main effect of alignment was also significant, *F*(1*,* 39) = 52*.*603, *p <* 0*.*001, with better performance in the aligned than misaligned condition. A follow-up paired samples *t*-test revealed that the enhanced performance in the aligned vs. misaligned condition was only observed in congruent trials, *t*(39) = 10*.*091, *p <* 0*.*001, where relevant and irrelevant halves led to the same response (i.e., both are same

<sup>2</sup>The black square brackets in the figure are for illustration purposes only. In the experiments, the cue was a yellow frame overlaid on top of the black outline of either the left or the right face half. We opted for the black square brackets here because the yellow frame would not stand out from the black background in the monochrome version of the figure.

or both are different). This means that the response on the relevant half is facilitated by the irrelevant half because their responses are congruent, and this facilitation is larger in the aligned than misaligned condition. In contrast, there was no difference between performance in the aligned than misaligned condition in incongruent trials, *t*(39) = 0*.*333, *p* = 0*.*741, where relevant and irrelevant halves elicit different responses (same vs. different response).

### *CP*

In contrast with the profile of the control participants, the composite face effect was absent in the CP data, evidenced by a non-significant interaction between alignment and congruency, *F*(1*,* 7) = 2*.*095, *p* = 0*.*191. Note that the alignment by congruency interaction was significant in the eight age- and gendermatched controls, *F*(1*,* 7) = 5*.*723, *p <* 0*.*05, and therefore, the absence of this interaction in the CP group was not due to a lack of statistical power. Note that because in the upright version of CFMT in **Table 1**, MN (*z*-score = −1*.*00), SH (*z*-score = −0*.*34) and SC (*z*-score = −1*.*79) performed within 2SD of the normal range, here we further used a leave-one-out procedure (MN/SH/SC) and repeated the analysis of the composite face effect. The pattern of alignment by congruency interaction was not affected by this procedure [without MN: *F*(1*,* 6) = 1*.*428, *p* = 0*.*277; without SH: *F*(1*,* 6) = 2*.*907, *p* = 0*.*139; without SC: *F*(1*,*6) = 4*.*391, *p* = 0*.*081] and therefore we decided to include all these three CPs in the final analysis. In addition, CPs showed a significant main effect of alignment, *F*(1*,* 7) = 6*.*879, *p* = 0*.*034, with better performance in the aligned condition than in the misaligned condition and a main effect of congruency, *F*(1*,* 7) = 9*.*981, *p* = 0*.*016, with higher discrimination sensitivity for congruent trials than incongruent trials. Nevertheless, the magnitude of both effects of alignment and congruency was much smaller than that of the control group. See **Figure 3** for a comparison among CP and the two control groups.

Because of the possible heterogeneity in HP in CP individuals, we also undertook an analysis of performance at the individual level and we report these data below. To do so, Crawford's modified *t*-test (Crawford and Howell, 1998; Crawford and Garthwaite, 2002) was used to assess the performance difference between each CP's score and the control sample. To this end, we created two indices critical for this task: specifically, for each participant, the congruency index was created by subtracting performance in the incongruent trials from that in the congruent trials, i.e., congruency index = congruent *d* –incongruent *d* , and the holistic processing index was created by subtracting the difference between congruent and incongruent trials in the misaligned condition from that in the aligned condition, i.e., (aligned congruent *d* –aligned incongruent *d* )–(misaligned congruent *d* –misaligned incongruent *d* ).

As can be seen from **Table 2**, five out of eight CPs showed significant impairment in the holistic processing index from the Crawford's *t*-test (*p <* 0*.*05, two-tailed) and the individual data from each CP participant is shown in **Figure 4**. We note that three CPs (BQ, MN, and TD) do not show a statistically significant HP effect and two of these three, MN and TD, show a trend in the right direction and it is only participant BQ who shows a different profile. Closer scrutiny of BQ's data shows higher *d* for the aligned congruent than aligned incongruent trials, but his *d* for misaligned congruent trials is 0.00, which is very unusual compared to the other CPs and because of this, there is no significant composite effect. Based on these results, we can conclude that 7 CP individuals (to a greater or lesser degree) show a reduction in HP of faces.

For comparison purposes, we also computed the HP scores for each of the 40 controls using a leave-one-out procedure (compute


**Table 2 | Crawford's** *t***-test scores of the aligned congruency index, misaligned congruency, and the holistic processing index for each individual CP participant.**

*Color code: Crawford's t-scores (negative only) with p values < 0.05 (two-tailed) are denoted in red italics and those with p values < 0.05 (one-tailed) are denoted in blue italics.*

*\*Holistic processing index is calculated as aligned (congruent–incongruent) d –misaligned (congruent–incongruent) d .*

means based on all controls with the exception of the target control and then assess the status of the left-out control relative to the mean and distribution of the group and this was repeated for each participant). Of the controls, 13 out of 40 do not show a Crawford significant HP result relative to the control group mean. In fact, there has been some recent consideration of the variability of performance (and lack of consistency at an individual level) of the standard composite effect (Ross et al., in press) and some discussion on ways to enhance the reliability and robustness of the finding, which holds strongly at the group level.

### **GENERAL DISCUSSION**

Congenital prosopagnosia is an intriguing neurodevelopmental disorder in which individuals are impaired at face perception apparently from birth, in the absence of any sensory or intellectual deficits. The reigning hypothesis is that the psychological mechanism that underlies the difficulty in face processing in these individuals is one in which holistic processing (HP) is impaired. Much research has provided evidence in support of this hypothesis including data showing that CP individuals do not show the expected inversion effect (Rouw and de Gelder, 2002; Behrmann et al., 2005; Avidan et al., 2011), do not show a global superiority effect in a Navon-compound letter task (Behrmann et al., 2005; Avidan et al., 2011) and do not show HP in a standard composite top-down task (for example, Ramon et al., 2010; Avidan et al., 2011; Palermo et al., 2011). Closer scrutiny, however, reveals several counterexamples. For example, Le Grand et al. (2006) reported that, of the eight CPs who participated in their study, surprisingly, only one CP showed an abnormal composite effect. Additionally, Susilo et al. (2011) reported that the CP in their study showed a composite effect across three different tasks (naming and two same/different judgments). Also, Schmalzl et al. (2008) tested a family of seven developmental prosopagnosia (DP) individuals (spanning four generations) and reported that only four individuals failed to show the normal composite effect, and, finally, Williams et al. (2007) found a normal composite effect in a case of DP. In light of the contrasting results reported to date, the purpose of the present investigation was to examine further the nature of holistic processing in CP vs. matched controls using a new left-right composite face task. In addition, we wished to assess possible differences between the groups in hemispheric modulation of the composite effect and to document the magnitude of the HP effect at an individual level. The vertical composite task was modeled after the well-known face chimeric effect in which two half faces presented to the left and right of fixation reveal superior processing of the half face that occupies the left visual field (right hemisphere).

At a group level, unlike the control individuals (*n* = 40, comprised of matched controls and a large group of non-matched controls), the CP individuals did not show an interaction of congruency × alignment. Moreover, the CP performance (in *d* ) is significantly lower than that of the controls, some of whom are directly pairwise matched with the CP individuals. Of interest, the CP group does show significantly poorer performance when the faces are misaligned compared with when they are aligned, reflecting residual sensitivity to first-order properties of the face (Maurer et al., 2002). The CPs also show a main effect of congruency, with higher discrimination sensitivity for congruent trials than incongruent trials. Although some accept this signature as a measure of HP, in that the unattended face half influences performance on the attended face half, the congruency effect is not considered the golden metric of HP (the alignment × congruency interaction).

Given the heterogeneity of individual CP cases, as reviewed above, we assessed each participant individually. The majority of CP individuals performed outside the normal range when a case-by-case analysis was done (7 out of 8 participants), further confirming the difficulties in HP. However, we note that almost a third of the controls also failed to show a composite effect when the individual control data were assessed (see Ross et al., in press for more detailed discussion of the reliability of the composite face task).

Surprisingly, but interestingly, we observed no differences between the controls and the CP in terms of modulation of the composite effect by hemisphere, i.e., performance was the same independent of whether the cued face half fell in the right or left visual field. While we were surprised by the absence of hemispheric modulation in the controls (see Liu et al., in press) given how closely this paradigm mirrors the known chimeric face result, of interest here is that the CPs, too, show no hemispheric modulation.

In sum, the CP individuals performed more poorly than the controls in a task of face matching that taps HP. These results support the claim that a breakdown in holistic processing may be at the basis of CP. The paradigm we designed appears to be effective in uncovering this difficulty and confirms the deficit in HP as noted on many previous reports (for example, Avidan et al., 2005; Ramon et al., 2010; Palermo et al., 2011; Kimchi et al., 2012). We note that a decrement in HP in CP may be quite ubiquitous and may even be evident in the failure of these individuals to determine aspect ratio (conjoint representation of the length and the width of rectangles) (Tanzer et al., 2014) in the ability to configurally represent other non-face stimuli too (Lange et al., 2009).

It is also the case that CPs may not only be impaired at HP but may even show some deficits in featural processing as well. For example, in the context of a Garner speeded-classification task using facial stimuli, unlike in the controls, the CP group exhibited no Garner interference in either the featural or the configural judgments. When classifying upright faces that varied in features (shape of eyes, nose, and mouth) and configuration (intereyes and nose–mouth spacing), the CPs could attend to configural information and make configural judgments without interference from irrelevant variation in featural information; similarly, they could attend to featural information and make featural judgments without interference from irrelevant variation in configural information. This pattern of performance, which is in clear contrast to the symmetric Garner interference observed in matched controls (and in young controls), indicates that featural information and configural information are separable in CP's upright face processing. That is, CPs do not perceive and process faces holistically. Rather, CPs process facial features and facial configuration independently.

Taken together, the findings of the current study are consistent with previous reports of altered visual perception in CP, specifically in the domain of HP. With this basic understanding of the possible underpinnings of the impairment, there have been some recent attempts to remediate the face processing deficits in CP with specific focus on retraining HP. DeGutis et al. (2007) devised a behavioral task that required discrimination of faces by their spatial configuration. This task was completed repeatedly by a single prosopagnosic individual and interestingly, after extensive training, not only did the individual improve in behavioral performance but also evinced a face-selective N170 after training that was not evident pre-training. There was also an increase in functional connectivity between ventral occipital temporal faceselective regions (right occipital face area and right fusiform face area) post-training, as well. More recently, DeGutis et al. (2014) explored whether it is possible to enhance face processing in a large group of CPs using a 3-week online face-training program targeting holistic face processing. The trained CPs showed moderate but significant overall training-related improvements on measures of front-view face discrimination and some showed significantly increased holistic face processing to the point of being similar to that of unimpaired control subjects. The findings also showed modest but consistent self-reported diary improvements. Clearly, further work along similar lines will continue to add to our understanding of the underlying deficit in CP and ways in which this can be offset through intervention.

## **CONCLUSIONS**

Consistent with the suggestion that impaired HP may underlie CP's difficulty in face processing, using a novel left-right composite face paradigm, we observed normal HP in control observers but reduced HP in CP. In addition to the group level performance, detailed examination of individual level performance showed that most CP individuals evinced no HP although this was also true in the individual profiles of about one third of the controls. Contrary to our prediction on differential hemispheric contribution to HP, neither CP nor control group showed any difference in performance as a function of hemifield of the cued face half, suggesting equal participation of both hemispheres to HP. In conclusion, the present study verified the use of a novel left-right composite face paradigm, which may potentially contribute to the study of HP in individuals with normal face perception and atypical face perception.

#### **ACKNOWLEDGMENTS**

This work was supported by a grant from the National Science Foundation to Marlene Behrmann (BCS-1354350) and by a grant from the National Science Foundation Temporal Dynamics of Learning Center, SBE0542013 (G. Cottrell: Co-PI: Marlene Behrmann). We thank Dr. James Tanaka for granting access to the database of face stimuli, which we used in this study. We thank Dr. Jennifer Richler for her helpful comments and Dr. Bradley Duchaine for providing detailed information on the Cambridge Face Memory Test. We also thank Akshat Gupta and Ryan Egan for helping with the face stimuli creation and data collection.

## **REFERENCES**


test score differences. *Neuropsychologia* 40, 1196–1208. doi: 10.1016/S0028- 3932(01)00224-X


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 05 June 2014; accepted: 05 September 2014; published online: 29 September 2014.*

*Citation: Liu TT and Behrmann M (2014) Impaired holistic processing of leftright composite faces in congenital prosopagnosia. Front. Hum. Neurosci. 8:750. doi: 10.3389/fnhum.2014.00750*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Liu and Behrmann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Multi-voxel pattern analysis (MVPA) reveals abnormal fMRI activity in both the "core" and "extended" face network in congenital prosopagnosia

## *Davide Rivolta1,2\*, Alexandra Woolgar 2, Romina Palermo3, Marina Butko2, Laura Schmalzl <sup>4</sup> and Mark A. Williams <sup>2</sup>*

*<sup>1</sup> School of Psychology, University of East London, London, UK*

*<sup>2</sup> Perception in Action Research Centre, and ARC Centre of Excellence in Cognition and its Disorders, Department of Cognitive Science, Faculty of Human Sciences, Macquarie University, Sydney, NSW, Australia*

*<sup>3</sup> School of Psychology, and ARC Centre of Excellence in Cognition and its Disorders, University of Western Australia, Crawley, WA, Australia*

*<sup>4</sup> Department of Family and Preventive Medicine, University of California San Diego, La Jolla, CA, USA*

#### *Edited by:*

*Aina Puce, Indiana University, USA*

#### *Reviewed by:*

*Marlene Behrmann, Carnegie Mellon University, USA Garga Chatterjee, Indian Statistical Institute, India*

#### *\*Correspondence:*

*Davide Rivolta, School of Psychology, University of East London, Water Lane, E15 4LZ London, UK e-mail: d.rivolta@uel.ac.uk*

The ability to identify faces is mediated by a network of cortical and subcortical brain regions in humans. It is still a matter of debate which regions represent the functional substrate of congenital prosopagnosia (CP), a condition characterized by a lifelong impairment in face recognition, and affecting around 2.5% of the general population. Here, we used functional Magnetic Resonance Imaging (fMRI) to measure neural responses to faces, objects, bodies, and body-parts in a group of seven CPs and ten healthy control participants. Using multi-voxel pattern analysis (MVPA) of the fMRI data we demonstrate that neural activity within the "core" (i.e., occipital face area and fusiform face area) and "extended" (i.e., anterior temporal cortex) face regions in CPs showed reduced discriminability between faces and objects. Reduced differentiation between faces and objects in CP was also seen in the right parahippocampal cortex. In contrast, discriminability between faces and bodies/body-parts and objects and bodies/body-parts across the ventral visual system was typical in CPs. In addition to MVPA analysis, we also ran traditional mass-univariate analysis, which failed to show any group differences in face and object discriminability. In sum, these findings demonstrate (i) face-object representations impairments in CP which encompass both the "core" and "extended" face regions, and (ii) superior power of MVPA in detecting group differences.

**Keywords: face perception, body perception, object perception, prosopagnosia, MVPA, multivariate analysis, unfamiliar face, fMRI**

## **INTRODUCTION**

People are typically able to recognize hundreds of familiar faces with ease. Regions within the inferior occipital cortex (i.e., occipital face area, OFA), fusiform gyrus (i.e., fusiform face area, FFA), and anterior temporal lobe (AT) are part of a neural network that supports this extraordinary ability (Haxby et al., 2000; Ishai, 2008; Kanwisher, 2010). In particular, the OFA and the FFA are argued to represent "core" regions supporting the perception and recognition of visually presented faces, whereas the AT is considered an "extended" region, which mediates aspects of identity, name, and biographical information (Haxby et al., 2000; Kriegeskorte et al., 2007). Functional Magnetic Resonance Imaging (fMRI) studies have shown that these regions play a critical role in the recognition of facial identity. For instance, OFA and FFA fMRI activity is correlated with behavioral measures of face recognition ability (Yovel and Kanwisher, 2005; Kriegeskorte et al., 2007; Furl et al., 2011). In addition, brain injuries encompassing at least one of these regions often results in severe face recognition deficits (i.e., acquired prosopagnosia) (Barton, 2008; Rossion, 2008).

Face recognition difficulties are also apparent in approximately 2–3% of the general adult population with no reported brain injuries (Kennerknecht et al., 2006; Bowles et al., 2009; Wilmer et al., 2010). This specific difficulty in recognizing faces, in the context of otherwise intact sensory and intellectual functioning, is known as developmental or *congenital prosopagnosia* (CP) (McConachie, 1976; Duchaine, 2000; Behrmann and Avidan, 2005; Duchaine and Nakayama, 2006b; Schmalzl et al., 2008; Rivolta et al., 2010, 2012a). Some people with CP do not have difficulty differentiating between other similar objects (Behrmann et al., 2005; Wilson et al., 2010), whereas some people do (Duchaine et al., 2007; Lobmaier et al., 2010).

The neuro-functional correlates of CP are still far from clear. Two single case studies of CP reported atypical functioning of the FFA (Hadjikhani and De Gelder, 2002; Bentin et al., 2007). The FFA was also implicated in a study by Furl et al. (2011), who functionally localized ROIs (i.e., by contrasting faces—cars fMRI activity) and found weaker peak activity and a smaller number of fusiform gyrus face-voxels in a group of 15 CPs as compared to matched controls in these ROIs. However, there was no difference between CPs and controls when a whole brain analysis was conducted. Repetition suppression paradigms have typically indicated that both CPs and controls show a diminished fMRI signal to the repeated presentation of faces within the OFA and FFA (Avidan et al., 2005; Avidan and Behrmann, 2009; Furl et al., 2011). In contrast, other studies have not demonstrated atypical activity in core regions (i.e., the OFA or FFA) of CPs (Hasson et al., 2003; Avidan et al., 2005; Avidan and Behrmann, 2009). Typical face sensitive occipital and fusiform activity has also been demonstrated with magnetoencephalography (MEG) in a group of six CPs when considering source-reconstructed event-related fields (ERFs) activity (Rivolta et al., 2012b). Thus, previous fMRI and MEG studies suggest that posterior face activity may be necessary, but not sufficient, for normal face recognition (see Rossion, 2008 for similar arguments based on acquired prosopagnosia patients), and leaves open the possibility that regions outside the "core" OFA and FFA may play an important role in the behavioral face recognition difficulties underlying CP.

Support for the involvement of "extended" systems in face identity recognition comes from a recent fMRI study that showed that a group of seven CPs showed reduced AT activity for famous faces compared to controls, and also reduced AT functional connectivity with "core" face regions (Avidan et al., 2013). This study also showed relatively intact OFA and FFA activity, thus providing a functional dissociation between spared "core" face regions and impaired "extended" regions in CP. Aberrant functioning of the AT in CP is also in line with anatomical data showing AT volume reduction in CP (Behrmann et al., 2007) and reduced anatomical connectivity of the AT regions in CP (Thomas et al., 2009). This data, thus, supports proposals that CP is a disconnection syndrome where, due to anatomical and functional deficiencies, intact "core" face regions cannot pass their information to more anterior "extended" regions (Avidan et al., 2013; Rivolta et al., 2013).

Taken together, we see an inconsistent pattern across studies, with some showing OFA and FFA dysfunction, but others showing only AT abnormalities. While these differences may have been driven by the heterogeneity of CP itself (Schmalzl et al., 2008), they may also be the result of the power and sensitivity of the fMRI analysis approach adopted so far in CP literature. In particular, all previous fMRI studies investigating face processing skills in CP have used traditional mass univariate analysis. Recent evidence has, however, suggested that multivariate analysis of fMRI datasets MVPA provides a more sensitive analytical approach than traditional univariate analysis (Cox and Savoy, 2003; Haynes and Rees, 2006; Norman et al., 2006). In addition, univariate analyses may be less sensitive to AT regions activity (Mur et al., 2009), which is susceptible of signal distortion due to the ear canals and sinuses (Ojemann et al., 1997). Here, we use MVPA for the first time to investigate face processing activity in a group of seven CPs and 10 matched controls.

In addition to presenting faces and objects/scenes as visual stimuli (as in most previous neuroimaging CP investigations), in the current study we have also included body and body parts. In fact, bodies not only match faces for visual exposure and perceptual experience (Reed et al., 2012), but there is also evidence suggesting that body perception shares perceptual mechanisms (i.e., holistic processing) with faces (Reed et al., 2003; Willems et al., 2014), and that the processing of bodies can be impaired in CP (Righart and de Gelder, 2007; Van den Stock et al., 2008). Thus, participants were presented with visual stimuli from four different categories (faces, headless bodies, body parts, and objects) and their task was to press a button whenever a stimulus was repeated twice (i.e., one-back task).

## **METHODS AND RESULTS PARTICIPANTS**

Seven people with CP (4 Females, Mean age = 39.7, Range: 22–58, *SD* = 14*.*30) and 10 people who did not report face processing impairments (4 Females, Mean age = 33.6, Range: 27–55, *SD* = 9*.*55) completed the experiment. All participants reported normal or corrected to normal vision, no history of neurological or psychiatric conditions and all except one CP were right handed. All participants provided written consent after the experimental procedure was explained. The study received ethic approval from Macquarie University and it conforms to The Code of Ethics of the World Medical Association (Declaration of Helsinki), printed in the *British Medical Journal* (18th July 1964).

## **TASKS USED TO CONFIRM CP**

All participants with CP were recruited through the online Australian Prosopagnosia Register (https://www*.*maccs*.*mq*.*edu*.* au/research/projects/prosopagnosia/register), where they registered because they were experiencing face recognition difficulties in everyday life. For detailed behavioral data of all CPs see Rivolta et al. (2012a). The CPs completed three tests of face identity recognition: (i) The MACCS Famous Face Test 2008 (MFFT-08), which measures the famous faces identification abilities (Palermo et al., 2011); (ii) The Cambridge Face Memory Test (CFMT, Duchaine and Nakayama, 2006a), which measures the memory for newly learned faces; and (iii) the Cambridge Face Perception Test (CFPT, Duchaine et al., 2007), which assesses face-matching abilities. A participant was considered CP if the performance on at least one of these three diagnostic tasks was at least 2 *SD* below the mean see **Table 1** for age standardized z-scores calculated from the normative data in Bowles et al. (2009).

Further tasks were administered to exclude that their face processing difficulties were consequence of low-level vision problems, general cognitive difficulties or impaired social functioning. All CPs showed normal contrast sensitivity as assessed by the *Functional Acuity Contrast Test* (FACT, Vision Sciences Research Corporation 2002) and normal color perception with the *Ishihara Test for Color Blindness* (Ishihara, 1925). Performance on the length, size, orientation and picture naming (long version) subtests of the *Birmingham Object Recognition Battery* (BORB) (Riddoch and Humphreys, 1993) confirmed that basic object recognition skills were intact. The *Raven Colored Progressive Matrices* (Raven et al., 1998) further indicated that the IQ of all participants with CP was within the normal range. None of the CPs scored within the autistic range on the *Autism-Spectrum Quotient* (AQ, Baron-Cohen et al., 2001). Thus, the everyday face recognition difficulties reported by the CPs are not due to lowlevel visual difficulties, low IQ, or impaired social functioning.



*Scores falling more than 2 SD below the mean are displayed in italics.*

All participants did not report any sign of anatomical brain alterations. Anatomical volumes (i.e., structural MRIs) have been routinely checked by an expert physician at S. Vincent's Hospital (Sydney).

#### **MRI DATA ACQUISITION**

Functional images were acquired with a 3-Tesla Philips scanner at St Vincent's Hospital (Sydney, New South Wales, Australia). At the beginning of the experimental session a high-resolution anatomical scan was acquired for each participant using a 3D-MPRAGE (magnetization prepared rapid gradient echo) sequence. Subsequently, high-resolution functional scans were obtained using an 8-channel head coil and a gradient echo planar imaging (EPI) sequence (114 time points per run; Inter-scan interval: 2 s, *TR* = 3000 ms, *TE* = 32 ms, voxel size = 1.4 × 1.4 × 2.0 mm; inter-slice gap: 20%). The 15 oblique axial slices were aligned approximately parallel to the anterior / posterior commissure line.

#### **fMRI EXPERIMENT**

#### *Behavioral task: the one-back task*

During the experiment participants were presented with visual stimuli belonging to four different categories: faces, headless bodies, individual body-parts (hands and feet) and objects. All stimuli were grayscale photographs and matched for brightness and contrast. The set of stimuli included a total of 240 images, 60 for each of the four stimulus categories (half of the "face" and "body" stimuli were females and half males). Stimuli covered approximately 4.1◦ of visual angle.

The presentation of stimuli during the fMRI acquisition was programmed with Presentation software (Neurobehavioral Systems, Albany, CA; http://www*.*neurobs*.*com/) and run on a 15-inch Macintosh Power Book with screen resolution set to 1280 × 854 pixels. Stimuli were back-projected via a projector onto a screen positioned 1.5 m behind the fMRI scanner, and participants viewed the screen through a mirror mounted on the head-coil and positioned at 10 cm distance from their head. An optic fiber button box was used to record the participants' responses.

Participants' brain activity was recorded in 8 functional runs with the duration of 336 s each. During each run, 114 functional scans (TRs) were acquired. The stimulus categories were presented in a blocked design with a total of 32 blocks of 16 s each. Each of the 32 blocks contained 16 stimuli of a specific category. Stimuli were presented in the center of the screen for 500 ms with a 500 ms inter-stimulus interval (ISI). The maintenance of attention to the stimuli was ensured by presenting participants with a standard "one-back" task. The task required pressing a button whenever a particular image was repeated consecutively (10% of the trials was a repeat). The order of blocks was counterbalanced across subjects. In addition, a fixation block (where a fixation cross was presented in the middle of the white screen) was presented at the beginning of each block and at the end of each fourth block (which corresponded to the end of the functional run).

#### *One-back task performance*

The one-back task was administered to ensure that participants were paying attention to the stimuli. Performance on the oneback task was analyzed by running a repeated-measures ANOVA with Group (controls, CPs) as a between-subject factor and Category (face, body, body part, object) as a within-subject factor. Performance on the one-back task did not differ between Controls (*M* = 0*.*771, s.e.m. = 0.185) and CPs (*M* = 0*.*722, s.e.m. = 0.221), *F*(1*,* 15) = 2*.*9, *p* = 0*.*109. This was the case across all stimulus categories no main effect of Category [*F*(3*,* 45) = 1*.*79, *p* = 0*.*163]; no Category by Group interaction [*F*(3*,* 45) = 1*.*32, *p* = 0*.*277], which is not surprising given that the one-back task was relatively simple and could be completed by simply attending to only part of the image.

#### *fMRI processing and multi-voxel pattern analysis (MVPA)*

Preprocessing of the fMRI data was carried out using SPM8 (Wellcome Department of Imaging Neuroscience, London, UK; www*.*fil*.*ion*.*ucl*.*ac*.*uk). All EPI images were spatially realigned to the mean functional image and smoothed with a 4 mm full-width at half maximum (FWHM) kernel. The timecourse of each voxel was high-pass filtered with a cut off of 128 s.

Multi-voxel pattern analysis (MVPA) was used to discriminate patterns of activation pertaining to face, object, bodies, and body parts in each participant separately. These analyses used spatially realigned smoothed native space images which were additionally smoothed with a 4 mm (FWHM) kernel. First, for each participant, the multiple regression approach of SPM8 was used to estimate the response to each of face, body, body part, and fixation blocks in each of the 8 scanning acquisition runs, with additional regressors of no interest included to model the run means. Blocks were modeled using 16 s box car functions convolved with the canonical haemodynamic response function of SPM. This yielded 8 beta estimates for each of the face, object, body, and body part conditions (one for each run). Next, MVPA was used to estimate the pair-wise discriminability of these beta estimates using a roaming searchlight (Kriegeskorte et al., 2006). The approach identifies voxels where the pattern of activation in its local neighborhood can discriminate between conditions.

The analysis of face vs. object proceeded as follows. For each participant, the pattern of beta values from the 16 relevant images (8 faces and 8 objects) was extracted from a spherical ROI (radius, 10 mm) centered in turn on each voxel in the brain, yielding 16 multivoxel vectors. All the voxels in each sphere contributed to each vector, without feature selection. A linear support vector machine, LinearCSVMC (Chang and Lin, 2011), was trained to discriminate between the vectors pertaining to faces and those pertaining to objects. We used a leave-one-out 8-fold splitter: the classifier was trained using the data from 7 of the 8 runs and was subsequently tested on its accuracy at classifying the unseen data from the remaining run. This process was performed in 8 iterations, using all 8 possible combinations of train and test runs. The classification accuracies from the 8 iterations were then averaged to give a mean accuracy score for that sphere, which was assigned to the central voxel. This procedure was repeated for every voxel in the brain yielding whole-brain classification accuracy maps for each individual. This analysis was carried out using custom Matlab scripts wrapping the LIBSVM library (Chang and Lin, 2011). Finally, to combine data across individuals, the normalization parameters derived from normalizing the mean EPI image for each participant were used to normalize the classification accuracy maps. Accuracy maps for control and CP participants, separately, were entered into one-sample *t*-tests comparing group accuracy scores to chance (50%). The resulting whole brain statistical maps were then thresholded at *t >* 8*.*403, equivalent to *p <* 0*.*05 with Family Wise Error (FWE) correction in the control group analysis. This analysis reveals voxels where the local patterns of activation reliably discriminate between faces and objects across each group separately. To identify regions where face vs. object discrimination was significantly greater in controls relative to CPs, the accuracy maps were additionally entered into a two-sample *t*-test (control minus patient). The resulting whole brain statistical map was then thresholded to visualize clusters surviving cluster level correction for multiple comparisons at *p <* 0*.*05. The same procedure was carried out for the discrimination of faces vs. objects, faces vs. bodies, and faces vs. body-parts.

## *MVPA results*

*Within-group analyses: controls and CPs.* Controls showed an above chance discrimination pattern between faces and objects over the fusiform gyri and inferior occipital gyri (see **Figure 1** and **Table 2**). Controls also showed above chance discrimination between faces and bodies in the fusiform gyri, left middle occipital gyrus and lateral inferior occipital gyri (see **Figure 1** and **Table 2**), and above chance discrimination between faces and body parts over fusiform gyri, left inferior temporal gyrus, lingual gyri, left superior occipital gyrus, right middle occipital gyrus, and lateral inferior occipital gyri (see **Figure 1** and **Table 2**). Controls' pattern of activity could above chance discriminate between object and bodies over the left inferior occipital gyrus, right middle occipital gyrus, and right fusiform gyrus (see **Figure 2** and **Table 2**). Finally, controls showed an above chance discrimination pattern between object and body parts over the inferior occipital gyrus (bilateral), fusiform gyrus (bilateral), right lingual gyrus, left inferior temporal gyrus, and right middle temporal gyrus (see **Figure 2** and **Table 2**).

CPs' MVPA activity over the right fusiform gyrus, left middle occipital gyrus, and left inferior occipital gyrus could discriminate between faces and objects at levels above-chance (see **Figure 1** and **Table 2**). CPs also showed above chance discrimination between faces and bodies in the right fusiform gyrus, right lingual gyrus, left middle occipital gyrus, and inferior occipital gyri (see **Figure 1** and **Table 2**), and above chance discrimination between faces and body parts over the left inferior occipital gyrus and right lingual gyrus (see **Figure 1** and **Table 2**). CPs' pattern of fMRI activity could discriminate between objects and bodies over the right inferior occipital gyrus. Finally, CPs showed an above chance discrimination pattern between the inferior occipital gyrus (bilateral), fusiform gyrus (bilateral), right lingual gyrus, left inferior temporal gyrus, and right middle temporal gyrus (see **Figure 2** and **Table 2**).

*Between-group analyses: controls vs. CPs.* The between-groups comparison indicated stronger face-object discrimination in controls than in CP. This group difference was evident in the fusiform gyri, right inferior occipital gyrus, right inferior temporal gyrus, and right parahippocampal gyrus (**Figure 3** and **Table 2**). The two groups' MVPA activity did not differ when discriminating faces vs. bodies, faces vs. body parts, objects vs. bodies, and objects vs. body parts.

## *fMRI mass-univariate analysis*

To compare our MVPA results to standard fMRI univariate findings, we performed a group level whole-brain mass-univariate statistic as implemented in SPM. To define face-sensitive regions, we compared faces vs. objects. Processing of all EPI images follows standard SPM procedure. All EPI images were normalized to T1-weightened MNI structural template and smoothed with an 4 mm Gaussian filter. As for the multivariate analysis, the multiple regression approach of SPM8 was used to estimate the response to each block in each of the 8 scanning acquisition runs, for each participant, with additional regressors of no interest included to model the run means. Blocks were modeled using 16 s box car functions convolved with the canonical haemodynamic response function of SPM. This yielded 8 beta estimates for each condition; one for each run. To find face discriminating region in each group (controls, patients), a onesample *t*-test was performed for each group separately using face minus object contrasts as reference images. The resulting map was thresholded at *t >* 8*.*403, equivalent to *p <* 0*.*05 with FWE correction. A between groups (controls minus patients) comparisons using two-sample independent *t*-test with unequal variance was performed with face minus object contrast images. The resulting whole brain statistical map was then thresholded to visualize clusters surviving cluster level correction for multiple comparisons at *p <* 0*.*05. In addition, face selective regions were also investigated in each subject separately (i.e., *single-subject analysis*) by contrasting the BOLD signal associated with presentation of faces compared to objects at the single subject level (*p <* 0*.*05 FWE).

## *fMRI results: mass univariate analysis*

*Within-group analyses: Controls and CPs.* At the group level, using the same threshold that was used in the MVPA analysis (*t >* 8*.*403), we could not find any statistically significant

fMRI activity (face-sensitive activity could not be found even with a more permissive threshold of *p <* 0*.*001 uncorrected). Since this lack of group activity could potentially be due to the between-subject variability in the location of face-sensitive regions, we additionally performed single-subject analyses, where we compared face vs. object activity. Results, in line with previous studies (e.g., Avidan et al., 2013), indicated that all controls show "core" face activity in the right OFA and FFA. Five out of seven CPs also showed OFA and four CPs showed FFA (**Table 3**).


**Table 2 | Anatomical regions (Label), MNI coordinates (x, y, z),** *z***-values (***z***-value), Brodmann areas (BA), clusters sizes (KE), and sides (L, Left; R, Right) of the within- (Controls and CPs) and between- (Controls vs. CPs) group effects.**

*(Continued)*

#### **Table 2 | Continued**


*Contrasts reported: face vs. object, face vs. body, face vs. body-part object vs. body and object vs. body part.*

*Between-group analyses: controls vs. CPs.* The group comparison did not show any statistically significant difference between controls and CPs. Thus, as predicted, mass-univariate analysis is not as sensitive as MVPA in detecting group differences. Given the small number of single-subject localized face-sensitive regions in CPs (see **Table 3**), we did not run any statistical analysis to compare the two groups.

## **DISCUSSION**

We investigated the neural characteristics of CP by examining the pattern of activity to faces, objects, headless bodies, and body parts using MVPA. We found that the pattern of fMRI activity within both the "core" and "extended" face regions showed reduced sensitivity discriminating faces and objects in a group of seven CPs as compared to a group of control participants. For

the first time, we also report that this pattern poor discrimination between faces and objects in CPs is also evident in the right parahippocampal gyrus. The two groups did not show any difference in face-body, face-body part, object-body, and object-body part discriminations. Given that mass-univariate results failed to report any group difference, we can also conclude that MVPA represents a more sensitive approach than traditional univariate statistics in detecting group differences (Norman et al., 2006). Note that since only the face-object contrast showed group differences and that the univariate analysis failed to report differences between controls and CPs, we exclude that group differences can be explained in term of general activity differences.

We acknowledge that face-sensitive regions (e.g., OFA and FFA) are traditionally defined using traditional mass-univariate analysis (Kanwisher et al., 1997). In the current study, due to the lack of face-sensitive (i.e., univariate) regions, we could not localize, at a *group level*, OFA, FFA and AT. In addition, we could not ascertain whether MVPA-defined face-object discriminant regions (**Figures 2**, **3**) include or not OFA, FFA, and AT. However, in order to compare the current study to previous findings in CP, we label the MVPA activities in the lateral occipital, fusiform and AT cortex as, respectively, OFA, FFA, and AT (**Figures 1**, **2**).

Results showed that, in controls, OFA and the FFA activity could discriminate between face and non-face (i.e., objects, bodies, body parts) stimuli above-chance (**Figure 1**). This result is in line with previous human neuroimaging (Pitcher et al., 2009), lesion (Barton, 2008), and animal (Tsao et al., 2008a) studies indicating the critical role of the ventral visual system for face, body, and object processing (see Yovel and Freiwald, 2013 for a review). Despite the finding that occipito-temporal regions in people with CP could be used to discriminate face vs. nonface stimuli (**Figure 1**), the crucial direct comparison between CPs and control participants demonstrated reduced face-object discriminatory pattern in CP, which was evident in the right OFA, bilateral FFA, right AT, and right parahippocampal gyrus (**Figure 2**). The finding of OFA and FFA functional aberrations is in line with previous single case studies showing reduced (or absent) posterior face activity in CP (Hadjikhani and De Gelder, 2002; Bentin et al., 2007). However, this result is in disagreement with recent studies in groups of CP which show typical "core"

**Table 3 | Core face regions (i.e., OFA, FFA, STS) activity in the right (R) and left (L) hemisphere for both controls and CPs.**


*"x" indicates the presence of a particular face region in a subject, whereas a blank space indicates its absence (activity thresholded at p < 0.05 FWE).*

albeit impaired "extended" face regions (Avidan and Behrmann, 2009; Avidan et al., 2013), and points toward the better sensitivity of MVPA with respect to univariate analysis of group fMRI data. In addition, "core" face-region aberrations demonstrate the crucial involvement of early face regions in CP, thus potentially positing against a "disconnection syndrome," which characterizes CP as the result of the functional isolation between (relatively spared) posterior face regions and (impaired) anterior face nodes (Avidan et al., 2013; Rivolta et al., 2013).

In agreement with Avidan et al. (2013), we reported atypical AT face-sensitive activity in CP. This finding further suggests the pivotal role of AT for typical face processing (Williams et al., 2006). However, in contrast to Avidan's et al. (2013), we also showed AT face-object group differences for unfamiliar, and not just famous, faces. Human (Rajimehr et al., 2009) and monkey (Tsao et al., 2008b) studies suggested that the AT patches respond to face stimuli in general, but are particularly sensitive to face identity. Given that the current study did not adopt familiar/famous faces and did not involve any identity or learning process, our finding of diminished unfamiliar-face vs. object discrimination in CP further demonstrates the sensitivity of MVPA analysis for the decoding of atypical neurophysiological properties of the human face recognition system.

A core face region that did not show MVPA face-object discriminant activity in either CPs or controls was the superior temporal sulcus (STS). The STS has been previously implicated in changeable aspects of face processing (Hoffman and Haxby, 2000; Puce and Perrett, 2003), facial emotions expression (Said et al., 2010) and facial dynamics (Schultz et al., 2013) (see Haxby et al., 2000 for a review). Given that we used static stimuli that did not show facial expressions, it is likely that that our experimental setting was not the most appropriate for engaging STS activity.

Overall, these results demonstrate for the first time with MVPA that both "core" and "extended" face regions show abnormal pattern of fMRI activity in CP. Thus, aberrant activity in a network including occipital and temporal regions mediates atypical face processing skills in CP. It is important to note, however, that since the MVPA analysis adopted only tests the for neural discrimination accuracy between category pairs (i.e., face vs. object), we cannot claim that the CP reduced face-object discrimination is truly face-specific. In theory, the CP aberrant discrimination pattern could have been equally driven by object or face processing. The lack of an object-body and object-body part group difference seems to exclude an object-specific coding problem. However, in the same fashion, the lack of face-body and face-body part group differences seems to rule out a face-specific problem in CP. Given the nature of the condition, which is often characterized by a disproportionate deficit in face processing (Duchaine and Nakayama, 2005), and given that the group differences appears in brain areas strongly implicated in face (Haxby et al., 2000; Avidan et al., 2013), rather than object (Kanwisher, 2010) processing, it seems however plausible to suggest that the group difference depicts a "face-driven" MVPA accuracy reduction in CP.

A finding never reported before in CP neuroimaging literature is the reduced face-object discrimination in the right parahippocampal gyrus. Given that the parahippocampal gyrus is a region strongly implicated in memory processing (Davachi et al., 2003) and involved in unfamiliar (Rivolta et al., 2014) and familiar (Leveroni et al., 2000) face perception, our results point toward a potential anatomical locus of face-object processing problems in CP. We note that the 1-back task did not tax memory, and CPs and controls did not differ in their performance on this task. It is, thus, possible that reduced face-object discrimination in the parahippocampal gyrus may reflect poor face memory in CP, as highlighted by their poor performance on the CFMT (see **Table 1**). Future studies which adopt tasks specifically tapping memorial aspects of face processing may clarify why reduced sensitivity was seen in this area.

Our finding of face-body, face-body part, object-body, and object-body part representations within the occipital and fusiform cortices (both in controls and CPs) are consistent with previous studies (Bar et al., 2006; Peelen and Downing, 2007) highlighting the importance of posterior ventral regions for body and object processing. The absence of group differences for face vs. body/body-parts activity albeit in agreement with previous behavioral studies suggesting typical body processing in CP (Duchaine et al., 2006), disagrees with previous EEG (Righart and de Gelder, 2007) and fMRI (Van den Stock et al., 2008) evidence reporting neurophysiological group differences, thus highlighting the need for future investigations.

#### **CONCLUSIONS**

The current study demonstrates that face-object discriminatory abilities in the lateral occipital cortex, fusiform gyrus, AT cortex and parahippocampal gyrus are compromised in people with CP. Although our analysis cannot directly posit for a "face-driven" coding problem in CP, the clinical features of the condition and the localization of the group differences in well known "core" and "extended" face regions seems to posit for a pivotal contribution of CP face processing deficits for the neural pattern observed. Thus, both core- and extended- face networks appear to reflect the behavioral abnormality congenital prosopagnosics experience in everyday life and elucidates a neural marker of CP. Future studies should further investigate the face-specificity issue by, for instance, testing the neural representation of multiple exemplars of individual faces and objects.

#### **ACKNOWLEDGMENTS**

We wish to thank Regine Zopf for programming the "one-back" task and C. Ellie Wilson for the help she offered in recruiting participants with CP and in programming the MFFT-08. We also with to thank the Kanwisher lab (MIT) for providing stimuli we adopted in the one-back task. This work was supported by the Macquarie University Research Excellence Scholarship (iMQRES) to DR. Mark A. Williams is supported by the Australian Research Council Fellowship Schemes (DP0984919). Alexandra Woolgar is the recipient of an Australian Research Council Discovery Early Career Researcher Award (DECRA, DE120100898).

#### **REFERENCES**


Ishihara, S. (1925). *Tests for Colour-Blindness*, *5th Edn.* Tokyo: Kanehara.

Kanwisher, N. (2010). Functional specificity in the human brain: a window into the functional architecture of the mind. *Proc. Natl. Acad. Sci. U.S.A.* 107, 11163–11170. doi: 10.1073/pnas.1005062107


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 23 June 2014; accepted: 30 October 2014; published online: 13 November 2014.*

*Citation: Rivolta D, Woolgar A, Palermo R, Butko M, Schmalzl L and Williams MA (2014) Multi-voxel pattern analysis (MVPA) reveals abnormal fMRI activity in both the "core" and "extended" face network in congenital prosopagnosia. Front. Hum. Neurosci. 8:925. doi: 10.3389/fnhum.2014.00925*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Rivolta, Woolgar, Palermo, Butko, Schmalzl and Williams. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The rehabilitation of face recognition impairments: a critical review and future directions

## *Sarah Bate\* and Rachel J. Bennetts*

*Department of Psychology, Faculty of Science and Technology, Bournemouth University, Poole, UK*

#### *Edited by:*

*Davide Rivolta, University of East London, UK*

#### *Reviewed by:*

*Jeremy Tree, University of Swansea, UK Joseph M. DeGutis, Harvard University, USA Lucy Yardley, University of Southampton, UK*

#### *\*Correspondence:*

*Sarah Bate, Department of Psychology, Faculty of Science and Technology, Poole House, Bournemouth University, Fern Barrow, Poole BH12 5BB, UK e-mail: sbate@bournemouth.ac.uk*

While much research has investigated the neural and cognitive characteristics of face recognition impairments (prosopagnosia), much less work has examined their rehabilitation. In this paper, we present a critical analysis of the studies that have attempted to improve face-processing skills in acquired and developmental prosopagnosia, and place them in the context of the wider neurorehabilitation literature. First, we examine whether neuroplasticity within the typical face-processing system varies across the lifespan, in order to examine whether timing of intervention may be crucial. Second, we examine reports of interventions in acquired prosopagnosia, where training in compensatory strategies has had some success. Third, we examine reports of interventions in developmental prosopagnosia, where compensatory training in children and remedial training in adults have both been successful. However, the gains are somewhat limited—compensatory strategies have resulted in labored recognition techniques and limited generalization to untrained faces, and remedial techniques require longer periods of training and result in limited maintenance of gains. Critically, intervention suitability and outcome in both forms of the condition likely depends on a complex interaction of factors, including prosopagnosia severity, the precise functional locus of the impairment, and individual differences such as age. Finally, we discuss future directions in the rehabilitation of prosopagnosia, and the possibility of boosting the effects of cognitive training programmes by simultaneous administration of oxytocin or non-invasive brain stimulation. We conclude that future work using more systematic methods and larger participant groups is clearly required, and in the case of developmental prosopagnosia, there is an urgent need to develop early detection and remediation tools for children, in order to optimize intervention outcome.

**Keywords: face recognition, prosopagnosia, neurorehabilitation, cognitive training, face processing**

## **INTRODUCTION**

Prosopagnosia is a cognitive condition characterized by a relatively selective deficit in face recognition. Traditionally the disorder has been described in a small number of individuals who acquire face recognition difficulties following neurological injury or illness, typically affecting occipitotemporal regions (De Renzi et al., 1994; Gainotti and Marra, 2011). Although acquired prosopagnosia (AP) in its purest form is a rare condition (Gloning et al., 1967; Zihl and von Cramon, 1986), many more individuals with brain damage are believed to experience moderate-to-severe face-processing deficits alongside other cognitive impairments (Hécaen and Angelergues, 1962; Valentine et al., 2006). Further, as many as 2.9% (Bowles et al., 2009) of the population may experience developmental prosopagnosia (DP)—an apparently parallel form of the disorder that occurs in the absence of neurological injury or lower-level visual deficits (e.g., Duchaine and Nakayama, 2005; Bate and Cook, 2012). While some people cope relatively well with prosopagnosia, it can have a devastating effect on an individual's everyday social and occupational functioning (Yardley et al., 2008). Hence, exploration of the remediation of prosopagnosia is an urgent clinical issue that, unfortunately, has received little attention to date. It is important to note that rehabilitation is not necessary in all cases of prosopagnosia—some people with DP cope relatively well, and many devise their own strategies to recognize the people around them (e.g., Fine, 2012). However, Yardley et al. (2008) note that the majority of their participants reported negative psychosocial experiences related to DP, particularly at a younger age. As such, investigations into the effectiveness of remediation techniques—especially those used in children—are important both on a theoretical and a practical level.

The few studies that have attempted to remedy face-processing deficits in individuals with AP or DP are summarized in **Table 1**. In the current paper, we present a critical review of substantive published attempts to rehabilitate AP and DP, examining both the design of each training programme and the research participants themselves, in an attempt to place the findings in the context of the wider neurorehabilitation literature. It has been argued that the main aim of neuropsychological rehabilitation is to reduce the impact of impairments on everyday living, whether through



restoration of function or the adoption of coping strategies (Wilson, 2003). In the context of face recognition, rehabilitation may therefore encourage an individual to develop compensatory strategies that aid person recognition, or attempt to restore—or, in the case of DP, to develop—normal face-processing mechanisms via more extensive visuo-cognitive training (referred to as "remedial training" in this paper). Although the neurorehabilitation literature is vast, it has seldom been applied to disorders of face-processing. As such, current research offers little guidance as to which approach (compensatory or remedial) may be more effective in prosopagnosia, or the factors that may influence the effectiveness of each method. Therefore, the main aim of this review is to provide guidance on this issue.

First, we address the question of whether the typical faceprocessing system retains neuroplasticity throughout the lifespan – in other words, is there evidence that the face-processing system might be able to learn or improve face-specific processing mechanisms at any point in time, or should prosopagnosia interventions focus primarily on critical periods of development or the development of compensatory strategies? Second, we examine intervention studies in AP and DP, with a specific focus on factors that may affect success, including the nature of the disorder, the type of intervention, and individual differences between participants. Finally, we discuss future directions in the rehabilitation of prosopagnosia.

#### **DOES THE TYPICAL FACE-PROCESSING SYSTEM REMAIN PLASTIC THROUGHOUT THE LIFESPAN?**

The term "neuroplasticity" typically refers to a neural system's capacity to learn new skills or improve existing capabilities, either during normal development or after neurological damage (e.g., Huttenlocher, 2002). Traditionally, there have been two main theories on neuroplasticity (Thomas, 2003). The first proposes that an innate blueprint specializes cognitive systems for a particular function, which emerges during critical periods within development. This perspective suggests that once the relevant neural structures have been specialized for their purpose, any damage can only be overcome by the adoption of compensatory behavioral strategies. In face-processing, this might take the form of recognizing people based on individual facial features, or using additional semantic cues during face encoding. In contrast, the other viewpoint proposes that the brain retains plasticity throughout the lifespan, and hidden reserves may aid the acquisition of new skills or compensate for damage—providing that appropriate intervention techniques are used. Drawing on the available neurorehabilitation literature, Thomas (2003) concludes that the brain's structures are not irreversibly determined by an innate plan, but plasticity is nevertheless limited. Further, these limits may fluctuate throughout development, and are not necessarily consistent across different neural systems. Therefore, before examining neuroplasticity in the context of prosopagnosia, it follows that neuroplasticity within the typical face-processing system should be examined. That is, is it theoretically possibly that face recognition skills can be improved at any point in the lifespan, or does research using neurotypical participants indicate that any plasticity in the neural face-processing system is short-lived following birth?

A dominant theory of the development of face-processing posits that crude brain circuits become specialized for face recognition in response to early visual experience with faces (the "perceptual narrowing" hypothesis: Nelson, 2001). Evidence supporting this theory comes from findings that very young infants can discriminate between monkey and other-race faces, whereas older infants and adults no longer have this ability (e.g., Pascalis et al., 2002; Kelly et al., 2007). Although these findings suggest some plasticity in the face-processing system in the first few months of life, Nelson suggests that early specialization of neural tissue for face-processing may lead to a lack of plasticity in later years.

Behavioral studies tracking the development of face recognition skills also suggest that specialized face processing systems emerge early in life. In a review of developmental studies conducted to date, Crookes and McKone (2009) conclude that adultlike face-processing strategies are obtained by early childhood in qualitative if not quantitative terms, suggesting a window for plasticity only within the first years of life. For example, one key marker of mature face-processing skills is the ability to process faces on a holistic basis, taking into account the overall configuration of facial features and the spacing between them (Maurer et al., 2002). As Crookes and McKone note, evidence of holistic processing has been observed in children as young as 3 or 4 years using classical paradigms such as the face inversion effect (Sangrigoli and de Schonen, 2004), the composite effect (de Heering et al., 2007; Macchi Cassia et al., 2009a), the part-whole effect for upright but not inverted faces (Pellicano and Rhodes, 2003), and tests that assess sensitivity to spacing between facial features (McKone and Boyer, 2006; Pellicano et al., 2006). A second marker of adult-like face-processing skills is the "inner-feature advantage" whereby adults are more proficient at recognizing familiar faces from the inner compared to the outer features (Ellis et al., 1979; Young et al., 1985)—a preference that has also been observed in children as young as 5 years of age (Wilson et al., 2007). Further, Pozzulo and Lindsay (1998) reported a meta-analysis that summarized findings from eyewitness studies that used children as participants. In agreement with the above studies, the authors noted that children as young as 5 years of age display adult-like performance in their ability to identify perpetrators from target-present (but not target-absent) line-ups. These studies therefore indicate that, despite evidence indicating a large increase in face recognition ability throughout childhood (presumably due to the need for more generalized mechanisms to develop), there is no qualitative change in face perception beyond 4–5 years of age. In fact, given increasing evidence that even infants are capable of holistic processing (Cohen and Cashon, 2001; Bhatt et al., 2005; Hayden et al., 2007) it is possible that face-processing skills are fully-developed at a very early age, implying a limit on plasticity beyond early childhood. This idea is supported by studies of adolescents and adults who were born with dense cataracts—despite the fact that the cataracts were removed before 7 months of age, participants show abnormal face-processing skills (Le Grand et al., 2001, 2004) but normal object discrimination (Robbins et al., 2010), indicating that early visual input is particularly important for the development of face-processing mechanisms.

While early visual input may be necessary for the initial development of face-processing mechanisms, it remains possible that these mechanisms can be refined or altered later in life. Despite evidence of early commitment to face-specific regions, neuroimaging studies suggest that the cortical face-processing system (Haxby et al., 2000; Gobbini and Haxby, 2007) continues to develop well into adolescence. For instance, Passarotti et al. (2003) found more diverse activation in the fusiform region for children as opposed to adults. Similarly, Gathers et al. (2004) reported that activation in the fusiform gyrus is not greater for faces compared with objects until 10 years of age, although they did note such activation more posteriorly in the inferior occipital region. Other studies suggest that both activation of the core faceprocessing system and connectivity between the different neural areas changes between the ages of 7 and 11 years (Cohen Kadosh et al., 2011, 2013). Event-related potential (ERP) components also continue to mature through late childhood into early adolescence: Taylor et al. (2004) reported that face inversion did not influence the face-specific N170 response until 8–11 years of age. While these findings raise the possibility that plasticity may remain in the face-processing system at least until adolescence, De Schonen et al. (2005) warn that plasticity during typical brain development is most likely due to modification of synaptic organization, rather than redistribution of face-processing mechanisms to other cortical regions. Hence, these findings do not imply that other neural areas can simply take over face-processing following brain damage.

There are also several lines of evidence that support the idea that the face-processing system may retain some plasticity even in adulthood. For instance, Germine et al. (2010) tested over 60,000 participants aged from pre-adolescence to middle-age on their ability to learn new faces. In three experiments, Germine and colleagues found that face learning ability improves up until the age of 30, although the recognition of inverted faces and name recognition peak at a much earlier age. Other evidence supporting plasticity in the adult face-processing system comes from studies of the other-race effect, or the finding that we are better at recognizing faces from our own race than those from other races (e.g., Malpass and Kravitz, 1969). Critically, one of the explanations for this effect is based on the presumption that the phenomenon reflects the lack of experience the viewer has had with faces from the other race (Meissner and Brigham, 2001; Hancock and Rhodes, 2008). Although the effect has been observed in infants as young as 3 months of age (e.g., Sangrigoli and de Schonen, 2004; Kelly et al., 2005, 2007), evidence suggests it remains plastic and reversible even in adulthood. Specifically, Hancock and Rhodes (2008) found a reduced other-race effect, accompanied by increased holistic processing, for participants who reported higher levels of contact with another race (see also Meissner and Brigham, 2001; Sangrigoli et al., 2005; de Heering and Rossion, 2008; Kuefner et al., 2008; Macchi Cassia et al., 2009b; Rhodes et al., 2009, for similar studies of the "own-age bias"). More interestingly, though, training can improve recognition of other-race faces. Tanaka and Pierce (2009) trained Caucasian students to discriminate between African-American and Hispanic faces, and reported an improvement in the recognition of novel stimuli of the same race, along with changes to the N250 ERP component to the other-race faces (see also Elliott et al., 1973; McKone et al., 2007). Notably, McKone et al. (2007) showed normal levels of holistic processing for trained cross-race faces, indicating that training can have an effect on the manner in which faces are processed, not just the accuracy with which they are identified.

In sum, behavioral and neural investigations using typical participants suggest that the face-processing system may retain some plasticity throughout childhood and into adulthood. This raises the possibility that it may be possible to rehabilitate face recognition deficits, at least in some circumstances.

## **NEUROREHABILITATION OF ACQUIRED PROSOPAGNOSIA**

Anderson et al. (2001) outline two potential means of recovery following brain injury: the spontaneous healing of damaged tissue may lead to reactivation of pre-existing neural pathways, or anatomical reorganization may allow different neural areas to take over the behavioral function of the damaged area. Given evidence that the face-processing system retains some plasticity in adulthood, remediation of face-processing skills following neurological injury may be possible. However, as with any other acquired deficit, it is likely that a number of general constraints will influence the success of intervention. These might include the age at which the lesion was acquired, the severity of the lesion, and the precise functional implications of the lesion. These factors may dictate the type of intervention that is suitable for the individual, and whether it should focus on compensatory rather than remedial training.

### **TIMING OF INJURY**

There is a general view that the developing brain has greater plasticity than the adult brain: Huttenlocher (2002) concludes that, across the neurorehabilitation literature, neuroplasticity in adults has generally been found to be lower than in children. Further, in early development there are higher levels of some genes and proteins that are required for neuronal growth, synaptogenesis and the proliferation of dendritic spines, and these levels significantly reduce with aging (Huttenlocher and Dabholkar, 1997). It therefore follows that compensatory reorganization and transfer of function is more likely after early brain injury (e.g., Elbert et al., 2001).

If plasticity in the developing face-processing system is greater in childhood than in adulthood, one would predict that spontaneous recovery might occur in children to a greater extent than in adults. There have been some instances of recovery of prosopagnosia in adults in the absence of any formal attempts at rehabilitation (e.g., Malone et al., 1982; Lang et al., 2006), but this is by no means consistent: many other cases have found no evidence of improvement or recovery over time (e.g., Sparr et al., 1991; Ogden, 1993; Spillmann et al., 2000). However, work examining the effects of peri- or prenatal injuries on the development of face recognition skills suggests that the infant system may be more plastic following damage than the adult system. For instance, Mancini et al. (1994) found that perinatal unilateral lesions only had mild effects on later face-processing abilities in children ranging in age from 5 to 14 years. In fact, less than half of the children were impaired at face- or object-processing, and face-processing deficits were no more common than object-processing deficits following a right hemisphere lesion.

Although these studies suggest some level of neural reorganization is possible following early damage (see also Ballantyne and Trauner, 1999), it is important to note that age of injury does not have a straightforward relationship with plasticity in the faceprocessing system. De Schonen et al. (2005) reported a similar study with a group of 5- to 17-year-olds who acquired unilateral posterior lesions involving the temporal cortex during the pre-, peri- or postnatal period. In general, deficits in low-level configural processing were related to face-processing deficits in patients with a lesion acquired before or at birth, when visual experience starts. These findings converge with other work in the neurorehabilitation literature indicating that there may be a U-shaped effect of damage, with prenatal injury leading to the poorest outcome (i.e., with no evidence of transfer of function from the damaged site to intact tissue: Anderson et al., 2001); greater plasticity in early childhood leading to cortical reorganization and greater sparing of function; and more limited plasticity in late adolescence and adulthood. In a similar vein, advanced age at the time of injury may result in less complete recovery compared to younger persons with comparable injuries (Katz and Alexander, 1994). However, the mechanisms of this phenomenon are not known, and it may simply be that increasing age leads to a reduced capacity for compensation or reduced cognitive reserve (Lye and Shores, 2000)—in other words, a more general cognitive decline due to ageing may make it more difficult to relearn old skills or acquire new compensatory strategies.

Another factor that should be taken into account when considering age of injury is the extent of the lesion. Pediatric research has indicated that children with generalized cerebral insult can exhibit both slower recovery and poorer outcome than do adults who suffer similar insults, possibly because attention, memory and learning skills have not been fully developed (Hessen et al., 2007). Without these capacities, the child does not have the tools to efficiently acquire new abilities and cannot progress along the normal pathway of cognitive development.

In sum, evidence from lesion studies suggests that early neurological damage may be more amenable to rehabilitation, but this is modulated by complex interactions with the exact timing and extent of the damage. Currently it is difficult to relate this directly to the prosopagnosia rehabilitation literature, as there is only one study that has attempted to remedy AP in childhood. Ellis and Young (1988) studied an 8-year-old child (KD) who acquired prosopagnosia after anesthetic complications damaged the lateral third and fourth ventricles at 3 years of age (see **Table 1**). The authors suggest that a persistent left-sided motor weakness implied a right hemisphere lesion, whereas initial loss of vision following the incident suggested bilateral occipital damage. She also had object agnosia, and the underlying deficit seemed to be an inability to construct adequate representations of visual stimuli. The researchers designed a remedial training programme that required KD to complete four tasks over a period of 18 months, including (1) simultaneous matching of photographs of familiar and unfamiliar faces, (2) paired discriminations of computer-generated schematic faces, (3) paired discriminations of digitized images of real faces and (4) the learning of face-name associations. Unfortunately, none of the programmes brought about an improvement in KD's face-processing skills. It is unclear why this programme failed to work, although it is likely that the extensive bilateral damage may have prevented any gains (see section Lesion Size and Location). Notably, this is the only study to date that has attempted to remedy AP acquired as a child, *and* the only study to attempt rehabilitation of a child with AP. As such, it is difficult to assess whether the lack of improvements following this intervention relate to the timing of the injury (3 years of age) or the timing of the intervention (8 years of age), or to comment on the cognitive characteristics/skills that may impact the success of the intervention (e.g., co-occurring object agnosia).

While age of injury may be an important determinant of the success of rehabilitation in AP, the timing of the intervention relative to the injury could also be an important consideration when planning interventions. For example, evidence from the stroke literature suggests that the speed of intervention following the cerebral incident may be fundamental for success. Some studies propose that there are parallels between plasticity mechanisms in the developing nervous system and those occurring in the adult brain immediately following stroke, but that this plasticity diminishes quickly (Biernaskie et al., 2004; Carmichael et al., 2005; Brown et al., 2009). This indicates that the brain may be most receptive to interventions immediately after a stroke, and suggests that early intervention could be crucial in these cases. However, it is currently unknown whether this temporarily increased plasticity extends to (a) the face-processing system, and (b) prosopagnosia acquired from insults other than stroke; it is also unclear whether it interacts with the age of the patient or other factors such as lesion location or severity.

## **LESION SIZE AND LOCATION**

Many causes of the lesions that bring about AP have been reported, including stroke, carbon monoxide poisoning, temporal lobectomy, encephalitis, neoplasm, and head trauma. Further, recent reports have described cases of AP alongside degenerative conditions such as frontotemporal lobar degeneration (Josephs, 2007) and posterior cortical atrophy (McMonangle et al., 2006; Sugimoto et al., 2012), and after temporal lobe atrophy (Joubert et al., 2003; Chan et al., 2009). With such a wide range of preceding causes, attempts to rehabilitate AP must take into account the extent and location of neurological damage, and in particular how different patterns of damage may be associated with different deficits. For example, some recent detailed analyses indicate that the primary site of damage in most cases is to posterior regions of the brain (e.g., Arnott et al., 2008). However, damage to more anterior regions has been reported to bring about "prosopamnesia," a condition in which patients retain the ability to recognize faces that they knew before the neurological accident, but cannot create stable representations of new faces in memory (e.g., Crane and Milner, 2002). As no attempts have been made to rehabilitate prosopamnesia, it is unknown whether one type of impairment is more amenable to intervention.

Lateralization of the lesion is another potentially important consideration. It was traditionally thought that AP results from unilateral damage to the right hemisphere, particularly the right occipitotemporal area. In line with this hypothesis, De Renzi et al. (1994) reported unilateral occipitotemporal lesions in three cases of AP, and cited 27 previously reported cases that presented with similar damage. However, some reports suggest the disorder can also result from unilateral left hemisphere lesions (Mattson et al., 2000; Barton, 2008), although De Renzi et al. (1987) suggested that prosopagnosia resulting from left hemisphere lesions can result in a more variable pattern of symptoms, and Gainotti and Marra (2011) suggest that AP cases involving left and right hemisphere lesions present with different patterns of functional impairment. This suggests that right and left hemisphere cases may warrant different methods of intervention (see section Identifying the Functional Impairment).

AP has also been reported in the context of bilateral damage (e.g., Damasio et al., 1982; Barton et al., 2002; Boutsen and Humphreys, 2002). Some authors have suggested that unilateral lesions bring about more selective impairments in face-processing, whereas bilateral lesions cause more extensive disruption (Warrington and James, 1967; Boeri and Salmaggi, 1994). This latter suggestion seems logical, given that, when only one hemisphere is affected, it is plausible that neural areas in the undamaged hemisphere might compensate for lost abilities at least to some degree; whereas no such compensation can occur in individuals with damage to both sides of the brain. Indeed, in the more general neurorehabilitation literature, functional plasticity is generally not observed in cases of bilateral damage, and greater damage tends to lead to worse outcomes. Broadly speaking, plasticity is most associated with focal lesions where true recovery with relatively little compensation is possible, presumably because some of the tissue that is crucial for function is unaffected by the lesion (Moon et al., 2009). While large focal lesions may also be associated with good recovery, this tends to only occur when damage is unilateral.

When looking at instances of spontaneous recovery from AP, there is some indication that this occurred following unilateral (Glowic and Violon, 1981; Lang et al., 2006) rather than bilateral (Sparr et al., 1991; Ogden, 1993) damage. When it comes to formal interventions (summarized in **Table 1**) two of the three AP studies that have reported some success involve patients with unilateral damage (i.e., Polster and Rapcsak, 1996; Francis et al., 2002); the other study reporting improvement involved a patient with bilateral damage that did not consistently affect the same areas of the brain (Powell et al., 2008). The two interventions that failed to show improvement (Ellis and Young, 1988; De Haan et al., 1991b) both involved patients with apparently more extensive bilateral damage.

#### **IDENTIFYING THE FUNCTIONAL IMPAIRMENT**

Initial cognitive assessments are required to inform the design of an intervention programme, although previous attempts at cognitive neuropsychological rehabilitation have often failed to follow this principle (Wilson and Patterson, 1990; Hillis, 1993). Fortunately, we have a relatively sophisticated understanding of the cognitive and neural underpinnings of the face-processing system, and dominant models of face recognition have traditionally been used to interpret cases of prosopagnosia and to guide intervention strategy. Traditionally, the face-processing system has been viewed as a sequential and hierarchical multi-process system, where impairment can occur at a variety of stages (Bruce and Young, 1986; see **Figure 1**). Specifically, an initial stage of early visual analysis is followed by "structural encoding," where view-centered representations (used to perceive changeable aspects of the face, such as emotional expression) are transformed into viewpoint-independent representations (used to perceive unchangeable aspects of the face—most notably identity). The face recognition units (FRUs) compare all stored representations of familiar faces to an incoming percept. If a match is achieved, access to semantic information is provided by the relevant person identity node (PIN), culminating in retrieval of the person's name. Although these processes are widely distributed across many neural systems that work in concert to process faces, specialized anatomical structures have been identified that largely map onto the functional stages proposed in the cognitive model (Haxby et al., 2000; see **Figure 1**).

The modular model permits disruption either to specific subprocesses, or to the connections between different units. The sequential nature of the model assumes that processing cannot be continued (at least at an overt level) past a damaged stage. Thus, prosopagnosia may result from three loci of damage within the framework: first, an AP may be unable to construct an adequate percept of a face, which would affect all later stages of processing (i.e., they would be unable to recognize a face as familiar or identify it; e.g., patient HJA: Humphreys and Riddoch, 1987; patient BM: Sergent and Villemure, 1989); second, an AP may be able to achieve a normal face percept but cannot access stored face memories (the FRUs)—in this case, they would be unable to ascertain familiarity or identity (e.g., patient LH: Etcoff et al., 1991; patient NR: De Haan et al., 1992); or third, an AP may be able to perceive faces and make familiarity judgments, but fail to access personspecific information or PINs—in this case, they would achieve a normal face percept and a sense of familiarity with a face, but identification (i.e., access to any semantic information about the person) would remain poor (e.g., patient ME: De Haan et al., 1991a).

In the majority of cases reported in the literature, patients with AP retain the ability to recognize people on the basis of other, non-face cues (e.g., body, voice). In some cases, however, impairments in face recognition are a subset of a more general person recognition problem—this is often associated with damage to the right anterior temporal lobe (Gainotti, 2013). In other words, these cases represent a subtly different type of disorder—one of semantic memory. Various interpretations of the exact nature of semantic disorders of this type exist, including impaired overt access to an output from semantics (Hanley et al., 1989), inability to use a "common access point" to gain semantic information (De Haan et al., 1991a), actual loss of person-based semantic knowledge (Evans et al., 1995; Laws et al., 1995), and damage to a specialized semantic store that contains information about singular objects (Ellis et al., 1989).

It therefore follows that an initial assessment should identify the functional locus of the impairment—be it perceptual, mnemonic, or a more general semantic memory problem—and training should be tailored to that weakness. Several cases in the AP rehabilitation literature demonstrate the importance of tailoring training programmes to the locus of the deficit. Most strikingly, Francis et al. (2002) created a number of therapy tasks tailored to patient NE, who had deficits at both structural and semantic levels, and/or deficits in the access links between structural and semantic knowledge. In three studies, the authors demonstrated that therapy was effective when it emphasized semantic information about people, and linked this knowledge to visual representations (imagery or photographs of faces); whereas therapy directed at processes that were not underpinning the impairment (i.e., name retrieval) was unsuccessful. In another case, Powell et al. (2008) investigated the rehabilitation of face recognition deficits in 20 adults who presented with a broad range of cognitive impairments following brain injury. The participants completed three training programmes targeted at the recognition of unfamiliar faces, comprised of (1) a semantic association technique that provided additional verbal information

about faces, (2) caricatured versions of target faces for recognition, and (3) a part-recognition technique that drew participants' attention toward distinctive facial features. The patient group as a whole showed small improvements in each of the three training conditions compared to a control condition where participants were simply exposed to faces. However, when the techniques were applied to a single case of profound acquired prosopagnosia (patient WJ, described in McNeil and Warrington, 1993; see **Table 1**), little or no improvement was observed following the semantic association and caricaturing programmes, whereas the part-recognition technique yielded 25% greater accuracy than the control condition. This result may be explained by focussing on the functional locus of impairment: WJ was impaired at the level of structural encoding, and relied on a feature-by-feature processing strategy that could be boosted by compensatory training. In some ways this is a surprising finding given that many prosopagnosics adopt this strategy in everyday life, and one might expect that WJ would naturally be using the technique even in the "simple exposure" condition. Nevertheless, this finding suggests not only that part recognition may be an effective method of circumventing damage to the typical face recognition system, but also that training in use of the technique may further boost a compensatory strategy that many individuals with prosopagnosia naturally adopt.

Clearly though, regardless of whether training is targeted at the impairment itself, other influences may prevent training success (e.g., KD, Ellis and Young, 1988). For instance, different levels of impairment may be more or less amenable to treatment: a number of authors have argued that prosopagnosia arising from perceptual deficits is most resistant to treatment and also least likely to show treatment generalization effects (Wilson, 1987; Ellis and Young, 1988; Francis et al., 2002). Polster and Rapcsak (1996) examined the effects of "deep encoding"—that is, incorporating personality judgments or providing names and other semantic information at the point of encoding—in patient RJ. They found that RJ, who showed face perception impairments, did not benefit from "shallow" encoding instructions to focus on facial features, yet performed relatively well with "deep" encoding instructions where he was required to rate faces in terms of their personality traits or was provided with semantic or name information during the study phase. The authors suggest that semantic information may aid recognition memory by establishing additional visually derived and identity-specific semantic codes. However, the gains did not generalize to novel viewpoints of the learned faces, and the authors conclude that the patient simply could not compensate for his inability to construct abstract structural codes that normally allow faces to be recognized from different orientations. Hence, even training in compensatory behavioral mechanisms could not circumvent the severity of the patient's face perception impairment.

While perceptual difficulties may well contribute to intervention success, it is of note that another study failed to rehabilitate an AP adult with higher-order impairments, patient PH. PH had profound face recognition impairments, but was found to display some covert recognition on several behavioral tasks, indicating he had a higher-level impairment affecting the FRUs or PINs, or the connection between them. Based on the knowledge that PH was capable of face recognition on an unconscious level, De Haan et al. (1991b) used a category-presentation method to try to improve the patient's face-processing skills. Specifically, PH was presented with the occupation performed by a set of famous people, and was asked to subsequently recognize their faces. Unfortunately, PH was only successful in recognizing faces from one of the six occupational categories that was used in the study, and the improvement was not maintained in a followup test 2 months later. This does not suggest that higher-order impairments cannot be remedied, but it does emphasize that, as discussed above, other factors such as age and lesion severity may contribute to the success of rehabilitation—it is pertinent to note that PH was an adult who had experienced bilateral damage to the temporo-occipital junction, and he did present with some perceptual impairments (see **Table 1**).

Finally, some cases of AP present with damage to more than one sub-process of the theoretical model. Francis et al. (2002)suggest that, when a patient's deficit is due to multiple impairments, intervention must target each of these in order for improvement to occur. For example, in their investigation described above, the authors found that therapy targeted at only one of NE's deficits (the semantic problem) without considering the other (the prosopagnosia) was ineffective.

### **IMPLICATIONS FOR INTERVENTION: COMPENSATORY OR REMEDIAL TRAINING?**

One of the critical debates in neurorehabilitation is concerned with whether training should encourage the formation of behavioral compensatory mechanisms, or attempt to strengthen normal behavioral mechanisms (remedial training). There has been only one attempt to restore normal processing in a case of AP to date, which unfortunately was not successful (KD, Ellis and Young, 1988). Clearly, no conclusions can be drawn on the utility of remedial methods for acquired cases on a single case alone, particularly given the unusual characteristics of the case (i.e., the age of acquisition, treatment option, and lesion size and location: see section Lesion Size and Location).

While attempts at remedial training are currently very limited, three of the four published studies examining the use of compensatory strategies in AP report some success (see **Table 1**). It is of note that two of these studies describe individuals with similar perceptual deficits in face-processing, yet found success using different techniques. While Powell et al. (2008) found a benefit of part-based but not semantic encoding for WJ, Polster and Rapcsak (1996) found a greater benefit for semantic or "deep" encoding than part-based encoding for patient RJ. It is unclear why featural and not semantic training helped WJ whereas the reverse pattern was observed in RJ, but these reports suggest both techniques may be beneficial, albeit for different individuals.

Of the studies presented in **Table 1**, only one of the four compensatory training studies had no effect—the study presented by De Haan et al. (1991b). Pertinently, the patient described in this study differs from those in the other studies, as they had a severe mnemonic rather than perceptual difficulty, and had also suffered bilateral damage. Based on the limited available evidence, compensatory training therefore appears to be more successful in AP than remedial techniques. Yet, further research is clearly required to examine the utility of remedial training in this form of the condition, and to assess which factors may influence the success of various training methods—for example, perhaps remedial training is more effective for patients with unilateral lesions, or for those with mnemonic deficits. Indeed, research into face-name encoding in Alzheimer's disease has had some success with remedial mnemonic techniques such as errorless learning and spaced retrieval (e.g., Haslam et al., 2011), but these techniques have not yet been applied in mnemonic cases of AP.

Understanding the conditions in which remedial techniques are effective is particularly important given that the wider neurorehabilitation literature suggests their benefits are larger than those of behavioral compensation (e.g., Sitzer et al., 2006). Within the AP literature, compensatory techniques show some limitations: NE (Francis et al., 2002) showed significant gains following training, but despite her success in the laboratory, she continued to encounter substantial problems in everyday life. She interpreted this as a case of competing demands—she was using a highly contrived method for remembering and recognizing new people, as well as coping with more general memory deficits. Such instances highlight the limitations of compensatory training, and should remedial training prove effective for at least some cases of AP, this may be a preferable option in terms of outcome.

## **DEVELOPMENTAL DISORDERS**

#### **DP AND NEUROPLASTICITY**

While we do not yet have a complete understanding of the genetic, neurological, and cognitive underpinnings of DP, it is viewed by most as a parallel disorder to AP. Yet, some caution should be exercised in application of the principles of neurorehabilitation discussed above to the developmental form of the condition. Thomas (2003) notes that developmental disorders represent the limits of plasticity, given that spontaneous reorganization and compensation during the natural developmental process do not overcome whatever abnormalities are underpinning the condition, as they may do following focal damage in the peri- or postnatal period (e.g., Mancini et al., 1994). Granted, it would be very difficult to actually find any cases of spontaneous recovery in DP, and this is further complicated by our limited understanding of the developmental trajectory of the condition and the existence of any early biobehavioral indicators. Nevertheless, the persistence of deficits in developmental disorders suggest atypical limitations on plasticity rather than focal damage, perhaps because disruption to early brain development alters low-level neurocomputational constraints, which prevent certain neural regions from acquiring normal specialized functions (Thomas and Karmiloff-Smith, 2003). It has been suggested that DP can be attributed to a failure to develop the visuo-cognitive mechanisms required for successful face recognition (Susilo and Duchaine, 2013), although it is unclear whether this comes about via genetic influences (Kennerknecht et al., 2006) or unrelated neurological abnormalities (e.g., Behrmann et al., 2007; Garrido et al., 2009). Importantly, while there is some evidence for a genetic factor in DP, Pennington (2001) argues that the correspondence between genes and the complex behavioral phenotypes observed in heterogeneous disorders such as DP is many-to-many rather than one-to-one. Hence, it is unlikely that a specific gene or set of genes exists for certain cognitive functions, including face-processing.

Understanding the underpinnings of DP is an important issue when it comes to the design of intervention programmes: Karmiloff-Smith and colleagues warn that apparently normal behavior in developmental disorders may be achieved by compensatory strategies that obscure underlying atypical processes (Karmiloff-Smith et al., 2002). In the context of face-processing this is evident in Williams Syndrome, a chromosomal disorder where face recognition skills are apparently normal (e.g., Wang et al., 1995), yet are underpinned by poor configural processing mechanisms (Karmiloff-Smith et al., 2004). It is also clear that individuals with DP develop complex and intriguing compensatory strategies that permit them to disguise their face recognition impairment in many real life scenarios (e.g., Yardley et al., 2008), and it remains unclear whether these techniques can sometimes obscure impaired processing strategies on behavioral tests of face and object processing. Thus, an important implication for the design of intervention programmes is that apparently specific cognitive deficits in developmental disorders do not necessarily imply a specific and localized site of neural impairment as has traditionally been observed in cases of adult brain damage.

This latter point has important implications for the notion that training should target the locus of functional impairment (see section Identifying the Functional Impairment). Several authors have attempted to interpret DP within the same theoretical framework that has traditionally been used for AP (e.g., Bruce and Young, 1986), and have used these findings to subsequently inform their rehabilitation programmes (e.g., Brunsdon et al., 2006; Schmalzl et al., 2008). However, some caution should be exercised when applying developmental deficits to adult frameworks of normal functioning. The traditional cognitive neuropsychological approach adopts the logic that implications about cognitive structure can be derived from the patterns of behavioral impairment that are observed in adults with acquired brain damage—for instance, the assumption that particular cognitive systems have modular structures allows for the possibility that highly selective patterns of impairment implicate relative independence of different sub-processes. Interpretation of apparently similar patterns of deficits in developmental disorders is tempting, particularly as one might infer that specific impairments in acquired and developmental cases correspond to acquired damage to a particular module in the former, and failure to develop that module in the latter (notably, Temple, 1997; Temple, offers just such a characterization for cases of DP). Yet, this inference is controversial, and some researchers have argued that development itself violates the basic assumptions of classic cognitive neuropsychological models, and there is no reason to suppose that abnormalities in development lead to the production of a cognitive system that simply maps onto the fully developed system (Bishop, 1997; Karmiloff-Smith, 1997).

Alternative explanations for DP may be found in the neurodevelopmental theories described in section Introduction. For example, one might assume that the basic apparatus for the faceprocessing system are present, but an abnormality in development has prevented these brain areas from becoming specialized for faces. One theory that adopts this notion is the amygdala/fusiform modulation model (Schultz, 2005), which proposes that the preference for face-like stimuli seen in newborn infants is underpinned by functions in the amygdala that draw attention to social stimuli. This increased social attention is thought to consequently provide the scaffolding that supports social learning and modulates activity in the critical face-processing area of the brain, the fusiform gyrus (see **Figure 2**). This model has been used to explain the underpinnings of face-processing and socioemotional deficits in autism spectrum disorder (ASD), based on the premise that faces have less emotional salience for these individuals.

The theory that face-processing deficits in ASD stem from a lack of social interest in faces has informed the development of face training programmes, such as the *Let's Face It* package (Tanaka et al., 2003). *Let's Face It* is a series of computerized games that target the child's ability to attend to faces, in addition to identity and expression recognition skills. Some gains have been noted in ASD participants following participation in the programme (Tanaka et al., 2010), although it is unlikely that similar gains would result in DP given the proposed visuo-cognitive rather than socio-attentional underpinnings of the condition (e.g., Duchaine et al., 2010). Although we do not have a clear understanding of the actual underpinnings and developmental trajectory of DP, the evidence from the ASD literature suggests that intervention can initiate specialization within a crude face-processing system, and that there may be potential for remedial training techniques in developmental conditions.

### **COMPENSATORY OR REMEDIAL TRAINING?**

The more general neurodevelopmental literature casts doubt on the potential for remedial training in developmental disorders. For instance, Thomas (2003) concludes that only compensatory changes can take place in developmental disorders, as underlying abnormalities are built into the relevant neural structures preventing experience-dependent plasticity. De Haan (2001) presents an example of this argument using a group of individuals with ASD, none of whom could categorically perceive facial expressions. Yet, only those participants with lower IQs appeared to be impaired on an expression-recognition task, indicating that the individuals with higher IQs were using compensatory strategies to achieve good recognition by other means. She therefore allows that there is "a degree of plasticity in the developing system that allows for development of alternative strategies/mechanisms in face-processing" (p. 393), but little to no opportunity for remediation.

In the DP literature, there have been two attempts to improve face recognition via compensatory strategies, and two to remedy normal face-processing strategies (see **Table 1**). First, Brunsdon et al. (2006) attempted to improve face recognition skills in an eight year-old child (AL), who had problems perceiving and recognizing faces. The researchers gave AL a set of 17 personally known faces (i.e., those of friends and family) to learn on stimuli cards, while his attention was drawn to distinguishing features of the faces. AL continued training until he recognized all the faces in four consecutive sessions, which occurred after 14 sessions within a 1-month period. A similar technique was adopted by Schmalzl et al. (2008), in their work with K, a four-year-old girl with DP. K achieved 100% accuracy in four consecutive sessions after nine attempts at training, and eye movement recordings indicated that she spent a longer time viewing the inner facial features after training. Both children reported benefits to their everyday recognition of the trained faces, although the benefits of training did not generalize to untrained faces in AL (generalization was not tested in K).

On the other hand, DeGutis et al. (2007) described a remedial training programme that suggests normal networks can be strengthened in DP. They report the case of an adult with DP, MZ, who had severe impairments in face perception. The training task was administered over 14 months in two separate intervals. Training required MZ to perform a perceptual classification task repeatedly over large numbers of trials. Specifically, facial stimuli were adjusted to vary in 2 mm increments according to eyebrow height and mouth height. MZ was required to classify each face into one of two categories: those faces with higher eyebrows and lower mouths, and those faces with lower eyebrows and higher mouths. After training, behavioral evidence indicated that MZ's face-processing ability improved on a range of behavioral tasks. However, the most pertinent findings of the study came from changes in neurophysiological measures that were taken before and after training. Specifically, the authors

used electroencephalography to investigate whether MZ displayed a selective N170 response for faces compared with watches. Although this face-selective component was not evident before training, its selectivity after training was normal. Further, levels of functional connectivity between key areas of the neurological face-processing system (see **Figure 1**) were increased after training. The authors suggested the training task was likely successful because it allowed MZ to become sensitive to spacing differences around the eye region and nose/mouth region and encourage her to integrate the spacing of these features into a coherent representation of the face. This gain was specific to training with upright faces: 8000 training trials with inverted faces improved MZ's ability to classify inverted faces but did not improve her performance with upright faces. However, there are some important caveats to these findings. MZ showed limited maintenance of training gains: she reported that the behavioral benefits faded after a few weeks without training, and post-training measures showed that her face-specific N170 had reverted back to its pre-training lack of face sensitivity after 15 weeks without training. Notably though, when the authors attempted to retrain MZ 15 weeks after training stopped, fewer trials were required than in the initial training to restore her improved performance on the assessment tests.

These findings were given weight by DeGutis et al. (2014) who showed that holistic processing improved in 13 out of 24 DPs who completed the same training programme over a 3 week period. Interestingly, the DPs who responded better to training only differed from those who achieved little gains according to the CFMT (a test of face memory: Duchaine and Nakayama, 2006) and not tests of face perception. In fact, the DPs who responded most to training were initially poorer at the CFMT (i.e., their prosopagnosia was more severe), although this comparison was not significant when a *post-hoc* correction was applied.

In sum, while at least some success was achieved in all four DP studies reported to date, it is difficult to draw general conclusions on the utility of each technique, particularly given the differences in age between the participants. The next section evaluates the factors that may have influenced treatment outcome in the studies described above.

#### **OTHER INFLUENCES ON TREATMENT OUTCOME IN DP**

In the AP literature, a number of authors have argued that level of impairment in prosopagnosia is an important factor in treatment outcome, and particularly that prosopagnosia arising from perceptual deficits is more resilient to intervention and generalization (Wilson, 1987; Ellis and Young, 1988; Francis et al., 2002). Although it is currently unclear whether DP can also be partitioned into different functional subtypes, some individuals with DP do appear to present with deficits in face perception, whereas others do not (e.g., Bate et al., 2009). Interestingly, the two compensatory training studies used children who did have impairments in face perception, and while there was little evidence of generalization to other faces (analogous to the findings in the AP literature), the gains did translate to everyday life. These studies demonstrate that, in DP, the recognition of a set of familiar face photographs can be improved with relatively little but precisely targeted training, even in the context of severe face perception impairments. Perhaps more strikingly, everyday gains were also noted in the individual reported by DeGutis et al. (2007), who also had a severe face perception impairment. This finding indicates that it is possible to apply remedial programmes to individuals with perceptual impairments, at least in adults with DP. Critically, DeGutis et al. (2014) found that larger training gains appear to be associated with poorer face recognition performance, and were not related to perceptual abilities.

Given that DeGutis et al.'s (2014) remedial training programme was not successful in all DPs, it is likely that different subtypes of the condition are better suited to particular training methods. As only one (unsuccessful) remedial programme has been trialed with an AP participant, it remains unclear whether (a) DP is simply easier to treat than AP using remedial training, (b) perceptual deficits are not as severe in DP as in AP, (c) the methods used in the DP studies are simply more effective than those employed in the AP studies, or (d) the nature of the lesion in the AP participant precluded any improvement regardless of intervention strategy.

One might also question the influence of age in the DP studies (see section Timing of Injury). From the available evidence it is very difficult to draw any conclusions on the suitability of remedial or compensatory training for different age groups, given the former were only carried in adults, and the latter in children. However, the studies reported by DeGutis and colleagues indicate that plasticity is retained in adult DPs, and provides encouraging evidence for the use of remedial programmes even in adulthood. Whether the same benefits will be exacerbated in children is unknown, but Dalrymple et al. (2012) briefly describe a DP child, TM, for whom remedial training was not successful. She notes several explanations for this, including the severity of his prosopagnosia, the intensity of training, and motivational factors (the training was quite tedious). It is clear that, although successful training strategies are beginning to emerge in adult studies, these strategies will need to be adapted and made age-appropriate for children, even if they target similar mechanisms.

If early intervention is critical in DP (before the development of unhelpful compensatory strategies and the passing of any critical periods), research needs to focus on early detection of the condition. Bradshaw (2001) argues that the consequences of atypical development may not be observable on a behavioral level for some time after they have occurred, indicating that urgent work is required to establish the developmental trajectory of DP, and its biobehavioral markers and risk factors.

## **FURTHER CONSIDERATIONS OF INTERVENTION PROGRAMMES**

#### **SPECIFICITY OF TRAINING**

It is clear from the above discussion that the most successful training programmes (whether compensatory or remedial) are those that target the impairment itself. In particular, the studies reported by DeGutis et al. (2007, 2014) indicate that training in holistic processing—a mechanism that is believed to be disrupted in both AP and DP—may be particularly fruitful. Pertinently though, it is possible to target such mechanisms using both facial (e.g., Maurer et al., 2002) and non-facial (e.g., Navon, 1977) stimuli. Such findings have important implications for training, given evidence that intervention using non-facial holistic processing techniques may not be beneficial for individuals with prosopagnosia. For instance, as mentioned in section Compensatory or Remedial Training? training with inverted face stimuli did not improve performance with upright faces in a participant with DP (DeGutis et al., 2007). A similar finding was reported in a study that attempted to train neurotypical participants in holistic processing using inverted faces (Robbins and McKone, 2003). While it is unclear exactly why this effect occurs, it is possible that training with inverted faces simply does not improve holistic processing strategies, and instead encourages processing strategies that are optimal for the recognition of inverted but not upright faces (Farah, 1996; Kanwisher, 2000). Alternatively it may simply be that there is a limit to the amount of transfer that is possible in perceptual learning, and upright faces are just too different from inverted faces for any gains to generalize (Fahle, 2005).

Perhaps the most striking demonstration of the need for facespecific training comes from a study reported by Behrmann et al. (2005). These authors describe the case of SM, a 24 year-old man with visual agnosia and concomitant prosopagnosia following damage to the right anterior and posterior temporal lesions, corpus callosum, and left basal ganglia. The authors trained SM to recognize Greebles (novel objects that require the integration of different "features" composed of complex shapes; Gauthier and Tarr, 1997) over a 31 week period. As has been observed in previous studies (e.g., Gauthier and Tarr, 1997; Duchaine et al., 2004) SM showed a significant improvement in recognizing Greebles that also extended to untrained stimuli and common objects. However, his face recognition skills became even more impaired following training. When this became evident, the authors stopped the training programme and concluded that residual neural tissue with limited capacity may compete for representations. These findings indicate that, at least in the case of holistic processing, any attempts to remediate prosopagnosia must utilize faces in order to be effective.

#### **GENERALIZATION, MAINTENANCE AND TRANSFER**

Failure to elicit treatment generalization both to untreated items and also to alternative versions of the treated items has been common in the treatment of visual recognition difficulties, for both objects and faces (see Riddoch and Humphreys, 1994). In the AP studies that showed some success, there was only evidence of generalization in the study reported by Francis et al. (2002). In fact, these authors concur with Ellis and Young (1988) that level of impairment is an important factor in remediation outcome and particularly findings of generalization. Francis et al. (2002) propose that person-specific generalization in their study within the treated group of photos (i.e., generalization of trained images to other images of the same person) may have been related to the fact that NE did not exhibit perceptual deficits. They propose that failures to achieve this type of generalization in other cases may relate to difficulties earlier in face-processing and particularly at a perceptual level (Ellis and Young, 1988).

However, a different pattern emerges in the DP literature. The one study that assessed generalization of the compensatory training programme within laboratory-based assessments found no evidence of generalization to untrained faces, although AL did show the benefits for different images of the trained faces (Brunsdon et al., 2006). However, response latencies were unusually long in AL, suggesting implementation of the strategy was labored. This observation is akin to the report of NE (Francis et al., 2002), who also received benefits from compensatory training, but found the strategies were often inefficient to implement in everyday life. Nevertheless both AL and K (Schmalzl et al., 2008) reported improved recognition of the trained individuals in everyday life, and the gains were maintained at 3-month and 4 week follow-ups, respectively. K was also described in Wilson et al. (2010) when she was 7.5 years old, and continued maintenance of the gains was reported (but note that the authors suggest K may be on the autism spectrum). These observations suggest that in DP compensatory training may be rapid, suitable for adults and young children, suitable for individuals with perceptual impairments, and the gains may translate to everyday life (but only for trained faces) and be maintained.

On the other hand, the remedial holistic training programme reported by DeGutis et al. (2007, 2014) also generalized to improvements in everyday face recognition (i.e., the gains were not restricted to the faces used in training), as evidenced by selfreport diaries kept by the participants. However, MZ showed limited maintenance of training gains (DeGutis et al., 2007), which raises the possibility that while remedial training may bring about greater and more generalized gains, these benefits may quickly fade without continued rehearsal. Furthermore, training in the larger group study was only successful in 13 of the 24 participants, and was not linked to pre-training performance on perceptual tests. This indicates that gains from remedial training can vary significantly between individuals, and a more complex set of factors may influence treatment outcome.

#### **INDIVIDUAL DIFFERENCES**

Much evidence indicates that age may be an important variable in predicting success in neurorehabilitation. Although no clear patterns can currently be seen in the prosopagnosia literature, it is likely that participant age may dictate the choice of training technique. For example, although the DP studies indicate that compensatory training can be effective even in children, the case of TM (Dalrymple et al., 2012) raises the possibility that remedial training techniques are simply not age-appropriate. Given that the broader neurorehabilitation literature suggests that remedial training should be more effective in children, future work needs to develop adaptations of remedial programmes for specific age ranges.

The wider neurorehabilitation literature also suggests that other individual differences can influence intervention outcome, although it is too early to comment on whether these hold true for prosopagnosia. For instance, there is controversial evidence that gender predicts recovery from acquired damage in adulthood (Anderson et al., 2001), as hormones may cause the female brain to develop more rapidly and with a more diffuse organization, perhaps permitting greater plasticity and potential for reorganization of function (Strauss et al., 1992; Kolb, 1995).

In addition, individuals with higher intelligence and superior education are less affected by brain damage (Wilson, 2003), and Anderson et al. (2001) conclude that family function, socioeconomic status, access to rehabilitation, and response to disability all make a powerful contribution to recovery. In the longer-term, it is environmental rather than organic factors that tend to predict recovery from acquired brain damage (e.g., Kolb, 1995). Hence, these factors may influence the outcome of rehabilitation studies, and should be taken into account when evaluating intervention success.

## **FUTURE DIRECTIONS**

Clearly future work needs to explore both compensatory and remedial training strategies in more depth, and match their suitability to both AP and DP, their potential subtypes, and properties of the individual participant. Future work should also investigate participants' emotional response to interventions—for example, whether training programmes can lead to negative outcomes (e.g., frustration or feelings of low self-worth if they are ineffective), and how these compare to the relatively modest behavioral gains reported to date. Future studies may also move beyond purely behavioral interventions: given huge gains in everyday face recognition have not been reported following any type of training, alternative methodologies may present with more fruitful means of boosting face recognition skills in prosopagnosia. Two methodologies in particular have the potential to supplement face training programmes: intranasal inhalation of oxytocin and non-invasive brain stimulation.

Recent evidence suggests that intranasal inhalation of oxytocin can temporarily improve face recognition skills in both typical participants and those with DP. Oxytocin is a neuropeptide that affects social cognition, potentially by increasing the perceptual salience of social cues (Bartz et al., 2011). Several studies of neurotypical populations have found better memory for faces (but not other, non-social stimuli) following inhalation of oxytocin (Guastella et al., 2008; Savaskan et al., 2008; Rimmele et al., 2009). More notably, a recent study found that participants with DP showed better performance on both a face matching and a face memory task following inhalation of oxytocin, compared with a placebo condition (Bate et al., 2014). Currently it is unclear why people with DP benefit from inhalation of oxytocin. On a neural level, findings from participants with typical face recognition suggest that oxytocin modulates activity in several regions implicated in face processing—namely, the FFA and the amygdala (Domes et al., 2010; Gamer et al., 2010). DPs show structural and connectivity abnormalities in the core face-processing system, around the fusiform and temporal gyri (Garrido et al., 2009) and within the ventro-occipital cortex (Thomas et al., 2009). Therefore, it is possible that oxytocin-related modulation of activity in these areas could underpin increased face recognition performance for the DPs in Bate et al.'s (2014) study. However, further work incorporating neuroimaging of DPs under oxytocin conditions is necessary to explore this possibility.

Inhalation of oxytocin has been found to increase fixations to the eye region of the face in typical participants (Guastella et al., 2008; Gamer et al., 2010). The eye region is considered optimal for face recognition (Peterson and Eckstein, 2012), and several studies have found that DPs spend less time looking at the eye region than typical controls (e.g., Schwarzer et al., 2007). It is possible that oxytocin encouraged DP participants to attend to the eye region more than usual, which may have increased their performance in face-processing tasks. Once again, further work using eye-tracking technology is necessary to explore this possibility. Future work may consider combining inhalation of oxytocin with behavioral training in an attempt to increase or speed up training gains, and/or to extend the benefits of oxytocin inhalation beyond a single session.

Another class of techniques that has been shown to improve face recognition performance, at least temporarily, is non-invasive brain stimulation. There are many types of non-invasive brain stimulation, but three in particular show promise for interventions in prosopagnosia: transcranial electric stimulation (incorporating transcranial direct current stimulation, or tDCS; and transcranial random noise stimulation, or tRNS) and galvanic vestibular stimulation (GVS). In transcranial electric stimulation, a weak current (usually 1–3 mA) is applied to the scalp via electrodes. tDCS involves the use of a constant current. Areas under the anode exhibit cortical excitability, whereas areas under the cathode show the opposite effect (Paulus, 2011). tDCS has been shown to improve performance in typical participants in a range of cognitive tasks, from low-level vision, executive functioning, memory, and language (Kuo and Nitsche, 2012). Notably, tDCS has also been used in stroke patients (generally those with aphasia), and, in concert with cognitive training, has been shown to improve speech and naming abilities (see Krause and Cohen Kadosh, 2013, for a review). This may occur because tDCS facilitates compensation in non-damaged regions, reduces activation in non-damaged regions that may inhibit activation in or around lesioned areas, or increases residual output of partially damaged areas (Cohen Kadosh, 2013). In other words, tDCS may be useful in conjunction with both remedial and compensatory training strategies, but choice of strategy and stimulation site (lesion area/contralateral lesion area) could vary patient-topatient, depending on the site and extent of damage. To date, tDCS has not been applied to prosopagnosia, or in face perception tasks in typical participants. However, Ross et al. (2010) found that anodal tDCS over the right anterior temporal lobe significantly improved name recall for famous faces in a group of young adults with typical face recognition, indicating that anterior temporal tDCS may be useful in mnemonic cases of AP or DP.

tRNS involves the use of a current that changes several hundred times per second, taking its value from a random noise distribution centered around 0 (Paulus, 2011). Because the current oscillates between the two electrodes, there is no anode or cathode, and the areas under both electrodes show enhanced cortical excitability (Cohen Kadosh, 2013). Like tDCS, tRNS has been shown to improve cognitive abilities in a range of domains, including motor and perceptual learning (Terney et al., 2008; Fertonani et al., 2011). tRNS also shows long-term effects: when combined with 5 days of cognitive training for numerosity or mental calculation, stimulation resulted in increased training gains that remained evident between 16 weeks and 6 months later (Cappelletti et al., 2013; Snowball et al., 2013). Like tDCS, tRNS has not been applied in AP or DP as yet. However, evidence from training studies in other domains suggests that combining cognitive training (such as the techniques used by DeGutis et al., 2014) with tRNS may enhance its effects, although work is needed to clarify which combination of training task and stimulation site is effective in various types of prosopagnosia.

GVS resembles tDCS of the vestibular nerve—electrodes are placed on the mastoid bones, which stimulates the vestibular nerve and, in turn, all vestibular relay stations upstream. fMRI studies have revealed that GVS activates a wide range of cortical areas including several associated with face-processing (e.g., the superior temporal gyrus and temporo-parietal cortex; Bense et al., 2001). Only one study has examined GVS in face recognition: Wilkinson et al. (2005) applied GVS to patient RC, who acquired prosopagnosia following damage to the right temporal lobe (amongst other areas). Short sessions of GVS improved RC's face discrimination performance to above-chance levels. However, the discrimination task was not strictly identity-matching—RC was required to choose a face that did not have its eyes and mouth inverted, rather than to choose between two typical faces. As such, it is difficult to say whether the stimulation simply improved detection of abnormalities in a face, or whether the effects would carry over to other face processing tasks (e.g., face memory). Once again, further work is necessary to confirm whether GVS may also be beneficial for DPs, or in other cases of AP with different lesions or functional profiles.

## **SUMMARY**

In sum, while there have been few attempts to improve face recognition skills in either AP or DP, some tentative conclusions can be drawn from the available data and the wider neurorehabilitation literature. First, there is evidence to suggest that both forms of the condition respond to compensatory training, and that some adults with DP benefit from remedial training (although currently it is unclear precisely why some participants show benefits, whereas others do not). It is also unclear whether remedial programmes may be useful in AP, and in children with DP. While the benefits of compensatory training programmes appear to be that they are suitable for both adults and children and their gains are more long-lasting, they also promote more labored processing strategies that are less likely to generalize to the recognition of untrained faces. On the other hand, remedial training techniques may promote more efficient "normal" processing strategies that are more likely to generalize to untrained faces, yet it takes more training to achieve these gains and they require continued rehearsal.

Given there have been very few studies in this area, further research into the duration, maintenance, and long-term benefits of remedial and compensatory training are necessary. It is likely that the suitability of these programmes for different individuals will have a complex interaction with age, the type of injury in acquired cases, the severity and nature of the prosopagnosia, and other environmental influences. In any case, gains are likely to be mild-to-moderate, and the utility of alternative methodologies (i.e., oxytocin inhalation or brain stimulation) should be considered. It is important to note that use of these techniques is in its infancy, and while single applications may bring about short-term gains in face recognition skills, there are likely to be significant safety considerations associated with everyday application of the techniques. Alternatively, performance of remedial training under oxytocin or stimulation conditions may bring about larger and longer-term benefits than the behavioral programme alone. Future work using more systematic methods and larger participant groups is clearly required, and in the case of DP, there is an urgent need to develop early detection and remediation tools for children in order to optimize intervention outcome.

## **REFERENCES**


circuits with prolonged modes of activation within both the peri-infarct zone and distant sites. *J. Neurosci.* 29, 1719–1734. doi: 10.1523/JNEUROSCI.4249- 08.2009


An MRI and PET study and a review of the literature. *Neuropsychologia* 32, 893–902. doi: 10.1016/0028-3932(94)90041-8


prosopagnosia. *Cogn. Neuropsychol.* 27, 30–45. doi: 10.1080/02643294.2010. 490207


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2014; accepted: 17 June 2014; published online: 23 July 2014.*

*Citation: Bate S and Bennetts RJ (2014) The rehabilitation of face recognition impairments: a critical review and future directions. Front. Hum. Neurosci. 8:491. doi: 10.3389/fnhum.2014.00491*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Bate and Bennetts. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Face processing improvements in prosopagnosia: successes and failures over the last 50 years

## *Joseph M. DeGutis 1,2\*, Christopher Chiu1, Mallory E. Grosso1 and Sarah Cohan2*

*<sup>1</sup> Boston Attention and Learning Laboratory, VA Boston Healthcare System, Jamaica Plain, MA, USA*

*<sup>2</sup> Vision Sciences Laboratory, Department of Psychology, Harvard University, Cambridge, MA, USA*

#### *Edited by:*

*Davide Rivolta, University of East London, UK*

#### *Reviewed by:*

*Arun Bokde, Trinity College Institute of Neuroscience, USA Janina Esins, Max Planck Institute for Biological Cybernetics, Germany Isabelle Bülthoff, Max Planck Institute for Biological Cybernetics, Germany*

#### *\*Correspondence:*

*Joseph M. DeGutis, Geriatric Research Education and Clinical Center, VA Boston Healthcare System, 150 S. Huntington Avenue, Boston, MA 02130, USA e-mail: degutis@wjh.harvard.edu*

Clinicians and researchers have widely believed that face processing cannot be improved in prosopagnosia. Though more than a dozen reported studies have attempted to enhance face processing in prosopagnosics over the last 50 years, evidence for effective treatment approaches has only begun to emerge. Here, we review the current literature on spontaneous recovery in acquired prosopagnosia (AP), as well as treatment attempts in acquired and developmental prosopagnosia (DP), differentiating between compensatory and remedial approaches. We find that for AP, rather than remedial methods, strategic compensatory training such as verbalizing distinctive facial features has shown to be the most effective approach (despite limited evidence of generalization). In children with DP, compensatory training has also shown some effectiveness. In adults with DP, two recent larger-scale studies, one using remedial training and another administering oxytocin, have demonstrated group-level improvements and evidence of generalization. These results suggest that DPs, perhaps because of their more intact face processing infrastructure, may benefit more from treatments targeting face processing than APs.

**Keywords: acquired prosopagnosia, developmental prosopagnosia, recovery, rehabilitation, treatment, cognitive training**

## **INTRODUCTION**

Prosopagnosia is a deficit in the ability to perceive and recognize faces, and most commonly results from genetic/developmental causes (up to 1 in 40 developmental prosopagnosics in the general population, Kennerknecht et al., 2006, 2008). More rarely, prosopagnosia is caused by acquired brain injury that damages occipital-temporal or anterior temporal regions (Barton, 2008). Though developmental and acquired prosopagnosics may have more or less severe perceptual deficits, they all generally have difficulties with building a rich holistic face representation sufficient for face identification (Bukach et al., 2006; Ramon et al., 2010; Avidan et al., 2011; Palermo et al., 2011; DeGutis et al., 2012b). Instead, prosopagnosics attempt to learn and recognize faces using a less effective piecemeal approach, or rely on nonfacial cues such as voice and clothing. Reliance on these alternative methods leaves prosopagnosics with significant recognition deficits that may lead to a restricted social circle, more limited employment opportunities, and loss of self-confidence (Yardley et al., 2008). Because of these potentially debilitating consequences and the high prevalence of prosopagnosia, developing treatments to enhance face recognition is a valuable endeavor.

A widely held belief by clinicians and researchers is that prosopagnosics cannot significantly improve their face processing ability. Even as recent as 2005, Coltheart suggested that "there may be domains of cognition for which an impairment caused by brain damage is such that restoration of normal processing is impossible. It is conceivable that face processing is one such domain." Coltheart goes on to suggest that this may be because "face processing depends on a specific brain region and this region may have a particular kind of structure that is specialized for the specific types of computations needed for recognizing the unique stimulus that faces are" (Coltheart et al., 2005). The acquired prosopagnosia (AP) literature somewhat reinforces Coltheart's claim, though more recent studies of developmental prosopagnosia (DP) (including two from Coltheart's group: Brunsdon et al., 2006; Schmalzl et al., 2008) suggest that improvement in some aspects of face processing, even at the group level, is indeed possible. In the current article, we first review the AP recovery and treatment literature and consider explanations of limited treatment-related improvements. We then review the more promising treatment-related improvements observed in DPs and discuss explanations for differences between developmental and acquired prosopagnosics.

#### **METHOD OF SEARCH AND SELECTION CRITERIA**

Using pubmed, google scholar, and web of science as search engines, we searched for articles using the keyword "prosopagnosia" in conjunction with each of the following keywords: "recovery," "training," "treatment," "therapy," "rehabilitation," "improvement," "enhancement," "amelioration," "restoration," and "compensation." We included both peer-reviewed empirical articles and book chapters and focused our search on prosopagnosia due to acquired brain injury and DP (which includes congenital prosopagnosia). However, we excluded studies where prosopagnosia was a symptom of a more global deficit such as in cases of neurodegenerative disease (Cronin-Golomb et al., 2000; Turan et al., 2013) and autism spectrum disorder (Weigelt et al., 2012).

## **SPONTANEOUS RECOVERY IN ACQUIRED PROSOPAGNOSIA**

Studies of spontaneous recovery in AP are useful in that they can help determine the potential for the face processing system to naturally improve after damage, and can shed light on the possibilities for treatment-related improvements. As can be seen in **Table 1** and **Figure 1**, our search revealed seven studies that assessed spontaneous recovery in AP, four of which suggest that recovery of face recognition abilities is possible. The first study to report recovery is a case of a 20-year old man who experienced prosopagnosia after falling from a horse and suffering bilateral, though predominantly left-sided, occipital-temporal contusions (Glowic and Violon, 1981). Remarkably, from 4 months postinjury to 1 year, the patient reported a full recovery in his face processing abilities. Because no neuroimaging data is presented, unfortunately it is difficult to know if this recovery was due to healing of the peripheral vasculature and support structures (e.g., reduced inflammation) or reorganization of the brain. Lang et al. (2006) provide more convincing evidence of neural reorganization, reporting full recovery after 6 months in an 89 year-old prosopagnosic woman with damage to right occipital-temporal regions. Interestingly, a post-recovery functional MRI revealed exclusive activation of the left fusiform face area (FFA) rather than the more typical right FFA activation when viewing faces, suggesting possible reorganization of face processing to homologous regions in the left hemisphere. Though these cases of full recovery are notable, they are somewhat limited by their reliance on the patients' self-report.

When using more objective tests of face perception and memory, Malone and colleagues described partial recovery in two acquired prosopagnosic patients with bilateral occipital lesions (Malone et al., 1982). One patient (64-year-old male) who was first assessed 10 weeks after symptom onset and again 12 weeks later, demonstrated improved recognition of familiar faces though not on perceptual discrimination of unfamiliar faces. Another AP (26-year-old male) was first assessed for prosopagnosia 1 week after an acquired brain injury due to a gunshot wound, and again 6 weeks post-surgery. He showed improved perceptual discrimination but no improvement on familiar face recognition. These two cases suggest that even with relatively similar lesions, the recovery of face perception and face memory mechanisms are dissociable and may represent two distinct targets for treatments.

In a fairly large group study of right hemisphere stroke survivors, Hier et al. (1983) reported that of 19 right hemisphere


*T1, First testing session at specified time after injury; T2, Final testing session after injury.*

stroke patients suffering from prosopagnosia (according to performance on a famous faces test), 50% recovered after 9 weeks and 90% recovered after 20 weeks. Despite the relatively large number of patients in this study, a major limitation is that it relied exclusively on a famous faces test for diagnosis and tracking of prosopagnosia. Because they did not account for pre-morbid familiarity, this may have inflated the incidence of prosopagnosia and, because of potential practice effects, exaggerated the degree of natural recovery. An additional issue is that the group lesion overlap was centered in the temporal-parietal junction, which is significantly superior to occipital-temporal lesions typically associated with AP. Thus, these high recovery rates may not generalize to more typical cases of AP.

In contrast to these four studies showing evidence of recovery, three studies of patients with bilateral occipital-temporal lesions failed to find evidence of recovery. Comparing assessments 2 weeks after brain injury in a 22-year-old prosopagnosic, to assessments 40 years later, Sparr et al. (1991) did not find any evidence of recovery on an informal famous faces task. Ogden (1993) similarly failed to find evidence of any improvements of face processing functions in her study of a 24-year-old AP who was first tested about 2 months after injury and then 6 years post-injury. Finally, Spillmann et al. (2000) assessed their patient 15 months after stroke and then 3 years later with similar results of no recovery.

Collectively, these studies provide evidence that some recovery from AP is possible in certain patients. Considering the positive results of the patients with unilateral lesions (Glowic and Violon, 1981; Hier et al., 1983; Lang et al., 2006) along with the lack of recovery in patients with bilateral occipital-temporal damage (Ogden, 1993; Spillmann et al., 2000), it seems that unilateral lesions may have the best prognosis for recovery. Bilateral lesions likely damage homologous core face processing regions such as the occipital face area (OFA), FFA and the posterior temporal sulcus (pSTS) (Haxby et al., 2001), which may destroy key nodes in the face processing network (see more on this below). This is consistent with the observation that APs with bilateral damage have generally more severe face recognition deficits than those with unilateral damage (Barton, 2008). We did not find that recovery varied by age, gender, or handedness. Additionally, although it is likely that there is a graded window of recovery for AP that is similar to other acquired visual disorders (Zihl, 2011), besides the Hier study we did not find strong evidence that those initially assessed earlier showed more recovery. However, due to the small number of studies, variability across studies in methods of prosopagnosia diagnosis and time points used to assess recovery, the conclusions we can draw are limited.

In spite of these limitations, these studies suggest that the face processing system may have some capacity for neural reorganization after damage and leave open the possibility that treatments could significantly enhance face processing, potentially more for APs with unilateral lesions.

## **COMPENSATORY TREATMENT APPROACHES IN ACQUIRED PROSOPAGNOSIA**

Several compensatory treatment attempts have been made to try to alleviate symptoms in AP, as seen in **Table 2** and **Figure 1**. These treatments seek to teach patients ways to work around their face recognition deficits, either by using intact systems in the domain of perceptual face processing (e.g., attending to facial features), semantic processing (e.g., encoding a faces in conjunction with details about their profession), using verbal strategies (e.g., verbalize distinct facial features), or using intact implicit face recognition mechanisms. About half of these studies show some benefits (Beyn and Knyazeva, 1962; Polster and Rapcsak, 1996; Francis et al., 2002; Mayer and Rossion, 2007), though it is still an open question how much these treatments generalize1 beyond the faces used in the specific training programs.

The first reported attempt at enhancing face recognition in prosopagnosia was by Beyn and Knyazeva (1962) who presented

<sup>1</sup>For the following treatment studies, we defined "generalization" as evidence of improvements in processing novel face stimuli that are different from the treatment intervention itself. Studies that do not have evidence of generalization by this definition could still have real-world implications (e.g., training to specifically better recognize a friend's face) but may be less useful than studies with evidence of enhancing more global aspects of face processing abilities.



a 39-year-old patient (C.H.) with severe deficits in recognizing familiar faces, likely from bilateral occipital-temporal damage. Through systematic practice of faces with special attention to facial features and expressions, as well as practice copying faces, Beyn reported that C.H. showed some improvements in recognizing faces in real-world circumstances. Although neither standardized methods of training nor objective tests were used, this study provides preliminary evidence that attending to specific facial features may be beneficial in lessening face processing deficits.

Mayer and Rossion (2007) also showed some improvements using feature training in prosopagnosic P.S., a 52-year-old patient with damage to the regions involving the left fusiform gyrus (encompassing the left FFA) and right inferior occipital gyrus (encompassing the right OFA). They had P.S. verbally analyze internal facial features, progressing from (1) faces with caricatured features, to (2) unknown adult faces, to (3) unknown faces of children, and finally to (4) children in P.S.'s kindergarten class. P.S. was first asked to sort each set of faces based on a criterion feature (e.g., length of the mouth) and then to describe the distinctive internal feature for each face in the set. This strategy was then applied to her kindergarten class, where she made index cards of every child's distinctive internal facial features. After 4 months of training (two sessions per week), she improved at recognition of pictures of her students and reported relying more on internal features. Moreover, she could confidently stay with her students outside the school environment, suggesting some real-world training-related improvements.

Francis et al. (2002) also found some evidence for improvement after compensatory training in a 21-year-old (N.E.) with prosopagnosia and person-based semantic deficits due to primarily right, possibly bilateral, temporal lobe damage from herpes encephalitis. When comparing several compensatory face learning strategies, they found that the encoding approaches that targeted both semantic impairments and face processing deficits were the most effective—they not only improved recognition of unfamiliar faces, but also faces of individuals familiar to the patient. Despite these promising results, the authors caution that N.E.'s face perception abilities were largely intact and the improvements they observed may not hold for acquired prosopagnosics with more severe perceptual deficits.

Powell et al. (2008) also showed some face recognition improvement after providing different encoding strategies to acquired prosopagnosic W.J., who had damage to left occipital, left frontal, bilateral temporal, and right occipital regions (McNeil and Warrington, 1993; Powell et al., 2008). Compared to being provided with semantic information along with the faces or encoding faces with caricatured features, instructing the patient to attend to distinctive features (e.g., This is Victoria, she has large eyes and freckles) improved facial recognition the most. This provides additional evidence that attending to distinctive features can be a useful compensatory aid to face learning in APs.

Though these studies reported evidence of improvements and positive impacts on everyday life, other studies using compensatory feature and semantic training in APs have found very limited improvements (Polster and Rapcsak, 1996) or failed to find any improvements (Wilson, 1987). In a 68-year-old AP male (R.J.) with a right occipital-temporal damage and semantic impairments, Polster and Rapcsak (1996) compared several encoding instructions while R.J. attempted to learn new faces, shown from front-views. Between rating features (e.g., narrowset vs. wide-set eyes), rating personality traits (e.g., lively vs. dull), identifying a distinctive feature (e.g., verbalize most distinctive feature), and attaching semantic information, encoding by rating personality traits and attaching semantic information yielded the most improvements during recognition of the same front-view versions of the faces. Unfortunately, these improvements did not generalize to improvements at recognizing novel ¾ views of these faces, suggesting that the information being learned was view-specific and may be of limited use in real-world settings. In another discouraging attempt, Wilson (1987) had a 27-yearold prosopagnosic with right temporal-parietal damage practice face recognition by attaching concrete visual images to each face and miming the image (e.g., This face is Sue—think of "soup" and mime eating soup). On each of the 11 test assessment sessions, performance did not demonstrate any appreciable improvement with either strategy.

Another compensatory approach with somewhat discouraging outcomes is the use of covert face recognition abilities, shown to be intact in some APs (though not all APs, see Barton et al., 2001), to improve overt recognition (i.e., provoked overt recognition). According to Burton's interaction and competition model of face recognition (Burton et al., 1990), covert recognition in APs arises from weak connectivity between face recognition units and person identity nodes (PINs), resulting in less activation of the PINs. The logic is that by incorporating semantic information (e.g., an individual's profession) while seeing someone's face, the activation of the PINs necessary for overt recognition could be strengthened, leading to improved recognition in APs. For example, Sergent and Poncet (1990) showed eight faces of famous politicians to acquired prosopagnosic P.V., who had damage to left anterior temporal and right temporal parietal regions. Though P.V. was unable to identify the faces, once the experimenter said that they all had the same occupation, she correctly guessed they were politicians and was able to identify seven out of eight faces. De Haan et al. (1991) replicated this effect in a limited way in a 23-year-old patient (P.H.) using a slightly modified paradigm in which the experimenters provided the category of profession. Out of the six categories they tried, improvements were limited to a single category in which the faces were highly related (actors from a particular soap opera). P.H.'s ability to recognize these faces faded after 2 months. Though using covert recognition mechanisms to aid overt recognition is theoretically appealing and may be possible in particular situations for certain patients (for a review see Morrison et al., 2000), the findings have been too inconsistent to be useful for more general rehabilitation.

Together, the results of compensatory training attempts in APs provide hope, but also suggest that no single approach is appropriate for all APs. Even with the most generally successful approach of focusing on distinct facial features, there are cases where it failed to work or where the effects of training failed to generalize beyond the faces used in training. One issue with many of these studies is that they did not adequately measure generalization to different tasks and different faces. Incorporating these measures of generalization in future studies would be useful to better gauge the therapeutic benefits of these approaches. One interesting pattern that we observed is that compensatory treatments were *more* successful in patients with *bilateral* lesions (e.g., Mayer and Rossion, 2007; Powell et al., 2008) compared to those with *unilateral* lesions (e.g., Wilson, 1987; Polster and Rapcsak, 1996). This stands in contrast to the spontaneous recovery results above, and paradoxically suggests that those with more extensive lesions have more to benefit from compensatory approaches. Though this could be an anomaly from the small number of studies in this literature, it warrants further investigation.

In sum, the available evidence suggests that one should choose compensatory treatments that are specific to each AP's deficits (e.g., perceptual vs. more semantic deficits) and their residual abilities (e.g., ability to identify distinctive features or identify personality traits from faces) as well as use guidance from theoretical models of face recognition (Bruce and Young, 1986; Haxby et al., 2001). However, considering the variable results of this rather small literature, a thoughtful trial-and-error approach using several treatments may be the most successful method in implementing compensatory training with APs.

## **REMEDIAL TREATMENT APPROACHES IN ACQUIRED PROSOPAGNOSIA**

While compensatory training utilizes strategies to work around prosopagnosics' face recognition deficits, remedial training directly targets prosopagnosics' underlying deficits (i.e. holistic face processing) to promote more normal patterns of face processing. Despite evidence that face processing abilities can improve through recovery and compensatory training in some APs, there is currently no evidence that treatment approaches that attempt to directly remediate face processing in APs are effective (see **Table 2** and **Figure 1**).

Ellis and Young (1988) present a very thorough attempt to retrain face discrimination in an 8-year-old prosopagnosic child (K.D.) with diffuse brain damage caused by meningococcal meningitis. In particular, over an 18-month period, they provided K.D. with systematic face discrimination training and face-name learning with feedback. Their thought was that perhaps systematic practice with a finite set of faces in a controlled environment would improve some aspects of face processing. They found no evidence of improvements after either repeated discrimination of familiar and unfamiliar faces or discrimination of schematic faces that differed on one to four features. They also failed to find any evidence that K.D. could learn face-name pairs. A potential drawback to this study is that the daily intensity of training was relatively low (on average, K.D. performed ∼10 trials/day) and training was not sufficiently adapted to K.D.'s ability level (i.e. there were no face tasks that she could successfully complete at the beginning of training). This likely made the training tasks quite frustrating and discouraging. Even after considering that K.D. may have had reduced motivation, this study still provides evidence that the face processing system, once damaged, is not easily remediated even in a young, plastic brain.

More recently, DeGutis et al. (2013) used a higher intensity holistic face training program (30 sessions x 900 trials/session over 1 month) in a 46-year-old acquired prosopagnosic (C.C.) with a right occipital-temporal lesion. In particular, C.C. trained on a task in which she had to integrate configural information from the eye and mouth region to accurately categorize computer-generated faces into one of two arbitrary categories (faces with higher eyebrows and lower mouths are category 1, whereas faces with lower eyebrows and higher mouths are category 2). The logic was that these face judgments would be strategic and slow at first, and then with practice become faster and more holistic. Despite showing some modest improvements on the training task, C.C. did not show any appreciable generalization to assessments using novel faces (DeGutis et al., 2013). Notably, a smaller dosage of the same training program (15 vs. 30 sessions) has recently shown to enhance aspects of face perception and subjective face recognition abilities in a group of developmental prosopagnosics (see below, DeGutis et al., 2014). The discrepancy between C.C.'s results and that of DPs could reflect that it is more difficult to remediate AP compared to DP, though additional attempts to remediate AP are necessary to confirm this. Together, these results show no evidence that approaches which attempt to remediate face processing in AP are successful.

## **OTHER TREATMENT APPROACHES IN ACQUIRED PROSOPAGNOSIA**

In addition to these compensatory and remedial approaches in AP, researchers have tried other means to improve face processing in APs. Wilkinson et al. (2005) used galvanic vestibular stimulation in a 61-year-old patient with AP from extensive damage to the right hemisphere, including the entire temporal lobe, inferior frontal gyrus, and superior parietal lobe. Their logic was that since face-selective brain regions are strongly activated by vestibular stimulation (Bense et al., 2001), electrical stimulation of the vestibular system may restore aspects of face perception. Electrical currents were administered via the left and right vestibular nerves during a forced choice face-matching task. Accuracy significantly improved from chance level to 70% after switching the stimulation polarity from either right to left or from left to right (Wilkinson et al., 2005). These improvements could be from generally enhancing alertness/attention or from the vestibular system's effects on visuospatial perception (Wilkinson et al., 2008).

Using a different approach, Behrmann et al. (2005) tried to improve face processing in an AP by training within-category discrimination of face-like objects ("greebles," Gauthier and Tarr, 1997). Their logic was that greeble training would engage visual expertise mechanisms similar to that of faces, and that stimulating these expertise mechanisms may enhance face perception. In particular, 24-year-old acquired prosopagnosic patient S.M. who suffered damage to his right anterior and posterior temporal regions, was trained to become a greeble expert over a period of 31 sessions (at least two sessions per week). Although the patient demonstrated marked improvements with recognizing greebles, he showed *more* impairment in facial recognition post-training, suggesting some potential competition between greeble processing and face processing. This study makes the important point that in order for an acquired prosopagnosic to improve at face processing, they likely have to train with faces.

## **WHY DO TREATMENTS PRODUCE RATHER LIMITED IMPROVEMENTS IN ACQUIRED PROSOPAGNOSIA?**

Together, the AP recovery and rehabilitation literature is consistent with Coltheart's view that the capacity to restore face processing abilities to normal levels is limited. However, there is evidence that at least some recovery is possible and that compensatory treatments can produce improvements, though it remains to be determined if these improvements generalize and if these strategies will be useful tools for APs in their everyday lives.

One explanation for the limited capability to restore normal face processing in AP is, as Coltheart suggests (2005), because face processing relies on specific cognitive (e.g., holistic processing) and neural mechanisms (e.g., core face processing regions which include the FFA, occipital face area-OFA, and posterior superior temporal sulcus-pSTS). It could be that when these face-selective mechanisms are damaged, because of differences between face and object processing and the limits of neural plasticity, they cannot be taken over by more general object processing mechanisms. The existence of a double-dissociation between prosopagnosics with normal object processing and patients with impaired object processing but intact face processing (Moscovitch et al., 1997; Germine et al., 2011) supports this distinction between object and face processing. If face-specific neural mechanisms become damaged, it may be that more general object recognition mechanisms cannot be used to efficiently recognize faces, but possibly can only aid in more effortful feature processing. This would account for some of the success of compensatory training in which APs are taught to verbalize distinct features (e.g., Mayer and Rossion, 2007). The distinctiveness of face and object processing may also explain why training on face-like objects (greebles) failed to improve face processing.

Another explanation for limited treatment-related improvements in AP is that to some degree, face processing sub-regions in the core (FFA, OFA, pSTS) and extended networks (anterior temporal lobes) represent distinct, independent functions and are not redundant. This lack of redundancy within the face processing network could reduce the capacity for reorganization amongst intact regions and make it so that damaging any single region is more catastrophic. Evidence for specialization amongst face processing regions is from an fMRI study showing that the FFA is sensitive to both face parts and face configuration, while the OFA and pSTS are sensitive to the presence of real face parts but not to the correct configuration of those parts (Liu et al., 2010). Furthermore, the pSTS has shown to be much more sensitive to dynamic aspects of faces (e.g., facial expressions) than the FFA or OFA (Pitcher et al., 2011). Patient studies also support functional independence within the face processing network. Barton (2008) found that patients with lesions to right occipital-temporal regions had more specific deficits in perceiving facial structure and configuration, particularly of the eye region, whereas those with more anterior temporal damage had greater deficits in accessing face memories.

Though face regions may be highly specific within a hemisphere, there may be more redundancy across hemispheres (e.g., right and left FFA). This redundancy would go along with findings that unilateral lesions are typically associated with less pronounced deficits than bilateral lesions (unilateral: Barton, 2008; in contrast, bilateral: Rossion et al., 2003) and why more APs recover after unilateral lesions than bilateral lesions. Furthermore, some redundancy amongst homologous areas can help explain Lang et al.'s (2006) demonstration of complete recovery as well as engagement of the left FFA after damage to right occipital-temporal regions. Despite some redundancy, homologous regions might have somewhat different functional properties. For example, one functional imaging study has suggested that feature- or part-based face processing characterizes the function of the left FFA, while whole-face processing characterizes that of the right FFA (Rossion et al., 2000).

The differentiation between face and object processing, further specialization amongst face selective regions, and even specialization of face selective regions in each hemisphere, may combine to make face recognition particularly depend on coordination amongst nodes in a highly specific network. Indeed, evidence suggests that the coordination amongst face processing nodes may be a crucial aspect of successful face processing (Moeller et al., 2008). This specialization in a network may make it so that the function of a single face-selective region cannot be fully taken over for by the remaining face processing regions and clearly cannot be taken over by regions that represent non-face processing regions. The relative specificity of face processing contrasts with acquired brain injuries causing aphasia (i.e., dysfunction in language comprehension or expression), where evidence suggests that peri-lesional and homologous regions can take over functions of damaged regions (Hamilton et al., 2011; Shah et al., 2013). This may reflect more redundancy in language processing compared to face processing. This high level of specialization and expertise involved in face recognition may make it more vulnerable to disruption and result in AP having a somewhat limited capacity for treatment (for a more extensive discussion of neural plasticity in face processing and prosopagnosia, see Bate and Bennetts, 2014).

## **ATTEMPTS TO ENHANCE FACE PROCESSING IN DEVELOPMENTAL PROSOPAGNOSIA**

As can be seen in **Table 3** and **Figure 1**, the current evidence suggests that compared to the AP findings there may be more potential for treatment-related face processing improvements in DP. In our review of the current literature, five out of six attempts with DP showed some degree of success in bettering aspects of face processing, three of which showed evidence of generalization beyond the faces used in training. It is also notable that there have been two recent group treatment studies (Bate et al., 2014; DeGutis et al., 2014). These studies are important in testing whether treatments work on a DP population level rather just for particular cases.

## **COMPENSATORY TREATMENT APPROACHES IN DEVELOPMENTAL PROSOPAGNOSIA**

Brunsdon et al. (2006) published the first positive attempt to rehabilitate an eight-year-old developmental prosopagnosic (AL) using "feature naming" training, a compensatory approach similar to those used in AP. In particular, AL was taught to perceive, discuss, and remember five distinctive facial characteristics of 17 faces of people he knew. The first two characteristics were always age and gender (which AL could likely recognize) and the other



*Generalization: Evidence of improvements in processing novel face stimuli that are different from the treatment intervention itself.*

three characteristics were distinctive facial features such as "long thin face," "wide nostrils," "high curved eyebrows," "wrinkles around the eyes," and "freckles." After 14 practice sessions over 1 month, AL showed improved recognition of not only the originally trained face images, but also of images of the same faces from different angles with and without hair. He also reported anecdotal real-life improvements of recognizing these faces.

Using the same training approach as Brunsdon et al. (2006), Schmalzl et al. (2008) showed similar positive results with 4 year-old developmental prosopagnosic K. K. not only showed improvements in recognizing target faces, but 4 weeks after training, she also improved on recognizing the faces in different orientations. Additionally, before training K. made abnormal eye movements focused on the external aspects of the face and after training, her scan paths were more normal and involved greater scanning of internal features. This more normal pattern of scanning internal features also generalized to untrained faces. Together, these results suggest that by training compensatory mechanisms in DP children, it is possible to enhance recognition of trained faces, and that this may lead to more normal face scanning patterns.

It is possible that these compensatory strategies could also help adult developmental prosopagnosics. Like K. before training (Schmalzl et al., 2008), adult DPs have shown to have more dispersed eye movements and more often fixate on external facial features (Schwarzer et al., 2007). Thus, similar compensatory training may result in adult DPs paying more attention to the internal features and better remembering particular faces. However, compensatory training could be less effective in adult DPs because they may be already quite well-practiced at using compensatory strategies, including attending to distinctive features.

## **REMEDIAL TREATMENT APPROACHES IN DEVELOPMENTAL PROSOPAGNOSIA**

In addition to the positive results of compensatory training in children with DP, evidence suggests that remedial training in DPs can produce more general improvements in face processing (DeGutis et al., 2007, 2014). An advantage of this approach over compensatory approaches is that it is more automatically implemented, which may better promote generalization.

The training procedure used in two of these studies was very similar and targeted enhancing holistic face processing. The rationale was that DPs could apply some holistic processing to faces, but only over a spatially limited area (e.g., Barton et al., 2003; DeGutis et al., 2012b) and the aim of training was to enhance prosopagnosics' ability to perceive internal feature spacing information across a greater spatial extent of the face. To accomplish this aim, DeGutis et al. (2007) designed a task where participants make category judgments based on integrating two vertical feature spacings: the distance between the eye and eyebrows, and between the mouth and nose. It was thought that, after thousands of trials, DPs could learn to allocate attention to both feature spacings simultaneously, resulting in greater sensitivity to configural information across the inner components of the face (i.e., greater holistic processing).

The first study using this procedure had a 48-year-old DP (M.Z.) perform several months of this procedure (over 20,000 trials; DeGutis et al., 2007). After training, she showed improvements on standardized tests of face perception/recognition (e.g., Benton Face Perception Test) and also experienced daily life improvements. M.Z. reported that these effects lasted for several months before fading. Additionally, immediately following training, she demonstrated a more normal pattern of eventrelated potential selectivity, showing a greater N170 (an occipitotemporal potential normally selective to faces and thought to reflect holistic face processing, see Jacques and Rossion, 2009) in response to faces than objects, and enhanced functional MRI connectivity within right hemisphere face-selective regions during face viewing. These signatures of normal face processing were not present before training. This suggests that it is possible to enhance face recognition in an adult DP using a remedial approach and that this can enhance signatures of normal face processing.

A recent study of 24 DPs that used a similar procedure (though participants performed only 15 sessions of training rather than *>*50) suggests that face processing can be enhanced at the group level (DeGutis et al., 2014). After training, DPs demonstrated overall enhanced performance on several face perception tasks as well as evidence of daily life improvements on a self-report diary. Furthermore, those who particularly excelled at the training task showed the strongest improvements on measures of face perception and enhanced holistic face processing. In fact, whereas prior to training there was a marked difference in holistic face processing between better trainees and controls, after training there were no significant differences between the two groups. However, not all aspects of face processing were enhanced—there were no improvements on measures that required face discrimination from different viewpoints, tasks shown to be particularly challenging for prosopagnosics (Marotta et al., 2002; Lee et al., 2010).

In contrast to these positive reports of training holistic face processing in DPs, there is one report of a failed remedial attempt in an adolescent DP (Dalrymple et al., 2012), which used a somewhat similar training approach to Ellis and Young (1988). Dalrymple et al. (2012) reported an attempt by DeGutis and colleagues to train 12-year-old T.M. to recognize the face of his mother. T.M. made a "mom/not-mom" response when presented with a picture of either his mother or age-matched females, and was provided feedback after each response. After 47 sessions of training (∼10–15 min per session) over a span of 10 months, T.M. did not demonstrate any appreciable improvements on the mom/not-mom task nor did he report improvements in daily life. Similar to Ellis and Young (1988), the intensity of training was somewhat low and insufficient motivation could have been a factor. Regardless, the results of this study are cautionary and suggest that there could be limitations to improvements in face processing in DPs even in the younger, developing brain.

Together, these studies suggest that remedial cognitive training that targets holistic face processing can enhance face processing in DPs and can potentially generalize to improvements in everyday life. Though remedial training did not help all DPs nor did it even enhance all aspects of face processing in the DPs it did help, these studies provide compelling evidence that the face processing system in DPs is at least partially remediable rather than permanently deficient.

## **OTHER TREATMENT APPROACHES IN DEVELOPMENTAL PROSOPAGNOSIA**

In addition to remedial training, another recent promising study by Bate et al. (2014) attempted to improve face processing in developmental prosopagnosics by administering intranasal oxytocin. Oxytocin is a neuropeptide that has shown to be involved is several aspects of social cognition including pair-bonding and trust (Walum et al., 2012) and may be dysfunctional in individuals with deficits in social cognition such as autism. Oxytocin has also shown to enhance the ability to infer the mental state of others on a task that requires sensitivity to subtle information from the eye region (Baron-Cohen et al., 2001; Domes et al., 2007). This is relevant to prosopagnosia in that the eye region is highly diagnostic for face recognition (Butler et al., 2010) and that processing of the eye region has been shown to be particularly impaired in prosopagnosics (DeGutis et al., 2012b). Further supporting this link between oxytocin and facial recognition ability, a recent study of 178 families with at least one autistic child found that variation in the oxytocin receptor gene, *OXTR*, was strongly associated with face recognition performance on the Warrington Face Memory Test (Skuse et al., 2014).

In light of these associations between oxytocin and facial recognition, Bate et al. (2014) attempted to enhance face perception and face memory in DPs using intranasal oxytocin. Ten DPs and ten normal controls were given both oxytocin and a placebo spray, with participants and experimenters both blind to condition assignment. Forty-five minutes after inhalation of the drug or placebo, participants completed novel versions of the Cambridge Face Memory Test (CFMT) and a simultaneous face-matching task. The results showed that DPs had significantly better performance on both tasks after inhaling oxytocin compared to when they inhaled placebo, while the control group showed no differences between conditions. DPs' improvement on both wellvalidated face memory and perception tasks is notable. Though the mechanisms of this improvement remain to be elucidated, one possibility is that oxytocin enhanced face-specific attention mechanisms, such as to internal features or the eye region in particular. These promising results suggest that further exploration of oxytocin's potential to produce longer-lasting improvements would be an exciting future direction not only for DPs, but for APs as well.

## **HOW DO TREATMENTS IMPROVE FACE PROCESSING MORE IN DPs MORE THAN APs?**

The studies reviewed above demonstrate that developmental prosopagnosics can benefit from several types of treatment. Thus, we suggest that compared to acquired prosopagnosics, developmental prosopagnosics may have a substantially greater capacity for improvement.

A likely explanation for DPs' potentially greater ability to benefit from treatments than APs is that they have a more intact face processing infrastructure compared to APs. Though studies have reported structural neural differences between DPs and controls (Behrmann et al., 2007; Garrido et al., 2009; Thomas et al., 2009), these differences are subtle when compared to the typically larger, more absolute lesions associated with AP (Barton, 2008). For example, Garrido et al. (2009) found that compared to controls, DPs had reduced cortical volume in the right anterior fusiform/temporal region, right middle fusiform gyrus, and superior temporal regions. They also found that better scores on face identity tasks were significantly correlated with the volume of the right middle fusiform gyrus. In addition to these cortical differences between DPs and controls, Thomas et al. (2009) report preliminary evidence that DPs have reduced white matter integrity between occipital-temporal and occipitalfrontal regions, suggestive of compromised connectivity within the face processing network and between face processing regions and more anterior regions. Together, this suggests that despite not having gross anatomical differences from controls, DPs have subtle structural differences that likely contribute to their face recognition deficits. Though these subtle structural differences may be important aspects of DPs' face recognition deficits, their subtlety may allow for greater neural plasticity and treatmentrelated improvements compared to acquired prosopagnosics who may have more catastrophic structural damage (for additional discussion on neural plasticity in face processing with regards to prosopagnosia, see Bate and Bennetts, 2014).

In addition to having structure similar to controls, several recent studies provide evidence that DPs' face processing mechanisms are not qualitatively different from controls, but instead show more subtle quantitative differences. For example, DPs generally have a normal face selective N170 ERP component, which represents relatively normal earlier stages of perceptual processing, but have a reduced N170 difference between upright and inverted faces, which may reflect reduced holistic face processing or the use of somewhat similar mechanisms for upright and inverted faces (Towler et al., 2012). Notably, unlike DPs, the majority of individuals with AP do not show a face selective N170 (Dalrymple et al., 2011; Prieto et al., 2011), which may explain some of the differences in treatment success between APs and DPs. Additional ERP evidence for similarities between DPs and controls is that during successful face recognition, DPs show normal N250 and P600f ERP components, potentials related to early visual and later post-perceptual stages of face recognition. This suggests that on the rare occasions that DPs recognize a face, they use similar mechanisms as controls. Furthermore, in functional MRI scans, DPs have shown some face selectivity amongst the core face processing regions (Bentin et al., 2007; Minnebusch et al., 2009; Furl et al., 2011), albeit they may have fewer face selective regions and may show slightly reduced selectivity (Furl et al., 2011).

Together, these studies suggest that DPs may have the ability to process faces in a way that is qualitatively similar to controls, but may have disrupted connectivity within the face processing system. It could be that treatments are improving face recognition in DPs by boosting connectivity within DPs' intact face processing infrastructure. Evidence supporting this idea is from DeGutis et al. (2007) who found increased coherence amongst face-selective regions after training.

DPs' subtle differences from controls and capacity for improvement have interesting similarities and differences with other developmental disorders affecting face processing. For example, the lack of an N170 inversion effect is also found in autism and Williams Syndrome (Towler and Eimer, 2012). Additionally, both individuals with autism and those with DP show dysfunctional face adaptation effects (Pellicano et al., 2007; Palermo et al., 2011). This may suggest that these disorders share a common abnormal developmental trajectory. However, in contrast to autism and Williams Syndrome that are defined in part by marked social differences, DPs show more typical social behavior. For example, it has been shown that DPs attend to the eye region as much as healthy controls (DeGutis et al., 2012b), and that many can efficiently recognize emotion (Palermo et al., 2011; though see Le Grand et al., 2006) and gender (DeGutis et al., 2012a; though see Kress and Daum, 2003) from faces. Furthermore, evidence suggests that holistic face processing is a core deficit in DP (DeGutis et al., 2012b; as well as acquired prosopagnosia, see Busigny et al., 2014) while this is not the case with autism (see Weigelt et al., 2012 for a review) or Williams syndrome (Bellugi et al., 2000). Together, this suggests that unlike autism and Williams syndrome in which there are more global developmental consequences, DP is more specifically associated with developmental abnormalities in face processing. These abnormalities are more quantitatively than qualitatively different from controls.

Though the current DP treatment studies demonstrate that face processing improvements are possible from training, it still remains to be seen whether DPs can truly achieve normal face recognition abilities. Even in cases where treatments were effective at improving face processing (Bate et al., 2014; DeGutis et al., 2014), DPs' abilities either continued to be below average or the skills learned did not generalize to all aspects of face processing (e.g., did not generalize to discrimination across viewpoints in DeGutis et al., 2014). Furthermore, even after successful training, evidence suggests that skills may not be "self-perpetuating" (e.g., DeGutis et al., 2007) and it is likely that without continued intervention DPs return to their dysfunctional ways of perceiving and remembering faces. Thus, though the current demonstrations lay the groundwork for the treatment of DP, there is much work ahead to create effective long-lasting treatments (for additional discussion on future directions, please see Bate and Bennetts, 2014).

## **SUMMARY**

Prosopagnosia has a high incidence (particularly DP) and can significantly impair social engagement and everyday functioning (Yardley et al., 2008). Currently there are no widely accepted treatments and instead, prosopagnosics are commonly left to learn how to recognize individuals through their own process of trialand-error with alternative strategies (e.g., voice, gait, clothing, etc.). In our review of the literature, we find evidence that effective treatments are just beginning to emerge. Though the most consistent treatment successes have been in DP, we find some evidence for the capacity for improvements in AP as well. In addition to enhancing the daily functioning of prosopagnosics, understanding how to better improve face processing could also lead to helping several other populations with face processing and social cognitive deficits including those suffering from autism, Williams syndrome, schizophrenia, as well as those with age-related cognitive decline and dementia. Finally, understanding the mechanisms of these treatments and how successful treatment impacts the cognitive and neural signatures of face processing can lead to broader insights into the capacity for cognitive systems and the brain to reorganize.

## **REFERENCES**


Bellugi, U., Lichtenberger, L., Jones, W., Lai, Z., and St. George, M. (2000). I. The neurocognitive profile of Williams Syndrome: a complex pattern of strengths and weaknesses. *J. Cogn. Neurosci.* 12(Suppl. 1), 7–29. doi: 10.1162/089892900561959


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 24 May 2014; accepted: 09 July 2014; published online: 05 August 2014. Citation: DeGutis JM, Chiu C, Grosso ME and Cohan S (2014) Face processing improvements in prosopagnosia: successes and failures over the last 50 years. Front. Hum. Neurosci. 8:561. doi: 10.3389/fnhum.2014.00561*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 DeGutis, Chiu, Grosso and Cohan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## In your face: transcendence in embodied interaction

## **Shaun Gallagher 1,2,3\***

<sup>1</sup> Department of Philosophy, Lillian and Morrie Moss Professor of Excellence in Philosophy, University of Memphis, Memphis, TN, USA

<sup>2</sup> Department of Philosophy, School of Humanities, University of Hertfordshire, Hatfield, Hertfordshire, UK

<sup>3</sup> Department of Philosophy, Faculty of Law, Humanities and the Arts, University of Wollongong, Wollongong, NSW, Australia

#### **Edited by:**

Davide Rivolta, University of East London, UK

#### **Reviewed by:**

Glenn Carruthers, Macquarie University, Australia Viridiana Mazzola, University of Geneva, Switzerland

#### **\*Correspondence:**

Shaun Gallagher, Department of Philosophy, Lillian and Morrie Moss Professor of Excellence in Philosophy, University of Memphis, 331 Clement Hall, Memphis, TN 38152, USA e-mail: s.gallagher@memphis.edu

## **INTRODUCTION**

What I see in the other's face is irreducible to its physiogomy, its shape or morphological features, its color or physical properties. The significance of the face transcends any of these things. This is an insight associated with the philosophy of Emmanuel Levinas (1969). The other person, in her otherness, resists being simply an entity—whether physical object or epistemological subject. The other is not the sum total of her ontological parts, but in some way transcends all of those parts. Nor is this transcendence simply a way of pointing to something invisible, a mind or a set of mental states that we might be able to infer or simulate. The other is not, analogically, another *me*, or a set of mental states that are like mine. Rather, Levinas suggests, the other, in her alterity, makes an ethical demand on me, to which I am obligated to respond the face-to-face is primarily an ethical relation—the other's face is perceived as an obligation to respond.

For Levinas, ethics is not a matter of theory, argumentation or the promulgation of rules, but is based on an experience of transcendence encountered in the other's face. The situation in which I experience this transcendence is "when the face has turned to me, in its very nakedness. It is by itself, and not by reference to a system" (Levinas, 1969, p. 75). In this circumstance the other's vulnerability shines through her face, independent of context, and elicits a response from me.

The face is characterized by proximity and distance at the same time. When the other's face is close to me, it is so not merely in physical geographical terms, the way an instrument or artifact might be. Its closeness demands a response that could range from a passionate kiss to a punch, or some less extreme and more polite behavior of moving away or asking for space. Even in contexts that involve a close examination in scientific or medical terms,

In cognitive psychology, studies concerning the face tend to focus on questions about face recognition, theory of mind (ToM) and empathy. Questions about the face, however, also fit into a very different set of issues that are central to ethics. Based especially on the work of Levinas, philosophers have come to see that reference to the face of another person can anchor conceptions of moral responsibility and ethical demand. Levinas points to a certain irreducibility and transcendence implicit in the face of the other. In this paper I argue that the notion of transcendence involved in this kind of analysis can be given a naturalistic interpretation by drawing on recent interactive approaches to social cognition found in developmental psychology, phenomenology, and the study of autism.

**Keywords: interaction, face, ethics, transcendence, Levinas**

the face demands some form of respect. Yet, even in its closeness there is something distant in it since one's experience of the other is not just in terms of the physicality of the face. The face (or more generally, the body) is never the totality of the other.

Although Levinas is in some respects a religious thinker (Veling, 1999; Purcell, 2006), his ethics is not necessarily religious, and his thinking about the face can be interpreted in secular terms of embodied, and especially affective, intersubjective experience. At least on one reading (Bergo, 2011), his ethical concept of transcendence is not informed by his religious thought; it is rather the other way around. Religious thinking may be motivated by the transcendence encountered in our intersubjective relations. Ethics begins in face-to-face experience and not in a theological dictum or reference to God. At the same time, however, Levinas (1991) associates the notion of transcendence indicated in the other's face with a form of infinity and as something beyond the reach of science. It is something that is "beyond understanding" (déborde la compréhension) (Levinas, 1991, p. 18).

What I want to demonstrate in this paper is that we can retain this kind of ethical significance, this ineliminable and irreducible transcendence of the other, as seen in the other's face, and still stand firmly on naturalistic grounds to gain an understanding of its significance. Although I view Levinas as presenting an important challenge to science and naturalistic philosophy, and in that regard I want to take this challenge seriously, my argument will not be in total agreement with Levinas. I'll argue that the transcendence at stake in this context involves one's capacity to perceive in the other the possibilities of further interactions that have the potential to take one beyond oneself. The transcendence isn't just in the other; it's in our possible interactions with the other.

#### **INTERACTING WITH FACES**

What is it about the other's face, or more generally, about the other, that elicits the ethical response? I want to work this out in terms that relate to recent debates in social cognition—and specifically in the context of embodied and enactive cognition. In this regard, I will reject what may appear to be a rather easy solution—an easy way to explain the transcendence of the other. That is, I will reject the idea that what transcends the face is the mind—the mental states (intentions, emotions, etc.) that somehow may be physically expressed in facial expressions but are themselves truly hidden and relatively transcendent, behind or beyond physical expressions.

The idea that the mind is hidden away, and thus transcendent to embodied comportment has been called the *unobservability principle* (UP; Krueger, 2012, p. 149). Leslie (1987, p. 164) provides a clear statement of UP: "Because the mental states of others (and indeed ourselves) are completely hidden from the senses, they can only ever be inferred". Many such statements of UP can be found in the theory of mind (ToM) literature. Karmiloff-Smith (1992, p. 138), for example, contends that ToM "involves inferences based on unobservables (mental states, such as belief)..." Or again, Johnson (2000, p. 22): "Mental states, and the minds that possess them, are necessarily unobservable constructs that must be inferred by observers rather than perceived directly".

In opposition to ToM, however, phenomenologists have argued against the supposed ubiquity of mindreading by theoretical inference or simulation, and have defended an embodied/enactive view that social understanding depends, in large part, on interaction rather than on mindreading (Gallagher, 2001, 2005, 2012; Gallagher and Hutto, 2008; De Jaegher et al., 2010). In interactive contexts direct perception also plays an important role in social cognition (Gallagher, 2008); and one's perception of the other is often focused on the face. The argument in favor of interaction theory (IT) has turned out to be a large, complex, and controversial one. I will not try to provide the entire story or enter into many of the details in this paper. Rather, with a focus on the role of face perception, I will discuss some of the experimental literature and its interpretations. Much of the interpretation that we find in this literature is consistent with UP and the ToM approach, and in this regard it follows a common explanatory principle, namely, that social cognition will ultimately be explained once we identify the process or mechanism within the individual agent responsible for the individual's ability to understand the other. In contrast to this principle, and consistent with interactionist views on social cognition, I'll argue that in basic (and most) instances social cognition is accomplished by something that goes beyond the individual agent, namely, the interaction itself. I'll suggest that it is in this interaction that we will be able to find an explanation of the kind of transcendence discussed by Levinas in the ethical context.

My aim in this section is not to provide an exhaustive review of the empirical literature on face perception, but to cover some of the relevant research pertinent to a range of social-cognitive experience. A good starting point is the research of Meltzoff and Moore (1977, 1994) on neonate imitation. These well-known, but still controversial experiments show that infants from birth are typically able to interact with their caregivers, in a way that privileges the face. Most of the experiments are on the imitation of facial gestures, such as tongue protrusion, mouth opening, and pursing of lips. But there are also experiments that show the infant is able to imitate angular tongue protrusion, movement of eyebrows, as well as smiles, grimaces, frowns, and so on (e.g., Field et al., 1982). We can note that one important aspect of these findings is simply that infants are attracted to faces. To explain this basic fact, Meltzoff and Moore (1997) propose an explanation in terms of a *cross-modal* mechanism. Faces are attractive and meaningful to infants because what the infant sees is generally isomorphic to own felt experiences. The cross-modal integration of vision and proprioception allows the infant to make some kind of pragmatic sense of the other's expression, in a way that calls forth a response (see Gallagher and Meltzoff, 1996), or as Meltzoff and Moore (1997) put it, it calls forth an action which serves the specific purpose that the infant is able to employ imitation to verify the identity of others.

There are disagreements about whether this kind of response is genuine imitation, or whether it's the result of perceptual priming in a system with underdeveloped inhibitory mechanisms, or simply a form of contagion (see, e.g., Hurley and Chater, 2004). There are, accordingly, disagreements about the nature of the activity and the nature of the mechanisms to be found in the infant that would account for their ability. Without settling these kinds of disagreements, proponents of IT consider neonate imitation to be part of "primary intersubjectivity" (Trevarthen, 1979), and, regardless of how it is explained or what internal mechanism is involved, its significance is primarily that it is a very early process that pulls the infant into a dyadic and dynamic interaction with the other. One can set aside questions about whether the infant is conscious of what it is doing, or whether internal representations are involved, or whether it's a strictly automatic response that comes down to mirror neuron activation, and still see that the significance of the infant's response to the other's face is tied to the fact that it is not a one-way process. The adult initiates the process in a way that elicits the infant's response and establishes an interaction that is two-way or reciprocal. The infant comes to be enactively coupled to the other in this interaction. The idea of enactive coupling means, in this context, that (1) it is a dynamic process (i.e., one in which a co-dependence is established between the coupled systems such that what happens in or to one system is partly dependent on the situation of the other); (2) that the recurrent engagement with the other person leads to a structural congruence between self and other (Thompson, 2007, p. 45); and (3) that the engaging organisms (or agents) maintain their autonomy (their own internal self-organization).<sup>1</sup> Accordingly, although one can still talk of individuals who engage in the interaction, a full account of neonate imitation is not reducible to mechanisms at work in either or both of the individuals. Complex coordination patterns that result from the mutual interaction of a social encounter, as such, are not simply inputs to individual mechanisms (De Jaegher et al., 2010). Such coordination processes can acquire a momentum of their own and can pull the participants into further or continuing interaction. Interaction in

<sup>1</sup> See Di Paolo and De Jaegher (2012) for a more formal account of dynamic coupling.

intersubjective contexts goes beyond each participant; it results in something (the creation of meaning) that goes beyond what each individual qua individual, that is, on its own, can bring to the process.<sup>2</sup>

Tracking and discriminating faces are some of the earliest infant capacities (Stern, 1985; Johnson et al., 1991; Walton et al., 1992; Hendricks-Jansen, 1996; Mondloch et al., 1999; Bushnell, 2001). Faces have saliency, not only for newborn infants, but throughout the life span, and many, if not most of our interactions with others are conducted face-to-face where enactive coupling is the rule, and where interaction itself is enabling and sometimes constitutive of social cognition. Developmental studies indicate the continued importance of faces. We know that infants "vocalize and gesture in a way that seems [affectively and temporally] 'tuned' to the vocalizations and gestures of the other person" (Gopnik and Meltzoff, 1997, p. 131) and that "in the gentle, immediate, affectionate, and rhythmically regulated playful exchanges of proto-conversation, 2-month-old infants look at the eyes and mouth of the person addressing them while listening to the voice" (Trevarthen and Aitken, 2001, p. 6). This has been dramatically demonstrated in still face experiments (Tronick et al., 1978) where the animation of the other's face is shown to be absolutely essential to the interaction process. The advent of joint attention (sometime during the first year see Reddy, 2008), "secondary intersubjectivity" (Trevarthen and Hubley, 1978), as well as social referencing (Klinnert et al., 1983; Mumme et al., 1996) all depend on making visual contact with the face of the other.

Eye-tracking studies and our everyday phenomenology attest to the fact that the importance of the perception of the other's gaze for a grasp of intentions and emotions continues in adulthood. We see meaning and emotion in the faces of others. Phenomenologists have noted this often in their criticisms of the UP.

I do not see anger . . . as a psychic fact hidden behind the gesture . . .. The gesture does not make me think of anger, it is anger itself. . . . I perceive the grief or the anger of the other in his conduct, in the face or his hands, without recourse to any "inner experience" (Merleau-Ponty, 2002, pp. 214, 415).

Anger, shame, hate, and love are not psychic facts hidden at the bottom of another's consciousness: they are types of behavior or styles of conduct which are visible from the outside. They exist on this face or in those gestures, not hidden behind them (Merleau-Ponty, 1964, pp. 52–53).

As Merleau-Ponty understands the notion of behavior, it is not a meaningless set of movements that require us to make inferences beyond what we can see. Behavior is meaningful, and what we can see is the meaning and the intention in the actions and expressions of others. Accordingly, this is not a form of psychological behaviorism. The phenomenologists are not alone in this. Consider that Wittgenstein, a philosopher from a very different tradition, says much the same thing.

Look into someone else's face, and see the consciousness in it, and a particular shade of consciousness. You see on it, in it, joy, indifference, interest, excitement, torpor, and so on.. . . Do you look into yourself in order to recognize the fury in *his* face? (Wittgenstein, 1967, §229). [I]t is as if the human face were in a way translucent and that I were seeing it not in reflected light but rather in its own (Wittgenstein, 1980, §170).

In intersubjective contexts, visual perception of the face of the other is not equivalent to glancing at an object. It's not a matter of me seeing the other's face, *simpliciter*, but of seeing that the other sees me (or quiet literally, seeing the other seeing me). The fact that the other returns the gaze, and that this strongly registers in our perception (cf. Sartre, 1956; Stawarska, 2009), provides part of the basis for regarding the other not as mere object but as a perceiving subject—and carries with it ethical significance. The other's gaze is precisely not something that can be subsumed into a strictly visual representation of eye direction since it has an affective impact on my own system that sets me up for further response. Perception of another's face activates not just the face recognition area and ventral pathway, but also the dorsal visual pathway—suggesting that we perceive affordances for possible responsive actions in the face of the other (Debruille et al., 2012). Faced with the face of a real person, at a minimum, subjects make eye contact with very subtle eye movements. Accordingly, face perception presents not just objective patterns that we might recognize as emotions. It involves complex interactive behavioral and response patterns arising out of an active engagement with the other's face—not a simple recognition of facial features but an interactive perception that constitutes the recognition of emotions.<sup>3</sup>

It's a mistake, of course, to take the face as an isolated entity, or to think that face-based emotion recognition is informationally encapsulated (pace Goldman, 2006, p. 110), even if in many cases we focus on the face in everyday life. We rely on a variety of bodily aspects in social interaction—posture, movement, gesture, vocal intonation and prosody—as well as communicative and narrative practices (Gallagher and Hutto, 2008), place-related and contextual factors, background knowledge about the person, etc. and our own prior experience. In this regard, we can also say that some of what is true of perception in general also applies to face perception. For example, meaningful perception of any sort may rely on activation of association brain areas outside of very early perceptual processing areas, like visual cortex V1. But recent research shows that even neuronal activity in the earliest of perceptual processing areas, such as V1, reflects more than simple feature detection. E.g., V1 neurons anticipate reward if they have been relevantly attuned by prior experience (Shuler and Bear, 2006). What we see in the present, including faces, incorporates an affective sense of relevant past experiences, so that reportable visual perception is already informed with affective value from the

<sup>2</sup>As Di Paolo et al. (2008) put it, "interaction can dynamically create phenomena that do not directly result from the individual capacities or behaviors of any of the partners if investigated on their own" (p. 279).

<sup>3</sup> If we think of emotions as complex patterns of experiences and behaviors and as such as "individuated in patterns of characteristic features"—features that may include bodily expressions, behaviors, action expressions, etc., then emotion perception can be considered a form of pattern recognition (Izard, 1972; Izard et al., 2000; Newen et al., in press). In this regard, the facial expressions play a major role.

start. Barrett and Bar's *affective prediction hypothesis* "implies that responses signaling an object's salience, relevance or value do not occur as a separate step after the object is identified. Instead, affective responses support vision from the very moment that visual stimulation begins" (Barrett and Bar, 2009, p. 1325). Along with the earliest visual processing, the medial orbital frontal cortex is activated and initiates a train of muscular and hormonal changes throughout the body, "interoceptive sensations" from organs, muscles, and joints associated with prior experience, which are integrated with current exteroceptive sensory information that help to guide response and subsequent actions. Accordingly, along with the perception of the environment, we also undergo and possibly experience, more or less recessively, certain bodily affective changes that accompany this integrated processing (Barrett and Bar, 2009, p. 1326). In other words, before we fully recognize an object or a face, for what it is, our bodies are already configured into overall peripheral and autonomic patterns based on prior associations. In terms of the predictive coding model used by Barrett and Bar, priors are not just in the brain, but involve a whole body adjustment.

Disruptions to intersubjective interaction and to emotional attunement can equally enlighten us about the nature of social cognition. If a facial expression contradicts other interactive or communicative processes, for example, if an actor shows happy facial gestures while telling a sad story (Decety and Chaminade, 2003), the result is puzzlement, distrust, and more explicit attempts to figure out motivations. If perception of the emotion pattern on the face is disrupted, intersubjective problems develop.

While most people perceive the face or body of another as a familiar whole imbued with life, subjectivity, and expression, schizophrenia patients will sometimes focus on individual parts or the purely material aspect of the person before them (see Addington and Addington, 1998; Sass and Pienkos, 2013).

As a result, in such instances of schizophrenia (as well as in autism), subjects have a propensity to view the face as an array of unrelated details; they miss the pattern/gestalt and fail to recognize the emotion.

In cases of Möbius Syndrome (MS)—a form of congenital bilateral facial paralysis resulting from developmental problems with the sixth and seventh cranial nerves (Briegel, 2006) subjects lack the capacity for facial expression and full control of eye movements. These physical problems can lead to difficulties with social understanding and behavior. Some subjects with MS manifest traits of social inhibition, introversion, feelings of social inadequacy and inferiority (Briegel, 2007) and report feeling out of sync with others (Cole, 1999b; Cole and Spalding, 2009). Indeed, part of the problem in MS is not *in* MS itself, but in the regard of others and in interactions between people with MS and others. Because facial expressions play a large role in intersubjective interaction, we anticipate facial responses and when they do not occur (as in MS) interaction can be disrupted in terms of its dynamics and affectivity, leading to confusion or feelings of social discomfort. This does not rule out the possibility that people with MS can find alternative strategies for interacting and understanding others (as Krueger and Michael, 2012 argue),<sup>4</sup> but it does highlight the importance of facial expressions for social cognition.

Face-related problems with intersubjective interaction are also to be found in cases of blindness (both congenital and acquired), those on the autism spectrum who actively avoid looking at faces, those with facial disfigurements or Parkinson's Disease. Jonathan Cole (1999a) gives an excellent account of these conditions with respect to the social difficulties that come along with them. Cole also takes us back to issues raised by Levinas.

If face-to-face relationships involve feelings toward and between people, any external face, another's face, puts a demand on me. It asks me to recognize another, for what I cannot fully assimilate I must respect, and for Levinas this recognition summons me to a form of moral responsibility, in the face of the other, which cannot be brought under the control of my reason and therefore cannot be explained. This moral or ethical responsibility can be viewed in terms of the need for a response, for the face of the other requires me to respond and enter into a relationship, but a relationship that I cannot fully control, that neither of us can fully control. It involves a risk so evident for many of those with facial problems that they avoid it (Cole, 1999a, p. 196).

## **ABOUT FACE: RESPONDING TO LEVINAS**

It's clear from the various empirical studies cited above that, as Krueger and Michael (2012) so aptly put it, "the face is the center of gravity for our social interactions" Krueger and Michael (2012, p. 4). But there is also something that seemingly floats free of a purely physical science. Levinas insists on transcendence. For him, I experience transcendence "when the face has turned to me, in its very nakedness. It is by itself, and not by reference to a system" (Levinas, 1969, p. 75). Levinas associates war with the concept of totality (a complete system, the opposite of a never complete infinity) and a denial of morality: "War renders morality derisory" (Levinas, 1969, p. 21). In this regard it is notable that the face of the other in battle has profound inhibitory effects on violent behavior directed towards the other (Grossman, 1996; Protevi, 2008). Killing involves an objectification (or de-subjectification) of the other in practices that include covering or ignoring the other's face. In this particular context, the denial of the face signifies that the other is reduced to a complete system which excludes the possibility of any further interaction. One finds this same denial, a closing down of interaction possibilities, in cases of torture and solitary confinement (Guenther, 2013; Gallagher, 2014).

<sup>4</sup> I'm in favor of a pluralist approach to social cognition (Fiebich, 2012), which does not deny that we can use some form of theoretical inference or simulation (see Gallagher, 2001), as well as narrative and communicative practices to gain understanding of others (Gallagher and Hutto, 2008). Pace Krueger and Michael (2012), I do not deny such possibilities. I do think, however, there is significant behavioral and phenomenological evidence to suggest that most of our everyday encounters with others are primarily embodied interactions, including communicative interactions, and that third-person theoretical inference and simulation are exceptional rather than common. I note also that I'm not at all convinced that a reverse simulation model as a form of mimicry can be thought of as "endorsing" what Krueger and Michael (2012) call strong interaction theory, as they claim.

I want to suggest, along with Levinas (1969), and adopting his terms, that "infinity is produced in the relationship of the same with the other" (Levinas, 1969, p. 26). But this means that there is nothing about the face in itself, *solus ipse*, or on its own, that generates the ethical demand. Nor is there anything like a complete alterity of the other that is not already mediated in interaction; much of what I am is already shaped in my interactions with others. Levinas emphasizes the asymmetrical demand of the other on me (e.g., Levinas, 1969, p. 46). Yet, we could think that the ethical demand is generated in the mutual turning towards each other. What is important is that the other looks back at me, as I meet her gaze with my own—this mutual experience, which is an aspect of primary intersubjectivity, sparks an interaction between me and the other. The transcendence associated with the ethical is not something unreachable in the other, but is generated in the interaction that transcends all individuality.

The most basic and primary experience of the other is this faceto-face, which sets in play the interaction and the transcendence an interaction that transcends the individuals involved and requires a response that is never complete. The meaning that emerges or is established by the interaction calls for further interpretation, interaction or communication. The ethical, which is about our way of living with others, is built around this primary intersubjective experience—and around it we start to build certain practices.

Interactionists sometimes use the metaphor of the tango (e.g., Di Paolo et al., 2013). Just as it takes two to tango, one cannot accomplish interaction by oneself. Just as when two people dance the tango, something dynamic is created that neither one could create on one's own. One might think that the metaphor of the tango involves an overly formal structure and that perhaps something more like a free dance form is more appropriate for how the dynamics of interaction work. But most of our lives are lived within social and intersubjective structures (practices and institutions) that do specify how we relate to one another. In some cases this takes the shape of a norm or rule that requires that we mutually recognize our responsibility to the other. Even within such structures, however, even in those that may support totalizing practices, but perhaps short of war, torture, and solitary confinement, one can find the possibility of transcendence in faceto-face relationships. In that interaction there is a mutual expectation of response, and an expectation that we will continue the interaction to some defined or perhaps ill-defined and imperfect end.

Levinas is right about the face, and about its irreducibility; but the other's face is not an absolute alterity, nor does it lead us beyond what we can find in our daily interactions.

On the one hand, in this realm (and clearly in the realm of some institutions) there are no guarantees that we find the kind of transcendence that Levinas talks about. On the other hand, the transcendence that may be found in interactions can open up a vista of possibilities—possibilities of further interactions that have the potential to take me beyond myself, and that make the other incalculably significant, someone I turn away from at my own risk.

### **ACKNOWLEDGMENTS**

This work is supported the Marie-Curie Initial Training Network, "TESIS: toward an Embodied Science of InterSubjectivity" (FP7-PEOPLE-2010-ITN, 264828), and the Humboldt Foundation, Anneliese Maier Research Award. An earlier version of this paper was presented at the Workshop on Other Minds. St. Hilda's College, Oxford University (12 March 2013).

#### **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 April 2014; accepted: 18 June 2014; published online: 09 July 2014*. *Citation: Gallagher S (2014) In your face: transcendence in embodied interaction. Front. Hum. Neurosci. 8:495. doi: 10.3389/fnhum.2014.00495*

*This article was submitted to the journal Frontiers in Human Neuroscience*. *Copyright © 2014 Gallagher. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Development of holistic vs. featural processing in face recognition

## **Kazuyo Nakabayashi <sup>1</sup>\* and Chang Hong Liu<sup>2</sup>**

<sup>1</sup> Department of Psychology, University of Hull, Hull, UK

<sup>2</sup> Department of Psychology, Faculty of Science and Technology, Bournemouth University, Poole, UK

#### **Edited by:**

Mark A. Williams, Macquarie University, Australia

#### **Reviewed by:**

Benjamin J. Balas, North Dakota State University, USA Lesley Calderwood, University of the West of Scotland, Scotland

#### **\*Correspondence:**

Kazuyo Nakabayashi, Department of Psychology, University of Hull, Hull, HU6 7RX, UK

e-mail: k.nakabayashi@hull.ac.uk

According to a classic view developed by Carey and Diamond (1977), young children process faces in a piecemeal fashion before adult-like holistic processing starts to emerge at the age of around 10 years. This is known as the encoding switch hypothesis. Since then, a growing body of studies have challenged the theory. This article will provide a critical appraisal of this literature, followed by an analysis of some more recent developments. We will conclude, quite contrary to the classical view, that holistic processing is not only present in early child development, but could even precede the development of part-based processing.

**Keywords: holistic processing, part processing, developmental face recognition, children, encoding switch, holistic interference, part-whole paradigm**

The encoding-switch hypothesis came from a series of experiments carried out by Carey and Diamond (Carey and Diamond, 1977; Diamond and Carey, 1977; Carey et al., 1980). It was a developmental perspective on Yin's (1969, 1970) finding that face recognition was more adversely affected by inversion than the recognition of non-face stimuli, such as airplanes or snowflakes. Face inversion is commonly known to disrupt the processing of configural information (i.e., the spatial layout of facial features) that is critical for successful face recognition.

Carey and Diamond found an adult-like inversion effect among 10-year-olds. In contrast, recognition performance by 6- and 8-year-olds was poorer than adults only for upright but not for inverted faces. Based on the assumption that inversion disrupts configural processing, Carey and Diamond reasoned that configural processing is only fully developed at around 10 years of age. The lack of an age difference for inverted faces could be due to all ages using a similar piecemeal encoding strategy. The authors suggested that the emergence of the inversion effect at the age of 10 might reflect the development of the ability to represent upright faces using configural information.

However, a number of subsequent studies have challenged the methodologies and theoretical arguments in Carey and Diamond's earlier studies. For instance, Flin (1985) argued that young children's sensitivity to facial orientation could have been masked by a floor effect in Carey and Diamond's study. If the recognition task is made sufficiently sensitive, age difference in children's recognition of inverted faces should be observed. To reduce the likelihood of floor effects, Flin re-examined the inversion effect using a small number of faces with long inspection time in an old-new recognition task. As expected, all 7- to 16-year-olds children in the study showed a typical inversion effect.

In fact, there is a growing body of evidence demonstrating that the inversion effect emerges much earlier than previously suggested. Infants' sensitivity to face orientation has been reported in a number of studies (e.g., Fagan, 1972; Cohen and Cashon, 2001; Turati et al., 2004; Bhatt et al., 2005; Leo and Simion, 2009; Zieber et al., 2013). For instance, 5-month-olds were sensitive to changes to configural information in upright faces, but not in inverted faces (Bhatt et al., 2005). Moreover, 4 month-olds recognized upright faces better than inverted faces when identifying the target face (Turati et al., 2004). Furthermore, even newborns were sensitive to the orientation of a face as their preference for attractive faces disappeared when the faces were inverted (Slater et al., 2000). Newborns, as young as 1- to 3-daysold (Leo and Simion, 2009), exhibited sensitivity to first-order configural information (i.e., the spatial layout of facial features common to all human faces) as they showed a preference for faces as opposed to non-face stimuli of comparable visual complexity (e.g., Johnson and Morton, 1991). These results provide compelling evidence for configural processing at a very early stage of life.

It is worth noting that the presence of the inversion effect at an early age only demonstrates the existence of holistic or configural processing. However, this does not mean that such processing is already adult-like or that it will not go through further development. In addition, as prior studies used different tasks (e.g., preferential looking vs. a standard recognition task) tailored to the age of participants, it would be difficult to assess whether the observed effects across studies reflect identical holistic/configural processing.

Moreover, some researchers have pointed out that featural and configural manipulations may not differ in a fundamental way. For instance, Riesenhuber et al. (2004) hypothesized that the inversion effect on configural processing could have resulted from separating faces with configural and featural transformations into different blocks. Such a design could encourage a specific recognition strategy for detecting one type of change, rather than provoking generic face recognition strategies. As predicted, they found no difference in the recognition of inverted configural and featural changes when they were presented in the same block. Others have also questioned the featural and configural distinction (e.g., Sekuler et al., 2004; Yovel and Duchaine, 2006). It would be interesting to establish whether these results could be replicated in children.

Another piece of key evidence for the encoding switch hypothesis comes from the paraphernalia-to-fool paradigm in which Carey and Diamond (1977) manipulated facial expressions and paraphernalia (e.g., hat, hairstyle, and glasses). Here, children were first presented with a target face. Next the target was paired with a distractor face. The task was to identify which face in the pair was the target. The target varied either in facial expression (no expression or expression) or paraphernalia (paraphernalia removed or added), whereas the distractor either wore the same paraphernalia or posed the same expression as the target during inspection.

The finding was that 6- and 8-year-olds were highly susceptible to errors when the distractor face wore the same paraphernalia as the target. This tendency declined markedly among 10 year-olds. Young children's reliance on paraphernalia in making identity judgments was also demonstrated when the pair of images were presented simultaneously (Diamond and Carey, 1977). However, when faces were familiar (e.g., classmates), children were able to ignore paraphernalia but used cues that were diagnostic to identity. Diamond and Carey argued that young children represent unfamiliar faces in terms of isolated features when making identity judgements. However, children at the age of around 10 years start to show an adult level of capacity for efficient processing of configural information of a face. The authors suggested that the developmental changes could be due to increased experience with faces in general, but also it may be a result of the maturation of the right hemisphere for dealing configural representations of faces.

Subsequently, Lundy et al. (2001) suggested that the inconsistencies in the literature could be explained by stimulus size. Studies reporting holistic processing in young children used small images (e.g., Flin, 1985; Baenninger, 1994) while others arguing for a change from part to whole processing used larger images (e.g., Schwarzer, 2000). Young children have a tendency to process a stimulus as an undifferentiated whole when the overall image can be perceived in a single glance. Moreover, young children's differentiation of stimulus components may be related to a limitation in the ability to narrow the focus of attention (see Enns and Girgus, 1985). Hence, the influence of paraphernalia is likely to vary depending on stimulus size as well as age related attentional limitations.

Based on these considerations, Lundy et al. conducted a paraphernalia-to-fool study to examine the effects of visual angle size on face recognition across 3-, 7-, and 10-year-olds. As predicted, 10-year-olds performed better than the two younger groups who showed equivalent performance. The size of a visual angle influenced only 7-year-olds' performance, with a significant improvement with a large visual angle. However, unlike Diamond and Carey, Lundy et al., did not treat the children's susceptibility to paraphernalia as evidence for piecemeal processing. Rather, they interpreted it as evidence for younger children's tendency to process a small image more holistically. According to this, the younger children in Carey and Diamond's study treated faces and paraphernalia as undifferentiated wholes whereby paraphernalia is processed as part of the face. Clearly, this was an important shift from Carey and Diamond's interpretation of the same effect. Whilst Carey and Diamond considered holistic processing as the ability to separate irrelevant features from relevant facial information, Lundy et al. treated the effects of irrelevant features as evidence for holistic processing.

Although paraphernalia (e.g., a change of a hair style or addition of glasses) can also affect adults' face recognition, it does not directly influence their configural or featural processing (Righi et al., 2012). It is because adults tend to include paraphernalia as part of a face, which disrupts encoding of relevant holistic information. Consistent with Lundy et al., Freire and Lee's (2001) paraphernalia study found that children have the capacity to process configural information by 4 years of age. Nevertheless, misleading paraphernalia could still hamper recognition as children's memory is susceptible to superfluous information. Therefore, distracting effects of paraphernalia do not necessarily provide evidence for a lack of configural processing as they could be due to a limitation in more general cognitive abilities.

Moreover, as Baenninger (1994) pointed out, the results from Carey and Diamond's study could have been derived from the distinctiveness of paraphernalia that distracted children's attention away from relevant information of a face. Furthermore, Flin (1985) argues that distinctive paraphernalia would be of greater perceptual salience than relevant facial information when targetdistractor similarity is high, making children more susceptible to distracting effects of paraphernalia. Flin showed that when facial information was made salient (target-distractor dissimilar), 6-year-olds accurately judged the identity while ignoring paraphernalia. However, when target-distractor similarly was high, 4- and 6-year-olds made their identify judgements based on paraphernalia. Flin, therefore, suggests that Diamond and Carey's (1977) findings could be explained by the task difficulty in that young children ignored relevant facial information while attending only to salient paraphernalia cues. Subsequently, Carey and Diamond (1994) modified their original encoding hypothesis by stating that even young children process faces holistically.

Studies using the composite-face paradigm also support holistic processing among young children (e.g., Tanaka et al., 1998; Pellicano and Rhodes, 2003; see also Carey and Diamond, 1994; Carey, 1996; for a modification of their original theory). The composite effect was first reported by Young et al. (1987), who studied how adult face recognition could be affected when the top half of a face was combined with the bottom half of another face. Typically, participants had a greater difficulty with identifying the top half of the composite face when the two halves were aligned than misaligned. This provided strong evidence that adults tend to perceive the composite face as an undifferentiated whole (see Rossion, 2013 for recent review of the composite effect).

The composite effect has later been demonstrated among 6-year-olds (Carey and Diamond, 1994) and preschool children (de Heering et al., 2007; Cassia et al., 2009). Like adults, children's identification of the top part of a composite face was better when the face was misaligned than aligned. In fact, Susilo et al. (2009) even reported a lager composite effect for 8- to 13-year-olds than adults when the stimuli were child faces. Subsequently, Turati et al. (2010) showed that infants as young as 3-month-olds exhibit the composite effect, indicating that they are capable of processing faces holistically.

Overall the literature appears to show no fundamental difference in the way children and adults use holistic information to perceive, store, and recognize faces (e.g., Tanaka et al., 1998; de Heering et al., 2007; Mondlock et al., 2007). This leaves an unanswered question: if holistic processing does not separate children's face processing from that of adults, then what might explain the difference in their recognition performance. We suggest that the answer may partly lie in the inverse of the original encoding switch hypothesis. That is, it may be the proficiency of piecemeal, rather than holistic, processing that takes longer to develop. The evidence for this comes mainly from a holistic interference effect on facial part recognition.

The holistic interference effect was studied in a variant of the part-whole paradigm first developed by Tanaka et al. (Tanaka and Farah, 1993; Tanaka and Sengco, 1997). Unlike the earlier paraphernalia studies that examined the effects of an addition or elimination of non-facial cues (e.g., glasses or hat) which are not part of facial components on identify judgments, the partwhole paradigm directly measures holistic vs. featural processing by examining part recognition in or out of the facial context. Tanaka et al. discovered that adults' part recognition was better in a whole face than in isolation. In a subsequent study, Tanaka et al. showed that by 6 years of age, children displayed similar holistic effects as adults (Tanaka et al., 1998). Hence children's face processing does not change from a part to holistic based strategy from this age. Pellicano and Rhodes (2003) later extended the finding to 4-year-olds by showing that children at this age remembered face parts better when tested in a whole face than in isolation.

Leder and Carbon (2005) subsequently provided a fresh approach to the part-whole paradigm using adults. They pointed out that empirical evidence for the whole face advantage had emerged when whole faces were learnt, leaving the gap in our knowledge as to whether the same advantage would still arise when facial parts are learned without the context of a whole face. An underlying assumption was that if learning parts imposes a strict part-based representation, an additional context at test might be ignored. Hence, performance in whole and part conditions may be comparable.

However, the authors found that when isolated parts were learned, presenting the parts in a whole face at test impaired part recognition. This finding was also seen even when participants knew which a critical part was, which implies that it was the unexpected context that hindered part recognition, but not uncertainly about the critical part *per se*. These results suggest that the interaction between facial features and the whole plays a key role in adult face processing. Leder and Carbon argue that the holistic interference is an essence of holistic processing because it demonstrates how difficult it is to ignore irrelevant facial information in a whole face.

Nakabayashi and Liu (2013) have subsequently investigated the holistic interference effect among children. They found that the effect was strongest among the 6-year-olds relative to 9–10-year-olds or adults. Participants in their study judged whether a sequentially presented pair of probe and target eyes were of the same child in four conditions: (1) both probe and test were isolated eyes (part-part); (2) probe was isolated eyes but tested in a whole face (part-whole); (3) probe was a whole face and tested with a part (whole-part); (4) both probe and test eyes were presented in a whole face (whole-whole). The results showed developmental differences when a part was presented in a whole face (i.e., part-whole, whole-part, and whole-whole), with 6-year-olds showing poorer part recognition performance than 9–10-year-olds or adults. In the part-part condition, 6-year-olds were able to identify the parts as well as the two older groups. These findings suggest that holistic processing is already present at 6 years of age. More importantly, the results demonstrate that it is the ability to inhibit the influence of this holistic processing on part recognition that seems to require a longer period of development.

Nakabayashi and Liu suggest that the developmental differences in part recognition may reflect differences in general inhibitory abilities. For instance, research using the go/no-go task procedure reveals that children's ability to inhibit an automatic response continues to develop throughout childhood (e.g., Dowsett and Livesey, 2000; Brocki and Bohlin, 2004; Berwid et al., 2005). The go/no go task typically requires a key-press response to frequently presented "go" stimuli while inhibiting responses to "no-go" stimuli. Nine-year-olds exhibited better inhibitory processes to soccer balls than 7-year-olds (Cragg et al., 2009). As children get older, they become more able to inhibit their responses at an earlier stage of responding. Perhaps the lack of maturity in this ability among 6-year-olds in Nakabayashi and Liu's study led to their susceptibility to holistic interference. Their 6-year-olds were less able to inhibit the tendency to process irrelevant information as part of a face.

Evidence for a slower development of piecemeal, as opposed to holistic, processing can also be found in a study by Liu et al. (2013), who investigated the development of facial feature processing in 8–9-, 13–14-year-olds, and adults. In one experiment, participants learnt whole faces, followed by a standard old-new recognition test whereby they identified one of the following test items: eyes; nose; mouth; inner face; outer face; or whole face. The results showed no age difference in the recognition of whole faces, but unlike adults and 13–14-year-olds, 8–9-year-olds were unable to distinguish between old and new facial features. More importantly, when part recognition was preceded by whole learning even 13–14-year-olds did not seem to naturally encode and recognize isolated parts. Based on these findings, the authors suggest that the processing of isolated facial regions differs in its developmental course from that of holistic processing, and that holistic processing may be more dominant before adulthood relative to featural and configural processing.

Results reported in these studies call for a more radical revision of the classical encoding switch hypothesis. The reverse of it may be a more accurate description of the developmental trajectory. It seems that holistic processing is a default mode of processing from an early age as there appears no qualitative difference in the way young children and adults use holistic information. The real developmental differences seem to lie in the ability to successfully encode and extract a critical part from a whole face while ignoring irrelevant information. It is this successful execution of piecemeal processing that seems to take longer to develop to an adult level.

Finally, we should caution that although this review has focused on the development of face recognition, we are not making any assumption that the developmental pattern is specific to face perception. As few studies have made a direct comparison between face and non-face visual processing in the developmental literature, it would be difficult to ascertain whether the development of face recognition has its own unique course. However, some researchers have examined whether certain mechanisms are unique to faces by using both faces and non-face objects (e.g., Yovel and Duchaine, 2006). This line of enquiry will be important for future research because it addresses the question of whether the current knowledge about face recognition development is domain specific or domain general.

#### **ACKNOWLEDGMENTS**

We would like to thank Toby Lloyd-Jones and the two reviewers for their helpful comments on the manuscript.

#### **REFERENCES**


Tanaka, J. W., and Sengco, J. A. (1997). Features and their configuration in face recognition. *Mem. Cognit.* 25, 583–592. doi: 10.3758/bf03211301


Zieber, N., Kangas, A., Hock, A., Hayden, A., Collins, R., Beda, H., et al. (2013). Perceptual specialization and configural face processing in infancy. *J. Exp. Child Psychol.* 116, 625–639. doi: 10.1016/j.jecp.2013.07.007

**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 April 2014; accepted: 30 September 2014; published online: 20 October 2014*.

*Citation: Nakabayashi K and Liu CH (2014) Development of holistic vs. featural processing in face recognition. Front. Hum. Neurosci. 8:831. doi: 10.3389/fnhum. 2014.00831*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Nakabayashi and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Face recognition: a model specific ability

## **Jeremy B. Wilmer <sup>1</sup>\*, Laura T. Germine<sup>2</sup> and Ken Nakayama<sup>3</sup>**

<sup>1</sup> Department of Psychology, Wellesley College, Wellesley, MA, USA

<sup>2</sup> Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA

<sup>3</sup> Department of Psychology, Harvard University, Cambridge, MA, USA

#### **Edited by:**

Mark A. Williams, Macquarie University, Australia

#### **Reviewed by:**

Fiona N. Newell, Trinity College Dublin, Ireland Roberta Daini, Università degli studi di Milano - Bicocca, Italy

#### **\*Correspondence:**

Jeremy B. Wilmer, Department of Psychology, Wellesley College, 106 Central Street, Wellesley, MA 02481, USA e-mail: jwilmer@wellesley.edu

In our everyday lives, we view it as a matter of course that different people are good at different things. It can be surprising, in this context, to learn that most of what is known about cognitive ability variation across individuals concerns the broadest of all cognitive abilities; an ability referred to as general intelligence, general mental ability, or just g. In contrast, our knowledge of specific abilities, those that correlate little with g, is severely constrained. Here, we draw upon our experience investigating an exceptionally specific ability, face recognition, to make the case that many specific abilities could easily have been missed. In making this case, we derive key insights from earlier false starts in the measurement of face recognition's variation across individuals, and we highlight the convergence of factors that enabled the recent discovery that this variation is specific. We propose that the case of face recognition ability illustrates a set of tools and perspectives that could accelerate fruitful work on specific cognitive abilities. By revealing relatively independent dimensions of human ability, such work would enhance our capacity to understand the uniqueness of individual minds.

**Keywords: specific ability, individual differences, face recognition, intelligence, IQ, multiple intelligences, cambridge face memory test, generalist gene**

## **INTRODUCTION**

Most of what we know about human cognitive ability—and by ability, we mean *variation across individuals* in performance or potential—concerns *g*. *g* is the single, broad ability that has been observed to account for a large portion of the variation in any sufficiently large and diverse battery of cognitive tests (Spearman, 1904; Carroll, 1993; Jensen, 1998). Studies of *g* (and of highly g-related tests) have long dominated the human abilities literature, producing the bulk of known genetic (Plomin et al., 2013), neural (Deary et al., 2010), clinical (American Psychiatric Association, 2013), academic (Neisser et al., 1996), professional (Schmidt and Hunter, 2004), and personal (Jensen, 1998; Deary, 2012) correlates of human abilities. In contrast to the literature on *g*, the parallel literature on specific abilities, those abilities that correlate little with *g*, is tiny (Neisser et al., 1996; Jensen, 1998; Schmidt and Hunter, 2004; Deary, 2012).

Why do we know so little about specific abilities? Lack of interest cannot account for this limited knowledge. Theories hypothesizing consequential specific ability dimensions have enjoyed wild popularity in fields as diverse as education and business, as well as in the media (Goleman, 1995; Gardner, 2006). Another possible explanation for the lack of knowledge about specific abilities is that they simply do not play a very important role in our lives (Schmidt and Hunter, 2004). Indeed, upon cursory examination, the sheer size and apparent comprehensiveness of the human abilities literature make it difficult to imagine that important specific abilities could have been missed.

We will argue here, nevertheless, that it is too early to write off specific abilities as unimportant or inconsequential. We propose, on the contrary, that the lack of emphasis on specific abilities is an artifact of (a) traditional test development procedures in the human abilities literature and (b) the bottleneck of human subjects testing; and we suggest that recent methodological advances and discoveries could be harnessed to fundamentally rebalance our broad understanding of human talent toward a greater appreciation of specific abilities. We will base this argument on insights we have gained from researching face recognition ability. Work in our labs and others has recently established face recognition as an exceptionally specific ability (Wilmer et al., 2010, 2012; Wilhelm et al., 2010; Davis et al., 2011; Hildebrandt et al., 2011; Peterson and Miller, 2012; McGugin et al., 2012; Palermo et al., 2013).

To be clear, when we use the terms *specific*, *specific ability*, *specificity*, or *specifically* in this paper, we use them in their classic human variation sense to refer to performance that correlates little across individuals with general intelligence (Spearman, 1904; Carroll, 1993; Jensen, 1998). The term *specific* is frequently used differently in the experimental psychology and human neuroscience literatures. In these literatures, it refers neither to individual differences nor to general intelligence, but it is rather used as a shorthand for domain or process specificity (Gazzaniga, 2004). While studies of individual differences can and do effectively tackle questions of domain and process specificity (Wilmer, 2008), here we focus on the more basic question of whether an ability dissociates from (is specific relative to) general intelligence.

In the first section below, we briefly review the evidence that face recognition varies specifically across individuals. In the second section, we examine two illuminating false starts whereby well-resourced efforts to measure face recognition ability misinterpreted promising evidence for its specificity. These false starts demonstrate how easily a specific ability can be overlooked. In the third and final section, we identify three key factors that fueled the recent discovery that face recognition ability is specific and that, we believe, could likewise fuel the discovery of further specific abilities. These factors were: incorporation of priorities, discoveries, and techniques from experimental psychology and cognitive neuroscience; the development and validation of an excellent test; and a powerful internet-enabled Citizen Science approach to investigating human variation.

## **FACE RECOGNITION VARIES SPECIFICALLY ACROSS INDIVIDUALS**

The core evidence that face recognition varies specifically across individuals comes from two complementary sources. The first source is face recognition's dissociations from other, more general cognitive abilities; the second, equally-critical source is the robust associations observed among assessments that measure face recognition ability in very different ways.

Face recognition, as measured by the widely-used Cambridge Face Memory Test (CFMT; Duchaine and Nakayama, 2006), dissociates strongly from more general abilities. It dissociates almost completely from standardized IQ tests. To date, its mean reported correlation with such IQ tests, weighted by sample size and corrected for range restriction in the IQ tests, is 0.01 (Davis et al., 2011; Peterson and Miller, 2012; Palermo et al., 2013). Face recognition, as measured by the CFMT, also dissociates surprisingly strongly from other recognition abilities. It shares a mere 3% of its variation with the recognition of word pairs (*n* = 3003; 95% CIs 2–4%; Wilmer et al., 2010, 2012); and even *within* the realm of visual recognition, it shares only 7% of its variation with the recognition of abstract art images (*n* = 4475; 95% CIs 5–8%; Wilmer et al., 2010, 2012).

These pervasive dissociations from other abilities are not a result of poor measurement. Not only is the CFMT as reliable per unit time as the most widely-used IQ test (Wechsler, 2008; Wilmer et al., 2010), but it correlates well with tests that measure face identity processing in quite different ways. Two such tests are the Cambridge Face Perception Test (CFPT), which correlates 0.60 with the CFMT (*n* = 124; Bowles et al., 2009), and the Cambridge Famous Faces Memory Test (CFFMT), which correlates 0.52 with the CFMT (*n* = 1219; Wilmer et al., 2010, 2012).

The CFPT and CFFMT differ from the CFMT in multiple ways. The CFMT assesses one's ability to memorize a set of previously unfamiliar faces and then, shortly thereafter, recognize them among distractors. The CFPT, in contrast, assesses one's ability to *rank* several faces by the similarity of their identity to a *simultaneously-viewed* "exemplar" face. The CFFMT, in further contrast, assesses one's ability to *attach names or other identifying information to* celebrity faces *learned haphazardly over one's lifetime*.

The CFMT, CFPT, and CFFMT thus differ starkly in both the task being performed (from visual matching in the CFPT to recognition in the CFMT to recall in the CFFMT) and the duration over which faces must be remembered (from milliseconds in the CFPT to minutes in the CFMT to years or decades in the CFFMT), making their robust intercorrelations a powerful demonstration of valid measurement.

Finally, and perhaps most impressive of all, the CFMT correlates 0.37 with a person's self-rating with the single statement "I can recognize famous celebrities in photos or on TV" (*n* = 190; 95% CI 0.24–0.49; Wilmer et al., 2010). This is substantially larger than the average 0.15 correlation found between objective and self-report measures of memory abilities in a major metaanalysis of 24,897 individuals tested across 169 studies (Beaudoin and Desrichard, 2011).

Associations like these between CFMT and CFPT, CFFMT, and self-reported recognition ability critically distinguish specificity from invalid measurement. As we will see below, such associations, as a counterpoint to face recognition's persistent dissociations, were the missing piece in prior face recognition ability research.

In addition to being specific, human variation in face recognition is highly heritable (Wilmer et al., 2010). This combination of specificity and heritability is rare (Wilmer et al., 2010). Indeed, so consistently has specificity traded off against heritability in past research that a recent behavioral genetic theory, the "generalist gene" theory, posited that most or all cognitive variation results from the same set of genes (Kovas and Plomin, 2006). A major exception to the generalist gene theory (Wilmer et al., 2010; Plomin et al., 2013), face recognition's heritability demonstrates that different sets of genes contribute independently to human cognitive ability. Given the example of face recognition, it is worth considering not only how many other specific abilities exist, but also whether any of them are as strongly heritable as face recognition.

In sum, face recognition, at least when measured via the CFMT, is exceptionally specific. Moreover, it is rare among specific abilities for its high heritability. In the next section, by examining two earlier false starts in the valid measurement of face recognition ability, we illustrate barriers to the discovery of its specificity that could bear importantly on the search for further specific abilities.

## **TWO FALSE STARTS IN THE MEASUREMENT OF FACE RECOGNITION ABILITY**

In this section, we will recount two major efforts to assess face recognition ability. These efforts, begun nearly 70 years apart, are among history's most concerted efforts to measure any social ability (Kihlstrom and Cantor, 2000; Wilmer et al., 2012). In each case, initial promising evidence for face recognition's specificity was misinterpreted as invalid measurement, and development of the test in question was abandoned. These missed opportunities to examine the specific ability of face recognition seem unlikely to us to be isolated examples. The lessons learned from these missed opportunities may therefore provide valuable information on where to search for additional specific abilities.

The first false start in the measurement of face recognition ability involved the George Washington Social Intelligence Test (GWSIT), developed in the late 1920s (Hunt, 1928). The GWSIT consisted of six subtests, two of which involved faces. A face recognition subtest assessed the ability to learn the names for a set of twelve novel target faces; then, presented with a larger group of faces, one was required to pick out the target faces and recall their names. The second subtest involving faces assessed the ability to label the mental states of faces based on their expression. The remaining four subtests verbally assessed other aspects of social knowledge and social judgment (Hunt, 1928).

The initial validation study for the GWSIT clearly showed that, though none of its subtests correlated particularly highly with each other (maximum *r* = 0.44), the face recognition subtest dissociated most strongly of all from the other subtests (mean *r* = 0.22; Hunt, 1928). On the basis of these dissociations, as well as a dissociation from a measure of general intelligence, Hunt (1928) presciently suggested that "the special ability of being able to recognize [faces] is relatively independent of pure 'brains"'.

What happened next is telling. Surprisingly, at least in hindsight, the promising evidence that the GWSIT provided for face recognition's specificity was not eagerly pursued. Quite to the contrary, the GWSIT was roundly criticized for failing to measure a unitary social ability. That is, it was criticized (a) because its subtests dissociated strongly from each other; and (b) because the small amount of overlap between its subtests was ultimately attributed to general intelligence (Thorndike, 1936; Thorndike and Stein, 1937). On this basis, the GWSIT rapidly fell out of favor as a research instrument (Kihlstrom and Cantor, 2000). Moreover, a mere two decades after it was introduced, the GWSIT was cited, in what would soon become the classic paper on test validity, as the classic example of an invalid test (Campbell and Fiske, 1959). In sum, far from inspiring further research, face recognition's clear and persistent dissociations from other abilities were the core inspiration for the rejection of the GWSIT as a valid ability measure.

Lest one be tempted to write off the rejection of the GWSIT as an isolated historical event, let us move forward nearly 70 years to a second, remarkably similar story. This story involved the third edition of the highly influential Wechsler Memory Scale (WMS-III), introduced in 1997 (Wechsler, 1997). The WMS-III added, for the first time in the WMS's history, a face recognition subtest. This subtest assessed the ability to memorize a set of faces and then classify a subsequent series of faces as "old" (seen before) vs. "new" (not seen before) (Wechsler, 1997). As with the GWSIT, the WMS-III's face recognition subtest dissociated persistently from other measures. These other measures included the WMS's own verbal and visual recognition subtests (Wechsler, 1997; Millis et al., 1999; Holdnack and Delis, 2004). Again, such dissociations were viewed as a liability rather than a virtue. The face recognition subtest was criticized for its dissociations (Millis et al., 1999; Holdnack and Delis, 2004), and it was dropped from the WMS-IV (Wechsler, 2009).

Seven decades apart, the story was the same. Face recognition's dissociations fueled a presumption of invalid measurement and an abandonment of measures, with remarkably little work aimed at disentangling specificity from invalid measurement by examining correlations across diverse measures of face recognition ability. The persistence with which face recognition was overlooked in these cases illustrates a blind spot for specificity that we believe is broadly characteristic of traditional test development practices in the human ability literature.

## **FACE RECOGNITION AS A MODEL IN THE SEARCH FOR FURTHER SPECIFIC ABILITIES**

We will now discuss three key factors that fueled the recent discovery that face recognition varies specifically across individuals, and that could plausibly fuel the discovery of further specifically varying abilities. These factors were: incorporation of priorities, discoveries, and techniques from experimental psychology and cognitive neuroscience; the development and validation of an excellent test; and a powerful internet-enabled Citizen Science approach to investigating human variation.

In contrast to the human ability literature's capacity to overlook dissociations, the cognitive neuroscience and experimental psychology literatures have, throughout their history, actively sought out dissociations. A remarkable aspect of the WMS story is that its face recognition subtest was introduced the same year, 1997, as major reports of face-selective activation in the human fusiform face area (FFA; see also Sergent et al., 1992; Kanwisher et al., 1997; McCarthy et al., 1997). Simultaneously, the WMS's dissociations inspired disappointment and rejection, while the FFA's dissociations inspired excitement and followup work. Indeed, the FFA's dissociations, along with other key neural and cognitive dissociations, have played a central role in solidifying the status of face processing as a major model system in studies of mind and brain. Such different reactions to evidence for dissociation are instructive when considering where to look for specific abilities. Perhaps equally valuable inspiration on where to look could be derived from the orphan tests of human abilities research (tests that were reliable yet abandoned due to their persistent dissociations) and the core dissociable model systems of cognitive neuroscience and experimental psychology.

As illustrative examples of ability domains that could plausibly contain additional specific abilities, consider social cognition, navigation, and dynamic visual perception. In the case of social cognition, the dissociations produced by the GWSIT raise the possibility that additional specific social abilities may exist (Hunt, 1928; Kihlstrom and Cantor, 2000; see also Mayer et al., 2008), and several aspects of social cognition, including theory of mind and joint attention, have been associated with distinct neural areas (Saxe, 2006). Navigation and dynamic visual perception, too, each involve well-defined neural areas (Epstein and Kanwisher, 1998; Newsome and Pare, 1988), and appear to dissociate from at least some general abilities (Hegarty et al., 2006; Wilmer and Nakayama, 2007). These are merely a few illustrative examples of the many domains in which orphan tests and/or functional or neural dissociations exist. We expect that there exist tens or hundreds of additional areas were such evidence is compelling enough to consider initiating a search for specific abilities.

The recent discovery of face recognition's specificity owes much to the careful development of a single, high-quality test: the CFMT (Duchaine and Nakayama, 2006). Ironically, it was the cognitive neuroscience and experimental psychology literatures, not the human abilities literature, that inspired the development of the CFMT (Duchaine and Nakayama, 2006). The CFMT's development drew primarily from three scientific areas. First, it drew from the stimulus-control techniques of visual psychophysics to produce well-controlled stimuli. Second, it drew from the dissociation-focused manipulations of cognitive neuroscience and experimental psychology to achieve an effective isolation of face processing mechanisms. Third, it drew from the practical test design methods of patient-based neuropsychological testing to minimize its demands on test-takers' general cognitive resources, including their capacity to attend, interpret, and problem-solve (taxing such general resources likely increases a test's reliance on general intellectual ability) (Duchaine and Nakayama, 2006).

The exceptional specificity of face recognition, as measured by the CFMT, is a case study in the value of incorporating the priorities, discoveries, and techniques of cognitive neuroscience and experimental psychology into efforts to measure human ability. Meaningful progress in the isolation of specific abilities, however, additionally requires a combination of rigorous psychometrics and access to the large, diverse samples of participants that enable iterative development, validation, and norming of high-quality tests.

Fortunately, we live at a time when the internet has opened up unprecedented opportunities for testing large samples. Resources such as Amazon Mechanical Turk®–an online clearing-house for small jobs where psychological research is increasingly conducted—enable the rapid recruitment and testing of large samples. Our own web-based work on face recognition and other abilities has been powered by our *Citizen Science* project TestMy-Brain.org (Germine et al., 2012). As with other citizen science initiatives (Bonney et al., 2009), TestMyBrain.org seeks to actively collaborate with the general public to answer scientific questions. At TestMyBrain.org, we make high-quality tests freely available via the web, and participants complete these tests to learn about themselves. We then aggregate data across participants to further refine the tests we offer and to answer scientific questions. Due to high public interest in self-discovery, the ease of participation across demographic groups, and the near-zero incremental cost of recruiting and testing each additional participant, our studies of face recognition have been able to rapidly collect high-quality data from many thousands of individuals of varied age, sex, occupation, and socioeconomic status (Wilmer et al., 2010, 2012; Germine et al., 2011a,b, 2012). Citizen science projects like Test-MyBrain.org, as well as other large-scale internet-based testing projects like Mechanical Turk, provide the necessary throughput to capture specific abilities and examine their importance in our lives.

#### **CONCLUSIONS**

Here, we have examined face recognition as a model specific ability. First, we reviewed the recent work that documents face recognition's specificity. Second, we recounted two major false starts in the measurement of face recognition ability. These false starts reveal a capacity for specific abilities not only to be missed, but indeed, to be actively avoided by major test development efforts. Third, we discussed three key factors that contributed to the discovery that face recognition ability is specific and that, we believe, could serve as a compass for the discovery of further specific abilities. These factors were: incorporation of priorities, discoveries, and techniques from experimental psychology and cognitive neuroscience; the development and validation of an excellent test; and a powerful internet-enabled Citizen Science approach to investigating human variation. We suggest that the time is right for a renewed effort to investigate specific abilities, and that this effort can be guided by the model example of face recognition ability. By revealing relatively independent dimensions of human ability, such work would enhance our capacity to understand the uniqueness of individual minds.

### **ACKNOWLEDGMENTS**

This work was supported by a Brachman Hoffman Fellowship to Jeremy B. Wilmer.

## **REFERENCES**


Germine, L., Nakayama, K., Duchaine, B. C., Chabris, C. F., Chatterjee, G., and Wilmer, J. B. (2012). Is the Web as good as the lab? Comparable performance from Web and lab in cognitive/perceptual experiments. *Psychon. Bull. Rev.* 19, 847–857. doi: 10.3758/s13423-012-0296-9

Goleman, D. (1995). *Emotional Intelligence.* New York: Bantam Books.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 09 May 2014; accepted: 10 September 2014; published online: 10 October 2014*.

*Citation: Wilmer JB, Germine LT and Nakayama K (2014) Face recognition: a model specific ability. Front. Hum. Neurosci. 8:769. doi: 10.3389/fnhum.2014.00769 This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Wilmer, Germine and Nakayama. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## A parametric study of fear generalization to faces and non-face objects: relationship to discrimination thresholds

#### **Daphne J. Holt 1,2,3\*, Emily A. Boeke<sup>1</sup> , Rick P. F. Wolthusen1,4 , Shahin Nasr 3,5 , Mohammed R. Milad1,2 and Roger B. H. Tootell 2,3,5**

<sup>1</sup> The Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA

<sup>2</sup> Harvard Medical School, Boston, MA, USA

<sup>3</sup> The Athinoula A. Martinos Center for Biomedical Imaging, Charlestown, MA, USA

<sup>4</sup> Department of Child and Adolescent Psychiatry, Faculty of Medicine Carl Gustav Carus of the Technische Universität Dresden, Dresden, Germany

<sup>5</sup> The Department of Radiology, Massachusetts General Hospital, Boston, MA, USA

#### **Edited by:**

Aina Puce, Indiana University, USA

#### **Reviewed by:**

Andrew Jahn, Indiana University, USA

Assaf Harel, National Institutes of Health, USA

#### **\*Correspondence:**

Daphne J. Holt, The Department of Psychiatry, Massachusetts General Hospital, 149 13*th* St., Room 2608, Charlestown, Boston, MA 02129, USA e-mail: dholt@partners.org

Fear generalization is the production of fear responses to a stimulus that is similar—but not identical—to a threatening stimulus. Although prior studies have found that fear generalization magnitudes are qualitatively related to the degree of perceptual similarity to the threatening stimulus, the precise relationship between these two functions has not been measured systematically. Also, it remains unknown whether fear generalization mechanisms differ for social and non-social information. To examine these questions, we measured perceptual discrimination and fear generalization in the same subjects, using images of human faces and non-face control stimuli ("blobs") that were perceptually matched to the faces. First, each subject's ability to discriminate between pairs of faces or blobs was measured. Each subject then underwent a Pavlovian fear conditioning procedure, in which each of the paired conditioned stimuli (CS) were either followed (CS+) or not followed (CS−) by a shock. Skin conductance responses (SCRs) were also measured. Subjects were then presented with the CS+, CS− and five levels of a CS+-to-CS− morph continuum between the paired stimuli, which were identified based on individual discrimination thresholds. Finally, subjects rated the likelihood that each stimulus had been followed by a shock. Subjects showed both autonomic (SCR-based) and conscious (ratings-based) fear responses to morphs that they could not discriminate from the CS+ (generalization). For both faces and non-face objects, fear generalization was not found above discrimination thresholds. However, subjects exhibited greater fear generalization in the shock likelihood ratings compared to the SCRs, particularly for faces. These findings reveal that autonomic threat detection mechanisms in humans are highly sensitive to small perceptual differences between stimuli. Also, the conscious evaluation of threat shows broader generalization than autonomic responses, biased towards labeling a stimulus as threatening.

**Keywords: fear, faces, emotion, learning, generalization, perception**

## **INTRODUCTION**

Fear generalization is an adaptive process in which a fear response occurs to stimuli that are similar to a threatening stimulus (Lissek et al., 2008; Hajcak et al., 2009; Dunsmoor and Labar, 2013; Haddad et al., 2013). Some generalization of fear responses is presumably crucial for survival, because similar stimuli may well be similarly dangerous. For instance, it is appropriate to be afraid of a dog that looks and sounds like a dog that previously bit you ("once bitten, twice shy"). However, fear generalization processes may be abnormal in some psychopathological states (Lissek, 2012).

The process of stimulus generalization has been studied for decades, using a variety of methods and stimuli, in a range of species including pigeons, goldfish, worms and humans (Ghirlanda and Enquist, 2003). In humans, the generalization of fear-related responses has been studied primarily using Pavlovian fear conditioning paradigms. In these studies, a variety of outcomes have been used to index fear generalization, including electromyography-measured startle responses (Lissek et al., 2008; Hajcak et al., 2009; Haddad et al., 2013), skin conductance responses (SCRs; Vervliet et al., 2010b; Dunsmoor and Labar, 2013) and explicit ratings (ERs) of fear or shock likelihood (Vervliet et al., 2006, 2010b; Lissek et al., 2008; Hajcak et al., 2009). Each of these studies found significantly increased fear-related responses to stimuli that were perceptually similar (compared to those that were less similar) to a conditioned stimulus (CS) that had been paired with an aversive outcome, such as an electrical shock. In other words, conditioned fear responses were found to generalize from a CS (typically an abstract shape such as a circle or a rectangle) paired with a shock (a CS+), compared to a slightly altered version of that CS that was not paired with the shock (a generalization stimulus, GS). Although these studies have described a qualitative association between fear generalization magnitudes and the degree of perceptual similarity of the GSs to the CS+, the precise relationship between discrimination ability and fear generalization in humans has not been systematically studied. One might predict that autonomic measures of fear responses would generalize beyond perceptual discrimination thresholds, i.e., subjects show fear responses to similar but easily distinguishable stimuli. Alternatively, autonomic responses might be more sensitive than perception in some cases, based on prior demonstrations of sub-threshold summation (Kulikowski and King-Smith, 1973; To et al., 2011) and "unconscious" fear responses (Morris et al., 1998; Whalen et al., 1998).

An additional possibility is that fear generalization gradients might narrow or broaden depending on the context or type of stimuli encountered. For example, the ability to both discriminate and extract common features from similar stimuli is important in social contexts. It is often necessary to quickly assess whether an individual is a friend or foe, generalizing from prior experience and erring on the side of a defensive posture when in doubt, until additional information becomes available. However, the benefits of generalization during social interactions are balanced against the advantages of being able to discriminate among specific individuals with whom one has different relationships.

Recognition and discrimination among distinct humans occurs primarily via recognition of faces (McKone et al., 2007). Many lines of evidence suggest that faces are processed in a specialized manner by the brain. For example, psychophysical studies have shown that faces are processed "holistically" (Kemp et al., 1996; Farah et al., 1998; Hole et al., 1999). In contrast, other types of stimuli are processed in a more piecemeal manner, based on their feature components. Face-specific processing mechanisms are anatomically segregated in specialized pathways in the brain in both humans and monkeys (Kanwisher et al., 1997; Tsao et al., 2008; Pinsk et al., 2009; Rajimehr et al., 2009; Ku et al., 2011; Nasr and Tootell, 2012). Thus, it is possible that these unique aspects of face perception influence the generalization of fear responses across perceptually similar faces.

Thus, in the current study, we aimed to (1) measure the relationship between visual discriminability and fear generalization; and (2) compare fear generalization gradients for faces and non-face control stimuli. First, we predicted that significant fear generalization would occur to stimuli that were indistinguishable from a threatening stimulus (one that had been associatively linked to an aversive experience, an electrical shock). Second, we predicted that fear generalization would be greater to faces, compared to non-face control stimuli.

## **MATERIALS AND METHODS**

#### **OVERVIEW OF EXPERIMENT 1 (FACES) AND EXPERIMENT 2 ("BLOBS")**

We created image morphs between images of (1) two distinct human faces (Experiment 1); and (2) two distinct non-face shapes or "blobs" (Experiment 2) (**Figure 1**). Later in the experiment, one of the two faces or blobs (the conditioned stimuli, CS) was paired with an electrical shock (the CS+) during a Pavlovian fear conditioning procedure.

First, each subject performed a *discrimination task* to identify the image morph that he could distinguish from the CS+ stimulus at a 75% accuracy level. This value was defined as the Just Noticeable Difference (JND) threshold.

Second, subjects underwent a Pavlovian *fear conditioning procedure*, in which the CS+ stimulus was intermittently followed by a shock, and the other stimulus of the pair was not followed by a shock (the CS−). SCRs were measured continuously.

Third, subjects underwent a *fear generalization procedure* during which they were presented with the CS+, CS− and five morphs whose degree of difference from the CS+ was determined by the subject's performance on the discrimination task (i.e., the specific JND for that subject). SCRs were measured continuously.

Fourth, subjects were presented with each of the previously presented stimuli, and asked to rate the likelihood that each stimulus had ever been followed by a shock (*explicit ratings*).

## **PARTICIPANTS**

Seventy-one healthy male volunteers (mean age: 24.61 ± 0.91) were recruited using an on-line advertisement and enrolled in the study (39 and 32 for Experiment 1 and 2, respectively). Only males were included in this initial study in order to minimize SCR heterogeneity related to gender differences in fear responses (Milad et al., 2006). Participants had no history of psychiatric or neurologic illness, as determined by a phone screen and the Structured Clinical Interview for DSM-IV (SCID; First et al., 1995). All subjects had normal or near normal vision, based on Snellen acuity.

The study was approved by the Partners Healthcare Institutional Review Board, and written informed consent was obtained from all subjects at the time of enrollment.

#### **STIMULI**

#### **Experiment 1: Faces**

Four images of human faces (see **Figure 1A**) were generated using FaceGen 3.4 (Singular Inversions, Canada), as described previously (Yue et al., 2011, 2013; Holt et al., 2014). All four faces (A, B, C and D) were male and caucasian, and achromatic (i.e., all color parameters were set to 0). FaceGen was then used to create morphs (99 even steps) between faces A and B and between faces C and D.

#### **Experiment 2: Non-face "blobs"**

Four images of three-dimensional, unfamiliar shapes ("blobs"; see **Figure 1B**) were generated as described elsewhere (Yue et al., 2013). To equate the texture pattern of the blob and face stimuli, a synthesized texture was generated from scrambling the texture of the face images (Portilla et al., 2003) and overlaid onto the blob stimuli. As with the faces, morphs (99 even steps) were created between blobs E and F and between blobs G and H.

#### **DISCRIMINATION TASK**

Participants were assigned one of the two pairs of face (A/B or C/D, Experiment 1) or blob (E/F or G/H, Experiment 2) stimuli.

The assigned face or blob pair was counterbalanced across subjects. Later in the experiment (during the Fear Conditioning procedure, see below), one of the two stimuli (the CS+) was paired with an electrical shock (the unconditioned stimulus, US), while the other stimulus (the CS−) was not paired with a shock. The CS+ and CS− assignment within each pair was counterbalanced across subjects.

Before the Fear Conditioning procedure, the subjects' ability to discriminate between the pair of stimuli assigned to them was evaluated using a forced-choice discrimination task. Prior to any measurements, the subject practiced the task until they confirmed that they understood the procedure (3–5 trials). The task consisted of three runs of 50 trials each. During each trial, participants first viewed the CS+ stimulus for 500 ms. Following an inter-trial interval (ITI) of 500 ms, the participants were presented with a morph and the CS+ stimulus side by side. Subjects were then asked to select which stimulus they had previously seen, by pressing one of two buttons, indicating the image on the right or the left. The positioning of the morph and CS+ stimulus was randomized across trials. Participants had unlimited time to respond (self-paced). The morphs used in the discrimination task were 6%, 12%, 24%, 48%, and 100% different from the CS+ stimulus (100% different = the other stimulus of the pair, the CS−). The participant's response was followed by an ITI of 1 s. Following completion of the task, participant accuracy was plotted against the morph level (the percentage difference from the CS+ stimulus), in order to calculate the JND level (**Figure 2**) for that participant. The JND was the morph level (% difference from the CS+ stimulus, which could fall between the morphs presented during the discrimination task) that could be distinguished from the CS+ stimulus at an accuracy of 75%. For an independent experiment (not shown here), subjects performed this task a second time (3 additional runs) following the completion of the Fear Conditioning and Fear Generalization phases of the experiment.

### **FEAR CONDITIONING AND FEAR GENERALIZATION**

The Coulbourn Instruments Lablink V System (Allentown, Pennsylvania) was used for these two phases of the experiment. Skin conductance levels were measured with the Coulbourn Isolated Skin Conductance Coupler. Before the Fear Conditioning procedure, two electrodes were placed on the palm of the participant's left hand (to record SCRs) and on the index finger and middle finger of the participant's right hand (to deliver the US, a mild electrical stimulus 500 ms in duration). Next, the intensity of the US was set by each participant to a level that was "highly annoying but not painful" (Milad et al., 2005; Holt et al., 2009). Also, prior to these procedures, the subjects were told that, during the experiment, each stimulus may or may not be followed by the US, but one stimulus was more likely to be followed by the US. They were also told that they would be asked questions about what they had observed following the experiment. Throughout these two procedures, subjects were observed through a closed circuit video camera to ensure that they were awake and attentive.

#### **Fear conditioning**

This phase consisted of 8 CS+ trials and 8 CS− trials, each 6 s long, presented in a pseudorandom order (Milad et al., 2005).

ITIs were 9, 12, or 15 s in duration. The CS+ was followed by the US in 5 of the 8 CS+ trials, and the CS− was never followed by a shock.

In the first 17 participants, a pilot version of the Fear Conditioning phase was used (12 trials, 50% reinforcement). Because this version did not produce reliable learning in this group (learning occurred in 12/17 subjects), this phase was modified. (Since the goal of the study was to examine generalization of previously learned fear responses, it was important that subjects demonstrate adequate fear learning initially, see below).

#### **Fear generalization**

This phase began after a 1-min break following Fear Conditioning. During the Fear Generalization procedure, subjects were presented with the CS+, CS− and five morph levels (m1, m2, m3, m4, and m5) whose degree of difference from the CS+ was determined by the subject's performance on the discrimination task (*m1* = 0.125 JND; *m2* = 0.25 JND; *m3* = 0.5 JND; *m4* = 1.0 JND; *m5* = 1.5 JND) (**Figure 3**). This phase consisted of 35 trials, i.e., five trials for each stimulus category. The ITIs were again 9, 12, and 15 s in length and each stimulus was presented for 6 s. For each subject, stimuli were presented in one of two different pseudorandom orders, so that no more than two of the same stimuli were presented consecutively (to avoid habituation of responses due to repetition) (Lissek et al., 2008; Dunsmoor et al., 2009), counterbalanced across subjects. During this phase, the CS+ was always followed by the shock (100% reinforcement), in order to minimize extinction of the association produced by viewing many CS+-like stimuli that were not followed by a shock.

### **EXPLICIT FEAR RATINGS**

Following Fear Generalization, subjects were presented with each of the previously presented stimuli once (in one of two pseudorandom orders, counterbalanced across subjects; stimulus presentation time = 6 s; ITI = 9 s), then asked to rate the likelihood that the stimulus had ever been followed by a shock (on a scale of 0–100% likely).

#### **SKIN CONDUCTANCE DATA PRE-PROCESSING**

During Fear Conditioning and Fear Generalization, skin conductance was recorded continuously. A participant was considered a "responder" if ≥2 of the 16 trials of the Fear Conditioning phase showed a response greater than 0.05 µS (Schnur et al., 1999; Turner et al., 2005). Data from subjects which did not meet this criteria ("non-responders") were excluded from further analysis (see below).

For both Fear Conditioning and Fear Generalization, the SCR to the stimulus was calculated by subtracting the mean skin conductance for the 2 s prior to stimulus onset from the peak of the skin conductance during the 6 s of stimulus presentation. In addition, for the analysis of the Fear Generalization data only, SCRs were calculated in an identical manner (using the 2 s prior to stimulus onset as the baseline) for the first 6 s of the ITI that immediately followed stimulus offset. Thus, we tested for fear generalization during two time intervals: (1) during the stimulus presentation (*immediate* fear generalization, IFG); and (2) following stimulus offset (*delayed* fear generalization, DFG). SCRs were square-root transformed and averaged across each stimulus type for both Fear Conditioning (CS+ and CS−) and Fear Generalization (CS+, m1, m2, m3, m4, m5, CS−), prior to the statistical analyses.

#### **PARTICIPANTS INCLUDED IN THE ANALYSES**

Because our goal was to measure fear generalization in participants who had successfully learned to discriminate the CS+ and CS−, we included data in our analyses from the participants who demonstrated successful learning only. The "learner" criterion for each individual consisted of a difference in shock likelihood ratings between the CS+ and CS− stimuli ≥50%.

Also, data from three subjects were excluded from the analyses because their discrimination task data were unusable; another subject's data were excluded because he fell asleep during the Fear Generalization procedure. Of the remaining 67 subjects, 53 were learners (28 and 25 subjects in Experiment 1 and 2, respectively). Of the 53 learners, 6 were non-responders (4 and 2 subjects in Experiment 1 and 2, respectively). Thus, 47 subjects (24 and 23 subjects in Experiment 1 and 2, respectively) were included in the analyses (mean age: 23.61 ± 0.94).

Lastly, two of the subjects who participated in Experiment 2 had JND values following the discrimination task that were too high to permit assignment of stimuli. These subjects were assigned generalization stimuli that differed maximally from the CS+ (*m1* = 8%, *m2* = 17%, *m3* = 33%, *m4* = 66%, *m5* = 99%). Excluding these two subjects from the analyses did not alter the findings.

## **DATA ANALYSES**

#### **Fear conditioning**

The presence of significant differential fear conditioning (CS+ minus CS− responses, *p* < 0.05) was assessed using paired, twotailed *t*-tests.

#### **Fear generalization**

In the *SCR data*, we tested for fear generalization using a repeated measures ANOVA with three factors: stimulus level (6: m1, m2, m3, m4, m5, CS−), experimental phase (2: during stimulus presentation, following stimulus offset), and stimulus type (2: faces, blobs) as a between-subjects factor. SCR data collected for the CS+ was not included in this analysis (thus, there are six stimulus levels in this ANOVA), since the presence of the shock during the ITI phase confounds the measurement of the CS+ response following stimulus offset. Significant main effects and interactions with stimulus level (*p* < 0.05) were followed up by paired, two-tailed *t*-tests.

In the *ER data*, we conducted a second repeated measures ANOVA with two factors: stimulus level (7: CS+, m1, m2, m3, m4, m5, CS−), and stimulus type (2: faces, blobs) as a betweensubjects factor. Significant main effects and interactions with stimulus level (*p* < 0.05) were followed up by paired, two-tailed *t*-tests.

*Comparison of autonomic and explicit fear generalization*: we compared the amount of fear generalization in the SCRs (independently for the IFG and DFG responses) to the fear generalization in the ERs, using ANOVAs performed on normalized data (normalization permitted comparison of SCR and ratings data). These two ANOVAs included three factors: response type (2: SCR, ratings), stimulus level (seven or six for the IFG and DFG analyses, respectively), and stimulus type (2: faces, blobs) as a betweensubjects factor.

For each subject, each averaged value for a given stimulus was normalized using the following formula: (Value − Minimum)/(Maximum − Minimum), where Minimum is the smallest average value for a given subject (i.e., the subject's average value for the CS+, m1, m2, m3, m4, m5, or CS−, whichever is the smallest), and Maximum is the largest averaged value for the subject (i.e., their average value for the CS+, m1, m2, m3, m4, m5, or CS−, whichever is the largest). Thus, each subject's largest average response was scaled to 1, and the smallest response was scaled to 0.

## **Correlations**

Correlations among fear conditioning, fear generalization (for the morphs to which there was significant fear generalization, see below) and JND levels were examined using Pearsons *r*.

## **RESULTS**

## **FEAR CONDITIONING**

In both Experiment 1 and 2, subjects acquired differential, conditioned fear responses (CS+ > CS−, *p*s < 0.001). We found no difference between the level of differential fear conditioning acquired during the two experiments (*t*(45) = 0.69, *p* = 0.50; mean SCR to CS+ = 0.40 ± 0.07 µS (mean ± SEM); mean SCR to CS− = 0.16 ± 0.08 µS across all subjects (*n* = 47); comparison of the CS+ vs. CS−: *t*(46) = 7.22; *p* = 4 × 10−<sup>9</sup> ). During Fear Generalization, this learning was maintained (i.e., SCRs were significantly greater to the CS+ compared to the CS− during Fear Generalization in both experiments (*p*s < 0.004)).

## **FEAR GENERALIZATION: SKIN CONDUCTANCE RESPONSES**

Fear generalization was defined by the presence of a significantly greater SCR to a morph (m1 +/− the other morphs) compared to the SCR to the CS− (Lissek et al., 2008; Haddad et al., 2013). The ANOVA revealed a significant effect of stimulus level (*F*(5,225) = 4.78, *p* < 0.001) and a significant stimulus level by experimental phase interaction (*F*(5,225) = 2.55, *p* = 0.03), with no main effects or interactions with stimulus type (all *p*s > 0.22). Follow-up tests revealed that, across both experiments, during the stimulus presentation, there was significant generalization to m1, compared to the CS− (*t*(46) = 2.08, *p* = 0.043; *p*s > 0.25 for the other morph levels). Following stimulus offset, there was generalization to m1 (*t*(46) = 3.80, *p* = 0.0004) and m2 (*t*(46) = 2.74, *p* = 0.009), and a trend towards generalization to m3 (*t*(46) = 1.93, *p* = 0.06) and m4 (*t*(46) = 1.90, *p* = 0.06), with no generalization to m5 (*p* = 0.29) (**Figure 4**). Thus, there was both immediate (IFG, during stimulus presentation) and delayed (DFG, following stimulus offset) fear generalization to morphs that were perceptually similar to the CS+ (i.e., perceptually closer to and indistinguishable from the CS+, compared to the JND threshold = m4).

The interaction with experimental phase arose from the greater amount of delayed, compared to immediate, fear generalization (DFG > IFG, **Figure 5**). A direct comparison of the differential SCRs (response to the morph − the response to the CS−) during the two phases of the experiment confirmed that the responses were greater for m1 (*t*(46) = 3.32, *p* = 0.002) and m2 (*t*(46) = 2.14, *p* = 0.04) (*p*s for the other morph levels >0.12) following stimulus offset, compared to during stimulus presentation.

Consistent with the absence of an interaction with stimulus type in the ANOVA, the pattern of responses was similar across Experiments 1 (faces) and 2 (blobs), although the effects at the individual morph levels appeared to be slightly (but nonsignificantly) stronger in Experiment 1 (**Figures 4** and **5**).

## **FEAR GENERALIZATION: EXPLICIT RATINGS**

For the ERs, there was a significant effect of stimulus level (*F*(6,270) = 61.29, *p* < 0.001), as well as a significant interaction between stimulus type and level (*F*(6,270) = 2.5, *p* = 0.023). This pattern of results arose from the presence of (1) explicit fear generalization to the morph stimuli; and (2) greater fear generalization in Experiment 1 (faces) compared to Experiment 2 (blobs) (**Figure 6**). Using the CS− as the baseline, comparison condition, we found fear generalization to all morph levels in both experiments (all *p*s < 0.013). However, because the shock likelihood ratings of the CS− were always 0, we also computed fear generalization using the ratings for m5 (the morph that was the most different from the CS+) as the comparison condition. Compared to the m5 ratings, in Experiment 1, subjects showed significantly greater shock likelihood ratings to the CS+, m1, m2, m3 and m4 (all *p*s < 0.0008), whereas in Experiment 2, subjects showed greater shock likelihood ratings to the CS+, m1, m2, m3 (all *p*s < 0.01) but not m4 (*p* = 0.52). Consistent with this, a direct comparison of the ratings across the two experiments at each stimulus level showed that there were significantly higher shock likelihood ratings in Experiment 1 compared to Experiment 2 for m1 (*t*(45) = 2.06, *p* = 0.046) and m3 (*t*(45) = 2.77, *p* = 0.008) (*p*s for the comparisons at the other stimulus levels >0.17).

In summary, although the SCRs showed similar generalization patterns and magnitudes across the two experiments (i.e., to faces and blobs), there was greater *explicit* fear generalization to perceptually similar faces, compared to the non-face control stimuli.

## **DIRECT COMPARISON OF AUTONOMIC (SCR) AND EXPLICIT FEAR GENERALIZATION IN NORMALIZED DATA**

#### **I. Comparison of shock likelihood ratings vs. SCRs during stimulus presentation**

Here again we found a significant effect of stimulus level (*F*(6,270) = 46.10, *p* < 0.001), consistent with the results described above showing fear generalization to the morphs for both the SCRs and ERs, across both experiments (**Figure 7A**). In addition, there was a significant interaction of stimulus level by response type (*F*(6,270) = 17.16, *p* < 0.001), with no significant interactions with stimulus type (*p*s > 0.07). In the normalized data, the ratings values were significantly greater than the SCR values for m1, m2, and m3 (*p*s < 0.002) but not for m4 and m5 (*p*s > 0.21) or CS−, which showed the opposite pattern (*p* = 5 × 10−10), since the ratings of the CS− were always 0.

## **II. Comparison of shock likelihood ratings vs. SCRs following stimulus offset**

Similar results were found following stimulus offset, with a significant effect of stimulus level (*F*(5,225) = 33.02, *p* < 0.001), and an interaction of stimulus level by response type (*F*(5,225) = 14.49, *p* < 0.001), with no interactions with stimulus type (*p*s > 0.24) (**Figure 7B**). The ratings values were significantly greater than the SCR values for m1, m2 and m3 (*p*s < 0.05) but not for m4 (*p* = 0.31). Also, m5 and the CS− showed the opposite pattern (*p* = 0.01 and 1 × 10−<sup>8</sup> , respectively).

Thus, this analysis demonstrates statistically that a greater amount of fear generalization was present in the ERs compared to the SCRs in both experiments.

### **CORRELATIONS BETWEEN FEAR LEARNING AND FEAR GENERALIZATION**

In the full sample (*n* = 47), the success of differential fear conditioning (i.e., the magnitude of the difference between SCRs to the CS+ and CS−) predicted the differential SCR to m1 (vs. CS−) during the stimulus presentation (*r* = 0.36, *p* = 0.01) and following stimulus offset (*r* = 0.59, *p* < 0.001), and the differential SCR to m2 following stimulus offset (*r* = 0.48, *p* = 0.001). In

**generalization procedure**. Bar plots of SCRs during fear generalization of the subjects of the two experiments combined (**A,D**; n = 47), Experiment 1 (**B,E**; n = 24) and Experiment 2 (**C,F**; n = 23) are shown. Panels (**A,B** and **C**) show mean maximum SCRs during the 6-s stimulus presentation; panels (**D,E** and **F**) show the mean maximum SCRs following stimulus offset, during the first 6 s of the ITI. Data for the CS+ are omitted from the graphs of the ITI data (panels (**D,E** and **F**), since the responses to the CS+ were likely influenced by the unconditioned stimulus (the electrical shock), which was delivered during the ITI immediately following the presentation of the CS+. A symbol over a CS+, m1 or m2 bar indicates that the mean SCR for this stimulus was significantly

The blue arrows indicate the morph level corresponding to the JND, m4. Error bars represent one standard error from the mean. Overall, these data reveal that a similar pattern of fear generalization occurs in response to perceptually similar face and non-face control stimuli. In Experiment 1 (faces), there was significant fear generalization to m1 (t(23) = 2.44, p = 0.02) and m2 (t(23) = 2.30, p = 0.03) during the stimulus presentation, and to m1 (t(23) = 2.55, p = 0.02) and m2 (t(23) = 2.46, p = 0.02) following stimulus offset (ps for the other morphs > 0.15). In Experiment 2, there was generalization to m1 only, following stimulus offset (t(22) = 2.93, p = 0.008), with no significant fear generalization during the stimulus presentation (all other ps > 0.08).

Experiment 1 only (*n* = 24), similar correlations were found between fear conditioning success and differential SCRs to m1 during the stimulus presentation (*r* = 0.65, *p* = 0.001) and to m1 and m2 following stimulus offset (m1: *r* = 0.81, *p* < 0.001; m2: *r* = 0.60, *p* = 0.002). Similar correlations were found when the SCRs to the CS− were not subtracted from the SCRs to the morphs.

Fear conditioning success was also correlated with the amount of explicit fear generalization to m1 (ratings to m1 vs. m5) in the full sample (*r* = 0.32, *p* = 0.027, *n* = 47) and in Experiment 2 (*r* = 0.47, *p* = 0.02, *n* = 23).

We found no correlations between JND levels and magnitudes of fear learning or fear generalization.

#### **DISCUSSION**

#### **SUMMARY OF FINDINGS**

First, we found that fear generalization is closely linked to perceptual discriminability. Specifically, in all analyses, generalization did not occur above discrimination thresholds. Second, we showed that conscious fear responses, measured as shock likelihood ratings, showed a broader fear generalization gradient than the SCRs. Also, both peripheral and conscious measures of fear generalization correlated with the success of acquisition of conditioned fear responses, suggesting that fear generalization here was not due to poor encoding of the original CS-US association. Lastly, partially confirming our prediction, conscious fear generalization was greater in response to faces than to non-face control stimuli.

#### **GENERALIZATION OF FEAR RESPONSES IS LINKED TO PERCEPTUAL DISCRIMINABILITY**

During stimulus presentation, SCR-based fear generalization occurred to the stimulus morph that was perceptually closest to the CS+ (m1), and then extended further following stimulus offset, to include m2 as well. In the ERs, generalization also occurred to m3 and variably (in Experiment 1 only) to m4, which represented the discrimination threshold, but not to m5.

corrected for all stimulus conditions by adjusting the mean response during the 2-s interval before the stimulus onset to 0.

These findings are in line with many previous studies conducted in non-mammalian species (e.g., pigeons responding to varying frequencies of light) showing a relationship between perceptual similarity and generalization of operant responses, which typically have a Gaussian distribution (Ghirlanda and Enquist, 2003). Here, we provide empirical evidence for this type of relationship in humans, demonstrating that the autonomic fear system in humans is sensitive to quite small perceptual differences between stimuli.

The finding of a broader fear generalization gradient in the post-experiment shock likelihood ratings, compared to the SCRs, is consistent with the results of two previous studies that used on-line shock likelihood ratings (Lissek et al., 2008; Haddad et al., 2013), suggesting that this is a robust phenomenon. This dissociation may at first appear counter-intuitive, since the mechanism(s) generating the conscious appraisal of threat seems to be "throwing away" more accurate information possessed by a lower level system.

However, we speculate that this conservative bias in conscious fear responses may have promoted survival during primate evolution. It may be advantageous, in certain contexts, to be wary of stimuli that are similar, but clearly not identical, to known threats, given that these stimuli may have other common characteristics. In the current study, the autonomic system was not mobilized for the morphs that were similar to, but distinguishable from, the CS+, suggesting that the cost of mobilizing the physiological resources to respond to a threat is outweighed, in the short term, by the benefits of gathering more information about the stimulus. A conscious perception of a potential threat may serve the purpose of directing attentional resources towards gathering this additional information (Ledoux, 2000). If new evidence suggests that the stimulus is indeed threatening, then the autonomic fear system may be recruited at that point.

The neural circuitry responsible for these two types of fear generalization responses has not been fully characterized. However it is known that distinct subfields of the hippocampal formation are involved in the individual coding of (vs. the generalization of features across) similar stimuli or events (Aimone et al., 2011; Newman and Hasselmo, 2014). Other studies have reported that the medial prefrontal cortex and midline thalamus also contribute to these processes (Xu et al., 2012; Xu and Südhof, 2013). During face perception, it is likely that the face-selective areas within the ventral temporal cortex, including the fusiform face area (Kanwisher et al., 1997) and anterior temporal area (Rajimehr et al., 2009; Nasr and Tootell, 2012) communicate with this fronto-thalamic-hippocampal memory network.

Consistent with the work conducted in rodents, functional imaging studies of fear generalization in humans using Pavlovian conditioning procedures have found that the medial prefrontal cortex (Dunsmoor et al., 2011; Greenberg et al., 2013a; Lissek et al., 2013a; Cha et al., 2014b) and hippocampus (Lissek et al., 2013a) show response gradients that are consistent with a fear generalization phenomenon. Similar gradients have also been detected in the responses of regions known to be important in salience detection and fear production, such as the insula (Dunsmoor et al., 2011; Greenberg et al., 2013a; Lissek et al., 2013a), striatum (Dunsmoor et al., 2011; Greenberg et al., 2013a) and ventral tegmental area (Cha et al., 2014a). However the mechanisms responsible for integrating the relevant perceptual and motivational information to produce these response gradients remain unclear. Studies that parametrically vary each component (e.g., the perceptual

**FIGURE 6 | Explicit shock likelihood ratings following the fear generalization procedure**. Bar plots of shock likelihood ratings for the two experiments combined (**A**; n = 47), Experiment 1 (**B**; n = 24) and Experiment 2 (**C**; n = 23) are shown. A symbol over a bar indicates that the mean ratings for this stimulus were significantly greater than the mean ratings for m5, the morph that was the most different perceptually from the CS+ (\*\* p < 0.01; <sup>+</sup> p < 0.001). The blue arrows indicate the morph level corresponding to the JND, m4. Error bars represent one standard error from the mean. Compared to the m5 ratings, subjects

features and motivational value of the stimuli) may clarify how these distinct types of information are used to inform both automatic and conscious perceptions of threat and resulting behavior.

## **AUTONOMIC FEAR GENERALIZATION HAS AN EXTENDED TIME COURSE**

The generalization gradients observed in our SCR data were larger following stimulus offset than during the presentation of the stimulus. This slow time course is typical of SCRs (Bach et al., 2010). This delayed generalization response may also reflect an interaction between the initial autonomic response and the conscious assessment of threat—top-down processes may augment fear responses over time. Alternatively, subjects may experience an acute increase in fear during the time period when they expect to receive a shock, immediately following stimulus offset. The absence of the shock in the context of an increased expectation for it may produce a "prediction error" signal (Li and Mcnally, 2014), contributing to this late SCR. Future studies that manipulate the predictability of the shock may determine whether this response is indeed linked to prediction error-related mechanisms, or merely reflects the long latency of SCRs.

showed significantly greater fear shock likelihood ratings to the CS+ (t(23) = 9.00, p = 5 × 10−<sup>9</sup> ), m1 (t(23) = 7.24, p = 2 × 10−<sup>7</sup> ), m2 (t(23) = 7.10, p = 3 × 10−<sup>7</sup> ), m3 (t(23) = 8.12, p = 3 × 10−<sup>8</sup> ), and m4 (t(23) = 3.87, p = 0.0008) in Experiment 1 (faces), and to the CS+ (t(22) = 5.90, p = 6 × 10−<sup>6</sup> ), m1 (t(22) = 4.43, p = 0.0006), m2 (t(22) = 4.50, p = 0.0002), and m3 (t(22) = 2.83, p = 0.01), but not m4 (p = 0.52) in Experiment 2 (blobs). Direct comparisons of the ratings of the two experiments revealed that there was more explicit fear generalization to faces than to blobs (see text).

## **CONSCIOUS FEAR GENERALIZATION WAS GREATER TO FACES THAN TO NON-FACE CONTROL STIMULI**

Conscious fear generalization (shock likelihood ratings) was greater to the face stimuli, compared to the perceptually matched control stimuli. Although we can only speculate regarding the mechanisms underlying this effect, one possibility is that the holistic, configural based (vs. feature-based) processing mechanisms relied upon during face perception promotes generalization of fear responses across similar-appearing faces. This hypothesis could be explored further in follow-up work in which, in addition to faces, inverted or contrast-reversed faces (which are processed in a feature-based manner) are used as generalization stimuli.

It is important to also note that our interpretation of this finding is somewhat limited by the fact that we used unrecognizable shapes ("blobs") as our control stimuli. Fear generalization may be greater for stimuli that are recognizable members of a known category of objects (Dunsmoor et al., 2013) (i.e., clear category membership may facilitate the extraction of general features of objects), compared to stimuli that are unrecognizable and seemingly arbitrary. Future work using non-face, known objects as control stimuli could further test whether fear

generalization to faces differs from that to other objects. However, these experiments would also need to account for disadvantages associated with these types of control stimuli, i.e., they would not be closely matched to the face stimuli in terms of lower level cues.

Another open question is whether the pattern of results seen here would change if faces with emotional expressions, such as fear, were used as stimuli. Dunsmoor et al. conducted several studies in which a morph continuum between a fearful and neutral face were used as generalization stimuli (Dunsmoor et al., 2009, 2011). In these experiments, the CS+ stimulus was a morph that was at the midpoint of the fear-to-neutral continuum. They found an asymmetric generalization gradient, with the most fear generalization in response to a morph on the "fear side" of the continuum. Given these data and the results of the current study, one question remains: is there fear generalization to faces with fearful expressions (or other biologically prepared stimuli) that are above the discrimination threshold (i.e., to those that can be clearly discriminated from the CS+) due to their intrinsic aversiveness? An alternative possibility is that discriminability among perceptually similar fearful faces is lower than that to perceptually similar neutral faces (perhaps because of the evolutionary importance of defending oneself rapidly from any possible threat), which would lead to greater generalization across fearful faces. These competing explanations could be investigated with the approach used here in the current study.

#### **LIMITATIONS**

This study has several limitations. First, we studied only males, in order to minimize heterogeneity in our data in this first study using this paradigm. A similar study in females is currently underway to determine whether the effects seen here differ across genders. Second, the shock likelihood ratings were not collected during the fear generalization procedure but immediately afterwards. This was done in order to avoid suppression of fear responses by evaluative processes (Lange et al., 2003; Taylor et al., 2003), but this aspect of our design may have affected our results. However, because previous studies that used on-line shock likelihood or fear ratings found qualitatively similar results (i.e., more apparent fear generalization in ratings than in physiological measures) (Lissek et al., 2008; Haddad et al., 2013), this seems unlikely to have had a large effect. Third, our findings could have been influenced by the fact that subjects viewed face or blob stimuli during the discrimination task, before undergoing Pavlovian fear conditioning and generalization procedures with some of the same stimuli. This raises the possibility that other types of learning processes, such as latent inhibition (the inhibitory effect of stimulus pre-exposure on fear conditioning and generalization (Vervliet et al., 2010a)) occurred. However, a latent inhibition effect would have led to a reduction in the level of differential fear conditioning achieved. Given that differential fear conditioning was robust in both experiments, and fear generalization magnitudes correlated with the amount of fear conditioning, this effect was likely small or insignificant.

#### **FUTURE STUDIES AND CLINICAL IMPLICATIONS**

The development of quantitative measures of perceptual and emotional processes and their interactions is needed for several reasons. After validating such measures, the mechanisms governing these processes can be explored further, by varying the experimental design and measuring additional outcomes, including the underlying brain mechanisms. Also, although some degree of fear generalization is adaptive, excessive generalization of fear or other types of emotional responses may lead to inappropriate behaviors and responses during social interactions, giving rise, in some cases, to psychopathological states. For example, fear generalization has been shown to be excessive in anxiety disorders (Lissek et al., 2010, 2013b; Greenberg et al., 2013b; Kaczkurkin and Lissek, 2013; Cha et al., 2014b). Thus, a quantitative index of abnormal fear generalization may serve as an intermediate phenotype for these disorders, which can serve as a target of treatment and early intervention. Abnormal fear processes have been demonstrated in depression (Nissen et al., 2010) and schizophrenia (Jensen et al., 2008; Holt et al., 2009; Romaniuk et al., 2010) as well. In light of the evidence for abnormalities in neural systems that span diagnostic categories in psychiatry (Insel et al., 2010), the study of fear-related processes in patients with a wide range of symptom types may clarify the degree to which patients with distinct primary diagnoses share a common vulnerability to negative affect and the experience of inappropriate fear.

response; ER = explicit ratings.

## **ACKNOWLEDGMENTS**

This work was supported by the National Institute of Mental Health (Daphne J. Holt: RO1MH095904) and the National Eye Institute (Roger B. H. Tootell: R01EY017081).

### **REFERENCES**


research on mental disorders. *Am. J. Psychiatry* 167, 748–751. doi: 10.1176/appi. ajp.2010.09091379


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 May 2014; accepted: 26 July 2014; published online: 05 September 2014*. *Citation: Holt DJ, Boeke EA, Wolthusen RPF, Nasr S, Milad MR and Tootell RBH (2014) A parametric study of fear generalization to faces and non-face objects: relationship to discrimination thresholds. Front. Hum. Neurosci. 8:624. doi: 10.3389/fnhum.2014.00624*

*This article was submitted to the journal Frontiers in Human Neuroscience*. *Copyright © 2014 Holt, Boeke, Wolthusen, Nasr, Milad and Tootell. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Novel method to classify hemodynamic response obtained using multi-channel fNIRS measurements into two groups: exploring the combinations of channels

## *Hiroko Ichikawa1,2,3\*, Jun Kitazono4, Kenji Nagata4, Akira Manda4, Keiichi Shimamura5,6, Ryoichi Sakuta5,6, Masato Okada4,7, Masami K. Yamaguchi 1,2, So Kanazawa8 and Ryusuke Kakigi <sup>9</sup>*

*<sup>1</sup> Department of Psychology, Chuo University, Tokyo, Japan*

*<sup>2</sup> Research and Development Initiative, Chuo University, Tokyo, Japan*

*<sup>3</sup> Japan Society for the Promotion of Sciences, Tokyo, Japan*

*<sup>4</sup> Department of Complexity Science and Engineering, The University of Tokyo, Kashiwa, Japan*

*<sup>5</sup> Department of Pediatrics, Dokkyo Medical University Koshigaya Hospital, Koshigaya, Japan*

*<sup>6</sup> Center for Child Development and Psychosomatic Medicine, Dokkyo Medical University Koshigaya Hospital, Koshigaya, Japan*

*<sup>7</sup> RIKEN Brain Science Institute, Wako, Japan*

*<sup>8</sup> Department of Psychology, Japan Women's University, Kawasaki, Japan*

*<sup>9</sup> Department of Integrative Physiology, National Institute for Physiological Sciences, Okazaki, Japan*

#### *Edited by:*

*Aina Puce, Indiana University, USA*

#### *Reviewed by:*

*Giancarlo Zito, 'S. Giovanni Calibita' Fatebefratelli Hospital, Italy Yosuke Kita, National Center of Neurology and Psychiatry, Japan*

#### *\*Correspondence:*

*Hiroko Ichikawa, Department of Psychology, Chuo University, 742-1, Higashi-Nakano, Hachioji, Tokyo 192-0393, Japan e-mail: ichihiro@tamacc.chuo-u.ac.jp* Near-infrared spectroscopy (NIRS) in psychiatric studies has widely demonstrated that cerebral hemodynamics differs among psychiatric patients. Recently we found that children with attention-deficit/hyperactivity disorder (ADHD) and children with autism spectrum disorders (ASD) showed different hemodynamic responses to their own mother's face. Based on this finding, we may be able to classify the hemodynamic data into two those groups and predict to which diagnostic group an unknown participant belongs. In the present study, we proposed a novel statistical method for classifying the hemodynamic data of these two groups. By applying a support vector machine (SVM), we searched the combination of measurement channels at which the hemodynamic response differed between the ADHD and the ASD children. The SVM found the optimal subset of channels in each data set and successfully classified the ADHD data from the ASD data. For the 24-dimensional hemodynamic data, two optimal subsets classified the hemodynamic data with 84% classification accuracy, while the subset contained all 24 channels classified with 62% classification accuracy. These results indicate the potential application of our novel method for classifying the hemodynamic data into two groups and revealing the combinations of channels that efficiently differentiate the two groups.

**Keywords: hemodynamic data, near-infrared spectroscopy (NIRS), support vector machine (SVM), sparse modeling, attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorders (ASD)**

## **INTRODUCTION**

Near-infrared spectroscopy (NIRS) has been utilized to measure brain activity in humans (for review, Ferrari and Quaresima, 2012). Because NIRS is non-invasive and requires less stabilization of participants than other neuroimaging techniques, NIRS is highly suitable for studies with infants, children (for review, Lloyd-Fox et al., 2010; Gervain et al., 2011) and patients with psychiatric symptoms or disorders such as schizophrenia, depression, anxiety disorder, and attention-deficit/hyperactivity disorder (ADHD) (for review, Fukuda, 2009; Ernst et al., 2012). ADHD is characterized by major symptoms of hyperactivity, impulsivity and inattention (American Psychiatric Association, 2000). Children with ADHD (ADHD children) show atypical hemodynamic response in the prefrontal region associated with attention and working memory deficits (Weber et al., 2005; Negoro et al., 2010; Monden et al., 2012, but see also Schecklmann et al., 2010).

The neural response of ADHD children in face processing is different from that of typically developing children (TD children). Tye et al. (2013) used ERP techniques to examine the face-inversion effect and gaze processing in ADHD children, children with ASD (ASD children), children with comorbid ASD+ADHD, and TD children. They found that children with ADHD (ADHD/ADHD+ASD) showed atypical response that reflect early attentional stage of face processing, while children with ASD (ASD/ASD+ADHD) showed atypical response in gaze processing and atypical neural specialization, which are likely to be more relevant to the characteristic social deficits of autism. As far as we know, Tye et al. (2013) was the first study that found the basic face processing in ADHD children different from that of ASD children and no previous studies has yet investigated familiar face processing in children with ADHD. Dawson et al. (2002) demonstrated that ASD children showed no differential ERPs when viewing their mother's face and viewing an unfamiliar female face, while TD children did. Although the hemodynamic response to one's mothers' face has not been tested with ADHD children, we may suppose that an atypical response might also be observed in ADHD children.

Recently our group found that boys with ADHD showed a different hemodynamic response to their own mother's face than typically developing boys (Shimamura et al., 2012, under review). Thorell et al. (2012) tested non-clinical 8.5 year-old children and reported that attachment disorganization and executive functioning were independently related to ADHD symptoms. Furthermore, Carlsson et al. (2008) investigated the endogenous and exogenous factors that predict inattentiveness and hyperactivity in middle childhood and demonstrated that the quality of the caregiving more powerfully predicted inattention and hyperactivity than did early biological or temperamental factors. The quality of the relationship between the child and the caregiver might increase or reduce the amount of communication between the two, even in TD children, which may possibly affect the development of the children's neural basis for processing his/her mother's face. (Incidentally, all adults participated as "caregivers" in Shimura et al.'s study were the mother's of the children.)

Furthemore, we found that boys with ADHD showed a different hemodynamic response to their own mother's face than boys with ASD (Shimamura et al., 2012, under review). Although ASD is characterized by difficulties with social interaction, restricted interests, and repetitive behaviors (American Psychiatric Association, 2000), there is a large overlap of symptoms in ADHD and ASD patients such as hyperactivity, restlessness, and impairments in social cognitive abilities (Yerys et al., 2009; Taurines et al., 2012; van der Meer et al., 2012). It is often difficult in actual clinical practice to distinguish between these patients due to their overlapping symptoms (Yoshida and Uchiyama, 2004). Using fNIRS, we presented boys of both groups with images of their mother's face and measured their cerebral hemodynamics in the bilateral temporal area. Only children with ADHD showed a significantly greater concentration of oxyhemoglobin (oxy-Hb) in the bilateral temporal area than the pre-task baseline; children with ASD showed a decrease of oxy-Hb concentration in the left temporal area.

These findings suggest the possibility of distinguishing the cerebral hemodynamic data of the ADHD participants from those of the ASD participants and the possibility of classifying the data into two separate diagnostic groups. With such classification, we might be able to predict to which diagnostic group an unknown participant belongs by analyzing his/her hemodynamic data.

To classify the hemodynamic data into two groups, a promising method involves the Support Vector Machine (SVM; Vapnik, 1982). The SVM is a multivariate method for binary classification and has recently been introduced to NIRS studies aimed at the development of BCI (Sitaram et al., 2007; Cui et al., 2010).

When using multivariate pattern analysis methods such as SVM, we can improve classification accuracy by selecting informative variables and eliminating uninformative ones (Weston et al., 2001; Guyon and Elisseeff, 2003; Liu and Yu, 2005). This process is called feature selection or variable selection and has been increasingly applied in recent BCI studies (Yamashita et al., 2008; Gottemukkula and Derakhshani, 2011). A number of feature selection methods have been proposed in past studies (Guyon and Elisseeff, 2003; Liu and Yu, 2005). These methods, however, do not necessarily select the subset of variables that gives the best classification accuracy because these methods are intended to be applied to high-dimensional data such as genetic data, and are designed to decrease computation time by using approximate treatment. Here, without approximate treatment, we exhaustively evaluated classification accuracy using all 224 − 1 = 16*,* 777*,* 215 subsets of channels to find the best one, using 5-fold cross validation.

In this study, we applied SVM to real hemodynamic data obtained in Shimamura et al. (2012, under review) and tried to classify the data into two groups: ADHD participants and ASD participants. We exhaustively searched the optimal subset among all subsets of channels and evaluated the classification accuracy using 5-fold cross validation. To compare the effectiveness of SVM with a standard method, we applied two feature selection methods: Lasso (Tibshirani, 1996) and sparse logistic regression (SLR) (Yamashita et al., 2008), and a channel-wise *t*-test, which is one of the most popular statistical analysis to find an activated channel in NIRS measurement.

## **METHOD AND ANALYSIS**

## **PARTICIPANTS**

The participants in this study were nine boys with ADHD (three ADHD-only and six with comorbid ADHD+ASD; mean age = 9y9m, *SD* = 1y7m) and eight boys with ASD (two ASD-only and six with comorbid of ASD+ADHD; mean age = 9y9m, *SD* = 1y4m). Six ADHD boys received methylphenidate, and one ADHD boy received atomoxetine. The mean score of ADHD-Rating Scale was 34.2 (range = 11–52; *SD* = 13*.*9) in the ADHD group, 26.6 (range = 16–38; *SD* = 7*.*6) in ASD group. The twotailed two-sample *t*-test did not show significant difference in their ADHD-RS scores [*t*(15) = 1*.*375, *ns*]. These two diagnostic groups were not different in their age or sex. All diagnoses were based on the DSM-IV-TR (American Psychiatric Association, 2000) and were made by a pediatric neurologist (Ryoichi Sakuta).

## **PROTOCOL**

During the measurement, participants observed the visual stimuli presented on the monitor. A single trial was comprised of a baseline period and test period. Each trial started with a baseline period during which the participants fixated on a black dot displayed on the monitor at the rate of 1 Hz. The duration of the baseline period was at least 20 s. Following this, a test period began. During the test period, the image of the child's mother's face or that of an unknown female face was presented. Either the mothers' face or the unknown female face appeared successively 10 times at the rate of 1 Hz. The duration of test trial was 10 s. The mothers' face and the unknown face were presented alternatively. Each participant performed five trials. In this study, we analyzed the hemodynamic data only from when the boys were passively looking at their mother's face.

The study was approved by the Ethical Committee of the Dokkyo Medical University Koshigaya Hospital and by the Ethical Committee of Chuo University. Written informed consent was obtained from the participants and their parent. The experiments were conducted according to the Declaration of Helsinki.

#### **FUNCTIONAL NIRS RECORDING**

We used a Hitachi ETG-4000 system (Hitachi Medical, Chiba, Japan), which can record from 24 channels simultaneously, with 12 channels for the right temporal area and 12 for the left. This instrument generates two wavelengths of NIR (695 and 830 nm) and measures the time courses of the levels of oxyhemoglobin (oxy-Hb), deoxyhemoglobin (deoxy-Hb), and their sum (total hemoglobin: total-Hb) with 0.1 s time resolution. Based on the previous study, which showed that the oxy-Hb change reflects the task-related neural activity more reliably than deoxy-Hb or total-Hb (Strangman et al., 2003; Cui et al., 2011), we analyzed only the oxy-Hb concentration.

The probes were set on the child's scalp at the bilateral temporal area centered at T5 and T6 according to the International 10–20 system (Jasper, 1958) (**Figure 1**). When the probes were positioned, the experimenter checked to see if the fibers were touching the child's scalp correctly. The Hitachi ETG-4000 system automatically detects whether the contact can adequately measure the emerging photons for each channel. All the trials were rejected from the analysis if adequate contact between the fibers and the child's scalp couldn't be achieved because of hair interference.

### **PREPROCESSING OF DATA**

The raw data on oxy-Hb concentrations from each channel were digitally band-pass-filtered at 0.02–1.0 Hz to remove any noise due to heartbeat pulsations and any longitudinal signal drift.

Although the raw NIRS data were originally relative values, and could not be compared directly across subjects or channels, the normalized data such as the Z-score could be averaged regardless of the unit (Schroeter et al., 2003; Shimada and Hiraki, 2006). This calculation of the Z-score is a reliable analysis for changes in concentration in the children's brains since the analysis is independent of the differential path length factor (DPF).

In order to show the relative change of oxy-Hb concentration during the task period, we standardized the raw data into Z-scores based on the baseline period ahead of the task period. We calculated the Z-scores of oxy-Hb in a time series of 0.1 s time resolutions from 3 s before the test period onset to the test period offset (Ichikawa et al., 2010; Kobayashi et al., 2011, 2012). The Z-score at each time point can indicate the deviation of hemodynamic response during the presentation of faces from the "baseline." The "baseline" for calculating the Z-score was a period of 3 s immediately before the beginning of the each test period, which reflects the activation during the observation of the blank and fixation points. The Z-scores were calculated using the following formula:

$$d = (\mathfrak{x}\_{test} - m\_{baseline}) / s \tag{1}$$

*xtest* represents the raw data [mM mm] at each time point during the test period and *mbaseline* represents the mean of the raw data during the baseline period. *s* represents the *SD* of the raw data during the baseline period.

Consistent with a previous study (Boynton et al., 1996) and our previous studies using NIRS (Ichikawa et al., 2010; Nakato et al., 2011), we found that a response peak lags a few seconds

test period was fixed for 10 s. The baseline period and test period were

between the fibers was set at 3 cm.

behind stimulus onset. Therefore, we performed the following analyses against the mean Z-scores from 3 to10 s after the face stimulus onset.

The Z-scores were calculated separately for each trial. In this study, we eliminated the trials from further analysis if all of 24 channels were not completely recorded due to hair interference or motion artifact. Finally, the number of data we obtained was 50: 25 data from 6 participants of the ADHD group and 25 data from 8 participants of the ASD group. The mean number of valid trials was 4.3 (*SD* = 2*.*7) per child with ADHD and 3.1 (*SD* = 1*.*3) per child with ASD.

#### **SUPPORT VECTOR MACHINE (SVM)**

We trained the SVM to discriminate data by considering to which diagnosis group the owner of the data belongs. We used the mean Z-scores of hemodynamic activities and the diagnosis group of participants as inputs and outputs of the SVM, respectively.

SVMs are state-of-the-art models for classification with a high generalization capability (Vapnik, 1982; Bennett and Mangasarian, 1992; Cortes and Vapnik, 1995). Given input data, an SVM classifies them into two classes. An SVM learns the relationship between the input data and their classes from the training samples. It also predicts the class of unknown data.

$$\left\{ (\mathbf{x}\_i, t\_i) \mid \mathbf{x}\_i \in \mathbb{R}^D, \ t\_i \in \{+1, -1\} \right\}\_{i=1}^N,\tag{2}$$

where *xi* is a D-dimensional feature vector, and *ti* is a class label of *xi*. In this study, the mean Z-scores were calculated for 24 channels (or 12 for hemispheric analysis) for each trial, and were used as a 24-dimensional (or 12-dimensional) feature vector. We had 50 valid trial data sets and 50 feature vectors input to the SVM. We related ADHD participants to *ti* = +1 and ASD participants to *ti* = −1.

Given this data set, an SVM finds a hyperplane in the feature vector space that separates the samples into two groups. The obtained hyperplane is called a decision boundary. New samples are then classified according to which side of the boundary they belong. Classifying samples using a decision boundary is a common practice for other linear classifiers such as logistic regression. What characterizes SVMs is margin maximizing. The margin is the distance between the decision boundary and the closest sample to the decision boundary. By maximizing the margin, SVMs are capable of accurately predicting the classes of new samples. The decision boundary is expressed as a linear equation as follows:

$$\boldsymbol{\chi}(\mathbf{x}) = \boldsymbol{w}^T \boldsymbol{x} + \boldsymbol{b} = \mathbf{0} \tag{3}$$

Here *w* is a weight vector. We want to find w and b that satisfy *y*(*xi*) *>* 0 for *ti* = 1 and *y*(*xi*) *<* 0 for *ti* = −1, that is, *ty*(*x*) *>* 0 for all samples (Vapnik, 1982). The larger value of *w* indicates that the more corresponding *x* contributes to classify data into two groups. A positive sign of *w* indicates that the larger value of a corresponding *x* raises the possibility of *ti* = 1, while a negative sign of *w* indicates that the larger value of a corresponding *x* raises the possibility of *ti* = −1.

#### **CROSS VALIDATION**

Cross validation (CV) is a method for evaluating how the ability of a learning machine such as an SVM is generalized to the unknown data that are not used in the training (Kohavi, 1995). In the CV, the data set is divided into two parts. One part is used for the training of the machine, and the other part is used for testing the machine's ability. This training and testing procedure is repeated using different partitioning. The CV is effective when the number of the data is limited.

We explain the *K*-fold CV for the SVM below. First, we divide the data set into *K* parts *C*1,. . . , *CK*. For each *k* = 1,. . . , *K*, we train the SVM using the data other than the *k*-th part *Ck*. We denote by *y\k* (*x*) the decision boundary. Then, by using this boundary *y\k* (*x*), we predict the classes of the data in *Ck*, and compare them against the true class labels *t*. We repeat this procedure for every *k* = 1,. . . , *K*, and calculate the following cross validation error (CVE):

$$\text{CVE} = \frac{1}{N} \sum\_{k=1}^{K} \sum\_{i \in \mathcal{C}\_k} L(t\_i, \boldsymbol{\upchi}\_k(\boldsymbol{\upchi})),\tag{4}$$

$$L(t, \boldsymbol{\gamma}(\boldsymbol{\alpha})) = \begin{cases} 1 \ (t\boldsymbol{\gamma} \,(\boldsymbol{\alpha}) > 0) \\ 0 \ (t\boldsymbol{\gamma} \,(\boldsymbol{\alpha}) < 0) \end{cases} \tag{5}$$

*L*(*t*, *y*(*x*)) indicates whether the prediction is correct or not. When the prediction is correct, *L*(*t*, *y*(*x*)) = 0, and when the prediction is incorrect, *L*(*t*, *y*(*x*)) = 1. CVE represents the ratio of the number of incorrectly predicted data to the total number of data. If the CVE is small, the generalization capability of the SVM is high. The classification accuracy is defined using the following formula:

$$(1 - \text{CVE}) \times 100\tag{6}$$

We used a 5-fold CV. In this study, we used three data sets; (1) *D* = 12 (12 channels placed on the right temporal area), (2) *D* = 12 (12 channels placed on the left temporal area), and (3) *D* = 24 (24 channels placed on the bilateral temporal areas). The above sets of (1) and (2) were subsets of (3).

We selected a subset *A* of *D* features and set *xiA* := (*xid* )*d*∈*<sup>A</sup>* ∈ R|*A*| . We then applied the 5-fold CV to the data set *xiS , ti <sup>N</sup> i* = 1 and calculated the CVE. We carried out this process for all (2*<sup>D</sup>* − 1) subsets.

Our aim was to find the optimal subsets that classify the data into two groups most correctly. In other words, we wanted to find the combination of NIRS measurement channels that distinctively respond to the experimental stimuli depending on which diagnostic groups the data belong.

#### **RESULTS**

We used the mean Z-scores of hemodynamic response obtained from each channel and the diagnostic group of participants as inputs and outputs for the SVM, respectively. We trained the SVM to find the optimal subset in the data set of channels in order to classify the data into two diagnostic groups. For each subset, we evaluated its classification accuracy using 5-fold CV.

We conducted the exhaustive search of three data sets of Z-scores of hemodynamic data: (1) the 12-dimension dataset obtained from the right 12 channels, (2) the 12-dimension dataset obtained from the left 12 channels, and (3) the 24-dimension dataset obtained from the bilateral 24 channels. For the 12 dimension datasets [(1) and (2)], SVM training and calculation of the CVE was repeated with 2<sup>12</sup> − 1 = 4095 subsets. For the 24 dimension dataset (3), SVM training and calculation of the CVE was repeated with 2<sup>24</sup> − 1 = 16*,* 777*,* 215 subsets.

## **RESULTS OF SVM CLASSIFICATION APPLIED ON THE DATA OBTAINED FROM THE RIGHT HEMISPHERE**

We found nine subsets to classify the data more correctly into two groups among the 4095 subsets. The measurement channels which comprised those subsets are listed in **Table 1**. The best classification accuracy for these subsets was 70%. **Figure 2A** shows the classification accuracy for all 4095 subsets.

**Figure 2B** shows the feature subsets corresponding to the classification accuracy represented in **Figure 2A**. **Figure 2B** shows that Ch. 15, 16, and 18 repeatedly appear in the subsets represented on the left side, which have relatively higher classification accuracy. These channels are filled with darker color, indicating these channels have a stronger coefficient than other channels. Furthermore, the colors of these three channels were consistent among all the subsets, respectively. This indicates that their weight vectors have the same sign commonly in all the subsets.

Moreover, **Figure 2B** shows that the red colored cells and blue colored cells are separated. The lower-numbered channels (Ch. 13–17) are constantly red and the higher-numbered channels (Ch. 18, 19, and 23) are constantly blue. This suggests that in the right hemisphere the brain area contributing to classification are separated. On the other hand, other channels were constantly filled with yellowish color. These channels do not have strong weight vectors, thus, they do not contribute to the classification in all of the 4095 subsets.

**Figure 3** shows the classification accuracy of the best 50 subsets (**Figure 3A**) and the corresponding feature subsets (**Figure 3B**).

**Table 1 | The measurement channels those comprised the subsets with best classification accuracya.**


*aThe numbers indicated the channel numbers. In the right hemisphere, the number of the best subset was nine. The percentage represented in the parenthesis indicating the best classification accuracy for each data set.*

Each cell indicates the weight vector of each channel in the feature subset. Ch. 15, 16, and 18 appear in most of subsets. Ch. 15 was in 49 of the 50 subsets. Ch. 16 was in 45 subsets and Ch. 18 was in 40 subsets, respectively. These three channels were more often used in the best 50 subsets and more effectively contributed to the classification. The positions of these channels are illustrated in **Figure 3C**.

The color of the cells corresponding to Ch. 15 and 16 are red, but Ch. 18 is blue. This tendency is consistent through the best 50 subsets. This result indicated that the greater hemodynamic response in Ch. 15 and 16 more often occurred in ADHD participants than in ASD participants, and those at Ch. 18 were more often found in ASD participants than in ADHD participants.

### **RESULTS OF SVM CLASSIFICATION APPLIED ON THE DATA FROM THE LEFT HEMISPHERE**

The best subset for classifying the data into two groups contained the five channels listed in **Table 1**. This subset had the best classification accuracy of 74%. **Figure 4** shows the classification accuracy for all 4095 subsets (**Figure 4A**) and the corresponding feature subsets (**Figure 4B**). Ch. 5 and 6 constantly appear in the subsets with relative higher classification accuracy and had greater weight value than the other channels. Furthermore, the color of their cells is consistent through all the subsets. This indicates the sign of the weight vector is common in all the subsets.

Compared with the result from the right hemisphere (**Figure 3B**), the red colored cells and blue colored cell are scattered among the channels. This suggests that in the left hemisphere the brain areas contributing to classification are not clearly separated.

**FIGURE 2 | (A)** The classification accuracy corresponding with 4095 subsets consisted from the channels in the right hemisphere. Horizontal axis represents the rank-ordered subsets. Vertical axis indicates the classification accuracy. The highest accuracy is represented on the left side. **(B)** The channel numbers in the subsets represented in **(A)**. Horizontal and vertical axes represent the rank-ordered feature subsets and the channel numbers, respectively. Colored cells indicate that the channel was in the subset and black cells indicate that the channel was not in the subset. The color of each cell indicates a value of weight vector corresponding to the Z-score of each channel. Red/yellow spectrum indicates positive value and blue spectrum indicates negative value.

**Figure 5** shows the classification accuracy for all 4095 subsets (**Figure 5A**) and the corresponding feature subsets (**Figure 5B**). Ch. 6 was most often used in the best 50 subsets. The channel was in 43 of the 50 subsets. Following Ch. 6, 3, and 7 were in 32 subsets. It is worth mentioning that Ch. 5 and 6 appear alternately in the top 17 subsets. The position of these channels is illustrated in **Figure 5C**.

Ch. 3, 5, and 7 are red and only Ch. 6 is blue. This tendency is consistent through the best 50 subsets. This result indicated that at Ch. 3, 5, and 7 ADHD participants showed greater hemodynamic response than ASD participants, and only at Ch. 6 did ASD participants show greater hemodynamic response than ADHD participants.

### **RESULTS OF SVM CLASSIFICATION APPLIED ON THE DATA FROM BILATERAL HEMISPHERES**

We found two subsets for classifying the data more correctly into two groups among the 16,777,215 subsets. The measurement channels that comprised those subsets listed in **Table 1**. These subsets had the best classification accuracy of 84%. **Figure 6A** shows the classification accuracy of the subsets with accuracy above 70%. The classification accuracy was higher when using the data set from the bilateral temporal areas than when using data sets from the one-side temporal area. This result indicated that the bilateral temporal areas interacted with each other and that the combined data analysis increased the amount of information about the classification.

**Figure 6B** shows the feature subsets corresponding to the classification accuracy represented in **Figure 6A**. As in the analysis on the data sets from each hemisphere, Ch. 6 appeared constantly and was colored blue.

**Figure 7** shows the classification accuracy for all 4095 subsets (**Figure 7A**) and the corresponding feature subsets (**Figure 7B**). Ch. 6 was in all of the 50 subsets and was most often used in the

best 50 subsets. Ch. 3 and 14 were in 49 subsets. The position of these channels is illustrated in **Figure 7C**.

Ch. 3, 5, and 14 are red and only Ch. 6 is blue. This tendency is consistent through the top 50 subsets. This result indicated that the greater hemodynamic response in Ch. 3, 5, and 14 more often occurred in ADHD participants than in ASD participants, and that those at Ch. 6 are more often found in ASD participants than in ADHD participants.

### **LASSO AND SPARSE LOGISTIC REGRESSION**

To demonstrate the effectiveness of the exhaustive search, we compared the exhaustive search with two existing methods, least absolute shrinkage and selection operator (LASSO, Tibshirani, 1996) and SLR (Yamashita et al., 2008). These are classification methods that incorporate feature selection as a part of the process of classifier training. We applied LASSO and SLR to datasets (1)–(3), and evaluated classification accuracies by using 5-fold CV.

By using LASSO, the classification accuracies were (1)57.5 and (2)66%. These accuracies rank (1)1106th (in the top 27.0%) and (2)147th (in the top 3.59%) of the 4095 accuracies obtained by the exhaustive search. The classification accuracy in dataset (3) was 70%, and this ranks 139,815th (in the top 0.833%) of the 16 million accuracies obtained by the exhaustive search.

By using SLR, the classification accuracies were (1)52 and (2)64%. These accuracies rank (1)1978th (in the top 48.3%) and (2)372th (in the top 9.08%) of the 4095 accuracies obtained by the exhaustive search. The classification accuracy in dataset (3) was 66%, and this ranks 698,955th (in the top 4.17%) of the 16 million accuracies obtained by the exhaustive search.

subsets consisted from the channels in the bilateral hemisphere. **(B)** The channel numbers in the subsets represented in **(A)**. **(C)** The position of the channels referred to in the text.

These results showed that, in datasets (2) and (3), the classification accuracies using LASSO and SLR were high and were located near the top of all the accuracies obtained by the exhaustive search. In comparison, in dataset (1) the classification accuracies were low compared to those in datasets (2) and (3), and were in the middle of all the accuracies obtained by the exhaustive search. These findings highlighted the fact that by using LASSO and SLR, the classification accuracy is not always high compared to the best result obtained by the exhaustive search, depending on the dataset. This fact was not revealed unless performing the exhaustive search. This means that neither LASSO nor SLR could uncover the latent structures relevant to discrimination between ASD and ADHD in the high-dimensional fNIRS data.

#### **CHANNEL-WISE ANALYSIS OF 24 CHANNELS**

With the aim of comparing the effectiveness of SVM with a standard method, we also performed a channel-wise *t*-test on each channel. A two-tailed two-sample *t*-test was conducted for the difference of Z-scores between the ADHD participants and ASD participants during the 3–10 s of the test trials. For all 24 channels, a *t*-test was performed. To reduce the risk of a Type I error, we performed the corrections using the false discovery rate (FDR) (Singh and Dan, 2006).

We did not find any significant difference in the mean Z-scores between ADHD participants and ASD participants. To compare the hemodynamic change with the baseline activation, we conducted a two-tailed one-sample *t*-test against the baseline. The *t*-test was repeated for each of 24 channels by applying the FDR procedure (Singh and Dan, 2006). However, we did not find any significant hemodynamic change from the baseline either in ADHD participants or ASD participants.

## **DISCUSSION**

In this study, we applied SVM to real hemodynamic data obtained in Shimamura et al. (2012, under review) and tried to classify the data into two groups: ADHD participants and ASD participants. We exhaustively searched the optimal subset of measurement channels and evaluated each classification accuracy using 5-fold CV. We showed that the classification accuracy when using the best subset of channels was 84%, while that using all 24 channels was 62%. Additionally, we applied two feature selection methods, LASSO (Tibshirani, 1996) and SLR (Yamashita et al., 2008), which are intended to be applied to high-dimensional data, and confirmed that the best subset of channels was not selected when using these methods. Furthermore, the channel-wise analysis did not find any significant channels that showed distinctive activation for groups.

In the right hemisphere, SVM classification indicated that the greater weight value was consistently in three channels: Ch. 15, 16, and 18. These channels contribute to the classification in opposite directions because of their opposite signs of weight vectors in constricting the decision boundary. The sign of weight vector corresponding to Ch. 15 and 16 were positive, while that corresponding to Ch. 18 was negative. This result indicates that hemodynamic data obtained from Ch. 15 and 16 might increase for ADHD children and those obtained from Ch. 18 might decrease. To classify the input data into an ADHD group or an ASD group, we can focus on only the activity of these three channels, rather than all the measurement channels. (Using only these channels, we can classify the hemodynamic data into an ADHD group and an ASD group with 68% classification accuracy.) Ch. 15 and 16 correspond to the temporal-parietal junction (TPJ). The TPJ is the area responsive to theory of mind (ToM) (Saxe and Kanwisher, 2003) and is involved in familiar face recognition (Gobbini et al., 2004; Gobbini and Haxby, 2007). Although the neural response to a maternal face has not been investigated with school-aged children, previous study with adults (Ramasubbu et al., 2007) demonstrated that a maternal face evoked a stronger and broader hemodynamic activation than did unfamiliar faces. On the other hand, Bartels and Zeki (2004) demonstrated that maternal attachment and romantic love commonly activated the brain's reward system, yet deactivated regions associated with negative emotions, social judgment, and ToM. Based on these previous studies, we can assume that increased hemodynamic response in the TPJ of ADHD participants might be related to the atypical attachment of ADHD children to their mother (Shimamura et al., 2012).

In the left hemisphere, though no channels appeared consistently through the best 50 subsets, some channels appeared alternately. In the best 17 subsets Ch. 3, 5, 7 appeared, while in the 18th to 28th subsets only Ch. 5 appeared dominantly. Below the 29th subsets, Ch. 6 appeared again and various unstable channels appeared alternately. These channels behave like small "patches." These small-scale activities by some patches reflect the sparse expression of the information in/from the left hemisphere (Kitazono, 2013).

The imbalance of channels that contributes to the classification between the hemispheres might reflect the differential stage of face processing. Previous studies demonstrated that the left hemisphere predominates when faces are processed featurally, whereas the right hemisphere predominates when faces are processes configurally (Koenig and Hillger, 1991; Rossion et al., 2000; Scott and Nelson, 2006). Based on these findings, we can suppose that the greater number of channels in the left hemisphere might input raw (lower level) information and process featural properties such as eyes and mouth independently, whereas the lesser number of channels in the right hemisphere might input the pre-processed information and process the configuration of faces.

The best classification accuracy was obtained by the SVM classification on all 24 channels. The classification accuracy was 84%.We had 50 data, and 42 out of those 50 were correctly classified into the diagnosis group to which the participants belonged. Let us compare with the results of the SVM classification in each hemisphere and in the bilateral hemisphere. Three channels (Ch.15, 16, and 18) are important in the SVM classification in the right hemisphere, while four channels (Ch. 3, 5, 6, and 7) are important in the left hemisphere. Also, four channels (Ch. 3, 5, 6, and 14) are important in the SVM classification in the bilateral hemisphere. Only three (Ch. 3, 5, 6) out of the seven channels (Ch. 3, 5, 6, 7, 15, 16, 18) are common between the one-sided hemisphere and the bilateral hemisphere analysis. Moreover, one channel (Ch. 14) is not important in the right hemisphere analysis. We found that the discrepancies in the results of the SVM classification in the bilateral hemisphere that the channels used in the optimal subset were not consistent with those used in the optimal subset in each hemisphere. It would be logical that different input results in different output (Kitazono, 2013).

In this study, we aimed to classify the hemodynamic data into two distinct participant groups: ADHD participants and ASD participants. We exhaustively searched the optimal subset of fNIRS measurement channel and evaluated the classification accuracy using 5-fold CV. We successfully found the optimal subset for the classification of the real hemodynamic data with 84% accuracy. We can conclude that SVM and exhaustive search provides an effective method for hemodynamic data classification obtained from multichannel NIRS measurement.

## **ACKNOWLEDGMENTS**

This study was supported by Grant-in-Aid for Scientific Research on Innovative Areas, "Face perception and recognition" from MEXT KAKENHI (20119002 to Masami K. Yamaguchi, 23119708 to Masato Okada); Grant-in-Aid for Scientific Research on Innovative Areas, "Sparse Modeling" from JSPS KAKENHI (25120009 to Masato Okada and Kenji Nagata, 26120529 to Hiroko Ichikawa); Grant-in-Aid for Scientific Research on Innovative Areas, "Exploring the Limits of Computation (ELC)" from JSPS KAKENHI (25106506 to Kenji Nagata); Grant-in-Aid for Scientific Research (A) from JSPS KAKENHI (20240020 to Masato Okada); Grant-in-Aid for Young Scientists (B) from JSPS KAKENHI (22700230 to Kenji Nagata) and a Grant-in-Aid for JSPS Fellows from JSPS KAKENHI (10J06155 to Jun Kitazono, 24 7809 to Hiroko Ichikawa).

## **REFERENCES**


traits? A selective review. *ADHD Atten. Defic. Hyperact. Disord*. 4, 115–139. doi: 10.1007/s12402-012-0086-2


activity patterns. *Neuroimage* 42, 1414–1429. doi: 10.1016/j.neuroimage.2008. 05.050


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 05 March 2014; accepted: 13 June 2014; published online: 02 July 2014.*

*Citation: Ichikawa H, Kitazono J, Nagata K, Manda A, Shimamura K, Sakuta R, Okada M, Yamaguchi MK, Kanazawa S and Kakigi R (2014) Novel method to classify hemodynamic response obtained using multi-channel fNIRS measurements into two groups: exploring the combinations of channels. Front. Hum. Neurosci. 8:480. doi: 10.3389/fnhum.2014.00480*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Ichikawa, Kitazono, Nagata, Manda, Shimamura, Sakuta, Okada, Yamaguchi, Kanazawa and Kakigi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Subliminal cues bias perception of facial affect in patients with social phobia: evidence for enhanced unconscious threat processing

## *Aiste Jusyte1,2 \* and Michael Schönenberg1*

<sup>1</sup> Department of Clinical Psychology and Psychotherapy, University of Tübingen, Tübingen, Germany <sup>2</sup> LEAD Graduate School, University of Tübingen, Tübingen, Germany

#### *Edited by:*

Aina Puce, Indiana University, USA

#### *Reviewed by:*

Jorge Almeida, University of Coimbra, Portugal Timo Stein, University of Trento, Italy

#### *\*Correspondence:*

Aiste Jusyte, LEAD Graduate School, University of Tübingen, Europastraße 6, 72072 Tübingen, Germany e-mail: aiste.jusyte@uni-tuebingen.de Socially anxious individuals have been shown to exhibit altered processing of facial affect, especially expressions signaling threat. Enhanced unaware processing has been suggested an important mechanism which may give rise to anxious conscious cognition and behavior. This study investigated whether individuals with social anxiety disorder (SAD) are perceptually more vulnerable to the biasing effects of subliminal threat cues compared to healthy controls. In a perceptual judgment task, 23 SAD and 23 matched control participants were asked to rate the affective valence of parametrically manipulated affective expressions ranging from neutral to angry. Each trial was preceded by subliminal presentation of an angry/neutral cue. The SAD group tended to rate target faces as "angry" when the preceding subliminal stimulus was angry vs. neutral, while healthy participants were not biased by the subliminal stimulus presentation. The perceptual bias in SAD was also associated with higher reaction time latencies in the subliminal angry cue condition. The results provide further support for enhanced unconscious threat processing in SAD individuals. The implications for etiology, maintenance, and treatment of SAD are discussed.

**Keywords: social anxiety, threat bias, subliminal, face perception, preattentive processing**

#### **INTRODUCTION**

According to evolutionary accounts of threat processing, affective facial expressions, especially those depicting a source of direct (anger) or indirect (fear, disgust) threat represent a class of signals relevant for survival. A great amount of empirical evidence suggests that very quick processing of threatening signals is a part of an innate functional repertoire of a healthy human brain (Williams et al., 2004; Koster et al., 2005; Bar-Haim et al., 2007; Ohrmann et al., 2007; Bishop, 2008; Bannerman et al., 2009; LoBue, 2009; Pichon et al., 2012). These results can be interpreted in terms of the preparedness theory, according to which the existence of these neuronal mechanisms are beneficial for the survival of the organism (LeDoux, 2000, 2003; Öhman et al., 2000, 2007; Öhman and Mineka, 2001a). Quick information processing may translate to a crucial temporal advantage of milliseconds to prepare for and execute a behavioral response when faced with sudden danger. Due to direct neural projections to the visual cortex, the amygdala is considered to be the key structure modulating this early processing advantage for threatening information (Morris et al., 1998, 2001; Anderson and Phelps, 2001; Vuilleumier et al., 2004). In accordance with these neuroimaging findings, prior studies have provided behavioral evidence by demonstrating that fearful cues actually enhance perceptual sensitivity (Phelps et al., 2006; Stolarova et al., 2006; Lim and Pessoa, 2008). However, the threatening character of certain stimuli does not necessarily have to be inherent, but may also acquire their aversive quality through learning experiences (Stolarova et al., 2006; Keil et al., 2007a,b). These different types

of threatening stimuli may reflect qualitatively different aspects of a threat, e.g., inherent vs. acquired, which may have affect processing in different ways. Prior investigations from our work group showed that participants acquired a perceptual bias to subliminal threat only when inherently aversive stimuli (angry faces) were paired with aversive outcomes via a prior conditioning procedure (Jusyte and Schönenberg, 2013). While this initial evidence suggests that learning experiences may have enhance unconscious visual processing of threatening stimuli, it remains unclear how durable these effects may be and whether similar mechanisms can be assumed in relevant psychopathologies, such as anxiety disorders.

Generalized social anxiety disorder (SAD) is a prevalent morbidity with a typically early onset and chronic manifestation (Bruce et al., 2005). Symptoms revolve around an intense, persistent (anticipatory) fear of social and performance situations that is usually accompanied by increased autonomic arousal (McTeague et al., 2009) and results in subsequent avoidance behavior (Fehm et al., 2008). While avoidance is an undisputed maintaining mechanism in all anxiety disorders, it cannot fully explain the persisting nature of SAD (Hirsch and Clark, 2004). A large number of studies have pointed out that hypervigilance toward threatening information may represent a key mechanism contributing to the maintenance of this disorder (Staugaard, 2010). It has been suggested that social stimuli, especially faces signaling threat or disapproval, are particularly salient for individuals with social anxiety (Rapee and Heimberg, 1997), possibly as a result of the inherent biological preparedness (Öhman and Mineka, 2001b) and aversive learning experiences. Accordingly, socially anxious individuals have been shown to exhibit an attentional bias toward angry faces in visual attention paradigms (Horley et al., 2004; Pineles and Mineka, 2005; Klumpp and Amir, 2009) as well as enhanced neural reactivity toward angry expressions in limbic and extrastriate visual areas compared to healthy controls (Stein et al., 2002; Straube et al., 2005; Phan et al., 2006; Klumpp et al., 2012).

Enhanced unconscious threat processing is a possible mechanism underlying cognitive biases in anxiety disorders, as they may impact later processing stages and engender affective, cognitive as well as behavioral phenomenology, thus giving rise to the overgeneralization of fear (Dunsmoor et al., 2011). Although numerous studies have examined subliminal threat processing in other anxiety disorders (Brooks et al., 2012), only few studies have addressed this issue in socially anxious populations using disorder-specific stimuli, i.e., threatening faces. Empirical evidence, which mostly stems from analogous group studies, supports the notion that (social) anxiety is associated with altered early visual processing (Li et al., 2008), engagement and guidance of attentional resources (Mogg and Bradley, 2002; Holmes et al., 2009), enhanced subcortical response (Bishop et al., 2004; Phan et al., 2006; Vizueta et al., 2011) and may affect subsequent social judgments (Li et al., 2008) when the threatening stimuli are presented under conditions of restricted awareness.

Our research group has previously established a paradigmatic approach to investigate how subliminal threat cues may affect perceptual decisions (Jusyte and Schönenberg, 2013). In a series of experiments, healthy volunteers made affective judgments of morphed affective stimuli that were blends of neutral and angry expressions. Subliminal cues resulted in biased affective judgments of the morphed stimuli (i.e., more "angry" responses) only when the subliminal stimulus was angry and had been previously paired with an aversive experience. These results indicated that an acquisition of a perceptual bias to subliminal threat occurs only when the negative primes were paired with aversive outcomes in a previous conditioning procedure, which may mirror fear acquisition in real-world contexts. As highlighted earlier, patients with SAD represent a group with an especially pronounced bias to threatening facial expressions associated with alterations in preattentive processing. In contrast to healthy individuals, social anxiety may be associated with an increased salience of angry faces to such an extent, that even an unconscious "hint" of hostility may be enough to distort visual processing, resulting in a perceptual bias for anger even without prior conditioning, possibly due to prior aversive conditioning in real-world contexts. SAD patients may be perceptually more vulnerable to the biasing effects of unconscious threat cues, which could form the basis of affective, cognitive and behavioral symptoms in social anxiety.

The present study aimed to investigate this issue in individuals with SAD. Specifically, we were interested in whether subtle signals of threat that are presented under conditions of restricted awareness would result in biased performance on a subsequent affective judgment task. We expected SAD patients to make more "angry" responses if the preceding subliminal stimulus was angry as opposed to neutral, but healthy control participants were not

expected to show this effect. The perceptual bias in the SAD group was expected to be larger for ambiguous mask stimuli (morphed facial expressions ranging between angry and neutral) due a larger susceptibility to biasing effects of the subliminal cues. In accordance with the affective judgment, we expected faster reaction times (RTs)for unambiguous as opposed to ambiguous mask stimuli for both groups and a facilitation of visual processing reflected in lower RT latencies for the subliminal threat condition in the SAD group only. These effects would provide further support for enhanced unconscious threat processing in SAD individuals and may have important implications for the development of new treatment strategies.

## **MATERIALS AND METHODS**

### **PARTICIPANTS**

Social anxiety disorder and control group participants were recruited via an electronic announcement, addressing all undergraduate students of the University of Tübingen who either experience anxiety in social interactions or have no interactional difficulties. Interested individuals were then invited for participation and completed a self-report battery of social anxiety measures and were administered a clinical interview in order to confirm the SAD/healthy control group status. All participants completed questionnaire diagnostics using German versions of several questionnaires assessing dimensional severity of social anxiety. *Social Interaction Anxiety Scale (SIAS)* was used to assess the anxiety experienced in social interactional situations; *Social Phobia Scale* (*SPS*;Mattick and Clarke,1998; Stangier et al.,1999) was employed to measure levels of anxiety when individuals are scrutinized by others, and *Liebowitz Social Anxiety Scale* (*LSAS*; Liebowitz, 1987; Stangier et al., 2003) was used to assess the range of social interaction and performance situations that social phobics may fear/avoid. Furthermore, a structured interview [Mini International Neuropsychiatric Interview (MINI; Sheehan et al., 1998)] was administered by trained psychologists in order to validate the clinical diagnosis of SAD and to ensure the diagnosis-free status of healthy control participants. Exclusion criteria for the SAD participants were: a history of or current disorder of the schizophrenic or bipolar/manic spectrum, a diagnosis of borderline or antisocial personality disorder as well as awareness of the subliminal stimulus prime as assessed in the recognition task. In the healthy control group, exclusion criteria were a current psychopathology or a history thereof as well as awareness of the subliminal stimulus. Two participants from the SAD group and three controls were excluded due to their performance on the recognition task, which indicated that they were aware of the subliminal prime. The final sample of consisted of 23 SAD subjects and 23 healthy controls (see **Table 1** for more details). Subjects signed an informed written consent and received monetary compensation for participation. All experiments reported here were approved by the local ethics committee and are in accordance with the Declaration of Helsinki.

#### **MATERIALS** *Facial stimuli*

Angry and neutral facial expressions of seven male models from the Karolinska Directed Emotional Faces database (Goeleven et al.,

**Table 1 | Demographic and control measures.**


The data represented in the table refers to means and SDs for each measure (in parentheses). LSAS = Liebowitz Social Anxiety Scale; SIAS = Social Interaction Anxiety Scale; SPS = Social Phobia Scale; CR = confidence rating.

2008) were selected for the stimulus material. We only included models who depict anger without opening the mouth or baring teeth in order to limit the confounding effects of visual features in the masking procedure (Calvo and Nummenmaa, 2008). This resulted in a total of 14 color pictures (7 models × 2 expressions), which were edited in order to match the basic visual features (luminance, color) and size (cropping with an oval mask) using Adobe Photoshop CS4. This was necessary in order to achieve maximum masking efficiency. The emotional expression was parametrically varied using a morphing procedure (FantaMorph software, Abrosoft, Beijing, China) in which angry and neutral expressions of the same model were blended together. This resulted in a set of 11 intensity levels (10% increment steps) of angry expressions ranging from 0% (neutral) to 100% (angry) for each model (**Figure 1B**). One model identity was randomly selected for the subliminal stimulus set (unambiguous neutral and angry expressions). The stimulus material for the perceptual judgment task consisted of graded expressions of the remaining models (6 remaining models × 11 intensity levels), which were used as mask stimuli and two subliminal stimuli (neutral and angry expression of a randomly selected model identity). Visual stimuli were delivered via Presentation software (Version 14.5) throughout all phases of the experiment. Face stimuli (300 × 375 pixel) were presented in the center of an 19--CRT monitor against a black background.

#### **PROCEDURE**

After providing written informed consent, participants completed the questionnaires and the diagnostic interview. The subsequent experimental procedure included three consecutive steps: In the first step, the participants were exposed to the subliminal stimulus set in order to establish a comparability to the original experimental design from our previous studies (Jusyte and Schönenberg, 2013). Next, the participants performed the perceptual decision task. In the third step, the participants' ability to perceive the subliminal stimulus was assessed in order to ensure that all subjects were unaware of the subliminal stimulus condition.

#### *Step I: exposure*

During the exposure phase, neutral (50% of the trials) and angry expressions of one model identity (which later served as the

subliminal stimulus pair) were presented a total of 20 times in pseudo-randomized order with no more than three identical trials in a row. The temporal structure for the exposure trials was as follows: an angry/neutral face was presented for 4 s, followed by 1 s inter-trial-interval (ISI, blank screen). Participants were instructed to pay close attention to the visual stimuli in order to "get acquainted with the stimulus material"1.

#### *Perceptual decision task*

The task for the participants was to indicate whether a briefly presented face stimulus was angry or neutral via a button press. The participants were not informed about the subliminal stimulus presentation and were instructed to react as quickly and accurately as possible. Trials were organized in blocks with either a subliminal presentation of angry or neutral stimulus on every trial throughout the whole block. One block consisted of 22 trials in which a subliminal stimulus was immediately masked by a supraliminal presentation of a mask stimulus. Per block, 11 intensity levels of two different models were presented once in random order. A total of six blocks were necessary in order to present all intensity levels of each model once with a preceding subliminal neutral as well as angry prime. Four repetitions or a total of 528 trials (6 models ×11 intensities × 2 subliminal stimuli × 4 repetitions) were presented during the experiment. Block and trial order was randomized for each repetition and participant. The temporal trial structure was as follows: The trial began with a fixation cross (500 ms, centered) followed by a subliminal angry or neutral stimulus (30 ms) and immediately replaced by a 100 ms presentation of the mask, which was then followed by a 100 ms checkered stimulus (**Figure 1A**) and finally the perceptual decision task. After the participants' response, the next trial began aftera1s ISI.

#### *Recognition task*

A major issue in all paradigms investigating subliminal processing is the difficulty ensure that these stimuli were not consciously perceived (Pessoa, 2005). In order to address this issue, several steps were undertaken (Li et al., 2008). First of all, the participants had no notion of the subliminal stimulus condition. During the experimental task, the subliminal stimuli were presented for merely 30 ms and backwardly masked by a stimulus with very similar perceptual properties. Furthermore, in a recognition task following the experiment, we assessed both subjective and objective awareness of the subliminal prime in a perceptual decision task and a subsequent confidence rating.

Before the recognition task, participants were debriefed about the subliminal stimulus presentations and were instructed to indicate whether the first, brief stimulus was neutral/angry and to ignore the mask. Following the perceptual decision, the participants were asked to indicate how confident they were that they

<sup>1</sup>The present study is an extension of our previous experimental work. In that study, we also included a conditioning procedure and investigated how learning experiences may influence preattentive processing using the same paradigm (Jusyte and Schönenberg, 2013). In this prior investigation, we included a conditioning procedure during which the subliminal stimuli used in the experiment served as CS+/−. In order to compare the results to a condition in which no conditioning was applied, we included an exposure phase prior to the experiment. Hence, the exposure procedure in this study was included as means of ensuring comparability to our previous work with healthy participants.

answered correctly on a scale ranging from 1 (not sure at all) to 10 (completely confident). The confidence rating was chosen as a subjective measure of awareness. In a total of 36 randomized trials (6 models × 2 subliminal stimuli × 3 repetitions), intermediate intensity pictures (50%) of each of the six models from the experimental task were presented as face masks and preceded by either a subliminal angry or neutral stimulus. The (temporal) trial structure was identical to the perceptual decision task with the exception of the confidence rating. *d* scores were computed as objective indices of awareness. Both subjective and objective awareness of the subliminal stimulus condition were taken into account and only subjects who were considered unaware in both respects were included in the final analysis. Subjects were considered unaware and included in the analysis if they produced a *d* score between 1 and −1 (*d* range = +/−3.829; *d*- = 0 indicates no discriminatory ability) and did not exhibit significantly higher confidence ratings on correct vs. erroneous responses in the recognition task.

#### **RESULTS**

#### **SAMPLE**

Demographic and psychopathological description of the final sample is displayed in **Table 1**. There were no significant differences with regard to age, gender, educational status, objective/subjective indices of awareness of the subliminal stimulus between groups. The SAD group scored significantly higher on all three dimensional measures of social anxiety (LSAS, SIAS, SPS) than the control group. None of the control group participants was diagnosed in the structured interview. All experimental group participants fulfilled the categorical diagnostic criteria for social phobia.

#### **RECOGNITION TASK**

*d* scores were computed for each participant. Participants who outperformed the criterion range were excluded from the analysis (two control group and three SAD group participants). A one-sample *t*-test for the final sample revealed no significant difference from chance level for neither the control [*t*(22) = 1.84; *p* > 0.05] nor the SAD [*t*(22) = 0.49; *p* > 0.1] group; the analysis over collapsed data across groups also did not reach significance [*t*(45) = 1.67; *p* > 0.1]. To investigate the subjective awareness of subliminal stimulus condition, we computed paired-sample-*t*-tests regarding confidence ratings on the correct vs. incorrect responses in the recognition taskfor each group [SAD: *t*(22) = 1.10; controls: *t*(22) = 0.53; *p*s > 0.1], yielding no significant differences (see **Table 1** for more details). The results showed

that the subjects had virtually no awareness for the subliminal stimulus condition.

#### **PERCEPTUAL DECISION TASK**

The data analysis for the perceptual decision task was conducted in several steps: Firstly, we computed an analysis in order to investigate the potential perceptual bias. For this purpose, an analysis was computed for each group with total values reflecting the mean number of "angry" responses for each subliminal stimulus type and mask stimulus intensity. In a second step, we aimed to explore potential group differences in the perceptual bias related to subliminal stimulus type by employing *d* scores. This type of analysis is a more sophisticated way to examine the relative biases in perception for angry as opposed to neutral primes and has the advantage of reflecting the perceptual bias in a single value, thereby reducing the complexity of the model. Lastly, in order to control for potential speed-accuracy trade-offs that may be associated with the observed effects, we conducted an analysis of RT data.

#### *Perceptual bias*

In order to investigate the perceptual bias, an initial repeatedmeasures ANOVA with two within-subjects factors (subliminal stimulus type and intensity) as well as one between-subjects factor (group) was conducted using mean proportion of "angry" responses for condition and intensity level. The results indicated a main effect of stimulus intensity [*F*(10, 440) = 572.32; *p* < 0.001; η<sup>2</sup> <sup>p</sup> = 0.93], which was further qualified by a significant condition × group [*F*(1, 44) = 572.32; *p* < 0.05; η<sup>2</sup> <sup>p</sup> = 0.10] and a group × intensity interaction on a statistical trend level [*F*(10, 440) = 1.80; *p* < 0.10; η<sup>2</sup> <sup>p</sup> = 0.04]. To further investigate the interaction effects, separate 2 (subliminal stimulus type)×11 (intensity levels) repeated-measuresANOVA were computed (**Figure 2**) for each group. For the control group, there was a significant effect of stimulus intensity [*F*(10, 220) = 292.08; *p* < 0.001; η<sup>2</sup> <sup>p</sup> = 0.93], but neither subliminal stimulus type [*F*(1, 22) = 0.23; *p* > 0.1; η<sup>2</sup> <sup>p</sup> = 0.01] nor interaction [*F*(10, 220) = 0.44; *p* > 0.1; η<sup>2</sup> <sup>p</sup> = 0.02] reached significance. The SAD group, however, showed a significant effect of both intensity [*F*(10, 220) = 282.67, *p* < 0.001, η<sup>2</sup> <sup>p</sup> = 0.93] and subliminal stimulus type [*F*(1, 22) = 7.05; *p* < 0.05; η<sup>2</sup> <sup>p</sup> = 0.24], as well as an interaction effect [*F*(10, 220) = 1.93; *p* < 0.05; η<sup>2</sup> <sup>p</sup> = 0.08]. Paired-sample *t*tests (subliminal angry vs. neutral stimulus) were computed in order to further qualify the interaction effect, yielding significant differences at the first five intensity levels (all *p*s < 0.05). Thus, the results indicate that only SAD group subjects tended to make more "angry" responses when the subliminal stimulus was angry.

In order to investigate whether the differences in perceptual biases are evident between groups, an additional joint analysis was computed. Indices of bias for angry and neutral subliminal conditions (*d* scores) were computed in the same manner as for the recognition task. This resulted in 11 scores (intensity of the mask stimulus) for each experimental condition. Positive scores represent a bias for angry response rating of the mask stimulus in the subliminal angry relative to the neutral condition, and vice versa, while a *d*- -value around 0 represents no systematic bias. *d* scores were analyzed using a repeatedmeasures ANOVA (**Figure 4**) with one within-subjects-factor (Intensity) and one between-subjects-factor (Group). Neither stimulus intensity [*F*(10,440) = 0.53; *p* > 0.1; η<sup>2</sup> <sup>p</sup> = 0.01] nor the intensity × group interaction reached significance [*F*(10, 440) = 1.01, *p* > 0.1; η<sup>2</sup> <sup>p</sup> = 0.02]. However, there was a significant group effect [*F*(1, 44) = 4.34, *p* < 0.05; η<sup>2</sup> <sup>p</sup> = 0.09]. Subsequent one-sample *t*-tests computed with a total mean *d*- score over all 11 intensity levels revealed no significant differences from chance level for the control group [*M* = −0.03; SD = 0.21; *t*(22) = −0.63; *p* > 0.1], whereas the effect was significant for the SAD participants [*M* = 0.10; SD = 0.20; *t*(22) = 2.36;

to angry (100). The dark circles and solid lines represent an angry subliminal

*p* < 0.05]. These results indicate that a systematic tendency for angry responses as a function of subliminal stimulus condition was only evident in SAD participants as opposed to the control group.

Furthermore, a correlation analysis was conducted to further investigate the relationship between the extent of perceptual bias in the perceptual decision task (mean *d* scores reflecting the relative tendency to rate mask stimuli as "angry" when the preceding subliminal stimulus was angry) and the objective awareness measure (sensitivity *d* scores reflecting the ability to discriminate between the subliminal stimulus conditions) obtained in the recognition task. There were no significant correlations between these two measures neither on the group level (SAD: *r* = 0.15; controls: *r* = −0.10), nor in the collapsed data (*r* = −0.03, all *p*s > 0.1).

#### *Reaction time (Abrantes-Pais et al., 2007)*

Reaction time latencies larger than three seconds were excluded from the analysis. The percentage of excluded trials was not significantly different between the control and SAD group [*M* = 6.39; SD = 10.48 and *M* = 3.13; SD = 4.24; *t*(44) = 1.36; *p* > 0.1]. An initial repeated measures ANOVA with the within-subjects factors subliminal stimulus type and intensity as well as one betweensubjects factor (group) was conducted. The results yielded a main effect of stimulus intensity [*F*(10, 220) = 52.57; *p* < 0.001; η2 <sup>p</sup> = 0.54] as well as a significant effect of subliminal stimulus condition [*F*(10, 440) = 5.10; *p* < 0.05; η<sup>2</sup> <sup>p</sup> = 0.10]. These main effects were further qualified by a condition × intensity interaction [*F*(10, 440) = 4.30; *p* > 0.01; η<sup>2</sup> <sup>p</sup> = 0.09] and a condition × intensity × group interaction [*F*(10, 440) = 2.10; *p* > 0.05; η2 <sup>p</sup> =0.05]. To further investigate the interaction effects, 2 (subliminal stimulus type) × 11 (stimulus intensity) repeated-measures ANOVAs were computed with mean RTs (**Figure 3**) for each group.

A significant effect of stimulus intensity emerged in the control group [*F*(10, 220) = 26.46; *p* < 0.001; η<sup>2</sup> <sup>p</sup> = 0.55]; there

stimulus, the white circles and dashed lines represent a neutral subliminal stimulus. gSAD, generalized social anxiety disorder. MS, milliseconds; SEM, standard error of mean

was no significant effect of subliminal stimulus condition [*F*(1, 22) = 2.29; *p* > 0.1; η<sup>2</sup> <sup>p</sup> = 0.09] nor did the interaction [*F*(10, 220) = 1.67; *p* > 0.05; η<sup>2</sup> <sup>p</sup> = 0.07] reach significance. In the SAD group, there was a significant effect of stimulus intensity [*F*(10, 220) = 27.45; *p* < 0.001; η<sup>2</sup> <sup>p</sup> = 0.55] and a significant interaction [*F*(10, 220) = 4.05; *p* < 0.001; η<sup>2</sup> <sup>p</sup> = 0.16], but no significant effect of subliminal stimulus type [*F*(1, 22) = 2.8; *p* > 0.1; η<sup>2</sup> <sup>p</sup> = 0.11]. Paired-sample *t*-tests (subliminal angry vs. neutral stimulus) were computed for the *post hoc* analysis in order to further investigate

the interaction effect for each intensity level of the mask stimuli. The results revealed that SAD participants exhibited significantly higher RT latencies when the subliminal stimulus was angry vs. neutral at the first four intensity levels (all *ps* > 0.05). In both the SAD and the control group, a significant quadratic [*F*(1, 22) = 39.90; *p* < 0.001; η<sup>2</sup> <sup>p</sup> = 0.64 vs. *F*(1, 22) = 79.24; *p* < 0.001; η2 <sup>p</sup> = 0.78) as well as linear trend [*F*(1, 22) = 47.17; *p* < 0.001; η2 <sup>p</sup> = 0.68 vs. *F*(1, 22) = 51.40; *p* < 0.001; η<sup>2</sup> <sup>p</sup> = 0.70) emerged for the averaged RT data, which indicates an inverted U-shape pattern as well as lower RT latencies for unambiguous angry vs. neutral expressions.

## **DISCUSSION**

The present study investigated whether SAD patients are more susceptible to the biasing effects of threatening subliminal cues. The results of the perceptual judgment task showed that SAD subjects tended to make more "angry" responses regarding graded mask stimuli in trials with a preceding angry vs. neutral subliminal cue, while the proportion of "angry" responses did not vary as a function of subliminal stimulus condition in healthy subjects.

These results may reflect alterations in early visual processing, which possibly stem from hypersensitivity to threatening cues in associated subcortical structures (Straube et al., 2005; Phan et al., 2006; Stein et al., 2007). Accordingly, subliminal threat cues have been shown to elicit a robust neural response, particularly in anxiety-prone individuals (Li et al., 2008; Ball et al., 2012; Brooks et al., 2012). Several studies have investigated whether social phobia is associated with an increased sensitivity to facial expressions of threat by employing morphed stimuli of varying emotional intensity yielding conflicting findings (Richards et al., 2002; Mullins and Duke, 2004; Philippot and Douilliez, 2005; Joormann and Gotlib, 2006; Montagne et al., 2006; Rossignol et al., 2007; Schofield et al., 2007; Stevens et al., 2008; Garner et al., 2009; Heuer et al., 2010). Most of these studies failed to demonstrate that social anxiety is associated with a biased interpretation of emotion (Mullins and Duke, 2004; Philippot and Douilliez, 2005; Schofield et al., 2007) while one study reported a higher (Montagne et al., 2006) and another a lower (Joormann and Gotlib, 2006) threshold for the onset of negative emotion in facial expressions. The results of the present study are in line with previous literature which failed to find evidence for a biased interpretation of emotion in SAD, as our findings do not indicate a dramatically increased general perceptual sensitivity to angry expressions in SAD, but they do provide support for a vulnerability to the biasing effects of unaware stimuli. The hypersensitivity and earlier onset of hostile cue perception in facial expressions, of which the anxious individual may not even be aware, has the potential to cause anxious rumination and misinterpretation of the social partner's facial expression, resulting in a cognitive overload and a failure to down-regulate these emerging misinterpretations by means of a top-down control.

Interestingly, the perceptual bias was observed at relatively low perceptual intensities of anger in mask stimuli (0–50% anger proportion) in the preceding subliminal angry vs. neutral stimulus condition. This finding is intriguing, because one would expect the biasing effect of subliminal cues to be most prominent at

intermediate stimulus intensity levels of the mask stimulus due to their ambiguity. Our data shows that the SAD group is particularly sensitive to the biasing effects of the hostile subliminal stimulus even when the mask stimulus barely contains anger.

The overall results of RT latency revealed an inverted U-shape pattern with respect to stimulus intensity in both groups, reflecting lower RTs for unambiguous angry and neutral expressions and an increase at intermediate stimulus intensity levels. Hence, both groups exhibited peak RT latencies at intermediate intensity levels. This pattern may reflect judgment uncertainty associated with stimulus complexity, which can be assumed to be higher for ambiguous vs. prototypical expressions (Lim and Pessoa, 2008). Moreover, both groups exhibited faster RTs for unambiguous angry vs. neutral expressions. This pattern indicates a behavioral speeding effect for angry faces, which may reflect a prioritized processing of angry vs. neutral stimuli (Lim and Pessoa, 2008; Lee et al., 2009).

Furthermore, we hypothesized that the perceptual bias evident in the SAD group would also be associated with a behavioral speeding, i.e., faster RT latencies, in the subliminal angry vs. neutral condition. Our data did not provide support for this assumption; in fact, a contrary interactional effect emerged: SAD subjects tended to show higher RT latencies in the subliminal angry cue condition at low to intermediate intensity levels. Interestingly, the differential RT slowing corresponded closely with the intensity levels at which perceptual judgment bias for subliminal angry vs. neutral condition was most prominent. This may be due to the incompatibility between the prime and the masking stimuli that call for different response alternatives and result in a competition, which is considered to be a major determinant of prolonged RT and erroneous responses (Klapp and Hinkley, 2002; Praamstra and Seiss, 2005). Furthermore, the evidence regarding RT speeding for threatening faces in SAD patients appears to be rather inconsistent (Staugaard, 2010). While some studies report a behavioral facilitation for affective material (Becker, 2009; Lee et al., 2009; Olatunji et al., 2011), there is also a line of evidence demonstrating a behavioral interference, in particular for negative stimuli (Buodo et al., 2002; Pereira et al., 2006; Sommer et al., 2008; Pereira et al., 2010). Recent evidence has also uncovered the neural mechanisms underlying the interference effects of negative emotional stimuli on behavior (RT slowing), which may represent the basis of defensive behavioral responses such as freezing (Pereira et al.,2010; Pichon et al., 2012).

The present study extends our previous experimental work, which has some implications for the understanding of general mechanisms of affective stimulus processing as well as for the etiological models of anxious psychopathology. The affective judgment pattern observed in the SAD group strongly resembles the results obtained in Experiment 1 of our previous experimental series (Jusyte and Schönenberg, 2013). The behavioral data of healthy participants who performed the same judgment task after undergoing an aversive learning procedure, where the angry face (which later served as the subliminal stimulus in the affective judgment task) was paired with an aversive outcome, bears substantial similarity to the performance of SAD participants, who did not receive aversive conditioning. Therefore, the paradigm employed in our previous investigation with healthy participants

may be an analog of the naturalistic process by which attentional vigilance in social anxiety develops, where an inherently negative stimulus is repeatedly paired with aversive experiences. To some extent, this may also reflect a natural and adaptive process by means of which individuals become more sensitive to facial displays of threat/dissaproval in those individuals with whom they associate unpleasant experiences. In future studies, it would be interesting to investigate whether other forms of experiential learning based on interactional outcomes, such paradigms involving social exclusion or inclusion experiences, would result in a similar sensitization toward subliminal threat stimuli.

These results of the present study also have some important implications regarding the development and maintenance of SAD. The data indicate that SAD patients exhibit an inherent anxious response pattern and appear to be sensitive to even very subtle signs of threat, which have the potential to guide volitional behavior. The fact that SAD participants do not require conditioning in order to unfold this sensitivity may be due to previous learning experiences in real life, in which facial expressions of anger or disapproval have acquired a potent signaling function. An angry face may represent such a highly potent signal of threat for social phobics that even a subtle "hint" of a hostile percept could suffice to bias early visual processing, resulting in a perceptual bias for "angry" responses even without prior conditioning, possibly due to prior aversive conditioning in real-world contexts.

Interestingly, both in this as well as our previous investigation using the same paradigmatic approach, we did not find evidence for a biased performance as a function of subliminal prime in healthy individuals, which contradicts a large number of studies from the priming literature (Murphy and Zajonc, 1993; Rotteveel et al., 2001; Winkielman et al., 2005; Dannlowski and Suslow, 2006; Almeida et al., 2013). On the other hand, not all studies have been able to replicate the threat processing advantage and conflicting evidence is reported over a variety of paradigms stemming from the attentional as well as perceptual unawareness literature (Pessoa, 2005; Pessoa et al., 2005, 2006; Bar-Haim et al., 2007; Purcell and Stewart, 2010; Lee et al., 2011). This may for one be due to the stimulus material employed in these studies. For instance, in an experimental series conducted by Calvo and Nummenmaa (2008), the authors concluded that not the emotional valence but certain salient physical features may underlie the processing advantage of emotional expressions in the face in the crowd paradigm. These salient features refer to the distribution of luminance in an emotional face caused by narrowing or widening the eyes, visibility of the teeth or opening the mouth. Paradigms for the investigation of subliminal threat processing may be even more vulnerable to these confounding effects. Considering the fact that many of the studies from the priming literature used characters rather than faces as masks, and the primes themselves were not cropped to remove areas such as hair in order to reduce contrast and target visibility, the reported priming effects may in part be due to a greater prime visibility. In addition, it is very hard to rule out this possibility due to the fact that most studies did not employ a valid awareness manipulation check to rule out this possibility. Some authors go so far as to say that priming effects may actually just reflect visual

confounds caused by insufficient masking ability (Pessoa, 2005; Pessoa et al., 2005, 2006). However, a number of recent findings call these conclusions into question. For instance, studies that had employed extremely brief presentation times (17–20 ms) still found a reliable amygdalar signal to briefly presented threatening stimuli (Pegna et al., 2004; Whalen et al., 2004; Liddell et al., 2005; Williams et al., 2005; Ohrmann et al., 2007; Pegna et al., 2008). Hence, the presence or absence of behavioral priming effects may critically depend on the extent of such activation, which is likely why this study did find priming effects in individuals who have been shown to be particularly sensitive to displays of threat, namely SAD.

The present study has several strengths and limitations worth mentioning. Among the strengths are the homogenous SAD and the well-matched control group as well as the within-subjects repeated-measures design, which provides for a high statistical power of the obtained results. Furthermore, we employed highly homogenous stimulus material regarding color, luminance and the distribution of light and dark areas in the emotional faces, which allowed for a very efficient masking procedure. The assessment of subjective as well as objective awareness of the subliminal stimulus is recommended for investigations which employ subliminal primes (Pessoa, 2005) and has been followed in the present study. One limitation concerns our stimulus material, which included only one emotional expression, namely varying intensities of anger. Several studies have shown that socially anxious individuals exhibit alterations in the processing of facial expressions exhibiting not only overt aggression (anger) but also milder forms of hostile expression that signal disapproval, such as disgust and contempt (Stein et al., 2002; Amir et al., 2005; Phan et al., 2006). Hence, future studies should attempt to investigate how the present findings extend to other forms of hostile facial expression. Furthermore, although we made attempts to match the priming and masking stimuli on low-level visual features by excluding models with visible displays of teeth, the influence of such features cannot be entirely ruled out. For instance, two recent studies that used subliminal presentation using continuous flash suppression indicated that low-level features, such as spatial frequencies, may underlie emotion processing advantages observed in similar paradigms (Stein and Sterzer, 2012; Stein et al., 2014). Perhaps the strongest argument against this is that we found group differences between SAD and healthy controls; thus, the perceptual sensitivity to subliminal displays of anger was associated with a factor related to an inherently individual characteristic of one group. Although the role of low-level features cannot be entirely ruled out, we believe that it does not sufficiently explain all of the results obtained in this study. For future research working with backward-masking paradigms, we recommend to include a more rigorous and sophisticated control of low-level visual features such as adjustments and matching of root mean square contrast.

In addition, while enhanced visual processing due to direct projections from hyperactive subcortical structures is a likely mechanism that accounts for the present results, we did not test these assumptions using brain imaging techniques. Moreover, we cannot rule out that the observed effects are related to differences in response priming rather than shifts in perceptual sensitivity, that is, the prime could simply affect response criteria, rather than actual expression perception. Future studies that employ both imaging techniques and sophisticated experimental designs which allow to distinguish between response and perceptual biases are needed to understand the underlying mechanism. Finally, this study did not elucidate whether the enhanced perceptual sensitivity is part of a SAD symptom correlate or rather a marker for vulnerability. This issue should be elucidated in future research.

In summary, the present work provides further evidence for enhanced perceptual processing of threatening facial expressions in SAD individuals. These findings beg the question whether the bias observed in our study is stable and whether it can be modified by means of classical cognitive–behavioral intervention methods or new computer-based training approaches that target attentional processes (Bar-Haim, 2010). It is possible that modification of later processing stages, may have a synergic effect on the automatic processing stages, but the anxious perceptual processing style may also be stable, which would mean that anxious individuals would always remain prone to relapse into an anxious psychopathology. Incorporation of these aspects in psychoeducation and strengthening the patient's ability to employ top-down strategies in order to counter the hyperactive threat detection system may be a useful strategy to down-regulate the hypersensitive perceptual threat processing. The present paradigmatic approach may be useful in future studies in order to elucidate these issues and could also prove to be a suitable outcome measure that reflects early information processing.

#### **ACKNOWLEDGMENTS**

The authors would like to thank Rebekka Kreußer, Alexander Schneidt, and Eva Wiedemann for their support in conducting the study. This research was supported by the LEAD Graduate School [GSC1028], a project of the Excellence Initiative of the German federal and state governments. The authors acknowledge the support by Deutsche Forschungsgemeinschaft and Open Access Publishing Fund of Tuebingen University.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 22 May 2014; accepted: 14 July 2014; published online: 04 August 2014. Citation: Jusyte A and Schönenberg M (2014) Subliminal cues bias perception of facial affect in patients with social phobia: evidence for enhanced unconscious threat processing. Front. Hum. Neurosci. 8:580. doi: 10.3389/fnhum.2014.00580 This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Jusyte and Schönenberg. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Project PAVE (Personality And Vision Experimentation): role of personal and interpersonal resilience in the perception of emotional facial expression

## **Michal Tanzer \*, Golan Shahar and Galia Avidan**

Department of Psychology, Ben-Gurion University of the Negev, Beer Sheva, Israel

#### **Edited by:**

Davide Rivolta, University of East London, UK

#### **Reviewed by:**

Anna Sedda, University of Pavia, Italy Darren Burke, University of Newcastle, Australia

#### **\*Correspondence:**

Michal Tanzer, Department of Psychology, Ben-Gurion University of the Negev, P.O. Box 653, Beer Sheva 84105, Israel e-mail: tanzer@post.bgu.ac.il

The aim of the proposed theoretical model is to illuminate personal and interpersonal resilience by drawing from the field of emotional face perception. We suggest that perception/recognition of emotional facial expressions serves as a central link between subjective, self-related processes and the social context. Emotional face perception constitutes a salient social cue underlying interpersonal communication and behavior. Because problems in communication and interpersonal behavior underlie most, if not all, forms of psychopathology, it follows that perception/recognition of emotional facial expressions impacts psychopathology. The ability to accurately interpret one's facial expression is crucial in subsequently deciding on an appropriate course of action. However, perception in general, and of emotional facial expressions in particular, is highly influenced by individuals' personality and the self-concept. Herein we briefly outline well-established theories of personal and interpersonal resilience and link them to the neuro-cognitive basis of face perception. We then describe the findings of our ongoing program of research linking two well-established resilience factors, general self-efficacy (GSE) and perceived social support (PSS), with face perception. We conclude by pointing out avenues for future research focusing on possible genetic markers and patterns of brain connectivity associated with the proposed model. Implications of our integrative model to psychotherapy are discussed.

**Keywords: angry expression, happy expression, general self-efficacy, perceived social support, biased emotion recognition**

The notion that individuals and social context actively shape each other, evident in numerous conceptual perspectives, is represented in Albert Bandura's seminal principle of *reciprocal determinism* (Bandura, 1978; see also Shahar, 2006 for review of such action models in clinical psychology). Herein we extend this notion by proposing an integrative model that incorporates research on perception and recognition of emotional facial expressions. Specifically, we posit that biased emotional face perception and its relation to individuals' personality and self- concepts may explain vulnerability to, and resilience in the face of, a host of psychological difficulties. We begin by providing a brief overview of the well-established concepts that contributed to our overarching model. We then describe findings emanating from our ongoing program of research entitled **Project PAVE** (**P**ersonality **A**nd **V**ision **E**xperimentation) which link personal and interpersonal resilience and perception of emotional facial expressions. We conclude by noting avenues for future research focused on possible genetic markers and patterns of brain connectivity associated with the proposed model, as well as implications for psychotherapy.

## **PERCEPTION OF FACIAL EXPRESSION AND ITS ROLE IN VULNERABILITY TO, AND RESILIENCE IN THE FACE OF, PSYCHOPATHOLOGY**

As presented in **Figure 1** (Step 1), mounting evidence in social, developmental, and clinical psychology, inspired by Bandura's principle, highlight the active role of individuals in shaping their own environment, consequently, affecting interpersonal relations, risk to psychopathologies or their selfconcept (Lerner, 1982; Swann, 1983, 1990; Buss, 1987; Hammen, 1991; Joiner, 1994; Wachtel, 1994; for review, see Shahar, 2006). For example, depressed or self-critical individuals may generate interpersonal aversive circumstances that eventually maintain or elicit their depressive state and/or their self-criticism (Joiner, 1994; Mongrain, 1998; Joiner et al., 1999; Zuroff and Duncan, 1999; Priel and Shahar, 2000; Shahar and Priel, 2003; Blatt and Shahar, 2004; Shahar et al., 2004; Bareket-Bojmel and Shahar, 2011; Shahar and STREALTH LAB, in press).

But what is the mechanism underlying these findings? According to some social theories, individuals form their self-concept, at least in part, based on the ways others observe them and relate to

them (e.g., the looking glass; Cooley, 1902). A similar notion was also postulated by the well-known psychoanalyst Winnicott in his theory regarding "mirroring" (1971), according to which infants form their sense of self by mentally absorbing their mother's facial expression as she attends to them. Relatedly, according to Swann's self-verification theory, people are motivated to search for evidence confirming their self-concept, and this motivation influences perceptual information processing (Snyder and Swann, 1978; Murray et al., 2000) as well as social interactions (Swann et al., 1989, 1994). Specifically, depressed individuals are more prone to interactions with partners who perceived them unfavorably and were indeed more alienated and rejected than nondepressed individuals (Swann et al., 1992).

Another approach for understanding this vicious cycle comes from Beck's cognitive model, stating that depressed individuals are likely to process information in a dysfunctional manner and this biased acquisition and processing style contribute to the maintenance of their psychopathology (Beck, 1967, 2008). Studies supporting this notion stress the causal role of biased attention in increased emotional vulnerability and investigate how interventions that modulate biased processing affect psychopathological disorders (for a related review see Mathews and MacLeod, 2002; Browning et al., 2010; Hakamata et al., 2010; Wells et al., 2010, and special section on cognitive bias modification in The Journal of Abnormal Psychology, Koster et al., 2009, but see Hallion and Ruscio, 2011). For example, the induction of an attentional bias by manipulating the location of threatening/neutral words prior to the presentation of to be detected probes using the dot probe paradigm, modified individuals' response time and consequently affected their mood during a standardized stress manipulation. Specifically, the group that was biased toward threat by being presented with threatening words prior to the probe, exhibited a greater increase in negative mood during the following stress task, compared to the group presented with neutral words (MacLeod et al., 2002).

Thus, previous studies imply that biased information processing and specifically, social-emotional information, may play a primary role in the development and maintenance of psychopathology, (Beck, 1967; Mathews and MacLeod, 2005; Bar-Haim et al., 2007; Clark et al., 2009; Disner et al., 2011; Roiser et al., 2012)—in turn affecting interpersonal relations (Swann et al., 1992; Shahar, 2006). This process is illustrated in Step 2 of **Figure 1**.

Previous studies have attested to segments of the processes presented in **Figure 1**. For example, anxiety has been associated with the tendency to attend to threatening information [e.g., the emotional Stroop (Stroop, 1935), the dot probe task (MacLeod et al., 1986) and the emotional spatial cuing paradigm (Fox et al., 2001), for further elaborations on these tasks see reviews by Bar-Haim et al., 2007; Cisler and Koster, 2010]. Using the dot probe paradigm (see description above), it has been demonstrated that individuals with a general anxiety disorder are faster to respond to probes replacing threat words, than neutral words, as compared to controls (e.g., MacLeod et al., 1986; Mogg et al., 1992). Similarly, depression was associated with a bias toward negative congruent information, mostly due to a difficulty in disengaging from information with a negative valence (for reviews see Gotlib et al., 2004; Mathews and MacLeod, 2005; Clark et al., 2009; Gotlib and Joormann, 2010; Disner et al., 2011; Roiser et al., 2012). Relatedly, in the emotional Stroop task, in which participants' response time to name the color of an emotional written word indicates their ability to disengage from the emotional context, depressed patients were slower to name the color of negative emotional words, compared to non-depressed controls (Gotlib and McCann, 1984; Broomfield et al., 2007).

In relation to the above, Roiser et al. (2012) proposed a cognitive neuropsychological approach for the understanding and treatment of depression. This model is based on a presumed casual chain linking negative information processing (e.g., biased emotional perception, attention and memory) to the development of symptoms of depression. Presumably, such a cognitive bias is affected by alterations in biological factors (e.g., monoamine transmission via different brain circuits involved in affective regulation and processing) and their interactions with both environmental and genetic factors (see the detailed model in Roiser et al., 2012). Importantly, this model was based on results obtained from a longitudinal design (Forbes et al., 2007) as well as on studies conducted with individuals at risk for developing depression, or on ones that recover from it (see Roiser et al., 2012).

Within this general theoretical template, our own particular contribution lies in the focus on biased processing of emotional facial expressions as depicted in Step 3 of **Figure 1**.

Given its unique evolutionary and social significance, face perception is probably the most multifaceted visual perceptual skill in humans. In addition to invariant information such as identity and gender, faces convey a large amount of subtle, variant, changeable information such as age (Ishai, 2008), expressions (Fox et al., 2000), intentions (van't Wout and Sanfey, 2008) and mood (Adolphs, 2003) upon which human observers rely for social interaction and communication. A wealth of behavioral literature posits that this efficient and multifaceted processing of faces is accomplished *in a qualitatively different* fashion compared to the processing of other object categories. Specifically, deriving a rapid and accurate representation of the face requires a disproportionate reliance on the configuration of the physical features of the face relative to that required for non-face object recognition (Behrmann et al., in press). This *holistic processing* is considered a hallmark of face perception (Farah et al., 1995; Richler et al., 2011; Behrmann et al., in press; DeGutis et al., 2013). Neuroimaging studies in humans collectively point to a number of "core regions" that show selective responses associated with the visual invariant, as well as variant, properties of faces. Additionally, there are a number of regions outside the occipitotemporal cortex that constitute an "extended" face recognition system with unique roles in processing high-level attributes of face perception such as memory and emotion (Haxby et al., 2000; Gobbini and Haxby, 2007; Ishai, 2008).

Of all the different types of information embedded in the face, facial expressions are of most relevance to the present investigation. Emotional face perception constitutes a key mechanism for social communication which is crucial for forming appropriate actions during social interactions (Öhman and Mineka, 2001; Haxby et al., 2002; Russell et al., 2003). Individuals' facial expressions allude to the expresser emotional state and may elicit a similar response in the observer (Haxby et al., 2002). The preference to look at face-like stimuli can be observed in newborns (Johnson et al., 1991), and first signs of facial expression recognition abilities are witnessed during the first year of life (Walker-Andrews, 1997; De Haan and Nelson, 1998; Farroni et al., 2007). Moreover, the process of recognizing an emotion from a face in order to produce a conceptual knowledge of this expression was suggested to involve areas in the core and extended systems via their anatomical and functional connections (Adolphs, 2002).

Furthermore, and most pertinent to our proposed model, psychopathological disorders were shown to be closely associated with biased processing of emotional face stimuli (see Mathews and MacLeod, 2005; Cisler and Koster, 2010; Yiend, 2010, for reviews). For example, individuals suffering from comorbid anxiety and depression recognized angry expressions better than happy and neutral expressions, a pattern that is reversed compared to controls (Gilboa-Schechtman et al., 2002). Additionally, Jermann et al. (2008) reported a positive correlation between depressive symptoms and the conscious recollection of sad expressions. Moreover, socially phobic patients better recalled faces that they judged as "critical" at the learning phase, while non-anxious controls performed better with faces that were judged as "safe" (Lundh and Öst, 1996; Coles and Heimberg, 2005).

The notion of biased attention toward recognition of facial expressions is also related to the idea that individuals' thoughts and feelings about themselves are closely related to the way in which they believe others perceive them (Cooley, 1902; Sullivan, 1953; Shraugher and Schoeneman, 1979). Moreover, the way individuals perceive themselves affect the way they perceive others (Swann, 1983, 1990; Leary, 1990). This notion is well captured in the seminal quote by Merleau-Ponty (1964) "*I begin to live my intentions in the facial expressions of the other and likewise begin to live the other's volitions in my own gestures" (p. 119*). Thus, through the prism of emotional face perception which is shaped by one's own self-views, individuals interpret their social environments, and this subjective interpretation, may in turn affect psychopathology and project back on their self-perception.

But what about resilience to psychopathology? Individuals have the ability to adapt, cope and maintain a stable equilibrium in the face of life stressors (Rutter, 1985; Richardson, 2002; Bonanno, 2004; Shahar et al., 2012). Yet, the question of why some people are more emotionally resilient than others still awaits an answer. We suggest that the relation between resilience factors such as personality traits or social variables, and processing of emotional face perception may be informative for understanding risk/resilience to psychopathology in terms of prevention: by investigating what makes some people more immune to the effects of negative valence or alternatively, more subjected to positive valence, we may be able to identify those individuals who are most vulnerable to adverse circumstances (Hauser et al., 2006; Shahar, 2012). Step 4 of **Figure 1** depicts our full-fledged model.

A number of factors have been associated with resilience, among them having high self-esteem or self-efficacy (Garmezy, 1991; Werner and Smith, 1992; Rutter, 1993; Masten, 1994), having emotional stability, extraversion or agreeableness (Friborg et al., 2005) and reporting elevated levels of perceived social support (PSS; Cohen and Wills, 1985; Kessler et al., 1985; Cohen et al., 2000; Cohen, 2004; Uchino, 2006; Lakey and Orehek, 2011). These factors were shown to contribute to positive outcomes and protect against negative ones. For example, social support has been shown to protect against a wide variety of adverse outcomes including depression (Lakey and Cronin, 2008), post-traumatic stress disorder (Brewin et al., 2000), and physical illness (Uchino, 2006) and to promote positive consequences such as self-care (Graven and Grant, 2014), coping strategies (Cohen and Wills, 1985; Davis and Swan, 1999; Wills and Fegan, 2001) self-control (Wills and Bantum, 2012) and optimism (Karademas, 2006). Importantly, there is almost no research on the possible underlying mechanisms mediating these effects particularly from the neuro-cognitive perspective, let alone focusing on face perception.

### **PROJECT PAVE**

Project PAVE was launched in order to examine our proposed link between vulnerability/resilience, emotional face perception, and self/social functioning. In the following sections we will describe the findings emanating from this project and note some future directions and implications.

First, we examined the associations between general selfefficacy (GSE), a central dimension of personal resilience pertaining to individuals' positive beliefs about their own capabilities (Bandura, 1997). We hypothesized that happy facial expression may signal approval by others, which should be congruent with the preceptor's high self-worth. Thus, we predicted that GSE would be positively correlated with accurate recognition of happy facial expression.

To test our hypotheses, we used a morph technique that merged between two emotional stimuli to create a new image containing a specified percentage from each of the original stimuli (see **Figure 2**). This method enabled us to assess both accuracy and bias depending on the morph level of the dominant expression. Participants (*n* = 70) were asked to classify the expression presented in each trial. Accuracy was determined by the dominant expression within each morph blend. Prior to the behavioral task, participants completed a battery of questionnaires assessing their self-efficacy and depressive symptoms. As predicted, and even after controlling for depressive symptoms (in this, as well as in all other studies described below), individuals with high self-efficacy showed a specific bias towards recognition of happy facial expressions. We interpreted this effect as a way to maintain and form affirming relations, which may serve as a protective factor during stress (Tanzer et al., 2013a).

Next, we hypothesized that happy facial expression would be better memorized compared to angry expressions, as the former may serve as a potential shelter, one could lean on and recall in a time of need. Thus we conducted another study in which participants (*n* = 92) were asked to memorize faces portraying happy/angry expressions and then (after a short interval) to recall which face was previously presented and retrieve the portrayed expression. As expected, GSE was positively correlated with better identity recognition for faces portraying a happy expression during the learning phase and with the tendency to recall the learned expression as happy. Taken together, our findings suggest that individuals with high GSE are tuned, in terms of *both* recognition and memory, to "happy others", possibly as a way of selfverification of their own positive self-views. This self-efficacious prism, through which one interprets his/her surrounding, may reduce stress and protect against potential hazards, consequently minimizing the risk for psychopathology (Tanzer et al., 2013a).

In our next line of studies we sought to examine other protective factors that are more related to the social context. Inspired by theories linking cognitive processes to interpersonal relationships (Leary, 1990, 2005; Pickett et al., 2004; Pickett and Gardner, 2005), we focused on PSS. PSS refers to the interpersonal network of resources that is available to individuals to provide help during time of need (Cohen and Wills, 1985; Lakey and Cronin, 2008; Lakey and Orehek, 2011). Based on the known role of PSS as a main protective factor against a wide range of negative life events or as a stress buffer minimizing their aversive outcomes (e.g., Cohen and Wills, 1985; Theran et al., 2006; Lakey and Cronin, 2008; Shahar et al., 2009; Lakey and Orehek, 2011), we predicted that it would be negatively associated with recognition of an angry expression, as the latter is a sign of threat one should avoid. Using the morph paradigm again, we now morphed between angry and neutral facial expressions and indeed found that individuals (*n* = 71) with elevated levels of PSS were less accurate in recognizing angry facial expressions (Tanzer et al., Submitted). Thus, positive PSS emerged as a protective factor that enables individuals to monitor their environments and overlook angry facial expressions, arguably being more open to positive and rewarding exchanges.

We also examined the impact of PSS on emotional face processing in a stressful situation by a failure/success manipulation (for details regarding the manipulation see Mendelson and Gruen, 2005; Tanzer et al., 2013b). Participants (*n* = 142) first filled questionnaires assessing their PSS and depressive symptoms and were then randomly allocated to a failure or a success condition, and accordingly were lead to believe they either failed or succeed at the Raven intelligence test (Raven et al., 1985). We hypothesized that PSS would act as a protective shield against hazards (e.g., an angry facial expression) in a time of need (e.g., the failure condition). Following the failure/success manipulation, they participated in the morph experiment that enabled assessing the accuracy and bias involved in recognition of emotional facial expressions (**Figure 2**). As expected, we found that in the failure

**FIGURE 2 | Example of morph stimuli used in the experiments**. The original stimuli (AM01) were taken from the KDEF database (Lundqvist et al., 1998). In this example stimuli are comprised from angry and happy faces morphed together to create a continuum of blending.

group (i.e., where individuals were bogusly believed they failed an intelligence test alluding to their self-worth), participants with elevated levels of PSS, as compared to those with low levels of PSS, were less accurate in recognizing angry facial expression, possibly as a way to maintain their self-worth during a time of need (Tanzer et al., 2013b).

In a similar fashion, we continued our investigation and examined how induced social support interacts with individuals' selfworth (i.e., GSE) in relation to recognition of an angry facial expression. Participants first completed questionnaires assessing their GSE, PSS and depressive symptoms (*n* = 54). They then took part in an imagery task, where they were asked to visualize a close partner or someone else who betrayed them in a time of need. Following this manipulation, they participated in the morph experiment. We predicted that both elements (i.e., positive support and elevated levels of GSE) would act synergistically to produce a bias against negative social cues (i.e., an angry facial expression). Such an intriguing interaction was indeed found and interpreted as a "protective shield" enabling individuals to monitor their surroundings in order to avoid recognition of angry expressions which consequently improve their well-being (Tanzer et al., Submitted).

### **CONCLUSIONS, FUTURE DIRECTIONS, AND IMPLICATIONS**

Taken together, these biases towards positive (e.g., happy expressions) facial expressions or against negative ones (e.g., angry expressions) may suggest biased emotional face processing as an underlying mechanism of the chain that leads from personality/self-concepts or interpersonal relations to risk/resilience to psychopathology. Protective factors (e.g., GSE and PSS), may serve as a "narrow" adaptive prism through which one interprets his/her surroundings. This biased perception, may consequently lead to selective attention to, or dismissal of, specific aspects of the environment, which eventually generate benevolent effects and reduce maladaptive ones. Whereas research on biased face processing in clinical populations has developed tremendously in the past decade (e.g., Mathews and MacLeod, 2005; Cisler and Koster, 2010; Yiend, 2010, for reviews), research on individual differences within the non-clinical populations is still in its infancy, and we suggest that focusing on the latter would open up an important avenue for better understanding of human behavior that in turn, may promote psychotherapy interventions.

Our suggested model emanated from different theories in diverse subfields of psychology (i.e., clinical, social and cognitive) and neuroscience. Thus, we were inspired from Bandura (1978) on reciprocal determinism and the perspective of action theory that stresses individuals' role in actively shaping their own environment (Lerner, 1982; Brandstadter, 1998; Shahar, 2006). Additionally, we built upon Winnicott's notion of the mirroring role of the mother as a vehicle for self-knowledge (Winnicott, 1971; see also Shahar and STREALTH LAB, in press), on socialclinical theories which aimed to explain how individuals construct self-views (e.g., the looking glass; Cooley, 1902), and how these self-views affect individuals' perception [self-verification theory (Swann, 1983, 1990)]. Moreover, we were influenced by theories on biased cognition such as Beck's notion on individuals' dysfunctional schemes and its effect on information processing. Finally, we were inspired by our vast interest in face processing, in relation to cognitive and developmental aspects (Behrmann and Avidan, 2005; Behrmann et al., in press; Avidan and Behrmann, 2014). As is evident, even when designing the most "basic" cognitive paradigm, one should bear in mind the existence of individual differences and the interplay between individuals' self and their outer subjective surrounding and these factors should be taken into account.

Our theoretical model alludes to neural mechanisms that may be involved in emotional face perception. While an extensive review of the vast literature on the neural basis of face perception lies outside the scope of this brief article (see Haxby et al., 2000; Gobbini and Haxby, 2007; Ishai, 2008; Rossion, 2014), we wish to point out the importance of focusing on the amygdala, known for its role in emotional face processing and its vast direct and indirect connections to cortical and subcortical structures, thus making it an important neural "hub" (LeDoux, 2000; Davis and Whalen, 2001). Specifically, it has been suggested that regulation of emotional stimuli may be accomplished by the reciprocal connections between the amygdala and orbital and ventro-medial prefrontal cortex (Adolphs, 2002; Vuilleumier, 2005). This coupling between the amygdala and prefrontal areas was in the focus of numerous studies, implicating its association with genetic individual differences [(i.e., genetic polymorphism), Hariri et al., 2002] and more specifically with allelic variation in the promotor region of the serotonin transporter gene (5-HTTLPR). For example, carriers of the s-allele, compared with the l-allele, of 5-HTTLPR showed elevated hemodynamic response to fearful expressions during fMRI scans (Hariri et al., 2002), which was associated with reduced coupling between the amygdala and the subgenual cingulate gyrus (Pezawas et al., 2005). Interestingly, an attentional bias toward happy facial expressions was associated with carrying of the "l" allele (Pérez-Edgar et al., 2010), thus possibly implicating this genetic variable as a potential protective factor against stressful life events (Fox et al., 2009).

Evidence more pertinent to our presented model and to the suggested future directions comes from studies that reported that the strength of the functional connections (as assessed with fMRI) between the amygdala and medial prefrontal areas was associated with the size of one's social network (Bickart et al., 2012), as well as to diverse psychopathologies (e.g., anxiety: Kim et al., 2011a,b). Moreover, amygdala activation in response to happy facial expression was associated with the personality trait extraversion (Canli et al., 2002; Canli, 2004), that might have some associations with generalized self-efficacy. Furthermore, PSS was found to moderate the relation between amygdala activity in response to fearful and angry facial expressions and anxiety trait, such that only low PSS predicted the relation between amygdala activity and anxiety trait (Hyde et al., 2011).

Taken together, these different findings call for future studies that will enable their integration into a single comprehensive framework using diverse methodologies to measure functional signal in face related regions and the connectivity between these regions, as well as genetic, self and face processing measures. We hypothesize that individual differences in variables associated with self-concept will manifest in cognitive processing biases that would be related to gene polymorphism accompanied by variations in the coupling of amygdala and frontal areas. Accordingly, resilient individuals will show lower amygdala reactivity to angry faces, and this reactivity would be due to enhanced suppression from frontal areas.

Another related brain region that is considered part of the extended face processing network is the insula, known for its involvement in affective processing (Adolphs, 2002) and empathy (Wicker et al., 2003; Adolphs, 2009; Singer and Lamm, 2009). Consistently with this account, the abilities to recognize and experience facial expressions (specifically disgust) are impaired in individuals with bilateral lesions in the insular cortex (Calder et al., 2000; Adolphs et al., 2003). In addition to these roles, that may be mediated by the connectivity of the insular cortex to the amygdala, this region is also considered part of the visceral somatosensory cortex and hence may be involved in modulating introspective information (Craig, 2002, 2008) as well as mediating responses to aversive stimuli (Phillips et al., 2003a). Thus, in light of our findings, and emanating from the notion that self-perception affects how individuals modulate their outer surrounding, future studies linking the insula activation and functional connectivity during emotional face recognition and its associations with self/social variables are warranted. Importantly, previous findings already allude to such an association; for example, insula activation during emotional recognition was associated with trait anxiety (Stein et al., 2007), social phobia (Gentili et al., 2008), schizophrenia and affective disorders (for review see Phillips et al., 2003b).

Moreover, future longitudinal studies should enable the construction of a more cohesive map of the relations in our proposed model. Such a line of inquiry is also expected to illuminate other alternative explanations, for example that biased perception may serve as a consequence factor (Koster et al., 2009) being influenced by either/both psychopathology and/or resilience (MacLeod et al., 2002; Mathews and MacLeod, 2005; Yiend, 2010). We stress that even though all of our studies were conducted on a non-clinical population and we controlled for depressive symptoms (Tanzer et al., 2013a,b, Submitted), we cannot completely rule out other probable explanations such as the possibility that previous psychopathological conditions (e.g., anxiety or affective disorders) might have accounted for some of the bias found in our results.

Also, individuals' past experiences and exposure to their caregivers' facial expressions might not only influence how these individuals form their sense of self, but also the saliency of these expressions later on. For example, it has been demonstrated that maltreated children directed their attention away from angry faces, as compared to controls, and interestingly, this bias to avoid threatening stimuli was dependent on the severity of the physical abuse they suffered from (Pine et al., 2005). Also, as suggested above, future studies focusing on genetic markers and their interaction with self-variables in association with biased face processing, may shed more light on other possible explanations emanated from the nurture vs. nature problem (i.e., consequences vs. predispositions). Nevertheless, our experimental-manipulation alludes to the suggested interpretations that self/social variables serve as predispositions that may lead to a cognitive bias for emotional face perception (i.e., consequence) which may affect risk/resilience to psychopathology and not vice versa. Moreover, previous studies that examined these associations and explored psychological interventions to alter biased processing, found supportive evidence for such a causal link among healthy populations (Mathews and MacLeod, 2002; MacLeod et al., 2002; Browning et al., 2007, 2010; Murphy et al., 2009; Hakamata et al., 2010; Wells et al., 2010).

### **ACKNOWLEDGMENTS**

The research described in this paper was partially supported by a grant from the Israeli Science Foundation (ISF 384/10) to Galia Avidan.

### **REFERENCES**


the establishment of novel treatment for anxiety. *Biol. Psychiatry* 68, 982–990. doi: 10.1016/j.biopsych.2010.07.021


vulnerability to stress-related psychopathology. *Am. J. Psychiatry* 162, 291–296. doi: 10.1176/appi.ajp.162.2.291


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 24 April 2014; accepted: 18 July 2014; published online: 13 August 2014*. *Citation: Tanzer M, Shahar G and Avidan G (2014) Project PAVE (Personality And Vision Experimentation): role of personal and interpersonal resilience in the perception of emotional facial expression. Front. Hum. Neurosci. 8:602. doi: 10.3389/fnhum.2014.00602*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Tanzer, Shahar and Avidan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Using hypnosis to disrupt face processing: mirrored-self misidentification delusion and different visual media

## *Michael H. Connors 1,2,3 \*, Amanda J. Barnier 1,2 , Max Coltheart 1,2 , Robyn Langdon1,2 , Rochelle E. Cox1,2 , Davide Rivolta4,5,6 and PeterW. Halligan1,7*

<sup>1</sup> ARC Centre of Excellence in Cognition and its Disorders, Sydney, NSW, Australia

<sup>3</sup> Dementia Collaborative Research Centre, School of Psychiatry, University of New South Wales, Sydney, NSW, Australia

<sup>4</sup> School of Psychology, University of East London, London, UK

<sup>5</sup> Department of Neurophysiology, Max Planck Institute for Brain Research, Frankfurt am Main, Germany

<sup>6</sup> Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany

<sup>7</sup> School of Psychology, Cardiff University, Cardiff, UK

#### *Edited by:*

Aina Puce, Indiana University, USA

#### *Reviewed by:*

Aina Puce, Indiana University, USA Devin Terhune, University of Oxford, UK

#### *\*Correspondence:*

Michael H. Connors, Dementia Collaborative Research Centre, School of Psychiatry, Level 3, Australian Graduate School of Management Building (G27), University of New South Wales, Sydney, NSW 2052, Australia e-mail: michael.connors@mq.edu.au

Mirrored-self misidentification delusion is the belief that one's reflection in the mirror is not oneself. This experiment used hypnotic suggestion to impair normal face processing in healthy participants and recreate key aspects of the delusion in the laboratory. From a pool of 439 participants, 22 high hypnotisable participants ("highs") and 20 low hypnotisable participants were selected on the basis of their extreme scores on two separately administered measures of hypnotisability. These participants received a hypnotic induction and a suggestion for either impaired (i) self-face recognition or (ii) impaired recognition of all faces. Participants were tested on their ability to recognize themselves in a mirror and other visual media – including a photograph, live video, and handheld mirror – and their ability to recognize other people, including the experimenter and famous faces. Both suggestions produced impaired self-face recognition and recreated key aspects of the delusion in highs. However, only the suggestion for impaired other-face recognition disrupted recognition of other faces, albeit in a minority of highs. The findings confirm that hypnotic suggestion can disrupt face processing and recreate features of mirrored-self misidentification. The variability seen in participants' responses also corresponds to the heterogeneity seen in clinical patients. An important direction for future research will be to examine sources of this variability within both clinical patients and the hypnotic model.

**Keywords: delusion, face perception, hypnosis, instrumental hypnosis, mirror sign, mirrored-self misidentification, self-recognition, visual self-recognition**

#### **INTRODUCTION**

Hypnotic suggestions can temporarily disrupt or alter many cognitive processes (Hilgard, 1965; Kihlstrom, 1985, 2007; Oakley and Halligan, 2009, 2013). In visual perception, for example, specific hypnotic suggestions can cause participants to hallucinate (Szechtman et al., 1998), become blind (Bryant and McConkey, 1999), or selectively ignore particular areas of their visual field (Oakley and Halligan, 2009; Priftis et al., 2011). These experiences can be very compelling – to the point that many participants have difficulty distinguishing the hypnotically suggested alterations from reality (Woody and Szechtman, 2000, 2011; Bryant and Mallard, 2003) – yet are completely reversible (Hilgard, 1965; Kihlstrom, 1985, 2007). In some cases, these alterations may even reflect changes to otherwise automatic cognitive processes (Lifshitz et al., 2013). Hypnotic suggestion is thus a powerful tool to manipulate and study cognition (Oakley and Halligan, 2009, 2013). One such application is in the study of clinical disorders (Kihlstrom, 1979). In previous work, we used hypnotic suggestion to disrupt self-recognition and "model" the neuropsychiatric mirrored-self misidentification delusion, the belief that one's reflection in the mirror is a stranger (e.g., Connors et al., 2012a). The current experiment extends this work by using hypnotic suggestion to disrupt face processing while testing both self-recognition and face recognition across different visual media.

#### **MODELLING MIRRORED-SELF MISIDENTIFICATION DELUSION**

Mirrored-self misidentification delusion commonly occurs in dementia. Approximately 2–7% of patients with Alzheimer's disease misidentify their own reflection in the mirror (see Connors and Coltheart, 2011; Connors et al., in press-b). The delusion can also occur in schizophrenia (Gluckman, 1968) and after stroke (Villarejo et al., 2011). Patients vary in their reactions to the "stranger." Some patients treat their reflection as a companion (Phillips et al., 1996). Other patients remain indifferent (Breen et al., 2001) or are deeply suspicious of the stranger (Gluckman, 1968). The delusion can occur despite intact semantic knowledge of mirrors (e.g., being able to define their properties and function; Breen et al., 2001). The delusion can also occur despite an ability to accurately recognize *other* people's reflections in the mirror (Spangenberg et al., 1998; Breen et al., 2001; Villarejo et al., 2011).

<sup>2</sup> Department of Cognitive Science, Macquarie University, Sydney, NSW, Australia

The influential two-factor theory of clinical delusions provided by Langdon and Coltheart, 2000 (see also Coltheart et al., 2011) proposes that two separate factors are necessary for a delusion. The first factor (Factor 1) explains the *content* of a delusion and typically involves some type of perceptual and/or emotional anomaly. In the case of mirrored-self misidentification, either impaired face processing (which leads to a difficulty in recognizing one's own face in the mirror) or mirror agnosia (an inability to use mirror knowledge when interacting with mirrors) can lead to the idea that there is a stranger in the mirror (Breen et al., 2001). The second factor (Factor 2) explains why the delusion is *maintained* and involves a deficit in belief evaluation. This second factor accounts for why some patients with impaired face processing or mirror agnosia develop a delusion and others do not (for a description of patients with these deficits without the delusion, see Ellis and Florence, 1990; Connors and Coltheart, 2011). The second factor may result from damage to the prefrontal cortex. This damage may be specific to the right dorsolateral prefrontal cortex (Coltheart, 2010), though it might also involve other areas, such as the ventromedial prefrontal cortex (Gilboa, 2010; Turner and Coltheart, 2010) or right inferior frontal gyrus (Sharot et al., 2011).

Delusions are difficult to study because of co-occurring symptoms and impairments. Mirrored-self misidentification delusion, in particular, is difficult to study because of the cognitive and neurological deterioration associated with dementia. Hypnotic suggestion allows researchers to recreate critical aspects of the delusion while avoiding some of these challenges (Kihlstrom, 1979; Kihlstrom and Hoyt, 1988; Cox and Barnier, 2010; Connors, 2012). Hypnotic suggestion is able to recreate many of the "surface features" of mirrored-self misidentification. The majority of high hypnotisable participants ("highs"), for example, who are hypnotized and given a suggestion to see a stranger in a mirror, report this experience and show features strikingly similar to clinical patients (Barnier et al., 2008, 2011). Participants, for example, maintain this belief when challenged and interact with their reflection as if it were another person.

Hypnosis may also be able to model the underlying neuropsychological processes of mirrored-self misidentification delusion as specified by the two-factor theory. Whereas a suggestion for impaired face processing or mirror agnosia may produce the content of the delusion (Factor 1), hypnosis by itself may disrupt belief evaluation (Factor 2; Connors et al., 2012a,b, 2013). People tend to accept ideas during hypnosis that they would normally reject in an ordinary, everyday state of consciousness (Shor, 1959). In support of this, previous research has shown that a hypnotic induction by itself reduces the ability of highs to distinguish between suggested and real events (Bryant and Mallard, 2003); encourages more holistic, rather than detail-oriented, processing of visual memory (Crawford and Allen, 1983); and affects brain areas, such as the upper pons, thalamus, rostral areas of the right anterior cingulate cortex, prefrontal cortex, and right inferior parietal lobule, that are involved in attention, absorption, and critical thinking (Rainville et al., 2002; Oakley, 2008; Deeley et al., 2012). Our previous research has compared participants given suggestions either with or without hypnosis to manipulate Factor 2 and demonstrated that hypnosis is necessary for most participants to experience the delusion (Connors et al., 2012a, 2013). Specific suggestions within hypnosis may thus allow researchers to create a laboratory model of mirrored-self misidentification and hence the unique opportunity to investigate selective cognitive influences in a controlled manner.

#### **RESPONSES TO DIFFERENT VISUAL MEDIA**

Given the apparent success of the hypnotic modeling paradigm so far, the current experiment aimed to better define some of the parameters of hypnotic mirrored-self misidentification. In particular, we focused on the impaired face processing (Factor 1) thought to be responsible for the delusion's content and sought to extend previous research in three ways. First, we examined whether hypnotic disruptions to self-face recognition generalized to include other visual media. This is directly relevant to the clinical disorder. Some patients with mirrored-self misidentification, for example, remain able to recognize themselves in photographs (Phillips et al., 1996; Breen et al., 2001) and small, handheld mirrors (Kumakura, 1982; Feinberg, 2001). Other patients, however, fail to recognize themselves in photographs (Biringer et al., 1991) or in any type of mirror or reflective surface (Gluckman, 1968; Spangenberg et al., 1998). In healthy participants, there is also evidence that self-face recognition in photographs involves different neural mechanisms to mirror images (Butler et al., 2012; Suddendorf and Butler, 2013).

Second, we examined whether the hypnotic mediated disruptions to face processing affected recognition of *other* people's faces. In the clinical condition, patients with mirrored-self misidentification vary to the extent that they can recognize images of other people. Whereas some patients recognize people other than themselves in the mirror (e.g., Spangenberg et al., 1998; Breen et al., 2001; Van den Stock et al., 2012), other patients report that all people in the mirror are strangers (Phillips et al., 1996; Breen et al., 2001). Some patients are also impaired in recognizing famous faces (Breen et al., 2001). The current experiment therefore examined whether participants recognized the hypnotist in the mirror, a photograph of person familiar to them (their lecturer), and a series of famous faces.

Finally, we attempted to create a more general deficit in face processing and examined whether the type of impairment specified in the hypnotic suggestion affected participants' responses. This is theoretically important because there are different views on the type of face processing deficit responsible for mirroredself misidentification. The account by Phillips et al. (1996) implies that a deficit specific to self-face recognition is responsible for the content of the delusion and explains why some patients can recognize other people in the mirror but not themselves. An alternative account by Breen et al. (2001; see also Langdon, 2011), however, suggests that a more general face processing deficit is responsible for the content of the delusion and is evident in neuropsychological tests of face processing in some patients. Against this background, this experiment compared two suggestions to help disambiguate different types of face processing deficit. The first suggestion was the Factor 1 suggestion for impaired face processing used in previous work (Connors et al., 2012a, 2013, in press-a). This suggestion

indirectly implied that participants would only fail to recognize their own face in the mirror, so is referred to here as the *suggestion for impaired self-face recognition.* The second suggestion was a new suggestion designed to impair recognition of all faces. It is referred to here as the *suggestion for impaired general-face recognition*.

#### **OVERVIEW OF THE CURRENT EXPERIMENT**

A hypnotist provided high and low hypnotisable participants with a hypnotic induction and either a suggestion for impaired self-face recognition or a suggestion for impaired general-face recognition. The experimenter then asked participants to identify who they saw in a mirror and in a series of photographs that included participants' own photograph and a photograph of a high profile lecturer from their psychology course. The experimenter then tested participants' ability to recognize famous faces in a forced-choice familiarity test (see Young and De Haan, 1988; Rivolta et al., 2010, 2012). After this, the experimenter tested whether participants could identify themselves in a live video image and then a handheld mirror. Next, to assess participants' understanding of mirrors, the experimenter asked them to define mirrors and to touch a ball that was only visible by its reflection in the mirror on the wall (see Connors and Coltheart, 2011). Finally, the experimenter tested participants' ability to recognize the hypnotist in the mirror when the hypnotist stood next to them.

This order of tests was not counterbalanced as previous work suggested that some challenges were more likely to break down the delusion than others (Barnier et al., 2011; Connors et al., 2012a). As a result, the tests were presented in a fixed order, starting with those considered to be least confronting and ending with the most confronting. It was expected that both suggestions would generate mirrored-self misidentification in highs, but not lows. In particular, it was expected that whereas highs given the suggestion for impaired self-face recognition would be able to recognize themselves in the other visual media and recognize other faces, highs given the suggestion for impaired general-face recognition would not recognize themselves or other faces in any media.

#### **MATERIALS AND METHODS**

#### **PARTICIPANTS AND DESIGN**

Participants were selected from a pool of 439 students (101 males, 318 females, 20 not disclosed) of mean age 22.06 years (SD = 6.25) on the basis of a 10-item modified version of the *Harvard Group Scale of Hypnotic Susceptibility, Form A* (HGSHS:A; Shor and Orne, 1962). High scorers (participants who scored 7 or greater) and low scorers (participants who scored 3 or less) were invited to participate in the current experiment, which also included an 11 item modified version of the *Stanford Hypnotic Susceptibility Scale, Form C* (SHSS:C; Weitzenhoffer and Hilgard, 1962)<sup>1</sup> in the same session. Participants received payment (\$20 for 1.5 h) for their

involvement. A total of 51 participants (16 males, 35 females) of mean age 21.92 years (SD = 6.10) completed this session. Only participants who scored in the range 7–11 (*highs*) or 0–3 (*lows*) on both the HGSHS:A and SHSS:C were included in the analyses.

The final sample consisted of 22 highs (8 males, 14 females) of mean age 21.32 years (SD = 3.85), and 20 lows (7 males, 13 females) of mean age 21.15 years (SD = 5.28). Highs had a mean score of 8.05 (SD = 0.90) on the HGSHS:A and 8.91 (SD = 1.23) on the SHSS:C. Lows had a mean score of 1.60 (SD = 1.19) on the HGSHS:A and 1.55 (SD = 1.19) on the SHSS:C. Participants were tested in a 2 (hypnotisability: high vs. low) × 2 (suggestion: impaired self-face recognition vs. impaired general-face recognition) between-subjects design. Participants were asked not to participate if they had any ongoing psychological condition, problems with substance abuse, or if they had ever suffered a serious head injury or neurological illness. All participants provided written informed consent. Research was approved by the Macquarie University Human Research Ethics Committee.

#### **MATERIALS AND PROCEDURE**

The hypnotist tested participants individually in a 90 mins session. This session consisted of an experimental session and a postexperimental inquiry. Both the experimental session and the postexperimental inquiry were recorded using a video camera.

#### *Experimental session*

Before the experiment, the hypnotist briefly explained the experiment and obtained participants' informed consent. Next, the hypnotist took participants' photograph using a digital camera. The hypnotist then printed the photograph, unbeknownst to participants, who were occupied completing payment forms. To do this, the hypnotist used a Canon Selphy CP780 compact photo printer to produce a standard 14.8 cm × 10.0 cm color photograph. Once printed, the photograph was placed in a photo album containing nine other photographs of faces that were produced using the same camera and printer.

The hypnotist then administered a standard hypnotic induction (∼10 min, from the SHSS:C; Weitzenhoffer and Hilgard, 1962). The hypnotist administered the first 10 items from the SHSS:C and scored participants' responses.

*Suggestion.* After these items, the hypnotist uncovered a mirror (∼40 cm × 50 cm) that was mounted on a wall next to the participants' chair. The mirror was positioned so that participants could look directly into it by turning their head to the left and leaning slightly forward (see **Figure 1**). The hypnotist gave participants one of two suggestions for a deficit in face processing. Participants were randomly assigned to receive either the suggestion for impaired self-face recognition (11 highs, 10 lows) or the suggestion for impaired general-face recognition (11 highs, 10 lows). The suggestion for impaired self-face recognition was:

<sup>1</sup>The 10-item modified HGSHS:A included: head falling, eye closure, hand lowering, finger lock, moving hands together, communication inhibition, experiencing of fly, eye catalepsy, posthypnotic suggestion, and posthypnotic amnesia; arm rigidity and arm immobilization items were removed to ensure that the procedure could be conducted within the time limits of a1hclass. The 11-item tailored SHSS:C

included: hand lowering, moving hands apart, mosquito hallucination, taste hallucination, arm rigidity, dream, age regression, arm immobilization, anosmia, negative visual hallucination, and posthypnotic amnesia; the auditory hallucination item was removed to ensure that the procedure could be conducted within the time limits of a 1 h individual session.

When you look to your left, there will be a mirror there, and you will see a person in it. When you see this person in the mirror, you will not be able to recognize this person. When you open your eyes and turn your head to your left, whilst remaining as deeply relaxed and comfortably hypnotized as you feel now, you will see a face in the mirror that you will not be able to identify, as if you have never seen this face before.

#### The suggestion for impaired general-face recognition was:

When you look to your left, there will be a mirror there, and you will see a person in it. When you see this person in the mirror, you will not be able to recognize this person. In fact, when you open your eyes and look around, you will not be able to recognize any person you see. That's right, whenever you see a face, it will seem unfamiliar to you and you will not be able to recognize who it is. When you open your eyes whilst remaining as deeply relaxed and comfortably hypnotized as you feel now, all faces will seem unfamiliar to you and you will not be able to recognize them.

The hypnotist checked that participants understood the suggestion. The hypnotist then asked participants to slowly open their eyes, turn their head to the left, and look into the mirror.

*Test 1: mirror 1.* The hypnotist asked participants to identify who they saw in the mirror and to briefly describe them. If participants reported seeing someone other than themselves, the hypnotist asked participants if they had ever seen this person before.

*Test 2: photograph.* The hypnotist handed participants a photo album that contained the participants' photograph and nine other photos (eight of unfamiliar faces, one of their lecturer's face) in one of four fixed randomized orders. The hypnotist asked participants to look at each photo one at a time and to indicate whether the face was familiar or unfamiliar. If participants reported that a face was familiar, the hypnotist asked participants who the person was. When this was completed, the hypnotist took the photo album from participants and asked participants to close their eyes.

*Test 3: Famous faces.* The hypnotist placed a keyboard on participants' lap and started the forced choice familiarity task of famous faces on the computer (see Rivolta et al., 2012, for more detail). As shown in **Figure 1**, the computer was positioned in the room approximately 45◦ to the participants' right; participants were asked to swivel their chair to face the screen directly. The hypnotist explained to participants that two faces would appear on the computer screen at the same time. One face would belong to someone famous; the other face would belong to someone who was not famous. Participants had to indicate using the keyboard which face was the famous face – that is, whether they thought the famous face was on the left or on the right. The task had 30 trials and involved 30 sets of faces: 30 famous faces (actors, politicians, and musicians who were well known to Australian participants) and 30 unfamiliar faces matched as closely as possible for age, sex, and attractiveness. The famous faces included Jennifer Aniston, Tony Blair, Sandra Bullock, George Bush, Nicholas Cage, Prince Charles, Bill Clinton, George Clooney, Kevin Costner, Tom Cruise, Robert De Niro, Johnny Depp, Cameron Diaz, Leonardo DiCaprio, Clint Eastwood, Queen Elizabeth II, Mel Gibson, Hugh Grant, Tom Hanks, Paris Hilton, Dustin Hoffman, John Howard, Nicole Kidman, Madonna, Kylie Minogue, Brad Pitt, Julia Roberts, John Travolta, Robin Williams, and Catherine Zeta-Jones. The faces were presented as black and white photographs, approximately 10 cm high, on a 51 cm × 32 cm (24) Macintosh computer screen. The order and positioning (left vs. right) of the famous faces were randomized. Participants were approximately 50 cm from the computer screen and gave their responses by pressing relevant keys on the keyboard. There was no time limit on responses; once a response was selected, the next set of faces appeared. The hypnotist told participants they should try to be as accurate as they could and that if they were unsure they should guess (there was no emphasis on speed). After these instructions, the hypnotist

asked participants to open their eyes and begin the task. When the task was completed, the hypnotist took the keyboard from participants.

*Test 4: mirror 2.* The hypnotist asked participants to look again at the mirror on their left and to identify who they saw. This was done to see whether participants maintained their delusion after the famous faces task.

*Test 5: video.* The hypnotist activated a live video feed of the participants' face and shoulders on the computer screen. This required a second video camera, focused on participants, which was concealed above the computer screen. The hypnotist asked participants to look at the computer screen and identify who they saw. The hypnotist then turned off the computer screen.

*Test 6: handheld mirror.* The hypnotist gave participants a handheld mirror to hold and asked them to identify who they saw in it. The hypnotist then took the handheld mirror from participants.

*Test 7: mirror agnosia.* The hypnotist first asked participants to define what mirrors are. The hypnotist then held a plastic ball, slightly larger than a tennis ball, above participants' shoulder and asked them to touch the ball. The hypnotist looked to see whether participants reached towards the ball or towards the ball's reflection in the mirror (as in mirror agnosia; see Connors and Coltheart, 2011).

*Test 8: mirror 3 and hypnotist's reflection.* The hypnotist asked participants to once again look at the mirror on the wall to their left. The hypnotist then moved position so that participants could see the hypnotist's reflection in the mirror. The hypnotist asked participants who they saw. If participants reported seeing the hypnotist but not themselves, the hypnotist asked them to explain how they could see the hypnotist but not themselves. The hypnotist then touched participants on the shoulder while they were looking in the mirror and asked participants what happened.

*Cancellation and deinduction.* The hypnotist canceled the suggestion by telling participants that everything was back to normal and that they were able to recognize themselves and otherfaces, just as they always had been able to. The hypnotist asked participants to look in the mirror once more and checked that they could recognize themselves. Next, the hypnotist gave participants the final SHSS:C suggestionfor posthypnotic amnesia and administered the SHSS:C deinduction, which involved gradually awakening participants as the hypnotist counted from 20 to 1. The hypnotist then tested and canceled participants' posthypnotic amnesia.

#### *Postexperimental inquiry*

For all media (mirror, photograph, video, handheld mirror), the hypnotist asked participants to describe their experience of looking at it and to rate the extent to which they believed that they were looking at a stranger (1 = *not at all*, 7 = *completely*). The hypnotist also asked participants to repeat the famous faces task to assess whether participants showed different responses when not affected by hypnosis or suggestion. Finally, the hypnotist debriefed participants and thanked them for their time.

#### *Coding of responses*

After testing all participants, the hypnotist and a rater (who was unaware of the aims of the experiment and the conditions in which participants were tested) independently examined the videotape records of the experiment. The two raters scored whether or not participants recognized themselves in each of the different visual media. The raters also scored whether or not participants recognized their lecturer in a photograph and the hypnotist in the mirror. Interrater reliability was 100%.

## **RESULTS**

## **EXPERIENCING THE DELUSION**

Participants were scored as passing the suggestion if they identified their reflection in the mirror as someone other than themselves. Overall, 9 (82%) highs given the suggestion for impaired self-face recognition and 5 (46%) highs given the suggestion for impaired general-face recognition passed the suggestion. Fisher's exact test showed that this difference did not reach statistical significance, *p* = 0.18. No lows passed the suggestion. The 14 highs who reported seeing a stranger were asked if they had ever seen this person before. Of these, 10 (71%; 8 impaired self-face recognition, 2 impaired general-face recognition) said they had never seen the person before, 2 (14%; 2 impaired general-face recognition) said they had seen the person before, and 2 (14%; 2 impaired self-face recognition) were unsure. Consistent with previous research (Connors et al., 2013, 2014), a *post hoc* analysis revealed that highs who passed the suggestion had higher SHSS:C scores than highs who failed the suggestion, *F*(1,18) = 4.56, *p* = 0.05, η<sup>2</sup> <sup>p</sup> = 0.20, but did not differ on HGSHS:A scores, *F*(1,18) = 0.24, *p* = 0.63, η<sup>2</sup> <sup>p</sup> = 0.01. The remainder of the results focus on the highs who passed the suggestion unless otherwise specified.

#### **RESPONSE TO THE DIFFERENT MEDIA**

The responses of participants to the different visual media are shown in **Table 1**. Participants were scored as being impaired on these tests if they failed to identify themselves. Statistical comparisons using Fisher's exact test revealed that more highs given the impaired self-face recognition suggestion failed to recognize themselves in the photograph (*p* = 0.02) and in the mirror the second time it was presented (*p* = 0.02) than highs who received the impaired general-face recognition suggestion. There was, however, no differences between suggestions in terms of highs' responses to the video (*p* = 0.15), handheld mirror (*p* = 0.59), or the mirror on its third presentation (*p* = 1.00).

Overall, three highs (27%) given the impaired self-face recognition suggestion and one high (9%) given the impaired general-face recognition suggestion failed to recognize themselves in all visual media – these four highs maintained the suggested experience across all tests. In contrast, two highs (18%) given the impaired self-face recognition suggestion and six highs (55%) given the impaired general-face recognition suggestion recognized themselves in all visual media – these eight highs failed the suggested experience. The remaining six highs (55%) given the impaired self-face recognition suggestion and four highs (36%) given the impaired general-face recognition suggestion showed mixed responses – these ten highs recognized themselves in some media


**Table 1 |The number and percentage of participants who failed the visual tests.**

Tests in italics involved recognition of other people; \*test does not involve self or other recognition.

but not others. Some of these highs initially failed to recognize themselves in the mirror but breached the suggested experience during the course of the experiment. In the case of the impaired self-face recognition suggestion, two of the nine highs (22%) who initially passed the suggestion reported recognizing themselves in the mirror the second time it was presented. In the case of the impaired general-face recognition suggestion, four of the five highs (80%) who initially passed the suggestion reported recognizing themselves in the mirror the second time it was presented. This left only one high given the impaired general-face recognition suggestion who failed to recognize themselves across different visual media. These findings implied that the experience of the impaired general-face recognition suggestion broke down more quickly than the experience of the impaired self-face recognition suggestion.

Despite this, highs given the impaired general-face recognition suggestion were more likely to not recognize other people than highs given the impaired self-face recognition suggestion. A greater proportion of highs who passed the impaired general-face recognition suggestion failed to recognize their lecturer's photograph or the hypnotist in the mirror than highs who passed the impaired self-face suggestion (**Table 1**). This difference between suggestions was also evident in the famous faces task. Participants were scored as being impaired on the famous faces task if their scores were at chance during the experiment, but significantly above it once the suggestion was canceled. As shown in **Table 1**, two highs (18%) given the suggestion for impaired general-face recognition met this criterion: They scored 10/30 and 14/30 during the experiment, but were unimpaired when they repeated the task in the postexperimental inquiry and scored 30/30 and 28/30, respectively. In contrast, no highs given the impaired self-face suggestion and no lows had difficulty completing the famous faces task (for highs, *M* = 26.68, SD = 5.08; for lows, *M* = 27.35, SD = 2.23). A repeated-measures ANOVA, however, revealed no group differences between highs and lows or between the two suggestions, most likely due to the small number of participants experiencing these effects (all *F*s < 3.22, all *p*s > 0.08, all η2 ps < 0.08).

All participants' ratings of belief in the postexperimental inquiry are shown in **Table 2**. Ratings across the different media were compared using a mixed ANOVA with between-subject factors of hypnotisability (high vs. low) and suggestion (impaired self-face recognition vs. impaired general-face recognition) and a within-subject factor of visual media (mirror, photograph, video, handheld mirror). In all media, highs rated their belief that they were looking at a stranger higher than lows, *F*(1,38) = 33.77, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.47. There was also a significant difference between visual media, *F*(3,38) = 11.25, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.23, and a significant interaction between hypnotisability and visual media, *F*(3,38) = 8.60, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.19. Whereas highs overall reported moderate ratings for the mirror and gave declining ratings thereafter, lows reported consistently low ratings for all visual media. There was no difference between suggestions and no interactions between hypnotisability and suggestion (all *F*s < 3.63, all *p*s > 0.07, all η<sup>2</sup> ps < 0.09). Overall, this indicates that the effects were limited to highs, effects declined somewhat over the visual media, and there were no clear differences between the two suggestions.

During the postexperimental inquiry, highs who passed the suggestion described a compelling experience. When asked about their experience of looking in the mirror, highs given the suggestion for impaired self-face recognition made comments like, "It just wasn't me. I thought that if I looked in the mirror, I would see me, but it didn't look or feel like me." Another high given this suggestion said, "It was a bit bewildering actually ... I was looking at someone in there but I couldn't register who it was. I was confused. I thought, 'Who is this person?"' Highs given the suggestion for impaired general-face recognition reported similar


**Table 2 |The postexperimental ratings of all participants regarding the extent to which they believed they were looking at a stranger in each of the visual media.**

Ratings were made on a scale of 1-7 (1 = not at all, 7 = completely). Standard deviations are in parentheses.

experiences. One high given this suggestion, for example, said, "It was weird. I know when you look in the mirror, it's meant to be you, but it was just unfamiliar. I just didn't recognize it was me." Another high given this suggestion said, "I actually felt like there was actually another person in the mirror. That another person was looking back at me. They felt familiar, but I didn't know who they were."

When asked about the other visual media, many highs reported similar experiences as when looking in the mirror. When describing the experience of looking at his photo, one high given the suggestion for impaired self-face recognition said, "I remember looking at it and being confused, like I was in the mirror. I felt as if I should know who it was, but I didn't." Another high given this suggestion described looking at her photo in a similar way: "I eventually came to the conclusion that I had never seen this person before. It was a similar experience to when I was looking in the mirror."When asked about the live video, highs said, "It felt weird, very similar to the feeling I had when I looked in the mirror. It just felt like I should be seeing me but it wasn't me. Sort of familiar, like feeling familiar with it, but also very unfamiliar." Other highs made comments like, "He looked very familiar. It looked like the guy in the mirror" and "I didn't think it looked like me. It just felt like someone really foreign, someone I wasn't familiar with."

When asked about the handheld mirror during the postexperimental inquiry, one high said he saw, "The same thing [as the mirror]. Just familiar but unfamiliar. Not what I would normally expect to see and feel." The one high who received the suggestion for impaired general-face recognition and maintained the delusion reported that she did not remember her experiences looking in the mirror. Such unsuggested posthypnotic amnesia is rare (Hilgard and Cooper, 1965; Hilgard, 1966; Cooper, 1979), but was also present in a participant in a previous experiment (Connors et al., 2012b). The other high given this suggestion who was impaired on the famous faces task described his experience as very compelling: "I found it extremely difficult. They both just looked famous, I could not really tell. Sometimes I could tell them apart after a while but sometimes I just had no clue who it was." However, these highs were in the minority; the majority of highs reported recognizing the famous faces and recognizing themselves in the handheld mirror.

A number of highs who did not show the delusion reported that they had some difficulty recognizing themselves. Three highs who received the suggestion for impaired general-face recognition said that they were initially unsure who they were looking at. Two of these highs said that they concluded it was them when they noticed the person in the mirror was wearing the same clothes as them, and the third said he recognized it was him when he saw the person move at the same time as he did. Likewise, four highs who displayed the delusion and failed to recognize themselves in the mirror (two impaired self-face recognition, two impaired general-face recognition) reported some initial difficulty recognizing themselves in the live video. These highs said they concluded it was themselves because they recognized the room they were in. A further three highs who experienced the delusion after the suggestion for impaired self-face recognition said that they had difficulty recognizing themselves in the handheld mirror but that the fact that they were holding and controlling it led them to believe it was themselves. Finally, one high (given the suggestion for impaired self-face recognition) breached her delusion after the hypnotist appeared next to her. This participant described having difficulty reconciling her subjective experience with what she knew to be true: "When you moved behind me, I realized it had to be me in the mirror. I still had some doubts though. My experience was that I still didn't think it was me, but logically it had to be me."

## **DISCUSSION**

### **OVERVIEW**

Both hypnotic suggestions disrupted the ability of highs to recognize themselves in the mirror. Highs, however, showed a different pattern of responses to the other visual media depending on the nature of the suggestion received. When tested on their ability to recognize themselves in other visual media, a proportion of highs given the suggestion for impaired self-face recognition failed also to recognize themselves in a photograph, in a live video, and in a handheld mirror. In contrast, only one high who received the suggestion for impaired general-face recognition failed to recognize herself in other visual media. When tested on their ability to recognize other faces using the famous faces task, no highs given the suggestion for impaired self-face recognition were impaired, whereas two highs given the suggestion for impaired general-face recognition were impaired. Although these findings are obviously limited by the small numbers of highs passing and maintaining the delusion, the findings show the potential for these two suggestions to model different aspects of mirrored-self misidentification.

#### **SELF-FACE RECOGNITION IN DIFFERENT VISUAL MEDIA**

As in previous work (Connors et al., 2012a), a hypnotic suggestion for impaired self-face recognition was able to recreate the surface features of the mirrored-self misidentification delusion. In particular, participants reported that their reflection was not themselves and maintained this belief over time. The current experiment extended previous findings by examining how participants with the hypnotic delusion responded to different visual media. The findings show that this suggestion affected the ability of some highs to recognize themselves in other visual media, despite not directly specifying this in the suggestion. As expected, however, the suggestion for impaired self-face recognition did not impair the ability of highs to recognize other people. Highs given this suggestion correctly identified their lecturer's photograph, identified the hypnotist in the mirror, and were not impaired in the famous faces task. These highs also showed an intact procedural understanding of mirrors. These findings indicate hypnotic suggestion might be able to selectively impair self-face recognition in some participants. Nevertheless, this pattern of responses differs from some clinical patients with mirrored-self misidentification who often show more general deficits in face processing (Phillips et al., 1996; Breen et al., 2001; Van den Stock et al., 2012).

This experiment used a new suggestion – a suggestion for impaired general-face recognition. This suggestion for impaired general-face recognition, however, did not seem to be as successful at generating mirrored-self misidentification as the original suggestion. Fewer participants receiving this suggestion reported the delusion than those receiving the original suggestion, although this difference did not reach statistical significance. The resulting delusion also broke down quickly, leaving only one participant who maintained the delusion through all the tests. This participant failed to recognize herself or other people in any of the different visual tests, yet showed an intact procedural understanding of mirrors. Although limited by the single participant, this high demonstrates that it is possible to generate a general face processing deficit using hypnotic suggestion. Importantly, two highs given the suggestion for impaired general-face processing were impaired on the famous faces task. These participants performed at a level very similar to patients with prosopagnosia, a condition in which participants have difficulty recognizing faces (Behrmann and Avidan, 2005; Rivolta et al., 2013) and show impairments in recognizing famous faces in forced choice tasks (Young and De Haan, 1988; Rivolta et al., 2012). Unlike patients, however, these two participants showed no sign of impairment once the suggestion was canceled. These findings indicate that hypnotic suggestion can create a general face processing deficit that can be measured on a formal neuropsychological test. The findings are consistent with Oakley and Halligan (2013), who used hypnotic suggestion to model prosopagnosia in a single participant. The current experiment replicated these finding using a more stringent, forced-choice measure, though only in two participants. Together, these findings indicate that hypnotic suggestion may be able to disrupt face processing in certain high hypnotisable participants. However, the fact that only 18% of highs given this suggestion showed this deficit reveals the difficulty of this type of hypnotic suggestion (as a comparison, 23% of highs in this experiment passed the suggestion for negative visual hallucination – to not see a specific object – in the SHSS:C; this suggestion is known to be difficult even for highs; Hilgard, 1965).

Other factors may have also prevented some participants from responding to the suggestion for impaired general-face recognition. Three participants reported in the postexperimental inquiry that they felt anxious when they heard this suggestion and were worried about what it would be like to not recognize faces. None of these participants experienced the delusion and it is possible that their anxiety interfered with their response to the suggestion. A fourth participant reported in the postexperimental inquiry that she had difficulty imagining what it would be like to not recognize faces. This participant likewise did not develop the delusion and it is possible that her difficulty anticipating the effects of the suggestion prevented her from responding. Overall, these findings highlight a limitation of using hypnosis to model clinical conditions. Responses are affected by factors such as the participants' expectations and interpretations, as well as the relative difficulty of the suggestion. It is thus not the verbatim suggestion, but the participants' interpretation of the suggestion and ability to experience it that shapes their response (McConkey, 1991, 2008). It is important to consider these factors when designing a hypnotic analog (see Connors et al., 2012b).

For both suggestions, a proportion of highs breached the hypnotic delusion during the visual tests. The visual tests, although not designed to challenge participants' hypnotic experiences, provided accumulating evidence against the hypnotic delusion and this may have led some highs to breach their delusion. As a result, it is difficult to compare the different tests because they were given in a single order that was designed to minimize breaching. However, the fact that some highs breached the delusion is consistent with previous research, which found that directly challenging the hypnotic delusion with confronting evidence led some participants to breach the delusion and report seeing themselves in the mirror (Connors et al., 2012a). The finding is also consistent with research that has found that a proportion of highs experiencing a hypnotic delusion (Noble and McConkey, 1995; Cox and Barnier, 2009) or posthypnotic amnesia (Kihlstrom et al., 1980; McConkey and Sheehan, 1981; Coe, 1989; Coe and Sluis, 1989) breach their experience in response to challenges. Hypnotic effects require participants to resolve the conflict between objective reality and the suggested experience (McConkey, 1983; Mallard and Bryant, 2006). Challenges both draw attention to and increase this conflict, leading some participants to breach the suggested effect. Nevertheless, a proportion of highs maintain their hypnotic responses in the face of confronting evidence and an important question for future research is whether particular individual differences predict whether participants maintain or breach their hypnotic experience (Connors et al., 2014).

#### **HETEROGENEITY IN RESPONSES**

As in previous work (Connors et al., 2012a), hypnotized participants displayed considerable variation in their responses to hypnotic suggestions and this variation corresponds to heterogeneity seen in clinical reports. Both hypnotized participants and clinical patients, for example, vary in the extent to which they recognized themselves in photographs, video, and handheld mirrors (Biringer et al., 1991; Breen et al., 2001; Connors and Coltheart, 2011). For both hypnotized participants and clinical patients, it is likely that the specific properties of the different visual media influence self-recognition. These properties may, in part, explain why some participants (and patients) recognize themselves in some visual media but not in others. Mirrors, for example, offer movement and depth cues that are not present in photographs. As a result, mirrors provide a highly realistic image that could be confused with a real person, whereas photographs provide a static, two-dimensional image that is unlikely to be confused in the same way (see also Butler et al., 2012; Suddendorf and Butler, 2013). In a similar way, a handheld mirror shows just the face in its narrow field of vision and is accompanied with greater physical control of the visual image than a larger mirror on the wall. All these cues could lead some participants and patients to identify themselves in a handheld mirror, despite being unable to identify themselves in a larger mirror on the wall and a clear understanding of how mirrors operate.

A large part of the variability, however, may also originate from the participants and patients themselves. Within the hypnotic model, for example, there are a number of sources of variation. Highs might interpret the same verbal suggestion in different ways to each other (see McConkey, 1991, 2008) and/or differ in their ability to experience specific types of hypnotic effects (see Woody et al., 2005). As a result, they may have different responses to the visual media. Highs also could use different cognitive strategies to experience the suggestion and this could lead to different responses (McConkey, 1991, 2008; McConkey and Barnier, 2004). Previous research, for example, has shown that highs using a constructive strategy (in which they actively use cognitive strategies to experience the hypnotic suggestion) were more likely to pass a suggestion for hypnotic blindness than participants using a concentrative strategy (in which they focused on the hypnotist's words; Bryant and McConkey, 1990). In addition, highs could vary in terms of how completely they respond to the suggestion (see Spanos, 1986). Cognitive-delusory suggestions tend to be more difficult to experience, even for highs, and highs could vary in their ability to generate a compelling and vivid experience.

In the clinical delusion, there are a number of other sources of variation. The variability, for example, could be due to the specific aspects of face processing that are impaired (see Langdon, 2011). The influential model of face processing by Bruce and Young (1986) holds that face processing involves a sequence of stages. These stages include encoding the structural properties of a face, experiencing a sense of familiarity if the face is known, accessing semantic information about the person, and naming the person. Patients who only have impairment at a late stage of face processing (such as in accessing semantic information or naming) may still experience a sense of familiarity when looking at images of themselves in some media. This sense of familiarity could provide the basis of self-face recognition in these instances (see Mandler, 1980). In contrast, patients who have more pervasive impairments or impairment at an earlier stage of face processing (such as in encoding the structural properties of faces) may fail to

experience even this sense of familiarity when looking at images of themselves. As a result, these patients may fail to recognize themselves in all media. Future research could investigate this possibility by directly testing clinical patients and potentially also by using hypnotic models.

#### **IMPLICATIONS AND FUTURE DIRECTIONS**

The current study has a number of limitations that could be addressed in future work. Given the significant variability evident among participants, larger sample sizes will be required to fully define the nature of the face-perception deficits and examine the role of individual differences. Future research could also formally test for both familiarity and recognition, use other types of face processing tests, and use larger numbers of trials to detect smaller effects. In addition, future research could examine the specific visual cues that participants use to recognize themselves in different media. Research, for example, could vary the size of the image in each media, use time delayed video footage to remove contingency cues, and disguise the video monitor as a mirror by placing a frame around it to alter expectations associated with the medium. As mirrors present images in a different orientation to photographs, reversing the axis of left and right, future research could also examine the role of this visual transformation by presenting photographs of participants and famous faces in this orientation. Finally, given that several participants suggested that their anxiety might have prevented them from experiencing the impaired general-face recognition suggestion, future research could consider revising the wording of this particular suggestion to make it appear more benign. This could be done, for example, by emphasizing that the effect would only be temporary and by suggesting that participants might find the experience both pleasant and interesting.

In addition to these issues, we acknowledge a number of important differences between clinical delusions and hypnotic models. Clinical delusions are functionally disruptive, and typically endure for long periods of time and across different contexts. In contrast, hypnotic delusions are short-lived, highly contextualized, and limited to the laboratory (see Barnier et al., 2008; Cox and Barnier, 2010). These differences between clinical and hypnotically suggested delusions obviously limit the ability to generalize experimental findings to clinical patients. For example, the longer duration of clinical delusions may lead to more extensive elaboration of the delusion, compared to the shorter exposure in otherwise healthy controls and where delusions are observed at their inception. Indeed, clinical patients with mirrored-self misidentification can often seem accustomed, even indifferent, to the stranger and attribute names and details to them (Breen et al., 2001). In contrast, many hypnotized participants appear surprised or shocked to see a stranger in the mirror and report not having seen the person before or knowing who they are. This difference in timeframe may be useful to simulate the experiences of patients when their delusion first forms, which is usually not possible to study directly in clinical patients. It is also important to note, though, that some aspects of the delusion may not be captured in the hypnotic model as they may require the persistence of the experience over long periods of time and

across different contexts. It is also important to recognize that, despite our focus on a monothematic delusion, many patients reporting this belief may experience other clinically related symptoms as a result of their overall condition (see Brodaty et al., 2013a,b).

Despite these differences, specific hypnotic suggestions could be used to test theoretical accounts of other clinical delusions. Other delusions, such as Capgras (the belief that a loved one is replaced by a visually similar impostor) and Frégoli (the belief that familiar people are following one around in disguise), may be due, in part, to disorders in face processing (Ellis and Young, 1990; see also Coltheart et al., 2011; Langdon, 2011). In Capgras delusion, loss of autonomic responsiveness to faces may lead to the idea that a known person has been replaced by an impostor. In Frégoli delusion, heightened autonomic responsiveness to faces may lead to the idea that strangers are known people in disguise. Future research could use hypnotic suggestion to manipulate face processing and model these other clinical delusions. According to Langdon and Coltheart's (2000) two-factor theory, a deficit in belief evaluation is also necessary for a delusion to form. In other research we have conducted (Connors et al., 2012a, 2013), we have found that a hypnotic induction can model this Factor 2 and specifically disrupt belief evaluation. It remains possible, however, that some individuals may not need to have a deficit in belief evaluation hypnotically induced in order to accept a suggestion for a delusional belief (Connors et al., 2013). In particular, pre-existing differences in the belief evaluation process could themselves act as Factor 2 and predispose certain individuals to delusions. Within hypnosis, there is also some evidence of variability in how highs rate their subjective experiences of a hypnotic induction (Terhune and Cardeña, 2010) and in how they objectively respond to suggestions following different types of hypnotic inductions (Brown et al., 2001). An important direction for future research, therefore, is to characterize the nature of Factor 2 in both clinical patients and hypnotic analog.

Hypnotic suggestions can also be used to investigate face processing independently of delusional belief. Specific suggestions can be designed to selectively impair specific stages of face processing within cognitive models. Adopting Bruce and Young's (1986) influential account, for example, a suggestion to not be able to discriminate features in faces could disrupt the structural encoding of faces, a suggestion to not recognize familiar faces could disrupt face recognition units that represent previously seen faces, and a suggestion to not be able to recall personal information about faces could disrupt the person identity nodes that link recognized faces to knowledge about the people. The ability to produce these effects on demand makes hypnotic suggestion particularly suited to neuroimaging (Oakley and Halligan, 2009, 2013; Woody and Szechtman, 2011). Future research could examine the underlying functional neuroanatomy and altered functional connectivity associated with hypnotic disruptions to face processing. Such investigations have the potential to inform neural models of face processing (see Gobbini and Haxby, 2007; Haxby and Gobbini, 2011; Kanwisher and Barton, 2011). While it is important to carefully screen participants both on their hypnotisability and their ability to

experience these specific suggestions in order to carry out such research, hypnotic suggestion provides a unique means of examining how higher-order cognitive processes influence different stages of face perception. As such, hypnosis offers considerable promise as a methodology to study both face perception and its pathologies.

## **ACKNOWLEDGMENTS**

We are grateful to Graham Jamieson, John Kihlstrom, and Andrew Young for helpful comments on an earlier version of this manuscript. We are also grateful to Jocelyn Elliott and Talia Morris for their research assistance. This research was supported by the Australian Research Council Centre of Excellence in Cognition and its Disorders (CE110001021). Davide Rivolta was supported by the LOWE grant Neuronale Koordination Forschungsschwerpunkt Frankfurt (NeFF).

#### **REFERENCES**


a patient with right lateralized occipito-temporal hypo-metabolism. *Cortex* 48, 1088–1090. doi: 10.1016/j.cortex.2012.03.003


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 09 October 2013; accepted: 11 May 2014; published online: 18 June 2014. Citation: Connors MH, Barnier AJ, Coltheart M, Langdon R, Cox RE, Rivolta D and Halligan PW (2014) Using hypnosis to disrupt face processing: mirrored-self misidentification delusion and different visual media. Front. Hum. Neurosci. 8:361. doi: 10.3389/fnhum.2014.00361*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Connors, Barnier, Coltheart, Langdon, Cox, Rivolta and Halligan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Individual differences in cortical face selectivity predict behavioral performance in face recognition

## *Lijie Huang1,2†, Yiying Song1,2†, Jingguang Li 1,2, Zonglei Zhen1,2, Zetian Yang1,2 and Jia Liu2,3\**

*<sup>1</sup> State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China*

*<sup>2</sup> Center for Collaboration and Innovation in Brain and Learning Sciences, Beijing Normal University, Beijing, China*

*<sup>3</sup> School of Psychology, Beijing Normal University, Beijing, China*

#### *Edited by:*

*Davide Rivolta, University of East London, UK*

#### *Reviewed by:*

*Valerie Goffaux, KU Leuven, Belgium Nicholas Furl, MRC Cognition and Brain Sciences Unit, UK Nicholas Furl, MRC CBU, UK*

#### *\*Correspondence:*

*Jia Liu, School of Psychology, Beijing Normal University, Room 405, Yingdong Building, 19 Xinjiekouwai St., Haidian District, Beijing 100875, China e-mail: liujia@bnu.edu.cn*

*†These authors have contributed equally to this work.*

In functional magnetic resonance imaging studies, object selectivity is defined as a higher neural response to an object category than other object categories. Importantly, object selectivity is widely considered as a neural signature of a functionally-specialized area in processing its preferred object category in the human brain. However, the behavioral significance of the object selectivity remains unclear. In the present study, we used the individual differences approach to correlate participants' face selectivity in the face-selective regions with their behavioral performance in face recognition measured outside the scanner in a large sample of healthy adults. Face selectivity was defined as the z score of activation with the contrast of faces vs. non-face objects, and the face recognition ability was indexed as the normalized residual of the accuracy in recognizing previously-learned faces after regressing out that for non-face objects in an old/new memory task. We found that the participants with higher face selectivity in the fusiform face area (FFA) and the occipital face area (OFA), but not in the posterior part of the superior temporal sulcus (pSTS), possessed higher face recognition ability. Importantly, the association of face selectivity in the FFA and face recognition ability cannot be accounted for by FFA response to objects or behavioral performance in object recognition, suggesting that the association is domain-specific. Finally, the association is reliable, confirmed by the replication from another independent participant group. In sum, our finding provides empirical evidence on the validity of using object selectivity as a neural signature in defining object-selective regions in the human brain.

**Keywords: object selectivity, fusiform face area, face recognition, individual differences, functional magnetic resonance imaging**

## **INTRODUCTION**

In neurophysiological studies, a standard criterion for neural selectivity is that the response of a neuron should be at least twice as great for the preferred stimulus category as for any other stimulus category (Tovee et al., 1993). Following this principle, functional magnetic resonance Imaging (fMRI) studies have identified several object-selective regions in human ventral visual pathway, each of which responds more highly to one object category than other object categories. These regions include the fusiform face area (FFA) responding selectively to faces (Kanwisher et al., 1997), the parahippocampal place area (PPA) responding selectively to places (Epstein and Kanwisher, 1998), the extrastriate body area (EBA) responding selectively to bodies (Downing et al., 2001), and the visual word form area (VWFA) responding selectively to visual words (Cohen et al., 2000). The object selectivity was taken as a neural signature of a functionally specialized region in processing its preferred object category. However, a fundamental question remaining unclear is whether object selectivity is indeed read out for behavioral performance on object recognition.

One of the most documented object selectivity in fMRI literature is the selective response for faces. A number of face-selective regions have been identified in human occipital-temporal cortex: most notably, the FFA which is localized in the middle fusiform gyrus, the occipital face area (OFA) localized in the inferior occipital gyri (Gauthier et al., 2000), and a region in the posterior part of the superior temporal sulcus (pSTS, Allison et al., 2000; Hoffman and Haxby, 2000). The face-selective regions typically responds more than twice as strongly for faces as for non-face objects (for review, see Kanwisher, 2000, 2003), and face selectivity is defined as the response difference between faces vs. non-face objects. Prior studies suggest a functional division of labor among the three face-selective regions, with the OFA and the FFA more involved in face recognition, whereas the pSTS more involved in processing of dynamic and social information in faces (Haxby et al., 2000; Calder and Young, 2005). The role of the OFA and FFA in face recognition is supported by three lines of evidence. First, evidence from fMRI adaptation paradigms indicates that OFA responses show sensitivity to physical changes of faces (Rotshtein et al., 2005; Fox et al., 2009) and FFA responses are sensitive to identity changes (Andrews and Ewbank, 2004; Winston et al., 2004; Rotshtein et al., 2005; Fox et al., 2009). Second, recent studies with multivariate pattern analysis (MVPA) have found distinct response patterns induced by different individual faces in the OFA and FFA (Nestor et al., 2011; Goesaert and Op de Beeck, 2013). Third, more direct evidence of face-selective regions contributing to face recognition came from neuropsychological studies showing that lesions in approximately the locations of the OFA and FFA can lead to selective impairment in face recognition (i.e., acquired prosopagnosia, AP) (Damasio et al., 1982; Sergent and Signoret, 1992; Barton et al., 2002). Yet, it remains unclear whether and how face selectivity obtained in fMRI studies contributes to behavioral performance in face recognition in normal participants. Several fMRI studies have indicated that face-selective responses in the FFA and OFA are related to trialto-trial behavioral success of face recognition. For example, the activations in the FFA and OFA were higher in trials when participants successfully detected and identified a face than when they did not (Grill-Spector et al., 2004), and the spatial patterns of activation in the FFA and OFA were more stable among correct than incorrect trials in a face discrimination task (Zhang et al., 2012).

If the face-selective responses in the FFA and OFA indeed contribute to behavioral performance of face recognition, it should be related not only to the trial-to-trial behavior success of face recognition within individual participants, but also to the individual differences in this ability across participants. Yet the evidence regarding whether the individual differences in face selectivity is related to that in face recognition ability is ambiguous. An intuitive approach to examine this issue is to compare face selectivity in individuals with normal face recognition ability with those severely impaired in this ability in the absence of obvious lesions (i.e., developmental prosopagnosia, DP) (e.g., Kress and Daum, 2003; Behrmann and Avidan, 2005; Duchaine and Nakayama, 2006). However, the findings are mixed. Some studies found that face selectivity was either absent or weakened in the FFA of DP individuals (Hadjikhani and de Gelder, 2002; DeGutis et al., 2007; Minnebusch et al., 2009; Furl et al., 2011), whereas other studies found that face selectivity in the FFA was intact in DP (Hasson et al., 2003; Behrmann et al., 2005; Avidan and Behrmann, 2009). These contradictory results may be accounted for by several possible factors, such as the lack of statistical power (i.e., small number of DP participants tested), the heterogeneous nature of DP, and the possibility that the FFA might not be the neural substrate of DP. Another approach to address the relevance of face selectivity to individual differences in face recognition ability is to examine the correlation between these two measures. To date, only one study has used this approach and shown a positive correlation between face selectivity in the FFA and face recognition ability (Furl et al., 2011). However, the correlation was examined across both DP and normal participants. Thus, it is unknown whether the correlation was partly resulted from group difference between DP and normal participants, or whether there was a linear relationship between face selectivity and face recognition ability in normal population. Therefore, in order to overcome the limitations of previous research, here we used fMRI to examine the correlation between individuals' face selectivity in face-selective regions and their face recognition ability in a large sample of normal participants.

To do this, we first measured participants' face selectivity in the face-selective regions (i.e., the FFA, OFA, and pSTS) when they viewed faces and non-face objects in the scanner (*N* = 294). Face selectivity was calculated as the z score of activation with the contrast of faces vs. non-face objects. Then, we measured the participants' face recognition ability with an old/new memory task out of the scanner. We used a difference measure between performance with faces and performance with flowers as an index of face-specific recognition ability (FRA), which isolated processes specific to face recognition by subtracting out variances reflecting domain-general cognitive processes (e.g., general visual discrimination abilities, attention, task engagement, and decision making) (Wang et al., 2012). Third, we used individual differences approach to examine whether the magnitudes of face selectivity in the face-selective regions were associated with participants' FRA, and, if established, whether the association was specific to face processing by controlling for irrelevant factors (e.g., response for objects or behavioral performance in object recognition). Finally, to ensure sufficient statistical power and replicability (Pashler and Harris, 2012), we performed a replication of the analysis with an independent large sample of participants (*N* = 201).

## **MATERIALS AND METHODS**

## **PARTICIPANTS**

Two cohorts of college students were recruited from Beijing Normal University, Beijing, China. Cohort 1 consisted of 294 participants (age: 17–24, mean age = 20.7; 155 females), and Cohort 2 consisted of 201 participants (age: 18–23, mean age = 20.3; 123 females). Participants reported normal or correctedto-normal vision. Participants with self-reported psychiatric and neurological disorders were excluded. Both behavioral and MRI protocols were approved by the Institutional Review Board of Beijing Normal University. Written informed consent was obtained from all participants prior to the experiment. Six participants (5 females) in Cohort 1 and one male participant in Cohort 2 did not take part in the behavioral test and consequently were excluded from further analyses.

## **STIMULI**

A dynamic face localizer was used in the fMRI scanning (Pitcher et al., 2011), containing colored movie clips of four object categories. Movie clips of faces were filmed on a black background, and framed close-up to reveal only the faces of 7 Caucasian children as they danced or played with toys or adults (who were out of frame). Movie clips of objects, scenes and scrambled objects were included to examine the selectivity of the FFA to faces. The objects were moving toys; the scenes were mostly pastoral scenes shot from a car window while driving slowly through leafy suburbs, along with some other videos taken while flying through canyons or walking through tunnels; and the scrambled objects were constructed by scrambling each frame of the object movie clips (for more details on the stimuli, see Pitcher et al., 2011).

## **fMRI SCANNING**

Each participant attended three runs in total, each of which lasted 3 min 18 s. Each run contained two block sets, intermixed with three 18-s rest blocks at the beginning, middle, and end of the run. Each block set consisted of four blocks with four stimulus categories, with each stimulus category presented in an 18-s block that contained six 3-s clips. The order of stimulus category blocks in each run was palindromic and was randomized across runs. During the scanning, participants were instructed to passively view movie clips containing faces, objects, scenes, or scrambled objects.

### **IMAGE ACQUISITION**

Scanning was conducted on a Siemens 3T scanner (MAGENTOM Trio, a Tim system) with a 12-channel phased-array head coil at BNU Imaging Center for Brain Research, Beijing, China. Functional images were acquired using a gradient-echo echo planar imaging sequence (30 slices, repetition time (*TR*) = 2.0 s, echo time (*TE*) <sup>=</sup> 30 ms, voxel size <sup>=</sup> <sup>3</sup>*.*<sup>125</sup> <sup>×</sup> <sup>3</sup>*.*<sup>125</sup> <sup>×</sup> <sup>4</sup>*.*8 mm3). Slices were oriented parallel to each participant's temporal cortex covering the whole brain. In addition, a high-resolution T1 weighted MPRAGE anatomical scan was acquired for registration purposes and anatomically localizing the functional activations.

### **fMRI DATA PREPROCESSING**

Data were analyzed using tools from the Oxford Center for Functional MRI of the Brain Software Library (FSL) (Smith et al., 2004) and in-house Python codes. A 2-stage registration was used to align functional data to Montreal Neurological Institute (MNI) standard templates. First, the functional data were aligned to structural images with a linear registration; and then the structural images were warped to MNI standard template with a non-linear approach. Functional data preprocessing included high-pass temporal filtering with a high-pass cutoff of 120 s, motion correction, and spatial smoothing using a 6-mm fullwidth at half-maximum (FWHM) Gaussian kernel. The voxel size of functional data was resampled to 2 <sup>×</sup> <sup>2</sup> <sup>×</sup> 2 mm3.

For the functional data of each participant, the general linear model (GLM) modeled the face, object, scene, and scrambled object stimuli as explanatory variables (EVs), convolved with a hemodynamic response function (HRF). Within the time course of each EV, the onset, and duration of every stimulus was modeled. The temporal derivative of each EV was modeled to improve the sensitivity of the model. Motion parameters were entered into the GLM as confounding variables of no interest. Statistical contrasts between pairs of different object categories were evaluated. After the first level analysis, all 3 runs from each participant were combined using a fixed-effects analysis at the second level, and the resulting images were wrapped into MNI template. Finally, the resulting contrast maps from all participants were passed forward to a random-effect group-level analysis.

## **ROI IDENTIFICATION AND FACE SELECTVITY CALCULATION**

Z statistic image for the contrast of faces vs. objects in grouplevel analysis was thresholded at *z >* 2*.*58 (one tailed *p <* 0*.*005, uncorrected) and segmented into several clusters using watershed segmentation codes developed in Python (available in the scikitimage project, http://scikit-image*.*org). To simplify the ROI definition for a large number of participants in our study, the ROIs for each individual were defined by projecting the ROIs obtained from the group-level analysis to each individual's brain, given that the group-level analysis provided information on the location of the ROIs by summarizing the data from all participants. The FFA was defined as the region of interest (ROI), consisting of a set of contiguous voxels that were significantly activated for faces vs. objects in the fusiform gyrus in both hemispheres (30 voxels minimum). The OFA and the pSTS were defined in the same way but localized in inferior occipital cortex and the posterior STS, respectively. Face selectivity in each ROI for each participant was calculated as the average z score from the contrast of faces vs. objects across all voxels within each ROI. Note that the face selectivity of the ROI was calculated from the same set of data that were used to define the ROI; however, this bias was unlikely to affect the brain-behavior correlation, because calculation of correlation is based on the variance, not the mean. That is, the bias may inflate the mean magnitudes of face selectivity in the ROIs for all participants, but it would not inflate the individual differences (i.e., variances) of face selectivity. For further control analysis, we also extracted the average z scores in the ROIs for faces (faces *>* fixation) and objects (objects *>* fixation).

### **BEHAVIORAL TEST**

The old/new recognition memory paradigm was used to measure participants' FRA. Specifically, for Cohort 1, 60 face images and 60 flower images were used (**Figure 1**). Face images were gray-scale adult Chinese faces with the external contours removed (leaving a roughly oval shape with no hair on the top and sides, with the addition of the neck). Flower images were gray-scale pictures of common flowers with leaves and background removed. There were two blocks in this task: a face block and a flower block, which were counterbalanced across participants. Each block consisted of one study segment and one test segment. In the study segment, 20 images of each stimuli category were shown twice. Each image was presented for 1 s with an interstimulus interval (ISI) of 0.5 s. In the test segment, the 20 studied images were shown twice, randomly intermixed with 40 new images from the

**FIGURE 1 | Example stimuli and trial types in the old/new recognition task.** In the study segment, participants studied a series of images of either faces or flowers. In the test segment, the studied images were shown with new images from the same category intermixed. Participants were asked to indicate which of the images had been shown in the study segment.

same category. On presentation of each image, participants were instructed to indicate whether the image had been shown in the study segment. Cohort 2 was tested by a short version of the task (i.e., halved length), which was reported previously (Wang et al., 2012). For each stimuli category, 10 images were learned and tested (with 20 new images as distractors). Otherwise, all experimental parameters were identical to those described for Cohort 1. For each participant, a recognition score was calculated as the recognition accuracy (hits + correct rejections) for each category (face and object/flower). The FRA was calculated as the normalized residual of the face recognition score after regressing out the object (i.e., flower) recognition score.

#### **VOXEL-WISE WHOLE-BRAIN ANALYSIS**

In addition to ROI analysis, we searched for any voxels in the whole brain that showed significant correlation between face selectivity and FRA across participants in Cohort 1. We first identified clusters of contiguous voxels showing significant correlation effect (*p <* 0*.*05, uncorrected), and then tested these clusters with whole brain correction (WBC) and small-volume corrections (SVC). In the WBC, the minimum cluster size above which the probability of type I error was below 0.05 was determined by the cluster program in FSL using Gaussian Random Field theory. Then, the SVCs were performed in preselected anatomical masks for regions implicated in face processing, namely, the right occipital fusiform cortex, bilateral STS, anterior temporal cortex, amygdala, OFC, and precuneus. All masks were taken from the Harvard–Oxford probabilistic structural atlas available with FSL 5.0 (FMRIB, Oxford, UK—http://www*.*fmrib*.*ox*.*ac*.*uk/fsl) with the threshold at 25%. The minimum cluster size was determined for each mask above which the probability of type I error was below 0.05.

### **RESULTS**

### **FACE SELECTIVITY IN THE FFA AND FACE RECOGNITION ABILITY**

Based on group-level z statistic image for the contrast of faces vs. objects (see Methods for details), the FFA was localized within the mid-fusiform gyrus in both hemispheres in two cohorts of participants (for coordinates of peak voxel and cluster size, see **Table 1**). **Figure 2A** showed the left and right FFA from the group-level analysis on an inflated cortical surface of MNI standard template. Consistent with previous literature, the right FFA was larger and more face-selective than the left FFA (see **Table 1** for details).

The critical test is whether face selectivity in the FFA was correlated with the ability of face recognition. Face selectivity for each participant was calculated as the average z score from the contrast of faces vs. objects across all voxels within the ROIs, while the FRA was calculated as the normalized residual of the face recognition score after regressing out the object recognition score in the old/new recognition task (**Table 2** showed descriptive statistics for this task). We found that face selectivity in the FFA of both hemispheres was positively correlated with the FRA in Cohort 1 (left FFA: Pearson's *r* = 0*.*16, *p* = 0*.*008; right FFA: Pearson's *r* = 0*.*14, *p* = 0*.*016; for scatterplots, see **Figures 2B,C**). Because there was no significant difference in the face selectivity-FRA correlation between the left and right FFA (Steiger's *Z*-test, *z <* 1), face selectivity in the left and right FFA was collapsed across hemispheres



and used for further analyses (correlation between face selectivity of the FFA and FRA, Pearson's *r* = 0*.*16, *p* = 0*.*008). Next, we examined whether the link between face selectivity in the FFA and the FRA was specific to face processing (i.e., domain-specific), or the association was able to be accounted for by factors not specific to face processing (i.e., domain-general).

First, since face selectivity was calculated from the contrast of faces vs. objects, we need to rule out the possibility that the face selectivity—FRA correlation was largely resulted from a negative correlation between FFA responses to objects and FRA, rather than a positive correlation between FFA response to faces and FRA. We found that the correlation between FRA and FFA response to objects (vs. fixation) was essentially zero (Pearson's *r* = −0*.*003, *p* = 0*.*97). Further, the FRA was positively correlated with FFA response to faces, after controlling out FFA response to objects (partial *r* = 0*.*13, *p* = 0*.*03). So it is the neural response to faces, not that to objects, which led to the association between face selectivity and the FRA. Second, the face selectivity—FRA correlation was unlikely to be explained by the participants' behavioral performance on object recognition either, because there was no correlation between face selectivity and the object recognition scores (Pearson's *r* = 0, *p* = 0*.*99), and face selectivity was positively correlated with face recognition scores (*r* = 0*.*14, *p* = 0*.*02). Hence, the face selectivity—FRA correlation was not confounded by the variance in neural response or behavioral performance for non-face objects. Third, previous studies have shown that females are better at face recognition than males (e.g., Rehnman and Herlitz, 2007; Sommer et al., 2013), and we replicated this finding with the measure of the FRA in our study [*t*(286) = 2*.*55, *p* = 0*.*01, Cohen's *d* = 0*.*30]. Therefore, the face selectivity—FRA association may result from the group difference between male and female participants, rather than a linear relationship across both groups of participants. To exclude this alternative, we calculated the partial correlation between face selectivity and FRA, with gender controlled out. We found that the association between FRA and face selectivity remained (partial correlation *r* = 0*.*14, *p* = 0*.*02), and thus, could not be explained by gender difference.

**FIGURE 2 | Face selectivity in the face fusiform area (FFA) and occipital face are (OFA) was correlated with face-specific recognition ability (FRA). (A)** The FFA, OFA, and the posterior part of the superior temporal sulcus (pSTS) from group-level analysis displayed on an inflated cortical surface of MNI standard template for Cohort 1. Z statistic image for the contrast of faces vs. objects in group-level analysis was thresholded at *Z >* 2*.*58 (one

**Table 2 | Mean Scores and standard deviations (***SD***) of the performance in the old/new recognition task and the FFA responses.**


*The FFA response to faces was calculated as the average z scores across all voxels from the contrast of faces vs. fixation, and the FFA response to objects was calculated from the contrast of objects vs. fixation. Face selectivity was calculated as the average z score from the contrast of faces vs. objects.*

Together, the above control analyses indicated that the association between face selectivity in the FFA and FRA is domain specific, and not able to be accounted for by the factors not specific to face processing.

Given the anatomical variability of face-selective regions across individuals, further analyses were performed to rule out the possibility that the FFA based on group-level analysis may lack specificity to tap into the FFA in individuals, especially in poor performers. First, we localized the FFAs in the poorest face recognizers (*N* = 20) at the individual level (*p <* 0*.*01,

tailed *p <* 0*.*005, uncorrected). **(B–D)** Scatter plots between FRA and face selectivity in the **(B)** right FFA, **(C)** left FFA, and **(D)** right OFA. The face selectivity for each participant was calculated as the average z score from the contrast of faces vs. objects across all voxels in each ROI, and the FRA was calculated as the normalized residual of the face recognition score after regressing out the object recognition score in the old/new recognition task.

uncorrected), and then compared their anatomical variability with that from the best recognizer (*N* = 20). We found the mean peak voxel coordinates of the FFA in the poor group (right FFA: 42.50, −53.63, −21.75; left FFA: −40.17, −50.67, −21.50) were very close to those in the good group (right FFA: 41.50, −48.88, −22.38; left FFA: −40.71, −48.35, −23.18). Moreover, SDs of the peak voxel coordinates in the poor group (right FFA: *SDx* = 2*.*60, *SDy,* = 5*.*39, *SDz,* = 3*.*31 mm; left FFA: *SDx* = 3*.*21, *SDy,* = 7*.*32, *SDz,* = 4*.*48 mm) were comparable to those in the good group (right FFA: *SDx* = 3*.*35, *SDy,* = 7*.*14, *SDz,* = 3*.*33 mm; left FFA: *SDx* = 2*.*91, *SDy,* = 5*.*41, *SDz,* = 4*.*71 mm), indicating comparable anatomical variability of the FFA between the poor and good performers. Second, there were 9 participants fitting the definition of DP (i.e., FRA scores *<*2 *SD*) in Cohort 1. We recomputed the correlation between face selectivity in the FFA and FRA with the 9 participants excluded, and found the correlation remained significant (Pearson's *r* = 0*.*130, *p* = 0*.*03). Third, we defined the FFA based on group-level analysis with a more stringent threshold (one tailed *p <* 0*.*0001, uncorrected), and found the correlation between face selectivity in the FFA and FRA remained unchanged (Pearson's *r* = 0*.*15, *p* = 0*.*009). Taken together, these results confirmed the validity of using the group-level ROIs in the current study.

Finally, though we have revealed a face selectivity—FRA association in the FFA, the effect size of the association was rather modest (*r* = 0*.*16). Did this reflect the true correlation coefficient between face selectivity in the FFA and FRA, or was the observed correlation coefficient somehow biased to a low-level value? To examine the reliability of the association, we replicated this finding with another independent cohort of participants following the same procedure. The face selectivity—FRA association was confirmed in Cohort 2, and more importantly, the effect size of the association was comparable to that of Cohort 1 (Pearson's *r* = 0*.*15, *p* = 0*.*04). In addition, the association was not confounded by either the FFA response to objects (correlation between FFA response to objects and FRA, Pearson's *r* = −0*.*03, *p* = 0*.*66), or the behavioral performance on object recognition (correlation between face selectivity of FFA and the object recognition score, Pearson's *r* = −0*.*03, *p* = 0*.*72). Neither could the association be solely explained by the gender difference, because the partial correlation between face selectivity and FRA with gender controlled out was 0.12 (*p* = 0*.*10). In sum, although the effect size is modest, face selectivity in the FFA was reliably associated with FRA, and the association is specific to face processing.

### **FACE SELECTIVITY IN OTHER FACE-SELECTIVE REGIONS AND FACE RECOGNITION ABILITY**

Was face selectivity in other face-selective regions associated with face recognition ability? With group-level z statistic image for the contrast of faces vs. objects, the right OFA and bilateral pSTS were obtained in Cohort 1(**Figure 2A**, **Table 1**), while the left OFA was not obtained, possibly due to large anatomical variability of the left OFA across individuals. We found that face selectivity in the right OFA was positively correlated with the FRA (Pearson's *r* = 0*.*16, *p* = 0*.*006, **Figure 2D**). In contrast, whereas the pSTS showed selective response for faces, its face selectivity was not correlated with the FRA (right: Pearson's *r* = −0*.*03, *p* = 0*.*59; left: Pearson's *r* = 0*.*06, *p* = 0*.*35).

These results were replicated in Cohort 2. Specifically, the bilateral OFA and pSTS were obtained in Cohort 2. Face selectivity in the OFA (right: Pearson's *r* = 0*.*19, *p* = 0*.*008; left: Pearson's *r* = 0*.*28, *p <* 0*.*001), but not that in the pSTS (right: Pearson's *r* = 0*.*02, *p* = 0*.*78; left: Pearson's *r* = −0*.*02, *p* = 0*.*78), was positively correlated with the FRA. Taken together, these results indicated that face selectivity in the FFA and OFA could predict individual differences in face recognition ability, while face selectivity in the pSTS did not link to face recognition ability, consistent with the functional division of labor among the three face-selective regions suggested in previous literature (Haxby et al., 2000; Calder and Young, 2005).

In our study, face selectivity of the ROIs was from the same dataset that was used to define the ROIs. To demonstrate that the face selectivity—FRA correlation is not subject to circularity and to further demonstrate that the results could not be accounted for by the approach of group-level ROI definition, we localized the ROIs in one cohort (i.e., Cohort 2), and then used the face selectivity in these predefined ROIs from the other cohort (i.e., Cohort 1) for the correlation analysis. The cross-cohort analysis replicated the finding from the within-cohort analysis: face selectivity in the FFA and OFA was positively correlated with the FRA in cohort 1 using the ROIs defined in cohort 2 (left FFA: Pearson's *r* = 0*.*15, *p* = 0*.*013; right FFA: Pearson's *r* = 0*.*14, *p* = 0*.*015; right OFA: Pearson's *r* = 0*.*15, *p* = 0*.*009), whereas face selectivity in the right pSTS was not correlated with the FRA (Pearson's *r* = −0*.*06, *p* = 0*.*32).

## **WHOLE BRAIN ANALYSIS**

In addition to the ROI analysis, we searched for any voxels in the whole brain that showed correlation between face selectivity and FRA across participants in Cohort 1. The results of whole brain analysis were in agreement with those of ROI analysis (**Figure 3**). That is, FRA was positively correlated with face selectivity in a cluster in the right inferior occipital cortex (MNI coordinates of peak: 42, −92, −10, cluster size: 1645, peak *z*value: 3.98), and another cluster in the left inferior occipital and fusiform cortex (MNI coordinates of peak: −42, −44, −30, cluster size: 1098, peak *z*-value: 3.95) with whole-brain correction. Then, anatomical masks were created for small volume corrections (SVC, *p <* 0*.*05) in regions implicated in face processing, including the right occipital fusiform cortex, the bilateral STS, anterior temporal cortex, amygdala, OFC, and precuneus. A significant positive correlation between FRA and face selectivity was found in a cluster in the right fusiform cortex (MNI coordinates of peak: 42, −44, −22, cluster size: 135, peak *z*-value: 3.03).

## **DISCUSSION**

Following the criterion for neural selectivity adopted in neurophysiological research, fMRI studies have identified multiple object-selective areas in the human brain. Here in this study, we investigated the behavioral significance of object selectivity by correlating the inter-subjects variance of face selectivity in face-selective regions with individual's specific ability of recognizing faces. In two independent large samples of participants, we found that individuals with higher face selectivity in the FFA and OFA consistently exhibited better face recognition ability. Furthermore, the association of face selectivity in the FFA and face recognition ability could not be merely explained by the FFA responses to objects, general object recognition ability, or gender, suggesting that the observed association is specific to face processing. In contrast, there was no association between face selectivity in the pSTS and face recognition ability. In sum, these findings provide empirical evidence that face selectivity in the FFA and OFA contributes to behavioral performance of face recognition. The behavioral relevance of face selectivity to face recognition supports the validity of using object selectivity in defining objectselective regions, though the validity of object selectivity can also be demonstrated in other approaches.

**FIGURE 3 | Voxel-wise correlation between face selectivity and FRA.** The results were displayed on an inflated cortical surface of MNI standard template, thresholded at *z >* 1*.*96 (two tailed *p <* 0*.*05, uncorrected).

Our study provides the first evidence that face selectivity in the FFA and OFA is related to individual differences in face recognition ability in normal population. Notably, the association remained after removing the extreme individuals fitting the definition of DP in our study. Thus, these results corroborated and extended the recent study demonstrating this association in the FFA across DP and normal participants (Furl et al., 2011). In addition, previous studies have shown that both the average response (Grill-Spector et al., 2004) and the spatial pattern of response in the FFA and OFA (Zhang et al., 2012) are involved in trial-to-trial behavioral success of recognizing faces. These two lines of evidence converged to indicate that face responses in the FFA and OFA contribute to behavioral performance of face recognition. Our results are more generally in agreement with previous studies showing that the FFA response reflects the percept of a face, rather than the physical stimuli, in binocular rivalry (Tong et al., 1998) and the Rubin vase-face illusion (Hasson et al., 2001; Andrews et al., 2002), and that the FFA responses for upright vs. inverted faces was positively correlated behavioral face-inversion effect (Yovel and Kanwisher, 2005). Taken together, these results suggest that the face-selective responses may subserve the neural correlate of face perception and face recognition. Perhaps the most convincing evidence that face-selective regions contribute to face recognition comes from the neuropsychological literature. The lesions of acquired prosopagnosic patients are usually found in ventral occipito-temporal cortex, involving both or either of the OFA and FFA, either right-sided or bilateral (Damasio et al., 1982; Sergent and Signoret, 1992; Barton et al., 2002). Importantly, results from prosopagnosic patient PS indicated that both the FFA and the OFA, and the integrity of their interaction, are necessary for successful face identification (Rossion et al., 2003; Schiltz et al., 2006; Rossion, 2008).

Further, our result suggested the association between face selectivity in the FFA and face recognition ability is domainspecific. First, the association is stimulus-specific because it is not accounted for by neural response or behavioral performance for non-face objects. Thus, a specific processing mechanism may exist for faces which distinguished from those for other object categories. Although there is alternative hypotheses proposed that the mechanisms involved in face recognition are also engaged in expert exemplar discrimination for any homogeneous visual category (Diamond and Carey, 1986; Gauthier et al., 1999, 2000), the stimulus specificity of face recognition has been supported by evidence from behavioral, developmental, electrophysiological, and clinical works, in addition to fMRI studies. Behaviorally, face recognition is more disrupted by inversion (e.g., Yin, 1969) and shows more holistic processing than object recognition (e.g., Tanaka and Farah, 1993), and there is greater development with age in face recognition than in object recognition (Carey and Diamond, 1977; Golarai et al., 2007; Weigelt et al., 2014). The neuropsychology literature of AP (Rossion et al., 2003; Busigny et al., 2010) and object agnosia (Moscovitch et al., 1997) contains evidence for a double dissociation between face and object recognition, and electrophysiological studies reveal a specialized region in monkey brain dedicated to process faces, consisting entirely of face-selective cells (e.g., Tsao et al., 2006). Interestingly, the relevance of object-selective response of an object-selective region to object recognition performance has also been demonstrated for other object categories. For example, the response to written sentences and letters strings, but not that to other object categories, in the VWFA increased linearly with reading performance (words read per minute) (Dehaene et al., 2010), and the object-selective activation in object areas (e.g., the lateral occipital complex) was positively correlated with performance of object naming across participants (Grill-Spector et al., 2000). Therefore, object selectivity may serve as a neural signature of a functionally specialized region. Note that the behavior-selectivity correlation provides sufficient but not necessary evidence to support the validity of using object selectivity to define an object-selective region.

Second, the association cannot be accounted for by domaingeneral cognitive processes (e.g., attention, task engagement, general visual discrimination abilities, and decision making), further suggesting the domain-specificity of the association. Although both the responses in face-selective regions (Wojciulik et al., 1998) and behavioral performance in face recognition tasks are sensitive to attention and task engagement, these general cognitive components shall be largely removed from the association after subtracting response to objects from that to faces, and subtracting object recognition scores from face recognition scores, because objects and faces likely underwent the same general cognitive processes. In addition, the correlation analysis was based on the link between in-scanner neural activation and out-of-scanner behavioral performance; therefore, those who were more attentive during scanning were not necessarily those more attentive or more engaged in the behavioral tasks out of scanner. Finally, the observation that pSTS activation did not correlate with FRA also argued against the possibility that the link between face selectivity in the FFA and face recognition ability was accounted for by general cognitive processes.

Face-selective regions are known to have anatomical variability across individuals, which may be averaged out of the group-level ROIs; however, our results were unlikely to be accounted for by the approach in defining the ROIs. First, the correlation was not resulted from the FFA variability in poor performers, because the anatomical variability of the FFA was comparable between poor and good performers, and the correlation between face selectivity in the FFA and FRA remained significant with poor performers excluded. Second, the same pattern of results was observed when the FFA was defined with a more stringent threshold or with a cross-cohort analysis, indicating that the individual-level ROI is not a critical factor to observe the behavior-selectivity correlation. Finally, the results of whole brain analysis fitted nicely with those of ROI analysis.

Comparing with previous studies, our study has a distinctive merit in methodology, that is, the association is examined in a large sample of participants, and more importantly, replicated in another independent large sample, which allows us to reveal a reliable brain-behavior association. Notably, not only the association, but also the effect size of the association was replicated in the independent sample. Yarkoni (2009) has argued that the combination of small sample sizes and stringent alpha-correction levels would lead to the grossly inflated correlations, whereas the correlations in our results are rather modest, in line with other previous studies with large sample sizes (e.g., Holmes et al., 2012; Hao et al., 2013; He et al., 2013). For the modest effect size of the association between face selectivity in the FFA and face recognition ability, there are several possible explanations. First, the responses in face-selective region as measured in our study is only one of many possible neural measures which may account for a portion of variance in face recognition ability, such as the cluster size (Furl et al., 2011) and gray matter volume of the face-selective regions (Behrmann et al., 2007; Golarai et al., 2007; Garrido et al., 2009; Dinkelacker et al., 2011), the functional connectivity (Zhu et al., 2011; Avidan et al., 2013) and anatomical connections between different face-selective regions (Thomas et al., 2009), and the connectivity between face-selective regions and the rest of the brain. Second, the old/new face recognition memory task may capture only a portion of variance in face recognition ability (Wilhelm et al., 2010; Wang et al., 2012). Third, the group-level ROIs in our study likely included some nonface-selective voxels and/or excluded some face-selective voxels in each individual, which may underestimate the true correlation coefficients between face selectivity and FRA. Therefore, further studies with face-selective ROIs defined at the individuallevel may help illustrating the association more precisely. Fourth, although the dynamic localizer of Caucasian faces was sufficient to demonstrate the link between face-selective responses and face recognition ability in our study, videos of young adult Asian faces may be more desirable stimuli to tap into expert face recognition for our participants. Future research adopting optimal face stimuli may characterize the correlation more accurately. Finally, the reliability of the measurement of both face selectivity and FRA are not perfect, which may further underestimate the correlation (Schmidt and Hunter, 1999). In sum, it is not very plausible for any single neural measure to account for a large proportion of variance in a complex behavior skill such as face recognition.

In conclusion, our study provides one of the first evidence that the face selectivity in the FFA can predict face recognition ability in normal population. In our study, several issues remained unaddressed that are important topics for future research. First, the exact mechanism underlying this association is still unknown. One possibility is that higher face selectivity observed in the fMRI reflects larger number of face-responsive neurons in face-selective regions and/or shaper tuning of these neurons observed in neurophysiology studies (Tsao et al., 2006), which contribute to better behavioral performance. Another possibility is that increased face-selective response is accompanied by enhanced connectivity between different face processing regions (Saygin et al., 2012), and enhanced connectivity (e.g., more efficient transfer of face-related information) lead to better performances (e.g., Zhu et al., 2011). Future studies combining different techniques (such as singlecell recording, fMRI, and diffusion tensor imaging) are needed to explore these possibilities in depths. Second, some studies have demonstrated that FFA could be divided into two sub-regions (Pinsk et al., 2009; Weiner and Grill-Spector, 2010), and their functional roles in the association need to be further characterized. Third, although neuropsychological (Damasio et al., 1982; Sergent and Signoret, 1992; Barton et al., 2002) and transcranial magnetic stimulation (TMS) studies (Pitcher et al., 2009) have indicated the causal role of the face-selective regions in face recognition performance, we acknowledge that the correlation between face selectivity and face recognition ability in our study could be explained in the other direction. That is, for example, the FFA may be more selective to faces in good recognizers because they accumulate more information when presented with faces than poor recognizers. Finally, future studies are invited to extend the behavioral significance of object selectivity to other object categories, e.g., the place-selective response in the PPA for place recognition, and the body-selective response in the EBA for body recognition, so as to investigate whether the association between object selectivity and object recognition ability is a general principle for object recognition.

## **ACKNOWLEDGMENTS**

This study was funded by the National Natural Science Foundation of China (31100808, 30800295, 31230031, 91132703, 31221003) and the National Basic Research Program of China (2010CB833903).

## **REFERENCES**


link negative affect, impaired social functioning, and polygenic depression risk. *J. Neurosci*. 32, 18087–18100. doi: 10.1523/JNEUROSCI.2531-12.2012


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 February 2014; accepted: 15 June 2014; published online: 02 July 2014. Citation: Huang L, Song Y, Li J, Zhen Z, Yang Z and Liu J (2014) Individual differences in cortical face selectivity predict behavioral performance in face recognition. Front. Hum. Neurosci. 8:483. doi: 10.3389/fnhum.2014.00483*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Huang, Song, Li, Zhen, Yang and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Familiarity is not notoriety: phenomenological accounts of face recognition

**Davide Liccione1,2\*, Sara Moruzzi 1,2 , Federica Rossi 1,3 , Alessia Manganaro<sup>1</sup> , Marco Porta<sup>4</sup> , Nahumi Nugrahaningsih<sup>4</sup> , Valentina Caserio<sup>1</sup> and Nicola Allegri 1,2**

<sup>1</sup> Lombard School of Psychotherapy, Pavia, Italy

<sup>2</sup> Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy

<sup>3</sup> Nursing Home De Rodolfi, Vigevano, Pavia, Italy

<sup>4</sup> Department of Engineering, University of Pavia, Pavia, Italy

#### **Edited by:**

Aina Puce, Indiana University, USA

#### **Reviewed by:**

Julie A. Brefczynski-Lewis, West Virginia University, USA Boutheina Jemel, Universite de Montreal, Canada

#### **\*Correspondence:**

Davide Liccione, Department of Brain and Behavioral Sciences, University of Pavia, Piazza Botta 11, 27100 Pavia, Italy e-mail: davide.liccione@unipv.it

From a phenomenological perspective, faces are perceived differently from objects as their perception always involves the possibility of a relational engagement (Bredlau, 2011). This is especially true for familiar faces, i.e., faces of people with a history of real relational engagements. Similarly, valence of emotional expressions assumes a key role, as they define the sense and direction of this engagement. Following these premises, the aim of the present study is to demonstrate that face recognition is facilitated by at least two variables, familiarity and emotional expression, and that perception of familiar faces is not influenced by orientation. In order to verify this hypothesis, we implemented a 3 × 3 × 2 factorial design, showing 17 healthy subjects three type of faces (unfamiliar, personally familiar, famous) characterized by three different emotional expressions (happy, hungry/sad, neutral) and in two different orientation (upright vs. inverted). We showed every subject a total of 180 faces with the instructions to give a familiarity judgment. Reaction times (RTs) were recorded and we found that the recognition of a face is facilitated by personal familiarity and emotional expression, and that this process is otherwise independent from a cognitive elaboration of stimuli and remains stable despite orientation. These results highlight the need to make a distinction between famous and personally familiar faces when studying face perception and to consider its historical aspects from a phenomenological point of view.

**Keywords: face recognition, familiarity, inversion, facial expression, person, phenomenology**

## **INTRODUCTION**

Face recognition is an essential task for human daily life as it allows the identification of the person in front of you and provides the possibility of a relational engagement (Kleinke, 1986). Several types of information can be extracted from the perception of a face, ranging from age, gender and emotional states, but above all, identity (Morrison et al., 2001; Jenkins and Burton, 2008). Faces constitute a separate perceptual category, differing in many aspects from other stimuli, such as objects (Tanaka and Sengco, 1997). They are perceived holistically, in contrast with other objects, which receive an elaboration based on the processing of constitutive details (Tanaka and Sengco, 1997; Farah et al., 1998; Ge et al., 2006). Face perception is defined as "holistic" because faces are processed as gestalts, with single facial features (nose, mouth, eyes and so on) having a less fundamental role in respect to the global face configuration (Maurer et al., 2002). The result is that any kind of experimental manipulation preventing this kind of elaboration could result in an impairment in making a judgment about face identity. The most studied of these manipulations is the so called "face inversion effect". This mechanism prevents the possibility to encode spatial information and causes the inability to perceive individual faces as a whole, forcing stimulus processing based on a system of specific and integrated features. This usually results in lower accuracy and slower reaction times (RTs; Valentine, 1988). Some interesting findings have been found in presenting inverted faces to patients with prosopagnosia, a neurological disorder characterized by the inability to recognize faces (Bauer, 1984; Grüter et al., 2008; Gainotti, 2014). Patients with congenital (Rivolta et al., 2012) and acquired prosopagnosia (Busigny and Rossion, 2010), show not to have holistic perceptual processing abilities, being minimally (if at all) affected by face inversion. Furthermore, some studies show better performance for inverted than upright faces, though this latter effect is not very common in either form of prosopagnosia (Farah et al., 1995a; Behrmann et al., 2005; Busigny and Rossion, 2010).

Besides the perceptual aspects of face recognition, great interest has been shown in the study of the elaboration of the so called "emotional valence" (Bruce and Young, 1986). Traditional cognitive models of face recognition speculate that facial identity and facial expressions are processed through different routes. Bruce and Young (1986), hypothesized the existence of two distinct elaboration pathways: one involved in identity recognition, the other in the analysis of facial expressions. The model is supported by clinical (Young et al., 1993), neurophysiological (Hasselmo et al., 1989) and neuroradiological (Winston et al., 2004) evidence, leading also to the formulation of a distributed neural system of face recognition (Haxby et al., 2000; Rivolta et al., 2014). However, there are some experimental evidences that undermine the dual route hypothesis. Many studies have shown an influence of facial expressions on identity recognition of newly learned faces (Foa et al., 2000; D'Argembeau and Van der Linden, 2007) and of famous faces (Gallegos and Tranel, 2005). Moreover, Van de Stock demonstrated that face identity perception mechanisms interact not only with the processing of facial expressions but also with bodily expressions (Van de Stock and de Gelder, 2014).

Some attention has also been concentrated on the study of emotional recognition in inverted faces. Literature on this topic is quite heterogeneous: while some studies found a detrimental effect of inversion only for the recognition of some emotional expressions (McKelvie, 1995; Calvo and Nummenmaa, 2008), some others found a general difficulty in recognizing inverted expressions for all types of emotions (Goren and Wilson, 2006). However, the most acknowledged idea is that the only expression not affected by inversion is happiness (Leppänen and Hietanen, 2004; Bombari et al., 2013). A limit of the above presented studies is that a distinction has not been made between famous and personally familiar faces, since the recognition of these two types of stimuli may differ in various aspects. In this regard, Herzmann et al. (2004), studied RTs, priming, and skin conductance response to unfamiliar, famous and personally familiar faces. They found faster RTs for both famous and personally familiar faces, but a greater skin conductance only for this last category. Moreover, recognition of personally familiar and famous faces seems to be based in different neurological areas. Taylor et al. (2009), in an fMRI study, compared unknown, famous and familiar faces, finding that the extent and areas of activation varied according to face type.

The three types of stimuli appear to be profoundly different if considered from a phenomenological perspective. Phenomenological theories claim that perception is an active process, structurally embodied, embedded, extended and enactive,<sup>1</sup> and that person recognition is different from object recognition.

What we perceive is determined by what we can do, and this is valid for both objects and people (Noë, 2004): the difference is that while an object reveals itself in a pattern of possibilities of action, a face reveals itself in a pattern of relational possibilities. In fact, in encountering another person the most pressing task is relational engagement. In these terms it appears clear why familiar faces are different from famous and unknown faces: if we encounter a familiar person (i.e., a person who has a history of real relational engagements with us) many ways of being in engagement become vivid and start to pertain to our personal experience and to its significance. In observing a familiar person we experience ourselves in our personal possibilities of relational engagement. In this way, particular importance is given to the processing of emotional expressions, because they define the "sense"<sup>2</sup> of this engagement (Bredlau, 2011). We therefore hypothesize that familiarity is a so powerful constituent of face perception to overcome the effect of the inversion of the stimulus and to be not influenced by emotional expressions.

Therefore, the aim of the present study is to investigate whether manipulations of orientation and expressions can influence the processing of facial identity of unfamiliar, personally familiar and famous faces. Our hypothesis is that inversion does not affect the vivid experiential perception of a familiar face, leading to similar RTs for inverted, compared to upright familiar faces. For this purpose, we presented our subjects with pictures of unfamiliar, famous and personally familiar faces, both upright and inverted, with three different emotional expressions: happy, neutral and sad/angry. The main element of evaluation was the RTs of our subjects during a face recognition task.

## **METHODS**

## **PARTICIPANTS**

Seventeen adults (5 male; 12 female), with normal or correctto-normal vision, ranging in age from 23 to 36 (*M* = 27.7, DS = 2.43 years), participated in this study. All participants were unaware of the purpose of the experiment. The study conformed to the national guidelines and regulations of the A.I.P. (Italian Association of Psychology), and was approved by the Lombard School of Psychotherapy ethical review committee. All subjects gave informed consent.

## **STIMULI**

Visual stimuli consisted of digitalized grayscale images of familiar, famous and unknown faces, displaying positive, negative or neutral expressions. All images were selected for high-resolution frontal views and forward eye-gaze. Pictures were homogenized for average brightness and contrast, and did not show significant differences in these parameters across categories. In accordance with the purpose of this study we avoided removing hair, glasses or other distinctive features from the portraits, in order to keep an authentic approach to face perception.

*Familiar faces*. These highly familiar faces consisted of pictures of 10 familiar people for each subject. The choice of familiar people was based on a questionnaire previously filled by the participants, which were asked to indicate 10 relatives or significant others (e.g., spouse, partner, etc.). The researchers contacted each familiar person and photographed them with three different expressions (positive, negative and neutral) making a total of 30 photos. Originally relatives were asked to pose happy, angry and neutral expressions. Nevertheless, due to subjects' difficulty in reproducing intentionally a unequivocal angry faces we chose to

<sup>1</sup>Roughly, defining perception as *embodied* means to consider the important role of the body shape in perceiving and experiencing the world and how we act in it. The idea that perception is *embedded* (and, on this ground, also *"extended"*) claims that perception is always situated in the environment: the objects (or events) are not isolated entities but instead, in Heidegger words, "at hand", i.e., available to manipulation, and in this sense, they shape our perceptions and actions. *Enactive* dimension of perception reveals that it is not merely analysis of actual physical features of objects but perception calling for action. For more detailed explanations see Gallagher (2008) and Noë (2004).

<sup>2</sup>Here we use the word "sense" to explain the idea of direction, purpose, motivation etc.

categorize those facial expressions (and therefore also the others) on emotional valence (positive, negative and neutral) rather than on discrete emotional states. So, our negative familiar stimuli can encompass both angry and sad faces. Difficulty in producing negative expressions on command is shown in other studies (Öhman et al., 2001). All familiar people gave informed consent.

*Famous faces*. Famous people were selected for use in this experiment on the basis of findings from a pilot study. Sixteen celebrities, appearing regularly in the media (politicians, actors, television celebrities etc.), were chosen: three images for each celebrity, judged by the authors as having neutral, positive and negative expressions were downloaded from the Internet. Fiftyone subjects (32 female, 19 male), outside the study, were asked to rate portraits for notoriety and emotional expressions (as positive, negative or neutral). For each face, participants were asked to answer the question: "What's the name of this person?" and to rate their notoriety on a Likert-type scale ranging from 0 (not at all familiar) to 6 (very familiar). To assess emotional expressions, participants were asked to judge if the expressions were positive, negative or neutral. The final stimulus set comprised 10 (5 male and 5 female) of the 16 celebrities who met the following criteria: identity recognized by 100% of participants and each emotional expression correctly rated by 85% of participants.

*Unknown faces*. 10 unknown faces were included in the experiment (for each participant we used photos of relatives and the significant others of other subjects).

In total, 90 stimuli were used of 10 personally familiar, 10 famous and 10 faces unknown to the participants. Each face displayed the three expressions, and each stimuli were presented upright and inverted, for a total of 180 pictures (see **Figure 1** for an example) and presented in a single session.

## **PROCEDURE**

As participants arrived at the laboratory, they read the information sheet, completed the consent form and were informed that they would perform computer-based tasks. Participants were seated in a quiet room, approximately 60 cm from the screen, and viewed all 180 images in one continuous block. All images were presented once for 5000 ms in randomized order with a black inter-stimulus slide lasting 2000 ms (**Figure 2**). Participants were instructed to press, as quickly as possible, one of two keys (B and M-counterbalanced response across subjects) in agreement with subjective recognition judgment (whether the face was known or not). No training was given to the participants prior to the facial recognition task.

## **REACTION TIME**

Participants RTs were recorded by the Tobii Studio 1750 eye tracker software. Raw data of RTs were exported from Tobii Studio and processed using an *ad-hoc* software module developed with Microsoft Access. The obtained results were then adapted to SPSS databases in order to further explore the data through statistical analyses.

## **STATISTICS**

Data analyses were performed using SPSS Statistics for Windows. Statistical analysis was performed on the logarithmic transformed data of RTs. The main purpose of this log transformation

is to get the sampled data in line with the assumptions of parametric statistics (such as ANOVA) and to deal with outliers. A 3 (class: familiar/famous/unknown) × 3 (expression: positive/negative/neutral) × 2 (orientation: upright/inverted) repeated measures ANOVA explored whether RTs differed per stimuli. Stimulus type (familiar, famous and unknown), expressions (positive, negative and neutral) and orientation (upright and inverted) were entered as within-subjects variables. Effect sizes (partial eta-squared, η 2 *p* , for *F*-statistics) are reported together with *p*-values for significant main effects and interactions, and *post-hoc t*-tests were Bonferroni-corrected to require a significance value of *p* < 0.01. An η 2 *p* value above 0.01 indicates a small effect, a η 2 *p* above 0.06 a medium effect, and a η 2 *p* above 0.14 a large effect. We used Mauchly's Test of Sphericity to test the assumption of sphericity, if this assumption is violated, the *F*-statistic is positively biased rendering it invalid and increasing the risk of a Type I error. To overcome this problem, Greenhouse-Geisser correction was applied to the degrees of freedom (*df*).

## **RESULTS**

**Table 1** shows mean values of reaction times.

This analysis revealed the main effects of Orientation [*F*(1,169) = 167.04, *p* < 0.001, η 2 *<sup>p</sup>* = 0.50], Class [*F*(1.87,315.39) = 69.80, *p* < 0.001, η 2 *<sup>p</sup>* = 0.29] and Expression [*F*(2,338) = 5.62, *p* = 0.004, η 2 *<sup>p</sup>* = 0.03]. Pairwise comparisons (**Figure 3**) reveal that RTs in upright condition were lower than in inverted condition (*p* < 0.001); RTs in detecting familiar faces were significantly faster compared to both famous (*p* < 0.001) and unknown faces (*p* < 0.001). RTs were faster for famous compared to unknown (*p* = 0.001); and for positive compared to neutral (*p* = 0.01)

**Table 1 | Means and standard error for reaction times.**


and negative expressions (*p* = 0.01). No differences were found between neutral and negative expressions (*p* > 0.05). Analysis revealed that all two-way interactions were significant (Orientation × Expression [*F*(2,338) = 11.16, *p* < 0.001, η 2 *<sup>p</sup>* = 0.06]; Orientation × Class [*F*(1.74,293.76) = 46.48, *p* < 0.001, η 2 *<sup>p</sup>* = 0.21]; Class × Expression [*F*(3.72,628.58) = 8.20, *p* < 0.001, η 2 *<sup>p</sup>* = 0.05]).

We found a significant three way interaction between orientation, class and expression [*F*(3.64,614.61) = 2.81, *p* = 0.02, η 2 *<sup>p</sup>* = 0.02]. Interaction between orientation, class and expression comparing for orientation (**Figure 4**) showed no significant differences for

**FIGURE 3 | Pairwise comparisons of the main effects of orientation, class and expression**. Single asterisk indicates significance at p < 0.001.

familiar faces in RTs between upright and inverted condition for all the expressions; in famous and unknown categories, instead, RTs were significantly higher for inverted orientation for all the expressions.

Interaction between orientation, class and expression, comparing for expression in upright condition (**Figure 5**, left part), showed that just in familiar we found a significant difference between neutral and negative expressions.

Interaction between orientation, class and expression comparing for expression in inverted condition (**Figure 5**, right part) showed that in familiar faces we replicate results of upright condition, in famous faces we found significant differences between

negative and both positive and neutral, in unknown faces between positive and both neutral and negative.

## **DISCUSSION**

The purpose of our study was to evaluate, in a face recognition task, the effects of different levels of face-familiarity (personally familiar, famous and unknown faces), orientation (upright or inverted) and emotional expressions (positive, neutral or negative). The main results can be summarized as follows: (1) regardless of orientation and expression, familiar faces are recognized faster than other stimuli; (2) inverted orientation does not seem to delay response times only for familiar faces; and (3) there appears to be a significant relation between familiarity and expression which is in turn affected by orientation.

Concerning the first issue, although based on RTs, our data are consistent with studies that show different psychophysiological responses for famous than unknown persons (Tranel et al., 1985; Ellis et al., 1999). In our study, faces of personally familiar people (relatives, spouse, partner, etc.) are identified more quickly compared to famous and unknown faces across all conditions. These data are consistent with Herzmann et al. (2004) who found higher autonomic responses for familiar, compared to both famous and unknown faces, although RTs did not differ between familiar and famous stimuli. This discordance can be explained by the different operative definitions of familiarity: in their study, Herzmann et al. (2004) used a broad concept of familiarity (portraits of lecturing staff), while we use a strict notion of familiarity and this difference may result in different RTs during the recognition task.

The second argument looks at the interaction between orientation and class, showing results that are consistent with studies that demonstrate RTs increase in face recognition when stimuli are inverted, confirming the difficulty of recognizing faces in this orientation (e.g., Itier and Taylor, 2002). However, we found this effect only for famous and unknown faces: response times for inverted familiar faces were not significantly higher compared to the same stimuli in upright condition. Some authors claim that the holistic processing used for upright faces is lost with inversion, and inverted faces, like objects, are processed only on the basis of their parts (i.e., Farah et al., 1995a,b). However, most of the studies that tested the effect of inversion have examined face recognition of famous and unknown people only (e.g., Itier and Taylor, 2002; Megreya and Burton, 2006). Hence, absence in literature of similar tasks makes it difficult to provide an exhaustive explanation of the phenomenon. We can suppose the holistic configuration is less compromised in inverted condition in function of familiarity. Further research is however needed.

Regarding the third issue, interaction between familiarity, expression and orientation in upright condition, our data show that subjects are faster in evaluating negative familiar faces than neutral ones. No differences were found across levels of expressions when famous and unknown faces were shown. Our results are consistent with studies that emphasize the joint effect of familiarity and expression in face recognition (Baudouin et al., 2000; Gallegos and Tranel, 2005; Dobel et al., 2008). One explanation for this pattern of results is based on the assumption that face recognition is easier if faces display typical rather than atypical expressions. So, there is a "*perceptual learning*" that defines the type of cognitive representation of known faces (Kaufmann and Schweinberger, 2004). It has been claimed that famous faces are depicted more frequently displaying one typical expression (generally positive) than all possible ones and resulted in faster recognition when smiling. Our famous stimuli varied for typical expressions (in the Italian media-context Vittorio Sgarbi is more frequently portrayed with negative expressions than positive expressions, unlike Roberto Benigni, while for other stimuli such as Queen Elizabeth or Barack Obama it is difficult to establish). This could partially explain the lack of differences in our results between expressions in this class of stimuli. Nevertheless, perceptual learning explanation cannot support our results for familiar faces. It is difficult to assume that there is a prototypical emotional representation for each family member, since the history of the relations are too varied to expose a subject to just one of their emotional expressions. And even if there were, it would be characterized by an extremely high inter-individual variability. One could argue that our subjects chose relatives with whom they had a higher affinity and good relationship and therefore cognitive representations of them were characterized by positive expressions. However, it is likely that the "expressive" representation of a relative is influenced by his character or his personality but plausibly independent from affection (for example if one has a taciturn or sulky disposition, his face representation will be characterized more by a neutral expression than positive, but this does not imply less affection towards him). So, regarding stimuli used in this experiment, the absence of a distinctive prototypical representation was, for different reasons, a common condition for both familiar and famous faces.

In regard to the inverted condition, some interesting results were obtained. Emotional expressions had an influence only on famous and unknown faces. No differences were found between the three expressions in the "familiar" condition. Literature regarding the processing of emotive expressions in inverted condition, is quite scarce and heterogeneous. While some studies show a detrimental effect of inversion on the recognition of all expressions, apart from positive ones (McKelvie, 1995; Calvo and Nummenmaa, 2008), some others reported an inversion effect for all types of emotions (Prkachin, 2003) or even opposite results, with happy faces being more affected by inversion than the others (Goren and Wilson, 2006). In our study, despite instructions not to explicitly recognize the emotions presented, our results seem to confirm those studies that show an easier processing of positive expressions also in inverted condition (Leppänen and Hietanen, 2004; Bombari et al., 2013). Again, familiar faces seem to constitute a distinct type of stimuli, being minimally affected by inversion in the analysis of the emotive effect.

## **PHILOSOPHICAL PHENOMENOLOGICAL VIEW**

In division 1 of Being and Time, Heidegger (1996) argues that we ordinarily encounter objects as equipment, that is, as being for certain sorts of tasks (hammering, writing, etc.). He states that we do not generally encounter beings as detached, theoretical entities [*Vorhanden*] but as available or "ready-to-hand" [*Zuhanden*] and entwined in a tacit, holistic contexture of equipment (Ratcliffe, 2002). This account is reinforced by Merleau-Ponty (1962), who claims that the perceived object is always contextualized, not just by its physical surroundings, but by the particular projects and interests of the perceiver: the particular and potential actions that the perceiver is engaged in or could be engaged in. As Noë notes: "Perception is not something that happens to us, or in us. It is something we do [. . .] What we perceive is determined by what we do (or what we know how to do); it is determined by what we are ready to do. In ways I try to make precise, we enact our perceptual experience; we act it out" (Noë, 2004). Hence, by following a phenomenological approach, perception is an active process that is structurally embodied and embedded, but it is possible to argue that the perception of a person is different when compared to the perception of objects. Recognizing a human face means to become aware of a particular kind of percept—the face of another human being like me—but it does not always mean to identify a "person": a person is a human being regarded as an individual, an individual is a single human being as distinct from all other human beings (Liccione, 2013). Moreover, in encountering another person the most pressing task is relational engagement, and the way for which this engagement can be achieved depends upon many (inter)subjective and contextual factors, such as facial expressions. When we perceive an unknown person's portrait we recognize a "face" (not an object), but our relational engagement with him/her is based only on the mere social meaning of his/her facial expression (i.e., in terms of approach/escape behavior). So, his/her identity and possible relational engagement are not interrelated. When we perceive a famous face, like that of Barack Obama, we really individualize a "person"—Barack Obama—the current president of the U.S., that is, a human individual with specified personality characteristics, so our relational engagement with him is based on our "media" knowledge. In this case, identity is an important factor but recognition of Barak Obama do not take the shape of a personal and unique historical pattern of relational engagements. Instead, recognition of our mother's face occurs in a context of an exclusive and unique historical pattern of interactive opportunities that are so salient as to be constitutive of their recognition. Identity is a decisive factor.

Burton et al. (2005), proposed that the better recognition of familiar faces, with respect to unfamiliar ones, is due to a more functional refinement of stored representations of the former. This fine-tuning of representations of faces is "exposure-driven", that is each new image of a face gradually upgrades its abstract representation, merging features that are constant across all possible variations. Our results (lower RTs for familiar faces) can well fit with this explanation given that it is possible argue that a person is more exposed to the faces of his/her family members than to those of celebrities and therefore he/she holds more powerful abstract representations of the former faces than those of the latter. Our familiar stimuli encompass several categories of relatives (e.g., parents, spouse, partner, etc.) for which it is likely to assume a different frequency of occurrence of encounters and, consequently, various refinement degrees of their abstract representations. Therefore, in order to verify frequency hypothesis it would be necessary provide experimental control of variables related to exposure effects (such as length of acquaintanceship with each relative and how long a subject spent time with him/her). In this way it would possible to examine if response times among familiar stimuli are or not affected by frequency of exposure. We cannot establish it solely with the data of this study.

Exposure time is often referred to domain of vision: Johnston and Edmonds (2009), correctly wrote that celebrities "may be very well known to the participants for a long period of time, *have been seen* in many different views and contexts, *have been seen* on many different occasions, and *have been seen* for lengthy periods of time (our italics)". Let's take a hypothetical example in which a family member and a celebrity have the same exposure time to the subject (i.e., a distant relative and a very famous anchorman). It is possible to argue that their cognitive representations share identical degrees of refinement and the same level of robustness to variation. Nevertheless, there is an important issue in supposing different qualitative aspects of quantitative exposure to these faces (famous and familiar targets): the celebrity's face never "looked" at me, that is she never directed her gaze toward my person and, correspondingly, although I have *seen* his face, I have never *looked* at it. There is no real (eye) contact with famous faces since there is no intentional reciprocity for engagement. According to Stawarska (2006), mutual gaze implies an attention contact, yielding social attunement: intentional gaze toward the eyes of another, returned by him, allows for a second-person relation while observations without contacts produce a third-person relation. Cole (1999), claimed that in our social relationship we "exchange or share a mutual gaze". Cooperative visual attention is a considered fundamental step for cognitive development and especially for social and emotional competences (Stawarska, 2006) and recently Mason et al. (2004) have shown that gaze direction contributes to the memorability of others. In our study, subjects were asked to produce a recognition judgment (whether the face was known or not) and familiar faces were the only targets for which it is possible to argue a past history of mutual glances. Moreover, the mutual gaze between members of the family is affectively characterized, unlike with strangers. We can argue that these qualitative aspects are doubtless unique for personally familiar faces, even if exposure is a decisive factor that strengthens familiarity (Burton et al., 2005), affective and emotional aspects related to personal narratives with others seem to play a special role in face processing. In their study (Gobbini et al., 2004; Gobbini and Haxby, 2007), showed different neuronal activation patterns in response to familiar faces, compared to famous or unknown ones, and these data are confirmed by other fMRI studies (Todorov et al., 2007; Vuilleumier and Pourtois, 2007). As argued by Gobbini et al. (2004) and Gobbini and Haxby (2007), interpersonal relationships towards familiar members provide a "person knowledge", a set of salient biographical and autobiographical information that are *integral components of cognitive representation* of them (our italics).

According to this vision, it is possible to argue that the encounter with expressive famous faces does not have the same meaning as that connected to family members, that is the same quality of personal significance with relatives: expressions displayed on familiar faces are linked to memories that imply particular relational engagements and these can co-occur with recognition. Ratcliffe (2008), argues that "feelings of familiarity [. . .] or relatedness [. . .] can play a role in constituting the sense that a perceived entity is a remembered entity". In other words, the relational horizons towards "my" mother (also) contribute to the recognition of her as "my" mother.

Arciero and Bondolfi (2009), claim that "at pre-reflective level, e-moting is the embodied meaning of an ongoing situation, perceived as a global mode of feeling and concurrently as a relational domain". We can consider the "emotional face" as a salient cue of this relation domain that discloses new possibilities of action and passion. Indeed Cole (1999), argues that face-to-face encounters involve feeling toward and between people and that other faces put a "demand" on one, that is, it requires responding and entering into a relationship. So, expressive faces always imply my Self. Social meaning of facial expressions for Self is often singled out to explain different behavioral responses to angry and happy faces: positive expressions evoke approval and satisfaction with our conduct while angry expressions denote disconfirm (D'Argembeau et al., 2010). Both confirmation and disconfirmation of the Self move the subject to relational acts (to speak, to smile, to discuss, to embrace) and in this sense sad expressions elicit concern and call for caring. We suppose, however, that the relation between significance of expression and Self is (more) meaningful when it actually implies the Self. Instead, there are no angry, sad or happy faces but rather angry, sad or happy people with which the subject has different relational engagements. Consequently angry an expression by Barack Obama does not involve a sense of disconfirmation, as it would be as if the same expression were displayed on one's mother's face! The same can be said for sad and happy expressions. This can explain our results about interplay between identity and emotive expression and particularly results for the facilitation role of negative familiar faces. Indeed negative expressions are associated with "critical" relational contexts and can similarly imply negative emotional responses (such as concern, worry, quandary, but also sadness and anger): we can suppose that when these expressions are displayed by significant others the Self is more involved because of significant past relational engagement with them.

To summarize, famous and familiar faces are therefore different in respect to the historical conditions that have shaped and structured the experience with the person that these images depict. When we perceive faces, we are required to potentially actualize relational engaging, but if the faces carry an affective historical (past), engaging will be better recognized because historical relationships with them have the nature of *lived experiences*. Therefore, familiarity represents an indispensable condition for the perception of another's face to be connected to a history of relational engagement. It is not a stimulus that is added to the perceptive structure of the face, but rather an embodied meaning which manifests itself in the face of a familiar person, inevitably referring back to the self. This phenomenological point of view explains why the holistic perception of a familiar face is maintained even if inverted: in our study, RTs for inverted faces showed no significant differences compared to those for upright faces. If we consider familiarity as constitutive to perception, and not only as perceptive content, it is therefore plausible that inversion can in no way act on it, unless the facial structure is so deformed as to render recognition impossible. In reference to this inverted condition, the results regarding the relationship between emotive expressions and class are questionable, for this reason further research is necessary to repeat the data.

#### **LIMITS**

We used a limited set of faces that were repeatedly presented to subjects across upright and inverted condition. This may have resulted in a lower uncertainty during recognition tasks.

We conducted a pilot study to evaluate the recognition of positive, negative and neutral expressions displayed by famous faces. We did not plan a similar questionnaire addressed for assessing same expressions depicted by familiar faces.

We did not collect data for familiar faces regarding (1) length of acquaintanceship, (2) how long a subject spent time with them and (3) degree of appreciation for each familiar member. It is possible to argue that our concept of "familiarity" is independent (unrelated) to the first two variables (at least) but we cannot establish it solely with the data of this study.

The results of the present study showed that face recognition is facilitated by familiarity and emotional expression, emphasizing the distinction between famous and personally familiar faces and stressing importance of historical aspects from a phenomenological point of view.

#### **REFERENCES**


Liccione, D. (2013). Verso una neuropsicopatologia ermeneutica. *Int. J. Philos. Psychol.* 4, 305–324. doi: 10.4453/rifp.2013.0032


Merleau-Ponty, M. (1962). *Phenomenology of Perception.* London: Routledge.

Morrison, D. J., Bruce, V., and Burton, A. M. (2001). Understanding provoked overt recognition in prosopagnosia. *Vis. Cogn.* 8, 47–65. doi: 10. 1080/13506280042000027

Noë, A. (2004). *Action in Perception.* Cambridge: MIT.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2014; accepted: 12 August 2014; published online: 01 September 2014*.

*Citation: Liccione D, Moruzzi S, Rossi F, Manganaro A, Porta M, Nugrahaningsih N, Caserio V and Allegri N (2014) Familiarity is not notoriety: phenomenological accounts of face recognition. Front. Hum. Neurosci. 8:672. doi: 10.3389/fnhum.2014.00672*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Liccione, Moruzzi, Rossi, Manganaro, Porta, Nugrahaningsih, Caserio and Allegri. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Spatio-temporal dynamics and laterality effects of face inversion, feature presence and configuration, and face outline

## *Ksenija Marinkovic 1,2\*, Maureen G. Courtney3, Thomas Witzel 4, Anders M. Dale1,5 and Eric Halgren1,5*

*<sup>1</sup> Department of Radiology, University of California San Diego, La Jolla, CA, USA*

*<sup>2</sup> Department of Psychology, San Diego State University, San Diego, CA, USA*

*<sup>3</sup> Cognitive Neuroimaging Laboratory, Center for Memory and Brain, Boston University, Boston, MA, USA*

*<sup>4</sup> Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Radiology Department at Harvard Medical School, Boston, MA, USA*

*<sup>5</sup> Department of Neurosciences, University of California San Diego, La Jolla, CA, USA*

#### *Edited by:*

*Mark A. Williams, Macquarie University, Australia*

#### *Reviewed by:*

*Christian Dobel, Westfälische Wilhelms-Universität Münster, Germany Emmanuel J. Barbeau, Centre National de la Recherche Scientifique, France*

#### *\*Correspondence:*

*Ksenija Marinkovic, Department of Psychology, San Diego State University, 5500 Campanile Dr. San Diego, CA 92182–4611, USA e-mail: kmarinkovic@mail.sdsu.edu*

Although a crucial role of the fusiform gyrus (FG) in face processing has been demonstrated with a variety of methods, converging evidence suggests that face processing involves an interactive and overlapping processing cascade in distributed brain areas. Here we examine the spatio-temporal stages and their functional tuning to face inversion, presence and configuration of inner features, and face contour in healthy subjects during passive viewing. Anatomically-constrained magnetoencephalography (aMEG) combines high-density whole-head MEG recordings and distributed source modeling with high-resolution structural MRI. Each person's reconstructed cortical surface served to constrain noise-normalized minimum norm inverse source estimates. The earliest activity was estimated to the occipital cortex at ∼100 ms after stimulus onset and was sensitive to an initial coarse level visual analysis. Activity in the right-lateralized ventral temporal area (inclusive of the FG) peaked at ∼160 ms and was largest to inverted faces. Images containing facial features in the veridical and rearranged configuration irrespective of the facial outline elicited intermediate level activity. The M160 stage may provide structural representations necessary for downstream distributed areas to process identity and emotional expression. However, inverted faces additionally engaged the left ventral temporal area at ∼180 ms and were uniquely subserved by bilateral processing. This observation is consistent with the dual route model and spared processing of inverted faces in prosopagnosia. The subsequent deflection, peaking at ∼240 ms in the anterior temporal areas bilaterally, was largest to normal, upright faces. It may reflect initial engagement of the distributed network subserving individuation and familiarity. These results support dynamic models suggesting that processing of unfamiliar faces in the absence of a cognitive task is subserved by a distributed and interactive neural circuit.

**Keywords: magnetoencephalography, faces, fusiform gyrus, temporal cortex, laterality, dual route model, face inversion**

## **INTRODUCTION**

Faces have captured a great deal of attention in the neuroimaging field, resulting in important insights into the brain networks that underlie material-specific processing. Based on neuroimaging evidence of right-dominant activity in the fusiform cortex that is greater to faces than other meaningful visual stimuli, this area has been termed the "fusiform face area" (Kanwisher et al., 1997; Kanwisher and Yovel, 2006), although the nature of its "face-specificity" has been debated (Gauthier et al., 1999; Halgren et al., 2000; Haxby et al., 2001; Haxby, 2006; Cowell and Cottrell, 2013).

Studies using temporally precise methodology such as ERPs (Event-Related Potentials) and MEG (Magnetoencephalography) reveal a face-sensitive deflection peaking at around 170 ms (N170 and its magnetic counterpart M170) estimated to that region (Lu et al., 1991; Halgren et al., 2000; Liu et al., 2000; Watanabe et al., 2003; Schweinberger et al., 2007; Eimer, 2011; Miki et al., 2011; Rossion and Jacques, 2011; Taylor et al., 2011). Intracranial studies confirm both the timing and the location of the primary generator of these potentials in the inferotemporal region (Allison et al., 1994; Halgren et al., 1994a; McCarthy et al., 1997; Puce et al., 1997; Barbeau et al., 2008) but also indicate that the face processing is subserved by a distributed network additionally comprising anterior temporal and prefrontal regions (Halgren et al., 1994b; Klopp et al., 1999; Marinkovic et al., 2000; Barbeau et al., 2008). Generators of face-induced N170 are highly consistent with the fMRI activity in the right fusiform gyrus (FG) (Puce et al., 1997) although fMRI studies also confirm engagement of distributed occipital, temporal, and frontal areas (Ishai et al., 2004; Chan and Downing, 2011).

Converging evidence suggests that faces are processed in a series of successive, but overlapping and mutually interactive processing stages engaging multiple brain areas. Following encoding in the posterior visual areas (at ∼100 ms), activation peaks in the FG at about 170 ms after stimulus onset. At this time it is briefly phase locked with the activity in distributed association cortices primarily in ventral temporal and prefrontal regions (Klopp et al., 2000), suggesting that the face processing is mediated by a network of simultaneously active sources during the N170 stage. The N170 is followed by a deflection at ∼240 ms (Barbeau et al., 2008) and subsequent activity that mediates integration with mnemonic, emotional, and other contributions in distributed areas, resulting in face recognition (Halgren et al., 1994a,b; Puce et al., 1999). This broad outline of the spatio-temporal activity pattern is consistent with the original model proposed by Bruce and Young (1986) which, in turn, serves as the foundation of the currently prevalent accounts (Halgren et al., 1994a; Haxby et al., 2002; Ishai, 2008; Behrmann and Plaut, 2013). Even though these models conceptualize face processing as being mostly sequential in nature, it is clear that this is an interactive process with overlapping, rather than discrete and temporally circumscribed stages (Halgren et al., 1994a,b; Barbeau et al., 2008; Behrmann and Plaut, 2013). They flexibly mediate structural encoding, familiarity, and retrieval of semantic information resulting in recognition, with an increasing degree of reliance on distributed and interactive circuits.

The goal of this study was to examine the spatio-temporal stages and the functional tuning of the areas engaged during face processing with an anatomically-constrained MEG method. This multimodal methodology combines whole-head high-density MEG and a distributed source modeling approach with highresolution structural MRI and cortical reconstruction to estimate the anatomical distribution of the underlying neural networks in a time-sensitive manner (Dale and Sereno, 1993; Hämäläinen and Ilmoniemi, 1994; Dale et al., 1999, 2000; Fischl et al., 1999a). Our analysis focused on both the relative amplitudes and latencies of the deflections evoked by faces and other conditions, as well as the spatial pattern of estimated activation. In particular, we wished to examine the sensitivity of the M170 to presence and configuration of inner features, face inversion, and face outline. Some of these variables have been manipulated in other studies (Bentin et al., 1996; Eimer, 2000b; Tong et al., 2000; Macchi Cassia et al., 2006; Zion-Golumbic and Bentin, 2007; Harris and Nakayama, 2008; Rossion and Jacques, 2008; Liu et al., 2010; Nichols et al., 2010; Gao et al., 2013) but we aimed to explore these effects in a more comprehensive manner. We used grayscale photographs of unfamiliar faces and manipulated face orientation (upright vs. inverted), internal features and external outline (present or absent) and the relative feature configuration (canonical vs. rearranged) resulting in the following conditions: "normal—N," "inverted—I," correct facial features presented in an oval without the hairline ("oval—O"), unnaturally rearranged facial features within the natural face outline ("rearranged—R"), blank faces with natural outlines but with no features ("blank—B"). Visual control (C) stimuli were obtained by randomizing grayscale patches of the face images so that they no longer looked like faces while preserving the spatial frequency, luminance, and overall shape. We were especially interested in investigating the functional profile of the M170 and its sensitivity to the presence and absence of features and their arrangement. For instance, if it indeed reflects a face-encoding stage, then it will be responsive to the presence of facial features irrespective of the facial outline (Bentin et al., 1996; Eimer, 2000b; Tong et al., 2000; Zion-Golumbic and Bentin, 2007). Furthermore, by using methodology that provides reasonable spatial source estimates, we wished to examine the spatial characteristics of the M170. For instance, even though the right hemisphere (RH) dominance of the M170 has been established (Halgren et al., 2000; Rossion et al., 2003a; Kloth et al., 2006), contributions of the left hemisphere (LH) at this latency in the context of these manipulations are not clear.

A special case is presented by inverting face stimuli and we included this condition in our study. Impaired recognition of faces that are presented upside-down, relative to other objects (Valentine, 1988) has been termed the "face inversion effect" and is associated with larger amplitude and longer latency of the N170 (Rossion et al., 1999; Eimer, 2000a; Itier and Taylor, 2004a). fMRI studies, however, show that the inverted faces evoke either a smaller or equivalent activity in the FG than the upright faces (Kanwisher et al., 1998; Gauthier et al., 1999; Haxby et al., 1999). Moreover, some fMRI evidence suggests that inverted faces also recruit non-face ("object") areas, evoking stronger responses more medially (Aguirre et al., 1999; Haxby et al., 1999). The dual route model suggests that inverted faces are additionally processed by the LH in a feature-based manner (Moscovitch et al., 1997; De Gelder and Rouw, 2001). This model was examined by comparing the M170 activity to inverted faces in the left and right fusiform cortices and in other engaged areas. The M170 is commonly followed by activity peaking at ∼240 ms which is the earliest deflection that is reliably sensitive to face repetition and may reflect emergence of familiarity through learning (Tanaka et al., 2006; Schweinberger et al., 2007; Zimmermann and Eimer, 2013). We examined spatio-temporal characteristics of the M240 and its activity profile as a function of face orientation, features, and outline. Given that our primary focus of interest was the M170 and the relatively early processing stages that are relevant to the stimulus manipulations, we wished to minimize the semantic aspects of the processing. To that end, we used faces that were unfamiliar to our participants and employed a task of passive viewing with short presentation intervals.

## **METHODS**

#### **SUBJECTS**

MEG recordings and structural MRI scans were obtained from 14 healthy right-handed male subjects between 22 and 29 years of age (mean = 24.21 ± 1.85). The subjects had no neurological impairments and no structural brain abnormalities were seen on their MRI scans. All subjects signed statements of consent that were approved by the relevant review board and were monetarily reimbursed for their participation.

#### **MATERIAL**

Participants viewed six different types of grayscale photos (examples are shown in **Figure 1**) including: normal upright faces (N), inverted faces (I), normal face features presented in an oval without hairline (O), faces with features that were rearranged into unnatural positions (R), blank faces without features but with normal hairline (B), and randomized visual control stimuli (C). The control stimuli consisted of random grayscale patches that no longer looked like faces but that preserved the spatial frequency, luminance, and overall shape. In an effort to ascertain that image manipulations did not cause potentially confounding changes in visual properties, a 2D spatial FFT was calculated across images. The control stimuli did not differ from normal faces in the mean power at low, middle or high spatial frequency bands (*<*5, 5–15, or 15–40 cycles per degree of visual angle, respectively). The stimulus set was

**FIGURE 1 | Group-based average dynamic statistical parametric maps of estimated activity to all six conditions on ventral surfaces at ∼107 ms, ∼160 ms, and at ∼240 ms, showing estimates in ventral and lateral views.** Early visual activity (at ∼107 ms) is stronger to inverted faces and control stimuli. Inverted faces evoked the strongest M160 activity estimated to the fusiform gyrus, followed

by oval, normal and rearranged images. Blank faces and randomized control faces evoked the weakest activity. The subsequent deflection, peaking at ∼240ms was largest to normal faces in the ventral and anterolateral temporal areas bilaterally. Examples of the images are shown below. The individual in the photos consented to the publication of these images.

comprised of the photos of six different Caucasian individuals that were not familiar to any of our subjects. All faces had neutral expression and were selected from a larger set used in prior studies (Marinkovic and Halgren, 1998). The six photographs were manipulated to obtain images across all six conditions.

#### **TASK**

During the MEG recording session the subjects were instructed to passively observe images that were presented in a randomized order on a computer-driven back-projection screen in front of the subject. Each image was presented for 225 ms at 1 s intervals on a gray background within a visual angle subtending 4◦ horizontal × 6◦ vertical. Each stimulus was repeated 16 times, yielding a total of 96 stimuli per condition.

### **DATA ACQUISITION AND ANALYSIS**

MEG signals were recorded from 204 channels (102 pairs of planar gradiometers) with a whole-head Neuromag Vectorview instrument (Elekta Neuromag) in a magnetically and electrically shielded room. The signals were recorded continuously with 601 Hz sampling rate and minimal filtering (0.1–200 Hz). Averages for each stimulus type were constructed from trials free of eyeblinks or other occasional artifacts. On average, 8.5 ± 4.4% trials were discarded. The position of magnetic coils attached to the skull, the main fiduciary points such as the nose, nasion and preauricular points, as well as a large array of random points spread across the scalp were digitized with 3Space Isotrak II system for subsequent precise co-registration with structural MRI images.

Each person's cortical surface was reconstructed from highresolution T1-weighted MRI structural images (1.5T Picker Eclipse, Marconi Medical, Cleveland OH) and was subsampled to ∼2500 dipole locations per hemisphere (Dale et al., 1999; Fischl et al., 1999a). This cortical surface served as the solution space to constrain a noise-normalized minimum norm inverse solution, here termed anatomically-constrained MEG or aMEG. The forward solution was calculated using a boundary element model (Oostendorp and Van Oosterom, 1991). Using a linear estimation minimum norm approach with no constraints on dipole orientation (Dale and Sereno, 1993; Hämäläinen and Ilmoniemi, 1994), dipole strength power was estimated at each cortical location every 5 ms. The estimates were normalized by noise obtained from the average pre-stimulus baseline which reduced the pointspread function variability (Liu et al., 2002a), and resulted in a series of frames of dynamic statistical parametric maps (dSPMs) of estimated cortical activity (Dale et al., 2000). These noisenormalized estimates of the current dipole power for each location fit the F distribution and can be viewed as "brain movies" as they unfold in time. Group averages for each condition were obtained by aligning cortical folding patterns across all individuals and averaging their inverse estimates (Fischl et al., 1999b; Dale et al., 2000). **Figure 1** presents the group average dSPMs of the overall activity patterns evoked by each stimulus condition at 107, 160, and 240 ms after stimulus onset. Estimated cortical activity is displayed on inflated views of an averaged cortical surface.

Whereas the movie snapshots represent estimated activity for the whole cortical surface at each time point, an alternative way of examining the data is to look at the timecourses (estimated noise-normalized dipole strength across time) for the selected regions of interest (ROIs). These waveforms represent estimated dipole strength moments in the cortical *source* space and are suitable for assessing the effects of stimulus conditions on both amplitude and latency (Marinkovic et al., 2003). In order to further explore activity timecourses and to ascertain statistical significance of the particular comparisons, ROIs were chosen for the relevant areas on the cortical surface based on the overall group average estimated activity. They included the posterior occipital cortex (Occ), the lateral FG, and ventrolateral anterior temporal cortex (aTL). The same group-based ROIs were used for all subjects in a manner blind to their individual activations by means of an automatic spherical morphing procedure (Fischl et al., 1999b). The ROIs contained 4.8 ± 2.3 vertices on average, corresponding to ∼2.7 cm2 of the cortical surface. The noise-normalized dipole strength estimates were averaged across all cortical points contained in each ROI at each time point. These values obtained for each subject and task condition were used for the statistical analysis. Within-subject ANOVAs were employed to examine differences in activity among conditions at different latencies. In most cases it was possible to determine singular amplitude peaks within the three latency windows of interest. For the occipital activity peaking at ∼107 ms (M107), peak amplitudes were detected within a 90–125 ms time window for each subject and task condition with an automatic algorithm. This made it possible to also examine task condition effects on peak latencies. Similarly, peak amplitude of the M160 in the right FG was identified within 130–190 ms time window for each subject and task condition. Activity in the left hemisphere at this latency was weaker and less consistent across subjects, making it difficult to detect amplitude peaks. Instead, average amplitudes were used to examine task condition effects on the activity within the 120–150 and 170–190 ms latency windows in the left FG. Within-subject ANOVA (Woodward et al., 1990) was used to statistically compare differences across conditions for each ROI and each of the three deflections. The Bonferroni method (Woodward et al., 1990) was used as a conservative protection against inflated *p*-values due to multiple comparisons and the adjusted *p*-values are reported unless specified differently.

## **RESULTS**

Inspection of the overall activity indicates that the earliest activity is estimated to the occipital region at ∼107 ms (M107) after stimulus onset. It propagates anteriorly via the ventral visual stream to the predominantly right ventral temporal areas peaking at ∼160 ms (M160), and further on to the anterior ventrolateral temporal and prefrontal regions at ∼240 ms (M240). Group-average dSPM estimates are shown in **Figure 1** for the activity at 107, 160, and 240 ms. Timecourses derived from the relevant ROIs are shown in **Figure 2**, and graphs of mean estimated activity across all conditions in **Figure 3**. **Table 1** summarizes main results across the ROIs and peak latencies.

The early occipital response peaks at 107 ms with a very similar amplitude and profile in both hemispheres. This observation was confirmed with an ANOVA of the peak amplitude (within 90–125 ms timewindow) with the factors of hemisphere and condition type. There was no main effect of laterality [*F*(1*,* 13) = 0*.*29, *p >* 0*.*5] and no laterality x condition interaction [*F*(5*,* 65) = 0*.*99, *p >* 0*.*45], so the results were pooled across both hemispheres. The main effect of Condition [*F*(5*,* 65) = 8*.*0, *p <* 0*.*0001] results from a greater peak amplitude to inverted faces and control stimuli [*F*(1*,* 13) = 12*.*9, *p <* 0*.*05] as compared to all other stimuli. The peak latency (107 ms) does not differ between the two hemispheres, *F*(1*,* 13) = 0*.*61 *p >* 0*.*45), but the peak latency to inverted faces (111 ms) is longer than the latency to all other stimuli (106 ms), *F*(1*,* 13) = 11*.*1, *p <* 0*.*05.

The subsequent deflection (M160) is right-dominant and is estimated to the fusiform cortex (**Figure 1**). ANOVA of the peak amplitude (within 130–190 ms time window) indicates that the right M160 is uniquely sensitive to condition differences as shown by the significant main effect, *F*(5*,* 65) = 8*.*4, *p <* 0*.*0001. Inverted faces evoke the greatest activity amplitude than all other stimuli, *F*(1*,* 13) = 14*.*8, *p <* 0*.*01, followed by other stimuli that include facial features such as the oval, normal, and rearranged faces (**Figures 1**–**3**). Activity to normal, upright faces does not differ from the activity to faces with rearranged features, or to normal features presented in an oval. That is, the canonically oriented stimuli containing inner features regardless of their arrangement elicit activity that appears to be very similar at ∼160 ms latency. Blank facial outlines with no features elicit the weakest activity, *F*(1*,* 13) = 21*.*5, *p <* 0*.*01.

At around this latency, activity estimated to the left FG is much weaker overall (**Figures 1**, **2**). Since the peak patterns at this latency in the left hemisphere are not consistent or always clearly distinguishable across subjects, the condition effects are examined by averaging response amplitudes within the specified latency windows. The first average deflection peaking at 140 ms (average amplitude within 120–150 ms) is not differentiated by any of the stimulus characteristics, as indicated by the lack of main effect, *F*(1*,* 13) = 0*.*1, ns (**Figures 2**, **3**). However, the main effect of the deflection peaking at 180 ms (average amplitude within 170–190 ms latency window), *F*(5*,* 65) = 2*.*5, *p <* 0*.*05, reflected its sensitivity to inversion. This deflection tends to be greater to inverted than all other stimuli, *F*(1*,* 13) = 4*.*0, *p <* 0*.*07 (uncorrected). A similar pattern but with a more robust effect of inversion is observed in the left aTL at this latency (**Figure 2**), with a significant main effect of condition, *F*(5*,* 65) = 9*.*6, *p <* 0*.*001. In the aTL, inverted faces elicit greater activity than all other stimuli at ∼180 ms, *F*(1*,* 13) = 22*.*5, *p <* 0*.*001. Therefore, inverted faces selectively engage the left ventral temporal cortex with slightly longer peak latency than the right-dominant fusiform area.

The M160 is followed by another peak at ∼240 ms (M240) after stimulus onset (**Figures 1**–**3**). The strongest M240 is elicited by normal faces, especially along the ventral stream, including the left FG and aTL bilaterally. ANOVA of the peak amplitudes within 210–250 ms latency in the left FG revealed a main effect of condition, *F*(5*,* 65) = 2*.*5, *p <* 0*.*05, with a tendency for normal faces eliciting greater activity than all other stimuli, *F*(1*,* 13) = 8*.*4, *p <* 0*.*07. In the left anterior ventrolateral temporal cortex, the

activity to normal faces was also stronger than to all other stimuli overall, *F*(1*,* 13) = 11*.*4, *p <* 0*.*05, although it did not differ from the stimuli with features presented within the oval. The peak latency (239 ± 16 ms) did not differ across conditions with the exception of a longer peak latency trend for the inverted faces (246 ms), *F*(1*,* 13) = 7*.*9, *p <* 0*.*09. Finally, as on the left, the activity to normal faces in the right aTL was greater than to other stimuli within 220–270 ms time window, *F*(1*,* 13) = 9*.*8, *p <* 0*.*05. The peak latency (255 ± 20 ms) was longer on the right than on the left, *F*(1*,* 13) = 36*.*1, *p <* 0*.*001.

## **DISCUSSION**

Our results support models proposing that face processing unfolds in successive, but overlapping and mutually dependent spatio-temporal stages in the ventral visual stream. The incoming face stimuli are analyzed for their visual characteristics at ∼100 ms in the occipital visual areas as indexed by M107. Structural encoding of the face-specific aspects takes place in the FG at ∼160 ms (M160) especially on the right, with the exception of the inverted faces that additionally activate anteroventral temporal cortex on the left. Subsequent, presumably more integrative processing, engages distributed inferoventral and anterolateral temporal areas at ∼240 ms (M240) bilaterally. These latencies of face-related activity peaks have been observed in other MEG studies (Schweinberger et al., 2007; Taylor et al., 2011) and confirmed with iEEG (Barbeau et al., 2008), lending further support to similar stages proposed by other models (Bruce and Young, 1986; Halgren et al., 1994a; Haxby et al., 2000).

### **M107—SENSITIVITY TO LOW-LEVEL VISUAL FEATURES**

In the present study, the initial activity peak (M107) in the occipital area is greater to inverted and randomized control faces in comparison to other stimulus categories. Other ERP and MEG studies have also reported larger peak at ∼100 ms to inverted faces (Linkenkaer-Hansen et al., 1998; Itier and Taylor, 2002, 2004a; Schweinberger et al., 2007; Meeren et al., 2008) and to randomized control faces (Halgren et al., 2000) in comparison to normal faces. Based on such findings, it has been proposed that stimulus categorization takes place at ∼100 ms based on holistic perception of a face (Liu et al., 2002b; Itier and Taylor, 2004b). However, other evidence suggests that the activity differences may be merely due to low-level visual differences. MEG studies indicate that the mid-occipital M100 amplitude is increased as a function of parametrically varied pixel noise (Tarkiainen et al., 2002) and spatial frequency (Tanskanen et al., 2005). Similarly, the fMRI-BOLD signal is larger to visually randomized faces in retinotopic areas (Lerner et al., 2001). This evidence is consistent with the idea that the observed categorical differentiation at ∼100 ms is based on low-level visual characteristics rather than a holistic percept (Rossion and Caharel, 2011; Cauchoix et al., 2014). Nevertheless, this deflection may represent an initial step in the face-sensitive analysis of the global visual characteristics with the purpose of tuning and facilitating subsequent processing (Halgren et al., 1994a; Itier and Taylor, 2004a). All of our stimuli belong to the face-like category, but those that deviate more from a global face template based on their shape (inverted faces) or texture and contour (randomized control stimuli) evoke the strongest M107 activity in the occipital area (**Figure 2**). Based on its sensitivity to low-level features, this initial stage may serve as a domain-specific gate, "flagging" stimuli that deviate in orientation or shape (Portin et al., 1999; Tsao and Livingstone, 2008) and allowing for a fast visual categorization (Crouzet and Thorpe, 2011). This stage may facilitate subsequent structural encoding stage which is represented in the FG at 160 ms, carrying out further refinement (Rossion and Caharel, 2011).


**Table 1 | ANOVA results for the main effects and condition contrasts carried out for M107, M160, and M240 response amplitudes and peak latencies.**

*The p-values for condition contrasts are reported with Bonferroni adjustment.*

#### **M160—GLOBAL FACE ENCODING**

This stage is reflected in a strongly right-lateralized M160 deflection which was greatest to inverted faces. All other face-like stimuli (normal, oval, and rearranged) evoked similar, intermediate-level activity in the fusiform cortex, whereas blank and randomized control faces evoked the weakest activity (**Figure 2**). This suggests that the face representation formed at this stage is based on a roughly face-like template that contains basic visual elements of a face: oval-shaped contour in an upright position with contrasting facial features regardless of whether they are spaced appropriately. Although the M160 representation lacks precision allowing for individuation at this stage, the stimuli that were most face-like evoked stronger activity than the blank faces and control stimuli which carry very little visual information needed for subsequent recognition. Our data are consistent with previous suggestions that this deflection reflects the operation of a face-encoding processing stage (Halgren et al., 1994a; Bentin et al., 1996; Puce et al., 1999; Eimer, 2000b; Downing et al., 2001; Bentin and Carmel, 2002), akin to the structural encoding ("face detection") module originally proposed by Bruce and Young (1986). In contrast to M107 which is sensitive to gross visual characteristics, the M160 deflection (presumably analogous to N170 in the ERP literature) is larger to stimuli that broadly resemble faces and can be processed further for familiarity. Consistent with other evidence, the M160 is responsive to the presence of facial features in the veridical or rearranged configuration irrespective of the facial outline (Bentin et al., 1996; Zion-Golumbic and Bentin, 2007). The M160 is attenuated to blank faces that lack internal features and to randomized control stimuli, confirming other similar findings at this latency in the FG (Eimer, 2000b; Tong et al., 2000). The finding that the rightlateralized M160 is similar in amplitude to stimuli containing inner features irrespective of their configuration could represent a process broadly generalizable to other types of visual stimuli such as words. For instance, ventral temporal cortex on the left is comparably activated by real and pseudowords, but not by other control stimuli (Cohen et al., 2002). In other words, the presence of the requisite features even if they are in unnatural locations may be necessary and sufficient for initial acceptance of a stimulus as possibly representing a face. This aspect of the face processor may be useful in situation when faces are seen in non-habitual orientations (for example, when the observed face is on a person lying on her side) and/or when much of the face is obscured by a hat or hair).

The N170 is largely insensitive to familiarity or repetition and consequently unresponsive to individuation (Marinkovic and Halgren, 1998; Puce et al., 1999; Bentin and Deouell, 2000; Eimer, 2000a; Anaki et al., 2007; Schweinberger et al., 2007; Barbeau et al., 2008; Taylor et al., 2011; Rivolta et al., 2012), providing additional evidence for its role in global face encoding (Bentin et al., 1996). In contrast, the process of individuation and recognition is subserved at the subsequent stage at ∼240 ms, located downstream in temporal cortices bilaterally. During the M160, the face-like features may be extracted by a domain-specific mechanism, permitting formation of a unitary and holistic representation of a face (Tanaka and Farah, 1993; Bentin and Golland, 2002; Schiltz and Rossion, 2006; Jacques and Rossion, 2009). This representation may be projected to distributed association cortices for further mnemonic, semantic, and emotional processing, resulting in the integration of the recognition process, as suggested by face-selective broadband coherence in intracranial EEG between the fusiform and distributed cortical areas (Klopp et al., 2000). The M160 was estimated to the right-dominant ventral temporal area, in the FG. Indeed, intracranial recordings confirm that the primary generators of the N170 deflection are in the fusiform area (Allison et al., 1994; Halgren et al., 1994a; McCarthy et al., 1997; Puce et al., 1997; Barbeau et al., 2008), in agreement with neuroimaging evidence (Kanwisher and Yovel, 2006).

#### **FACE INVERSION ENGAGES DUAL-ROUTE PROCESSING**

The M160 in the right fusiform cortex to inverted faces had a larger amplitude and longer peak latency than all other stimuli, replicating results of numerous other ERP and MEG studies (Bentin et al., 1996; Eimer, 2000a; Rossion et al., 1999, 2000; Liu et al., 2000; Sagiv and Bentin, 2001; Itier and Taylor, 2002, 2004a; Watanabe et al., 2003; Kloth et al., 2006; Honda et al., 2007). In the left ventral temporal cortex, the immediately preceding deflection peaked at ∼140 ms and was insensitive to any manipulation (**Figures 2**, **3**). However, the immediately subsequent deflection peaking at ∼180 ms on the left was selectively elicited by inverted faces (**Figure 2**) in a manner similar to the right M160. Clearly, the M160 is not maximal to optimal stimuli (i.e., normal, upright faces) but to inverted stimuli that deviate from the canonical orientation. At this point, the inverted faces have been classified as faces and need to engage additional resources to continue being processed for recognition. Even though at this latency the overall activity is much weaker in the LH overall, the deflection at 180 ms is elicited selectively by inverted faces. This indicates that they may uniquely engage bilateral ventral temporal cortices, supporting a dual route model (Moscovitch et al., 1997; De Gelder and Rouw, 2001; Rhodes et al., 2004), as well as the related idea that inverted faces recruit other mechanisms in addition to the right fusiform region (Aguirre et al., 1999; Haxby et al., 1999; Rossion et al., 2000; Yovel and Kanwisher, 2005; Epstein et al., 2006; Rossion, 2009). Despite a clear RH dominance in face processing, some evidence suggests that the LH contributes significantly to processing inverted faces. Behavioral studies using divided visual field methodology show the RH advantage in discriminating upright, but not inverted faces (Hillger and Koenig, 1991; Cattaneo et al., 2013), indicating left hemisphere engagement during processing of inverted faces. Similarly, split-brain monkeys show the face inversion effect when the stimuli are presented to the RH, but not to the LH (Vermeire and Hamilton, 1998). The face recognition deficit in prosopagnosic patients is more pronounced with bilateral lesions (Barton, 2008), possibly resulting from a disruption in interhemispheric communication which is critical for integrated perceptual decisions. Furthermore, relatively spared processing of inverted faces in prosopagnosia (Farah, 1996; De Gelder and Rouw, 2001) could be explained by a model of bilateral engagement of a more general system for visual objects (Aguirre et al., 1999; Haxby et al., 1999). Finally, MEG studies (Dobel et al., 2008, 2011) reported that individuals with congenital prosopagnosia manifested a decreased M170 and a strongly reduced gamma power in the left fusiform cortex, confirming left hemisphere involvement in normal face processing. This observation is confirmed by an fMRI study showing decreased activation in the left FG in congenital prosopagnosic patients (Dinkelacker et al., 2011). Therefore, it appears that by disturbing canonical face processing, face inversion creates suboptimal conditions for face recognition (Rossion, 2008), resulting in bilateral engagement of the ventral visual stream. This effect is not unique inasmuch as the N170 is similarly augmented to contrast inversion and misaligned face halves (Itier and Taylor, 2002; Letourneau and Mitchell, 2008; Jacques and Rossion, 2010) which may also rely on additional visual processing mechanisms. Furthermore, engagement of additional resources in the non-dominant hemisphere by visually deviating stimuli may be a more general principle generalizing beyond faces. For instance, even though left-dominance of language processing has been firmly established (Price, 2010), the right hemisphere is selectively engaged by unpronounceable non-words (Marinkovic et al., 2014). Similarly, the right ventral occipitotemporal cortex is more strongly activated by words in the less fluent language in bilingual speakers (Leonard et al., 2010).

#### **M240—EMERGENCE OF FAMILIARITY VIA REPETITION**

Extensive imaging evidence obtained with hemodynamic methods has been commonly interpreted in the context of dedicated face-processing modules particularly in the fusiform area (Kanwisher et al., 1997; Kanwisher and Yovel, 2006). However, spatio-temporally sensitive methods impose the idea of distributed and partly sequential processing encompassing mutually dependent and overlapping areas whereby the face-relevant information is increasingly refined in the posterior-to-anterior direction, reaching identity/semantic networks in the anterior temporal and inferior prefrontal cortices (Halgren et al., 1994a,b, 2000; Puce et al., 1999; Barbeau et al., 2008). Faces are processed by the ventral processing stream similar to other visual stimuli. Subsequent to an early engagement of the striate cortex (M107), ventral occipito-temporal areas support an intermediate material-specific processing stage (M160) providing structural representations to downstream distributed associative areas for processing of identity and emotional expression (Bruce and Young, 1986; Klopp et al., 2000; Liu et al., 2002b). In contrast to M107 and M160 that were larger to inverted faces, the normal, upright faces evoked the largest M240 estimated to the ventral and anterior temporal areas bilaterally, in agreement with other MEG reports (Schweinberger et al., 2007). The M240 deflection engages distributed anterior temporal cortices and may index familiarity detection and recognition, supporting previous iEEG findings (Barbeau et al., 2008). Furthermore, recent evidence shows that the (presumably analogous) N250 is sensitive not only to familiarity (Caharel et al., 2014), but that it emerges to previously unfamiliar faces as a result of repetition and, consequently, familiarization (Tanaka et al., 2006; Schweinberger et al., 2007; Pierce et al., 2011; Zimmermann and Eimer, 2013). Even though we did not manipulate repetition in a condition-specific manner, the present results are consistent with the idea that this deflection may reflect access to recognition units and activation of a memory trace for the particular face that has become familiar with repetition (Zimmermann and Eimer, 2013). Our localization estimates and the observation of the sensitivity of the inferior and anterior temporal cortices to face orientation and identity are supported by fMRI studies (Sugiura et al., 2001; Rotshtein et al., 2005; Kriegeskorte et al., 2007; Nasr and Tootell, 2012; O'Neil et al., 2013) and are further confirmed with single cell recordings in non-human primates (Freiwald and Tsao, 2010). Similarly, lesion studies report that anterior temporal lesions result in face recognition impairments (Glosser et al., 2003; Barton, 2008; Gainotti and Marra, 2011). Thus, it appears that familiarity detection stage depends on the anterior temporal structures, and possibly specifically perirhinal cortex (Allison et al., 1994; Halgren et al., 1994a; Henson et al., 2003).

Even though the estimated M240 sources in our study are bilaterally distributed, the overall activity is left-dominant. It is generally accepted that the left hemisphere is essential for semantic domain especially in language tasks whereas the right hemisphere subserves face processing (Dien, 2009). Right hemisphere bias for faces has been widely reported and accepted (De Renzi et al., 1994; Kanwisher et al., 1997). However, even during face processing left hemisphere may play a dominant role in storage and retrieval of semantic face attributions as indicated by lesion (Glosser et al., 2003; Snowden et al., 2004) and imaging evidence (Griffith et al., 2006). Given that in our study only photographs of previously unknown faces were used, the connection with semantic system is speculative. Nevertheless, an increase in M240 resulting from repeated exposure to upright, normal faces may partially stem from initial engagement of the network supporting person-specific information (Gainotti and Marra, 2011; Zimmermann and Eimer, 2013). These semantic face attributions may be represented in the left hemisphere as is the case with left-lateralized N360 to famous faces (Barbeau et al., 2008). Baron and Osherson (2011) used face stimuli in a visual categorization task and showed that the left anterior temporal lobe was especially sensitive to combinatorial face categorization. Importance of the left hemisphere is supported by reports of prosopagnosia resulting from ventral lesions in the left hemisphere (Verstichel and Chia, 1999; Vuilleumier et al., 2003). Furthermore, Dinkelacker et al. (2011) showed decreased fMRI activation in the left FG in congenital prosopagnosic individuals. Similarly, a MEG study found weaker activity overall in the left occipitotemporal areas in congenital prosopagnosic patients (Dobel et al., 2008). Nevertheless, the overwhelming evidence suggests that face processing depends on distributed bilateral contributions (Farah, 1990; Haxby et al., 2000; Verosky and Turk-Browne, 2012) even in the case of emotional face processing (Fusar-Poli et al., 2009).

The anterolateral and ventral temporal regions may be essential for bringing together the configural representation of the face stimuli with the identity-relevant representations as part of a distributed network (Avidan et al., 2013; O'Neil et al., 2014). iEEG recordings show coherence between the FG and distributed association areas at ∼200 ms to faces (Klopp et al., 2000) and functional connectivity studies support this finding (O'Neil et al., 2014). This transitional entrainment may represent a widespread projection for further processing. The M240 may thus represent the familiarity detection stage as an initial step in accessing the person identity/semantic system that exists for the famous faces or personal acquaintances, followed by the full-fledged recognition percept laden with emotional, mnemonic, and other associations (Halgren et al., 1994a). Since the participants in our experiment were engaged in passive viewing of unfamiliar faces and were not asked to make any explicit judgments, we interpret the M240 as an index of familiarity with caution. Nevertheless, people excel at making attributions about unfamiliar faces such as age, gender, attractiveness, intelligence, etc. (Bruce and Young, 1986) and the M240 may index a familiarity detection stage within a generic face processing stream.

#### **CONCLUSION**

Faces are highly relevant visual objects engaging a multi-stage cascade of mutually dependent and overlapping distributed activity in the ventral visual stream with flexible downstream allocation. An initial analysis of the low-level visual characteristics takes place in the occipital region at ∼100 ms. Its sensitivity to low-level visual features and deviation in orientation or shape and texture may facilitate fast initial categorization. The subsequent activity of the predominantly right ventral temporal area (centered on the posterior FG) at ∼160 ms may index the face detection stage by subserving structural encoding necessary for downstream individuation and recognition. Additional engagement of the left ventral temporal area at ∼180 ms by inverted faces is consistent with the dual route model and spared processing of inverted faces in prosopagnosia. The M240 may index engagement of the familiarity processing network in bilateral, distributed anteroventral temporal areas. Thus, our data support dynamic models of face processing that suggest that face perception is subserved by a distributed and interactive neural circuit (Bruce and Young, 1986; Halgren et al., 1994a,b; Puce et al., 1999; Haxby et al., 2000; Klopp et al., 2000; De Gelder and Rouw, 2001; Rossion et al., 2003b; Ishai et al., 2005; Barbeau et al., 2008; Nasr and Tootell, 2012; Cauchoix et al., 2014).

## **ACKNOWLEDGMENTS**

We are grateful to Rupali P. Dhond, Sharelle Baldwin, Jeremy Jordin, Dave Post, Brendan Cox, Kim Paulson, Jeffrey Lewine, and Bruce Fischl. This work was supported by NIH (NS18741, AA016624, RR14075, P01 HD033113).

## **REFERENCES**


an event-related potential study. *Brain Topogr.* 20, 31–39. doi: 10.1007/s10548- 007-0028-z


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2014; accepted: 08 October 2014; published online: 10 November 2014.*

*Citation: Marinkovic K, Courtney MG, Witzel T, Dale AM and Halgren E (2014) Spatio-temporal dynamics and laterality effects of face inversion, feature presence and configuration, and face outline. Front. Hum. Neurosci. 8:868. doi: 10.3389/fnhum. 2014.00868*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Marinkovic, Courtney, Witzel, Dale and Halgren. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The complete design in the composite face paradigm: role of response bias, target certainty, and feedback

## *Günter Meinhardt\*, Bozana Meinhardt-Injac and Malte Persike*

*Department of Psychology, Johannes Gutenberg University Mainz, Mainz, Germany*

#### *Edited by:*

*Mark A. Williams, Macquarie University, Australia*

#### *Reviewed by:*

*Mark A. Williams, Macquarie University, Australia Guillaume A. Rousselet, University of Glasgow, UK*

#### *\*Correspondence:*

*Günter Meinhardt, Methods Section, Department of Psychology, Johannes Gutenberg University Mainz, Wallstr. 3, 55099 Mainz, Germany e-mail: meinharg@uni-mainz.de*

Some years ago an improved design (the "complete design") was proposed to assess the composite face effect in terms of a congruency effect, defined as the performance difference for congruent and incongruent target to no-target relationships (Cheung et al., 2008). In a recent paper Rossion (2013) questioned whether the congruency effect was a valid hallmark of perceptual integration, because it may contain confounds with face-unspecific interference effects. Here we argue that the complete design is well-balanced and allows one to separate face-specific from face-unspecific effects. We used the complete design for a same/different composite stimulus matching task with face and non-face objects (watches). Subjects performed the task with and without trial-by-trial feedback, and with low and high certainty about the target half. Results showed large congruency effects for faces, particularly when subjects were informed late in the trial about which face halves had to be matched. Analysis of response bias revealed that subjects preferred the "different" response in incongruent trials, which is expected when upper and lower face halves are integrated perceptually at the encoding stage. The results pattern was observed in the absence of feedback, while providing feedback generally attenuated the congruency effect, and led to an avoidance of response bias. For watches no or marginal congruency effects and a moderate global "same" bias were observed. We conclude that the congruency effect, when complemented by an evaluation of response bias, is a valid hallmark of feature integration that allows one to separate faces from non-face objects.

**Keywords: feature integration, composite effect, congruency effect, response bias, selective attention**

### **1. INTRODUCTION**

A common observation in face perception or recognition experiments is that observers have difficulty judging face parts independently. In various studies, Tanaka and colleagues found that facial context strongly modulates recognition of face parts; for houses, researchers have observed less contextual influence (Tanaka and Farah, 1993; Tanaka and Sengco, 1997). The strong interdependence of parts in part-to-whole recognition and matching tasks led to the conclusion that faces are "special" compared to other object categories in that face processing involves relatively little part-based decomposition (Young et al., 1987; Tanaka and Farah, 1993; Farah et al., 1998). The stronger integration of parts for faces compared to non-face objects was substantiated in subsequent studies using classic hallmarks of feature integration (Gauthier et al., 1998; Yovel and Kanwisher, 2004; Kanwisher and Yovel, 2006; Robbins and McKone, 2007; Macchi Cassia et al., 2009; Taubert, 2009; Meinhardt-Injac, 2013).

Integrative processing of object parts may also arise and strengthen as a function of expertise, even with novel and artificial objects (Gauthier and Tarr, 1997). Testing selective attention to objects parts, Gauthier et al. (2003) found evidence that car experts had problems ignoring irrelevant car features. Further, the researchers found that the N170, a face-selective ERP component (Bentin et al., 1996; Itier and Taylor, 2004; Rousselet et al., 2004, 2008; Jacques and Rossion, 2009), was jointly modulated by cars and faces among car experts, which indicates that integrated encoding of object features may have a common sensory basis in objects of expertise. Later measurements failed to confirm similar results in measures of feature integration for faces and non-face objects of expertise, which led to criticism of the expertise hypothesis (Robbins and McKone, 2007). Albeit the dispute about the role of expertise there is consensus that faces and non-face objects differ in their degree of part integration when high degrees of familiarity, expertise or training are not involved (Gauthier et al., 2003; McKone et al., 2006; Rossion, 2013).

#### **1.1. THE COMPOSITE FACE PARADIGM**

A frequently used behavioral approach to measuring the degree of integration among face parts is the composite face paradigm (Young et al., 1987). In this paradigm, face composites are formed by combining a lower and upper half, both stemming from different persons. In the experiment, two such composite faces are shown and observers have to match either the upper or lower halves. **Figure 1A** illustrates matching the upper halves of two composite faces. When two upper halves are same with different lower halves (see "same" example in **Figure 1**), the upper halves look different. Because the two whole faces are indeed different, the failure to selectively attend to just

one half may be because of perceptual integration among both halves (Rossion and Boremanse, 2008). Misaligning the halves hampers integration, and each one can be attended selectively (see **Figure 1B**).

In several studies the composite face paradigm was used in the following variety (Goffaux and Rossion, 2006; Rossion and Boremanse, 2008; Jacques and Rossion, 2009). In "same" trials the upper face halves were same while the lower ones were different. In "different" trials upper and lower halves were both different (see dashed gray boxes in **Figure 2**). Perceptual integration was concluded from the performance difference obtained for aligned and misaligned arrangements. The results of these experiments showed that strong modulatory effects of alignment existed for the "same," but not for "different" trials. Therefore, the authors confined their analyses to the hit-rate (i.e., the rate of correctly indicating same face halves).

The particular way of defining same and different trials and the use of only the hit rate led to the criticism that nonperceptual strategies may have affected the results (Cheung et al., 2008). First, Cheung and colleagues argued that the frequency of same and different unattended face halves should be balanced to avoid induction of bias toward the "different" response category. As shown in **Figure 2** (see gray boxes), the design used by Rossion and colleagues [called the "partial design" (PD) by Cheung and colleagues] includes more different halves than same halves, which might bias an observer's response strategy toward "different" responses. Second, they argued that, generally, any measure of feature integration should not be affected by an observer's response strategies. As with the theory of signal detection, they claimed that a bias-free measure of performance should be used. Such a measure can only derive from the performance

achieved for both response categories (MacMillan and Creelman, 2005, p. 6).

To construct a design with an equal number of same and different halves they proposed to compose same- and different-trials in *congruent* (see 1st row in **Figure 2**) and *incongruent* (see 2nd row in **Figure 2**) variants, and referred to this partitioning as the "complete design," CD. To use a bias-free measure, they proposed using *d*- , which is calculated from the relative frequency data of both response categories. Further, to measure how face halves interact the authors suggested using the performance difference achieved with congruent and incongruent trials, the *congruency effect* (CE). The authors pointed out that comparison of aligned and misaligned conditions is possible with the CD, but it is not necessary (Cheung et al., 2008, p. 1328), because the CE included the effect of interest with all aligned stimuli.

While performance in the congruent trials is widely unaffected by the global or local focus on the face stimuli, performance in incongruent trials can only be good if the observer can attend to only the target half and ignore the non-target half, as the non-target halves vary orthogonally to the target halves and are the same when the target haves are different and vice versa. An observer who is unable to selectively attend to the face halves and integrate across the two halves would perform well in congruent trials, but at chance levels in incongruent trials, which would result in a maximal CE. On the other hand, if the observer is able to maintain a part-based focus on only the target halves, performance would become equal in congruent and incongruent trials, thus, the CE would vanish.

Favoring a perceptual account of facial feature integration, one may be seduced to analyze only the "same" trials, and to disregard the "different" trials (Rossion, 2013, p. 42). However, ignoring performance achieved with one trial class may seriously confound perceptual and non-perceptual sources of the observer's decisions. In this context, it is important to note that the CD is only an experimental design and it does not favor any theoretical account of object processing. As outlined below, it is possible to derive testable hypotheses for the perceptual account of the composite effect within the CD. Advantageously, these hypotheses can be tested using bias-free measurements of performance in a same/different forced choice task.

#### **1.2. TESTING THE PERCEPTUAL AND THE DECISIONAL ACCOUNT OF THE COMPOSITE EFFECT WITH THE COMPLETE DESIGN**

Some authors regard the composite effect as a visual illusion that stems from perceptual integration of upper and lower faces halves (Rossion, 2008, 2013). To make the perceptual account of the composite effect more explicit, one may conceive an ideal "holistic" observer who refers to a whole face as the smallest perceptual unit when exposed to natural and intact face stimuli. However, this notion is just an ideal, because human observers can take a part-based focus of facial stimuli (Meinhardt-Injac et al., 2010, 2011). As outlined above, this observer would yield a large congruency effect in the complete design. Moreover, she/he would show a unique response pattern in incongruent trials (see **Figure 1A**). When exposed to the "same" trials, she/he should tend to respond "different" because the wholes formed by fusing the upper and lower haves are different. In the "different" trials she/he should also tend to respond "different" because the wholes are also different. That is, an observer who relies on the perceptual integration of the upper and lower halves should exhibit a strong response bias toward the "different" response category in incongruent trials. Conversely, in congruent trials, she/he should exhibit no response bias because the wholes are same in the "same" trials and different in the "different" trials. This means that a unique and testable prediction exists for the perceptual account of the congruency effect in the CD.

**Prediction 1.** *Suppose in a same/different face matching experiment in the complete design upper and lower face halves are perceptually integrated into a unified whole facial percept, and the observer relies on this percept in most of the trials when she/he decides about the identity of face halves. In this case a large congruency effect will exist with a strong bias toward "different" responses in incongruent trials and no bias toward either response category in congruent trials.*

This prediction has an important implication for the conclusions that can be drawn from the absence of response bias in incongruent trials. As it is implied by Prediction 1, a bias toward "different" responses in incongruent trials is a *necessary condition* for the perceptual account. If the bias is not observed, it can be concluded that the subject's response behavior is not guided by a unified whole facial percept (i.e., she/he is no "holistic" observer). On the other hand, when the scheme of results is observed as postulated by Prediction 1, it does not offer conclusive evidence that a unified whole facial percept underlies the response behavior because alternative sources may yield the same result. However, it is good evidence because the crucial observation is a complex one that comprises three coincident components.

Let us now turn to the alternative view that face halves are perceived and encoded as independent parts, but interact at the decisional level (Richler et al., 2008a,b). As far as we see, the kind of interaction at the decisional level has not yet been explicated such that testable predictions can be derived concerning the nature of response bias (see Discussion, in Cheung et al., 2008). This lack of explanation is clearly a drawback. However, as the researchers pointed out, the interaction of face halves is stronger for faces than for other objects and it occurs automatically, while non-face objects need training or aiding context (Gauthier et al., 2003; Richler et al., 2009a). The degree of part interaction is expected to increase with increasing object expertise (Gauthier and Tarr, 2002; Gauthier et al., 2003; Richler and Gauthier, 2013). From this, it follows that there should be a strong congruency effect for faces, but not for non-familiar non-face objects. For the nature of the bias, no specific prediction is possible with the explication of this theory.

As outlined above, the nature of errors in incongruent trials is particularly important to understand the way face halves interact. The "holistic" observer is not expected to be prone to wrongly saying "same" when the target halves are different because then the target halves *and* the wholes are different. Instead, she/he is prone to wrongly saying "different" when the target halves are same because the wholes are different. Hence, a case in which errors of both kind are equally likely in incongruent trials (i.e., there is no bias toward either response category) would offer strong evidence that the observer does not rely on an unparsed whole facial representation. However, a strong congruency effect means that the observer makes many errors in the incongruent trials. While the absence of a "different" bias in incongruent trials would speak against a perceptual account of holistic processing, comparisons with the results for non-face objects are necessary to decide whether the congruency effect might reflect, at least partly, response interference, as with the Stroop effect (Richler et al., 2009b). The involvement of a response interference should concern faces and non-face objects as well. However, if congruency effects were negligible for non-face objects but substantial for faces, this finding would speak against response interference and would suggest a decisional account for the interaction among the face halves (see Discussion).

#### **1.3. TASK CONSTRAINTS**

The same/different matching task used to study the composite effect involves categorization at the individual level, which is an important task constraint (see Discussion). Schyns and colleagues (Smith et al., 2004; van Rijsbergen and Schyns, 2009) recorded the early perceptual and face selective N170 and the P300, which reflects activation involved in categorial decisions (Goodale and Milner, 1992), while subjects made categorial decisions about faces (e.g., gender, facial expression). They found evidence for face specific encoding at early stages, but not much selectivity for the diagnostic features of the given categorization task. Modulation by mostly task-relevant diagnostic features was found only for the P300. Measuring the selectivity for spatial frequencies showed that the N170 was sensitive to both low and high spatial frequencies, while the P300 responded mainly to the high spatial frequencies of task-relevant diagnostic features (Smith et al., 2004). From these results the authors concluded that categorial decisions about objects are made at a later stage that transforms and reorganizes detailed diagnostic features.

The findings of Schyns and colleagues indicate that variation of task constraints can offer valuable clues about the functional role and the locus of feature integration in face perception. The difficulty of ignoring irrelevant context can be modulated by informing the observer early or late in the trial which object parts are to be compared. With an early cue the observer can try to attend to only diagnostic features, and to ignore irrelevant context. When the cue comes late in the trial, the observer must encode relevant and irrelevant features, and recall only the relevant features at decision. Therefore, contextual influence should be larger in the late cue condition. Second, feedback about correctness can help the observer to control contextual influence, and to optimize attentional selection. In a recent study (Meinhardt-Injac et al., 2011) it was shown that observers were able to use trial-by-trial feedback to regulate the influence of irrelevant external context features on relevant internal features. Strong improvement in accuracy was observed compared to the no-feedback condition, indicating that feedback indeed helped observers to attend to the diagnostic features.

Because faces and non-face objects differ in their degree of part integration, early/late cueing and feedback should modulate the congruency effect differently for both stimulus categories. Contextual influence is expected to be moderate for non-face objects. Therefore, also the modulating influence of early/late cueing and feedback should be small. In contrast, congruency effects for faces are expected to be substantial. The temporal cue position and feedback should therefore be crucial for controlling the influence of irrelevant facial features.

Using the CD enables us to characterize the nature of feature integration by judging congruency effects along with response bias. In particular, the perceptual account of the composite effect can be tested within the framework of the CD. Additionally, variation of constraints for attending diagnostic features and providing feedback or not can be used as a further means to strengthen a differential results pattern for faces and non-face objects. In this study we demonstrate that the CD is suitable for revealing different processing schemes for face and non-face objects reliably.

## **2. MATERIALS AND METHODS**

#### **2.1. STUDY OUTLINE**

As in a previous study using the CD (Richler et al., 2009c), we used a same/different face matching task in which a composite study image was shown for a longer time interval (800 ms), followed by a composite test image shown for a shorter time interval (433 ms, see **Figure 4**). A cue informed the observer which halves, the upper or lower, were to be attended. The observers' task was to decide, as accurately as possible, whether the cued halves were the same or different. One group of participants received acoustical trial-by-trial feedback about correctness, the other received no feedback. The temporal position of the target cue was varied to modulate the constraints for attending diagnostic features. When the target cue coincides with the study image, the observer can adjust his/her attentional focus to only the target half and maintain it throughout the trial. When the target cue comes briefly before the test, the observer must encode the whole stimulus at study and then shift his/her attention toward the target half at test. Hence, an effective part-based strategy is possible if the observer is certain from the beginning of the trial about which halves are to be matched (Riesenhuber et al., 2004; Riesenhuber and Wolff, 2009). With variations in feedback and temporal cue position it is possible to measure performance under conditions where observers have good attentional control and learning opportunities (cue at the beginning of the trial and trial-by-trial feedback) and measure that point at which attentional control is hampered and the decision behavior cannot be optimized by cognitive markers (no feedback and cue briefly before test image). These conditions should illuminate whether faces and non-face objects differ concerning the efficient extraction of diagnostic cues for identity matching of halves. If feature integration across halves is mandatory for faces, faces should be less efficient in this respect because the influence of irrelevant features remains, and interferes with piecemeal analysis.

#### **2.2. PARTICIPANTS**

Fifty one subjects participated in the experiment with face stimuli; 24 in the no-feedback group and 27 in the feedback group. 38 subjects participated in the experiment with non-face stimuli; 19 in the feedback and 19 in the no-feedback group. In all groups, the proportion of female participants was about 65%. All participants were undergraduate students of psychology at the Johannes Gutenberg University Mainz, ages spanned between 20 and 24 years. Subject had normal or corrected to normal vision, using corrective lenses in the latter case. All subjects were naive with respect to the purpose of the experiment. They were given course credit points for participation. The study was conducted in accordance with the Declaration of Helsinki. In detail, subjects participated voluntarily and gave written informed consent for their participation. In addition, participants were informed that they were free to stop the experiment at any time without negative consequences, and that their data would be removed from the panel. The data were analyzed anonymously.

#### **2.3. APPARATUS**

The experiment was executed with Inquisit runtime units. Stimuli were displayed on NEC Spectra View 2040 TFT displays in 1280 × 1024 resolution at a refresh rate of 60 Hz. Screen mean luminance *L*<sup>0</sup> was 100 cd/m<sup>2</sup> at a michelson contrast of (*Lmax* − *Lmin*)/(*Lmax* + *Lmin*) = 0.98. No gamma correction was used. The room was darkened so that the ambient illumination matched that of the screen. Stimuli were viewed binocularly at a distance of 70 cm. Subjects used a distance marker but no chin rest throughout the experiment. Stimulus size was 250 × 350 pixels (width × height), which corresponded to 10 × 12.5 cm of the screen, or 8◦ × 10◦ measured in degree of visual angle at 70 cm viewing distance. Stimulus position jittered randomly within a region of ± 50 pixels around the center of the screen to preclude pixel matching strategies between two subsequent stimulus presentations. Masks subtended 350 × 450 pixels (width × height), and their position was always fixed at the screen center. They were constructed from randomly ordered 5 × 5 pixel blocks of the prior image shown. Subjects provided responses on an external key-pad, and wore light headphones for acoustical feedback in the feedback condition.

## **2.4. STIMULI** *2.4.1. Face stimuli*

Photographs of 20 male models were used for stimulus construction. The models gave written consent for scientific use and publication of their face images. These photographs were frontal view shots of the whole face, captured in a professional photo studio under controlled lighting conditions. The original images were edited using Adobe Photoshop CS4 to generate the set of stimuli used in the experiment. Photographs were initially converted to 8 bit grayscale pictures and superimposed with an elliptical frame mask to obliterate all external facial features, such as hair, ears, or chin line. The elliptical cutouts were then split horizontally at the bridge of the nose, thus yielding 20 upper and 20 lower face halves. Each upper half was recombined with three lower halves to constitute a final set of 60 compound faces. The cutline between the face halves was concealed with a white bar 5 pixels in thickness. It was warranted that any upper face part was never recombined with the lower half of the same original face. In addition, each of the 20 lower and upper halves appeared exactly three times in the final set of stimuli.

## *2.4.2. Non-face stimuli*

Twenty watches were used for the non-face stimuli. Watches were sampled from internet sources, and selected such that they had high overall resemblance, showed the same time, and had nonsalient distinctive single features. The images were transformed to gray and matched on lightness and contrast. The cutline for subdividing into upper and lower halves was exactly through the midpoint of the clock face. All external features were removed using a circular frame mask that contained only the clock face of the watches with numbers and hour hands. Stimulus examples are shown in **Figure 3**. As for the faces, a final set of 60 composite faces was constructed.

## **2.5. PROCEDURE**

A same/different forced choice matching task was used. Subject were informed that face pairs could differ in the cued and noncued halves and that object matching was to be done upon just the cued halves. The temporal order of events in a trial sequence was: fixation mark (750 ms)—blank (300 ms—study face stimulus (800 ms)—mask (400 ms—blank (800 ms—test face stimulus (433 ms)—mask (400 ms)—blank frame until response (see **Figure 4**). The allocation of participants to the feedback and the no-feedback group was random. Subjects were made familiar with the task by going through randomly selected probe trials to ensure that the instructions were understood and could be put into practice. All subjects completed two cue conditions. In the "cue 1st" condition a rectangular bracket marking the target face half was shown simultaneously with the study face, and remained until the test face was masked. In the "cue 2nd" condition the cue presentation began with the mask of the study face. A trial was deemed congruent (CC) when the non-cued half of the face was different in "different" trials and same in "same" trials, and it was considered incongruent (IC) when the non-cued half was same in "different" trials and different in "same" trials.

For each stimulus class the experimental design was a 2 (Feedback) × 2 (Cue position) × 2 (Congruency) × 2 (Target half) factorial plan. Feedback was implemented as a betweensubjects factor; all others were within-subjects factors. Each condition was measured with 16 "same" and 16 "different" trials. Trials were shuffled and assembled in a randomly ordered measurement list, but with cue position ordered in blocks1 . The two blocks, interleaved by a brief pause, were administered on a single day. Each block lasted about 15 min.

## **2.6. DEPENDENT MEASURES AND DATA TRANSFORMATIONS**

For the same/different experiment the "same" response category was defined as the target category. Accordingly, hit-rate (Hit) was defined as the rate of correctly identifying same target halves and correct rejection rate (CR) was defined as the rate of correctly identifying different target halves. False alarm rate (FA) and the rate of misses (Miss) were defined as being the complementary rates to CR and Hit, respectively. Rates were estimated by pooling

1Pilot measurements showed that having cue position randomly interleaved rendered the task too difficult.

across the relative frequencies obtained for upper and lower half matching. The relative frequency data were transformed into *d*- according to

$$d' = z(\text{CR}) - z(\text{Miss}).\tag{1}$$

In Equation (1) *z* is the quantile of the standard normal distribution. If the standard scale is shifted leftward about *d*- /2, the fair response criterion is located at the origin (see Appendix). By calculating the response criterion *c* on this scale

$$z = z(CR) - \frac{d'}{2} \tag{2}$$

response bias can be evaluated because positive values of *c* mean that the observer prefers "different" responses, while negative values of *c* indicate that she/he prefers the "same" response category (see **Figure A1**).

A bias measure can alternatively be defined in terms of the error proportion:

$$q = \frac{\text{Miss}}{\text{Miss} + \text{FA}}.\tag{3}$$

If *q* = 0.5, then both kinds of errors are made with the same frequency. A ratio of *q* > 0.5 indicates a tendency to say "different" while *q* < 0.5 indicates a preference toward "same" responses. The error proportion measure, *q*, has the advantage that it easy to interpret. For example, a value of *q* = 0.7 means that 70% of all errors are wrong "different" responses and 30% are wrong "same" responses.

A further way to assess response bias is to look at the odds-ratio statistics. The odds-ratio of both errors is defined

$$OR = \frac{\text{Miss/Hit}}{\text{FA/CR}}.\tag{4}$$

The odds-ratio is a straightforward way to assess how much higher the odds are for wrong "different" responses compared to wrong "same" responses.

#### **2.7. DATA ANALYSIS**

Agglomerating the rates for upper half and lower half matching resulted in *N* = 32 replications for each trial type. If CR or Miss rates were zero or unity, they were corrected to 1/(2*N*) and 1 − 1/(2*N*), respectively, before *d* data were calculated (MacMillan and Creelman, 2005, p. 8). The *d* data were analyzed with ANOVA with feedback as the grouping factor and cue position and congruency as repeated measurement factors. Separate analyses were carried out for faces and watches. Congruency effects were calculated from the *d* data by taking the difference *CE* = *d*- (*CC*) − *d*- (*IC*) on the level of individual subjects.

#### **3. RESULTS**

#### **3.1. MATCHING ACCURACY**

**Figure 5** shows the data for faces and watches as Box-Whisker plots. Widely different results were obtained for faces and watches. The ANOVA results for faces (see **Table 1**) indicated a strong effect of cue position [*F*(1, 49) = 88.8, *p* = 1.4 · 10−12, η<sup>2</sup> *<sup>p</sup>* = 0.644] and a strong effect for congruency [*F*(1, 49) = 132, *p* = 1.4 · 10−15, η<sup>2</sup> *<sup>p</sup>* = 0.73]. The

effect of congruency was strongly modulated by cue position [*F*(1, 49) = 30.0, *p* = 1.5 · 10−6, η<sup>2</sup> *<sup>p</sup>* = 0.379], and, to smaller degrees, by feedback [*F*(1, 49) = 4.29, *p* = 0.044, η<sup>2</sup> *<sup>p</sup>* = 0.081]. There was no main effect of feedback [*F*(1, 49) = 0.03, *p* = 0.968, η<sup>2</sup> *<sup>p</sup>* = 0.001], and cue position and feedback did not interact [*F*(1, 49) = 0.22, *p* = 0.64, η<sup>2</sup> *<sup>p</sup>* = 0.004].

For watches (see **Table 2**), there was a strong effect of cue position [*F*(1, 36) = 107, *p* = 2.5 · 10−12, η<sup>2</sup> *<sup>p</sup>* = 0.748] and a smaller effect of congruency [*F*(1, 36) = 8.62, *p* = 0.006, η<sup>2</sup> *<sup>p</sup>* = 0.193]. The latter effect did neither depend on cue position [*F*(1, 36) = 0.57, *p* = 0.456, η<sup>2</sup> *<sup>p</sup>* = 0.016], nor on feedback [*F*(1, 36) = 0.02, *p* = 897, η<sup>2</sup> *<sup>p</sup>* < 0.001]. As for faces, there was no main effect of feedback [*F*(1, 36) = 0.15, *p* = 0.701, η<sup>2</sup> *<sup>p</sup>* = 0.004], and feedback and cue position did not interact [*F*(1, 36) = 2.30, *p* = 0.139, η<sup>2</sup> *<sup>p</sup>* = 0.06].

#### **3.2. CONGRUENCY EFFECTS**

**Figure 6** shows the congruency effects (CE) for faces (open symbols) and watches (filled symbols), as Box-Whisker plots2 . A significant congruency effect in one condition, when the cue came at the second position in the absence of feedback, existed for watches [*CE* = 0.404,*t*(18) = 2.969, *p* = 0.008]. The lack of any interactions of congruency with cue position or feedback (see above) indicates that these factors did not modulate the congruency effect (see **Table 2**). Further, the analysis yielded no interaction of all three factors [feedback × cue position × congruency, *F*(1, 36) = 2.30, *p* = 0.138, η<sup>2</sup> *<sup>p</sup>* = 0.06].

Congruency effects for faces were strong, ranging from about 0.75 *d* units (cue1st with feedback) to 1.75 *d* units (cue2nd without feedback). Congruency effects for faces depended largely on cue position, and were much larger when the cue came at the second position [*CE* = 0.565, *F*(1, 49) = 30.0, *p* = 1.5 · 10<sup>−</sup>6]. Congruency effects were also stronger without than with feedback [*CE* = 0.416, *F*(1, 49) = 4.29, *p* = 0.044]. Feedback and cue position were found to modulate the congruency effect independently, since the higher level interaction among all three factors failed to reach significance [feedback × cue position × congruency, *F*(1, 49) = 2.39, *p* = 0.128, η<sup>2</sup> *<sup>p</sup>* = 0.047].

Hence, we found a clear pattern for congruency effects. For watches, the data yielded a consistent tendency to perform better in congruent contexts compared to incongruent contexts (see **Figure 5**). However, congruency effects remained marginal clearly below half a *d* unit and did not depend on cue position or feedback. For faces, however, there were large congruency effects, which were strongly modulated by cue position (η<sup>2</sup> *<sup>p</sup>* = 0.379), and, to minor degrees, by feedback (η<sup>2</sup> *<sup>p</sup>* = 0.081).

#### **3.3. RESPONSE BIAS**

**Figure 7** shows the response criterion *c* for faces (upper panels, A) and watches (lower panels, B) as Box-Whisker plots. **Tables 3**, **4** show detailed results, including both the *c* and the *q* measure, miss and false alarm rates, overall error rate *pe*, and odds ratio of misses and false alarms. To judge response bias statistically it has to be verified whether the mean *c* value is significantly above ("different" bias), or below ("same" bias) the expected value 0, as indicated by the Whiskers3 . For faces, there was only one significant bias in the feedback condition (see left upper panel of **Figure 7**), where a tendency toward "same" responses existed for congruent trials when the cue came at the second position [*c* = −0.14, *t*(26) = −5.03, *p* = 3.1 · 10<sup>−</sup>5]. There was no response bias in the absence of feedback in congruent contexts; however, a

<sup>2</sup>Note that, since the CE is defined as a difference measure (see Materials and Methods), congruency effects are significant at the 5% alpha level if 0 is outside the confidence interval of the mean, which is easily seen from the Whiskers. We do not report ANOVA tables for the CE measure, since the results are identical with those for all interactions involving congruency at the original *d* data (see **Tables 1**, **2**). We report results from pairwise tests necessary to judge differences in the magnitude of the CE. However, also these tests coincide with the tests for the interactions involving the congruency factor, since congruency and feedback have only 2 levels.

<sup>3</sup>Note that ANOVA of the *c* data does not indicate whether the values deviate significantly from 0. Therefore, results of separate *t*- statistics for each condition are listed in **Tables 3**, **4**.



*The table shows source of variation, sum of squares (SS), degrees of freedom (df), variance estimate (*σˆ <sup>2</sup>*), F- ratio, (F), significance level, p, and partial etasquared,* η*<sup>2</sup> p.*



pronounced tendency toward "different" responses existed in incongruent trials [cue1st: *c* = 0.27,*t*(23) = 5.19, *p* = 2.9 · 10<sup>−</sup>5; cue2nd: *c* = 0.24,*t*(23) = 4.80, *p* = 7.6 · 10<sup>−</sup>5]. To judge bias it is also important how many errors occurred in a given condition because response bias is of practical relevance only if a substantial number of errors are made. This was the case for incongruent trials in the absence of feedback. Here, 14% misses stood against 5.3% false alarms when the cue came first (*pe* = 9.7%), and 29.8% misses compared to 15.5% false alarms when the cue came at the second position (*pe* = 22.6%). Response bias did not occur in the feedback condition in incongruent trials when the cue came at the second position, although the error rates were rather high (*pe* = 18.8%, see last line of **Table 3**). Instead, there was "same" bias in congruent trials, but there, the error rate was moderate, with 9.3% false alarms compared to 5.5% misses (*pe* = 7.4%, see 2nd last line of **Table 3**). This indicates that trial-by-trial feedback influenced the subjects' response strategies. Comparing the likelihood of both kind of errors with the odds-ratio statistics confirmed this result. In the absence of feedback and in incongruent trials the chance for wrong "different" responses was more than double the chance for wrong "same" responses when the cue came at the second position, and nearly threefold when the cue came at the first position. With feedback and in congruent trials the chance for wrong "different" responses was nearly halved when the cue came at the second position. All other odds-ratios are about 1, which indicates balanced chances for errors of both kinds.

For watches the *c* values were negative in all conditions, which indicates a global bias toward "same" responses. However, statistical significance was reached only in two conditions, congruent trials when the cue came at the second position, in the presence of feedback [*c* = −0.16, *t*(28) = −2.39, *p* = 0.028], and in its absence [*c* = −0.17, *t*(28) = −2.59, *p* = 0.018]. In both conditions a significant proportion of errors occurred (see **Table 4**). Testing first against second cue position for congruent trials revealed a stronger "same" bias at the second cue position in the presence of feedback [*c* = 0.124, *F*(1, 36) = 6.46, *p* = 0.015] and in its absence [*c* = 0.123, *F*(1, 36) = 6.31, *p* = 0.017]. In incongruent trials no corresponding differences were found [feedback: *c* = 0.03, *F*(1, 36) = 0.43, *p* = 0.518; no feedback: *c* = −0.05, *F*(1, 36) = 1.12, *p* = 0.296]. A global bias toward "same" responses also became apparent in the mean odds-ratio, which was 0.73, indicating that wrong "different" responses had about a three quarters chance to occur compared to wrong "same"

responses. In the two conditions where a significant bias measure *c* was observed this chance fell to about 0.5. This shift from the general balance of chances observed for watches was by far not as strong as the three shifts of chances observed in the face matching experiment.

## **4. DISCUSSION**

Testing the effects of congruency, target certainty, and feedback in a same/different matching task showed strong effects of congruency and target certainty, while feedback yielded no effects on overall matching performance. This finding was the case for faces and watches. The magnitude of the congruency effects, however, differed widely between the two object classes. For watches, congruency effects were consistently present in all conditions, but marginal, and reached significance only in the condition where subjects could not prepare well for the task (no feedback and late target cue). For faces, in contrast, there were large congruency effects, which were substantial when subjects could prepare well for the task (feedback and target cue already at study) and very large if not (no feedback and late target cue). For faces, feedback and target certainty modulated the congruency effect independently (additively), while no modulatory influence of these factors was found for watches. Hence, the magnitude of congruency effects and the pattern of their dependency on feedback and target half certainty clearly separated facial from the non-facial watch stimuli.

Analysis of response bias also revealed differential result patterns for faces and watches. For faces, response bias strongly depended on the feedback condition, while, for watches, feedback did not influence the nature of response preferences. For faces, a strong "different" bias was observed in the absence of feedback in incongruent trials, and no response preference was found in congruent trials. With feedback, the "different" bias vanished completely, but a "same" bias emerged in congruent trials and when the cue came at the second position. For watches, a marginal, but general "same" bias was found, which was significant in

congruent trials when the cue came at the second position. Hence, a "different" bias in incongruent trials, which might be diagnostic of a "holistic" representation, which interferes with proper comparison of parts, was only found for faces. Indeed, the finding of the large congruency effects, together with a strong "different" bias only in incongruent trials, came out in the no-feedback condition where no external signals communicated to the observer that she/he erroneously judged face halves as different. This finding is strong support for the perceptual account of the composite effect (see Introduction).

#### **4.1. TARGET CERTAINTY**

Subjects made more errors when they were informed about the target half briefly before test. However, the effect of cue position was differential for congruent and incongruent trials only for faces, not watches. For faces, the need to change the attentional focus within a trial impaired performance to larger degrees in incongruent trials, thus enlarging the congruency effect (see **Figures 5**, **6**). As stated above (see Materials and Methods) the reasoning behind the manipulation of cue position was to probe

#### **Table 3 | Bias measure results for faces.**


*The table shows mean c value, its standard error, se, t- value and probability that the expected value 0 is within the distribution of the mean c value, misses, false alarms, error rate, pe* = (*Miss* + *FA*)/2*, and error proportion measure, q, on a percent scale, odds-ratio, OR, and sample size, N.*



whether the congruency effect depended on how the subjects prepared for the task. While it was a reasonable assumption that a priori knowledge and the opportunity to adopt a viewing strategy in advance would regulate the face processing mode (Riesenhuber et al., 2004; Riesenhuber and Wolff, 2009), our results for the bias measure only partly support this claim. Regardless of feedback, subjects could try to encode and compare only the target face half when the cue came before the trial. While subjects made more errors mostly in incongruent trials when the cue came late in the trial (see **Table 3**), the strong bias in favor of "different" judgments was the same for both cue positions when there was no feedback. Hence, the opportunity to adjust the attentional focus in advance clearly reduced the absolute number of errors in incongruent trials, but it did not change their nature. This finding suggests that the early cue enabled a part-based viewing strategy in more of the trials, but errors still came from global contextual influence.

#### **4.2. FEEDBACK**

The results of this study showed that the effects of feedback are highly differential for faces and watches. For watches, providing feedback or not did not have much effect. Feedback did not modulate performance, it did not modulate the congruency effect, and it did not change the nature of response preferences in any respect. For faces, feedback did not modulate the general level of matching accuracy; however, it did modulate the congruency effect and it changed observers' response preferences qualitatively (see **Figure 7**). With feedback, the response bias pattern that suggested

#### **Table 5 | Pairwise tests for face-half matching with and without feedback.**


a perceptual account of the congruency effect in the no feedback condition was lost. This finding indicates that, with trial-by-trial feedback, observers adjusted either their perceptual or decisional strategies. In the following, we argue that subjects adjusted mostly their decisional strategies.

If observers frequently resort to the "different" response in incongruent trials, trial-by-trial feedback would signal her/him that she/he overlooked the sameness of the target halves, which should initiate a more careful use of the "different" button in the course of the experiment. We noted above that feedback reduced the matching errors particularly in incongruent trials, which limited the congruency effect (see **Table 3**). However, overall face-matching performance was the same with and without feedback. This can only happen if performance in congruent trials worsens in the presence of feedback, which was exactly the case. **Table 5** shows the pairwise comparisons of performance with and without feedback. The results confirm that performance slightly worsened in congruent trials and slightly improved in incongruent trials; however, none of these changes reached statistical significance. Indeed, the largest change was worsening of performance in congruent trials when the cue came at the second position, which just failed to reach significance. These results show that the smaller congruency effects in the feedback condition (see Results) were artifacts of change in opposite directions for congruent and incongruent trials. In fact, no net improvement of performance occurred by providing trial-by-trial feedback.

When we look at the changes in the nature of errors, the conclusion that feedback led only to a change of decisional strategy is further substantiated. The net error rate *pe* was mostly comparable in two corresponding conditions with and without feedback; however, there was an overall shift in bias toward more "same" responses with feedback. In the cue 1st condition, this did not occur because errors occurred only occasionally.

Change of response criteria, but lack of net performance improvement, indicates that feedback could not be used to refine a perceptual strategy with better attentional control of the unattended face halves. However, Meinhardt-Injac et al. (2011) reported this function of feedback in controlling the contextual effects of external features on internal target features. There, subjects were able to use feedback for improvement in incongruent trials, while performance in congruent trials remained as good as that in the without feedback condition. However, the task in Meinhardt-Inajc and colleagues' study was less complex and did not require attentional shift within a trial, which is a difficult task (Lincolt et al., 1997). In addition, learning to regulate the influence of incongruent context information was easier because it could be achieved by learning to better focus the inner face parts and ignore the facial surroundings.

#### **4.3. RESPONSE BIAS FOR WATCH STIMULI**

The discussion in the foregoing section showed that the response bias results for faces can be explained by the perceptual account of the composite effect in cases with no external markers that might alert subjects to the fact that they falsely think the halves are different. When such external markers were provided, subjects changed their response strategies and relabeled perceptual states as "same," which they formerly labeled "different". Because the performance measure in the complete design was bias-free, the findings suggest that this strategy was decisional and did not lead to net change of performance.

In addition to feedback and reward, a further factor that might influence response bias is the stimulus material. **Figure 7A** shows that faces in the congruent trials were judged as "same" or "different" with practically equal likelihood when there was no feedback. This finding indicates that the stimulus material was well balanced in this respect. Composite watch stimuli, however, had to be constructed from exemplars with high overall similarity. These stimuli differed by single details, otherwise the matching task would have been too easy. The bias data (see Results, see **Figure 7B** and **Table 4**) show that subjects had a general tendency to overlook the crucial details, which made the difference. This finding was independent of the congruency relation; however, occurred more frequently when the cue came at the second position. This is plausible because, with an additional attentional shift, finding the crucial feature in 400 ms is more difficult. In the cue 1st condition, the search was restricted to just one half.

#### **4.4. THE CONGRUENCY EFFECT IN THE COMPLETE DESIGN**

The findings of the present study support a perceptual account of the congruency effect for faces because congruency effects coincided with a response preference for "different" responses in incongruent trials, as is expected from the composite face "illusion" (Rossion and Boremanse, 2008; Rossion, 2013). These findings are consistent with recent findings of Gao et al. (2011) who used the CD to study the effect of priming local vs. global processing levels with Navon primes prior to composite face matching. Instead of using non-face controls, they compared congruency effects and response bias in aligned and misaligned arrangements of face halves. As in this study the authors found strong congruency effects which were accompanied by a "different" bias only in the incongruent trials for the aligned arrangement. For the misaligned arrangement, both the congruency effect and the bias vanished. Hence, currently there are two studies which used the CD and obtained results in agreement with the "holistic" encoding hypothesis for face stimuli, while non-face stimuli or misaligned faces yielded different result patterns in the combined effects of congruency and response bias.

Gauthier and colleagues also reported larger congruency effects for faces than for non-face objects (Gauthier et al., 2003; Richler et al., 2009a; Richler and Gauthier, 2013); however found mixed results for the nature of the bias. Cheung et al. (2008) reported a "different" bias for full-spectrum faces and low-pass filtered faces, and a "same" bias for high-pass filtered faces. In a series of experiments with arrangements similar to this study, also a "different" bias was observed for the late cue condition (Richler et al., 2008b, Exp. 1 and Exp. 3). However, in a later replication with different timings a "same" bias was reported (Richler et al., 2009c). From our estimation, a "same" bias is not easily explained in terms of facial feature integration. A preference for "same" responses in incongruent trials would mean that a subject more often indicates sameness of composite faces whereas both the target halves *and* the wholes formed by an integration of the halves differ. A possible explanation for a "same" bias could be that face part interaction enters in the calculation of an internal, multi-feature similarity measure. Since partly different (incongruent) is less than totally different (congruent), it could well be that the observer shows no "different" bias in incongruent trials when she/he is conservative with the response criterion on the latent similarity scale4 . Because the authors currently decline from a unique interpretation of response bias (Cheung et al., 2008), a theoretical gap exists that should be closed by an explication of the rules for the interaction of independently encoded parts at the decision stage.

#### **4.5. THE USE OF TASK-RELEVANT OBJECT INFORMATION**

At the individual level of categorization single facial features, their configural relationship (Leder and Bruce, 2000; Leder et al., 2001), and global face features such as skin texture and hue

<sup>4</sup>We thank Peter Hancock for drawing our attention to this interpretation.

(Hancock et al., 2000; Meinhardt-Injac et al., 2013) can, in principle, be diagnostic for match or mismatch. This is a major difference to the categorization experiments of Schyns and colleagues (Smith et al., 2004; Joyce et al., 2006), where it was a priori clear that the inner face region around the eyes was most diagnostic for the gender discrimination task and the mouth region for the facial expression discrimination task. Certainly, face-halve matching with randomly changing target definitions between upper and lower halves also requires one to separate these two highly diagnostic face regions. However, our results are disappointing with respect to a better use and sharpening of diagnostic information with facilitative task demands. Results for the early cue show that selective encoding and comparing of diagnostic features was possible only for watches. For faces, the influence of the irrelevant face halves remained substantial, even though the observer could try to encode just one half. The results for the influence of feedback also show that faces and watches differ in the effective use of relevant cues. For faces, there was no learning because improvement in incongruent trials was achieved at the cost of impairment in the congruent trials. For watches, the small but significant congruency effect in the late cue condition vanished when feedback was provided; however, performance in congruent trials stayed the same. Absence of congruency effects and learning to improve in incongruent contexts showed that observers succeeded in retrieving diagnostic feature information mostly for watches.

Results obtained with the bubbles-technique (Gosselin and Schyns, 2001) suggest the presence of both diagnostic and less diagnostic features for faces at the early perceptual level, which indicates an automatic, task-independent mechanism for faces (Smith et al., 2004; van Rijsbergen and Schyns, 2009). The authors showed task-related modulation of the late P300 by demonstrating that the potential became more negative when the taskdiagnostic features were faded in, and less negative when the task-diagnostic features of the concurrent task were present. This finding might indicate task-related feature selection at later stages. However, these two groups of facial features were presented individually. For the problem addressed here, it would be interesting to see how much negativity is reduced when the task irrelevant features are added. Comparing relative changes of both the N170 and P300 would indicate where feature integration among relevant and irrelevant features is stronger; at encoding or at decision. This is left to forthcoming experimentation.

## **5. CONCLUSIONS**

It has been shown that the complete design can be used to derive testable predictions for the mechanisms of facial feature integration, which can be contrasted against results for non-facial objects. In studying the composite effect, the CD is highly recommended, since it ensures that the number of same and different face half pairings is fully balanced across attended and nonattended halves. Because of the high theoretical importance of the nature of response bias, use of a fully balanced design is mandatory, and it should be excluded that a response bias is induced merely by an unequal number of same and different face halves. With respect to the uniqueness of the congruency effect, the findings regarding the effects of feedback have revealed a weakness of the CE; therefore, we recommend not relying on only a difference measure (CE) when judging the effects of the congruency manipulation. Performance in incongruent trials is certainly more sensitive to task demands, but also sensitivity for congruent trials must be monitored, since there may be change in opposite directions. As an alternative that avoids some disadvantages of difference scores (Peter et al., 1993), regression-based techniques could be used (DeGutis et al., 2013). However, the initial empirical comparisons indicate no higher reliability of the regression method. Because bias-free performance measures are linked to the CD, it allows researchers to assess performance and response bias independently. As a formal framework for experimental design, the CD is neutral regarding divergent theoretical accounts of feature integration. Therefore, we consent to Richler and Gauthier (2013) in that the CD is, at the time, the right framework for studying the composite effect.

#### **AUTHOR CONTRIBUTIONS**

All authors contributed equally to conception and design of the study. Malte Persike conducted the experiments and data preparation. Günter Meinhardt contributed data analysis and interpretation. All authors were involved in writing, preparation of the manuscript and its final approval. All authors agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

### **REFERENCES**


Hancock, P. J., Bruce, V., and Burton, A. M. (2000). Recognition of unfamiliar faces. *Trends Cogn. Sci.* 4, 330–337. doi: 10.1016/S1364-6613(00)01519-9


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2014; accepted: 14 October 2014; published online: 31 October 2014. Citation: Meinhardt G, Meinhardt-Injac B and Persike M (2014) The complete design in the composite face paradigm: role of response bias, target certainty, and feedback. Front. Hum. Neurosci. 8:885. doi: 10.3389/fnhum.2014.00885*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Meinhardt, Meinhardt-Injac and Persike. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## **APPENDIX: THE BIAS MEASURE** *C*

Let us define the four events resulting from a 2 × 2 stimulusresponse matrix with "same" and "different" as the response alternatives (see **Figure A1**) in terms of conditional probabilities:

$$\begin{array}{lcl}CR &= P(\text{"different"}|D) \\ FA &= P(\text{"same"}|D) \\ Miss &= P(\text{"different"}|S) \\ Hit &= P(\text{"same"}|S) \end{array}$$

According to the basic assumptions of signal detection theory, these probabilities derive from normal probability density function (likelihood functions), *f*(*x*|*D*) and *f*(*x*|*S*), with equal variance σ2. For the difference of the means of both distributions we have

$$
\Delta \mu = \mu\_S - \mu\_D = k - \mu\_D + \mu\_S - k
$$

where *k* is the decision criterion on the sensory continuum *x*, which is assumed to be constant throughout all measurements. Dividing by σ

$$\begin{array}{rcl}d' = \frac{\Delta\mu}{\sigma} = \frac{k - \mu\_D}{\sigma} + \frac{\mu\_S - k}{\sigma} \\ = \mathbf{z}\_D - \mathbf{z}\_S\end{array} = \Phi^{-1} \begin{pmatrix} \mathbf{CR} \end{pmatrix} - \Phi^{-1} \begin{pmatrix} \text{Miss} \end{pmatrix}.$$

**decision criterion** *k* **on the latent sensory continuum** *x***.** The lower

Here, −<sup>1</sup> is the inverse distribution function (quantile function) of the standard normal distribution,*zD* is the standard quantile of *k* relative to *f*(*x*|*D*), and *zS* is the standard quantile of *k* relative to *f*(*x*|*S*). Now, verify that standardization of *x* with respect to *f*(*x*|*D*) maps μ*<sup>D</sup>* → 0 and μ*<sup>S</sup>* → *d*- , i.e.,

 $z(\mu\_D) = \frac{\mu\_D - \mu\_D}{\sigma} = 0$   $z(\mu\_S) = \frac{\mu\_S - \mu\_D}{\sigma} = d'.$ 

The standardization *z* = (*x* − μ*D*)/σ may be shifted to a new origin, chosen as half the standardized distance of means, *d*- :

$$z' = z - \frac{d'}{2}.$$

This scale is chosen to express the response criterion *k* on a transformed standard axis:

$$c = z\_{\mathrm{D}} - \frac{d'}{2}.$$

On this scale, positive values of *c* mean that the response criterion is closer to μ*S*, and negative values mean that it is closer to μ*D*. The means transform *z*- (μ*D*) = −*d*- /2, and *z*- (μ*S*) = *d*- /2, respectively.

(see arrow).

# Differential age-related changes in N170 responses to upright faces, inverted faces, and eyes in Japanese children

Kensaku Miki 1,2 \* † , Yukiko Honda1† , Yasuyuki Takeshima<sup>1</sup> , Shoko Watanabe1† and Ryusuke Kakigi 1,2†

<sup>1</sup> Department of Integrative Physiology, National Institute for Physiological Sciences, Okazaki, Japan, <sup>2</sup> Department of Physiological Sciences, School of Life Science, SOKENDAI (The Graduate University for Advanced Studies), Hayama, Japan

The main objectives of this study were to investigate the development of face perception in Japanese children, focusing on the changes in face processing strategies (holistic and/or configural vs. feature-based) that occur during childhood. To achieve this, we analyzed the face-related N170 component, evoked by upright face, inverted face, and eyes stimuli in 82 Japanese children aged between 8- and 13-years-old. During the experiment, the children were asked to perform a target detection task in which they were told to press a button when they saw images of faces or kettles with mustaches, glasses, and fake noses; i.e., an implicit face perception task. The N170 signals observed after the presentation of the upright face stimuli were longer in duration and/or had at least two peaks in the 8–11-year-old children, whereas those seen in the 12–13-year-old children were sharp and only had a single peak. N170 latency was significantly longer after the presentation of the eyes stimuli than after the presentation of the upright face stimuli in the 10- and 12-year-old children. In addition, significant differences in N170 latency were observed among all three stimulus types in the 13-year-old children. N170 amplitude was significantly greater after the presentation of the eyes stimuli than after the presentation of the upright face stimuli in the 8–10- and 12-year-old children. The results of the present study indicate that the upright face stimuli were processed using holistic and/or configural processing by the 13-year-old children.

Keywords: N170, face, development, EEG, eyes, inversion

## Introduction

The face contains a lot of information that is relevant to our daily lives, such as information about age, sex, and familiarity, and plays an important role in social communication. Accordingly, the face has been extensively examined in many previous psychological studies. For example, Bruce and Young (1986) described seven codes that can be distinguished during face processing, which they named pictorial, structural, identity-specific semantic, visually-derived semantic, name, expression, and facial speech codes; i.e., the face recognition model. In addition, three types of information are known to be important for face perception (Lee et al., 2013; Liu et al., 2013). The first type is isolated featural information, such as the size of the eyes. The second is configural information, which refers to the spatial relationships between facial

#### Edited by:

Aina Puce, Indiana University, USA

#### Reviewed by:

Florian Bublatzky, University of Mannheim, Germany Emilie Meaux, Laboratory for Neurology and Imaging of Cognition (LabNIC), Switzerland

#### \*Correspondence:

Kensaku Miki, Department of Integrative Physiology, National Institute for Physiological Sciences, 38 Nishigonaka Myodaiji, Okazaki, 444-8585, Japan kensaku@nips.ac.jp

> †These authors have contributed equally to this work.

> > Received: 18 February 2014 Accepted: 22 April 2015 Published: 02 June 2015

#### Citation:

Miki K, Honda Y, Takeshima Y, Watanabe S and Kakigi R (2015) Differential age-related changes in N170 responses to upright faces, inverted faces, and eyes in Japanese children. Front. Hum. Neurosci. 9:263. doi: 10.3389/fnhum.2015.00263 features, and the third is holistic information referring to the facial gestalt, which represents the fusion of featural and configural information into an unbroken whole (Tanaka and Farah, 1993).

It has been reported that faces are processed using holistic and/or configural strategies rather than feature-based strategies, which are generally used for object perception (Maurer et al., 2002). In addition, there is phenomenon unique to humans and non-human primates. Psychological studies have reported that face recognition was more difficult when inverted faces were presented rather than upright faces and named this phenomenon the face inversion effect. These findings suggest that face inversion might disrupt the holistic and/or configural processing of facial information (Tanaka and Farah, 1993; Mondloch et al., 2002).

EEG demonstrated that a negative component is evoked at approximately 170 ms during object perception, and this component was termed N170 (Bentin et al., 1996; George et al., 1996). N170 was shown to be larger during the viewing of faces than during the observation of other objects, such as cars or chairs (Rossion and Jacques, 2008), and was found to exhibit longer latency and a greater amplitude when eyes were being examined than during the viewing of upright faces (Watanabe et al., 1999). Therefore, N170 has been proposed to reflect holistic and/or configural processing during face perception. In previous studies, N170 was found to display longer latency and a greater amplitude during the observation of inverted faces than during the viewing of upright faces (Watanabe et al., 2003; Honda et al., 2007); therefore, N170 appears to be modulated by facial inversion, possibly because facial inversion disrupts holistic and/or configural processing and forces featural processing to be employed (Maurer et al., 2002; Rossion and Gauthier, 2002). In addition, recent studies based on event-related potential (ERP) and eye-tracking data have shown the importance of the eyes for face perception processing (Meaux et al., 2014; Nemrodov et al., 2014). For example, it was reported that the amplitude of N170 was greater when the subject fixated on the eyes than when they examined other locations (the forehead, nasion, nose, or mouth) (Nemrodov et al., 2014).

Some researchers have studied the development of face perception using neuroimaging methods (e.g., Lee et al., 2013). Many EEG-based studies have detected changes in N170 with age (Taylor et al., 1999, 2004; de Haan et al., 2002; Itier and Taylor, 2004a,b). In an infant study (de Haan et al., 2002), a putative ''infant N170'' signal was found to be sensitive to the species of animal to which the presented face belonged; however, the orientation of the face did not influence processing until a later stage, which differed from the findings obtained in adults. The inversion effect does not seem to affect the latency of N170 until 8–11 years of age and does not appear to affect the amplitude of N170 until 13–14 years of age (Taylor et al., 2004). Batty and Taylor (2006) also showed that the sensitivity of N170 to emotions develops late; i.e., at 14- to 15-years-old. A recent study that examined EEG and eye-tracking data detected a correlation between initial fixation on the eyes and N170 and suggested that this correlation was partially driven by common developmental dynamics (Meaux et al., 2014). In an fMRI study, adolescents exhibited face-related activity in the fusiform face area (FFA), occipital face area (OFA), and superior temporal sulcus (STS), which were similar to the regions that were activated in the adult group, whereas none of these face-related regions were activated in the children (Scherf et al., 2007).

Cultural differences in face processing mechanisms are known to exist. Blais et al. (2008) described cultural differences in eye movements between Western Caucasians and East Asians during the learning, recognition, and categorization of faces. In addition, a recent fMRI study found differences between the face processing mechanisms of Western individuals and East Asians; i.e., they detected an analytical style of face processing in the Western subjects and a holistic processing style in the East Asians (Goh et al., 2010).

In this study, we mainly investigated the development of face perception in children, focusing on the changes in face processing strategies (holistic and/or configural vs. featurebased) that occur during childhood. We also compared our results for Japanese children with the findings for Western children reported in previous studies. Based on the findings of previous developmental studies that examined EEG and eyetracking data, we mainly focused on the N170 component as a developmental marker of face perception processing in this study (Taylor et al., 1999, 2004; Itier and Taylor, 2004a,b; Meaux et al., 2014). On the other hand, P100 is considered to reflect basic and early processing, e.g., responses to changes in luminance, visual field, and visual size (see Meaux et al., 2014). Some studies have detected face-related effects on P100 (Batty and Taylor, 2003; Itier and Taylor, 2004a; Taylor et al., 2004). However, whilst age-related changes in N170 are considered to reflect the development of face perception processing it has been suggested that age-related changes in P100 reflect the general development of sensory and/or cognitive function, e.g., the development of visual acuity or visual attention, etc. (Pastò and Burack, 1997; Skoczenski and Norcia, 2002; Want et al., 2003; Betts et al., 2006; Crookes and McKone, 2009). Therefore, we also analyzed the changes in the P100 component that occur during childhood and compared them with the changes in N170 during the same period. Eighty-two subjects were analyzed in this study after being classified into 6 age groups (into 8, 9, 10, 11, 12, and 13-year-olds), which differed from the method used in previous EEG studies, in which the subjects were divided into two-year age groups (into 4–5, 6–7, 8–9, 10–11, 12–13, and 14–15-year-olds) (Taylor et al., 1999, 2004; Itier and Taylor, 2004a,b). This was the first study to investigate the development of face perception in a large number of Japanese children.

## Methods and Materials

## Subjects

Ninety-one normal right-handed volunteers with normal or corrected visual acuity participated in this study. However, two 8-year-olds, two 9-year-olds, two 12-year-olds, and three 13-year-olds were excluded from the ERP analysis because of artifactual EEG contamination. Therefore, the ERP data of 82 subjects were analyzed.

Miki et al. The development of face perception

The 82 subjects were divided into 6 age groups; i.e., into 8-year-olds (n = 11, 3 males, mean age: 8.6 ± 0.24-years-old), 9-year-olds (n = 17, 7 males, mean age: 9.4 ± 0.22-years-old), 10-year-olds (n = 15, 10 males, mean age: 10.2 ± 0.15-yearsold), 11-year-olds (n = 12, 4 males, mean age: 11.1 ± 0.27 years-old), 12-year-olds (n = 10, 7 males, mean age: 12.5 ± 0.20-years-old), and 13-year-olds (n = 17, 8 males, mean age: 13.4 ± 0.30-years-old). The subjects were recruited from a primary school and a junior high school in Okazaki city, Aichi Prefecture, Japan. All of the subjects were in age-appropriate levels at school, and none of them had learning or attention problems.

All of the subjects and their parents gave their informed consent to participate in the experiment, which was approved by the ethics committee of the National Institute for Physiological Sciences. All of the experiments were conducted according to the Declaration of Helsinki. Each of the subjects was given a reward at the end of the experiment.

## Visual Stimuli

We presented the following five types of stimuli to the children (**Figure 1**):


especially small children, than non-facial stimuli, such as cars, chairs, flowers, butterflies, or animals, and thus, the presentation of the target stimuli might have helped to minimize habituation and drowsiness. The subjects were asked to push a button as quickly as possible when the target stimuli were presented.

We presented the kettle and target stimuli to ensure that the experimental task acted as an implicit face perception task, and the luminance and contrast of the kettle stimuli differed from those of the upright face, inverted face, and eyes stimuli. Therefore, we analyzed the results obtained under conditions (1), (2), and (3).

The upright face, inverted face, eyes, and kettle stimuli each consisted of 50 different images, and thirty different images were used for the target stimuli. The upright and inverted face stimuli did not have mustaches or glasses. All of the images were gray-scaled and unfamiliar to the subjects. The stimuli were shown for a relatively short period (250 ms) in order to minimize the influence of artifacts, and the interstimulus interval lasted for 1000–1200 ms. The stimuli were presented in random order, and a scrambled image, which was made by replacing the 160,000 pixels in the stimulus images with faces, was presented throughout the inter-stimulus interval to minimize the changes in luminance and contrast among the upright face, inverted face, and eyes stimuli (**Figure 1**). In addition, we asked the subjects to blink during the presentation of the scrambled image. Therefore, each trial took 1250–1450 ms.

The stimuli and scrambled images measured 9.6 degrees × 9.6 degrees and were presented using a personal computer (DELL Dimension XPS T750r) and a monitor (Sony GDM-F520). A red light that measured 0.2 degrees in diameter and was located

140 cm from the subject's eyes was presented as a fixation point throughout the experiment. The fixation point, stimuli, and scrambled image were presented in the center of the monitor. The subjects were seated on a chair and were instructed to concentrate on the fixation point during the experiment.

To minimize habituation and drowsiness, each subject took part in more than 10 short-term recording sessions. Each recording session included 19–21 trials of the upright face, inverted face, eyes, and kettle stimuli, and 3–5 trials of the target stimuli. In total, the experiment took less than 30 min. Each session was delivered in a pseudorandom order among the subjects.

## Event-Related Potential (ERP) Recording and Data Analysis

ERP were recorded by averaging EEG using a Neuropack MEB 2200 system (Nihon Kohden, Tokyo, Japan) with nonpolarizable Ag/AgCl electrodes. EEG electrodes were placed at Fz, Cz, T3, T4, C3, C4, Pz, P3, P4, T5, T6, O1, and O2 based on the International 10–20 System, and additional electrodes were placed at T5' (2 cm below T5) and T6' (2 cm below T6) (Taylor et al., 1999; Watanabe et al., 1999). The reference electrode was placed on the tip of the nose, and the ground electrode was placed on the forehead. An electrooculogram (EOG) was also recorded using an electrode located above the right eye and the reference electrode in order to assess the subjects' blinking and eye movements. The impedance of all electrodes was kept at less than 5 kΩ. EEG and EOG were recorded simultaneously with a band-pass of 0.1–50 Hz, and digitized at a rate of 1000 Hz. By using only a 0.1–50 Hz band-pass filter, the noise above 50 Hz was not completely removed. Thus, we used a 60 Hz AC filter, to remove such noise. The time window for the recording ran from 100 ms before to 400 ms after stimulus onset in order to minimize the influence of artifacts.

As for artifact rejection, epochs in which the variations in the EEG and EOG signals were larger than ±80 µV were automatically excluded from the on-line averaging. The percentage of rejected epochs was 10.0% in the 8-year-olds, 20.6% in the 9-year-olds, 25.5% in the 10-year-olds, 25.7% in the 11-year-olds, 14.0% in the 12-year-olds, and 14.8% in the 13-year-olds.

More than 40 ERP trials were averaged for each condition. However, the number of averaged trials was less than 40 for all conditions in three 10-year-olds, one 11-year-old, and one 13-year-old.

As for the ERP analysis, the time window for the analysis ran from 100 ms before to 400 ms after stimulus onset in order to minimize the influence of artifacts, and the data obtained during the 100 ms before stimulus onset were used as the baseline. We analyzed the N170 component from 50 ms before its maximum (negative) to 50 ms after its maximum (negative) using the grandaverage waveforms recorded for each group by the T5 (left) and T6 (right) electrodes. Peak latency was determined individually at the point after stimulus onset at which the N170 amplitude for each condition peaked. In the maximal N170 amplitude analysis, we used both the baseline-to-peak and peak-to-peak methods. As additional analyses, the latency and amplitude (baseline-to-peak) of the P100 component were measured at the O1 (left) and O2 (right) electrodes. N170 was longer in duration and/or had at least two peaks in many of the 8–11-year-old children, and the positive component that followed N170 could not always be clearly identified. Therefore, we did not measure the positive component that followed N170.

The data were analyzed by repeated-measures analysis of variance (ANOVA), and stimulus condition (upright face, inverted face, or eyes), electrode (P100: O1 or O2, N170: T5 or T6), and age (8-, 9-, 10-, 11-, 12-, or 13-years-old) were included as factors. Huynh and Feldt's correction was used if the sphericity assumption was violated. The Bonferroni test was used for post hoc analyses, and p-values of < 0.05 were considered significant.

## Results

## P100

## P100 Waveform

**Figure 2** shows the grand-averaged waveforms obtained for the 8–13-year-old children in all stimulus conditions (upright face, inverted face, and eyes) by the O1 (left) and O2 (right) electrodes. **Table 1** shows the mean (and standard deviation) P100 latency and amplitude (baseline-to-peak) values obtained for each age group in each condition.

## P100 Latency

The stimulus condition (F = 20.20, p < 0.01, partial η <sup>2</sup> = 0.210) and electrode (F = 15.92, p < 0.01, partial η <sup>2</sup> = 0.173) had significant effects on the latency of P100. P100 latency was shortest after the presentation of the upright face stimuli, and shorter P100 latency values were detected at the O2 electrode (right hemisphere) than at the O1 electrode (left hemisphere). Age (F = 0.82, p > 0.05, partial η <sup>2</sup> = 0.051) did not have a significant effect on P100 latency, nor did any of the interactions between the parameters.

We investigated the differences in P100 latency among the stimulus conditions in each age group using post-hoc analysis. P100 latency was significantly longer after the presentation of the inverted face stimuli than after the presentation of the upright face stimuli in the 9-, 10-, and 13-year-old children (9-yearolds: p < 0.01, 10-year-olds: p < 0.05, 13-year-olds: p < 0.05). In addition, P100 latency was significantly longer after the presentation of the eyes stimuli than after the presentation of the upright face stimuli in 10- and 13-year-old children (p < 0.05). The stimulus condition did not have a significant effect on P100 latency in the 8- or 11–12-year-old children (8-year-olds: F = 1.573, p > 0.05, partial η <sup>2</sup> = 0.136; 11-year olds: F = 2.885, p > 0.05, partial η <sup>2</sup> = 0.208; 12-year-olds: F = 2.511, p > 0.05, partial η <sup>2</sup> = 0.218).

## P100 Amplitude

The stimulus condition (F = 31.87, p < 0.01, partial η <sup>2</sup> = 0.295) and age (F = 3.32, p < 0.01, partial η <sup>2</sup> = 0.179) had significant effects on P100 amplitude. P100 amplitude was greatest after the presentation of the inverted face stimuli and decreased as age increased. However, the effect of electrode (F = 0.01, p > 0.05,

by the O1 (left) and O2 (right) electrodes among the three stimulus conditions (upright face, inverted face, and eyes) in each age

100–200 ms for all ages in this figure. P100 amplitude decreased as age increased.

TABLE 1 | Latency and amplitude (baseline-to-peak) of P100 at the O1 and O2 electrodes in the upright face, inverted face, and eyes stimulus conditions.


Data are presented as the mean and standard deviation for each age group.

partial η <sup>2</sup> = 0.000) on the amplitude of P100 was not significant. The stimulus condition × electrode (F = 3.18, p < 0.05, partial η <sup>2</sup> = 0.040) and stimulus condition × age (F = 2.03, p < 0.05, partial η <sup>2</sup> = 0.118) interactions also had significant effects on P100 amplitude.

We investigated the differences in P100 amplitude among the stimulus conditions in each age group using post-hoc analysis. P100 amplitude was significantly greater after the presentation of the inverted face stimuli than after the presentation of the eyes stimuli in the 8–11-year-old children (8-year-olds: p < 0.05, 9-year-olds: p < 0.01, 10-year-olds: p < 0.01, 11-year-olds: p < 0.05). In addition, P100 amplitude was significantly greater after the presentation of the inverted face stimuli than after the presentation of the upright face stimuli in the 9-year-old children (p < 0.05). In 12-year-old children, the stimulus condition did not have a significant effect on P100 amplitude (F = 2.607, p > 0.05, partial η <sup>2</sup> = 0.225). In 13-year-old children, the stimulus condition had a significant effect on P100 amplitude (F = 4.565, p < 0.05, partial η <sup>2</sup> = 0.222). P100 tended to exhibit a greater amplitude after the presentation of the inverted face stimuli than after the presentation of the upright face stimuli, but this difference was not significant (p = 0.061).

#### N170

#### N170 Waveform

**Figure 3** shows the grand-averaged waveforms obtained with the upright face stimuli by the T6 (right) electrode in each age group. The large negative deflection (N170) observed after the presentation of the upright face stimuli was longer in duration and/or had at least two peaks in the 8–11-year-old children, whereas it was sharp and had a single peak in the 12–13-year-old children (**Figure 3**). **Figure 4** shows the grandaveraged waveforms recorded for the 8- to 13-year-old children in each stimulus condition (upright face, inverted face, or eyes) by the T5 (left) and T6 (right) electrodes. The grandaveraged waveforms exhibited a large negative deflection in all age groups and all stimulus conditions. **Table 2** shows the mean (and standard deviation) N170 latency and amplitude (baselineto-peak and peak-to-peak) values obtained in each stimulus condition (upright face, inverted face, and eyes) by the T5 (left) and T6 (right) electrodes.

#### N170 Latency

The stimulus condition (F = 33.24, p < 0.01, partial η <sup>2</sup> = 0.304) and age (F = 11.44, p < 0.01, partial η <sup>2</sup> = 0.429) had significant effects on N170 latency. N170 latency was greatest after the presentation of the eyes stimuli, and decreased as age increased. The effect of electrode (F = 1.98, p > 0.05, partial η <sup>2</sup> = 0.025) on N170 latency was not significant, nor were any of the interactions between the parameters.

We investigated the differences in N170 latency among the stimulus conditions in each age group using post-hoc analysis. Stimulus condition (F = 1.93, p > 0.05, partial η <sup>2</sup> = 0.161) did not have a significant effect on N170 latency in the 8-year-old children. The N170 latency observed after the presentation of the eyes stimuli was significantly longer than those observed after the presentation of the upright or inverted face stimuli in the 10- and 12-year-old children (p < 0.01). On the other hand, the N170 latency observed after the presentation of the eyes stimuli was significantly longer than that observed after the presentation of the inverted face stimuli in the 9- and 11 year-old children (9-year-olds: p < 0.05, 11-year-olds: p < 0.01). Significant differences in N170 latency were observed among all three stimulus conditions in the 13-year-old children, with the upright face stimuli producing the shortest latency and the eyes stimuli producing the longest latency (p < 0.01).

#### N170 Amplitude According to the Baseline-to-Peak Method

The stimulus condition (F = 33.04, p < 0.01, partial η <sup>2</sup> = 0.303) and electrode (F = 22.57, p < 0.01, partial η <sup>2</sup> = 0.229) had significant effects on N170 amplitude, while the effect of age (F = 0.29, p > 0.05, partial η <sup>2</sup> = 0.018) was not significant. The stimulus condition × age interaction (F = 2.79, p < 0.01, partial η <sup>2</sup> = 0.155) also had a significant effect on N170 amplitude. N170 amplitude was greatest after the presentation of the eyes stimuli, and larger N170 amplitude values were detected at the T6 electrode (right hemisphere) than at the T5 electrode (left hemisphere).

We investigated the differences in N170 amplitude according to the baseline-to-peak method among the stimulus conditions in each age group using post-hoc analysis. A significant effect of stimulus condition was detected in the 8–10- and 12-yearold children. In addition, N170 exhibited a significantly greater amplitude after the presentation of the eyes stimuli than after the presentation of the upright face or inverted face stimuli in the 8–10- and 12-year-old children (upright face: 8-year-olds: p < 0.01, 9-year-olds: p < 0.05, 10-year-olds: p < 0.01, 12-year-olds: p < 0.05; inverted face: 8-year-olds: p < 0.05, 9-year-olds: p < 0.01, 10-year-olds: p < 0.05, 12-year-olds:

p < 0.01). In the 13-year-old children, N170 tended to exhibit a greater amplitude after the presentation of the eyes stimuli than after the presentation of the upright face stimuli, which was similar to the results we obtained for the 12-year-old children.

### N170 Amplitude According to the Peak-to-Peak Method

The stimulus condition (F = 31.20, p < 0.01, partial η <sup>2</sup> = 0.291) and electrode (F = 13.49, p < 0.01, partial η <sup>2</sup> = 0.151) had significant effects on N170 amplitude, while the effect of age (F = 1.46, p > 0.05, partial η <sup>2</sup> = 0.088) was not significant. The stimulus condition × electrode interaction (F = 6.33, p < 0.01, partial η <sup>2</sup> = 0.077) also had a significant effect on N170 amplitude. N170 amplitude was smallest after the presentation of the upright face stimuli, and greater N170 amplitude values were detected at the T6 electrode (right hemisphere) than at the T5 electrode (left hemisphere), as was found using the baselineto-peak method.

We investigated the differences in N170 amplitude according to the peak-to-peak method among the stimulus conditions in each age group using post-hoc analysis. A significant effect of stimulus condition was detected in the 10–13-year-old children. The amplitude of N170 was significantly greater after the presentation of the inverted face or eyes stimuli than after the presentation of the upright face stimuli in the 10-, 12-,


TABLE 2 | Latency and amplitude (baseline-to-peak and peak-to-peak values) of N170 at the T5 and T6 electrodes in the upright face, inverted face, and eyes stimulus conditions.

Data are shown as the mean and standard deviation for each age group.

and 13-year-old children (inverted face: 10-year-olds: p < 0.05, 12-year-olds: p < 0.01, 13-year-olds: p < 0.05; eyes: 10 year-olds: p < 0.01, 12-year-olds: p < 0.05, 13-year-olds: p < 0.01). The amplitude of N170 was significantly greater after the presentation of the inverted face stimuli than after the presentation of the upright face stimuli in the 11-year-old children (p < 0.05).

## Latency Differences Between N170 and P100

We assessed the latency differences between N170 and P100 in the left (N170 at T5 and P100 at O1) and right (N170 at T6 and P100 at O2) hemispheres. **Table 3** shows the mean (and standard deviation) latency differences between the N170 and P100 components in the left and right hemispheres.

The resultant data were analyzed by repeated-measures ANOVA with stimulus condition (upright face, inverted face, or eyes), hemisphere (left or right), and age (8-, 9-, 10-, 11-, 12-, or 13-years-old) as factors. The latency difference between N170 and P100 was significantly affected by the stimulus condition (F = 11.11, p < 0.01, partial η <sup>2</sup> = 0.128) and age (F = 7.22, p < 0.01, partial η <sup>2</sup> = 0.322). The effect of hemisphere (F = 0.80, p > 0.05, partial η <sup>2</sup> = 0.010) was not significant. In addition, the stimulus condition × hemisphere × age interaction (F = 1.925, p < 0.05, partial η <sup>2</sup> = 0.112) had a significant effect on the latency differences between N170 and P100. None of the other interactions had similar effects. The latency differences between N170 and P100 were smallest after the presentation of the inverted face stimuli and decreased as age increased.

## Discussion

The results of the present study can be summarized as follows: (1) P100 amplitude decreased significantly as age increased; (2) N170 latency significantly decreased as age increased, and the latency differences between N170 and P100 significantly decreased as age increased; (3) N170 exhibited a significantly longer latency after the presentation of the eyes stimuli than after the presentation of the upright face stimuli in the 10- and 12-year-old children; (4) Significant differences in N170 latency were observed among all three stimulus types in the 13-year-old children; (5) N170 exhibited a significantly greater amplitude after the presentation of the eyes stimuli than after the presentation of the upright face stimuli in the 8–10- and 12-year-old children.

The reduction in P100 amplitude observed with age in the present study agreed with the results of our previous study (Miki et al., 2011). Based on the findings of previous studies (Pastò and Burack, 1997; Skoczenski and Norcia, 2002; Want et al., 2003; Betts et al., 2006; Crookes and McKone, 2009), we speculate that age-related changes in P100 reflect the general development of sensory and/or cognitive function.

The reduction in N170 latency observed with age in the present study was consistent with the findings of a previous study (Taylor et al., 2004), and the latency differences between N170 and P100 significantly decreased as age increased. Based on the findings of previous studies, we speculate that: (1) the reduction in N170 latency seen with age was not influenced by P100 latency; and (2) the observed reduction in N170 latency


TABLE 3 | Latency differences between N170 and P100 in the left (N170 at T5 and P100 at O1) and right (N170 at T6 and P100 at O2) hemispheres in the upright face, inverted face, and eyes stimulus conditions.

Data are shown as the mean and standard deviation for each age group.

was driven by perceptual and cognitive development (Mitchell and Neville, 2004; Doucet et al., 2005), which differs from the underlying mechanism that is considered to be responsible for changes in P100 latency. Previous fMRI studies have observed developmental changes in face-specific areas of the brain (the FFA, OFA, and STS) (Scherf et al., 2007) and the extended face-processing network (e.g., the inferior frontal gyrus) (Joseph et al., 2011). Therefore, we speculate that changes in N170 latency might reflect developmental changes in areas of the brain related to face perception.

In the present study, we found that after the presentation of the upright face stimuli the N170 component was longer in duration and/or had at least two peaks in the 8–11-yearold children, whereas it was sharp and had one peak in the 12–13-year-old children. In a previous study (Taylor et al., 2004), N170 was composed of two subcomponents in about two thirds of young children (less than 10–11-years-old). The first N170 component (N170a) was only present in some young children and was rarely detected in older children. In addition, it had a flatter developmental curve and reached adults levels at a younger age. In contrast, the second N170 component (N170b) showed a prolonged and steeper maturation curve, the latency of which was markedly longer in the younger age groups. Taylor et al. (2004) suggested that these two subcomponents might reflect different functional sources in the temporo-occipital and lateral temporal cortices. The N170 components observed in the 8- to 11-year-old children in the present study were consistent with those described in the above study.

When we used the peak-to-peak method to analyze our data, we found that the amplitude of N170 was affected by the previous component; i.e., P100. In this study, the stimulus condition and electrode had significant effects on N170 amplitude according to both the peak-to-peak and baseline-to-peak methods, while the effect of age was not significant. On the other hand, the stimulus condition × age interaction was demonstrated to have a significant effect on N170 by the baseline-to-peak method, but not the peak-to-peak method. Therefore, we consider that the baseline-to-peak method might be more valuable for investigating developmental changes in N170 amplitude than the peak-to-peak method. In addition, we consider that it is important to analyze both P100 and N170 and to perform comparisons between N170 amplitude data obtained with the baseline-to-peak and peak-to-peak methods during studies of the developmental changes in face perception-related N170 signals. The amplitude of N170 might be affected by the amplitude of P100, and our findings regarding the relationship between P100 and N170 suggest that the development of face perception (reflected by N170) is based on the development of more basic visual functions (indicated by P100).

Previous face inversion studies have shown that inversion disrupted holistic and/or configural processing during face perception, but had little or no effect when the presented stimuli were processed featurally (Freire et al., 2000; Maurer et al., 2002). Mondloch et al. (2002) speculated that configural processing might only approach adult levels after children reach 10 years of age and might develop very slowly. In the present study, the inversion effect was only observed in the 13-year-old children. Based on the abovementioned studies, we speculate that: (1) the upright face stimuli were processed holistically and/or configurally in the 13-year-old children, but not in those younger than 12-years-old; and (2) that at 13 years of age the pattern of responses to facial stimuli becomes similar to those described in adult studies (Watanabe et al., 2003).

The results of the present study were generally consistent with the findings of previous studies (Taylor et al., 1999, 2004; Itier and Taylor, 2004a,b). However, our findings differed from those of previous studies with regard to the age at which the adult response pattern was observed. We consider that cultural differences might have been one of the reasons for this. Many studies have examined cultural differences in face perception processing. Blais et al. (2008) monitored the eye movements of Western Caucasians and East Asians during learning and recognition in a face recognition task and a face categorization by race task, and suggested that the Western Caucasians consistently fixated on the eye region and partially concentrated on the mouth, whereas the East Asian subjects fixated more on the central region of the face. In addition, an fMRI study demonstrated higher facial selectivity in Western individuals in the left FFA and a greater degree of right-sided lateralization in the FFA in East Asians. These findings were consistent with an analytical style of face processing in Western individuals and a holistic processing style in East Asians (Goh et al., 2010). Based on the above studies, we speculated that our findings might differ from those of previous developmental studies of N170 due to variations in face perception processing between Japanese and Western children. We consider that our division of the subjects into one-year age groups, rather than the two-year age groups used in previous studies, might also partially explain these differences.

In the present study, we investigated the development of face perception in Japanese children, and the results obtained led to speculation regarding the changes in face processing strategies (holistic and/or configural vs. feature-based) that occur during childhood and cultural differences in face perception strategies. This study had several limitations, with the most important being that we did not examine children under 7 years-old or over 14-years-old. However, we detected a marked change in the development of face perception during childhood in this study. The second limitation was associated with the presented stimuli. The use of target images including facial features might have resulted in some bias, for example, a bias in the way the children visually explored the facial stimuli (it might have encouraged them to focus on the eyes or the nose/mouth regions), which could have affected the processing strategies they used to perceive faces. In addition, some of the images of kettles with mustaches, glasses, and fake noses that were presented as target stimuli might have been perceived as face stimuli because they were only presented for a short period (250 ms). However, our results were consistent with those of previous studies involving children and adults. Therefore, I consider that the target stimuli used in this study ensured that the experimental task was an implicit face perception task, although they might have introduced some element of bias. The third limitation was that we only examined Japanese children. As described above, cultural differences

## References


in the development of face perception might be important, and we intend to investigate this issue in more detail in future studies. Another limitation was that the properties of the presented face (upright and inverted) and eyes stimuli, e.g., their luminance, differed. P100 might be affected by the properties of visual stimuli rather than differences in the type of processing involved in face perception and so might have varied among the stimulus conditions in the present study. In future studies, we intend to use stimuli with similar properties and investigate the relationships between P100 and N170 during face processing.

## Author Contribution

SW designed the experiments. YH and SW performed the experiments and EEG data recording. YH performed the EEG data analysis and statistical analyses. YT made the program used for the stimulus presentation in this study. KM performed the statistical analyses and wrote the manuscript. RK made suggestions about this study and had overall responsibility for it.

## Acknowledgments

We are very grateful to cooperation by Mishima Primary School in Okazaki City and Okazaki Junior High School affiliated to Aichi University of Education. This study was supported by COI STREAM (Center of Innovation Science and Technology based Radical Innovation and Entrepreneurship Program), Ministry of Education, Culture, Sports, Science, and Technology, Japan.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Miki, Honda, Takeshima, Watanabe and Kakigi. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## The face inversion effect in opponent-stimulus rivalry

## *Malte Persike\*, Bozana Meinhardt-Injac and Günter Meinhardt*

*Research Methods and Statistics, Department of Psychology, Institute of Psychology, Johannes Gutenberg University Mainz, Mainz, Germany*

#### *Edited by:*

*Davide Rivolta, University of East London, UK*

#### *Reviewed by:*

*Guillaume A. Rousselet, University of Glasgow, UK Timo Stein, Charité Universitätsmedizin Berlin, Germany*

#### *\*Correspondence:*

*Malte Persike, Research Methods and Statistics, Department of Psychology, Institute of Psychology, Johannes Gutenberg University Mainz, Mainz, Rheinland-Pfalz, Wallstr. 3, 55099 Mainz, Germany e-mail: persike@uni-mainz.de*

The face inversion effect is regarded as a hallmark of face-specific processing, and can be observed in a large variety of visual tasks. Face inversion effects are also reported in binocular rivalry. However, it is unclear whether these effects are face-specific, and distinct from the general tendency of visual awareness to privilege upright objects. We studied continuous rivalry across more than 600 dominance epochs for each observer, having faces and houses rival against their inverted counterparts, and letting faces rival against houses in both upright and inverted orientation. We found strong inversion effects for faces and houses in both the frequency of dominance epochs and their duration. Inversion effects for faces, however, were substantially larger, reaching a 70:30 distribution of dominance times for upright versus inverted faces, while a 60:40 distribution was obtained for upright versus inverted houses. Inversion effects for faces reached a Cohen's *d* of 0.85, compared to a value of 0.33 for houses. Dominance times for rivalry of faces against houses had a 60:40 distribution in favor of faces, independent of the orientation of the objects. These results confirm the general tendency of visual awareness to prefer upright objects, and demonstrate the outstanding role of faces. Since effect size measures clearly distinguish face stimuli in opponent-stimulus rivalry, the method is highly recommended for testing the effects of face manipulations against non-face reference objects.

**Keywords: binocular rivalry, inversion effect, visual awareness, predominance ratio, face specificity**

## **1. INTRODUCTION**

When presenting highly dissimilar images to corresponding regions of either eye an observer experiences binocular rivalry dynamic alternations of two percepts that compete for dominance. Because the physical stimuli are constantly visible to each eye but conscious perception fluctuates, binocular rivalry ranks among the most intriguing paradigms to study properties of visual awareness. While in earlier conceptualizations it was proposed that binocular rivalry reflected competition between monocular neurons within the LGN and the primary visual cortex (Blake, 1989), it has since been established that competitive interactions at multiple neural sites are involved, including lower and eye-specific, and also higher cortical areas which respond to input from both eyes (Blake and Logothetis, 2002; Tong et al., 2006). Although the issue is still subject to ongoing debate, the involvement of higher, object related cortical levels with input from both eyes has contributed to the idea that neural representations of the two stimuli compete for visual awareness, independent of the eye that actually views the stimulus (Leopold and Logothetis, 1996; Logothetis et al., 1996). A striking observation in favor of pattern competition rather than eye competition was that subjects experienced no dominance changes when sudden eye-reversals of stimulus presentations were introduced in flickering displays (Logothetis et al., 1996), suggesting that eye-independent mechanisms stabilize the conscious experience of the dominant stimulus alternative.

Evidence for pattern competition was mostly found with complex object stimuli which particularly stimulate extrastriate, object related brain regions lacking retinotopic organization and responding largely independent of scale or viewpoint. Using dichoptic presentation of face and house stimuli it was found that activation in the face-tuned fusiform face area (FFA; Kanwisher and Yovel, 2006) alternated with activation in the parahippocampal place area (PPA), which preferably responds to houses and places (Tong et al., 1998), in the same way as if the two single eyes were stimulated with faces and houses in physical alternation. Exploring the remainder FFA activity during the epochs where the perception of intact face stimuli was suppressed it was found that this activity was still greater than the activity caused by invisible scrambled faces (Jiang and He, 2006). This suggests that stimulus processing still reaches higher level areas even if conscious perception is suppressed (Tong et al., 2006).

Earlier studies on binocular rivalry reported influence of object-related, configural stimulus properties. Controlling for low level stimulus properties, faces were still found to have stronger dominance phases compared to random dot patterns (Yu and Blake, 1992). The authors moreover found stronger dominance for dot patterns that could be grouped to meaningful structures ("dalmatian dog") compared to random patterns that lacked this property. Surprisingly, the advantage for the dalmatian dog patterns was found irrespective of whether the subjects had consciously recognized the structure as meaningful, or not. These and related observations support the notion that activity from higher level visual areas rather than adaptation of eye-tuned neurons during their mutual inhibition initiates the perceptual switch among the rivaling percepts.

Yu and Blake (1992) also reported an advantage of upright orientation over inverted presentation for meaningful dot patterns. Such inversion effects in binocular rivalry suggest that familiarity and learning history with common objects influence their time of conscious perception and suppression (Jiang et al., 2007). Inversion effects play a particular role in face perception, since faces are the object category whose correct perceptual assessment depends strongest on the upright orientation (Yin, 1969). Humans are face experts, and can recognize faces correctly even from distorted images, unusual viewpoints, or after significant aging, unless they are turned upside down (Maurer et al., 2002). Even strong distortions, which make a face appear grotesque, remain unnoticed when a face is turned upside-down ("Thatcher illusion"; Thompson, 1980). These observations led to the conclusion that inversion mainly affects processing of the configural properties of faces, while featural properties remain relatively unaffected by inversion (Carey and Diamond, 1977; Murray et al., 2000; Leder et al., 2001). However, there are also claims that the same facial cues are used for upright and inverted faces (Sekuler et al., 2004), and that inversion effects are not different for single features or features in the usual facial configuration (Rakover and Teucher, 1997), leading to a debate whether inversion changes face processing qualitatively (Rossion and Boremanse, 2008) or quantitatively (Riesenhuber et al., 2004; Sekuler et al., 2004; Riesenhuber and Wolff, 2009). However, measures of holistic face perception, such as the part-whole effect (Tanaka and Farah, 1993) and the composite effect (Young et al., 1987), are likewise critically dependent on the upright orientation (Rossion and Boremanse, 2008). Meanwhile, the face inversion effect (FIE) is recognized as one important hallmark of face speciality, and FIE measurement is used whenever the involvement of proprietary face-specific mechanisms is investigated (Maurer et al., 2002).

In binocular rivalry, early evidence for predominance of upright compared to inverted faces was reported by Engel (1959) who asked subjects to give a summary statement about predominance over a fixed epoch of 1 min length. Using a novel variant of binocular rivalry termed continuous flash suppression (CFS; Tsuchiya and Koch, 2005), Jiang and colleagues showed that upright faces break predominance of dynamic noise patterns in the first rival epoch about 400 ms earlier than inverted faces (Jiang et al., 2007). However, no further control objects were used to indicate whether the upright advantage of faces is face-specific. Using the same paradigm and adding house control objects Zhou et al. (2010) replicated the FIE. Upright faces broke the first dominance epoch of noise patterns earlier than inverted faces, while identical durations were obtained for upright versus inverted houses, indicating face specificity of the inversion effect in the CFS paradigm. A recent CFS study with objects from a variety of categories, however, amended this finding (Stein et al., 2012). The authors reported inversion effects for bodies, faces, dogs, and birds, but no or minor ones for lamps and chairs. Using a relative change measure to normalize the effects they documented disproportionately large inversion effects for faces and bodies, indicating that these two object categories are largely separated in terms of the strength of the inversion effect.

The results of Stein and colleagues are promising for using CFS as a paradigm to identify face-specific effects when contrasted with object categories which are analyzed in a part-based fashion, like houses (Yovel and Kanwisher, 2004; Kanwisher and Yovel, 2006) or cars (Cassia et al., 2009). Interestingly, recent reports of face inversion effects all stem from the CFS paradigm (Yang et al., 2007; Stein et al., 2011a,b, 2012). With the traditional opponent-stimulus rivalry paradigm there are currently no data on the inversion effect for faces compared to other objects categories. The current study aims at filling this gap by systematically comparing inversion effects for faces and houses, since houses are preferably chosen as non-face reference objects in neuroimaging studies on face perception. By estimating effect size measures strength and object specificity of inversion effects observed in CFS and opponent-stimulus rivalry can be directly compared. This may offer a offers a basis for deciding which rivalry paradigm is more appropriate for testing a given set of hypotheses.

## **2. MATERIALS AND METHODS 2.1. STUDY OUTLINE**

The study aimed at measuring the effects of stimulus inversion for face and house stimuli in opponent-stimulus rivalry. In experiment I faces and houses rivaled against their inverted counterparts. In experiment II faces rivaled against houses, both in upright and inverted orientation. Eye-reversal and artificial blink events were included to indicate eye- or pattern dominance (Blake et al., 1980; Logothetis et al., 1996). Experimental sessions were executed on four consecutive days to obtain representative within-subject data allowing to generalize over temporal state variations between days. Each session comprised four experimental runs for each of the four stimulus conditions. Since comparison of dominance and suppression across stimulus categories requires a match in low level stimulus properties (Yu and Blake, 1992) we conformed the stimulus material with respect to their spatial dimensions and RMS contrast (Peli, 1990). The latter was achieved via an image manipulation procedure that produced images with identical gray level histograms (see below). Hence, the stimulus material matched not only in gray-level variance, but also in its first order image statistics. Since the proportion of mixed dominance epochs, where subjects could not decide whether stimulus alternative A or B was dominant, increases with image size (Yu and Blake, 1992) we adjusted image size such that not more than 50% of mixed dominance epochs could be expected, while the images were still sufficiently large to contain the relevant object details. This also provided leeway to obtain effects pertaining to each rival alternative and the mixed percept. Dominance was measured in terms of epoch frequency, duration, and their joint effect. Effect sizes and normalized effect measures were estimated.

## **2.2. PARTICIPANTS**

Seventeen German volunteers participated in this study (12 females and 5 males). All were undergraduate students of psychology at the Johannes Gutenberg University Mainz, age span 20–24 years. All participants had normal or corrected to normal vision, using corrective lenses in the latter case. All subjects were naive with respect to the purpose of the experiment. They were given course credit points for participation. The study was conducted in accordance with the Declaration of Helsinki. In detail, subjects participated voluntarily and gave written informed consent to their participation. In addition, participants were informed that they were free to stop the experiment at any time without negative consequences. The data were analyzed anonymously.

### **2.3. APPARATUS**

The experiment was executed on standard desktop computers with Inquisit 4 runtime units. Subjects viewed dichoptically through a custom built mirror stereoscope from a viewing distance of 60 cm. Responses were given via external Cedrus RB-830 response pads with internal high-precision timers for accurate response time measurements. Patterns were displayed on NEC MultiSync E222W TFT displays at 1650 × 1050 pixel resolution and a refresh rate of 60 Hz. No gamma correction was used. The room was darkened so that the ambient illumination approximately matched the illumination on the screen.

### **2.4. STIMULI**

Photographs of faces and houses were selected as stimulus patterns. Face images were selected from the Radboud Faces Database (Langner et al., 2012), house images were sampled from internet sources. Faces were frontal views of eight caucasian models with neutral facial expression. House photographs were eight straight shots depicting the gable end of the structure (see **Figure 1**). Picture backgrounds were removed in Adobe Photoshop. The images were converted to grayscale and downsampled to a picture height of 125 pixels, or 3.37◦ of visual angle. The widths of both faces and houses spanned from 90 to 110 pixels, or 2.42◦ to 2.96◦, depending on the specific aspect proportions of a given image. To achieve maximal congruency in pixel overlap between two dichoptically presented images, pairs of face and house images with similar shape and geometry were assembled. Only these matching pairs of faces and houses were set against each other in the experiment. Images were flipped over the horizontal axis to create inverted versions.

Luminance histograms of all images were equalized with Matlab procedures developed in-house. First, the average histogram of pixel intensity values was computed across all images. An adaptive quantile transformation then conformed the pixel intensities of each image to the average histogram, yielding images with identical luminance histograms. The mean luminance of each image was 0.518 in a normalized [0,1] range, or 93.2 cd/m<sup>2</sup> on screen. Maximum screen luminance was 187.7 cd/m<sup>2</sup> and minimum screen luminance was 3.7 cd/m2. RMS contrast (Peli, 1990) of all images was 0.176 in normalized units. Images were finally superimposed onto a background noise pattern with a size of 150 × 150 pixels, or 4.04◦ × 4.04◦, and a grain resolution of three pixels. The luminance distribution of the noise pattern was sampled from the previously computed average luminance histogram in order to keep the luminance distribution of the whole stimulus unchanged. The background pattern was identical for both eyes and only changed between experimental conditions. This was done to help observers maintain eye vergence on the whole stimulus during foreground changes. In addition, four location markers were placed right outside the corners of the background pattern at positions identical to each eye. The whole stimulus arrangement was displayed on a gray screen canvas with a luminance of 93.2 cd/m2, thereby matching the mean luminance of each stimulus. See **Figure 1** for stimulus examples from experiment I (**Figure 1A**) and experiment II (**Figure 1B**).

## **2.5. PROCEDURE**

Prior to each experimental session, participants completed an extensive calibration procedure to adjust the stereoscope to their ocular anatomy and vergence disposition. In addition, a standard blink test was performed to determine the dominant eye. Fifteen of the seventeen participants were right-dominant.

The main blocks of both experiments comprised two stimulus conditions, constructed from different pairings of stimuli. Experiment 1 contained pairings of (a) upright faces with inverted faces, and (b) upright houses with inverted houses. Experiment 2 paired (a) upright faces with upright houses, and (b) inverted faces with inverted houses. **Figure 1** provides stimulus examples for all stimulus conditions from both experiments. Since each experimental condition presented different stimulus types, the assignment of response button to stimulus category needed to be learned before entering the main experimental block. The learning task consisted of 64 trials, 32 trials for each of the two stimulus categories which were to be juxtaposed in the main experiment. A learning trial was the binocular display of one stimulus, viewed through the stereoscope. Participants had to press the response button corresponding to the stimulus category on screen. Participants were allowed to proceed to the main experiment only if they reached a proportion correct rate of at least 0.96, i.e., no more than 2 errors in 64 learning trials.

A main experimental block started with the dichoptic display of one stimulus pair (**Figure 1C**). Subjects indicated via a button press which of the two stimuli was perceived as unambiguously dominant at any given moment. When none of the two stimuli was dominant, thus resulting in a fused percept containing parts of both stimuli, both response buttons were to be released. A button press was followed by a latency period of 600–800 ms allowing for the dominance percept to consolidate. If the button was released while still within latency, no experimental manipulation commenced. If, however, the button press was retained until after the latency period, one of three experimental manipulations took effect with equal likelihood. First, the stimulus presentation could remain unaltered by keeping the same stimulus arrangement on screen as before the button press (the "no-change," or "normal" condition). Second, the stimulus presentation could be reversed between eyes, so that each eye would afterwards be presented with that stimulus which the other eye had viewed before (the "eye reversal" condition). Third, both stimuli could disappear for two frames (33 ms) leaving only the underlying background mask visible, and then reappear in the same stimulus arrangement as before (the "blink" condition). When either an eye reversal or a blink had occurred, the next three button presses never triggered a latency phase but had the respective epoch always be of the no-change variant without any stimulus change. This was done in order to avoid rapid cascades of eye reversals or blinks on consecutive button presses. The procedure further ascertained that about 1/2 of all epochs were no-change epochs, 1/4 eye reversal epochs and 1/4 blink epochs.

The four opponent-stimulus rivalry conditions were blocked and administered during one single session. Participants were asked to take brief pauses between experimental blocks. Each participant attended four sessions for the respective experiment over the course of four consecutive days. A session comprised 64 epochs in each learning task and 240 epochs in each experimental block, 180 of which were no-change trials, 30 eye reversal trials, and 30 blink trials. A session took observers between 40 and 60 min. Participants were free to stop the experiment at any given time via an exit button if they felt the task became uncomfortable.

#### **2.6. DEPENDENT MEASURES AND OUTLIER CLEARING**

The length of dominance epochs was recorded for each stimulus category in both possible pairings (see previous section). A dominance epoch was defined as the time duration for which participants had one of the response buttons depressed. Moreover, the duration of ambiguous epochs was recorded, where participants reported an unclear percept containing parts from both presented stimuli. Note that pairwise stimulus rivalry, as employed here, may yield different dominance durations for the same stimulus category, depending on which other stimulus it is paired with. Hence, each of the four stimulus conditions produces two sets of dominance durations. For example, dominance durations for the "upright face" category can either stem from its paring with inverted faces or upright houses.

For each subject the data from all four sessions per stimulus condition were merged into one data set. Since response time measurements are susceptible to lapses in attention and erroneously prolonged button presses, dominance durations were cleared for outliers by calculating the mean (*M*) and standard deviation (*SD*) for each set of dominance epochs and clipping all dominance durations beyond *M* + 2.5*SD*. For no participant, more than 1.94% of the recorded dominance epochs were excluded. The raw data of all subjects, including the positions of the outlier criteria on the time continuum, are supplied in the electronic supplement of this article.

#### **2.7. DATA ANALYSIS**

The frequency of dominance epochs and their duration were analyzed with repeated measurement ANOVA. Separate analyses were carried out for each experiment and each dependent variable. The data of experiment I were analyzed for effects of percept (upright or inverted), object type (face or house) and switch (no-change, blink, and eye reversal). The data of experiment II were analyzed for effects of percept (face or house), orientation (upright or inverted) and switch. For analyzing the frequency data the percept factor included the epochs where observers experienced ambiguous percepts. For analyzing the dominance durations, epochs with mixed percepts were not included. Correspondingly, and as commonly defined (Yu and Blake, 1992), we calculated the predominance ratio (*PR*) as the ratio of the summed dominance duration for one single stimulus alternative (e.d., *A*) to the sum of the added dominance durations of both rivaling stimulus alternatives (*A* + *B*)

$$PR(\mathbf{A}) = \frac{\Sigma\_D(\mathbf{A})}{\Sigma\_D(\mathbf{A}) + \Sigma\_D(\mathbf{B})},\tag{1}$$

hence *PR*(B) = 1 − *PR*(A). *PR* measures were calculated on the level of individual subjects, and were analyzed statistically.

In order to normalize differences in the mean duration of dominance epochs for the opponent rival stimuli we calculated a relative change measure *C*% as

$$C\_{\%} = \frac{D\_A - D\_B}{D\_A} \times 100\%.\tag{2}$$

where *A* was defined as the condition for which longer dominance epoch durations were expected, i.e., the upright orientation for rivalry of upright against inverted objects and the face category for rivalry of faces against houses.

## **3. RESULTS**

#### **3.1. FREQUENCIES OF DOMINANCE EPOCHS**

**Tables 1**, **2** summarize the frequency statistics of the dominance epochs in the two experiments, and **Figure 2** shows the mean number of epochs with their confidence intervals. In the nochange condition without eye reversal or blink the observers experienced about 665 dominance epochs for rivalry of faces and houses against their inverted counterparts, and for rivalry of faces against houses. In about half of all epochs (between 55% and 65%) the observers experienced "mixed" percepts, where they could not unambiguously decide between seeing alternative A or B. For the given stimulus size of about 3◦ visual angle, this result is in line with earlier findings (Yu and Blake, 1992). For the remaining epochs of unique percepts observers experienced a higher frequency of dominance epochs for upright than for inverted stimuli in experiment I (see **Figure 2A**). ANOVA revealed no overall effect of object type (face or house) [*F*(1, 16) = 0.01, *p* = 0.941], an effect of



*N* (houses) 876.8

**Table 2 | Frequencies of dominance epochs for rivalry of faces versus houses (***N* **= 17).**


*N* (upright) 851.7

*N* (inverted) 831.2

percept [*F*(2, 32) = 55.43, *p* < 0.001], and an interaction of percept with object type [*F*(2, 32) = 13.46, *p* < 0.001], indicating a stronger effect of stimulus inversion for faces compared to houses. Pairwise comparisons within object category revealed inversion effects (calculated as the difference upright—inverted) for faces [*F*(1, 16) = 26.13, *p* < 0.001] and for houses [*F*(1, 16) = 18.85, *p* < 0.001].

For experiment II ANOVA indicated no overall effect of orientation [*F*(1, 16) = 2.43, *p* = 0.138], an effect of percept [*F*(2, 32) = 33.47, *p* < 0.001], and no interaction of percept with orientation [*F*(2, 32) = 0.344, *p* = 0.711], substantiating the same pattern of effects in the two panels of **Figure 2B**. Pairwise comparisons within each orientation revealed no differences in the frequency of dominance epochs among the two rival objects in upright [*F*(1, 16) = 0.77, *p* = 0.392] and inverted presentation [*F*(1, 16) = 1.36, *p* = 0.261].

**Tables 1**, **2** validate that after data clearing blink and eye reversal epochs taken together still occurred with about the same frequency as the unique percepts in normal rivalry (i.e., the nochange condition). Eye reversal and blink trials did practically not occur during mixed percepts, since blink or eye reversal trials were initiated only when the subjects indicated prolonged unique dominance of one percept. Exceptions could occur only when the observer released a key precisely during the frame refresh before an eye reversal or switch. Such trials were excluded from the analyses.

### **3.2. DURATIONS OF DOMINANCE EPOCHS**

**Tables 3**, **4** summarize the statistics for the average dominance durations of the two stimulus alternatives. The data are illustrated in **Figure 3**. For rivalry of upright against inverted objects (experiment I) ANOVA yielded main effects of percept [*F*(1, 16) = 28.16, *p* < 0.001] and switch condition [*F*(2, 32) = 43.32, *p* < 0.001], but no effect of object type [*F*(1, 16) = 1.51, *p* = 0.236]. The object type × percept interaction failed significance [*F*(1, 16) = 2.11, *p* = 0.165]. However, this result was due to the inclusion of the blink and eye reversal conditions. Analysis of just the data for normal, undisturbed rivalry epochs revealed a significant object type × percept interaction [*F*(1, 16) = 6.51, *p* < 0.025], corresponding to the intersecting scheme of the means (see **Figure 3A**, solid symbols for faces and houses). The data in **Table 3** show that the mean dominance times for upright faces were about 1000 ms longer than the mean dominance times for inverted faces, while the inversion effect for houses was less than 500 ms. The reduction of dominance time due to inversion (*C*%) was 30% for faces, compared to just 12.5% for houses in normal rivalry. Estimation of effect size for the inversion effects via the population variance estimates from the two paired samples (*d* = μ/σˆ*pop*) revealed a large effect size (*d* > 0.8) for the inversion effect of faces, but a medium effect size for the inversion effect of houses (*d* ≈ 0.5), referring to Cohen's effect size classification (Cohen, 1988). Note that effects sizes for stimulus inversion in epochs with artificially induced termination (i.e., blink or eye reversal) yielded similar results (see Discussion).

For rivalry of faces against houses (experiment II) ANOVA indicated main effects of percept [*F*(1, 16) = 10.37, *p* < 0.005] and switch condition [*F*(2, 32) = 27.99, *p* < 0.001], but no effect of orientation [*F*(1, 16) = 2.45, *p* = 0.136]. The orientation × percept interaction failed significance [*F*(1, 16) = 1.01, *p* = 0.329]. This result persisted when analyzing the data for no-change conditions only [*F*(1, 16) = 0.17, *p* = 0.689], corresponding to the parallel course of the means (see **Figure 3B**). Overall, the data demonstrate longer dominance durations for faces compared to houses independent of the orientation of the objects. The average difference in dominance duration was about 750 ms, which corresponds to 25% shorter dominance durations for houses compared to faces in the relative change measure, *C*%. Calculation of Cohen's *d* revealed a medium to large effect size of about *d* = 0.64 (see **Table 4**).

In both experiments the effects of blink and eye reversal practically coincided (see **Figure 3**). Both led to a strong shortening of the actual dominance epoch, having observers signal

## **Table 3 | Mean durations of dominance epochs (seconds) for rivalry of upright versus inverted objects (***N* **= 17).**


**Table 4 | Mean durations of dominance epochs (seconds) for rivalry of faces versus houses (***N* **= 17).**


**FIGURE 3 | Mean durations of dominance epochs (seconds) for upright faces and houses rivaling against their inverted counterparts (A), and faces rivaling against houses (B).** Error bars indicate 95% confidence limits of the means.

perceptual change approximately 750 ms after the manipulation occurred (see **Tables 3**, **4**). Dominance of upright faces and houses apparently survived the manipulation for some extra time in experiment I (see **Figure 3** and Discussion).

To check whether the results for the durations of the dominance epochs depend on the position on the duration scale we additionally analyzed the three quartiles of the dominance epoch duration distributions. Note that the dominance epoch durations usually follow a Gammy distribution (see Logothetis et al., 1996), which also holds for the data of this study for normal rivalry epochs which were not artificially terminated by blink or eye reversal (see distribution functions of dominance epoch durations in the electronic data supplement of this article). This means that, generally, Mod < Median < Mean holds for the duration data, so the distributions are positively skewed and the mean is the largest of all three distribution statistics, and is usually located between the median and the 3rd quartile. The results (see Supplementary Materials, Figure S1, and Tables S1, S2) show that the major findings obtained with the mean durations are maintained with all three quartiles: For normal rivalry there is an inversion effect of about 30% for faces, compared to just 12.5% for houses, and the duration advantage of faces over houses is about 25%. Also the effect sizes in the Cohen's *d* measure differ only marginally across the different duration statistics. This indicates that the effects of inversion and object category do not concern a particular band of epoch durations (e.g., only the longer ones), but all epoch durations to similar degrees. This is further indicated by the fact that the skewness of the distributions, measured via the third central moment, *m*3, is not modulated by inversion or object category in normal rivalry (see Table S3 in Supplementary Materials).

In order to get hints at possible response strategies in favor of upright objects (experiment I), or in favor of faces when rivaling against houses (experiment II), respectively, we analyzed the durations of the ambiguity epochs between the unique perceptual states (see Table S4 in Supplementary Materials). For experiment I the ambiguous epochs between the transition from upright to inverted objects and between the transition from inverted to upright objects had practically the same length [faces: = 66 ms, *t*<sup>16</sup> = −0.463, *p* = 0.649; houses: = 63 ms, *t*<sup>16</sup> = −0.554, *p* = 0.587]. However, for rivalry of faces against houses, the ambiguity epochs before the face percept were about 300–450 ms shorter than before the house percept [upright: = 314 ms, *t*(16) = −2.551, *p* < 0.05; inverted: = 451 ms, *t*(16) = −2.761, *p* < 0.05], indicating that subjects tended to resolve the ambiguity state earlier in favor of the face than the house percept. This may have perceptual or non-perceptual reasons (see Discussion).

## **3.3. PREDOMINANCE RATIOS**

The analyses in the foregoing sections has shown that upright objects gain an advantage in both the frequency of dominance

**Table 5 | Predominance ratio statistics for rivalry of upright versus inverted objects (***N* **= 17).**


**Table 6 | Predominance ratio statistics for rivalry of faces versus houses (***N* **= 17).**


epochs and their mean durations. Since the absolute dominance time of a perceptual alternative is given by the sum of durations of all its dominance epochs, the alternative which is more frequently dominant and has longer dominance periods will have larger absolute dominance time, and therefore show the larger predominance ratio (*PR*; see section 2). The **Tables 5**, **6** summarize the predominance ratios and their statistics for faces and houses rivaling against their inverted counterparts (**Table 5**) and faces rivaling against houses in upright and inverted orientation (**Table 6**). Using the *PR* the inversion effect is given by the deviation from the expected value *E*(*PR*) = 0.5 for equal absolute dominance durations (IE, last line of **Tables 5**, **6**, listed in percent). A one sample test was calculated for the deviation of the *PR* from 0.5. The *PR* data from experiment I suggest significant inversion effects (*PR* > 0.5) for both faces and houses in all conditions. For normal rivalry (i.e., the no-change condition), the proportions of dominance times for upright and inverted objects were approximately 70:30 for faces, while, for houses, they were approximately 60:40. Calculation the odds ratio for the predominance ratios according to

$$ORR\_{IE} \text{(face,house)} = \frac{PR \text{(upright face)} / (1 - PR \text{(upright face)})}{PR \text{(upright house)} / (1 - PR \text{(upright house)})} \tag{3}$$

yielded a value of 1.59, indicating 1.6 times larger odds for upright faces compared to upright houses. For rivalry of faces against houses the *PR* values reveal dominance time proportions of approximately 60:40 in favor of faces (see **Table 6**) in normal rivalry, which is a significant deviation from an even distribution. This occurred for upright and inverted faces with approximately equal likelihood (*OR* = 1.04).

#### **4. DISCUSSION**

Measuring inversion effects for faces and houses in opponentstimulus rivalry has revealed a strong advantage for upright objects. While inversion effects were found for both object categories, the effects for faces were significantly stronger, and involved both the frequency (see **Table 1** and **Figure 2A**) and the mean duration of dominance epochs (see **Table 3** and **Figure 3A**). Upright houses retained an advantage over inverted houses mostly with respect to mean epoch duration, and a smaller one in their frequency (ibid). The joint effect of frequency and duration of dominance epochs is impressive for faces, showing a distribution of 70:30 of total dominance time for upright faces rivaling against their inverted counterparts, compared to a 60:40 distribution for upright versus inverted houses. Moreover, the mean dominance duration advantage for upright faces of about 1 s, with an effect size of *d* = 0.85 is impressive, and contrasts strongly with the advantage of upright houses of scarcely half a second, amounting to an effect size of *d* = 0.33. The canonical result of experiment I is that both object categories show inversion effects in opponent-stimulus rivalry, but the effects for faces are disproportionately stronger. This means that both object classes are well separated with respect to their inversion effects in opponent-stimulus rivalry. The results of experiment II show that dominance epochs for faces and houses occur with equal frequency (see **Figure 2B** and **Table 2**), but the epochs of houses are about 25% shorter (see **Table 4**), leading to a 60:40 distribution of total dominance times for faces and houses independent of orientation. Overall, the results demonstrate that upright faces enjoy privileged presence in visual awareness.

We included blink and eye-reversal events in order to assemble evidence whether rivalry of common objects, which are known to be processed in specialized brain areas (Tong et al., 1998), rests more on eye- or pattern dominance (Blake et al., 1980; Logothetis et al., 1996). The most intriguing result found for these manipulations is that they yielded practically the same effect, namely terminating the current rivalry epoch. Dominance epochs in these conditions are about half as long as normal dominance epochs (see **Figure 3** and **Tables 3**, **4**), and their mean duration of about 1500 ms shows that these epochs terminate roughly 700–800 ms after the manipulation took effect. This is an expected delay caused by the evaluation of the changed percept and response preparation. If dominance rests on eye-specific mechanisms, immediate termination of the epoch is expected for the eye-reversal condition (Blake et al., 1980; Logothetis et al., 1996). However, since there is a local spatio-temporal luminance change caused by both eye reversal and blink, the termination of the current dominance epoch may be due to just this. A blink is merely a temporal disturbance of the same spatial image presentation while eye-reversal switches the eye-specific channels through which higher level object areas receive the stimulus input. Termination of their input should exert a greater effect than a brief interruption of the input flow in the same channels. In fact, it did not, regardless of the patterns which were rivaling. This points to pattern dominance (Logothetis et al., 1996) over eye dominance (Blake et al., 1980) for rivaling faces and houses. In further support of pattern dominance we observed inversion effects for faces and houses in these two conditions (see **Figure 3A**, and **Tables 3**, **5**). Upright objects survived an eye reversal or blink for a longer time than their inverted counterparts, indicating that the termination of the dominance epoch is, at least partly, under higher level control, and not fully determined by the physical screen event.

The scheme of results for inversion effects reported here (experiment I) contrasts with effects found in continuous flash suppression (CFS), where a strong FIE was found, but no inversion effect for houses (Zhou et al., 2010). Stein et al. (2012) used CFS to study inversion effects for a large variety of objects. As in the present study a relative change measure was reported, which gauges the size of the effect independent of its absolute position on the time scale. For the *C*% values, the authors obtained about 25% for faces, 20% for bodies, 6% for dogs and birds, and practically no effects for inanimate objects like lamps and chairs. Houses were not tested. In this study we obtained *C*% values of about 30% for faces and 12.5% for houses. Although the data basis for the inversion effect in different binocular rivalry techniques is limited at the time, the superior inversion effects for faces and bodies in the study of Stein and colleagues indicate that CFS lets such objects reach visual awareness earlier which combine effects of familiarity and long-lasting learning (expertise) with the effects of domain-specific processing in specialized brain areas. Faces (FFA) and bodies (extrastriate body area (EBA) and fusiform body area (FBA; see Brandman and Yovel, 2010) were the only objects used in the study of Stein and colleagues that match both criteria. Houses only fit with the latter criterion (see Introduction), and fail to induce an inversion effect in CFS (Zhou et al., 2010). Findings of Jiang et al. (2007) point in the same direction. Using CFS they found strong inversion effects for faces and for Chinese and Hebrew words, but the latter only for readers of their own language.

In opponent-stimulus rivalry, where two unmasked and clearly visible stimulus alternatives compete for perceptual dominance, inversion effects are not limited to objects with domain specific processing and objects of expertise. Even for noisy dot figures that are more easily combined into meaningful objects under upright viewing conditions (Yu and Blake, 1992) the upright orientation is privileged. Moreover, the clear inversion effect obtained for houses in this study shows that in direct opponentstimulus rivalry the upright view is preferred for those objects which are meaningful to us as common objects predominantly in upright orientation. We should therefore expect that plants, trees, chairs and lamps, which all failed to yield an inversion effect in CFS (Stein et al., 2012) yield inversion effects when paired in opponent-stimulus rivalry. The magnitudes of the inversion effects for faces and houses in opponent-stimulus rivalry resembles the magnitudes of inversion effects obtained for a variety of face and non-face objects in the seminal study on the effects of inversion by Yin (1969). The author compared recognition memory for photographs of faces with other objects which are mostly seen upright in everyday life (houses, airplanes, stickfigures). He obtained inversion effects for all objects, but recognition memory for faces was disproportionately impaired by inversion. This let him conclude that inversion effects reflect an experience dependent component that concerns all mono-oriented objects, as well as a component that is specific for faces. Apparently, both components shine through in direct opponent-stimulus rivalry, while in CFS only the latter component takes effect, comprising both generic category specific expertise (Carey and Diamond, 1977) and domain specificity (Kanwisher, 2000; Yovel and Kanwisher, 2004).

While studying inversion effects of the same stimuli in binocular rivalry is not confounded with low level image differences (experiment I), category specific effects (experiment II) are not easily evaluated. In this study we matched images for their 1st order luminance statistics, since images with larger contrasts are known to reduce the time of their suppression while their dominance times remain unchanged (Blake and Logothetis, 2002). We thus can assume that the 60:40 advantage for faces compared to houses is not due to different luminance histograms of both categories. Differences may, however, arise from category specific spatial frequency spectra. Control of amplitude spectra for faceand non-face objects is possible, but at the cost of a significant loss in face detail information (Willenbockel et al., 2010). Most current CFS studies on the inversion effect did not apply control of low level image properties, since they were not aiming at across category comparison of suppression times.

Results for opponent-stimulus rivalry show that particularly large inversion effects can be expected for faces, and minor but significant ones for other common mono-oriented objects. Hence, face speciality is well reflected by dominance in binocular rivalry. The large dominance advantage for upright faces makes the paradigm particularly suitable to study domains of face perception where the inversion effect is highly diagnostic, such as featural and relational image manipulations (Leder and Bruce, 2000; Leder et al., 2001), familiarity (Hancock et al., 2000; Veres-Injac and Persike, 2009), and own/other race effects (Young et al., 2012). Further, the smaller but present inversion effect for common mono-oriented objects renders them highly suitable as nonfacial benchmarks. Inversion effects in CFS appear to be smaller and tightly focused on objects of expertise with domain specific processing. Hence, CFS exhibits higher categorial selectivity of the inversion effect.

A disadvantage of having observers track their perceptual states in opponent stimulus rivalry is that the tracking results may be confounded with possible response preferences, since subjects may tend to resolve ambiguous percepts earlier in favor of a preferred stimulus alternative. To account for possible response preferences, some authors use catch trials in which mixtures of both patterns overlayed in transparency, are presented to both eyes. A response bias in favor of one category is inferred from asymmetrical results in the dominance measure for the same mixture proportions, e.g., for 70:30 compared to 30:70 (Lee and Blake, 2004; Baker and Graf, 2009). Using this technique Baker and Graf (2009) found no evidence for a response tendency toward more familiar patterns when natural images rivaled against noise. We decided not to include such catch trials, since we already included the "blink" and "eye-reversal" trials, and interleaved binocular trials interfere with the dichoptic viewing cycle. However, analysis of the epochs with mixed percepts can give valuable hints whether possible response preferences might bias the subjects' perceptual reports. If such a bias exists, then the observers should signal the end of a mixed percept earlier when going from stimulus alternative A to B compared to moving from B to A. This means that, if there is a response bias toward one stimulus alternative, the mean durations of both kinds of mixed percepts should not be the same. The results (see Table S4 in Supplementary Materials) indicate same durations of the epochs with mixed percepts between upright and inverted objects and between inverted and upright objects, for both faces and houses. However, for rivalry of faces against houses, the mixed epochs that were resolved into faces were 300–450 ms shorter than the mixed epochs that ended up in houses, indicating a perceptual or a decisional asymmetry in the perceptual alternations among the object categories. On the basis of the present data it cannot be excluded that the observed face-to-house dominance ratio of 60:40 rests, at least partly, on response preferences for faces.

It is important to note that in opponent-stimulus rivalry observers just indicate what they actually see, and the stimulus alternatives are clearly visible and unmasked objects. In CFS, however, subjects perform a speeded detection task and the stimulus of interest is masked by a highly effective spatio-temporal noise masker. In view of the fact that there is external noise and decision noise in CFS it is not surprising that the influence of higher level stimulus properties, like structure and meaning, do not take effect so easily. However, CFS is much more apt for studying higher level stimulus influence on *unconscious* processing, including subcortical processing that may reach object-selective areas via subcortical projections (Pasley et al., 2004; Williams et al., 2004). Investigators may decide which paradigm applies best for the hypotheses under scrutiny.

## **AUTHOR CONTRIBUTIONS**

All authors contributed equally to conception and design of the study. Malte Persike conducted the experiments and data preparation. Günter Meinhardt contributed data analysis and interpretation. All authors were involved in writing, preparation of the manuscript and its final approval. All authors agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum.2014. 00295/abstract

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 March 2014; accepted: 22 April 2014; published online: 15 May 2014. Citation: Persike M, Meinhardt-Injac B and Meinhardt G (2014) The face inversion effect in opponent-stimulus rivalry. Front. Hum. Neurosci. 8:295. doi: 10.3389/fnhum. 2014.00295*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Persike, Meinhardt-Injac and Meinhardt. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Photographic but not line-drawn faces show early perceptual neural sensitivity to eye gaze direction

Alejandra Rossi 1,2 \* † , Francisco J. Parada2,3,4† , Marianne Latinus 3,5 and Aina Puce1,2,3

<sup>1</sup> Cognitive Science Program, Indiana University, Bloomington, IN, USA, <sup>2</sup> Program in Neuroscience, Indiana University, Bloomington, IN, USA, <sup>3</sup> Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, USA, <sup>4</sup> Department of Psychiatry, Harvard Medical School, Boston, MA, USA, <sup>5</sup> Institut de Neurosciences de la Timone, UMR7289, CNRS, Aix-Marseille Université, Marseille, France

Our brains readily decode facial movements and changes in social attention, reflected in earlier and larger N170 event-related potentials (ERPs) to viewing gaze aversions vs. direct gaze in real faces (Puce et al., 2000). In contrast, gaze aversions in linedrawn faces do not produce these N170 differences (Rossi et al., 2014), suggesting that physical stimulus properties or experimental context may drive these effects. Here we investigated the role of stimulus-induced context on neurophysiological responses to dynamic gaze. Sixteen healthy adults viewed line-drawn and real faces, with dynamic eye aversion and direct gaze transitions, and control stimuli (scrambled arrays and checkerboards) while continuous electroencephalographic (EEG) activity was recorded. EEG data from 2 temporo-occipital clusters of 9 electrodes in each hemisphere where N170 activity is known to be maximal were selected for analysis. N170 peak amplitude and latency, and temporal dynamics from Event-Related Spectral Perturbations (ERSPs) were measured in 16 healthy subjects. Real faces generated larger N170s for averted vs. direct gaze motion, however, N170s to real and direct gaze were as large as those to respective controls. N170 amplitude did not differ across line-drawn gaze changes. Overall, bilateral mean gamma power changes for faces relative to control stimuli occurred between 150–350 ms, potentially reflecting signal detection of facial motion. Our data indicate that experimental context does not drive N170 differences to viewed gaze changes. Low-level stimulus properties, such as the high sclera/iris contrast change in real eyes likely drive the N170 changes to viewed aversive movements.

#### Keywords: N170 ERP, real faces, line-drawn faces, gaze aversion, apparent motion

## Introduction

Successful social behavior requires evaluating incoming sensory information and merging it with situationally relevant behavioral responses. Though a part of our social life may rely on purely reflexive behaviors, specialized neural activity is needed in evaluating social cues (Stanley and Adolphs, 2013). Over the past two decades social neuroscience, the study of social and cognitive influences on biological processes (Cacioppo, 1994; Cacioppo et al., 2000; Ochsner and Lieberman, 2001), has aimed to generate a brain-based understanding of social behaviors. An individual's social cognitive understanding of the world is likely to not be context-invariant, however, the effects of task and experimental context on social cognition are seldom studied. In the case of social attention, in daily life a gaze change will occur in the context of

#### Edited by:

Davide Rivolta, University of East London, UK

#### Reviewed by:

Christine Parsons, University of Oxford, UK John Towler, Birkbeck, University of London, UK

#### \*Correspondence:

Alejandra Rossi, Cognitive Science Program, Indiana University, 1900 East Tenth St., Bloomington, IN 47405, USA alejandrarossic@gmail.com

†These authors have contributed equally to this work.

> Received: 29 April 2014 Accepted: 19 March 2015 Published: 10 April 2015

#### Citation:

Rossi A, Parada FJ, Latinus M and Puce A (2015) Photographic but not line-drawn faces show early perceptual neural sensitivity to eye gaze direction. Front. Hum. Neurosci. 9:185. doi: 10.3389/fnhum.2015.00185 directed emotions and actions from not only one's self, but from others around us. This social environment, with many multisensory cues and continually changing context, is difficult to reproduce in a controlled laboratory setting. However, even in a controlled laboratory setting, experimental context can potentially modulate neural responses to particular stimulus conditions or tasks, and may underlie some of the differences observed between studies in the literature. In a laboratory setting, experimental context can be created within a trial, across trials or conditions, or across experimental sessions. Context effects could potentially be driven by the characteristics of the stimuli (bottom-up), or by task demands/instructions to subjects (top-down).

One particularly striking experimental context effect has been reported to viewing faces. It has long been known that the N170 event-related potential (ERP) is strongly driven by the physical or structural characteristics of a face stimulus (Bentin et al., 1996). In an elegant experimental manipulation, Bentin and Golland (2002) recorded an N170 ERP evoked to (static) schematic line-drawings of faces, scrambled versions of the same faces, and line-drawings of common objects. In their design different subject groups were exposed to the stimuli with different block orders. The scrambled, or jumbled, versions of the line-drawn face stimuli had recognizable features, whose position relative to the outline of the face was altered. As expected, N170s were elicited to all stimulus categories, and were significantly larger to the intact schematic faces in both experiments. Critical to the current discussion, significantly larger N170s occurred to jumbled schematic faces but only when that stimulus block directly followed the schematic face block (Bentin and Golland, 2002), indicating how important stimulus-induced context effects can be in a laboratory setting. In a different study, N170 amplitude elicited to Moonee faces decreased by priming with photographic images of the same individuals represented in the Moonee faces (Jemel et al., 2003). The strongest priming effect occurred to images that were the actual photographic image of the Moonee face stimulus (a bottom-up effect), however, priming was also observed to different real images of the same individual relative to the Moonee faces (top-down effect) (Jemel et al., 2003). As a third example of the importance of experimental context effects, differences in the lateralization of N170 to faces can occur as a function of stimulus conditions used in the experiment. For example, the classic right lateralization of N170 is seen when faces are randomly presented among other object classes (e.g., Bötzel et al., 1995; Bentin et al., 1996; Eimer, 1998; Itier and Taylor, 2004) compared to a bilateral or even left-lateralization pattern when faces are presented in series with other faces (Deffke et al., 2007). These findings caution how important experimental context can be on N170s elicited to faces (Maurer et al., 2008). Indeed, N170 is larger to ambiguous face-like stimuli that are perceived as faces relative to the same stimuli when they are not seen as faces (George et al., 1996; Sagiv and Bentin, 2001; Bentin and Golland, 2002; Latinus and Taylor, 2005, 2006). These effects have been proposed to be driven by stimulus context by a number of investigators (Bentin and Golland, 2002; Latinus and Taylor, 2006).

Isolated eyes evoke larger and delayed N170s relative to full faces (Bentin et al., 1996; Jemel et al., 1999; Puce and Perrett, 2003). Hence, the context of the face itself (e.g., outline and other face parts) may affect the neural response elicited to the eye stimulus—an effect that does not occur to presenting other face parts in isolation. Due to its sensitivity to dynamic gaze transitions (Puce et al., 2000; Conty et al., 2007), N170 has been posited to be a neural marker of communicative intent (Puce, 2013). Relevant for the present study, N170s to dynamic gaze aversions are larger and earlier than those to gaze transitions looking directly at the observer (Puce et al., 2000; Watanabe et al., 2002; but see Conty et al., 2007). This effect occurs to full images of faces, and isolated eyes (Puce et al., 2000), suggesting that N170 signals changes in social attention, and reflects the potential salience of gaze direction (Puce and Perrett, 2003; Conty et al., 2007).

N170 modulation to dynamic facial movements is not exclusive to eyes: larger N170s occur to mouth opening vs. closing movements—potentially reflecting a response to a pending utterance (Puce et al., 2000), and this effect occurs in both real and line-drawn faces (Puce et al., 2003; Rossi et al., 2014). Unlike in dynamic mouth motion, N170s to gaze aversions are strongly modulated by stimulus type: real faces show N170 differences to averted vs. direct gaze (Puce et al., 2000), whereas line-drawn faces do not (Rossi et al., 2014). These differences beg the question about effects of stimulus-driven context effects on the N170 elicited to dynamic facial movements. Hence, here we recorded N170 ERPs to dynamic gaze transitions to both real and line-drawn dynamic face images, and scrambled controls within the same experiment (using an experimental structure similar to that of Puce et al., 2003). We performed a standard ERP peak analysis, focusing on N170, and reasoned that if stimulus-context effects were driving N170 modulation, we would expect to observe larger N170s to gaze aversion vs. direct gaze for both real and line-drawn faces. In contrast, if the N170 effect was driven by low-level stimulus features only in the eye stimuli e.g., high iris/sclera contrast in the real images of faces, then the N170 effect would be seen only to dynamic images using real faces and not to line-drawn face images. This N170 modulation would not be predicted to occur for real control stimuli, in line with our previous studies. Finally, if the N170 effect was driven by a general low-level effect of local stimulus contrast change (occurring in both face and control stimuli), then we might expect to observe larger N170s to the real faces and their respective controls, relative to the line-drawn stimuli.

As well as examining averaged ERP activity, we also investigated oscillatory electroencephalographic (EEG) behavior post-motion onset to all stimulus types at electrode sited generating maximal N170 activity in a frequency range of 5–50 Hz. Previous studies evaluating facial motion effects have focused exclusively on averaged ERPs, which represent linearly summed EEG trials that are phase-locked to a relevant event (e.g., motion onset), and that are independent of ongoing EEG activity (Jervis et al., 1983). It has been proposed that the transient phase-resetting of ongoing oscillatory EEG activity underlies ERP generation (Brandt and Jansen, 1991). However, oscillatory EEG activity that is not phase-locked can also occur to a stimulus event, and will not be seen in an averaged ERP (Makeig et al., 2004). Oscillatory EEG activity expressed both as a function of EEG frequency and time relative to stimulus onset and/or execution of motor response can be identified using time-frequency decomposition of EEG signals, and displayed as Event-Related Spectral Perturbation (ERSP) plots (Makeig et al., 2004). Changes in a given EEG frequency band can occur from more than one process or underlying mechanism (e.g., see Sedley and Cunningham, 2013). Modulation of alpha band (8–12 Hz) power has been linked to changes in attentional state (Worden et al., 2000; Sauseng et al., 2005; Thut et al., 2006; Fries et al., 2008), and performance on visual perception tasks (Ergenoglu et al., 2004; Babiloni et al., 2006; Thut et al., 2006). Alpha may act as an inhibitory brain signal (Klimesch, 2012), which might enable timing of processing, and gated access to knowledge, and orientation in time, place, and context (Basar et al., 1997; Palva and Palva, 2007; Klimesch, 2012). Increases in beta band power (12–30 Hz) may reflect maintenance of current behaviorally relevant sensorimotor or cognitive states (Engel and Fries, 2010), whereas gamma band power (>30 Hz) increases may facilitate cortical processing, cognitive control and perceptual awareness (Ray and Cole, 1985; Tallon-Baudry and Bertrand, 1999; Grossmann et al., 2007; Engel and Fries, 2010; but see Sedley and Cunningham, 2013).

Ideally, ERSP and ERP analyses performed in parallel could more completely characterize neural activity to different task demands and conditions. However, the relationship between oscillatory EEG activity and ERP activity complex (see Rossi et al., 2014) and is not typically studied. Our previous comparisons of ERP and ERSP activity to viewed dynamic eye and mouth movements in dynamic line-drawn faces showed statistically significant differences for apparent motion in the beta and gamma bands between facial motion conditions, which differed timing and frequency content relative to control motion stimuli (Rossi et al., 2014). Given our previous study, here, we expected to observe oscillatory EEG changes in beta and gamma bands that would occur at different post-motion onset times for facial and control motion stimuli.

## Materials and Methods

## Participants

Seventeen healthy participants provided written informed consent to participate in the study. All participants had normal or corrected-to-normal vision, and were free of a history of neuropsychiatric disorders. The study protocol was approved by the Institutional Review Board at Indiana University, Bloomington (IRB 1202007935).

High-density (256 channel) EEG and behavioral data were collected from all participants, and data from 1 individual had to be excluded from further analysis due to a large amount of artifactual EEG contamination from facial/neck muscle activity, as well as line noise. Hence, data from 16 participants (7 males, 9 females) with an average age of 26 years (range 21–34 years) were submitted for analysis. The 16 participants were right-handed, as assessed by the Edinburgh Handedness Inventory (mean: R64.6, SD: 19) (Oldfield, 1971).

## Stimuli

Participants viewed four different types of visual displays alternating between natural images of facial motion and respective motion controls, as well as motion of line-drawn faces and their respective motion controls. The real face, with eyes averting and looking directly at the observer, had a respective motion control that consisted of a colored checkerboard with checks moving towards the left and right in the same visuospatial position as the eyes in the real face (similar to that used in Puce et al., 1998). A line-drawn face, with eyes averting and looking at the observer, had a respective motion control in which line segments in the scrambled stimulus moved with a similar spatial excursion to the eyes, with the same number of pixels contributing to the motion (**Figure 1**).

## Stimulus Creation

Real faces had been originally created from still 8-bit color photographs of posed direct and extreme averted (30 degree) gaze positions in both left and right directions. The stimulus face was superimposed on a background of concentric grayscale circles of different luminance. The images were originally created to be presented sequentially to depict dynamic gaze transitions, and mouth motion (see Puce et al., 1998). The corresponding non-facial motion controls consisted of a colored checkerboard pattern that was constructed from hues taken from the original head. Separate corresponding control stimuli were created to that sequential presentation resulted in checks alternating their position in the same regions of the visual field as the eyes in the real face (Puce et al., 1998), and to ensure that subjects did not visualize a ''face'' in the dynamic control stimulus.

White line-drawn faces on a black background had been originally created from a multimarker recording of facial expressions using specialized biological motion creation software from which lines were generated between some of the point lights [Elite Motion Analysis System (BTS, Milan, Italy)]. The black and white control stimuli had originally been created by extracting line segments from the line-drawn face and spatially re-arranging them in the visual space in an earlier version of Photoshop (Adobe Systems, Inc.), so that the face was no longer recognizable (Puce et al., 2003). The existing line-drawn faces were modified for Rossi et al. (2014) in Photoshop CS5 (Adobe Systems, Inc.) by adding a schematic iris to the face which when spatially displaced could signal a gaze change on the stimulus face. A direct gaze consisted of a diamond-shaped schematic pupil positioned in the center of each schematic eye. Averted gaze consisted of an arrow-shaped schematic pupil that was moved to the extremity of the schematic eye (**Figure 1**). Thus, by toggling the two schematic eye conditions, observers reported seeing a convincing ''direct'' vs. ''averted'' gaze transition in the line-drawn face. Similarly, line-drawn control stimuli were created, using a rearranged ''scramble'' of the lines making

up the eye movements on the face stimuli, ensuring that all stimuli presented would be equiluminant, and have similar motion excursions, as well as contrast and spatial frequency characteristics (**Figure 1A**). On debriefing post-experiment, subjects did not report seeing ''eye'' stimuli in the line-drawn control stimulus.

For all stimulus types, the effect of smooth movement was generated and no side-switch transition was possible (e.g., eyes looking to the right followed by eyes looking to the left). Negative-contrast versions (inverse colors) of all the stimulus versions were constructed to be used as infrequently presented targets (**Figure 1B**).

#### Procedure

Participants viewed the stimuli displayed on a 24-inch monitor (Dell Ultra Sharp U2412M, refresh rate of 60 Hz) resulting in an overall visual angle of 5 × 3 (vertical × horizontal) degrees. Participants completed four experimental runs in total; each run lasted approximately 6 min to allow participants to remain still for the EEG recording and maintain their level of alertness. After each run, participants had a self-paced break.

All stimulus types were always presented in each experimental run, with a run consisting of the repeated presentation of the following alternating 20 s stimulus blocks (**Figure 1A**; following the procedure used in Puce et al., 2003):


4. LINE CONTROL. The spatially ''scrambled'' versions of the LINE FACE were alternated to produce apparent motion in the same part of the visual field as the eyes in the LINE FACE condition, similar to that used in Rossi et al. (2014).

Stimulus onset asynchrony was randomly varied between 1000 and 1500 ms on each experimental trial (i.e., between two consecutive apparent motion onsets). A total of 210 trials were acquired per stimulus type (LINE FACE, LINE CONTROL, REAL FACE, REAL CONTROL).

The experiment was run using Presentation Version 14 (NeuroBehavioral Systems, 2010). Participant reaction times and accuracy were logged, and time stamps for different stimulus types (as well as button press responses for detected target stimuli) for each trial were automatically sent to the EEG system and stored in the EEG file.

Participants were instructed to press a button indicating the presence of a target stimulus. Target stimuli were negativecontrast versions of all stimuli used in the experiment (**Figure 1B**). Targets were randomly assigned to each alternating block (20% of trials). Trials with targets were not included in ERP/EEG analyses. Similarly, so as to remove potential confounds created by changes in stimulus type (i.e., for the first stimulus of each block, as well as for stimuli immediately following targets), trials following a target and the first stimuli of each block were not included in ERP/EEG analyses. The purpose of the target detection task was to keep participants attentive. All participants completed a short practice run (36 trials) at the beginning of the session and were given feedback regarding detection of target stimuli. All participants completed the practice run with 100% accuracy. EEG trials from the practice run were not included in subsequent analyses.

### EEG Data Acquisition and Preprocessing EEG Data Acquisition

A Net Amps 300 high-impedance EEG amplifier and NetStation software (V4.4) were used to record EEG from a 256-electrode HydroCel Geodesic Sensor Net (Electrical Geodesics Inc.) while the participant sat in a comfortable chair and performed the task in a dimly lit, humidified room. Continuous 256 channel EEG data were recorded with respect to a vertex reference using a sampling rate of 500 Hz and bandpass filter of 0.1–200 Hz (the ground electrode was sited on the midline parietal scalp). Stimulus delivery and subject behavioral responses were time-stamped onto all EEG files. Impedances were maintained below 60 kΩ as per the manufacturer's recommended guidelines. Impedances were tested at the beginning of the experimental session and then once more at the half-way point of the experiment, allowing any high-impedance electrode contacts to be adjusted if necessary.

## EEG Data Preprocessing

EEG data were first exported from EGI Net Station software as simple binary files. The same pre-processing procedure was applied to the ERP and ERSP analyses. All EEG pre-processing procedures were performed using functions from the EEGLAB toolbox (Delorme and Makeig, 2004) running under MATLAB R2010b (The Mathworks, Natick, MA). EEG data were first segmented into 1700 ms epochs: 572 ms pre-stimulus baseline and 1082 ms after apparent motion onset. EEG amplitude at each trial was normalized relative to the pre-stimulus baseline based on the event-markers, identifying each trial type. ERP data were displayed using a 200 ms pre-motion onset and 600 ms after the motion transition—see **Figure 3**. [A manufacturerspecified latency correction factor was applied to all behavioral data and epoched ERP data. In our case, given a sampling rate of 500 Hz, a correction of 18 ms was made, as per manufacturer guidelines].

EEG epochs were first visually inspected to identify and exclude bad channels from each individual subject EEG dataset. The electrodes identified as bad differed between subjects; average number of ''bad'' electrodes was 22 ± 2.15 (standard error of mean) out of 256 channels. Epochs with very large artifacts (e.g., very large subject movements and channel drifts) were manually rejected prior to subjecting the EEG data subsequent artifact detection analyses.

Independent Component Analysis (ICA) was used to identify and subtract components representing artifacts such as eye movements, eye blinks, carotid pulse, muscle activity and line-noise (Bell and Sejnowski, 1995; Delorme and Makeig, 2004). This allowed trials with eyeblinks to be adequately corrected, and allowed these trials to be included in the analysis. A total of 32 ICA components were generated for each participant's EEG dataset. Eyeblinks, cardiac artifact and muscle activity were identified in isolated ICA components. Following removal of artifactual ICA components and reconstitution of the EEG signal, interpolation of bad channels was performed to regenerate a 256-channel EEG dataset. Bad channels were interpolated using a spherical interpolation: electrical activity was interpolated with respect to the surrounding nearest neighbor electrodes.

Data were re-referenced to a common average reference. ERP components such as the N170 and the vertex positive potential (VPP) amplitude have previously been shown to be very sensitive to reference location (Joyce and Rossion, 2005). The average reference has been suggested as being optimal as it captures finer hemispheric differences and shows the most symmetry between positive and negative ERP peaks for face-related stimuli (Joyce and Rossion, 2005). Only behaviorally correct EEG trials, i.e., no false alarms for targets, were included in subsequent analyses.

## EEG/ERP Data Analyses

Two temporo-occipital 9 electrode clusters including equivalent 10–10 system sites P07/P9 and P08/P10 were chosen for further analyses, based on inspection of the grand averaged data from the current study and previously reported maxima in N170 amplitudes that used 4 electrode clusters for 64- and 128-channel EEG derivations, and P09 and P10 for smaller electrode arrays of 10–10 system sites (Puce et al., 2000, 2003; Carrick et al., 2007; Brefczynski-Lewis et al., 2011; Rossi et al., 2014; **Figures 2**, **3**). Averaged data from the 9 electrodes in each hemispheric cluster were used in all subsequent ERP analyses. Similarly, singletrial EEG data recorded from the same 9 electrodes in each hemispheric cluster were used for ERSP analysis.

### Analysis of Event-Related Potentials

A digital 40 Hz infinite impulse response (IIR) low-pass filter was applied to the artifact-free behaviorally correct EEG data. Average ERPs were generated for each of the eight conditions and for each subject (about 200 trials per condition per subject on average). The ERPs from all subjects were averaged to generate a grand-average set of ERP waveforms for each condition and EEG channel.

Data from each 9 electrode temporo-occipital cluster were extracted and the average time-course for each electrode cluster was generated for subject and condition, and was subsequently used for calculating N170 amplitude and latency. In line with previous work (Puce et al., 2003), we focused on the N170 as a neural marker of the perception of facial motion. In our data, consistent with previous studies, the N170 showed a lateralized posterior scalp distribution (**Figure 2**). Using an automated peak detection procedure within a search time window of 150–250 ms after apparent motion onset, N170 peak amplitudes

and latencies were extracted for each condition, each subject, and each electrode cluster independently.

#### Analysis of Event-Related Spectral Perturbations

denote amplitude in microvolts and time in milliseconds, respectively.

All analyses were performed using custom in-house routines written using the EEGLAB toolbox (Delorme and Makeig, 2004) running under MATLAB. Artifact-free, behaviorally correct EEG segments were convolved with a linearly increased Morletwavelet on a trial-by-trial basis for each condition and subject. Specifically, the length of the wavelet increased linearly from 1 to 12 cycles across the frequency range of 5–50 Hz (theta, alpha, beta, and low-gamma). The linear increment of wavelet cycles is a commonly used practice when calculating spectral components in neurophysiological data, so that temporal resolution can be comparable for lower and higher EEG frequencies (Le Van Quyen et al., 2001) (for a detailed account on spectral analyses of EEG see Herrmann et al., 2005). After the EEG signals in each trial were convolved with a Morlet wavelet, they were transformed into power, and the resulting values were then averaged across trials. We analyzed the spectral power of components in the theta (5–8 Hz), alpha (8–12 Hz), beta (12–30 Hz), and low-gamma (30–50 Hz) EEG frequencybands as they evolved over the post-movement epoch. In order to extract even-related spectral power from raw power, a standard baseline procedure was applied in a trial by trail basis (Grandchamp and Delorme, 2011). The window used as baseline comprised data points between −200 and 0 ms pre-stimulus range.

Induced activity is defined as EEG activity that is elicited to the stimulus, but may not be precisely time- or phase-locked to the stimulus transition (in this case apparent motion onset). However, each individual EEG epoch will also contain evoked activity, hence a calculation of ''total power'' (i.e., sum of evoked and induced activity) in each frequency band was made for EEG epochs in our study (see Tallon-Baudry et al., 1996).

As we have previously noted differences between facial motion stimulus type for eye and mouth movements (Rossi et al., 2014), we performed a similar analysis and generated differential ERSP plots between pairs of conditions: LINE CONTROL Direct vs. Away, LINE FACE Direct vs. Away, REAL CONTROL Direct vs. Away and REAL FACE Direct vs. Away.

## Statistical Testing for Significant Differences **ERP peak analysis**

Differences in temporo-occipital N170 peak amplitude and latency were evaluated using a 4-way repeated-measures ANOVA with Hemisphere (Left, Right), Configuration (Face, Control), Stimulus Type (Real, Line) and Motion (Away, Toward) as within-subjects factors using SPSS for MAC 18.0 (SPSS Inc.). Significant main effects were identified at P values of less than 0.05 (after Greenhouse-Geisser correction). Contrasts were evaluated using the Bonferroni criterion to correct for multiple comparisons with P values of less than 0.05 identifying significant effects.

Furthermore, to specifically assess the specificity of the effect to REAL FACE stimuli, we performed paired t-tests between each motion conditions for each stimulus condition. Four t-tests were performed (i.e., REAL FACE AWAY × REAL FACE DIRECT, REAL CONTROL AWAY × REAL CONTROL DIRECT, LINE FACE AWAY × LINE FACE AWAY, LINE CONTROL AWAY × LINE CONTROL DIRECT). The level of statistical significance (a priori two-tailed) was set at p < 0.05.

## **ERSP analysis**

To measure the complete temporal extent of effects over frequency, we used a bootstrap approach (N = 1000 bootstraps), identifying time-frequency data points of statistically significant differences based on data-driven 95% confidence intervals (as described and implemented in Pernet et al., 2011) from the data of the two 9 electrode clusters. Non-parametric permutation was used to estimate the distribution under the null hypothesis of no differences in oscillatory amplitude between the pair of conditions.

Due to our paired design, when a subject was selected randomly, results from all his or her conditions were included in that sample. For each condition, we averaged the data across (resampled) participants and computed differences between conditions. Thus, for each one of the observed mean differences between conditions for a given frequency at each time-point, a t-statistic was calculated. At this stage, time points were evaluated according to a threshold set if their t-statistic corresponded to a value below 0.05 according to the Student's t-distribution. This procedure was repeated for all ERSP time-points at each frequency. Temporally contiguous threshold time points were grouped into temporal clusters. At each bootstrap iteration, the temporal cluster mass was computed as the sum of the t-statistics over consecutively significant time-points, with the maximum cluster mass being recorded. Finally, temporal clusters in the observed data were deemed significant if their mass exceeded the maximum cluster mass of 95% of all bootstrap replicates (corresponding to a significance level of 0.05). This method allowed correction for multiple comparisons. Thus, the cluster mass statistic identified temporal regions with significant differences while avoiding false-positives arising from multiple comparisons (Pernet et al., 2011). This approach is comparable to false-discovery rate (FDR; Benjamini and Hochberg, 1995).

## Results

## Behavioral Data

Participants identified the target stimulus (image negative) with 99% accuracy by button press. Mean reaction time for target detection by stimulus type was 557 ± 94 ms (s.d.) for REAL faces, 577 ± 112 ms for REAL controls, 607 ± 113 ms for LINE faces and 560 ± 92 ms for LINE controls. A 2-way (Configuration × Stimulus Type) repeated-measures ANOVA showed a significant main effect of Stimulus Type (F(1,15) = 11.067, P < 0.001) and an interaction effect of Configuration by Stimulus (F(1,15) = 26.24, P < 0.001).

For the significant main effect of Stimulus Type, REAL stimuli (real faces and real controls) generated faster responses relative to LINE stimuli (line-drawn faces and line-drawn scrambled controls, mean difference: 30 ± 9 ms). The significant interaction effect of Configuration by Stimulus type indicated that real faces generated a faster response compared to REAL controls (mean difference: 47 ± 11 ms), while the opposite was seen for linedrawn stimuli, line-drawn controls were identified fastest (mean difference: 47 ± 13).

The current behavioral task was used to help participants pay attention to the display. EEG epochs from these target trials were not included in subsequent analyses.

## Peak Analysis of the N170 ERP

N170 amplitudes and latencies were extracted from each of the two temporo-occipital scalp electrode clusters for each participant and condition for subsequent statistical testing. N170 was maximal over the temporo-occipital scalp, as demonstrated by the topographic voltage maps (**Figure 2**) plotted at the time point at which the N170 was maximal in amplitude. N170 was elicited in all stimulus conditions (**Figure 3**).

N170 latency and amplitude data for each condition and hemisphere are shown in **Table 1**. A 4-way repeatedmeasures ANOVA for N170 peak amplitude differences revealed a significant main effect for hemisphere (F(1,15) = 11.265, P < 0.001) and stimulus type (Real vs. Line; F(1,15) = 46.289, P < 0.001). The main effects for configuration (Face, Control) and motion (Away, Direct) were not significant. A significant interaction effect was observed between stimulus type and motion (F(1,15) = 5.143, P < 0.05).

For the significant main effect of hemisphere, post hoc paired comparisons revealed that N170 amplitude was greater for the Right hemisphere relative to the Left Hemisphere (mean difference: 0.66 ± 0.26 µV). The main effect of


TABLE 1 | Group N170 peak amplitude (µV) and latency (ms) data: Mean and Standard Errors (Std) as a function of hemisphere (Hem) and Condition.

Legend: Ampl = amplitude; Lat = latency.

stimulus type showed that N170 amplitude was greater for the line-drawn stimuli relative to the real stimuli (mean difference: 0.55 ± 0.08 µV, **Figure 3**). Post hoc comparisons for the interaction effect between stimulus type and motion revealed that among the REAL stimuli (i.e., real faces and real controls) the N170 for averted gaze was significantly larger than that to direct gaze (mean difference: 0.29 ± 0.13 µV) (**Figure 3**); an effect not seen for line-drawn stimuli.

One could argue that our 4-way ANOVA would reveal a 3-way interaction between stimulus type, configuration, and motion. This was not the case. When we compare our current data to those of Puce et al. (2000); we note that the authors also did not find interaction effects on N170 amplitude as assessed by means of 3-way ANOVAs performed at isolated hemispherically homologous electrode sites in a study that was performed using only 22 EEG electrodes. To try and investigate potential differences and similarities between the two studies, we further explored our current high-density EEG data by running paired t-tests on N170 amplitude for the Away and Direct motion transition for each stimulus type and configuration in the right occipitotemporal cluster (given that the main differences in the original study were reported in the right hemisphere). These analyses indicated that N170 amplitude was significantly larger for Away relative to Direct for REAL FACES (t(15) = −2.229, P = 0.04, mean difference = 0.46 µV), consistent with the difference reported in Puce et al., 2000. In contrast, N170 amplitudes for LINE faces were not significantly different [LINE FACE Away relative to LINE FACE Direct (t(0.15) = 0.411, P = 0.69, mean difference = 0.08 µV)]. Similar comparisons across the control conditions were also not significant: REAL controls showed no differences between conditions [REAL CONTROL Away relative to Direct (t(15) = −0.85, P = 0.41, mean difference = −0.11 µV] and LINE controls also showed no significant differences in N170 amplitudes [LINE CONTROL Away relative to Direct (t(15) = 0.241, P = 0.12, mean difference = 1.6 µV].

The 4-way repeated-measures ANOVA for N170 latency revealed a significant main effect of Stimulus type (F(1,15) = 39.49, P < 0.001) and Configuration (F(1,15) = 7.773, P < 0.05). No other statistically significant main effects of hemisphere or motion, or interaction effects were observed for N170 latency. For the significant main effect of Stimulus type, post hoc comparisons indicated that the effect might have been driven by the shorter latencies to REAL faces compared to LINE faces (mean difference: 26 ± 4 ms, **Figure 3**). For the significant main effect of Configuration, post hoc comparisons suggested that N170s for CONTROL (both REAL and LINE) were shorter compared to FACE stimuli (mean difference: 9 ± 3 ms, **Figure 3**). To statistically evaluate these differences, four paired t-tests were performed (see Section Materials and Methods). None of these latency differences were found to be statistically significant [REAL FACE Away relative to REAL FACE direct (t(15) = 0.979, P = 0.343, mean difference = 6.06 ms; LINE FACE Away relative to LINE FACE direct (t(15) = 1.563, P = 0.139, mean difference = 6.78 ms; REAL CONTROL away relative to REAL CONTROL direct (t(15) = 1.140, P = 0.272, mean difference 4.31 ms; LINE CONTROL away relative to LINE CONTROL direct (t(15) = 0.457, P = 0.654, mean difference = 2.96 ms)].

### Temporal Dynamics: ERSP Plots

ERSP plots demonstrated clear activity in selected EEG bands in all conditions in the post-motion onset period (**Figure 4**). The activity profile was similar across all conditions, including activity spanning over multiple time points and frequency bands. A common feature across all conditions was a prolonged burst of activity in the theta (5–8 Hz) and alpha (8–12 Hz) frequency bands in both electrode clusters extending from ∼100–400 ms. Moreover, a consistent decrease in amplitude in the beta range (12–30 Hz) was also seen for most conditions extending from ∼150–400 ms. An additional feature in the ERSP plots was activity in the low-gamma band (30–50 Hz) peaking roughly

plots as a function of condition and hemisphere. Left (L) and right (R) occipitotemporal data are presented in left and right columns, respectively. In each four-part display panel total ERSP activity is shown for each respective stimulus condition. The y-axis displays frequency (Hz) and the x-axis displays time (ms). Power (decibels) of ERSP activity in decibels, being a default unit used in analysis packages such as EEGLAB (Delorme and Makeig, 2004), is depicted by the color calibration bar at the right of the figure. The vertical broken line at time zero indicates the apparent motion stimulus onset.

around 200 ms after apparent motion onset for most conditions (**Figure 4**).

Statistically significant differences between conditions were seen only in the beta (12–30 Hz) and low-gamma (30–50 Hz) frequency bands. **Figure 5** displays masked time-frequency statistically significant differences between facial motion transitions and control motion transitions. We discuss these differences for each stimulus type below.

#### Line Face

ERSP comparisons between gaze transitions for LINE FACE stimuli produced statistically significant differences in beta and gamma bands in the interval of 150–300 ms post-motion onset in both electrode clusters (**Figure 5**). Larger bilateral gamma amplitudes (∼30–∼40 Hz) were seen to direct gaze transitions (LINE FACE Direct) peaking at around 200 ms relative to gaze aversions (LINE FACE Away) (**Figure 5** first row). For the opposite side of the contrast, larger bilateral gamma activity at ∼40 Hz occurred at a later point in time (at ∼300 ms for LINE FACE Away vs. Direct, **Figure 5** first row). Significant differences in beta activity were only observed in the left temporo-occipital electrode cluster, with a relative larger decrement in beta amplitude for Away relative to Direct.

FIGURE 5 | Group data: Statistically significant ERSP plot differences between stimulus conditions as a function of hemisphere. Left (L) and right (R) occipitotemporal data are presented in left and right columns, respectively. LINE FACE, REAL FACE, LINE CONTROL and REAL CONTROL difference plots appear from top to bottom panels, respectively. For LINE FACE, LINE FACE Away was subtracted from LINE FACE Direct, and for REAL FACE, REAL FACE Away was subtracted from REAL FACE Direct. For LINE CONTROL, the ERSP plot from LINE CONTROL Away has been subtracted from LINE CONTROL Direct. For REAL CONTROL, REAL CONTROL Away was subtracted from REAL CONTROL Direct. Frequency (Hz) is displayed on the y-axis as a function of time (ms). The direction of the difference in spectral power is depicted by the color calibration bar at the right of the figure. Warm colors depict increased power for condition 1 (Direct) whilst cool colors indicate increased power for condition 2 (Away). Gray areas in the plot indicate regions where the differences between conditions were not significant. The vertical broken line at time zero indicates the apparent motion stimulus onset.

## Real Face

REAL FACE stimuli elicited divergent significant differences across hemispheres (as shown in **Figure 5** s row) relative to LINE FACES. In the left electrode cluster, statistically significant differences occurred at similar times and were confined to the same frequency range, and were similar for REAL and LINE face stimuli. Direct gaze changes in REAL FACES elicited larger gamma amplitudes peaking at ∼200 ms (∼30–45 Hz range) relative to gaze aversions, while Away gaze changes elicited stronger gamma power at ∼300 ms (∼40 Hz), and at ∼500 ms (∼35–45 Hz range). In the right electrode cluster, a later biphasic difference in gamma amplitude consisted of initial augmentation and then suppression of activity for direct relative to averted gaze. These effects occurred for frequencies ∼40 and ∼50 Hz and peaked between 400 ms (**Figure 5**, second row). Unlike in the left hemisphere, effects for REAL FACES occurred at later times relative to LINE FACES: gamma effects for REAL FACES occurred later in time relative to LINE FACES.

#### Line Control

Unlike for the face stimuli significant effects were effectively confined to the beta range, for LINE CONTROL stimuli identified significant bilateral differences that were confined to the beta band, consisting of brief periods between ∼300 to ∼400 ms after movement onset were observed bilaterally (**Figure 5**, third row). This difference was driven by stronger beta suppression in the Away condition in both hemispheres.

#### Real Control

REAL CONTROL stimuli generated a much more diverse pattern of differences extending between ∼100 to ∼600 ms after movement onset in beta and gamma activity in the left hemisphere (**Figure 5**, fourth row) relative to LINE CONTROL stimuli. First, significantly stronger activity for the REAL CONTROL Away condition consisted of an early beta component peaking right before 200 ms (∼20–25 Hz), and a gamma component at ∼200 ms (∼45–50 Hz). In this same time range, gamma at ∼35–40 Hz between ∼200–300 ms, was significantly larger for Direct condition. Later in time, gamma burst extending from ∼40–50 Hz occurred between ∼500–600 ms and was stronger for the Away condition, while a lower frequency gamma component extending between ∼35–40 Hz was significantly stronger between ∼550–600 ms for the Direct condition (**Figure 5**, fourth row). Unlike the left electrode cluster, the right cluster showed more limited significant differences in oscillatory activity between conditions. Specifically, a very early gamma band response peaked at ∼100 ms, being stronger for the Direct condition, and a later higher frequency gamma component at ∼45 Hz peaking at ∼500 ms was stronger for the Away condition.

## Discussion

## ERP Data: N170 Effects

Our main purpose for the experiment was to look for stimulusinduced context effects that might produce modulations of the N170 by gaze transition in line-drawn faces, when presented with real images of faces in the same experiment. For a stimulus context effect to be present, we would expect to observe parallel effects in the form of larger N170s to gaze aversions vs. direct gaze for both real and line-drawn faces. Given our experimental design, this would translate to a significant interaction effect of Motion [Away, Direct] × Config [Face, Control]. If, however, N170 modulation occurred only to gaze changes in real faces, then we would expect to see a significant interaction effect of Motion [Away, Direct] × Config [Face, Control] × Stimulus [Real, Line]. Finally, if N170 modulation was driven by general low-level effects of stimulus luminance and contrast, then we might expect to observe a significant main effect of Stimulus type [Real, Line].

Interestingly, our analysis generated effects that were more complex than predicted for N170 amplitude. We observed a significant main effect of Motion [Away, Direct] × Stim [Real, Line]—which was not what we had predicted. The nature of these differences was clarified with paired t-tests, which indicated that N170 was larger for averted vs. direct gaze only for real faces—consistent with our previous study (Puce et al., 2000). Also consistent with our previous work (Rossi et al., 2014) there was no effect of gaze aversion on N170 in our line-drawn face stimuli (or control stimuli). Having said that, there were other striking differences in the current dataset that resulted from our initial predictions not being upheld. These, results raise a number of interesting questions about the nature of stimuli and experimental designs, which we subsequently discuss in detail.

However, relative to our original experimental question, based on the above findings we would argue that, stimuluscontext effects from real faces were not present for viewed eye movements in impoverished faces, when both stimulus categories are presented within the same experiment. This suggests that the difference in N170 amplitude gaze changes in real faces might be driven by a different neural mechanism relative to N170 modulation by mouth movements. We previously reported to mouth opening vs. closing movements in both real and line-drawn faces produce N170 amplitude modulations (Puce et al., 2003). Consistent with what we had previously postulated, it appears that information from mouth movements can be accessed from both real and impoverished images, unlike information from the eyes that appears to require real faces.

Why would ERPs elicited to impoverished mouth movements behave so differently to those observed to gaze transitions? Bassili (1978, 1979) originally reported behavioral data to viewed emotional expressions on point-light face stimuli. Success in recognizing different emotions was driven by the subject focusing on either the upper or lower regions of the impoverished face (i.e., eyes/brows vs. mouth, respectively) (Bassili, 1979). The line-drawn stimuli in this study can also be regarded as biological motion stimuli (see Oram and Perrett, 1994, 1996). Most typically, these impoverished forms of stimuli are used to represent very effectively the articulated motion of the joints of the body—where information related to the type of activity being observed can be readily identified from seeing these minimalist displays (Blake and Shiffrar, 2007). There is a very large literature demonstrating the sensitivity of the human brain to biological motion stimuli (see reviews by Giese and Poggio, 2003; Puce and Perrett, 2003; Blake and Shiffrar, 2007). Despite this, very few research groups have studied brain responses to biological motion stimuli involving the face (see the original studies of Bassili, 1978, 1979). Similar to movements of the body, mouth movements are a type of articulated motion. Mouth opening and closing occurs due to the actions of the articulated mandible. Hence, we argue that our previously reported ERP data that demonstrate differences between mouth opening and closing are representing a brain response to articulated biological motion (e.g., see Beauchamp et al., 2002; Peuskens et al., 2005).

Other facial movements involving the forehead and eyes do not require the movements of articulated joints, and gaze aversions also fall into this category. Changes in the eyes, either associated with gaze aversions, or with emotions such as fear, surprise, and happiness alter the amount of seen eye white area, which can modulate the brain's response to these types of stimuli even when these observed changes are task-irrelevant (Whalen et al., 2004; Hardee et al., 2008). This is likely to be driven by the high-contrast human irissclera complex. A gaze change, such as a lateral gaze shift, produces a local visuospatial luminance/contrast change. This type of stimulus, which can readily be seen at a distance, is thought to have evolved for the purposes of facilitating social interactions (Emery, 2000; Rosati and Hare, 2009). Human eyes are unique among primates with respect to this attribute, with most other species showing very little difference in contrast between irises and sclera (Rosati and Hare, 2009).

The lack of demonstrated N170 differences to gaze aversions relative to direct gaze transitions in this and our previous study using line-drawn faces (Rossi et al., 2014) supports the idea that neural activity to eye gaze transitions in real faces might be triggered by low-level stimulus features. Specifically, changes in local visual contrast and increased eye white area (see also Whalen et al., 2004; Hardee et al., 2008) as irises/pupils move from a direct to an averted position likely drive the N170 differences previously reported by Puce et al. (2000) and also seen to the real faces in the current study. This is likely to be driven by the high-contrast human iris-sclera complex. We found only a 2-way interaction on N170 amplitude suggesting that a contribution of low-level visual features on the modulation of the N170 cannot be ruled out—as indicated by a discernable ERP to the motion control stimuli. The real control stimuli also presented a high local luminance/contrast difference in the same part of the visual space as did the eye stimulus in the real face. However, based on the paired t-test data the local high contrast effect cannot totally account for the observed modulation of N170 amplitude to gaze motion: no differences were seen between N170s to the ''Away'' and ''Direct'' transitions for real control stimuli. The local high-contrast of the iris/sclera complex probably contributes to the N170, but cannot explain differential effect seen on the N170 to the real face stimuli. It may well be that the actual configuration of the eye plays a role in the response. This, in some ways, could be regarded as a low-level feature also: a feature that is embedded in a more complex stimulus (the dynamic face). A set of studies with parametric manipulations of these variables would be required to get to the bottom of this effect. Additionally, it would be interesting to investigate the relative effect of iris/sclera contrast and the configuration effect on the N170 by using for instance faces where human eyes were replaced by non-human primate eyes (Dupierrix et al., 2014), or examining responses to nonhuman primate gaze changes, as typically the iris/sclera complex in the eyes of the non-human primate do not show these high local contrast differences (Emery, 2000; Rosati and Hare, 2009).

Viewing the gaze changes of another individual are thought to produce reflexive changes in one's visuospatial attention (Hietanen et al., 2008; Itier and Batty, 2009). It could be argued that both the eyes and their respective scrambled controls might cue participants' visual attention in the motion direction. This possibility has been discussed in the literature (Grossmann et al., 2007; Hadjikhani et al., 2008; Straube et al., 2010). Our line-drawn controls in their ''averted'' state looked like arrows (facing left and then right), and N170 did not differ between the ''arrow'' control and the direct gaze control condition (a diamond shape). Behavioral and ERP studies of visuospatial cueing paradigms using Posner-like tasks (Posner, 1980) in healthy subjects have demonstrated similar behavioral effects for both arrows and schematic eyes, but different ERP-related effects that most typically occur beyond the P100 and N170 that are specific to these visuospatial cueing tasks. Specifically, anterior and posterior negativities have been described to arrow and schematic gaze-cues respectively (Hietanen et al., 2008; Brignani et al., 2009). Interestingly, when real faces are used in a gaze-cueing paradigm, differences in early ERPs, such as P100 and N170 (P1 and N1) have been reported, producing larger amplitudes for valid trials (Schuller and Rossion, 2001). These experimental results, despite being generated in different experimental designs, are consistent with the current study in that schematic eyes do not elicit changes in earlier sensory ERPs such as N170. This finding bears further investigation, given that the schematic eyes in the visuospatial cueing studies did have contrast between ''irises'' and ''sclera'', unlike those in our current study.

A further point needs to be made on the issue of the schematic representation of faces. Our participants all reported that they found the gaze transitions in both types of faces to be compelling. However, some interesting differences in behavior and neurophysiology were observed for the impoverished stimulus categories. Subjects detected target stimuli that consisted of image negatives for all presented stimulus types. Participants were slower at detecting impoverished face targets relative to real face targets, and were the slowest for impoverished faces relative to impoverished controls. For real stimuli, face targets were identified faster relative to controls. We cannot directly relate our behavioral data to our ERP findings: the ERPs were recorded to trials where no behavioral response was recorded, so we can only speculate about the potential nature of our ERP findings to the impoverished stimuli. One consideration might be that the impoverished faces in the current study might not be treated as faces by the brain. A stimulus such as an impoverished face might be ambiguous, and would hence take a longer time to be evaluated and might require more detailed processing. This might manifest as increased response time (for the detection of targets), as well as increased N170 latencies (which were seen as a main effect for linedrawn vs. real stimuli). Coupled with the longer latency is also an increased N170 amplitude (seen as a main effect for stimulus type for line faces and controls). The increased N170 latencies and amplitudes observed here might potentially reflect the more effortful processing that might be required of these stimuli.

Some intriguing differences in N170 activity relating to data of our control stimuli need to be addressed. In our original study, the checkerboard controls had movements that were not congruous with one another i.e., checks changed in two locations corresponding to each eye in the real face, but moved in opposite lateral directions (Puce et al., 1998). The control stimuli were created deliberately in this fashion, as in the piloting of data for the earlier study, subjects reported a very convincing and persistent illusion of eyes that the checkerboard control stimuli created. This created the unwanted confound of visualization in the study. This effect was abolished by introducing a movement condition where checks reversed in opposite directions, and we used this control stimulus in our previous study (see Puce et al., 1998). In the current study we were concerned that differences in the type of presented motion (congruous vs. incongruous) across the stimulus conditions may have, in part, contributed to the differences in the neurophysiological response between faces and controls. Therefore, we chose to have congruous motion for all stimulus types. Interestingly, in doing so we may well have created a stimulus-context effect for the control stimuli—and have potentially allowed subjects to ''see'' eyes in the control stimuli (as we had previously experienced). This occurred only for the REAL CONTROL stimuli, and not for the LINE CONTROL stimuli—there were no differences related to movement direction. So, we might have actually created an unexpected effect of stimulus-context in this experiment, with the REAL FACES providing a context for the REAL CONTROLS (not unlike that seen by Bentin and Golland, 2002). Our original purpose for running the experiment was to explore context effects related to eye gaze changes in LINE FACES in the presence of REAL FACES. In this latter case, we can state that no effect of stimulus context was observed.

The effects that are induced in N170 activity here underscore how important low-level stimulus manipulations related to luminance/contrast and also motion can be, and these have the potential to interact with task-related variables. Hence, control stimuli of multiple types may have to be used in an experiment so as to understand the nature of observed differences in neurophysiological data across different conditions.

## EEG Spectral Power

Total EEG spectral power to the very brief apparent motion transition generated consistent prolonged bursts of activity in theta, beta and gamma EEG frequency bands which overall behaved similarly across conditions in a task requiring detecting negatives of the stimuli (see **Figure 4**). We expected to observe oscillatory EEG changes in beta and gamma bands that would occur at different post-motion onset times for facial and control motion stimuli when statistical comparisons were made between conditions, in line with our previous study where participants detected color changes in line drawn face and control motion stimuli (Rossi et al., 2014). Statistically significant differences between stimulus conditions were confined to the beta and gamma bands only. The main significant change in gamma activity occurred at ∼200 and ∼300 ms post-motion onset for LINE faces bilaterally, but only in the left electrode cluster for REAL faces. Direct gaze transitions elicited stronger gamma amplitudes at ∼200 ms (for LINE and REAL faces), whereas averted gaze elicited stronger gamma amplitudes at ∼300 ms. We speculate that these bursts of activity reflect processing of facial information, as these changes were not present in the respective control conditions (compare first and second rows of **Figure 5**). REAL controls showed a gamma amplitude increment (∼40–50 Hz) for the Direct condition, which occurred ∼100 ms later than the gamma burst seen for REAL faces (compare second and last rows, **Figure 5**). It may be that this gamma burst to the controls might be a general coherent motion effect in data that were sampled from our occipito-temporal electrode clusters in a task where negative images of stimuli had to be detected.

In our previous study, we examined ERSP changes to impoverished line-drawn faces and controls only, where facial movements included eye and mouth movements in a color detection task that required a behavioral response for all presented stimuli (Rossi et al., 2014). In that study we also observed significant transient increases in the beta and gamma ranges to the facial motion stimuli, but these changes in oscillatory activity tended to occur at different time points relative to those seen in the current study. Gamma range changes to eye and mouth movements, if present, occurred much later in time relative to the current study e.g., after 400 ms post-motion onset relative to the changes at ∼150–300 ms in the current study and favored direct gaze and mouth closing movements. Beta range changes, if present, showed a short burst at ∼100 ms and a more prolonged burst between 350–550 ms favoring averted gaze (Rossi et al., 2014). In our previous study (Rossi et al., 2014) subjects viewed line-drawn face displays where the color of the lines could be either white or red, with subjects having to make a color decision on every seen motion transition. In our current study, as we had both line-drawn and real faces in the same experiment, we elected to use a target detection task where subjects had to identify negatives of all stimulus types, to try to ensure that equal attention was given to all stimulus types. Taken together, the ERSP findings of both studies would indicate that the significant differences in oscillatory behavior we observed at sites producing maximal ERP activity might be driven rather by task differences/decisions rather than the motion characteristics of the stimuli per se. Having said that, in both studies, changes in oscillatory activity were different across faces and respective controls, suggesting that changes in total oscillatory activity post-motion onset may reflect a complex mix of task and stimulus-related properties. At this point in time much more data are needed to make sense of these changes in oscillatory activity to facial motion stimuli and their relationship to ERP activity—it is not yet customary to perform both types of analysis in studies.

Caruana et al. (2014) performed an intracranial EEG study where ERPs and oscillatory activity were examined side-by-side. They presented epilepsy surgery patients with faces showing dynamic gaze changes while intracranial EEG was recorded from electrodes sited in the posterolateral temporal cortex. Intracranial N200 ERP activity (the analog of the scalp N170) and transient broadband high frequency gamma band activity (out to 500 Hz) occurring at around the same time as the N200 was significantly larger when patients viewed averted gaze relative to either direct gaze or lateral switching movements (Caruana et al., 2014). Lachaux et al. (2005) reported intracranial gamma-band activity in left fusiform gyrus, occipital gyrus and intraparietal sulcus for a static face detection task. Gamma band amplitude (40–200 Hz) significantly increased between 250–500 ms poststimulus onset, and these condition differences were not present in the ERP (Lachaux et al., 2005). In contrast, intracranial N200 activity and high frequency gamma activity in ventral temporal cortex to viewing static faces has been documented (Engell and McCarthy, 2011). The presence of gamma activity in the ERSP predicted the presence and size of the N200 at a particular site. Relevant to this study, however, N200 activity was elicited to impoverished face stimuli, but notably gamma activity was absent (Engell and McCarthy, 2011), indicating that the relationship between intracranial ERP activity and gamma activity can be a very complex one. It is difficult to make a comparison between intracranial and scalp EEG studies, because with intracranial EEG high frequency gamma band activity can be sampled, whereas in scalp EEG studies the skull effectively acts as a low pass filter so that gamma frequencies typically will be recorded only under 100 Hz (Srinivasan et al., 1996). Interestingly, impoverished face stimuli, as used in Engell and McCarthy (2011) study and our study, do not appear to generate prominent gamma activity. In our study, stimuli tended to evoke sustained activity in frequency ranges below gamma), gamma activity was transient and was seen at around 200 ms postmotion onset in electrodes sited over lateral temporal scalp (**Figure 4**), similar to the broadband gamma reported in the intracranial EEG studies (e.g., Lachaux et al., 2005; Engell and McCarthy, 2011; Caruana et al., 2014). In another scalp EEG study, Zion-Golumbic and Bentin (2007), noted that activity in the 25–45 Hz range between 200–300 ms post-stimulus onset, was largest for (static) real faces compared to scrambled real faces in midline parieto-occipital locations (Zion-Golumbic and Bentin, 2007).

In our study, it is not entirely clear if the gamma activity could be related to configurality, movement, or a combination of both. However, we believe that the gamma responses at ∼200 and ∼300 ms present both for LINE and REAL stimuli reflect a preferential response to facial movement. An alternative explanation is that as a gamma response was seen for LINE and REAL faces and controls, albeit at different postmotion onsets, these gamma responses might be correlates of motion perception in the horizontal axis (see **Figure 5**). However, in our previous study we included mouth movements (with a large vertical component) in an impoverished face, and recorded gamma activity to both stimulus types (Rossi et al., 2014). From the relatively few existing studies in the literature, it is clear that the relationship between oscillatory EEG activity, including gamma, and stimulus and task type is complex. Much further work to viewing motion stimuli with different attributes (e.g., linear vs. non-linear, inward vs. outward radial, looming vs. receding) where ERP and ERSP activity are directly compared will be needed to disentangle these issues.

As noted earlier, LINE FACES and CONTROLS differ in configuration, but produce identical movements. A common spectral change between LINE FACES and CONTROLS was a decrement in beta power, maximal at ∼25 Hz for the conditions producing direct gaze (FACES) and a diamond shape (CONTROLS), albeit at slightly different time intervals (FACES from ∼215–300 ms, CONTROLS ∼300–400 ms). Since the common feature between these stimulus types is apparent motion of identical numbers of pixels, these beta components might be related to the encoding of the movement. Beta spectral power has been previously associated with the perception and production of movement per se (Pfurtscheller et al., 1996; Müller et al., 2003; Müller-Putz et al., 2007). However, the beta change for LINE FACES occurred earlier than that for LINE CONTROLS. We have previously seen similar decreases in the 20–30 Hz beta range for line-drawn faces producing identical lateral eye movements as those used in the current study (Rossi et al., 2014). However, in Rossi et al. (2014) decrements in beta spectral power occurred later (∼400–550 ms). In these two studies, participants were asked to respond to very different stimulus attributes, and with different stimulus probability. In Rossi et al. (2014) participants responded on every trial to indicate the color of the line-drawn stimulus (which varied randomly from white to red, and vice-versa). In the current study, participants responded to infrequent targets that were negatives of all stimuli. Hence, at this stage it is not possible to clarify the nature of the observed beta power change to the apparent motion stimulus. We do believe that these differences were driven by task demands, and will have to explicitly test this in future studies.

## General Conclusions

We would advocate that future studies evaluate ERP and ERSP activity in parallel, so that we can develop an understanding of the functional significance of each type of neurophysiological activity, and how one might affect the other. Gaze changes produced gamma activity irrespective of face type: direct gaze elicited more gamma at an earlier latency relative to averted gaze. Overall, our N170 ERP peak analysis argues for the idea that gaze changes/eye movements in impoverished linedrawn faces do not trigger the neural responses that have been associated to the perception of socially relevant facial motion (relevant for communicative behavior), replicating data in an earlier study (Rossi et al., 2014). In contrast, real faces in this study as well as others (e.g., Puce et al., 2000, 2003; Conty et al., 2007; Latinus et al., revised and resubmitted) show N170 sensitivity to gaze changes. Interestingly, differences between our real faces and control stimuli were not as strong as we had previously demonstrated (Puce et al., 2000), and this may be due to ability to potentially visualize our current controls as faces. Overall our data indicate that N170s elicited to social attention manipulations are not modulated by topdown processes (such as priming or context) for impoverished faces.

Taking together the findings of our previous and current studies, N170s to gaze changes appear to be generated by different processes relative to mouth movements. Eye movements/gaze changes in real faces generate local visuospatial luminance/contrast changes producing an N170 that altered by the luminance/contrast change that occurs between changing gaze conditions. When gaze changes are presented in an impoverished face, there is no differential luminance/contrast change and the N170 does not show modulation across gaze conditions. In contrast, mouth opening and closing movements are a type of articulated biological motion, whose moving form modulates N170 irrespective of whether the movement occurs in a real or an impoverished face (Puce et al., 2003; Rossi et al., 2014), therefore largely independent of luminance/contrast. Yet, the motion of both face parts happens to elicit ERP activity at the same latency with a similar scalp topography that likely reflects an aggregate of neural activity from various parts of motionsensitive, as shown clearly by fMRI studies (e.g., Puce and Perrett, 2003). The functional dynamics from this very heterogeneous brain network will likely only be disentangled by aggregating the data from a number of different investigations such as functional connectivity using fMRI, intracranial EEG and scalp EEG/MEG studies which examine evoked and induced neurophysiological

## References


activity in healthy subjects and individuals with neuropsychiatric disorders.

## Acknowledgments

AP, FJP, and AR were supported by NIH grant NS-049436. We also thank our participants, who made this experiment possible.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The Guest Associate Editor Davide Rivolta declares that, despite having collaborated with author Ania Puce, the review process was handled objectively and no conflict of interest exists.

Copyright © 2015 Rossi, Parada, Latinus and Puce. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# HUMAN NEUROSCIENCE

## Own-race and own-age biases facilitate visual awareness of faces under interocular suppression

## **Timo Stein1,2,3\*, Albert End2,4,5 and Philipp Sterzer 2,3,6**

<sup>1</sup> Center for Mind/Brain Sciences, CIMeC, University of Trento, Rovereto, Italy

<sup>5</sup> Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany

<sup>6</sup> Bernstein Center for Computational Neuroscience, Berlin, Germany

#### **Edited by:**

Aina Puce, Indiana University, USA

**Reviewed by:** Martin Lages, University of Glasgow, UK Przemyslaw Tomalski, University of Warsaw, Poland

#### **\*Correspondence:**

Timo Stein, Center for Mind/Brain Sciences, CIMeC, University of Trento, Palazzo Fedrigotti, Corso Bettini 31, 38068 Rovereto, Italy e-mail: timo@timostein.de

The detection of a face in a visual scene is the first stage in the face processing hierarchy. Although all subsequent, more elaborate face processing depends on the initial detection of a face, surprisingly little is known about the perceptual mechanisms underlying face detection. Recent evidence suggests that relatively hard-wired face detection mechanisms are broadly tuned to all face-like visual patterns as long as they respect the typical spatial configuration of the eyes above the mouth. Here, we qualify this notion by showing that face detection mechanisms are also sensitive to face shape and facial surface reflectance properties. We used continuous flash suppression (CFS) to render faces invisible at the beginning of a trial and measured the time upright and inverted faces needed to break into awareness. Young Caucasian adult observers were presented with faces from their own race or from another race (race experiment) and with faces from their own age group or from another age group (age experiment). Faces matching the observers' own race and age group were detected more quickly. Moreover, the advantage of upright over inverted faces in overcoming CFS, i.e., the face inversion effect (FIE), was larger for own-race and own-age faces. These results demonstrate that differences in face shape and surface reflectance influence access to awareness and configural face processing at the initial detection stage. Although we did not collect data from observers of another race or age group, these findings are a first indication that face detection mechanisms are shaped by visual experience with faces from one's own social group. Such experience-based fine-tuning of face detection mechanisms may equip in-group faces with a competitive advantage for access to conscious awareness.

**Keywords: face perception, face detection, visual awareness, race, age, interocular suppression, continuous flash suppression**

## **INTRODUCTION**

Faces are a rich source of important social information. Before this information can be accessed, however, the presence of a face in a visual scene needs to be detected. While much research has examined how we identify and remember individual faces, surprisingly little is known about the perceptual mechanisms underlying the initial detection of a face. Most classical theories of face perception only deal with the perceptual and cognitive operations that are carried out after a face has been detected in a scene (Bruce and Young, 1986; Burton et al., 1999; Haxby et al., 2000). It appears plausible, however, that face detection is supported by perceptual mechanisms distinct from those analyzing specific facial properties such as identity, because face detection and face recognition have fundamentally different computational goals (Tsao and Livingstone, 2008): Whereas recognition mechanisms need to extract facial information that distinguishes individual faces, detection mechanisms need to be sensitive to information that is common to all faces. Indeed, there is evidence for a dissociation between face detection and face recognition in prosopagnosic individuals who show severe deficits in face discrimination but perform well in face detection tasks (de Gelder and Rouw, 2000; Le Grand et al., 2006; Garrido et al., 2008). Accordingly, recent models of face perception have incorporated a distinct initial stage of face detection in a hierarchy of face processing stages (de Gelder et al., 2003; Johnson, 2005; Duchaine and Nakayama, 2006; Tsao and Livingstone, 2008).

How could face detection mechanisms localize regions in a visual scene that contain a face? Because all faces share the same global structure, face detection can efficiently be achieved by matching the visual input to an internal representation corresponding to the structure of a prototypical face (Lewis and Ellis, 2003). Although the exact nature of this face representation or face template is currently unknown, it appears likely that face detection mechanisms are tuned to the spatial configuration of

<sup>2</sup> Department of Psychiatry, Charité Universitätsmedizin Berlin, Berlin, Germany

<sup>3</sup> Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Berlin, Germany

<sup>4</sup> DFG Research Unit Person Perception, Friedrich Schiller University of Jena, Jena, Germany

facial parts that are invariant across different face exemplars (e.g., two eyes above nose above mouth; McKone et al., 2007; Tsao and Livingstone, 2008). When these "first-order relations" are distorted by turning faces upside down, face detection performance declines significantly (Purcell and Stewart, 1988; Lewis and Edmonds, 2003; Garrido et al., 2008). Because upright and inverted faces are physically identical, this face inversion effect (FIE) supports the notion that face detection mechanisms rely on information about the common spatial configuration of facial parts.

A particularly striking demonstration of the impact of face inversion on detection performance comes from experiments using strong interocular suppression induced by continuous flash suppression (CFS; Tsuchiya and Koch, 2005). In CFS, a train of high-contrast, contour-rich masks flashed into one eye can render a face photograph projected to the other eye invisible for up to several seconds (see **Figure 1A**). The time faces need to overcome suppression and gain access to awareness is strongly modulated by their orientation: Upright faces break into awareness much more quickly than inverted faces (Jiang et al., 2007; Yang et al., 2007; Stein et al., 2011a). This FIE in breaking continuous flash suppression (b-CFS) is larger than the effect of inversion on b-CFS for most other objects (Stein et al., 2012b), indicating that the FIE reflects face-specific detection mechanisms (Zhou et al., 2010a). Thus, comparing the duration of perceptual suppression of physically identical upright and inverted faces under CFS represents a powerful and well-controlled method for studying mechanisms of face detection.

With this approach, we have recently found evidence that face detection mechanisms are broadly tuned to register all visual information that could be indicative of a face: Even simple schematic head-shaped patterns consisting of three dark blobs were detected more quickly when the spatial arrangement of these blobs resembled the face-like configuration of two eyes above the mouth than when this configuration was inverted (Stein et al., 2011b). Interestingly, these face-like patterns also preferentially attract the gaze of newborns in their first few days of life (e.g., Farroni et al., 2005). Thus, it is possible that relatively hard-wired face detection mechanisms respond to all visual patterns that contain face-like first-order relations among face-like parts (also see Tomalski et al., 2009a,b). However, there is also evidence that face detection mechanisms can be modified by visual experience and respond optimally to those faces that have been encountered most frequently. First, the inversion effect for schematic face-like patterns is smaller than for naturalistic face photographs (Stein et al., 2011b). Second, Gobbini et al. (2013) recently reported that upright faces of close friends overcame CFS more quickly than upright faces of strangers. However, as this b-CFS study did not include inverted faces, faster detection of highly familiar faces could have been due to uncontrolled differences in low-level physical stimulus characteristics.

To better understand the tuning properties of face detection mechanisms, in the present study we used b-CFS to measure detection performance and inversion effects for faces from the observer's own race or from another race (race experiment) and for faces from the observer's own age group or from another age group (age experiment). While it is well established that the greater experience we have with people from our own race and age group is associated with better recognition memory for ownrace and own-age faces (Meissner and Brigham, 2001; Rhodes and Anastasi, 2012), it is unknown whether own-race and ownage biases facilitate the initial detection of a face. Faces from different races and age groups have identical first-order relations among facial parts, but differ in face shape and surface reflectance properties (Berry and McArthur, 1986; Hill et al., 1995). Thus, if face detection mechanisms were relatively hard-wired and broadly

**FIGURE 1 | Breaking continuous flash suppression (b-CFS) paradigm and face stimuli. (A)** Schematic of an example b-CFS trial. An upright or an inverted face was gradually introduced to one eye. To render the face target invisible for the first seconds of each trial through interocular suppression, CFS masks flashing at 10 Hz were presented to the other eye. The contrast of the CFS masks was slowly ramped down over the course of each trial.

Participants indicated as quickly and accurately as possible on which side of fixation the target or any part of the target became visible. **(B)** Example face stimuli. Rows from top to bottom: young Caucasian adults from the race experiment, young Black adults from the race experiment, young Caucasian adults from the age experiment, and old Caucasian adults from the age experiment.

tuned to fit all visual patterns having face-like first-order relations (Stein et al., 2011b), the FIE should be of similar size for all face categories. Alternatively, if the mechanisms supporting visual awareness were shaped by experience and thus optimally tuned to more frequently encountered faces (Gobbini et al., 2013), the FIE in b-CFS should be larger for same-race and same-age faces than for other-race and other-age faces.

### **METHOD**

#### **PARTICIPANTS**

Fourteen Caucasian students (12 female, age range 20–35 years, *M* = 24.9 years, *SD* = 4.5 years) participated for course credit or monetary compensation. All participants had normal or corrected-to-normal vision and were naïve to the purpose of the study. The study protocol was approved by the Charité ethics committee.

#### **DISPLAY AND STIMULI**

Participants viewed a CRT screen from a distance of 50 cm through a mirror stereoscope, such that each eye was presented with one of two fusion contours (11.0◦ × 11.0◦ of visual angle) consisting of white noise pixels (width 0.5◦ ). Because the precise luminance was not critical to our research question, i.e., the comparison of physically identical upright and inverted faces, we did not linearize the monitor output. Faces were presented on a mid-gray background within these fusion contours, with the remainder of the screen being black. In the center of each fusion contour a fixation cross (0.7◦ × 0.7◦ ) was displayed and participants were asked to maintain stable fixation throughout each experimental block. We created multicolored Mondrian-like CFS masks (10.0◦ × 10.0◦ ) consisting of randomly arranged circles (diameter 0.4◦–1.8◦ ) and selected 120 colored face photographs from the "Center for Vital Longevity Face Database" (Minear and Park, 2004).

In the *race experiment*, we used 30 photographs of young Caucasian adults (15 female, age range 18–27 years, *M* = 21.9 years, *SD* = 2.6 years) and 30 photographs of young Black adults (15 female, age range 18–30 years, *M* = 22.8 years, *SD* = 3.4 years). In the *age experiment*, we used another set of 30 photographs of young Caucasian adults (age range 18–27 years, *M* = 22.0 years, *SD* = 2.1 years) and 30 photographs of older Caucasian adults (age range 65–91 years, *M* = 74.9 years, *SD* = 7.1 years). All non-facial features were cropped and the images were resized to approximately 3.5◦ × 4.0◦ , retaining some variability in face size (**Figure 1B**). Then the stimuli's luminance and RMS contrast were adjusted (based only on the monitor's input values, as we did not linearize the monitor output), separately for each RGB channel (in the race experiment, the RMS contrast was slightly higher than in the age experiment). To preserve each face's original color composition, we computed the relative contribution of each RGB channel to the luminance of the original stimulus, which served as a weighting factor for each RGB channel. These weighting factors were then used to normalize each RGB channel's luminance proportionally to its weight in the original image.

Note that a precise matching of low-level stimulus characteristics was not critical to our research question, as we compared breakthrough from CFS for physically identical stimuli shown in upright and inverted orientations. Even for grayscale face stimuli, it is virtually impossible to equate all low-level physical stimulus properties that may influence b-CFS (e.g., Yang et al., 2007; Stein and Sterzer, 2012; Stein et al., 2012b). For colored face photographs, this problem is further complicated by the nontrivial interaction of color channels. Therefore, we did not attempt to precisely match the color photographs used in the present experiments, but only sought to achieve roughly similar overall suppression durations.

## **PROCEDURE**

Participants performed a standard b-CFS localization task: After a 1-s fixation period, CFS masks flashing at 10 Hz were presented to one randomly selected eye, while a face was gradually introduced to the other eye by ramping up its contrast over the first second of each trial. Beginning 2.1 s after trial onset, the contrast of the CFS masks was linearly ramped down to zero over 6.9 s. The face was presented until response or for a maximum trial length of 10 s. On each trial, a face was centered at a random vertical position (maximally 2.6◦ below or above the fixation cross) in the left or the right half of the fusion contour (2.9◦ from the fixation cross). Participants were informed about the presentation of upright and inverted face targets and were asked to press the left or the right arrow key on the keyboard to indicate as fast and accurately as possible on which side of fixation a face or any part of a face emerged from suppression.

Both the race and the age experiment consisted of 240 trials (separated by mandatory breaks after 80 and 160 trials). We counterbalanced the order of the two experiments across participants. In both experiments each combination of two face categories (race experiment: Caucasian faces, Black faces; age experiment: young faces, old faces), two face orientations (upright and inverted), two eyes for face presentation, and 30 face exemplars was presented once. The order of trials was randomized.

## **ANALYSIS**

We excluded trials with incorrect responses from the analysis (race experiment: 1.7% of all trials, age experiment: 2.1% of all trials). As an effect size estimate for the paired *t*-tests we report Cohen's *d* as the pooled mean divided by the standard deviation.

## **RESULTS**

#### **RACE EXPERIMENT**

A repeated-measures ANOVA with the factors race (Caucasian, Black) and orientation (upright, inverted) on the mean suppression durations yielded a significant main effect of orientation, *F*(1,13) = 15.98, *p* = 0.002, η 2 *<sup>p</sup>* = 0.55, reflecting overall shorter suppression durations for upright faces, and a significant raceby-orientation interaction, *F*(1,13) = 11.05, *p* = 0.005, η 2 *<sup>p</sup>* = 0.46. The main effect of race did not reach statistical significance, *F*(1,13) = 3.67, *p* = 0.078, η 2 *<sup>p</sup>* = 0.22. Compared to their inverted counterparts, suppression durations were shorter for both upright Caucasian faces, *t*(13) = −4.33, *p* = 0.001, *d* = 1.16 (*M* = −865 ms, *SD* = 747 ms, 95% CI [−1296 ms, −433 ms]), and upright Black faces, *t*(13) = −3.01, *p* = 0.010, *d* = 0.80 (*M* = −442 ms, *SD* = 550 ms, 95% CI [−760 ms, −124 ms]). Importantly, however, the significant interaction demonstrated that the FIE was significantly

**experiment (C, D)**. **(A)** Mean suppression durations for upright and inverted Caucasian and Black faces. Error bars show 95% CIs for the mean difference between upright and inverted faces (that is, 95% CIs of the face inversion effects), separately for Caucasian and Black faces. **(B)** Individual subject data. Left panel: Inversion effects (difference in mean suppression durations between upright and inverted faces) for Caucasian and Black faces. Right panel: Interaction effect, with positive values reflecting a larger inversion effect for Caucasian faces than for Black

larger for Caucasian faces, i.e., for own-race faces (*M* = 423 ms, *SD* = 476 ms, 95% CI [148 ms, 697 ms], see **Figures 2A,B**).

#### **AGE EXPERIMENT**

For the age experiment, a repeated-measures ANOVA with the factors age (young, old) and orientation revealed a similar pattern of results. There was a significant main effect of age, *F*(1,13) = 26.08, *p* < 0.001, η 2 *<sup>p</sup>* = 0.67, reflecting overall shorter suppression durations for young faces, a significant main effect of orientation, *F*(1,13) = 22.09, *p* < 0.001, η 2 *<sup>p</sup>* = 0.63, and a significant age-by-orientation interaction, *F*(1,13) = 29.65, *p* < 0.001, η 2 *<sup>p</sup>* = 0.70. Again, compared to their inverted counterparts, suppression durations were shorter for both upright young faces, *t*(13) = −6.39, *p* < 0.001, *d* = 1.71 (*M* = −1055 ms, *SD* = 618 ms, 95% CI [−1413 ms, −699 ms]), and for upright old faces, *t*(13) = −2.42, *p* = 0.031, *d* = 0.65 (*M* = −406 ms, *SD* = 628 ms, 95% CI [−768 ms, −43 ms]). Crucially, the significant interaction demonstrated a larger FIE for young faces, i.e., for own-age faces (*M* = 650 ms, *SD* = 446 ms, 95% CI [392 ms, 907 ms], see **Figures 2C,D**).

#### **LINEAR MIXED EFFECTS ANALYSES**

To account for variability in suppression durations between face exemplars, we also performed linear mixed effects analyses using the lme4 package (Bates et al., 2012) for R (R Core Team) on the raw suppression durations and, due to their positive skew, also on log-transformed suppression durations. These analyses had random intercepts for participants and for individual face exemplars. Reduced models containing only these random effects of participants and face exemplars were tested against models including fixed effects of orientation (upright, inverted) or face category (race experiment: Caucasian, Black; age experiment: young, old) using likelihood ratio tests. To test for the interaction effect, models with the orientation-by-category interaction were compared to models with the two fixed factors only.

For the analyses of raw suppression durations from the race experiment, the comparison of the reduced model with the model containing the additional fixed factor of orientation was significant, χ 2 (1) = 102.82, *p* < 0.001, while the comparison with the model containing the additional fixed factor of face category did not reach significance, χ 2 (1) = 1.31, *p* = 0.252. Most importantly, the interaction was significant, χ 2 (1) = 10.76, *p* = 0.001. The results of the analyses of log-transformed suppression durations from the race experiment were similar, for orientation, χ 2 (1) = 98.86, *p* < 0.001, for face category, χ 2 (1) = 2.15, *p* = 0.143, and for the interaction, χ 2 (1) = 10.20, *p* = 0.001.

For the age experiment, analogous analyses of raw suppression durations revealed a significant effect of orientation, χ 2 (1) = 94.94, *p* < 0.001, a significant effect of face category, χ 2 (1) = 4.18, *p* = 0.041, and a significant interaction effect, χ 2 (1) = 19.17, *p* < 0.001. Finally, a similar pattern of results was obtained for the analyses of log-transformed suppression durations from the age experiment, for orientation, χ 2 (1) = 93.914, *p* < 0.001, for face category, χ 2 (1) = 5.14, *p* = 0.023, and for the interaction, χ 2 (1) = 18.24, *p* < 0.001. Thus, the results from the linear mixed effects analyses were consistent with the outcome of the standard repeated-measures ANOVA reported above, meaning that the effects persisted after accounting for variability across individual face exemplars.

#### **SIMILARITY OF SUPPRESSION DURATIONS FOR INVERTED FACES**

Additional *post hoc t*-tests showed that for inverted faces suppression durations did neither differ between own-race and otherrace faces nor between own-age and other-age faces, both *t* < 1. The similarity of suppression durations for inverted faces can be regarded as an a posteriori validation of our attempt to match faces in terms of low-level physical stimulus characteristics. By contrast, when displayed in upright orientation, suppression durations were shorter for own-race faces compared to other-race faces, *t*(13) = 3.57, *p* = 0.003, *d* = 0.95 (*M* = −389 ms, *SD* = 407 ms, 95% CI [−624 ms, −153 ms]), as well as for own-age faces compared to other-age faces, *t*(13) = −6.34, *p* < 0.001, *d* = 1.69 (*M* = −660 ms, *SD* = 389 ms, 95% CI [−884 ms, −435 ms]). Thus, the increased FIE for own-race and own-age faces most likely reflected a greater advantage of upright over inverted faces in gaining access to awareness.

#### **EXPERIMENTAL ORDER AND OWN-RACE VS. OWN-AGE BIAS**

Because we used a within-subjects design, it is possible that the temporal order of the experiments affected our results. In particular, young Caucasian faces were included in both experiments. Thus, after the first experiment observers might have been accustomed to the presentation of the specific face categories used in the first experiment, of which only young Caucasian faces were repeated in the second experiment (albeit using different exemplars). We therefore conducted an additional mixed ANOVA with the between-subjects factor experimental order (race experiment first, age experiment first) and the within-subjects factors experiment (race, age), face category (own, other), and orientation. There was no significant four-way interaction and there were no significant three-way interactions with experimental order (all *F* < 1), indicating that the difference in FIEs for own- and other faces was similar for the first and the second experiment, both for the race experiment (first, *M* = 423 ms, *SD* = 477 ms; second, *M* = 423 ms, *SD* = 513 ms), as well as for the age experiment (first, *M* = 528 ms, *SD* = 427 ms; second, *M* = 771 ms, *SD* = 464 ms). Furthermore, because the three-way interaction between experiment, face category, and orientation was not significant, *F*(1,12) = 2.76, *p* = 0.123, η 2 *<sup>p</sup>* = 0.19, there was no evidence for differences in the strength of the FIE modulation by the own-race and the own-age bias.

## **DISCUSSION**

Upright faces have a robust advantage over inverted faces in overcoming CFS and breaking into awareness (e.g., Jiang et al., 2007; Yang et al., 2007; Stein et al., 2011a,b). This FIE demonstrates the sensitivity of detection mechanisms to the global facial structure, i.e., the spatial configuration or first-order relations of face parts, which is disrupted in inverted faces (Purcell and Stewart, 1988; Lewis and Edmonds, 2003; Lewis and Ellis, 2003). Most likely, the FIE reflects face-specific detection mechanisms, as the impact of inversion on b-CFS is greater for faces than for most other object categories (Zhou et al., 2010a; Stein et al., 2012b). The present findings show that face detection mechanisms are not only sensitive to face orientation, but also to comparably subtle differences in face shape and surface reflectance. Young Caucasian adults detected faces of their own race and age group more quickly than young Black faces and old Caucasian faces. This advantage of upright own-race and own-age faces over upright other-race and other-age faces is unlikely to merely reflect differences in low-level physical stimulus properties (e.g., higher contrast at the hairline in young Caucasian faces), because we did not obtain similar differences in suppression durations when the same faces were inverted. Moreover, the advantage of upright over inverted faces in gaining access to awareness, i.e., the FIE, was increased for own-race and own-age faces. This indicates that configural face processing at the initial detection stage can be influenced by facial properties that differ between faces from different race and age groups, namely by differences in face shape and surface reflectance (including, e.g., albedo, hue, texture; Russell et al., 2007).

This influence of face shape and facial surface reflectance properties on the FIE in simple detection has implications for our understanding of the perceptual mechanisms involved in visual awareness of faces. It has been proposed that faces are detected by matching the visual input to a deformable internal representation of a prototypical face (Lewis and Ellis, 2003). A poor match between this (upright) face template and inverted faces could account for the FIE. We have recently provided evidence that this face template only represents the prototypical first-order and ordinal luminance contrast relationships among facial parts that are shared by all faces under natural lighting conditions (Stein et al., 2011b). This account cannot explain the increased FIE for own-race and own-age faces. Rather, the present findings indicate that the face template guiding detection holds a more detailed representation of a prototypical face, containing information about face shape and surface reflectance properties.

## **LIMITATIONS OF THE PRESENT ONE-GROUP DESIGN FOR INTERPRETING OWN-RACE AND OWN-AGE BIASES**

It seems natural to interpret these own-race and own-age biases as indicative that experience with people from one's own race and age group finely tunes detection mechanisms to faces from one's own social categories. However, to be precise, our data only show that for young Caucasian observers the advantage of upright over inverted faces in gaining access to awareness is larger for young Caucasian faces than for young Black and old Caucasian faces. While the comparison of upright and inverted faces rules out that low-level stimulus differences caused this pattern of results, our findings do not yet establish unequivocal evidence for a fine-tuning of face detection mechanisms to one's own social categories. For this, it would have been necessary to show a reversed pattern of results with young Black or old Caucasian observers. As we could not collect data from these groups of observers due to logistic challenges, testing for this crossover interaction remains an important avenue for future studies. Thus, our findings leave open the possibility that face detection mechanisms are generally tuned to detect young faces with light skin color (e.g., Rhodes, 2006), independent of the observer's own social group membership and visual experience.

Although we cannot exclude this possibility, in the light of other recent evidence for the influence of experience on face detection (Gobbini et al., 2013) we consider an experiencebased mechanism a more likely explanation for the present findings. This interpretation would dovetail with recent accounts of own-race and own-age effects in face recognition memory. For example, the "experience-based holistic account" by Rossion and Michel (2011) holds that memory deficits for other-race (and potentially other-age) faces result from a poor match between the faces' unfamiliar morphology and an experience-derived template representing the global structure of an average face. Consequently, information diagnostic for discriminating individual out-group faces is processed in a less holistic, more piecemeal fashion, and thus less efficiently (Tanaka et al., 2004; Michel et al., 2006; de Heering and Rossion, 2008). In support of this notion, the detrimental effect of inversion on recognition memory is reduced for other-race (Rhodes et al., 1989; Hancock and Rhodes, 2008; Rhodes et al., 2009) and other-age (Kuefner et al., 2008) faces. Adopting this view, face detection could involve fitting the visual input to a face template that is shaped by the observer's specific experience with faces. The goodness of fit between the visual input and this experience-based face template would determine detection performance and equip faces from one's own social categories with an advantage in gaining access to awareness.

### **UNCONSCIOUS PROCESSING OF FACIAL RACE AND AGE OR MERE TUNING OF FACE DETECTION MECHANISMS?**

In the present study we recorded the duration of perceptual suppression as a marker of different perceptual sensitivities to faces from the observer's own and other race or age group. A number of previous b-CFS studies went one step further and took a difference in breakthrough from CFS as evidence for differential unconscious processing occurring while stimuli are still suppressed (e.g., Jiang et al., 2007; Stein et al., 2011c). Most commonly, this inference rested on the comparison to a binocular control condition not involving CFS. However, we have recently provided theoretical and empirical reasons that question the logic of relying on a control condition to infer unconscious processing under interocular suppression (Stein et al., 2011a; Stein and Sterzer, 2014). Therefore, following other recent b-CFS studies (e.g., Yang et al., 2007; Tsuchiya et al., 2009; Zhou et al., 2010b; Stein et al., 2014), here we did not include a binocular control condition and do not claim that differences in suppression durations necessarily reflect differential unconscious processing of facial race and age under CFS.

One may still argue that, because faces presented under CFS went undetected for several seconds, differences in breakthrough need to reflect unconscious processing of facial race and age during this long period of subjective invisibility. However, comparable detection latencies can be obtained with techniques other than CFS, such as difficult visual search for faces (e.g., Garrido et al., 2008). Thus, the mere length of overall response times cannot be taken as proof of unconscious processing. To provide unequivocal evidence for unconscious processing, one would need to demonstrate that a subliminal stimulus that is rendered permanently invisible still has some influence on a measure of perceptual or cognitive processing (Stein et al., 2011a; Stein and Sterzer, 2014). Adopting this dissociation logic, neuroimaging studies revealed that neural responses differentiate between invisible faces and non-face stimuli (e.g., Jiang and He, 2006; Sterzer et al., 2008, 2009; for a review see Sterzer et al., 2014). There is only limited evidence, however, for specific facial features being processed unconsciously (Adams et al., 2010; Xu et al., 2011). Most studies indicate that the representation of facial shape, gender, identity, expression, and eye gaze requires awareness (Moradi et al., 2005; Shin et al., 2009; Yang et al., 2010; Amihai et al., 2011; Stein and Sterzer, 2011; Stein et al., 2012a). Amihai et al. (2011) found that faces rendered invisible through CFS failed to induce race adaptation aftereffects, indicating that there is no unconscious processing of facial properties that discriminate faces from different races. It thus appears more likely that the ownrace and own-age biases observed in the present study reflect processing differences at the transition to awareness (cf. Gayet et al., 2014), that is, differences in stimulus detectability (Stein et al., 2011a; Stein and Sterzer, 2014).

## **POSSIBLE NEURAL MECHANISMS**

Previously, we found evidence for face detection mechanisms in adult observers being broadly tuned to all head-shaped visual patterns with two dark blobs over one dark blob on a lighter background (Stein et al., 2011b), similar to the face-like stimuli that optimally drive newborns' orienting behavior (e.g., Farroni et al., 2005). This finding led us to speculate that the initial detection of a face might rely on an inborn subcortical face detection pathway involving the superior colliculus, pulvinar, and the amygdala (de Gelder et al., 2003; Johnson, 2005; Nguyen et al., 2013). The present findings appear inconsistent with such a coarsely tuned subcortical face detection pathway, because the processing of relatively subtle differences in face shape and surface reflectance likely requires more elaborate cortical visual processing. Indeed, face-sensitive cortical visual regions such as fusiform and occipital face areas exhibit differential responses to own- and other-race faces (Golby et al., 2001; Feng et al., 2011; Natu et al., 2011) and, possibly, to own- and other-age faces (Ebner et al., 2013). Moreover, the impact of face inversion on the early face-sensitive event-related potentials N170 is larger for own- than other-race faces (Vizioli et al., 2010). Consistent with the present findings, this suggests that early cortical markers of structural face encoding are finely tuned to own-race faces. One important task for future studies is to directly relate these neural measures of face processing to facilitated awareness of own-race and own-age faces. Another interesting direction for future neuroimaging work is to determine whether cortical responses to faces suppressed through CFS (Jiang and He, 2006; Sterzer et al., 2008, 2009) distinguish faces from different race and age groups without awareness.

#### **CONCLUSION**

In conclusion, the modulation of FIEs by race and age revealed in the present study demonstrates that the perceptual mechanisms governing awareness of faces are not only sensitive to the spatial configuration of facial parts, but also to variations in face shape and surface reflectance properties. These findings show that face detection mechanisms are more complex than previously thought and provide a first indication that experience fine-tunes the earliest levels of visual processing to faces from our own social groups.

#### **ACKNOWLEDGMENTS**

Philipp Sterzer and Timo Stein were supported by the German Research Foundation (grants STE 1430/2-1, STE 1430/6-1, and STE 2239/1-1).

#### **REFERENCES**


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 April 2014; accepted: 14 July 2014; published online: 01 August 2014*. *Citation: Stein T, End A and Sterzer P (2014) Own-race and own-age biases facilitate visual awareness of faces under interocular suppression. Front. Hum. Neurosci. 8:582. doi: 10.3389/fnhum.2014.00582*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Stein, End and Sterzer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Altering second-order configurations reduces the adaptation effects on early face-sensitive event-related potential components

## *Pál Vakli 1, Kornél Németh1, Márta Zimmer 1, Stefan R. Schweinberger 2,3 and Gyula Kovács1,2,3 \**

<sup>1</sup> Department of Cognitive Science, Budapest University of Technology and Economics, Budapest, Hungary

<sup>2</sup> Institute of Psychology, Friedrich Schiller University of Jena, Jena, Germany

<sup>3</sup> DFG Research Unit Person Perception, Friedrich Schiller University of Jena, Jena, Germany

#### *Edited by:*

Davide Rivolta, University of East London, UK

#### *Reviewed by:*

Davide Rivolta, University of East London, UK Romina Palermo, University of Western Australia, Australia

#### *\*Correspondence:*

Gyula Kovács, Institute of Psychology, Friedrich Schiller University of Jena, Leutragraben 1, 07743 Jena, Germany e-mail: gyula.kovacs@uni-jena.de

The spatial distances among the features of a face are commonly referred to as secondorder relations, and the coding of these properties is often regarded as a cornerstone in face recognition. Previous studies have provided mixed results regarding whether the N170, a face-sensitive component of the event-related potential, is sensitive to secondorder relations. Here we investigated this issue in a gender discrimination paradigm following long-term (5 s) adaptation to normal or vertically stretched male and female faces, considering that the latter manipulation substantially alters the position of the inner facial features. Gender-ambiguous faces were more likely judged to be female following adaptation to a male face and vice versa. This aftereffect was smaller but statistically significant after being adapted to vertically stretched when compared to unstretched adapters. Event-related potential recordings revealed that adaptation effects measured on the amplitude of the N170 show strong modulations by the second-order relations of the adapter: reduced N170 amplitude was observed, however, this reduction was smaller in magnitude after being adapted to stretched when compared to unstretched faces. These findings suggest early face-processing, as reflected in the N170 component, proceeds by extracting the spatial relations of inner facial features.

**Keywords: second-order relations, face processing, N170, neural adaptation, face aftereffect**

#### **INTRODUCTION**

Human faces invariably contain the same basic features positioned in the same fashion. This basic feature configuration is called first-order relational information (CONF1st; Diamond and Carey, 1986) and distinguishes the category of faces from other non-face object categories (Maurer et al., 2002). The variations of metric distances between these facial features is referred to as second-order relational information (CONF2nd; Diamond and Carey, 1986). Results show that humans are highly sensitive to such CONF2nd (Haig, 1984) and it has been suggested that they are important for face recognition and the discrimination of individual faces from each other (Tanaka and Farah, 1991; Tanaka and Sengco, 1997; Leder and Bruce, 2000; Rotshtein et al., 2007; Richler et al., 2009).

Although previous results underline the importance of CONF2nd in the representation of face identity, this view has been challenged more recently. First, it has been shown that face recognition based exclusively on these properties is relatively poor when they remain within the range of real-world variations (Taschereau-Dumouchel et al., 2010). Second, geometrical distortions that affect second-order relations have little or no effect on face recognition performance either (Hole et al., 2002), suggesting that the extraction of simple distances between facial features is not crucial for face recognition.

In the past few years, electrophysiological studies have focused on the N170 event-related potential (ERP) component or on its magneto-encephalographic counterpart, the M170, which are face-specific in the sense that they are usually larger to faces than to non-face objects (Bentin et al., 1996; Itier and Taylor, 2004; Gao et al., 2013; Rivolta et al., 2014; for review see Eimer, 2011; Rossion and Jacques, 2011). It has been suggested that the N170 is sensitive to the CONF1st of faces. For example, presenting the same facial features in a scrambled configuration reduces the amplitude of the N/M170 (e.g., Bentin et al., 1996; Gao et al., 2013) while stimulus inversion, that interrupts configural face processing (Yin, 1969), delays and enhances N170 as compared to upright faces (Eimer, 2000a; Rossion et al., 2000; Wiese et al., 2009). Therefore it seems that the N/M170 electromagnetic component is associated with the early and generic structural processing of faces, related to the category of faces per se (Bentin et al., 1996; Jeffreys, 1996; Schendan et al., 1998; Eimer, 2000a,b; Joyce and Rossion, 2005; Kloth et al., 2010; Ganis et al., 2012; Gao et al., 2013).

One aspect, however, that remained largely neglected is the relation of the N/M170 to the processing of CONF2nd. Some results suggest that the N170 is relatively insensitive to manipulations that change the CONF2nd. In a previous study using a passive viewing paradigm, altering faces by displacing the eyes and mouth and hence changing the CONF2nd while leaving the CONF1st intact did not modulate the amplitude or the latency of the N170 component (Halit et al., 2000). The N170 was, however, larger in amplitude in response to faces that were judged Vakli et al. Second-order relations and N170

atypical and unattractive when compared to typical and attractive ones. The authors concluded that the N170 may be related to the encoding of faces in relation to a general face prototype, whereas individual recognition mechanisms may be reflected in the later P2 component which indeed showed sensitivity to the configural modification of faces (Halit et al., 2000). In a more recent experiment, participants were presented with pairs faces that differed either in their local features or their CONF2nd properties (Mercure et al., 2008). The N170 did not show any difference between featurally or configurally manipulated faces neither when the participants had to make same/different judgements, nor when they were explicitly instructed to focus on the featural or configural differences between the members of each face pair. On the other hand, other studies suggest that the N170 of the right hemisphere reflects neural functions that are related to the processing of CONF2nd as well (Scott and Nelson, 2006; Zimmer and Kovacs, 2011). Scott and Nelson (2006) recorded ERPs to previously familiarized faces in which either the eyes and mouth were displaced while leaving the CONF1st unaffected, or the same features were replaced by those of another individual without any change in their position. In a passive viewing paradigm, the overall amplitude and latency of the N170 did not differ in response to the original familiar and modified face stimuli. On the other hand, when analyzing difference waveforms (obtained by subtracting the ERP responses for the altered faces from those evoked by the original ones), the authors found a greater N170 amplitude difference for configural than for feature changes over the right hemisphere. The opposite pattern was observed over the left hemisphere. This result is indicative of the role of CONF2nd in the processing of faces as reflected in the N170 component. Moreover, it has also been demonstrated that adaptation of the N170, that is, the reduction of its amplitude to face repetition is evident and even enhanced over the right hemisphere for faces with expanded and contracted inner features (Zimmer and Kovacs, 2011).

Taken together, the few studies mentioned above yield mixed results regarding whether the N170 reflects face processing mechanisms engaged in the coding of CONF2nd of faces. Another stimulus manipulation that changes the aspect ratio and hence the CONF2nd of faces without affecting CONF1st is stretching the entire face along one of its axes (Hole, 2011). It has been shown that human face recognition is surprisingly robust to stretching (Hole et al., 2002; Bindemann et al., 2008). In a repetition-priming paradigm Bindemann et al. (2008) found that the presentation of stretched and normally proportioned primes leads to no repetition-related effects for the N170 at all, and repetition effects in the subsequent N250r component were equivalent for both prime conditions. However, recent results suggest that exclusive neural mechanisms underlie priming and adaptationaftereffects (Walther et al., 2013). More specifically, Walther et al. (2013) have shown that behavioral priming (reduced response times and increased accuracy in identity classification for repeated faces) and aftereffects (contrastive perceptual biases in identity judgment) can be demonstrated within a single paradigm for unambiguous and ambiguous faces, respectively. Importantly, the two effects never occurred concurrently for the same stimuli, indicating that distinct mechanisms can account for these phenomena. Therefore it is possible that the paradigm of Bindemann et al. (2008)is less suited to test the earlier structural encoding steps of face processing reflected in the N170. In the current experiment we applied an adaptation paradigm (Webster and MacLin, 1999) involving face gender judgments that has previously been shown to lead to robust reductions of the N/M170 (Kovacs et al., 2006; Harris and Nakayama, 2008; Kloth et al., 2010) to test whether changing the aspect ratio of faces changes the adaptation of the N170 as well. We hypothesized that if the N170 reflects solely the processing of the CONF1st of a face, then the adaptation effect on the N170 should be similar for the normal and stretched adaptor conditions. Alternatively, if the extraction of CONF2nd is also reflected in the N170, then changing the aspect ratio of the adaptor face should decrease the N170 adaptation effect, that is, a smaller amplitude reduction or no amplitude reduction at all is expected when compared to normally proportioned adapters.

#### **MATERIALS AND METHODS PARTICIPANTS**

Twelve naive, healthy volunteers (8 females) with normal or corrected-to-normal vision served as subjects (mean age: 21.55 ± 2.42 years) and gave written informed consent. We conform to the protocols approved by the Ethical Committee of the Budapest University of Technology and Economics.

#### **STIMULI**

Face stimuli (gray-scale full-front images, mean luminance = 1.17 cd/m2, 3-3 young males and females) were identical to those of Kovacs et al. (2006), having no obvious genderspecific features and were fit behind an oval mask (6◦× 5.9◦). Female-male pairs were entered into a landmark-based morphing algorithm (Winmorph 3.01). Ten faces, ranging from 100% female to 100% male in 10% steps, were created (leaving out the 50/50% level) and were used as test stimuli. Additional typical female (NORMF) and male (NORMM) faces were chosen as adapters (luminance = 1.1 cd/m2). These images were vertically stretched (STRF and STRM) by 110% and horizontally compressed by 37% and were used as adapters as well. The Fourier phase randomized version (Nasanen, 1999) of a normal face was created and served as an adapter in the control (CTRL) condition. This image lacked any shape information while it preserved the amplitude spectrum of the original image. The inclusion of this stimulus condition was necessary for the ERP analysis in order to assess the putative, category-level N170 adaptation effect; that is, the amplitude reduction in response to face repetition when compared to a condition in which the face is preceded by a non-face stimulus (Kovacs et al., 2006, 2007; Kloth et al., 2010). Thus, five adapter conditions (CTRL, NORMF, NORMM, STRF, STRM) were used in total. To control for low-level adaptation, and since previous studies suggested that the N170 is, to a large extent, independent of the size of the stimuli (Jeffreys, 1996), all adapters differed in size from the targets (NORM: 6.8◦ × 6.3◦, STR: 6.8◦ × 2.4◦) and the position of the test stimulus varied randomly within a 1◦ range along the horizontal and vertical dimensions in each trial.

#### **PROCEDURE**

Stimuli were presented centrally (21 monitor, resolution = 1024 × 768, 60 Hz vertical refresh rate; viewing distance = 72 cm) on a uniform gray background (luminance = 1.3 cd/m2). The five adaptor conditions were given in separate blocks (pseudo-randomized order). All software was written in MATLAB 6.5 (Mathworks Inc.) using Psychtoolbox 2.45. Subjects were tested in a dimly lit room (background luminance <1 cd/m2). They were instructed to fixate a central cross and to perform a two-alternative forced choice gender discrimination task on the test faces. Stimuli were presented according to the method of constant stimuli. The adapter was presented for 5000 ms, followed by a 550 ms gap, and then the test face was presented for 200 ms (**Figure 1**). The five adapter conditions (CTRL, NORMF, NORMM, STRF, STRM) were presented in separate blocks with short breaks in between. Within a block, each test stimuli was presented 5 times, yielding 150 trials in each block. The total recording time was approximately 90 min.

#### **ELECTROPHYSIOLOGICAL RECORDINGS**

ERPs were recorded via 32 Ag/AgCl electrodes placed according to the 10/20 system (impedances <5 k-, sampling rate: 1000 Hz, ground: FT9, reference: AFz). EEG was segmented offline [Brain-Vision Analyser (Brain Products GmbH)] into 1100 ms long trials including a 100 ms prestimulus interval. Trials containing blinks, movements, A/D saturation or EEG baseline drift were rejected.

**FIGURE 1 | Experimental protocol and adaptor images.** In the beginning of each trial, a fixation cross was presented in the center of the screen for 150 ms, followed by one of the five adaptor images (from top to bottom: NORMF, NORMM, STRF, STRM, and CTRL) which was visible for 5000 ms. This was followed by the presentation of a blank screen for 550 ms, and then the test face was displayed for 200 ms.

After artifact rejection 92% of the trials remained available for further analysis. ERPs were averaged separately for each subject, condition and channel. Averages were then digitally filtered (0.5– 25 Hz) with a zero phase shift digital filter and were re-referenced to average.

## **DATA ANALYSIS**

Behavioral data was modeled by the Weibull psychometric function (Psignifit; Wichmann and Hill, 2001). A two-way repeated measures analysis of variance (ANOVA) was conducted with adapter configuration (2 – NORM, STR), adapter gender (2 – F, M) and morph-level (10) as within-subject factors on the participants' female-male decisions. As we were interested in comparing the aftereffects in case of normal and stretched adapters, and our control stimulus was neither matched to the configuration, nor to the gender of the adaptor faces, we excluded this condition from the statistical analysis. To compare the magnitude of adaptation directly in the NORM and STR conditions, we first calculated the magnitude of the aftereffect by subtracting the percentage of trials endorsed as female obtained during the female adapted conditions from that of the male adapted condition, separately for NORM and STR. Next, the magnitude of aftereffect was subjected to a two-way withinsubject ANOVA with configuration (2) and morph-level (10) as factors.

Analyses of the ERP waveforms included the amplitude and latency of three major components: (1) P100 (measured at O1, O2), defined as a main positive deflection around 110 ms and (2) the N170 [P7/P8, P9/P10, PO7/PO8, PO9/PO10; (Eimer, 2000a; Rossion et al., 2000)] and (3) P200 (O1/O2, P5/P6, PO3/PO4, PO7/PO8). After averaging, the individual peak amplitudes were measured for each subject and condition in the time windows of 70-130 ms (P100), 140-210 ms (N170) and 215-320 ms (P200). Latencies were measured at the peak amplitudes. Categorical adaptation effects were determined by comparing the ERP responses found in NORM and STR to those in CTRL. To obtain a sufficient number of trials, data was collapsed across the female and male adaptor conditions as well as across the 10 different target morph-levels (Kovacs et al., 2006; Zimmer and Kovacs, 2011). Amplitude and latency values were entered into a three-way repeated measures ANOVA with adapter type (3, CTRL and NORM or STR), hemisphere (2) and electrode (N170: 4, P200: 4) as within-subject factors. P100 amplitude and latency values were analyzed using a two-way repeated measures ANOVA with adapter type (3) and hemisphere (2) as within-subject factors. All analyses involved Greenhouse-Geisser adjusted degrees of freedom to correct for non-sphericity. *Post hoc* comparisons were made using Bonferroni tests.

## **RESULTS**

## **BEHAVIORAL RESULTS**

Subjects could solve the gender-discrimination task (**Figure 2**), as suggested by the significant main effect of morph-level [*F*(1.98,21.79) = 396.65, *p* < 0.0001, η<sup>2</sup> <sup>p</sup> = 0.97]. The comparison of female and male adaptor conditions confirmed previous findings in the sense that adaptation to a face with a given

gender biases perception towards the opposite gender (Kovacs et al., 2006; Kloth et al., 2010). This is expressed by the fact that significantly more faces were judged as female after being adapted to a male face and vice versa [main effect of adapter gender: *F*(1,11) = 139.17, *p* < 0.0001, η<sup>2</sup> <sup>p</sup> = 0.93]. In addition, the aftereffect was larger for intermediate than for less ambiguous morph-levels [adapter gender × morph-level interaction: *F*(9,99) = 11.7, *p* < 0.0001, η<sup>2</sup> <sup>p</sup> = 0.52]. This effect was independent of the adapter configuration as the three-way interaction was not significant [*F*(3.1,34.13) = 1.47, *p* = 0.24, η<sup>2</sup> <sup>p</sup> = 0.12]. The main effect of configuration tended to show a stronger aftereffect for NORM when compared to STR [*F*(1,11) = 4.43, *p* = 0.059, η<sup>2</sup> <sup>p</sup> = 0.29] and it showed a significant interaction with adapter gender [*F*(1,11) = 5.54, *p* < 0.05, η<sup>2</sup> <sup>p</sup> = 0.33]. *Post hoc* tests confirmed the presence of aftereffects in case of NORM and STR adaptors as well; significantly more faces were judged as female following adaptation to either normal (*p* < 0.0001) or stretched male faces (*p* < 0.01) when compared to their female counterparts. No other main effects or interactions were significant.

The direct comparison of the magnitude of aftereffect (see Materials and Methods) for the two configuration conditions showed that the aftereffect is significantly larger for NORM adapters when compared to STR [main effect of configuration: *F*(1,11) = 5.54, *p* = 0.038, η<sup>2</sup> <sup>p</sup> = 0.33]. The aftereffect was larger for the ambiguousfaces when compared to less ambiguous ones [main effect of morph-level: *F*(9,99) = 11.72, *p* < 0.0001, η<sup>2</sup> <sup>p</sup> = 0.51]. Altogether, these results suggest that adapting to a stretched face is able to bias the perception of a subsequent ambiguous face, but to a lesser degree than a normal, normally proportioned adapter does.

#### **EVENT-RELATED POTENTIAL RESULTS**

The early component peaks P1, N170, and P200 were observable at their typical latencies in the event-related potential following the onset of the test faces (**Figure 3**). The N170 was strongly affected by the type of adaptor image (**Figure 4**) in the sense that both NORM and STR led to lower amplitudes than CTRL [**Figure 5**; main effect of adaptation: *F*(2,22) = 49.44, *p* < 0.0001, η2 <sup>p</sup> = 0.82]. This adaptation effect was smaller over the left when compared to the right hemisphere [interaction of hemisphere and adapter condition: *F*(2,22) = 12.6, *p* < 0.001, η<sup>2</sup> <sup>p</sup> = 0.53] and somewhat larger for more superior (P7, P8, PO7, PO8) when compared to more inferior electrodes [P9, P10, PO9, PO10; electrode × adapter interaction: *F*(1.92,21.07) = 6.3, *p* < 0.01, η2 <sup>p</sup> = 0.37]. STR led to lower N170 amplitudes than CTRL (*post hoc* test: *p* < 0.0001 for both hemispheres), reflecting categorical adaptation effects, in spite the changes in CONF2nd. However, STR led to significantly higher N170 amplitudes than NORM (*p* < 0.001 for both hemispheres), suggesting that the alterations of CONF2nd modulate the adaptation processes as well.

A significant main effect of adapter condition was observed [*F*(2,22) = 9.93, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.47] due to the N170 latencies being significantly longer after being adapted to NORM when compared to CTRL (*p* < 0.001). In addition, the latencies were significantly shorter over the right when compared to the left hemisphere [main effect of hemisphere: *F*(1,11) = 17.17, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.61] and over P9/P10 when compared to the electrodes P7/P8 (*p* < 0.01) and PO7/PO8 [*p* < 0.01; main effect of electrode: *F*(2.13,23.42) = 8.11, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.42]. Altogether these results suggest that the early and generic structural steps of face processing, reflected in the N170, are sensitive to both the first and second-order configuration changes of the stimuli.

The amplitude of the P200 was significantly higher over the right when compared to the left hemisphere [main effect of hemisphere; *F*(1,11) = 5.02, *p* < 0.05, η<sup>2</sup> <sup>p</sup> = 0.31]. A significant main

**FIGURE 4 | Grand average ERPs for the CTRL (gray line), NORM (black continuous line) and STR (black dashed line).** The bottom of the figure depicts the topographical maps of the activity with white dots marking the electrode locations, used for N170 analysis (10 ms time-window centered on the peak). Negativity is indicated by blue.

effect of electrode [*F*(2.14,23.5) = 4.64, *p* < 0.05, η<sup>2</sup> <sup>p</sup> = 0.3] was also observed due to the P200 being higher in amplitude over the PO3/PO4 when compared to the P5/P6 electrodes (*p* < 0.01). In addition, a significant adapter condition × hemisphere interaction [*F*(2,22) = 4.86, *p* < 0.05, η<sup>2</sup> <sup>p</sup> = 0.30] was observed. Post-hoc comparisons, however, failed to show any consistent adaptation effects over either hemisphere. For the latency of the P200, only a significant main effect of electrode was observed [*F*(3,33) = 5.11, *p* < 0.05, η<sup>2</sup> <sup>p</sup> = 0.32] due to the P200 being longer in latency over the P5/P6 when compared to the O1/O1 (*p* < 0.05) and PO7/PO8 electrodes (*p* < 0.05). Finally, regarding the latency of the P100 we observed a significant main effect of adapter condition [*F*(2,22) = 10.79, *p* < 0.001, η<sup>2</sup> <sup>p</sup> = 0.5]. Post hoc tests revealed that the P100 peaked later after being adapted to NORM when compared to CTRL (*p* < 0.05) and STR (*p* < 0.001). No other main effects and interactions regarding P100 latencies or amplitudes were significant.

#### **DISCUSSION**

In the present study we tested the effect of CONF2nd on genderspecific aftereffects by vertically stretching the adaptor images. We found contrastive biases in gender perception after adaptation to normal and vertically stretched faces. This finding corroborates the results of previous experiments demonstrating that gender-ambiguous faces are perceived as more masculine after prolonged exposure to a female face and vice versa (Webster et al., 2004; Kovacs et al., 2006; Kloth et al., 2010; Zhao et al., 2011). However, the strength of the aftereffect was smaller for stretched than for unstretched adapters, which suggests that the aftereffects are sensitive to the CONF2nd of the adapters. The pattern of our results implies that gender-specific aftereffects rise partially from processes sensitive to the CONF2nd of faces. This is surprising, given the facts that (1) stretching of a face leaves face-recognition performance unaffected (Hole et al., 2002; Bindemann et al., 2008) and (2) aftereffects are suggested to have greater transfer across transformations preserving identity (Yamashita et al., 2005). Our results challenge this theory (see Tillman and Webster, 2012 for similar conclusions) and suggest that changes of CONF2nd affect gender-specific aftereffects, in spite of their identity-preserving nature. Previous studies have emphasized the role of features in the perception of face gender, since face parts such as the eyebrows, eyes or mouth convey sufficient information for gender discrimination even when they are presented in isolation (Brown and Perrett, 1993; Yamaguchi et al., 1995). Nevertheless, there is evidence for the contribution of relational information to the perception of face gender as well. For example, changes in eyebrow-eyelid distance has been shown to affect gender classification performance (Campbell et al., 1999). Thus, it is possible that stretching faces in the present study altered such relational cues and hence affected the masculinity/femininity of the adaptor faces, which resulted in the decrease of gender-specific aftereffects.

It is important to note that the distortions applied in our study changed substantially the second-order relations of the face; however, they affected the shape of local features as well. The importance of the second-order relations in face discrimination is supported by the observation that differences in the metric distances between facial features play a significant role in perceiving two faces as same or different as well (Rotshtein et al., 2007). Further studies [e.g., by applying the so-called "Jane stimuli" (see Mondloch et al., 2002)] are necessary to investigate the relative contribution of facial features and second-order relations on the N170 adaptation effects.

So far, very few studies have tested the effect of face configuration on N/M170 and these experiments could convincingly show its sensitivity to the CONF1st (Bentin et al., 1996; Eimer, 2000a; Rossion et al., 2000; Gao et al.,2013). Prior results regarding

CONF2nd led to unequivocal results with studies emphasizing either the relative insensitivity (Halit et al., 2000; Bindemann et al., 2008; Mercure et al., 2008) or sensitivity (Scott and Nelson, 2006; Zimmer and Kovacs, 2011) of N170 to CONF2nd. The current results show category-specific adaptation effects for STR in the form of lower N170 amplitudes when compared to CTRL, but this adaptation effect was smaller than the one observed for NORM. This suggests that the generic, category-specific faceprocessing steps, reflected in these comparisons of the N170 (Kloth et al., 2010), mirror both the first and second-order properties of stimuli.

Previous studies that failed to demonstrate sensitivity to CONF2nd in the N170 time window typically compared the overall neural response (i.e., the amplitude and latency of the N170) to intact and configurally altered face stimuli (Halit et al., 2000; Mercure et al., 2008). Assessing the effect of stimulus repetition on neural responses, on the other hand, offers a more sensitive method to disentangle the nature of neural representations in a specific brain area or time window. This approach proved to be effective in functional imaging research with the presumption that repetition reduces neural activity only if the subsequently presented stimuli activate the same neural population. This allows for the identification of separate subpopulations of neurons selective for a particular stimulus attribute whose responses cannot be discerned when measuring the overall neural activity (Grill-Spector et al., 2006; Krekelberg et al., 2006). In this respect, our results complement previous findings demonstrating that faceselective areas of the human occipito-temporal cortex show less adaptation to repeated faces when they differ in their secondorder relations (Rhodes et al., 2009). To conclude, it is possible that the modulation of the neuronal responses by adaptation is more sensitive to the relatively small changes of CONF2nd stimulus manipulations when compared to the absolute electrophysiological response (for the comparison of stimulus selectivity of neural response and adaptation see Sawamura et al., 2006), explaining why previous studies did not find the N170 to be sensitive to CONF2nd.

Previous studies have shown that face gender aftereffects are accompanied by the reduction of the BOLD signal in the fusiform face area and the occipital face area (Kovacs et al., 2008; Nagy et al., 2012). On the basis of these results, it is possible that the aftereffects observed in the present study reflect the adaptation of these face-selective cortical areas; however, this claim should be investigated with functional imaging methods.

Evidence is surprisingly scarce regarding the physiological mechanisms underlying face adaptation aftereffects. It has been shown that cholinergic mechanisms play a role in the face repetition effects observed in the fusiform gyrus (Thiel et al., 2002). In the somatosensory domain, the contribution of glutamatergic neural systems to perceptual adaptation has been demonstrated (Folger et al., 2008). Thus, while certainly speculative, it is possible that cholinergic and glutamatergic neurotransmitter pathways play a role in the face adaptation effects we observed. Further studies could investigate this possibility by means of specific neuro-pharmacological testing.

## **CONCLUSION**

The present results demonstrate that facial aftereffects evoked by adaptation to normal or vertically stretched faces show sensitivity to second-order relations of facial features. In accordance with the behavioral results, adaptation effects on the N170 ERP component were present, but were smaller in magnitude, after being adapted to stretched faces, suggesting the sensitivity of N170 to second-order relations manipulated by linear distortion.

## **AUTHOR CONTRIBUTIONS**

Designed the experiment: Stefan R. Schweinberger, Gyula Kovács, Márta Zimmer; data acquisition: Kornél Németh, Pál Vakli; data analyses: Kornél Németh, Pál Vakli, Gyula Kovács; interpretation of the data: Kornél Németh, Pál Vakli, Stefan R. Schweinberger, Gyula Kovács; provided materials: Kornél Németh, Pál Vakli, Márta Zimmer, Gyula Kovács; wrote the article: Kornél Németh, Pál Vakli, Márta Zimmer, Stefan R. Schweinberger, Gyula Kovács; proofed/revised the article: Stefan R. Schweinberger, Kornél Németh, Márta Zimmer, Gyula Kovács.

### **ACKNOWLEDGMENTS**

This work was supported by the Hungarian Scientific Research Fund (OTKA) PD 101499 (Márta Zimmer), by the Deutsche Forschungsgemeinschaft (KO 3918/1-2, 2-1; Gyula Kovács) and by the National Development Agency (TÁMOP; TÁMOP-4.2.2/B-10/1-2010-0009; Pál Vakli).

### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 March 2014; accepted: 28 May 2014; published online: 12 June 2014. Citation: Vakli P, Németh K, Zimmer M, Schweinberger SR and Kovács G (2014) Altering second-order configurations reduces the adaptation effects on early facesensitive event-related potential components. Front. Hum. Neurosci. 8:426. doi: 10.3389/fnhum.2014.00426*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Vakli, Németh, Zimmer, Schweinberger and Kovács. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Facilitated detection of social cues conveyed by familiar faces

## *Matteo Visconti di Oleggio Castello1†, J. Swaroop Guntupalli 1†, Hua Yang1 and M. Ida Gobbini 1,2\**

*<sup>1</sup> Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA*

*<sup>2</sup> Department of Medicina Specialistica, Diagnostica e Sperimentale (DIMES), Medical School, University of Bologna, Italy*

#### *Edited by:*

*Aina Puce, Indiana University, USA*

#### *Reviewed by:*

*Joseph Allen Harris, Otto-von-Guericke Universität Magdeburg, Germany Bozana Meinhardt-Injac, Johannes Gutenberg University Mainz, Germany*

#### *\*Correspondence:*

*M. Ida Gobbini, DIMES at the Medical School, University of Bologna, Viale C. Berti-Pichat 5, 40126 Bologna, Italy e-mail: mariaida.gobbini@unibo.it; maria.i.gobbini@dartmouth.edu*

*†These authors have contributed equally to this work.*

Recognition of the identity of familiar faces in conditions with poor visibility or over large changes in head angle, lighting and partial occlusion is far more accurate than recognition of unfamiliar faces in similar conditions. Here we used a visual search paradigm to test if one class of social cues transmitted by faces—direction of another's attention as conveyed by gaze direction and head orientation—is perceived more rapidly in personally familiar faces than in unfamiliar faces. We found a strong effect of familiarity on the detection of these social cues, suggesting that the times to process these signals in familiar faces are markedly faster than the corresponding processing times for unfamiliar faces. In the light of these new data, hypotheses on the organization of the visual system for processing faces are formulated and discussed.

**Keywords: face perception, familiar face recognition, attention, visual search, eye gaze, head angle, social cognition**

## **INTRODUCTION**

In previous work we have proposed that recognition of familiar faces is based on activation of a distributed network of areas including the theory of mind areas and areas involved in the emotional response (Gobbini et al., 2004; Leibenluft et al., 2004; Gobbini and Haxby, 2006, 2007; Gobbini, 2010). In this manuscript we present new data in the context of a series of psychophysical experiments that focus on visual processing of familiar faces.

We are constantly exposed to faces and face perception is extremely efficient and quick. Even in the context of disrupted visual awareness through various forms of masking and interocular suppression, faces seem to be detected and processed by the visual system more so than other categories of stimuli. For example, upright faces break through interocular suppression one-half second faster than do inverted faces, indicating that the upright facial configuration is processed even when the subject is unaware of the image (Jiang et al., 2007; Yang et al., 2007; Zhou et al., 2010). Social cues such as facial expressions, head direction, and eye gaze direction also appear to be processed when the subject is unaware of the face image, as evidenced by faster breakthrough of interocular suppression by faces with fearful expressions, faces presented in full-frontal view, and faces with eye gaze directed at the viewer (Jiang and He, 2006; Yang et al., 2007; Stein et al., 2011; Gobbini et al., 2013a). Neural response to masked or suppressed faces with fearful expression has been reported in the amygdala suggesting the possibility of a subcortical pathway for fast processing of socially relevant stimuli (Morris et al., 1998; Whalen et al., 1998; Williams et al., 2004; and for review see Tamietto and de Gelder, 2010; but see also Pessoa and Adolphs, 2010; and Valdés-Sosa et al., 2011).

Measurement of saccadic reaction has shown that we can detect a face as fast as 100 ms after stimulus onset (Crouzet et al., 2010). Some research supports the idea that faces, as colors, shapes or orientation might be processed pre-attentively (according to the definition of parallel processing proposed by Treisman and Gelade, 1980), in an automatic way (Hershler and Hochstein, 2005 but see also VanRullen, 2006). Interestingly, the first facespecific evoked potential has been consistently reported at around 170 ms post-stimulus (Bentin et al., 1996; Puce et al., 1999; Eimer and Holmes, 2002) raising the question of which aspect and what level of processing at short latencies (before the N170) is performed to enable rapid face detection.

According to our functional model on face perception (Haxby et al., 2000, 2002) the encoding of the structural aspect of a face that affords recognition of identity is performed by a distinct pathway as compared to the one that is involved with perception of facial movements and, more generally, biological motion (Allison et al., 2000; O'Toole et al., 2002; Winston et al., 2004; Gobbini et al., 2007, 2011; Pitcher et al., 2012). While the ventral temporal pathway, in particular the fusiform gyrus seems to be involved in recognition of the unchangeable aspect of a face, the posterior superior temporal sulcus (pSTS) seems to be involved with perception of the changeable aspects of a face. The STS also seems to be involved in detecting other people's direction of attention. Neurons in the anterior temporal cortex of the monkey are tuned to direction of others' social attention cues, such as head orientation, eye gaze and body movements (Perrett et al., 1985). In humans, fMRI has shown specific regions such as the posterior and anterior superior temporal sulcus, the fusiform gyrus, the medial prefrontal cortex, preferentially engaged by eye gaze and head turns highlighting how dedicated neuronal population are involved in processing relevant social cues (Hoffman and Haxby, 2000; Pageler et al., 2003; Pelphrey et al., 2003; Engell and Haxby, 2007; Schweinberger et al., 2007; Carlin et al., 2012; and for a review Senju and Johnson, 2009).

We have shown that personally familiar faces are detected more efficiently than are faces of strangers in conditions in which attentional resources are reduced and in which faces are rendered subjectively invisible (Gobbini et al., 2013b). Visual search paradigms used by others have reported faster detection of familiar faces in a visual search paradigm (Tong and Nakayama, 1999; see also Deuve et al., 2009) and showed that detecting a specific identity involves a serial search with no pop-out. In Tong and Nakayama (1999), detection of one's own face or a familiar face was faster than detection of unfamiliar faces with a smaller effect of familiarity on search speed that was not significant in one experiment and less than half of the effect on detection speed in a second experiment.

With the present experiment we tested whether social cues, which are supposedly processed by a distinct pathway from that for identity, are detected more efficiently if conveyed by familiar faces. We predicted that the familiarity of a face affects not only the visual representation of invariant aspects for identification, but also the perception of subtle changes that can signal an internal state, such as direction of attention. The extensive expertise with a familiar face might result in efficient processing that is independent of capture of attention. We used a visual search paradigm in which the task is to detect a target with a specified direction of attention—toward or away from the viewer—as conveyed by the gaze direction or head angle of personally familiar or unfamiliar. Importantly, all distractors on target present trials were unfamiliar faces to avoid confounding the effect of faster processing of the target social cue in a familiar face from attentional capture by the familiar face—an effect that would lead to biasing search to check the familiar face containing the target feature earlier than the distractor faces (such a confound muddied the interpretation of results in Buttle and Raymond, 2003). If distractors are familiar faces, a shallower slope for the effect of set size on reaction time (response time vs. set size function, RSF) could be due to faster processing of the familiar face distractors rather than to attentional biasing of a serial search, as was the case in Persike et al. (2013). Thus, in our paradigm an effect of the familiarity of the face with the target feature on the RSF would indicate attentional capture unconfounded by faster processing of distractors. Conversely, an effect of familiarity on target social cue detection independent of an effect on RSF would indicate faster processing in familiar faces independent of attentional capture. Results showed no effect of the familiarity of the target face on the RSF, indicating that the main effect of familiarity on reaction time that was constant across set sizes was due to faster processing of only the target stimulus, not to altered processing of distractors or to an attention-driven bias to process familiar target stimuli earlier in a visual search.

Thus, our results confirm our prediction. Two facial cues for others' direction of attention—gaze direction and head angle are detected much faster if the faces are personally familiar, corroborating our previous findings on facilitated detection of personally familiar faces under conditions of lack of awareness and reduced attentional resources (Gobbini et al., 2013b). These results suggest that the learned representation involves more than invariant features for identifying familiar individuals but also changeable features for social communication.

## **METHODS**

## **PARTICIPANTS**

Two sets of four friends (three females, five males) participated in the experiment. As a criterion for familiarity, we chose friends that had extensive interaction with each other for more than a year before the experiment. They were recruited from the Dartmouth College community. Their pictures were taken in different head and gaze orientations to be used as stimuli in the experiment. To ensure that all the stimuli were equal in terms of image quality, we took the pictures in a photo studio with identical lighting and camera placement and settings. Subjects were reimbursed for their participation; all gave written informed consent to use their pictures and to participate in the experiment. The experiment was approved by the local IRB committee.

## **STIMULI**

For each subject we created three sets of images: target familiar faces (three identities), target unknown faces (three identities), and distractor unknown faces (five identities). Three target unknown individuals were pseudo-randomly sampled from a set of eight identities (four females). Five different identities were used as distractors. Images of the distractor face identities were never used as targets. The pictures of the eight unfamiliar individuals had been previously taken at the University of Vermont with the same lighting, camera placement and settings used for the friends.

Images were cropped, resized to 150 × 150 pixels, and then grayscaled using ImageMagick (version 6.8.7-7 Q16, x86\_64, 2013-11-27) on Mac OS X 10.9.2. The average pixel intensity of each image (ranging from 0 to 255) was set to 128 with a standard deviation of 40 using the SHINE toolbox (function *lum-Match*) (Willenbockel et al., 2010) in MATLAB (version 8.1.0.604, R2013a).

## **EXPERIMENTAL SETUP**

The experiment was run on an Apple MacPro 1,1, display Apple Cinema HD (23--) set at a resolution of 1280 × 800 pixels with a 60 Hz refresh rate, using Psychtoolbox (version 3.0.8) (Brainard, 1997; Pelli, 1997; Kleiner et al., 2007) in MATLAB (version 7.8.0.347, R2009a).

Before the actual experiment, subjects practiced the task with a set of unrelated images. They sat at a distance of approximately 80 cm from the screen (eyes to screen) in a dimly lit room. The experiment consisted of four different tasks (see below for a detailed description) divided into four blocks. At the beginning of each block, a visual cue indicated the current task. After two blocks, the script invited the subjects to take a break and let the experimenter know they completed the first part of the experiment. After this break, the experimenter ran the script for the second part, and subjects completed the last two blocks. The order of the tasks was randomized.

Stimuli were presented on a gray background (pixel intensity set to 128 for all the pixels), and were positioned approximately 6.89◦ from the fixation point. Each stimulus had a retinal size of approximately 4*.*08 × 4*.*08◦. Intertrial intervals were randomly jittered from trial to trial, ranging from 800 to 1000 ms, during which subjects were required to maintain fixation on a black cross in the center of the screen. Stimulus presentation ended with the subject's response or after 3000 ms if no response was made. Subjects were not required to maintain fixation during stimulus presentation (**Figure 1**).

### **TASKS**

Subjects were required to detect a target among a different number of distractors (set of 2 or 4 or 6 stimuli), and had to press the left arrow-key (YES) when they found the target, or the right arrow-key (NO) if the target was absent. They heard a beep if they were wrong or if they took too much time to respond (maximum allowed time of 3 s).

The experiment had four tasks. The first two tasks investigated detection of a target with gaze orientation that differed from

**FIGURE 1 | Example of trials with different number of stimulus array used in the experiment.** Stimuli were positioned on a circle, separated by 60◦ from each other, making them equidistant from the fixation point and lying on a regular hexagon. Note that for set sizes of two and four there are three possible shapes that the stimuli can create (rotations of 60 and 120◦ of the shape depicted here), which were randomly chosen from trial to trial. See details in the text.

distractors, controlling for head orientation—all stimuli depicted faces in frontal view. In Task 1 subjects detected a face with gaze directed to the observer among faces with averted gaze. In Task 2 they detected a face with averted gaze among faces with gaze directed to the observer. The other two tasks investigated detection of a target with head orientation that differed from distractors, controlling for gaze orientation—all stimuli depicted faces with gaze directed to the observer. In Task 3 subjects detected a face in full view among faces in profile view (head turned approximately 40◦). In Task 4 subjects detected a face in profile view among faces in full view. The order of the tasks was randomized for each participant.

We manipulated the set size (total number of stimuli on the screen: 2, 4, or 6), the familiarity of the target, and the presence of the target. For all set sizes, the stimuli were positioned on a circle with a radius of 250 px (or 6.89◦ of visual angle) centered on the fixation point, and were positioned on the vertices of a regular hexagon. Thus, all stimuli were equidistant from the fixation point, and the first saccade covered the same distance regardless of the condition. We controlled the position of the stimuli such that the shape they created was always symmetrical with respect to the fixation point (see **Figure 1**). Thus, the total number of possible shapes was 3, 3, and 1 respectively for set sizes of 2, 4, and 6 (for set sizes of 2 and 4, the other possible shapes are rotations of 60 and 120◦ of the shapes in **Figure 1**).

Since we were unable to completely cross the target position and the possible shapes due to time constraints for the experiment, we decided to balance the occurrence of the target in the left and right hemifield, thus avoiding any lateral bias. The shape and the target position were randomly determined for each trial with the constraint that in 50% of the trials the target was on the left side.

The target could be either a familiar or a stranger face. Likewise, on each target absent trial one distractor image was a target face identity (familiar or stranger) with the same gaze and head orientation as the other distractors. Half of target absent trials had a familiar target identity as a distractor, and half had a stranger target identity as a distractor. Thus, the presence of a target identity was not informative on the presence of a target gaze or head orientation.

We also controlled for rightward and leftward orientation of gaze and head angle of targets in Tasks 2 and 4, in which the target had either averted gaze or averted head angle. The orientation of the targets was balanced to the left and right. In Tasks 1 and 3 the orientation of the distractors was similarly balanced. For each trial, all distractors were oriented to one side. Half of the trials had all distractors oriented to the left, and the other half had all distractors oriented to the right.

For each taskwe presented each targetidentity two timesfor each set size, target present or absent, and right- or leftward orientation condition, thus yielding 144 trials per task (Number of target identities × 2 × Set size × Presence of target × Orientation = 6 × 2 × 3 × 2 × 2 = 144). The trial order was randomized.

## **RESULTS**

We analyzed reaction times for target present and target absent trials separately. **Table 1** shows the Reaction Times (RTs) in ms and **Table 2** shows mean *d* values and SE for each task and each condition.

Data were analyzed in R (version 3.0.2, R Core Team, 2013) using a Linear Mixed-Effect Model on RTs and *d* values, as implemented in the package *lme4* (version 1.0-6, Bates et al., 2014). The model was then fitted with Maximum-Likelihood estimation. To find the best fitting model, different models were evaluated according to the AIC (Akaike Information Criterion), and tested by means of a log-likelihood ratio test (Baayen et al., 2008). Once the best model was found, interaction or main fixed effects of this model were also evaluated with a log-likelihood ratio test (Baayen et al., 2008).

Reliability of parameter estimates for main fixed effects and contrasts were evaluated through parametric bootstrapping (10,000 replicates), and then computing 95% basic bootstrap confidence intervals (bCI). Effect sizes for familiarity and 95% bCa confidence intervals (10,000 repetitions) shown in **Tables 3, 4** were computed using the package *bootES* (version 1.01, Kirby and Gerlanc, 2013).

#### **Table 1 | Mean RTs [ms] for each condition and each task, correct responses only.**


**Table 2 | Mean** *d* **values and SE (***N* **= 8, in parenthesis) for each task and condition.**


### **TARGET PRESENT**

We first created a general model entering main effects of task, set size, and familiarity of the target, and the interaction between set size and familiarity; subjects and target items were entered as random effects with random intercepts and random slopes for familiarity. Then we removed random slopes for familiarity (one at a time) to test whether a parsimonious model could be found. Indeed, we found that removing random slopes for both random effects decreased the AIC, while the X2 log-likelihood ratio tests were not significant.

The RSF for familiar and unfamiliar targets were not significantly different, as indicated by a non-significant interaction between familiarity and set size (X<sup>2</sup> (1) = 1*.*28, *p* = 0*.*26). Consequently, we further simplified the model by removing this interaction effect. Thus, this yielded the best model in terms of AIC with task, set size, and familiarity as main fixed effects, and subjects and target items as random effects with random intercepts.

We found a main effect of familiarity (X2 (1) = 21*.*07, *p <* 0*.*0001, parameter estimate = −83.8 ms, 95% bCI: [−115.7, −52.1]), set size (X2 (1) = 385*.*35, *p <* 0*.*0001, parameter estimate = 168.6 ms, bCI: [152.4, 185.1]), and task (X2 (3) = 73*.*94,

**Table 3 | Unstandardized effect size [ms] of familiarity for the Target Present condition and Cohen's d effect size of familiarity across set sizes in the four tasks (bCa bias-corrected and accelerated confidence intervals, computed with 10,000 repetitions).**


*p <* 0*.*0001). The strong effect of set size on target present trials for all tasks indicates that visual search for gaze direction and head angle is serial with no evidence for parallel search or pop-out. Mean slope for the RSF on target present trials for gaze detection was 91 ms/item for gaze direction and 77 ms/item

**Table 4 | Unstandardized effect size [ms] of familiarity for the Target Absent conditions and Cohen's d effect size of familiarity across set sizes in the four tasks (bCa bias-corrected and accelerated confidence intervals, computed with 10,000 repetitions).**


for head angle. Mean difference time for detection of target social cues in familiar and unfamiliar faces was 109 ms for gaze direction and 65 ms for head angle. We found a statistical difference between the two tasks (Gaze vs. Head, parameter estimate = 62.97 ms, bCI: [49.2, 76.7]), but no difference between Task 1 and Task 2 (parameter estimate = 9.18 ms, bCI: [−11.5, 30.1]) nor between Task 3 and Task 4 (parameter estimate = 15.57 ms, bCI: [−5.1, 36.1]). For an overview of all results in the Target Present conditions see **Tables 1**–**3**, **5**, and **Figures 2**, **3**.

#### **TARGET ABSENT**

We ran the same analysis for target absent and found that the best model was again with task, set size, and familiarity as main fixed effects, and subjects and target items as random effects with random intercepts. All interactions (two-way and three-way) were not significant.

We found a main effect of task (X2 (3) = 215*.*88, *p <* 0*.*0001) and set size (X2 (1) = 1443*.*3, *p <* 0*.*0001, parameter estimate = 335.5 ms, bCI: [320.9, 350.0]), but not for familiarity (X<sup>2</sup> (1) = 1*.*3, *p* = 0*.*26, parameter estimate = −15.1 ms, bCI: [−41.9, 11.2]). Mean slope for the RSF on target absent trials for gaze detection was 192 ms/item for gaze direction and 143 ms/item for head angle. A contrast of tasks showed that the first two tasks were statistically different from the last two (Gaze vs. Head, parameter estimate = 95.0 ms, bCI: [82.6, 107.0]), and that Task 3 was statistically different from Task 4 (Detect Full View vs. Detect Profile View, parameter estimate = 47.2 ms, bCI: [29.6, 65.1]), but Task 1 was not statistically different from Task 2 (Detect Direct Gaze vs. Detect Averted Gaze, parameter estimate = 17.6 ms, bCI: [−0.1, 35.6]). For an overview of all results in the Target Absent conditions see **Tables 1**, **2**, **4**, **5**, and **Figures 2**, **3**.

#### *d***-VALUES**

Since many subjects had False Alarm rates of 0, we computed the Hit and FA ratios by adding 0.5 and dividing by N + 1, thus scaling the ratios to avoid extremes. To analyze *d* values, we used the same analyses (Linear Mixed-Effect Models) as described above.

**Table 5 | RSF slope estimates for Target Present and Target Absent conditions for the four tasks and for Familiar/Stranger targets (bCa bias-corrected and accelerated confidence intervals, computed with 10,000 repetitions).**


We found that the best model was with task, set size, and familiarity as main fixed effects, and subjects as random effects with random intercepts. All interactions (two-way and three-way) were not significant.

We found a main effect of set size (X2 (1) = 10*.*26, *p* = 0*.*0014, parameter estimate = −0.1284 [−0.2074, −0.0509]), familiarity (X<sup>2</sup> (1) = 14*.*32, *p* = 0*.*0002, parameter estimate = 0.2490 [0.1258, 0.3769]), and task (X<sup>2</sup> (3) = 7*.*83, *p* = 0*.*0497). The contrasts for task we specified before were not statistically significant for the *d* values: Gaze vs. Head angle, parameter estimate = −0.0538 [−0.3510, 0.0060]; Task 1 vs. Task 2, parameter estimate = 0.0872 [−0.2134, 0.1430]; Task 3 vs. Task 4, parameter estimate = −0.0570 [−0.1081, 0.2547] (see **Table 2** for the mean *d*- values and SE for each condition and each task).

#### **DISCUSSION**

Face perception is arguably one of the most developed visual skills in humans. Faces are detected more readily than other objects (Crouzet et al., 2010). Familiar face perception is especially sensitive and efficient and is dramatically better than unfamiliar face perception (Jenkins and Burton, 2011). Here we show that one class of social cues transmitted by faces—perception of the direction of another's attention—is detected much more rapidly in familiar faces than in unfamiliar faces. In previous work, we have shown that personally familiar faces, as compared to faces of strangers, are detected more readily in conditions with reduced attentional resources and even without awareness (Gobbini et al., 2013b). With the experiments reported in the present manuscript, we extend this line of research to show that the increased efficiency afforded by familiarity includes not only simple detection but also the perception of socially-relevant cues.

We used a visual search paradigm to test the effect of face familiarity on the detection of a target with a different gaze or head orientation. We found that the familiarity of the face with the target feature had a strong effect on detection time but no effect on RSF slopes—in other words, a facilitation of social cue detection that was constant across set sizes. This result indicates that the social cue was detected much faster in familiar than unfamiliar faces and that attentional capture—a bias to process the familiar faces earlier in a serial visual search—did not play a significant role, as such an effect would be reflected in a flatter RSF.

As expected we found that increasing the number of distractors made the task harder as evidenced by increased reaction times and decreased *d* values. Moreover, as expected, we found that detecting a target head orientation was faster than detecting a target gaze direction, albeit with no difference in accuracy. This effect could be due to the fact that head orientation differences are evident in larger changes in the visual stimulus than are gaze direction differences, thus making the visual search easier.

Our results clearly show that detection of target gaze directions and head angles involves a serial visual search with no indication of parallel processing or pop-out. Detection times on target present trials showed a strong effect of set size. This finding is consistent with those of Tong and Nakayama (1999) who found that detection of a target individual (self or a stranger) among distractor faces involved a serial search. Pop-out for simple face detection among non-face distractors was shown in one report using large set sizes (Hershler and Hochstein, 2005) but appears to be due to low level visual features, namely the amplitude spectrum of spatial frequencies (VanRullen, 2006).

Images of familiar and unfamiliar faces were carefully matched. All pictures were made with the same lighting and photographic equipment in a studio setting. Mean luminance and contrast were the same for all stimuli. Thus, spurious low-level differences cannot account for performance differences between the detection of familiar and stranger targets. Indeed, we found a large main effect of familiarity for both the speed and accuracy of target detection.

The slope of the RSF is an indication of how much time is required to check each stimulus for the target feature. Target absent trials require checking all stimuli for the target feature, resulting in RSF slopes that are twice as steep as those for target present trials on which visual search terminates with detection of the target feature. Processing each distractor for gaze orientation, as indicated by the RSF slope on target absent trials, required on average 192 ms, and processing each distractor for head angle required 143 ms. In this context, the effect of familiarity on gaze orientation and head angle tasks (109 ms and 65 ms, respectively) suggests that the times to process these signals in familiar faces are markedly faster than the corresponding processing times for unfamiliar faces.

Familiar faces also may attract attention, biasing visual search to process familiar faces earlier than unfamiliar faces, an effect that also could cause faster detection of social cues in familiar faces. Such an effect, however, would make the RSF slope flatter for familiar target trials than for unfamiliar target trials, an effect that was not significant in the current study. In Tong and Nakayama (1999), the RSF slope was slightly flatter for finding one's own face than for finding an unfamiliar face target in a visual search task. This effect was not significant in their first experiment, with an RSF slope difference of 15 ms/item, and was significant in the second experiment, with an RSF slope difference of 23 ms/item. Estimate of the equivalent effect in our data, based on target present trials as in Tong and Nakayama (1999), was 10 ms/item and not significant. When we include this non-significant effect in a model that accounts for the difference in detection times with both cue processing and RSF slope differences, the facilitation of detection by familiarity is still due mostly to a faster processing of the social cue rather than to looking at familiar faces earlier. The more parsimonious explanation that better fits our data, therefore, is that the target social cue—gaze angle and head direction—is examined in each stimulus in the search array, that this process is serial, that a familiar face is no more likely than an unfamiliar face to be examined earlier in the serial search, and that the social cue is processed more quickly if the face is familiar.

We also found that responding "no" on target absent trials was slowed by 20–40 ms if the distractors all had attention directed away from the viewer, as indicated either by averted gaze or averted head angle. Perceived gaze and head orientation represent strong signals for reallocating attention in humans, and the attentional shift to the side elicited when someone else stares or turns their head away from us appears to be automatic (Friesen and Kingstone, 1998; Frischen et al., 2007). This automatic diversion of attention may be the underlying cause for slower response times on target absent trials when distractor face images had averted gaze or head angle. To summarize, not only are familiar faces detected faster than are faces of strangers (Tong and Nakayama, 1999; Deuve et al., 2009; Ramon et al., 2011; Gobbini et al., 2013b) but also cues that represent strong social signals (Perrett et al., 1985; Senju and Johnson, 2009; Stein et al., 2011; Gobbini et al., 2013a)—eye gaze and head direction—are detected much more rapidly if they are perceived in a familiar face.

We spend a great amount of time at looking at faces of immediate family and close friends that become intimately familiar over repeated exposure and social interaction extending over years. This slow and prolonged exposure can contribute to the development of a more stable representation of the visual appearance of a familiar face. Personally familiar faces, in contrast to the faces of strangers, are detected faster and recognized with great efficiency in conditions of poor visibility and over large changes in a head angle, lighting, partial occlusion, and age (Burton et al., 1999; O'Toole et al., 2006; Johnston and Edmonds, 2009; Burton and Jenkins, 2011). Personally familiar faces are among the most highly-learned and salient visual stimuli for humans and are associated with changes in the representation of both the visual appearance and associated person knowledge, affording highly efficient and robust recognition. By contrast, recognition of unfamiliar faces—identifying a target unfamiliar face among other faces—is surprisingly inaccurate (Burton et al., 1999; O'Toole et al., 2006; Burton and Jenkins, 2011). Whereas the performance of machine vision systems for face recognition is equivalent to human performance for unfamiliar face recognition, human performance for familiar face recognition is much better (Jenkins and Burton, 2011; O'Toole et al., 2011). Understanding the perceptual and neural mechanisms underlying this remarkable performance is of great interest for understanding how neural systems become highly efficient for highly salient stimuli and for designing better machine vision systems. The relative roles played by detectors for fragmentary or holistic visual features and by top-down influences of semantic information in the facilitation of familiar face processing are unknown. Face detection and perception of the direction of another's attention, however, appear to be extremely fast, efficient, and independent of attentional resources and even awareness (Jiang et al., 2007; Crouzet and Thorpe, 2011; Gobbini et al., 2013a), suggesting that top-down influences of semantic information may play a minor role and that facilitation of familiar face processing may be due mostly to the development of detectors of fragmentary or holistic visual features that are specific to familiar individuals.

A distributed system for face perception has been described in humans (Haxby et al., 2000, 2002; Ishai et al., 2005; Gobbini and Haxby, 2007; Haxby and Gobbini, 2011) and monkeys (Tsao et al., 2008; Freiwald and Tsao, 2010). In humans the system includes visual cortical areas that are involved in perception of invariant visual attributes diagnostic of identity and perception of changeable aspects for facial expression and speech (the "core system") and additional areas involved in representation of information associated with faces, such as person knowledge, emotion, and spatial attention (the "extended system") (Haxby et al., 2000, 2002; Ishai et al., 2005; Gobbini and Haxby, 2007; Taylor et al., 2009; Natu and O'Toole, 2011; Bobes et al., 2013). Repeated exposure to faces might result in natural and protracted learning that tunes this hierarchical and distributed system at all levels to afford efficient and robust detection and identification of these faces. This could be due to development of representations of the visual appearance across many different changes in head angle, lighting, expression, and partial occlusion. The integration of multiple representations into a general representation of an individual could help build a system that is stable, robust, and efficient (Bruce, 1994; Burton et al., 2011). Neurophysiological data from monkeys suggest that a view-independent representation of faces is achieved through a series of processing steps from posterior toward more anterior face responsive patches in the temporal cortex that exhibit population responses tuned to head angle more posteriorly (MF/ML) and to head-angle invariant face identity more anteriorly (AM) (Freiwald and Tsao, 2010). In humans, face areas in the core system are tuned differentially to face parts (the occipital face area, OFA), invariant aspects that support recognition of identity (the fusiform face area, FFA) and changeable aspects such as facial expression, eye gaze, and speech movements (the pSTS). In addition, human face areas have been described in anterior temporal and inferior frontal cortices (the ATFA and IFFA) that may play a critical role in identification (Rajimehr et al., 2009; Kriegeskorte et al., 2007; Natu et al., 2010; Nestor et al., 2011; Kietzmann et al., 2012; Anzellotti and Caramazza, 2014; Anzellotti et al., 2014).

Classical cognitive models on face perception and recognition posit that visual recognition necessarily precedes access to person knowledge (Bruce and Young, 1986). Evoked potential studies have shown that the first face-specific response to a face, the N170, is not modulated by familiarity (Bentin et al., 1999; Puce et al., 1999; Eimer, 2000; Paller et al., 2000; Abdel Rahman, 2011 but see also Caharel et al., 2011). Instead, modulation of the response by familiarity appears at later latencies (greater than 250 ms) (Eimer, 2000; Schweinberger et al., 2004; Tanaka et al., 2006). Whereas early face-specific evoked potentials are recorded in posterior temporal locations, the later potentials that are modulated by familiarity are recorded in temporal, frontal and parietal locations (Bentin et al., 1999; Puce et al., 1999; Eimer, 2000; Tanaka et al., 2006). Faster detection without awareness of personally familiar faces as compared to faces of strangers suggest that early face processing that precedes explicit recognition may be facilitated for personally familiar faces (Gobbini et al., 2013b). Models of object perception hypothesize that the recognition of objects despite pronounced changes in appearance is due to a multistep sequence of processing, characterized by stages in which stimulus features of increasing complexity are analyzed and combined until a representation, invariant to visual transformation is achieved in the inferior temporal cortex (Ullman et al., 2002; Riesenhuber and Poggio, 2002; Serre et al., 2007; DiCarlo et al., 2012; but see also Kravitz et al., 2013).

Psychophysical studies have shown that faces can be detected very rapidly, with the earliest reliable saccades to faces at 100– 110 ms (Crouzet et al., 2010; Crouzet and Thorpe, 2011). Face specific patterns of neural activity can be detected as early as 100 ms with EEG using multivariate pattern analysis (Cauchoix et al., 2014). These very rapid responses to faces may be due to low-level visual features that are more frequent in faces (Tanskanen et al., 2005). For example, Honey et al. (2008) and Crouzet and Thorpe (2011) demonstrated the importance of specific spatial frequency amplitudes underlying ultra-fast face detection. Specific properties of faces, such as eye gaze direction, head angle and personal familiarity, differentially facilitate detection even without awareness (Stein et al., 2011; Gobbini et al., 2013a,b). These findings raise the question of how such fast and preconscious processing can be achieved—through a subcortical system (for a review see Tamietto and de Gelder, 2010 but see also Pessoa and Adolphs, 2010) or through a cortical route with a fast feed-forward integration of information (VanRullen and Thorpe, 2001) and activation of the distributed network in the fronto-parietal areas for retrieval of person knowledge. Highly-learned representations of personally familiar faces may also include detectors for visual features—face fragments or more holistic configurations—that are diagnostic for familiar individuals (Butler et al., 2010). The facilitation of familiar face processing that appears to be at least partially independent of attentional resources and awareness may be due to activation of such learned diagnostic feature detectors. The results presented here suggest that these detectors also may be specific for features that carry social signals, such as eye gaze direction, head orientation, and expression.

A largely unexplored mechanism in the expertise for familiar faces involves detectors for diagnostic facial features in early visual cortex. Petro et al. (2013) have shown facial attributes such as gender and expression can be decoded, using multivariate pattern analysis (MVPA), in V1 cortical patches. Diagnostic features specific to familiar faces might be learned through experience and might afford "pre-recognition" detection, namely facilitated detection without an explicit recognition of the identity of highly familiar faces. Instead, explicit recognition of a highly familiar face may require top-down processing from neural systems that are involved in retrieval of person knowledge and in the emotional response, and this top-down input could serve to tune and amplify the visual representation of personally familiar faces (Gobbini and Haxby, 2007; Gobbini, 2010).

In this manuscript we have presented new evidence for facilitated processing of personally familiar faces. We have highlighted the importance of testing the human system for familiar face detection and recognition. Experiments using familiar faces as stimuli can offer insight on the organization of the neural systems for recognition of highly familiar objects, can help improve software for face recognition and can shed further light on practical issues such as flaws in eye witness reports. Our expertise with face recognition seems to be most developed for familiar faces, and unfamiliar face recognition is disappointing. Our expertise with familiar faces could be due to the integrated functioning of the distributed neural system for face perception at multiple levels (Haxby et al., 2000, 2001; Gobbini and Haxby, 2007; Haxby and Gobbini, 2011). The extended system components for the representation of person knowledge may interact with the representation of the visual appearance to stabilize and strengthen the representation of visual features that are diagnostic of the identity and facial gestures of familiar individuals. The development of a robust representation of the visual appearances of familiar individuals affords detection even in conditions with poor visibility (O'Toole et al., 2006; Burton and Jenkins, 2011). Activation of these simple features might facilitate detection preceding explicit recognition and facilitate processing of social signals. Understanding how learning tunes integrated processing of personally familiar faces in the hierarchical system for face perception may serve as a model for how learning tunes neural systems for recognition of other highly salient stimuli, such as gestures and actions, personal objects and places, or voices and written words.

## **AUTHOR CONTRIBUTIONS**

M. Ida Gobbini conceived the idea; M. Ida Gobbini, J. Swaroop Guntupalli and Matteo Visconti di Oleggio Castello designed the experiment; J. Swaroop Guntupalli and Hua Yang collected and analyzed the data of a pilot study; Matteo Visconti di Oleggio Castello collected and analyzed the data for the final experiment; M. Ida Gobbini and Matteo Visconti di Oleggio Castello wrote the manuscript; Hua Yang and J. Swaroop Guntupalli provided critical inputs to the final version of the manuscript.

#### **ACKNOWLEDGMENTS**

The authors would like to thank Jim Haxby and Carlo Cipolli for insightful discussions and comments on the manuscript. The authors would like to thank also Ming Meng and Patrick Cavanagh for feedback on the experimental design. Furthermore, the authors would like to thank Easha Narayan, Kelsey Wheeler and Courtney Rogers for help with subject recruitment and stimulus development.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 06 June 2014; accepted: 13 August 2014; published online: 02 September 2014.*

*Citation: Visconti di Oleggio Castello M, Guntupalli JS, Yang H and Gobbini MI (2014) Facilitated detection of social cues conveyed by familiar faces. Front. Hum. Neurosci. 8:678. doi: 10.3389/fnhum.2014.00678*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Visconti di Oleggio Castello, Guntupalli, Yang and Gobbini. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## What can individual differences reveal about face processing?

## **Galit Yovel <sup>1</sup>\*, Jeremy B. Wilmer <sup>2</sup> and Brad Duchaine<sup>3</sup>**

<sup>1</sup> School of Psychological Sciences, Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel

<sup>2</sup> Department of Psychology, Wellesley College, Wellesley, MA, USA

<sup>3</sup> Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA

#### **Edited by:**

Mark A. Williams, Macquarie University, Australia

#### **Reviewed by:**

Ruthger Righart, Technische Universität München, Germany Cyril R. Pernet, University of Edinburgh, UK

#### **\*Correspondence:**

Galit Yovel, School of Psychological Sciences, Sagol School of Neuroscience, Sharet Building, Tel Aviv University, Tel Aviv 69978, Israel e-mail: gality@ post.tau.ac.il

Faces are probably the most widely studied visual stimulus. Most research on face processing has used a group-mean approach that averages behavioral or neural responses to faces across individuals and treats variance between individuals as noise. However, individual differences in face processing can provide valuable information that complements and extends findings from group-mean studies. Here we demonstrate that studies employing an individual differences approach—examining associations and dissociations across individuals—can answer fundamental questions about the way face processing operates. In particular these studies allow us to associate and dissociate the mechanisms involved in face processing, tie behavioral face processing mechanisms to neural mechanisms, link face processing to broader capacities and quantify developmental influences on face processing. The individual differences approach we illustrate here is a powerful method that should be further explored within the domain of face processing as well as fruitfully applied across the cognitive sciences.

**Keywords: face recognition, individual differences, holistic processing, fusiform face area, behavioral genetics**

The cognitive and neural bases of face processing have been extensively investigated. The majority of these studies have taken a group-mean approach, focusing on the average cognitive or neural response and treating natural variation across individuals as noise. Here, we seek to highlight a small but growing literature that treats such variation as a valuable signal in its own right. These studies complement and extend group-mean studies by providing a powerful, independent way to examine the functional organization, neural bases, and developmental origins of skilled face processing (Wilmer, 2008). As such, the goal of this review is not merely to document the existence of individual differences in face processing. Rather, we focus on cases where associations and dissociations across individuals have advanced our theoretical understanding. These associations and dissociations have, for example, tested theorized links between behavioral face processing measures such as measures of holistic processing or the processing of familiar and unfamiliar faces; they have associated and dissociated different neural face processing measures and examined their relationships with behavior; they have dissociated face recognition from more general cognitive abilities; and they have isolated genetic and environmental contributions to face processing.

## **ARE FACE RECOGNITION ABILITIES MEDIATED BY HOLISTIC PROCESSING MECHANISMS?**

The notion that faces are processed more holistically than objects is one of the most extensively studied ideas in the face processing literature. Whereas holistic processing has been defined in many different ways in the literature (Richler et al., 2012), it has been typically measured with two main tasks: the part-whole task (Tanaka and Farah, 1993) and the composite face task (Young et al., 1987; Le Grand et al., 2004; see **Figure 1A**). In the part-whole task, subjects are asked to recognize face parts of faces they were previously learned either when they are embedded in a whole face or when presented alone. Performance level is typically better when the parts are presented within a whole face than when presented alone (**Figure 1A**, right). The composite task compares recognition of one half of a face when the other irrelevant half is inconsistent. Recognition of one face half is better when the other inconsistent half is misaligned than when it is aligned (**Figure 1A**, left).

Both of these tasks have shown interactive processing among face parts for upright faces but little or no interactivity for inverted faces or non-face objects (for review see, Maurer et al., 2002). These findings have led to suggestions that it is this holistic processing ability that underlies our remarkable face recognition abilities (Maurer et al., 2002). Whereas several studies with prosopagnosic individuals indicated impaired holistic processing of faces as measured with the composite task (Le Grand et al., 2006; Avidan et al., 2011; Palermo et al., 2011) suggesting that poor face recognition abilities may be associated with impaired holistic processing, this question has not been examined in normal individuals until recent individual differences studies examined the correlation between holistic processing measures and face recognition abilities.

The first study that directly assessed the correlations between these tasks (Konar et al., 2010) measured the composite effect by subtracting performance on aligned (i.e., whole condition) and misaligned (i.e., part condition) faces. Surprisingly this measure of holistic processing failed to correlate with their measure of face recognition abilities. Three subsequent studies, however, have found significant positive relationships between face recognition and both the composite effect (Richler et al., 2011; Wang et al., 2012; Degutis et al., 2013) and the part-whole effect (Wang et al., 2012; Degutis et al., 2013) but not with performance on a nonface global local task (Wang et al., 2012) suggesting that these correlations are not mediated by general visual processing abilities. Two of these studies (Richler et al., 2011; Degutis et al., 2013) used the reliable and well-validated Cambridge Face Memory Test (CFMT; Wilmer et al., 2012), which may partially account for the higher correlations they revealed. These studies further found that the size of the correlations depends on the analytic method used to measure holistic processing (regression vs. subtraction scores, Degutis et al., 2013) and the design of the composite task used (congruency/interference vs. standard design, Richler and Gautheir, 2013; Rossion, 2013).<sup>1</sup>

<sup>1</sup>The composite face effect has been measured in these studies with two different paradigms, the *congruency/interference paradigm* or the *standard*

Taken as a whole, these studies suggest that holistic processing and face recognition are indeed linked, but not as strongly as had been widely assumed; therefore, other factors must also contribute to face recognition abilities. One such factor was suggested by a recent study showing a small but significant correlation across individuals between face recognition ability and face aftereffects (Dennett et al., 2012). The face aftereffect is used as a tool to assess face space coding of face identity. The magnitude of the face aftereffect reflects the steepness of the response function along a given dimension (i.e., facial feature). Dennett et al. (2012) have revealed that steeper functions are associated with better face recognition abilities. Future studies are needed to assess the relative contributions of holistic processing, face space coding, and other factors to face recognition abilities.

## **DO DIFFERENT MEASURES OF HOLISTIC FACE PROCESSING TAP THE SAME MECHANISMS?**

Given evidence that faces are processed by holistic mechanisms, another basic question that has been overlooked for many years is whether different measures of holistic processing reflect the same holistic processing mechanism. The part-whole and the composite face effect have often been considered measures of the same process and have been used interchangeably in the face processing literature (Richler et al., 2012). It was therefore puzzling when studies revealed low correlations between these two holistic processing measures (*r* = 0.23 in Degutis et al., 2013; *r* = 0.03 in Wang et al., 2012). Degutis et al. (2013) however found that the correlation between the two holistic measures was substantial (*r* = 0.44) when holistic processing scores were computed via a regression-based method (**Figure 1B**) and were higher than when they were computed via the subtraction-based method used in group-mean studies; this finding has generated discussion of how to translate measures used in group-mean studies into a form that can validly capture both clinical and non-clinical human variation (for an extensive discussion of this question see Degutis et al., 2013) 2 . More generally, such individual differences based studies force us to critically examine commonly used measures and better determine what they measure.

## **ARE FACE PARTS AND THEIR SPACING REPRESENTED BY THE SAME MECHANISM OR DIFFERENT MECHANISMS?**

A study that examined individual differences in discrimination of face parts and with spacing between parts provides another example of how individual differences findings may not only complement data from group-means studies, but also aid in defining our measures of interest (Yovel and Kanwisher, 2008). The term configural processing has been frequently used to describe how faces are represented (Maurer et al., 2002). One way in which configural processing has been measured is by examining sensitivity to the distance among face parts (e.g., distance among the two eyes), which has been claimed to be critical for face recognition (Le Grand et al., 2001). However, more recent papers highlighted the role of both the spacing and the shape of their parts in face processing and suggested that they are both mediated by the same face processing mechanism (Yovel and Duchaine, 2006; McKone and Yovel, 2009; Amishav and Kimchi, 2010). An individual differences study strongly supported the latter claim, by showing a high correlation between discrimination of faces that differ in spacing among parts and faces that differ in the shape of parts for upright faces. In contrast, the same correlation was effectively zero for inverted faces or houses (Yovel and Kanwisher, 2008). These findings have led to the suggestion that the definition of holistic/configural processing should include the processing of both the shapes of parts and the spacing among them (McKone and Yovel, 2009). Consistent with these findings, Yovel and Duchaine (2006) have reported that prosopagnosic individuals show similarly poor discrimination of face parts and the spacing among them, which suggests both types of information are impaired in individuals with poor face recognition abilities.

## **DO FACE EXPRESSION AND FACE IDENTITY PROCESSING RELY ON DIFFERENT MECHANISMS?**

The question of whether face expression and identity are processed by a common mechanism or separate mechanisms has been debated in the cognitive (Ganel and Goshen-Gottstein, 2004), neuropsychological (Bruce and Young, 1986) and neuroimaging literature (Calder and Young, 2005; Gobbini and Haxby, 2007). An individual differences approach can address this question by assessing the correlations among tests of expression and identity processing. A critical prerequisite for such an approach, however, is reliable and valid tests of expression processing. In a recent study Palermo et al. (2011) argued that no existing expression processing test met the high standards necessary to enable such an approach. They then developed two new tests, one expression matching test and one expression labeling test. Both tests efficiently captured expression processing abilities, demonstrating strong reliability and validity despite their brevity; moreover, these tests demonstrated suitability for capturing a broad range of performance, avoiding the ceiling effects found in the majority of existing expression processing tests (Palermo et al., 2011). Interestingly, in an 80-person sample, the variation shared between these two expression recognition tests (*r* = 0.45) was virtually independent of performance on the CFMT (Duchaine and Nakayama, 2006), evidence for expression-specific mechanisms. At the same time, the expression matching test correlated robustly with the CFMT (*r* = 0.40), evidence for facegeneral mechanisms. Further studies are now needed to determine which particular expression processing mechanisms are shared with, and which are independent of, identity processing.

*paradigm*. A comprehensive discussion about the two types of paradigms and the extent to which they provide a valid measure of holistic processing are discussed in Richler and Gautheir (2013) and Rossion (2013). Briefly, the *standard paradigm* provides a measure of holistic processing by assessing the interference of irrelevant different face halves on the processing of the other face halves. The *congruency/interference design* provides a general stroop-like measure in which congruent and incongruent trials are compared.

<sup>2</sup>Regression scores are computed by regressing, rather than subtracting, the part-based condition from the whole face condition. In the part-whole task, regression scores are computed by regressing the part condition from the whole condition. In the composite face task, regression scores are computed by regressing the misaligned task from the aligned task. A more detailed discussion about the subtraction and regression approaches is found in Degutis et al. (2013).

## **ARE FAMILIAR AND UNFAMILIAR FACES PROCESSED BY DIFFERENT MECHANISMS?**

Face recognition abilities have been measured both with famous or personally familiar faces and with unfamiliar faces. In several studies Burton et al. provided evidence that familiar faces may be processed qualitatively differently from unfamiliar faces (Jenkins and Burton, 2011). They then used an individual differences approach to provide an independent test of their theory, examining the correlations between performance on several tasks that examined matching of upright and inverted familiar and unfamiliar faces (Megreya and Burton, 2006). Whereas correlations between tasks that measured unfamiliar face matching abilities were high, relatively low correlations were found between tasks that examined matching of unfamiliar and familiar faces. Interestingly, the correlation between matching of unfamiliar upright faces was highly correlated with matching of inverted familiar faces. Based on the notion that face processing mechanisms are specialized for the processing of upright but not inverted faces the authors interpret these findings as strong support for their theory of qualitatively different processing of familiar vs. unfamiliar faces, going so far as to suggest that "unfamiliar faces are not faces". These findings illustrate how individual differences can provide an independent test of a theory derived from group-mean studies. It is noteworthy that unlike the lack of correlation found in matching tasks, correlations between famous and unfamiliar faces are found in a memory task (Wilmer et al., 2012). Furthermore, most prosopagnosic individuals are impaired on both familiar and unfamiliar face recognition tasks (Duchaine et al., 2007; Dalrymple et al., 2011). Future studies are now needed to determine whether these findings, both group-mean based and individual differences based, hold across a variety of face matching and face memory tasks.

## **DO COGNITIVE AND NEURAL MEASURES OF FACE PROCESSING REFLECT THE SAME MECHANISMS?**

Faces are known to elicit robust and distinct neural responses with both functional MRI and electrophysiological measures (**Figure 2**). To better understand what type of processing these neural measures reflect, it is important to determine to what extent they are associated with cognitive measures of face processing as well as the extent to which different neural measures are correlated among themselves.

One of the most well-established findings in the face processing literature is the face inversion effect—that is the substantial drop in performance found for inverted relative to upright faces (**Figure 2A**, Yin, 1969). A difference between the group mean response to upright and inverted faces was found in two face areas, the fusiform face area (FFA) and the superior temporal sulcus face area (STS-FA) response was higher for upright than inverted faces, whereas in the lateral occipital complex (LOC) object area the response was higher for inverted than upright faces (Yovel and Kanwisher, 2005). However, a correlation between the behavioral and fMRI measures of the face inversion effect was found only with the FFA (**Figure 2D**). These findings suggest the FFA as a neural locus of the face inversion effect and highlight the importance of assessing correlations as well as differences in mean responses, because group means may be consistent with the behavioral effect but not associated with it.

The relationship between cognitive and neural measures of face processing has been also examined in a study that examined different cognitive measures of face perception and memory and various face-related event-related potential (ERP) components (Herzmann et al., 2010). This study revealed moderate correlations between a cognitive measure of face processing (i.e., a combined performance score on various perception and memory tasks) and the latency of the N170, an ERP component that is much stronger to faces than other stimuli (**Figure 2B**), but no correlation with an earlier component, the P100. Moderate correlations were also found with later ERP components related to face memory or person recognition. Such studies are important in determining which aspects of face processing are reflected by different ERP components and provide converging evidence to the majority of ERP studies that employ the more common group-based analysis approach (e.g., Schweinberger et al., 2004).

## **DO EEG AND fMRI MEASURES OF FACE PROCESSING REFLECT THE SAME MECHANISMS?**

Face-selective neural responses (i.e., higher group-mean response to faces than non-faces) have been reported in hundreds of fMRI and EEG studies. However, only one study has examined the correlation across individuals between the EEG and fMRI faceselective measures. This study revealed that the magnitude of faceselectivity in both the FFA and the STS-FA were associated with the face-selectivity of the EEG response approximately 170 ms after stimulus onset (N170) (Sadeh et al., 2010; **Figure 2E**). The face-selectivity of the occipital face area (OFA) was not correlated with the face-selectivity of the N170 but was correlated with ERP face-selectivity at 100–110 ms after stimulus onset, consistent with transcranial magnetic stimulation studies that varied pulse timing (Pitcher et al., 2007, 2012). These studies nicely demonstrate how correlational analysis of EEG and fMRI can reveal temporal dissociations among different brain regions and link different brain areas to the time course of different stages of face processing. Importantly, these correlations extend group-means findings by showing which of these neural face-selective measures, which have been primarily studied separately, are strongly linked and therefore reflect the same underlying neural mechanisms of face processing.

A similar approach has been used to investigate the face inversion effect present in ERP and fMRI measures. The midtemporal face-selective areas, the FFA and STS-FA show a higher response to upright than inverted faces. In contrast, object general areas (LOC) show a higher response to inverted than upright faces (Yovel and Kanwisher, 2005). The N170 shows increased and slightly delayed amplitude to inverted than upright faces. Two mechanisms have been suggested to account for the increased N170 amplitude to inverted than upright faces. According to the *qualitative hypothesis* increased amplitude for inverted faces reflects the recruitment of additional non-face mechanisms that are not used for the processing of upright faces. Thus, the increased response to inverted faces in the object area may contribute to the increased N170 amplitude to inverted faces. In

response to faces than non-face objects) regions in the occipital-temporal cortex: the occipital face area (OFA) in the lateral occipital cortex, the fusiform face area (FFA) in the fusiform gyrus and the superior temporal sulcus face

et al., 2010).

selectivity (i.e., difference in ERP or fMRI response to faces than non-faces) was found at 170 ms with the FFA and STS-FA, but not with the OFA (Sadeh contrast, the *quantitative hypothesis* suggests the same processes generate the N170 response to upright and inverted faces but that the increased amplitude for inverted faces reflects the greater demands that inverted face processing places on face mechanisms.

To directly test these two hypothesis, the N170 and fMRI face inversion effects were measured in a simultaneous EEGfMRI study. The N170 face inversion effect was calculated for each subject as the normalized difference between the response to upright and inverted faces (Sadeh et al., 2011). In addition, faceselective and object general areas were localized and the difference in their response to upright and inverted faces was measured. A correlational analysis between the fMR- face inversion effect (i.e., the difference between the response to inverted than upright faces) in the object and face-selective areas and the N170 face inversion effect revealed a very strong correlation with the object areas (*r* = 0.8) but not with the face areas. These findings further support the *qualitative hypothesis*, which suggests that inverted faces engage object mechanisms that are not used for the processing of upright faces (see also, Moscovitch et al., 1997; Pitcher et al., 2011).

These simultaneous fMRI-EEG studies nicely demonstrate how combining the two methods can provide insight into the temporal characteristics of brain areas and the possible neural generators of ERP signals. In particular, the correlations between the face-selective measures indicate an earlier latency for the OFA than for the mid-temporal face areas. The face inversion effect study attributed the increased amplitude of the N170 to inverted faces, to object areas rather than to the nearby face-area, a finding that cannot be obtained from source localization analysis of EEG data alone. These findings therefore do not only further establish the link between the ERP and fMRI face markers but also enhance our understanding of the spatial-temporal architecture of the face system.

### **HOW SPECIFIC IS FACE RECOGNITION ABILITY?**

In the sections above, we have explored how individual differences can link and dissociate mechanisms *within* the domain of face processing. Individual differences can also reveal links and dissociations *between* face processing and other cognitive abilities.

An active line of research has recently revealed that face recognition is an uncommonly specific ability. In the psychometric literature, the term *specific* is typically applied to an ability that shows some degree of independence from general intelligence (Wai et al., 2009). By this definition, face recognition is exceptionally specific. To date, its mean reported correlation with measures of general intelligence, weighted for sample size and corrected for range restriction in the IQ measures, is 0.01 (Davis et al., 2011; Peterson and Miller, 2012; Palermo et al., 2013).

CFMT are very different tests. FFMT measures the ability to name faces stored incidentally in memory over months or years of cultural exposure. memory shortly before being tested. This dissociation demonstrates that

CFMT captures a specific recognition capacity.

Moreover, face recognition dissociates strongly even from other types of recognition memory. For example, in diverse samples totaling over 4000 participants, well-validated tests of verbal and non-face visual recognition ability, respectively, explained only about 3% and 7% of the variance in face recognition ability (Wechsler, 1997; Wilmer et al., 2010, 2012; **Figure 3**). These findings are consistent with several reports that show that neuropsychological cases sometimes show selective impairment or sparing of face recognition (Moscovitch et al., 1997; Duchaine et al., 2006; Rezlescu et al., 2012; Busigny et al., 2014).

Ironically, the history of psychometric ability testing has seen face recognition ability dropped at least twice from prominent test development efforts when its pervasive dissociations from other social and memory abilities were mistaken for lack of valid measurement (Thorndike, 1936; Kihlstrom and Cantor, 2000; Holdnack and Delis, 2004; Wilmer et al., 2012). Only in recent years has it been discovered that face recognition's dissociations from other abilities reflect a valid, unique dimension of human ability (Wilhelm et al., 2010; Wilmer et al., 2010, 2012; Hildebrandt et al., 2011; McGugin et al., 2012; **Figure 3**). Guided by the example of face recognition, we suggest that an individual differences approach be used to further define the cognitive and neural boundaries of face processing, as well as to search for other unique social and cognitive ability dimensions.

## **HOW IS FACE RECOGNITION ABILITY SHAPED BY GENES AND ENVIRONMENT?**

Individual differences in face processing abilities provide a powerful vehicle for exploring the contributions of genes and environments to face processing via twin and family studies. A recent twin study showed that face recognition in adults is highly heritable (Wilmer et al., 2010). Correlations between identical twins on the CFMT (0.70) were twice those of fraternal twins (0.29), evidence that the strong family resemblance for face recognition ability resulted from genetic factors rather than common environmental factors (**Figure 4**).

The combination of face recognition's uncommon specificity and high heritability runs counter to a classic finding in behavioral genetics that more specific abilities are less heritable (Plomin et al., 2013; see section above on face recognition's specificity). That classic finding inspired a prominent theory that the majority of genetic variance in any cognitive ability is attributable to general intelligence (Kovas and Plomin, 2006). Face recognition presents a clear exception to that theory. Further, face recognition's heritability suggests that it could be used as a model system to study cognitive and neural resilience to environmental variation. Despite its strong dissociation from general intelligence, face recognition may be similar to general intelligence in showing increasing heritability with age (Wilmer et al., 2010; Zhu et al., 2010). If so, then increasing heritability may be a more generalized principle of development than previously recognized.

Future work is needed to determine whether adult face recognition can be parsed into heritable subcomponents. One twin study found a non-zero genetic contribution to the composite face effect, but not to the part-whole effect, suggesting a relatively constrained role of holistic face processing mechanisms in face recognition's heritability (Zhu et al., 2010). Future work could

also explore the specific genetic and environmental mechanisms by which a broad natural tendency for relatively good or poor face recognition ability is expressed. A richer understanding of such mechanisms might inspire novel interventions to enhance, or accommodations to support, the important social task of recognizing others in our everyday lives. Moreover, a more detailed understanding of the reasons for face recognition's high degree of resilience to environmental variation might fuel efforts to maximize neural and cognitive resilience in other domains as well.

## **CONCLUSION**

This review demonstrates how assessment of associations and dissociations among measures of face processing by an individual differences approach can provide answers to basic questions about face processing mechanisms. The questions tackled by the examples in our review address the nature of various face processing mechanisms, their relationship to each other, and their relationship to broader aspects of cognition. Many questions still await such investigations. These questions include: what associations and dissociations exist between additional face processing mechanisms, including those used to glean age, gender, and attractiveness? What are the detailed neural and genetic mechanisms of each aspect of face processing? What plasticity exists, at what ages, and what are the practical correlates good or bad at face processing? How do aspects of face processing beyond face recognition relate to a broader array of human capacities?

Most existing work on individual differences in face processing has aimed to isolate broad patterns of association and dissociation among abilities, or between abilities and their underlying mechanisms. Much work remains to be done at this relatively coarse level of analysis. At the same time, there is a need to begin digging deeper, making increasingly fine-grained theoretical distinctions about the specific neural, cognitive, genetic, and environmental mechanisms that shape and constitute such broad associations and dissociations. Fine-grained work of this sort is both promising and challenging; it requires (a) a greater number of high-quality tests; (b) more highly multivariate statistical models; and (c) a larger number of participant-hours. Each such requirement comes with its own costs and complications, however, all can be overcome for fine-grained questions of sufficient theoretical or practical import.

As this review indicates, correlational analyses not only expand our methodological and statistical armory but also force consideration of the theoretical meaning of our measures in a way that a group-mean approach may not. The individual differences approach can therefore provide valuable information that complements and extends the inferences supported by the commonly used group-mean approach. We anticipate this approach will be as fruitful in other domains of cognitive science as it has been, and will likely continue to be, in the study of face processing.

#### **REFERENCES**


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 April 2014; accepted: 10 July 2014; published online: 19 August 2014*. *Citation: Yovel G, Wilmer JB and Duchaine B (2014) What can individual differences reveal about face processing? Front. Hum. Neurosci. 8:562. doi: 10.3389/fnhum.2014. 00562*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Yovel, Wilmer and Duchaine. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Social and emotional relevance in face processing: happy faces of future interaction partners enhance the late positive potential

## **Florian Bublatzky \*, Antje B. M. Gerdes , Andrew J. White , Martin Riemer and Georg W. Alpers**

School of Social Sciences, Clinical Psychology, Biological Psychology, and Psychotherapy, University of Mannheim, Mannheim, Germany

#### **Edited by:**

Aina Puce, Indiana University, USA

**Reviewed by:**

Gilles Pourtois, University of Ghent, Belgium Elizabeth Bendycki DaSilva, Indiana University, USA

#### **\*Correspondence:**

Florian Bublatzky, School of Social Sciences, Clinical Psychology, Biological Psychology, and Psychotherapy, University of Mannheim, L13,17, 68131 Mannheim, Germany e-mail: f.bublatzky@ uni-mannheim.de

Human face perception is modulated by both emotional valence and social relevance, but their interaction has rarely been examined. Event-related brain potentials (ERP) to happy, neutral, and angry facial expressions with different degrees of social relevance were recorded. To implement a social anticipation task, relevance was manipulated by presenting faces of two specific actors as future interaction partners (socially relevant), whereas two other face actors remained non-relevant. In a further control task all stimuli were presented without specific relevance instructions (passive viewing). Face stimuli of four actors (2 women, from the KDEF) were randomly presented for 1s to 26 participants (16 female). Results showed an augmented N170, early posterior negativity (EPN), and late positive potential (LPP) for emotional in contrast to neutral facial expressions. Of particular interest, face processing varied as a function of experimental tasks. Whereas task effects were observed for P1 and EPN regardless of instructed relevance, LPP amplitudes were modulated by emotional facial expression and relevance manipulation. The LPP was specifically enhanced for happy facial expressions of the anticipated future interaction partners. This underscores that social relevance can impact face processing already at an early stage of visual processing. These findings are discussed within the framework of motivated attention and face processing theories.

**Keywords: face processing, social interaction, emotion, anticipation, ERP**

## **INTRODUCTION**

Humans are intrinsically social. From an evolutionary perspective, social information is critical for survival as it contributes to successful commitment, procreation and preservation (Tooby and Cosmides, 1992; Brothers, 2002). Thus, conspecifics are primary elicitors of emotions designed to promote both affiliation and protection in the face of constantly changing environmental conditions (Keltner and Kring, 1998). Accordingly, viewing facial stimuli is highly informative and mediates perceptual, physiological, and behavioral responses (Hamm et al., 2003; Vuilleumier and Pourtois, 2007).

To investigate the link between social and emotional information processing, the present study focuses on the social relevance of facial pictures. Human faces contain salient social signals mediating information about one's own and the others' identity, emotional state, and intentions (Ekman and Friesen, 1975; Öhman, 1986). The neural signature of face processing has been outlined in recent research (Haxby et al., 2002; Adolphs and Spezio, 2006). Given the crucial importance of being able to efficiently read and understand facial expressions, it has been proposed that distinct brain structures are centrally involved in face processing (e.g., fusiform face area (FFA), superior temporal sulcus (STS); Kanwisher et al., 1997 but see Chao et al., 1999). In addition, research has identified several neural substrates involved in both emotional and social processes (e.g., amygdala, insular, medial prefrontal cortex (MPFC); Gusnard et al., 2001; Norris et al., 2004; Williams et al., 2004; Northoff et al., 2006; Schmitz and Johnson, 2007; Olsson and Ochsner, 2008; Sabatinelli et al., 2011). In order to adequately interact in social situations, observing emotional facial expressions facilitates perceptual, attentional, and behavioral responses (Alpers and Gerdes, 2007; Alpers et al., 2011). For instance, in visual search tasks, threatening (schematic) faces are detected more quickly than friendly or neutral target faces especially among highly anxious participants (Byrne and Eysenck, 1995; Öhman et al., 2001). In line with an evolutionary perspective, this processing advantage has been described specifically for angry and fearful faces mediating potential threat to the observer (Byrne and Eysenck, 1995; Whalen et al., 2001).

Electrophysiological measures are particularly well-suited to investigate the temporal dynamics of face processing. Eventrelated brain potential (ERP) studies have revealed processing differences for facial stimuli within the first 100 ms after stimulus onset. For instance, suggested to reflect attention gain control in extrastriate sources, enhanced P1 amplitudes were observed for fearful compared to neutral faces in visuo-spatial attention tasks (Pourtois et al., 2004). Further, temporo-occipital negativities have been shown to be sensitive to facial stimuli (N170; Bentin et al., 1996) and emotional facial expression (early posterior Bublatzky et al. The social face

negativity, EPN; Schupp et al., 2004). The N170 is probably the most frequently investigated ERP component in face processing. It has been primarily related to the structural encoding of faces in temporo-occipital processing areas; for instance, as evidenced by studies manipulating structural features (e.g., face inversion; Itier and Taylor, 2002), presentation of specific face and body parts (e.g., only eye region; Itier et al., 2006), and spatial attention tasks (Holmes et al., 2003; Jacques and Rossion, 2007). Regarding the emotional state and intentions conveyed by facial expression, an early posterior negativity (occipito-temporal EPN; 150–300 ms) and late positive potentials (centro-parietal LPP; 300–700 ms) have been observed for angry as compared to neutral faces, but also for happy faces (Sato et al., 2001; Liddell et al., 2004; Schupp et al., 2004; Williams et al., 2006; Holmes et al., 2008; but see Wangelin et al., 2012). Further, these effects were more pronounced in socially anxious participants (Moser et al., 2008; Sewell et al., 2008; but see Mühlberger et al., 2009) and participants undergoing socially-mediated aversive anticipation (Wieser et al., 2010; Bublatzky and Schupp, 2012). Of particular interest, enhanced LPP amplitudes have been observed for neutral expressions of primed familiar faces (Schweinberger et al., 2002; Kaufmann et al., 2008), and when faces are high in social relevance (e.g., romantic partner, family members; Guerra et al., 2012).

The effects of emotional stimulus content on attention have also been documented with a variety of other visual stimuli (e.g., pictures of naturalistic scenes, words, and hand gestures; Schupp et al., 2006a; Kissler et al., 2007; Flaisch et al., 2009; Schacht and Sommer, 2009). Further, EPN and LPP components were found to vary as a function of emotional arousal (i.e., pronounced EPN/LPP for highly emotional arousing pictures; Schupp et al., 2006a), and the LPP appeared sensitive to emotion regulation (Hajcak et al., 2010; Thiruchselvam et al., 2011). In addition, both ERP components have been observed to occur spontaneously while passive picture viewing and during performance of concurrent explicit attention tasks (Schupp et al., 2006a; Pourtois et al., 2013). These results are in line with those of several neuroimaging studies (i.e., showing increased BOLD responses in distributed occipital, parietal and inferior temporal networks; Junghöfer et al., 2006; Sabatinelli et al., 2011) and studies that have shown clear differences in autonomic and reflex activity for emotional compared to neutral stimuli (e.g., Bradley et al., 2001). In sum, there is ample evidence supporting the notion that EPN and LPP components reflect motivationally guided selective attention to significant stimuli (Schupp et al., 2006a).

Building on these findings, the present study examined the joint effects of social relevance of facial stimuli and the displayed emotional expressions. Using an instructional learning paradigm, participants were informed that they would later be introduced to the person presented in a specific face picture. Thus, these faces acquired social relevance by virtue of being potential interaction partners and were contrasted with other non-relevant face actors. Furthermore, the manipulation of facial expressions (happy, neutral, angry) allowed to model the emotional valence and arousal of the anticipated social situation. In light of previous research on face processing, electrocortical processing was hypothesized to differentiate social relevant from non-relevant faces (enhanced EPN/LPP). Further, valence effects are proposed to account for prioritized emotion processing (e.g., threat- or happy-advantage; Schupp et al., 2004; Williams et al., 2006). For instance, based on higher motivational impact, emotional compared to neutral face processing may benefit from additional social relevance as reflected by enhanced LPP amplitudes (Schupp et al., 2007). Integrating different experimental paradigms and methodologies, the present study constitutes a new experimental approach to examine the mutual impact of social and emotional processes by means of the anticipation of a socially relevant situation.

## **METHODS**

## **PARTICIPANTS**

The sample consisted of 26 healthy volunteers (16 females) aged between 19 and 34 years (*M* = 23, *SD* = 4.3) and recruited from the University of Mannheim (STAI-State *M* = 35.3, *SD* = 4.6; STAI-Trait *M* = 38.8, *SD* = 7.7; SIAS *M* = 16, *SD* = 7.4; FNEbrief version *M* = 33.9, *SD* = 7.8). All participants were informed about the study protocol before providing informed consent in accordance with the university's ethics guidelines. Participants received course credits for their participation.

### **MATERIALS AND PRESENTATION**

Happy, neutral, and angry facial expressions of 4 different face actors (2 female) were selected from the Karolinska Directed Emotional Faces (KDEF; Lundqvist et al., 1998).<sup>1</sup> Pictures (1024 × 768 pixels) were randomly presented for 1 s without interstimulus gaps (see **Figure 1**). The full set of pictures (*N* = 12) was presented 60 times during two separate blocks, each consisting of 720 trials. The first block served as a control condition without specific instructions (passive viewing task). For the second block (meet task), two specific face actors (1 female and 1 male) were introduced as future interaction partners. Accordingly, two face actors were instructed as relevant whereas the other two face actors were non-relevant with respect to future interaction. Assignment of face stimuli to the relevant/non-relevant condition was counterbalanced across participants. Within blocks, face pictures were presented in a different order for each participant. Accounting for potential repetition effects (see Flaisch et al., 2008), picture randomization was restricted to no more than three repetitions of the same facial expression, equal transition probabilities between facial expressions and face actors, and no immediate repetition of the same actor displaying the same facial expression. Pictures were presented on a 22 inch computer screen located approximately 1 meter in front of the participants.

#### **PROCEDURE**

After the EEG sensor net was attached, participants were seated in a dimly-lit and sound-attenuated room. During a practice run (12 picture trials), participants were familiarized with the picture viewing procedure. In the following passive viewing task participants were instructed to attend to each picture appearing on the screen. Before the meet task, instructions were given concerning the relevance of face stimuli by indicating who were

<sup>1</sup>KDEF identifiers: actor 1: af20has, af20nes, af20ans; actor 2: af25has, af25nes, af25ans; actor 3: am10has, am10nes, am10ans; actor 4: am25has, am25nes, am25ans.

the relevant and non-relevant face actors. With respect to the kind of interaction situation, the meet instruction was deliberately kept vague and neutral ("You are going to meet one of these two people at the end of the experiment"). After each block, valence and arousal ratings of the picture set was assessed using the paperpencil version of the self-assessment manikin (SAM; Bradley and Lang, 1994). At the end of the experiment, a debriefing interview was completed.

#### **EEG RECORDING**

Electrophysiological data were collected using a 64 actiCap system (BrainProducts, Munich, Germany) with Ag/AgCl active electrodes mounted into a cap according to the 10–10 system (Falk Minow Services, Herrsching, Germany). The EEG was recorded continuously with a sampling rate of 500 Hz with FCz as the reference electrode, and filtered on-line from 0.1–100 Hz using VisionRecorder acquisition software and BrainAmp DC amplifiers (BrainProducts, Munich, Germany). Impedances were kept below 10 kΩ. Off-line analyses were performed using VisionAnalyzer 2.0 (BrainProducts) and EMEGS (Peyk et al., 2011) and included low-pass filtering at 30 Hz, artifact detection, sensor interpolation, baseline-correction, and conversion to an average reference (Junghöfer et al., 2000). Stimulus-synchronized epochs were extracted and lasted from 100 ms before to 800 ms after stimulus onset. Finally, separate average waveforms were calculated for the experimental conditions Facial Expression (happy, neutral, angry), Task (passive, meet), and Relevance (relevant, non-relevant),<sup>2</sup> for each sensor and participant.

## **DATA REDUCTION AND ANALYSES**

#### **Self-report data**

Valence and arousal ratings were analyzed with repeated measures ANOVAs including the factors Facial Expression (happy, neutral, angry), Task (passive, meet), and Relevance (relevant, non-relevant).

## **Event-related potentials**

To examine the effects of facial expression, instructed task, and relevance on face processing, a two-step procedure was used. As a first step, visual inspection and single sensor waveform analysis were used in concert to identify relevant ERP components. To this end, single sensor waveform analyses were calculated for each time point and each sensor separately (see Peyk et al., 2011) for the factors Facial Expression (happy, neutral, angry), Task (passive, meet), and Relevance (relevant, non-relevant). To correct for multiple testing, effects were only considered meaningful when the effects were observed for at least eight continuous data points and two neighboring sensors (cf., Bublatzky and Schupp, 2012). Supporting this cluster selection procedure, visual inspection helped ensure that no effects relevant to the main hypothesis regarding the interaction between Facial Expression, Task, and Relevance were missed.

Following this, conventional ERP analyses were based on area scores. Repeated measures ANOVAs based on mean activity in selected sensor clusters and time windows were performed. The P1 component was scored over parieto-occipital cluster (left: O1, PO3; right: O2, PO4) within 100 and 140 ms after picture onset. The N170 was scored at P7 and P8 between 150 and 200 ms. The EPN component was scored at bilateral posterior sensors (PO9 and PO10) between 260 and 360 ms after stimulus onset. To account for the broad distribution of the LPP component, mean activity was scored in bilateral centro-parietal clusters (left: FC1, C1, CP1, P1; right: FC2, C2, CP2, P2) in a time window from 450–700 ms.

An overall multivariate ANOVA tested interaction effects between Facial Expression (happy, neutral, angry), Task (passive, meet), Relevance (relevant, non-relevant), and Laterality (left, right) as a function of ERP Component (P1, N170, EPN, LPP) using Wilks statistics. Significant main effects were observed for Component, *F*(3,23) = 30.72, *p* < 0.001, η 2 *<sup>p</sup>* = 0.80, Facial Expression, *F*(2,24) = 11.78, *p* < 0.001, η 2 *<sup>p</sup>* = 0.50, Task, *F*(1,25) = 8.51, *p* < 0.01, η 2 *<sup>p</sup>* = 0.25, but not for Relevance, *F*(1,25) = 0.15, *p* = 0.70, η 2 *<sup>p</sup>* = 0.01, or Laterality, *F*(1,25) = 3.35, *p* = 0.08, η 2 *<sup>p</sup>* = 0.12. Of particular importance, higher-order interactions were revealed for Component by Facial Expression, *F*(6,20) = 9.81, *p* < 0.001, η 2 *<sup>p</sup>* = 0.75, and Component by Task, *F*(3,23) = 16.69, *p* < 0.001, η 2 *<sup>p</sup>* = 0.69. Directly testing the interaction between the three task-sensitive ERP components (P1, EPN, LPP) revealed significant variation of Component as a function of Facial Expression, *F*(4,22) = 13.23, *p* < 0.001, η 2 *<sup>p</sup>* = 0.71, and Task, *F*(2,24) = 26.13, *p* < 0.001, η 2 *<sup>p</sup>* = 0.69. To follow up on these interactions, separate repeated measures ANOVAs including the factors Facial Expression, Task, Relevance, and Laterality were conducted for each ERP component.

<sup>2</sup>Note: As the passive viewing task did not contain a relevance manipulation, an artificial data split was undertaken to adjust factor structure (i.e., each 1 male and 1 female face actor were assigned artificially to relevant/non-relevant condition).

For effects involving repeated measures, the Greenhouse-Geisser procedure was used to correct for violations of sphericity, and as a measure of effect size the partial η 2 (η 2 *p* ) are reported. To control for type 1 error, Bonferroni correction was applied for *post hoc t*-tests.

## **RESULTS**

#### **SELF-REPORT DATA**

Overall, valence ratings differed significantly for Facial Expression, *F*(2,48) = 308.06, *p* < 0.001, η 2 *<sup>p</sup>* = 0.93. Happy facial expressions (*M* = 7.86, *SD* = 0.16) were rated more pleasant than neutral and angry faces (*M* = 5.15 and 2.44, *SD* = 0.16 and 0.14), *p*s < 0.001, and neutral as more pleasant than angry faces, *p* < 0.001. Although a marginal significant main effect of Task, *F*(1,24) = 3.78, *p* = 0.06, η 2 *<sup>p</sup>* = 0.14, indicated that faces were rated as more pleasant during meet task, the interaction Facial Expression by Task was not significant, *F*(2,48) = 1.34, *p* = 0.27, η 2 *<sup>p</sup>* = 0.05. Neither instructed Relevance, *F*(1,24) = 0.57, *p* = 0.27, η 2 *<sup>p</sup>* = 0.05, nor any higher-order interaction reached significance, *F*s < 1, *p*s > 0.70, η 2 *<sup>p</sup>* < 0.01.

Arousal ratings varied for Facial Expression, *F*(2,48) = 45.82, *p* < 0.001, η 2 *<sup>p</sup>* = 0.66. Both happy and angry facial expressions (*M* = 4.21 and 5.81, *SD* = 0.38 and 0.33) were rated as more arousing than neutral (*M* = 2.62, *SD* = 0.29), *p*s < 0.001, and angry faces as more arousing than happy expressions, *p* < 0.01. No main effects were observed for Task or Relevance, *F*s(1,24) = 2.79 and 1.67, *p*s = 0.11 and 0.21, η 2 *<sup>p</sup>* = 0.10 and 0.07. However, arousal ratings varied as a function of Facial Expression by Task, *F*(2,48) = 6.28, *p* < 0.01, η 2 *<sup>p</sup>* = 0.21. To follow up on the differential impact of passive viewing and meet task, facial expressions were tested separately.

Happy face pictures, were rated as more arousing during passive viewing than meet task, Task *F*(1,24) = 14.32, *p* < 0.01, η 2 *<sup>p</sup>* = 0.37. Neither Relevance, *F*(1,24) = 0.01, *p* = 0.93, η 2 *<sup>p</sup>* < 0.01, nor the interaction Task by Relevance reached significance, *F*(1,24) = 1.08, *p* = 0.31, η 2 *<sup>p</sup>* = 0.04. Similarly, angry faces were rated higher in arousal during passive viewing compared to meet task, *F*(1,24) = 7.48, *p* < 0.05, η 2 *<sup>p</sup>* = 0.24. Neither Relevance, *F*(1,24) = 3.09, *p* = 0.09, η 2 *<sup>p</sup>* = 0.11, nor Task by Relevance, *F*(1,24) = 0.13, *p* = 0.72, η 2 *<sup>p</sup>* < 0.01, reached significance. In contrast, arousal ratings for neutral faces did not vary by Task, *F*(1,24) = 1.0, *p* = 0.33, η 2 *<sup>p</sup>* = 0.04, Relevance, *F*(1,24) = 0.12, *p* = 0.73, η 2 *<sup>p</sup>* = 0.01, or Task by Relevance, *F*(1,24) < 0.01, *p* = 1.0, η 2 *<sup>p</sup>* < 0.01.

## **EVENT-RELATED POTENTIALS**

Results indicated that verbal instructions about future interaction partners modulated early and late face processing as revealed by enhanced P1, EPN, and LPP amplitudes (**Figures 2**, **3**). Further, the interaction of social and emotional relevance varied across the visual processing stream. Whereas early components revealed independent main effects of Facial Expression and Task (shown by P1, N170, and EPN), the LPP was markedly augmented for happy faces considered as future interaction partners (**Figure 4**).

#### **P1 component**

Enhanced P1 amplitude for the meet compared to passive viewing task reached marginal significance, *F*(1,25) = 3.57, *p* = 0.07, η 2 *<sup>p</sup>* = 0.13, however, instructed Relevance did not increase P1 amplitude, *F*(1,25) = 0.01, *p* = 0.92, η 2 *<sup>p</sup>* < 0.01. Further, emotional Facial Expression modulated the P1 component, *F*(2,50) = 7.44, *p* < 0.01, η 2 *<sup>p</sup>* = 0.23. Follow-up tests revealed that amplitudes were more pronounced for angry facial expressions compared to neutral and happy faces, *F*s(1,25) = 8.71 and 10.72, *p*s < 0.01, η 2 *<sup>p</sup>* = 0.26 and 0.30. The difference between happy and neutral facial expressions was not statistically significant, *F*(1,25) = 0.70, *p* = 0.41, η 2 *<sup>p</sup>* = 0.03. The P1 amplitude was more pronounced over the right hemisphere, *F*(1,25) = 8.79, *p* < 0.01, η 2 *<sup>p</sup>* = 0.26. No further interactions including Facial Expression, Task or Relevance reached statistical significance, *F*s < 1.76, *p*s > 0.18, η 2 *<sup>p</sup>* < 0.07.

### **N170 component**

Whereas Task and Relevance did not modulate the N170, *F*s(1,25) = 0.31 and 0.33, *p* = 0.58 and 0.57, η 2 *<sup>p</sup>* = 0.01 and 0.01, amplitudes varied as a function of Facial Expression, *F*(2,50) = 9.23, *p* = 0.001, η 2 *<sup>p</sup>* = 0.27. The N170 was more pronounced for both happy and angry faces compared to neutral facial expressions, *F*s(1,25) = 26.70 and 3.98, *p*s < 0.001 and = 0.06, η 2 *<sup>p</sup>* = 0.52 and 0.14. The difference between happy and angry faces reached marginal significance, *F*(1,25) = 4.02, *p* = 0.06, η 2 *<sup>p</sup>* = 0.14. No main effect of Laterality was observed, *F*(1,25) = 2.75, *p* = 0.11, η 2 *<sup>p</sup>* = 0.10, nor any interaction including Facial Expression, Task, and Relevance reached statistical significance, *F*s < 0.71, *p*s > 0.48, η 2 *<sup>p</sup>* < 0.03.

#### **Early posterior negativity**

More pronounced negativity was observed for the meet compared to the passive viewing task, *F*(1,25) = 43.61, *p* < 0.001, η 2 *<sup>p</sup>* = 0.64, however, relevance instruction did not modulate the EPN, *F*(1,25) = 0.80, *p* = 0.38, η 2 *<sup>p</sup>* = 0.03. Replicating previous findings, the EPN amplitude varied as a function of Facial Expression, *F*(2,50) = 16.28, *p* < 0.001, η 2 *<sup>p</sup>* = 0.39. Happy and angry face processing was associated with enlarged EPN amplitudes compared to neutral stimuli, *F*s(1,25) = 37.91 and 10.94, *p* < 0.001 and 0.01, η 2 *<sup>p</sup>* = 0.60 and 0.30. Further, the EPN was more pronounced for happy compared to angry faces, *F*(1,25) = 5.05, *p* < 0.05, η 2 *<sup>p</sup>* = 0.17. In addition, more pronounced negativities were observed over the left in contrast to the right hemisphere, *F*(1,25) = 12.30, *p* < 0.01, η 2 *<sup>p</sup>* = 0.33. No further interactions including Facial Expression, Task, and Relevance reached statistical significance, *F*s < 1.1, *p*s > 0.33, η 2 *<sup>p</sup>* < 0.05.

#### **Late positive potential**

Broadly distributed LPP were modulated by Task, *F*(1,25) = 6.41, *p* < 0.05, η 2 *<sup>p</sup>* = 0.20, and Facial Expression, *F*(2,50) = 5.34, *p* = 0.01, η 2 *<sup>p</sup>* = 0.18. Happy and angry faces elicited larger LPPs compared to neutral materials, *F*s(1,25) = 11.46 and 5.21, *p*s < 0.01 and 0.05, η 2 *<sup>p</sup>* = 0.31 and 0.17, although no difference was found between

happy and angry facial expressions, *F*(1,25) = 0.23, *p =* 0.64, η 2 *<sup>p</sup>* = 0.01. No differences were observed for instructed Relevance, *F*(1,25) = 1.74, *p =* 0.20, η 2 *<sup>p</sup>* = 0.07, and Laterality, *F*(1,25) < 0.01, *p* = 0.98, η 2 *<sup>p</sup>* < 0.01.

Of particular interest, a significant interaction emerged for Facial Expression by Relevance, *F*(2,50) = 3.6, *p* < 0.05, η 2 *<sup>p</sup>* = 0.12. Further, a near-significant interaction was observed for Task by Relevance, *F*(2,50) = 3.32, *p =* 0.08, η 2 *<sup>p</sup>* = 0.12, but not for Facial Expression by Task, *F*(2,50) = 2.18, *p* = 0.14, η 2 *<sup>p</sup>* = 0.08, or the higher order interaction Facial Expression by Task by Relevance, *F*(2,50) = 0.08, *p* = 0.91, η 2 *<sup>p</sup>* < 0.01. To follow up these interactions, analyses were conducted separately for each experimental task (see **Figure 4**).

For the meet task, a significant main effect of Facial Expression was observed, *F*(2,50) = 4.86, *p* < 0.05, η 2 *<sup>p</sup>* = 0.16. Follow-up analyses revealed pronounced LPP amplitudes for happy faces, *F*(1,25) = 9.33, *p* < 0.01, η 2 *<sup>p</sup>* = 0.27, and marginally significant for angry compared to neutral facial expressions, *F*(1,25) = 3.89, *p* = 0.06, η 2 *<sup>p</sup>* = 0.14. No difference was observed between happy and angry facial expressions, *F*(1,25) = 1.26, *p =* 0.27, η 2 *<sup>p</sup>* = 0.05. Whereas, the interaction Facial Expression by Relevance did not reach significance, *F*(2,50) = 1.83, *p* = 0.18, η 2 *<sup>p</sup>* = 0.07, a near-significant main effect of Relevance was observed, *F*(1,25) = 3.85, *p =* 0.06, η 2 *<sup>p</sup>* = 0.13. Exploratory follow-up analyses testing relevant compared to non-relevant faces revealed enhanced LPP amplitudes for relevant happy faces, *F*(1,25) = 4.12, *p =* 0.05, η 2 *<sup>p</sup>* = 0.14, but not for relevant neutral, *F*(1,25) = 2.32, *p* = 0.14, η 2 *<sup>p</sup>* = 0.09, or angry faces, *F*(1,25) = 0.02, *p* = 0.91, η 2 *<sup>p</sup>* < 0.01.

In contrast, for the passive viewing task, only the main effect of Facial Expression reached marginal significance, *F*(2,50) = 2.75, *p* = 0.07, η 2 *<sup>p</sup>* = 0.10. Follow-up tests revealed pronounced LPP for angry faces, *F*(1,25) = 4.47, *p* < 0.05, η 2 *<sup>p</sup>* = 0.15, and marginally enhanced amplitudes for happy faces, *F*(1,25) = 2.92, *p* = 0.10, η 2 *<sup>p</sup>* = 0.10, compared to neutral facial

expressions. No difference was observed for happy and angry faces in the passive viewing task, *F*(1,25) = 0.35, *p* = 0.56, η 2 *<sup>p</sup>* = 0.01. Neither the main effect Relevance, *F*(1,25) < 0.01, *p* = 0.99, η 2 *<sup>p</sup>* < 0.01, and Laterality modulated LPP amplitudes, *F*(1,25) = 0.03, *p* = 0.86, η 2 *<sup>p</sup>* < 0.01, nor any interaction reached significance, *F*s < 1.86, *p*s > 0.17, η 2 *<sup>p</sup>* < 0.07.

## **DISCUSSION**

The present study examined the impact of instructed social relevance and emotional facial expression on face processing. The main finding was that the mere verbal instruction about social contingencies can modulate early and late face processing as indicated by enhanced P1, EPN, and LPP amplitudes. Importantly, event-related potential measures revealed that the interaction of social and emotional significance varied across visual processing stream. Whereas rather early components revealed independent main effects of facial expression and task instruction (P1, N170, and EPN), the LPP was augmented specifically for happy faces of future interaction partners. These results support the notion of joint impact of emotional and social information mediating face perception.

The anticipation of social interaction with another individual is of considerable value. In the present study, social relevance was manipulated by introducing two specific face actors as future interaction partners (meet task). Results indicate that this socioemotional context is associated with specific processing patterns as participants view face pictures. The first ERP component sensitive to both task instruction and emotional facial expression was the P1 component, which was enlarged for angry faces compared to happy and neutral facial expressions. Further, regardless of facial expression, enhanced P1 amplitudes were observed during meet compared to passive viewing task. Thus, several previous findings were replicated: enhanced P1 amplitudes in explicit attention tasks (Hillyard and Anllo-Vento, 1998; Pourtois et al., 2004) and implicit processing biases during self-relevant

conditions (e.g., instructed threat or in specific phobia; Kolassa et al., 2006; Michalowski et al., 2009; Bublatzky and Schupp, 2012). Presumably based on intensified visual processing in the extrastriate cortex (Hillyard and Anllo-Vento, 1998; Pourtois et al., 2013), the present P1 effects may indicate enhanced vigilance during task conditions of high self-relevance.

Both N170 and EPN components varied as a function of emotional facial expression. As enhanced negativities have been found for both happy and angry compared to neutral faces, these findings suggest that selective face processing occurs as a function of stimulus arousal. Whereas the N170 has been mostly related to structural encoding of non-affective facial features (Sato et al., 2001; Eimer and Holmes, 2002) within occipitotemporal areas (e.g., STS; Itier and Taylor, 2004), the present data are in line with a growing body of literature showing that the N170 is subject to emotional modulation (Pizzagalli et al., 2002; Batty and Taylor, 2003; Rossignol et al., 2005) similar to the EPN component. Further, indicating the enhanced relevance of facial stimuli for (sub-) clinical populations with high levels of social anxiety, pronounced N170 and EPN amplitudes have been observed for angry facial expression (Kolassa and Miltner, 2006; Mühlberger et al., 2009; Wieser et al., 2010). Here, valence specific effects were observed in healthy participants, however, with more pronounced N170/EPN for happy facial expressions. One promising direction for future studies is to manipulate the implicit level of social relevance when examining interindividual differences in emotional face processing (e.g., familiar loved vs. unfamiliar faces displaying emotions; Guerra et al., 2012).

Regarding late positive potentials, face processing was modulated by both task- and emotional relevance. Similar to past research (Schupp et al., 2004), faces displaying angry expressions were associated with enhanced LPP amplitudes, however, this effect was similarly present for happy faces. Of particular interest, the social relevance manipulation revealed an interactive relationship with emotional facial expression. Whereas both happy and angry faces elicited an enhanced late parietal positivity compared to neutral stimuli, this effect was more pronounced when viewing potential interaction partner displaying happy facial expressions. A similar trend was observed for neutral, but not angry, faces of purported interaction partners compared to non-relevant faces. Thus, whereas emotional and social relevance independently modulated early ERP components indicating either a threat advantage (P1) or selective emotion processing (EPN)—later processing stage revealed specifically enhanced amplitudes for socially relevant happy faces (LPP). These findings appear in line with the evaluative space model of affective processing (Cacioppo and Berntson, 1994; Cacioppo et al., 1999). Depending on the level of activation, emotional input may provoke different processing and response gradients. For instance, at low activation levels, pleasant stimuli may exert a greater influence than unpleasant stimuli in guiding motivational tendencies (e.g., explorative behavior). Accordingly, in rather lowarousing experimental conditions, happy facial expression may be more efficient in activating the motivational approach system than angry faces fostering avoidance. This hypothesis could be tested with socially relevant faces presented under conditions of low and high arousal (e.g., threat-of-shock paradigm; Grillon and Charney, 2011; Bublatzky et al., 2010, 2013). Importantly, future research is needed to connect findings from the perceptual/attentional domain to the functional level, for instance, by testing approach/avoidance behavior (e.g., decision making; Pittig et al., 2014) to socially relevant happy/angry faces in social phobia (Wangelin et al., 2012).

Over and above the impact of implicit stimulus relevance (i.e., emotional facial expression), explicit instructions about social relevance in the meet task was associated with increased P1, EPN, and LPP amplitudes. These findings may complement recent research utilizing selective attention paradigms (Delorme et al., 2004; Pourtois et al., 2013). For instance, Schupp et al. (2007) observed pronounced EPN and enhanced late parietal positivities for target pictures of different semantic categories. Of particular interest, pictures displaying highly arousing content potentiated attention effects specifically during later processing stages (Schupp et al., 2007). In the present social anticipation task, the emotional facial features were no counting targets and actually rated as little arousing; however, a boost of emotion-focused attention was observable specifically for happy facial expression of purported interaction partner. Here, the reference to neural systems involved in various means of relevance processing based on bottom-up or top-down regulation may be informative (e.g., relevance based on task instruction, emotional, or social information; Schmitz and Johnson, 2007; Pourtois et al., 2013). For instance, paying attention to specific stimulus features modulates BOLD responses in the visual cortex (Kastner and Ungerleider, 2000), and for both emotional scenes and facial expressions a great overlap of neural activity has been demonstrated in the amygdala and medial prefrontal cortex (Sabatinelli et al., 2011); the latter being strongly involved in self-referential processing (Gusnard et al., 2001; Northoff et al., 2006; Olsson and Ochsner, 2008; Schwarz et al., 2013).

Several noteworthy aspects and alternative explanations of the present findings need to be acknowledged and should be addressed in future research. First, the critical test of the interaction between social relevance and facial expression was based on the processing of the same face stimuli that differed only in instructed social relevance. This approach has the advantage of ruling out potential effects due to physical differences, as apparent in comparing "social" vs. "non-social" stimuli, however, required that a fixed order of passive viewing task, followed by social meet instruction, was adopted. Thus, excessive stimulus repetitions may have reduced emotion or task effects. However, similar to previous research (Codispoti et al., 2006; Schupp et al., 2006b), neither EPN nor LPP components revealed a reduction of selective emotion processing in the later task. On the contrary, the present LPP amplitudes were generally enhanced during social anticipation task. Furthermore, cognitive processes—such as working memory load or implicit emotion regulation—may have contributed to the absence of enhanced LPP to socially relevant angry faces. For instance, recent studies observed reduced LPP amplitudes to aversive stimuli under working memory load, suggesting that threat processing is contingent on available cognitive resources (MacNamara et al., 2011; Van Dillen and Derks, 2012). Alternatively, implicit emotion regulation may have reduced LPP amplitudes to aversive stimuli as shown in previous studies (Hajcak et al., 2010; Thiruchselvam et al., 2011). Here, future research may implement resource competition (e.g., by means of concurrent tasks or distractor stimuli) and active emotion regulation strategies. This could help clarify how social relevance affects emotional and cognitive processes in face perception.

The effects of selective attention elicited by either implicit emotional or explicitly instructed task relevance have been assessed in previous studies (Schupp et al., 2007; Pourtois et al., 2013). Extending this line of research, the present study utilized a novel approach to manipulate stimulus relevance by introducing specific face actors as future interaction partner. Social relevance was found to modulate face processing differently across visual processing stream. Whereas early ERP components revealed independent effects of social and emotional relevance (P1, N170, EPN), later processing stages were associated with specifically enhanced LPP for happy facial expressions when displayed by future interaction partner. Thus, social relevance may facilitate evaluative face processing according to socio-emotional settings (i.e., future interaction; Fischer and van Kleef, 2010).

### **ACKNOWLEDGMENTS**

We are grateful to Susanne LaMura for her assistance in data collection. This work was supported, in part, by the "Struktur-und Innovationsfonds (SI-BW)" of the state of Baden-Wuerttemberg, Germany.

#### **REFERENCES**


absence of habituation in the visual cortex. *Neuroreport* 17, 365–369. doi: 10. 1097/01.wnr.0000203355.88061.c6


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2014; accepted: 17 June 2014; published online: 16 July 2014*.

*Citation: Bublatzky F, Gerdes ABM, White AJ, Riemer M and Alpers GW (2014) Social and emotional relevance in face processing: happy faces of future interaction partners enhance the late positive potential. Front. Hum. Neurosci. 8:493. doi: 10.3389/fnhum.2014.00493*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Bublatzky, Gerdes, White, Riemer and Alpers. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## Face, eye, and body selective responses in fusiform gyrus and adjacent cortex: an intracranial EEG study

#### *Andrew D. Engell <sup>1</sup> and Gregory McCarthy2 \**

*<sup>1</sup> Kenyon Psychological Neuroscience Laboratory, Department of Psychology, Kenyon College, Gambier, OH, USA <sup>2</sup> Human Neuroscience Laboratory, Department of Psychology, Yale University, New Haven, CT, USA*

#### *Edited by:*

*Aina Puce, Indiana University, USA*

#### *Reviewed by:*

*Nathalie George, Centre National de la Recherche Scientifique, France Shozo Tobimatsu, Kyushu University, Japan*

#### *\*Correspondence:*

*Gregory McCarthy, Department of Psychology, Yale University, PO Box 208205, New Haven, CT 06520-8205, USA e-mail: gregory.mccarthy@yale.edu*

Functional MRI (fMRI) studies have investigated the degree to which processing of whole faces, face-parts, and bodies are differentially localized within the fusiform gyrus and adjacent ventral occipitotemporal cortex. While some studies have emphasized the spatial differentiation of processing into discrete areas, others have emphasized the overlap of processing and the importance of distributed patterns of activity. Intracranial EEG (iEEG) recorded from subdural electrodes provides excellent temporal and spatial resolution of local neural activity, and thus provides an alternative method to fMRI for studying differences and commonalities in face and body processing. In this study we recorded iEEG from 12 patients while they viewed images of novel faces, isolated eyes, headless bodies, and flowers. Event-related potential analysis identified 69 occipitotemporal sites at which there was a face-, eye-, or body-selective response when contrasted to flowers. However, when comparing faces, eyes, and bodies to each other at these sites, we identified only 3 face-specific, 13 eye-specific, and 1 body-specific electrodes. Thus, at the majority of sites, faces, eyes, and bodies evoked similar responses. However, we identified ten locations at which the amplitude of the responses spatially varied across adjacent electrodes, indicating that the configuration of current sources and sinks were different for faces, eyes, and bodies. Our results also demonstrate that eye-sensitive regions are more abundant and more purely selective than face- or body-sensitive regions, particularly in lateral occipitotemporal cortex.

**Keywords: face area, body area, eye, face-part, N200, ECoG, iEEG**

## **INTRODUCTION**

The discovery of face-selective neurons in the macaque temporal lobe (Gross et al., 1969, 1972) set in motion a productive research program in the study of face perception in the primate visual system. Functional neuroimaging (Sergent et al., 1992; Haxby et al., 1994; Puce et al., 1995) and intracranial EEG (iEEG) (Allison et al., 1994a,b) studies expanded that program to the neural basis of face perception in the human. These early studies showed that faces selectively activated regions of the (predominantly right) ventral occipitotemporal cortex (VOTC). In particular, functional MRI studies have consistently identified a small region of the lateral mid-fusiform gyrus as face selective, a region that is often referred to as the "fusiform face area" (FFA; Kanwisher et al., 1997). Subsequent fMRI research identified a region posterior to the FFA, dubbed the "occipital face area" (OFA) that is also preferentially activated by the perception of faces (Gauthier et al., 2000). These regions have since been promoted as "core nodes" in an extended face processing network (e.g., Haxby et al., 2000; Rossion et al., 2003; Calder and Young, 2005; Ishai, 2008; Pitcher et al., 2011).

While regions within the VOTC are unequivocally sensitive to faces, VOTC regions are also active during perception of non-face corporeal stimuli. For example, the perception of bodies also evokes a larger hemodynamic response than the perception of non-corporeal objects along part of the fusiform gyrus (Schwarzlose et al., 2005; Peelen and Downing, 2005; Peelen et al., 2006; Pinsk et al., 2009; van de Riet et al., 2009; see de Gelder et al., 2010). Although there is substantial overlap between VOTC areas activated by faces and by bodies, some studies have identified a discrete region activated by bodies that is dissociable from the FFA and which has been named the "fusiform body area" (FBA; Schwarzlose et al., 2005). Isolated bodies also activate lateral occipitotemporal cortex (LOTC) at the intersection of the anterior occipital and inferior temporal gyri, an area named the "extrastriate body area" (EBA; Downing and Peelen, 2011).

Other studies have shown that regions of the occipitotemporal cortex (OTC) co-extensive with the FFA, the FBA, and the EBA respond to biological motion (Bonda et al., 1996; Grossman and Blake, 2002; Peelen et al., 2006; Grezes et al., 2007; Pichon et al., 2008; Engell and McCarthy, 2013; Shultz et al., 2014). These results suggest the possibility that these regions are not responding to specific body parts *per se*, but are engaged by the processing of intentional or social agents (Shultz et al., 2014). Indeed, in a recent study, Shultz and McCarthy (Shultz and McCarthy, 2012) showed that areas of the VOTC co-extensive with the FFA responded to the apparently purposeful motion of machines.

Most studies of non-face corporeal perception have been conducted using fMRI. Intracranial electroencephalography has millisecond temporal resolution and, depending upon the configuration of electrodes, can have high anatomical resolution (although coverage can be sporadic). There have been very few iEEG studies of social agent perception that have focused on stimuli other than whole-faces. One of the few such reports found that images of hands evoked a category-selective event-related potential (ERP) from recording sites on the right VOTC and left LOTC cortices (McCarthy et al., 1999). At these locations there was no concomitant category-selective response to faces or faceparts (eyes, lips, and noses). Similarly, a more recent report found a single body-selective site on the right LOTC at which there was no appreciable response to faces (Pourtois et al., 2007).

These findings provide limited support for the idea that face and non-face body parts are processed in distinct brain areas. However, given the extent of the activation overlap between faces and bodies observed in fMRI, a more systematic study is warranted, particularly since one of the two aforementioned studies used images of isolated hands rather than bodies as a stimulus (McCarthy et al., 1999). Subsequent neuroimaging has shown that hands and bodies evoke dissociable neural responses (Bracci et al., 2010). The second study reported results from a single electrode within a single patient. Small samples are common in iEEG as this method relies on the participation of individuals undergoing invasive brain procedures, often for pharmacologically intractable epilepsy. Nonetheless, results from a single electrode raise concerns about the replicability and generalizability of the findings.

Here we use iEEG to investigate the functional selectivity and spatial relationship of the response to three visual categories of social agents (whole faces, eyes in isolation, and headless-bodies). Both time-locked ERPs and event-related spectral perturbations (ERSPs) were investigated. We address the possible limitations of previous iEEG experiments by using images of whole bodies (without heads) rather than isolated body parts, and by using a large sample of 1536 electrode sites across twelve patients. In addition to evaluating the amplitude of the ERP at each subdural electrode to faces, bodies, and eyes, we also examined the spatial distribution of the ERPs evoked by each stimulus type across adjacent electrodes. The spatial configuration of current sinks and sources relative to the recording electrodes determines the spatial distribution of voltage over the cortex. If the same source configuration was responsible for the ERP evoked by each different stimulus, then the spatial distribution would be the same. However, if the spatial distribution of voltage evoked by faces, bodies, and eyes differed across closely adjacent sites, this would be strong evidence that a different pattern of neural activity, and perhaps a different subset of neurons, was activated by the different stimulus categories.

## **MATERIALS AND METHODS PROCEDURE**

Stimulus presentation was computer controlled and displayed on a 17-- LCD monitor (800 × 600 pixels) positioned on a table over the patient's bed. The viewing distance was adjusted for patient comfort. Patients were asked to view sequentially presented stimuli that were randomly selected from four categories: novel faces, eyes in isolation, headless bodies, and flowers (**Figures 1A,B**). In

total, patients viewed 40 unique exemplars from each stimulus category. Each image was displayed for 750 ms with a jittered stimulus onset asynchrony that varied randomly between 1800 and 2200 ms (**Figure 1C**). In the first version of the experiment (see below) images from the face, eye, body, and flower categories were presented a roughly equal number of times (77–83 trials of each for patient 1, and 174–188 trials of each for patient 2). In the second version, 40 trials were presented from each of these categories. Two patients experienced a longer version in which 80 trials were presented from each category. To ensure the patient's engagement with the task, a target circle was presented on ∼11.1% of trials to which a speeded button press response was required. Presentation of the stimuli was intermittently paused to give the patients a rest period.

#### **STIMULI**

The first two of our twelve participants saw grayscale images (**Figure 1A**). Face stimuli were created from photographs taken from various sources (see Allison et al., 1999 for details). Eye stimuli were the same pictures used for faces, but cropped so that only the eye-region remained visible. Body stimuli were photographs of males and females, digitally cropped so that the head was removed. Flower stimuli were color photographs converted to grayscale images. Prior studies from our laboratory have shown that face-selective ERPs evoked at subdural VOTC electrodes are readily dissociable from many different non-corporeal object categories, including letter strings, patterns, and flowers (cf. Allison et al., 1994a, 1999; McCarthy et al., 1999; Engell and McCarthy, 2010, 2011). We chose flowers as control stimuli because they are a category of living stimuli with visual symmetry and can be readily individualized.

The remaining ten participants saw color stimuli (**Figure 1B**). Face stimuli were realistic faces created using FaceGen (Singular Inversions, Toronto, Ontario, Canada). Eye stimuli were the same images used for faces, but cropped so that only the eye-region remained visible. Body stimuli were created using Poser 6.0 (Curious Labs Inc., Santa Cruz, California, USA). Flower stimuli were the same as those described above, but were not converted to grayscale.

### **EEG ACQUISITION**

Recordings were obtained from 1536 electrodes implanted in 12 patients (median age = 33 years, age range: 18–54 years, 8 female, 4 male) with medically intractable epilepsy who were being evaluated for possible surgery by the Yale Epilepsy Surgery Program (Spencer et al., 1982). In these patients, strips or grids of stainless steel electrodes (2.2 mm surface diameter) were placed subdurally on the cortical surface. The placement of the strips was determined by the clinical needs of each patient, and thus electrode locations varied across individuals. The studies reported here were included among other sensory and cognitive experiments in which each subject participated, typically 4–8 days following implantation of electrodes. At the time of participation, medication levels to control seizures and post-operative pain varied across patients. The EEG experiments were not conducted immediately before or after seizures nor were any of our sites of interest revealed to be in epileptogenic cortex. The EEG protocol was approved by the IRB of the Yale University School of Medicine. All participants provided informed consent.

Local field potentials were recorded and amplified with a common reference using an SA Instruments EEG amplifier system with a 0.1–100 Hz bandpass. The reference was a small post implanted in the outer table of the patient's skull. The location of this post varied across patients, but it was always in the skull adjacent to superior frontal or parietal cortex. Most often, the post was implanted at the top of the skull in a region roughly adjacent to electrodes C3 or C4 of the 10-20 EEG Electrode system. From each patient we simultaneously recorded from 128 electrodes with a concentration of sites on ventral occipitotemporal, lateral occipital, posterior lateral temporal, and parietal cortices. The EEG signal was continuously digitized with 14-bit resolution and a sampling rate of 500 Hz using a Microstar 4200 A/D data acquisition board. The digitized signal was written to disk using a custom PC-based acquisition system. A digital code unique to each experimental condition was recorded in a separate channel at the onset of each stimulus presentation.

#### **ELECTRODE LOCALIZATION**

A high-resolution anatomical scan (1 × 1× 1.5 mm) was acquired for each patient prior to implantation. Post-implant CT scans in which the electrodes were easily detected and localized in 3D were then co-registered to the anatomical MR data. Each patient's brain was transformed to MNI space using the Bioimage Suite software package (http://www*.*bioimagesuite*.*org) to facilitate visualization of recording sites of interest from all patients on a standard brain. In cases in which the inherent imprecision of spatial normalization resulted in an electrode appearing just off the brain, the electrode position was projected to the cortical surface. This approach allowed for a convenient graphical representation of the overall distribution of electrodes on the brain's surface (**Figures 2A**, **3**). However, as the exact gyral and sulcal boundaries of the brain varied among our subjects, this summary view does not precisely reflect the location of any individual electrode with respect to anatomical landmarks of the subject's brain in which it was located.

Instances of conditional voltage changes over space (see Rate of Voltage Changes Over Space) are also displayed on cortical surfaces. In an effort to preserve the relationship of electrode locations to sulcal and gyral boundaries, we projected these electrodes on to each individual's brain (**Figure 4**, P1, P2, P3, P4, & P7). However, for two of the participants the signal to noise was insufficient to achieve quality segmentation so the electrodes from these individuals are shown on a canonical brain surface (**Figure 4**, P5 & P6).

#### **EVENT-RELATED POTENTIAL (ERP) ANALYSIS**

ERP analyses were performed using custom MATLAB (The Mathworks, Inc.) functions. Residual line noise (60 Hz) filtering was performed in Matlab using a 5th order Butterworth filter that was applied in a temporally symmetric manner to avoid introducing phase shifts. Baseline adjusted ERPs were created by signal averaging the EEG across trials for each experimental condition and subtracting from each time-point the average of a 100 ms pre-stimulus epoch. Low-pass filtering was achieved with a temporally symmetric smoothing kernel with a total length of five time-points (from −2 to +2 time points) that was convolved with the average ERP waveforms prior to amplitude and latency measurements of the N200.

A computer algorithm was used to identify electrodes that were "selective" for a particular category. Guided by previously published criteria (Allison et al., 1999), face-, eye-, and body-selective sites were defined as those with a peak negativity occurring between 160 and 240 ms post stimulus onset (N200) that was at least −50μV in amplitude and at least twice as large for the category of interest than for the control condition (flowers). Similar selection criterion (i.e., a category response twice as large as for all other tested categories) has previously been used in both single cell (e.g., Perrett et al., 1982; Baylis et al., 1985; Leonard et al., 1985) and human local field potential (LFP) (e.g., Puce et al., 1997; Allison et al., 1999; McCarthy et al., 1999; Puce et al., 1999; Engell and McCarthy, 2010, 2011) investigations of faceselective responses. Consistent with previous human LFP studies, this was based on the qualitative comparison of the peak magnitude of ERPs. Automated detection by the computer algorithm was followed by visual inspection by the authors to screen for artifacts. Nine of 41 face-selective electrodes, three of 54 eyeselective electrodes, and seven of 29 body-selective electrodes that were identified by the computer algorithm were excluded from analysis.

For each set of category-selective electrodes, we created an average ERP from all electrodes contributed by a given patient. We then identified the peak amplitude of each category-evoked response from within our epoch of interest (160–240 ms) and the latency at which the peak occurred. Wilcoxon signed rank tests were then used for pairwise contrasts of the four conditions to test for differences in the peak amplitude and latency of the N200 response. For each group of category-selective electrodes,

**(A)** The locations of 69 category-selective electrodes displayed on a standard brain (left panel). Selectivity was defined by contrasting each of the conditions of interest (faces, eyes, bodies) to flowers. Therefore, a single electrode could be identified as "selective" for more than one condition. The color at each location indicates which category (or categories) met selectivity criteria (see Materials and Methods). For reference to standard imaging results, the ORANGE overlay indicates voxels at which there is a ≥33% probability of being face-selective in a face vs. scene fMRI experiment (Engell and McCarthy, 2013). **(B)** The

**FIGURE 3 | Category-specific sites.** The locations of category-specific electrodes overlaid on a standard brain. At these sites, the category-selectivity was determined by contrasting the condition of interest to *all* other conditions, not only flowers. Using this more conservative criterion we found three face-selective, thirteen eye-selective, and one body-selective site. A color border indicates electrodes that were contributed by the same patient. For instance, the orange border around the eye selective sites on the left and the right occipitotemporal cortex indicates that the same patient contributed these two electrodes.

we performed five pairwise tests, which included all possible pairings except for the category-selective condition vs. flowers. The latter test was not performed because category vs. flower was our selection criteria. We used a Bonferroni correction to adjust the significance threshold for our five contrasts from *p <* 0*.*05 to *p <* 0*.*01.

that were category-selective for one or more conditions. The grand-averages can therefore include sites that were selective for multiple conditions. For instance, the body-selective ERP (bottom row) includes the response recorded from the "Body," "Face & Body," and "Face & Eye & Body" locations. The grand average ERP was created by averaging patient ERPs, which could each include one or more electrodes. We report both the patient sample size (*N*) and the total number of electrodes. **(C)** The relative increase in event-related gamma power at the same category-selective sites.

The normalized locations (MNI) of the category-selective electrodes were plotted onto a standard brain (**Figure 2A**). K-means clustering, as implemented with the "kmeans" function in MATLAB- s (The Mathworks, Inc.) Statistics Toolbox, was used to further summarize the electrode locations by segmenting them into four clusters and identifying the locations of the cluster centroids. We chose *k* = 4 because visual inspection of the electrode locations suggested a cluster on the ventral temporal surface, and another on the occipitotemporal surface of each hemisphere. The selection of four clusters was supported by the quantitative observation that four clusters explained 89, 85, and 87% of the total spatial variance for faces, eyes, and bodies, respectively.

To test for latency differences we identified the peak of the N200 (minimum amplitude) within our critical time window (160–240 ms) for each electrode. The latencies of these peaks for each condition were then contrasted using Wilcoxon signed rank tests.

Finally, we identified category-*specific* electrodes. At these sites the response to a given category (faces, eyes, or bodies) met the criteria for selectivity as compared to *all* other corporeal categories and flowers.

#### **RATE OF VOLTAGE CHANGES OVER SPACE**

The changes in voltage created by the current sinks and sources of active neurons can be recorded throughout the volume conductor of the brain, but the strength of the voltage diminishes

**FIGURE 4 | Peak-voltage changes over space.** We visually identified 12 locations (seven patients) at which adjacent electrodes showed an N200 to at least two of our conditions of interest (faces, eyes, bodies), and at which the peak-voltage to these conditions changed at different rates over space. For patient 1E (top row, left column) we display the waveform from two adjacent electrodes on the right ventral temporal surface. Inspection of these waveforms shows that at electrode E1 there is a prominent N200 to both faces and eyes, but not bodies. At the adjacent electrode, E2,

bodies evoke an N200 that is qualitatively larger than faces and eyes. Moreover, compared to the response at E1 the eye-N200 has diminished more sharply than the face-N200. The log of the peak difference is displayed in the bar graph. For all other patients we display only these bar graphs to represent the rate of change for each of the three conditions. At each location (i.e., each collection of adjacent sites) two or more of the conditions experience a different rate of change over space, indicating differing patterns of current sinks and sources.

with distance from the source, resulting in a weaker signal at more distal recording sites. The rate at which this signal decays (as a percentage of the peak amplitude) should be the same for ERPs generated by the same configuration of current sinks and sources; this is true regardless of the initial strength of the signal. If ERPs evoked by two different stimulus categories fall off at different rates across adjacent electrodes it indicates a different configuration of sinks and sources, and thus a different pattern of neural activity. We looked for all instances of differential voltage changes over space in the peak ERP response to faces, eyes, and bodies by visually inspecting all ERPs from the 12 patients. At these locations, we quantified the change for each condition by calculating the log of the peak voltage difference between the adjacent electrodes. This returned a complex number for negative peak differences. For these values we report the negative of the real part of the complex number.

#### **EVENT-RELATED SPECTRAL PERTURBATION (ERSP) ANALYSIS**

In addition to the ERP analysis, we also analyzed the EEG using a time-frequency approach that evaluated event-related changes in gamma power (gamma event-related spectral perturbations; γERSP) at each of the category-selective sites. Following our prior reports (Engell and McCarthy, 2010, 2011, 2014) we removed the mean signal-averaged ERP from the raw EEG signal for each trial prior to ERSP analysis. This ensured that any significant spectral differences between categories did not merely reflect the frequency composition of the phase-locked ERP. As a result of this approach, the frequency-domain analysis reported here is insensitive to spectral changes that undergo phase resetting (i.e., phase-locked "evoked" EEG responses). However, these signal components are well captured in the time-domain analysis (i.e., ERP), resulting in a full characterization of the data.

Event-related spectral perturbations were computed using EEGLAB v7.1 (Delorme and Makeig, 2004) and MATLAB v7.9 (The Mathworks, Inc.). Time-frequency power spectra were estimated using Morlet wavelet analysis based on 3 cycles at the lowest frequency (11.6 Hz) increasing to 16 cycles at the highest frequency (125 Hz). Change in power induced by each category (i.e., ERSP) was estimated by calculating the ratio of log power (dB) between the post-stimulus and pre-stimulus epochs. ERSPs within the gamma band (30–100 Hz) were averaged at each timepoint to create a "gamma power-wave" over time. This frequency range for gamma was selected on the basis of the prior literature. Reports in the animal (Singer and Gray, 1995; Tallon-Baudry and Bertrand, 1999) and human (e.g., Lachaux et al., 2005; Tsuchiya et al., 2008; Fisch et al., 2009; Engell and McCarthy, 2010, 2011, 2014; Engell et al., 2012) literatures have defined 30 Hz as the lower bound of the gamma band. These same human intracranial studies have reported an upper bound for gamma between 70 and 200 Hz. The amplifiers used in our studies imposed a 100 Hz (−3db) upper limit on the iEEG signal, so we restricted the upper range of the gamma band to 100 Hz. For each condition and each site with a category-selective N200 we estimated the area under curve (AUC) within an epoch that appeared to be most sensitive to the task (Engell and McCarthy, 2011, 2014; Engell et al., 2012). Across conditions, 150–600 ms showed the largest changes in gamma power for all conditions and we therefore focused our analysis on this window. Where appropriate these AUC estimates were contrasted with paired-sample *t*-tests.

## **RESULTS**

## **EVENT-RELATED POTENTIALS**

#### *Face-selective electrodes*

We identified 32 face-selective electrodes (20 RH, 12 LH) across 11 patients (**Table 1**, **Figure 2**). At these locations, the peak amplitude medians of faces, eyes, bodies, and flowers were −106.66, −94.32, −50.54, and −19.40, respectively. The Wilcoxon signed rank test showed that the face response was larger (i.e., more negative) than the body response, *Z* = 2*.*76, *p* = 0*.*006, but not the eye response, *Z* = 1*.*07, *p* = 0*.*286. Note that selection of these sites was solely based on the faces *>* control contrast, so the selection process did not necessitate that faces would be larger than bodies or eyes. The peak response to eyes was larger than bodies, *Z* = 2*.*93, *p* = 0*.*003, and flowers, *Z* = 2*.*93, *p* = 0*.*003. The peak response to bodies was larger than flowers, *Z* = 2*.*85, *p* = 0*.*004.

The latency medians of faces, eyes, bodies, and flowers were 162, 178, 186, and 192 ms, respectively. The Wilcoxon signed rank test showed that the face response was marginally earlier than the eye response, *Z* = 2*.*54, *p* = 0*.*011, and significantly earlier than the body response, *Z* = 2.67, *p* = 0*.*008. The latency of the


*The MNI coordinates from each of the 32 face-selective electrodes are grouped according to which cluster they were assigned to by the k-means clustering analysis. The MNI coordinates in the first row are the centroid locations within each cluster. Coordinates in bold typeface indicate electrodes that were face-specific as well as face-selective (see Materials and Methods).*

peak to eyes did not differ from the latency of the peak to bodies, *Z* = 1*.*11, *p* = 0*.*266, but was marginally earlier than flowers, *Z* = 2*.*40, *p* = 0*.*016. Bodies and flowers did not differ, *Z* = 0*.*71, *p* = 0*.*476.

The cluster detection algorithm (see Materials and Methods) identified two spatial clusters of electrodes within each hemisphere for face-selective responses. The cluster centroids in the right hemisphere were located at 33, −55, 17 and 27, −80, −8. The cluster centroids in the left hemisphere were located at −46, −59, −20 and −38, −89, 1 (**Figure 5**). Despite the sparse sampling inherent in iEEG, these centroids roughly corresponded to face-selectivity peaks as identified by the Atlas of Social Agent Perception (Engell and McCarthy, 2013). The small sample sizes within each cluster precluded statistical analysis of the ERPs. We describe the relevant qualitative results from within these clusters in the context of our discussion section.

In a second analysis, we identified face-*specific*, rather than face-*selective* (see Materials and Methods), electrodes. We found three face-specific sites (2 RH, 1 LH) contributed by three patients (bolded coordinates in **Table 1**).

#### *Eye-selective electrodes*

We identified 51 eye-selective electrodes (30 RH, 21 LH) across 12 patients (**Table 2**, **Figure 2**). At these locations, the peak amplitude medians of faces, eyes, bodies, and flowers were −51.66, −97.79, −27.40, and −15.16μV, respectively. The Wilcoxon signed rank test showed that the eye response was larger than the response to faces, *Z* = 2*.*90, *p* = 0*.*004, and bodies, *Z* = 3*.*06, *p* = 0*.*002. The average peak response to faces was significantly larger than to bodies, *Z* = 2*.*75, *p* = 0*.*006, and flowers, *Z* = 3*.*06, *p* = 0*.*002. The body response was marginally larger than the flower response, *Z* = 2*.*20, *p* = 0*.*028.

The latency medians of faces, eyes, bodies, and flowers were 168, 179, 187, and 182 ms, respectively. The Wilcoxon signed rank test showed that the eye response was significantly later than the face response, *Z* = 2*.*71, *p* = 0*.*007, but did not differ from the body response, *Z* = 0.67, *p* = 0*.*505. The latency of the peak to faces was marginally earlier than bodies, *Z* = 1*.*94, *p* = 0*.*052, and significantly earlier than flowers, *Z* = 2*.*59, *p* = 0*.*010. Bodies and flowers did not differ, *Z* = 1*.*57, *p* = 0*.*116.

The cluster detection algorithm identified two spatial clusters of electrodes within each hemisphere for the eye-selective ERPs. The cluster centroids in the right hemisphere were located at 33, −55, 18 and 38, −85, −2. The cluster centroids in the left hemisphere were located at −40, −68, −12 and −31, −92, 1 (**Figure 5**). The small sample sizes within each cluster precluded statistical analysis of the ERPs. We describe the relevant qualitative results from within these clusters in the context of our discussion section.

**FIGURE 5 | Results of spatial-clustering. (A)** The locations of the centroids from the k-means cluster analysis as seen on the right ventral tremporal (Right VT; 1st row), right occipitotemporal (Right OT; 2nd row), left ventral temporal (Left VT; 3rd row), and left occipitotemporal (Left OT; 4th row) surfaces. Centroids are displayed for faces (red), eyes (green), and bodies (blue). **(B)** Grand-average ERPs were calculated from the electrodes within each spatial cluster. The "I" bar in each plot represents 50μV along the y-axis and is

located along the x-axis at the time of stimulus onset. We report both the patient sample size (N) and the total number of electrodes included in each ERP. Bar graphs show the relative increase in event-related gamma power (AUC of log power change from baseline) at these same electrodes. Bar height indicates AUC between 0 and 100 db2. Note, only *increases* in gamma power are shown. At the body-selective sites in the left ventral and occipitotemporal regions there was desynchronization to flowers and to eyes, respectively.


**Table 2 | Electrode locations of eye-selective sites within each spatial cluster.**

*The MNI coordinates from each of the 51 eye-selective electrodes are grouped according to which cluster they were assigned to by the k-means clustering analysis. The MNI coordinates in the first row are the centroid locations within each cluster. Coordinates in bold typeface indicate electrodes that were eye-specific as well as eye-selective (see Materials and Methods).*

In a second analysis, we identified eye-*specific*, rather than eye-*selective* (see Materials and Methods), electrodes. We found 13 eye-specific sites (4 RH, 9 LH) contributed by eight patients (bolded coordinates in **Table 2**). Within patients from whom several eye-specific sites were identified, there was only one instance in which the electrodes were closely adjacent. In all other cases the electrodes were located in different within-hemisphere locations (e.g., lateral temporal and ventral temporal cortices) or in different hemispheres.

#### *Body-selective electrodes*

We identified 21 body-selective electrodes (15 RH, 6 LH) across ten patients (**Table 3**, **Figure 2**). At these locations, the peak amplitude medians of faces, eyes, bodies, and flowers were −83.63, −69.88, −75.45, and −21.30μV, respectively. The Wilcoxon signed rank test showed that the body response was not larger than faces, *Z* = 0*.*56, *p* = 0*.*575, or eyes, *Z* = 0*.*66, *p* = 0*.*508. The average peak response to faces was significantly larger than to flowers, *Z* = 2*.*60, *p* = 0*.*009, but not to eyes, *Z* = 0*.*05, *p* = 0*.*959. The eye response was marginally larger than the flower response, *Z* = 2*.*50, *p* = 0*.*013.

The latency medians of faces, eyes, bodies, and flowers were 181, 195, 182, and 197 ms, respectively. The Wilcoxon signed rank test showed that the body response did not differ from the latency of the response to faces, *Z* = 1*.*40, *p* = 0*.*161, or eyes, *Z* = 0*.*83, *p* = 0*.*406. The latency of the peak to faces was marginally earlier than eyes, *Z* = 2*.*45, *p* = 0*.*014, but did not differ from flowers, *Z* = 1*.*74, *p* = 0*.*083. Eyes and flowers did not differ, *Z* = 0*.*12, *p* = 0*.*906.

The cluster detection algorithm identified two spatial clusters of electrodes within each hemisphere for eye-selective ERPs. The cluster centroids in the right hemisphere were located at 36, −55, 19 and 45, −83, −2. The cluster centroids in the left hemisphere were located at −34, −48, −22 and −20, −96, 5 (**Figure 5**). The small sample sizes within each cluster precluded statistical analysis of the ERPs. We describe the relevant qualitative results from within these clusters in the context of our Discussion Section.

In a second analysis, we identified body-*specific*, rather than body-*selective* (see Materials and Methods), electrodes. We identified only one body-specific site, which was located in the left hemisphere (bolded coordinates in **Table 3**).

#### **VOLTAGE CHANGES OVER SPACE**

We visually identified 10 occurrences from seven patients in which the peak of the ERP to all, or some, of the categories of interest changed at different rates across neighboring electrodes (**Figure 4**). Of these 10, seven were located on the ventral surface (6 right hemisphere) and three on the lateral occipitotemporal surface (1 right hemisphere). At many of these locations the most notable difference was between bodies and the other conditions. However, the peak rate of change between faces and eyes also


**Table 3 | Electrode locations of body-selective sites within each spatial cluster.**

*The MNI coordinates from each of the 21 body-selective electrodes are grouped according to which cluster they were assigned to by the k-means clustering analysis. The MNI coordinates in the first row are the centroid locations within each cluster. Coordinates in bold typeface indicate an electrode that was body-specific as well as body-selective (see Materials and Methods).*

differed, though this difference was often more subtle than that of bodies.

#### **EVENT-RELATED SPECTRAL PERTURBATIONS**

Across the electrodes that were selective for faces (*N* = 32), eyes (*N* = 51), and bodies (*N* = 21), we observed substantial change in gamma power as an effect of stimulus presentation. However, there were few differences between categories. At the face-selective ERP locations the face-γERSP was larger than the eye-γERSP, *t*(10) = 3*.*69, *p* = 0*.*004, and marginally larger than the bodyγERSP, *t*(10) = 2*.*46, *p* = 0*.*03. There were no other pairwise differences, *p*s *>* 0.01. At the eye-selective ERP locations eye-γERSP was *smaller* than the face-γERSP, *t*(11) = 3*.*12, *p* = 0*.*010. This same relationship was seen even when only including face-specific sites in the eye-γERSP average. At these sites the face-γERSP was also marginally larger than the body γERSP, *t*(11) = 2*.*67, *p* = 0*.*022. There were no other pairwise differences, *p*s *>*0.01. At the body-selective ERP locations there were no pairwise differences between conditions, *p*s *>*0.01.

We also investigated the category-selective γERSPs within each of the four spatial clusters identified by the cluster detection algorithm. As with the ERPs, the small samples within each cluster precluded statistical analyses. We describe the relevant qualitative results from within these clusters in the context of our discussion section.

### **DISCUSSION**

In this paper we report several findings regarding the selectivity and organization of cortical areas engaged in the perception of faces, eyes, and bodies. Overall, we found substantial overlap in the activation by the three corporeal stimuli. The majority of electrodes selective for one of the three corporeal categories were selective for one or both of the other categories. However, we did identify several electrode sites that were specific for only a single category—particularly for eyes. Furthermore, we report evidence for different spatial distribution of voltage evoked by the three corporeal stimuli at closely adjacent electrodes. This indicates that the ERPs evoked by these stimuli are being generated by different configurations of current sinks and sources despite their substantial spatial overlap. In the following section we will discuss the evidence for this as well as general observations regarding the response properties and locations of face, eye, and body selective sites. We conclude by discussing these findings in the context of the perception of social agents.

#### **NEURAL SELECTIVITY OF RESPONSES**

Locations showing selectivity for at least one of the categories of interest were found widely distributed across bilateral VOTC. No category-selective locations were found in the frontal or parietal lobes. The wide spatial distribution observed in VOTC is consistent with prior iEEG reports from our laboratory (e.g., Allison et al., 1994a, 1999; Engell and McCarthy, 2010, 2011), but inconsistent with fMRI studies that often report highly localized selective regions (e.g., Sergent et al., 1992; Haxby et al., 1994; Puce et al., 1995; Kanwisher et al., 1997; Gauthier et al., 2000; but see Pinsk et al., 2009; Weiner and Grill-Spector, 2013). We used a cluster detection algorithm to assign each category-selective electrode to one of four spatial clusters, and then identified the centroids of those clusters. For each of the social agent conditions, the centroids (**Figure 5**) were well aligned to face- and body-selective areas identified in the fMRI literature. Face-selective electrodes clustered around the FFA (ventral temporal cortex) and OFA (posterior and lateral occipitotemporal cortex) in each hemisphere, as did eye- and body-selective electrodes. These findings suggest that group analysis of fMRI data has emphasized regions of maximal overlap at the expense of detecting spatial variability across, and perhaps within, individuals. Weiner and Grill-Spector (2013) have recently reported that high-resolution fMRI of individual participants shows that face- and body-selective regions repeat throughout the occipitotemporal cortex. If so, our findings might reflect a coarse sampling of this repeating pattern.

### *Face-selectivity*

Category-selective face-N200s were identified at 32 locations across 11 patients. Consistent with a prior report (McCarthy et al., 1999), the peak of the face-N200 at these sites was qualitatively larger and earlier than the peak of the eye-N200. The face-N200 was also larger than the body-N200. Despite being selected for their face-selectivity, the body-N200 and eye-N200 at these face-selective sites was greater than the ERP evoked by flowers. Therefore, the underlying cortex is sensitive to nonface images of social agents, despite being optimally activated by faces. Notably though, the locations of the face-selective sites were widely distributed across bilateral VOTC and thus might span functionally heterogeneous regions that include some areas, such as the FFA located in right ventral temporal cortex (rVT), that are more face-selective than others.

Of the 32 face-selective electrodes, 14 were located within the rVT region, suggesting that there might indeed be greater faceselectivity within this region. However, the responses at these sites did not appreciably differ from those in our other three regions of interest. Perhaps more surprising was that none of the 14 face*selective* sites within the rVT met the criteria to be face-*specific*. In other words, no electrodes on or around the FFA, a region often considered to be a functional module for face perception, were category-selective for faces as compared to eyes and/or bodies.

## *Eye-selectivity*

Eyes are a critical feature of faces and attract the most attention during natural looking (Janik et al., 1978). In a previous iEEG study, McCarthy et al. (1999) found right VOTC and left LOTC sites at which face-parts (eyes, lips, noses) evoked a larger and later N200 than did whole faces. In that study, the authors averaged over the potentials independently evoked by eyes, lips, and noses. This approach creates the possibility that averaging in the potentially weaker responses of lips and noses will obscure eyeselective electrodes. Similarly, they averaged hands and flowers to create their control condition, and thus did not directly contrast face-parts with non-face body parts. In the current study, we focused on eyes and identified 51 eye-selective sites across twelve patients. The grand-averaged response to faces, eyes, and bodies across these sites was very similar to the response from face-selective sites, with the exception that the eye response was qualitatively larger and later than the face response. As with the face-selective sites, the body response at these eye-selective sites was larger (though not significantly so) than the flower response.

Unlike the face-selective sites, the eye-selective sites were more likely to also be category-specific. That is, 27 of the 51 sites were not identified as being selective for faces or bodies. Moreover, 13 of those 27 sites were eye-specific, all but two of which were located in bilateral occipital and lateral occipitotemporal cortex.

#### *Body-selectivity*

Category-selective body-N200s were identified at 21 electrodes across ten patients. As with the face-selective sites discussed above, the body-selective sites were also sensitive to the other social agent categories. All three categories were larger than flowers at these sites, and all were highly similar to one another. There was only a single electrode site in a single subject at which the body-N200 was category-specific. In other words, we found no evidence of regions that preferred bodies to faces or eyes. Particularly striking was the response of the subgroup of electrodes from within the right ventral temporal cortex, which includes the so-called fusiform body area (Schwarzlose et al., 2005). Here, the peak ERP response to bodies was qualitatively *smaller* than to faces or eyes (see **Figure 5**). The body-γERSP was also smaller than the face-γERSP. This contrasts to sites within the right occipitotemporal region (near the so-called extrastriate body area) where bodies elicited a larger N200 and γERSP than faces or eyes.

### **INDEPENDENT OR SHARED NEURAL SUBSTRATE**

As discussed above, neuroimaging studies report substantial overlap in VOTC brain regions activated by faces and bodies. The striking overlap of these networks is highlighted in a largesample fMRI study (Engell and McCarthy, 2013). That study used dynamic "point light" displays (i.e., biological motion) rather than static body images, but the location and magnitude of activation evoked by bodies and point-light displays are strongly correlated (Peelen et al., 2006).

Our current ERP results offer some insight into the nature of this extensive overlap observed in fMRI. Consistent with fMRI studies, we found that electrodes sensitive to one visual category of animate agent (faces, eyes, or bodies) were frequently sensitive to one or both of the other categories. However, we found ten instances in which the amplitude of the peak ERP response changed at different rates across electrodes for two or more of the categories at closely adjacent electrodes. For instance, we observed a large difference in the rate of change for faces and for bodies (and to a lesser extent, eyes) across adjacent electrodes located on the right fusiform gyrus of Patient 1E (see **Figure 4**). Indeed, in this particular case the amplitude of the peak response increases for one condition while decreasing for the other. This differential rate of change in voltage across adjacent electrodes cannot be accounted for by a consistent configuration of current sources and sinks that is activated at different strengths. Rather it indicates that these ERP distributions are caused by a different pattern of input and/or the participation of at least some different neural elements for faces, bodies, and eyes.

It is important to note that sites at which *no* evidence was found for the different spatial rates of change for faces, bodies, and eyes outnumbered sites at which we found such evidence. However, a favorable spatial relationship between electrode locations and sink/source configurations is necessary to record category-selective N200s at neighboring electrodes. Therefore, an inability to find appropriate electrodes can be due simply to the unsystematic spatial sampling typical of subdural recordings. The presence of functionally heterogeneous, but spatially regular and interdigitated neurons at a scale much smaller than our interelectrode distance could also result in indistinguishable voltage distributions with our methods. In contrast, we cannot think of alternative explanations that would account for differential rate change, thus making the ten instances across seven patients reported here compelling.

#### **FACES: PART AND WHOLE**

An influential model of face processing posits that detection and representation of face-parts occurs in the "occipital face area" of the posterior occipitotemporal cortex (Haxby et al., 2001). Consistent with this model, we found the majority of eye-specific sites in bilateral posterior occipitotemporal cortex. In addition, there were few face-selective sites in this region. However, our data is inconsistent with another key feature of this model; namely, that the fusiform face area is primarily involved in holistic processing of the whole face. Eye-selective sites in this region slightly outnumbered face-selective sites. Moreover, we found no electrodes in this region that were face-specific when compared to isolated eyes and bodies.

We have previously shown that the face-N200 is functionally distinct from the face-γERSP (Engell and McCarthy, 2010, 2011) and have proposed that the former is involved in early detection, whereas the latter is involved in elaborative processing. It is notable then, that the face-γERSP was *larger* than the eye-γERSP at eye-selective ERP sites. The qualitative nature of these results demand caution, but we speculate that the elaborative processing of a whole face follows the initial evoked response to eyes. This would reconcile the current results with the neuroimaging literature because changes in gamma power, and not evoked-potentials, are most closely related to changes in the hemodynamic response (Mukamel et al., 2005; Niessing et al., 2005; Lachaux et al., 2007; Koch et al., 2009; Ojemann et al., 2010; Scheeringa et al., 2011; Hermes et al., 2012). However, we observed that the face-γERSP is larger than the eye-γERSP at eye-selective sites within the OFA region as well, which is inconsistent with fMRI reports of a greater OFA response to face-parts(Liu et al., 2010).

## **CONCLUSIONS**

Direct electrical recordings from the surface of the fusiform gyrus and adjacent VOTC and LOTC show a complex pattern of activation for the perception of faces, bodies, and eyes. Most electrodes selective for one category of corporeal stimuli (relative to the control category of flowers) showed selectivity for the other corporeal categories as well. This was particularly true for body stimuli only one electrode site of 1536 total sites examined, and of 69 sites showing a response selective for at least one category of corporeal stimuli, was specific for bodies. Perhaps most surprisingly, no electrode site in the vicinity of the fusiform face and body areas (as defined by fMRI studies) showed face or body specificity. These data do not, then, provide evidence for highly discrete processing regions for these different stimulus types. However, we did find a differential spatial distribution over closely adjacent electrodes for the maximum ERP response to bodies (and to a lesser extent for eyes) relative to faces. This suggests that *within* a given region, the different stimulus types engaged different configurations of current sinks and sources. Taken together, these results suggest a lumpy or patchy spatial representation for these different types of corporeal stimuli rather than segregation into highly discrete regions.

## **ACKNOWLEDGMENTS**

We thank Dr. Dennis D. Spencer, and Mr. William Walker for their help in acquiring the intracranial EEG data reported here. This research was supported by National Institute of Mental Health fellowship MH-093986 (Andrew D. Engell) and National Institute of Mental Health grant MH-005286 (Gregory McCarthy).

## **REFERENCES**


human extrastriate visual cortex. *Neuropsychologia* 45, 2621–2625. doi: 10.1016/j.neuropsychologia.2007.04.005


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 May 2014; accepted: 02 August 2014; published online: 21 August 2014. Citation: Engell AD and McCarthy G (2014) Face, eye, and body selective responses in fusiform gyrus and adjacent cortex: an intracranial EEG study. Front. Hum. Neurosci. 8:642. doi: 10.3389/fnhum.2014.00642*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Engell and McCarthy. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Discriminable spatial patterns of activation for faces and bodies in the fusiform gyrus

## *NaYeon Kim, Su Mei Lee, Margret C. Erlendsdottir and Gregory McCarthy\**

Human Neuroscience Laboratory, Department of Psychology, Yale University, New Haven, CT, USA

#### *Edited by:*

Davide Rivolta, University of East London, UK

#### *Reviewed by:*

Galia Avidan, Ben-Gurion University of the Negev, Israel Guy Michael Wallis, University of Queensland, Australia

#### *\*Correspondence:*

Gregory McCarthy, Human Neuroscience Laboratory, Department of Psychology, Yale University, P. O. Box 208205, 2 Hillhouse Avenue, New Haven, CT 06520-8205, USA e-mail: gregory.mccarthy@yale.edu Functional neuroimaging studies consistently report that the visual perception of faces and bodies strongly activates regions within ventral occipitotemporal cortex (VOTC) and, in particular, within the mid-lateral fusiform gyrus. One unresolved issue is the degree to which faces and bodies activate discrete or overlapping cortical regions within this region. Here, we examined VOTC activity to faces and bodies at high spatial resolution, using univariate and multivariate analysis approaches sensitive to differences in both the strength and spatial pattern of activation. Faces and bodies evoked substantially overlapping activations in the fusiform gyrus when each was compared to the control category of houses. No discrete regions of activation for faces and bodies in the fusiform gyrus survived a direct statistical comparison using standard univariate statistics. However, multi-voxel pattern analysis differentiated faces and bodies in regions where univariate analysis found no significant difference in the strength of activation. Using a whole-brain multivariate searchlight approach, we also found that extensive regions in VOTC beyond those defined as fusiform face and body areas using standard criteria where the spatial pattern of activation discriminated faces and bodies. These findings provide insights into the spatial distribution of face- and body-specific activations in VOTC and the identification of functionally specialized regions.

**Keywords: face area, body area, fusiform gyrus, multivoxel pattern analysis, fMRI**

#### **INTRODUCTION**

The ability to extract biologically relevant information from faces and bodies is critical for social interactions among humans and for many nonhuman animals. Single-cell recording in sheep and in monkeys has revealed that some temporal lobe neurons respond selectively to faces (Gross et al., 1972; Perrett et al., 1982; Kendrick and Baldwin, 1987; Tanaka et al., 1991), hands (Desimone and Albright, 1984), or headless bodies (Wachsmuth et al., 1994). In humans, there is converging evidence that faces activate regions of ventral occipitotemporal cortex (VOTC), and in particular a region of the lateral mid-fusiform gyrus (e.g., Sergent et al., 1992; Allison et al., 1994; Haxby et al., 1994; Puce et al., 1995; McCarthy et al., 1997). This latter region has been shown to respond selectively to faces when compared to a variety of non-corporeal control stimuli, such as scenes, objects, letter strings, and textures. Indeed, such apparent selectivity has led to its widely adopted functional designation as the fusiform face area, or FFA (Kanwisher et al., 1997).

Areas selective to bodies have also been reported in studies using fMRI. Downing et al. (2001) reported a region of lateral occipitotemporal cortex (LOTC) to be selectively activated by bodies without faces (a region they designated as the extrastriate body area, or EBA). In a later study, Peelen and Downing (2005) reported a similar body-selective area along the VOTC in the fusiform gyrus that they designated as the fusiform body area, or FBA. The selectivity for bodies in the FBA has been studied by comparing the response to bodies or body parts to noncorporeal objects (Taylor et al., 2007; Hodzic et al., 2009; Vocks et al., 2010; Willems et al., 2010; Ewbank et al., 2011), object parts (Costantini et al., 2011), or scrambled bodies (Aleong and Paus, 2010).

Evidence for anatomically distinct FFA and FBA would be consistent with a modular neural organization, as has been previously proposed for face processing (Kanwisher et al., 1997). However, if the same voxels respond equally to faces and bodies, a more distributed organization may be considered. Studies in the macaque using fMRI have found multiple clusters in the superior temporal sulcus (STS) of the macaque brain that respond to faces (Tsao et al., 2003), and an adjacent and overlapping region that responds to body parts (Pinsk et al., 2005, 2009). In fMRI studies that have compared activations in human VOTC evoked by faces and bodies, evidence for the anatomical distinction of these areas has been equivocal. For example, Spiridon et al. (2006) reported that the activation evoked by body stimuli was not statistically significantly different than that evoked by faces in the FFA. Morris et al. (2008) found no statistical difference in a VOTC region corresponding to the FFA when contrasting activations evoked during guided eye fixations of the face or body of a static image of a male human avatar. The time courses of activity confirmed that faces and torsos evoked no differential activation in this region, but hands evoked much less activation. This finding was consistent with an earlier report by Morris et al. (2006), which found that bodies with naturally occluded faces and faces with naturally occluded bodies equally activated a lateral region of the VOTC corresponding to the FFA. However, in both studies by Morris et al. (2006, 2008), viewing bodies with occluded faces or making guided

fixations upon a torso activated a region adjacent and medial to the FFA. Morris et al. (2006) note that the medial VOTC areas activated by bodies without faces were the same as those previously identified with differential processing of objects and textures, suggesting that these activations might represent domain-general processes.

In a recent intracranial event-related potential study, recordings from subdural electrodes along the VOTC including the fusiform gyrus were compared for faces, bodies, and eyes (Engell and McCarthy, 2014). While many sites in this region showed strong selectivity to these corporeal stimuli compared to a control category, most sites that responded to one of these three stimulus categories responded to the other two. However, the authors also showed shifts in the spatial distribution of voltage associated with faces, bodies and eyes at adjacent electrode sites, suggesting that a different configuration of current sinks and sources is engaged by these stimulus types. This finding suggests a differential neural organization among the three categories at a finer spatial scale.

Others, however, have made a stronger argument in favor of separate selective face and body areas, while also noting regions of overlap. Using higher spatial resolution than most contemporaneous studies, Schwarzlose et al. (2005) initially defined the FFA on the basis of the face > object contrast, and the FBA on the basis of the body > object contrast. They then defined face- or body-selective regions by eliminating overlapping voxels that were included in both the FFA and FBA from the initial contrasts. The time courses of activation in these non-overlapping areas demonstrated the regions' selective response to either faces or bodies. Weiner and Grill-Spector (2010) reported minimally overlapping, but rather alternating face and body activations within VOTC instead of a single specialized area for faces or bodies. However, while both studies reported the region's response to faces compared to objects and bodies compared to objects, they did not directly contrast faces and bodies to each other. Indeed, activations evoked by faces and bodies have rarely been statistically compared to each other when those regions are identified. Thus it remains unclear whether the areas defined as selective had significantly different levels of activation.

The studies reviewed thus far have focused on identifying discrete selective regions that respond only to faces, or only to bodies, and have thus deemphasized the regions where the activations for faces and bodies overlap. An alternative perspective is that faces and bodies may be represented in activation *patterns* within a larger area of VOTC, rather than in discrete regions such as the FFA or FBA (see Haxby et al., 2001, for evidence supporting a pattern perspective for face processing). Multi-voxel pattern analysis (MVPA) has been employed to determine whether sufficient information exists within local brain regions to classify a stimulus into one of a number of different categories (e.g., Haxby et al., 2001; Connolly et al., 2012), and to investigate the functional organization of the regions at a finer scale (Downing et al., 2007). Peelen and Downing (2007) have suggested that MVPA reveals more subtle functional differences in activations that overlap at a larger spatial scale.

Here, we examined the activation to faces and bodies at high spatial resolution in a sample of 21 young adults. Our focus was upon the fusiform gyrus and adjacent VOTC regions, with the goal of determining the degree of overlap between face and body activations, and the degree to which faces and bodies can be discriminated within regions of overlap. Using a univariate general linear model (GLM) approach, we first tested whether discrete regions of the fusiform gyrus were activated when faces and bodies were statistically compared. We then used MVPA to determine whether sufficient information was present in the pattern of activation in areas where both faces and bodies evoked overlapping and statistically indistinct activation to classify a stimulus as a face or body. Finally, we conducted a whole-brain multivariate searchlight analysis to identify all regions in the brain where faces and bodies could be discriminated.

## **MATERIALS AND METHODS**

#### **SUBJECTS**

Twenty-one healthy adults (13 female, mean age 23.7 ± 4.0 years, all right-handed) with normal or corrected-to-normal vision and no history of neurological or psychiatric illnesses participated in this study. All participants gave written informed consent. The Yale Human Investigations Committee approved the protocol.

#### **EXPERIMENTAL DESIGN**

**Figure 1** presents exemplars of the stimuli used in the experiment. Face stimuli were created using FaceGen software (Singular Inversions, Toronto, ON, Canada). Body stimuli were created using Poser 6.0 (Curious Labs Inc., Santa Cruz, CA, USA). House stimuli were photographs of houses with natural scenes in the background. All stimuli were presented on the center of a screen (10◦ × 10◦) located behind the participant in the scanner and viewed with a mirror mounted in the head coil.

Each participant completed four runs, each of which lasted 4 min 54 s. Each run consisted of a pseudo-randomized block design in which 12-s stimulus blocks were interleaved with 12-s blocks of fixation. A total of 12 stimulus blocks were presented in every run, including four each for faces, bodies, and houses. Stimulus blocks consisted of eight images from a single category. Stimuli were presented for 1 s each, interleaved with 500 ms of fixation. Participants were instructed to count the number of times they saw the same picture twice consecutively. No button press was required.

#### **fMRI IMAGE ACQUISITION AND PREPROCESSING**

Data were acquired at the Magnetic Resonance Research Center at Yale University using a 3.0 T Siemens TIM Trio scanner with a 32-channel head coil. Functional images were acquired using a multiband imaging sequence (TR = 2000 ms, TE = 32 ms, flip angle=62 , FOV=210×202 mm, matrix=104×100, slice thickness = 2.0 mm, 60 slices, multiband accelerate factor = 3) yielding isotropic voxels that were 2 mm3. Two structural images were acquired for registration: T1 coplanar images were acquired using a T1 Flash sequence (TR = 335 ms, TE = 2.61 ms, flip angle = 70◦, FOV = 240 mm, matrix = 192 × 192, slice thickness = 2.0 mm, 60 slices), and high-resolution images were acquired using a 3D MP-RAGE sequence (TR = 2530 ms, TE = 2.77 ms, flip angle = 7◦, FOV = 256 mm, matrix = 256 × 256, slice thickness = 1 mm, 176 slices).

#### **ANALYSIS OVERVIEW**

The data were analyzed using several different methods so that our observations could be compared to previously reported findings. We first conducted a conventional univariate GLM to obtain parameter estimates for each condition and whole-brain statistical maps for the contrasts of interest: face > house, body > house, face > body, and body > face. We examined the face > house and body > house activation maps and measured the overlap between the two. Because our focus in this paper is upon theVOTC,we used the temporal occipital fusiform cortex (TOFC) overlay from the Harvard–Oxford Structural Atlas as an anatomical mask, or region of interest (ROI), for several of our analyses. Each hemisphere was analyzed separately.

Using a beta series derived from hemodynamic model fitting, we used MVPA to test whether activation patterns in the overlapping region could discriminate faces, bodies, and houses. We also compared each subject's uncorrected contrast maps in the subject's own anatomical space, as some previous studies reported face- and body-specific activations on a subject-by-subject basis (e.g., Schwarzlose et al., 2005; Weiner and Grill-Spector, 2010).

Finally, we conducted a multivariate searchlight analysis of the whole brain to discover regions that could discriminate faces from bodies that fall outside of the fusiform regions that were the focus of our initial analyses.

#### *Image preprocessing*

Image preprocessing was performed using the FMRIB Software Library (FSL; http://www.fmrib.ox.ac.uk/fsl). Structural and functional images were skull-stripped using the Brain Extraction Tool (BET). The first three volumes (6 s) of each functional dataset were discarded to allow for MR equilibration. Functional images then underwent motion correction (using the MCFLIRT linear realignment) and high-pass filtering with a 0.01 Hz cut-off to remove low-frequency drift. Data were not spatially smoothed. The functional data were registered to the coplanar images, which were in turn registered to the high-resolution structural images, using non-linear registration, and then normalized to the Montreal Neurological Institute's template (MNI152). For subject-specific analyses, each participant's functional images were registered to the participant's own high-resolution structural images.

#### *Whole-brain contrast maps*

Whole-brain voxel-wise GLM analyses were performed using FSL's FMRI Expert Analysis Tool (FEAT). Each condition within each preprocessed run was modeled with a boxcar function convolved with a gamma hemodynamic response function. The model included explanatory variables (EVs) for the three stimulus types: faces, bodies, and houses, as well as confound EVs to exclude time points with excessive head motion (>2 mm) from analysis. Subject-level analyses combining multiple runs were conducted using a fixed effects model. Group-level analyses were performed using a mixed effects model, with the random effects component of variance estimated using FSL's FLAME 1 + 2 procedure. Clusters were defined as contiguous sets of voxels with *Z* > 2.3 and then thresholded using Gaussian random field theory (cluster probability *p* < 0.05) to correct for multiple comparisons (Worsley et al., 1996). We also generated uncorrected statistical maps with *Z* > 1.96 for the subject-specific analyses as described below.

#### *Subject-specific analysis*

Whole-brain statistical maps were generated for each subject using an uncorrected threshold of *Z* > 1.96. Subject-specific ROIs were defined by transforming the Harvard-Oxford Atlas TOFC ROI into subject space. Within this ROI, we obtained the intersection of the face > house and body > house contrasts (i.e., the "overlap") to identify voxels that respond to both faces and bodies. We also obtained the intersection of the face > house and face > body contrasts ("face specific") to identify voxels that were selective to faces in both contrasts, and similarly, the intersection of the body > house and body > face contrasts ("body specific"). We then excluded "face specific" and "body specific" voxels from the "overlap" such that the remaining voxels ("exclusive overlap") showed preferential response to both faces and bodies compared to houses, but did not respond differently between faces and bodies.

#### *Multi-voxel pattern classification*

MVPA was performed on the "exclusive overlap" (i.e., voxels in the overlap between the face > house and body > house contrasts that did not respond differently between faces and bodies) from the group-level uncorrected (*Z* > 1.96), unsmoothed statistical maps within the TOFC ROI.

To perform the pattern analysis, we first obtained parameter estimates (or betas) for each stimulus block and for each participant by using hemodynamic model fitting. Specifically, the preprocessed functional data were registered and normalized to the MNI 152 template using FSL's Non-linear Image Registration Tool (FNIRT). Regression analyses were then performed using AFNI's (Cox, 1996) 3dDeconvolve and 3dREMLfit functions, where each stimulus block was modeled using the BLOCK5 basis function with duration of 12 s. The resulting beta volumes for each stimulus block (16 beta volumes in total for each stimulus category) were concatenated into a single beta series for each subject.

A three-way classification (faces, bodies, and houses) was performed using a linear support vector machine (SVM) classifier, as implemented in PyMVPA (Hanke et al., 2009) on the beta series and only within the overlap ROI. Within each volume in the beta series, each voxel's beta values were meannormalized (by *Z*-scoring using the mean and standard deviation of the voxels within the overlap ROI), which effectively removed mean differences across volumes in the beta series. This was done to ensure that any MVPA differences found were based on spatial pattern differences and not mean activation level differences. Classification training and testing were performed using a leave-one-run-out cross-validation strategy. We ensured that each condition contained the same number of examples in the training and testing sets using PyMVPA's Balancer. Confusion matrices were generated during classification to assess if the three-way classification discriminated all three categories successfully instead of only a subset of categories.

#### *Whole-brain searchlight analysis*

A whole-brain searchlight analysis was performed to identify all brain regions that discriminated between faces and bodies (Kriegeskorte et al., 2006). For each participant, voxels were extracted from a spherical searchlight with a two-voxel radius (33 voxels in each searchlight including the central voxel) and MVPA was performed. The searchlight then moved through each voxel in the brain. We examined pair-wise classification performance (faces vs. bodies) rather than three-way classification (faces vs. bodies vs. houses) because distinct patterns of activity evoked by houses led to higher classification performance in regions medial to the fusiform gyrus. As in ROI-based multivariate analyses described above, the data were normalized to remove mean activation differences between categories. A linear SVM classifier was trained and tested using the data from each searchlight, using a leaveone-run-out cross-validation strategy. The classification accuracy of each searchlight was assigned to the central voxel in the sphere, yielding an image of whole-brain classification accuracy for each participant. These images were then entered into a second-level one-sample *t*-test to identify voxels that showed significantly higher than chance level classification accuracy (0.50), using the AFNI program 3dttest++. To correct for multiple comparisons, the output group-level statistical map was thresholded using a false discovery rate of *q*(FDR) < 0.05.

### **RESULTS**

#### **GROUP-LEVEL GLM**

As expected, we found bilateral activity within the TOFC ROI for both the face > house and body > house contrasts (peak coordinates in **Table 1**). **Figure 2** displays regions within the TOFC ROI that were significantly activated in the group-level face > house (red) and body > house (yellow) contrasts. The overlap of the face > house and body > house activation maps is also shown (blue).

**Table 2** summarizes the average volume of activation in the face > house and body > house contrast maps and the overlap. In the right hemisphere, the overall volume of activated voxels was 1808 mm<sup>3</sup> in the face > house contrast and 2760 mm<sup>3</sup> in the body > house contrast. The overlap of the two contrasts was 976 mm<sup>3</sup> (54% of the face > house activation). In the left hemisphere, the face > house contrast yielded 1008 mm3 of activation, while the body > house activation yielded 2176 mm<sup>3</sup> with 696 mm<sup>3</sup> of overlap (69% of the face > house activation). The face > house and body > house contrasts showed peak activity in the same location in the left hemisphere, and the proportion of the overlap was bigger than in the right hemisphere.

**Figure 3** shows regions in the TOFC ROI that were significantly activated in the group-level face > body and body > face contrasts. In the face > body contrast, no significantly activated voxels were found in the fusiform gyrus. However, we found medial and lateral activations in both hemispheres in the body > face contrast (orange and cyan). Peak coordinates for the body > face clusters are also included in **Table 1**. The medial body > face clusters were not observed in the body > house contrast (shown in **Figure 1**). In both hemispheres, the lateral body > face clusters were lateral and posterior to the body > house clusters.

No voxels in the overlap between the face > house and body > house contrast maps were activated when faces and bodies were directly contrasted. Thus, no overlapping voxels showed statistically different activations to faces or bodies.

#### **GROUP-LEVEL INTERSECTIONS**

As described above, group-level GLM analyses revealed differential activation maps for faces and bodies in the direct face



No activations were found for the face > body contrast.

**TOFC ROI.** Faces (red) and bodies (yellow) evoked substantially overlapping activations when compared to houses (see **Table 2** for the overall volumes of activations). The blue overlay indicates the overlap

between the face > house and body > house contrast maps. The images are centered on the peak activation of the face > house contrast, and activations in the right hemisphere are magnified. R: right, L: left, A: anterior, P: posterior.

vs. body contrasts. Despite the strong face-selectivity of the region, we did not find significantly greater activity to faces in the face > body contrast. To guard against a Type II error, we explored the uncorrected statistical maps (*Z* > 1.96) for the four contrasts of interest. **Figure 4** presents regions that were activated in the face > house, body > house, face > body, and body > face contrasts at the uncorrected level. We examined intersections between uncorrected contrast maps to explore the spatial distribution of voxels that exhibited preference to faces and/or



Note: RH = right hemisphere, LH = left hemisphere.

bodies in the four contrasts. These analyses were restricted to the TOFC ROI.

As shown in **Figure 4**, 8% of the right hemisphere, and 4% of the left hemisphere voxels from the face > house contrast showed a *Z* > 1.96 for the face > body contrast. These "face specific" voxels (green) appeared on the medial part of the face > house cluster in both hemispheres. 31% of the right hemisphere, and 23% of the left hemisphere voxels from the body > house contrast showed a *Z* > 1.96 for the body > face contrast. These "body specific" voxels (cyan) were found lateral to the body > house clusters.

We also examined whether voxels within the overlap between the face > house and body > house contrasts showed differential activation to faces and bodies in these uncorrected data. We found that some voxels within the overlap exhibited greater activity to faces or bodies (i.e., face > body > house or body > face > house), but the majority (>80%) of the overlapping voxels did not show mean activation difference between faces and bodies ("exclusive overlap"). The size of overlap with no difference between faces and bodies was 52% (right) and 67% (left) of the face > house activations.

**FIGURE 3 | Direct face vs. body contrasts within the TOFC ROI.** No significantly activated voxels were found in the face > body contrast. Medial and lateral activations were observed in the body > face contrast (orange and cyan). Lateral activations were also found in the body > house contrast, indicating that bodies evoked greater activation in the lateral regions (cyan) than both faces and houses.

#### **SPATIAL DISTRIBUTIONS OF FACE- AND BODY-EVOKED ACTIVITY**

Wefurther examined how the magnitude of face- and body-evoked activity (relative to baseline) varies along the lateral–medial axis (x-axis) and the posterior–anterior axis within the TOFC ROI. As **Figure 5A** shows, face activity was numerically greater than body activity only within the mid-fusiform gyrus (x-coordinates from 32 to 40 mm), and body activity became greater than face activity in areas medial and lateral to the mid-fusiform gyrus – consistent with the GLM results reported above. However, twotailed one-sample *t*-tests performed on the mean activities at each x-coordinate revealed that the magnitudes of face and body activations were not significantly different. **Figure 5B** shows how

face and body activations change along a posterior–anterior axis (y-axis). Because we were interested in the lateral regions where face- and body-evoked activations overlapped, we included only the lateral half of the ROI (from *x* = 36 to 52 mm). Both face and body activity increased toward the posterior, but there was no difference between the two categories. The subtle difference in the magnitude of face- and body-evoked activations is consistent with the GLM results reported above.

#### **SUBJECT-SPECIFIC RESULTS**

Some have argued that face- and body-selective regions in VOTC should be examined in each subject's anatomical space because

x-coordinates (MNI) from 52 mm (lateral) to 16 mm (medial). **(B)**

y-coordinate.

anatomical differences between subjects are obscured when coregistering individuals' data into the template brain for group-level analyses (e.g., Peelen et al., 2006; Weiner and Grill-Spector, 2012). To address this issue, we performed the same analyses in each individual subjects' anatomical brain space.

As in group-level analyses, we generated uncorrected statistical maps in subject space for the contrasts of interest (i.e., face > house, body > house, face > body, body > face) and obtained the intersections of the maps.

In subject-based analyses, we focused on the proportion of overlap between the face > house and body > house contrasts and the intersection of the face > house and face > body contrasts, in order to compare the size of the overlap relative to the entire face > house activation in each subject. We also calculated the proportion of the face > body voxels among the face > house voxels ("face specific") in each subject.

On average, 41% of voxels in the right hemisphere and 34% of voxels in the left hemisphere that showed greater activation to faces compared to houses overlapped with the body > house map. Nineteen subjects (out of 21) showed face > body and body > face clusters within the fusiform ROI in both hemispheres. 34% (right hemisphere) and 36% (left hemisphere) of the face > house voxels were also included in the face > body map. Thus, we found substantial overlap between the face > house and body > house maps and a small intersection of the

face > house and face > body maps in subject-based analyses, similar to what we have observed in group-level contrast maps.

There was large individual variability in the volumes of activation and proportion of the overlap. **Figure 6** presents individual subjects' functional data on the surface of the lateral VOTC. The "face specific" (face > body and face > house) voxels were displayed in red, the "body specific" (body > face and body > house) in orange, and the "exclusive overlap" (the overlap with no difference between faces and bodies) in yellow.

#### **MULTI-VOXEL PATTERN ANALYSIS ON THE OVERLAP**

We performed MVPA to examine if the voxels that show no activation differences to faces and bodies in univariate GLM analyses could nonetheless discriminate between faces and bodies based on their spatial pattern of activity. Voxels were selected based on the group-level uncorrected statistical maps. 172 voxels in the right hemisphere and 121 voxels in the left hemisphere were selected independently of contiguity.

On average, we found high classification performance (faces vs. bodies vs. houses) in both hemispheres: 58.5% in the right hemisphere and 58.2% in the left hemisphere. A one-sample *t*-test confirmed that these group mean accuracies were significantly above chance level of 33.3% (*p* < 10<sup>−</sup>4). Houses were classified as accurately as faces and bodies within the voxels. As presented

**FIGURE 6 | Activations in subject space.** "Face specific" (red) and "body specific" (orange) regions for each of the 21 subjects are displayed, along with "exclusive overlap" (yellow) regions where face > house and body > house but faces are not different from bodies.

in **Table 3**, the confusion matrix showed no preferences. That is, there was no more misclassification between faces and bodies, as between faces and houses, or between bodies and houses.

Thus, the activation patterns within the overlapping voxels were discriminable among our three categories even though these voxels did not show a mean activation difference between faces and bodies.

#### **MULTIVARIATE SEARCHLIGHT ANALYSIS**

A whole-brain searchlight analysis demonstrated that faces and bodies are highly accurately decoded by local patterns of activity within VOTC and within the occipital lobe (**Figure 7**). Extensive regions where classification accuracies above chance level were obtained included bilateral VOTC that extended to the occipital regions, and the supramarginal gyri. Within the VOTC, faces and bodies were discriminable above chance in regions beyond the boundaries of those that selectively responded to faces and/or bodies compared to houses. **Figure 7B** overlays the group-level searchlight results upon the regions that showed mean activation differences between faces and bodies. Most of the voxels in the TOFC ROI that strongly respond to faces and/or bodies also contained local pattern differences between the two categories, even without mean activation differences.

### **DISCUSSION**

There is agreement in the literature that the perception of both faces and bodies activates regions of the VOTC, principally within the fusiform gyrus. At issue is the degree to which these activations overlap or are anatomically distinct, and thus provide evidence for a highly modular or more distributed neural architecture. The overlap in activation in the VOTC evoked by the perception of faces and bodies has been observed in prior studies (e.g., Morris et al., 2006, 2008; Spiridon et al., 2006), as has the finding that some voxels are more strongly activated by one or the other category (e.g., Schwarzlose et al., 2005; Weiner and

**Table 3 | Confusion matrices from three-way classification of face, body, and house blocks.**


Values represent average fraction (across participants) of face, body, and house blocks each classified as face, body, or house blocks. An ideal confusion matrix (i.e., perfect classification) would have values of 1 on the diagonal (grayed entries) and values of 0 on the off-diagonal entries (errors). RH, right hemisphere; LH, left hemisphere.

Grill-Spector, 2010). Our results indicate more overlap than has been reported in prior studies. In the right hemisphere, we observed that 54% of the face > house activation overlapped with the body > house contrast, while other groups have reported less than 30% on average (Schwarzlose et al., 2005; Weiner and Grill-Spector, 2010). Furthermore, following correction for multiple comparisons, we found no voxels in which faces evoke more activity than bodies in group level analyses, and relatively few voxels showing such differences in uncorrected statistical comparisons.

The shapes of the spatial distributions of activation for faces and bodies in the medial-lateral extent of the fusiform gyrus were similar. However, despite having similarly located spatial peaks (i.e., where the beta values were at maximum), the spatial distribution of activation for bodies was somewhat kurtotic in appearance relative to faces – i.e., somewhat flatter in the middle and with more activity in the medial and lateral tails. The group-level GLM revealed a significant body > face contrast in both the medial and lateral tails of the distribution. The voxels in the medial aspect of the distribution tail did not differentiate bodies from houses, and thus recalls Morris et al. (2006) who argued that a medial fusiform activation by bodies was not a domain specific process. However, the voxels in the lateral tail of the spatial distributions *did* reveal voxels that were more strongly activated by bodies than by faces *and* by houses. It may be, then, that it is these voxels that compose the fusiform body area, although the bulk of these voxels were within or lateral to the inferior occipital sulcus and thus located in the inferior temporal gyrus. It is notable, however, that the activation evoked by bodies in this lateral region is less than half of that observed in the mid-fusiform gyrus where faces and houses evoke nearly equivalent activation. That is, the VOTC area in which significant differences between faces and bodies were obtained in GLM was at the lateral periphery of *both* the face and body fusiform activations.

If the existence of voxels where faces and bodies evoke significantly different levels of activation is evidence for a discrete neural instantiation of a modular processing stream, what then, does overlap represent? Does the overlap represent a hemodynamic or vascular smearing or other spatial blurring of otherwise discrete neural representations? Is it an artifact of combining data across subjects? Or is this evidence for a functional convergence of face and body activations? Hemodynamic smearing or other blurring would seem more likely to occur in a region between two spatially distinct peaks. However, as we have seen, the medial– lateral peak of the face and body activations were roughly the same in the mid-fusiform, and the area showing the strongest body > (face and house) response is lateral to both peaks. While it is very likely that combining across subjects contributed to some of the observed overlap, our individual subject analysis revealed substantial overlap of face and body activations, with overlap as high as 80% of activated voxels in the right fusiform gyrus of one individual.

The issue of functional convergence is less easily addressed. Using MVPA, we observed that the pattern of activation within the region of overlap, where no mean activation differences between faces and bodies were present, still contained sufficient information to discriminate faces from bodies. The confusion matrices for

three-way classification indicated that the classifier did not simply distinguish faces from non-face stimuli, or bodies from non-body stimuli. Good classification accuracies in regions of overlap were found at both the group-level and individual subject analysis. These results are compatible with the idea that faces and bodies have an intermixed or patchy representation at a finer scale within a larger face and body sensitive area (Pinsk et al., 2009; Weiner and Grill-Spector, 2010) and support the suggestion from earlier studies (e.g., Grill-Spector et al., 2006; Çukur et al., 2013) that

the VOTC, and the FFA in particular, has a more heterogeneous organization than previously appreciated.

Discriminable activation patterns for faces and bodies have been previously reported. For example, Weiner and Grill-Spector (2010) used a winner-take-all classifier to identify faces or body parts among six object categories (faces, body parts, houses, flowers, guitars, and cars; chance level 17%).Within the voxels in lateral ventral temporal cortex that showed selective response to faces or body parts (i.e., the union of face-selective and limb-selective

voxels), accuracies of 97% for faces and 94% for body parts were found. However, these accuracies are difficult to directly compare to the present study as different classifier methods and different ROIs were used. We have applied a more restrictive criterion for voxel selection and included a relatively small number of examples (i.e., the number of blocks for each stimulus type), which might have reduced classification performance (O'Toole et al., 2007; Etzel et al., 2009). Our results show that, even in the voxels that fall within overlapping activations, faces and bodies were discriminated above chance in the patterns of activity. This finding suggests that the representations of faces and bodies converge in some regions of VOTC, but remain nevertheless discriminable.

This interpretation is consistent with a recent intracranial EEG study from our laboratory (Engell and McCarthy, 2014) in which ERPs recorded from subdural electrodes showed strong selectivity to different corporeal stimuli (faces, isolated eyes, and headless bodies) compared to a control category (flowers) from sites along the fusiform gyrus and surrounding cortex. However, most sites that were selective to one type of corporeal stimulus were also sensitive to the other types – that is, there were few sites that responded exclusively to one type of corporeal stimulus – and only one of 1536 electrode sites examined in 12 subjects showed a specific response to bodies compared to faces and isolated eyes. Engell and McCarthy did find, however, instances in which the different corporeal stimuli evoked a difference in the spatial distribution of voltage over closely spaced adjacent electrodes – suggesting that faces, bodies, and eyes engaged a different configuration of current sources and sinks, despite activating the same electrodes. Engell and McCarthy concluded that this was evidence consistent with a lumpy or patchy representation of corporeal stimuli that may be evident at a finer spatial resolution than that offered by fMRI.

However, other studies have suggested that regions of the fusiform gyrus identified as face-selective in standard localizer tasks respond strongly to such stimuli as dynamic point-light displays of human ambulation (Engell and McCarthy, 2013) and the purposeful, or causal, movements of machines that are otherwise devoid of human surface characteristics (Shultz and McCarthy, 2012). This suggests that at least some of the regions of VOTC identified as face-selective by standard localizer tasks integrate information about social or intentional agents – and may as a consequence show task-related variation in patterns of activation in addition to stimulus-related variation. Based upon intracranial ERP studies, we previously suggested that there may be a temporal course whereby areas initially responding to an exemplar of a specific stimulus category (perhaps reflected by the initial stimulus-driven N200 ERP recorded directly from the fusiform gyrus) followed by a period where the initial representation is modified by other stimulus and task factors and perhaps reflected in the subsequent gamma activity at the same electrode sites (Puce et al., 1999; Engell and McCarthy, 2010).

Our paper has focused upon the fusiform gyrus and on functional regions defined in the extensive literature on high-level visual perception as the FFA and FBA using typical methods for identifying these regions. However, we also used a whole-brain searchlight MVPA approach to explore for other regions that could

discriminate faces from bodies at above chance levels. Extensive regions of the VOTC beyond the operationally defined FFA and FBA could significantly discriminate faces from bodies, as could the posterior STS, the supramarginal gyrus and intraparietal sulcus. Faces and bodies are visually very different, and so perhaps it is not surprising that many visual regions of the brain can discriminate these stimuli. Of course this same concern applies to the discriminability between faces and bodies observed in the overlap region of our operationally defined FFA and FBA. Our results demonstrate differential pattern information specific to faces and bodies, but the current study does not address what specific information within faces or bodies is represented in the patterns of activity. Given the enhanced sensitivity of multi-voxel patterns to more specific information compared to univariate analyses, future research may be able to investigate whether more specific features of a face or body image, rather than generic categorical information, can also be decoded by activation patterns in VOTC.

To conclude, the current study investigated the similarities and distinctiveness of face- and body-evoked activations in VOTC, specifically in the fusiform gyrus. The results support that regions in VOTC maintain functional specificity for faces and bodies, even though the two categories were not differentiated in mean activation levels within those regions. This study also exemplifies the use of univariate and multivariate analyses in investigating similar but disparate activations of a local brain region.

## **ACKNOWLEDGMENTS**

This research was supported by National Institute of Mental Health grant MH-005286 (Gregory McCarthy) and the Yale University Faculty of Arts and Sciences (FAS) Imaging Fund.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 04 June 2014; accepted: 29 July 2014; published online: 14 August 2014. Citation: Kim NY, Lee SM, Erlendsdottir MC and McCarthy G (2014) Discriminable spatial patterns of activation for faces and bodies in the fusiform gyrus. Front. Hum. Neurosci. 8:632. doi: 10.3389/fnhum.2014.00632*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Kim, Lee, Erlendsdottir and McCarthy. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Face identity matching is influenced by emotions conveyed by face and body

## **Jan Van den Stock1,2 and Beatrice de Gelder1,3,4\***

<sup>1</sup> Department of Neuroscience, Division of Old Age Psychiatry, Brain and Emotion Laboratory Leuven (BELL), KU Leuven, Leuven, Belgium

<sup>2</sup> Old Age Psychiatry, University Hospitals Leuven, Leuven, Belgium

<sup>3</sup> Cognitive and Affective Neuroscience Laboratory, Tilburg University, Tilburg, Netherlands

<sup>4</sup> Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Brain and Emotion Laboratory Maastricht, Maastricht University, Maastricht, Netherlands

#### **Edited by:**

Davide Rivolta, University of East London, UK

#### **Reviewed by:**

Anna Sedda, University of Pavia, Italy David P. McGovern, Trinity College Institute of Neuroscience, Ireland Megan L. Willis, Australian Catholic University, Australia

#### **\*Correspondence:**

Beatrice de Gelder, Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Brain and Emotion Laboratory Maastricht, Maastricht University and Maastricht Brain Imaging Centre, Oxfordlaan 55, 6229 ER Maastricht, Netherlands e-mail:

b.degelder@maastrichtuniversity.nl

Faces provide information about multiple characteristics like personal identity and emotion. Classical models of face perception postulate separate sub-systems for identity and expression recognition but recent studies have documented emotional contextual influences on recognition of faces. The present study reports three experiments where participants were presented realistic face-body compounds in a 2 category (face and body) × 2 emotion (neutral and fearful) factorial design. The task always consisted of two-alternative forced choice facial identity matching. The results show that during simultaneous face identity matching, the task irrelevant bodily expressions influence processing of facial identity, under conditions of unlimited viewing (Experiment 1) as well as during brief (750 ms) presentation (Experiment 2). In addition, delayed (5000 ms) face identity matching of rapidly (150 ms) presented face-body compounds, was also influenced by the body expression (Experiment 3). The results indicate that face identity perception mechanisms interact with processing of bodily and facial expressions.

**Keywords: face, body, emotion, identity, context**

## **INTRODUCTION**

Faces provide powerful interpersonal communicative signals and influential theories of face perception have proposed dedicated behavioral and neural mechanisms underlying perception of faces. Two hallmarks of classical theories of face perception are that processing of faces is dominant over other object classes and that different kinds of facial information like identity, expression and direction of gaze are processed in separate, relatively independent subsystems (e.g., Bruce and Young, 1986; Haxby et al., 2000; Calder and Young, 2005). Yet, there is growing evidence challenging these basic principles. For instance, it has been reported that contextual cues that in daily life frequently co-occur with faces influence how we perceive and process faces (de Gelder et al., 2006; de Gelder and Van den Stock, 2011b; Wieser and Brosch, 2012). For example, studies have shown that perception of facial expressions is influenced by vocal expressions (de Gelder and Vroomen, 2000), bodily expressions (Meeren et al., 2005; Van den Stock et al., 2007; Aviezer et al., 2008) and background scenes (Righart and de Gelder, 2006, 2008a; Van den Stock and de Gelder, 2012; Van den Stock et al., 2013b). There is also evidence that facial expressions influence recognition of body expressions (Willis et al., 2011). From a theoretical perspective, these crosscategorical emotional context influences may be explained by activation of an emotion system that is not category specific and therefore common for faces and bodies, thereby modulating face expression categorization.

Secondly, a few studies have challenged the notion of segregated processing streams for identity and expression perception. On the one hand, there is evidence from studies exploiting perceptual mechanisms like interference (Schweinberger and Soukup, 1998; Schweinberger et al., 1999) and adaptation (Leopold et al., 2001; Webster et al., 2004), indicating that recognition of facial expressions interacts with task-irrelevant processing of facial identity, while recognition of identity is relatively independent of facial expression (Fox and Barton, 2007; Fox et al., 2008). On the other hand, using a sequential match-to-sample paradigm, Chen et al. (2011) reported lower accuracies for matching facial identities with emotional expressions, compared to neutral faces, consistent with other studies using different paradigms (D'argembeau et al., 2003; Kaufmann and Schweinberger, 2004; Gallegos and Tranel, 2005; D'argembeau and Van Der Linden, 2007; Savaskan et al., 2007; Levy and Bentin, 2008). In addition, there is clinical evidence from subjects with prosopagnosia that identity perception is influenced by the emotion conveyed by the face (de Gelder et al., 2003; Van den Stock et al., 2008; Huis in 't Veld et al., 2012).

These studies investigated either contextual influences on face emotion perception or interactions between face identity and face emotion processing. However, little is known about whether contextual emotion cues, such as body postures also influence perception of the facial identity, which is presumably, at least partly processed by different mechanisms than the ones that are the emotional components in the face perception network (Haxby and Gobbini, 2011). In this study, we combine findings of contextual modulation of facial expression perception on the one hand, and face identity and emotion interactions on the other hand. We investigated whether emotional information conveyed by both facial and bodily expressions influences perception of facial identity. For this purpose we created compound images of whole persons consisting of either neutral or emotional faces and bodies that had matched or mismatched expressions while participants were always required to assess the face identity. This design allows contrasting predictions of different theories on facial identity recognition. On the one hand, theories dedicating a cardinal role to processing of the shape of the face (e.g., Kanwisher et al., 1997), would predict minimal influences of both the facial as well as the bodily expression. On the other hand, a significant influence of the emotion of the facial and bodily expression on face identity recognition is more compatible with theories proposing distributed but parallel and interactive processing of multi-faceted faces (e.g., de Gelder et al., 2003; Campanella and Belin, 2007).

## **EXPERIMENT 1: SELF-PACED SIMULTANEOUS MATCHING OF FACE IDENTITY**

#### **METHOD**

#### **Participants**

Twenty participants volunteered for the experiment (10 male, mean (SD) age = 23.9 (7.7)) in exchange for course credits. None of the participants had a neurologic or psychiatric history and all had normal or corrected to normal vision. Informed consent was obtained according to the declaration of Helsinki.

#### **Stimulus materials**

Pictures of facial expressions were taken from the Karolinska Directed Emotional Faces (KDEF) (Lundqvist et al., 1998) and from our own database. In a pilot study, the faces were randomly presented one by one on a screen and participants (*N* = 20) were instructed to categorize the emotion expressed in the face in a seven alternative forced choice paradigm (anger, disgust, fear, happiness, neutral, surprise or sadness). None of these participants took part in any of the other experiments. On the basis of this pilot study, we selected 80 fearful (40 female) and 80 neutral (40 female) facial expressions, all recognized accurately by at least 75% of the participants.

Stimuli of whole body expressions were taken from our own validated database (de Gelder and Van den Stock, 2011a). The selected stimuli displayed fearful body postures and an instrumental action (pouring water in a glass). We used action images instead of neutral body postures, because like the fearful expressions, instrumental actions elicit movement and action representation and we wanted to control for these variables. Forty fearful (20 female) and 40 instrumental (20 female) body expressions were selected.

We created realistic face-body compounds by carefully resizing and combining facial and bodily expressions. A total of 80 compound stimuli were created following a 2 face (fearful and neutral) × 2 body (fearful and neutral) factorial procedure, resulting in 20 stimuli (10 male) per condition. Face and body were always of the same gender, but only half of face-body pairs expressed the same emotion, with the other half displaying an emotion mismatch (e.g., a fearful face with a neutral body).

#### **Procedure**

A trial consisted of a compound face-body stimulus presented simultaneously with two face images left and right underneath the face-body compound image. One of the faces was the same as the face of the compound stimulus. The other face belonged to a different actor, but was matched regarding emotional expression as well as main visual features, such as hair color and gender (see **Figure 1** for stimulus examples). Participants were instructed to indicate which of the two bottom faces matched the one of the compound stimulus. We attempted to minimize the visibility of non-face identifying cues, such as hairstyle in both face alternatives. Therefore, both face alternatives only showed the inner canvas of the head, this in order to reduce simple imagematching processes. Instructions stressed to answer as accurately and as quickly as possible. The stimuli were presented until the participant responded. Interstimulus interval was 2000 ms. The experiment started with two practice trials, during which the subject received feedback. The position of the target face was counterbalanced.

### **RESULTS**

Mean accuracies and median response times (RTs) were calculated for every condition. The results are shown in the left panel of **Figure 2**. A 2 facial expression (fearful and neutral) × 2 bodily expression (fearful and neutral) repeated measures ANOVA was carried out on the accuracy and Response time (RT) data. This revealed for the accuracy data a main effect of facial expression (*F*(1,19) = 4.571; *p* = 0.046; η 2 *<sup>p</sup>* = 0.194) and bodily expression (*F*(1,19) = 4.678; *p* = 0.043; η 2 *<sup>p</sup>* = 0.198) , but no significant interaction (*F*(1,19) = 0.812; *p* = 0.379; η 2 *<sup>p</sup>* = 0.041). The main effect of facial expression reflects that neutral faces are matched more accurately than fearful faces, while the main effect of body expression indicates that faces with a neutral body are more accurately matched than faces with a fearful body. The reaction time data only showed a main effect of bodily expression (*F*(1,19) = 12.100; *p* = 0.003; η 2 *<sup>p</sup>* = 0.389), indicating that matching faces with a neutral body was performed faster than matching faces with a fearful body.

There was an equal number of male and female participants in the present experiment, as there is evidence of gender differences in emotion perception (Donges et al., 2012; Kret and de Gelder, 2012). To investigate the influence of gender of the observer on the results, we performed the same repeated-measures ANOVAs with gender of the observer as an additional between subjects variable. This revealed that there were no significant main or interaction effects of gender of the observer (all *p'*s ≥ 0.239). Therefore, we considered gender of the observer as a variable of non-importance in the following experiments.

#### **DISCUSSION**

The results show that matching of facial identity is influenced by the emotion expressed in the face, but also by the task irrelevant body expression as seen in the accuracy and reaction time data.

**FIGURE 1 | Stimulus examples**. Examples of experimental stimuli showing on top a fearful face on a fearful body **(A)**; a neutral face on a fearful body **(B)**; a fearful face on a neutral body **(C)** and a neutral face on a neutral body **(D)**. On the bottom two face identities are

presented. Both show the same expression as the one on top, but only one is of the same actor as the face on top (in the figure the bottom left alternative is always of the same identity as the one on top).

Accuracy and reaction time data show consistent patterns, indicating that the effects cannot be explained by a speed-accuracy trade-off. The lower accuracy for matching identity of fearful faces compared to neutral faces is in line with a recent study using a sequential match-to-sample paradigm (Chen et al., 2011). More interesting for the present purpose: the body expression effect shows that the previously reported influence of body emotion on recognition of facial emotion (de Gelder et al., 2006; de Gelder and Van den Stock, 2011b) extends to facial identity recognition.

Although the instruction stated to respond as accurately and as fast as possible, the viewing time was unlimited. A possible explanation for the body expression effect may be that subjects spent more time looking at the fearful body expressions, compared to the neutral ones. Therefore, a question is whether the body expression effect still obtains with limited viewing time when the duration of stimulus presentation is too short to allow exploration of task irrelevant stimulus attributes. We investigated this issue in Experiment 2.

## **EXPERIMENT 2: TIME-CONSTRAINED SIMULTANEOUS MATCHING OF FACE IDENTITY**

#### **METHOD Participants**

Nineteen participants volunteered for the experiment (2 male, mean (SD) age = 19.2 (1.6)) in exchange for course credits. None of the participants had a neurologic or psychiatric history and all had normal or corrected to normal vision. Informed consent was obtained according to the declaration of Helsinki.

## **Procedure**

The procedure was identical to the one in Experiment 1, except that stimulus presentation was limited to 750 ms. A pilot study with different durations indicated that 750 ms was the shortest duration that was still associated with an acceptable accuracy rate (>75%).

#### **RESULTS**

We conducted the same analysis as described in Experiment 1. The results are shown in the middle panel of **Figure 2**. RT data refer to RTs post-stimulus offset. The ANOVA on the accuracy data revealed a main effect of bodily emotion (*F*(1,18) = 10.174; *p* = 0.005; η 2 *<sup>p</sup>* = 0.361) and body × face emotion interaction (*F*(1,18) = 12.455; *p* = 0.002; η 2 *<sup>p</sup>* = 0.409). The main effect of body expression indicates that faces with a neutral body are more accurately matched than faces with a fearful body. To follow up on the interaction, we quantified the effect of body emotion (neutral body minus fearful body) as a function of face emotion. A paired sample *t*-test showed that the body emotion effect was significantly larger for neutral faces (*t*(18) = 3.529, *p* = 0.002). More specifically, fearful bodies result in lower accuracies, but only when they are presented with a neutral face (*t*(18) = 4.328; *p* < 0.001) and not with a fearful face (*t*(18) = 0.475; *p* = 0.640). The analysis of the reaction times revealed a main effect of facial emotion (*F*(1,18) = 13.552, *p* = 0.002; η 2 *<sup>p</sup>* = 0.430) as the only significant result, with fearful faces resulting in longer RTs than neutral faces.

### **DISCUSSION**

The results of Experiment 2 show that the body expression effect also holds when the viewing time is shortened to 750 ms in order to minimize visual exploration of the task irrelevant body expression. Moreover, a pilot study showed that 750 ms is the minimal duration to obtain an overall accuracy of at least 75% (when chance level is 50%). This result indicates that the body expression effect cannot fully be explained by extensive visual exploration of the fearful body expressions, compared to the neutral body expressions. Although 750 ms was the shortest duration at which participants showed a satisfactory performance according to the results of the pilot study, this duration does not exclude a differential looking time at fearful vs. neutral bodies.

In addition, the results indicate that the body expression effect primarily occurs when the facial expression is neutral, consistent with our previous study on the influence of body expressions on categorization of facial expressions (Van den Stock et al., 2007).

In both Experiments 1 and 2, participants had to make a saccade from the face on top to the two faces at the bottom of the stimulus. The area spanning the distance between the two fixation regions contains the bodily expression, which raises the question whether the effects can be explained by the fact that a saccade always covers the region of the body expression. To investigate this issue, we modified the design in order to exclude saccades across the body expression in Experiment 3.

## **EXPERIMENT 3: TIME-CONSTRAINED DELAYED MATCHING OF FACE IDENTITY**

#### **METHOD Participants**

Nineteen participants volunteered for the experiment (14 male, mean (SD) age = 19.8 (1.9)) in exchange for course credits. None of the participants had a neurologic or psychiatric history and all had normal or corrected to normal vision. Informed consent was obtained according to the declaration of Helsinki.

#### **Procedure**

The procedure was identical to the one in Experiment 1, except that the task was modified to a delayed match-to-sample task. The face-body compound was presented for 150 ms, which is insufficient to encode the face and make a saccade. A 5000 ms delay during which a blank screen was presented, followed the stimulus. We included this delay, to avoid responses based on after-images. Subsequently, the two isolated faces were presented until the participant responded. This design does not require any saccades of the subject during presentation of the face-body compound stimulus and minimizes the occurrence of after-image effects.

While we could also have moved the answer stimuli above the central display to avoid saccades, we preferred to make a more substantial change to the design, while maintaining the central research question (does body emotion influence processing of face identity?). Furthermore, the 150 ms presentation of the composite stimulus does not provide enough time to look at the task irrelevant body as well as sufficiently encoding the identity of the face stimulus. It should be stated that the task required that the identity was sufficiently encoded and stored in working memory, as the response screen did not appear until 5000 ms after the offset of the composite stimulus.

## **RESULTS**

The results are shown in the right panel of **Figure 2**. RT data refer to RTs measured from the onset of the screen showing the two face images. The analysis of the accuracy data revealed a main effect of body expression (*F*(1,18) = 8.824, *p* = 0.008; η 2 *<sup>p</sup>* = 0.329), while there was a main effect of body (*F*(1,18) = 6.958, *p* = 0.017; η 2 *<sup>p</sup>* = 0.279) and face expression (*F*(1,18) = 5.449, *p* = 0.031; η 2 *<sup>p</sup>* = 0.232) in the RT data. The main effects of body expression reflect the fact that faces combined with a neutral body are matched faster and more accurate than faces with a fearful body, while the main effect of facial expression indicates that neutral faces are matched faster than fearful faces.

## **DISCUSSION**

The results show that sequential matching of face identity is influenced by the task irrelevant body expression, even when presentation time is reduced to 150 ms, no saccades are required and the influence of after-image effects are minimized.

## **BETWEEN-EXPERIMENTS ANALYSIS**

To investigate the effect of the three experimental designs, we performed a repeated-measures ANOVA with version as betweensubjects variable (self-paced direct matching; time constrained direct matching; delayed matching) and facial expression and body expression as within-subject variables on the accuracy and the reaction time data. For the accuracy data, the results revealed a significant main effect of body expression (*F*(1,55) = 23.878, *p* < 0.001; η 2 *<sup>p</sup>* = 0.303), reflecting lower performance for fearful body expressions; a significant main effect of version (*F*(2,55) = 8.686, *p* < 0.001; η 2 *<sup>p</sup>* = 0.240), a significant body expression × face expression interaction (*F*(1,55) = 4.186, *p* = 0.046; η 2 *<sup>p</sup>* = 0.071) and finally a significant body expression × face expression × version interaction (*F*(2,55) = 4.560, *p* = 0.015; η 2 *<sup>p</sup>* = 0.142). Bonferroni corrected post-hoc tests on the main and interaction effects revealed that accuracies were higher in Experiment 1 (selfpaced) than in Experiment 2 (*p* = 0.004) and Experiment 3 (*p* = 0.001), while there was no difference between Experiments 2 and 3 (*p* = 0.999). Follow-up of the body expression × face expression interaction by means of a paired *t*-tests showed that the effect of the body emotion (neutral body minus fearful body) was larger for neutral faces than for fearful faces, although this was only marginally significant (*t*(57) = 1.877, *p* = 0.066). More specifically, a fearful body expression only significantly reduced performance when the face was neutral (*t*(57) = 4.096, *p* < 0.001) but not when the face was fearful (*t*(57) = 1.327, *p* = 0.379). Similarly, a fearful face expression only reduced performance when the body was neutral (*t*(57) = 2.152, *p* = 0.036) and not when the body was fearful (*t*(57) = 0.596, *p* = 0.553). We performed a one-way ANOVA with Experiment (3 levels) as factor on the differential effect of body emotion on face emotion ((neutral face/neutral body minus neutral face/fearful body) minus (fearful face/neutral body minus fearful face/fearful body)). This revealed a main effect (*F*(2,57) = 4.560, *p* = 0.015) and Tukey Honestly Significant Difference (HSD) corrected *post-hoc*

tests showed that there was only a significant difference between Experiment 1 and 2, indicating that the body emotion × face emotion interaction effect was larger in Experiment 2 than in Experiment 1. For the reaction time data, there was a main effect of body expression (*F*(1,55) = 21.455, *p* < 0.001; η 2 *<sup>p</sup>* = 0.281) reflecting slower performance for fearful body expressions; a main effect of face expression (*F*(1,55) = 10.500, *p* = 0.002; η 2 *<sup>p</sup>* = 0.160), reflecting slower performance for fearful face expressions; and a main effect of version (*F*(2,55) = 41.670, *p* < 0.001; η 2 *<sup>p</sup>* = 0.602).

## **GENERAL DISCUSSION**

Recently we have documented that recognition memory for face identity is influenced by the affective valence of the visual context, as conveyed by body expressions (Van den Stock and de Gelder, 2012). We hypothesized that these differences originate at the perception stage and therefore predicted for the current study that matching of facial identity is influenced by the emotional context, i.e., body expressions (de Gelder and Bertelson, 2003).

We performed three experiments investigating the influence of task irrelevant body and face expressions on processing of facial identity. Participants were presented realistic face-body compounds in a 2 category (face and body) × 2 emotion (neutral and fearful) factorial design. The task always consisted of twoalternative forced choice facial identity matching. Although the task variables were increasingly manipulated to tap into facial identity processing and aimed to minimize effects of non-interest, such as simple image matching, viewing time and attention, there was always an influence of the task irrelevant body expression. Moreover, the analysis of the pooled data of the three experiments revealed that the most significant and largest effect was the effect of body emotion.

There is evidence showing that both faces and bodies share similar perceptual (Robbins and Coltheart, 2012) and neural (Reed et al., 2003; Stekelenburg and de Gelder, 2004; Van De Riet et al., 2009; Schmalzl et al., 2012) processing routines and this may be the underlying mechanism through which facebody interactions occur. In fact, a similar mechanism has been proposed for facial expression recognition (Van den Stock et al., 2007) and recent data indicate that disrupting the canonical facebody configuration, reduces the influence of the body expression on the recognition of the facial expression (Aviezer et al., 2012). Although accumulating evidence shows that both faces and bodies are processed configurally, this does not exclude that a face-body compound stimulus is processed as one configuration. In fact, an event-related potential (ERP)-study showed that the emotional expression of a body influences the early electrophysiological markers (P1, occurring around 115 ms) during facial expression categorization (Meeren et al., 2005). Perhaps the strongest behavioral support for the hypothesis that processing of the identity of a face has a strong intrinsic coupling with the body is provided in a recent study revealing that adaptation to body identity results in perceptual after-effects on facial identity perception (Ghuman et al., 2010).

Alternatively, it cannot be ruled out that a fearful body expression attracted more (covert) attention (Posner and Petersen, 1990) than the neutral body posture (Bannerman et al., 2009). In line with this there is evidence from cortically blind patients indicating that body shape and body emotion is processed even without awareness (Tamietto et al., 2009; Van den Stock et al., 2011, 2013a). Orienting responses may be triggered by the emotional body expression in order to detect the source of potential danger, leading to a reduced encoding of facial details (Kensinger et al., 2007). This could lead to a reduction in time used to process facial identity when combined with a fearful body expression, which could account for the results we report here.

These hypotheses both have adaptive benefits at face value. In the face of danger (as communicated by fearful conspecifics), the primary focus would be to detect and adequately react to the source of danger, rather than devoting resources to the processing of the identity of the bystanders. In fact, we have previously provided evidence for a neural mechanism supporting motor preparation when viewing fearful body expressions (de Gelder et al., 2004). The finding that the body expression effect is primarily observed with neutral faces is compatible with this line of reasoning. When the stimulus at the focus of attention, i.e., the face, is signaling threat, the body expression is of less importance and has less influence. By extension, the present results provide evidence that the interactions between face identity and face emotion processing that have been previously reported (D'argembeau et al., 2003; Kaufmann and Schweinberger, 2004; Gallegos and Tranel, 2005; D'argembeau and Van Der Linden, 2007; Savaskan et al., 2007; Levy and Bentin, 2008; Chen et al., 2011) also apply for face identity and body expression.

However, in the analysis of the accuracy data of the combined Experiments, there was a main effect of body expression, while the effect of face expression only occurred in interaction with the body expression. The interaction effect more particularly revealed that the effect of body expression was significantly larger when the face expression was neutral and similarly, the effect of face expression only occurred when the body expression was neutral. The absence of a main effect of face expression, in combination with the occurrence of the main effect of body expression and face × body expression interaction may reflect that the body expression influence outweighs the influence of facial emotion on face identity matching. This conjecture would be in line with fMRI-studies, directly comparing emotional face and body stimuli. While faces typically trigger more amygdala and striate cortex activity compared to bodies, the inverse contrast appears to activate a more widespread and extensive set of regions, including frontal, parietal, temporal, occipital and subcortical structures (Van De Riet et al., 2009; Kret et al., 2011).

Although cross-categorical influences on emotion recognition have been mainly examined at the perception stage, the neural correlates of emotional influence on identity recognition have been primarily investigated in the memory stage and the findings point to an important role of the amygdala (for reviews, see Hamann, 2001; Kensinger, 2004; Phelps, 2004; Labar and Cabeza, 2006). The amygdala may also play a role in the effects we observe in the present study. It has been documented that both neutral and fearful faces activate the amygdala (Zald, 2003; Fusar-Poli et al., 2009), as well as fearful and neutral body expressions (Hadjikhani and de Gelder, 2003; de Gelder et al., 2004, 2010; Van den Stock et al., 2014). In addition, we have shown that emotional body expressions presented in the blind hemifield of a cortically blind patient activates the amygdala as well as other subcortical structures like colliculus superior and the thalamic pulvinar (de Gelder and Hadjikhani, 2006; Van den Stock et al., 2011). These findings support the notion that emotional body expressions are processed automatically and thereby have an influence on face identity perception.

The current study supports the notion that the effects of body expression on recognition memory for face identity (Van den Stock and de Gelder, 2012) originate at least in part during the perception stage. In Experiment 1 we used a rather "liberal" setup with unlimited viewing time and participants were instructed to respond as accurate and quickly as possible. Although the average reaction time was around 1500 ms, the accuracy data showed no ceiling effect. This finding may be explained by the fact that participants engaged in visual exploration of the task irrelevant body expression.

In Experiment 2, stimulus presentation of the face-body compound was limited to 750 ms, which was the minimal duration to allow sufficient accuracy (>75%) on the basis of a pilot study. Although we have no objective measure that participants refrained from looking at the body expression, the short presentation of the compound stimulus does not allow elaborate exploration of the body expression. The average reaction time of about 1200 ms (750 ms stimulus presentation + around 450 ms response latency) is about 300 ms shorter than in Experiment 1 and compatible with the notion that participants spent more time looking at the body expression in Experiment 1.

However, in both Experiments 1 and 2, the task required making a saccade across the body expression. This was no longer the case in Experiment 3, which also reduced presentation of the compound stimulus to 150 ms, which is insufficient to visually explore the task irrelevant body expression. Interestingly, the results still showed an influence of the body expression on face identity processing.

In conclusion, the results of the present study indicate that task irrelevant bodily expressions influence facial identity matching under different task conditions and hence the findings are compatible with an automatic interaction of emotional expression information and face identity processing.

## **ACKNOWLEDGMENTS**

Jan Van den Stock is a post-doctoral researcher supported by FWO-Vlaanderen (1.5.072.13N). Beatrice de Gelder is partly supported by FES and FP7-FET-Open grant and by an Adv ERC grant.

## **REFERENCES**


Bruce, V., and Young, A. W. (1986). Understanding face recognition. *Br. J. Psychol.* 77, 305–327.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 November 2013; accepted: 23 January 2014; published online: 12 February 2014.*

*Citation: Van den Stock J and de Gelder B (2014) Face identity matching is influenced by emotions conveyed by face and body. Front. Hum. Neurosci. 8:53. doi: 10.3389/fnhum.2014.00053*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2014 Van den Stock and de Gelder. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Emotions affect the recognition of hand gestures

## *Carmelo M. Vicario\* and Anica Newman*

*School of Psychology, University of Queensland, Brisbane, Australia*

#### *Edited by:*

*Davide Rivolta, Max Planck Society, Germany*

#### *Reviewed by:*

*Anthony Singhal, University of Alberta, Canada John Symons, University of Kansas, USA*

#### *\*Correspondence:*

*Carmelo M. Vicario, School of Psychology, The University of Queensland, McElwain Building, St. Lucia, QLD 4072, Australia e-mail: carmelo.vicario@uniroma1.it* The body is closely tied to the processing of social and emotional information. The purpose of this study was to determine whether a relationship between emotions and social attitudes conveyed through gestures exists. Thus, we tested the effect of pro-social (i.e., happy face) and anti-social (i.e., angry face) emotional primes on the ability to detect socially relevant hand postures (i.e., pictures depicting an open/closed hand). In particular, participants were required to establish, as quickly as possible, if the test stimulus (i.e., a hand posture) was the same or different, compared to the reference stimulus (i.e., a hand posture) previously displayed in the computer screen. Results show that facial primes, displayed between the reference and the test stimuli, influence the recognition of hand postures, according to the social attitude implicitly related to the stimulus. We found that perception of pro-social (i.e., happy face) primes resulted in slower RTs in detecting the open hand posture as compared to the closed hand posture. Vice-versa, perception of the anti-social (i.e., angry face) prime resulted in slower RTs in detecting the closed hand posture compared to the open hand posture. These results suggest that the social attitude implicitly conveyed by the displayed stimuli might represent the conceptual link between emotions and gestures.

#### **Keywords: facial expressions, anger, happiness, pro-sociality, antisociality, hand posture**

## **INTRODUCTION**

The body is closely tied to the processing of social and emotional information. Embodied cognition theories posit that knowledge is grounded in the brain's modal systems for perception, action, and affect (Candidi et al., 2010; Vicario et al., 2013). These systems are automatically engaged during online conceptual processing, thus allowing the re-enactment of modality-specific patterns of activity similar to those called into play during the actual experience of perception, action, and emotion (Barsalou et al., 2003; Barsalou, 2008).

Several versions of embodied cognition have been proposed in the last 20 years (for a discussion, see Wilson, 2002). A common aspect emphasized by embodied cognition theories is the simulation of experience in modality-specific systems. Examples include Glenberg's (1997) theory of memory, Barsalou (1999) theory of perceptual symbol systems, and Damasio's (1994) theory of emotion. The main idea underlying all theories is that cognitive representations and operations are fundamentally grounded in their physical context (Niedenthal et al., 2005). For example, Reed and Farah (1995) asked a group of participants to judge whether two human figures depicted the same posture. The results showed that participants were better at detecting changes in the arm position of a visually presented figure while using their upper limb to generate a response and better at detecting changes in the figure's legs while moving their own legs to generate a response.

Interesting insights in support of the embodied cognition theories have also been provided in a study on attitudes, conceived by Darwin (1872/1904) as a collection of motor behaviors especially postures—that convey an organism's affective response toward an object. For example, Wells and Petty (1980) instructed a group of participants to nod their heads vertically, or to shake their heads horizontally, while wearing headphones. While performing these movements, participants heard either a disagreeable or an agreeable message about a university-related topic. In a subsequent phase, they rated the degree to which they agreed with the message. The results showed that the participants' head movements modulated their judgments. Specifically, participants who nodded their heads while hearing the message judged it to be more favorable than participants who shook their heads.

Furthermore, embodiment is also critically involved in the representation of emotions. Niedenthal et al. (2001) demonstrated that facial mimicry plays a causal role in the processing of emotional expressions. Participants watched one facial expression morph into another and had to detect when the expression changed. Some participants were free to mimic the expressions, while others were prevented from mimicking by holding a pencil laterally between their lips and teeth. Consistent with the embodiment hypothesis, participants free to mimic the expressions detected the change in emotional expression more efficiently than did participants who were prevented from mimicking the expressions. This evidence (see Niedenthal et al., 2005 for other works on this argument) suggests that feedback from facial mimicry is important in a perceiver's ability to process emotional expressions.

Mimicry has been recently shown to relate to social attitudes. For example, Leighton et al. (2010) investigated whether social attitudes have a direct and specific effect on mimicry. To address this, a group of participants was primed with pro-social, neutral or anti-social words in a scrambled sentence task. They were then tested for mimicry using a stimulus-response compatibility task which required the execution of a pre-specified movement (e.g., opening their hand) on presentation of a compatible (open) or incompatible (close) hand movement. Results showed that pro-social priming produced a larger automatic imitation effect than anti-social priming, indicating that the relationship between mimicry and social attitudes is bidirectional, and that social attitudes have a direct and specific effect on the tendency to imitate behavior without intention or conscious awareness.

All the works discussed above suggest a relationship between social attitudes and the processing of emotions. In fact, an emotional expression is informative not only about the emotional state of a person, but also a signal of its affiliative intention (Hess et al., 2000). Accordingly, it was suggested that individuals who show happy expressions are perceived as highly affiliative, whereas individuals who show anger are perceived as highly nonaffiliative, especially when the expresser is male (Knutson, 1996; Hess et al., 2000).

In consideration of this suggestion, in the current research we addressed the hypothesis that pro-social (i.e., happy) vs. antisocial (i.e., angry) facial expressions influence the recognition of the social attitudes implicitly conveyed by a particular hand posture (i.e., closed hand posture: anti-social attitude; open hand posture: pro-social attitude). In fact, as described by Givens (2008), uplifted palms (i.e., open hand) suggest a vulnerable or non-aggressive pose that appeals to listeners as allies rather than as rivals or foes. Moreover, Shaver et al. (1987) found that fist clenching (i.e., closed hand) is involved in the anger prototype.

Therefore, we used a same/different task (Farell, 1985), which is a behavioral paradigm classically used for testing the effect of task irrelevant stimuli (e.g., visual primes) on participants' performance. In fact, the time taken to make a same/different judgment can be a particularly useful measurement as it can be used to isolate the mental processes underlying a phenomenon of interest (Sternberg, 1969).

We predict that the "angry" prime (which reflects an antisocial attitude) selectively interferes with the reaction times (RTs) in detecting pictures depicting a closed hand posture (i.e., antisocial attitude) compared to open hand postures; vice-versa, we expect that the "happy" prime (which reflects a pro-social attitude) selectively interferes with RTs in detecting pictures depicting an open hand posture (i.e., pro-social attitude) compared to a closed hand posture. This prediction was made according to studies suggesting some relationship between social attitudes, emotion and embodiment (see Niedenthal et al., 2005 for a review).

In consideration of the paradigm used to explore our research goal (i.e., same/different task, Farell, 1985), we expect to detect performance interference (i.e., slower RTs) rather than performance facilitation (i.e., faster RTs), as predicted according to the results by Leighton et al. (2010). In fact, several previous works (Pratto and John, 1991; MacKay et al., 2004; Most et al., 2005; Ihssen et al., 2007) have shown that rapidly presented emotional pictures (as in the current study) interfere with performance in detecting other stimuli, probably because this type of presentation captures attentional resources. A recent study clarifies the mechanism behind the attentional interference reported in association with the involvement of emotional stimuli (Hodsoll et al., 2011). These authors showed, in five separate experiments that both positive (i.e., happy) and negative (i.e., fearful and angry)

facial expressions interfere with RTs when the emotional stimulus was irrelevant for execution of the task (like in our study). Importantly, the RT interference was only reported when the emotional stimulus was irrelevant to the execution of the task, as in our study. Moreover, the same/different task adopted in our study implies the involvement of working memory processes, since participants were required to retrieve the reference stimulus to make the comparison. Accordingly, there is evidence of emotions having an "interference" effect on working memory (WM) processes. For example, the study by Kensinger and Corkin (2003) indicated that negative faces slow down responses in a non-verbal WM task. Unfortunately they did not use happy faces. Interestingly, this effect was only found with facial stimuli (not with emotional words), suggesting that the performance interference is specific for facial stimuli.

## **METHODS**

## **PARTICIPANTS**

Thirteen consenting healthy participants with an average age of 22.71 ± 3.17 years, 4 male, were recruited from the University of Queensland (Australia). All were right-handed, had normal or correct-to-normal vision, and were proficient in the English language. Participants were all naïve to the purpose of the experiment. The experiment was performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki.

### **STIMULI AND PROCEDURE**

Participants were positioned 50 cm from a Dell computer 21-- monitor configured at a refresh rate of 60 Hz. All the visual stimuli were presented in a single session which included the three hand postures (i.e., closed hand, open hand, neutral posture—control posture) used in the study by Leighton et al. (2010) (see **Figure 1** for details) and three facial expressions (i.e., happy, angry, neutral expression—control priming).

The stimuli were presented in the center of the screen for a total of 270 trials (i.e., 90 trials for the "*same*" condition: 10 trials × 3 facial expressions × 3 hand postures; 180 trials for the "*different*" condition: 10 trials × 3 facial expressions × 6 combinations

**FIGURE 1 | Stimuli used for the experimental paradigm.** *Hand postures:* **(A)** Neutral hand posture; **(B)** Closed hand posture; **(C)** Open hand posture. Reprinted from Leighton et al. (2010).

of the 3 hand postures). A typical trial sequence was presented as follows: First, the computer program displayed a ready signal (fixation cross) lasting 1000 ms. Next, a reference stimulus (i.e., one among the three hand postures) was presented for 1000 ms. Immediately after the reference stimulus disappeared, the computer program displayed a visual prime (i.e., one among the three face stimuli) lasting 500 ms. Finally, once the visual prime disappeared, the computer program displayed a test stimulus (i.e., one among the three facial stimuli. See **Figure 2** for a typical trial sequence). By using one among two buttons of the keyboard (V and B-counterbalanced response across subjects) participants were required to establish, as quickly as possible, whether the test stimulus was the same as or different to the reference stimulus. Before testing, participants were required to complete 27 practice trials.

#### **DATA ANALYSIS**

Given the difference in the amount of conditions when the reference and test were the "*same*" compared to the conditions when the reference and test were "*different*," we decided to collapse the data of the "*different*" conditions in order to obtain 3 categories of reference stimuli from the initial six combinations: (1) reference "*open*" test "*neutral*"; (2) reference "*open*" test "*closed.*" (3) reference "*closed*" test "*neutral*"; (4) reference "*closed*" test "*open*"; (5) reference "*neutral*" test "*open*"; (6) reference "*neutral*" test "*closed*."

Before collapsing the data to obtain 3 categories, we performed 3 separate ANOVAs (one for each emotional prime) in order to verify whether the reference stimulus (*per se*) influenced

the participants' performance. The results showed no significant difference among the 6 stimulus conditions for the neutral [*F(*5*,* <sup>60</sup>*)* = 0*.*73, *p* = 0*.*602, η*p*<sup>2</sup> = 0*.*057, power = 0.245], as well as for the happy [*F(*5*,* <sup>60</sup>*)* = 2*.*03, *p* = 0*.*086, η*p*<sup>2</sup> = 0*.*145, power = 0.639] and angry [*F(*5*,* <sup>60</sup>*)* = 1*.*78, *p* = 0*.*129, η*p*<sup>2</sup> = 0*.*129, power = 0.574] facial expressions. Thus, we collapsed the data for all three emotional primes in order to obtain three "different" test stimulus categories: (1) *Neutral test stimulus category*: reference "*open*" test "*neutral*" with reference "*closed*" test "*neutral;* (2) *Open test stimulus category:* reference "*neutral*" test "*open*" with reference "*closed*" test "*open*"; (3) *Closed test stimulus category*: reference "*neutral*" test "*closed*" with reference "*open*" test "*closed.* A further analysis was performed on "same" condition trials, when a neutral prime was presented (i.e., neutral expression prime), to control for an effect of the reference/test stimulus compatibility on the participants' performance. No congruency effect was found for the three considered postures when a neutral expression prime was presented [*F(*2*,* <sup>24</sup>*)* = 1*.*33, *p* = 0*.*28, η*p*<sup>2</sup> = 0*.*100, power = 0.260].

After collapsing the data of the different conditions, we normalized the RTs obtained for both angry and happy prime conditions by dividing them with those obtained for the neutral expression prime condition (i.e., baseline). Thus, normalized RTs were entered in a (2 × 2 × 3) factorial design with Stimulus (*same* and *different*), Facial expression (*anger* and *happiness*), Hand posture (*neutral, open, closed*), as main factors. *Post hoc* comparisons were performed with Fisher *post hoc* tests. For all tests, statistical significance was set at *p <* 0*.*05. Errors and false alarms were removed before performing the analysis. They were distributed as following: *Errors*: No expression "same"(1.02%), no expression "different" (0.85%); Anger "same" (1.2%), anger "different"(1.02%); Happiness "same" (1.20%), happiness "different (0.65%). *False alarm:* No expression "same"(1.05%), no expression "different" (0.54%); Anger "same" (1.4%), anger "different"(0.54%); Happiness "same" (1.08%), happiness "different (0.25%).

Data analysis was performed using Statistica software, version 8.0, StatSoft, Inc., Tulsa, OK, USA.

## **RESULTS**

The repeated measures ANOVA revealed that there was no significant main effect for the Stimulus [*F(*1*,* <sup>12</sup>*)* = 0*.*001, *p* = 0*.*971, η*p*<sup>2</sup> *<* 0*.*001, power = 0.050], Facial expression [*F(*1*,* <sup>12</sup>*)* = 0*.*842, *p* = 0*.*337, η*p*<sup>2</sup> = 0*.*06, power = 0.135] and Hand posture [*F(*2*,* <sup>24</sup>*)* = 1*.*23, *p* = 0*.*309, η*p*<sup>2</sup> = 0*.*093, power = 0.242] main factors. No significant results were found for the Stimulus × Facial expression [*F(*1*,* <sup>12</sup>*)* = 0*.*006, *p* = 0*.*939, η*p*<sup>2</sup> *<* 0*.*001, power = 0.050], and Stimulus × Hand posture [*F(*2*,* <sup>24</sup>*)* = 0*.*105, *p* = 0*.*901, η*p*<sup>2</sup> = 0*.*008, power = 0.064] interaction factors. However, a significant result for the Facial expression × Hand posture interaction factor was found [*F(*2*,* <sup>24</sup>*)* = 3*.*42, *p* = 0*.*049, η*p*<sup>2</sup> = 0*.*221, power = 0.586]. In particular, we found that the RTs in detecting the closed hand posture were significantly slower compared to the open (*p* = 0*.*01) and the neutral (*p* = 0*.*03) hand postures, when presented with the angry prime. Analyses revealed a significant Stimulus × Facial expression × Hand posture interaction factor [*F(*2*,* <sup>24</sup>*)* = 6*.*73, *p* = 0*.*004, η*p*<sup>2</sup> = 0*.*359, power = 0.878]. *Post hoc* comparisons showed a significant interaction exclusively for the "*same*" condition. In particular we found that RTs in detecting a closed hand posture were slower, compared to the RTs for both neutral (*p* = 0*.*04) and open (*p* = 0*.*01) hand postures, when the angry prime was presented. Vice versa, we found that the RTs in detecting both neutral (*p* = 0*.*02) and open (*p* = 0*.*03) hand postures were significantly slower, compared to the RTs for the closed hand posture, when the happy prime was presented.

Similar results were found for each single posture by comparing RTs when participants were presented with both emotional primes. In particular, we found that RTs for both neutral and open hand postures were slower in the happy prime condition (*p* = 0*.*045; *p* = 0*.*016, respectively) compared to the angry prime condition. Vice-versa, RTs for detecting the closed hand posture were slower in the angry prime condition (*p* = 0*.*003) compared to the happy prime condition. No significant differences were reported for the "*different*" conditions (see **Figure 3** for details).

A further analysis was conducted to examine the accuracy of responses (Error %). A significant main effect for the Stimulus factor was detected [*F(*1*,* <sup>12</sup>*)* = 20*.*02, *p <* 0*.*001, η*p*<sup>2</sup> = 0*.*627, power = 0.984]. In particular we found a lower accuracy in detecting same hand postures (*M* = 10*.*7% ± 1.53) with respect to different hand postures (*M* = 4*.*1% ± 0.65). We also detected a significant difference in accuracy of responses for Facial expressions [*F(*2*,* <sup>24</sup>*)* = 4*.*18, *p* = 0*.*027, η*p*<sup>2</sup> = 0*.*258, power = 0.680]. In particular we found a higher error when the angry face was presented (*M* = 8*.*9% ± 1.19) as compared to both neutral (*M* = 6*.*7% ± 1.08, *p* = 0*.*031) and happy (*M* = 6*.*4% ± 0.95, *p* = 0*.*012) faces.

No other significant main effects were detected: Hand posture [*F(*2*,* <sup>24</sup>*)* = 0*.*45, *p* = 0*.*642, η*p*<sup>2</sup> = 0*.*036, power =

**FIGURE 3 | Participants' performance in the same/different task.** The figure shows RTs (normalized by dividing them with the baseline conditionneutral facial expression) for both angry and happy primes when presented with Neutral, Open or Closed hand postures. The results show a significant difference for the "same" condition (i.e., same), and no significant difference for the "different" condition (i.e., different). <sup>∗</sup>Denotes *P-*values *<* 0.05. Vertical bars indicate standard errors of the mean.

0.114]. Likewise, no significant interaction effects were found: Stimulus × Facial expression [*F(*2*,* <sup>24</sup>*)* = 0*.*61, *p* = 0*.*547, η*p*<sup>2</sup> = 0*.*049, power = 0.141], Stimulus × Hand posture [*F(*2*,* <sup>24</sup>*)* = 0*.*48, *p* = 0*.*619, η*p*<sup>2</sup> = 0*.*039, power = 0.120], Facial expression × Hand posture [*F(*4*,* <sup>48</sup>*)* = 1*.*34, *p* = 0*.*268, η*p*<sup>2</sup> = 0*.*100, power = 0.386] and Stimulus × Facial expression × Hand posture [*F(*4*,* <sup>48</sup>*)* = 1*.*09, *p* = 0*.*372, η*p*<sup>2</sup> = 0*.*083, power = 0.316].

## **DISCUSSION**

It was recently suggested that embodied simulation mediates our capacity to share the meaning of actions, intentions, feelings, and emotions with others, thus grounding our identification with and connectedness to others (Gallese, 2009).

In the current research we sought to investigate the existence of a relationship between emotion and embodiment. In particular, we were interested in testing the existence of a relationship between pro-social vs. anti-social facial expressions (i.e., happiness vs. anger) and pro-social vs. anti-social hand postures (i.e., open hand vs. closed hand postures). Thus, a same/different paradigm was used to investigate whether the RTs in recognizing a particular hand posture would have been affected by a particular emotional prime, depending on its social meaning.

Several studies have shown that exposure to faces expressing emotions (i.e., happiness or anger) affect facial mimicry (see Hess and Fischer, 2013 for a recent review). On the other hand, it has been shown that facial mimicry might be influenced by social attitudes. For example, when people watch a funny movie with friends, they laugh more than if they see the same movie alone (Jakobs et al., 1999); Moreover, there is evidence that facial expressions can affect the recognition of social attitudes. For example, a person showing happiness is typically perceived as having affiliative intentions, whereas a person showing anger or disgust is not (Hess et al., 2000).

Similarly to this research which demonstrates a link between emotion and social attitude, the results provided by the "same" condition of our study show that the presentation of happy facial primes resulted in slower RTs in detecting the open hand posture compared to the closed hand posture. Vice-versa, presentation of the anger facial prime slowed down RTs in detecting the closed hand posture compared to the open hand posture. We also found a pattern of results similar to those documented for the open hand posture while detecting the neutral hand posture. A possible suggestion for explaining this result is that the social meaning associated with the neutral posture was similar (i.e., pro-social) to that associated with the open hand posture. This is a likely interpretation, considering that the stimulus used for the neutral prime was a partially open hand. However, this suggestion remains to be verified since our participants were not asked to rate the grade of social attitude (pro-social vs. anti-social) associated with the three hand pictures.

On the other hand, no difference was reported for the "different" condition. Possible arguments for explaining this difference between the "same" and the "different" conditions can call into question (1) the higher number of trials (*n* = 180) for the "different" condition compared to the "same" condition (*n* = 90). This could have caused some habituation effect in the "different" condition that could have reduced the expected effect; (2) the higher number of combinations (i.e., 6 different trial conditions) which increases the perceptual variability (compared to the 3 different trial of the "same" condition).

One hypothesis regarding the application of theories of embodied cognition to emotion is that the perception of emotional meaning involves the embodiment of the implied emotion (Adolphs, 2002). Thus, it has been suggested that decoding these signals is accompanied by unconscious imitation as the perception of an individual's facial expression induces a corresponding reaction in the observer's facial muscles (Dimberg et al., 2000).

On the other hand, our study provides evidence of a direct link between emotion and embodiment, which extends beyond the unconscious imitation of the displayed facial expression. In fact, we found that the exposure to facial expressions affects the recognition of hand pictures, according to the social attitude implicitly related to the stimulus posture. This suggests that perceiving facial expressions might automatically pre-activate the expectation for social (pro vs. anti-social) outcomes which in turn affects the recognition of a social attitude implicitly conveyed through the hand gesture.

Our result provides new insights into the emotion/embodiment issue as it shows the existence of a particular relationship between emotions and gestures. In particular, it suggests that the "social attitude" might represent the link between gestures and emotions.

Our study bears limitations such as the limited number of participants and the absence of data about the degree to which each of the three hand postures are perceived to communicate prosocial and anti-social attitudes. This notwithstanding, we believe it creates the rationale for more extensive investigation of the emotion-embodiment issue.

Future works devoted to explore this issue might wish to investigate whether: (1) visual primes associated with a social attitude (i.e., pictures depicting open vs. closed hands) affect the recognition of pro-social vs. anti-social facial expressions (i.e., happiness vs. anger); (2) facial expressions such as those used in the current study affect hand mimicry, depending on the social attitude associated with the displayed posture; (3) mood manipulation (i.e., happiness) influences the recognition of socially relevant gestures. Moreover, to explore the soundness and the generalizability of these effects, it would be interesting to replicate this experiment through the use of different paradigms (i.e., go/no go task).

#### **ACKNOWLEDGMENTS**

We would like to thanks Dr. Ada Kritikos for making available her laboratory for the implementation of this study.

#### **REFERENCES**


Vicario, C. M., Candidi, M., and Aglioti, S. M. (2013). Cortico-spinal embodiment of newly acquired, action-related semantic associations. *Brain Stimul.* 6, 952–958. doi: 10.1016/j.brs.2013.05.010

Wells, G. L., and Petty, R. E. (1980). The effects of overt head movements on persuasion: Compatibility and incompatibility of responses. *Basic Appl. Soc. Psychol.* 1, 219–230. doi: 10.1207/s15324834basp0103\_2

Wilson, M. (2002). Six views of embodied cognition. *Psych. Bull. Rev.* 9, 625–636. doi: 10.3758/BF03196322

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 20 September 2013; accepted: 11 December 2013; published online: 26 December 2013.*

*Citation: Vicario CM and Newman A (2013) Emotions affect the recognition of hand gestures. Front. Hum. Neurosci. 7:906. doi: 10.3389/fnhum.2013.00906*

*This article was submitted to the journal Frontiers in Human Neuroscience.*

*Copyright © 2013 Vicario and Newman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The processing of facial identity and expression is interactive, but dependent on task and experience

## **Alla Yankouskaya<sup>1</sup>\*, Glyn W. Humphreys <sup>1</sup> and Pia Rotshtein<sup>2</sup>**

<sup>1</sup> Cognitive Neuropsychology Centre, Department of Experimental Psychology, University of Oxford, Oxford, UK <sup>2</sup> School of Psychology, University of Birmingham, Birmingham, UK

#### **Edited by:**

Davide Rivolta, University of East London, UK

#### **Reviewed by:**

Cheryl Grady, University of Toronto, Canada Jason Friedman, Macquarie University, Australia

#### **\*Correspondence:**

Alla Yankouskaya, Cognitive Neuropsychology Centre, Department of Experimental Psychology, University of Oxford, 9, South Parks Road, Oxford OX13UD, UK e-mail: alla.yankouskaya@ psy.ox.ac.uk

Facial identity and emotional expression are two important sources of information for daily social interaction. However the link between these two aspects of face processing has been the focus of an unresolved debate for the past three decades. Three views have been advocated: (1) separate and parallel processing of identity and emotional expression signals derived from faces; (2) asymmetric processing with the computation of emotion in faces depending on facial identity coding but not vice versa; and (3) integrated processing of facial identity and emotion. We present studies with healthy participants that primarily apply methods from mathematical psychology, formally testing the relations between the processing of facial identity and emotion. Specifically, we focused on the "Garner" paradigm, the composite face effect and the divided attention tasks. We further ask whether the architecture of face-related processes is fixed or flexible and whether (and how) it can be shaped by experience. We conclude that formal methods of testing the relations between processes show that the processing of facial identity and expressions interact, and hence are not fully independent. We further demonstrate that the architecture of the relations depends on experience; where experience leads to higher degree of interdependence in the processing of identity and expressions. We propose that this change occurs as integrative processes are more efficient than parallel. Finally, we argue that the dynamic aspects of face processing need to be incorporated into theories in this field.

**Keywords: face processing, integration, identity, emotions, redundancy gains, capacity processing**

## **INTRODUCTION**

It is difficult to find more a complex source of information in social interaction than human faces. Gaze direction, emotional expression and identity are perceived very rapidly allowing us to make a judgment of a face seen for less than a hundred milliseconds. How is this broad range of facial information processed by our perceptual system? To answer this question, scientists have used two general approaches. The first focuses on the independent manipulation of each type of facial information, e.g., emotional expressions (Bassili, 1979; Bartlett et al., 1999; Baudouin et al., 2000; Calder et al., 2000; Adolphs, 2002; Balconi and Lucchiari, 2005); person identity (Bruce et al., 1991; Collishaw and Hole, 2000; Baudouin and Humphreys, 2006; Caharel et al., 2009). The second approach is to manipulate both types of information together, to determine whether different types of facial information are processed in an integrative or independent manner (Etcoff, 1984; Bruce and Young, 1986; Campbell et al., 1986; de Gelder et al., 2003; Wild-Wall, 2004; Calder and Young, 2005; Curby et al., 2012). The focus of this review is on studies adopting the latter approach to address the still outstanding question of whether identity and emotional expression information in faces are processed independently or interactively. We attempt to answer this question using novel application of mathematical procedures to psychological problems. We further discuss the novel hypothesis that the architecture of face processing is dynamic and shaped by experience.

Three paradigms are commonly used with healthy participants to assess the relationship between factors in systematic ways: the "Garner paradigm", the facial composite paradigm and the divided attention paradigm. Methodological issues within each paradigm and the contrasting processes that they "weight" are described in detail. The review begins with a brief highlight of the three views on interactive vs. independent processing of identity and emotion in faces and the supporting evidence for each. The three following sections present the evidence on interactions between identity and emotional expression from studies employing each task. The last section summarizes our knowledge about the relations between identity and emotion processing in faces and proposes directions for further studies.

## **THREE VIEWS ON INTERACTIONS BETWEEN IDENTITY AND EMOTIONAL EXPRESSION PROCESSING IN FACES**

A critical question, fundamental for building models of face processing, is whether identity and emotional expressions in faces interact or whether they are processed by strictly separated routes. This section provides a brief summary of contemporary views on the relationship between the two types of facial information. To date, three accounts have been proposed.

*The first account—independent processing—*proposes that there is separate and parallel processing of identity and emotional expression signals from faces (Bruce and Young, 1986). The main support for the separate-parallel routes comes from neuropsychological studies showing double dissociations in emotion and identity processing. Patients have been reported to have impaired recognition of face identity but not emotion (Bruyer et al., 1983; Jones and Tranel, 2001; Nunn et al., 2001), while other patients have impaired discrimination of face expression but not identity (Humphreys et al., 1993) or impairments at recognizing specific emotion (e.g., Adolphs et al., 1994; Calder et al., 2000).

*The second account—asymmetric dependency—*argues for asymmetric processing of identity and emotional expression in faces; namely that emotion processing depends on facial identity coding but not vice versa (Schweinberger and Soukup, 1998; Schweinberger et al., 1999; Baudouin et al., 2000; Kaufmann and Schweinberger, 2004; Atkinson et al., 2005). A common finding in studies that support asymmetric dependency is that observers are able to attend and respond to the identity of faces while ignoring emotional and speech expressions, but they are unable to ignore identity when attending and responding to either emotional expression or speech (Schweinberger and Soukup, 1998; Schweinberger et al., 1999). Similar results have been reported in studies examining the relationship between gender and emotion in faces (Le Gal and Bruce, 2002; Atkinson et al., 2005). These findings are consistent with the idea that information about invariant aspects of faces influences how changeable aspects of faces are computed, while information about their changeable aspects of faces does not influence the processing of invariant face properties (Haxby et al., 2000).

*The third account—interactive processing—*supports the idea of interactive processing between facial identity and emotion (Ganel and Goshen-Gottstein, 2002, 2004; Wild-Wall, 2004; Yankouskaya et al., 2012; Wang et al., 2013). Ganel and Goshen-Gottstein (2002, 2004) provide evidence for symmetric interference between facial identity and emotions in familiar faces and proposed that the mechanisms involved in processing familiar identity and expression are interconnected, with facial identity serving as a reference from which different expressions are more easily derived (Ganel and Goshen-Gottstein, 2002, 2004). Study by Yankouskaya et al. (2012) further support the interactive view by demonstrating redundancy gains and super capacity in processing faces containing both a target identity and emotional expression as compared when single target (a target identity or emotion) is present. The interactive model is also supported by neuroimaging findings (see for review Calder and Young, 2005).

It is important to note the asymmetric and symmetric interactive accounts do not necessarily imply that there is only one shared mechanism for processing identity and emotion information from faces (Calder and Young, 2005). These accounts suggest a high degree of interconnection between emotion and identity processing, whether they are incorporated in one representational space (Calder and Young, 2005), or in separate ones (Haxby et al., 2002).

In the following sections we discuss in detail evidence based on formal testing of the three models of identity and expression processing.

#### **THE GARNER TASK**

The Garner paradigm was originally designed to establish the nature of the relationship between the properties of twodimensional stimuli (Garner, 1974). It is assumed that if two dimensions of a stimulus are processed interactively, variation in one dimension will interfere with processing of the second dimension. In contrast, if the two dimensions are processed independently, there will be no interference from each other. Typically an observer is required to make speeded two-choice classifications of four types of stimuli as the two dimensions of the stimuli are varied orthogonally. The stimuli are presented in three experimental conditions: a control condition (the stimuli vary along a relevant dimension, while the irrelevant dimension is held constant); an orthogonal condition (both the relevant and irrelevant dimensions vary); and a correlated condition (the two dimensions co-vary). Garner interference (GI) is defined as an increase in reaction times (RTs) and/or error rates for the relevant target dimension in the orthogonal condition relative to the constant and the correlated conditions. The difference between the correlated and constant blocks provides a measure for the potential benefit arising from integrating the two factors. Though this aspect is rarely considered in studies using the Garner paradigm.

Results based on the Garner paradigm provide conflicting results. While some studies show no interference in responses to either expression or identity, suggesting independent processing (e.g., Etcoff, 1984), others show an asymmetrical effect (effect of identity on expression but not vice versa; e.g., Schweinberger and Soukup, 1998), symmetrical effects with familiar faces (but not with unfamiliar faces) (e.g., Ganel and Goshen-Gottstein, 2004) or symmetrical interactions between facial expression and facial familiarity that emerge for some expressions (happiness and neutral), but not for others (disgust and fear) (Wild-Wall, 2004). One possible reason for the variability in the results may be the use of a small stimulus set in many studies using this paradigm. Typically only two different stimuli exemplars displaying one of two emotions are used (e.g., see Schweinberger and Soukup, 1998). This limited set of stimuli is repeated across trials allowing the development of a strategy of discriminating stimuli based on local image details (e.g., variations in lighting, photographic grain) rather than on expression and identity. Such a strategy may limit interference between the dimensions. Another important issue is that different picture-based strategies may be used for the identity and emotion decision tasks in the Garner paradigm. In the identity decision task pictorial strategies might be used to discriminate individuals based on the shape of a face or on non-facial cues such as hair style (e.g., see the stimuli in Etcoff (1984) and Schweinberger and Soukup (1998) for example). For the expression decision task however, where participants are required to attend to internal facial features, this strategy may be inappropriate. This can lead to differences in task difficulty which may contribute to the asymmetric interference effects between identity and emotional expression judgments.

The relative discriminability between the exemplars of the two dimensions can also affect results in the Garner paradigm. Wang et al. (2013) orthogonally manipulated the discriminability (Disc) of stimuli within the two relevant dimensions (e.g., high Disc identities and high Disc expressions, high Disc identities and low Disc expressions). The results showed asymmetric interference from identity to emotional expression when the discriminability of the facial expression was low and that of facial identity was high. In contrast there was interference from emotional expression on identity when the discriminability of facial expression was high and that of facial identity low. When both dimensions were low in discriminability, interference was found in both directions, while there was no interference when both dimensions were highly discriminable. The authors argued that, when discriminability is low, people refer to additional information from an irrelevant dimension, and this results in GI (Wang et al., 2013). Ganel and Goshen-Gottstein (2004) controlled for pictorial processing strategies and they also equated the discriminability of identity and expression judgments. In this case symmetric interference was found between expression and identity judgments, though only for familiar faces (Ganel and Goshen-Gottstein, 2004).

Taken together, the above studies suggest that degree of interaction between identity and emotional expression in faces is associated with the level of discriminability of the two dimensions. It is less clear, however, why no interaction is observed when both dimensions are highly discriminable. It is possible that participants process each relevant dimension separately from the irrelevant one, because there is enough information carried by each dimension. However, there is also the possibility that in the orthogonal condition participants tend to switch their attention between the two dimensions that constantly change. Hence in some occasion participants direct attention to the irrelevant dimension which leads to potential increase in errors and longer RT. Thus, the effects of the unattended stimulus dimensions arise due to trial-by-trial fluctuations in attention that lead to the irrelevant dimension sometimes being attended (Lavie and Tsal, 1994; Weissman et al., 2009). On these occasions performance will be affected by variation in the irrelevant dimension, even though the dimensions might be processed independently.

## **THE COMPOSITE FACE TASK**

Composite faces combine the top half of one face with the bottom half of another face. When aligned, the two face halves appear to fuse together to produce a novel face, making it difficult to selectively process either half of the composite by itself (Young et al., 1987; Mondloch et al., 2006; Rhodes et al., 2006; McKone, 2008; Rossion, 2013). In the composite paradigm, the task is to attend to one half of the face (e.g., the top), and either name it (naming version) or determine whether it is the same or different to the half face in a second composite stimulus (matching version), while ignoring the non-target half (e.g., the bottom part of the face). There are two critical conditions: when the two halves of the faces are aligned—"encouraging" holistic processing, or when the two halves are not aligned—when there is less likelihood of processing them as a single perceptual unit. Note, that as in the Garner paradigm, perceptual integration is indexed by the level of interference of the irrelevant dimension on the processing of the relevant dimension.

When the two halves of the faces are smoothly aligned, the novel face in the composite condition can create a conflicting situation as it does not match the identity of either the top or the bottom half. In contrast, when two halves are misaligned, the face is not encoded as a perceptual whole, and the information of either part can be assessed without mutual interference. The robust finding is that participants are slower, and less accurate in identity judgments of the top half when the face halves are vertically aligned compared to when they are spatially unaligned (e.g., Young et al., 1987; McKone, 2008). Similar to the effects with facial identity, there is also a composite effect for emotional expressions (Calder et al., 2000, Experiment 1).

Interestingly, when identity and expression information are combined, the composite effect in identity has been found to operate independently of the effect in emotional expression. In (Calder et al., 2000, Experiment 4), three types of composite faces were employed: (i) two halves of the same person posing different facial expressions (same-identity/different-expression composites); (ii) two halves of different people posing the same facial expression (different-identity/same-expression composites); and (iii) two halves of different identities posing different facial expressions (different-identity/different- expression composites). Participants performed two tasks: judging the identity or the expression of each face. The RT pattern depended on the task. In the identity task, judging the identity of the top half of the face was facilitated if it matched the identity of the bottom half, and this was independent of whether the expressions (the irrelevant dimension in this case) matched or mismatched. Similarly in the expression task, when the two halves were matched for expression responses were facilitated independent of facial identities. Thus, the results indicated that people could selectively attend to either of the facial dimensions (see a similar conclusion in Etcoff's (1984) study where participants performed a Garner task).

Critical examination of Calder et al.'s (2000) Experiment 4 highlights a few important points. First the authors did not equate for difficulty across the condition and trial types (e.g., identity decisions were easier than expression decisions). It could be that when decisions are easier, participants tend to rely on a single source of information to make the decision (Wang et al., 2013); however if the decision is difficult the participants may refer to the irrelevant dimensions to provide additional information to make a correct classification judgment or they may need a longer time to ignore the irrelevant information. In both cases this does not imply complete independence between the coding of identity and emotional expression. Second, the high cognitive demands on the perceptual system, required to focus attention on just one part of the faces, may have affected the results. For example, similar to the Garner task, participants may have attended to the irrelevant dimension due to trial-by-trial fluctuations in attention or local details of the images. Finally, the results may reflect a tradeoff between speed and accuracy, as the accuracy results indicate that most errors were made during conditions where the top and bottom halves did not match on either expression or identity. Furthermore, Richler et al. (2008) found that discriminability (*d'*) on trials when both face halves had same identity was higher than discriminability on trials when the two halves had different identities. In summary, the composite face task cannot unambiguously provide evidence for separate routes for processing of facial identity and emotional expressions.

### **THE DIVIDED ATTENTION TASK**

The divided attention task has been used in studies examining holistic vs. featural processing in faces (Wenger and Townsend, 2001) and independent vs. interactive processing of identity and expressed emotion in faces (Wenger and Townsend, 2001; Yankouskaya et al., 2012, 2014a,b).

In the divided attention task, participants are required to monitor two sources of information simultaneously for a target to decide if the target is present or absent. There are two main advantages in employing the divided attention task. First, the task requires people to attend to facial identity and emotional expression simultaneously—a situation that closely resembles daily life. Second, in contrast to the selective attention task, the divided attention task controls for performance in the single target conditions by including the double target display. There is considerable evidence that, when a visual display contains two targets that require the same response, RTs are faster compared to when only one target appears (Miller, 1982; Mordkoff and Miller, 1993; Miller et al., 2001; Wenger and Townsend, 2006). For example, in Mordkoff and Miller's (1993) study participants were required to divide their attention between the separable dimensions of color and shape, with all stimulus features being attributes of a single object. Participants were asked to press a button if the target color (green), the target shape (X), or both target features (green X) were displayed, or to withhold their response. The mean RT on redundant target trials was significantly less than the mean RT on single target trials (Mordkoff and Miller, 1993).

Although different explanations can be put forward to account for this redundant target effect (RTE), the most relevant here are the Independent Race Model (Raab, 1962) and the Coactivation Model (Miller, 1982). According to the Independent Race Model, redundancy gains are explained by means of "statistical facilitation" (Raab, 1962). Whenever two targets are presented simultaneously, the faster signal determines the response "target present" (i.e., this signal wins the race). As long as the processing time distributions for the two signals overlap, RTs will be speeded when two targets occur since the winning signal can always be used for the response (Raab, 1962). Note, that signal which finishes "first" may depend on whether it is attended. For example, emotional expression or identity may be computed first, if there are fluctuations in attention to each independent dimension.

An alternative explanation for the RTE is the coactivation view. According to this model, the information supporting a response "target present" is pooled across the features defining the targets prior to response execution (Miller, 1982). When both target identity and target emotional expression contribute activation toward the same decision threshold, the response will be activated more rapidly relative to when only one attribute contributes activation.

The critical contrast for the two models compares the probability for the response times obtained on redundant targets trials relative to the sum of probabilities for responses being made to either single target trial. The Independent Race Model holds that at no point in the cumulative distribution functions should the probability of a response to redundant targets exceed the sum of the probabilities for responses to either single target. In contrast, the coactivation account predicts that responses to the redundant targets can be made before either single target generates enough activation to produce a response. Thus, the number of fastest responses to a face containing both the target identity and the target emotional expression should be larger than the number of fastest responses to either target facial identity or target expression when presented as single targets. The procedure assessing the relations between the number of fast responses in the single target trials vs. the dual target trails is referred to as the Miller inequality test, or the race model inequality test.

An alternative approach to test independence vs. co-activation processing is by examining the effects of the RTE on the workload capacity of the system (Townsend and Nozawa, 1995). The concept of workload capacity reflects the efficiency with which a cognitive system performs a task. Mathematically, the workload capacity (*C(t)*) is defined by the hazard function that gives the rate of process completion at any point time (when the process under an observation has not yet completed) (Townsend and Wenger, 2004). Importantly, the yardstick for the capacity model (Townsend and Nozawa, 1995) is the standard parallel model (e.g., The Independent Race Model (Raab, 1962)) where processing on individual dimensions does not change with increasing workload and signals are processed in parallel without mutual interference. In terms of the capacity model, the standard parallel processing model is associated with unlimited capacity (*C(t)* = 1), as processing one dimension has no impact on the processing of the second dimension. Processing with limited capacity (*C(t)* < 1) is associated with decreasing performance (e.g., slowing in RT) when the workload increases and the system performs sub-optimally. On the other hand the overall workload could decrease when redundant targets are presented, leading to facilitation in performance (e.g., faster RT). In this case the system is said to operate at super capacity (*C(t)* > 1)). The super capacity emerges since a decision is made before any single dimension alone provides sufficient evidence to support it. Hence less processing was needed of each dimension to enable a decision—making the process more efficient. The super capacity mode violates the race model inequality (Townsend and Wenger, 2004; Townsend and Eidels, 2011), suggesting positive dependency between the two dimensions.

The Race Model and the capacity measure have been used in tests of independence vs. coactivation in the processing of facial identity and emotional expression. Yankouskaya et al. (2012) employed the divided attention task under conditions where participants had to detect target identities and target emotional

expressions from photographs of a set target faces. Three of these photographs contained targets: stimulus 1 had both the target identity and the target emotion (i.e., redundant target); stimulus 2 contained the target identity and a non-target emotional expression; stimulus 3 contained the target emotional expression and a non-target identity (**Figure 1**). Three non-target faces were photographs of three different people, and expressed emotions different to those in target faces. Identity, gender and emotional expression information were varied across these studies.

The general results showed that supper-additive redundancy gains occurred between face identity and emotional expression. Particularly striking was the finding that there were violations of the race model inequality test (Miller, 1982) when the target identity was combined with the target expression in a single face. Violation of the race model inequality occurred for combinations of sad or an anger expression with facial identity but not when identity was combined with a neutral expression. In the last case, the authors report no evidence for any redundancy gain. Yankouskaya et al. (2012) suggest that unfamiliar faces bearing a neutral expression do not carry expression-contingent features and a neutral expression may be defined by the absence of an expression, making it more idiosyncratic to the particular face.

Importantly, the mathematical tests of the race model and capacity measures provide us with a precise analysis of the relationship between the processing of identity and emotional expression (Yankouskaya et al., 2012), facilitating estimation of the effect of different factors on the relationship (Yankouskaya et al., 2014a,b).

Taken together the data derived from the divided attention task within the framework of the race model and capacity measures of processing are consistent with coactive processing when a target identity is paired with a distinct emotional expression. The coactivation is beneficial for the cognitive system as it allows to pool together information derived from identity and emotion in faces leading to super capacity of the system. This super capacity emerges since combining information reduces the demands of resources compared to when each channel is consider independently.

## **DO EXPERIENCE AND FAMILIARITY WITH FACES MODULATE THE WAY THAT EXPRESSION AND IDENTITY PROCESSING INTERACT?**

Based on common observation, the recognition of identity and emotional expression in faces in everyday life is easy. We can catch a face of familiar person in a crowd or an expression in a face in few seconds. In return, we are typically quick at making a judgment if a briefly seen face is unfamiliar or whether a stranger's face has a particular expression. On the other hand, it may take longer for us to recognize a familiar face with an unusual expression or a stranger's smiling face, because it makes us doubt whether the person is familiar or not (Baudouin et al., 2000). These examples show that familiarity judgments to faces are affected by the expression of the faces, and the interaction occurs for both unfamiliar and familiar faces (Baudouin et al., 2000; Elfenbein and Ambady, 2002; Eastwood et al., 2003; Wild-Wall, 2004; Calvo and Nummenmaa, 2008). Familiarity with

the sum of the distributions of the single targets: emotional expression and

faces can be conceptualized at multiple levels: (1) continuous contact across the lifespan with faces in general may gradually shape the way we process faces; (2) there may be familiarity for faces from specific ethnical/relevant cultural group; and (3) there may be familiarity and increased experience with the face of specific individuals (including both media channels and direct social interactions).

Experience with human faces changes across the lifespan and affects the way we process faces. For example, the processing of both identity and expressions improves from childhood to adulthood (Schwarzer, 2000; Baudouin et al., 2010; Germine et al., 2011) and gradually declines in older people (Plude and Hoyer, 1986; Ruffman et al., 2008; Obermeyer et al., 2012). It is unclear, however, whether general experience with faces through the lifespan affects the way identity and expression interact.

top right panel and older in the lower panel (data reported in Yankouskaya et al., 2014b).

We used the divided attention paradigm to assess how aging affects the integration of visual information from faces. Three groups of participants aged 20–30, 40–50 and 60–70 performed a divided attention task in which they had to detect the presence of a target facial identity or a target facial expression. Three target stimuli were used: (1) with the target identity but not the target expression; (2) with the target expression but not the target identity; and (3) with both the target identity and target expressions (the redundant target condition). On non-target trials the faces contained neither the target identity nor the target expression. All groups were faster in responding to a face containing both the target identity and emotion compared to faces containing either single target. Furthermore the redundancy gains for combined targets exceeded performance limits predicted by the independent processing of facial identity and emotion. These results held across the age range suggesting that there is interactive processing of facial identity and emotion which is independent of the effects of cognitive aging. Remarkably, there was an increase in the extent of co-activation across trials throughout the adulthood lifespan so that, with increased age the benefits of redundant targets were larger. This was reflected by an increased probability of fast response trials and increased processing efficiency evidenced by "higher" super-capacity. (**Figures 2**, **3**).

<sup>1</sup>Graphic representations of the distributions were constructed using group RT distributions obtained by averaging individual RT distributions (Ulrich et al., 2007). When the CDFs are plotted, the Independent Race Model requires that the CDF of the redundant targets trials falls below and to the right of the summed CDF (less fast responding trials for the redundant target compared with the number of fast trials for both single targets), any reliable violation of this pattern provides support for the co-activation model.

The evidence on the effects of life experience with faces is mirrored by the data on processing faces from same vs. a different race. It is well documented that the processing of ownrace faces is advantaged for both expressions (Elfenbein and Ambady, 2002; Kubota and Ito, 2007) and identity (Levin, 2000; Kito and Lee, 2002; Walker and Tanaka, 2003; Michel et al., 2006; Cassidy et al., 2011). In a recent study Yankouskaya et al. (2014a) showed that experience with own race faces affected the integration of identity and emotional information. The relations between the processing of facial identity and emotion in own- and other-race faces were examined using a fully crossed design with participants from three different ethnicities all residing in the UK at the time of the study (Yankouskaya et al., 2014a). Three groups of participants (European, African and Asian individuals) performed the divided attention task on three sets of six female portrait photographs for each ethnic group. In each set, three photographs contained targets: Stimulus 1 had both the target identity and the target emotion, sad (IE); Stimulus 2 contained the target identity and a nontarget emotional expression, happy (I); Stimulus 3 contained the target emotional expression, sad, and a non-target identity (E). Three non-target faces were photographs of three other people expressing emotions different from those in target faces (angry, surprised, and neutral). The benefits of redundant identity and emotion signals were evaluated and formally tested in relation to models of independent and coactive feature processing and measures of processing capacity for the different types of stimuli (see details in section 1.3). The results suggested that coactive processing of identity and emotion that was linked to super capacity for own-race but not for other-race faces (**Figure 4**).

Furthermore, in the study of Yankouskaya et al. (2014a), the evidence for a race effect on the integration of emotion and identity information was asymmetric. European participants only showed evidence of perceptual integration for their own race faces. However African and Asian participants showed this both for their own race faces and for European faces, but they did not show it respectively for Asian and African (both otherrace) faces (**Figure 4**). This asymmetry reflects number of contacts with other race faces; as all participants were residing in the UK at the time of testing, the Asian and African participants had greater familiarity with European faces than Europeans had with Asian and African faces (**Table 1**). A formal test show that variations in the size of the redundancy gains across other race faces were strongly linked to the number of social contacts, but less so to the quality of the contact with other-race members. This suggests that experience with faces facilitates the coactive processing of identity and emotional expression.

The capacity analysis also demonstrated super capacity for processing identity and emotional expression within own-race faces, indicating that the observed responses for the redundant target face were greater than predicted by the combined response to single targets (**Figure 5**). In contrast, adding information to other-race faces generated results indicative of a negative dependency and suggesting that the processing of identity and emotional expression in other-race faces operates with limited capacity. The negative dependency for other-race faces held true for European participants but not for African and Asian groups where responses for European faces showed positive dependency.

Collectively, these results suggest that one component of the own race face advantage is the increase in the integration of identity and emotional expression information in own-race faces. This effect is strongly linked to individual experience with particular types of face.

Finally, familiarity with specific individuals can also change the way information from the face is processed. Ganel and Goshen-Gottstein (2004) predicted that GI should be greater for familiar compared to unfamiliar faces, because representations of familiar faces contain richer and more detailed structural descriptions than representations of unfamiliar faces. As a consequence

**participants (own-race, African and Asian faces from the left to the right), middle row–African participants (European, African and Asian faces from the left to the right), low row–Asian participants (European, African and**

I + E—the sum of distributions for I and E (in purple). These graphs show whether the redundant target information is processed coactively (IE line places on the left of the I + E line, see for details Yankouskaya et al., 2014a).

#### **Table 1 | Mean number (standard deviation in brackets) of well-known own and other-race people for groups of European, African and Asian participants**.


\* In bold for own race people.

perceivers should be more likely to be sensitive to the associations between invariant and changeable aspects of familiar faces than they are to those of unfamiliar faces (Ganel and Goshen-Gottstein, 2004). This was demonstrated using the Garner paradigm where participants had to make identity and emotion judgments for personally familiar and unfamiliar faces. The authors report that interference between identity and expression increased for familiar faces (Ganel and Goshen-Gottstein, 2004), consistent with this information being processed in a more integral way in this case.

Taken together, the studies above suggest that familiarity modulates the relationship between the processing of identity and emotional expression in faces. Increased experience with faces lead to increased integration of information. As discussed above, pooling information across multiple channels allow the system to operate at super capacity, so enhancing processing efficiency. We suggest that experience with faces results in a qualitative change to the way faces are processed. Importantly this change occurs in adulthood, demonstrating that our face processing system retains flexibility throughout life. Furthermore, the above results show that there is no one system for processing faces, but multiple mechanisms operate in parallel

depending on the faces processed and on our previous experience with them—for example, the identity and emotion of novel faces (e.g., faces from a different ethnicity) are processed in parallel, while identity and emotion information from highly familiar face types are integrated. Thus we propose that experience shapes the connections between different processing channels and thereby increasing the efficacy of the processing in each of the individual channels. This brings up the question at what stage of the face processing identity and emotions are connected.

## **AT WHAT STAGE OF THE PROCESSING INFORMATION ON IDENTITY AND EMOTION IS INTEGRATED**

There are several stages of processing at which identity and expression/emotion could interact during face processing. The coactivation view (Miller, 1982) suggests that the interaction between identity and emotional expression leading to a superredundancy gain occurs just after the two stimuli have been separately coded, but prior to a decision about target presence. The interactive view (Mordkoff and Yantis, 1991) suggests that information about facial identity and emotional expression may be exchanged at early perceptual levels (inter-stimulus crosstalk) or at a decisional stage (non-target response bias). We next briefly discuss studies which may offer some resolution to these conflicting views.

Evidence for separate mechanisms for emotion and identity processing that interact prior to the decision comes primarily from neuropsychological cases and neuroimaging studies. The neuropsychological evidence mentioned above (Behrmann et al., 2007; Riddoch et al., 2008) showing a double dissociation between expression and identity processing. Neuroimaging studies suggest that different neural structures are involved in processing identity (invariant) and emotion (variant) information (Haxby et al., 2002). For example, it is shown that regions within the superior temporal process expressions, while regions along the Fusiform Gyrus process identity (Winston et al., 2004). It is further shown that processing within these two regions is relatively separated (Fairhall and Ishai, 2007). Taken together it is suggested that at some stage identity and expression are processed separately.

The alternative view suggests a single mechanism for processing identity and expressions from faces (Calder and Young, 2005). Thus arguing that identity and expression are not processed by dissociated mechanisms, but instead these two dimensions are processed within a single multi-dimensional space. This view relies on computational, neuropsychological and neuroimaging evidence. Computationally, it is shown that the principle components derived from pictures of different identity posing different expressions, contains identity specific, emotion specific and shared emotion and identity components (Cottrell et al., 2002). Thus the authors argue that within a single face representation system, different dimensions code for dissociated as well as shared features across the two dimensions. Critical review of neuropsychological studies by Calder and Young (2005) further suggest that most patients who are impaired at identity processing (prosopagnosia) also show impaired emotion recognition, when formally tested, albeit less severe. Finally, Calder and Young review neuroimaging studies showing that regions along the Fusiform Gyrus (assumed to be solely processing identity) often show sensitivity to the facial expression (Vuilleumier et al., 2003) while regions along the superior temporal (assume to be dedicated to expression) are often sensitive to the face identity (Winston et al., 2004).

In summary, it is unclear whether the interactive nature of emotion and identity arise from a single multi-dimensional space or due to interaction between different processing streams. Further research is needed to address this question, maybe using methods that have higher time resolution such as EEG or MEG.

#### **CONCLUSION**

We started our review by outlining three accounts for the relationship between the processing of identity and emotional expression in faces: independent, asymmetric and co-active processing of the two facial dimensions. We discussed in details support for each account from studies employing the Garner inference paradigm, the composite faces paradigm, and the divided attention paradigm. Based on this we conclude:

First, there is compelling evidence against strictly independent processing of identity and emotional expression (Ganel and Goshen-Gottstein, 2002, 2004; Wang et al., 2013), with perhaps the strongest evidence coming from studies of redundancy gains (particularly the mathematical tests against models assuming independent processing of expression and identity) (Yankouskaya et al., 2012, 2014a,b; Fitousi and Wenger, 2013).

Second, there are two crucial conditions for the interaction to occur: equal discriminability of identity and emotional expression (Ganel and Goshen-Gottstein, 2002; Wang et al., 2013) and an expression that is emotionally valenced (i.e., other than a neutral expression) (Yankouskaya et al., 2012).

Third, interactive processing of identity and emotional information in faces is modulated by familiarity and experience with faces (Ganel and Goshen-Gottstein, 2002; Yankouskaya et al., 2014a). Both greater familiarity and experience with faces facilitate the interaction.

#### **REFERENCES**


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 23 June 2014; accepted: 28 October 2014; published online: 14 November 2014*.

*Citation: Yankouskaya A, Humphreys GW and Rotshtein P (2014) The processing of facial identity and expression is interactive, but dependent on task and experience. Front. Hum. Neurosci. 8:920. doi: 10.3389/fnhum.2014.00920*

*This article was submitted to the journal Frontiers in Human Neuroscience*.

*Copyright © 2014 Yankouskaya, Humphreys and Rotshtein. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms*.

## ADVANTAGES OF PUBLISHING IN FRONTIERS

FAST PUBLICATION Average 90 days from submission to publication

COLLABORATIVE PEER-REVIEW

Designed to be rigorous – yet also collaborative, fair and constructive

RESEARCH NETWORK Our network increases readership for your article

## OPEN ACCESS

Articles are free to read, for greatest visibility

### TRANSPARENT

Editors and reviewers acknowledged by name on published articles

GLOBAL SPREAD Six million monthly page views worldwide

COPYRIGHT TO AUTHORS

No limit to article distribution and re-use

IMPACT METRICS Advanced metrics track your article's impact

SUPPORT By our Swiss-based editorial team

EPFL Innovation Park · Building I · 1015 Lausanne · Switzerland T +41 21 510 17 00 · info@frontiersin.org · frontiersin.org