READING FACES AND BODIES: BEHAVIORAL AND NEURAL PROCESSES UNDERLYING THE UNDERSTANDING OF, AND INTERACTION WITH, OTHERS

EDITED BY: Paola Ricciardelli, Andrew P. Bayliss and Rossana Actis-Grosso PUBLISHED IN: Frontiers in Psychology

#### *Frontiers Copyright Statement*

*© Copyright 2007-2017 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88945-145-6 DOI 10.3389/978-2-88945-145-6

# About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **READING FACES AND BODIES: BEHAVIORAL AND NEURAL PROCESSES UNDERLYING THE UNDERSTANDING OF, AND INTERACTION WITH, OTHERS**

Topic Editors:

**Paola Ricciardelli,** University of Milano-Bicocca & Milan Centre for Neuroscience, Italy **Andrew P. Bayliss,** University of East Anglia, UK **Rossana Actis-Grosso,** University of Milano-Bicocca & Milan Centre for Neuroscience, Italy

Artwork by Daniele Zavagno, Departmente of Psychology, University of Milano - Bicocca (email: daniele.zavagno@unimib.it)

The aim of this Research Topic was to offer an interdisciplinary forum for researchers interested in the interplay of face, eye gaze, and body perception in the understanding of others, with an emphasis on behavioural and neural processing. The papers included in this topic come from cognitive, neuroscience and social psychology perspectives and shed new light on how facial and body cues interact with each other and with social, ecological and contextual factors (such as for example social identification and group membership) to form a unified representation that can guide our perceptions and responses to other people. Altogether, they provide an up-to-date picture of advances in this fascinating research field.

**Citation:** Ricciardelli, P., Bayliss, A. P., Actis-Grosso, R., eds. (2017). Reading Faces and Bodies: Behavioral and Neural Processes Underlying the Understanding of, and Interaction with, Others. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-145-6

# Table of Contents


*80 Facial feedback affects valence judgments of dynamic and static emotional expressions*

Sylwia Hyniewska and Wataru Sato

*87 Emotional processing in Parkinson's disease and schizophrenia: evidence for response bias deficits in PD*

Ilona P. Laskowska, Ludwika Gawrys´, Szymon Łe˛ ski and Dariusz Koziorowski

*96 Processing of masked and unmasked emotional faces under different attentional conditions: an electrophysiological investigation*

Marzia Del Zotto and Alan J. Pegna

# **Role of Social factors in social information processing**


# Editorial: Reading Faces and Bodies: Behavioral and Neural Processes Underlying the Understanding of, and Interaction with, Others

Paola Ricciardelli 1, 2 \*, Andrew P. Bayliss <sup>3</sup> and Rossana Actis-Grosso1, 2

*<sup>1</sup> Department of Psychology, University of Milano-Bicocca, Milan, Italy, <sup>2</sup> Milan Centre for Neuroscience, Milan, Italy, <sup>3</sup> School of Psychology, University of East Anglia, Norwich, UK*

Keywords: face perception, gaze cueing, body language, atypical development, social information processing

#### **Editorial on the Research Topic**

#### **Reading Faces and Bodies: Behavioral and Neural Processes Underlying the Understanding of, and Interaction with, Others**

The ability of individuals to understand other people as beings who have intentional and mental states is fundamental to adapt to our social world. To this end, our perceptual and neural systems have evolved to extract useful information from faces and moving bodies of other humans to allow reciprocal social interactions and communication.

A central source of socially meaningful cues is the face and eye gaze, which can be visually analyzed to understand a person's emotions, focus of attention, intentions, beliefs, and desires. All of this body of information, although complex, is easily detected and used by people to go beyond a person's facial appearance to make inferences about personal dispositions and personality traits, such as trustworthiness.

The contributions of this Research Topic have addressed through different methodologies and techniques how we process and integrate the different types of information coming from static and dynamic faces and moving bodies and, on the other hand, how person categorization cues influence the way in which we process faces. The issues emerged from behavioral, neuropsychological, computer, and neurophysiological studies are briefly reviewed along with some remarks on future research directions and outstanding questions.

The specificity and the importance of faces as visual stimuli was addressed in the study by Shyi and Wang, who, by mean of a face composite task, tested the possibility that the top-half of a face might induce stronger holistic processing than the bottom-half counterpart. Their results show instead that holistic processing may distribute homogeneously within an upright face.

The ability of adults in decoding child facial expressions was studied by Gadea et al. These authors analyzed the relation between the facial expressions of a group of children when they told a lie and the accuracy in detecting the lie by a sample of adults, finding that the lies expressed with emotional facial expressions are more easily recognized by adults than the lies expressed with a "poker face." They also correlated the accuracy of the lie detectors with their subclinical traits of personality disorders. It was found that the presence of an emotion helps the observer to read the mind of the other person and highlight a modulatory effect of personality traits on this ability. Moreover, the interaction between facial cues as an index of emotional internal state and dynamic emotional expressions performed with faces by both an actor and the observer has been investigated by Hyniewska and Sato. With their study they show that the evaluation of an emotional face is influenced not only by the emotional expression of the face to be judged, but also by the emotional expression of the face of the judging person.

Edited and reviewed by: *Bernhard Hommel, Leiden University, Netherlands*

> \*Correspondence: *Paola Ricciardelli paola.ricciardelli@unimib.it*

#### Specialty section:

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

Received: *14 November 2016* Accepted: *23 November 2016* Published: *08 December 2016*

#### Citation:

*Ricciardelli P, Bayliss AP and Actis-Grosso R (2016) Editorial: Reading Faces and Bodies: Behavioral and Neural Processes Underlying the Understanding of, and Interaction with, Others. Front. Psychol. 7:1923. doi: 10.3389/fpsyg.2016.01923*

Lewinski showed that people are not very accurate at recognizing neutral faces as neutral. By comparing human performance with that of the automated facial coding (AFC) software he found that the computer software was far more accurate than people. This finding opens up new questions on the exact mechanism which can explain this discrepancy and what is the functional meaning and the advantage of seeing a face as emotional.

An important role in face processing can be played by the fact that in everyday life the external (e.g., hair style) and inner components of the face are not seen in isolation. In this respect, the paper by Saegusa et al. showed that attractiveness judgments of hair surrounding a task-irrelevant face were always influenced by the attractiveness of the face itself. This study provides evidence that visual attractiveness information, relevant for person categorization and personality trait inference (Dion et al., 1972), is integrated at the perceptual level. An outstanding issue for future research concerns the temporal dynamic of this integration and where within the human brain (e.g., in the occipitotemporal cortex) it occurs.

However, not only facial cues provide crucial information regarding a person's internal state. In every-day situations, body language or "bodily kinematics" are equally important, especially when facial signals are unavailable to the observer. A growing body of evidence shows that body motion cues are also a core component of social interactions and concur to make the first impression of a person. Actis-Grosso et al. directly compared pictures of static emotional faces with body motion cues (i.e., biological motion display) to test their efficacy in conveying emotions. They found that emotions are not recognized in the same way but some emotions (i.e., sadness) are better recognized when conveyed by static faces whereas others (i.e., fear) by motion displays.

With regard to how face and body motion cues may contribute to social understanding in typical and atypical population, it is becoming apparent that variance in face recognition among the general population is much higher than previously thought. Albonico et al. show that motion improves face recognition performance of poor face recognizers, but does not improve that of those who already find face recognition easy. In their study, Actis-Grosso et al. also compared the performance in the recognition of emotions of young adults with Low or High Autistic Traits, finding that the two groups could rely on different cues for the recognition of emotions.

To date little is known about how facial and body cues interact with each other, and with social (e.g., social identification and group membership) and ecological factors to form a unified representation that can guide our perceptions and responses to other people. Jarick and Kingstone based their study on the hypothesis that a cornerstone of non-verbal communication is the eye contact between individuals and the time that it is held. In their study they show experimentally that the effect of eye contact, which is considered as a form of body language, can be quickly and profoundly altered merely by having participants, who had never met before, play a game in a cooperative or competitive manner. Laskowska used a more ecologically valid test (the Emotional Intelligence Scale—Faces), in which a mixture of basic and complex emotions (or social emotions) were presented, to assess whether the deficit in facial emotion recognition present in Parkinson's (PD) disease is due to impaired sensory processes or impaired decision making ones. They compared PD's patients to healthy controls and to a group of patients with schizophrenia. While in patients with schizophrenia facial emotion recognition seems to originate only from a generalized sensory impairment, PD's patients showed both a decreased sensitivity and a change in response bias compared with healthy controls. This study indicates that when a more ecological approach is taken it provides a better differentiation of the origins underling everyday emotion recognition in pathological populations.

In a similar vein, by using more realistic 3D avatars that suddenly shifted their eyes, thus mimicking more natural social interaction, Dalmaso et al. provide some evidence that in right-hemisphere damaged patients the ability to shift attention in response to eye gaze stimuli (gaze cueing effect) was preserved and that head orientation does not seem to modulate the gaze cueing effect. Therefore, combining the study of neuropsychological patients with that of the processing of social cues provides new hints about both neural and behavioral mechanisms of social attention. In particular, Bobak and Langton cast doubt on the long-held view that gaze cueing does not require top-down control by showing that we do not follow gaze direction when working memory capacity is occupied.

There is not a full theoretical account of how we process, integrate, and interpret the various social signals from a visual image. In an ERP study Del Zotto and Pegna addressed the issue about how the brain process positive and negative facial emotions. In particular, they focused on the interaction between awareness, non-spatial selective attention, and emotion processing. Using a backward masking paradigm, they found that attention and awareness are partially dissociated in emotion processing as indicated by the finding that they affect different EEG components at different processing time.

Finally, Proietti et al. work demonstrates that we look at the faces of people of different ages in different ways. This is important as it adds to data regarding other categories such as ethnicity, using eye tracking as a method to supplement measures such as processing speed to tell us more about processing style and content of in- and out-group individuals. By contrast, the studies by Cañadas et al. and Jacquot et al. respectively provide new evidence on how person categorization and person knowledge can bias cognitive processes. Cañadas et al. show that when learning about the reliability of people in a trust economic game, participants generalize the positive behavior of white faces to other members of that group, while they are sensitive to individual behavior of black faces. On the other hand, Jacquot et al. show that even people that you believe to be incompetent can alter your own metacognitive appraisal of your accuracy at a task. That is, after making a 2AFC judgment, seeing a video of a person nodding their head boosts confidence that one's decision was correct and seeing a head shake reduces this. The effect is smaller but still present even if the person in the video is known to be incompetent. Jacquot et al. also used facial EMG and showed

smile-muscle activity only when competent people nodded their head following difficult judgments.

In conclusion, the variety of approaches and methods employed by the studies included in this topic highlights the need to adopt a multidisciplinary perspective to reach a full theoretical account of how we extract, process and interpret the various social signals coming from a person. The new account should integrate information from the face and body as well as social and contextual information, thus helping also to advance current models of face processing. What should still be addressed in future research, for example, is how personality inferences derived from the person's perceptual appearance bias cognitive processes involved in the understanding of others. In future studies, comparing groups of individuals in normal and pathological conditions might help to better understand the

# REFERENCES

Dion, K., Berscheid, E., and Walster, E. (1972). What is beautiful is good. J. Pers. Soc. Psychol. 24, 285. doi: 10.1037/h0033731

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

interplay between individual differences and social perception. We hope that the papers included here can stimulate and guide research in social cognition and social neuroscience by bringing together research in the field of cognitive and social psychology.

# AUTHOR CONTRIBUTIONS

PR: Planned the topic and edited the majority of papers included in the topic. AB: Edited some papers included in the topic. RA: Planned the topic and edited some papers included in the topic.

# FUNDING

This work was supported by a grant from the University of Milano-Bicocca (Fondo di Ateneo 2014) to PR and to RA.

Copyright © 2016 Ricciardelli, Bayliss and Actis-Grosso. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Testing Differential Holistic Processing Within a Face: No Evidence of Asymmetry from the Complete Composite Task

Gary C.-W. Shyi1,2 and Chao-Chih Wang<sup>1</sup> \*

<sup>1</sup> Department of Psychology and Center for Research in Cognitive Science, National Chung Cheng University, Chia-Yi, Taiwan, <sup>2</sup> Advanced Institute of Manufacturing with High-tech Innovations, National Chung Cheng University, Chia-Yi, Taiwan

The composite face task is one of the most popular research paradigms for measuring holistic processing of upright faces. The exact mechanism underlying holistic processing remains elusive and controversial, and some studies have suggested that holistic processing may not be evenly distributed, in that the top-half of a face might induce stronger holistic processing than its bottom-half counterpart. In two experiments, we further examined the possibility of asymmetric holistic processing. Prior to Experiment 1, we confirmed that perceptual discriminability was equated between top and bottom face halves; we found no differences in performance between top and bottom face halves when they were presented individually. Then, in Experiment 1, using the composite face task with the complete design to reduce response bias, we failed to obtain evidence that would support the notion of asymmetric holistic processing between top and bottom face halves. To further reduce performance variability and to remove lingering holistic effects observed in the misaligned condition in Experiment 1, we doubled the number of trials and increased misalignment between top and bottom face halves to make misalignment more salient in Experiment 2. Even with these additional manipulations, we were unable to find evidence indicative of asymmetric holistic processing. Taken together, these findings suggest that holistic processing is distributed homogenously within an upright face.

#### Edited by:

Rossana Actis-Grosso, University of Milano-Bicocca, Italy

#### Reviewed by:

Luis J. Fuentes, University of Murcia, Spain Emanuela Bricolo, University of Milano-Bicocca, Italy

> \*Correspondence: Chao-Chih Wang ccu.george@gmail.com

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 15 June 2015 Accepted: 20 September 2016 Published: 04 October 2016

#### Citation:

Shyi GC-W and Wang C-C (2016) Testing Differential Holistic Processing Within a Face: No Evidence of Asymmetry from the Complete Composite Task. Front. Psychol. 7:1506. doi: 10.3389/fpsyg.2016.01506 Keywords: face recognition, holistic processing, asymmetry, congruency effect, perceptual field hypothesis

# INTRODUCTION

Face recognition is a ubiquitous ability for humans and many investigators agree that at its core is holistic processing (Tanaka and Farah, 1993; McKone, 2010). Using the composite face task, Young et al. (1987) were among the first to demonstrate holistic processing of faces, which many regard as the hallmark of face processing and which is at the core of the debate between the expertise hypothesis and domain-specificity hypothesis of face processing (Kanwisher, 2000; Gauthier and Tarr, 2002; Gauthier et al., 2010; McKone, 2010). The composite task has been used to assess failures of selective attention to irrelevant face parts, and failures in selective attention result in unwarranted processing of irrelevant parts, which in turn interferences with processing of target face parts. Participants cannot focus on the specific part (e.g., the top face half) while ignoring the irrelevant part (e.g., the bottom

face half), which implies that faces are processed holistically, rather than as parts that are combined. Young et al. (1987) designed the composite task and used celebrity faces as stimuli. Participants were asked to name celebrities based on the tophalf of composite faces, and the bottom face half interfered with performance more in the aligned (composite) than misaligned (non-composite) condition. In other words, it was more difficult for participants to respond to the same celebrity in the top face half in the aligned than misaligned condition. Based on these findings, Young et al. (1987) suggested that for aligned faces, participants perceive integration of the top and bottom face parts, and such integrated, holistic processing is disrupted with misaligned faces. Young et al. (1987) concluded that processing face identity requires holistic processing, not merely featural processing. It is interesting to note that they used inverted faces instead of misaligned faces in their second experiment and found comparable results, suggesting that face inversion might share the same mechanisms with (or at least be functionally equivalent to) misalignment in terms of disrupting holistic processing (Gauthier and Bukach, 2007).

Following Young et al.'s (1987) initial study, Hole (1994) demonstrated that irrelevant face parts also influence simultaneous matching of unfamiliar faces. In each trial, a pair of faces was simultaneously presented and observers had to judge whether the top parts of the displayed faces were the same or different by a button press. This differs from the naming task used by Young et al. (1987). Regardless of task, findings from these earlier studies lend support to the conjecture that upright faces are processed holistically rather than via piecemeal featural processing.

Over the past two decades, researchers have found support for the notion that holistic processing plays a central role in face perception and recognition (Gauthier et al., 1998; Gauthier and Tarr, 2002; Robbins and McKone, 2003, 2007), and many are now trying to answer the question regarding the exact nature of holistic processing and its underlying mechanisms (for reviews, see Rossion, 2013; Richler and Gauthier, 2014). Currently, there are two main hypotheses, the template hypothesis (also called the holistic encoding hypothesis) and the attention strategy hypothesis (Richler and Gauthier, 2014). According to the template hypothesis, faces are encoded as a single unit to fit a template (Tanaka and Farah, 1993; Farah et al., 1998). The whole face is matched to a unified memory template rather than to parts. In other words, faces are represented as an undifferentiated whole because facial features are glued into a single unitary representation (Richler et al., 2012). Alternatively, the attention strategy hypothesis proposes that faces are processed holistically because attention to the whole becomes automatized with experience (Richler et al., 2011b, 2012). In other words, while facial features could be encoded and represented independently, holistic processing arises from a strategy of attending to all face parts simultaneously.

In addition to these two views, Rossion (2008, 2009, 2013) and van Belle et al. (2010) proposed the perceptual field hypothesis, which in their view is compatible with the holistic encoding or template hypothesis, to explain the inversion effect in face processing (Rossion, 2013). Specifically, as illustrated in **Figure 1**,

from Rossion, 2009, with permission).

the perceptual field for an upright face is expanded to cover almost the entire face, which results in an observer's perception of a whole face, rather than a collection of facial features in isolation (Rossion, 2009; van Belle et al., 2010). When faces are inverted, however, the perceptual field is contracted to contain only specific local features, and observers perceive one local feature at a time (**Figure 1**). This hypothesis has been used to explain inversion effects in face perception. For example, using gaze-contingent displays, van Belle et al. (2010) showed that the difference between upright and inverted faces disappears when observers could only perceive a face one piece at a time through a gaze-contingent window that necessarily disrupts holistic processing.

The face template, attention strategy, and perceptual field hypotheses might not be completely incompatible with one another in terms of symmetry of holistic processing within a face. All three hypotheses emphasize integration of facial features into some sort of holistic representation during face processing, and posit that it is difficult, if not impossible, to process features independently in upright faces. However, the three hypotheses differ in terms of the origin of holistic processing. Whereas the attention strategy and perceptual field hypotheses emphasize the influence of experience from regular exposure to faces and frequent social interactions (Rossion, 2013; Richler and Gauthier, 2014), the template hypothesis postulates

an internal origin for holistic processing. Specifically, the face template may be established innately, and its impact can be observed during early infancy (McKone et al., 2007; McKone, 2010).

One implication of the perceptual field hypothesis (Rossion, 2009, 2013) is that participants perceive upright faces in entirety rather than a combination of top and bottom halves, even though they are able to pay attention to the top or bottom half upon request. Therefore both the template and perceptual field hypotheses appear to assume homogeneous or unitary holistic processing within an upright face. Consequently, both hypotheses predict that comparable holistic effects should be observed regardless of whether top or bottom face halves are targets. In contrast, according to the attention strategy hypothesis, holistic processing is a failure of selective attention in the composite task (Richler et al., 2009, 2012; Richler and Gauthier, 2014), and it is unclear whether attentional weights for top and bottom parts are equal. If they are, then holistic processing should be symmetrical; if weights are not equal, holistic processing should be asymmetrical. In fact, a recent study by Chua et al. (2014) showed that attentional weights to different face parts (and hence holistic processing) can be modulated via learning to pay attention to either the top, bottom, or both face parts based on which part or parts were diagnostic for differentiating group members.

Alternatively, processing resources may not be evenly distributed within a face, leading to the prediction of asymmetrical holistic processing in the composite face task. In fact, empirical evidence reviewed by Rossion (2013) suggests that the face research field at large appears to be in favor of a top/bottom asymmetry in holistic processing. The first empirical evidence purportedly supporting asymmetric holistic processing was from Young et al. (1987), where reaction times (RT) for naming were longer for top versus bottom face halves in the composite (aligned) condition. Furthermore, the magnitude of the alignment effect (difference in RT between composite (aligned) and non-composite (misaligned) conditions) was greater for top (256 ms) than bottom (159 ms) face parts.

More generally, Rossion (2013) offered three possible explanations for asymmetry in holistic processing. First, the top part (e.g., eyes and eyebrows) are more important than the bottom part when recognizing identity. Second, the location of optimal fixation for the purpose of identifying a face is in the top part. Third, the top part includes more elements (two eyes, eyebrows, part of nose) than the bottom part (essentially a single mouth). Although these putative possibilities sound reasonable, it is important to re-evaluate the results of Young et al. (1987) more carefully before accepting the asymmetry hypothesis. Specifically, mean RTs in the non-composite (misaligned) condition were shorter for top (1041 ms) than bottom (1123 ms) parts, even though top (1297 ms), and bottom (1282 ms) parts yielded comparable RTs in the composite condition, suggesting that it was easier for participants to compare misaligned top parts than misaligned bottom parts. Therefore, the results from misaligned (or non-composite) trials fail to provide a baseline control for differential holistic processing between top and bottom parts on aligned (composite) trials.

In a more recent study, Schwartz et al. (2002) reported a reverse finding, such that holistic processing (measured in terms of an RT difference) was larger when the bottom face part was the target than when the top face part was the target. However, closer inspection reveals that their results might be driven by ceiling effects. Specifically, accuracy for top parts was 98% in the aligned condition and 99% in the misaligned condition, yielding a relatively small alignment effect (i.e., 1%). In contrast, accuracy for bottom parts was 87% in the aligned condition and 91% in the misaligned condition, resulting an alignment effect of 4%. The near-perfect performance for top parts in the misaligned condition clearly suggest that top face halves were easily discriminable compared to bottom face halves, which may confound the baseline control for inferring differential holistic processing. We think it is important to control relative discriminability between top and bottom face halves before assessing the possibility of differential holistic processing within a face.

Finally, some studies have demonstrated that the eye region is more diagnostic than the mouth region, and suggest that it is easier to detect the eye versus mouth region (Davies et al., 1977; Haig, 1985; Gosselin and Schyns, 2001). For example, Davies et al. (1977) asked participants to select which of six faces matched a target face. Participants were more likely to erroneously choose faces that had different mouths from the target versus different eyes. In other words, eyes were more salient face cues, such that participants were more likely to notice if they changed.

Taken together, these findings suggest that the conclusion that there is a top/bottom asymmetry in face processing (Rossion, 2013) may be at least partially due to uneven discriminability between the two face halves. Therefore, in the present study we examined the possibility of differential holistic processing within a face without confounding relative discriminability. Specifically, we first confirmed that the discriminability of top and bottom face parts was equal. Then, we used the complete composite task (Gauthier and Bukach, 2007) to test whether there is differential holistic processing within a face.

Both Young et al. (1987) and Hole (1994) only calculated the difference in performance between aligned and misaligned conditions (alignment effect) for trials where top halves were the same, while completely ignoring data from different trials (Robbins and McKone, 2007; Rossion, 2013). Gauthier and Bukach (2007) proposed what they called the complete design to replace this traditional composite task, also called the partial design, for two reasons. First, although some researchers have suggested that only data from same trials in the partial design should be analyzed (Robbins and McKone, 2007; Rossion, 2013) (**Figure 2**), Gauthier and Bukach (2007) and Richler and Gauthier (2014) argued that data from both same and different trials should be analyzed, because both are relevant for explaining the composite illusion. When different trials are ignored in the partial design, it is impossible to determine whether irrelevant parts facilitate or interfere with performance when relevant parts are different.

The second and perhaps more critical reason is that the partial design is susceptible to response biases (Richler and Gauthier, 2014) because participants tend to respond "same"

more often in the upright face condition than in the inverted face condition (Farah et al., 1998; Wenger and Ingvalson, 2003) and in the aligned condition than the misaligned condition (Gauthier and Bukach, 2007). Moreover, participants are more likely to respond "same" on trials where relevant and irrelevant parts are both "same" or both "different" (congruent trials) than trials where one part is the same and the other is different (incongruent condition), but in the partial design correct response and congruency are confounded (all "same" trials are incongruent and all "different" trials are congruent).

To rule out these potential problems, Gauthier and Bukach (2007) proposed that holistic processing should be assessed and measured in terms of a congruency effect (i.e., difference in performance between the congruent and incongruent conditions) using sensitivity (d') as the dependent variable (Green and Swets, 1966). Sensitivity is better on congruent than incongruent trials in the aligned condition, and the magnitude of this congruency effect is reduced in the misaligned condition (Richler and Gauthier, 2014, for a meta-analysis). We think it is appropriate to use the complete design of the composite task.

In the present study, we calculated congruency effects as an index of holistic processing, and expect to find an interaction between congruency and alignment, where the congruency effect will be larger for aligned than misaligned faces (Richler et al., 2011a; Wong et al., 2012; Richler and Gauthier, 2014). Moreover, to avoid potential confounds, we verified that perceptual discriminability was equivalent between the top and bottom face halves used in Experiment 1.

# EXPERIMENT 1

The goal of Experiment 1 was to test whether the magnitude of holistic processing would differ when the top versus bottom face half was the target in the complete design of the composite task (Gauthier and Bukach, 2007; Wong et al., 2012). Prior to the composite task, we verified that top and bottom face halves were equally discriminable.

# Materials and Methods Participants

Sixteen college students (6 male, 10 female) from the National Chung Cheng University participated in Experiment 1 for NT\$ 100. Mean age was 21.5 years (SD = 2.56, range = 18–25 years). All participants had normal or corrected to normal vision. Participants were recruited in accordance with approval of the Research Ethics Committee of National Chung Cheng University, Chia-Yi, Taiwan.

#### Design

We adopted the complete design and computed a measure of sensitivity (d') for each participant as the dependent variable. "Same" responses on "same" trials were defined as hits, and

"same" responses on "different" trials were defined asfalse alarms. In each trial, two composite faces were shown simultaneously. The top or bottom part was designated as the target for each block. For aligned composites, the top and bottom face halves were modified slightly whenever necessary to create smooth alignment between the two halves. For misaligned composites, top and bottom face halves were moved horizontally. The same face stimuli were used for aligned and misaligned conditions regardless of whether the target was the top or bottom face half.

#### Stimuli

For face stimuli, we first created 32 different Asian face images with equal number of male and female faces using FaceGen 3.1 (Singular Inversions, Canada). Half (eight male and eight female) were designated as the target set, and the remaining half were designated as the irrelevant set. To ensure that top and bottom face halves were equally discriminable, we tested another group of 14 college students (six female, eight male) from the National Chung Cheng University in a task where face halves (top or bottom) were presented alone. A pair of face halves were presented in each trial, and participants were asked to judge whether or not the two halves were identical. Each participant completed eight practice trials and 256 formal trials, which took about 20 min. Mean performance for top face halves (M = 2.07) was almost identical to mean performance for bottom face halves (M = 2.17), t(13) = 0.675, p > 0.05, suggesting that the face halves were equally discriminable. These face halves were then used to construct face composites.

Top halves from the relevant set were randomly paired with bottom halves from the irrelevant set to create face composites in accordance with the complete design illustrated in **Figure 2**. Specifically, there were 16 faces for each of the four face composites (A/B, A/C, D/C, and D/B) in **Figure 2**.

Each face image was shown in grayscale on a black square background with 100 pixels on each side. When presented on the display screen, each face was about 4.01 cm in width and 4.80 cm in height, subtending a visual angle of about 5.10◦ × 6.11◦ at a viewing distance of approximately 45 cm. An overextended white line was overlaid horizontally in the middle of each face to clearly demark the top and bottom halves. The line was of 8.18 cm in length and 0.14 cm in height, subtending 10.39◦ × 0.18◦ of visual angle. The white line did not disrupt the perceptual integrity of the face, but was necessary to clearly distinguish the top and bottom halves (Rossion, 2013). The top and bottom halves of faces were separated by about 2◦ of visual arc (25 pixels) in the misaligned condition.

#### Procedure

In each trial (**Figure 3**), a fixation cross was shown for 500 ms, followed by the presentation of a pair of composite faces for 2000 ms. Participants were asked to judge whether the top halves of the faces were identical while ignoring the bottom halves, or vice versa. One face was located in the upper left quadrant and the other face was located in the lower right quadrant to discourage feature-by-feature comparison. The center of the face in the left quadrant was about 4.69 cm (6◦ in visual arc) below the top edge of the monitor, and about 13.31 cm (visual angel 16.83◦ ) from the left edge of the monitor. The center of the face in the lowerright quadrant was roughly the same distances from the bottom and right edge of the monitor. The two faces were separated by a center-to-center distance of about 14.12 cm (17.83◦ in visual arc). The top half was the target in one block, and the bottom half was the target in another block. The order of the two blocks was counterbalanced across participants. It took about 40 min for

participants to complete 24 practice trials and 256 experimental trials. Note the same set of 256 composite images was used in both blocks.

# Results and Discussion

fpsyg-07-01506 October 1, 2016 Time: 13:47 # 6

Mean d' was computed in each condition and submitted to a three-way repeated-measure ANOVA with part (top vs. bottom), congruency (congruent vs. incongruent), and alignment (aligned vs. misaligned) as within-participants variables. As illustrated in **Figure 4**, the main effects of congruency and part were both significant, F(1,15) = 5.01, MSE = 2.60, p < 0.05, and F(1,15) = 44.27, MSE = 17.41, p < 0.001, respectively. The performance in the bottom part condition (M = 2.21) was better than that in the top part condition (M = 1.92) and the performance on congruent trails (M = 2.43) was better than that on incongruent trials (M = 1.70). The two-way interactions between part and congruency, F(1,15) = 6.57, MSE = 1.92, p < 0.05, and between alignment and congruency, F(1,15) = 30.56, MSE = 7.13, p < 0.001, were also significant. The difference between the congruent trials and incongruent trials in the top part condition (M = 0.98) was greater than that in the bottom part condition (M = 0.49). The difference between congruent trials and incongruent trials in the aligned condition (M = 1.21) was greater than that in the misaligned condition (M = 0.27). However, the three-way interaction between part, alignment, and congruency was not significant, F < 1.

The two-way interaction between alignment and congruency is consistent with many previous studies (Richler and Gauthier, 2014), indicating that the irrelevant parts were less likely to affect relevant parts in the misaligned (M = 0.27) than aligned condition (M = 1.21) because spatial misalignment disrupts holistic processing. To further examine the possibility of differential holistic processing, we submitted the difference in d' between congruent and incongruent trials (congruency effect) to a two-way repeated-measure ANOVA with part and alignment as independent variables. As illustrated in **Figure 5**, both the main effects of part and alignment were significant, F(1,15) = 6.57, MSE = 3.85, p < 0.022, and F(1,15) = 30.56, MSE = 14.27, p < 0.001. The congruency effect for the top part condition (M = 0.49) was greater than those for the bottom part condition (M = 0.23). The congruency effect in the aligned condition (M = 0.60) was greater than that in the misaligned condition (M = 0.13). However, the interaction between part and alignment was not significant, F < 1. Therefore, we found no evidence for differential holistic processing between the top and bottom parts.

Although the three-way interaction between part, alignment, and congruency, which would be indicative of asymmetry in holistic processing between top and bottom face halves, was not significant, it is worth noting that the two-way interaction between part and congruency was significant. Follow-up analyses showed that there was a difference between the top and bottom parts on incongruent, but not congruent trials F(1,15) = 10.54, MSE = 2.25, p < 0.01, and F < 1, respectively. These results suggest that the top and bottom parts might not be equally discriminable.

However, these findings do not necessarily mean that we failed to control perceptual discriminability between the top and bottom parts. Rather, a more plausible explanation may have to do with the fact that face halves with equivalent discriminability when presented in isolation were positioned together to create whole faces. Instructions to respond to the target part while ignoring the irrelevant part may not completely prevent perceptual input from the latter while participants presumably focused processing on the former. As Rossion (2009) and van Belle et al. (2010) predict, the perceptual field likely encompasses the entire face when it is presented upright (compared to when it is inverted). Moreover, although the face features included in the perceptual field may be identical regardless of which part is the target when the two parts are aligned, this may not be the case when the two parts are separated in the misaligned condition.

As illustrated in left half of **Figure 6**, when the top of a face is the target, the perceptual field may contain more facial details than when the bottom part is the target. This difference may be more disruptive to performance on incongruent trials, where top and bottom parts elicit contrasting responses, than congruent

FIGURE 5 | Congruency effects, defined as differences in d' between congruent and incongruent trials, as a function of face part (top vs. bottom) and alignment (aligned vs. misaligned) in Experiment 1. Error bar indicated ±1 standard error of mean.

trials, where the two parts elicit identical responses. These differences may have contributed to the interaction between part and congruency in Experiment 1. In fact, inspection of **Figure 4** suggests that both aligned and misaligned conditions yielded comparable performance for top and bottom parts in congruent trials, and performance differed between top and bottom parts in based on alignment in incongruent trials.

# EXPERIMENT 2

As a better control for the potential confound discussed above, we further displaced top and bottom parts so that they were completely separated in the misaligned condition (see right side of **Figure 5**), which may additionally serve to eliminate lingering holistic effects observed in that condition of Experiment 1 (**Figure 4**). It is worth noting that in Experiment 1, variability of the congruency effect for the aligned condition when top half was the target was relatively large compared to the other conditions (**Figure 5**). To reduce performance variability, we doubled the number of trials in Experiment 2.

# Materials and Methods Participants

fpsyg-07-01506 October 1, 2016 Time: 13:47 # 8

Nineteen college students (9 male, 10 female) from the National Chung Cheng University in Chiayi County, Taiwan, participated in Experiment 2. All participants had normal or corrected to normal vision, and received NTD\$120 for their participation.

#### Stimuli

Stimuli were the same as Experiment 1, except we increased the separation between top and bottom face halves in the misaligned condition. The top part was displaced to the right by about 4◦ visual angle in Experiment 2, which is double the displacement used in Experiment 1.

#### Procedure

The procedure was identical to Experiment 1. Each participant completed eight practice trials and 512 formal trials, which took about 50 min.

# Results and Discussion

As illustrated in **Figure 7**, mean d' was computed in each condition and submitted to a three-way repeated-measure ANOVA with part, congruency, and alignment as withinparticipant variables. The main effects of part, alignment, and congruency were significant, F(1,18) = 4.56, MSE = 3.94, p < 0.05, F(1,18) = 10.08, MSE = 1.51, p < 0.001, and F(1,18) = 67.52, MSE = 10.22, p < 0.001, respectively. The performance in the top part condition (M = 2.51) was better than that in the bottom part condition (M = 2.18). The performance in the misaligned condition (M = 2.45) was better than that in the aligned condition (M = 2.24). The performance on congruent trials (M = 2.61) was better than that on incongruent trials (M = 2.09). The two-way interaction between alignment and congruency also was significant, F(1,18) = 22.38, MSE = 5.33, p < 0.001. The difference between congruent trials and incongruent trials in the aligned condition (M = 0.45) was greater than that in the misaligned condition (M = 0.07). However, the three-way interaction was not significant, F < 1. As indicated in **Figure 7**, the interaction between congruency and alignment was very similar regardless of whether the top or bottom half was the target. Contrary to Experiment 1, the interaction between part and congruency was not significant, F(1,18) = 1.28, MSE = 0.22, p > 0.05.

This observation was further confirmed when we used magnitude of congruency effect (i.e., difference in d' between congruent and incongruent trials) as the dependent variable and performed a two-way repeated-measure ANOVA with part and alignment as independent variables. As shown in **Figure 8**, only the main effect of alignment was significant, F(1,18) = 22.38, MSE = 10.67, p < 0.001. The congruency effect in the aligned condition (M = 0.89) was greater than that in the misaligned condition (M = 0.14). Neither the main effect of part nor its interaction with alignment was significant, Fs < 1. These latter results again indicate that, compared to Experiment 1, we were better able to control perceptual discriminability between top and bottom parts when we enlarged the spatial separation between them in the misaligned condition.

# GENERAL DISCUSSION

The main purpose of the present study was to examine whether differential holistic processing between the top and bottom face parts, measured by congruency effect with the complete design, would be eliminated when parts were equated in terms of perceptual discriminability. In Young et al. (1987), reaction times were longer in the misaligned than aligned condition, and there was an interaction between part and alignment. Rossion (2013) recently suggested that this finding is indicative of a top–bottom asymmetry in the composite effect, where holistic processing is larger for the top than bottom part.

However, differential holistic processing obtained by Young et al. (1987) may have been due to a confound from stimulus

discriminability. To avoid this confound, it is necessary to control discriminability between top and bottom face parts. In Experiment 1, our results revealed that participants performed equally well when top or bottom halves were presented in isolation, indicating that top and bottom face halves were equally discriminable perceptually. Furthermore, the results of Experiments 1 and 2 suggest that holistic processing is distributed homogenously within an upright face, consistent with predictions derived from both the template and perceptual field hypotheses, which suggest that upright faces induce a relatively large perceptual spatial window that encompasses the entire face. Our findings are also consistent with predictions based on the attention strategy hypothesis where attentional weights are equal for all face parts.

Given our findings, we suggest that the results from Young et al. (1987), which have been taken as an indication of topbottom asymmetry, might have been caused by differences in stimulus discriminability. In addition to the physical factor of discriminability, it is worth noting that Rossion (2013) proposed several other factors that may affect homogeneity of holistic processing within a face. For example, there are more fixations at the eye region than at the mouth region (Bombari et al., 2009; Xu and Tanaka, 2013). Moreover, the eye region seems to be more attractive and contains more social information (Baron-Cohen et al., 1997). In contrast, some patients (e.g., prosopagnosia) show less attention to the top half of faces (Orban de Xivry et al., 2008; Ramon et al., 2010).

# CONCLUSION

The present study was designed to whether there is differential holistic processing within a face. Our findings demonstrate a top/bottom symmetry, not asymmetry, in holistic processing, lending credence to the proposal that representations underlying holistic processing are unitary and homogenous, with equal weighting between top and bottom face parts. Although our results support the general notion of symmetric holistic processing within an upright face, this does not necessarily mean that the magnitude of holistic processing for top and bottom parts cannot be altered. Quite the contrary—recent studies have shown that attention and experience can modulate holistic processing (Chua et al., 2014; Richler and Gauthier, 2014). As another alternative, researchers could also consider the possibility that both the holistic encoding (template) and attention strategy hypothesis are both in operation, such that while representations of upright faces are holistic, its processing can be subject to attentional modulation. For example, in Experiment 2, we enlarged the separation between top and bottom parts in the misaligned condition to the point they were separated completely by 4<sup>0</sup> without any visible overlap (see the two panels on the left in **Figure 6**, p. 17). We speculated that with the complete separation of top and bottom parts in the misaligned condition, participants of Experiment 2 probably had more opportunity to learn, perhaps by constricting more effectively their perceptual field to the top part when it was the target, and thereby minimized the potential interference from the irrelevant, bottom part, especially when the bottom part would elicit an incongruent response. This may be the reason why no significant difference between top and bottom parts was found when we used congruency effect as the dependent measure in Experiment 2. In future studies, we seek to unravel the factors that may modulate holistic processing, especially with respect to predictions based on the holistic encoding versus attention strategy hypotheses.

# AUTHOR CONTRIBUTIONS

GS contributed to the rationale of the whole experiments, analyzed the data and drafted the manuscript. C-CW found the conflict of the rationale, designed the experiments, collected and analyzed the data. The authors both revised the manuscript and replied to the reviewers.

# ACKNOWLEDGMENTS

fpsyg-07-01506 October 1, 2016 Time: 13:47 # 10

This study was supported by a research grant (No.:103-2815- C-194-016-H) from the Ministry of Science and Technology in

## REFERENCES


Taiwan, ROC, awarded to GS. We would like to express our appreciation to Isabel Gauthier and Jennifer Richler for their comments on an early version of the manuscript. We also thank Patty Lee for her assistance in data collection for Experiment 2. Finally, we would like to express our gratitude to the candid and constructive comments from three reviewers, which have greatly improved the manuscript. A portion of the study was presented at the 15th Annual Meeting of Vision Sciences Society in St. Pete Beach, Florida, USA in May, 2015.


Tanaka, J. W., and Farah, M. J. (1993). Parts and wholes in face recognition. Q. J. Exp. Psychol. Hum. Exp. Psychol. 46, 225–245. doi: 10.1080/14640749308401045

van Belle, G., de Graef, P., Verfaillie, K., Rossion, B., and Lefèvre, P. (2010). Face inversion impairs holistic perception: evidence from gaze-contingent stimulation. J. Vis. 10, 1–13. doi: 10.1167/10.5.10


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer EB and handling Editor declared their shared affiliation, and the handling Editor states that the process nevertheless met the standards of a fair and objective review.

Copyright © 2016 Shyi and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Visual scanning behavior is related to recognition performance for ownand other-age faces

*Valentina Proietti1,2\*, Viola Macchi Cassia2,3, Francesca dell'Amore3, Stefania Conte2,3 and Emanuela Bricolo2,3*

*<sup>1</sup> Department of Psychology, Brock University, St. Catharines, ON, Canada, <sup>2</sup> NeuroMI, Milan Center for Neuroscience, Milan, Italy, <sup>3</sup> Department of Psychology, University of Milano-Bicocca, Milan, Italy*

It is well-established that our recognition ability is enhanced for faces belonging to familiar categories, such as own-race faces and own-age faces. Recent evidence suggests that, for race, the recognition bias is also accompanied by different visual scanning strategies for own- compared to other-race faces. Here, we tested the hypothesis that these differences in visual scanning patterns extend also to the comparison between own and other-age faces and contribute to the ownage recognition advantage. Participants (young adults with limited experience with infants) were tested in an old/new recognition memory task where they encoded and subsequently recognized a series of adult and infant faces while their eye movements were recorded. Consistent with findings on the other-race bias, we found evidence of an own-age bias in recognition which was accompanied by differential scanning patterns, and consequently differential encoding strategies, for own-compared to other-age faces. Gaze patterns for own-age faces involved a more dynamic sampling of the internal features and longer viewing time on the eye region compared to the other regions of the face. This latter strategy was extensively employed during learning (vs. recognition) and was positively correlated to discriminability. These results suggest that deeply encoding the eye region is functional for recognition and that the own-age bias is evident not only in differential recognition performance, but also in the employment of different sampling strategies found to be effective for accurate recognition.

#### *Edited by:*

*Andrew Bayliss, University of East Anglia, UK*

#### *Reviewed by:*

*Tilo Strobach, Medical School Hamburg, Germany Peter James Hills, Anglia Ruskin University, UK*

> *\*Correspondence: Valentina Proietti vproietti@brocku.ca*

### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 31 May 2015 Accepted: 19 October 2015 Published: 03 November 2015*

#### *Citation:*

*Proietti V, Macchi Cassia V, dell'Amore F, Conte S and Bricolo E (2015) Visual scanning behavior is related to recognition performance for own- and other-age faces. Front. Psychol. 6:1684. doi: 10.3389/fpsyg.2015.01684* Keywords: face age, age bias, eye movements, encoding, recognition, adult faces, infant faces

# INTRODUCTION

It is well-known that our ability to recognize faces varies depending on certain facial dimensions: individuals generally recognize human faces and faces from one's own race more accurately and faster than other-species (see review in Dufour et al., 2006) and other-race faces (see review by Meissner and Brigham, 2001). Age as well is known to affect how faces are remembered. In a seminal study by Bäckman (1991), young adults recognized own-age faces more accurately than other-age faces regardless of whether the faces were familiar (famous) or unfamiliar. This original finding of an advantage in the processing of own-age compared to other-age faces (i.e., own-age bias, OAB) in young adults has been replicated in numerous studies investigating either identity recognition (in eyewitness paradigms or old/new recognition tasks) or identity matching (in delayed match-to-sample tasks) when performance for young adult (i.e., own-age) faces was compared to that for older adult faces (e.g., Anastasi and Rhodes, 2006; Wiese et al., 2008; He et al., 2011) child faces (Anastasi and Rhodes, 2005; Kuefner et al., 2008; Harrison and Hole, 2009; Hills and Lewis, 2011) or infant faces (Kuefner et al., 2008; Macchi Cassia et al., 2009a,b; Yovel et al., 2012).

For all these dimensions, the faces that are more readily recognized—that is, human faces, own-race faces and ownage faces—when compared with their within-category counterparts—that is, other-species faces, other-race faces and other-age faces—are those with which participants have accumulated abundant experience. Superior recognition of faces from over-experienced categories has been attributed to perceptual expertise as well as to social cognitive factors. According to perceptual expertise accounts, extensive experience with faces from a given category (e.g., own-race) results in exquisite sensitivity to differences among faces in, for example, the shape and spacing of facial features (e.g., Rhodes et al., 2009; Tanaka and Pierce, 2009; Mondloch et al., 2010). According to social cognitive accounts, adults encode faces of in-group members at the individual level whereas they encode faces of out-group members at the categorical level (Levin, 2000; Sporer, 2001; Ge et al., 2009). Recent proposals have argued for an integrative framework in which social cognition and perceptual expertise interact in determining an individual's sensitivity to individuating facial characteristics (Sporer, 2001; Young et al., 2012).

Indeed, there is ample evidence that adults process faces from different races differently, both in terms of the underlying neural mechanisms and the associated visual processing strategies. For example, electrophysiological studies have found that the facesensitive N170 is of larger amplitude in response to upright otherrace faces compared to upright own-race faces and face inversion affects this component more for the latter than the former types of faces. These results suggest that although configural/holisitic information is extracted from faces of both racial groups, upright other-race faces require increased processing demands (e.g., Caharel et al., 2011; Montalan et al., 2013). Although results are not always consistent, several behavioral studies have suggested that both configural/holistic information (e.g., Tanaka et al., 2004; Michel et al., 2006) and featural cues (e.g., Hayward et al., 2008; Mondloch et al., 2010) are extracted more effectively from own-race faces than other-race faces.

More recently, the question of whether, and to what extent, the own-race bias in face memory is related to perceptual processing differences has been productively addressed using eye-tracking methodologies, which provide a direct measure of visual scanning behavior through on-line recording of visual fixations on various portions of the face with high temporal and spatial resolution. When viewing faces, adults are found to spend more time fixating the internal features, e.g., the eyes, nose and mouth (e.g., Janik et al., 1978; Walker-Smith et al., 2013), and this scanning strategy is related to subsequent recognition (e.g., Henderson et al., 2005). Given that eye movements are important for face memory, several studies have explored whether recognition deficits observed for faces belonging to less familiar race groups can be related to non-optimal exploration of these faces during encoding and/or recognition. Conflicting findings have been obtained in the investigation of this hypothesis. Some recent studies have shown how culture affects the way people view faces: Western observers normally tend to look longer to the eye region (reflecting the use of analytic perceptual strategies), whereas East Asians tend to focus more on the nose region (possibly reflecting the use more holistic perceptual strategies; Blais et al., 2008; Caldara et al., 2010; Fu et al., 2012; Hills and Pake, 2013). While some studies found that these crosscultural variations in scanning strategies do not differ for owncompared to other-race faces (Blais et al., 2008; Caldara et al., 2010; Hills and Pake, 2013), other studies showed that these variations are modulated by face race (East Asian participants: Fu et al., 2012; Hu et al., 2014; Western participants: Goldinger et al., 2009; Wu et al., 2012; McDonnell et al., 2014). Western participants were found to make more fixations on the eye region of same-race faces compared to other-race faces, and to fixate longer the nose and mouth region of Asian compared to Caucasian faces (e.g., Goldinger et al., 2009); they are also reported to make a larger number of shorter fixations while exploring own-race compared to other-race faces, suggesting the use of more active scanning strategies for the former than the latter (e.g., Wu et al., 2012). The same pattern of scanning behavior is observed during recognition, as the eyes of samerace faces are sampled more often than those of other-race faces, whereas the opposite occurs for the mouth (Nakabayashi et al., 2012).

Unlike the own-race bias, investigations of how faces of different ages are perceptually encoded and processed are limited. The behavioral own-age recognition advantage is mirrored in young adults by ERP responses, which show higher degree of specialization for own-age faces (i.e., young faces) compared to other-age faces (i.e., older faces; larger N170, VPP, frontocentral P200 for older compared to young faces; larger occipital P200 for young compared to older faces; Wiese et al., 2008, 2012; Ebner et al., 2011). However, evidence of perceptual processing differences between adult faces and faces belonging to otherage groups comes mainly from studies comparing the disrupting effects produced on the discrimination of those faces by stimulus manipulations that are known to hinder configural and/or holistic processing, like the face inversion effect (e.g., Kuefner et al., 2008) and the composite-face effect (e.g., de Heering and Rossion, 2008). These studies have shown that adults rely more heavily on expert configural/holistic strategies when processing own-age faces compared with elderly adult faces (Proietti et al., 2013; Wiese et al., 2013), child faces (de Heering and Rossion, 2008; Kuefner et al., 2008, 2010), and infant faces (Macchi Cassia et al., 2009a).

Critically, although this evidence clearly supports the hypothesis of a perceptual processing advantage for younger adult faces compared to a wide range of other-age face types, investigations of how individuals visually scan own- and otherage faces, and how differences in scanning behavior may relate to different recognition performance are quite limited. To the best of our knowledge, only three studies have addressed this question by recording young adult participants' eye movements through eye-tracking methodologies, and all focused on the comparison between young adult (i.e., own-age) and elderly adult faces (Firestone et al., 2007; He et al., 2011; Short et al., 2014). Results converge in showing that young adults look longer at own-age faces compared to older adult faces, both when faces are presented in isolation (Firestone et al., 2007; He et al., 2011) and when they are embedded in naturalistic scenes and the two face ages directly compete for attention (Short et al., 2014).

Among these studies, though, only Firestone et al. (2007) actually investigated whether the distribution of eye movements across various facial regions differed for young and older adult faces, as Short et al. (2014) considered each face as a whole region of interest, and He et al. (2011) only divided each face into lower and upper half and found no difference in distribution of looking time across the two regions between young and older adult faces. Firestone et al.'s (2007) results confirmed the general tendency of Caucasian observers to look longer at the eyes region, followed by the nose and the mouth region. However, although young (i.e., own-age) faces received more transitions between facial regions compared to older adult faces, they also received a decrease in sampling of the eyes, and an increase in sampling of the nose and mouth compared to older faces. Moreover, the authors found that, irrespectively of face age, increased looking time on the nose region was associated to successful subsequent recognition. The authors concluded that patterns of eye scanning during the encoding of unfamiliar faces are critically related to recognition. However, the finding that looking at the nose, rather than at the eye region, mediated correct identification is at odds with demonstrations that longer looking at the upper facial regions (i.e., hair, eyes) results in more accurate recognition of own-race faces (McDonnell et al., 2014).

The aim of the present study was to extend available evidence on the relationship between visual scanning behavior and recognition performance for own- and other-age faces by comparing eye movement scanning patterns exhibited by young adult participants while encoding and recognizing adult and infant faces within the context of an old/new recognition memory task. Infant faces were chosen because, given that newborns are very infrequently present in an adult's typical everyday environment, the amount of individual's exposure to this specific face category is very limited and can be estimated rather well. The influence of experience with infant faces was controlled in the study by selecting participants for having null or limited direct contact with infants (i.e., infant novices), according to the same criteria used in previous studies comparing discrimination and processing abilities for adult and infant faces (Kuefner et al., 2008; Macchi Cassia et al., 2009a,b; see also Yovel et al., 2012). In these studies, infant novices showed better discrimination for young adult faces compared to infant faces in a delayed two-alternative forced choice matching-to-sample task, in which they were asked to match a briefly presented target face to two simultaneously presented test faces appearing after a short delay. Critically, adult participants also showed an inversion effect that was selective for young adult faces. Because it is well-established that at least a portion of the inversion effect is related to configural processing of upright faces (Mondloch et al., 2002), the authors interpreted the complete absence of an inversion effect for infant faces as

evidence that configural processing was not engaged to any extent for the recognition of these faces.

In light of this evidence, the present study had three main goals: (1) to extend available evidence of a recognition bias for adult over infant faces using an old/new recognition memory task; (2) to investigate whether adults show differences in gaze patterns while encoding and/or recognizing adult and infant faces; (3) to test whether these differences in gaze patterns are related to recognition performance. Based on the overarching hypothesis that face recognition varies as a function of expertise with different face categories (i.e., own- vs. other-age faces) and that such improvement may be explained by differential visual encoding strategies, we predicted that: (1) participants would show an own-age recognition advantage as indicated by higher recognition accuracy and/or lower response times (RTs) for adult compared to infant faces; (2) looking behavior (looking time and the dynamicity of visual exploration) would differ for adult and infant faces; (3) recognition performance would vary as a function of looking behavior, possibly with longer fixations on the upper regions of the face being linked to more efficient subsequent recognition.

# MATERIALS AND METHODS

# Participants

Participants were 27 female university students aged from 19 to 29 years (*M* = 23.89 years, *SD* = 2.05). They were asked to participate if they had no offspring and had not acquired extensive experience with infants (i.e., 2 years or younger). To this end, potential participants were screened prior to testing via a questionnaire that included specific inquiries aimed at assessing whether, in the past 5 years, they had had nieces or nephews, contact with infants of friends or acquaintances, and/or a job that put them in contact with infants. Inclusion criteria were identical to those of earlier studies investigating the own-age bias in participants with little or no experience with infants (Macchi Cassia et al., 2009a,b; i.e., less than 520 h of experience per year in the past 5 years). Participants included in the sample had acquired an average of 91.48 h (*SD* = 114.62, range = 0–520) of experience per year over the past 5 years. All participants were Italian and right-handed, and they all had normal or corrected-to-normal vision. All procedures used in the current study complied with the Ethics Standards outlined by the Declaration of Helsinki (BMJ 1991; 302: 1194) and were approved by the Ethics Committee of the University of Milano-Bicocca. All participants signed an informed consent before testing and received formation credits for their participation.

# Stimuli

Twenty-four color photos of female adult faces and 24 photos of infant (aged 3–5 months) faces were used as stimuli. Faces were all Caucasian, frontal, and with neutral expression; an ovalshaped occluder was placed on each face to conceal background information (e.g., hair and ears*;* **Figure 1**). The hue and brightness of the color face images (resolution 72 dpi) were leveled out and were all normalized to be of the same width

(306 pixel, 7.2 cm, 6.3◦ of visual angle). The height of the stimuli and consequently of the occluder differed between the two types of faces in order to maintain ecological validity (adult faces = 10.6 cm, 9.3◦ of visual angle; infant faces = 7.94 cm, 7◦ of visual angle). Faces in each age group were normalized to be the same shape and size. Moreover the eyes, nose, and mouth position were normalized to the locations of the eyes, nose, and mouth of the average image computed on the 24 stimuli, so that the major features of all face stimuli were located in the same face regions.

# Apparatus

All faces appeared on a light gray background at the center of the 19 inches Samsung SyncMaster 1200 NF screen, with a resolution of 1024 × 768 pixels. Stimulus presentation and response collection were controlled by the E-prime 2.0 software. Participants' eye movements were recorded using an Applied Science Laboratories' (ASL) Model 504 Eye Tracker 6 system. Participants had their head on a chin-rest and sat about 65 cm from the eye tracker camera located at the base of the presentation screen, which measured participants' eye movements at a sampling rate of 50 Hz.

# Procedure

Participants were tested in an old/new face recognition task while their eye movements were recorded. A manual calibration of gaze position was conducted at the beginning of the testing session, and repeated at the beginning of each experimental block, using a nine-point fixation procedure. The calibration was validated and repeated when necessary until the optimal calibration criterion was reached.

Each trial started with a fixation cross at the center of the screen, which participants had to fixate for 500 ms in order for the target face to appear for 3000 ms. Participants were instructed to inspect carefully and memorize a sequence of 12 adult and 12 infant faces presented in random order in the center of the screen. Each face was spaced out by a 1000 ms gray noise mask to reduce a possible retinal permanence effect, followed by the 500 ms fixation point (see **Figure 1**).

After the 24 trials of the learning phase, participants performed a filler task used to create a temporal gap between the learning and the recognition phase and to reduce any potential recency effects. In brief, this filler involved an object search in which participants were asked to identify a specific shape (e.g., a square) among other distractor shapes (e.g., triangles). Once identified, a new trial would begin and this process would repeat until 3 min had elapsed. Participants' eye movements during this filling task were not recorded and their performance was not analyzed. Immediately afterward, the test phase began with the presentation of the 24 familiar faces previously seen in the learning phase plus other new 24 faces (12 adult and 12 infant) randomly intermixed with the formers. Each trial started with a fixation cross at the center of the screen, which participants had to fixate for 500 ms in order to have the target face appear. The face remained on the screen until the participant had classified the face as familiar (already seen in the learning phase) or novel by pressing one of two joystick buttons (**Figure 1**). The face images presented in both the learning and test phases were counterbalanced between participants, as was the response associated with the joystick buttons.

# RESULTS

# Behavioral Performance

Three behavioral performance measures were computed for each participant separately for responses to adult and infant faces (sensitivity index -d'-, response bias -c- and mean correct RTs - RTs-) and analyzed to test our first prediction that participants would show an own-age recognition advantage as revealed by higher recognition accuracy and/or lower RTs for adult compared to infant faces. **Table 1** shows the mean and standard error of the mean (SE) for each measure. To assess the own-age bias on recognition data, we conducted paired sample *t*-tests to compare each measure of performance between adult and infant faces (**Table 1**). Participants performed more accurately in the recognition of adult compared to infant faces, as indicated by the significant difference emerged in the sensitivity index (d'), *t*(26) = 2.226, *p* = 0.035 (i.e., higher d' for adult faces compared to infant faces). The comparisons for response bias and mean RTs did not reach statistical significance (c: *p* = 0.074; mean RTs: *p* = 0.094), although the pattern for mean RTs was in the expected direction. It is not unusual to obtain a recognition bias on some measures but not others in similar tasks (Meissner and Brigham, 2001; McDonnell et al., 2014), therefore our data reflected the presence of an OAB.

# Eye Movements

Participants' eye movement scanning behavior was analyzed for both the learning and the recognition phases in order to test our second prediction that looking behavior would differ for adult and infant faces. Three areas of interest (AOIs) were defined for each face of the two age groups: the eyes (right and left combined), the nose, and the mouth (see **Figure 2**).

TABLE 1 | Behavioral performance measures: means and SE of sensitivity index (d'), response bias (c), mean correct response times (RTs) in milliseconds for adult and infant faces.


*p-values in bold are significant (p < 0.05).*

The three AOIs were equal in size and, together, covered 36 % of the total area of the face (each AOI covered 12% of the face). Thus, the proportion of the face captured by the AOIs was held constant for adult and for infant faces (see **Figure 2**).

Two measures were derived from eye movement data: percentage of total viewing time on each AOI and the number of visits per unit time (second) across all AOIs. The first was created to provide a measure of the relative amount of sampling of each facial feature, while the second was created to index the dynamicity of visual processing across the whole face. The percentage of total viewing time was calculated for each trial by dividing the total fixation time on each AOI by the total fixation time on the whole face, and by multiplying the result by 100. Percentages, rather than raw viewing time, were used in order to directly compare viewing time across the learning and recognition phase, which differed for trial duration (learning: 3 s, recognition: until response, *M* = 1735.44 ms, *SE* = 127.47). It should be noted that the total fixation time on the three AOIs (eyes, nose, and mouth) did not add to 100% of the on-face fixation time because some fixations may have fallen outside the AOIs but still within the face area. Number of visits per second was calculated for each trial by dividing the total number of visits (number of times the gaze entered a specific AOI in a given trial) received by each AOI by trial duration, in seconds, which was fixed to 3 s for learning trials, and variable until response for recognition trials. For this measure the left and right eyes were considered as separate AOIs.

Different sets of analyses were performed for each eye movement measure. A first set included eye movement data from all the trials. Furthermore, to test our third prediction that recognition performance would vary as a function of looking behavior, a second and a third set of analyses were performed separately for trials that triggered a correct response during the recognition phase and those that were incorrectly recognized. Separate analyses were performed for the two response measures because, while all participants made at least one correct response on both adult and infant trials, two participants did not have any incorrect response in at least one of the two conditions. For all sets of analyses, analyses of variance (ANOVAs) were conducted on each of the eye movement measures using the factors face age (adult, infant), phase (learning, recognition), and, for total

TABLE 2 | Mean and standard error of the percentage of total looking time on each of the three AOI (eyes, nose, mouth) on the adult and infant face during learning and recognition phase.


*Reported data refer to all the trials (correct and incorrect).*

viewing time, AOI (eyes, nose, mouth). All comparisons were Bonferroni corrected.

### Analyses on All Trials

#### *Percentage of total viewing time*

The mean and SE of the percentage of total viewing time on each AOI for the adult and infant faces during learning and recognition are shown in **Table 2**. The 2 <sup>×</sup> <sup>2</sup> <sup>×</sup> 3 ANOVA showed a significant main effect of AOI, *F*(2,52) = 16.582, *p <* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.389. Bonferroni-corrected, multiple-comparison tests revealed an overall smaller percentage of viewing time on the mouth region (*M* = 11.26%, *SE* = 1.92%) compared to both the nose region (*M* = 27.99%, *SE* = 2.39%), *p <* 0.001, and the eye region (*M* = 33.87%, *SE* = 3.16%), *p <* 0.001. No differences emerged between the eye and the nose regions (*p >* 0.74). The AOI main effect was qualified by two significant interactions between AOI and face age, *<sup>F</sup>*(2,52) <sup>=</sup> 6.999, *<sup>p</sup>* <sup>=</sup> 0.002, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.212, and AOI and phase*, F*(2,52) <sup>=</sup> 3.958, *<sup>p</sup>* <sup>=</sup> 0.025, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.132 (see **Figure 3**). *Post hoc* pairwise *<sup>t</sup>*-tests showed that the only significant difference between adult and infant faces across the two phases concerned the percentage of viewing time on the eyes, which was higher for adult faces (*M* = 36.71%, *SE* = 3.13%) than for infant faces (*M* = 31.03%, *SE* = 3.36%), *t*(26) = 3.775, *p* = 0.003. The mouth region was the least fixated area of the three AOIs for both adult and infant faces, *p*s *<* 0.01. *Post hoc t*-tests also showed that participants looked significantly longer at the eye AOI during learning (*M* = 37.26%, *SE* = 3.69%) compared to recognition (*M* = 30.48%, *SE* = 3.01%), *t*(26) = 2.878, *p* = 0.024. No other difference was found to be significant, *p*s *>* 0.14. Also, in both the learning and recognition phase, the mouth region was the least fixated area of the three AOIs, *p*s *<* 0.007.

#### *Number of visits per second*

The mean and SE of the number of visits for the adult and infant faces during learning and recognition are shown in **Table 3**. The 2 × 2 ANOVA with face age and phase as within-subjects factors revealed main effects of both face age*, F*(1,26) = 33.370, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.562, and phase, *<sup>F</sup>*(1,26) <sup>=</sup> 27.916, *<sup>p</sup> <sup>&</sup>lt;* 0.001, η<sup>2</sup> = 0.518, indicating that participants made more visits per

TABLE 3 | The mean and standard error of the number of visit per second for the adult and infant face conditions separately for the learning and recognition phases.


*Mean and SE values refer to all the trials (first line), correct trials (second line) and incorrect trials (third line).*

second while encoding adult faces (*M* = 1.89, *SE* = 0.09) compared to infant faces (*M* = 1.66, *SE* = 0.09) and they made more visits per second when recognizing faces (*M* = 1.97, *SE* = 0.10) than during learning (*M* = 1.58, *SE* = 0.09; see **Figure 4**).

# Analyses on Correct Trials

#### *Percentage of total viewing time*

The mean and SE of the percentage of total viewing time on each AOI for the adult and infant faces during learning and recognition are shown in **Table 4**. The 2 <sup>×</sup> <sup>2</sup> <sup>×</sup> 3 ANOVA revealed main effects of face age, *F*(1,26) = 5.07, *p* = 0.033, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.163, and AOI, *<sup>F</sup>*(1,26) <sup>=</sup> 17.366, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.400. Participants spent longer looking at the three AOIs for adult faces (*M* = 24.92%, *SE* = 1.0%) compared to infant faces (*M* = 23.91%, *SE* = 1.02%). Bonferroni-corrected, multiple-comparison tests revealed an overall smaller percentage of viewing time on the mouth region (*M* = 11.32%, *SE* = 1.97%) compared to both

TABLE 4 | The mean and standard error of the percentage of total looking time on each of the three AOI (eyes, nose, mouth) on the adult and infant faces separately for the learning and recognition phases.


*Data are presented separately for correct and incorrect trials.*

the nose region (*M* = 28.16%, *SE* = 2.29%), *p <* 0.001, and the eye region (*M* = 33.77%, *SE* = 3.08%), *p <* 0.001. Viewing time did not differ between the eye and the nose regions (*p >* 0.75). The AOI main effect was qualified by two significant two-way interactions with the factor face age, *F*(1,26) = 10.330, *p <* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.284, and phase, *<sup>F</sup>*(1,26) <sup>=</sup> 5.462, *<sup>p</sup>* <sup>=</sup> 0.007, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.174 (see **Figure 5**). The percentage of time that participants spent viewing the eye region was higher for adult faces (*M* = 37.09%, *SE* = 3.02%) compared to infant faces (*M* = 30.46%, *SE* = 3.31%), *t*(26) = 4.455, *p <* 0.001, whereas there was no significant difference between the two face ages on viewing time on the nose, *p* = 0.155, and mouth, *p* = 0.137. For both face ages, the mouth AOI was the least fixated region, *p*s *<* 0.01. The AOI × phase interaction was due to the fact that participants spent more time viewing the eye region during the learning phase (*M* = 37.61, *SE* = 3.601%) compared to the recognition phase (*M* = 29.93%, *SE* = 2.97%), *t*(26) = 3.266, *p* = 0.021. Furthermore, in the recognition phase both the eye and the nose regions were viewed more than the mouth, *p*s *<* 0.001.

#### *Number of visits per second*

The mean and SE of the number of visits for the adult and infant faces during learning and recognition are shown in **Table 3**. The 2 × 2 ANOVA with face age and phase as within-subjects factors showed main effects of face age, *F*(1,26) = 37.836, *p <* 0.001., <sup>η</sup><sup>2</sup> <sup>=</sup> 0.593, and phase, *<sup>F</sup>*(1,26) <sup>=</sup> 36.273, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 582, indicating that participants made more visits per second while encoding adult faces (*M* = 1.89, *SE* = 0.09) compared to infant faces (*M* = 1.67, *SE* = 0.09) and made more visits per second when recognizing faces (*M* = 1.99, *SE* = 0.10) than when learning faces (*<sup>M</sup>* <sup>=</sup> 1.57, *SE* <sup>=</sup> 0.09; see **Figure 6**).

### Analyses on Incorrect Trials

#### *Percentage of total viewing time*

The mean and SE of the percentage of total viewing time on each AOI for the adult and infant faces during learning and recognition are shown in **Table 4**. The 2 <sup>×</sup> <sup>2</sup> <sup>×</sup> 3 ANOVA on the distribution of viewing time across the different AOIs for faces that were not correctly recognized during the recognition phase revealed a main effect of AOI, *F*(2,48) = 13.223, *p <* 0.001, η<sup>2</sup> = 0.355, with shorter dwell time on the mouth region (*M* = 10.85%, *SE* = 1.95%) than on the eye (*M* = 31.14%, *SE* = 3.10%) and the nose region (*M* = 27.25%, *SE* = 2.746%). No other main effects or interactions attained significance, *p*s *>* 0.23.

#### *Number of visits per second*

The mean and SE of the number of visits for the adult and infant faces during learning and recognition are shown in **Table 3**. The 2 × 2 ANOVA with face age and phase as within-subjects factors revealed only a significant main effect of phase*, F*(1,26) = 16.854, *<sup>p</sup> <sup>&</sup>lt;* 0.01, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.413, suggesting that participants made more visits during the recognition phase (*M* = 1.87, *SE* = 0.10) compared to the learning phase (*M* = 1.48, *SE* = 0.10) and a main effect of face age, *<sup>F</sup>*(1,26) <sup>=</sup> 18.925, *<sup>p</sup> <sup>&</sup>lt;* 0.01, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.441, with more visits for adult faces (*M* = 1.79, *SE* = 0.09) compared to infant faces (*M* = 1.55, *SE* = 0.09).

### Relations between Behavioral Performance and Eye Movements

To further explore the relation between scanning behavior and recognition performance we correlated percentage of viewing time on the eye and mouth region during the learning phase with two measures of behavioral performance – i.e., d' and mean RTs – for adult and infant faces separately.

#### *Percentage of total viewing time*

Two-tailed Pearson correlation revealed that increasing percentage of dwell time on the eye region increased the likelihood of correct identification, as measured by d', for infant faces *r* = 0.423, *p* = 0.028 (especially during learning, *r* = 0.497, *p* = 0.008). The same correlation failed to reach significance for adult faces, *r* = 0.315, *p* = 0.109. Percentage of viewing time on the mouth region during recognition showed positive correlation with mean RTs for correct recognition responses for both infant, *r* = 0.449, *p* = 0.019, and adult faces, *r* = 0.376, *p* = 0.053.

#### *Number of visits per second*

For both adult and infant faces number of visits during recognition was positively correlated with recognition accuracy (d') (adult faces: *r* = 0.395, *p* = 0.041; infant faces: *r* = 0.385, *p* = 0.047).

# DISCUSSION

The current study explored the impact of face age on the visual processing strategies employed during encoding and recognition of face stimuli.

In the only previous study investigating how face age modulates behavior, Firestone et al. (2007) looked at how young and older adults' visual exploration strategies and recognition performance differ for young and older adult face stimuli. Here, we wanted to extend this first work by analyzing young adults scanning behavior on young adult faces and on a more physically distant and less experienced, face category, namely infant faces.

Analysis of our participants' response performance provides evidence for the presence of an own-age bias. Results from our study confirmed the presence of the expected markers for the own-age bias, with higher recognition accuracy (d') for adult compared to infant faces and a trend toward mean RTs being faster for the former than the latter. Other studies using similar paradigms found a weaker or absent own-age bias (Firestone et al., 2007). In spite of the methodological differences between this study and previous studies (Kuefner et al., 2008; Macchi Cassia et al., 2009b) comparing adults' performance in the processing of adult and infant faces, the current results suggest that, in the absence of consistent experience with other-age faces, young adults show an advantage in the recognition of own-age compared to other-age faces.

Most interestingly, our eye movements data provide novel evidence that adult and infant faces elicit different gaze patterns in non-experienced adults. Both our variables of choice associated with participants' looking behavior (percentage of total viewing time on each AOI and number of visits per second) significantly differed for the two face categories considered, with adult faces being associated with higher number of visits per second and higher percentage of viewing time on the eye region independently of the task participants had to perform (to memorize or to recognize the face). The first variable (number of visits per second) is indicative of the dynamicity of visual exploration since we considered a visit to the area whenever a fixation was performed in any of the AOIs preceded by a saccade originating either from another AOI or from a region of the face not included in any specific AOI. Therefore, the higher number of visit per second found in the processing of adult faces can be considered as an index of more dynamic visual exploration of these faces compared to infant faces.

Regarding the percentage of viewing time, the general pattern of attention to facial features found was consistent with previous research (Henderson et al., 2005; Flowe, 2011; Nakabayashi et al., 2012; McDonnell et al., 2014) showing that participants fixate more the eyes than other regions of the face. In addition, as predicted, in our data there were differences in how participants processed own-age vs. other-age faces. To this regard, our finding of a higher percentage of viewing time on the eye region of adult compared to infant faces seems to be at odds with what found in the study by Firestone et al. (2007) where young adults looked longer at the eye region of old faces compared to young faces. There are at least two important methodological differences between the current study and the Firestone et al.'s (2007) study that may explain the conflicting results. First of all, in Firestone et al.'s (2007)study participants' eye movements were recorded during an age judgment task; the longer fixation time on the eye region of older adult faces compared to young adult faces might be explained as a consequence of the specific task demands. Participants had to focus on the age of the faces, and it is conceivable that they would have fixated the region that is more informative in that context, which is probably the eye given the presence of wrinkles. Secondly, participants in Firestone et al.'s (2007) study were not controlled for the amount of experience with older adult individuals and, as shown in previous studies, amount of contact can make an important difference in modulating perceptual strategies during the processing of older adult faces (Proietti et al., 2013). Additional evidence would be important to clarify if the inconsistency between our results and those obtained by Firestone et al. (2007) is due to a real effect of older adult faces as a peculiar face category or to the effect of task demands (i.e., age judgments compared to recognition task).

Nonetheless, it is important to underline that the results we obtained (higher percentage of viewing time on the eye region for own-compared to other-age faces) are in line with findings from studies on the race bias, showing that Caucasian participants dwell longer on the eye region of own-race faces compared to other-race faces (Goldinger et al., 2009; Wu et al., 2012; McDonnell et al., 2014, but see Blais et al., 2008; Caldara et al., 2010; Hills and Pake, 2013 for no differences in looking beahaviour for own-race vs. other-race faces). Previous studies have shown that adult participants rely on different perceptual strategies when processing own- and other-age faces by looking at phenomena such as the face inversion effect (e.g., Kuefner et al., 2008) or the composite effect (Kuefner et al., 2010). The present findings add to this earlier evidence by showing that part though probably not all—of the difference in how individuals encode different categories of faces, being the differences related to age or race, lies in their differential attention to discrete facial features. At least in the case of Caucasian participants, the exploration of the eye region is an effective strategy more extensively employed in the processing of familiar face categories compared to unfamiliar face categories.

A second important finding from the current study relates to the difference in scanning strategies employed for encoding and recognition. In fact, the majority of existing studies on the age and race biases, analyzed participants' looking behavior during face learning (Firestone et al., 2007; Goldinger et al., 2009). Our results suggest that scanning strategies change as a function of the task participants have to perform (encoding or recognizing a face). Specifically, results showed that regardless of face age, participants tended to focus their attention more on the eye region in the learning phase, while they tended to use a less specific strategy in the recognition phase. These findings seem to be at odds with those of an earlier study by Henderson et al. (2005) that showed that the distribution of looking time across face features becomes more restricted from learning to recognition, with increasing dwell time on the eye and nose regions and decreasing looking time to the other features (e.g., mouth, chin, forehead). However, there are many methodological differences that may explain inconsistency in the results. For example, each participant in Henderson et al.'s (2005) study was tested in two different learning conditions, only one of which was a free viewing condition as in our study. In the second condition participants had to keep their gaze steady in the area directly between the eyes. It is possible that this restricted viewing condition during learning has biased participants to keep their gaze within the same region even during recognition, thus restricting the distribution of fixations across face features. In addition, in the learning phase of the Henderson et al.'s (2005) study each face was presented for 10 s, whereas in the current study we used much shorter presentation duration (i.e., 3 s). It is possible that such a shorter presentation duration induced participants to focus their attention more on the most informative facial features (i.e., eyes), rather than moving attention across features. In any case, our findings do concord with those by Henderson et al. (2005) in pointing to the dominance of the eyes as an important (based on our data, the most important) feature for face learning.

In addition to that, our data also indicate that participants used a more dynamic strategy during recognition compared to learning, which is reasonable if we consider that participants have to explore all features in order to find the familiar cues coded during learning. Even more important, the use of a more dynamic strategy (higher number of visits) is functional to recognition, as indicated by the positive correlation between number of visits and recognition accuracy (d'). This finding suggests that, during the short time (*M* = 1735 ms) before the participant makes a recognition decision and provides his/her response, a global and more dynamic scanning of the whole face is more functional than a more analytic exploration of the features, for both adult and infant faces.

This conclusion is further supported by the finding that the amount of sampling of the eye region during learning in our data was, to some extent, associated with differences in recognition performance. The analyses conducted separately for correct and incorrect trials confirmed that the larger sampling of the eye region compared to the other AOIs for adult faces with respect to infant faces occurred only for those faces that were subsequently correctly recognized. This again suggests that the eyes are diagnostic to identity. This was confirmed by correlation analyses showing that viewing time on the eye region affected correct identity discrimination in the subsequent recognition phase. Of note, this was especially true for infant faces, whose eye region was viewed overall less than the eye region of adult faces; in the adult face condition, the overuse of the eye region may have masked the effect and led to the absence of a direct association between this exploration strategy and recognition accuracy. Therefore, correlation results combined with results from corrected vs. incorrect trials provide robust evidence of the relevance of the exploration of the eye region in sustaining efficient identity recognition.

Unlike the eye region, visual exploration of the mouth region resulted to be dysfunctional for subsequent face recognition as suggested by the fact that, for both adult and infant faces, longer inspection of the mouth is related to longer RTs in the identification of familiar faces. To the best of our knowledge, the only studies showing that visual exploration of the mouth region is important to face recognition are those using emotional faces (Eisenbarth and Alpers, 2011); in these cases it is clear that looking at the mouth represents an important strategy to gather information about the face. Since the faces used in the current study all posed a static, neutral expression, it is reasonable to assume that the mouth region didn't provide any additional information diagnostic to identity recognition.

### REFERENCES


Rather, dwelling on the mouth region at encoding led to longer RTs.

# CONCLUSION

This study provides novel evidence for the presence of an own-age bias in young adult individuals. This bias is evident not only in the recognition performance exhibited for adult compared to infant faces, but also in the employment of sampling strategies (longer looking at the eye region and a more dynamic exploration) during encoding, which are effective for accurate recognition. The selective use of these strategies for own-age faces was predominant during the encoding of novel faces more than during recognition of the familiarized faces, again pointing to the relevance of these strategies for efficient learning of facial identity.

The finding of differential scanning strategies for own-age as compared to other-age faces extends earlier evidence of perceptual processing differences between adult and infant faces in adults with limited experience with infants (Macchi Cassia et al., 2009a,b). The current study does not provide a direct test of the impact of experience on scanning patterns, as it lacks a comparison with experienced adults who have regular contact with infants. However, participants were intentionally selected for having very limited experience with infants according to the same inclusion criteria used in earlier studies that compared the magnitude of the OAB in novice and experienced participants (i.e., maternity-ward nurses, Macchi Cassia et al., 2009b; firsttime mothers with younger siblings, Macchi Cassia et al., 2009a). In these studies (both cited in the Introduction) the experienced participants, unlike the novices, showed no (or smaller) sign of an OAB in perceptual recognition and a generalized inversion effect for adult and infant faces, suggesting that experience with infants is capable of modulating recognition ability and inducing the use of face-specific processing strategies in adulthood. In light of this earlier evidence, our finding that adult and infant faces elicit different gaze patterns in adults selected for having very limited experience with infants suggests that scanning strategies plays a critical role in the recognition advantage for faces belonging to the most familiar age categories.

# ACKNOWLEDGMENTS

The authors wish to thank Enrica Longo for help in recruiting and testing participants. This research was supported by a scholarship from the University of Milano-Bicocca to the first author.


Blais, C., Jack, R. E., Scheepers, C., Fiset, D., and Caldara, R. (2008). Culture shapes how we look at faces. *PLoS ONE* 3:e3022. doi: 10.1371/journal.pone.0003022


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Proietti, Macchi Cassia, dell'Amore, Conte and Bricolo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Automated facial coding software outperforms people in recognizing neutral faces as neutral from standardized datasets

#### *Peter Lewinski\**

*The Amsterdam School of Communication Research, Department of Communication Science, University of Amsterdam, Amsterdam, Netherlands*

#### *Edited by:*

*Paola Ricciardelli, University of Milano-Bicocca, Italy*

#### *Reviewed by:*

*Luis J. Fuentes, Universidad de Murcia, Spain Francesca Gasparini, University of Milano-Bicocca, Italy*

#### *\*Correspondence:*

*Peter Lewinski, The Amsterdam School of Communication Research, Department of Communication Science, University of Amsterdam, Postbus 15793, 1001 NG Amsterdam, Netherlands p.lewinski@uva.nl*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 22 April 2015 Accepted: 31 August 2015 Published: 11 September 2015*

#### *Citation:*

*Lewinski P (2015) Automated facial coding software outperforms people in recognizing neutral faces as neutral from standardized datasets. Front. Psychol. 6:1386. doi: 10.3389/fpsyg.2015.01386* Little is known about people's accuracy of recognizing neutral faces as neutral. In this paper, I demonstrate the importance of knowing how well people recognize neutral faces. I contrasted human recognition scores of 100 typical, neutral front-up facial images with scores of an arguably objective judge – automated facial coding (AFC) software. I hypothesized that the software would outperform humans in recognizing neutral faces because of the inherently objective nature of computer algorithms. Results confirmed this hypothesis. I provided the first-ever evidence that computer software (90%) was more accurate in recognizing neutral faces than people were (59%). I posited two theoretical mechanisms, i.e., smile-as-a-baseline and false recognition of emotion, as possible explanations for my findings.

Keywords: non-verbal communication, facial expression, face recognition, neutral face, automated facial coding

# Introduction

Recognizing a neutral face as neutral is vital in social interactions. By virtue of "expressing" "nothing" (for a separate discussion on faces "expressing" something, see Russell and Fernández-Dols, 1997), a neutral face should indicate lack of emotion, e.g., lack of anger, fear, or disgust. This article's inspiration was the interesting observation that in the literature on facial recognition, little attention has been paid to neutral face recognition scores of human raters. Russell (1994) and Nelson and Russell (2013), who provided the two most important overviews on the topic, did not include or discuss recognition rates of lack of emotion (neutral) in neutral faces. They provided overviews of matching scores (i.e., accuracy) for six basic emotions, but they were silent on the issue of recognition accuracy of neutral faces.

A distinct lack of articles that explicitly report accuracy scores for recognition of neutral face could explain the silence of researchers in this field. One notable exception is the *Amsterdam Dynamic Facial Expression Set* (ADFES; van der Schalk et al., 2011), where the authors provide an average matching score of 0.67 for their neutral faces. This score is considerably low when one considers that an average for six basic emotions is also in this range ( 0.67, see Nelson and Russell, 2013, Table A1 for datasets between pre-1994 and 2010).

In this paper, I demonstrate a fascinating effect on the recognition of non-expressive, neutral faces both by humans and by software, though I can only speculate as to its theoretical mechanisms. I provide the first evidence that computer software is better in recognizing neutral faces than people are. I open up a potentially productive new area for studying the precise mechanism behind my findings, and I entertain speculation on two possible causes for my findings, i.e., smile-as-a-baseline and false recognition of emotion. In addition, I note in my discussion that independently of the exact mechanism, this finding already has practical implications.

In the current paper, I attempt to fill a gap in the literature regarding the analysis of recognition accuracy of neutral faces from secondary data of human raters and an "objective rater." I define this objective rater as automated facial coding (AFC) software. Therefore, I compare the human versus software accuracy in recognizing neutral faces (i.e., lack of emotion) in clearly neutral images of a face. The use of such objective rater could become a standard in the field of non-verbal communication from facial expressions.

#### Objective Rater

I assume throughout that the computer software is an objective rater because it follows the same coding schema (i.e., an algorithm) for every rating. Technically, software of this type cannot deviate from the algorithm and cannot take into account extraneous information, e.g., a social context or situation. Furthermore, software does not have personal biases stemming from age, culture, or gender. In short, computer software cannot display individual differences in recognizing emotions. To illustrate, I submit a far-fetched example. Studies on recognition of emotionally neutral faces in clinically depressed patients have revealed (e.g., Leppänen et al., 2004) that these individuals perform worse (are less accurate and slower) in recognizing neutral faces than healthy participants. Computer software cannot be depressed or otherwise experience emotional or cognitive abnormalities as humans can. Thus, most importantly, I argue that the software has no specific incentive to over-detect somehow ambiguous situations (such as neutral faces).

Furthermore, as explained below, the AFC software and human raters had essentially to perform the same task, i.e., to choose one target label (neutral) out of a single, unvarying choice set. A training set of ∼10,000 images is extremely small, if it is compared to an average number of faces/expressions seen (both consciously and unconsciously) by an average person aged 30 years. But in the context of comparing human to software, the human rater arguably still has a much richer training data set (speaking figuratively) than the particular AFC software tested in this paper. It was not possible to locate a reference discussing how many faces/expressions an average, healthy adult sees by age 30, but this perhaps goes into trillions of instances, and thus AFC software should be no match to human recognition, but it nevertheless might be. This is likely because a human system is not "software" that just needs more instances of the same stimuli to recognize it correctly; instead, a human system likely has many recognition biases. Based on the above-mentioned reasoning, my hypothesis states that human raters will have significantly lower accuracy recognition rates than AFC software.

# Materials and Methods

To test my hypothesis, I gathered a representative sample of neutral faces from standardized datasets. I then computed human accuracy scores for these faces. Next, I analyzed those neutral faces with AFC software – FaceReader (Noldus, 2014) – and computed FaceReader's accuracy scores. Finally, I compared the human and FaceReader performance in recognizing neutral faces. I report how I determined sample size and all study measures in the sections that follow.

#### Neutral Faces

I used all available images of neutral faces in both *Karolinska Directed Emotional Faces* (KDEF; Goeleven et al., 2008) and *Warsaw Set of Emotional Facial Expression Pictures* (WSEFEP; Olszanowski et al., 2015) datasets. KDEF is a typical dataset with emotional faces, including baseline, that is, neutral images. See **Figure 1** for a typical neutral face image. WSEFEP is a dataset that closely replicates the KDEF methodology of gathering faces, i.e., it contains close-up, front-facing, light-adjusted images of people's faces. The KDEF dataset is a standard dataset in facial expression and AFC research and a popular choice with

researchers, being cited over 160 times. Importantly, this choice was also made because KDEF was included in the original training set of the AFC software and WSEFEP was not. This distinction allowed for testing whether this factor could explain potential differences.

#### Actors Posing a Face

In addition, during the creation of both datasets, the actors expressing the emotion (or posing a neutral face) received specific procedural instructions and underwent extensive training. Thus, consistency and standardization justified our selections of KDEF and its replication, WSEFEP. There were 70 neutral faces in KDEF (50% women) and 30 in WSEFEP (53% women), for a total of 100 images. The actors from those 100 images were specifically instructed to pose a neutral face (see Lundqvist et al., 1998 for definition of a neutral face) by the creators of the respective datasets.

Nevertheless, I sought to assure myself that the faces were indeed neutral. Therefore, the images were coded by a certified Facial Action Coding System (FACS) coder (Ekman et al., 2002) to identify if there could be any facial movement [so-called Action Unit, (AU)] indicative at least partially of an emotion. None of the images contained significant AU (or combinations of AUs), which I defined as part of a basic emotion expression based on EMFACS-7 classification (Friesen and Ekman, 1983). I note that it is unorthodox to use FACS (Ekman et al., 2002, p. 10) to "code" neutral still images, however, the Investigator's Guide – part of the FACS manual – is not clear about this issue and in principle permits such coding. Further, Griffin (2014), among others, used a similar procedure in his studies to determine if a neutral face has a truly neutral expression. By using EMFACS-7 classification, the FACS manual's Investigator's Guide, and following up on Griffin (2014), I believe I have adopted the best approach to ensure truly neutral expressions.

#### Human Ratings of the Datasets

I manually downloaded the datasets and extracted from them the matching scores for the neutral faces (see Table A1 in the Appendix A). In both datasets, the matching scores were defined as "the percentage of observers who selected the predicted label" (Nelson and Russell, 2013, p. 9). I took these matching scores as proxy for accuracy of human recognition rates.

### Human Face Categorization

The authors of both datasets asked the human judges to choose one label out of a list of six basic emotions (happy, surprised, angry, sad, disgusted, and fearful), a "neutral" or other option (KDEF – "indistinct"; WSEFEP – "acceptance," "anticipation," "other emotion") when they saw a target face (*N*KDEF = 490; *N*WSEFEP = 210). In both datasets, the target faces showed a basic emotion expression or a neutral face. The order of presentation of all the faces was randomized in both datasets; furthermore, the human judges saw only a sub-sample of all possible target face images (to minimalize order effect as well as anchoring effect). See Appendix B for excerpts from the description of the two datasets on the judgment task for human coders.

# Automated Facial Coding Software – FaceReader

As an instance of AFC software, I used FaceReader (Noldus, 2014), a software tool that automatically and programmatically analyzes facial expression of emotion. An average recognition score of 89% over the six basic emotions was reported for FaceReader in den Uyl and van Kuilenberg (2005), revalidated to 88% in Lewinski et al. (2014a). This software has been available for scientific research since den Uyl and van Kuilenberg (2005). Researchers have used FaceReader in a multitude of contexts such as, but not limited to: human–computer interaction (Goldberg, 2014); social psychology (Chentsova-Dutton and Tsai, 2010); consumer science (Chan et al., 2014); advertising (Lewinski et al., 2014b); and multimedia research (Romero-Hall et al., 2014). Of more relevance to the current paper is the software's specific use in assessing the role of recognition of emotional facial expressions in human raters only (Choliz and Fernandez-Abascal, 2012).

#### FaceReader Face Categorization

FaceReader works in three steps. First, it detects a face in the image. Next, it identifies 500 key landmark points in the face through Active Appearance Model (Cootes and Taylor, 2004), visualized as a 3D superimposed virtual mesh. In the last stage, it classifies the image according to how likely the emotion is present (or not) in a person's face. A 3-layer, artificial neural network trained on more than 10,000 of instances of six basic emotions and neutral faces makes this classification possible. Then, the software can assign a label to each target face. FaceReader can choose from six basic emotions, a neutral label, as well as a "failed to recognize" option. Therefore, the software followed a very similar procedure to what human judges did. It had to choose a label for a target face out of six basic emotions, a neutral label or indicate it could not classify the face (failure). The number of classification choices is thus similar to the task that the human judges had, however, it could be argued that this is not a oneto-one task equivalency. See van Kuilenburg et al. (2005) for a detailed algorithmic description of this software. In addition, **Figure 1** provides a visualization of FaceReader analysis.

FaceReader's emotion detection algorithm ranges from 0 to 1 for each basic emotion, plus neutral. Higher values indicate a greater likelihood that the person in the image or video experiences the target emotion (or lack thereof). I took this measure as a proxy for classification accuracy. It is technically impossible to compute matching scores for FaceReader as one might do with human raters because the number of "raters" is always *n* = 1, i.e., the software itself.

# Results

Human participants judged 100 images of neutral faces, while FaceReader analyzed the same 100 images. FaceReader successfully analyzed all the images (no "fail to detect"). An independent samples *t*-test was run to determine if there were differences in accuracy scores between humans and FaceReader. There was no homogeneity of variances, as assessed by Levene's test for equality of variances (*p <* 0.0005). The accuracy scores were lower for humans (*M* = 0.59, *SD* = 0.23) than for FaceReader (*M* = 0.90, *SD* = 0.14), a statistically significant difference (*M* = −0.31, 95% CI [−0.37, −0.26]), [*t*(167.96) = 11.62, *p <* 0.0005]. Additionally, Cohen's effect size value (*d* = 1.68) suggested a high practical significance (Cohen, 1988). People, on average, recognized 59 images as neutral out of a set of 100 neutral images; FaceReader recognized 90 images as neutral out of the same set. FaceReader outperformed humans by 31%, i.e., it accurately recognized 31 more images than humans did. See Table A1 in Appendix A in Supplementary Material for overview of accuracy scores for each image.

#### Training Set

Potentially, inclusion or exclusion of KDEF and WSEFEP datasets in the software's training set could bias the results, because software could possibly be better in recognizing a neutral face as neutral if it had previously seen it. According to the software developer, the KDEF dataset was included to train the software while the WSEFEP data set was not included. Furthermore, a number of unnamed datasets was also included in the training set, resulting in more than 10,000 images in the entire training set. Therefore, it is possible that the inclusion/exclusion in the training dataset could be a potential explanatory factor of the results reported above.

To demonstrate that this factor (inclusion vs. exclusion) does not bias the results, the same statistical tests as above were run separately on the KDEF and WSEFEP. Two separate independent samples *t*-tests were conducted, first only on the KDEF dataset (included in the training set) and then only on the WSEFEP (not included in the training dataset). As expected, there was a significant difference between accuracy scores of human coders and FaceReader in KDEF dataset only [*t*(138) = 11.12, *p <* 0.0005] as well as in WSEFEP dataset only [*t*(58) = 4.50, *p <* 0.0005], replicating the main results when the datasets are combined. Therefore, in this study, it does not matter for recognition of neutral faces if the datasets are included or not in the original training set for this particular AFC software.

# Discussion

I demonstrated that AFC software massively outperforms human raters in recognizing neutral faces, a finding with important, far-reaching implications. First, I recommend that recognition rates for neutral faces be reported in all future emotion recognition studies. Second, further study of why humans only recognize on average about 60% of neutral faces as neutral is crucial. This study did not test an explanatory mechanism. However, I offer some speculation regarding two theoretical reasons for humans' surprisingly low performance in the sections below.

#### Theoretical Implications

One explanation of my findings is the phenomenon of the smile-as-a-baseline. In contemporary society, the baseline, i.e., neutral, emotional expression might be a smile rather than a technically neutral face. Some researchers (see Lee et al., 2008) have presented evidence that neutral faces look threatening, or at least "negative." This finding could shed light on why humans have so much trouble recognizing "nothing" in truly neutral faces. That is, people are socialized into seeing happiness (or at least some kind of emotion) in the course of interpreting other people's emotions, acting upon that interpretation, and consequently relating better to other people.

Another explanation for my findings could be the phenomenon known as false recognition of emotion (see Fernández-Dols et al., 2008), which is, bizarrely, contradictory to the smile-as-a-baseline explanation. Fernández-Dols et al. (2008) found that semantic rather than perceptual context of the facial stimuli provokes erroneously perceiving a particular emotion in the facial expression. I add to this theory by showing relatively low accuracy for human raters (59%) and high accuracy for AFC software (90%) in recognizing neutral faces. Undoubtedly, AFC software has no semantic framework from which to draw; perhaps the lack of such a framework makes the software less biased in neutral face recognition (i.e., avoiding false-positive errors).

Today's AFC software cannot interpret the surrounding semantic context of the face, whereas people perform that interpretation almost instinctively (Fernández-Dols et al., 2008). From the software's "perspective," seeing a face can only be a neutral experience, while a human might be scanning for an extra (contextual or semantic) layer of meaning in faces. However, it must be noted that the assumption behind this argument is that the difference between a perfect score of 100% and actual score of 59% (i.e., 41%) is accounted for by labeling the neutral face with another emotion label (e.g., anger, sadness, disgust, etc) or even non-emotional label. For example, a label of "indistinct" expression was provided in original labels in the KDEF dataset or "acceptance, anticipation" for images in the WSEFEP dataset. Thus, a new theoretical question arises if there would be a difference in the neutral score recognition if only emotional or only non-emotional labels were included in the original datasets. This can be investigated in future studies.

Also worth pointing out is that both AFC software and human raters had only a limited number of categories to choose from – both datasets used the so-called forcedchoice method (see Russell, 1994 for criticism). Despite these conditions, the human recognition scores for neutral faces still fell short of Haidt's criterion of 0.70–0.90 accuracy score, which is the threshold at which a particular emotion (or lack thereof) in the face could be considered universally recognizable (see Haidt and Keltner, 1999). Beyond these theoretical discussions, my findings also have some practical, real-world implications.

#### Practical Implications

First, in the practice of professionals who judge others' nonverbal behavior (e.g., police officers, judges, psychotherapists, etc), it must be highlighted that human observers are not usually sensitive enough to see a neutral face as neutral. This shortcoming may result in professionals acting based on incorrect assumptions (e.g., police officer subduing a pedestrian because of wrongly assuming that a face was not neutral but angry).

Second, with the advent of wearable devices such as Google Glass, the clear advantage of software in recognizing neutral faces might be exploited. Even though Google Glass has been discontinued as of the beginning of 2015, the genie is out of the bottle – similar powerful devices are expected to be available in the near future. Thus, considering the situation described in the previous paragraph, a police officer equipped with Google Glass could be more effective in executing their duties (in Dubai, this is already the case, as seen in Gulf News from May 20, 2014). Wearable tech like Google Glass that included a utility like AFC software could indicate when others have a neutral face, reducing the chances of police officers engaging in needless interventions, possibly reducing violence overall.

# Limitations

#### Image Quality

One possible limitation of my study is the use of AFC software as an "objective" rater. AFC software has been known, in principle, to code expressions slightly differently depending, for instance, on the positioning of the face in the picture, uneven saturation, or varying hue (see e.g., FaceReader manual; Noldus, 2014). As much as this is a valid argument, it is equally valid for human raters, as people would be similarly influenced by image quality in judging facial expression.

#### Face Morphology

Another possible limitation to our study is the morphology of the face itself (e.g., wrinkles, bulges, folds; see e.g., Hess et al., 2004). For example, some people exhibit a shape to the mouth that naturally – i.e., when not otherwise emoting – looks like a smile (curved up) or a frown (curved down). Hairy eyebrows, meanwhile, may also give the appearance of a frown. Because of differences in facial morphology, neutral affiliate faces are less readily confused with angry faces than are dominant faces (Hess et al., 2007). I did not control for such possible morphological differences as part of the study, any more than FACS did so in coding the images or in my selection of the images.

However, I argue that I did indeed control for these possible confounds by presenting the exactly same set of neutral faces to the AFC software as was presented to human raters. Any possible differences in image quality, related photo characteristics, or facial morphology were kept constant and were the same for both software and human raters. If the software were possibly "confused" by the quality of the photo or the morphology of the face, this factor would apply equally to the human raters. In any case, I deem this particular limitation unlikely due to the highly standardized nature of the image sets used (see Materials and Methods).

### Posed vs. Spontaneous Expressions

On the theoretical level, the current manuscript investigates and focuses only on difference in perception (both by software and human coders) coming from datasets that had clear, posed, and prototypical expressions. Such sets are standard in the field because they allow for heightened control over the independent variable to which the participants are exposed (the stimuli itself), as well as helping to define what is meant by a particular emotion. Furthermore, this paper focuses only on neutral expressions, and in principle, the same issues of similarity between posed and spontaneous facial expressions of emotion likely do not apply to a study of neutral faces. It was perhaps never tested, but it is difficult to think of a theoretical or practical reason why there should be a difference in neutral face recognition based on whether it comes from a spontaneous or posed facial expressions dataset (e.g., there is no muscle movement in neutral faces as there is in the case of emotional expressions). The current paper investigates only neutral expression and thus the debate on posed vs. spontaneous expressions is likely not applicable to neutral expressions to the same degree it is to emotional expressions. However, future studies may indeed find it worthwhile to test software vs. human accuracy on spontaneous expressions datasets to test this assumption empirically.

#### Coding Task

My hope is that the clarifications in the introduction and methods sections on the procedure and internal working of the software, as well as the thorough description of the participants' task, provided sufficient evidence that the task of software and human raters was similar in nature and that humans should inherently have an advantage over the software. However, I recognize that human judges in both the datasets and FaceReader software had slightly different recognition tasks, as the choice set varied across all three instances, and this might have biased results.

Nonetheless, it must be recognized that none of the existing datasets are constructed in exactly the same way. Human judges in KDEF and WSEFEP, as well as in other famous datasets (e.g., JACFEE, MSFDE, ADFES, RaFD, UCDSEE, FACES), varied in terms of at least (a) including human rating; (b) recognition procedure; and (c) inclusion of a "neutral" and/or "other" label. This is why it was not possible to include these other datasets, as only two datasets were identified that contained human scores for neutral faces and followed a protocol similar to what the AFC software follows. WSEFEP and KDEF met those criteria, hence their use in this paper. Furthermore, to evaluate other famous datasets, it would be necessary to access raw images from those datasets and have them judged by human coders, e.g., on a crowdsourcing platform. This task was deemed beyond the focus of the current paper.

# Anchoring Effect

Another possible limitation of this study lies in using the "matching scores" (i.e., the accuracy) for human raters from the dataset itself (WSEFEP and KDEF). In both of the datasets, the raters were also judging other basic emotional expressions (to validate those datasets), as in repeated-measure experiments. See Olszanowski et al. (2015; WSEFEP) and Goeleven et al. (2008; KDEF) for more details. Even though each rater saw only a limited number of images to classify and the order was randomized, the possibility exists that rating other-than-neutral faces could have resulted in a so-called anchoring effect (e.g., see Russell, 1994). In other words, previously witnessed emotion could have influenced the recognition of the subsequent expression. Nevertheless, both KDEF and WSEFEP, which followed the KDEF methodology, are typical instances of facial expression datasets used widely in research. For the developers of those sets, it would be impractical to expose human raters to only one subset of images, as this would result in a gargantuan sample needed to judge those facial images. A possible solution to this issue would include presenting the subset of neutral images in random order to a number of independent judges recruited from crowdsourcing platforms (e.g., MTurk). I may well adopt that methodology in future studies.

### References


# Acknowledgment

The research leading to these results has received funding from the People Programme (Marie Curie Actions) of the European Union's Seventh Framework Programme FP7/2007-2013/ under REA grant agreement 290255.

## Supplementary Material

The Supplementary Material for this article can be found online at: http://journal*.*frontiersin*.*org/article/10*.*3389/fpsyg*.* 2015*.*01386


**Conflict of Interest Statement:** Peter Lewinski has worked – as a Marie Curie Research Fellow – for Vicarious Perception Technologies B.V., Amsterdam – an artificial intelligence company that develops FaceReader software for Noldus Information Technologies B.V. He is also a research fellow in ASCoR.

*Copyright © 2015 Lewinski. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Working memory load disrupts gaze-cued orienting of attention**

*Anna K. Bobak and Stephen R. H. Langton\**

*School of Natural Sciences, University of Stirling, Stirling, UK*

A large body of work has shown that a perceived gaze shift produces a shift in a viewer's spatial attention in the direction of the seen gaze. A controversial issue surrounds the extent to which this gaze-cued orienting effect is stimulus-driven, or is under a degree of top-down control. In two experiments we show that the gaze-cued orienting effect is disrupted by a concurrent task that has been shown to place high demands on executive resources: random number generation (RNG). In Experiment 1 participants were faster to locate targets that appeared in gaze-cued locations relative to targets that appeared in locations opposite to those indicated by the gaze shifts, while simultaneously and continuously reciting aloud the digits 1–9 in order; however, this gaze-cueing effect was eliminated when participants continuously recited the same digits in a random order. RNG was also found to interfere with gaze-cued orienting in Experiment 2 where participants performed a speeded letter identification response. Together, these data suggest that gaze-cued orienting is actually under top-down control. We argue that top-down signals sustain a goal to shift attention in response to gazes, such that orienting ordinarily occurs when they are perceived; however, the goal cannot always be maintained when concurrent, multiple, competing goals are simultaneously active in working memory.

#### *Edited by:*

*Andrew Bayliss, University of East Anglia, UK*

#### *Reviewed by:*

*Nathalie George, Centre National de la Recherche Scientifique, France Ramesh K. Mishra, University of Hyderabad, India*

#### *\*Correspondence:*

*Stephen R. H. Langton, School of Natural Sciences, University of Stirling, Stirling FK9 4LA, UK stephen.langton@stir.ac.uk*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 24 April 2015 Accepted: 05 August 2015 Published: 24 August 2015*

#### *Citation:*

*Bobak AK and Langton SRH (2015) Working memory load disrupts gaze-cued orienting of attention. Front. Psychol. 6:1258. doi: 10.3389/fpsyg.2015.01258* **Keywords: gaze-cued attention, working memory, top-down control, random number generation, executive load**

# **Introduction**

In various social contexts, people tend to take notice of others' gaze direction. The past two decades have seen a large number of studies investigating this social orienting phenomenon utilizing a modified version of Posner's (1980) cueing paradigm (see Frischen et al., 2007, for a review). In this task, response times (RTs) to either detect, identify or localize targets appearing in gazed at locations (i.e., cued targets) are compared with responses to targets in locations that have not been gazed-at (i.e., uncued targets). In line with the view that people tend to pay attention to where others are looking, studies have consistently shown shorter RTs to cued than to uncued targets (e.g., Friesen and Kingstone, 1998; Driver et al., 1999; Langton and Bruce, 1999). The authors of the original studies demonstrating this gaze cueing effect argued for its reflexive, stimulus-driven nature, a claim supported by more recent evidence suggesting that the effect is immune to interference from a concurrent working memory (WM) load (Law et al., 2010; Hayward and Ristic, 2013). The aim of this paper is to revisit this recent evidence, and to investigate whether a more demanding concurrent WM task will disrupt gaze-cued orienting. Such a result would suggest that, rather than a stimulusdriven reflex, gaze cueing should be better understood as being under a degree of top down control.

Researchers have drawn a broad distinction between, on the one hand, exogenous, bottomup, reflexive, or stimulus-driven attention, and on the other, endogenous, top-down, or wilful attention (e.g., Posner, 1980; Jonides, 1981). Several lines of evidence suggest that the gaze-cueing effect is more like the former than the latter. First, it emerges even when participants are explicitly asked to ignore the faces that provide the directional cues (Langton and Bruce, 1999); second, the gaze-cueing effect is observed when participants are aware that gaze cues do not reliably predict the locations of the forthcoming targets (i.e., targets are equally likely to appear in any of the possible target locations following any gaze cue), or even when targets are actually more likely to appear in uncued relative to cued locations (Driver et al., 1999; Kuhn and Kingstone, 2009); third, gaze cueing occurs even when participants know with 100 per cent certainty that targets will appear in a particular location (Galfano et al., 2012); and finally, gaze cues facilitate attention shifts even when a peripheral target is accompanied by an irrelevant sudden onset distractor in a mirror opposite location (Friesen et al., 2005).

Despite this compelling evidence for the stimulus-driven character of social orienting, some authors suggest that a topdown component is involved in the process (e.g., Vecera and Rizzo, 2004, 2006; Koval et al., 2005). For example, Vecera and Rizzo (2004, 2006) demonstrated that patient EVR who sustained large lesions to orbitofrontal cortex—a part of the brain linked to executive functioning—showed a normal, exogenous orienting of attention in response to sudden onset peripheral cues, but did not show an orienting response to centrally presented gaze cues. This was irrespective of how well the gazes predicted the likely location of the targets (50 and 75% accuracy). As a result of the neurological damage, EVR was also left with certain difficulties in goal directed behavior, such as typical daily activities, or decision making when presented with a problem (Vecera and Rizzo, 2004). The authors therefore argued that gaze-directed orienting is subjected to topdown modulation in a similar way to other behaviors that require sustained and selective attention to socially relevant cues, such as words and arrows. A recent study by Tipples (2008) reported that, indeed, individual differences in self-reported attentional control are linked to orienting cued by arrows and gazes, but not to orienting cued by peripherally presented sudden-onset stimuli.

Ostensibly, these neuropsychological data do seem to suggest that gaze-cued orienting is rather less like a stimulus-driven reflex and more akin to endogenous, wilful orienting of attention. However, as pointed out by Frischen et al. (2007), we should be cautious in over-interpreting these results for it is unclear whether EVR displayed a normal pattern of cueing prior to sustaining the brain lesion. Hietanen et al. (2006) pointed out that not all individuals display the typical pattern of reflexive orienting to gaze cues and EVR could have been one of them. Nevertheless, Vecera and Rizzo's work certainly hints at top-down involvement in gaze-cued orienting.

If gaze cued attention is modulated by top-down processes, WM is the likely mechanism responsible for the modulation. Indeed, numerous studies have shown that WM is linked to attentional control in the antisaccade task (Kane et al., 2001) and that attention to visual distractors is influenced by the content of WM (Lavie and De Fockert, 2005; Spinks et al., 2004). Moreover, WM content was found to be congruent with what is attended to (Downing, 2000; Pratt and Hommel, 2003; Olivers et al., 2006; Soto et al., 2008; Olivers, 2009). WM is therefore a convincing candidate for a system controlling "endogenous" shifts of attention, which may include those made in response to gazes. However, across two experiments, Law et al. (2010) found no evidence for WM involvement in gaze cueing. While there was overall slowing of RTs to peripheral targets following a gaze cue when participants were engaged in a concurrent high load WM task (retain a five digit sequence during each gaze-cueing trial), rather than a low load WM task (retain a single digit in memory) or no concurrent secondary task, the gaze cueing effect remained intact across all secondary task conditions. A recent study by Hayward and Ristic (2013) yielded similar results: once again, gaze-cued orienting was found to be resilient to a concurrent WM load (retain a five digit sequence); however, the authors went a step further in demonstrating that their concurrent WM task did in fact disrupt endogenous orienting of attention, suggesting that gaze-cued orienting and endogenous orienting are independent processes.

In summary, although the work of Vecera and Rizzo (2004, 2006) has suggested that top-down factors might be involved in gaze-cued orienting of attention, the effect has remained stubborn to demands imposed by concurrent cognitive tasks (Law et al., 2010; Hayward and Ristic, 2013). The issue about whether gazecued orienting can best be described as an exogenous or an endogenous process therefore remains unresolved.

In this paper we revisit the finding that gaze-cued orienting is unaffected by a concurrent cognitive load. One of the problems with the digit load concurrent task used by both Law et al. (2010, Experiment 1) and Hayward and Ristic (2013) is that it does not necessarily place overly large demands on WM resources. For example, Baddeley and Hitch (1974, cited in Baddeley, 1990) showed that participants could maintain and rehearse out loud sequences of up to eight digits while simultaneously carrying out reasoning, learning and comprehension tasks, with only minimal interference; Law et al. (2010) and Hayward and Ristic (2013) each used just five digit sequences in their high load secondary tasks. Second, there is a growing body of research showing that WM is flexible and can prioritize between competing goals (see Ma et al., 2014, for a review). Pertinently, maintenance rehearsal, the resource-demanding aspect of the digit load task employed in the Law et al. (2010) and Hayward and Ristic (2013) studies, could have been suspended during the brief period when participants were performing the gaze-cueing task. To see that this could be so, consider the sequence of events on each trial in the relevant experiments reported by Law et al. (2010) and Hayward and Ristic (2013). Following the presentation of a fixation cross participants were shown the to-be-retained digit sequence for 1500 ms. The fixation cross then reappeared for 1000 ms prior to the presentation of the gazing face, which was displayed for up to 1000 ms, depending on the stimulus onset asynchrony (SOA) condition. This was followed by the presentation of the target, which demanded either a localisation response (Law et al., 2010), which averaged around 450 ms under digit load conditions, or a target detection response (Hayward and Ristic, 2013), which averaged around 400 ms. Finally, participants were given a WM prompt—a single digit from the retained sequence—to which they were asked to respond by entering the next digit in the five digit sequence. Participants could therefore have encoded the digit sequence upon its presentation and continued to rehearse this for up to 2500 ms before the gaze cue was presented. Rehearsal could then have been suspended for the duration of the presentation of the gaze cue, and the presentation and response to the target stimulus, which would have amounted to, at most, 1500 ms. During this time WM resources could have been available to initiate an attention shift in the direction of the gaze cue, producing the normal gaze-cueing effect on RTs. Rehearsal of the digit sequence could then be successfully resumed because, as shown by Baddeley (2002), material can be passively stored in WM (i.e., without rehearsal) for up to 2000 ms before decay renders it irretrievable. The sequence would therefore still be available in WM for subsequent rehearsal and response following the presentation of the memory prompt.

Our argument is therefore that, regardless of whether or not the digit load task places excessively high demands on participants' executive resources, the demands are not necessarily imposed during the period when participants are shifting attention in response to the seen gazes. Clearly what is needed is a secondary task that must genuinely be carried out simultaneously and continuously with the gaze cueing procedure. Law et al. (2010) attempted one such task. In their second experiment participants carried out a sequence of gaze-cueing trials while at the same time listening to an auditory description of a matrix pattern, which they used to build up a mental image of the shape. Participants visualized a 5 *×* 3 grid of unfilled squares. They were then presented with a 15 word sequence consisting of the words "filled" and "unfilled," which instructed them as to which of the squares on their imaginary should be filled-in, and which should be left blank. The resulting grid of filled and unfilled squares depicted one of the digits 1–9, which participants were then asked to report. This task clearly demands both manipulation and maintenance of visuospatial information, and would seem to require that processing be carried out simultaneously with the gaze cueing tasks. Gaze-cued orienting was nonetheless unaffected by this secondary task, leading the authors to conclude that it is a largely stimulus-driven reflex. However, it is possible that, as with the digit load task, participants could strategically suspend the processing aspect of the secondary task—the mental filling-in of the squares—until after the gaze tasks had been completed. The task could then become one of maintaining in memory a verbal sequence during the gaze-cueing trials. Alternatively, participants could allocate resources to building up the mental image between gaze-cueing trials, briefly suspend this while the gaze cues and targets were presented, and then resume the mental grid filling before the start of the following gaze-cueing trial. Both accounts are consistent with the account of flexible allocation of WM resources depending on the prioritized goal (Ma et al., 2014).

In the experiments reported in this paper we employed an executively demanding secondary task that must genuinely be completed concurrently with the gaze cueing procedure: random number generation (RNG). Generating random sequences from a well known and well defined set of items, such as the numbers one to nine, or letters of the alphabet, requires participants to generate and run a plan for the retrieval of an item from the appropriate set. They must keep track of the frequency with which they have generated each item, and compare sequences

to some conception of randomness. If recent sequences are judged to be insufficiently random, a new strategy must be devised and initiated. In addition, well-learned or stereotypical sequences (e.g., 1-2-3-4, or A-B-C-D) must be inhibited. Random sequence generation therefore seems to draw on a range of executive processes, a claim supported by the work of Miyake et al. (2000) and Jahanshahi et al. (1998). For example, the latter group showed that transcranial magnetic stimulation of the left dorsolateral prefrontal cortex—an area associated with executive functioning—impaired participants' ability to generate random sequences of numbers. Concurrent generation of random sequences has also been shown to have a negative effect on a range of tasks, including the learning of simple contingencies (Dienes et al., 1991); performing mental arithmetic (Logie et al., 1994); syllogistic reasoning (Gilhooly et al., 1993); choosing appropriate moves in chess, and remembering the positions of chess pieces (Robbins et al., 1996). Random number or interval generation, unlike reciting equal intervals, was reported to disrupt performance on the Corsi Blocks Task (Vandierendonck et al., 2004) and other tasks tapping into executive components of spatial WM (Towse and Cheshire, 2007).

The evidence that RNG taps executive processes, particularly those involved in spatial WM tasks, and the fact that it can be performed continuously, make it a good candidate for a secondary task with which to investigate the impact of WM on the gaze-cueing effect. In each of the experiments reported here, participants performed blocks of standard gaze-cueing trials with target localization (Experiment 1) and target identification (Experiment 2) responses. In easy secondary task conditions, participants repeatedly recited aloud the digits 1 to 9 in sequence at the rate of one digit per second while performing the gaze cueing trials. In the hard secondary task conditions, participants generated random numbers, again at the rate of one per second, from the same set of digits. Counting numbers aloud, in order, is a stereotyped response, which should not be demanding of executive resources. Gaze cued orienting, whether stimulusdriven or involving a volitional component, ought to be observed under these conditions. However, if attention shifts in response to seen gazes share executive processes with RNG, we would expect the effect to be reduced, or absent when participants are engaged in the hard secondary task.

# **Experiment 1**

# **Materials and Methods** Participants

University of Stirling students and visitors (17 women, 7 men, with a mean age of 23.71 years, and range of 18–40 years) were recruited through the online sign-up system and online advertising. Psychology students were awarded experimental credits for their participation and the remaining volunteers participated on an entirely voluntary basis. All participants had self-reported normal or corrected-to-normal vision. All experimental procedures have been approved by the University of Stirling Research Ethics Committee and adhere to the principles of the 1964 Helsinki Declaration. Written informed consent was obtained from all participants.

# Materials and Apparatus

# *Primary gaze cueing task*

A color photograph of a male face with neutral facial expression cropped of all external features subtending 5.7 *×* 3.7° of visual angle was used in the experiment. The face model was selected from the Radboud Faces Database (Langner et al., 2010), and the stimuli were prepared using Adobe Photoshop 7.0. A cross was used as a fixation point at the beginning of each trial, subtending 0.3°. The stimulus employed as the target was a white asterisk subtending 0.3° and located at the same level as the eyes 5 cm (4.1°) from the midpoint of the photograph to the left or right.

# *Secondary task*

In the secondary tasks participants were required to produce random sequences of numbers from 1 to 9 in the hard condition, or, in the easy condition, recite out loud the digits from 1 to 9 in sequence at the rate of one digit per second. The pace was indicated by a JOYO JM-65 metronome. Sequences were recorded using Olympus VN-5500 Digital Voice Recorder to ensure that participants were, indeed, performing the relevant secondary task.

All stimuli were presented against a black background on a 17 inch monitor set to 1152 *×* 864 pixels and refreshing at the rate of 75 MHz using E-Prime software (Psychology Software Tools, Pittsburgh, PA, USA). Reaction times and responses to targets were registered using a Serial Response Box (Psychology Software Tools, Pittsburgh, PA, USA).

#### Design

The experiment employed a within-subjects design with three independent variables: cue validity (cued, uncued), secondary task (hard, easy), and (SOA, 300 ms, 1000 ms). The dependent variable was RT in response to targets.

### Procedure

All participants were seated 70 cm away from the computer screen in a dimly lit room. Participants performed the secondary tasks concurrently with the gaze trials. In the hard secondary task condition, participants were asked to imagine an infinite number of numbers from one to nine in a hat and pulling them out one at a time, replacing each after it has been read. They were asked to generate the numbers out loud at a rate of one per second indicated by the sound of a metronome and informed that their voice was to be recorded for the purpose of further analysis. In the easy secondary task participants were instructed to recite the digit sequence from 1 to 9 repeatedly at a rate of one digit per second. Again, participants were asked to keep pace with the metronome, and informed about the active recording of their voice.

An example of a gaze cueing trial is illustrated in **Figure 1**. All trials began with a fixation cross displayed on the screen for 1000 ms. This was followed by a directly gazing face for 750 ms after which the gaze shifted to the left or right. The gaze cue was displayed for either 300 ms or 1000 ms before the onset of the target stimulus (i.e., the SOA). The gaze cue was non-predictive of the location (i.e., 50% cued and 50% uncued trials). Both the cue and the target remained on screen until response. Participants

were asked to press the right foremost button on the serial box for targets appearing on the right side of the face and the left foremost button for targets appearing on the left.

Participants completed a set of four blocks of 32 trials under each of the secondary task conditions. These comprised 16 repetitions of the factorial combinations of cue validity (cued, uncued), SOA (300 ms, 1000 ms), and gaze direction (left, right). Whether participants began with a set of four blocks of trials under easy or hard secondary task conditions was counterbalanced between participants. Prior to starting each set of four blocks, participants completed a block of 16 practice trials. Blocks in each set of four consisted of trials drawn randomly, without replacement from the pool of 128 trials. Participants were given five seconds before the first trial in each block to begin reciting the appropriate digit sequence (i.e., random or sequential).

Volunteers were informed that the gaze direction of the displayed face did not reliably predict the future localization of the target stimulus and advised that both tasks were of equal importance and that they should aim to maximize performance on each of the tasks.

## **Results**

Gaze cueing trials with errors were removed from analysis, resulting in the loss of 1.47% of the data. From the remaining data, median RTs were computed for each participant in each condition of the experiment. The interparticipant means of these RTs are recorded in the top row of **Table 1**. The data clearly violated the homogeneity of variance assumption (Hartley's *F*max = 8.77, *p <* 0.01). A transformation of the data was therefore performed by computing the reciprocal of each participant's median RT in each condition of the experiment. This transformation was found to stabilize the variances (Hartley's *F*max = 2.10, *p >* 0.05 following the transformation), as can also be seen in **Table 1**. This table shows the means and standard deviations of the transformed data (middle row), and the corresponding means after conversion back to the original scale (bottom row). All inferential statistics were conducted on the reciprocally transformed data.



*The units on the original scale are milliseconds. Units on the transformed scale are milliseconds−<sup>1</sup> . The table also shows percentage of correct responses in each condition.*

The transformed data were subjected to an analysis of variance (ANOVA) with cue validity, secondary task and SOA as repeated measures factors. There was a significant main effect of secondary task *F*(1, 23) = 72.89, *p <* 0.001, η 2 *<sup>p</sup>* = 0.76 reflected by overall slowing of reaction times under the hard secondary task condition (*M* = 495 ms) in comparison with the easy task (*M* = 374 ms). There was also a significant main effect of SOA, *F*(1, 23) = 18.86, *p <* 0.001, η 2 *<sup>p</sup>* = 0.45 with faster reaction times to targets appearing 1000 ms after the onset of the gaze cue (*M* = 416 ms) than after 300 ms (*M* = 436 ms). The effect of cue validity did not reach significance, *F*(1, 23) = 2.06, *p* = 0.17, η 2 *<sup>p</sup>* = 0.08, showing that, overall, participants responded no faster to cued targets (*M* = 424 ms) than uncued targets (*M* = 428 ms). However, the main effects were qualified by a significant interaction between task and cue validity, *F*(1, 23) = 6.85, *p <* 0.05, η 2 *<sup>p</sup>* = 0.23, confirming that there was a modulation of the gaze cueing effect by the secondary task demands. Simple main effects analyses revealed that, under easy secondary task conditions, cued targets (*M* = 369 ms) were located faster than uncued targets (*M* = 379 ms), *F*(1, 46) = 8.69, *p <* 0.01, but that under hard secondary task conditions, performance for cued targets (*M* = 499 ms) was equivalent to that of uncued targets (*M* = 492 ms), *F*(1, 46) = 1.42, *p* = 0.24.

Finally, the ANOVA revealed a marginally significant interaction between cue validity and SOA, *F*(1, 23) = 3.79, *p* = 0.06, reflecting the observation that at the 300 ms SOA cued targets (*M* = 431 ms) were responded to faster than uncued targets (*M* = 442 ms), but at the 1000 ms SOA, the trend was in the opposite direction, with slightly faster location of uncued targets (*M* = 415 ms) than cued targets (*M* = 418 ms). No other interactions reached significance (*p*s *>* 0.13)<sup>1</sup> .

The percentages of correct responses are also shown in **Table 1**. It is clear from these data that participants were able to perform the target localization task very well indeed, making errors on just 1.4% of trials. Moreover there is no evidence of a trade off between speed and accuracy that would compromise interpretation of the RT data. As performance was essentially at ceiling level in all conditions, no further analyses were conducted on these data.

#### **Discussion**

The overall pattern of the data indicated a cueing effect under easy dual task conditions, which disappeared when participants were engaged in an executively demanding secondary task. Participants were also slower and somewhat less accurate at target localization under hard relative to easy secondary task conditions, which suggests that generating random number sequences is indeed a more demanding task than reciting ordered sequences of digits. However, although participants' accuracy was slightly lower under hard secondary task conditions, it was still very high indeed, suggesting that participants did not simply abandon the target localization task, or avert their gazes from the screen when performing the demanding secondary task. One possibility, however, is that participants may have maintained relatively high accuracy at target localization under difficult secondary task conditions by compromising their performance in generating random numbers. For example, they might have waivered from the requirement to generate numbers at the rate of one per second, or they may not have maintained an acceptable level of randomness. As we did not analyze these data we cannot address this possibility directly. The available data do suggest, however, that the RNG task had a detrimental effect on gazecued orienting. So, whether or not participants strayed from the maximum demands of the RNG task, it was still sufficient to disrupt gaze-cued orienting relative to performance in the easy secondary task condition.

The results of Experiment 1 imply that those mechanisms that are involved in the generation of random number sequences are also involved in the generation of an attention shift in response to a seen gaze. A key assumption underlying this interpretation of the data is that the difference in RTs for the localization of uncued versus cued targets is caused by the allocation of visual attention in response to the gaze cue. However, an alternative interpretation is that the RT difference between uncued and cued conditions could actually reflect a difference in the degree of stimulus-response compatibility

<sup>1</sup> In order to examine whether the source of the interference effect of RNG on gaze cued orienting might be an incompatibility between the spatial code generated by the appearance of the target and one that might be associated with the generation of random numbers (e.g., producing number sequences from left to right in visual imagery), we also performed an ANOVA with target location (left vs. right) as an additional repeated measures factor. However, target location was found to interact with neither of the other two factors, and nor did the predicted interaction between target location, secondary task and cue validity reach statistical significance (*p* = 0.84).

between these cases. The argument is as follows. First, there is evidence that gazes and other social cues automatically trigger the generation of spatial codes (Langton et al., 1996; Langton, 2000; Langton and Bruce, 2000). It is reasonable to assume, therefore, that the gaze cues in the present experiment also trigger the generation of such codes. On cued trials, the gazes would result in the generation of spatial codes which are the same as those required for the key press responses (e.g., gaze right, target right); under uncued conditions, these codes would be different (e.g., gaze right, target left). The RT difference between uncued and cued conditions could therefore be the result of difficulties in response selection, for example, rather than any shifting of visuo-spatial attention. The interaction effect that we have observed in Experiment 1 might therefore reflect the influence of RNG on response selection processes, rather than on gaze-cued orienting of attention. This problem was addressed in Experiment 2.

# **Experiment 2**

In order to eliminate a response selection account for the cueing effect observed in Experiment 1, in Experiment 2 we used a target identification, rather than a target localization task. Additionally, we also included a condition that ought to be immune from a demanding secondary task—one where the identity of a target is assessed as a function of whether or not its location has been indicated by a peripheral luminance change.

# **Materials and Methods** Participants

Undergraduates from the University of Stirling (*N* = 32, 14 female, 18 male) were recruited for this experiment. They received course credit for participation. The mean age was 21.59 years (range: 18–44 years).

### Materials and Apparatus

These were identical to those used in Experiment 1 in all but the following respects. The target stimuli for both the gaze cueing and peripheral cueing tasks comprised the letters T and F in 18 point Arial font. In the peripheral cueing task, two grey boxes appeared centered 4.1° to the left and right of the central fixation cross. The lines of these boxes were 1 pixel thick and the boxes measured 1.6° in height and 1.4° in width. The spatial cue in this condition was rendered by replacing one of the grey placeholder boxes with an identically sized white box, the lines of which were six pixels thick.

# Design

The experiment had a 2 *×* 2 *×* 2 design with cue type (gaze cue, peripheral cue) as a between-subjects independent variable and cue validity (cued, uncued), and task type (hard, easy) as withinsubjects variables. SOA was not manipulated in this experiment and was instead fixed at 300 ms for both cue types. This SOA produced the largest magnitude of gaze-cueing in Experiment 1, and is also short enough to elicit a cueing effect from peripheral onsets (Müller and Rabbitt, 1989).

## Procedure

The easy and hard secondary tasks were identical to those used in Experiment 1. The procedure for gaze-cueing trials was identical to that of Experiment 1, save for the facts that the SOA was fixed at 300 ms for all trials, targets comprised the letters T and F, and participants were asked to identify the target letter on each trial by pressing the topmost button on the response box for the letter T and the bottom button for the letter F.

Trials in the peripheral cue condition began with a 2000 ms presentation of the display comprising the fixation cross and placeholders. One of the placeholder boxes was then replaced by the white cue box. The target letter (T or F) appeared centered in either the cued box, or the uncued box 300 ms after the onset of the cue, and remained on the screen until the participant had responded.

Participants completed 64 trials under each secondary task condition, divided into two blocks of 32 trials. A block of 16 practice trials preceded each pair of experimental blocks. The order in which participants completed each pair of easy and hard secondary task blocks was counterbalanced across participants, and participants were randomly allocated to either the gazecueing or peripheral cueing task, with the constraint that an equal number took part in each task.

# **Results**

Participants made errors on 4% of all gaze-cueing trials in Experiment 2 and these responses were removed from subsequent analyses of the RT data. Median RTs were then computed as in Experiment 1, and the interparticipant means and standard deviations of these data are presented in **Table 2**. Once again, because of the heterogeneity of variance evident in the data (Hartley's *F*max = 18.84, *p <* 0.01), RTs were subjected to a reciprocal transform, which was found to stabilize the variances across experimental conditions (Hartley's *F*max = 1.93, *p >* 0.05). The means and standard deviations of these transformed data are also presented in **Table 2**, along with the corresponding untransformed means. As in Experiment 1, all inferential statistics were conducted on the reciprocally transformed data.

An ANOVA was conducted on the reciprocally transformed RT data, with secondary task (easy vs. hard), and cue validity (cued vs. uncued) as repeated measures factors, and cue-type (gaze vs. peripheral) as a between-subjects factor. This analysis yielded a main effect of secondary task, *F*(1, 30) = 62.03, *p <* 0.001, η 2 *<sup>p</sup>* = 0.67, with faster identification of targets under easy secondary task conditions (*M* = 472 ms) than hard secondary task conditions (*M* = 577 ms). There was also a main effect of cue validity, *F*(1, 30) = 62.17, *p <* 0.001, η 2 *<sup>p</sup>* = 0.68, reflecting faster performance for cued targets (*M* = 495 ms) than uncued targets (*M* = 545 ms). However, these main effects were qualified by interactions between secondary task and cue validity, *F* (1, 30) = 24.66, *p <* 0.001, η 2 *<sup>p</sup>* = 0.45, cue validity and cue-type, *F* (1, 30) = 29.74, *p <* 0.001, η 2 *<sup>p</sup>* = 0.50, and by all three factors, *F* (1, 30) = 4.62, *p <* 0.05, η 2 *<sup>p</sup>* = 0.13.

In order to explore the significant 3-way interaction, separate repeated measures ANOVAs were conducted on the RT data from the group who performed the gaze-cueing primary task and those

#### **TABLE 2 | Means and standard deviations (in parentheses) of responses in each condition of Experiment 2.**


*The units on the original scale are milliseconds. Units on the transformed scale are milliseconds−<sup>1</sup> . The table also shows percentage of correct responses in each condition.*

who performed the peripheral cueing task, each with cue validity and secondary task as factors.

#### Gaze-cueing Task

For the group performing the gaze cueing trials, the ANOVA yielded significant main effects of secondary task, *F*(1, 15) = 26.17, *p <* 0.01, η 2 *<sup>p</sup>* = 0.64, and cue validity, *F*(1, 15) = 6.74, *p <* 0.05, η 2 *<sup>p</sup>* = 0.31, and a significant interaction between these factors, *F*(1, 15) = 4.54, *p* = 0.05, η 2 *<sup>p</sup>* = 0.23. Simple main effects analyses indicated that under easy secondary task conditions, participants were faster to identify cued targets (*M* = 458 ms) than uncued targets (*M* = 479 ms), *F*(1, 30) = 11.28, *p <* 0.01; however, there was no such cueing effect under hard secondary task conditions (cued targets: *M* = 576 ms; uncued targets: *M* = 582 ms), *F*(1, 30) = 0.44, *p* = 0.51.

#### Peripheral Cueing Task

The equivalent analysis conducted on the data from participants who performed the peripheral cueing trials yielded main effects of secondary task, *F*(1, 15) = 40.39, *p <* 0.001, η 2 *<sup>p</sup>* = 0.73, and cue validity, *F*(1, 15) = 56.98, *p <* 0.001, η 2 *<sup>p</sup>* = 0.79, and a significant interaction between these factors, *F*(1, 15) = 22.48, *p <* 0.001, η 2 *<sup>p</sup>* = 0.60. Subsequent simple main effects analyses confirmed that the effects of cue validity were reliable under both easy secondary task conditions (cued targets: *M* = 433 ms; uncued targets: *M* = 526 ms), *F*(1, 30) = 78.60, *p <* 0.001, and hard secondary task conditions (cued targets: *M* = 541 ms; uncued targets: *M* = 613 ms), *F*(1, 30) = 21.93, *p <* 0.001 with the interaction presumably arising because the magnitude of the cueing effect was larger under the former (93 ms) than the latter (72 ms)<sup>2</sup> .

The percentage of correct responses are also shown in **Table 2**. Participants were clearly performing at a high level of accuracy and there is no evidence of a trade off between speed and accuracy that would compromise interpretation of the RT data. No further analyses were conducted on these data.

#### **Discussion**

In Experiment 2 all participants performed a target identification task instead of the target localization task used in Experiment 1. For half of the participants, spatial cues were provided by a gaze shift, as in Experiment 1, whereas peripheral luminance transients formed the cues for the remaining participants. Once again, participants carried out the gaze-cueing task, or peripheral orienting task while simultaneously performing an easy secondary task in some blocks of trials, and a hard secondary task RNG in others. Results indicated significant cueing effects under the easy secondary task conditions for both types of cue; however, the gaze cueing effect, but not the peripheral cueing effect, was eliminated when participants simultaneously performed the executively demanding RNG task. This finding supports the conclusion from Experiment 1 that gaze-cued orienting of attention and RNG involve at least some of the same cognitive mechanisms.

One curious aspect of the data is the observation that the peripheral cueing effect was actually reduced, though not eliminated, under hard secondary task conditions. Peripheral luminance changes are thought to capture attention in a purely stimulus-driven fashion (e.g., Jonides and Yantis, 1988; Yantis and Jonides, 1990; Franconeri et al., 2005), so why should the cueing effect have been influenced at all by an executively demanding secondary task? One possibility is that under the easy secondary task conditions, the procedure allowed peripheral cues to trigger both an exogenous and an endogenous orienting of attention. Studies investigating the time courses of the two types of orienting suggest that each have distinct but overlapping time courses: orienting based on peripheral cues occurs rapidly and is strongest between 100 and 300 ms after cue onset, with a peak at around 150 ms; endogenous orienting is rather slower and reaches its peak at around 300 ms (e.g., Müller and Rabbitt, 1989; Cheal and Lyon, 1991). Thus, at the SOA of 300 ms used in Experiment 2, we might expect both kinds of attention to be deployed toward the target location, producing additive effects on RT under easy secondary task conditions. If RNG disrupts only endogenous orienting, this will still leave some facilitation caused by the rapid exogenous orienting of attention under the more difficult secondary task, as was observed.

A similar argument might be made for gaze-cued orienting. At an SOA of 300 ms the advantage for target identification at cued versus uncued locations could involve both an exogenous and an endogenous deployment of attention, with RNG disrupting only the latter. However, as we have observed, there is no residual cueing effect under difficult dual task conditions that could be attributed to exogenous factors. Therefore, the gaze-cueing effect

<sup>2</sup>As with Experiment 1, we also performed an ANOVA including target location (left vs. right) as an additional repeated measures factor, but again this analysis failed to yield any significant effects involving this factor (*p*s *>* 0.14).

Bobak and Langton Gaze cueing and working memory

observed under easy secondary task conditions is likely to be driven by some of the same endogenous mechanisms that are involved in RNG.

# **General Discussion**

The two experiments reported here investigated the extent to which gaze-cued orienting of attention is under top-down control. In each experiment, we assessed RT to targets whose location was cued by a gaze shift, relative to targets that appeared in a location opposite to that indicated by the direction of gaze. In order to assess the involvement of voluntary control in gaze cueing, performance was assessed while participants simultaneously completed an easy secondary task, and compared with performance while executing a demanding secondary task. With both a target localization (Experiment 1) and a target identification (Experiment 2) decision, a gaze cueing effect was observed when participants were simultaneously executing the undemanding secondary task—repeatedly reciting the digits 1–9 in sequence; however, gaze cueing was disrupted when participants were simultaneously generating random numbers. RNG is argued to place high demands on WM resources (e.g., Vandierendonck et al., 2004; Towse and Cheshire, 2007). The conclusion is therefore that these same resources are involved in the orienting of attention made on the basis of an observed shift in someone's gaze. In other words, gaze-cued attention is not a strongly automatic process and is instead under a degree of top-down control.

The results obtained in these experiments contradict those of Law et al. (2010) and Hayward and Ristic (2013) who found that gaze-cued orienting was resistant to a secondary task load. However, as argued above, it may be that the secondary tasks used in these studies could be temporarily suspended while participants performed the gaze-cueing trials. Our data show that a WM task that runs fully in parallel with gaze cueing trials (i.e., it is not suspended at any point during the gaze cueing trials) does, indeed, disrupt the gaze cueing effect.

Should we therefore understand gaze-cued orienting to be simply another manifestation of volitional, endogenous orienting of attention—in other words, the deliberate allocation of attentional resources in response to current goals? The answer seems to be no. While our data suggest that gaze-cued orienting shares resources with whatever control processes are used in RNG, plenty of other data point to it being much more like a stimulusdriven effect—the allocation of resources based on factors external to the observer; for example, it is observed even when gazes are known to be uninformative or even counter-informative of the likely location of an upcoming target (see Frischen et al., 2007). Indeed, at least two studies have shown that attention can be deployed volitionally toward a location opposite to that indicated by a gaze cue, at the same time as being deployed in the direction indicated by the direction of gaze (Friesen et al., 2004; Hayward and Ristic, 2013). These data suggest that gaze-cued attention and volitional orienting are independent of one another.

So, gaze-cued attention should not be thought of as another example of a purely volitional process (i.e., endogenous orienting), but then neither can it be described as a stimulus-driven reflex (i.e., exogenous orienting). Stimulus-driven processes occur whenever their triggering stimuli are present, and are resistant to concurrent load manipulations. The data reported here suggest that, in contrast, gaze-cued orienting *is* influenced by a concurrent WM load. Gaze-cued attention therefore clearly bears a resemblance to exogenous orienting as well as to endogenous forms of orienting. The difficulty, then, is generating a theory that can account for these seemingly contradictory observations.

Ristic and Kingstone's (2012) solution to the dilemma is that gazes, arrows and words with spatial meaning engage a unique mechanism called *automated symbolic orienting*, which occurs without intention, and arises as a result of the overlearning of associations between cues and target events. Our proposal is different in that it acknowledges a specific role for a top down mode of control in gaze-cued orienting. We suggest that orienting to gazes occurs as a result of an internally generated goal that is maintained by top-down signals from the WM. This goal might be characterized by the rule "look where others look" and may arise through, for example, learning about contingencies between gazes and rewarding target events, a suggestion originally made by Langton and Bruce (1999) and Driver et al. (1999) to explain their observations of gaze-cued orienting.

The key idea is that "look where others look" is a goal state that is almost permanently maintained by top-down signals that activate mechanisms involved in detecting and responding to the appropriate environmental trigger (a gaze shift, for example). This top-down activation is what gives gaze-cued orienting its resemblance to endogenous attentional control. However, because of this top-down activation, any stimulus that meets the relevant criteria (e.g., moving eyes or eye-like stimuli) will trigger the associated behavior (an attention shift). This attention shift occurs as long as the default goal state remains undisrupted by other, highly demanding attentional goals that engage WM concomitantly.

Notably, the gaze-cued orienting effect will persist even in the face of concurrent task demands, as long as the concurrent task does not recruit the same top-down mechanisms that are involved in maintaining the "look where others look" goal state. Repeatedly counting from 1 to 9 is a well practiced routine, which does not require the generation and maintenance of complex stimulusresponse mappings, establishment of novel module-to-module couplings, iterative monitoring and modification of performance and so on. Maintaining a digit load in WM may be similarly untaxing, as it relies on a dedicated component of WM (e.g., the phonological loop in the WM model, see Baddeley, 2000) and it is unclear whether it is performed in parallel with the gaze cueing trials. RNG, on the other hand, requires much more in the way of controlled processing. One must first generate a strategy in order to produce the desired output; representations of the possible response alternatives must be activated and maintained in WM so that they are available for selection; the output must be monitored in relation to some internally generated concept of randomness; and it is likely that inhibitory processes act to suppress the generation of overlearned sequences (Towse and Cheshire, 2007). These might be thought of as a number of sub-goals that must be generated and maintained in order to satisfy the main task goal of generating the random sequence. We suggest that it is this requirement that swamps the ability to maintain the goal of looking where others look (cf. Duncan et al., 1996).

This theory suggests that it is the number of simultaneously active sub-goals required of RNG that disrupts the orienting of attention to seen gazes; however, it is of course possible that the source of interference is one or more of the component processes themselves. Further research will be required to explore this possibility. The theory also presents a solution to another puzzle: if gaze-cued orienting were truly a stimulus-driven process, it ought to occur every time a gaze shift is viewed, and would likely be accompanied by an overt shift in gaze as covert and overt orienting usually, but not inevitably, occur in tandem (see Findlay and Gilchrist, 2003); yet automatic *overt* attention shifts in response to others' gazes patently do not occur outside the confines of the laboratory. How is it that averted gazes that when seen in the laboratory readily trigger covert attention shifts do not seem to trigger overt shifts in more naturalistic situations? The answer may be that gazes seen in natural situations simply do not tend to trigger covert shifts of attention due to high cognitive demand imposed by social situations in which these gazes occur. Indeed, covert gaze-cueing might be observed in the laboratory where participants' concurrently active goals are reduced to the generation and maintenance of relatively straightforward stimulus-response mappings (e.g., press the top button for a letter T, the bottom button for a letter F); however, the effect may vanish in many normal interactions in which participants tend to have multiple, continuously changing concurrent goals. Pertinently, in their recent study, Gregory et al. (2015) showed that when viewing a "live" scene with socially engaged actors, overt attention to gazes and heads is reduced (cf. Freeth et al., 2013). The authors explain their findings in terms of a cognitive load that is required for processing bodies, and making higher cognitive judgements about the presented social scene. This load disrupts "reflexive" shifts of attention present in viewing gazes passively such as in a laboratory environment. It is possible that the secondary task used in our studies produced similarly high cognitive demands for the WM system to stop prioritizing gazes.

An alternative explanation for our data is that rather than imposing high general cognitive demands, RNG exerts its effects on gaze cued orienting specifically through disrupting the spatial processing involved in extracting gaze direction from the eyes and executing an attention shift in the computed direction. In support of this suggestion, it is well known that the mental representations of numbers are associated with spatial codes (e.g., Zorzi et al., 2002), with low numbers associated with the left side of space and high numbers with the right side of space (Dehaene et al., 1993). Pertinently, there is also a large body of research showing that parietal cortex is involved in numerical representations in humans and primates (see Nieder, 2004, for a review) and that gaze cued

# **References**


The proposal is, then, that the same spatial processing resources may be involved in gaze-cued orienting and RNG. This is an intriguing suggestion as it could account for why RNG disrupts gaze cued orienting, whereas other high load tasks do not. It is not immediately obvious, however, why the generation of numbers in an ordered sequence in our easy secondary tasks would not also involve the same spatial resources as does generating the same digits in a random order. Indeed, one might argue that spatial coding is actually stronger in the case of ordered number generation as one can readily imagine the ordered sequence in a number line from left to right. On this view it seems likely that any spatial coding induced by the generation of numbers is controlled across the secondary tasks used in our experiments. In support of a spatial account, it could be argued that RNG draws more heavily on spatial resources than does ordered number generation, for the latter simply involves reading off a stereotyped verbal sequence, which might not involve the activation of individual spatial representations to the same extent as RNG. Indeed, numbers are likely associated with different kinds of representations—verbal as well as visuo-spatial—with different representations deployed according to the nature of the number-involving task (e.g., van Dijck et al., 2009). Given this, it is of course possible that neither secondary task involves the activation of spatial codes; both random and ordered number generation may involve verbal rather than spatial coding of numbers. According to this account, neither task would impact upon gaze-cued orienting through drawing upon a limited spatial resource.

Our data do not allow us to tease apart these possibilities directly, although the fact that the spatial location of the target interacted with neither secondary task nor cue validity hints that spatial coding may not be a crucial factor1,2. Nevertheless, the suggestion that RNG exerts its effects on gazed-cued orienting through a spatial mechanism is clearly one that warrants further research.

In summary, in two experiments, we assessed the effects of a concurrent WM demand on social orienting. Our main finding was that social attention was disrupted by the RNG task. Data from this study stands in contrast to previous laboratory-based findings in suggesting that attention cued by gazes is, indeed, dependent on top-down control.

# **Acknowledgments**

This research was supported by the Economic and Social Research Council (grant number ES/1034803/1). The authors thank Graeme Lavery and Amy Walker who collected the data for Experiment 2, and two reviewers for their helpful comments on an earlier draft of the manuscript.


Baddeley, A. D. (2002). Is working memory still working? *Euro. Psychol*. 7, 85–97. doi: 10.1027//1016-9040.7.2.85


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Bobak and Langton. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Space-based and object-centered gaze cuing of attention in right hemisphere-damaged patients

*Mario Dalmaso1\*, Luigi Castelli1,2, Konstantinos Priftis3,4, Marta Buccheri1, Daniela Primon5, Silvia Tronco5 and Giovanni Galfano1,2*

*<sup>1</sup> Department of Developmental and Social Psychology, University of Padova, Padova, Italy, <sup>2</sup> Center for Cognitive Neuroscience, University of Padova, Padova, Italy, <sup>3</sup> Department of General Psychology, University of Padova, Padova, Italy, <sup>4</sup> Human Inspired Technologies Research Center, University of Padova, Padova, Italy, <sup>5</sup> Department of Rehabilitation, Unità Locale Socio Sanitaria 15, Cittadella, Italy*

Gaze cuing of attention is a well established phenomenon consisting of the tendency to shift attention to the location signaled by the averted gaze of other individuals. Evidence suggests that such phenomenon might follow intrinsic object-centered features of the head containing the gaze cue. In the present exploratory study, we aimed to investigate whether such object-centered component is present in neuropsychological patients with a lesion involving the right hemisphere, which is known to play a critical role both in orienting of attention and in face processing. To this purpose, we used a modified gazecuing paradigm in which a centrally placed head with averted gaze was presented either in the standard upright position or rotated 90◦ clockwise or anti-clockwise. Afterward, a to-be-detected target was presented either in the right or in the left hemifield. The results showed that gaze cuing of attention was present only when the target appeared in the left visual hemifield and was not modulated by head orientation. This suggests that gaze cuing of attention in right hemisphere-damaged patients can operate within different frames of reference.

Keywords: gaze cuing, object-centered attention, right hemisphere-damaged patients, hemispheric asymmetry, social cognition

# Introduction

The eyes of our conspecifics represent a privileged target for our attention, as shown by several recent studies (e.g., Birmingham et al., 2008; Levy et al., 2013; Boggia and Ristic, 2015). The prioritized processing of eye gaze stimuli might be related to the fact that they are a valuable source of information which provides important insights not only about where other individuals are attending to, but also about their internal states such as future intentions or beliefs (e.g., Baron-Cohen, 1995). This, in turn, can help us developing a better interaction with our social and physical environment (e.g., Emery, 2000; Shepherd, 2010).

The relevance of the eye gaze of others has been testified by a phenomenon known as gaze cuing of attention, which consists of the tendency to shift attention in the direction gazed by a face (for a review, see Frischen et al., 2007). This can be empirically investigated by asking participants to manually respond to a lateralized target that is preceded by the onset of a task-irrelevant centrally placed face with averted gaze. Shorter reaction times (RTs) are generally observed when the target appears at the same spatial location indicated by the gaze

#### *Edited by:*

*Paola Ricciardelli, University of Milano-Bicocca, Italy*

#### *Reviewed by:*

*Marco Tullio Liuzza, Sapienza University of Rome, Italy Andrea Marotta, Sapienza University of Rome, Italy*

#### *\*Correspondence:*

*Mario Dalmaso, Department of Developmental and Social Psychology, University of Padova, Via Venezia 8, 35131 Padova, Italy mario.dalmaso@gmail.com*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 22 April 2015 Accepted: 20 July 2015 Published: 04 August 2015*

#### *Citation:*

*Dalmaso M, Castelli L, Priftis K, Buccheri M, Primon D, Tronco S and Galfano G (2015) Space-based and object-centered gaze cuing of attention in right hemisphere-damaged patients. Front. Psychol. 6:1119. doi: 10.3389/fpsyg.2015.01119* of the face stimulus, rather than when the target appears elsewhere (i.e., the gaze-cuing effect; see Friesen and Kingstone, 1998). This pattern of results confirms that individuals tend to shift attention towards the same direction indicated by eye gaze stimuli (see also Driver et al., 1999; Galfano et al., 2012). The intrinsic social nature of this type of behavior has recently been supported by several studies conducted both on healthy participants (e.g., Teufel et al., 2010; Schulz et al., 2014; Cole et al., 2015) and on clinical populations (e.g., Nestor et al., 2010; Dalmaso et al., 2013, 2015; Marotta et al., 2014). In particular, as for healthy participants, it has been shown that gaze cuing is strongly affected by social variables related to both the observer and the person observed, as well as to the relationship between the two individuals. For instance, group membership (e.g., Pavan et al., 2011; Ciardo et al., 2014; Chen and Zhao, 2015), social status/dominance (e.g., Jones et al., 2010; Dalmaso et al., 2012, 2014), political affiliation (Liuzza et al., 2011, 2013; see also Carraro et al., 2015), trustworthiness (e.g., Süßenbach and Schönbrodt, 2014), and participants' age (e.g., Slessor et al., 2008; Kuhn et al., 2015), autistic traits (e.g., Senju et al., 2004; Bayliss et al., 2005; Ristic et al., 2005) or phobias (e.g., Pletti et al., 2015) can all impact gaze cuing of attention.

The great relevance of gaze in shaping human behavior prompted researchers to hypothesize the presence of a neurocognitive mechanism specifically devoted to gaze cuing of attention, although the results are not always consistent (e.g., Hietanen et al., 2006; Tipper et al., 2008; Nummenmaa and Calder, 2009). However, according to a recent neuroimaging study, the neural underpinnings of gaze cuing of attention seem to involve several brain areas related to gaze and face processing (Callejas et al., 2014). In more detail, these brain areas would first process sensory information conveyed by facial stimuli and, subsequently, this information would be passed to several regions involved in orienting of attention. Interestingly, these regions would mostly be located in the right hemisphere – which is well known to be specialized for face processing (e.g., Ojemann et al., 1992) – and would include the right-posterior superior temporal sulcus, the rightposterior intraparietal sulcus, and the right-inferior frontal junction.

The involvement of the right hemisphere in gaze cuing of attention has been also investigated in studies adopting a causal approach with both healthy individuals (e.g., Porciello et al., 2014) and neuropsychological patients. As for the studies with patients, Kingstone et al. (2000) observed gaze cuing of attention in two split-brain patients, but only when a lateralized eye gaze cue was projected towards the right hemisphere. Interestingly, when gaze cues were replaced with a non-social cue such as an arrow, the cuing effect was bilateral (Ristic et al., 2002). Other studies focused on patients with brain lesions which were specifically localized in the right hemisphere. In this regard, Akiyama et al. (2006) observed a preserved arrow cuing of attention in the face of an impaired gaze cuing of attention in a patient with a rare lesion circumscribed to the right superior temporal gyrus, which has been shown to be crucially involved in face and gaze processing (e.g., Allison et al., 2000). However, no clear conclusion about the eventual lateralization of the effects as a function of the visual hemifield can be drawn because only right-sided targets were tested because of the patient's left hemianopia. Furthermore, Vuilleumier (2002) presented four right hemisphere-damaged patients with peripheral targets and eye gaze cues (Experiments 4 and 5) or arrow cues (Experiment 6). Even if left neglect was present in all participants, eye gaze stimuli elicited a reliable orienting detectable even in the contralesional side, whereas arrow cuing of attention was overall weaker. More recently, Bonato et al. (2009) tested right hemisphere-damaged patients (either with or without left neglect) by presenting centrally placed symbolic cues (i.e., arrows and numbers) and schematic eye gaze stimuli. Strikingly, in both groups (patients with or without left neglect) reliable orienting of spatial attention emerged in response to arrow cues but not to numbers, whereas eye gaze produced orienting of attention only in patients without left neglect.

All the aforementioned neuropsychological studies provided interesting insights regarding the functioning of a broad brain network that would support gaze cuing of attention. Even if a direct comparison of the findings of those studies is difficult because of the type of different brain lesions characterizing the patients and because of the adopted paradigms, a common feature of these studies is that they only focused on the spacebased component of visual attention. Indeed, participants were presented with centrally placed faces (or eyes only) displayed upright and targets could generally appear either in the gazedat location or in the opposite hemifield. This approach, however, does not allow to tease apart the contribution of two different modalities of attention shifting depending on specific reference frames. Indeed, on the one hand, attention mechanisms operate on simple spatial coordinates along hypothetic spatial vectors. However, we know that humans can shift their attention at least within another frame of reference, which is objectcentered (e.g., Fink et al., 1997; Behrmann and Tipper, 1999). In this case, the way individuals allocate their attentional resources in response to a cuing stimulus is shaped by intrinsic structural features of the object rather than by the simple spatial information it conveys. According to neuroimaging evidence, space-based and object-centered attention mechanisms would be mainly served by common brain regions primarily located in the parietal cortex. In particular, these brain regions would include the left lateral inferior parietal cortex, the left prefrontal cortex, the left and right medial superior parietal cortex and also the cerebellar vermis (Fink et al., 1997). In addition, other brain regions would be differently recruited for the two frames of reference. Indeed, while object-centered attention would also recruit the left striate and prestriate cortex, space-based attention would recruit regions located in the right hemisphere such as the inferior temporal/fusiform gyrus and the dorsolateral prefrontal cortex (Fink et al., 1997).

In the same vein, studies addressing the relationship between gaze cuing and frames of reference provided evidence that attentional shifts occur even when the head is not presented in the standard upright position (Bayliss et al., 2004; see also Bayliss and Tipper, 2006). In more detail, Bayliss et al. (2004) employed a standard gaze-cuing task in which a central face with direct gaze suddenly looked rightwards or leftwards. After that, a to-be-detected target was presented either in the right or in the left hemifield. The peculiarity of this task was that the facial stimulus could appear either in the canonical upright orientation, resulting in a face looking rightwards or leftwards, or rotated 90◦ clockwise or anti-clockwise, resulting in a face looking upwards or downwards. When the face was presented upright, participants were fasterin detecting targets that appeared in the same spatial location indicated by eye gaze (space-based orienting). Intriguingly, when the face was presented rotated, participants were still faster in detecting targets that appeared in the spatial location that would have been looked by the face, had this been presented upright (object-centered orienting). For instance, faces rotated 90◦ clockwise with eye gaze directed downwards elicited faster responses for targets that appeared on the right than on the left part of the screen. On the contrary, faces rotated 90◦ anti-clockwise with eye gaze directed downwards elicited faster responses for targets that appeared on the left than on the right side of the screen. This pattern of results is in line with previous evidence that suggested that eye gaze direction and head orientation are computed in parallel (e.g., Langton et al., 2000) rather than sequentially (i.e., eye gaze direction first, followed by head orientation), as originally proposed by the pioneering studies conducted by Perrett et al. (1992). This would explain the presence of the gaze cuing effect even within the object-centered frame: in this case, individuals would tend to compute gaze direction as if the head was oriented upright, which is undoubtedly more likely to occur during everyday social interactions (see Bayliss and Tipper, 2006). From a neuroanatomical perspective, the computation of eye gaze and head directions would be mainly supported by the right superior temporal sulcus, a brain area heavily involved in face processing (e.g., Haxby et al., 2000). However, more work is needed in order to get a broader picture concerning the neural mechanisms underlying this social form of spatial orienting.

Interestingly, both space-based and object-centered attention components seem to be preserved in right hemisphere-damaged patients (e.g., Driver and Halligan, 1991; Behrmann and Tipper, 1999). For instance, Behrmann and Tipper (1999) presented right hemisphere-damaged patients with two disks connected by a line and placed one in the left hemifield and one in the right hemifield, and two squares placed one in the left hemifield and one in the right hemifield. In this frame, slower RTs were reported in response to targets that appeared on stimuli (i.e., both circles and squares) on the left rather that on the right. However, when the two disks inverted their spatial position by rotating of 180◦, slower RTs were reported in response to targets that appeared on the right disk as compared to RTs in response to targets on the left disk. As for squares, which contrary to disks remained stationary, slower RTs continued to be reported in response to left targets. These intriguing results seem to confirm that right hemisphere-damaged patients can allocate visual attention in different frames simultaneously. However, to the best of our knowledge, so far no studies have investigated this ability in gaze cuing of attention.

The aim of the present study was, therefore, twofold. Firstly, we aimed to provide further evidence concerning gaze cuing of attention in right hemisphere-damaged patients. Contrary to previous studies using schematic faces as cuing stimuli (e.g., Vuilleumier, 2002; Bonato et al., 2009), here we employed 3D avatars with a greater degree of ecological validity that should make eye gaze stimuli particularly relevant. Indeed, according to recent evidence, the sensitivity to eye gaze direction seems to be decreased when line-drawn face stimuli – such as those employed both in Bonato's and in Vuilleumier's studies – are employed (Rossi et al., 2015). On the contrary, the use of 3D avatars should facilitate the emergence of a robust gaze cuing of attention maintaining, at the same time, a strict control on the physical features of the facial stimuli.

Secondly, we aimed to explore whether right hemispheredamaged patients exhibit a specific difficulty in object-centered gaze cuing of attention which could not be detected in previous studies that invariably used upright faces (e.g., Vuilleumier, 2002). To this purpose, we exploited the paradigm devised by Bayliss et al. (2004). Because for clinical testing a slightly different experimental setting was employed, we first attempted to replicate the main findings observed by Bayliss et al. (2004) in a sample composed of young healthy individuals (Experiment 1). In more detail, we expected, in line with Bayliss et al. (2004), a reliable and comparable gaze cuing of attention irrespectively of whether facial stimuli were presented upright or rotated. The same task employed in Experiment 1 was then administered in Experiment 2 to a group of right hemisphere-damaged patients, and to a matched group of healthy controls. We focused on a sample of right hemispheredamaged patients in keeping with early neuropsychological studies using spatial cuing procedures (e.g., Posner et al., 1984). In addition, we included patients displaying diffused lesions and did not address specific brain areas because neuroimaging evidence suggests that a wide neural circuitry is involved in face processing and social attention (e.g., Allison et al., 2000; Haxby et al., 2000; Callejas et al., 2014). If right hemispheredamaged patients process eye gaze stimuli within different frames of reference, then gaze cuing of attention should emerge irrespectively of head orientation. On the contrary, if right hemisphere-damaged patients process eye gaze stimuli only within a canonical framing in which head stimuli are presented upright, then gaze cuing of attention should be expected only within this frame. In both cases, these results coming from neuropsychological patients could hopefully provide new insights concerning both the behavioral mechanisms and the neural underpinnings of the space-based and the object-centered gaze cuing of attention.

# Experiment 1: Young Healthy Adults

#### Materials and Methods Participants

Twenty-six first-year undergraduate students (*Mean age* = 19.27 years, SD = 0.604, 5 males, 4 left handed) enrolled at the University of Padova participated in the experiment as part of course requirements. All participants were naïve to the purpose of the experiment and provided a written consent. The study was approved by the Ethics Committee for Psychological Research at the University of Padova and it was conducted in accordance with the Declaration of Helsinki.

#### Stimuli and Apparatus

Face stimuli consisted of eight 3D full-color avatars (4 males and 4 females) created through FaceGen 3.1. For each face there were three versions: one with direct gaze, one with gaze averted rightwards and one with gaze averted leftward. Faces lacked distracting elements such as hair and clothes (see also Pavan et al., 2011).

Stimulus presentation and data collection were handled through a laptop PC running E-prime 1.1. Participants sat 57 cm from the monitor (1024 × 768 pixels, 60 Hz) on which stimuli were presented against a gray background (*R* = 180, *G* = 180, *B* = 180).

### Procedure

The procedure was similar to that used by Bayliss et al. (2004). Each trial began with a centrally placed black fixation cross (1◦height <sup>×</sup> <sup>1</sup>◦width) for 650 ms (see **Figure 1**), followed by a face with direct gaze which served as a pre-cue. Depending on condition, this face could appear oriented in three different orientations: upright (space-based frame; 16.8◦height × 14.4◦width), rotated 90◦ clockwise or anti-clockwise (object-centered frame). In these two latter orientations, the rotation was centered on the middle of the eyes. After 1500 ms, the same face was presented with gaze averted either rightwards or leftward, which served as a spatial cue. After a fixed 500-ms stimulus onset asynchrony (SOA), a black square (1.3◦height × 1.3◦width) which served as target appeared 13.3◦ to the right or to the left with respect to the center of the screen. Participants were instructed to detect the target by pressing the space bar as fast as possible with the index finger of their dominant hand. In the space-based frame, a congruent trial occurred when the target appeared on the same spatial location gazed at by the upright face stimulus. In the object-centered frame, a congruent trial occurred when the target appeared on the same spatial location looked at by the rotated face stimulus had this been presented upright (see **Figure 1**). Both cue and target stimuli remained visible until the participant's response or until 3000 ms elapsed, whichever came first. We also included catch trials to prevent anticipatory responses. In the case of a catch trial, the target did not appear and participants were instructed to refrain from responding. The red words "NO RESPONSE" and "ERROR" were presented when participants did not respond within 3000 ms (i.e., missed responses) and when they responded on catch trials (i.e., false alarms), respectively.

On each trial, face frame (upright vs. rotated), gaze direction (left vs. right), and target location (left vs. right) were selected randomly. Each combination of these factors was presented an equal number of times. When the face was not upright, head was equally likely to be rotated clockwise or anti-clockwise. The participants were informed that head orientation and gaze direction were both uninformative about the spatial location of the upcoming target, which could appear either on the right or the left with the same probability. Moreover, they were also asked to maintain their eyes on the center of the screen for the whole duration of the experiment. There was a practice block composed of 9 target-present trials and 3 catch trials, followed by three experimental blocks each composed of 64 target-present trials and 16 catch trials. The whole experiment was composed of 240 experimental trials.

# Results

#### Data Reduction

Missed responses (0.24 % of trials) and false alarms (0.4 % of trials) were removed and, because of their low rate of frequency, they were not analyzed further. Anticipations, defined as RTs less than 100 ms and outliers, defined as RTs that fall 3 SD above the mean of each participant were also removed (1.5% of trials; see also Bonato et al., 2009).

## Reaction Time Analysis

Reaction times for correct responses were analyzed using JASP 0.7 software (Love et al., 2015) through a repeated-measures ANOVA with cue-target spatial congruency (2: congruent vs. incongruent) and frame (2: spatial vs. object) as withinparticipant factors. Furthermore, in order to assess which model (i.e., H0 vs. H1) was more likely supported by the current data, the Bayes Factor (BF; e.g., Rouder et al., 2009) was also computed.

The only significant main effect was cue-target spatial congruency, *<sup>F</sup>*(1,25) <sup>=</sup> 8.484, *<sup>p</sup>* <sup>=</sup> 0.007, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.253, confirming the presence of an overall gaze-cuing effect with shorter RTs on spatially congruent trials (*M* = 328 ms, SE = 7.06) than on spatially incongruent trials (*M* = 334 ms, SE = 7.64). The main effect of frame only approached significance, *F*(1,16) = 3.966, *<sup>p</sup>* <sup>=</sup> 0.057, <sup>η</sup><sup>2</sup> <sup>p</sup> <sup>=</sup> 0.137 (see **Figure 2**). Importantly, the cuetarget spatial congruency × frame interaction was not significant (*F* < 1, *p* = 0.894), suggesting a comparable gaze-cuing effect in each frame. In line with this, BF analysis showed that the model with only main effects, BF10 = 9.829, was preferable over the model also including the interaction, BF10 = 2.632. For completeness, one-tailed paired *t*-tests were performed between congruent and incongruent trials divided by frame. These analyses revealed a significant gaze-cuing effect both in the space-based frame, *t*(25) = 2.147, *p* = 0.021, *dz* = 0.421, and in the object-centered frame, *t*(25) = 2.097, *p* = 0.023, *dz* <sup>=</sup> 0.411 (see **Figure 2**). BF analysis showed that both in the space-based frame, BF10 = 2.610, and in the object-centered frame, BF10 = 2.843, the model supporting H1 (i.e., the presence of the gaze cuing effect) was preferable over the model supporting H0 (i.e., the absence of the gaze cuing effect).

Overall, this pattern of results is fully consistent with the findings reported by Bayliss et al. (2004), and it confirms that the paradigm used here is suitable for revealing both a space-based and an object-centered component in gaze cuing. Therefore, the same paradigm was also used in Experiment 2 in order to assess whether those two components emerged in right hemispheredamaged patients.

FIGURE 1 | Stimuli (not drawn to scale) and sequence of events for (A) an incongruent trial with a head oriented upright (space-based frame), (B) a congruent trial with a head oriented clockwise (object-centered frame), and (C) an incongruent trial with a head oriented anti-clockwise (object-centered frame).

# Experiment 2: Right Hemisphere-Damaged Patients vs. Healthy Matched Controls

# Materials and Methods Participants

The experimental group was composed of eleven individuals recruited in a public clinic located in northern Italy. They were recruited on the basis of the lack of mental retardation and a diagnosis of brain lesions limited to the right hemisphere, in accordance with board-certified neuroradiological reports (see **Figure 3**). Two patients were excluded from the analyses, because of difficulties in understanding the instructions and completing the experiment. The final sample was thus composed of nine patients (*Mean age* = 63 years, SD = 15.2, *mean education* = 7.56 years, SD = 2.65, three females, all right handed). Demographic and clinical information of patients is reported in **Table 1**.

The control group was composed of 9 healthy individuals (*Mean age* = 63.11 years, SD = 15.44, *mean education* = 11.22 years, SD = 5.26, three females, all right handed), recruited in the local population to match the patients

for age, education, gender, and handedness. Two-tailed paired *t*-tests between mean age, *t*(16) = 0.15, *p* = 0.988, and education, *t*(16) = 1.867, *p* = 0.087, of patients and controls confirmed

that the two groups were roughly comparable. An interview was administered to all of them in order to exclude previous history of neurological disease.

#### Stimuli and Apparatus

Stimuli and apparatus were identical to those employed in Experiment 1.

### Procedure

The procedure was identical to that employed in Experiment 1.

#### Results

### Data Reduction

Data reduction was the same as that adopted in Experiment 1. Missed responses (6.35% of trials) and false alarms (2.08% of trials) were removed and analyzed separately. Anticipations, defined as RTs less than 100 ms and outliers, defined as RTs that fall 3 SD above the mean of each participant were also removed (1.24% of trials).

# Reaction Time Analysis

Reaction times for correct responses were analyzed using JASP 0.7 software (Love et al., 2015) through a mixed-design ANOVA with cue-target spatial congruency (2: congruent vs. incongruent) and frame (2: spatial vs. object) as within-participant factors. Hemifield (2: right vs. left) was also included as withinparticipant factor in order to investigate the potential presence of lateralized effects in right hemisphere-damaged patients (see also Bonato et al., 2009). Group (2: right hemisphere-damaged patients vs. healthy controls) was included as between-participant factor.

The main effect of cue-target spatial congruency was significant, *<sup>F</sup>*(1,16) <sup>=</sup> 5.568, *<sup>p</sup>* <sup>=</sup> 0.031, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.258, confirming the presence of an overall gaze-cuing effect with shorter RTs on spatially congruent trials (*M* = 697 ms, SE = 68.24) than on spatially incongruent trials (*M* = 725 ms, SE = 70.96), as well as the main effect of hemifield, *F*(1,16) = 8.738, *p* = 0.009, η2 <sup>p</sup> = 0.353, owing to shorter RTs when the target appeared on the right hemifield (*M* = 634 ms, SE = 57.12) rather than on the left hemifield (*M* = 788 ms, SE = 87.91). The main effect of group was also significant, *F*(1,16) = 6.116, *<sup>p</sup>* <sup>=</sup> 0.025, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.277, owing to shorter RTs in healthy participants (*M* = 539 ms, SE = 63.22) than in right hemispheredamaged patients (*M* = 883 ms, SE = 123.53). The cuetarget spatial congruency × hemifield interaction was significant, *<sup>F</sup>*(1,16) <sup>=</sup> 9.998, *<sup>p</sup>* <sup>=</sup> 0.006, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.385, as well as the hemifield × group interaction, *F*(1,16) = 10.244, *p* = 0.006, η2 <sup>p</sup> = 0.390, while the cue-target spatial congruency × group interaction was not significant, *F*(1,16) = 3.269, *p* = 0.089, η2 <sup>p</sup> = 0.190.


<sup>a</sup>*Confirmed by neuroradiological reports. CN, caudate nucleus; F, frontal; IP, intraparenchymal; O, occipital; P, parietal; PA, parenchymal; T, temporal; TN, thalamic nucleus.* <sup>b</sup>*H, hemorrhagic; I, ischemic.* <sup>c</sup>*Acute event.*

Importantly, all the previous two-way interactions were qualified by the cue-target spatial congruency × hemifield × group three-way interaction, *F*(1,16) = 4.713, *p* = 0.045, η2 <sup>p</sup> = 0.228. This three-way interaction was further analyzed through two separate ANOVAs as a function of the hemifield with cue-target spatial congruency as within-participant factor and group as between-participant factor. As for targets appearing on the right hemifield, the main effect of group was not significant, *F*(1,16) = 2.374, *p* = 0.143, η<sup>2</sup> <sup>p</sup> = 0.129, but the means indicated that RTs were shorter in healthy participants (*M* = 546 ms, SE = 65.19) than in right hemisphere-damaged patients (*M* = 722 ms, SE = 93.82). All other results were nonsignificant (*F*s < 1, *p*s > 0.436; BF10s < 1). Nevertheless, for completeness, one-tailed paired *t*-tests between congruent and incongruent trials divided by group confirmed that the gazecuing effect was absent both in healthy controls, *t*(8) = –0.605, *p* = 0.281, *dz* = –0.202, and in right hemisphere-damaged patients, *<sup>t</sup>*(8) <sup>=</sup> –0.620, *<sup>p</sup>* <sup>=</sup> 0.276, *dz* <sup>=</sup> –0.207 (see **Figure 4**). BF analysis showed that both in healthy controls, BF10 = 0.222, and in right hemisphere-damaged patients, BF10 = 0.221, the model supporting H0 was preferable over the model supporting H1.

As for targets appearing on the left hemifield, the main effect of cue-target spatial congruency was significant, *F*(1,16) = 9.951, *p* = 0.006, η<sup>2</sup> <sup>p</sup> = 0.383, owing to shorter RTs on spatially congruent trials (*M* = 756 ms, SE = 83.98) than on spatially incongruent trials (*M* = 821 ms, SE = 92.84), as well as the main effect of group, *<sup>F</sup>*(1,16) <sup>=</sup> 8.425, *<sup>p</sup>* <sup>=</sup> 0.010, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.345, owing to overall shorter RTs in healthy controls (*M* = 533 ms, SE = 61.61) than in right hemisphere-damaged patients (*M* = 1043 ms, SE = 164.68). The cue-target spatial congruency × group interaction was also significant, *F*(1,16) = 5.170, *p* = 0.037, η2 <sup>p</sup> = 0.244. One-tailed paired *t*-tests between congruent and incongruent trials divided by group indicated that the gazecuing effect was present both in healthy controls, *t*(8) = 2.244, *p* = 0.028, *dz* = 0.748, and in patients, *t*(8) = 2.768, *p* = 0.012, *dz* = 0.923, but the effect was much stronger in the latter case (18 ms vs. 112 ms)1 . BF analysis showed that both in healthy controls, BF10 = 3.261, and in right hemisphere-damaged patients, BF10 = 6.195, the model supporting H1 was preferable over the model supporting H0.

Because Bonato et al. (2009) documented the presence of a disengagement deficit in their patients, at least when arrow cues were used, we also implemented their formula to explore whether such phenomenon was also evident in our sample. To this end, the gaze cuing effect (RTincongruent – RTcongruent) was separately calculated for targets appearing in the left and in the right hemifield, and the difference between them was finally computed (Cuingleft – Cuingright). A one-sample *t*-test showed that this index was significantly different from zero, *t*(8) = 2.761, *p* = 0.025, *dz* = 0.920, thus confirming the presence of a disengagement deficit in our sample of right hemisphere-damaged patients. BF analysis showed that the model supporting H1 was preferable over the model supporting H0, BF10 = 3.134.

Importantly, all the interactions involving cue-target spatial congruency and frame were not significant (*F*s < 1, *p*s > 0.443, BF10s < 1), suggesting a comparable gaze-cuing effect for the two frames (see **Table 2**). Nevertheless, for completeness, onetailed paired *t*-tests were performed between congruent and incongruent trials divided by frame and group. These analyses were carried out only for target appearing on the left hemifield since the gaze-cuing effect was observed only there. As for

<sup>1</sup>The fact that gaze cuing emerged only when target appeared in the left hemifield could be expected for right hemisphere-damaged patients. A similar pattern for healthy controls, however, should not come as a surprise, in that a recent study

found that even in healthy individuals gaze cuing was detectable only when targets appeared on the left hemifield (Marotta et al., 2012a), likely reflecting the cerebral hemispheric specialization for face processing. In order to test whether the gazecuing effect emerged only in the left hemifield also in Experiment 1, we further analyzed RTs data from Experiment 1 through a repeated-measures ANOVA with cue-target spatial congruency (2: congruent vs. incongruent), frame (2: spacebased vs. object-centered) and hemifield (2: right vs. left) as within-participant factors. The results remained virtually unchanged. Indeed, the only significant result was the cue-target spatial congruency main effect, *F*(1,25) = 8.534, *p* = 0.007, η2 <sup>p</sup> = 0.254. The cue-target spatial congruency × hemifield interaction did not reach significance, *<sup>F</sup>*(1,25) <sup>=</sup> 1.289, *<sup>p</sup>* <sup>=</sup> 0.267, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.049, although the gaze cuing effect (RTincongruent – RTcongruent) was larger for targets appearing on the left hemifield (7 ms) as compared to the right hemifield (3 ms). Interestingly, onesample *t*-tests confirmed that the gaze-cuing effect was statistically different from zero in the left hemifield, *t*(25) = 2.870, *p* = 0.008, *dz* = 0.563, but not in the right hemifield, *t*(25) = 1.165, *p* = 0.255, *dz* = 0.228. In line with this, BF analysis showed that the model supporting H1 was preferable over the model supporting H0 in the left hemifield, BF10 = 5.566, but not in the right hemifield, BF10 = 0.381.

healthy controls, a significant gaze cuing emerged in the spacebased frame, *t*(8) = 2.325, *p* = 0.024, *dz* = 0.775, while in the object-centered frame the effect was not significant, *t*(8) = 0.719, *p* = 0.246, *dz* = 0.240, although means were in the expected direction, with shorter RTs on congruent trials (*M* = 534 ms, SE = 59.17) than on incongruent trials (*M* = 540 ms, SE = 63.41). BF analysis showed that the model supporting H1 was preferable over the model supporting H0 in the space-based frame, BF10 = 3.601, while in the object-centered frame this was less evident, BF10 = 0.587. Patients oriented attention in response to gaze both in the space-based frame, *t*(8) = 1.866, *p* = 0.049, *dz* = 0.622, and in the object-centered frame, *t*(8) = 2.258, *p* = 0.027, *dz* = 0.753. BF analysis showed that the model supporting H1 was preferable over the model supporting H0 both in the space-based frame, BF10 = 2.066, and in the objectcentered frame, BF10 = 3.320.

# Error Analysis

Missed responses were analyzed through a mixed-effect logit model (e.g., Jaeger, 2008). In this analysis, cue-target spatial congruency, frame, hemifield, and group were treated as fixed effects, and participant was treated as random effect. In a first model, both main effects and interactions were tested. Under these circumstances, no significant results emerged (*p*s > 0.152). For this reason, a second model was implemented considering only the main effects. In this case, the main effect of hemifield was significant, *b* = –1.440, SE = 0.173, *z* = –8.316, *p* < 0.001, owing to more missed responses when the target appeared on the left hemifield than on the right hemifield. The main effect of group was also significant, *b* = –5.094, SE = 1.236, *z* = –4.121, *p* < 0.001, owing to more missed responses in right hemisphere-damaged patients than in healthy controls (see **Table 2**). Other main effects were not significant (*p*s > 0.635). Model comparison was performed following the guidelines proposed by Bolker et al. (2009). BIC values suggested that evidence supporting the model with only main effects (BIC = 1132.8) over the model in which also interactions were considered (BIC = 1213.0) was very strong (-BIC = 81; see Raftery, 1995). In the same fashion, the likelihood ratio test indicated that the model in which also interactions were considered did not provide additional information with respect to the model with only main effects, χ<sup>2</sup> (11) = 9.691, *p* = 0.558.

Similarly, we also analyzed false alarms in catch trials through a mixed-effect logit model with frame and group as fixed effects, and participant as random effect. In a first model, both main effects and interactions were tested. Under these circumstances, no significant results emerged (*p*s > 0.258). For this reason, a second model was implemented considering only the main effects. In this case, the main effect of group was significant, *b* = –4.992, SE = 1.209, *z* = –4.130, *p* < 0.001, owing to more false alarms in right hemisphere-damaged patients than in healthy controls (see **Table 2**). The main effect of frame was not significant (*p* = 0.647). Model comparison was performed following the guidelines proposed by Bolker et al. (2009). BIC values suggested that evidence supporting the model with only main effects (BIC = 1195.2) over the model in which also interactions were considered (BIC = 1200.4) was positive (-BIC = 5; see Raftery, 1995). In the same fashion, the likelihood ratio test indicated that the model in which also interactions were considered did not provide additional information with respect to the model with only main effects, <sup>χ</sup>2(1) <sup>=</sup> 2.946, *<sup>p</sup>* <sup>=</sup> 0.09.

# Discussion

The ability to orient attention in response to spatial signals provided by our conspecifics represents a key element of human behavior (e.g., Baron-Cohen, 1995), and research has focused on both the cognitive aspects of the phenomenon as well as on the neural underpinnings that would serve gaze cuing of attention. Neuroimaging studies (e.g., Hietanen et al., 2006; Tipper et al., 2008; Callejas et al., 2014) indicate that this form of social orienting involves brain areas mainly localized in the right hemisphere. The possible existence of a broad neural network devoted to gaze cuing of attention emerged also from neuropsychological studies (e.g., Vuilleumier, 2002; Bonato et al., 2009) in which right hemisphere-damaged patients often showed a relatively spared ability to shift attention toward spatial locations indicated by eye gaze stimuli, at least when lesions do not specifically involve the superior temporal gyrus and the superior temporal sulcus (Akiyama et al., 2006).

The general aim of the present study was to provide further evidence on gaze cuing of attention in a sample of right hemisphere-damaged patients. Unlike previous neuropsychological studies, which only focused on the spacebased component of gaze cuing of attention, here we also explored the object-centered component of this form of orienting. To reach this goal, in two experiments, we employed a task similar to that devised by Bayliss et al. (2004) in which a centrally placed head with averted gaze, displayed upright


TABLE 2 | Mean reaction times (RTs; ms) and percentage of errors (%E) for all conditions in Experiment 2.

*Values in brackets are SEM. C, congruent trial; I, incongruent trial; MR, Missed Responses; FA, False Alarms.*

(space-based orienting) or rotated 90◦ clockwise or anticlockwise (object-centered orienting), preceded the onset of a target that could appear either in the right or in the left hemifield. In Experiment 1, we tested a sample of young healthy individuals. The same task was also administered in Experiment 2 to a sample of right hemisphere-damaged patients compared with a matched group of healthy individuals.

As for the overall gaze-cuing effect, the results stemming from right hemisphere-damaged patients were, on the whole, consistent with those reported by Vuilleumier (2002) and Bonato et al. (2009) in that the ability to shift attention in response to eye gaze stimuli was preserved. However, gaze cuing was significant only when targets appeared in the left hemifield (see Bonato et al., 2009). This finding is in line with previous evidence according to which right hemisphere-damaged patients often suffer from a disengagement deficit of attention following a spatially incongruent cue pointing to the right visual hemifield (e.g., Posner et al., 1984; Bartolomeo et al., 2001; for a review, see Bartolomeo and Chokron, 2002). In other words, responses would be particularly slowed down when targets are presented in the contralesional side (i.e., left visual hemifield) after a spatial cue that pushed attention towards the ipsilesional side (i.e., right visual hemifield). The presence of a disengagement deficit seems more frequent in response to peripheral cues (see Losier and Klein, 2001), although it has also been documented in response to centrally placed arrow cues (Bonato et al., 2009; Olk et al., 2010) but not in response to centrally placed eye gaze cues (Bonato et al., 2009). Strikingly, our results provide first evidence of a disengagement deficit in response to centrally placed gaze cues in patients with a damage to the right hemisphere. Despite the comparison between our results and those reported by Bonato et al. (2009) must be taken with caution – due to relevant differences in both the methodology and the clinical samples – the discrepant pattern may be tentatively explained by taking into account the specific type of eye gaze stimuli used in the two studies. Indeed, while in the present study we employed 3D avatars that suddenly moved their eyes rightwards or leftwards – mimicking actual social interactions – in Bonato et al. (2009), participants were presented with schematic eyes in isolation (i.e., not embedded within a face) with static pupils oriented rightwards or leftward. Interestingly, in the present study, also participants from the control group showed a reliable gaze cuing of attention only for targets appearing in the left hemifield, even though this effect was significantly larger among right hemisphere-damaged patients (i.e., 112 ms) as compared to healthy participants (i.e., 18 ms, a magnitude which is in line with previous reports; e.g., Friesen and Kingstone, 1998). Gaze cuing of attention only in response to targets presented in the left hemifield has also been documented in a recent study, conducted by Marotta et al. (2012a), that administered to healthy participants a similar paradigm to that employed here. In more detail, Marotta et al. (2012a) asked participants to detect a target, which could appear rightwards or leftwards, in the presence of centrally presented task-irrelevant arrow and eye gaze cues oriented rightwards or leftwards. Strikingly, while a reliable arrow cuing of attention emerged irrespectively of whether the target appeared in the left or in the right hemifield, a reliable gaze cuing emerged only in response to targets appearing in the left hemifield. The authors interpreted this pattern of results as likely reflecting the specialization of the right hemisphere in face processing (e.g., Ojemann et al., 1992). This conclusion is also consistent with a previous study conducted in healthy individuals that suggests that while symbolic spatial cuing of attention – such as the one obtained with arrows – would be supported by brain mechanisms spread bilaterally, gaze cuing of attention would be specifically supported by brain areas located in the right hemisphere (Greene and Zaidel, 2011). Moreover, as discussed in the introduction, this scenario is also supported by evidence coming from split-brain patients who exhibited arrow cuing of attention in response to targets presented bilaterally in the face of a gaze cuing of attention limited to targets presented to the left (Kingstone et al., 2000; Ristic et al., 2002). However, the scarcity of evidence on this topic invites to take this conclusion with caution and future studies are necessary in order to test exhaustively the possible different contribution that the two hemispheres provide to the social and the symbolic cuing of attention.

One of the major goals of the present study, was also to address the potential role of the frame of reference (i.e., space-based vs. object-centered) in shaping attentional orienting. In Experiment 1, we replicated the pattern of results reported by Bayliss et al.

(2004) in a sample of young healthy individuals. Indeed, a reliable and comparable gaze cuing of attention emerged irrespectively of head orientation. Importantly, in Experiment 2, a similar pattern emerged, namely a gaze cuing of attention of similar magnitude was observed both under space-based and objectcentered frames, at least when targets were presented in the left hemifield. This finding provides further evidence supporting the notion that, also in right hemisphere-damaged patients, visual attention can operate within different frames of reference and suggests that this ability is not limited to symbolic cues (e.g., Driver and Halligan, 1991; Behrmann and Tipper, 1999) but it extends to a social stimulus such as eye gaze. An intriguing research question that could be addressed in future studies is to explore whether space-based orienting and object-centered orienting are sensitive to context information such as in the case in which reference objects (e.g., placeholders) are presented – or not – in the periphery. Indeed, recent evidence has reported that, when in a gaze cuing task no placeholders are presented, the gaze cuing effect emerges not only in response to a specific spatial location but instead it is also detectable in response to targets appearing in different spatial locations within the cued hemifield. On the contrary, when placeholders are used, the gaze cuing effect emerges only in response to targets appearing inside the placeholder (Wiese et al., 2013; see also Marotta et al., 2012b for similar results). Following this rationale, it would be interesting to employ a modified version of the paradigm adopted in the present study in which the presence of peripheral placeholders is manipulated. Following the results reported by Wiese et al. (2013), in the presence of placeholders the gaze cuing effect should emerge only within the space-based frame of reference.

Future work could be carried out also to overcome some limitations that characterize the present study. First of all, at the time of testing we have been unable to administer standardized measures of neuropsychological tests to all the individuals of our clinical sample. This prevented us from assessing the potential presence of hemispatial neglect and its possible role in shaping socio-attentional mechanisms. For instance, a neuropsychological assessment tool such as the Behavioral Intentional Test (e.g., Wilson et al., 1987) could be employed in order to unveil any potential relationship between symptom variables and gaze cuing of attention within different frames of

# References


reference. Furthermore, to what concerns the methodological aspects of the paradigm employed here, it is important to highlight the fact that we used a fixed 500-ms SOA. The main reason for this choice was for coherence with the original study of Bayliss et al. (2004) in which the same SOA was used. However, future studies could employ a broader range of SOA in order to properly assess the temporal dynamics underlying the gaze cuing effect within different frame of reference. Bayliss and Tipper (2006), who employed a similar paradigm as that proposed in Bayliss et al. (2004), used two SOAs of about 200 and 500 ms. In both cases, they reported both space-based and object-centered gaze cuing of attention, but at the shorter SOA this effect was overall weaker, especially within the object-centered frame (gaze cuing effect = 4 ms) as compared to the spatial-based frame (gaze cuing effect = 9 ms).

In summary, our results confirm the presence of spared gaze cuing of attention in right hemisphere-damaged patients, and are overall consistent with previous studies (e.g., Vuilleumier, 2002; Bonato et al., 2009). Furthermore, they provide first evidence that gaze cuing of attention in right hemisphere-damaged patients can operate within different frames of reference. Previous studies only focused on symbolic spatial cues (e.g., Driver and Halligan, 1991; Behrmann and Tipper, 1999), and we here show that objectcentered orienting is preserved also for a relevant social cue such as eye gaze. Because the study of the neural underpinnings underlying gaze cuing of attention is still an ongoing endeavor, further studies are necessary in order to achieve an exhaustive scenario concerning the brain areas involved in this form of social orienting. In this regard, the adoption of a causal approach based on neuropsychological evidence, aimed to address the effects of more focal lesions not limited to the right hemisphere (e.g., Vecera and Rizzo, 2004, 2006), represents a fruitful path for future research.

# Acknowledgments

This research was financially supported by the Italian Ministry of Education, University and Research (Futuro in Ricerca 2012, Grant RBFR12F0BD). We thank Yuri Bertelli for his assistance in data collection of Experiment 1 and Gianmarco Altoè for statistical advice.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Dalmaso, Castelli, Priftis, Buccheri, Primon, Tronco and Galfano. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Emotion recognition through static faces and moving bodies: a comparison between typically developed adults and individuals with high level of autistic traits†

#### *Edited by:*

*Anna M. Borghi, University of Bologna and Institute of Cognitive Sciences and Technologies, Italy*

#### *Reviewed by:*

*Maddalena Fabbri Destro, Istituto Italiano di Tecnologia – Brain Center for Social and Motor Cognition, Italy Lubna Ahmed, St Mary's University Twickenham, UK*

#### *\*Correspondence:*

*Rossana Actis-Grosso rossana.actis@unimib.it*

*†Part of the data was presented by Actis-Grosso and Ricciardelli (2013) at the 29th Annual Meeting of the International Society of Psychophysics, Freiburg, Germany, October 2013.*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 22 June 2015 Accepted: 28 September 2015 Published: 23 October 2015*

#### *Citation:*

*Actis-Grosso R, Bossi F and Ricciardelli P (2015) Emotion recognition through static faces and moving bodies: a comparison between typically developed adults and individuals with high level of autistic traits. Front. Psychol. 6:1570. doi: 10.3389/fpsyg.2015.01570*

#### *Rossana Actis-Grosso1,2\*, Francesco Bossi1 and Paola Ricciardelli1,2*

*<sup>1</sup> Department of Psychology, University of Milano-Bicocca, Milano, Italy, <sup>2</sup> Milan Centre for Neuroscience, Milano, Italy*

We investigated whether the type of stimulus (pictures of static faces vs. body motion) contributes differently to the recognition of emotions. The performance (accuracy and response times) of 25 Low Autistic Traits (LAT group) young adults (21 males) and 20 young adults (16 males) with either High Autistic Traits or with High Functioning Autism Spectrum Disorder (HAT group) was compared in the recognition of four emotions (Happiness, Anger, Fear, and Sadness) either shown in static faces or conveyed by moving body patch-light displays (PLDs). Overall, HAT individuals were as accurate as LAT ones in perceiving emotions both with faces and with PLDs. Moreover, they correctly described non-emotional actions depicted by PLDs, indicating that they perceived the motion conveyed by the PLDs *per se.* For LAT participants, happiness proved to be the easiest emotion to be recognized: in line with previous studies we found a happy face advantage for faces, which for the first time was also found for bodies (happy body advantage). Furthermore, LAT participants recognized sadness better by static faces and fear by PLDs. This advantage for motion kinematics in the recognition of fear was not present in HAT participants, suggesting that (i) emotion recognition is not generally impaired in HAT individuals, (ii) the cues exploited for emotion recognition by LAT and HAT groups are not always the same. These findings are discussed against the background of emotional processing in typically and atypically developed individuals.

Keywords: emotions recognition, faces, biological motion, point-light displays, Autism Spectrum Disorders, Asperger Syndrome, Autism Spectrum Conditions

# INTRODUCTION

Research on emotion recognition has been dominated by studies focusing on faces and using *static stimuli*, in particular static photographs of facial expressions. This is probably due to two reasons. First, the recognition of facial emotional expressions is efficient with both static and moving images (although facial motion increased the likelihood of the recognition of basic expressions, Bassili, 1978, 1979), whereas this is not true with other body parts, in which emotion recognition is far more efficient with dynamic stimuli (e.g., Atkinson et al., 2004). Second, since the seminal study by Ekman et al. (1969), there is well-documented evidence that through facial expressions the human face has evolved as a major signaling and communication channel for emotions. For several decades scientists seemed to have ignored the fact that emotions are expressed and communicate to others with the whole body, which without a doubt means with faces, but also with hands, body postures, velocity of gait, tone and volume of the voice, and so on (i.e., body language). In recent years, however, an increasing number of scientists have become aware of the fact that facial expressions are not the only source of input that conveys emotionally relevant information and there is a small but consistent corpus of research showing that human observers are able to distinguish at least a limited set of emotions from static body expressions in the absence of facial cues (see Atkinson, 2013 for a review).

Emotion processing and emotion recognition have been widely investigated not only in typically developed (TD) individuals, but also in pathological populations, with a particular emphasis on people with Autism Spectrum Disorders (ASDs) or Autism Spectrum Conditions (ASCs). According to DSM-V (American Psychiatric Association [APA], 2013), ASD refers to a set of complex, polygenetic neurodevelopmental disorders, which are characterized, among other symptoms, by social and communication deficits. Recently, the broader label of ASC has been used to characterize difficulties in social and communication functioning alongside repetitive behavior and restricted interests (Baron-Cohen et al., 2009; Ashwin et al., 2015) and includes ASDs. This point of view assumes that ASDs lie on a continuum of social-communication disability which in the general population goes from no impairments to pathological conditions (Baron-Cohen et al., 2001). This view does not support the idea of diagnostic categories of autism but assumes that any person may have "autistic traits" or what has been called "the broader autism phenotype" (Bailey et al., 1995). Therefore, signs of social and communication deficits can be found even in individuals who have not received a formal diagnosis of ASC but present a high level of autistic traits. Even though diagnostic criteria for autism do not require a difficulty in the identification of emotional cues, it is commonly assumed that emotion recognition difficulties are present in individuals with ASC and they may also be present in individuals with "High Autistic Traits" (considered as part of a broader continuum). As in TD individuals, the majority of works on emotion processing in ASC has focused mainly on faces and has used static stimuli.

Although during past decades there has been a growing interest toward the role of facial movement in emotional expressions, the results are controversial, given that it is very hard to separate experimentally the processing of facial identity from that of emotional expressions. In an attempt to reduce nonmotion cues, researchers have typically employed point-light or patch-light displays (hereinafter referred to as PLDs) of human bodies (biological motion), in which static form information is minimal or absent but motion information (kinematics and dynamics) and motion-mediated structural information are preserved (Johansson, 1973). PLDs, in fact, are obtained only by placing single visible markers on some crucial points (i.e., joints) of the body (or of the face in the adaptation used for studying facial motion). These displays have been proven to convey to the human observer a variety of information such as for example the nature of the action (Kozlowski and Cutting, 1978; Dittrich, 1993) and the gender of the actor (Mather and Murdoch, 1994).

There are a growing number of studies using PLDs showing a link between motion and emotion, for both faces and bodies.

While there is no consensus on whether facial motion can facilitate emotion recognition (e.g., Knight and Johnston, 1997; Bould and Morris, 2008; Fiorentini and Viviani, 2011), it is now well recognized that body language (also referred to as bodily kinematics) is sufficient for the perception of emotions (Atkinson et al., 2004; Clarke et al., 2005). This may imply that people are able to perceive emotions from kinematic patterns without having to compute the detailed shape of the human form first.

Evidence is thus accumulating regarding human ability to recognize emotions not only through photographs of facial expressions, which is documented by a lot of research, but also through (a) static body postures, (b) PLDs of moving faces, (c) PLDs of moving bodies and even through (d) PLDs of moving body parts.

Hence, if on the one hand faces are universally recognized as the major signaling and communication channel for emotions (George, 2013), on the other hand a growing body of evidence shows that bodily kinematics are also crucial for emotion recognition. The aim of the present study was therefore to investigate whether emotion recognition differs depending on whether emotions are conveyed through a static face or through body motion. In particular, our scope was to focus on the recognition of emotions on the one hand by excluding the motion component from faces, and on the other hand by concealing face identity from body motion. We hypothesized that since different emotions present different features of faces and dynamic components of body language, they may play a different role in the recognition of different emotions. We think that this comparison could also help in better understanding the process of emotion recognition in tipically developed individuals and shed some new light on ASC, and for this reason we tested both TD individuals and young adults with high functioning ASCs. In fact, it is possible that the static components of faces and the dynamic components of the human body could contribute differently as cues in the recognition of different emotions, and that the role of these cues might differ in individuals with ASC.

Research on emotion recognition difficulties in ASC has reported very mixed results. Several studies found generalized deficits on various emotion reading tasks (e.g., Davies et al., 1994; Corbett et al., 2009), but also a significant number of papers reported no differences between typical and autistic participants (e.g., Baron-Cohen et al., 1997; Castelli, 2005; Jones et al., 2011). Research has also investigated the idea that individuals with autism might have difficulties in the recognition of just some of the six basic emotions (i.e., Happiness, Surprise, Fear, Sadness, Disgust, and Anger) rather than a generalized deficit, but also in this case the results are controversial, with some studies reporting evidence, for example, of a selective difficulties in recognizing surprise (e.g., Baron-Cohen et al., 1993) or fear (e.g., Ashwin et al., 2006; Humphreys et al., 2007; Wallace et al., 2008) and other studies that failed to replicate these findings (e.g., Baron-Cohen et al., 1997; Castelli, 2005; Lacroix et al., 2009).

In a recent meta-analysis, Uljarevic and Hamilton (2013) brought together data from 48 papers, testing over 980 participants with autism, using as stimuli both faces and bodies (and both static and dynamic stimuli). The results of this metaanalysis show that there is an emotion recognition difficulty in autism, with the recognition of happiness only marginally impaired and the recognition of fear slightly worse than that of happiness.

To date, only a few research groups have explored whether individuals with ASC are different from TD observers in body emotion perception from PLDs, but these results are not entirely consistent (see Kaiser and Shiffrar, 2009, for a review). In a series of works by Moore et al. (1997, 2007) ASC-individuals were shown to have a reduced ability, compared to controls, in verbally reporting the subjective states and emotions from PLDs, but no differences were found in reporting actions or objects. According to authors, this deficit in the ability to describe emotional body actions could be interpreted as a deficit either at a perceptual-level (i.e., people with autism do not perceive correctly the emotional information conveyed by PLD kinematics) or at a semantic-level (i.e., people with autism perceive adequately the emotional information, but fail to associate it with the appropriate descriptive words). To solve this ambiguity Atkinson (2009) used a forced choice paradigm to investigate the ability of ASC-individuals to recognize emotions or actions from PLDs. As in previous studies, Atkinson found impairment for ASC in emotion recognition. However, in contrast to Moore and colleagues, the ASCgroup also revealed deficits in labeling the displayed actions from PLDs and, more generally, an elevated motion coherence threshold.

Interestingly, a central issue in explaining the impairment of ASC in recognizing emotions from PLDs concerns a more general impairment in the integration of local elements into a coherent whole. In this respect, the ability to recognize and label biological motion from PLDs, independent from its emotional content, is crucial but research on this issue has led to very mixed results. Specifically, some studies have reported ASC-related impairments in identifying biological motion from PLDs (e.g., Blake et al., 2003; Annaz et al., 2010) whereas other studies failed to reveal any ASC-related impairments (Murphy et al., 2009; Saygin et al., 2010). Although a possible explanation for these discrepant results may rest on differences in the severity of the ASC individuals who participated in the studies (see Blake et al., 2003), a recent study by Robertson et al. (2014) seems to indicate another possible reason for the incongruence. In comparing TD and ASC individuals in a series of coherent motion perception judgements, both TD and ASC participants showed the same basic pattern of accuracy in judging the direction of motion, with performance decreasing with reduced motion coherence and shorter viewing durations of the displays. However, these effects were enhanced in the ASC group: despite equal performance in the longer displays, performance was much worse than the TD group in the shorter displays, and in the decreasing stimulus coherence conditions.

To our knowledge, only two studies have tried to compare faces and bodies in emotion recognition in TD. In an fMRI study, Atkinson et al. (2012) showed participants 2slong digital video clips displaying point-light facial or body movements corresponding to angry, happy and emotionally neutral movements. The results showed, among other things, that facial and body motions activate selectively the Facial Fusiform Area and the Extrastriate Body Area (the former coding for the static structure of faces and the latter for bodies), but no evidence was found for an emotional modulation in these areas.

While in their study Atkinson et al. (2012) were comparing *moving* PLD faces with *moving* PLD bodies, Alaerts et al. (2011) carried out a study in which *static* faces and *moving* bodies were compared, as in the present study. The main aim of Alaerts et al.'s (2011) study was to investigate potential gender differences in a series of tasks involving the recognition of some basic aspects (e.g., displayed actions or PLDs gender) from PLDs depicting body movements of a male and female actor. Additionally, they tested whether the ability to recognize emotions from bodily PLD kinematics was correlated to the ability to recognize emotions from facial cues consisting of static photographs showing the eye region, as assessed by the 'Reading the Mind in the Eyes Test' (revised version, Baron-Cohen et al., 2001). A strong correlation between emotion recognition from body PLDs and facial cues was found, indicating that the ability to recognize the emotions expressed by other individuals is generalized across *facial* and *body* emotion perception.

Yet, no study has ever investigated whether the static components of emotional faces and the dynamic components of body language are differently involved in the recognition of different emotions, which in fact are characterized by different patterns of facial features and bodily kinematics.

We hypothesized that bodily kinematics play a fundamental role in the recognition of some emotions, while facial expressions should be crucial in the recognition of some others. Indeed, while facial expressions of emotions such as happiness and anger are unequivocally recognized as such, facial expressions of other emotions are often confused: for example a fearful face could easily be confused with a surprised one (e.g., Smith and Schyns, 2009; George, 2013). We thus reasoned that bodily kinematics is used by the emotion recognition system to disambiguate between these emotions and for this reason we expect body language to be at least as important as static faces in the recognition of fear, also in light of the fact that fear is usually associated with behaviors such as shivering, which are better detectable through body language than in emotional faces.

On the other hand, the bodily kinematics associated with some emotions such as sadness could easily be confused with neutral kinematics. For example, body language often associates slow gait and some configural cues such as bows and reclined head with sadness. However, for some individuals the very same features can be the default posture and thus do not express any particular emotion, being neutral. We thus expected that for the recognition of sadness, facial expression would play a major role, given also that sadness is often associated with behaviors such as crying or moaning, which are better expressed in the face.

Furthermore, in the literature there are two well documented effects: the so-called happy face advantage and anger superiority effect. The former consists of happy faces being recognized (and remembered) more easily and readily than other emotional faces, such as sad or fearful faces (Leppänen and Hietanen, 2003; Shimamura et al., 2006). Regarding angry faces, the anger superiority effect concerns the fact that it is easier to detect angry faces than happy faces in a crowd of neutral ones: angry faces popout of crowds, perhaps as a result of a preattentive, parallel search (Hansen and Hansen, 1988). It is thus interesting to see whether the same advantages extend also to bodily kinematics. For example, in a study by Atkinson et al. (2004), in which emotion recognition was studied with PLDs and full-light displays in both static and dynamic conditions, and with different qualities of motion expressing the emotions (i.e., normal, exaggerated and very exaggerated), recognition success differed for individual emotions. In particular, it was found that disgust and anger conveyed by dynamic PLDs were more likely to be confused and mixed up with fear whereas the opposite was true for sadness and happiness, which were less likely to be confused. In contrast, in a work by Chouchourelou et al. (2006), among five different emotions, the greatest visual sensitivity was found for angry walkers, and Ikeda and Watanabe (2009) found that the detection of anger was more strongly linked to explicit gait detection than happiness. Furthermore, Atkinson et al. (2012) claimed that their pilot work indicated that angry and happy point-light movements tended to be more readily identifiable than certain other emotions for both facial and body expressions. Therefore, we reasoned that it could be possible to find an advantage for at least one emotion (i.e., happiness), given that the kinematics associated with happiness is special, being faster and smoother than all the others.

Based on previous research on both ASC and TD participants, we also hypothesized about differences between them in terms of the cues they are relying on (i.e., static facial cues or dynamic body cues) to recognize the different emotions. In particular, given that the recognition of happiness is only marginally impaired in ASC individuals (Uljarevic and Hamilton, 2013), we expected them to be as good as TD in recognizing happiness both with facial expressions and PLDs. Instead for fear, we expected a different recognition performance for the ASC and TD group, not only in the light of the worse recognition of fear found in the meta-analysis by Uljarevic and Hamilton (2013), but also on the basis of the several studies that suggest a dysfunction of the amygdala – which has a specific role in the processing of fear (Adolphs, 2008) – in autism, which could cause poor recognition of fear and other negative emotions (Baron-Cohen et al., 2000; Howard et al., 2000; Ashwin et al., 2006). Lastly, given that the detection of anger has been shown to be more strongly linked to explicit gait detection (Ikeda and Watanabe, 2009), a difference in global motion processing in ASC and TD participants could be translated into a different pattern for the recognition of anger.

To summarize, in the present study we wanted to explore several hypotheses. Firstly, we aimed to evaluate the different role of body language and emotional faces in the recognition of different emotions in TD individuals. Secondly, we wanted to unveil any differences between individuals with High Autistic Traits (HAT group) and TD individuals with Low Autistic Traits (LAT group) in recognizing emotions through static faces and PLDs. Specifically, in LAT individuals we expected (i) body language to be at least as effective as static face in the recognition of fear; (ii) sadness to be better recognized through facial expression; (iii) to find an advantage for happiness also when it is conveyed through PLDs (in close similarity with the happy face advantage); (iv) HAT individuals to be as good as LAT ones in recognizing happiness both with facial expressions and PLDs; (v) HAT individuals to rely on different cues for the recognition of fear and anger than LAT participants.

To this end, we performed an exploratory experiment in which we compared the performance (i.e., accuracy and response times) of two groups of participants (i.e., HAT and LAT group) in the recognition of four basic emotions (fear, anger, sadness, and happiness), conveyed either by static face images or by PLDs1 .

To make sure that all participants could correctly perceive the motion conveyed by the PLDs *per se*, a control test referred as "action recognition test" (see Alaerts et al., 2011) was conducted using biological motion displays, in which the actor was performing neutral actions (e.g., rowing).

# MATERIALS AND METHODS

# Participants

Twenty-five (21 males, 4 females, mean age = 22.3 years, *SD* = 2.9) TD individuals, with Low Autistic Traits ("LAT" group) and twenty (16 males, 4 females, mean age = 22.8 years, *SD* = 9.0) young adults with High Autistic Traits ("HAT" group) took part in the experiment. LAT participants were undergraduate students from the University of Milano-Bicocca who received course credits for their participation in the study. HAT participants were recruited from a community center, the "Spazio Nautilus Onlus", and were diagnosed from different clinical teams as follows: 17 participants diagnosed with Asperger Syndrome (AS) and three diagnosed with Pervasive Developmental Disorder Not Otherwise Specified (PDD-NOS), according to DSM-IV-TR (American Psychiatric Association [APA], 2000) or ICD-10 (World Health Organization [WHO], 1992) criteria. Reliable IQ measures for 13 AS participants were obtained, (mean IQ: 118.92, *SD*: 23.392) through standardized tests, administrated by the same clinical teams who made the ASD diagnosis. Although it was not possible to obtain a formal IQ assessment from all of them the participants in the HAT group had an autonomous life and/or a job which requires a good cognitive and intellectual functioning but showed an impairment in social and communication skills. It is noteworthy that no relationship was found between IQ and biological motion perception in ASD (Atkinson, 2009). All 45 participants had normal or corrected-to-normal vision and were unaware of the purpose of the study.

<sup>1</sup>The choice to compare only four basic emotions out of the six typically considered as basic (see Ekman and Friesen, 1971) is due to the fact that both disgust and surprise have been found (e.g., Dittrich et al., 1996; Atkinson et al., 2004) to be easily confounded with other emotions (surprise also being considered a mixed emotion) and for this reason we preferred not to include them in our study.

# Ethical Statements

All participants gave a written informed consent before testing. The study was conducted in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and fulfilled the ethical standard procedure recommended by the Italian Association of Psychology (AIP). The study was specifically approved by the local Ethics Committee of Milano-Bicocca University.

# Apparatus and Materials

The experiment was carried out in a dimly illuminated room. Participants sat approximately 60 cm away from a 19-inch LCD monitor (acer-R V196lb; Resolution: 1600 × 1200 pixels; Refresh rate: 75 Hz) interfaced with an Intel-R CoreTM i7-3517U 1.90 GHz personal computer equipped with a NVIDIA-R GeForce-R GT 620M Video Board.

Four emotions were tested, i.e., happiness, sadness, anger, and fear. Eight emotional faces (two for each emotion, one portraying a male and one a female, considered as two versions of the same emotion and coded, respectively, as version 1 and 2), taken from Radboud Faces Database (Langner et al., 2010) were used as static face stimuli whereas eight patch light displays (PLDs) were used as bodily PLD kinematic stimuli. In the latter, emotions were conveyed solely by biological motion, specifically by the kinematics of light patches placed on the joints of an actor (each emotion being expressed through two different motion sequences, coded, respectively, as version 1 and 2, Atkinson et al., 2004, 2012).

In the action recognition test (Alaerts et al., 2011), eight additional PLDs of white dots moving against a black background were also used as stimuli, showing eight different non-emotional actions (i.e., walking, riding a bike, jumping, painting, rowing, playing tennis, saluting, using a hoe, Atkinson et al., 2004, 2012).

A computerized version of Autism Quotient (AQ) questionnaire (Baron-Cohen et al., 2001b) was filled in by the participants at the end of the Experimental Session. The questionnaire consisted of 50 statements with 4 possible responses (True, Almost True, Almost False, False).

# Procedure

The participants were individually tested. The software E-Prime 2.0-R (Psychology Software Tools, Inc., Pittsburgh, PA, USA) was used for stimuli presentation and data recording.

The experiment was divided in four sessions: (i) static faces test; (ii) bodily PLD kinematics test; (iii) action-recognition test (Alaerts et al., 2011), and (iv) AQ questionnaire.

Instructions were provided verbally and also appeared written on the monitor at the start of each session.

In order to be sure that the participants could extract meaningful information from PLDs and to familiarize the participants with the task, before the experiment each participant was shown a short movie displaying PLD of a walking man.

The order of presentation of the sessions (i) and (ii) was counterbalanced across participants. Both static faces test and bodily PLD kinematics test consisted of 24 trials (8 stimuli × 3 repetitions), randomized for all participants. In these first two sessions participants were asked to indicate as fast as possible the displayed emotions by pressing different buttons on a keyboard. A forced choice paradigm was used, to avoid any interference caused by possible difference in the ability to associate the emotional information with the appropriate descriptive words. The four response options (happiness, sadness, anger, and fear) were indicated on the respective response buttons (Q-key, D-key, K-key, P-key), which were labeled with the emotion name.

The pressing of the response button started a blank interval of 1 s, followed by the next trial. Each trial was presented for a maximum duration of 6 s (i.e., 1 s for pictures and 3 s for PLDs, followed by, respectively, 5 s and 3 s of a black mask), after which the blank interval, and the next trial, automatically started.

In the action-recognition test participants had to watch a series of eight short movies (duration of 3 s), and were asked to verbally describe the displayed actions in the point light animations. Each series always started with the walking man already seen before the experimental session, while the other seven movies were presented after it in random order. Each movie was cyclically presented for a maximum duration of 5 min. Participants were instructed to press the spacebar when they were satisfied with their description, which was recorded by the experimenter. The press of the spacebar started the next trial.

Finally, for the AQ questionnaire (Baron-Cohen et al., 2001b), participants were asked to read each of the 50 sentences and to press, for each sentence, one out of four possible response keys (1-key, 2-key, 3-key, 4-key), which were labeled with the four response options (True, Almost True, Almost False, and False, respectively). The software E-Prime 2.0-R (Psychology Software Tools, Inc., Pittsburgh, PA, USA) was used for both questionnaire presentation and to automatically compute the questionnaire total score.

The whole experiment lasted approximately 30 min. Participants were free to interrupt the Experiment at any moment and to take a brief rest between different sessions.

# RESULTS

# Preliminary Data Analysis

Three preliminary analyses were performed.

First, we checked for a possible effect of Repetition on Accuracy in the Emotion Recognition Test: a repeated-measures analysis of variance (ANOVA) with Emotion, Stimulus, Stimulus Number and Repetition showed no significant effect of Repetition [*F*(2,38) = 1.549, *p* = 0.226].

Second, an independent sample *t*-test on the number of correct responses for the action recognition test was carried out. It showed no difference in accuracy between LAT and HAT groups [*t*(43) = −0.542, *p* = 0.590], indicating that all participants could correctly perceive the action performed in the video and conveyed by the PLDs *per se*.

Third, the AQ scores were compared between the different experimental groups through an independent sample *t*-test to make sure that the two groups differed in terms of Autistic traits. It showed a significant effect [*t*(29.06) = −5.214, *p <* 0.001] between LAT participants (mean = 16.08, *SD* = 5.09) and HAT participants (mean = 27.55; *SD* = 8.721) confirming that the two groups indeed differed.

With regards to the main analyses, sessions (i) "static faces test" and (ii) "bodily PLD kinematics test" are hereinafter referred to as "emotion recognition test". To compare the recognition performance in the emotion recognition test for LAT and HAT, accuracy (i.e., the proportion of correct responses) and response times were analyzed with two Mixed Models Analysis. The first four trials for each participant were considered to be practice trials and were discarded from the analysis. Degrees of freedom in Mixed Models Analyses were estimated through the Satterthwaite approximation method. In the next section the results for accuracy and response times to static faces and bodily PLD kinematics tests are separately discussed.

### Accuracy

For both LAT and HAT participants overall classification accuracy averaged across stimulus type was high [89.5% (*SD* = 22.52) for LAT vs. 84.6% (*SD* = 25.42) for HAT].

A Mixed Models analysis with Emotion, Stimulus Type and Stimulus Version as independent within-subjects variables and Group as independent between-subjects variable showed a significant main effect of Emotion [*F*(3,129) = 15.309, *p <* 0.001]; the interactions Emotions × Stimulus Type [*F*(3,151.913) = 2.921, *p <* 0.05] and Emotion × Stimulus Type × Group [*F*(3,151.913) = 4.198, *p <* 0.01] were also significant. No other factors or interactions were significant.

The variance component of each random factor (reported in **Table 1**) can be estimated. If the estimated variance components are larger than zero, then each random factor captures a significant variance component. So this model captures data dependency due to the repeated-measure design (Gallucci and Leone, 2012).

**Figure 1** shows the main effect of Emotion. *Post-hoc* tests (Sidak correction) revealed that accuracy for trials conveying happiness (mean = 0.943) and anger (mean = 0.923) was higher than for those conveying fear (mean = 0.807) and sadness (mean = 0.810, all *p*s *<* 0.001). However, both happiness and anger and both fear and sadness did not differ from each other (*p* = 0.974 and *p >* 0.999, respectively).

As stated above, the main effect of Emotion was modulated by a significant interaction with Stimulus Type, which was itself modulated by a significant 3-way interaction Emotion × Stimulus Type × Group. To follow up this significant 3-way interaction, three different Simple Effect



Analyses were performed on the interaction Emotion × Stimulus Type × Group.

A first Simple Effect Analysis compared differences among single emotions conveyed by different types of stimuli for the two groups of participants (see **Figure 2**).

With static face stimuli, LAT participants were the least accurate for fear (mean = 0.78), which differed significantly from happiness (mean = 0.973, *p <* 0.001) and anger (mean = 0.94, *p* = 0.001). By contrast, the HAT group did not show any significant difference for any emotion.

With PLDs, LAT participants were the least accurate for sadness (mean = 0.753), which differed significantly from fear (mean = 0.88, *p* = 0.038), happiness (mean = 0.98, *p <* 0.001), and anger (mean = 0.96, *p <* 0.001). HAT participants were less accurate for fear (mean = 0.725) than for happiness (mean = 0.867, *p* = 0.038) and anger (mean = 0.917, *p* = 0.001), while sadness (mean = 0.767) showed a lower accuracy in comparison with anger (*p* = 0.023).

A second Simple Effect Analysis compared the two groups' accuracy for different emotions as a function of stimulus type. It showed that with PLDs, HAT participants recognized both fear (mean = 0.725) and happiness (mean = 0.867) less accurately (*p* = 0.004 for fear and *p* = 0.033 for happiness) than LAT participants (mean = 0.88 for fear and mean = 0.98 for happiness).

Finally, a third Simple Effect Analysis compared the two different kinds of stimuli as a function of different emotions in the two groups of participants (**Figure 3**).

It showed that LAT participants recognized fear better through PLDs (mean = 0.88) than through faces (mean = 0.78, *p* = 0.025), while they recognized sadness better through faces (mean = 0.893) than through PLDs (mean = 0.753, *p* = 0.002). This dissociation can be appreciated in **Figure 4**.

By contrast, HAT participants recognized fear better through faces (mean = 0.842) than through PLDs (mean = 0.725, *p* = 0.019).

Thus, as hypothesized, while in LAT participants the recognition of fear was more accurate when it was conveyed

by PLDs, HAT group showed the opposite pattern with a more accurate recognition of fear when it was conveyed by static face images.

#### Response Times

Data relative to static faces were analysed and considered separately from data relative to PLDs, given that each PLD lasted 3 s whereas each static face was presented for 1sec. Furthermore, participants were more familiar with pictures of static emotional faces than with PLDs and this implies that each PLD, at least at the first repetition, was shown for its entire duration (i.e., 3 s) whereas participants often gave their response to static faces before the entire stimulus duration.

A Mixed Models analysis with Emotion, Stimulus Type, Stimulus version and Repetition as independent within-subjects variables and Group as an independent between-subjects variable showed a main effect of both Stimulus Type [*F*(1,44.199) = 256.941, *p <* 0.001] and Repetition [*F*(2,169.615) = 16.53, *p <* 0.001], as well as a more interesting main effect of Emotion [*F*(3,144.064) = 20.294, *p <* 0.001]. The Stimulus × Repetition interaction was significant [*F*(2,169.718) = 7.787, *p* = 0.001] as well as the Repetition × Group interaction [*F*(2,169.615) = 3.237, *p* = 0.042] and the 3-way interaction Emotion × Stimulus Type × Group [*F*(3,137.029) = 2.749, *p* = 0.045]. No other interaction was significant.

The variance component of each random factor (reported in **Table 2**) can be estimated. If the estimated variance components are larger than zero, then each random factor captures a significant variance component. So this model captures data dependency due to the repeated-measure design (Gallucci and Leone, 2012).

The main effect of Stimulus was due to the fact that, as explained above, RTs for pictures of static faces were consistently faster than RTs for PLDs. The main effect of Repetition was modulated by significant interactions with both Stimulus Type and Group. A Simple Effect Analysis on the first interaction (i.e., Repetition × Stimulus Type) showed that RTs for PLDs at the first presentation were significantly longer (mean = 2804.077 ms) than RTs for PLDs at the second (mean = 2477.570 ms, *p <* 0.001) and the third repetitions (mean = 2402.502 ms, *p <* 0.001), which did not significantly differ from each other. On the contrary, RTs for faces did not significantly differ among



the three repetitions (all *p*s *>* 0.5). A Simple Effect Analysis on the second interaction (i.e., Repetition × Group) showed that, while LAT participants presented a linear trend in RTs across repetitions, this was not the case for HAT participants: for LAT the first presentation (mean = 2072.365 ms) showed higher RTs than both the second (mean = 1935.685 ms, *p* = 0.059) and the third repetition (mean = 1794.718 ms, *p <* 0.001), which were also significantly different from each other (*p* = 0.014); for HAT participants only the first presentation (mean = 2088.473 ms) presented higher RTs than the second (mean = 1840.949 ms, *p <* 0.001) and the third ones (mean = 1890.212 ms, *p* = 0.002), which did not differ from each other (*p >* 0.999).

*Post hoc* tests (Sidak correction) on the main effect of Emotion showed that LAT participants' RTs for trials conveying happiness (mean = 1676,756 ms) were lower than those conveying all the other emotions: fear (mean = 2106,651 ms, *p <* 0.001), sadness (mean = 2088,837 ms, *p <* 0.001), and anger (mean = 1864,781 ms, *p* = 0.06). Moreover, anger presented significantly lower RTs than fear (*p* = 0.003) and sadness (*p* = 0.014). For the HAT group, only happiness (mean = 1724,201 ms) showed lower RTs than all other emotions (all *p*s *<* 0.05): fear (mean = 2084,563 ms), sadness (mean = 1987,800 ms), and anger (mean = 1962,947 ms).

The main effect of Emotion was modulated by the 3-way interaction Emotion × Stimulus Type × Group. A Simple Effect Analysis was conducted on this 3-way interaction. For faces (**Figure 5**, left) both LAT and HAT participants showed lower RTs for happiness (mean = 1090.173 ms and 1016.228 ms, respectively, for LAT and HAT groups) than for all the others emotions: fear (LAT mean = 1572.327 ms, *p <* 0.001; HAT mean = 1468.859, *p <* 0.001) sadness (LAT mean = 1390.105 ms, *p* = 0.01; HAT mean = 1423.866 ms, *p* = 0.001) and anger (but only for HAT participants, mean = 1296,003 ms, *p* = 0.042). For LAT participants RTs for angry faces were significantly lower than RTs for fearful ones (*p* = 0.007).

For PLDs (**Figure 5**, right), LAT participants showed significantly lower RTs for PLDs conveying happiness (mean = 2263.338) than those conveying fear (mean = 2640.976, *p* = 0.001) and sadness (mean = 2787.568, *p <* 0.001); RTs for angry PLDs were also lower than those conveying sadness (*p* = 0.018). HAT participants did not show any significant advantage for PLDs.

Regarding this last comparison, it should be noted that, in principle, it is possible that the difference between RTs for different emotions with PLDs was not due to a difference in emotion recognition, but to a difference in actor performance. In other words, it is possible that PLDs conveying happiness were detected faster not because happiness is the easiest emotion to detect, but because the actor was more effective in performing that specific emotion than all the others. However, this is also true for static pictures. Nevertheless, given that studies using PLDs as stimuli for emotion recognition are not as common as studies using pictures of static faces, any comparison across different emotions with PLDs should be taken with caution and no strong conclusion should be drawn from it.

# DISCUSSION

emotions. Error bars represent standard errors.

We investigated whether the type of stimulus (i.e., pictures of static faces vs. body motion) contributes differently to the recognition of four different emotions (i.e., Happiness, Anger, Fear, and Sadness). To this end, we performed an exploratory study aimed at comparing LAT and HAT individuals to test if the two groups based their recognition on different cues (static facial cues vs. bodily kinematics). Specifically, we were interested in seeing in LAT individuals (i) whether body language was at least as effective as static face in the recognition of fear; (ii) sadness was better recognized through facial expression; (iii) the presence of an advantage for happiness also when conveyed through PLDs. Moroever, we expected (iv) HAT individuals to be as good as LAT ones in recognizing happiness both with facial expressions and PLDs; and (v) HAT individuals to rely on different cues for the recognition of fear and anger than LAT ones.

Interestingly, the action recognition test showed no difference between the LAT and HAT group, indicating that HAT participants could correctly perceive the motion conveyed by the PLDs *per se*. This result confirms the results by both Moore et al. (1997) and Hubert et al. (2009), who reported that participants with autism were perfectly capable of integrating the individual points of the PLD into a whole, and with several other studies showing that global processing of hierarchical stimuli (i.e., the integration of local elements into a coherent whole) is not specifically impaired in people with autism (e.g., Mottron et al., 2003; Dakin and Frith, 2005, for a review). However, there are also several studies reporting ASC impairments in identifying biological motion from PLDs (e.g., Blake et al., 2003; Atkinson, 2009; Annaz et al., 2010; Nackaerts et al., 2012), and a more general deficit in ASC in coherent motion processing (Spencer et al., 2000; Milne et al., 2002). The latter deficit is typically considered as a good example of atypical global perceptual processing in individuals with ASC, given also its correlation with other markers of atypical global perception (Pellicano et al., 2005). The fact that our HAT participants could correctly perceive biological motion from PLDs is thus in disagreement with the above studies. We think that the ability of HAT participants to perceive biological motion, found in the present study, could be due either to the fact that participants in our study were all high

functioning (see Blake et al., 2003) or to the fact that the time of presentation of the displays both in the action recognition test and in the emotion recognition test all lasted 3 s. Robertson et al. (2014) found that when presented with PLDs, individuals with ASC showed comparable performance to control participants only if PLDs duration was "long" (1.5 s). In contrast, impairment was found for a shorter duration (0.2 s.). It is thus possible that HAT participants exhibited similar behavioral results as LAT participants, only because the stimulus duration of the PLDs used in our study was "long enough", while a shortening of viewing duration would cause a worsening in the performance. Another possibility is the combination of different factors. That is, our participants were high functioning without a severe deficit in global perceptual processing (or they have been rehabilitated) and the duration of PLDs was long enough to efficiently integrate the local elements into a global configuration.

The possible causes that could explain HAT individuals' ability to recognize biological motion can also account for the fact that HAT participants were very accurate in perceiving emotions both with faces and with PLDs, being as accurate as LAT participants. The fact that our HAT participants could compensate for their possible deficits in emotion recognition makes it even more striking the fact that they relied on different cues from those used by LAT group in emotion recognition.

Overall, our results confirm that emotion recognition is not globally compromised in HAT participants – at least for our group of participants and with the type of stimuli used in the present study – since some impairment was found only for specific emotions.

However, differently from LAT participants, HAT ones did not show any significant advantage for any emotion. Happiness, in fact, proved to be the easiest emotion to be recognized only for LAT participants but not for HAT participants. In line with previous studies, for the LAT participants we found a *happy face advantage* (Leppänen and Hietanen, 2003; Shimamura et al., 2006, although no difference was found in accuracy for happiness and anger), which for the first time was also found for bodies. We propose to call this latter effect *happy body advantage*, to underline its similarity with the analogous happy face advantage (i.e., better accuracy and faster response times). One of our initial hypotheses was to find an advantage for the recognition of happiness in PLDs. Our reasoning was mainly based on the peculiarity of the kinematics associated with happiness (faster and smoother as compared to the kinematics associated to all the other emotions), and in line with previous studies (Atkinson et al., 2004, 2012). Results for LAT participants, thus, confirm our reasoning. By contrast, the results for HAT participants partially contradict one of our initial hypotheses according to which we expected HAT participants to be as good as LAT ones in recognizing happiness both with facial expressions and PLDs. In fact, HAT participants did not show the same advantage as shown by LAT ones for the recognition of happiness. Interestingly, Uljarevic and Hamilton (2013) showed a negligible impairment in the recognition of happiness in autism. This very mild impairment thus could be the reason why no advantage was shown for this particular emotion in our HAT sample.

However, for both happy face and happy body advantages in LAT participants no difference was found in accuracy between both stimulus types conveying happiness and anger. Thus, even if both happy faces and happy PLDs were recognized faster than all the other emotions (and more accurately than fear and sadness), they were not recognized more accurately than angry faces and PLDs. We think that, at least for faces, this could be due to the so-called anger superiority effect (Hansen and Hansen, 1988), for which it is easier to detect angry faces than happy faces. This effect is usually observed in visual search paradigm (in which angry faces pop out from a crowd of neutral ones), which is a typical attentive task. We speculate that, even if in a typical perceptual task as in the one tested in our study the anger superiority effect does not emerge, nonetheless angry stimuli are more perceptually (and behaviourally) salient than the ones conveying fear and sadness. For this reason they do not differentiate from the happy ones, which are easier to detect (because of the happy face advantage). Even if an anger superiority effect has never been reported for bodies, the same reasoning holds for PLDs, given that evidence of a greater visual sensitivity for angry walkers than for the other five different emotions has been reported (Chouchourelou et al., 2006). Moreover, there is a stronger link of anger, as compared to happiness, with the detection of gait in PLDs (Ikeda and Watanabe, 2009). It is thus possible that, for PLDs as well as for faces, anger is at least as perceptually salient as happiness and this would explain why a difference in accuracy between happy and angry PLDs has not been found.

For the remaining two emotions tested in this study, fear and sadness, an interesting result emerged, pointing out that certain emotions are expressed better through dynamic information than through static ones. In particular, LAT participants relied more on static faces to recognize sadness, but on PLDs to recognize fear. This is in line with our initial hypotheses. In fact we hypothesized that (i) for the recognition of sadness, facial expression would play a major role, given that the body language associated with sadness (e.g., slow gait, bows and reclined head) could be neutral (i.e., non-emotional) for some individuals (and also given that sadness is often associated with behaviors such as crying or moaning, which are better expressed with the face); (ii) for the recognition of fear, bodily kinematics would be at least as important as static faces, given that it would be used by the emotion recognition system to disambiguate between fear and surprise (which could be easily confused, see Smith and Schyns, 2009). This is also consistent with the idea that fear is usually associated with behaviors such as for example shivering, which are better detectable in body language than in emotional faces.

The advantage for motion kinematics in the recognition of fear is not present in HAT participants. In fact, for what concerns fear processing and recognition, our results show that HAT participants are often inclined to use strategies based on processing face details, which are different from those used by control participants.

Different speculations are possible to explain this result. On the one hand, one explanation could be based on the fact that adults with HAT are usually trained to recognize different emotions through faces. For this reason they could learn to compensate for a general deficit in emotion recognition, but in doing so they learn to rely more on static face details than on bodily kinematics, for which they do not undergo any specific training. This would explain why, when LAT individuals use kinematic cues to recognize fear, our HAT individuals do not rely on these cues.

Another possible explanation refers again to a lack of confidence in bodily kinematic cues for HAT individuals, but does not refer neither to a deficit in emotion recognition nor to a possible compensation for it. This second possible explanation is based to the fact that empathy deficits in autism are a function of interoceptive deficits related to alexithymia (Silani et al., 2008) and that alexithymia in turn has been found to be correlated with the confidence in emotion perception in Point-Light Displays (Lorey et al., 2012). In fact, Lorey et al. (2012) examined how the ability to perceive own emotions assessed with the Toronto Alexithymia Scale, is related to both the ability to perceive emotions depicted in PLDs and the confidence in these perceptions. The results showed that people with higher alexithymia scores were significantly less confident about their decisions, but did not differ from people with lower alexithymia scores in the valence of their ratings. Recent fMRI studies (e.g., Silani et al., 2008; Bird et al., 2010) have shown that the particular difficulties in emotional awareness in individuals with HAT are not related to their impairments in self-reflection/mentalizing but instead they are a function of interoceptive deficits related to alexithymia. Bird et al. (2010) suggest that the empathy deficits observed in autism may be due to the large comorbidity between alexithymic traits and autism, rather than representing a necessary feature of the social impairments in autism. Thus, if our HAT participants presented a high interoceptive deficit (as it is likely to be the case), this would explain why they did not rely on kinematic cues, being less confident than LAT ones in their judgment on PLDs (Lorey et al., 2012). This speculation is in need of further research, but it should be noticed that it does not exclude the other suggested possibility of a lack of confidence in judgements based on bodily kinematics. In both cases, in fact, it is possible that HAT individuals simply rely more on static cues because, if any thing, they may have been trained with emotional faces and not with emotional bodies.

A last possibility to explain why individuals with HAT do not use body cues to recognize fear like LAT ones, is based on the possible impairment in global motion which, as already suggested in this section, even if present, it does not emerge in this study (because of a long stimulus duration for PLDs) and could explain why HAT participants do not rely on bodily kinematics to recognize fear.

We think that a possible way to study biological motion perception in ASC without having to deal with long durations and motion coherence – which of course is involved in PLDs, not to mention the fact that PLDs with durations shorter than 1 s are difficult to see as emotional – would be to study the biological motion of a single point of light. Our suggestion is based on the idea that our capability to recognize biological motion is not strictly related to the dynamic template of the classical PLD, but rather to the kinematic structure of the movement of *each* single point (Runeson and Frykholm, 1981). In particular, our perceptual system is very well attuned to a peculiarity of human movement, namely, a particular relation between velocity and curvature known as the two-thirds power law (Lacquaniti et al., 1983). The sensitivity to this biological motion of a single point-of-light has been investigated in adults (e.g., Viviani and Stucchi, 1989; de'Sperati and Viviani, 1997; Actis-Grosso et al., 2001; Carlini et al., 2012) as well as in 4-day-old human neonates (using a standard preferential-looking paradigm, Méary et al., 2007) and indicates that human motion perception is attuned to biological kinematics. However, nobody has studied yet the biological motion of a single point-of-light in ASC. For example, findings from a preferential looking paradigm in 2 year-old toddlers indicate that only TD-children demonstrated a clear looking preference for biological PLDs, whereas toddlers diagnosed with autism did not (Klin et al., 2009). We think that a similar study with biological motion of a single point of light could rule out any possible involvement of motion coherence and duration, thus helping to solve the problem of different authors reporting different results in biological motion perception for the ASC group. It should also be noticed that a single point of light could also convey emotions, and could be studied accordingly with both TD and ASC populations. In fact, not only has it been shown that arm movements alone, performing simple actions, convey information about affect (Pollick et al., 2001), but it was recently found that specific motion patterns increase perceived intensity and arousal related to emotional faces (Chafi et al., 2012). Following this line of research, and taking into account recent evidence of a link between single dot kinematics and localizations (Actis-Grosso et al., 2008), we think that it would be possible to find specific kinematics (i.e., absolute velocity, accelerations, stops, and so on) related to specific emotions, so that a single point of light could be perceived as happier or sadder, in analogy with classical studies on animacy (Heider and Simmel, 1944; Michotte, 1954), helping in this way to better clarify the link between the perception of emotion (and, more in general, of agency as highlighted by studies on the so-called social network, Wheatley et al., 2007) and the perception of motion (Tavares et al., 2011). As a matter of fact, we think that future research should consider a new experiment focused on the perceived animacy and/or emotions of a single point, in order to study kinematic features of biological motion through short-duration stimuli.

In our view, the results in which HAT participants exhibited a different recognition pattern for fear, and were generally more inclined to use strategies based on processing static face details, could also account for the emotion recognition difficulty with static emotional faces often found in autistic population, in which recognition of fear is also found to be worse than in TD individuals (Uljarevic and Hamilton, 2013). What we suggest is that the recognition of emotions is based on kinematics even when static faces have to be judged. In fact, it has recently been suggested (Actis-Grosso and Zavagno, 2015) that pictures of emotional faces may convey information with respect to implied motion: namely the fact that a still photograph of an object in motion may convey dynamic information about the position of the object immediately before and after the photograph was taken (Freyd, 1983; Kourtzi and Kanwisher, 2000). Focusing on the facial expression of emotions, Actis-Grosso and Zavagno (2015) hypothesized that all emotions could be classified in terms of inherent dynamism, that might be a visible trace within the facial expression of an emotion (Freedberg and Gallese, 2007), and that some facial emotions are more visually dynamic than others. They asked a group of participants to rate both the emotional content and the dynamicity of emotional faces taken from static artworks and found that some facial emotions (i.e., disgust, anger, and fear) were positively related to the dynamicity attributed to the artworks, thus presenting a first evidence that also static emotional faces could be somehow dynamic, allowing the observer to extract dynamic information from their static representations. If this result is generalized across more different emotions and with photographs such as the ones used in this study, we think that it would be possible to find a specific impairment for ASC in recognizing "dynamic" emotions in static pictures.

# CONCLUSION

This study highlights for the first time that certain emotions are expressed and perceived better through dynamic information whereas others are better recognized through static ones and that LAT individuals and individuals with HAT based their emotion recognition on different cues. We thus think that future research rather than searching for a universal and primary emotion recognition impairment in autism should take into account that different emotions are better recognized though different stimulus types which are processed differently, in LAT individuals and individuals with HAT. We also think that the present study, besides sheding some light on the link between the perception of motion and the perception of emotion in HAT individuals, suggests some future directions for both scientific research – that should study in more detail the kinematics associated with single emotions and the way in which individuals with ASC rely on it to recognize emotions – and clinical training – that should be more focused on body movement.

# ACKNOWLEDGMENTS

The authors are indebted to the volunteers who donated their time to participate in the study and the staff at the community center the "Spazio Nautilus Onlus", in Milan, where the participants with HAT were tested, for their collaboration. The authors are also very grateful to A.P. Atkinson, who very kindly provided them with the body PLDs stimuli, and to R. Dotsch, G. Bijlstra, and O. Langner, who very kindly provided

# REFERENCES


them with the Radboud Faces Database. The authors thank Irene Tesoro for helping in data collection, and Marcello Gallucci for his advice on statistical analyses. This research was funded by a grant from the University of Milano-Bicocca to RA-G and PR and a scholarship from the same University to FB.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Actis-Grosso, Bossi and Ricciardelli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Something in the way people move: the benefit of facial movements in face identification

*Andrea Albonico1,2\*, Manuela Malaspina1,2 and Roberta Daini1,2\**

*<sup>1</sup> Department of Psychology, Università degli studi di Milano-Bicocca, Milan, Italy, <sup>2</sup> NeuroMI – Milan Center for Neuroscience, Milano, Italy*

While the dissociation between invariant aspects of a face and emotional expressions has been studied extensively, the role of non-emotional changeable aspects in face recognition has been considered in the literature rarely. The purpose of the present study was to understand whether information on changeable aspects (with and without emotional content) can help those individuals with poor face recognition abilities (when based on invariant features) in recognizing famous faces. From a population of 80 university students we selected two groups of participants, one with poor performance (experimental group, EG) and the other with good performance (control group, CG). By means of a preliminary experiment, we selected videos of 16 Italian celebrities that were presented in three different conditions: motionless, with non-emotional expressions, and with emotional expressions. While the CG did not differ in the three conditions, the EG showed a significantly better performance in the two conditions with facial movements, which did not differ between each other. These results suggest a role of changeable aspects in the identification of famous faces, rising only in the case invariant features are not analyzed properly.

#### *Edited by:*

*Andrew Bayliss, University of East Anglia, UK*

#### *Reviewed by:*

*Juan Lupiáñez, University of Granada, Spain Peter Lewinski, University of Amsterdam, Netherlands*

#### *\*Correspondence:*

*Andrea Albonico and Roberta Daini, Department of Psychology, Università degli studi di Milano-Bicocca, Piazza dell'Ateneo Nuovol, 20126 Milan, Italy a.albonico@campus.unimib.it; roberta.daini@unimib.it*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 30 March 2015 Accepted: 30 July 2015 Published: 11 August 2015*

#### *Citation:*

*Albonico A, Malaspina M and Daini R (2015) Something in the way people move: the benefit of facial movements in face identification. Front. Psychol. 6:1211. doi: 10.3389/fpsyg.2015.01211* Keywords: face perception, famous face identification, emotional expressions, typical facial expressions

# Introduction

All fields of cognitive neuroscience have shown evidence that faces are *special* visual stimuli (e.g., Farah et al., 1998) and that we have specific regions for processing them (Michel et al., 1989; Sergent et al., 1992; Kanwisher et al., 1997; Allison et al., 1999). Whether this is *nature* or *nurture* is still a matter of debate, but recent studies have shown that a genetic contribution could occur in both recognition abilities of healthy people and face recognition difficulties of poor recognizers, on the base of participants' relatives performances (e.g., Schmalzl et al., 2008; Wilmer et al., 2010). For instance, Wilmer et al. (2010) have obtained a correlation of 0.7 between monozygotic twins when performing face recognition tasks, more than two times higher than dizygotic twins (0.29), suggesting that such a specialization is mainly genetically determined.

The numerous demonstrations of specificity in the process of face recognition have led to the development of cognitive models (Bruce and Young, 1986; Haxby et al., 2000; Gobbini and Haxby, 2007), that largely agree about the existence of a dissociation between the perception and processing of the invariant features, that permits the recognition of the identity of a face, and the changeable aspects of a face, which are a major source of information about social context. The anatomo-functional model of Haxby et al. (2000), for example, includes a module that processes the changeable aspects of a face that are connected to all the characteristics important for social interaction, such as analysing eye gaze direction in relation to the orientation of attention, the lip movements (pre-lexical speech perception) associated with oral communication and the facial expressions used to adjust the emotional tone.

In line with Haxby et al. (2000), facial expressions and the identity of a face are processed by two separate routes (involving the superior temporal sulcus, STS, and the fusiform face are, FFA, respectively) after an initial common encoding process and, therefore, are independent of each other. However, some studies have shown that judgments of expression can be modulated by the identity and familiarity of a face (Schweinberger and Soukup, 1998) and vice versa (Sansone and Tiberghien, 1994; Baudouin et al., 2000; Lander and Metcalfe, 2007), suggesting an interaction between the processing of emotion and identity in some circumstances (Gobbini and Haxby, 2007).

This dissociation has been studied mainly by using static faces, with or without emotional expressions. Our everyday experience of faces, however, involves mostly moving faces and the way people move their face can be very "typical." There are, indeed, facial expressions which are non-emotional, and that may be prototypical of a particular individual, as everyone has some characteristic facial expressions usually. Contrary to emotions, which are often the same and universally recognizable (Ekman and Friesen, 1976; Ekman et al., 1987; Izard, 1994; Elfenbein and Ambady, 2002; but see also Matsumoto, 1992, and Russell, 1994, for the influence of the culture and language on recognition of emotion from facial expressions), these expressions (dynamic facial signature) are idiosyncratic and often without a clear underlying emotional tone (O'Toole et al., 2002).

O'Toole et al. (2002) proposed a model of face recognition in which the processing of the changeable aspects of a face is twofold: the changeable aspects are processed, like many other stimuli of motion, from the dorsal stream (i.e., the pathway in the human visual system that processes spatial and motion information, Ungerleider and Haxby, 1994) and convey information about social content; moreover a structure from motion is obtained which allows identification of dynamic facial signatures and recognition of familiar face.

According to O'Toole et al. (2002), we can learn the idiosyncratic movements of a specific face and we can use them to improve recognition: in fact it has been shown that it is possible to learn to discriminate "artificial" individuals based solely on facial motion information (Hill and Johnston, 2001; Knappmeyer et al., 2001), and faces can be recognized more easily in the presence of facial movements in non-optimal visual conditions (Knight and Johnston, 1997; Lander et al., 1999, 2001; Lander and Bruce, 2000). These two sets of data reflect the two ways in which motion can aid face recognition: the idiosyncratic movements of a specific face could be a different and independent cue to recognition, that is represented regardless of the structural information of the face (supplemental information hypothesis, SIH); on the other hand facial motion could facilitate the perception of the structural model of the face, resulting in an improved recognition (representation enhancement hypothesis, REH). These two theoretical explanations of the motion advantage in face recognition are not mutually exclusive, and various factors, like face familiarity and task demand, may influence the relative importance of the two types of cue (O'Toole et al., 2002).

Several studies have demonstrated the motion advantage (with both rigid and non-rigid facial movements) in different types of tasks involving familiar and unfamiliar face recognition in normal individuals (e.g., Lander et al., 1999; Knappmeyer et al., 2003; Butcher et al., 2011) and in congenital prosopagnosics (Steede et al., 2007; Longmore and Tree, 2013; Daini et al., 2014), but only a small number of studies have considered a further distinction among the latter: facial expressions in fact can include both expressions with an emotional content and expressions without any affective content. Since both these kinds of expressions convey dynamic facial information, it is possible that they are processed in a similar way. However, a recent study by Gobbini and Haxby (2006) has shown that other areas outside the core system for face recognition (i.e., FFA) could be involved in the processing of dynamic expressions, and the involvement of these extra areas could account for the difference usually found between emotional and non-emotional facial expressions, as much as between changeable and invariant features.

It is still an open question whether non-emotional facial expressions can contribute to recognize a face.

The first evidences suggesting that the emotional expressions are processed specifically and independently from the nonemotional expressions in our neural system, came from two studies that showed how these two categories of expressions (emotional and non-emotional) can have different effects on the recognition of static images of unfamiliar faces both in healthy individuals (Comparetti et al., 2011) and in congenital prosopagnosics (Daini et al., 2014). However, to our knowledge, if the difference between emotional and non-emotional expressions could be observed also in recognizing familiar (i.e., famous) faces has not been investigated yet.

In order to explore the role of emotional and non-emotional expressions in face recognition, we selected two groups of participants [a poor recognition or experimental group (EG) and a good recognition or control group, (CG)], different in terms of their ability to recognize faces, and asked them to identify famous celebrities from three different visual presentation conditions: one static and two dynamic (emotional and non-emotional). Moreover, we decided to use videos instead of static pictures because of the dynamic component of facial movement, which is more evident and realistic in this kind of stimuli, compared to static images.

We sought to investigate whether people with poor skills in face recognition can benefit from the information provided by the changeable and dynamic components of a face in order to identify the others. Moreover, we wondered whether any eventual benefit would be strictly related to emotional expressions, or whether it would be due to facial movements and independent of the emotional content, confirming that our visual system analyses and stores non-emotional expressions independently from invariant features and from the content of emotional expressions.

# Materials and Methods

#### Preliminary Experiment for Stimuli Selection

The stimuli consisted of 3-s videos of 22 Italian celebrities, selected from television, sports, politics and science, and balanced by gender (11 males and 11 females). We created these three sets of stimuli, in accordance with three different conditions: one in which the face had a neutral expression (Set 1, no expression), a second in which the face assumed an expression of joy (Set 2, emotional expression) and a third in which the face assumed an expression without any affective connotation (Set 3, nonemotional expression). The 3 s videos were taken from longer videos available on the web, by using "VirtualDub 1.9.8." These cut frames were standardized in size and brightness with a photo editing program (Adobe Photoshop CS4) and reassembled then in a new video (size: 640 × 480 pixels, 22.1◦ × 34.7◦), using the program "Windows Movie Maker." At the end of the video collection, each celebrity was shown then in three videos, one for each experimental condition and each set (Set 1, no expression; Set 2, emotional expression; Set 3, non-emotional expression) was finally composed by 22 videos.

Prior to the start of the experiment, a separate group of 30 participants, all students at the University of Milan-Bicocca (15 males and 15 females, age = 22.97, SD = 2.19), rated these videos for the main experiment. Each of these participants provided informed consent, in accordance with ethical guidelines by the University of Milan-Bicocca ethical committee. All of them had normal or corrected-to-normal vision and no evidence of neurological or neurophysiological alterations.

The 66 videos were subjected to a rating to confirm the absence of expression in the Set 1 stimuli, the proper recognition of the emotion represented in the Set 2 stimuli and the absence of an affective connotation for the stimuli in Set 3.

Twenty-two videos, selected randomly from the larger sample of 66 videos, were shown to each independent and naïve judge, by using E-Prime 2.0 on the screen of a PC (1280 × 768 pixels, 40.5 × 30.5 cm 60 Hz refresh rate). No time limit was given.

At the end of each video participants of this first experiment were asked to evaluate the presence of a dynamic expression, by using a five level semantic differential scale ranging from 1 to 5, where 1 was the absence of expression and 5 the presence of a very dynamic expression. After, they were also asked to rate how they would classify the previously seen expression of the video (where possible) in terms of emotional content, replying to a multiple-choice question (possible answers: joy, fear, sadness, disgust, anger, surprise, no emotion, other emotion). As a result of this procedure, each video obtained 10 independent ratings. For each video two scores were then calculated: the average score about the dynamic content of the face and the mode of the expression type conveyed by the face. In order to select the videos, different parameter were used: (1) for neutral expression videos, a low intensity (*<*1.8) and no emotional content; (2) for emotional expression videos: a high intensity (*>*3.5) and the correct type of expression (joy); and, finally, (3) for non-emotional expression videos, a high intensity (*>*3.5) and no emotional content. We considered a video satisfactory when at least seven out of 10

judges had given the expected response. The videos that did not meet these requirements were then excluded. Furthermore, if only a video of a celebrity was wrongly evaluated by the judges, all the three videos of that celebrity were then discarded, in order to keep the relationship between the three types of stimuli. Six out of 22 famous people and their videos were thus excluded from the stimuli.

Afterward, the final set of stimuli (see Appendix 1 in Supplementary Material) included videos of the 16 remaining famous people (eight males and eight females) and for each of them, three videos were presented: a neutral expression video, a dynamic emotional expression (joy), and a non-emotional dynamic expression.

#### Main Experiment Participants Selection

#### In selecting participants we sought to identify individuals with poor skills in face recognition, and a CG. For this purpose we recruited (through the University of Milan-Bicocca, Sona System©) from the student population of Psychology at the University of Milan-Bicocca, 63 volunteers who reported no difficulties in face recognition (13 males and 50 females, all right-handed, age range 19–25, mean age 21.59 ± 1.91) and 17 volunteers (1 male and 16 females, all right-handed, age range 19–26, mean age 21.29 ± 2.17) who declared themselves to have difficulties in recognizing familiar faces.

Participation allowed the acquisition of credits and each participant was asked to sign an informed consent for the processing of personal data, in accordance with ethical guidelines by the University of University of Milan-Bicocca ethical committee.

All participants had normal or corrected to normal vision and no evidence of neurological or neurophysiological alterations.

To evaluate the face processing abilities each participant underwent the upright and inverted versions of the Cambridge Face Memory Test (CFMT, Duchaine and Nakayama, 2006; Bowles et al., 2009). This test is composed by three different stages of increasing difficulty, for a total of 72 trials. The inverted version was used to calculate the inversion effect index, IE (Yin, 1969). The average scores of participants with no face recognition difficulties were used to calculate the *z*-scores for each participant; we calculated also the *z*-scores with published control scores (Duchaine and Nakayama, 2006) to confirm the exact selection of the EG with poor face recognition abilities.

None of the 63 participants without face recognition impairment showed a pathological score at the CFMT (i.e., performance lower than 2.0 SD below the mean). Twenty-four out of the 63 participants (2 males and 22 females, all righthanded, age range 19–25, mean age 21.54 ± 1.69) composed our final CG, (good recognizers sample), selected on the basis of a performance above the mean (*z*-score *>* 0) and on the participants agreeing to come back to undergo the second part of the experiment.

At the end, seventeen participants who referred to having noticed difficulties in face recognition were recruited. On the basis of their scores at the CFMT only 14 of them (1 male




∗*Scores falling 2 SD below the mean.*

and 13 females, all right-handed, age range 19–26, mean age 21.57 ± 3.41) were selected as poor recognizers for the second phase of the experiment (EG, i.e., performance lower than 2.0 SD below the mean). Three participants (C.M., G.T. and P.R.), indeed, were excluded because their performance in the testing phase was not clearly pathological (superior than 2.0 sd above the mean) and, consequently, the presence of a face recognition impairment was not certain (**Table 1**).

#### Control Group and Experimental Group

Thirty-eight participants (14 with difficulties in face recognition and 24 controls), selected as described above, participated in this experiment addressing the role of the dynamic aspects of a face in famous people identification.

The group of 14 participants with difficulties in face recognition (1 male and 13 females, mean age of 21.57 years, SD = 3.41, mean education of 16.3 years, SD = 1.9) did not differ from the CG (2 males and 22 females, mean age of 21.54 years, SD = 1.69, mean education of 16.1 years, SD = 1.51) in terms of age (*t*<sup>36</sup> = –0.045, *p* = 0.963) or years of education (*t*<sup>21</sup> = –0.284, *p* = 0.777). All participants were right-handed.

#### Stimuli and Procedure

The 48 videos showing 16 Italian famous people (eight males and eight females) in three different conditions (neutral expression, dynamic emotional expression – joy-, and non-emotional dynamic expression) were used for the actual experiment.

The experiment, assembled and driven by E-Prime 2.0, was divided into three blocks, corresponding to the three experimental conditions: neutral (including all the stimuli from Set 1), emotional expressions (with Set 2 stimuli), and non-emotional expressions (with the stimuli from Set 3). The presentation of the stimuli within each set was randomized and the order of presentation of the three blocks was counterbalanced among the participants (three different sequences). The participants were sited at a distance of 40 cm from the PC screen and the videos were shown in a central position on the screen, after 500 ms of a black mask. Participants were asked to watch the video, and after to provide the name of the famous person or, alternatively, to give any biographical information that was linked to that face. No time limit for the response was given. Thus, for each condition the maximum score was 16, corresponding to the correct identification of all celebrities. At the end of the experimental session, participants were asked whether they knew all the celebrities used in the experiment, by providing them the names of the celebrities. None of the celebrities was unknown to the participants.

# Results

The number of correct famous face identifications in the three conditions were submitted as dependent variable to an analysis of variance with two between-subjects main factors "Group" (two levels: EG and CG) and "Sequence" (three levels: neutral– emotional–non-emotional (N–E–NE), non-emotional–neutral– emotional (NE–N–E) and emotional–non-emotional–neutral (E–NE–N) and one within-subjects main factor: "Stimulus Condition" [three levels: neutral (N), emotional (E) and nonemotional (NE)].

The analysis of variance showed a significant main effect of "Group" [*F*(1,32) <sup>=</sup> 8.912; *<sup>p</sup>* <sup>=</sup> 0.005; <sup>η</sup><sup>2</sup> <sup>=</sup> 0.28] and a significant main effect of "Stimulus Condition" [*F*(2,64) = 6.474; *<sup>p</sup>* <sup>=</sup> 0.003; <sup>η</sup><sup>2</sup> <sup>=</sup> 0.20]. The main effect of "Sequence" [*F*(2,32) <sup>=</sup> 0.833; *<sup>p</sup>* <sup>=</sup> 0.444; <sup>η</sup><sup>2</sup> <sup>=</sup> 0.052] was not significant. The "Group" by "Stimulus Condition" interaction was significant [*F*(2,64) = 7.387; *p* = 0.001; η<sup>2</sup> = 0.23], as was "Stimulus Condition" by "Sequence" [*F*(4,64) = 3.375; *p* = 0.014; η<sup>2</sup> = 0.21]. The "Group" by "Sequence" [*F*(2,32) = 1.751; *<sup>p</sup>* <sup>=</sup> 0.190; <sup>η</sup><sup>2</sup> <sup>=</sup> 0.109] and the "Group" by "Sequence" by "Stimulus Condition" [*F*(4,64) = 0.511; *p* = 0.728; η<sup>2</sup> = 0.03] interactions were not significant.

The EG, made up of participants who showed difficulties with unfamiliar face recognition tests, had a worse performance than the control participants even in the famous faces identification task (9.35 vs. 11.54 correct answers, respectively). The main effect of Stimulus showed that, overall, the neutral condition (10.14) was significantly more difficult than the non-emotional condition (10.78; *p* = 0.009), which instead did not differ from the emotional condition (10.42).

The significant "Group" by "Stimulus condition" interaction (**Figure 1**) was further explored by *post hoc* multiple comparisons (Bonferroni). The EG exhibited a significantly lower performance then the CG in all three conditions (neutral: *p* = 0.001; emotional: *p* = 0.015; non-emotional: *p* = 0.036). Moreover, in the EG the neutral condition was significantly lower than the other two conditions (*p* = 0.019 and *p* = 0.001) and the emotional was not significantly different from the non-emotional condition (*p* = 0.222). No differences were present for the CG in the three stimulus conditions (all *p >* 0.05).

These results are in line with the hypothesis that individuals with poor face recognition skills can be helped by the presence of facial movements: in fact their performance improved in the two dynamic conditions, independently of the emotional content. Not only did the presence of an emotion not improve the performance, but even facial movements without emotional content showed a small improvement with respect to the emotional condition, even if not statistically significant. One possible explanation for these findings could be that it is easier to extract the typical expressions of a person without processing the affective meaning of them.

Also the significant "Stimulus condition" by "Sequence" interaction was further explored by *post hoc* multiple comparisons (Bonferroni). In the first sequence N–E–NE, the first presentation (neutral) was significantly lower than the third (non-emotional; *p* = 0.007), while in the third sequence E–NE–N the first presentation (emotional) was significantly lower than the second (non-emotional; *p* = 0.011). In general, the first condition was always lower than the other two, and this difference was not significant only in the second sequence, between the first dynamic condition and the second neutral condition. Nevertheless, the absence of any interaction between the group and the sequence confirms that the effects we were looking for and that we obtained, were independent of the "obvious" effect of the sequence.

# Discussion

We seleceted two groups on the basis of their differences in performing face matching and recognizing judgments during a neuropsychological assessment. These two groups were also significantly different during an experimental task involving

the identification of famous faces. The group with poor face recognition skills was impaired in recognizing famous people, even if external cues, such as hair and upper body parts, were available. The difference in performance between the two groups was not only quantitative, but also qualitative. Only the performance of the group with poor face recognition, indeed, showed an effect of stimuli condition. In particular, their performance was significantly worse for neutral faces than for the two dynamic conditions.

The presence of facial expressions did not help in recognizing famous people in the case of the group with good recognition: this was probably due to the fact that they are already at ceiling in recognizing faces from their invariant features, so the expressions were of no help. As shown by other authors (for a review, see O'Toole et al., 2002), facial motion is unlikely to benefit identity recognition in normal conditions, but dynamic information can be helpful in difficult conditions (like poor illumination, noise, etc.) and can improve face recognition (Steede et al., 2007; Longmore and Tree, 2013). We showed that the difficult conditions can relate not only to the external factors but also to the limitations of the participants.

Indeed, only the poor recognition group derived benefits from the celebrities' facial movements, while the group with good recognition did not derive any benefit from the presence of expressions in famous faces.

A possible explanation is that both groups could use facial movements, but the good recognizers already showed the maximum performance with the neutral condition, because their ability to process facial invariant aspects was preserved and more efficient than facial movements processing. An alternative explanation is that people with difficulties in the processing of invariant aspects of faces learned to use facial movements cues to compensate for their difficulties, becoming better than the controls in using those information. Unfortunately, starting from our data we are not able to disentangle between these two hypotheses and new studies will be necessary for a better understanding of the interaction and integration of the two systems for face recognition.

The order of presentation had an effect on performance, so that the same celebrity presented a second or third time was better recognized than on the first occasion, independently of the group. Moreover, we did not tested all possible sequences and this can be a limitation of the present study. Nonetheless, the fact that the sequence did not interact with the difference observed between groups in the three conditions, suggests that such an effect was independent of the effects that we were interested in.

Finally, the improvement in performance in the poor face recognition group was not limited to the emotional expressions condition, but it was also present in the non-emotional expressions condition. This is consistent with the idea that the performance of individuals with difficulties in face recognition is not improved by the emotional content, but it is improved by the changeable aspects, which were present in both conditions of our experiment. This does not mean that the two aspects of faces, emotional expressions and non-emotional expressions, do not differ from each other, but it suggests that the way people move their face, regardless of whether they are speaking or smiling, is typical, and can be separately processed, stored and recovered. It is possible that when a person moves his face to express some emotional content, his/her expressions are as typical as when the facial movements do not contain any emotional content.

Our results showed that facial expressions can improve face recognition also in the case of famous faces, in addition to the already demonstrated case of unfamiliar faces (Comparetti et al., 2011; Daini et al., 2014). On the other hand, the different results we obtained, compared with previous studies, may be attributed to the use of a different presentation of the stimuli. In fact, while previous studies (Comparetti et al., 2011; Daini et al., 2014) used static images and found that emotional and non-emotional expressions can have different effects during a recognition task, in our study we found that both types of expressions can have the same impact on identification of naturally moving faces of famous people. This result could be due to the fact that videos stimuli convey more dynamic information than static pictures, allowing the participants to focus more on the dynamic information itself and so minimizing the differences between the emotional and non-emotional content. Moreover, we did not ask to detect the expression, but to identify the person, independently from the presence of a facial expression. Nevertheless, one possible limitation of our study is the small sample size of the group with poor face recognition abilities, which, however, is strictly related to the complexity of finding and recruiting those participants.

Our results support the hypothesis that our visual system analyses and stores non-emotional expressions independently from invariant features and emotional content, as suggested only by O'Toole et al.'s (2002) model, and that in poor face recognizers the changeable aspects might be preserved and might help recognition, giving supplemental information for face recognition (independent from the structural information of the face, accordingly to the SIH). Such information (together with the invariant features of the face and all biographical information) is

# References


part of what defines the identity of a person, and is the basis of facial imitation.

# Conclusion

The facial movements are not particularly useful for individuals with good performance in face processing when they have to identify famous people in optimal condition because they can already rely on invariable aspects, and this appears to be the main pathway for identification. On the other side, individuals with poor face recognition skills can benefit from facial movements in order to identify well-known celebrities, suggesting that motion information can be extracted from an image sequence or a video and can act as a cue for identification. It appears that poor recognizers could have coded and learned face information by relying to a more preserved processing of changeable aspects, compared to a less efficient processing of invariant aspects, and that they could use motion as a supplemental cue in face recognition. On the other hand, the lack in performance of our EG, despite the advantage of facial dynamic information, suggests the greater relevance of invariant features in face recognition and its deficiency in those participants.

Our results are relevant for both theoretical and practical reasons. They support the hypothesis of a system in our brain that is able to process, learn and recover typical facial expressions, independent of emotional content. Such a system seems to be preserved in individuals who show poor face recognition abilities (and this could be extended to individuals with congenital prosopagnosia), and it is possible that they can be trained, possibly early in youth, so as to improve their ability to interact with others in everyday life.

# Supplementary Material

The Supplementary Material for this article can be found online at: http://journal*.*frontiersin*.*org/article/10*.*3389/fpsyg*.* 2015*.*01211


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Albonico, Malaspina and Daini. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Facial feedback affects valence judgments of dynamic and static emotional expressions

Sylwia Hyniewska\* and Wataru Sato

The Hakubi Project, Primate Research Institute, Kyoto University, Inuyama, Japan

The ability to judge others' emotions is required for the establishment and maintenance of smooth interactions in a community. Several lines of evidence suggest that the attribution of meaning to a face is influenced by the facial actions produced by an observer during the observation of a face. However, empirical studies testing causal relationships between observers' facial actions and emotion judgments have reported mixed findings. This issue was investigated by measuring emotion judgments in terms of valence and arousal dimensions while comparing dynamic vs. static presentations of facial expressions. We presented pictures and videos of facial expressions of anger and happiness. Participants (N = 36) were asked to differentiate between the gender of faces by activating the corrugator supercilii muscle (brow lowering) and zygomaticus major muscle (cheek raising). They were also asked to evaluate the internal states of the stimuli using the affect grid while maintaining the facial action until they finished responding. The cheek raising condition increased the attributed valence scores compared with the brow-lowering condition. This effect of facial actions was observed for static as well as for dynamic facial expressions. These data suggest that facial feedback mechanisms contribute to the judgment of the valence of emotional facial expressions.

#### Edited by:

Paola Ricciardelli, University of Milano-Bicocca, Italy

#### Reviewed by:

Fabien D'Hondt, Université Catholique de Louvain, Belgium Victoria Ashley, Veterans Affairs Northern California Health Care System, USA

#### \*Correspondence:

Sylwia Hyniewska, The Hakubi Project, Primate Research Institute, Kyoto University, Masukawa Building for Education and Research 406, Kitashirakawa-Oiwakecho, Sakyo, Kyoto 606-8501, Japan sylwia.hyniewska@gmail.com

#### Specialty section:

This article was submitted to Cognition, a section of the journal Frontiers in Psychology

Received: 22 November 2014 Accepted: 01 March 2015 Published: 17 March 2015

#### Citation:

Hyniewska S and Sato W (2015) Facial feedback affects valence judgments of dynamic and static emotional expressions. Front. Psychol. 6:291. doi: 10.3389/fpsyg.2015.00291 Keywords: facial feedback, dynamic expression, emotion recognition, facial expression, dimensional rating

# Introduction

Judging the emotions that others are experiencing is an important skill in managing interpersonal relationships. Given that emotions guide behaviors (Frijda, 2010), understanding others' emotions allows to predict their behaviors and to coordinate social relationships. In fact, evaluating the emotional content of any behavior is essential in all social encounters, starting with basic judgments of the extent to which an ongoing event is attractive or aversive to another individual (Russell, 1994; Widen, 2013).

Several lines of evidence suggest that emotion judgment is modulated through such behavior as the mimicry of observed facial expressions. It has long been known that humans have a tendency to spontaneously imitate the expressions of others (Smith, 1759/1976), and experimental psychological studies have provided empirical evidence that the simple viewing of facial emotional expressions leads to the reproduction of similar expressions by viewers (e.g., Dimberg, 1982). Several researchers have proposed that the facial actions resulting from such mimicry influence emotion judgment via the feedback effect (Hatfield et al., 1994; Goldman and Sripada, 2005; Niedenthal et al., 2010). Specifically, researchers have suggested that muscle activations in response to others' emotional facial expressions provide feedback to the brain in the form of proprioceptive signals, which activate the representation of one's own emotional bodily state; this representation leads to understanding the emotions experienced by other people (Hatfield et al., 1994; Goldman and Sripada, 2005; Niedenthal et al., 2010). Neuroscientific research supports such ideas by showing that the mutual influence of the production and observation of expressions can be explained by a shared neural substrate, the mirror neuron system (Grezes and Decety, 2001; Atkinson and Adolphs, 2005; Iacoboni, 2009). Thus, the influence of facial feedback on the interpretation of the emotional expressions of others can be explained theoretically.

However, empirical investigations of the causal relationship between the facial actions of observers and the judgment of emotions have reported mixed findings. Several studies reported results supporting this relationship using designs involving the manipulation of facial actions with instruments (Niedenthal et al., 2001; Oberman et al., 2007; Ponari et al., 2012; Rychlowska et al., 2014), cosmetic procedures (Neal and Chartrand, 2011), and instructions (Stel and van Knippenberg, 2008). For example, Niedenthal et al. (2001) showed that participants whose spontaneous facial actions were disrupted by holding a pen in their mouth were slower to detect a change from one expression to another compared with participants who were free to react with their facial muscles. Neal and Chartrand (2011) found that limiting facial mimicry by injecting Botox into faces and amplifying the subjective experience of facial actions by applying gel to faces impaired and improved, respectively, emotion recognition based on facial expressions. However, some of these studies only partially supported this relationship. For example, Oberman et al. (2007), who also used a pen-holding technique to constrain the facial actions of observers, reported that this disruption impaired the emotion-labeling performance in response to some but not all emotions. Stel and van Knippenberg (2008) showed that constraining facial actions by asking participants not to move their faces reduced the speed but not the accuracy with which the emotion depicted in facial expressions was recognized. Furthermore, several studies that tested the correlation between the degree of facial mimicry and the accuracy of expression recognition found no evidence of such a relationship (Blairy et al., 1999; Hess and Blairy, 2001; however, see Sato et al., 2013). Following those findings, a number of researchers (Blairy et al., 1999; Hess and Blairy, 2001; Hess and Fischer, 2013) pointed out that whether the facial actions of observers modulate judgments of perceived emotional expressions in unrestricted conditions remains unclear.

Two factors seem to be important in order to clarify this issue: the use of dimensional measures to evaluate emotion and the use of dynamic vs. static presentations of facial expressions.

First, although all previous studies tested facial emotion judgments using emotional categories (e.g., anger), facial emotion can be interpreted using dimensions of valence and arousal. These dimensions are superordinate to categories (Russell, 2003), and the most prevalent interpretation of them is that valence, which ranges from negative to positive, represents the qualitative component, whereas arousal, which ranges from low to high, reflects the energy level (Russell, 2003). It has been proposed that dimensional judgments of facial expressions may be more fundamental than categorical ones (Russell et al., 2003). Several studies have supported this notion; for example, preschoolers order facial expressions in a two-dimensional space of valence and arousal without the use of emotion labels, as these seem not to be readily available at this stage of development (Russell and Bullock, 1986). Based on these data, we could argue that the unconscious feedback from the face, which is not explicitly related to an ongoing evaluative task and acts on a basic and non-verbal level, would be more clearly related to the dimensional attribution stage of facial expression judgments. Consistent with this notion, a recent study found a significant correlation between facial mimicry and emotion recognition using dimensional, specifically valence, ratings (Sato et al., 2013). Based on these findings, we hypothesized that facial actions would have a clear effect on emotion judgments made with dimensional valence ratings.

Second, although none of the previous studies compared dynamic and static presentations of facial expressions, this difference may modulate the facial feedback effect on emotion recognition. Previous psychological studies have shown that, compared with static facial expressions, dynamic ones facilitate various types of psychological activities, including facial mimicry (Weyers et al., 2006; Sato et al., 2008; Rymarczyk et al., 2011), subjective emotional arousal (Sato and Yoshikawa, 2007a), and emotion recognition (Wehrle et al., 2000; Biele and Grabowska, 2006). Functional neuroimaging studies have also shown that dynamic vs. static facial expressions enhanced activity in the mirror neuron system (Sato et al., 2004). Based on these data, we hypothesized that facial action would influence ratings of static and dynamic presentations of facial expressions and exert a stronger impact in reaction to dynamic presentations.

To test these hypotheses, we investigated the effect of facial actions on emotional evaluations offered in terms of valence and arousal ratings of dynamic and static facial expressions. To manipulate participants' facial actions, we used the voluntary facial action technique (Dimberg and Söderkvist, 2010), which requires participants to lower their brows (corrugator supercilii muscle) or raise their cheeks (zygomaticus major muscle) to differentiate between two types of stimuli; in our study, it was the gender of the stimuli that differed. This technique has been shown to be effective in the modulation of the valence of the subjective emotion reported while viewing emotional facial expressions in situations in which participants are not aware that the purpose of the experiment involves examining the effect of facial action on emotional processing (Dimberg and Söderkvist, 2010). We also prepared a cover story and a dummy task, to be administered before the actual facial action task, to hide the experimental purpose. We presented facial expressions of anger and happiness because (1) the voluntary facial action technique can elicit mimicry-like facial actions in response to these expressions and (2) correlations between facial actions and valence evaluations have been reported for these expressions (Sato et al., 2013).

## Materials and Methods

#### Participants

Thirty-six students from Kyoto University (15 females, 21 males, mean ± SD age, 22.1 ± 2.1 years) participated in this study. All participants had normal or corrected-to-normal visual acuity. Although six additional volunteers participated in the study, their data were not analyzed due to their reported psychological problems or outlier ratings (>2 SD from the group mean). Participants signed a written informed consent form after the experimental procedures were explained. The study was approved by the local ethics committee of the Primate Research Institute, Kyoto University. Participants were reimbursed for their time and effort.

#### Experimental Design

We used a three-factorial within-participants design: observer's action (brow lowering, cheek raising) × stimulus emotion (anger, happiness) × stimulus presentation (dynamic, static). Valence and arousal scores were the two dependent variables.

### Stimuli

The facial expressions (**Figure 1A**) were taken from the video corpus of emotional displays depicted by Kyoto University students (Sato and Yoshikawa, 2007b). The selection and validation of the angry and happy expressions in dynamic and static styles were described in a previous study (Sato and Yoshikawa, 2007b), which found high levels of accuracy in the recognition of these expressions by participants. Static pictures showed the peak expression in the video displays. Four displays of each emotion were chosen (the expressions of two male and two female actors). A dummy task, which preceded the one of interest, involved the presentation of pictures of robots and animals. Each stimulus subtended a visual angle of about 7.8◦ horizontally × 9.8◦ vertically. The viewing distance was approximately 0.7 m.

#### Apparatus

The presentation of stimuli was controlled by Presentation <sup>R</sup> software version 14.9 (Neurobehavioral Systems) implemented on a Windows computer (HP Z200 SFF, Hewlett-Packard). The stimuli were presented on a 19-inches CRT monitor (HM903D-A, Iiyama). The facial actions of participants were monitored through a hidden digital camera (QuickCam IM, Logitech).

#### Procedure

Participants were led individually to a sound-attenuated experimental room. As part of a cover story to reduce awareness of the focus of the research, a plethysmograph device was attached to the non-dominant hand of participants, and participants were told it would measure their heartbeat during the entire experiment. Participants then relaxed for 3 min.

Computer guidelines about the cover story and procedures were provided to participants. The first guideline indicated that the aim of the study was to investigate the practical use of technology by handicapped persons. Participants were told that they would be assigned to perform two tasks that were randomly chosen from a wide range of possible tasks; however, the same tasks were actually assigned to all participants. Participants were asked

to evaluate the internal state of the stimuli by pressing keys to respond to an affect grid (Russell et al., 1989), which graphically represented the two dimensions of valence, from unpleasant (1) to pleasant (9), and of arousal, from low arousal (1) to high arousal (9) (**Figure 1B**). Following Russell et al. (1989), the midpoint of each scale was explained as representing a neutral, average feeling, whereas the vertices were defined as representing extreme emotions, such as excitement and depression.

In the dummy task, participants performed shoulder actions in response to the photographs of robots and animals. They were asked to move their left and right shoulder forward as fast as possible in response to robots and animals, respectively, and to evaluate the internal states of the stimuli using the affect grid. They were asked to hold the shoulder position until they finished responding. After a few practice trials for actions and for ratings with actions, a total of 12 trials, consisting of six trials each with robots and animals, were conducted. The order of trials was randomized. Each trial consisted of the presentation of a fixation cross for 500 ms; this was followed by the presentation of the stimulus for 1500 ms and then by the presentation of the affect grid. The inter-trial interval was 1000 ms. The results from the dummy task are not reported as the performance on this task was irrelevant to the purpose of the study.

In the experimental task (**Figure 1C**), participants performed facial actions in response to emotional facial expressions. They were asked to lower their brows and raise their cheeks as fast as possible in response to women and men, respectively, under one condition and to perform the facial actions in the opposite direction under another condition. They were also asked to evaluate the internal states of the stimuli using the affect grid while maintaining the facial action until they finished responding. The participants engaged in a few practice trials for actions and for ratings with actions. During the practice, participants were observed through a hidden camera by an experimenter certified in the use of the Facial Action Coding System (FACS: Ekman et al., 2002) to ensure the correctness of their facial actions according to this system. If the participant did not perform the facial actions appropriately (i.e., Action Units 4 and 12 for brow lowering and cheek raising, respectively), the experimenter corrected the actions by explaining that the plethysmograph device was not able to accurately detect the responses. The experimenter pointed either to brows or to cheeks, asking the participant if he/she could reproduce the expression presented on the screen while making it herself. No affective terminology was used to describe the facial action, nor were any related terms, such as "frown" or "smile," used. One intervention was sufficient to correct facial actions during the experimental task. The participants completed a total of 64 trials presented in two blocks of 32. In one block, participants were asked to lower their brows when seeing women; in the other, they were asked to do so when seeing men (and the reverse for the cheek raising). The same stimuli were used in both blocks. The event sequence of each trial was the same as that in the dummy task (i.e., a fixation for 500 ms, the stimulus for 1500 ms, and then the affect grid).

After the experiment, the participants were interviewed. This process confirmed that no-one was aware of the purpose of our experiment. Participants were then debriefed regarding the experiment. Permission to use their data was requested and granted in all cases.

### Data Analysis

Repeated-measures analyses of variance (ANOVAs) were performed treating observer's action (cheek or brow activation), stimulus emotion (happiness or anger), and stimulus presentation (dynamic or static) as factors. Valence and arousal were analyzed separately. Our effect of interest was the observer's action. When this factor showed significance, we further tested for simple effects under each stimulus condition using t-tests (onetailed). The simple effects of other factors were also examined using t-tests (two-tailed). Based on our preliminary analyses, the gender of the participants, which showed no significant main or interactive effects on the results, was disregarded in the following analyses. The results of all tests were considered statistically significant at p < 0.05.

# Results

In terms of valence scores (**Figure 2** left; see **Supplementary Figure 1** left for different scores between cheek raising and brow lowering conditions), the three-way ANOVA revealed a main effect of the observer's action, F(1, 35) = 10.34, MSE = 0.24, p < 0.005, η 2 <sup>p</sup> = 0.228, with more positive scores under the cheek raising compared with the brow-lowering condition. Simple-effect analyses confirmed that the effects of observers' action (cheek raising > brow lowering) were significant for all the dynamic happy, t(35) = 2.12, p < 0.05, static happy, t(35) = 1.95, p < 0.05, dynamic angry, t(35) = 1.84, p < 0.05, and static angry expressions, t(35) = 3.31, p < 0.005. We found no significant interactions related to the observers' action, F(1, 35) < 1.18, p > 0.1. Additionally, the main effect of the stimulus emotion (happiness > anger), F(1, 35) = 571.36, MSE = 3.11, p < 0.001, η 2 <sup>p</sup> = 942, and the interaction between the stimulus emotion and the stimulus presentation, F(1, 35) = 8.22 MSE = 0.10, p < 0.005, η 2 <sup>p</sup> = 0.190, were significant. Simple effect analyses for the interaction revealed that the effect of stimulus emotion (happiness > anger) were significant both for dynamic and static presentations, t(35) > 22.03, p < 0.001, and the effect of stimulus presentation (static > dynamic) was significant for angry, t(35) = 2.93, p < 0.01, but not for happy expressions, t(35) = 1.11, p > 0.1. The main effect of the stimulus presentation was not significant, F(1, 35) = 2.72, p > 0.1.

In terms of arousal (**Figure 2** right, **Supplementary Figure 1** right), the three-way ANOVA showed no significant main effect or interactions related to the observers' action, F(1, 35) < 2.27, p > 0.1. However, we found a significant main effect of stimulus presentation (dynamic > static), F(1, 35) = 12.32, MSE = 3.58, p < 0.005, η 2 <sup>p</sup> = 0.260, and a significant interaction between the stimulus emotion and the stimulus presentation, F(1, 35) = 19.22, MSE = 0.57, p < 0.001, η 2 <sup>p</sup> = 0.354. Simple effect analyses for the interaction revealed that the effect of stimulus emotion was not significant for either of dynamic or static presentations, t(35) < 1.55, p > 0.1, and the effect of stimulus presentation (dynamic > static) was significant for angry, t(35) = 4.23,

p < 0.001, and marginally significant for happy expressions, t(35) = 1.99, p < 0.1. The main effect of the stimulus emotion was not significant, F(1, 35) = 0.62, p > 0.1.

### Discussion

Consistent with our first hypothesis, our results showed that observers' facial action had an impact on the valence ratings of stimulus facial expressions. Specifically, cheek raising led to higher valence scores for facial expressions than did brow lowering. These results are consistent with several previous studies that reported that the manipulation of facial actions by observers influenced emotion recognition (Niedenthal et al., 2001; Oberman et al., 2007; Neal and Chartrand, 2011). However, several studies reported cases in which facial action had no clear effect on the attribution of emotional labels to facial expressions (Oberman et al., 2007; Stel and van Knippenberg, 2008). Following these inconsistencies in categorical attributions to expressions, we relied on valence judgments, which have been defined as more fundamental than categorical judgments (Russell et al., 2003). Our experiment was the first to further test the facial feedback effect by using dimensional valence ratings, which seem even better able to detect consequent qualitative changes in judgments of the emotion of others.

With regard to our second hypothesis, the modulating effect of facial actions, cheek raising and brow lowering, was strong in response to static as well as to dynamic presentations. However, contrary to our expectations, the effect of facial action was equally strong in response to both presentation formats. This result is inconsistent with previous data showing that dynamic facial expressions were better able to elicit facial mimicry, subjective emotion, and emotion recognition than were static ones (e.g., Sato and Yoshikawa, 2007a). Consistent with most data regarding the effect of dynamic presentations (e.g., Detenber et al., 1998; Sato and Yoshikawa, 2007a), our data showed that dynamic stimuli were rated as more arousing than were static ones, therefore we expect that our dynamic stimuli would have elicited a stronger emotional impact than our static stimuli similarly to what was observed in the previous studies. One possible interpretation of the observed discrepancy concerns our request that participants voluntarily and clearly perform facial actions in response to both dynamic and static facial expressions; this manipulation may have induced the same feedback for both types of presentation. It is possible that the recognition of dynamic facial expressions is enhanced in natural settings due to the stronger facial mimicry than the one experienced in response to static facial expressions.

Our results showing a clear facial feedback effect on the valence attributed to facial expressions may have theoretical implications. The extant literature regarding facial mimicry has long assumed that the feedback effect of facial actions would play a fundamental role in expression recognition (Hatfield et al., 1994). Experimental evidence has supported the importance of facial mimicry in the processing of facial expressions, showing that facial mimicry occurs rapidly, even before conscious awareness of faces (Dimberg et al., 2000), and that it is elicited at developmentally early stages, even in newborn infants (Meltzoff and Moore, 1977). However, the specific information about others' emotional expressions provided by the facial feedback effect remained unknown. In the literature on the facial expression recognition, it was proposed that a dimensional evaluation is fundamental to this process (Russell et al., 2003). This notion has been supported by empirical evidence that the valence of facial expressions is processed rapidly, before conscious awareness of faces (Murphy and Zajonc, 1993), and that it is recognized at developmentally early stages, such as 2 years of age (Russell and Bullock, 1986). However, the mechanism underpinning the ways in which the valence of expressions can be recognized also remained unknown. Our results connect these bodies of literature and suggest that facial feedback plays a fundamental role in emotion recognition by providing information about the valence of facial expressions.

Our results may also have practical implications. Using an experimental approach, we showed the effectiveness of the voluntary facial action technique (Dimberg and Söderkvist, 2010) for eliciting the facial feedback effect on the judgments of emotional expressions. This easy and non-intrusive method may be used in ecological settings to assist in the judgments of others' emotions. For example, it may be possible to utilize this method in individuals touched by psychiatric disorders involving impairments in emotional communication, such as the autism spectrum disorder (ASD). Individuals with ASD are characterized primarily by impaired recognition of emotional facial expressions (Hobson, 1993). Consistent with the notion of a facial feedback effect, a recent study revealed that individuals with ASD were impaired compared with typically developing controls in their ability to engage in spontaneous facial mimicry in response to others' emotional expressions (Yoshimura et al., in press). At the same time, this study showed that the ASD group was able to voluntarily imitate facial expressions in a manner comparable to the control group. Based on these data, we speculate that it may be possible to assist individuals with ASD in their valence judgments of facial expressions by applying the voluntary facial action technique in a way that is congruent with others' facial expressions. It would be interesting to explore such possibilities in future research.

In addition to the effect of observers' facial action, our results showed that dynamic presentations of facial expressions intensified the ratings of arousal as well as part of valence. The intensifying effect of dynamic presentations on arousal ratings is in line with previous studies reporting that the ratings of intensity (Biele and Grabowska, 2006) and subjectively experienced arousal (Sato and Yoshikawa, 2007a) were higher for dynamic than for static facial expressions and that the ratings of experienced arousal were higher for dynamic than for static emotional scenes (Detenber et al., 1998; Simons et al., 1999, 2000). The modulatory effect of dynamic presentations on valence ratings were also reported in some studies using scenery stimuli (Detenber et al., 1998; Simons et al., 2000). Together with these data, our results suggest that dynamic presentations have an intensifying effect on the dimensional evaluations of emotional facial expressions, independently of the effect of observer facial action.

# References


Several limitations of the present study should be acknowledged. First, because we contrasted two facial actions, we could not conclude whether these facial actions increased or decreased the valence evaluations. This issue can be investigated by introducing a baseline situation, such as a condition or group without any predefined facial constraints. Clarification of this issue should increase our understanding of the phenomenon. Second, because we relied on only two basic emotions (cf. Ekman, 1992), questions about whether other valenced emotional expressions would show a similar effect involving facial feedback remains unanswered. Further studies should overcome this weakness by introducing expressions with other basic emotions (e.g., fear) or even complex emotions (e.g., excitement; cf. Yik et al., 2011).

In summary, our data showed an effect of facial action on valence judgments. When individuals activated the zygomaticus major muscle they attributed more positive valence to dynamic and static facial expressions than when they activated the corrugator supercilii muscle. These results suggest that facial feedback mechanisms contribute to the evaluation of the valence of emotional facial expressions.

# Author Contributions

SH and WS were responsible for the conception and design of the study, data acquisition and analysis, the interpretation of results, and the writing of the manuscript.

# Acknowledgments

We thank Professor T. Matsuzawa for helpful advice and Ms. K. Minemoto for technical support. This study was supported by funds from the Japan Society for the Promotion of Science Funding Program for Postdoctoral Fellowship (PE13059) and for Next Generation World-Leading Researchers (LZ008).

# Supplementary Material

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fpsyg. 2015.00291/abstract

Supplementary Figure 1 | Differences in scores between the observer action conditions. Mean (with SE) differences between cheek raising and brow lowering conditions for valence (left) and arousal (right).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Hyniewska and Sato. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Emotional processing in Parkinson's disease and schizophrenia: evidence for response bias deficits in PD

*Ilona P. Laskowska1\*, Ludwika Gawrys´ 2, Szymon Ł ˛eski3 and Dariusz Koziorowski4*

*<sup>1</sup> Music Performance and Brain Laboratory, Department of Cognitive Psychology, University of Finance and Management, Warsaw, Poland, <sup>2</sup> 2nd Department of Psychiatry, Institute of Psychiatry and Neurology, Warsaw, Poland, <sup>3</sup> Laboratory of Neuroinformatics, Department of Neurophysiology, Nencki Institute of Experimental Biology, Warsaw, Poland, <sup>4</sup> Department of Neurology, Faculty of Health Science, Medical University of Warsaw, Warsaw, Poland*

#### *Edited by:*

*Paola Ricciardelli, University of Milano-Bicocca, Italy*

#### *Reviewed by:*

*Michela Sarlo, University of Padova, Italy Roberta Daini, Università degli Studi di Milano-Bicocca, Italy*

#### *\*Correspondence:*

*Ilona P. Laskowska, Music Performance and Brain Laboratory, Department of Cognitive Psychology, University of Finance and Management, 55 Pawia Street, Warsaw 01-030, Poland ilaskowska@gmail.com*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 01 July 2015 Accepted: 04 September 2015 Published: 24 September 2015*

#### *Citation:*

*Laskowska IP, Gawrys L, Ł ˛ ´ eski S and Koziorowski D (2015) Emotional processing in Parkinson's disease and schizophrenia: evidence for response bias deficits in PD. Front. Psychol. 6:1417. doi: 10.3389/fpsyg.2015.01417* Deficits in facial emotion recognition in Parkinson's disease (PD) patients has been well documented. Nevertheless, it is still not clear whether facial emotion recognition deficits are secondary to other cognitive impairments. The aim of this study was to answer the question of whether deficits in facial emotion recognition in PD result from impaired sensory processes, or from impaired decision processes. To address this question, we tested the ability to recognize a mixture of basic and complex emotions in 38 non-demented PD patients and 38 healthy controls matched on demographic characteristics. By using a task with an increased level of ambiguity, in conjunction with the signal detection theory, we were able to differentiate between sensitivity and response bias in facial emotion recognition. Sensitivity and response bias for facial emotion recognition were calculated using a *d-prime* value and a *c* index respectively. Our study is the first to employ the EIS-F scale for assessing facial emotion recognition among PD patients; to test its validity as an assessment tool, a group comprising schizophrenia patients and healthy controls were also tested. Patients with PD recognized emotions with less accuracy than healthy individuals (*d-prime*) and used a more liberal response criterion (*c* index). By contrast, patients with schizophrenia merely showed diminished sensitivity (*d-prime*). Our results suggest that an impaired ability to recognize facial emotions in PD patients may result from both decreased sensitivity and a significantly more liberal response criteria, whereas facial emotion recognition in schizophrenia may stem from a generalized sensory impairment only.

Keywords: facial emotion recognition, mild cognitive impairment, Parkinson's disease, response bias, schizophrenia, signal detection theory

# Introduction

In recent years, there has been increasing interest in the wide range of cognitive symptoms accompanying neurodegenerative disorders. This trend is reflected in the recommendations published in the DSM-5 (American Psychiatric Association, 2013), which state that neuropsychological assessment for such disorders should be expanded to include social cognition. This recommendation pertains to both mild and major neurocognitive disorders. Mild neurocognitive disorder (mNCD) in the domain of social cognition is defined as: "subtle changes in behavior or attitude, often described as a change in personality, such as less ability to recognize social cues or read facial expressions" (p. 595). For the assessment of patient competency in social cognition, the DSM-5 recommends evaluation in two domains: (1) those which measure the ability to recognize a variety of both positive and negative emotions, and (2) those which measure the ability to consider the mental state and experiences of others.

Parkinson's disease (PD) is one such neurodegenerative disorder in which facial recognition impairment has frequently been identified (Dujardin et al., 2004; Ariatti et al., 2008; Baggio et al., 2012; Bediou et al., 2012). PD results from a loss of dopamine neurons in the pars compacta region of the substantia nigra and depletion of some of the neurons within the ventral tegmental area (Braak et al., 2003; Drui et al., 2014). These degenerations affect both nigrostriatal and mesocorticolimbic systems and seem to be associated with facial emotion recognition ability in PD (Péron et al., 2012). The most commonly occurring impairments are seen in the ability to recognize basic negative emotions: fear, sadness, anger, and disgust (for review, see: Gray and Tickle-Degnen, 2010). Inability to recognize anger and disgust has been shown to be directly related to dopamine depletion (Sprengelmeyer et al., 2003; Lawrence et al., 2007).

There is still much debate as to whether emotion recognition impairment in PD is restricted to negative emotions (Suzuki et al., 2006; Gray and Tickle-Degnen, 2010). One recently published study (Buxton et al., 2013) found that PD patients did exhibit deficits in the ability to recognize happiness. In that study, six basic emotions were presented at three levels of intensity: low, medium and high. Results show that the PD group's ability to identify happiness was affected when the intensity level of the emotion was decreased to medium or low. A similar pattern is not observed with negative emotions. Buxton's results indicate that impairment in facial emotion recognition among PD patients is (1) not restricted to negative emotions, and (2) dependent upon the intensity of the stimulus. Impairment in the recognition of complex emotions (so-called "social emotions") has also been documented. One study found that the ability to recognize arrogance was reduced among PD patients following temporary withdrawal from dopamine replacement therapy (Martins et al., 2008). Such findings raise the question as to whether additional cognitive processes are involved when faced with ambiguous stimuli, such as basic emotions with reduced intensity or more complex emotions.

Impaired performance on various cognitive tasks is well described in PD patients (Verbaan et al., 2007; Muslimovic et al., 2009). Executive dysfunctions, which are extremely common (Owen, 2004; Kudlicka et al., 2011; Ravizza et al., 2012) even in the early stages of the disease (Levin and Katzen, 2005), include difficulties with decision-making, categorization, executive attention, and working memory.

A small number of studies have investigated the possibility that emotion recognition impairment in PD is secondary to executive dysfunction. However, due to the paucity of data, no consistent conclusion could be drawn (Gray and Tickle-Degnen, 2010). What is more, most of these studies (e.g., Breitenstein et al., 1998; Pell and Leonard, 2005; Clark et al., 2008; Herrera et al., 2011) used different neuropsychological measures (e.g., verbal fluency tasks, Trail Making Test (TMT), Wisconsin Card Sorting Test, Visual Search) to examine different aspects of executive function. However, the failure of these studies to find a link between emotion recognition impairment and executive dysfunction does not negate the possibility that such a link exists. To investigate the processes underlying facial emotion recognition, it may be useful to employ a method which already incorporates some aspects of executive function (e.g., decision-making) and explores the range of emotions encountered in everyday life. Such a method would therefore include tasks which present a degree of uncertainty and would make demands upon an individual's decision-making processes. According to Krantz (1969), decision-making comprises sensory processes and cognitive decision processes. Signal detection theory (SDT) is a useful tool for analyzing both these measures of performance (Stanislaw and Todorov, 1999; Macmillan and Creelman, 2004). Decision-making strategy is measured in terms of response bias, whereas the accuracy of stimulus detection is expressed in terms of sensitivity. The distinction between sensitivity and response bias appears to be particularly significant when tasks include a higher level of difficulty and when response strategies play an important role due to greater ambiguity of stimuli.

To date, the SDT has only been applied in a few studies pertaining to facial emotion recognition (e.g., Tsoi et al., 2008; Pixton, 2011; Huang et al., 2013). To the best of our knowledge, it has been used in only one study involving PD patients (Narme et al., 2011). However, Narme et al. (2011) were primarily interested in specific visuospatial deficits than in executive function deficits. They hypothesized that facial emotion recognition impairment in PD may result from configural processing deficits. Their study included the following tasks: (1) a facial emotion recognition task; (2) an upside-down facial emotion recognition task, and (3) a facial configuration detection task. Their results showed that configural performance was positively correlated with emotion recognition of negative emotions. It has been suggested that impaired recognition of emotion from facial cues could be related, at least partially, to configural processing alteration, especially for vertical, secondorder information. However, these results are in contrast to previous studies examining visuospatial deficits among PD patients (for review, see: Gray and Tickle-Degnen, 2010), which suggests that facial emotion recognition deficits in PD are independent from general deficits in face processing. These discrepancies could be accounted for by task differences, since major studies assessed visuospatial deficits with the Benton Facial Recognition Task, which was not designed specifically to serve as a measure of configural processing. It is worth noting that decision-making deficits, categorization impairments and decreased working memory in PD were not controlled in the study by Narme et al. (2011). However, the authors of that study do not disregard the need to clarify the role of attention and executive functions in more complex experimental tasks which demand specific cognitive activity in future studies (Narme et al., 2011).

In our study we choose not to focus on emotion-specific recognition deficits, opting instead to assess emotion recognition deficits using a wider range of emotions. We tested the ability of PD patients to recognize a mixture of basic and complex emotions. To do this, we employed the Emotional Intelligence Scale – Faces (EIS-F), which complies with the recommendations of the DSM-5. As far as we know, our study is the first to assess the usefulness of EIS-F for research into cognitive impairment in patients with neurological and psychiatric disorders.

The aims of our study were as follows: (1) to assess facial emotion recognition among patients with PD, using a task which is more ecologically valid than those which merely assess basic emotions. This will give us a more accurate picture of the ability of these patients to recognize facial emotions in a natural environment; (2) to answer the question of whether deficits in facial emotion recognition in PD result from impaired sensory processes (i.e., decreased sensitivity in stimulus detection), or from a decision-making impairment (measured as response bias). By using a task with a greater level of ambiguity in conjunction with the SDT, we should be able to differentiate sensory process deficits from decision-making deficits; (3) to assess the diagnostic accuracy of EIS-F and to ascertain its usefulness as an assessment tool for mNCD.

In order to check the validity of the EIS-F for assessing facial emotion recognition among PD patients, a control group comprising schizophrenia patients was also tested. As with PD patients, a number of studies have shown that patients with schizophrenia exhibit impaired facial emotion recognition, compared with healthy controls (Feinberg et al., 1986; Archer et al., 1992; Salem et al., 1996; Addington and Addington, 1998; Kohler et al., 2003). It has been observed that impairment of emotion recognition in both PD patients (e.g., Dujardin et al., 2004; Lawrence et al., 2007; Clark et al., 2008) and SCH patients (e.g., Bediou et al., 2005) results from a disturbed dopaminergic system. There is some evidence that there is an inverted U shaped relationship between emotion recognition ability and dopamine level (Delaveau et al., 2005). In one study (Delaveau et al., 2005), healthy individuals were given levodopa and this had an effect on amygdala activation during the performance of the facial emotion recognition task. We can therefore expect that diminished dopaminergic innervation of the amygdala in PD, and dopamine overstimulation in SCH, will have a negative impact on emotion recognition ability. Furthermore, it is worth noting that people with schizophrenia, in contrast to PD patients, may show a more general deficit in face perception (for review, see: Bortolon et al., 2015). Performance in the BVRT and similar tasks is usually impaired in schizophrenia patients, compared to healthy controls. According to Bortolon et al. (2015) this difference was not seen in only four studies, while schizophrenia patients displayed impaired performance in over a dozen studies. In the case of PD patients, the exact opposite pattern of results is found: Gray and Tickle-Degnen (2010) found that in 15 studies examining BVRT performance, there was no difference between PD patients and healthy controls, and only four showed impaired performance in the PD group. The results of metaanalysis performed by the authors of those studies suggest

that the existence of facial emotion recognition impairment in PD cannot be explained in terms of a general visuospatial deficit. We therefore included the SCH group as an additional control group that would be expected to exhibit decreased discriminability of facial emotions. We expected robust visual face processing deficits to have a significant effect on performance in the EIS-F, given that the task requires discrimination of subtle facial features. We wanted to find out if decreased sensitivity to facial emotions is accompanied by changes in response strategy. As already mentioned, the SDT is rarely used in emotion recognition studies and the link between response strategy and ability to discriminate emotions has not been determined. Our goal was to study both of these factors and, in including the SCH group, we would be able to determine response strategy changes in the context of sensitivity changes.

# Materials and Methods

#### Participants

Thirty-eight non-demented patients with Parkinson's disease (14 females) and 38 healthy controls matched for sex, age, and education took part in the study. PD patients were recruited via the Parkinson's Disease Association in Bydgoszcz and the Parkinson's Disease Foundation in Warsaw. All patients met the criteria for Idiopathic Parkinson's disease with mild or medium intensity motor symptoms, as per the Hoehn and Yahr Scale (mean = 2.34, *SD* = 1). All patients were treated with levodopa or a dopamine agonist medication, and were tested soon after medication was administered (i.e., during their "on" state). PD patients with a Mini-Mental State Examination (MMSE) score below 24 were excluded from the test group.

In addition to the above, a group of 26 patients with schizophrenia (nine females) were compared to 26 healthy controls matched for sex, age, and education. This group of patients was recruited from schizophrenia foundations in Bydgoszcz, Inowrocław, Sicienko, and Torun. The schizophrenia ´ patients were diagnosed in accordance with the Diagnostic and Statistical Manual of Mental Disorders-IV. Patients were being treated with antipsychotic medication at the time of testing. Schizophrenia patients with an MMSE score below 24 were excluded from the test group.

All participants were native speakers of Polish. Healthy controls were recruited from the general population. Measures of cognitive functions (MMSE) and depression (BDI) were administered prior to testing. **Table 1** shows the demographic and clinical characteristics of Parkinson's disease and schizophrenia patients and their respective control groups. Informed written consent was obtained from all subjects prior to testing and the study was approved by the local ethics committee.

#### Neuropsychological Assessment

In order to assess the accuracy of the EIS-F and its diagnostic utility for PD patients, we examined the relationship between



Education (years) 12.83 3.48 13.33 2.69 277.0# 0.13 Disease duration 17.69 9.17 BDI 11.17 11.94 8.23 8.64 282.0# 0.28 MMSE 28.42 1.81 *PD, Parkinson's disease; HC, healthy controls; M, mean; SD, standard deviation;*

*t/U, t statistic or U statistic (for non-normally distributed samples); BDI, Beck Depression Inventory; MMSE, Mini-Mental State Examination; Str WR, CW, Stroop Color-Word Test (Word Card, Color-Word Card); TMT A, B, B-A, Trail Making Test part A and B, and difference; BVRT corrects, errors, Benton Visual Retention Test, number of correct answers, and number of errors; RAVLT 1-5, Rey Auditory Verbal Learning Test sum of trials 1-5, LTM, long term memory score; SCH, Schizophrenia patients.*

emotional processing and cognitive functions. Each patient with PD and each control subject were given a set of neuropsychological tests. The assessment was performed by a neuropsychologist. To assess visual attention, psychomotor speed and alternating attention, we employed the TMT, which is a widely used tool for testing executive functions. TMT comprises parts A (number sequencing) and B (number-letter switching). We also calculated the TMT B-A index to remove the speed component. To examine selective attention and inhibition control, we used the Stroop Test (STR), which comprises a wordreading index (WR) and color-word naming index (CW), and is designed to assess cognitive speed and executive function. To test visual and verbal memory, we used the Benton Visual Retention Test (BVRT), which is designed to assess short-term visual memory, and the Rey Auditory Verbal Learning Test (RAVLT), which is widely used to assess episodic verbal memory. We used RAVLT trials 1–5 as a measure of verbal learning, and RAVLT ltm as an indicator of ability to retrieve information after a 20-min interval.

Mild cognitive impairment (MCI) was defined by neuropsychological testing as impaired performance (i.e., 2 SD below the mean score for the age- and education-matched control group) in two neuropsychological tests (see: Litvan et al., 2012). Subjective complaints of cognitive problems (or lack thereof) were not treated as a factor in the selection process.

#### Facial Emotion Recognition Task

Our study utilized the (EIS-F (Matczak et al., 2005). The test comprises 18 photographs, nine featuring male faces and nine featuring female faces. For each group of nine, four photographs depict positive emotions and five depict negative emotions. Accompanying each photograph is a list of six possible emotions (see **Figure 1**). The subject must determine which of the six emotions are shown in each photograph, and which are not, by choosing one of three possible responses: "shown," "not shown," and "hard to say." Before the test commences, the subject is instructed that the "hard to say" response should only be given as a last resort. There are 108 items in total (i.e., 18 photographs × 6 names of emotions). The number of emotions expressed in each photograph is from one to four. Perfect score in the test requires the identification of 45 "shown" emotions and correct rejection of 63 "not shown" emotions.

The emotions depicted in the photographs include both basic emotions (positive: joy, surprise; negative: sadness, anxiety,

anger, disgust), and complex emotions (positive: tenderness, self-contentment, pride, satisfaction, admiration, hope, coquetry, composure, self-confidence, curiosity, expectation, interest, astonishment; negative: unpleasant surprise, confusion, aversion, distrust, resignation, regret, disappointment, insecurity, disregard, feeling of superiority, indignation, envy, hate, contempt, unease, jealousy, disbelief).

#### Data Analysis

In EIS-F, each decision as to whether a given emotion is present (or not) in a photograph is considered to be a separate answer in the test. The number of correct responses in the test is: "shown" = 45, "not shown" = 63. The authors of the test have proposed only one performance indicator, i.e., the total number of correct responses, be they "shown" or "not shown." However, the summary result can be broken down into six possible responses: correct positive answer ("shown"), correct negative answer ("not shown"), incorrect positive answer ("shown"), incorrect negative answer ("not shown"), a response of "hard to say" when the correct response should have been "shown," a response of "hard to say" when the correct response should have been "not shown."

In our study, we analyzed the EIS-F results using the SDT (Macmillan, 2002). The SDT predicts that, for tasks which require a yes/no answer, performance is dependent upon the accuracy with which the subject discriminates between a known process (the signal) and chance (the noise). Moreover, the SDT takes into consideration the response strategy employed by the subject: where the subject experiences uncertainty, he may give a positive response (liberal strategy) or a negative response (conservative strategy). The EIS-F presents subjects with this choice, in that they must choose whether a given photograph depicts or does not depict the emotion in the accompanying list. In accordance with the SDT classification, correct positive responses ("shown") are known as "hits." Correct negative responses ("not shown") are "correct rejections." Incorrect positive responses ("shown") are called "false alarms." Incorrect negative responses ("not shown") are known as "misses." In the case of EIS-F, responses of "hard to say" prove problematic because the SDT does not take this option into consideration. However, since the "hard to say" responses are classed as erroneous in the EIS-F, such responses have also been classed as incorrect in our analysis. For this reason, a response of "hard to say" in cases where the correct response should have been "shown" are classed as misses. A response of "hard to say" when the correct response should have been "not shown" is classed as a false alarm.

In order to measure performance in a given task, the SDT uses the sensitivity index *d'* (Macmillan, 2002), which calculates the difference between hits and false alarms. The higher the value of the *d'*, the more accurate the distinction between signal and noise.

The second index used by the SDT is the response bias index c. Positive c index values indicate a conservative response strategy. In cases of uncertainty, the subject is more likely to give a negative response (expressed in EIS-F as a "not shown" response). A negative *c* index indicates a liberal response strategy, whereby the subject gives a positive response in cases of uncertainty (expressed in EIS-F as a "shown" response). We chose the *c*index, instead of the common β index, because it is not affected by changes in the *d'* (Ingham, 1970; Macmillan, 1993).

#### Statistical Analysis

Analyses were performed using custom scripts written in Python programming language with packages for scientific computing: SciPy (Oliphant, 2007), sdt\_metrics, and pandas. Group differences in demographic, clinical and cognitive characteristics and facial emotion recognition variables were analyzed using independent two-tailed *t*-tests for normally distributed variables, the Mann–Whitney test for non-normally distributed variables. Correlations between neuropsychological, demographic, clinical factors and facial emotion recognition variables (*d'*, *c*, hit rate, false alarm rate), were analyzed using Pearson's correlations. Receiver Operating Characteristic (ROC) curves were plotted for each group (patients/controls).

# Results

#### PD Patients

There were no significant differences between PD patients and HCs regarding demographic variables (see **Table 1**). Depression scores were significantly higher in the PD group. Patients scored significantly lower on the MMSE than the HCs. Significant differences were observed in the executive functions measures (TMT, STR). There were no significant differences in memory measures (RAVLT, BVRT).

The hit rate was considerably higher in the PD group than in the HC group (**Table 2**), although this was accompanied by a higher rate of false alarm responses. Despite the higher rate of hits, the *d'* sensitivity index (which indicates the accuracy of recognition) showed no difference between the groups. For both groups, a large number of hits were accompanied by an equally large number of false alarms. Both groups employed a liberal response strategy (as indicated by a negative *c* index). At the same time, the response bias was significantly higher (i.e., larger deviation from zero) among PD patients, which shows that there is a greater tendency to give positive responses in this group.

#### Patients with Schizophrenia

The performance of patients with schizophrenia in the EIS-F, as expressed by hit rate, did not differ significantly from that of the healthy controls, although the schizophrenia patients



*d-prime, sensitivity index; c, response bias index.*



*d-prime, sensitivity index; c, response bias index.*

gave a significantly higher number of false alarm responses (**Table 3**). This resulted in a considerably lower sensitivity index for schizophrenia patients compared with healthy controls. Both groups employed a liberal response strategy and showed no statistically significant differences in *c* index.

#### Receiver Operating Characteristic

We plotted ROC curves defined by average hit and false alarm rates for each patient and control group, and calculated the *d'* index corresponding to these averages. As we can see in **Figure 2**, all ROC curves lie relatively close to the diagonal dotted line representing performance of random choice strategy. This suggests that the difficulty level of the task was relatively high. However, all groups perform above chance level. Note that younger HCs performed better than older HCs. This would indicate that the age of test subjects significantly affects facial emotion recognition ability.

for PD group, SCH, schizophrenia patients, HCSCH, control for SCH group.

The markers on the ROC curves represent average hit rate and false alarm rate within each group. Notably, the markers are positioned almost symmetrically for all groups (indicating little bias as measured by *c* index) except for the PD patient group, whose marker lies further to the right (higher hit and false alarm rates, larger deviation of *c* from zero).

#### Correlations

In order to examine the relation between age and facial emotion recognition ability, a correlation analysis was performed for all healthy controls. We found a significant negative correlation between age and the *d'* (*r* = −0.42, *p <* 0.001). There was no significant correlation between age and *c* index (*r* = 0.13, *p* = 0.26). We also found no significant correlation between years of education and *d'* (*r* = 0.07, *p* = 0.67) or *c* index (*r* = −0.005, *p* = 0.98). Since the *c* index was lower among PD patients, we carried out a correlation analysis between facial emotion recognition variables and cognitive function measures. We found that there was a significant negative association between executive performance (TMT B-A) and response bias (*c* index) (*r* = −0.36, *p <* 0.05) and between TMT B-A and false alarm rate (*r* = 0.33, *p <* 0.05).

#### Emotion Recognition in Relation to MCI

In order to provide MCI definition scores for PD, patients were categorized as pathological if their mean score was at least two standard deviations below that of the control group in two neuropsychological tests. Thus, 29 of the 38 PD patients were classified as cognitively intact and nine were classified as MCI. We compared emotion recognition performances of MCI and non-MCI PD patients and found no statistically significant differences.

# Discussion

Facial emotion recognition requires particular perception skills, such as an ability to discriminate facial features. However, whereas the most commonly used tests in research into emotion recognition consist of "yes/no"- type questions and answers, an equally important role is played by the cognitive decision process. In our study, we used the EIS-F to assess how both of these processes participate in facial emotion recognition in a group of individuals with PD, and in a group of individuals with schizophrenia. The SDT was used to measure perception and response bias. We found that: (1) patients with PD in comparison with age-matched healthy controls displayed sensory deficit in facial emotion recognition, as indicated by a decreased *d'* index value (2) individuals with PD employed a more liberal strategy than healthy individuals when it came to answering questions; (3) patients with schizophrenia showed less sensitivity in stimulus identification, compared with individuals from the age-matched healthy control group. Notably, decreased discriminability in the schizophrenia group was not accompanied by changes in response strategy, as indicated by the similar value of *c* index in the schizophrenia and healthy control groups. These findings indicate that facial emotion recognition ability can be sensitive to at least two potentially different process impairments, and the SDT may detect the impact of both sensory and executive deficits. Our findings are consistent with the belief that difficulties in facial emotion recognition in PD are not merely the result of a general deficit in face processing, but also the effect of executive control impairment (Péron et al., 2012). By contrast, facial emotion recognition difficulties in schizophrenia may stem from generalized perceptual impairment (Archer et al., 1992).

Patients with PD showed concurrent signs on the Stroop and TMT measures. It seems plausible that difficulty in processing emotions may stem from impaired executive functions. Indeed, we did find a correlation between the patients' results on the facial emotion recognition task (*c* index, false alarms ratio) and TMT B-A measure. This finding suggests that the observation that prosodic emotion recognition in PD is partially dependent on deficits in executive functions (Gray and Tinkle-Degnen, 2010) and this also extends to facial emotion recognition. Moreover, we did not find any significant correlation between the *d'* index value and the results of executive function tests. Our results validate the specificity of *d'* and *c* measures, which are sensitive to impairments in two different processes.

With regards to sensory deficit in facial emotion recognition in PD, our findings were relatively similar to those of Narme et al. (2011). However, the overall discriminability of facial emotions in the study by Narme et al. (2011) was significantly higher than that observed in our study. The fact that we introduced stimuli of varying levels of difficulty (i.e., ambiguity and intensity) may explain this difference. This may also explain why both groups employed a highly conservative criterion (*c* = 0.4 in the PD group and *c* = 0.33 in healthy controls) in the study by Narme et al. (2011). Our use of complex facial emotions with lower intensity and higher ambiguity enabled us to detect changes in response strategy among PD patients, as expressed by a decreased *c* index value.

One could argue that the deficits in facial emotion recognition are an effect of general decline in cognitive functioning in PD. We excluded this possibility by comparing those patients with MCI with cognitively intact PD patients. We found that the difference in *d'* and *c* indices between the groups was not statistically significant. This finding concurs with the results of a study by Herrera et al. (2011), which revealed that emotion recognition impairment among PD patients was not related to the patients' cognitive status (in both the PD MCI and PD non-MCI groups, approximately half of the patients displayed impaired facial emotion recognition).

For the HC groups, our results regarding the negative correlation of age and ability to discriminate facial emotions are broadly consistent with previous studies (Orgeta and Phillips, 2008; Ruffman et al., 2008); however, the stimuli in our study were a mixture of basic and complex emotions, and we did not analyze positive and negative emotions separately.

By using the SDT, we found that EIS-F is able to discriminate between patients with PD and HCs. In contrast to the majority of tests used to assess facial expression recognition, EIS-F measures a mixture of basic and complex emotions. Thus, the EIS-F test has the advantages of an ecological test, in which the ambiguity of an emotion does not merely result in a reduction in stimulus intensity. The test subject has to define more specific categories of meaning from the complex process of emotion classification. In other words, as well as having to decide whether a photograph depicts or does not depict a given emotion (e.g., a positive emotion), the subject must also define that emotion more precisely (e.g., pride, relief, flirtatiousness). For this reason, we can assume that the level of difficulty of the EIS-F will be high, an assumption confirmed by the results of the hit ratio, false alarm ratio, and *d'* values. It should be stressed that the SDT has, thus far, been seldom used in analyses of facial emotion recognition, and there are very few studies of the clinical population (Diehl-Schmid et al., 2007; Tsoi et al., 2008; Narme et al., 2011; Huang et al., 2013). That said, the existing literature tells us that the sensitivity index values obtained in our study were relatively low. The values ranged from 0.56 (mean for the PD group) to 1.03 (mean for the younger control group), and these values are similar to those obtained for healthy individuals in fast-paced (12.5–25 ms) basic emotion recognition tests (Pixton, 2011). In a study of schizophrenia patients (Tsoi et al., 2008), which also had a relatively short exposition time of 50 ms, the *d'* sensitivity index results were similar to those seen in our study. Another study of patients with schizophrenia, in which the faces shown to the subjects had been manipulated to exhibit different levels of intensity (Huang et al., 2013), the *d'* sensitivity index fell when the intensity of the stimulus was decreased. Even in groups of healthy individuals, the sensitivity index was lower than 1 when the intensity of a given emotion fell below 50%. The results of these studies suggest that the difficulty level of the EIS-F is indeed high, and is comparable to those tests which either limit the exposition time or considerably limit the intensity of the stimulus.

Future research should examine the sensitivity and accuracy of the EIS-F. One possible way of doing this is to check whether the results obtained in the EIS-F correlate with results obtained using other tests which measure facial emotion recognition. It would be extremely useful to do a comparison with more simply designed tests (Penn Emotional Facial Recognition – ER40) and considerably more difficult tests (e.g., tests where the level of intensity of presented emotions is manipulated).

Recognition of the social emotions used in the EIS-F requires not only efficient perception of a stimulus, but also efficient language competence. Our study was severely wanting in this regard, as we did not use any measurement of verbal comprehension (e.g., the relevant vocabulary subtest from the Wechsler Adult Intelligence Scale). However, this may be of more relevance to patients with schizophrenia than those with PD, given that language deficits are not typical among PD patients with cognitive dysfunctions (Goldman and Litvan, 2011). Still another limitation of this study is that there was no relevant neuropsychological measures to test whether the facial emotion recognition deficits in SCH could be due to a more specific cognitive impairment, even though we screened for patients' global cognitive abilities using the Mini-Mental State Examination to exclude patients with a score below 24. Moreover, it has been noted that the stability of facial emotion recognition impairments over the course of schizophrenia may indicate an intermediate phenotype or an endophenotype of schizophrenia (Bediou et al., 2012), which suggests that facial emotion recognition impairments are not directly related to general cognitive function constraints. Also, since our study only examined PD patients and schizophrenia patients who were taking medication for their condition, the effect of nonpharmacological interventions on facial emotion recognition remains untested.

## Conclusion

Little research has been done into the process of "natural" social emotion recognition, since the majority of studies have used morphed faces in order to manipulate the intensity of emotions (e.g., by morphing two basic emotions in varying proportions). In doing so, variables may be strictly controlled (i.e., the proportion of an assessed emotion in the morphed stimulus). However, this does not fully reflect the natural emotions seen in everyday life. We do come across mixed emotions in our daily lives

# References


(e.g., anger mixed with sadness), but more often than not we are required to identify social emotions, such as mixtures of contempt and dislike, or admiration and pride. In our study we used the EIS-F, which has the advantages of an ecologically valid test measuring social emotion recognition. Our results suggest that: (1) PD significantly changes response bias and causes a slight decrease in sensitivity in the recognition of social emotions; (2) schizophrenia has very little effect on response bias, but is significantly connected with decreased sensitivity in the recognition of social emotions.

# Acknowledgments

We wish to thank Dr. Andrzej Koczorowski for his helpful advice and Mr. Emmanuel Levy for his technical support. This study was supported by funds from the National Science Center in Poland for Postdoctoral Fellowship (DEC-2012/04/S/HS6/00575).

perception: an FMRI study. *Clin. Neuropharmacol.* 28, 255–261. doi: 10.1097/01.wnf.0000186651.96351.2e


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Laskowska, Gawry´s, ٞeski and Koziorowski. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Processing of masked and unmasked emotional faces under different attentional conditions: an electrophysiological investigation**

*Marzia Del Zotto <sup>1</sup> \* and Alan J. Pegna 1,2 \**

*1 Laboratory of Experimental Neuropsychology, Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland, <sup>2</sup> School of Psychology, University of Queensland, Brisbane, QLD, Australia*

In order to investigate the interactions between non-spatial selective attention, awareness and emotion processing, we carried out an ERP study using a backward masking paradigm, in which angry, fearful, happy, and neutral facial expressions were presented, while participants attempted to detect the presence of one or the other category of facial expressions in the different experimental blocks. ERP results showed that negative emotions enhanced an early N170 response over temporal-occipital leads in both masked and unmasked conditions, independently of selective attention. A later effect arising at the P2 was linked to awareness. Finally, selective attention was found to affect the N2 and N3 components over occipito-parietal leads. Our findings reveal that (i) the initial processing of facial expressions arises prior to attention and awareness; (ii) attention and awareness give rise to temporally distinct periods of activation independently of the type of emotion with only a partial degree of overlap; and (iii) selective attention appears to be influenced by the emotional nature of the stimuli, which in turn impinges on unconscious processing at a very early stage. This study confirms previous reports that negative facial expressions can be processed rapidly, in absence of visual awareness and independently of selective attention. On the other hand, attention and awareness may operate in a synergistic way, depending on task demand.

**Keywords: ERP, emotions, faces, subliminal, masking, awareness, selective attention**

# *Specialty section:*

*Alan J. Pegna alan.pegna@unige.ch, a.pegna@uq.edu.au*

*Edited by: Paola Ricciardelli, University of Milan, Italy Reviewed by: Tessa Marzi,*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*University of Florence, Italy Shahd Al-Janabi,*

*\*Correspondence: Marzia Del Zotto marzia.delzotto@unige.ch;*

*Macquarie University, Australia*

*Received: 19 May 2015 Accepted: 20 October 2015 Published: 31 October 2015*

#### *Citation:*

*Del Zotto M and Pegna AJ (2015) Processing of masked and unmasked emotional faces under different attentional conditions: an electrophysiological investigation. Front. Psychol. 6:1691. doi: 10.3389/fpsyg.2015.01691* **INTRODUCTION**

In the two last decades, many studies have focused on conscious and unconscious processing of emotional stimuli (for reviews, see Lindquist et al., 2012; Pourtois et al., 2013). One of the most extensively investigated categories of stimuli in this field are human facial expressions. In fact, due to their critical role in social, emotional and cognitive function, human faces constitute a biologically relevant category of visual stimuli, thought to be processed very rapidly and leading to an immediate regulation of behavior. Notably, it has been shown that emotions can selectively influence early aspects of visual perception, modulating the strength of the neuronal signal (Batty and Taylor, 2003; Bocanegra and Zeelenberg, 2009).

Along these lines, electrophysiological studies using face stimuli have provided evidence that faces can be processed at an early stage and without awareness both in healthy controls (e.g., Kiss and Eimer, 2008; Pegna et al., 2008, 2011) and in patients with cortical blindness (Gonzalez Andino et al., 2009; Del Zotto et al., 2013). Using a backward masking paradigm in healthy participants, Kiss and Eimer (2008) found that both subliminal and supraliminal fearful faces produced an enhanced early frontal positivity compared to neutral faces between 140 and 180 ms post-stimulus. In addition, the N2 (180–250 ms) was modulated by emotion at frontal and central sites, though only on subliminal trials. In another backward masking study, Pegna et al. (2008) found an increased N170 for masked fearful compared to non-fearful (happy and neutral) faces. Moreover, the N2 was observed to be greater over the right posterior leads for fearful compared to non-fearful expressions, increasing progressively with target detectability. Despite the discrepancies between these two studies regarding the location of the effects (possibly due to the use of different reference electrodes), these observations reveal that negative (fearful) emotional expressions are differentiated early in the course of visual information processing, and that this remains true even when the stimuli are not consciously detected. Such findings corroborate the existence of a rapid, preattentive process, in which negative emotional stimuli initiate attentional capture more effectively than positive or neutral ones (Öhman et al., 2001; Vuilleumier, 2005; Maratos, 2011).

A number of studies have also addressed the temporal dynamics of attention shifting for emotional faces with some reports again claiming the presence of a very early effect. In a covert attention shifting paradigm using a bar-probe task with fearful, happy and neutral facial expressions as emotional cues (Pourtois et al., 2004), found an enhanced negative modulation of the C1 (80–100 ms) component for fearful compared to happy faces. Moreover, the P1 component was found to be enhanced for targets appearing in the former location of fearful faces confirming that fearful faces were efficient, and rapid attractors of attention. These results demonstrated that emotional features can modulate the neural activity in the striate cortex independently of spatial attention, prior to the N170. Nevertheless, previous (Clark et al., 1994) and subsequent (Rossi and Pourtois, 2013) studies have confirmed that the C1 is mainly generated in the striate and extrastriate visual cortices, and it can be sensitive to spatial (Vanlessen et al., 2012) and non-spatial (Proverbio et al., 2010) visual attention. For instance, selective attention on high and low spatial frequency gratings can increase respectively the negativity or the positivity of the C1, starting at 60 ms after the stimulus presentation (Zani and Proverbio, 2009), independently of the attended location. Conversely, the occipital P1 is modulated by spatial attention *per se* and in conjunction with non-spatial features (Zani and Proverbio, 2012). Moreover, the C1 component is sensitive to the valence of affective meaning of threatening compared to neutral stimuli (Stolarova et al., 2006). Nevertheless, the exact effect of emotional processing on the C1 component under subliminal conditions is still unclear.

An opposing view subsequently emerged to the one postulating a rapid, preattentive processing of emotional faces. This claimed that neural processing of emotional face stimuli requires some degree of attention for detection and processing to occur (Pessoa et al., 2002a,b, 2005a; Wronka and Walentowska, 2011). By manipulating the attentional load of concurrent tasks while presenting emotional faces (Erthal et al., 2005; Pessoa et al., 2005b; Yates et al., 2010), evidence was found showing that emotional stimuli were processed when the competing tasks required little attentional resources, but not when the attentional demand was high.

Recent studies have claimed that these two different views are not mutually exclusive, in view of the fact that attention and emotion interact to different degrees depending on perceptual and cognitive load, as well as on task demand (Okon-Singer et al., 2007). From this perspective, emotionally relevant stimuli would be processed automatically, in the sense that they do not require conscious monitoring, as long as sufficient attentional resources are available for processing to occur. Additionally, in a series of behavioral priming experiments (Finkbeiner and Palermo, 2009), it was found that non-emotional masked faces could be processed unconsciously even when spatial attention was not oriented toward them, contrary to non-face stimuli. These findings were further replicated by an another study confirming that, in comparison to non-faces, faces produced a priming effect regardless of spatial and temporal attention (Quek and Finkbeiner, 2013). These authors concluded that faces in general can be processed without awareness or attention.

In independent lines of study, separate investigations have addressed the electrophysiological correlates of visual awareness. Classically, several authors have put forward that the P300 may be linked to visual awareness (or more specifically P3b) since it is more pronounced when the stimulus are consciously perceived (Babiloni et al., 2006; Del Cul et al., 2007; Lamy et al., 2008). Conversely, other reports suggest that the P3b can rather reflect consequences of conscious perception (Rutiku et al., 2015), distinguishing between "perceptual awareness," associated more to the attentional process of the visual stimuli, and "contextual awareness," associated more to the working memory and context of face stimuli (Navajas et al., 2014). Others investigations have pointed to the possibility of an earlier component, arising closer to 230 ms. Indeed, in a series of studies, it was proposed that the P2 and the N2 may reflect the earliest activity linked to awareness and was consequently named "visual awareness negativity" or VAN (Koivisto and Revonsuo, 2008). This early posterior negative deflection, peaking around 200–250 ms after stimulus onset over lateral-occipital cortex (Koivisto and Revonsuo, 2003), is only elicited by stimuli that are presented above the subjective threshold of perception. This component has been found in different experimental paradigms using reduced visibility, such as masking, reduced-contrast stimuli, attentional blink, and change blindness (Koivisto and Revonsuo, 2010), and has been noted in conjunction with non-spatial (Koivisto and Revonsuo, 2008), as well as spatial attention (Koivisto et al., 2009). Evidence supports the hypothesis that the early part of VAN (130–200 ms) is modulated completely independently of attentional manipulation, whereas the later part (200–300 ms) is influenced by selective attention on posterior temporal sites, suggesting that the selection negativity (SN) and the VAN are dissociable (Koivisto and Revonsuo, 2007, 2008), with the later part of the VAN likely reflecting recurrent processing in the ventral visual stream (Vanni et al., 1996). However, the VAN and the P3 could represent simply different stages of conscious process, with the earliest phase reflecting the initial sensory aspect of conscious perception, and the later stage denoting the experience of the stimuli.

Despite the existing studies, the interactions between attention and awareness, and even more so with emotion, are still unclear due to the scarcity of studies specifically addressing this issue. In order to explore the interplay between awareness, attention, and emotion processing and to characterize their dynamics, we carried out a study in which we systematically varied the top-down contribution (defined as voluntary selective attention) and the bottom-up stimulus-driven contribution (defined as the extent of masking) in emotional face processing. Combining a selective attentional task (e.g., Proverbio et al., 2010) with a backward masking paradigm of different emotional categories, our intention was to determine: (i) how much processing could occur with limited visibility and without voluntary attention; and (ii) which type of differences arose across emotional expressions. Consequently, we recorded the EEG and computed ERPs during the central presentation of backward-masked face stimuli, depicting "negative" (i.e., fearful or angry) or "nonnegative" (i.e., happy or neutral) expressions that were either attended or not (relevant or irrelevant to the task). Selective attention was manipulated by instructing participants to select a specific target category (e.g., "negative" stimuli) while ignoring the other category.

Four ERP components were examined: (i) the C1 considered the earliest index of emotional modulation (Pourtois et al., 2004) and object-based attention (Proverbio et al., 2010); (ii) the N170, considered to be an index of conscious (Batty and Taylor, 2003) and unconscious (Pegna et al., 2008) face processing; (iii) the P2, the N2 (VAN) and the P3, known to be as electrophysiological correlates of conscious access to visual stimuli (Railo et al., 2015); and (iv) the N2 and N3 which are possible indices of SN in conjunction with awareness (Koivisto et al., 2009).

# **MATERIALS AND METHODS**

# **Participants**

Sixteen healthy volunteers took part in the EEG experiment (age range 19–33, mean = 24.25, SD = 4.33). All participants were right-handed as measured on the Oldfield–Edinburgh scale (Oldfield, 1971; mean laterality index: 13.9, range: 8–20) with normal or corrected-to-normal vision and gave their informed written consent prior to the procedure. The investigation was approved by the local Ethics Committee. Participants were paid 30 CHF for their contribution.

The group consisted of six women (age range 19–33, mean = 23.7) and 10 men (age range 19–33, mean = 24.8), mainly students of the University of Geneva and staff from Geneva University Hospital. Since anxiety is thought to influence behavioral and ERPs responses, especially with emotional stimuli (e.g., Holmes et al., 2009; Mühlberger et al., 2009; Putman, 2011), we administered the State/Trait Anxiety Inventory Test (STAI; Self-Evaluation Questionnaire of Spielberger et al., 1970) to all participants before every EEG recording session. This test measures anxiety levels in adults, differentiating between the temporary condition of "state anxiety" (S-A) and the more general and long-standing quality of "trait anxiety" (T-A). None of the participants presented a pathological level of anxiety (standard score mean: male group S-A *≈* 48, T-A *≈* 41; female group: S-A *≈* 42, T-A *≈* 44). During the ERP analysis, two participants were excluded from the experimental sample due to excessive artifacts.

# **Apparatus and Stimuli**

Black and white pictures of actors, displaying happy (H), angry (A), fearful (F), and neutral (Ne) facial expressions, were selected from a database that was previously set up<sup>1</sup> . The stimuli were modified by means of Adobe Photoshop 11, in order to remove hair, ears, and unwanted facial signs and to keep constant luminance values across emotional categories. Stimuli consisted of bitmap images of 6 cm *×* 6 cm (237 *×* 237-pixels) subtending a visual angle of 3° when viewed at a distance of 114 cm from the screen. We used 40 stimuli (20 adult faces representing males and 20 representing females) for each emotional condition (angry, fearful, happy, and neutral). We created cropped faces on a black equal background for every single emotional category. Scrambled faces, obtained by randomly scrambling 20 *×* 20-pixels squares on every single cropped face, were used as masks in the backward masking paradigm, thus preserving the same physicals parameters (Di Lollo et al., 2000). The total number of stimuli was 160 and each stimulus was presented 10 times for a total of 1600 trials. The run was composed of 10 blocks of 160 stimuli each and were displayed using E-prime™ software; the presentation of stimuli in every block, as well as the sequence of blocks and the response hand, were counterbalanced across participants and randomized within participants by the software.

# **Design and Procedure**

Participants were comfortably seated in a moderately dark room (Faraday cage) while pictures were presented at the center of the screen. In order to manipulate voluntary selective attention, we used an attention task in which participants had to respond to a pre-defined category of stimuli by pressing a button on a keyboard, while ignoring the other categories. On half of the blocks, participants were asked to respond to happy and neutral stimuli (defined as "pleasant" faces), while on the other half, participants responded to fearful and angry stimuli (defined as "negative" faces; 50% in each category). Participants were instructed to respond as accurately and quickly as possible. During the EEG recording session, they were also asked to avoid any movement and to limit eye blinks. Before starting each sequence, the task instructions were indicated on the computer screen, and the participants were informed by the experimenter of the target category to which they had to reply. Target category varied randomly across blocks. Further, the experimenter verbally reiterated the instructions. The stimuli were presented for either 21 ms ("subliminal" presentation or masked condition) or for 290 ms ("supraliminal" presentation or unmasked condition),

<sup>1</sup>Previously, 46 volunteers (24 females and 22 males) classified the type of emotion of 130 faces taken from different database by means of a taxonomical scale of six basic emotions as defined by Ekman and Friesen (1971): anger, happiness, fear, surprise, disgust, and sadness (plus neutral). In the EEG experiment, we selected only those pictures reaching the threshold of 70% of consensus.

**FIGURE 1 | (A)** Experimental procedure: face stimuli are presented for 21 ms (masked condition) or 290 ms (unmasked condition), followed immediately by the mask (scrambled face). The total duration (stimulus + mask) lasts 321 ms, with a inter stimulus interval between 1400 and 1700 ms, during which participants are allowed to give a manual response (key press). **(B)** The scalp distribution and names of 204 electrodes used during the EEG experiment. The colored circles delimit the different ROIs (region of interests) used to computed the ERP analysis. Each color refers to specific ROI(s) for specific components: C1 (40–100 ms)—yellow; N170 (140–190), N2 (subliminal 280–330 ms, supraliminal 240–320), N3 (320–390 ms)—green; P1 (95–135 ms), P2 (200–260 ms)—pink; P3 (380–580 ms)—blue. Electrodes within discrete ROIs are merged together.

and were followed immediately by a mask constituted by a scrambled face. The duration of the mask was set such that the total stimulus duration (target + mask) was of 311 ms. Masks thus lasted respectively 290 and 21 ms (**Figure 1A**). In each sequence, half of the stimuli were presented subliminally. In the subliminal condition, it was emphasized that targets would be difficult to detect, but the participants were requested to focus on the stimulus, and to respond as soon as possible if they thought the target corresponded to the specified target category (pleasant or negative emotional expressions depending on the block).

Prior to the recording session, a training procedure was performed to familiarize the participants with the task and with the category of stimuli. Stimuli used in the training session were different to those used in the real experiment. However, paper printouts of all faces were presented to the participants once, before the EEG experiment to ensure that there was no ambiguity about the emotional expressions of the faces.

# **Behavioral Data and Statistical Analysis**

A repeated measure analysis of variance (two-way ANOVAs) within participants was applied on *d-prime values*, on *criterion rates* (c), and on *reaction times* (RTs), the latter only for target stimuli. We considered the following within factors: (i) *Presentation*: Masked and Unmasked; (ii) *Emotion*: Pleasant (Pls), Negative (Neg).

D-prime was used to evaluate the accuracy of the participants' performance (signal detection theory, Macmillan and Creelman, 1991). This sensitivity rate was computed using hit and false alarm scores of every single subject in each category. Criterion rates were used to evaluate the willingness of the participant to make a false alarm. Defining the Criterion as the *z*-score on the Signal Absent Distribution, a high value of the Criterion implies that the respondent requires strong evidence before declaring that the signal is present.

# **ERP Recordings and Analysis**

Continuous EEG data were acquired at 1000 Hz using a Geodesics system (Electrical Geodesics, Inc., USA) with 256 equallyspaced scalp electrodes referenced to the vertex. Impendency was kept below 50 kΩ. ERPs were computed by Cartool software (http://sites.google.com/site/fbmlab/cartool, 3.40 versions). The EEG signal was filtered offline from 0.01 to 50 Hz. EEG was epoched offline from 100 ms before to 1400 ms after the onset of the stimulus face. Separate epochs were computed for each of the 16 stimulus categories using only correct responses, and were baseline corrected using a pre-stimulus interval of 100 ms prior to the onset of the stimulus. All the epochs contaminated by blinks, eye movements, or other artifacts (EEG sweeps with amplitudes exceeding *±*100 µV) were excluded during the averaging procedure. Remaining artifacts were manually rejected upon visual inspection. During the ERP analysis, we systematically excluded 52 electrodes, situated over the face and in the most inferior part of the cap, decreasing their total number from 256 to 204 (**Figure 1B**). ERPs were then recalculated against the average reference (Lehmann and Skrandies, 1980).

We computed different region of interests (ROIs) based on different groups of electrodes, which were merged together (**Figure 1B**). We measured the peak amplitude and the latency of the following components:

*•* C1 (40–100 ms) over left temporal-occipital (83, 93, and TP9), right temporal-occipital (TP10, 191, and 201), and middle occipital (146, 147, and 156) regions (**Figure 1B**).


ERP amplitude and latency values were analyzed separately for each component, by means of four-way repeated-measures analyses of variance. We considered the following factors: (i) *Emotion (E)*: Angry (A), Happy (H), Fearful (F), Neutral (N); (ii) *Presentation*: Masked and Unmasked; (iii) Target category: Target (T), Non-target (NT); (iv)*ROIs*: left temporal-occipital (Lf), middle-occipital (Cx), right temporal-occipital (Rh) for the C1; left and right hemispheres for the P1, N170, P2, N2, and N3; we considered middle-parietal and middle-occipital areas for the P3. Additionally, we also computed separate repeated-measures ANOVAs (Emotion *×* Target Category *×* ROIs) for supraliminal and subliminal conditions to detect specific effects that could not emerge from the main ANOVA.

In behavioral and ERP statistical analyses, LSD tests were carried out for multiple mean *post hoc* comparisons in multiple ANOVA interactions. Greenhouse–Geisser corrections were applied to reduce the positive bias from repeated factors with more than two levels. We reported measures of effect size (η 2 <sup>p</sup>) in addition to probability values.

# **RESULTS**

# **Behavioral Results**

Participants' performance, when discriminating facial expressions in the masked condition, was 46.7% (*z*-score value: 0.12), which is not significantly different from chance level (binomial distribution: *p <* 0.72). In the unmasked condition accuracy was at 85.4% (*z*-score value: *−*1.24).

The *d-prime* analysis showed that *d ′* differed significantly between masked (0.78) and unmasked (2.16) stimuli [P factor: *F*(1*,*13) = 98.84, MSE = 0.27, ε = 1, *p <* 0.0001, η 2 <sup>p</sup> = 0.88] as it did for the *criterion* (*c*) [P factor: *F*(1*,*13) = 9.99, MSE = 0.53, ε = 1, *p <* 0.008, η 2 <sup>p</sup> = 0.50; subliminal: 0.66; supraliminal: 0.05].

Reaction times were significantly faster in response to unmasked than masked targets [*F*(1*,*13) = 9.35, MSE = 11660, ε = 1, *p <* 0.009, η 2 <sup>p</sup> = 0.41; mean values: Sup = 613 ms vs. Sub = 675 ms] and almost significantly faster to negative valence of faces [*F*(1*,*13) = 4.11, MSE = 7825, ε = 1, *p* = 0.06, η 2 <sup>p</sup> = 0.24; mean values: Pls = 661 ms and Neg = 627 ms]. More specifically, the slight effect between positive and negative valence was exclusively due to the neutral faces as shown by the significant effects in the repeated-measures ANOVAs with four separate emotions [*F*(3*,*39) = 10.79, MSE = 11637, ε = 0.78, *p <* 0.0002, η 2 <sup>p</sup> = 0.45; mean values: *A* = 626 ms, *F* = 629 ms, and *H* = 608 ms vs. *N* = 714 ms, *post hoc* comparisons: *p*s *<* 0.0002].

# **Electrophysiological Results** C1 Component

# *Latency*

An earlier peak was detected for: (i) non-targets (63 ms) compared with targets (68 ms) in the masked presentation (*p <* 0.01); masked (63 ms) compared to unmasked (67 ms) non-target stimuli (*p <* 0.02); unmasked (64 ms) compared to masked (68 ms) target stimuli (*p <* 0.05), as shown by the interaction of "Presentation *×* Target Category" [*F*(1*,*13) = 12, MSE = 194,ε = 1, *p <* 0.004, η <sup>2</sup> = 0.48]. Masked stimuli presented an earlier peak over right (64 ms) than left (67 ms) electrodes, whereas unmasked stimuli showed a later peak over central (67 ms) than left (64 ms) electrodes ["Presentation *×* ROI": *F*(2*,*26) = 3.66, MSE = 103, ε = 0.85, *p <* 0.05, η 2 <sup>p</sup> = 0.22].

*Masked presentation*. The same attentional effect, found in the main ANOVAs, occurred in this analysis ["Target Category": *F*(1*,*13) = 6.1, MSE = 1.28, ε = 1, *p <* 0.03, η 2 <sup>p</sup> = 0.32] showing an earlier peak for non-targets (63 ms) than targets (68 ms) stimuli.

*Unmasked presentation*. No significant result was found.

#### *Peak*

The interaction of "Emotion *×* Target Category" was significant [*F*(3*,*39) = 2.8, MSE = 1.28, ε = 1, *p* = 0.05, η 2 <sup>p</sup> = 0.17], showing a difference between targets (*−*0.98µV) and non-targets (*−*0.4µV) only in the fearful condition (*post hoc* comparisons: *p*s *<* 0.003), and between fearful and happy (*−*0.54 µV), as well as angry and neutral (both *−*0.55 µV) faces in the attentive condition (*post hoc* comparisons *p*s *<* 0.002).

*Masked presentation*. The amplitude was affected by the interaction of "Emotion *×* Target Category" [*F*(3*,*39) = 3, MSE = 3.06, ε = 1, *p <* 0.05, η 2 <sup>p</sup> = 0.19], showing an increase of negativity for fearful (*−*1.49 µV) compared to angry (*−*0.57 µV), happy (*−*0.44 µV; *p <* 0.02), and neutral faces (*−*0.45 µV; *p*s *<* 0.01) only in the attentive condition. Moreover, only fearful faces elicited a greater negativity between targets (*−*1.49 µV) and non-targets (*−*0.35 µV; *p <* 0.005).

*Unmasked presentation*. No significant result was found.

#### P1 Component

#### *Latency*

Negative facial expressions elicited an earlier peak compared with pleasant emotional faces ["Emotion": *F*(3*,*39) = 13.33, MSE = 66, ε = 0.65, *p <* 0.0002, η 2 <sup>p</sup> = 0.5; mean values: *A* = 116 ms and *F* = 114 ms vs. *H* = 120 ms and *N* = 119 ms; *post hoc* comparisons: *p*s *<* 0.009].

*Masked presentation*. Fearful faces elicited an earlier peak compared with all the other emotional expressions ["Emotion": *F*(3*,*39) = 11.12, MSE = 70, ε = 0.62, *p <* 0.0005, η 2 <sup>p</sup> = 0.46; mean values: *A* = 118 ms, *F* = 113 ms, *H* = 122 ms, *N* = 120 ms; *post hoc* comparisons: *p*s *<* 0.01]. Moreover, the difference between angry and happy faces, as well as between angry and neutral faces, was significant (*post hoc* comparisons: *p*s *<* 0.0001).

*Unmasked presentation*. Negative facial expressions elicited an earlier peak compared with pleasant emotional faces ["Emotion": *F*(3*,*39) = 4.77, MSE = 50, ε = 0.86, *p <* 0.009, η 2 <sup>p</sup> = 0.27; mean values: *A* = 115 ms and *F* = 115 ms vs. *H* = 118 ms and *N* = 119 ms; *post hoc* comparisons: *p*s *<* 0.04].

#### *Peak*

No significant result was found in the main ANOVA, or in the ANOVA of single presentation.

#### N170 Component

#### *Latency*

Masked stimuli elicited an earlier peak compared to unmasked faces ["Presentation": *F*(1*,*13) = 12.1, MSE = 118, ε = 1, *p <* 0.004, η 2 <sup>p</sup> = 0.48; mean values: 163 vs. 167 ms].

*Masked presentation*. Negative (anger and fear) facial expressions elicited an earlier peak compared to pleasant (happiness and neutral) faces ["Emotion": *F*(3*,*39) = 4.84, MSE = 116, ε = 0.51, *p <* 0.03, η 2 <sup>p</sup> = 0.27; *A* = 162 ms and *F* = 161 ms vs. *H ≈ N* = 167 ms; *post hoc* comparisons: *p*s *<* 0.05].

*Unmasked presentation*. "Emotion" factor ["Emotion": *F*(3*,*39) = 6.74, MSE = 47, ε = 0.84, *p <* 0.001, η 2 <sup>p</sup> = 0.34] showed a later peak for happy (169 ms) than fear (165 ms) and angry (163 ms) facial expressions, and for neutral (167 ms) compared to angry faces (*post hoc* comparisons: *p*s *<* 0.01). Moreover, this component peaked earlier on the right (164 ms) than left (168 ms) hemisphere ["ROI": *F*(1*,*13) = 4.9, MSE = 216, ε = 1, *p <* 0.045, η <sup>2</sup> = 0.27].

#### *Peak*

Unmasked pictures produced a greater amplitude compared to masked faces ["Presentation": *F*(1*,*13) = 31.35, MSE = 9.75, ε = 1, *p <* 0.0001, η 2 <sup>p</sup> = 0.71; mean values: *−*5.7 and *−*4 µV]. Negative facial expressions increased significantly the amplitude of this component compared to pleasant faces ["Emotion": *F*(3*,*39) = 6.38, MSE = 4.99, ε = 0.58, *p <* 0.009, η 2 <sup>p</sup> = 0.33; mean values: *A* = *−*5.24 µV and *F* = 5.36 µV vs. *H* = *−*4.63 µV and *n* = *−*4.23 µV; *post hoc* comparisons: *p*s *<* 0.05]. The interaction of "Presentation *×* Emotions" [*F*(3*,*39) = 4.1, MSE = 2.56, ε = 0.73, *p <* 0.025, η 2 <sup>p</sup> = 0.24] progressively showed an increased negativity across emotions, from neutral to angry faces (*N* = *−*3 µV *< H* = *−*3.79 µV *< F* = *−*4.6 µV and *A* = *−*4.77µV; *p*s *<* 0.02) only in the masked condition (**Figure 2**). In the unmasked condition, only fearful faces (*−*6.12 µV) elicited a greater negativity compared to happy (*−*5.48 µV) and neutral facial (*−*5.44 µV) expressions (*post hoc* comparisons: *p*s *<* 0.04).

*Masked presentation*. Angry and fearful facial expressions elicited a greater negativity compared with happy and neutral faces ["Emotion": *F*(3*,*39) = 9.26, MSE = 4, ε = 1, *p <* 0.0001, η 2 <sup>p</sup> = 0.42; *A* = *−*4.77 and *F* = *−*4.6 *< H* = *−*3.79 *< N* = *−*3; *post hoc* comparisons: *p*s *<* 0.05].

*Unmasked presentation*. No significant result was found.

# P2 Component

#### *Latency*

Targets elicited an earlier peak compared to non-targets ["Target Category": *F*(1*,*13) = 6.6, MSE = 236,ε = 0.1, *p <* 0.024, η 2 <sup>p</sup> = 0.34; mean values: 231 vs. 234 ms respectively], as well unmasked in comparison to masked stimuli ["Presentation": *F*(1*,*13) = 5.89, MSE = 242, ε = 1, *p <* 0.03, η <sup>2</sup> = 0.32; mean values: 231 vs. 235 ms respectively]. This component peaked earlier over right than left leads, as proved by "ROI" factor [*F*(1*,*13) = 6.89, MSE = 134,ε = 1, *p <* 0.02, η 2 <sup>p</sup> = 0.35; mean values: 231 vs. 234 ms respectively].

*Masked presentation*. "Target Category" factor was significant [*F*(1*,*13) = 6.64, MSE = 220, ε = 1, *p <* 0.023, η 2 <sup>p</sup> = 0.34; mean values: T = 232 ms vs. NT = 237 ms], as well as "ROI" *per se* [*F*(1*,*13) = 9.53, MSE = 54,ε = 1, *p <* 0.009, η <sup>2</sup> = 0.42; mean values: Rh = 233 ms vs. Lf = 236 ms].

*Unmasked presentation*. No significant result was found.

#### *Peak*

Amplitude was greater for unmasked (6.1 µV) than masked (1.6 µV) stimuli as shown by means of "Presentation" factor *per se* [*F*(1*,*13) = 51.85, MSE = 41, ε = 1, *p <* 0.0001, η <sup>2</sup> = 0.8; see **Figure 3**].

*Masked presentation*. No significant result was found.

*Unmasked presentation*. No significant result was found.

#### N2 Component

#### *Latency*

The peak was earlier for unmasked (281 ms) than masked (304 ms) ["Presentation": *F*(1*,*13) = 51.45, MSE = 1147, ε = 1, *p <* 0.0001, η <sup>2</sup> = 0.80]. The interaction with "Emotion" factor showed that each emotional category of unmasked faces elicited a earlier peak compared to the same emotional category of masked stimuli [*F*(3*,*39) = 3.83, MSE = 151, ε = 0.73, *p <* 0.03, η 2 <sup>p</sup> = 0.23; mean values: A—276 ms vs. 305 ms; F—283 ms vs. 304 ms; H—279 ms vs. 304 ms; N—285 ms vs. 303 ms; *p*s *<* 0.0001].

*Masked and unmasked presentation*. No significant result was found.

#### *Peak*

Amplitude was significantly different in the Target Category factor only in the masked presentation, where targets (*−*1.89µV) elicited a greater amplitude than non-targets (*−*0.97 µV), as shown by the *post hoc* comparisons (*p <* 0.002) of the "Presentation *×* Target Category" interaction [*F*(1*,*13) = 5.65, MSE = 3.14,ε = 1, *p <* 0.04, η 2 <sup>p</sup> = 0.3].

*Masked presentation*. The "Target Category" factor [*F*(1*,*13) = 7.42, MSE = 6.34, ε = 1, *p <* 0.02, η 2 <sup>p</sup> = 0.36] indicated that target (*−*1.89) stimuli increased the negative amplitude more than the non-target (*−*0.97) ones.

*Unmasked presentation*. No significant result was found.

**FIGURE 2 | (A)** Masked presentation; **(B)** Unmasked presentation. Both figures depict grand average ERPs, merged across electrodes ROI of N170 and between attentive and inattentive conditions (above); Scalp-Current-Density maps between 160 and 190 ms after the presentation of emotional stimuli, corresponding to N170 time window (below). Each ERP (and map) represents a different emotional condition: Anger (A—black), Fear (F—red), Happiness (H—blue), neutral (N—green).

#### N3 Component

#### *Latency*

No significant result was found in the separate ANOVAs computed for each type of presentation.

#### *Peak*

Masked stimuli elicited a smaller negativity compared with unmasked faces ["Presentation": *F*(1*,*13) = 32.7, MSE = 20, ε = 1, *p <* 0.0001, η 2 <sup>p</sup> = 0.72; mean values: 1.87 vs. *−*0.53 µV], as well as non-targets compared to targets, independently of the type of presentation ["Target Category": *F*(1*,*13) = 12.12, MSE = 2.88, ε = 1, *p <* 0.004, η 2 <sup>p</sup> = 0.48; mean values: 1 vs. 0.39 µV respectively]. This component was greater on the left than right hemisphere ["ROI": *F*(1*,*13) = 6.28, MSE = 9.74, ε = 1, *p <* 0.03, η 2 <sup>p</sup> = 0.33; mean values: 0.3 vs. 1 µV respectively].

*Masked presentation*. The factor "Target Category" was significant, revealing a greater negativity for targets compared to non-targets [*F*(1*,*13) = 14.2, MSE = 1.12, ε = 1, *p <* 0.003, η 2 <sup>p</sup> = 0.52; mean values: *T* = 1.6 µV vs. NT = 2.1 µV].

*Unmasked presentation*. The factor "Target Category" was again significant, revealing a greater negativity for targets compared with non-targets [*F*(1*,*13) = 5.67, MSE = 3.36, ε = 1, *p <* 0.04, η 2 <sup>p</sup> = 0.3; mean values: *T* = *−*0.83 µV vs. NT = *−*0.24 µV].

### P3 Component

#### *Latency*

"Target Category" factor was significant *per se*, showing an earlier peak for non-target than for target stimuli [*F*(1*,*13) = 8.54, MSE = 1001, ε = 1, *p <* 0.02, η 2 <sup>p</sup> = 0.40; mean values: 473 vs. 482 ms respectively]. This component peaked earlier over occipital than parietal leads ["ROI" factor: *F*(1*,*13) = 5.77, MSE = 5210, ε = 1, *p <* 0.032, η 2 <sup>p</sup> = 0.31; mean values: 469 vs. 485 ms respectively].

*Masked presentation*. No significant result was found.

*Unmasked presentation*. "Target Category" factor was significant *per se*, showing an earlier peak for non-target than for target stimuli [*F*(1*,*13) = 7, MSE = 1205,ε = 1, *p <* 0.02, η 2 <sup>p</sup> = 0.35; mean values: 472 vs. 485 ms respectively].

# *Peak*

(middle) stimuli and their difference (top).

The "Presentation" factor was significant [*F*(1*,*13) = 33.53, MSE = 12.85, ε = 1, *p <* 0.0001, η 2 <sup>p</sup> = 0.72], showing increased amplitude for unmasked (5.2 µV) than masked (3.24 µV) emotional expressions (**Figure 4**). Targets elicited greater positivity compared to non-targets ["Attention": *F*(1*,*13) = 12.5, MSE = 3.38, ε = 1, *p <* 0.004, η 2 <sup>p</sup> = 0.5]. This effect was observed only over parietal areas (4.47 vs. 3.49 µV, *p <* 0.0002), but not over occipital areas (4.3 vs. 4.6 µV, *p* = 0.2), as shown by the interaction of "Attention *×* ROI" [*F*(1*,*13) = 7.66, MSE = 1.9, ε = 1, *p <* 0.02, η 2 <sup>p</sup> = 0.37].

*Masked presentation*. The amplitude was significantly more positive for target than non-target stimuli ["Attention": *F*(1*,*13) = 7.4, MSE = 6.67, ε = 1, *p <* 0.02, η 2 <sup>p</sup> = 0.5; mean values: 3.71 and 2.77 µV respectively].

*Unmasked presentation*. Amplitudes were increased for targets compared to non-targets only over parietal leads, as revealed by the interaction of "Attention *×* ROI" [*F*(1*,*13) = 4.98, MSE = 0.94, ε = 1, *p <* 0.04, η 2 <sup>p</sup> = 0.28; mean values occipital ROI: *T* = 5.2 µV vs. NT = 4.6 µV, *p <* 0.008; mean values occipital ROI: *T* = NT = 5.49 µV].

# **DISCUSSION**

Our study reveals that facial expressions can be processed without awareness and independently of whether participants are engaged in an attempt to detect a specific emotion. Additionally, our results highlight that top-down selective attention and awareness of these stimuli operate in distinct time periods.

In both masked and unmasked conditions, the N170 showed an increased negative amplitude in response to negative faces compared to happy and neutral faces, revealing that the initial processing of emotion occurs independently of awareness (**Figure 2**). The P2 component was found to be linked to stimulus visibility, independently of attention and emotion, while the P3 and N3 showed an interaction between awareness and selective attention. Finally, the N2 and N3 were greater for targets than non-targets, independently of facial expression and type of presentation (masked vs. unmasked). At the N2 level, an enhanced response for targets emerged only in the masked condition, while, at the N3 level, the effect was observed for both masked and unmasked stimuli.

Our study therefore confirms previous findings, demonstrating that emotional expressions are processed without awareness. Furthermore, this process occurs even when the emotion is unattended. However, attention can modulate conscious processing at a very early stage (within 100 ms after the stimulus onset according to our findings), depending on the type of emotional stimulus. Interestingly, we found that attention and awareness for these emotional stimuli arise at different time periods and rely on different networks with respect to those involved in the initial stage.

# **Temporal Processing of Emotions**

Evidence from electrophysiological studies using implicit tasks, supported a model of automatic (defined here as occurring without voluntary attention), and rapid processing of emotional expressions (Batty and Taylor, 2003), suggesting that the N170 can be linked to the processing of emotional faces. In their study, Batty and Taylor (2003) investigated six basic emotional expressions (sadness, fear, disgust, anger, happiness, and surprise) as well as neutral faces, and found that the N170 was modulated by emotion and produced a significantly larger amplitude for fearful compared to other expressions. Blau et al. (2007) also observed an enhanced N170 amplitude for fearful compared to neutral faces under non-explicit task conditions. They showed that the processing of facial structure and emotion produces electrophysiological responses within the same time interval, suggesting that the emotion processing does not occur solely after a supposed initial encoding period, as previously reported (Eimer and Holmes, 2002; Ashley et al., 2004). Separate investigations concluded that emotional expressions modulate the N170, even in the absence of visual awareness. As noted above, a greater activation for negative emotions has been found for backwardmasked faces presented below the threshold of visual awareness (Pegna et al., 2008). The same time-window for subliminal processing of emotions was identified between 140 and 180 ms by Kiss and Eimer (2008), who described an enhanced response for fear compared to happy faces in supraliminal and subliminal conditions over frontal and central sites. Thus, it seems clear that a strong modulatory activity occurs very rapidly in response to the emotional valence, even when the stimulus is not accessible to perceptual awareness.

Findings evidencing an N170 modulation for undetectable faces were replicated in a study using supraliminal and subliminal faces in which emotions were irrelevant to the task (Pegna et al., 2011). The results showed that the N170 was enhanced both for detectable and undetectable fearful faces, and that this occurred even when the participants were engaged in an orthogonal task (i.e., comparing lateral flanker bars). Consequently, the N170 modulation to emotional expressions is not eliminated when the stimuli are unattended. In a previous MEG study (Bayle and Taylor, 2010), demonstrated that the M170 was sensitive to emotions whether or not they were attended (although different areas appeared to be involved in each case). This contradicts certain previous findings such as those of Wronka and Walentowska (2011) who investigated emotional face processing using a selective attentive task and reported an effect on the N170, but only when the participants were asked to respond to the emotional expression of the face. Indeed, when requested to judge the gender of the faces the effect disappeared. These authors argued that top-down attentional control mechanisms were required for emotional processing in accordance with previous reports (Pessoa et al., 2002a,b). Although this conclusion has been the subject of much debate, our study corroborates reports suggesting that the N170 is modulated by emotions without attention or awareness, and prior to their engagement.

Interestingly, we also found a modulation at the C1 (N70) level for fearful faces, which showed an increased negativity compared to the other emotions. The early effect of emotion arising at 70 ms, may be due to the salience of fearful faces in conjunction with top-down mechanisms of voluntary attention that respond to visual features pertaining to the specific type of stimulus. This would explain why in our case, even under conditions of limited visibility, participants, focusing on threatening stimuli, produced an enhanced C1. However, the effect was found only in the masked attentive condition, suggesting that "fear" might capture attention solely during brief presentations of stimuli (i.e., 21 ms in our condition). In support of this explanation, similar effects have been described by Bannerman et al. (2010) who found a comparable outcome using a spatial cueing paradigm and measuring visual saccades and manual RTs to the cued locations. Fearful or neutral body expressions were used as cues which were presented for brief (20 ms) or long (100 ms) durations. These were followed by targets to which participants were asked to respond, either with a button press or with a saccade. In the short presentations, no differences were found in manual responses across emotional cues. However, longer presentations produced an emotion-dependant effect. Importantly, saccadic RTs were significantly faster for fearful compared to neutral emotional expressions in validly-cued locations, only during brief presentations. Thus, shorter presentations may enhance the effect of threatening cues, at least when saccadic responses are considered, a finding that could corroborate our results for the C1 response.

In behavioral paradigms, Quek and Finkbeiner (2013) confirm that masked non-emotional faces are processed even when they are not attended. Our ERP data additionally show this effect on the emotional valence, in agreement with other ERP (e.g., Pegna et al., 2011) and fMRI studies (e.g., Vuilleumier et al., 2001). Interestingly, our findings also reveal that selective attention has a differential effect on the extent of unconscious processing depending on the different emotions. Indeed, selective attention can boost the processing of threatening faces in the masked condition, suggesting that unconscious processes can modulate selective attention in return. This view adopts a middle position between the complete independence of awareness and attentional resources (e.g., Posner and Snyder, 1975; McCormick, 1997), and the influence of attention on the engagement of cognitive resources in unconscious processing (e.g., Dehaene et al., 2006). However, the intentions of participants and their action plans can influence the initial unconscious processing of visual stimuli, as proposed by Neumann (1984). This would suggest that the early process is likely to be modulated by top-down strategic control and the unconscious processing of visual stimuli can be defined as "automatic" inasmuch as it does not hand out information necessary to support strategic processing steps (for a complete review, see Kiefer, 2007). Additionally, it has been demonstrated that the specific instructions given to participants prior to performing a task (Van den Bussche et al., 2009), as well as the knowledge of a prime or stimulus before starting an experiment (Al-Janabi and Finkbeiner, 2012), can increase subsequent masked cueing effects, enhancing the perception and the discrimination of unseen stimulus and boosting invisible objects into consciousness (Lin and Murray, 2014). It may be argued that in our case, the exposure to the stimuli before the experimental session, as well as their random presentations below and above the threshold of visibility, could have enhanced the emotional effect in the subliminal condition. However, this topdown influence seems to appear solely with negative emotions, thus precluding this interpretation.

# **Temporal Processing of Attention and Awareness**

No effects were observed on the P1 component in response to emotion, selective attention or awareness. On the other hand, the P2 was affected by stimulus visibility but showed no modulation associated with attention (**Figure 3**), while the N2 showed an interaction of attention with the visibility of the stimuli, the response being modulated by selective attention only when the target was clearly visible. The N3 and the P3 however, produced greater responses for targets compared to non-targets in the masked as well as in the unmasked condition (**Figure 4**), revealing neural activity linked essentially to selective attention during this period. Thus, the neural mechanisms of visual attention and visual awareness seem to be independent and dissociated in the initial periods, with awareness arising slightly earlier in time.

Our results therefore corroborate previous suggestions that SN and awareness (the VAN) are dissociable (Koivisto and Revonsuo, 2007, 2008; Railo et al., 2015), as selective attention did not modulate the first period of our awareness-related component occurring between 200 and 260 ms after stimulus onset (the P2 component). On the other hand, attentional effects appeared at the N2 and at the N3 level, only 280 ms after the presentation of the stimulus. This view is by no means generally agreed upon. For instance, Shafto and Pitts (2015) showed that the N170, the VAN and the P300 were absent during inattentional blindness, but were present and were modulated in the aware condition, when faces were task-relevant. They claim that selective attention and perceptual awareness are distinct and separable processes, but only singularly dissociable, meaning that attention can operate even in the absence of awareness, while perceptual awareness cannot operate without attention. Their results argue against the hypothesis of the P3 wave as a neural marker of workspace activation and conscious access (Dehaene et al., 2014). Evidently, it is not always justified to assume that the VAN indexes visual perception alone, due to the fact that attentional ERP components present similar latencies and topographies as the VAN, leading to difficulties in distinguishing them from one another (Rutiku et al., 2015). In addition, the type of paradigm employed (e.g., feature-based vs. spatial attention, stimulus expectation, and adaptation or storing in working memory) may lead to differences in accessibility to consciousness by the visual stimuli. Thus, the prerequisites and consequences of consciousness may become confused with awareness *per se* and its actual neural markers (Aru et al., 2012; De Graaf et al., 2012).

In our case, the P2, corresponding here to the first part of the VAN (200–260 ms), was shown to emerge independently of top-down selection. Awareness and selective attention began to interact at the N2 level, likely corresponding to the second part of the VAN, showing the greatest interaction effect on the parietal

# **REFERENCES**

Al-Janabi, S., and Finkbeiner, M. (2012). Effective processing of masked eye gaze requires volitional control. *Exp. Brain Res.* 216, 433–443. doi: 10.1007/s00221- 011-2944-0

and, thus, dependent on top-down mechanisms (voluntary attention). This suggests that the P300 should rather be seen as a consequence of consciousness, related for example to postperceptual processing, rather than to awareness *per se* (Shafto and Pitts, 2015). **CONCLUSION**

> In the literature, it has been shown that both emotional faces and awareness can affect the amplitude (Balconi and Lucchiari, 2007) and latency (Balconi and Mazza, 2009) of the N2, while threatening stimuli appear to enhance the N170 (Pegna et al., 2008, 2011). The increased amplitudes for such stimuli are interpreted as a heightened activity in response to their emotional content, while the delayed peak for the masked stimuli may reflect the effort necessary to compute a weaker stimulus. These findings are in line with the view that emotional stimuli are capable of capturing attention and eliciting a rapid, preattentive response in the absence of awareness (Öhman et al., 2000, 2001; Vuilleumier, 2005; Maratos, 2011) and is consistent with the hypothesis of dual pathways for visual processing, which includes a subcortical pathway through the superior colliculus and pulvinar to the amygdala allowing a rapid response to signals of threat is required (Liddell et al., 2004, 2005; LeDoux, 2007).

> P3. The N3 and the P3 waves showed an effect linked to the detectability of stimuli independently of the emotional valence

Finally, the pattern of effects, observed in the ERPs, appears to be in line with the "cumulative influence model" put forward by (Tallon-Baudry, 2012). This model states that attention and awareness might be initially independent and combined later on, when the response of the subject reaches the decisional stage. In our study, attention and awareness are indeed not initially combined, as the latter emerges after around 200 ms before any effect of selective attention is observed (i.e., targets and nontargets do not differ). On the other hand, effects of selective attention, appear after 280 ms, first in interaction with stimulus visibility, but then independently. The fact that the neural signatures of awareness and of selective attention are distinct, albeit partially, argues in favor of their relative independence and suggest that both can contribute to the final decisional processes. That said, selective attention appears to be influenced by the emotional nature of the stimuli, which in turn impinges on unconscious processing at a very early stage (Finkbeiner and Palermo, 2009).

# **ACKNOWLEDGMENTS**

This investigation was supported by the Swiss National Science Foundation grant no. 320030-144187. We are grateful to Martina Franchini who provided diligent assistance with EEG recording and analysis.

Aru, J., Bachmann, T., Singer, W., and Melloni, L. (2012). Distilling the neural correlates of consciousness. *Neurosci. Biobehav. Rev.* 36, 737–746. doi: 10.1016/j.neubiorev.2011.12.003

Ashley, V., Vuilleumier, P., and Swick, D. (2004). Time course and specificity of event-related potentials to emotional expressions. *Neuroimage* 15, 211–216. doi: 10.1097/00001756-200401190- 00041


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Del Zotto and Pegna. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Source unreliability decreases but does not cancel the impact of social information on metacognitive evaluations

*Amélie Jacquot1\*, Terry Eskenazi2, Edith Sales-Wuillemin3, Benoît Montalan4, Joëlle Proust5, Julie Grèzes2 and Laurence Conty1\**

*<sup>1</sup> Laboratoire de Psychopathologie et Neuropsychologie EA 2027, Université Paris 8, Saint-Denis, France, <sup>2</sup> Laboratoire de Neurosciences Cognitives INSERM U960, Ecole Normale Supérieure, Paris, France, <sup>3</sup> Laboratoire de Socio-Psychologie et Management du Sport EA 4180, Université de Bourgogne, Dijon, France, <sup>4</sup> Laboratoire ICONES EA 4699, Université de Normandie, Mont-Saint-Aignan, France, <sup>5</sup> Institut Jean Nicod, Ecole Normale Supérieure, Paris, France*

#### *Edited by:*

*Andrew Bayliss, University of East Anglia, UK*

#### *Reviewed by:*

*Stefan Kopp, Bielefeld University, Germany Francesco Foroni, International School for Advanced Studies, Italy*

#### *\*Correspondence:*

*Amélie Jacquot and Laurence Conty, Laboratoire de Psychopathologie et Neuropsychologie EA 2027, Université Paris 8, 2 Rue de la Liberté, Saint-Denis 93526 Cedex, France amelijacquot@gmail.com; laurence.conty@univ-paris8.fr*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 29 April 2015 Accepted: 31 August 2015 Published: 14 September 2015*

#### *Citation:*

*Jacquot A, Eskenazi T, Sales-Wuillemin E, Montalan B, Proust J, Grèzes J and Conty L (2015) Source unreliability decreases but does not cancel the impact of social information on metacognitive evaluations. Front. Psychol. 6:1385. doi: 10.3389/fpsyg.2015.01385* Through metacognitive evaluations, individuals assess their own cognitive operations with respect to their current goals. We have previously shown that non-verbal social cues spontaneously influence these evaluations, even when the cues are unreliable. Here, we explore whether a belief about the reliability of the source can modulate this form of social impact. Participants performed a two-alternative forced choice task that varied in difficulty. The task was followed by a video of a person who was presented as being either competent or incompetent at performing the task. That person provided random feedback to the participant through facial expressions indicating agreement, disagreement or uncertainty. Participants then provided a metacognitive evaluation by rating their confidence in their answer. Results revealed that participants' confidence was higher following agreements. Interestingly, this effect was merely reduced but not canceled for the incompetent individual, even though participants were able to perceive the individual's incompetence. Moreover, perceived agreement induced zygomaticus activity, but only when the feedback was provided for difficult trials by the competent individual. This last result strongly suggests that people implicitly appraise the relevance of social feedback with respect to their current goal. Together, our findings suggest that people always integrate social agreement into their metacognitive evaluations, even when epistemic vigilance mechanisms alert them to the risk of being misinformed.

Keywords: metacognition, social influence, facial expression, confidence, electromyography, epistemic reliability

# Introduction

Other than communicating important information about others' feelings and attitudes (George and Conty, 2008), non-verbal social cues such as gaze or facial expression also provide circumstantial information that may guide people's decisions. Remarkably, non-verbal social cues can spontaneously affect metacognitive evaluations of past decisions (Eskenazi et al., in revision). Metacognition refers to the process by which individuals monitor and control their likely success

**Abbreviations:** 2AFC, two alternative forced choice; RFRs, rapid facial reactions; EMG, electromyography; PP, previous participants; SE, standard error; SD, standard deviation; RTs, reaction times.

in cognitive tasks (Proust, 2010). As we make a decision, we concurrently monitor our mental activity in order to regulate information processing and behavior (Koriat, 2007). In experimental research, metacognitive evaluations are usually measured by a second-order decision, which may occur in the form of a subjective confidence judgment in past performance on a first-order task (i.e., retrospective evaluations; Fleming et al., 2010; Kepecs and Mainen, 2012). Several works have aimed to identify the informational cues used by people to elaborate their metacognitive evaluations (Alter and Oppenheimer, 2009; Bahrami et al., 2010; Koriat and Ackerman, 2010). In previous work from our lab, we found that people spontaneously adjust their metacognitive retrospective evaluations based on the nonverbal feedback given by another individual (Eskenazi et al., in revision). Here, we investigated whether this form of social influence varies as a function of the reliability of the social source providing the feedback.

In our previous work, we asked people to perform a 2AFC perceptual task and then rate the level of confidence in their responses. Participants' confidence ratings were higher after another person had oriented his/her gaze toward their response (25% of trials) compared to when the person gazed at the opposite response (25% of trials), or when there was no social cue (50% of trials). Intriguingly, this effect of non-verbal feedback on confidence ratings was present while participants were told that the person's gaze direction was uninformative and should be ignored (Experiment 1). Furthermore, the effect was observed despite the fact that the person gazed at the participant's response only half of the time and regardless of response accuracy. Therefore, participants viewed an equal number of trials with objectively correct and objectively incorrect feedback, rendering the person's gaze direction unreliable for task purposes. Also, using the same experimental design but leading participants to believe that the gaze direction reflected a PP's response to the same question, the effect was observed at the expense of participants' metacognitive sensitivity (Experiment 2). Finally, task difficulty (which strongly determines participants' degree of certainty prior to feedback) did not modulate social influence, which is in contrast to what previous studies may have predicted (Festinger, 1954; Laland, 2004). Therefore, our results suggest that people have a natural tendency to spontaneously assign relevance and trustworthiness to social information, especially when the social information is perceived as offering positive feedback.

In the real world, however, not all social sources are equally reliable. A strong susceptibility of metacognition to social information regardless of source reliability carries a major risk of accidental or intentional misinformation. Developmental studies consistently demonstrate that children trust others selectively starting at 4 years of age. They monitor the informants' past accuracy and adjust their decisions to the information provided to them (Harris and Corriveau, 2011; Harris et al., 2012; Mills, 2013; Bernard et al., 2015b). We are thus cognitively equipped to evaluate the epistemic reliability of (social) sources of information, a capacity termed 'epistemic vigilance' (Sperber et al., 2010). We seem to be able to assign a weight to social information that determines the extent to which we assimilate that information. Here we studied the extent to which beliefs about the epistemic reliability of a social source modulate the influence of non-verbal social information on metacognitive evaluations. Gaging the reliability of an informant mainly consists in gaging the accuracy of the message he/she communicates (Bernard et al., 2015a). The associated mechanisms may rely on a variety of cues, such as the quality of the message, the perceived benevolence of the informant, the number of congruent informants and/or one's own perceptual or memorial certainty (Sperber et al., 2010; Bernard et al., 2015b). However, to be reliable, the informant must meet a critical condition: he/she must be competent, i.e., possess genuine information (as opposed to misinformation or no information; Fiske et al., 2007; Sperber et al., 2010).

In an amended version of the paradigm described above, participants in the current study performed a first-order 2AFC task, followed by a subjective confidence rating of their own response on each trial. Before reporting their rating, participants saw a short video clip in which an individual either smiled and nodded to express agreement with the participant's response, frowned and shook his/her head to signal disagreement, or raised his/her eyebrows and shoulders to express uncertainty. The participant was led to believe that these individuals were PPs and that the expression they displayed at each trial reflected whether they had given the same response as the participant (agreement), the opposite response (disagreement), or no response (uncertainty). However, similarly to Eskenazi et al. (in revision), the experiment was controlled so that participants received equal amounts of objectively correct feedback, objectively incorrect feedback, and uncertain feedback. Throughout the experimental session, each participant saw two individuals, one of whom was presented as being more competent at the task than the other. We hypothesized that participants would be more likely to align their confidence with the non-verbal social feedback when it was provided by the more competent individual. We also expected confidence ratings to be higher following positive feedback (agreement) than following disagreement or uncertainty, as our previous results revealed that people were more susceptible to positive/concordant feedback.

In addition to confidence ratings, we collected participants' facial muscle activity, as an implicit marker of feedback processing. When exposed to facial expressions, people typically display RFRs, which are detectable by EMG and usually match the perceived expression (Bush et al., 1986; Dimberg and Thunberg, 1998; Dimberg et al., 2000; Hess and Blairy, 2001; McIntosh, 2006). The exact mechanisms underlying RFRs remain a hotly debated issue. Several authors have suggested that RFRs reflect the internal simulation of perceived emotions, which facilitates their understanding. In line with this notion, RFRs have been shown to play a role in the elaboration of judgments about perceived facial expression (Niedenthal et al., 2001; Oberman et al., 2007) as well as ratings about one's emotional reaction to others' facial expressions (e.g., Sato et al., 2013). Most importantly for the present work, RFRs have been reported to be modulated by the subjective relevance or meaningfulness of the facial expression (Soussignan et al., 2013, 2015). For example, RFRs typically increase for in-group as compared to outgroup members (McHugo et al., 1991; Bourgeois and Hess, 2008; van der Schalk et al., 2011), and when there is potential for interaction with others (Grèzes et al., 2013). Here, we expected to observe greater RFRs to the facial expressions of the competent agent, who by definition provides more reliable feedback than the incompetent agent.

We recorded the EMG activity of participants' zygomaticus (the facial muscles responsible for pulling the corners of the mouth upward into a smile) and corrugator supercilii (the facial muscles responsible for pulling the brows together). Usually, viewing positive facial expressions elicits increased activity of zygomaticus major muscles, while negative facial expressions evoke increased activity of the corrugator (Dimberg, 1982; Wild et al., 2001; Sonnby–Borgström, 2002; Dewied et al., 2006; Weyers et al., 2006; Sato et al., 2008; Schilbach et al., 2008; Schrammel et al., 2009; Dimberg et al., 2011; Moody and McIntosh, 2011; Rymarczyk et al., 2011). Here, we expected to observe RFRs (i.e., increased zygomaticus activity) in response to facial expressions of agreement (as compared to disagreement and uncertainty), and increased corrugator activity in response to disagreement (as compared to agreement and uncertainty). Furthermore, as markers for the implicit monitoring of competence, we expected RFRs to be enhanced for the competent as compared to the incompetent individual. We also expected these RFR effects to be strongly reflected in the zygomaticus activity, as a reflection of the particular susceptibility to positive social feedback.

# Materials and Methods

#### Participants

Twenty-eight volunteers participated in the experiment (14 females; mean age = 24.86 ± 0.79). All reported normal or corrected-to-normal vision and had no neurological or psychiatric history. Each participant gave their written informed consent and received a compensation of 20€. We obtained ethics approval from the local research ethics committee (CPP Ile de France III, approval n◦ Am5569-1- 2489) for this research. Data from three participants were excluded from the analysis: two failed to correctly identify the competent individual and one reported extreme confidence rating values (see data analysis).

# Stimuli

### Dots Display Stimuli

The first-order 2AFC task was a number estimation task where participants judged if target displays contained more or fewer dots than a reference display. The displays consisted of arrays of white dots (10-pixel diameter) randomly distributed on a black disk (320-pixel diameter), with at least 10 pixels separating the dots from each other. For target displays, the number of dots varied from 32 to 68 by increments of 4, while the number of dots was fixed at 50 for the reference display. Task difficulty was manipulated by varying the difference in the number of dots separating the target from the reference displays. This difference ranged from ±2 to ±18 dots in five increments, yielding five levels of task difficulty. Using a program in Matlab, we randomly

generated 48 different target displays for each distance, as well as 10 different reference displays.

#### Social Stimuli

The social stimuli consisted of 1.5-s videos created for the purpose of the experiment (see Supplementary Methods). Individuals who did not have distinctive features (e.g., mustache, piercing, jewelry, etc.) were filmed wearing black t-shirts against a white background under the same lighting conditions. They were filmed individually and frames contained fontal views that included the top of the head to the shoulders. Several videos were filmed for each expression: agreement (i.e., smiling and nodding), disagreement (i.e., frowning and head shaking), and uncertainty (i.e., raising eyebrows and shrugging). These were non-verbal facial expressions consisting of head movements and were filmed in an ecologically valid context (see Supplementary Methods). A series of pre-tests were conducted to select the videos from two pairs of individuals (one pair of females and one pair of males; mean age = 32.75 years, *SD* = 2.22) who were matched for perceived competence and trustworthiness. We selected three videos from each individual, one for each of the expressions (agreement, disagreement or uncertainty). We paired videos that were judged in pre-tests to be equally persuasive and emotional in the context of our experimental task. Each participant was presented with only one pair of individuals, either the male or the female pair.

# Experimental Procedure Procedure

Participants were individually tested in a room where they were seated approximately 90 cm away from a 17-inch LCD monitor. Stimulus presentation was conducted using the E-Prime 2.0 software (Psychology Software Tools, Inc., Pittsburgh, PA, USA). Each trial was initiated by a 400-ms presentation of a fixation cross, followed by a brief 100-ms target display. The symbols "−" and "+" appeared on the left and right sides of the screen, respectively, 300 ms after the disappearance of the target, and remained onscreen until the participant gave his/her response. Using a two-choice button, participants had to decide whether the target display contained more ("+") or fewer ("−") dots than the reference display. After participants responded, they were presented with a 1.5 s video of a social agent displaying an expression and were asked to indicate their level of confidence in their response using a scale of 0 (not confident at all) to 100 (very confident). The scale remained available on the screen for participants to respond for up to 3000 ms (**Figure 1**). Each participant completed 10 blocks of 24 trials and each block began with a reference display that was presented for 3000 ms. Participants had to keep the reference display in mind to be able to evaluate the upcoming target stimuli.

# Belief Manipulation

Participants were led to believe that the individuals seen in the videos were actual participants who previously took part in the same experiment. In each trial, the individual's expression supposedly reflected that "PP's" answer to the very same dot question. We explained that an expression of agreement would

be shown when the PP gave the same response as the participant. If the PP gave the opposite response, the participant would see an expression of disagreement. If the PP did not respond, an expression of uncertainty would appear. To ensure the story's credibility, participants were filmed before beginning the experiment (wearing a black t-shirt and expressing agreement, disagreement, or uncertainty) and were told that the videos would be used in future sessions of the experiment.

### Competence Manipulation

At the beginning of the experiment, participants were presented with a picture of the two PPs they would see in the experimental session and each PP's fictive success rates for the task. These scores were manipulated in order to present one of the PPs as being more competent at the task than the other. They were randomly generated to be between 94.0 and 98.9% for the "competent" PP and between 61.0 and 65.9% for the "incompetent" PP. The two PPs used in each experimental session were always of the same gender and half of the participants were presented with two female PPs while the other half viewed two male PPs. The PP competence associations were counterbalanced across participants.

#### Block Distribution

In order to reinforce the association between PPs and level of competence, the experiment began with two blocks of easy trials (Difficulty 1 and 2). In these induction blocks, the "competent" PP gave correct feedback on 80% of the trials (agreement if the participant gave the correct answer, disagreement if not), incorrect feedback on 10% of the trials (disagreement if the participant gave the correct answer and agreement if not), and expressed uncertainty on the remaining 10% of trials. By contrast, the "incompetent" PP gave correct feedback in only 20% of cases, incorrect feedback in 40% and uncertainty in the other 40%. We included only the easy trials so that participants could easily discern correct from incorrect feedback. During these induction blocks, participants performed 12 trials per difficulty level (2) and PP (2), resulting in 48 trials that were randomly distributed between the two blocks. After the two induction blocks, participants performed three experimental blocks comprising harder trials (difficulty 3, 4, and 5). In these blocks, both PPs provided random feedback, expressing an equal number of agreement, disagreement and uncertainty expressions (i.e., 33%). These three experimental blocks immediately followed the two manipulation blocks so that participants would not notice the change in feedback distribution. The experimental block consisted of 12 trials per level of difficulty (3) and per PP (2). All 72 trials were randomly divided across the three blocks. The entire procedure (two induction blocks and three experimental blocks) was repeated twice. Participants performed 240 trials in total, of which 144 (experimental trials) were analyzed.

#### Post-Test

At the end of the experiment, pictures of the PPs with neutral expressions were presented together on the screen and participants had to choose the most "competent" one. Next, each PP was presented individually and participants were asked to indicate ("yes" or "no") whether they thought the PP had influenced their confidence, and to what extent (on a scale of 0–3). Participants were also asked to indicate each PPs competence and trustworthiness on a scale of −5 ("not at all") to 5 ("entirely").

## Electrophysiological Data Recording and Reduction

We collected surface facial EMG recordings from each participant using the ADInstrument acquisition system (ML870/P Powerlab 8/30). It has been shown that the right hemisphere of the brain is responsible for spontaneous emotional facial reactions (Dimberg and Petterson, 2000), so the EMG electrodes were placed on the left side of each participant's face.

Throughout the experiment, we continuously recorded *corrugator supercilii* (eyebrow frowning) and *zygomaticus major* (elevation of the mouth corners) muscle activity using Sensormedics 4 mm shielded Ag/AgCl miniature electrodes. Each muscle's activity was recorded by two electrodes placed on the muscle about 1.25 cm apart (center to center), and roughly parallel to the muscle. The ground electrode was placed at the bottom of the neck dorsally. Before attaching the electrodes, target sites were cleaned with alcohol and rubbed to reduce inter-electrode impedance. The signal was recorded with a sampling frequency of 2 kHz and a band-pass online filter of 500 Hz and then integrated.

Because RFRs were reported to occur during the first second of presentation of a face (Dimberg and Thunberg, 1998; Dimberg et al., 2000; Moody et al., 2007), for each trial of the experimental blocks, we extracted the EMG data collected 300 ms before to 1000 ms after video onset. Integral values were then subsampled offline at 10 Hz, resulting in the extraction of 100-ms time bins. EMG trials containing a noisy baseline (2 SD above or below the mean) were rejected.

Next, the data were log-transformed [Ln (μV)] to reduce the impact of extreme values and standardized (transformed to *Z*-scores) for each participant and for each muscle. Finally, the baseline value (300 ms before video onset) was subtracted from each trial.

#### Data Analysis Behavioral Data

Analyses were conducted on the experimental blocks, which included 144 trials in total. Accuracy and reaction times (RTs) for the dot task were submitted to repeated measures ANOVAs using Difficulty (3, 4, 5) as a within-subject factor. A repeated measures ANOVA was conducted on confidence ratings with Competence (Competent vs. Incompetent), Expression (Agreement vs. Disagreement vs. Uncertainty) and Difficulty (3, 4, 5) as withinsubject factors. Taking into account the sphericity assumption, we adjusted the degrees of freedom using the Greenhouse–Geisser correction when appropriate (in this case, ε and corrected p values were reported). Planned comparisons were performed when main effects or interactions were observed.

We conducted *t*-tests to compare the two PPs on the different variables recorded during the post-test: Competence, Trustworthiness, and the degree of influence of the PPs. The posttest indicated that two of the 28 participants did not explicitly recognize the competent agent and another individual reported an extreme value for confidence (*>*2 SD above the mean). All three participants were excluded from the analyses.

### Electrophysiological Data

Participants with a high rate of trial rejection (2 SD above the mean rate; i.e., *>*25%) were excluded from the analyses on zygomatic (n = 2) and corrugator (n = 2) activity. The data for each muscle were submitted separately to a repeated measures ANOVA with Competence (Competent vs. Incompetent), Expression (Agreement vs. Disagreement vs. Uncertainty), Difficulty (3, 4, 5) and Time Windows (10) as within-subject factors. Taking into account the sphericity assumption, we adjusted the degrees of freedom using the Greenhouse–Geisser correction when appropriate (in this case, ε and corrected p values were reported). Planned comparisons were performed when main effects or interactions were observed.

# Results

#### Behavioral Results First-Order Task

The ANOVAs conducted for performance on the dot task showed a main effect of Difficulty in accuracy [*F*(4,24) = 216.32, ε = 0.73, *p*corr *<* 0.0001] and in RTs [*F*(4,24) = 36.02, ε = 0.43, *p*corr *<* 0.0001]. Planned comparisons showed that accuracy decreased (all *p*s *<* 0.05), while RTs increased (all *p*s *<* 0.05) with task difficulty (See **Table 1**).

TABLE 1 | Accuracy and response time by each level of difficulty for the first order-task (with SD).


#### Confidence

The ANOVA indicated a main effect of Difficulty [*F*(2,48) = 38.44, ε = 0.69; *pcorr <* 0.0001]. Confidence decreased when task difficulty increased (all *p*s *<* 0.005). A main effect of Expression was also observed [*F*(2,48) = 22.01, ε = 0.72; *pcorr <* 0.0001]. Agreement led to higher confidence than Disagreement [*t*(24) = 5.82; *p <* 0.0001—mean effect size = 7.75 ± 1.51] and Uncertainty [*t*(24) = 5.13; *p <* 0.0001 mean effect size = 6.27 ± 1.30]. Disagreement and Uncertainty did not differ significantly [*t*(24) = 1.85; *p >* 0.05— mean difference = 1.48 ± 0.79], suggesting that Disagreement did not impact confidence in our experimental design. Importantly, we observed an interaction between Competence and Expression [*F*(2,48) = 10.49, ε = 0.91; *p*corr *<* 0.0001], indicating that agreement expressed by the competent PP has a greater impact on confidence than agreement expressed by the incompetent PP [*t*(24) <sup>=</sup> 3.78; *<sup>p</sup> <sup>&</sup>lt;* 0.001 – **Figure 2**].

#### Post-Test

The *t*-tests revealed that, after performing the task, the competent PP was perceived as being more competent [*t*(24) = 9.05, *p <* 0.0001] but also more trustworthy [*t*(24) = 6.03, *p <* 0.0001] than the incompetent PP (**Figure 3**). Moreover, 88% of participants reported having been influenced by the competent PP, while 52% of participants reported having been influenced by the incompetent PP. The competent PP was also reported to have influenced participants' confidence more intensely than the incompetent PP [*t*(24) = 4.86, *p <* 0.0001]. A one-sample *t*-test against zero confirmed that participants reported having been influenced by both the competent and the incompetent PPs [all *<sup>t</sup>*(24) *<sup>&</sup>gt;* 4.0, all *<sup>p</sup>*<sup>s</sup> *<sup>&</sup>lt;* 0.001, **Figure 3**].

#### Electrophysiological Results Zygomaticus

The ANOVA did not reveal any main effects of our factors, but a main effect of Time Windows [*F*(9,198) = 2.3, ε = 0.52, *p*corr *<* 0.05]. However, a three-way interaction among Competence, Expression and Difficulty was observed [*F*(4,88) = 2.6, ε = 0.90, *p*corr *<* 0.05]. Agreement expressed by the competent PP induced elevated zygomaticus activity when compared to Disagreement expressed by that same PP. This effect was largest for difficulty level 5 [i.e., hardest trials; *t*(22) = 2.16; *p <* 0.05], where Agreement also induced greater activity than Uncertainty [*t*(22) = 2.36; *p <* 0.05]. The difference between Agreement and Disagreement tended to reach significant for difficulty 4 [*t*(22) = 1,74; *p* = 0.09], but disappeared for difficulty 3 [i.e., easiest trials; *t*(22) = 1.44; *p >* 0.1]. Importantly, these modulations were not observed for the expressions of the incompetent PP (all *<sup>p</sup>*<sup>s</sup> *<sup>&</sup>gt;* 0.2 – **Figure 4**).

## Corrugators

The ANOVA did not reveal any main effects of our factors, but a main effect of Time Windows [*F*(9,198) = 6.27, ε = 0.46, *p*corr *<* 0.01]. Moreover, it did not reveal any interactions of our factors on corrugator activity (all *F*s *<* 2.28; all *p*s *>* 0.1). We had expected to find corrugator activity in the disagreement condition, as previous experiments have shown an impact of negative emotional displays on this muscle's activity (Larsen et al., 2003). There are two possible explanations for this lack of corrugator activity modulation: firstly, the lack of impact of disagreement on confidence suggest that this expression was not judged particularly relevant for the task by the participants. Secondly, it is known that corrugators are sensitive to task difficulty (van Boxtel, 2010). Here, the dot task was immediately followed by social feedback, so there may have been a carryover effect of task difficulty on corrugator activity, which would

contaminate the effect of feedback. This last possibility limits any further interpretation of corrugator activity patterns.

# Discussion

In this experiment, we investigated whether non-verbal social feedback provided by sources with varying epistemic reliability modulates metacognitive evaluations. To this end, we explored how subjective confidence in performance on a first-order task and RFRs to non-verbal social feedback varied as a function of the reliability of the social source. The results indicated that individuals always integrated social agreement into the elaboration of their metacognitive evaluation, even when mechanisms for epistemic vigilance alert participants to the risk of being misinformed. Albeit to a lesser extent, agreement provided by an unreliable source still impacted participants' post-decision confidence ratings. However, when asked explicitly, participants were able to distinguish competent informants from incompetent ones. In addition, the RFRs indicated an implicit processing of the competence attributed to the social source.

Regarding subjective confidence, our findings revealed a pattern similar to that reported by Eskenazi et al. (in revision). Participants adjusted their confidence ratings as a function of the information provided by another individual's non-verbal cues. In Eskenazi et al. (in revision, Experiment 2) participants' confidence ratings were higher after another person (presented as PP) had oriented his/her gaze toward their response than in the absence of social cues. Here, we found that subjective confidence levels were higher after an individual expressed agreement as compared to disagreement or uncertainty. Eskenazi et al. (in revision, Experiment 2) also found participants' confidence ratings to be lower after another person had oriented his/her gaze toward the opposite response, compared to when there was no social cue. Here, however, perceived disagreement was not associated with lower levels of confidence than perceived uncertainty. This might be explained by the fact that perceived uncertainty was not neutral; in fact, it may have been sufficient in lowering participants' confidence. However, we deem it unlikely. That is because any effect of perceived disagreement or uncertainty should have been modulated by source reliability just as the effect of perceived agreement was. The absence of such modulation by source reliability converges toward the view that others' disagreement and uncertainty were not judged taskrelevant by the participants and thus did not impact confidence. This may reveal further that other's gaze direction is processed in a more reflexive manner than the facial expressions we used in the current study. It is well known that, as soon as 3 months of age, human infants automatically orient their attention toward the direction an adult's eyes turn (Hood et al., 1998; Farroni et al., 2000; Senju et al., 2006). In adults, such mechanism has been proved to affect automatically our evaluations about object of the environment looked at by others (Bayliss et al., 2006, 2007; Manera et al., 2014). Incongruent gaze direction might be more difficult to ignore than more complex disagreeing facial expressions. Anyway, in Eskenazi et al. (in revision), the effect of congruent gaze was significantly higher than the effect of incongruent gaze. Together with the present results, these findings support the view that positive/concordant information (i.e., agreement) has a stronger effect on confidence judgments than negative/disapproving social information (i.e., disagreement).

Two main (non-exclusive) mechanisms may account for this agreement effect. First, the particular sensitivity to agreement possibly reflects the individuals' biased tendency to see themselves in a positive light (Leary, 2007) and to expect positive rather than negative feedback (Hepper et al., 2011). It has been suggested that this tendency helps people maintain a positive self-concept (Taylor and Brown, 1988). In other words, the particular susceptibility to social agreement reported here may reflect a self-serving bias. Individuals seemed to reject the validity of the disagreeing feedback, focusing on their potential success while overlooking their potential failures. It is well-known that a self-serving bias heavily influences judgment processes (Fiske et al., 2007). Here, we suggest that a self-serving bias can also influence one's metacognitive evaluations of past decisions. Alternatively, it is possible that agreements are automatically appraised as being more reliable than disagreement or uncertainty. It is well-known that positive feelings in one area cause other traits to be viewed positively, a form of confirmation bias called the "halo effect" (Thorndike, 1920; Asch, 1946; Nisbett and Wilson, 1977). Consistently, when not manipulated for reliability, all of our videos were judged to be more competent, persuasive, and trustworthy when expressing agreement than disagreement or uncertainty (see Supplementary Methods-Pre-test 3).

Importantly, the impact of agreement or positive social feedback on subjective confidence ratings was greater when it was provided by a competent rather than an incompetent social source. This demonstrates that participants were sensitive to the epistemic reliability of the social source, which modulated the weight they assigned to the social information when elaborating their metacognitive evaluations. Moreover, the posttests highlighted that the "competent" individual was rated as being more competent as well as more trustworthy. This suggests that competence judgments automatically led participants to calibrate trust as well (Fiske et al., 2007). However, this effect may not be specific to competence. The halo effect predicts that the competent agent was not only judged as more trustworthy, but that he/she was perceived in a more positive light overall than the incompetent agent. This effect may have mediated the greater impact of the competent agent's agreement on confidence. This implies, for example, that in-group members (who are known to be appraised more positively than out-group members; e.g., Molenberghs, 2013) would have a similar effect on confidence than the competent agent in our study.

Furthermore, electrophysiological results indicated zygomaticus activity in response to agreement when expressed by the competent individual, but only on difficult trials, i.e., when participants were uncertain about their performance on the first-order task. This effect emerged for the medium level of difficulty (Difficulty 4) and reached its maximum for the highest level (Difficulty 5). This suggests that RFRs depend on the reliability attributed to the source, but also on the perceiver's informational needs. Our physiological results demonstrate first that participants implicitly processed the reliability of the source. They further highlight that they processed social feedback as a function of its relevance to their current goal, such that social cues with high informative value amplified the EMG activity.

RFRs are thought to predominantly reflect the outcome of non-affective motor mimicry (Bavelas et al., 1986; Chartrand and Bargh, 1999), which initially evolved to identify the emotional expression of perceived faces (Hatfield and Rapson, 1993; Niedenthal et al., 2005; McIntosh, 2006) and then to encourage affiliation by favoring liking (Lakin et al., 2003). However, it has also been suggested that RFRs reflect the emotional readout of the perceived facial expression (Cacioppo et al., 1986; Buck, 1994; Dimberg and Thunberg, 1998; Grèzes et al., 2013), which may vary substantially as a function of its relevance to the self (Grèzes et al., 2013; Soussignan et al., 2013, 2015). Our results best fit the second hypothesis. In this study, amplified activity found in the zygomaticus likely reflects the participants' sense that social agreement indicates a higher probability of success in the task than anticipated (Carver and Scheier, 1990). In other words, it might correspond to the positive experience of having one's response confirmed by a competent individual – a positive experience that increases with uncertainty about prior performance. The pattern of zygomaticus activity is thus consistent with reports of increased zygomaticus activity with the reward value attributed to smiling faces (Sims et al., 2012) and with arousal level of pleasant facial expressions (Fujimura et al., 2010). The data further suggest that the zygomaticus responses we observed are contingent on the participants' expectations at each trial.

Interestingly, the social-functional perspective assumes that emotions enable individuals to respond to the situation at hand (Keltner and Haidt, 1999). One may thus expect that the positive experience reflected in the zygomaticus activity contributes to the elaboration of the participant's confidence judgment, which in turn leads to an increase in confidence. Intriguingly, however, the behavioral data did not follow the same pattern as the RFR results. We observed that positive feedback from both the competent and the incompetent source impacted confidence independently of task difficulty. We might thus speculate that the modulations in confidence we observed reflect an automatic association between another person's approval and higher subjective confidence in one's own decision. Another person's endorsement of one's own prior response may be motivationally strong enough to raise confidence in that response in a non-analytic manner, even when the source has been presented and appraised as unreliable. We thus speculate that while agreement automatically increases participants' confidence, the emotional response reflected in the zygomaticus activity depends on context appraisal. This proposal implies that inhibiting RFRs during our experiment would not impact the effect of agreement on confidence. The lack of a clear dissociation between the effects may reveal a discrete role of the emotional reaction (which is reflected in zygomaticus activity) in mediating the impact of social agreement on participants' confidence.

It is also noteworthy that in the post-test, participants reported having been influenced by the incompetent individual, even though they rated him/her as incompetent and untrustworthy. This is in line with the finding that the implicit processing of social information may be dissociated from explicit beliefs (Chaiken and Trope, 1999; Forgas et al., 2003; Hassin et al., 2005; Bargh, 2006). This further suggests that participants were partly aware of their failure to screen social information as a function of its reliability. They seem to always integrate agreement into the elaboration of their metacognitive evaluation.

Previous studies have shown that individuals have an irrational susceptibility to social feedback, treating it indiscriminately as reliable information (Bahrami et al., 2010; Eskenazi et al., in revision). This may be due to the fact that social feedback is reliable more often than not in natural settings. In line with this notion, others have claimed that cooperation has become an evolutionarily stable strategy that motivates the perception of other participants as knowledgeable and trustworthy partners (Tomasello, 2014). However, although it is generally an adaptive strategy, such social susceptibility can also be detrimental, compromising performance (Bahrami et al., 2010) as well as the accuracy with which performance is evaluated (i.e., metacognitive sensitivity; Eskenazi et al., in revision). The present study advances those findings by demonstrating that individuals are particularly susceptible to positive social feedback, even when they are aware of its unreliability. Here, we propose that this apparently irrational tendency to take on board another's confirmation when forming metacognitive evaluations is driven by the motivation to maintain a positive self-concept. Moreover, one could further speculate that such self-serving bias has implications for goal achievement. By

# References


maintaining a positive self-concept and enhancing confidence, positive social feedback may help individuals engage in the task and devote resources which would eventually improve success (Custers and Aarts, 2005). Likewise, positive reinforcement has been shown to strongly influence learning (Jones et al., 2011).

## Conclusion

Even though we are able to distinguish reliable from unreliable informants both implicitly and explicitly, when elaborating metacognitive evaluations of our past decisions, we are inclined to treat social feedback as reliable when it is confirmatory. Our results further highlight that negative social feedback is not as effective at impacting one's confidence in oneself. This positively biased processing of social information is robust and may play an instrumental role in social learning that should be addressed in further investigations.

# Acknowledgments

This research was made possible by an advanced grant, "Dividnorm" project # 269616, from the European Research Council. AJ is supported by a "DIM cerveau et pensée" fellowship provided by Region Ile-de-France.

## Supplementary Material

The Supplementary Material for this article can be found online at: http://journal*.*frontiersin*.*org/article/10*.*3389/fpsyg*.* 2015*.*01385


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Jacquot, Eskenazi, Sales-Wuillemin, Montalan, Proust, Grèzes and Conty. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **The duality of gaze: eyes extract and signal social information during sustained cooperative and competitive dyadic gaze**

*Michelle Jarick <sup>1</sup> \* and Alan Kingstone <sup>2</sup>*

*<sup>1</sup> Neurocognition of Attention and Perception Lab, Department of Psychology, MacEwan University, Edmonton, AB, Canada, <sup>2</sup> Department of Psychology, University of British Columbia, Vancouver, BC, Canada*

#### *Edited by:*

*Rossana Actis-Grosso, Università degli Studi di Milano-Bicocca, Italy*

#### *Reviewed by:*

*Giuseppe Di Pellegrino, University of Bologna, Italy Luis R. Manssuer, Bangor University, UK*

#### *\*Correspondence:*

*Michelle Jarick, Neurocognition of Attention and Perception Lab, Department of Psychology, MacEwan University, 10700 104 Avenue, Edmonton, AB T5J 4S2, Canada jarickm@macewan.ca*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 05 June 2015 Accepted: 07 September 2015 Published: 23 September 2015*

#### *Citation:*

*Jarick M and Kingstone A (2015) The duality of gaze: eyes extract and signal social information during sustained cooperative and competitive dyadic gaze. Front. Psychol. 6:1423. doi: 10.3389/fpsyg.2015.01423* In contrast to non-human primate eyes, which have a dark sclera surrounding a dark iris, human eyes have a white sclera that surrounds a dark iris. This high contrast morphology allows humans to determine quickly and easily where others are looking and infer what they are attending to. In recent years an enormous body of work has used photos and schematic images of faces to study these aspects of social attention, e.g., the selection of the eyes of others and the shift of attention to where those eyes are directed. However, evolutionary theory holds that humans did not develop a high contrast morphology simply to use the eyes of others as attentional cues; rather they sacrificed camouflage for communication, that is, to signal their thoughts and intentions to others. In the present study we demonstrate the importance of this by taking as our starting point the hypothesis that a cornerstone of non-verbal communication is the eye contact between individuals and the time that it is held. In a single simple study we show experimentally that the effect of eye contact can be quickly and profoundly altered merely by having participants, who had never met before, play a game in a cooperative or competitive manner. After the game participants were asked to make eye contact for a prolonged period of time (10 min). Those who had played the game cooperatively found this terribly difficult to do, repeatedly talking and breaking gaze. In contrast, those who had played the game competitively were able to stare quietly at each other for a sustained period. Collectively these data demonstrate that when looking at the eyes of a real person one both acquires and signals information to the other person. This duality of gaze is critical to non-verbal communication, with the nature of that communication shaped by the relationship between individuals, e.g., cooperative or competitive.

**Keywords: gaze, attention, cooperation, competition, eye contact**

# **Introduction**

The human eye's morphology is unique among primates in that it possesses a white sclera surrounding a darker iris and pupil. As a result of this high visual contrast, and unlike nonhuman primates, it is easy to determine where a human being is looking. One provocative proposal is that the high contrast polarity of the human eye is an evolutionary adaptation that occurred approximately six million years after the human and chimpanzee lineage split, and this singular morphological adaptation served as a catalyst for new forms of communication to emerge (Kobayashi and Kohshima, 1997). That is, unlike other primates, humans sacrificed camouflage of their looking behavior for communication. As a result we can determine quickly and quietly, and with remarkable fidelity, where someone else is looking, and this has a profound impact on our own behavior. For instance, much research suggests that the contrast polarity of the eyes can influence joint attention, such that human attention is oriented in the same direction as another's gaze (Friesen and Kingstone, 1998; Driver et al., 1999). Moreover, Ricciardelli et al. (2009) have shown that reversing the contrast polarity of the eyes disrupts the perception and response to another's gaze, supporting the importance of this factor in joint attention.

While a tremendous amount of research has been conducted on how humans discriminate and orient to the eyes of others, typically when those images of people are photos or schematic faces (e.g., Friesen and Kingstone, 1998; Hietanen and Leppanen, 2003), there has been a recent and growing appreciation in the field that the high contrast between iris and sclera does not exist only to support one's ability to read the eyes of others as attentional cues. Rather it also serves to signal to others one's internal states and intentions (see Risko et al., 2012; Laidlaw et al., in press, for reviews). The following recent studies illustrate this point.

In a natural situation between two individuals Wu et al. (2014) investigated if, and when, humans use gaze to signal information to other humans while eating. In a series of three experiments it was established that (1) there is a normative behavior to look away when someone begins to bite, (2) that people are more likely to look down at their food just before taking a bite, and pertinent to this paper, (3) when one person looks down signaling that a bite is forthcoming, the other person responds to that signal and looks away. These data suggest that natural gaze signaling occurs in social contexts (e.g., while sharing a meal), is read by another person, and can trigger a gaze response that is different from gaze following during joint attention. That is, the partner at the meal does not look down at the food or directly at the eater as a bite is about to be made but rather looks away in a manner that is consistent with the social norm (see also Wu et al., 2013).

More recently, Gobel et al. (2015) demonstrated that participants' beliefs about social context could have a profound effect on the information that they signal with their eyes. They had participants watch videos of faces of higher or lower ranked people, while they, the participants, were filmed. The participants either believed that the recordings of their viewing behavior would later be seen by the people depicted in the videos or that no-one would see them. When participants believed that the recordings would later be seen by those depicted, they looked less at the eyes of the higher ranked people, and more at the eyes of the lower ranked individuals, suggesting that the participants used their gaze to signal information that was sensitive to social rank (e.g., Foulsham et al., 2010; Cheng et al., 2013).

Collectively, and critical to the aim of the present study, these recent studies suggest that natural real-time social attention between individuals is a two-way street, where each person signals as well as reads gaze information (Wu et al., 2014), and that the nature of this gaze signaling changes with the social context between individuals (Gobel et al., 2015). The present study combined these two ideas and put them to a direct test. We did this by requiring dyads, who did not know each other before taking part in the present study, to hold direct eye-gaze well beyond the natural period of a few seconds (Argyle and Dean, 1965). In addition, we manipulated the social context of the situation by having participants first play a competitive or a cooperative game. Our working hypothesis was that if making eye contact with another person brings into play the duality of eye gaze—that is, gaze serves to both read information from, and signal information to, another person—and that the nature of this gaze communication varies with social context (Wu et al., 2013), then requiring people to hold their eye gaze far beyond the comfort zone of a few seconds should serve to amplify the communication that is occurring between individuals to the point that it would be observable in their behavior alone.

Admittedly, this is a rather bold prediction, but it is grounded on the foundational ideas that eye gaze (as evidenced by its unique morphology) is an extraordinarily powerful and important visual stimulus to humans that supports communication between individuals. Furthermore, as the above data from Wu and Gobel suggest, the use of eye gaze is extremely sensitive to social context and the norms that reside within them. Indeed, social context has such a powerful force on looking behavior that when individuals who do not know each other are together in shared space there is a marked tendency to avoid looking at each other. This has been demonstrated recently on several occasions (Foulsham et al., 2011; Laidlaw et al., 2011; Gallup et al., 2012). For instance, Laidlaw et al. (2011) demonstrated that people sitting in a waiting room were more likely to look in the direction of a chair if it was empty than when it was occupied by a stranger.

In other words, there is good reason to think that people will find it extremely difficult to look at a stranger in the eye for a prolonged period of time. So much so that we hazard to guess that if the reader of this brief report imagines walking into a study, playing a game with a stranger, and then being asked to sit down beside this new partner and for the next 10 min to stare into her or his eyes while s/he stares deeply into theirs, that simply imagining this situation might cause the reader to feel some discomfort. One might also be able to imagine that the nature of the game that one first played with their partner, and the social context that it established, could have a tremendous impact on what one might feel is being communicated while looking into each other's eyes. For example, if the game was cooperative in nature, then the communication might be positive and unifying, almost intimate, and one might try to break eye contact or talk about something neutral to reduce the intimacy being created. In contrast, if the game and social context with the partner was competitive in nature, then the dynamic might feel more like a staring contest.

Consistent with these proposals, research has shown that strangers who wish to limit the level of intimacy will reduce the degree to which they make eye contact (Argyle and Dean, 1965), while those who wish to portray dominance will engage in more eye contact (Exline et al., 1965). Therefore, we predicted that dyads in the cooperative group might try to limit their use of eye contact to keep the intimacy level at bay, while the dyads in the competitive group might keep eye contact to heighten their dominance. The null hypothesis was that this task would be easy and insensitive to any changes in social context primed by having the participants first play a short game. After all, the participants did not know the person they were partnered with, the preceding game, as we will show, involved simply working on puzzles, and the task itself "just" involved looking into the eyes of another person.

# **Materials and Methods**

#### **Participants**

Forty-two undergraduate students participated (15 males, 27 females, mean age of 20 years). Participants were tested in pairs (21 dyads in total). One dyad admitted to having been in class together and were excluded from the analysis. All other participants reported being strangers and provided informed consent prior to participating. There were 10 cooperative dyads (7 males, 13 females; seven same-sex and three opposite-sex) and 11 competitive dyads (8 males, 14 females; seven same-sex and four opposite-sex). All participants gave informed consent before participating and the Research Ethics Boards approved the study procedures.

#### **Procedure**

Dyads were randomly assigned to either the cooperative or competitive context. For the cooperative context, participants were asked to complete a series of Tangram puzzles together as a team, whereas for the competitive context each participant completed their own Tangram puzzle in a race against the other person. Participants had 5 min to complete as many Tangram puzzles as they could. Tangram puzzles are a type of dissection puzzle, composed of different geometric pieces that can be combined to form a broad range of different shapes and/or patterns. The task is to combine all the puzzle pieces to form the requested shape and/or pattern, then move onto the next requested shape/pattern, and so on.

All participants were seated at the same table, with cooperative dyads beside one another and competitive dyads at different sides of the table (see **Figure 1**, for a schematic of the setup). Thus, all participants in the competitive context could see each the others' progress, which was designed to add to the competitive nature of the situation. Consistent with the different nature of the games, all the dyads in the cooperative task engaged in conversation with one another while performing the task, typically with conversation about the task—its difficulty, what pieces should go where, etc.,—ongoing throughout the 5-min session. In contrast, it was unusual for the competitive dyads to talk with one another, and they never engaged in any helping cooperative behaviors, such as assisting the other individual with solving a puzzle. These observations provided us with a solid basis for believing that the two tasks had been successful in establishing different types of relationships between the two groups, i.e., cooperative or competitive. And while we do not have eye contact and speech data from the cooperative dyads, a recent paper by Ho et al. (2015) did track the eyes of dyads while they engaged in cooperative games, and they found that eye gaze is used to signal

both the end and the beginning of a speaking turn. Specifically, a speaker will end his or her speaking turn with direct gaze at the listener, and the listener will then begin to speak while averting their gaze. Note that these data make the additional important point that both eye contact, and the breaking of eye contact, are important communicative social signals.

After the puzzle game, participants were asked to relocate to a different section of the room and sit next to one another (about one foot between them). They were instructed to make eye contact for as long as they could within a 10-min period and it was emphasized that they were not to "cheat," e.g., by closing their eyes or looking at another part of the partner's face. If they broke eye contact, they were to tell the experimenter and just start again until the 10-min had elapsed. There was no penalty for breaking eye contact (save for the fact that it extended the total time required to accumulate a total of 10 min of eye contact time) and the experimenter was very patient with participants when they did break eye contact. Participants had to stay still in their seats and only turned their head toward their partner to make eye contact. We reasoned that having participants sit side-by-side would maximize the physical proximity between them in a natural way (e.g., akin to sitting on a bus) and ensure that when their heads were turned they would be very close to one another (see **Figure 2**). Because a head turn of this nature is effortful, and as such there is no question that the act is anything but volitional, we reasoned that it would serve only to further enhance the gaze signal. These were the only limitations for participants and they were otherwise free to talk, smile, laugh, etc.

Eye contact was evaluated using three different sources. The first source was the participants themselves. They were explicitly instructed not to "cheat" and to self-report when they felt eye contact was broken. The second source was the experimenter. He was trained to watch participants and stop them if he detected a break in eye contact, e.g., a look elsewhere on the face of the participant's partner. The third source was the video recorded using three HD Sony camcorders (two capturing the faces of each participant and one capturing the interaction of both participants). The video was analyzed offline (with 1080p resolution) by two independent coders (author MJ and a research

**FIGURE 2 | Example of the eye contact phase of the experiment.** Participants were seated in close proximity, akin to sitting on a bus or next to someone in a classroom.

assistant) who were blind to the cooperative and competitive conditions.

# **Results**

The videos were coded for the behavioral markers of gaze, smiling, laughing, and talking. The inter-rater reliability was high for the proportion of all behaviors recorded (*r* = 0.99 for eye contact, *r* = 0.82 for talking, *r* = 0.62 for smiling, and *r* = 0.82 for laughing). **Figure 3** shows scarf plots representing the behavioral markers as a function of the 10-min period for five representative dyads in each group. Note that some dyads total time exceeds 10 min (600 s) due to the occurrence of spontaneous interruptions, e.g., a sneeze, the asking of a question, etc. The data analysis however is specific to the 10-min engaged in the task of trying to keep eye contact.

These scarf plots are presented to illustrate how qualitatively different the two types of dyads performed. The cooperative dyads general behavior, presented on the left of **Figure 3**, is punctuated by talking, laughing, smiling and repeated failures to maintain eye contact for sustained periods of time. In contrast, the competitive dyads presented on the right of **Figure 3**, rarely talk, laugh or even smile; and hold direct eye gaze with one another for remarkably long sustained periods of time, with a break in gaze clearly the exception rather than the rule. These patterns of behavior illustrate that the Tangram puzzle prime was a powerful manipulation in our study, and converge with the predicted outcomes of our study, i.e., that dyads in the cooperative group would find it difficult to sustain eye contact while the dyads in the competitive group would not. While talking is consistent with the positive social relationship between the dyads, many also casually reported that engaging in conversation helped them to make the eye contact experience less uncomfortable (i.e., less

intimate). For instance, cooperative dyads might acknowledge that they should stop talking and focus on the task of keeping eye contact, but then within a few seconds of direct eye contact were back to conversing. Also consistent with this, the conversation topics tended to be non-intimate small-talk about school, work, extracurricular activities, etc. Those under the competitive social context, on the other hand, were able to sustain gaze and did not feel it necessary to talk, smile, or laugh.

In order to commit key aspects of these data to statistical analysis, the observed behaviors—eye contact, talk, smile, laugh—were averaged for each dyad (since the behaviors of each participant in the dyads co-occurred) and subjected to independent-sample one-tail *t*-tests with the proportion of time spent performing the behaviors as dependent variables. See **Figure 4**, for mean proportions across the two groups.

The results showed a significant difference between the groups in the proportion of the 10-min making eye contact [*t*(19) = 2.005, *p* = 0.029], the proportion of the time spent talking [*t*(19) = 3.56, *p* = 0.001], the proportion spent smiling [*t*(19) = 2.299, *p* = 0.016], and the proportion of time laughing [*t*(19) = 2.26, *p* = 0.018]. That is, the competitive group was able to keep eye contact for longer periods (*M* = 93.9% of the time) compared the cooperative group (*M* = 80.9% of the time), while the cooperative group talked significantly more (*M* = 47.3 vs. 5.9%), smiled significantly more (*M* = 19.7 vs. 6.6%), and laughed significantly more (*M* = 3.9 vs. 0.1%) compared to the competitive group.

As most of the dyads were of the same-sex pairs, reliable samevs. opposite-sex comparisons could not be made. However, we did remove the opposite-sex pairs to evaluate the data for a mixedsex bias and the results did not change. With the same-sex pairs all behavioral measures were significantly different across the conditions (all *p-values <* 0.05) showing the robustness of our effects.

**FIGURE 3 | Scarf plots representing both duration and frequency of participant behaviors as a function of time across the 10-min period.**

# **Discussion**

In general, researchers have assumed that social attention in the real world can be studied by investigating how people attend to images of people (e.g., Friesen and Kingstone, 1998; Hietanen and Leppanen, 2003). Over the past few years, however, investigators have begun to make the argument that studying how people attend to mere representations of people is failing to capture a key aspect of social attention in the real world (Kingstone et al., 2003, 2008; Myllyneva and Hietanen, 2015). That is, we do not only look at other people simply to extract information about where they are looking. We also look at other people to signal to them information about ourselves, just as they look at us to signal information about themselves. This looking to others to extract information as well as signal information is what we refer to as the *duality of gaze*, although we hasten to add that this duality is not strictly limited to looks toward individuals, as looks away from people also serve an important communicative signal (e.g., Ho et al., 2015).

To date, the amount of evidence in support of this latter position has been limited, but what has been collected has been consistent with it. Some (but by no means all) of the evidence was touched on in the introduction to the present paper. For example, there is also work by Freeth et al. (2013) demonstrating that people answering questions from a live interviewer vs. a video recorded interviewer were sensitive to changes in eye contact only with the live interviewer. Similarly, Foulsham et al. (2011) has reported that people avert their gaze when approaching a real person vs. a video of that person. All these studies are predicated on the notion that there is a duality of gaze that exists in a live situation that is absent when faced with a video version. However, none directly test the idea that live direct gaze is communicative in nature. The present study does precisely that.

In a deceptively straightforward experiment we show that when people are required to make eye contact for a sustained period of time, the social relationship that has been primed between individuals dictates whether eye contact can be kept or not. When the social relationship was cooperative, eye contact was very difficult to sustain, and talking became very frequent, consistent with the notion that individuals find eye contact uncomfortable and reduce this discomfort by limiting the sending and receiving of (potentially intimate) gaze signals and distract themselves

with conversation. An alternative, and not mutually exclusive possibility, is that participants are attempting to regulate their emotional arousal by breaking gaze. Future investigation will be required to resolve if one or both possibilities are being applied.

In contrast, when the relationship between the two participants has been primed to be competitive, participants were able to maintain direct eye gaze for longer stretches of time—far beyond what is normal—and they engaged in relatively little talking. This is consistent with the idea that within a competitive context, eye contact could be perceived as a portrayal of dominance and performed as a staring contest. Indeed, a few participants in the competitive condition spontaneously voiced the strategy of a staring contest.

In sum, this simple study stands as a singular, explicit and powerful demonstration that when two individuals make eye contact, their gaze serves a communicative function that is exquisitely sensitive to and shaped by small manipulations in their relationship. Just by asking participants to work together on a puzzle for 5 min, either cooperatively or competitively, can profoundly alter their ability to sit side by side and look each other in the eye for a period of time.

In addition to the theoretical implications of the present study, the current investigation raises two interesting methodological contributions as well. The first concerns the effectiveness of the Tangram game in priming a cooperative or competitive relationship between participants. This has not, to our knowledge, been demonstrated before and is therefore a potentially powerful tool for future social scientists wishing to manipulate the relationship between two or more individuals in a subtle but robust manner. Secondly, there is the staring task itself. It is not an understatement to say that the task of asking participants to stare at one another could be one of the most powerful quick tests for a researcher to use to determine the underlying nature of their relationship. If dyads have great difficulty keeping eye contact and indulge in talking with one another, then it will serve as an indicator that their relationship is a cooperative one. Conversely, if they have little difficulty making eye contact and fail to talk much, then one might infer that theirs is a more competitive one. That said, it is also important to note that at present we do not have a clear notion of what is the "baseline" performance on this task. While it is tempting to think that no puzzle task, or doing the puzzle task alone, will provide a baseline measure, this would merely leave the relationship between dyads free to vary as a function of whether the dyads found the eye contact task cooperative or competitive. Indeed there are many other social factors that may also modulate the nature of the eye contact task—such as the perceived attractiveness of the individuals in the dyads, their culture, their sexual orientation, and their social status—each of which will further complicate what is the "true baseline" performance.

In closing, and with the caveats above in place, it is perhaps worthwhile to indulge in a small degree of speculation about the behaviors we observed as a function of eye gaze and social context, and what factors may be found to be driving these behaviors after future investigation. With the clear acknowledgment then that what follows is speculation, it is generally assumed that eye contact signals interpersonal thoughts, attitudes, and intentions (e.g., Baron-Cohen et al., 1997), but little is known about if or how it does so during live social interactions. Some of the early researchers to study this phenomenon focused on how eye contact influenced the level of intimacy or dominance when performed at close distances (Argyle et al., 1973). For instance, researchers showed that individuals make more eye contact with people that they like and are attracted to (Exline et al., 1965). Another study reported that couples in love make more eye contact overall than couples that were not in love (Rubin, 1970). Compellingly, strangers have reported feelings of passionate love after spending only 2-min engaged in unbroken eye contact (Kellerman et al., 1989). To account for these effects of eye contact and intimacy, Argyle and Dean (1965) proposed that the level of intimacy between strangers could be maintained by balancing four factors: eye contact, proximity, topic of conversation, and smiling. For instance, if one wants to keep intimacy levels low, they should stand further apart, reduce eye contact, and talk about something banal such as the weather.

More recently, Ponkanen and Hietanen (2012) demonstrated that eye contact with a live individual causes a significant increase is nervous system arousal (galvanic skin response) and this was even more pronounced in response to a smiling face than a neutral face. Arousal is a physiological response to intimacy and according to Argyle and Dean's initial proposal one could predict that arousal would show the greatest enhancement to eye contact of a smiling face in close proximity and engaging in a personal conversation.

Against this historical backdrop one might wish to speculate then that the cooperative dyads in the present study were already in a higher-than-normal intimate environment by simply sitting close in proximity to one another. Hence, the most direct way to reduce intimacy was to break eye contact, which is what we observed. However, because the task was to maintain eye contact, the other avenue was to engage in neutral conversation. This is also what we observed. The competitive dyads on the other hand, were close in proximity but primed to exert dominance. Thus their need to break eye contact or engage in idle conversation was relatively low, and hence the finding that for this group eye contact was sustained and talking was not.

# **Conclusion**

Here we showed experimentally that the effect of eye contact could be quickly and profoundly altered by the social context that was primed by a simple puzzle game. Those who had played the game cooperatively found eye contact terribly difficult to sustain and indulged in a great deal of talking, smiling and laughing. In contrast, those who had played the game

# **References**


competitively were able to stare quietly at each other for long periods with little smiling or laughing. These findings support our hypothesis that when looking at the eyes of a real person, one both acquires and signals information to the other person. This duality of gaze is critical to non-verbal communication, with the nature of that communication shaped by the relationship between individuals, i.e., cooperative or competitive.

# **Acknowledgments**

We thank Patti Leclerc for helping with data coding. This project was funded by grants to AK from the National Sciences and Engineering Research Council of Canada, and the Social Sciences and Humanities Research Council of Canada.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Jarick and Kingstone. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Deceit and facial expression in children: the enabling role of the "poker face" child and the dependent personality of the detector

This study presents the relation between the facial expression of a group of children

*Marien Gadea\*, Marta Aliño, Raúl Espert and Alicia Salvador*

*Department of Psychobiology, Faculty of Psychology, University of València, València, Spain*

#### *Edited by:*

*Rossana Actis-Grosso, Università degli Studi di Milano-Bicocca, Italy*

#### *Reviewed by:*

*Chiara Turati, Università degli Studi di Milano-Bicocca, Italy Katja Koelkebeck, University of Muenster, Germany*

#### *\*Correspondence:*

*Marien Gadea, Department of Psychobiology, Faculty of Psychology, University of València, Avenida Blasco Ibañez 21, 46010 València, Spain marien.gadea@uv.es*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

> *Received: 01 April 2015 Accepted: 14 July 2015 Published: 28 July 2015*

#### *Citation:*

*Gadea M, Aliño M, Espert R and Salvador A (2015) Deceit and facial expression in children: the enabling role of the "poker face" child and the dependent personality of the detector. Front. Psychol. 6:1089. doi: 10.3389/fpsyg.2015.01089* when they told a lie and the accuracy in detecting the lie by a sample of adults. To evaluate the intensity and type of emotional content of the children's faces, we applied an automated method capable of analyzing the facial information from the video recordings (FaceReader 5.0 software). The program classified videos as showing a neutral facial expression or an emotional one. There was a significant higher mean of hits for the emotional than for the neutral videos, and a significant negative correlation between the intensity of the neutral expression and the number of hits from the detectors. The lies expressed with emotional facial expression were more easily recognized by adults than the lies expressed with a "poker face"; thus, the less expressive the child the harder it was to guess. The accuracy of the lie detectors was then correlated with their subclinical traits of personality disorders, to find that participants scoring higher in the dependent personality were significantly better lie detectors. A non-significant tendency for women to discriminate better was also found, whereas men tended to be more suspicious than women when judging the children's veracity. This study is the first to automatically decode the facial information of the lying child and relate these results with personality characteristics of the lie detectors in the context of deceptive behavior research. Implications for forensic psychology were suggested: to explore whether the induction of an emotion in a child during an interview could be useful to evaluate the testimony during legal trials. Keywords: deceit, children, facial expression, emotion, dependent personality, gender differences

# Introduction

The present study fits in the general background of the need to identify valid indicators of deceptive behavior and/or find measures that validly discriminate between liars and truth-tellers. For a long time, this question has stimulated interdisciplinary research despite criticism and skepticism due to the general lack of valid and reliable results as well as discussions around the utility of the laboratory-designs used to explore deception (Vrij and Granhag, 2012). One important claim in this field is to examine deception in quite naturalistic settings, with the aim to provide relatively unrestricted "honest vs. deceptive" statements and so improve ecological validity (Gamer and Ambach, 2014). A comprehensive definition of deception is that "succesful or unsuccessful deliberate attempt, without forewarning, to create in another person a belief which the communicator considers to be untrue" (Vrij, 2000, p. 15). Thus, the deceptive behavior is an interpersonal exchange of information: a very special one where there is a liar and a deceived person, or lie detector. In this kind of social communication, adult subjects achieve, overall, an accuracy rate of around 54% of all statements judged -independently whether they are truth or lie – only slightly above chance (see the meta-analysis of nearly 24,500 veracity judgements by Bond and DePaulo, 2006). The ability to differentiate between children's true and false statements is also an important issue on this field because children can be victims or witnesses to crimes and may be required to testify about their experiences in court (Brunet et al., 2013; see also the review of Lee, 2013). The studies show that when adults attempt to differentiate children's deceptive behavior, including parents, child protection lawyers, police and social workers, and judges, are highly inaccurate and rarely perform above chance levels (Crossman and Lewis, 2006; Eldestein et al., 2006). This fact has been partially explained by some authors by referring to the observed behavior of children when they lie, closely mimicked from subjects who are telling the truth (e.g., to make direct eye contact, Talwar and Lee, 2002).

Until present, no demographic individual difference (i.g., gender, education, age, experience) has shown to be reliably related to deception detection's accuracy (Aamodt and Custer, 2006). It has been shown recently that people find limitations in lie detection mainly due to weakness in behavioral cues of deception (Hartwig and Bond, 2011). So the effort of researchers to improve lie detection should focus on increasing behavioral differences between the liars and the truth tellers. With some exceptions, few studies have focused on the liar to determine individual differences in the ability to lie and some authors have claimed the necessity to study the liar more in deep to fully understand the deceptive behavior (Wright et al., 2013), for instance, Wright et al. (2012) showed that the best lie detectors are also the best liars. Moreover, there are some physical characteristics and observed behaviors which have been associated to the liar, under the assumption that some hidden mental states associated to the act of lying could influence behavior and, therefore, the lie could be inferred (Hill and Craig, 2002). In this sense, certain parameters of the speech (Spence et al., 2012) or a number of kinetic variables (Duran et al., 2013) have been proven to be capable of differentiating between liars and truth tellers. One of the most studied parameters is the facial expression of the liar, given that the human face provides a number of signals which are essential for interpersonal communication in our social life. The face has been considered the place for the most expressive behavior and a window to the subject's mental states, where people possibly cannot overcome the constraints of the translation of their intentions into their expressions (DePaulo, 1992). It has been recently shown through automated tools that when deceptive behavior spontaneously appears, continuous fluctuations of movement in the upper face are characterized by dynamical properties of less stability and greater complexity, despite no apparent differences in the overall amount of movement between deception and truth (Duran et al., 2013). The face is thought to be susceptible of a "leakage" of hidden negative emotional states (supposedly associated to deceptive behavior) that shows facial microexpressions lasting only tenths of a second, which some authors claim to be a clue of deception (Ekman and Friesen, 2003). This could be especially clear when taking into account the type of lie, for instance, Warren et al. (2009) showed that the detectors were significantly above chance for emotional compared with unemotional lies ones, and they reported the benefit of subtle facial expressions of the liars as the key for the task. Thus, one can assume that the facial expression of the speaker when expressing emotions is a determinant factor for the possibility of detecting his/her deceptive behavior. Then one could wonder if every individual person perceives those emotional cues of the face equally, and if the answer is "no" (as is plausible) then maybe exists a detector whose perception is more adequate to detect lies. In fact, despite of the very poor average performance in lie detection, there are some persons that seem to be especially good at this task, as showed by Ekman et al. (1999), who found accuracy rates from 68 to 73% amongst groups with a special interest in deception detection (unfortunately, they did not measure the individual differences in emotional perception of the detectors). Of more interest is the finding of Ein-Dor and Perry (2014) regarding the attachment anxiety, but not other types of anxiety, predicted more accurate detection of deceitful statements. It is reasonable to think that subjects with attachment anxiety do not perceive the emotional information expressed by others in the same manner than the healthy persons. Moreover, we can expand the search a bit far from explicitly clinical disorders to subclinical traits. In the interesting work of Nettle (2006) regarding personality variations, the author argues the possible adaptive function of certain personality characteristics often viewed as undesirable, for instance, the benefits of neuroticism would be vigilance to dangers, striving and competitiveness (in front of its costs of stress, anxiety disorder, and depression). Other authors suggested that clinical levels of paranoia may represent the inevitable cost of efficient threat perception– or 'justified' suspicion – that is necessary for survival of the human species (Green and Phillips, 2004). So, the exploration of subclinical levels of these personality traits in the normal population in relation with lie detection could be interesting as well.

Given the above scenario, our main aim was to investigate the relation between the facial expressions of a group of children when they told a lie and the chance of detecting the lie by a sample of adult detectors. To evaluate the intensity and type of emotional content of the children's faces, we applied an automated method capable of analyzing the facial information from the video recordings, the FaceReader 5.0 software. This is a tool whose neural network has been trained using a high-quality set of approximately 10,000 facial images, manually described as emotional or not by human coders, achieving a classification accuracy of 89% by itself (den Uyl and van Kuilenburg, 2005; Terzis et al., 2010). We expected that the information obtained from the FaceReader regarding the emotionality expressed in the children's face would be positively related to the success in the lie detection task. A second aim was to test the possibility that the anxious or the paranoid personality disorder measured at a subclinical level in the sample of detectors could be related to a more accurate detection of interpersonal deceit.

# Materials and Methods

#### Subjects

#### The Liars: Children Performing the Videotapes

A total of four Caucasian Spanish-speaking female students of 7 years-old were selected from a local school after obtaining informed consent from their parents. We asked for children in a classroom of a primary school which we had access through one of the teachers and with the permission of the headteacher. We got favorable feedback from the parents of only five children: the four girls plus one boy that we didn't include in the sample in order to be homogeneous about the gender variable. This age was selected because the ability of lying, in the sense of applying an intentional component in the discourse to deceit, is supposedly reached. Theories of development regarding the ability of lying suggest it increases from 2 to 3 years to reach its pick at 6 years, in parallel to the development of Theory of Mind and executive functioning in children (Talwar and Lee, 2008; Evans and Lee, 2013; Cheung et al., 2015). The participants had no reported history of psychiatric disorders, medical illness or chronic pharmacological treatment, and their intellectual abilities were normative and homogenous according to their teacher.

#### The Detectors: Sample of Adults Who Watched the Videotapes

A total of 104 young adults, undergraduate volunteers between 18 and 26 years-old, 29 men with a mean age of 20.03 years (SD 4.07) and 75 women with a mean age of 19.91 years (SD 4.48), were selected from the University community to take part in the experiment, without monetary reward. Participants were excluded if they reported current or past psychiatric/neurologic illness, use of psychotropic medication or chronic pharmacological treatment.

All subjects were treated in accordance with "Ethical Principles of Psychologists and Code of Conduct"1 . All procedures were in accordance with the standards of our institutional committee of ethics in research with humans that approves the experiments, and with the 1964 Helsinki declaration and its later amendments.

#### Materials

### Recording of the Videos: the Deceit Detection Test

The Deceit Detection Test consisted in 12 video-recordings that expressed true or false statements from the group of female children cited above. This material was elaborated as follows.

The videos were filmed with a Sony Handycam HDR-PJ200E in a well illuminated room of our laboratory. Special care was taken to ensure good frontal light on the participant's face, which is an important requirement for the FaceReader 5.0 software to produce reliable results. Also important is that participants are looking directly toward the camera while showing their facial expression. Although the software can handle rotations up to 40◦, minimal rotation is desired to ensure optimal quality readings. The recordings with a resolution of 320 × 240 at 12 frames per second were saved as AVI files to be analyzed later with FaceReader 5.0 software. The videotapes were conducted by two researchers who were unaware of the aim of the experiment. Once the context of the recording was set, one of the researchers instructed the children about the task. The children were told they must tell a story, freely chosen by them. The story must be about what happened at a particular moment in time, and must be false or true. So, they were asked to tell the truth or elaborate a spontaneous lie about what happened in a concrete temporal moment of their life (past, present, future). Meanwhile, the other researcher took the record with the camera (six videos for each child, being three with false content and three with true). The children were pressed a bit to do so (and to do it right). This was achieved when one researcher (a stranger man for them) with a camera in hand pointing to her face told them that the acting was "very important for their parents and they must perform the show correctly." The reader can consult concretely the given instructions and a transcription to English of the valid recordings in the Supplementary Material section. It has been argued that to give fixed instructions about when to lie (and when not to) means that the experimental study lacks of ecological validity (Sweeney and Ceci, 2014), but our aim was that the situation resembled a demand to the children to follow instructions from the authority (obey to the parent in the occurrence of a legal trial for instance) given the data pointing that children are very able to fabricate false reports to gain some advantage or to satisfy authority (Bala et al., 2005). These procedures resulted in 24 naturalistic and spontaneously performed videos with a maximum duration of about 1 min each. A pre-selection of the videos was carried out before implemented the experiment: note that the children were free to tell what they want, and some of their statements were inadequate (the instruction was to give a plausible lie, so statements like "I go to the moon this afternoon" were considered invalid) and other were too short or not easily understood. Thus, an individual and separated evaluation of them was additionally performed by three clinical judges with the aim of selecting the best videos in terms of general credibility, veracity and realism, speech quality and similar duration (length). This resulted in 14 videos (seven genuine, seven deceptive) that lasted an average of 32 s with a minimum of 12 s and a maximum of 64 s (SD: 15.7). The minimum number of videos for a given child was two and the maximum, five. These videos were submitted to the FaceReader analysis but unfortunately two of them could not be evaluated by the software due to technical reasons, so the final data analysis was performed on 12 videos (six truths and six lies, minimum number of videos from the same child one, maximum four) that were included in the test. These videos were presented in a computer to the sample of detectors, sequentially and in semi counterbalanced order. After each video a gray screen with genuine or deceptive options was shown ("was the girl telling a lie

<sup>1</sup>http://www*.*apa*.*org/ethics/code/principles*.*pdf

or telling a truth?") so the detector could complete a direct, selfreport judgment. The scoring of the videos is explained in next sections.

## Automated Analysis of Facial Expression: FaceReader 5.0

Facial expressions were analyzed using FaceReader software version 5.0 (Noldus Information Technology B. V., 2012), a commercially available program that uses algorithms to evaluate and classify, on frame-by-frame basis, facial images and videos into the following categories of basic emotions: happiness, sadness, anger, surprise, fear, disgust, and neutral (Ekman, 1970). FaceReader works in three steps: (1) face finding, (2) face modeling, and (3) face classification. This tool finds a face using the Active Template Method. Then it creates a virtual, super-imposed 3D Active Appearance Model featuring almost 500 unique marks of the face. In a third stage, scores for the intensity and probability of facial expressions for basic emotions are computed. These variables reflect a measure of the magnitude of that emotion being shown from 0% (not at all) to 100% (perfect match). In our study, this facial analysis software analyzed more than 370 s of video recording, i.e., around 5,685 frames on six basic emotion scales. In FaceReader you can choose from a list of four models to fit (general, children, east-asian, and elderly) so the appropriate model was selected (Caucasian children between 3 and 10 years). Additionally, a variable that FaceReader takes into account consists in the characteristic facial expression that some people have by nature (sad, angry, etc.). You can calibrate FaceReader to correct for these person's specific biases toward a certain facial expression so that a real emotion can be analyzed. To do so, the user must use one or more videos as calibration material, as it was done in the current study. In the current research, at least two videos of each person were chosen for the calibration process, and a higher sample rate was implemented, because our video's length did not exceed the minute. This was to make sure that the calibration material contained a diverse set of images.

After the analysis we classified each video in just two categories: "neutral" or "emotional." The "neutral" videos were those whose percentage without emotional expression (neutral) was between 70 and 94%. The emotional videos were when the sum of all the expressed emotions was higher than the percent of neutral expression (thus note that emotional videos could include some more "happy" and other more "sad"). This classification resulted in six emotional videos and six neutral videos, being half of them a lie and half of them a truth, respectively.

# Personality Disorders Screening Test: Salamanca Questionnaire

Recently published by Pérez Urdániz et al. (2011) as a screening tool to evaluate 11 personality disorders, some of them according to The Diagnostic and Statistical Manual of Mental Disorders (DSM) version IV-TR (paranoid, schizoid, schizotypal, histrionic, antisocial, narcissist, and dependent) and some other according to the International Classification of Diseases (ICD) version 10 (emotionally unstable personality disorder-impulsive type, emotionally unstable personality disorder-borderline type, also known as limit, anankastic, and anxious.). Additionally, the 11 traits are categorized in three different groups: Type A: strange and extravagants (paranoid, schizoid, and schizotype), Type B: immature (histrionic, antisocial, narcissist, and both subtypes of emotional unstable disorders: impulsive and limit), and Type C: avoiding (anankastic, dependent, and anxious). The questionnaire consists in a total of 22 questions and each trait of personality is evaluated trough two questions with a 4-point Likert scale (false = 0 points; sometimes true = 1 point; usually true = 2 points; always true = 3 points). The cutoff score is established at three points for every trait. This questionnaire has been validated and correlated with the Interpersonal Personality Disorder Examination, being considered an adequate test of screening, with a sensitivity of 100% and a specificity of 76.3% (Caldero-Alonso, 2009). It is a self-assessment questionnaire (*<*10 min) with an easy interpretation.

#### Dependent Variables and Statistical Analysis

Regarding the Deceit Detection Test, seven raw dependent variables were considered for the analyses: (1). Total Hits, is the total score when the detectors guess the child's statement (both the true and false) with a maximum of 12. (2). False Positives, when the detectors thought that a statement was genuine but it was deceptive (the detector believed in the girl but she was lying) with a maximum of six. (3). False Negatives, the total score when detectors thought the statement was deceptive but it was not (the detector did not believe in the girl but she was telling the truth) with a maximum of six. (4). Deception-Hits: we considered separately the scores of the detectors regarding the false statements, with a maximum of six, and the (5) Truth-Hits: the scores of the detectors regarding the true statements, with a maximum of six. It was also considered what was guessed according to the FaceReader analysis: (6) Emotional-Hits (with a maximum of six) for the guessed about videos with emotional content; and (7) Neutral-Hits (with a maximum of six) regarding the videos without emotional content.

We used also signal-detection analysis for hypothesis testing (Stanislaw and Todorov, 1999), by calculating the discriminability (d') index and the participantbias criterion (C) index, regarding their ability to detect the lies. The interpretation of (d') is that the larger the index the better the discriminability, where values near 0 indicate random performance. When the (C) index is 0, this indicates no bias in the judge. Being the signal a lie, a negative (C) index indicates a truth-bias and a positive one indicates a lie-bias. On the other hand, The Salamanca Questionnaire gave 14 scores: Three main scores (for the main Type A, B, and C scales) and 11 subscores for each of the personality disorders described above.

All raw scores were mostly analyzed with non-parametric statistics due to the nature of the variables (Kolmogorov– Smirnov Test *<*0.05 in most cases). Thus, differences between related variables were tested with the Wilcoxon Sign-Rank Test, gender differences were tested with the *U* Mann–Whitney test, and associations between variables were tested with a series of Spearman Rank correlations. All analyses were run with SPSS 19. Data are presented in means, SD, confidence intervals, and index *d* Cohen for effect size when corresponding.

# Results

The **Table 1** shows a descriptive of the 12 videos recorded, with percent of each emotion, total sum of emotions and neutral expressions according the FaceReader analysis, as well as the classification of each video in Neutral or Emotional (N vs. E) and in True or False content (T vs. F), and the Total Hits (raw score and percent) observed for each video. There was a significant negative correlation between the intensity of the neutral expression (% Neutral/FaceReader) and number of hits from the sample (ρ = −0.70; *p <* 0.01). The videos were then classified in a 2 × 2 Table, according the T/F and E/N variables, to form four boxes (three videos in each). The mean percent of Hits for the three Emotional-True videos was 77.2%, and for the Emotional-False videos was 84.9% The mean percent of Hits for the three Neutral-True videos was 56.7% and for the Neutral-False was 46.4%. Thus, the percentdifference of correct classification for the true videos depending on emotional expression was 20.5%. For the false videos depending on emotional expression as well- was 38.4%. This difference was tested with a Chi-square test but it was not significant [χ<sup>2</sup> (gl 1) <sup>=</sup> 1.42; n.s.].

The **Tables 2** and **3** show the means and SD for the sample (as a whole and separated into women and men) regarding the Deceit Detection Test, including the indexes of discriminability (*d* ) and the participant bias criterion (C), as well as the scores from the Salamanca Questionnaire as explained above (3 main scales and 11 subscales). The mean of Total Hits for the whole sample of videos (regardless of the content of the video) was 7.98 (SD: 1.4), so the classification was correct in a 66.5% of the recordings (which was significantly different from a constant of 50% chance: *t*(gl 103) = 14, *p <* 0.001). The difference between scores for False Positives (mean = 2.05; SD = 0.9) and False Negatives (mean = 1.96; SD = 1) was not significant, as neither was the difference between scores for Truth Hits (mean = 4.02; SD = 1.0) and Deception Hits (mean = 3.96; SD = 0.9). Interestingly, the difference between scores for Emotional Hits (mean = 4.88;

SD = 0.9) and Neutral Hits (mean = 3.11; SD = 1.1) was significant (Wilcoxon *Z* = −7.7; *p <* 0.001; CI 95% for the mean Emotional-Hits (4.69–5.05) versus Neutral-Hits (2.88–3.32), d Cohen = 1.7).

The pattern of significant correlations between the Deceit Detection Test and the Salamanca questionnaire showed a significant correlation between the Type C scale and the Neutral Hits score (ρ = 0.21, *p <* 0.02). The Dependent subscale (part of the Type C scale) was the most strongly related to the Deceit Detection Test, with significant direct correlations with the Hits score (Rho = 0.25, *p <* 0.008), with the Deception Hits score (ρ = 0.26, *p <* 0.007), and with the Neutral Hits score (ρ = 0.27, *p <* 0.003), as well as a negative correlation with the False positives score (ρ = −0.25, *p <* 0.009).

Despite the low number of men in the sample, we checked gender differences (see **Tables 2** and **3**) and observed a number of tendencies that approached significance in the Deceit Detection Test, especially the higher mean for Truth Hits in women (Mann–Whitney *U* = 858, *p* = 0.08) and the higher mean for Neutral Hits in women (Mann–Whitney *U* = 859, *p* = 0.08), as well as the higher mean for False Negatives in men (Mann– Whitney *U* = 853, *p* = 0.07). The differences for the Salamanca Questionnaire were more salient: men showed a significantly higher score in the Type A scale [strange and extravagant; Mann– Whitney *U* = 605, *p <* 0.001; CI 95% mean for men (3.8–5.2) vs. women (2.4–3.3), *d* Cohen = 0.8] as well as in each of its subscales: paranoid (Mann–Whitney *U* = 770, *p <* 0.01; CI 95% mean for men (1.3–2) versus women (0.9–1.3), d Cohen = 0.46), schizoid (Mann–Whitney *U* = 795, *p <* 0.02: CI 95% mean for men (1.4–2.4) versus women (1–1.6), d Cohen = 0.49), and schizotype [Mann–Whitney *U* = 769, *p <* 0.008; CI 95% mean for men (0.5–1.2) vs. women (0.3–0.7), *d* Cohen = 0.35]. Men also showed a significantly higher score in two subscales from the Type B scale (immature): antisocial [Mann–Whitney *U* = 873, *p <* 0.03; CI 95% mean for men (0.2–0.6) vs. women (0.1–0.3), *d* Cohen = 0.34] and narcissist [Mann–Whitney *U* = 815, *p <* 0.03; CI 95% mean for men (0.7–1.6) vs. women

TABLE 1 | General characterization of the videos: "Video/Girl" refers to number of the video and first initial of the name of each girl who was recorded.


*"Happy...Other" refer to observed percent of that emotion in the video from the FaceReader analysis." % Emo" and "% Neu" refer to the observed total percent of expressed emotions and neutral expression from the FaceReader analysis. "E/N" refers to the classification of the videos into Emotional or Neutral. "T/F" refers to the classification of the videos into True or False content. Total Hits refers to the number of subjects (from the total n* = *104) who guessed the content of each particular video, transformed into percent in the last column % Hits.*



*TH, Total Hits, independent if the video performed a truth or a lie. FP, false positives (subject believed but the girl was lying), FN, false negatives (subject didn't believe but the girl was telling the truth), TH, Truth Hits score: hits for true statements, HD, Deception Hits score: hits for false statements, EH, Emotional Hits: hits for videos with high emotional facial expression, HN, Neutral Hits: hits for videos with neutral facial expression, d', index of discriminability, C, participant bias criterion index.*



*SA-B-C, Salamanca Test; Type A-B-C scales, Subscales from the Salamanca: P, paranoid; SD, schizoid; ST, schizotype; HS, histrionic; AS, antisocial; N, narcissist; I, impulsive; L, limit; AN, anankastic; D, dependent; ANX, anxious.*

(0.4–0.8), *d* Cohen = 0.48]. On the other hand, women showed a significantly higher score in the dependent subscale of the Type C [Mann–Whitney *U* = 816, *p <* 0.04; CI 95% mean for men (1.2–2.4) vs. women (2.1–2.7), *d* Cohen = 0.42].

# Discussion

In the light on the outcome of the present experiment, our detectors were quite successful in determining the children's truths and lies, since the classification was correct in 66.5%, significantly above from the standard 50–54% accuracy level (Bond and DePaulo, 2006), without significant differences between the detection of true or false videos, and with a moderately good overall index of discriminability. Studies that apply a paradigm in which the children choose to lie (or not) about a transgression by telling "no" (or "yes") to the question: "did you peek?" show that adults are bad detectors, with both deception and truth detection near chance (Leach et al., 2004; Crossman and Lewis, 2006). Interestingly, some contextual variations of the task, like pressing the children to consider the moral implications of deceit or to promise to be honest before the task can facilitate the subsequent deception detection above chance (Leach et al., 2004). Our children were pressed to perform the task well to satisfy their parents, but they told us far more than a monosyllable: in our paradigm they must invent spontaneous stories (see transcriptions, Supplementary Material). Then, one could consider the speech's content as a clue factor to perform the detection task. However, as explained before, care was taken in the selection of the videos to keep the children's discourse reasonable even when they told a lie, so we should assume other factor out of the purely verbal contents of the discourse. Methods of veracity detection that use the linguistic differences in true and false stories (CBCA, reality monitoring) show rates of correct classification from 65 to 90% in trained detectors, but these methods when applied to children are problematic because children's reports tend to contain fewer details and are generally shorter (Brunet et al., 2013). Most of the spontaneous reports of our children were in fact too short to apply CBCA. Another factor to consider could be the age of the children: were them too young to elaborate a good fake, so they were easily detected? We don't think so, because from the point of view of the development of the ability of lying, a child of 7 years has reached enough level of Theory of Mind to be able to perform successfully lying, with intentionality and conventionality (Talwar and Lee, 2002; Lee, 2013).

Instead of the children's age or the verbal content of their discourse, our interest was their facial expression as analyzed with an automated method: the FaceReader. The results were quite interesting: some videos were more easily guessed than others, whether the girl had told a truth or a lie (see **Table 1**). A significant inverse correlation between the accuracy in the detection and the neutral expression of the children appeared. Thus, the less expressive was the child the harder was the detection. This was confirmed when testing accuracy between the emotional and neutral videos: the mean success rate for the emotional videos was significantly higher (see **Table 2**). In addition, though non-significant, there was a 39% difference in guessing between the lies expressed with and without emotional facial expression (the "poker face" was harder to read). This makes the present methodology promising for future studies with higher samples. Some authors have analyzed differences between facial expressiveness of liars and control children before. For instance, in the study of Talwar et al. (2007) that variable was coded manually (non-automated) according the Facial Action Coding System (FACS, Ekman et al., 2002) and revealed small but significant differences between liars and control children in terms of both positive and negative facial expressions (unfortunately the authors did not tested its influence in the detectors). A more similar decoding than ours, with a computer-based automatic vision system, did recognize, with 85% accuracy, the facial expressions of faked pain in adults, compared to the recognition of trained human detectors, who obtained just 55% accuracy (Bartlett et al., 2014). Note that this data prove the existence of certain facial expression associated to deception that can be identified through automatic tools. In the present experiment the most emotionally expressive faces as automatically coded were the most transparent to human detectors, which helped for the detection of either false or true stories of the children. Some authors have suggested that emotional expressiveness in general is related to being judged as trustworthy (Boone and Buck, 2003); instead, we found it to be related as being more easily understood (including the hidden intentions to deceit). Following authors who suggest that the lies can be more accurately detected when less-conscious mental processes are used (Reinhard et al., 2013; ten Brinke et al., 2014) it is plausible that such unconscious process involving the perception of emotion in the face could facilitate the detection of deceptive or truthful information; this hypothesis remains to be evaluated. In addition, we observed a bias toward lies in our detectors: a positive C index indicating the labeling of the truth-tellers as liars. This supports the data of Crossman and Lewis (2006) about a suspicion in the detectors when evaluating children, judging them more prone to lie, which differs from research on detection of adult's lies, that tends to demonstrate a truth bias (Eldestein et al., 2006).

An interesting finding of the present experiment was the relation between personality variables and lie detection. Among other personality traits studied, the attachment anxiety, described as anxiety from separation and abandonment, has been related to good lie detection as commented in the introduction. The attachment anxiety is related to the activation of a psychobiological innate system that motivates people to seek proximity to significant others if in need of protection from threats, and is known to be related to superior abilities to quickly and accurately detect of those threats and dangers (Mikulincer and Shaver, 2007; Ein-Dor et al., 2010). This raises the possibility for the existence of an innate ability to detect deceit, as a socially oriented threat, in these patients. Ein-Dor and Perry (2014) demonstrated that attachment anxiety (but not other types like social, avoidance, or security anxiety; DePaulo and Tang, 1994) predicted a more accurate detection of deceitful statements and a greater amount of money won during a poker game. Here we found that those subjects scoring higher in the dependent personality scale were significantly the most accurate in the task of detection and intriguingly also in the detection without facial emotional cues (the "hard" situation), as well as were less prone to believe that the statement was true when the girl was telling a lie. The definition (according DSM-IV-R) of the dependent

subject as "showing passivity so that others take responsibility over the subject's own decisions, along with subordination and inability to fend alone due to lack of confidence" is very close to that of attachment anxiety and, in fact, most patients with a dependent personality disorder have suffered from attachment anxiety in their childhood (Silove et al., 2011). The present results, along with data from Ein-Dor and Perry (2014), show that the attachment anxiety and the dependent personality at a subclinical level could offer certain social adaptive advantages. This supports the view of Nettle (2006) about variations in personality, which are better described in terms of a mixture of costs and benefits for the individual such that the optimal value for fitness may depend on a concrete context. In fact, literature shows that individuals with dependent personality disorder are very efficient at reading subtle social cues such as facial expression, presumably due to their need to behave in a way that maximizes probability of care (Bornstein, 2012); the present data are in complete accordance with this view. None other scale was related to lie detection, so we cannot support the view of Green and Phillips (2004) about adaptive advantages of paranoia, at least at a subclinical level and in relation with children's lie detection.

Additionally, despite our sample had a low number of men, we were interested in testing gender differences. There is a general assumption that women are superior to men in interpreting other people's non-verbal behavior (Hall, 1978). Women have advantage over men in reading facial expressions (DePaulo et al., 1993) though literature indicates that they have just an advantage when the person whose lies are trying to detect is a closer person, for instance a romantic partner (Vrij, 2011). No statistical differences were found here, only a trend for women to a better discrimination when judging a true statement and, interestingly, in the more difficult condition (neutral expression). This finding resembles the data of Wojciechowski et al. (2014) about the superiority of women in the performance of a deception task with inconsistencies between the facial and verbal cues. It would be of worth for future studies to perform more experiments with larger samples to check if women can be better lie detectors than men in a variety of harder circumstances (for instance, in total absence of facial emotional cues). In addition, we observed that men tended to be more suspicious than women when judging the children's veracity, and their lie bias was higher, supporting the suggestion of DePaulo et al. (1993) about women being more inclined to believe that they are being told the truth than men. The reader must note that these latter assumptions are based only in statistical trends and are commented only to encourage other authors to check for gender differences routinely. Additionally, personality differences by gender showed men scoring significantly higher in paranoid, schizoid, schizotype, antisocial, and narcissist, and women scoring significantly higher in dependent, results that are in accordance with published data (Golomb et al., 1995; Bornstein, 1996).

In sum, we found that children telling deceptive or truthful stories with an unemotional facial expression according the FaceReader were harder to catch. Thus, the automated analysis of facial expression can help as a tool for detecting deception in children. In addition, the emotional expressiveness could affect stronger in some people with especial personality traits who possibly process the emotional info in a different way, concretely persons with subclinical dependent personality disorder; making them best lie detectors. This study has a number of limitations that could be overcome in future work. The number of videos applied was low and should be increased at least until a valid sample of videos of each of the basic emotions would be reached. The sample of detectors should be increased and be gendermatched. In addition, the detectors should be questioned about their perception of emotion in the facial expression of the children, how difficult they found the task, if they used the emotional or other clues to the task etc. At the moment from the present data, implications for forensic psychology are suggested: to explore whether the induction of an emotion in a child during an interview could be useful to evaluate the testimony during

# References


legal trials. In any case, these results and its implications are relevant specifically for those legal situations in which an adult pushes a child to tell a lie, but not necessary for those in which the child spontaneously decides to tell a lie. In addition, it should be noted that the children who go through a real legal process have much more motivation and their emotional state is much more complex, which could potentially affect adults' ability to detect lies. In any case, further study of facial emotional expressiveness of children is of interest to forensic psychology.

# Supplementary Material

The Supplementary Material for this article can be found online at: http://journal*.*frontiersin*.*org/article/10*.*3389/fpsyg*.* 2015*.*01089


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Gadea, Aliño, Espert and Salvador. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **The effect of social categorization on trust decisions in a trust game paradigm**

*Elena Cañadas <sup>1</sup> \*, Rosa Rodríguez-Bailón <sup>2</sup> and Juan Lupiáñez <sup>3</sup>*

*<sup>1</sup> Department of Organizational Behavior, University of Lausanne, Lausanne, Switzerland, <sup>2</sup> Department of Social Psychology, Mind, Brain and Behavior Research Center, University of Granada, Granada, Spain, <sup>3</sup> Department of Experimental Psychology, Mind, Brain and Behavior Research Center, University of Granada, Granada, Spain*

This study investigates whether participants use categorical or individual knowledge about others in order to make cooperative decisions in an adaptation of the trust game paradigm. Concretely, participants had to choose whether to cooperate or not with black and white unknown partners as a function of expected partners' reciprocity rates. Reciprocity rates were manipulated by associating three out of four members of an ethnic group (blacks or whites consistent members) with high (or low) reciprocity rates, while the remaining member of the ethnic group is associated with the reciprocity of the other ethnic group (inconsistent member). Results show opposite performance's patterns for white and black partners. Participants seemed to categorize white partners, by making the same cooperation decision with all the partners, that is, they cooperated equally with consistent and inconsistent white partners. However, this effect was not found for black partners, suggesting a tendency to individuate them. Results are discussed in light of the implications of these categorization-individuation processes for intergroup relations and cooperative economic behavior.

#### *Edited by:*

*Andrew Bayliss, University of East Anglia, UK*

#### *Reviewed by:*

*Charles R. Seger, University of East Anglia, UK Danielle M. Shore, University of Oxford, UK*

> *\*Correspondence: Elena Cañadas elena.canadas@unil.ch*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 23 June 2015 Accepted: 28 September 2015 Published: 12 October 2015*

#### *Citation:*

*Cañadas E, Rodríguez-Bailón R and Lupiáñez J (2015) The effect of social categorization on trust decisions in a trust game paradigm. Front. Psychol. 6:1568. doi: 10.3389/fpsyg.2015.01568* **Keywords: trust game, categorization, individuation, ingroup–outgroup perception, cooperation**

# **INTRODUCTION**

Every day we come across many people. The amount of information that we can extract from these encounters can be so demanding that it needs to be organized in order to be used for making efficient decisions and plan our subsequent actions toward those people. Such organization of the information provides us with general knowledge about the individuals we intend to interact with. At the same time, this knowledge helps us guiding our interactions even with strangers.

When perceiving a person for the first time we may categorize him/her according to the salient features of his/her face such as sex, age, or ethnicity (Devine, 1989; Fiske and Neuberg, 1990). Research shows that people use stereotypes to attribute characteristics to others and consequently the impressions we form about them can be biased by those stereotypes. Interestingly, this process can take place outside the individual's awareness (Bargh and Williams, 2006; Cunningham and Zelazo, 2007).

Social perception may involve a decision-making process where social agents decide whom to interact with and how. Perceivers try to predict the course of the interaction and whether the goals of the interaction will be achieved or not. In social contexts, this decision-making process is influenced by certain salient features of the people we interact with, such as facial expressions (e.g., Scharlemann et al., 2001; Ruz and Tudela, 2011), physical attractiveness (e.g., Solnick and Schweitzer, 1999; Solnick, 2001), or ethnicity (e.g., Sommers, 2006, 2007), which may influence our beliefs and expectations about those with whom we have to interact (e.g., Ruz et al., 2011; Gaertig et al., 2012), especially when we know nothing about them. One of the most crucial features when interacting with others concerns the level of trust deployed in these relationships. Trust is essential for a secure and healthy social life (Dunning et al., 2014), being considered as a core social motive (Fiske, 2003).

Although essential to social life, trust is conceived as irrational by philosophers (e.g., Hobbes, 1997; Machiavelli, 1515/2003) or neoclassical economists (e.g., Berg et al., 1995; Bolle, 1998). Despite that, empirical evidence has shown that people trust strangers and reward that trust (for reviews, see Johnson and Mislin, 2011; Balliet and Van Lange, 2013).

Outside the lab, trust is present in interpersonal situations (trusting a confidant), in economic markets (trusting a financial advisor), or even in political elections (trusting a government). Knowing whom to trust is crucial for preventing being deceived by others, being taken advantage of, or avoiding financial losses, and many other undesirable outcomes. When we have previous experience with our partner, the object of our trust, we can predict at different levels of certainty whether we can trust him/her, and in fact minimal interactions can already influence trustworthiness judgments (Frank et al., 1993). However, when we lack this previous experience with somebody it is difficult (although not impossible) to predict his/her behavior and consequently trust or not him/her<sup>1</sup> . However, trust at zero acquaintance has to be influenced by factors different from the experience with the trustee (Dunning et al., 2014).

Some research has focused on some of these factors that may influence participants trusting behavior. Among these, there is an emerging literature pointing to the role of shared group membership in the promotion of trust (Platow et al., 2012). Much of this work follow the theoretical claim by Brewer (1981, p. 356) that group membership "serves as a rule for defining the boundaries of low-risk interpersonal trust that bypasses the need for personal knowledge and the costs of negotiating reciprocity." This proposition has been supported by the results of some studies showing that participants trust others as a function of their group membership (e.g., Tanis and Postmes, 2005), although it is not supported by results from other studies, as we will describe later (Tortosa et al., 2013).

As in many other impression formation processes, when deciding to trust unfamiliar others, we can either: (a) categorize them and interact with them according to the inferences that can be extracted from what we know (or have learnt) about their category (i.e., their inferred group membership), or (b) to individuate them and try to predict their behavior in order to know how to interact with them based solely on what we specifically learnt about them. Literature has repeatedly shown that categorization seem to be the default process in particular for social stimuli (Brewer, 1988; Fiske and Neuberg, 1990; Kawakami et al., 1998; Cuddy et al., 2004; Nelson, 2005).

In the present study we were interested in evaluating whether people infer information (i.e., reciprocation rate) about others based primarily on their category membership (i.e., ethnicity) or on individuation perception, and consequentially their decision to trust them (e.g., sharing money) depends on these reciprocation inferences. In order to do so, we adapted a procedure developed to investigate the use of social categories for the allocation of attentional control (Canadas et al., 2013), to investigate the categorization-individuation processes underlying the cooperation dynamics in a trust game context.

Canadas et al. (2013) procedure presented photographs of men and women as the context in which congruent or incongruent stimuli appeared for participants to solve a flanker task. Three faces in each social category (i.e., consistent faces) were associated either with a high proportion of congruent trials (75% congruent–25% incongruent, i.e., a low conflict context) or a low proportion of congruent trials (25% congruent–75% incongruent, i.e., a high conflict context). Whereas a forth face in each group (i.e., inconsistent face) was associated to the proportion congruent of the other group. The extent to which inconsistent faces produced the same pattern of results as consistent ones, in spite of being associated to the opposite proportion of congruency as the social category they belonged to, was taken as an index of social categorization. A categorization pattern was observed in fact in the first study, thus supporting the abovementioned idea that categorization processes seem to be the default for social stimuli.

In a second study, we manipulated the instructions given to participants, either to individuate (i.e., pay attention to the individual characteristics of the faces) or categorize (i.e., pay attention to the category-based features of the faces). The previous pattern of results was replicated in the categorization instructions group. However, the pattern of results in the individuation instructions group showed that different effects were observed for consistent and inconsistent faces, thus reflecting the individual association rather than the group associations between faces and the proportion of congruency. This pattern was taken as evidence for individuation even in social contexts when participants are motivated to do so.

In conclusion, although categorization seems to play a dominant role in person perception processing, a wide range of variables has been shown to function as modulators of categorical thinking activation, including instructions, motivation, goals, and strategies (e.g., Lepore and Brown, 1997; Castelli et al., 2004; Macrae and Cloutier, 2009); the paper by Canadas et al. (2013) showed to be a suitable method to investigate these processes. In the current study we aimed at extending this procedure to investigate individuation-categorization processes in a more direct and clear social behavior, the decisions about to trust or not a partner in an economic game.

Wilson and Kayatani (1968) examined the effect of a partner's ethnicity on cooperation behavior in a prisoner's dilemma game. They found that participants were far more cooperative with ingroup partners than with outgroup partners. Also, Chen et al. (2010) performed several experiments using the prisoner's dilemma task in which participants identify themselves as members of the same university of the same ethnicity, and showed the same pattern of results as Wilson and Kayatani (1968), that is, the sense of belonging to the same group played an important role

<sup>1</sup>Trusting behavior is the willingness of an individual to expose themselves to the actions of others while trustworthy behavior is defined to be rewarding trust through reciprocation.

in the participants' cooperation ratings. However, Tortosa et al. (2013) did not find evidence for this bias against the outgroup with the well-known Trust Game paradigm developed by Berg et al. (1995), originally constructed by Camerer and Weigelt (1988). In a first experiment Tortosa et al. (2013) observed no effect of ethnicity on the cooperation rate, whereas in the second experiment they observed in fact a smaller but reliable tendency toward a larger cooperation rate for the outgroup partners (64.4%) compared to the ingroup ones (57.5%).

In the current research we used a modified version of this trust game procedure and incorporated the Canadas et al.'s (2013) manipulation to investigate categorization and individuation processes. We evaluated whether participants prefer to trust (cooperate/share money with) ingroup members (i.e., white partners), compared to outgroup members (black partners). The task typically involves two players, a trustor and a trustee. The trustor (participant) is endowed with a sum of money and has to decide whether or not to share it with her/his game partner. If s/he decides to keep the money for her/himself, the trustee gets nothing. If s/he decides to share the money, the trustee receives the initial endowment multiplied by an amount (determined by the experimenter). If the trustee then reciprocates, the sum is divided between the two players; otherwise the trustor obtains nothing. In this game, the typical decision of the trustor is hazardous because the trustee's reciprocation is not enforced by the rules. Still, substantial amounts of trust are observed across studies (Berg et al., 1995). These effects are attributed more to "social preferences" such as fairness, altruism, and reciprocation (see, for example; Fehr and Schmidt, 1999; Charness and Rabin, 2002; Fehr and Fischbacher, 2002, 2003; Fehr and Camerer, 2007) than to self-interest rational choices.

Importantly in our adaptation of the procedure, each participant was presented with two categories of faces (i.e., blacks and whites) of supposed partners randomly assigned to a high (75%) or low (25%) proportion of reciprocation in a within subject design. Also, as in Canadas et al. (2013) we manipulated that one individual in each group (inconsistent member) is associated with the proportion of reciprocation of the other group. This will allow us to examine different effects of impression formation (participant's cooperation bias). Another advantage of using a within subject design is that it allows us to explore the learning processes underlying participants' strategy to adapt their sharing behavior with high reciprocation vs. low reciprocation partners.

A second and more important aim of our study is to evaluate whether participants individuate, or rather categorize, that is, the extent to which participant behave in the same way with all category members, irrespectively of whether they show a consistent or inconsistent cooperation rate with rest of the category members. In case of categorization, the same decision (e.g., to cooperate with the members of one ethnic group) will be displayed also with inconsistent members of the group (that is, even with those whose reciprocation rates are opposite to the one of their own ethnic-category, and equal to the other ethnic-category). On the contrary, participants will individuate to the extent that their decisions are taken accordingly to the reciprocation rate associated to each individual face rather than to the ethnic-category. Therefore, in case of individuation, inconsistent individuals will be show different cooperation patterns than consistent faces.

We expected that along the block of trials, participants would use the facial features to categorize individuals according to the more salient features of their faces (i.e., ethnic features) and therefore decide to cooperate or not with them depending on the likelihood of their group to reciprocate. Thus, participants would share in greater extent with the individuals of the group more likely to reciprocate. However, we expected this to happen mainly for consistent individuals.

A different prediction was made for inconsistent individuals. On the one hand, and according to our previous research by Canadas et al. (2013), inconsistent individuals might be also categorized. However, given the nature of the task (Trust Game round) when participants interact several times with the same partner, previous interaction with the same person influences the participant's decision to cooperate (King-Casas et al., 2005). Foregoing research has demonstrated that people attempting to maximize their benefits should learn from the feedback displayed after the interactions with the environment (reinforcement learning—Sutton and Barto, 1998) and consequently in our study individuation is a more efficient strategy. This strategy then should be learned quickly after the feedback of each interaction (Axelrod and Hamilton, 1981).

Taken all together, both the individualistic nature of this task and the explicit consequences of each decision (participants were informed about whether the partner reciprocated or not in each trial), we expected that participant would pay attention to each individual and therefore would individuate inconsistent partners, updating first impressions based on previous interactions (Chang et al., 2010; Campellone and Kring, 2013). This individuation pattern (i.e., a correlation between the participants cooperation rate and the individual reciprocation rate, nor the group reciprocation rate) was expected nevertheless mainly for the ingroup individuals (i.e., white partners), thus supporting previous knowledge on outgroup vs. ingroup social categorization (Judd and Park, 1988; Linville et al., 1989; Levin, 1996, 2000; MacLin and Malpass, 2001).

# **MATERIALS AND METHODS**

# **Participants**

Twenty-six undergraduate white students from the University of Granada (one man, mean age 20.15 years, SD = 1.93) participated in exchange for course credits. The study was conformed to the relevant regulatory standards approved by the local ethics committee of the University of Granada in the Department of Experimental Psychology. Participants signed consent forms and received 1% of the final payoffs (maximum 10 euros).

# **Stimuli and Procedure**

At the beginning of the session, participants were instructed that the experiment explored the cooperation patterns that emerge between people during the so-called *trust game*. During the task participants played the role of "*trustors*." They received 1 euro and

had to decide whether to keep or share it with an allegedly partner (i.e., the "*trustee*"), an unknown person for the participant from whom a picture is shown. Each trail starts with the Euro's symbol (€)—indicating that he/she receives 1 euro, and the participant has to decide whether to keep it (by pressing the *0* on the keyboard) or share it with their partner (by pressing *1* on the keyboard). Deciding to keep the money would yield no earnings for the partner and would end the trial. If participants decide to share, it would result in 5 euros given to the *trustee*, who, in turn, would decide whether: (a) to reciprocate the cooperation, and each of them would receive 2.5€, or (b) not to reciprocate, and the *trustee* would keep the 5€ but the participant would receive nothing. This feedback about the *trustee* decision was displayed on the computer screen 500 ms after the participant took the decision and the trial ended after the feedback. The participants' goal was to maximize their payoffs in the game.

Participants were also informed that they were not playing with real people but that the reciprocity behavior would mimic common patterns of play by real people. Participants were not informed about the different manipulations included in the design: the ethnicity of the interaction partners or the partner's reciprocation rate. Therefore they were unaware of the main goal of the study, which was to explore how the ethnicity of the partners can influence strategies of cooperation, and to investigate whether participants categorize vs. individuate the outgroup vs. the ingroup trustees.

The general procedure was similar to that used by Tortosa et al. (2013). The task was presented on a PC running E-prime software (Schneider et al., 2002). Stimuli were frontal photographs of eight black people (four men and four women) and eight white people (four men–four women) from Nimstim face stimulus set (Tottenham et al., 2009) that represented the *trustees*. Faces were matched on attractiveness and trustworthiness as reported by 28 independent participants (10 men and 18 women, all whites; mean aged 32.68, SD = 6.56) in an online questionnaire using Qualtrics®<sup>2</sup> . All stimuli were presented against a gray background (see **Figure 1**). Each trial started with a 200 ms presentation of "€" (2.1 *×* 1.6° visual angle) to indicate the money given to the participant, that was replaced by a fixation point (+, 0.7 *×* 0.7° visual angle) for 500 ms, and was followed by the picture of the *trustee* for that trial (6.2 *×* 8.3° visual angle) for 1500 ms. During this time, participants had to indicate whether to keep (by pressing the "0" key) or share (by pressing the "1" key) the euro. After participants informed of their decision (or after 1500 ms in case they did not so), the picture was replaced by the fixation point for 500 ms and then replaced by a symbolic feedback symbol (1.0° *×* 1.0° visual angle) which indicated the *trustee*'s decision for that trial. Three possible symbols displayed in three different colors were used as feedback: a green "o", a navy "#", and a maroon "\*". Their meanings were: "You have decided to keep the money. You receive 1 euro. Your partner receives 0 euro"; "You have decided to share and your partner has decided to reciprocate"; "You have decided to share and your partner has decided not to reciprocate." The association between specific symbols, color, and their meaning was counterbalanced across participants<sup>3</sup> . On trials where participants did not enter their decision on time (1.5 s), they saw the message "¡tarde!" (late!). At the end of the trial a larger fixation point (a "+" sign, 1.0° *×* 1.0°) remained on the screen for 1000 ms.

Participant played a multi-round design, with 16 different *trustees* over the course of the task. Participants played this game 40 times with each of the 16 *trustees* (for a total of 640 trials) divided in two phases of five blocks each. Each phase was designed so that three faces of an ethnic group were associated with a high probability of reciprocation rate (75%) while three faces of the other group were associated with a low probability of reciprocation rate (25%). These were consistent faces. The forth face of each group in each phase reciprocated at the rate of the other group. These were inconsistent faces. In the second phase the group reciprocation rate was inversed using four different faces for each ethnic group. The order in which black or white started reciprocating in 75% of the trials was counterbalanced across participants. Which face of the group acted as inconsistent face was also counterbalanced across participants. For instance, for a given participant, five blocks constituted the first phase. In

<sup>2</sup>There was neither a significant difference in attractiveness between the two ethnic faces *t*(27) = 1.64, *p* = 0.11; *Alpha*foralltheblackfaces\_attractiveness = 0.79, mean = 2.67, SD = 0.61; *Alpha*forallwhitefaces\_attractiveness = 0.81; mean = 2.51, SD = 0.57; nor a significant difference in trustworthiness *t*(27) = 0.19, *p* = 0.85; *Alpha*foralltheblackfaces\_trustworthiness = 0.77, mean = 2.93, SD = 0.52; *Alpha*forallwhitefaces\_trustworthiness = 0.71; mean = 2.91, SD = 0.50.

<sup>3</sup>This procedure was used because in a follow up experiment we wanted to evaluate evocated potentials associated to the feedback as have been previously studied in Tortosa et al. (2013).

block 1 reciprocation rate was allocated at 50% for every face of both groups, whereas in blocks 2–5 the reciprocation rate was set at 75% for three black trustees (consistent faces), and 25% for one black trustee (inconsistent face), and 25% for three white trustees (consistent faces) and 75% for one white trustee (inconsistent face). In a second phase of five extra blocks, eight new faces were presented and the reciprocation rates were inversed for the ethnic groups. That is, in the sixth block the reciprocation rate would be again set at 50% for both groups, but in blocks 7–10, three white faces would reciprocate at a 75% rate while three black faces would reciprocate at a 25% rate and one white face would reciprocate at a 25% rate while one black face would reciprocate at a 75% rate.

Once the participants finished the trust game task they were presented with the 16 faces and were asked to evaluate them using a likert-scale ranging from 1 "not at all" to 7 "very much" in what extent they were attractive and trustworthy. We also asked participants to indicate how distinctive the face was compared to the other members of its group, using a likert-scale ranging from *−*3 (very distinctive) to +3 (very undistinctive), and how frequently each face was presented compared to the others (1 "less," 2 "the same," 3 "more"). We also included some general questions about the group level, including % of reciprocation and % presentation of whites and blacks.

# **RESULTS**

We analyzed the proportion of participants' sharing/cooperation rates across conditions. First, we compared cooperation rate toward black and white trustees in the first block of the first phase (where there was no manipulation of group reciprocation rate, 50%). There was no significant difference in participants cooperation with black (mean = 0.68; SD = 0.16) vs. white trustees (mean = 0.63; SD = 0.19), *t*(25) = 1.45, *p* = 0.16.

In order to measure the categorization or individuation strategies in participants' cooperation behavior, we analyzed it separately for each ethnic group and faces' level of consistency (consistent or inconsistent with their respective category). Thus, cooperation rates were introduced into an ANOVA with ethnicity (black, white), block (2–5), group reciprocation rate (25%, 75%), and face consistency (consistent, inconsistent) as withinsubject factors. Result showed that participants, contrary to the social categorization hypothesis, decided to cooperate equally independently of the trustee's ethnicity, *F*(1,25) = 0.69, *p* = 0.41, η 2 *<sup>p</sup>* = 0.03, that is, they did not cooperate with white trustees more than with black trustees. The main effect of Face consistency was neither significant, *F*(1,25) = 0.02, *p* = 0.90, η 2 *<sup>p</sup>* = 0.00.

However, and according to our predictions, participants significantly preferred to cooperate with the group associated to high reciprocity (*M* = 66.7%, CI: 61.4–72.0) as compared to the one associated to low reciprocity (*M* = 61.9%, CI: 56.0–67.9), *F*(1,25) = 13.39, *p <* 0.001, η 2 *<sup>p</sup>* = 0.35. This effect of Group Reciprocation rate was significantly moderated by block and Face Consistency, as shown by the three-way interaction between these three factors, *F*(1,25) = 4.23, *p* = 0.008, η 2 *<sup>p</sup>* = 0.15. The interaction showed that the effect of group reciprocation rate (which was opposite for inconsistent faces) increased across blocks, as learning increased. This makes evident the reinforcement learning hypothesis (Chang et al., 2010) by which participants update their previous impressions with the acquired knowledge of reciprocity rate of each face.

More interestingly, the Ethnicity by Group reciprocation rate interaction was significant, *F*(1,25) = 5.17, *p* = 0.03, η 2 *<sup>p</sup>* = 0.17, and was significantly moderated by the Ethnicity *×* Group Reciprocation *×* Face consistency three-way interaction, which was also significant, *F*(1,25) = 10.47, *p* = 0.003, η 2 *<sup>p</sup>* = 0.30. Importantly, contrary to our predictions, a significant Group Reciprocation rate by Face Consistency interaction was observed for black trustees, *F*(1,25) = 6.92, *p* = 0.01, η 2 *<sup>p</sup>* = 0.22, while the same interaction was not significant for white trustees, *F*(1,25) = 0.70, *p* = 0.41, η 2 *<sup>p</sup>* = 0.03 (see **Figure 2**). That is, while black trustees led to cooperation responses as a function of the faces' individual cooperation rates (as they were opposite for inconsistent faces), in the case of the white trustees the participant's cooperation behavior was guided by the cooperation rate of the group, independently of the individual cooperation rate (i.e., independently of face consistency). As the same cooperation responses for consistent and inconsistent faces can be conceived as a sign of *categorization*, and opposite cooperation behaviors for consistent vs. inconsistent faces as a sign of *individuation*, these interactions indicated that black trustees were individuated whereas white trustees were categorized.

# **Trustees' Evaluations**

We checked for individual differences of the faces. Specifically, first we wanted to evaluate how the trust game task could have affected judgments of attractiveness and trustworthiness of the trustees. We then performed a repeated measure analysis (two group reciprocation rate by two ethnicity by two face consistency) on each dependent variable. We did not find any significant effect nor interaction for attractiveness, *Fs*(1,25) *<* 2.8, *ps >* 0.11. Trustworthiness ratings only revealed a significant interaction effect for Ethnicity by Face Consistency, *F*(25) = 4.57, *p* = 0.04, η 2 *<sup>p</sup>* = 0.16, indicating that consistent black trustees were evaluated as more trustworthy (mean = 3.9; SD = 1.61) than inconsistent black trustees (mean = 3.5; SD = 1.53). However, inconsistent white trustees were evaluated as more trustworthy (mean = 3.7; SD = 1.10) than consistent white trustees (mean = 3.4; SD = 1.32). All other *Fs*(1,25) *<* 2.5, *ps >* 0.14.

Then we analyzed how distinctive (very distinctive *−*3 to very undistinctive +3) each individual faces was in comparison with the ingroup faces. The only significant effect was the Group Reciprocation rate by Ethnicity interaction, *F*(25) = 4.24, *p* = 0.05, η 2 *<sup>p</sup>* = 0.15. The result showed that black trustee associated with low reciprocation rates were perceived as more similar to each other (mean = 1.07; SD = 1.47) than those associated to high reciprocation rates (mean = 0.89; SD = 1.41) while white trustees were perceived as more similar to each other when associated to high group reciprocation rates (mean = 0.96; SD = 1.46) compared to low group reciprocation rates (mean = 0.67; SD = 1.62). None of the other effects reached significance, *Fs*(1,25) *<* 1, *ps >* 0.35. Next we perform the same analysis to evaluate the perception of the frequency of individual faces presentation (1 "less," 2 "the same," 3 "more" compared to the rest). None of the effects were significant, *Fs*(1,25) *<* 1.5, *ps >* 0.22.

We next evaluated how participants perceived the faces at a group level. We first analyzed how frequently participant believed that the two groups of faces (blacks and whites) were presented during the task. There was no significant differences in their estimates of the overall presentation rates of black trustees (mean = 56.15%; SD = 12.11%) compared to white trustees (mean = 51.92%; SD = 15.43), *t*(25) = 1.10, *p* = 0.28, η 2 *<sup>p</sup>* = 0.01. This result indicates that participants correctly estimated that all faces were equally presented throughout the experiment. Then we evaluated participants' impression about the reciprocation rates of black and white trustees. Interestingly, we found a significant difference in overall reciprocation' judgments depending on the ethnicity of the trustee, *t*(25) = 2.95, *p* = 0.005, η 2 *<sup>p</sup>* = 0.26. Participants reported they thought that black trustees reciprocated more often (mean = 60.58%; SD = 13.44) than white trustees (mean = 48.50%; SD = 16.00).

# **DISCUSSION**

The present study explored the effect of ethnicity and consistent vs. inconsistent behaviors (reciprocation rates) regarding their identity group in a multi-round trust game task. We wanted to explore whether ethnicity moderates the decision of whether to cooperate with partners or not and, more importantly, whether social categorization or individuation processes would underlie those decisions. Results revealed that participants did not show a particular bias toward cooperating with white compared to black in general, although, interestingly, they used different strategies to make decisions about how to cooperate (share money) or not with white and black partners. Whereas the observed pattern of results led us to conclude that the white ingroup trustee's faces were categorized (i.e., the same cooperation pattern was observed for consistent and inconsistent faces), the black outgroup trustee's faces were individuated (i.e., an opposite pattern of cooperation was observed for consistent and inconsistent faces).

Even though preferences to cooperate with ingroup members more than with outgroup members have been largely reported in previous research (Wilson and Kayatani, 1968; Tanis and Postmes, 2005; Chang et al., 2010), other studies' results go in opposite direction, that is, favoring outgroup members (see Allport, 1954; Monteith et al., 2002; Tortosa et al., 2013, study 2). However, our results did not show any bias neither for black nor for white trustees as measured in block 1 (50% reciprocation rate for both black and white trustees). This finding is in line with previous results by Stanley et al. (2011) which show that unless participants had a strong pro-whites or pro-black bias, as measure with an implicit ethnic attitudes test, their evaluation of trustworthiness and their cooperation behavior (economic offers in a trust game) kept similar toward black and white partners.

We can rule out the possibility that black and white trustees evoked different trustworthiness impression, as we controlled for this (among other variables, e.g., attractiveness) with the pretest for stimuli selection. The evaluation of the trustworthiness of the stimuli at the end of the trust game did not show either overall differences between black and white trustees, which go in line with the pretest and with other studies investigating ethnic attitudes (Phelps et al., 2000; Stanley et al., 2011).

A potential explanation for the similar cooperation toward partners belonging to both ethnic groups can be due to the use of women and men as stimuli. It may exist a confound between these two groups, so participants prefer to cooperate with women more as they are perceived more trustworthy than men (independently of their ethnic categorization, Buchan et al., 2008) and consequently the gender bias may have concealed the ethnic bias. This is surely a confound factor that should be carefully analyzed by future research.

Interestingly, the manipulation of consistency significantly affected the evaluation of trustworthiness, which may explain the current results in our study. The different evaluation of inconsistent black and white faces being the former more positively evaluated regarding trustworthiness than the later may evidence that people accepted more ingroup members (whites) that behave unexpectedly compared to outgroup members (blacks; Kosic et al., 2014).

Our main contribution to the study of ethnic categories and decision-making literature focuses on the study of cooperation strategies related to categorization and individuation processes. Result showed that (white) participants used different strategies to make decisions on how to cooperate (share money) with white and black partners. Specially, they learnt which face is behaving inconsistently with the rest of the group and decided how to cooperate with this person accordingly to the specific cooperation rate that he or she showed. That is, participants individuate each trustee they were encountering with. Interestingly, however, this individuation strategy applied exclusively to black faces (outgroup members). Contrary, decisions to cooperate toward white trustees followed a categorization strategy. That is, participants took their decisions to cooperate with inconsistent trustees as a function of the proportion of reciprocation assigned to the majority of the white trustees (consistent trustees).

The reason why participants categorize whites and individuate blacks, contrary to our expectations, and to what was previously shown (Hugenberg et al., 2010) is far from being clear. However, it could be argued that participants may care about ingroup identification (Castano and Yzerbyt, 1998); therefore, they may be motivated to preserve the homogeneity of the ingroup members (Castano, 1999) producing the categorization effect observed for white faces. Furthermore, according to interdependence theories, participants may have individuated black faces given that their outcomes (the money they could earn during the task) may depend on their sharing behaviors (Ruscher and Fiske, 1990). Participants may have paid special attention to black people to compensate their dispositional behavior to categorize them and by consequence they increased their attention to inconsistencies among black partners. This increased attention may have helped them to use the strategy to cooperate with each face according to the individual reciprocation rate rather than the group reciprocation rate.

Another alternative explanation comes from Collins et al. (2011) model of learning phenomena, and concretely the "blocking" (Kamin, 1968) explanation, explaining why people learnt with different strategies about black and white partners. "Blocking" might occurs for whites when a new proportion of reciprocity (inconsistent-cue) is introduced alongside a proportion of reciprocity (consistent-cue) whose meaning has already been learned about the majority of the members of the group. Because the perceptual information coming from the inconsistent partner (white person) is redundant at the perceptual level (providing no additional information beyond the original cue), learning about it may have been blocked.

Interestingly, while blocking could explain the null effect (more related to categorization for whites), highlighting could explain the individuation effect for blacks. Highlighting occurs when a person focuses extra attention on a cue that changes the meaning of a previously learned cue, as happens when a learned association is no longer correct when a new cue is added alongside a known one (Kruschke, 2009). Another explanation to blocking from a motivational perspective will indicate that for white participants it is not enriching on a matter of novelty to learn about others whites, but it is highly interesting to know about the outgroup, to avoid threats (highlighting).

Unfortunately we do not have information about participants' previous experience with black individuals, so future studies should measure and control for it. Future research should also focus in explaining the mechanism underlying the individuation—categorization strategies chosen by the participants, not only in economic games, but also in other social interactions, such as prosocial behaviors. It will be also interesting to know whether bottom-up (perceptual information) or topdown (conceptual-stereotypes) processes influence judgmental tasks. Previous research in gender-emotion stereotypes (Becker et al., 2007) show that both top-down and bottom-up processes can co-occur during people evaluation.

Another specific detail of our procedure is that another group category apart from the ethnical group (i.e., gender) could be salient, as half of the faces in each group were women whereas the other half were men. Given that the majority of participants in our study (all but one) were women this might have affected the pattern of results. However, given that this occurred for the ethnic groups it seems unlikely that it could explain the pattern of results. Nevertheless, future research should control more carefully for the presence/absence of different important category features (ethnicity, gender, age, etc.).

It seems clear that future research is necessary to replicate and consolidate the specific findings observed in the reported study, and to better explain the observed pattern of results. Nevertheless, and importantly, the current study has shown to be a suitable tool to investigate the incidental generation and use of categorization-individuation social cooperation processes. In a previous study (Canadas et al., 2013), this general paradigm showed to be also suitable to investigate these categorizationindividuation processes and their use underlying the implicit allocation of attentional control. We believe this paradigm could be extended to the study of other situations where categorization vs. individuation processes play an important role in social interactions. Perhaps the individuation pattern observed for the outgroup members might disappear whenever more than four members from each category have to be tracked. Therefore, our procedure might be useful to investigate the interplay between using specific knowledge about our interaction with a particular individual to predict his/her future behavior vs. using knowledge

# **REFERENCES**


we have about our previous interactions with other members of the same group, and the boundary conditions for the use of one process or the other.

# **ACKNOWLEDGMENTS**

This research was financially supported by the Spanish Ministry of Education, with research grants (PSI2013-45678-P and PSI2014- 52764-P) to RRB and JL.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Cañadas, Rodríguez-Bailón and Lupiáñez. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Visual attractiveness is leaky: the asymmetrical relationship between face and hair

#### *Chihiro Saegusa1,2,3\*, Janis Intoy4 and Shinsuke Shimojo2,5*

*<sup>1</sup> R & D-Kansei Science Research, Kao Corporation, Tokyo, Japan, <sup>2</sup> Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA, <sup>3</sup> Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan, <sup>4</sup> Division of Engineering and Applied Sciences, California Institute of Technology, Pasadena, CA, USA, <sup>5</sup> Computation and Neural Systems, California Institute of Technology, Pasadena, CA, USA*

Predicting personality is crucial when communicating with people. It has been revealed that the perceived attractiveness or beauty of the face is a cue. As shown in the well-known "what is beautiful is good" stereotype, perceived attractiveness is often associated with desirable personality. Although such research on attractiveness used mainly the face isolated from other body parts, the face is not always seen in isolation in the real world. Rather, it is surrounded by one's hairstyle, and is perceived as a part of total presence. In human vision, perceptual organization/integration occurs mostly in a bottom up, task-irrelevant fashion. This raises an intriguing possibility that taskirrelevant stimulus that is perceptually integrated with a target may influence our affective evaluation. In such a case, there should be a mutual influence between attractiveness perception of the face and surrounding hair, since they are assumed to share strong and unique perceptual organization. In the current study, we examined the influence of a task-irrelevant stimulus on our attractiveness evaluation, using face and hair as stimuli. The results revealed asymmetrical influences in the evaluation of one while ignoring the other. When hair was task-irrelevant, it still affected attractiveness of the face, but only if the hair itself had never been evaluated by the same evaluator. On the other hand, the face affected the hair regardless of whether the face itself was evaluated before. This has intriguing implications on the asymmetry between face and hair, and perceptual integration between them in general. Together with data from a *post hoc* questionnaire, it is suggested that both implicit non-selective and explicit selective processes contribute to attractiveness evaluation. The findings provide an understanding of attractiveness perception in real-life situations, as well as a new paradigm to reveal unknown implicit aspects of information integration for emotional judgment.

Keywords: attractiveness, face perception, emotion, information integrality, eye movement

# Introduction

Past studies have revealed some seemingly irrational aspects of the human mind in decisionmaking tasks. An example is the influence of task-irrelevant information such as the Simon effect (Simon and Craft, 1972). This influence is considered irrational because an ideally rational decision maker should not be affected by any task-irrelevant information. Another example can be found in

#### *Edited by:*

*Paola Ricciardelli, Univeristy of Milano-Bicocca, Italy*

#### *Reviewed by:*

*Peter Lewinski, University of Amsterdam, Netherlands Daniele Zavagno, University of Milano-Bicocca, Italy*

#### *\*Correspondence:*

*Chihiro Saegusa, R&D-Kansei Science Research, Kao Corporation, 2-1-3 Bunka, Sumida-ku, Tokyo 131-8501, Japan saegusa.chihiro@kao.co.jp*

#### *Specialty section:*

*This article was submitted to Cognition, a section of the journal Frontiers in Psychology*

*Received: 11 November 2014 Accepted: 16 March 2015 Published: 09 April 2015*

#### *Citation:*

*Saegusa C, Intoy J and Shimojo S (2015) Visual attractiveness is leaky: the asymmetrical relationship between face and hair. Front. Psychol. 6:377. doi: 10.3389/fpsyg.2015.00377* the same–different task. In such a task, the reaction time for the same response were longer when the stimuli were different in task-irrelevant dimensions than when they were the same; thus, the task-irrelevant information could not be ignored completely (Egeth, 1966). Garner (1974) reported that either facilitation or interference would occur in response to non-emotional tasks (i.e., classification) depending on the nature of combined dimensions, and summarized the manner of information integration in visual spatial patterns and in auditory temporal patterns. The magnitude of the influence depends on the nature and the combinations of dimensions (Watanabe, 1988). A task-irrelevant influence can also be seen in emotional decision-making, such as visual attractiveness judgment, even when it is shown under our perception threshold (Murphy and Zajonc, 1993).

The specific question we address here is related to these findings: how can emotional values among different types of object be integrated spatially? In particular, would such "attractiveness leakage" occur even when the observer intends to ignore surrounding objects and to concentrate on a target object (Mier et al., 2011; Shimojo et al., 2011)? The significance of answering this question theoretically in relation to real-world applications should be obvious (think of advertisements in magazines or TV commercials, for instance). When one views an entire visual scene, perceptual organization/integration occurs mostly in a bottom up, task-irrelevant fashion (Kanizsa, 1979). It raises an intriguing possibility that the attractiveness of taskirrelevant visual stimuli, while concurrently presented with those that are task-relevant, may affect the attractiveness of the latter depending on the perceptual organization among them. Bearing in mind such motivations, we chose human faces and hairstyles as the stimuli in the current study. The face–hair pair is of utmost interest in this regard because one may expect a maximum degree of leakage owing to their tight perceptual organization.

The human has a well-developed ability to detect, recognize, and discriminate faces automatically, and to draw information from them. Needless to say, a face carries important social information. For instance, beauty is associated with goodness (Dion et al., 1972), earning potential (Elder, 1969), and advantage in mate choice (Thornhill et al., 1995). As for facial attractiveness, averageness, symmetry, and sexual dimorphism make faces more attractive (Langlois and Roggman, 1990; Grammer and Thornhill, 1994; Kowner, 1996; Perrett et al., 1998; Rhodes, 2006, for review). Holistic processing is important in facial attractiveness judgment (Abbas and Duchaine, 2008), and differences in eye movements during holistic and analytic processing of facial attractiveness has been noted (Schwarzer et al., 2005).

In modern perceptual studies about the face, during and after the 80 s in particular, stimuli tend to be prepared by cropping the face to eliminate hair, or by using computer-generated graphics that did not have hair. The influence from hair was considered a sort of artifact in the laboratory, and thus rarely examined. In reality, however, face is typically accompanied with hair. It is therefore rather natural to assume that the impression of one's hair (i.e., hairstyle, hair color, etc.), or a lack of it, influences how the face looks. In fact, the hair plays an important role in some aspects of facial recognition in the real world, for example, in describing photos containing faces and in memory tasks (Davies et al., 1981). There is also evidence that one's hair influences how one looks, e.g., in terms of physical attractiveness, health, and fertility (Swami et al., 2008), as well as personality (Graham and Jouhar, 1981). However, relatively little research has been conducted on hair attractiveness and its influence on the face, partly for the reason mentioned above.

In the current study, we investigated how the attractiveness of task-relevant and task-irrelevant objects (face/hair, or hair/face) is integrated in attractiveness evaluation. We also tracked eye movements during the evaluation task for an objective assessment of the participant's overt attention. If our evaluation of facial or hair attractiveness is influenced by the perceptual misattribution of task-irrelevant facial and hair information, this phenomenon might be found for both male and female face and hair. However, past studies have suggested that the process of evaluating facial attractiveness differs when in evaluating male or female faces. Since both facial and hair makeup have been shown to manipulate appearance and attractiveness ratings of female models (Graham and Jouhar, 1981; Etcoff et al., 2011), we focused on the evaluation of female models in the current study.

# Experiments 1a,b

# Method

#### Participants

Thirty-one adults between the ages of 19 to 33 (*M* = 23.2 years, SD = 4.3 years, 14 females) participated in Experiment 1. Nineteen of the participants (*M* = 23.9 years, SD = 4.4 years, 9 females) participated in Experiment 1a, in which they viewed images of faces, hairs, and composites of faces and hairstyles to evaluate attractiveness, and 12 (*M* = 22.0 years, SD = 4.2 years, 5 females) participated in Experiment 1b, in which their eye movements during the sessions were recorded in addition to the evaluation task. All were naive about the purpose of the experiment, and had normal or corrected-to-normal vision. All of them were unfamiliar with the face and hair images used in the experiment. The Caltech Committee for the Protection of Human Subjects approved the experiment protocol, and informed consent was obtained from all the participants.

#### Materials and Stimuli

To simulate the diversity of faces in the real world, we included faces from multiple ethnicities and attractiveness levels in a stimulus set to use in Experiments 1a,b. Eight face images with four ethnicities (African, European, East Asian, and South Asian) and two attractiveness levels (attractive and less attractive) were selected from a pre-rated, larger set described in our past study (Park et al., 2010). All face images in the set were generated using FaceGen Modeller (Singular Inversions, Toronto, ON, Canada) and race categorization was based on that of the software. From the set of young female faces that consisted of 32 African faces, 36 East Asian faces, 30 European faces, and 38 South Asian faces, we selected the faces at the top 5% of attractiveness within each ethnic category as the attractive faces, and those at the bottom 5% within each ethnic category as unattractive faces. In addition, one face from the European category at the bottom 1% was added to this face set, because European faces at the bottom 5% were consistently evaluated as more attractive compared to the faces in other ethnicities. Sixteen images of hairstyles with two levels of length (long and short), two texture (straight and wave), and four colors (light blonde, darker blonde, light brown, and dark brown) were generated using the online software Hollywood Makeover (http://www.instyle.com/makeover) to include various colors and styles of hair. Each face image and each hair image were combined in the natural spatial alignment to make 144 face-andhair composites. Experiments were written in Matlab using the Psychophysics Toolbox extensions (Brainard, 1997; Pelli, 1997; Kleiner et al., 2007).

#### Procedures

There were two main sessions in which (i) face-only images (FO) and face–hair composites were randomly shown and (ii) hair-only images (HO) and face–hair composites were randomly shown. The participants were asked to evaluate the attractiveness of (i) face only or (ii) hair only on a 7-point scale (1: the least attractive, 4: neutral, 7: the most attractive), while ignoring task-irrelevant hair or face in the composite stimuli (for a sample of composite image trial, see **Figure 1**). In another control session (iii), in which only composites were shown, participants evaluated overall attractiveness. This third session was added to examine the possibility that the leakage effect in (i) or (ii) above may be due to the weighted average of face and hair. The images were presented on a 19-inch ViewSonic CRT screen at 1280 × 1024 pixel resolution with 60 Hz refresh rate. The order of sessions was randomized between participants, and the order of images within each session was also randomized. In all three sessions, a scale bar was presented at the bottom of the computer screen and participants rated the attractiveness using a mouse. A chin rest was set at a distance of 57 cm from the computer screen. In Experiment 1a, each stimulus had a size of 381 × 500 pixels, where face area was approximately 5.0◦ × 7.0◦

of visual angle. Each stimulus was presented on a larger scale in Experiment 1b for recording eye movements, where the size of the stimulus was 534 × 700 pixels with approximately 7.0◦ × 9.8◦ of face area. Eye movements during the sessions were recorded with a head-mounted, video-based eye tracker Eyelink-II (SR Research Ltd., Otawa, ON, Canada) at 250 Hz sampling rate, pupil-tracking mode. Nine-point calibration and validation in the settings of Eyelink-II was performed at the beginning of each session, and a drift correction was performed at the beginning of each trial.

#### Analyses

In the analyses, data on the 16 hairstyles and 8 faces (a European face that was at the bottom 5% was excluded so that there would be an equal number of attractive faces and less attractive faces in the data set for analyses) shown in **Figure 2** were used. For the analyses of self-reported evaluation scores, the scores given by participants in Experiments 1a,b were pooled. Rating scores (*x*) were converted to *z*-scores (*z*) within each participant, using the mean (μ) and SD (σ) of the scores that each participant gave in all three sessions [*z* = (*x*–μ)/σ]. The mean scores were calculated for face attractiveness evaluation on FO, face attractiveness evaluation on face-and-hair composites (FC), hair attractiveness evaluation on HO, hair attractiveness evaluation on face-and-hair composites (HC), respectively, within each participant. The scores of FO and FC, as well as those of HO and HC, were compared using a dependent *t*test to investigate if there was an influence from task-irrelevant hair in FC or from task-irrelevant face in HC. Eye movement was analyzed using Eyelink Data Viewer (SR Research Ltd.). To examine whether their gaze had been limited within the task-relevant area, the area of interests was defined by experimenter's eyes, and the proportion of duration when their gaze was dwelling on the hair area over the duration when their gaze was dwelling in the area of the face-and-hair composite (dwell time ratio in hair area: DwRH) were calculated, and averaged over all the samples within each experimental condition within each participant. The difference in DwRH between the conditions was then tested with a Friedman's rank test.

#### Results and Discussion

#### Attractiveness Ratings

Mean attractiveness ratings of faces were significantly higher when evaluating face-only stimuli [*M* = –0.049, SE = 0.61, 95% CI (–0.17, 0.076)] compared to that when evaluating face-andhair composites [*M* = –0.22, SE = 0.036, 95% CI (–0.30, –0.15); *t*(30) = 3.84, *p <* 0.01, *d* = 0.63, 95% CI of the difference (0.081, 0.27)], as shown in **Figure 3A**. Similarly, mean attractiveness ratings of hair were significantly higher when evaluating hair-only stimuli [*M* = 0.41, SE = 0.057, 95% CI (0.29, 0.52)] compared to that when evaluating face-and-hair composites [*M* = 0.17, SE = 0.051, 95% CI (0.067, 0.28); *t*(30) = 4.97, *p <* 0.001, *<sup>d</sup>* <sup>=</sup> 0.79, 95% CI of the difference (0.14, 0.33); **Figure 3B**]. Thus, in both situations, the perceived attractiveness of the target (face or hair) was lower when evaluating face-and-hair composites compared to that when evaluating face-only or hairstyle-only stimuli. Because the faces presented in the FO condition and the FC condition, and the hairstyles presented in the HO condition and in the HC condition, were the same, the difference in attractiveness scores between the conditions indicate an influence on the evaluation of the target face or hair by the presence of taskirrelevant hair or face. The result that perceived attractiveness level decreased when face and hair were combined, for both the evaluations of face and hair, is rather paradoxical, since combining the face and hair is more realistic compared to face-only or hair-only stimuli. One possibility for this decrease might be due to the congruency between face and hair, since some combinations in our stimuli set (e.g., blonde hair with East Asian face) might be perceived as artificial.

It is known that facial attractiveness is evaluated differently depending on the evaluator's gender. This suggests a possible gender difference in the influence from task-irrelevant face or hair on the evaluation of hair or face. To investigate this gender difference, a two-way repeated measures analysis of variance (ANOVA) was performed on the mean attractiveness scores of faces with

hair-only stimuli and face–hair composites (B). In both cases, the ratings

presented stimuli type (FO or FC) as a within-participant factor and participants' gender as a between-participant factor. The results indicated no interaction between evaluations of face with/without hairs and evaluator's gender [*F*(1,29) = 0.792, *<sup>p</sup>* <sup>=</sup> 0.381, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.027]. Another ANOVA conducted on the mean attractiveness scores of hairstyle revealed no difference between evaluator's gender in the influence from task-irrelevant face in evaluating hair [for the interaction, *F*(1,29) = 0.376, *p* = 0.545, η2 <sup>p</sup> = 0.013].

Results of the dependent *t*-tests indicated that the facial attractiveness evaluation in FO and FC, as well as the hair attractiveness evaluation in HO and HC, might be different from each other, but it is still unclear in what manner the influence occurs. A possible explanation is the misattribution of the attractiveness of task-irrelevant stimulus to the target stimulus. Thus, we investigated this prediction using a correlation analysis on the mean attractiveness score of each of the 8 faces in the FO condition and the average score of hairstyles that were presented with each of the eight faces in the HC condition, as an index of attractiveness leakage from face to hair. The results revealed a significant positive correlation [*r*(6) = 0.631, *p <* 0.05, one-tailed]. On the other hand, the index of attractiveness leakage from hair to face showed no significant correlation [*r*(14) = 0.275, *p* = 0.151, one-tailed]. The mean attractiveness evaluation given to each of the eight FO varied from –1.13 to 1.28, and a one-way repeated measures ANOVA revealed that the attractiveness level of FO significantly differed between the faces [*F*(4.13,123.9) <sup>=</sup> 27.3, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.477]. Analogous to this, the means of the attractiveness scores given to the 16 hair-only stimuli varied from –0.49 to 1.25 and differed significantly between the hairstyles [with a Friedman's rank test, <sup>χ</sup>2(15) <sup>=</sup> 112.3, *<sup>p</sup> <sup>&</sup>lt;* 0.001], indicating that the manipulation of hair attractiveness was also successful. Although the variety of faces and hairstyles included in our stimuli set was rather small, and therefore, the interpretation of the findings

represent ±1 SEM.

should be limited to this stimulus set, the results nevertheless suggest the possible misattribution of facial attractiveness to the hair attractiveness evaluation. To summarize the findings, the attractiveness evaluation of hairstyles was influenced by that of task-irrelevant faces. Although the attractiveness evaluation of faces was influenced by the presence of hairs, it was unclear if the attractiveness of the hairstyle was a source of influence.

#### Eye Movement

A Friedman's rank test on DwRH suggested a difference in dwelling time in the face/hair area between experimental conditions [χ2(4) <sup>=</sup> 39.9, *<sup>p</sup> <sup>&</sup>lt;* 0.001; see **Figure 4**]. The conditions considered in the test were face attractiveness evaluation on FO, face attractiveness evaluation on face-and-hair composites (FC), hair attractiveness evaluation on HO, hair attractiveness evaluation on face-and-hair composites (HC), and overall attractiveness evaluation of face-and-hair composites (TC). Results of the Wilcoxon signed-rank test to compare DwRH in HO and in HC revealed that DwRH in HC [*Mdn* = 0.21, 95% CI (0.14, 0.36)] was significantly lower than that in HO [*Mdn* = 0.38, 95% CI (0.23, 0.48)], *z* = 3.06, *p <* 0.01, *r* = 0.88. This could be interpreted as the automatic drawing of participants' gazes to the face if hair (the target) was presented with a face. In the face evaluation session, participants' gaze was mainly in the face area regardless of whether there was hair, and there was no difference between DwRH in FO [*Mdn* = 0.00, 95% CI (0.0005, 0.01)] and that in FC [*Mdn* = 0.01, 95% CI (0.001, 0.02)], *z* = 1.10, *p* = 0.27, *r* = 0.32. Thus, this effect was observed in only the hair attractiveness evaluation task. Although the sample size for the eye movements might be rather small, all the participants in

Experiment 1b showed the same tendency in their gaze behavior in the hair attractiveness evaluation (**Figure 5**). There are two possible interpretations of this asymmetry. The first is the centerof-gravity in gaze behavior, which in effect would decrease the DwRH in all cases, and the other is the salience of the face (relative to hair). It is not clear at this point which of these interpretations is more appropriate, since we cannot isolate these factors from each other in our choice of stimuli (frontal views of faces and hairstyles).

# Experiment 2

The results of Experiment 1 suggested a possible asymmetrical relationship between face and hair in the leakage of attractiveness from one to another. However, the variances of the baseline attractiveness of the faces and hairstyles were insufficient to allow discrimination between different predictions. Moreover, the findings in Experiment 1 indicated that the same hairstyles were perceived as less attractive when they were presented with a face compared to when they were presented in isolation. The same occurred in the attractiveness evaluation of faces. This might be due to incongruences within the face-and-hair pairs in Experiment 1. Since the stimuli set included face images of several ethnicities, matching between the ethnicities and hair color varied from unnatural to natural (e.g., blonde hairs were perceived as natural on European faces, but might be perceived as artificial on East Asian faces), which might have added noise to the results. Another potential cause of the noisy data was influence from stimuli repetition, since each face or hair was shown repeatedly (but combined with different hair or face) in a session. There

main effect of conditions [χ2(4) <sup>=</sup> 39.9, *<sup>p</sup> <sup>&</sup>lt;* 0.001]. DwRH in hair attractiveness evaluation for face–hair composites was significantly lower than area when face–hair composites were presented. Error bars represent ± 1 SEM.

also might be an artifact from the design, due to randomization of two types of stimuli (e.g., having face-only and face-with-hair) in a single session, which might have yielded some effect over trials, such as confusion over the task or a cognitive set in participants. To eliminate these factors in order to more sensitively detect attractiveness leakage effects, we conducted Experiment 2, in which only European faces were used. In addition, the attractiveness of both hair and face were exaggerated to maximize the leakage effect, and no repetition of face or hair were allowed in a session.

## Method

### Participants

Thirty-two adults aged between 18 and 36 (*M* = 23.3, SD = 4.0, 9 females) were divided into two groups. One group performed a task set for investigating attractiveness leakage from face to hair and another group performed a set for the leakage from hair to face. The experiment protocol was approved by Caltech Committee for the Protection of Human Subjects, and informed consent was obtained from all the participants.

### Materials and Stimuli

To create a stimulus set for investigating the leakage from face to hair, 30 hairstyles with intermediate level of attractiveness were selected from a pre-rated set of 128 hairs. Ten attractive, 10 intermediate, and 10 less attractive faces were selected from a pre-rated set of 140 European female faces generated using FaceGen Modeller. The hairs were divided into three groups of 10 hairs having an approximately similar level of mean attractiveness (according to the pre-rating given by a different set of participants) and proportion of characteristics such as color, shape, and length, between the groups. Then, each group of hairs was combined with each attractiveness level of face images to produce 10 composites of intermediate hair and attractive face, 10 of intermediate hairs and intermediate face, and 10 of intermediate hairs and less attractive faces. Similarly, 30 hair-and-face composites (i.e., three levels of hair attractiveness combined always with intermediate attractive faces) were prepared for investigating the attractiveness leakage from hair to face. These combinations were meant to maximize the potential leakage effect. **Figures 6A,B** shows the sets of faces only that were used in experiments.

### Procedures

The task set for the attractiveness leakage from face to hair consisted of three sessions. In the main session (Hmain), face-and-hair composites were shown in pseudo-random order in which the order of the composites were pre-determined to allocate the attractiveness levels evenly throughout the session ("H" indicates that hair attractiveness rating was the main task, while "F" indicates that face attractiveness rating was the main task, throughout this paper. See **Figure 7** for the list of conditions.) Two patterns of pre-determined pseudo random order were generated, and randomly assigned to participants. Participants were asked to ignore the face and evaluate the attractiveness of only hair on a 7-point scale (1: the least attractive, 4: neutral, and 7: the most attractive). In the two control sessions (hair-only session: HHO, and face-only session: HFO, respectively), hair or face images that were shown in the main session were presented alone without

FIGURE 6 | Sets of faces used in F sessions (A) and H sessions (B) of Experiment 2. In this figure the sets of faces only that were used in Experiment 2 are shown. The main task was face attractiveness rating in F sessions, while the main task was hair attractiveness rating in H sessions. The list of sessions is shown in Figure 7.

face or hair, and participants were asked to rate the attractiveness of them again on the 7-point scale. For all three sessions, each image was viewed for 0.5 s before the rating. This procedure was meant to give the participants an idea of the possible range of attractiveness. The control session HHO was conducted to secure the same attractiveness levels of hairs in the three groups, and the other control session HFO was conducted to secure the attractiveness threshold in the "to-be-ignored" faces. As shown in **Figure 7**, half of the participants were assigned to the "main-first" group, where the task order was Hmain, HHO, and then HFO. Another half of the participants were assigned to the "control-first" group, where the task order was HFO, Hmain, and then HHO. These two orders were set to examine and to balance order effects due to task order. Likewise, the task set for the attractiveness leakage from hair to face consisted of a main session (Fmain) where participants saw composite images and rated the attractiveness of only the face while ignoring the hair, and two control tasks FFO and FHO where they rated the attractiveness of face-only or HO. As before, there were two session orders ("main-first" and "control-first").

Eye movement was recorded using the same equipment and settings as in Experiment 1. Since the recording was not successful for one participant in the "main-first" group of the leakage from the hair to face task set, eye movement analysis was based on data from 31 participants.

A *post hoc* questionnaire was completed after the experiment to check if participants followed the instruction to ignore taskirrelevant face or hair and also to check if they noticed any influence from the task-irrelevant stimuli.

#### Analyses

All the rating scores were standardized as described in Experiment 1. To investigate the influence of the attractiveness level of the task-irrelevant hairs (or that of the task-irrelevant faces) on the attractiveness evaluation of faces (or that of hairs), two-way ANOVAs were performed on the attractiveness ratings in the main and baseline sessions, with the stimuli type and the attractiveness level of task-irrelevant face or hair as repeated measure factors. As in Experiment 1, we employed *post hoc* tests to examine gender differences in the attractiveness self-reports (ratings of attractiveness). We performed two-way ANOVAs on the ratings (for Hmain and Fmain) with attractiveness of the to-be-ignored face or hair as the withinparticipants factor and gender as the between-participants factor.

When analyzing eye movement, the data from the first trial of each session was eliminated due to a longer response time compared to in other trials. Then, dwell-time ratio in taskirrelevant face (or hair) area was calculated in a way similar to that in Experiment 1. In addition, saccade amplitude during the sessions was calculated as another index of eye movement in Experiment 2 to examine if the holistic/analytical processing of information is related to the attractiveness leakage phenomenon.

# Results and Discussion Attractiveness Ratings

*Attractiveness leakage from face to hair* A two-way ANOVA on mean attractiveness ratings in Hmain and in HHO was performed with attractiveness levels of taskirrelevant face (attractive, intermediate, or less attractive) shown with the hair as a within-participant factor. There was a significant main effect of task-irrelevant facial attractiveness level [*F*(2,30) <sup>=</sup> 5.57, *<sup>p</sup> <sup>&</sup>lt;* 0.01, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.27], indicating the presence of attractiveness leakage from face to hair, a significant main effect of layout condition [Hmain or HHO: *F*(1,15) = 7.21, *p <* 0.05, η2 <sup>p</sup> = 0.33], and a significant interaction between layout condition and facial attractiveness level [*F*(2,30) = 8.91, *p <* 0.01, η2 <sup>p</sup> = 0.37]. The main effect of layout condition showed that the ratings in Hmain [*M* = –0.017, 95% CI (–0.076, 0.042)] were significantly lower than those in HHO [*M* = 0.12, 95% CI (0.053, 0.19); *<sup>F</sup>*(1,15) <sup>=</sup> 11.6, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.44, 95% CI of the difference (–0.25, –0.028)]. To interpret the interaction, *post hoc* repeated ANOVAs were performed within H*main* and within HHO, respectively. In Hmain, the main effect of task-irrelevant facial attractiveness on hair attractiveness rating was significant [*F*(2,30) <sup>=</sup> 11.0, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> <sup>=</sup> 0.42; see **Figure 8A**]. A posteriori Bonferroni analysis revealed a significantly lower hair attractiveness score in trials with less attractive faces [*M* = –0.26, SEM = 0.052, 95% CI (–0.37, –0.15)] compared to in trials with attractive faces [*M* = 0.20, SEM = 0.065, 95% CI (0.062, 0.34); *p <* 0.001, 95% CI of the difference (–0.68, –0.24)], and compared to in trials with intermediate faces [*M* = 0.006, SEM = 0.070, 95% CI (–0.14, 0.16); *p <* 0.01, 95% CI for the difference (–0.42, –0.11)]. However, there was no significant difference between trials with attractive faces and those with intermediate faces [*p* = 0.106, 95% CI for the difference (–0.046, 0.43)]. The lack of a significant main effect of ratings in HHO [*F*(2,30) = 0.11, *p* = 0.896] indicates no effect from hair attractiveness itself, as expected, because we controlled the attractiveness level of all hair stimuli to be moderate. Thus, we can conclude, to a significant extent, the difference in hair ratings in Hmain was due to taskirrelevant, to-be-ignored face attractiveness. This result is consistent with the attractiveness leakage phenomenon observed in Experiment 1.

An ANOVA investigating gender differences revealed no significant interaction between participants' gender and attractiveness of the to-be-ignored face [*F*(2,28) = 0.98, *p* = 0.39, η2 <sup>p</sup> = 0.065].

Further, to double-check the leakage phenomena from face to hair, we performed a regression analysis with the attractiveness rating on target hair in Hmain as the dependent variable and the other ratings (attractiveness ratings for hair in HHO and face in HFO) as independent variables. Results indicated that attractiveness ratings for both independent variables made a significant positive contribution to attractiveness ratings in Hmain. Standardized coefficients were β = 0.59 for HHO (*p <* 0.001) and β = 0.14 for HFO (*p <* 0.001). The adjusted *R*<sup>2</sup> value for the model was 0.35. Thus, although the variance is rather small and interpretations should be made carefully, this result may support the results of the ANOVAs that were performed on mean attractiveness ratings in Hmain and in HHO.

*Attractiveness leakage from hair to face* An analogous two-way repeated measure of ANOVA on mean ratings in Fmain and those in FFO were conducted. The results showed a significant main effect of hair attractiveness [*F*(2,30) <sup>=</sup> 3.77, *<sup>p</sup> <sup>&</sup>lt;* 0.05, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.20), indicating a leakage from hair to face. There was no significant main effect of the layout condition [*F*(1,15) <sup>=</sup> 1.34, *<sup>p</sup>* <sup>=</sup> 0.26, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.082). *Post hoc* repeated-measures ANOVAs were performed on the ratings in

Fmain and on those in FFO respectively. The main effect of the task-irrelevant hair attractiveness was only marginally significant in Fmain [*F*(1.41,21.1) <sup>=</sup> 3.54, *<sup>p</sup>* <sup>=</sup> 0.061, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.19; degrees of freedom were corrected using Greenhouse–Geisser estimates of sphericity; see **Figure 8B**], while it was not significant in FFO [*F*(2,30) <sup>=</sup> 2.26, *<sup>p</sup>* <sup>=</sup> 0.122, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.13].

To interpret the marginal significance in the main effect of task-irrelevant hair attractiveness in Fmain, we investigated the possible influence of task order on the attractiveness leakage from hair to face by conducting a two-way ANOVA on the ratings in Fmain with task-order as the between-participant factor and task-irrelevant hair attractiveness as the within-participant factor. The result demonstrated a significant interaction of taskorder and task-irrelevant hair attractiveness [*F*(2,28) = 4.59, *p <* 0.05, η<sup>2</sup> <sup>p</sup> = 0.25], as well as a significant main effect of taskirrelevant hair attractiveness [*F*(2,28) <sup>=</sup> 4.59, *<sup>p</sup> <sup>&</sup>lt;* 0.05, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.25]. Subsequent one-way ANOVAs conducted on face ratings on the attractiveness level of task-irrelevant hair within the "main-first" and "control-first" group respectively showed that the main effect of task-irrelevant hair attractiveness was significant in the "mainfirst" group [*F*(2,14) <sup>=</sup> 7.05, *<sup>p</sup> <sup>&</sup>lt;* 0.01, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.50] but not in the "control-first" group [*F*(2,14) <sup>=</sup> 0.005, *<sup>p</sup>* <sup>=</sup> 1.00, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.001; see **Figure 9**]. These results suggest that the significant main effect of task-irrelevant hair attractiveness in the first ANOVA is mainly due to the "main-first" group, which is free from a sequential effect across sessions. Although the sample sizes for the *post hoc* ANOVAs were rather small, the results nevertheless suggest that the influence from task-irrelevant hair on the attractiveness evaluation of the face differs depending on whether participants were familiar with the hairstyles before starting the main session.

There was no significant difference between male and female participants in the influence of task-irrelevant hair attractiveness on the attractiveness evaluation of faces. That is, there was no significant interaction between participant gender and hair attractiveness: [*F*(1.44,21.6) <sup>=</sup> 1.84, *<sup>p</sup>* <sup>=</sup> 0.19, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.11, degrees of freedom corrected using Greenhouse–Geisser estimates of sphericity].

We double-checked the findings using multiple regression analyses of the attractiveness ratings in Fmain and FHO. Because the ANOVA results suggested that attractiveness leakage might differ depending on task order, we analyzed ratings from the control-first and main-first groups, separately. The standard coefficients in the regression models were as follows: mainfirst task, β = 0.34 for FFO (*p <* 0.001) and β = 0.20 for FHO (*p <* 0.01); control-first condition, β = 0.35 for FFO (*<sup>p</sup> <sup>&</sup>lt;* 0.001) and <sup>β</sup> <sup>=</sup> –0.012 for FHO (*<sup>p</sup>* <sup>=</sup> 0.84). Adjusted *<sup>R</sup>*<sup>2</sup> values for the models were 0.14 (control-first) and 0.17 (mainfirst). The difference between the models can be interpreted as the effect of task-order on how faces are evaluated (for attractiveness) in the presence of task-irrelevant hair. This suggests changes in the mechanisms of facial attractiveness perception between the experimental conditions (control-first and mainfirst). Although this interpretation should be made with caution (for reasons described in the previous section), the results may support the findings from ANOVA performed on the ratings in Fmain.

#### Eye Movement

One-way ANOVAs with the attractiveness level of task-irrelevant hair as a within-subject factor were conducted on the mean dwelltime ratio in the hair area (DwRH) as well as on the mean saccade amplitude during Fmain. Likewise, a one-way ANOVA was conducted on saccade amplitude during Fmain. Regarding dwell-time ratio in the face area (DwRF) in Fmain, Friedman's rank test was performed because the scores were not normally distributed.

# *Eye movements with the attractiveness leakage from face*

*to hair* The results showed a significant main effect of task-irrelevant face attractiveness in saccade amplitude during Hmain [*F*(2,28) = 4.11, *p <* 0.05, η<sup>2</sup> <sup>p</sup> = 0.23]. A posteriori Bonferroni analysis revealed significantly smaller saccade amplitudes in trials with attractive faces [*M* = 3.31, SEM = 0.15, 95% CI (2.99, 3.63)] than in trials with intermediate faces [*M* = 3.49, SEM = 0.15, 95% CI (3.17, 3.81); *p <* 0.05, 95% CI of the difference (–0.34, –0.029)], whereas those in trials with less attractive faces [*M* = 3.41, SEM = 0.15, 95% CI (3.09, 3.73)] were not significantly different from either those in trials with attractive faces [*p* = 0.468, 95% CI of the difference (–0.29, 0.082)] or intermediate faces [*p* = 0.728, 95% CI of the difference (–0.099, 0.26)]. This could be interpreted in two possible ways. First, the attractive faces might strongly attract our eye gaze automatically, and thereby lead to smaller saccade amplitude. Second, the presence of an attractive face might have led to holistic processing, which in turn led to longer dwelling time on the nose area in the face, as indicated in the literature (Schwarzer et al., 2005). However, the difference between the mean angles are rather small (e.g., 0.18◦) and thus might not constitute a meaningful difference. Also, interpretations should be made carefully as the sample size was rather small. No significant effect was observed in DwRH [*F*(2,28) = 0.25, *p* = 0.781, η<sup>2</sup> <sup>p</sup> = 0.018].

# *Eye movements with attractiveness leakage from hair to*

*face* There was no significant main effect of the attractiveness level of task-irrelevant hair in both saccade amplitude and in DwRF

[for saccade amplitude, *<sup>F</sup>*(1.44,21.7) <sup>=</sup> 1.26, *<sup>p</sup>* <sup>=</sup> 0.30, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.077, using corrected degrees of freedom estimated with Greenhouse– Geisser estimates for sphericity; for DwRF, <sup>χ</sup>2(2) <sup>=</sup> 0.32, *p* = 0.85].

# General Discussion

We examined if the attractiveness of a hairstyle (face) is implicitly affected by a face (hairstyle) in two experiments. Results from Experiment 1 provided evidence for the "attractiveness leakage" from face to hair, but not from hair to face, when examining the correlation coefficient as the leakage index. In Experiment 2, we adjusted the attractiveness levels of the faces and hairstyles used to maximize the leakage effect, and manipulated the session order to address a sequential order effect. The results showed significant bidirectional attractiveness leakage between face and hair in the "main-first" session order, and a unidirectional leakage from face to hair in the "control-first" session order, explaining the marginally significant result in Experiment 1 (i.e., no significant leakage from hair to face). In short, the attractiveness leakage is bidirectional between face and hair, as long as there is no prior experience with the same stimuli. Our findings are consistent with other findings from different stimuli/tasks (Egeth, 1966; Simon and Craft, 1972; Garner, 1974; Watanabe, 1988), in that (i) attractiveness evaluation was influenced by task-irrelevant visual information, and (ii) the influence was to some extent asymmetric in its direction. The asymmetry could probably be explained by the ability of our visual system that is tuned automatically to faces, but not so much to hairstyles.

Eye movement patterns in Experiment 1 were partly consistent with the asymmetric pattern of the attractiveness leakage. DwRH (dwell time ratio in hair area) in the hair evaluation task was lower in the trials where hair was shown with face, while DwRH in the face task was not influenced by the presence of hair. This is consistent with our hypothesis that the asymmetry in the leakage effect is due to the automaticity/priority of face processing in our visual system, assuming that overt attention shifts typically reflect covert attention shifts. Another way to interpret this finding is to take into account the visual information processing strategy. It is known that holistic processing plays an important role in facial attractiveness judgment (Abbas and Duchaine, 2008). As for the relationship between eye movement and visual processing, holistic processors tend to look more at the eye and nose area (Schwarzer et al., 2005). More interfeatural saccades are observed in the configural condition than in featural condition, and participants fixated at the center of face tended to perceive it in a holistic way in the condition cued by an intact face, compared to configural, or featural conditions that were cued by blurred or scrambled face (Bombari et al., 2009). Thus, the differences in DwRH in Experiment 1 could also be interpreted in the processing strategy framework. However, the influence of the attractiveness level of the task-irrelevant face or hair on gaze behavior during the evaluation of the attractiveness of the taskrelevant target remains unclear. Also, there is a possibility that the usage of a head-mounted eye tracker in our experiment might have influenced participants' eye gaze behavior during the tasks. Thus, further research is required to determine the direction of causality.

A sequential effect observed in the face evaluation task in Experiment 2 might be interpreted along the same line. The control task in which participants evaluated the hair attractiveness of hair-only stimuli might prime a feature-based perceptual strategy (as in Bombari et al., 2009), thus interfering with a holistic perception in the subsequent main task with the face–hair composites. Another possible interpretation is the predictability of the range of hair attractiveness. A "control-first" task order might have enabled participants to learn the range of task-irrelevant hair attractiveness before proceeding to the main task, and such a cognitive set might have limited the implicit influence from the task-irrelevant stimulus in the subsequent session (Fmain).

In the context of affective states of human emotion, Schwarz and Clore (1983) reported that participants' evaluation on general well-being was influenced by the weather of the day only when participants were not primed by the interviewer about the weather. A similar tendency was found in the mere exposure effect. Through a meta-analysis of research on Zajonc's (1968) mere exposure effect, Bornstein (1989) revealed that a degree of implicitness/explicitness of the presented stimuli affected the effect size of the mere exposure effect. The more implicitly the stimulus was presented, the more influence it had. Jacoby and Whitehouse (1989) explained this phenomenon from the perspective of perceptual fluency, assuming that perceptual fluency underlies the mere exposure effect, and proposed that participants misattributed perceptual fluency to liking in a subliminal condition while engaging in a correction process in a supraliminal condition (Bornstein and D'Agostino, 1992). In terms of the current findings, it has been reported that perceptual fluency is involved in aesthetic evaluations (Reber et al., 2004). Thus, a sequential effect we observed might be explained due to either a predictability of the range of hair attractiveness or a priming to hair attractiveness.

In the *post hoc* questionnaire in Experiment 2, most participant reported that they noticed some influence from taskirrelevant stimuli (*M* = 3.95 and SD = 0.89 for ratings on a 5-point scale ranging from 1: "did not notice any influence" to 5: "noticed influence"), even though they tried to follow the instruction to ignore it (*M* = 4.68 and SD = 0.59 for ratings on a 5-point scale ranging from 1: "did not follow instruction" to 5: "followed instruction"). There were no noticeable differences in their answers according to the target stimuli (face or hair) or session order. This suggests that, in face attractiveness evaluation in the "control-first" task order, the participants noticed an influence from task-irrelevant hair and were able to suppress it. However, in the other conditions (face attractiveness in the "main-first" task-order condition and hair attractiveness in both of the taskorder conditions), they could not suppress the influence from task-irrelevant stimuli.

The leakage effects we observed should still be considered partly "implicit" in the following ways: (1) affected by taskirrelevant stimuli, (2) participants followed instruction to ignore the task-irrelevant stimuli according to their eye movement behavior, and (3) even when they were aware of the influence (from the other part), they could not entirely cancel the effect through effort. However, most of the participants were aware of the influences from the other part of face, and in the particular condition/sequential order they could suppress the influence. Thus, we could conclude that both the explicit and implicit processes contributed to the attractiveness rating.

As an application to the real world, a misattribution of information to irrelevant objects has been researched in relation to advertisement. Especially, it is widely known that physically attractive models in advertisements influence consumer's perception toward the advertisement itself and the advertised product (Baker and Churchill, 1977). Here, we showed that such a misattribution could be observed within a person, between facial and hair stimuli, indicating that such leakage could occur even at perceptual, rather than cognitive or contextual levels. Facial attractiveness is an important impression factor for women applying facial makeup, and past studies have revealed how facial makeup could change the perceived impression of a face (Graham and Jouhar, 1981). Our findings suggest that how one's hair looks might influence how one's face looks, and how one's face looks might influence how one's hair looks. Further, although we focused on the possible misattribution of attractiveness between face and hair, our findings may open up questions about attractiveness integration in general. Thus, our findings may have relevance in several fields including cognitive psychology, neuroscience, and behavioral economics (e.g., in marketing and consumer research). Also, as the faces used in our experiment were computer-generated realistic images, the possible differences between the perception of these faces and that of photos of real faces should be addressed in future research.

In summary, the evaluation of hair attractiveness was influenced by task-irrelevant face attractiveness regardless of the session order, whereas the evaluation of face attractiveness was influenced by task-irrelevant hair only when participants performed the task without influences from the prior task (of rating the baseline attractiveness of hairstyles). In other words, the leakage from hair to face occurs only in situations where the sequential effect is eliminated, i.e., situations that are more natural and consistent with the typical real-world context. Gaze behavior was consistent with the results and the interpretation. The asymmetry in the attractiveness leakage effect from face to hair and that from hair to face possibly indicates an asymmetry in perceptual and attentional processing between face and hair. Finally, combining the results from the *post hoc* questionnaire, eye movements, and behavioral data, both the implicit and explicit processes seem to contribute to the leakage effect.

The findings revealed another notable case of influence from task irrelevant stimuli, shedding new light on the implicit-explicit interplay in attractiveness judgment, especially in the face–hair stimuli. As such, it provides a new paradigm to explore the intricate relationship between perception and aesthetic decision under various contexts.

In the current study, we investigated behavioral and perceptual aspects of attractiveness leakage related to a task-irrelevant object by using face and hair as stimuli. We suggest that future fMRI or EEG studies may reveal the neural mechanisms underlying such integration processes in the assessment of visual attractiveness. Data from neuroimaging could complement those from eye movement research to provide a better understanding

### References


about holistic processing. Since the middle fusiform gyrus, as well as the inferior occipital gyrus, have been reported to support a holistic representation of faces (Schiltz and Rossion, 2006), these brain areas would likely be involved in the leakage phenomenon. Another possible account is the misattribution of emotion on a false target. In this case, emotion displayed on the unattended face would be automatically misattributed to the evaluation of the target stimuli. In fact, some research findings revealed linear neural responses in the reward centers in the brain such as the orbitofrontal cortex (O'Doherty et al., 2003), nucleus accumbens (Aharon et al., 2001; Kim et al., 2007), and ventral occipital region (Chatterjee et al., 2009) to attractive faces, even when people were performing an unrelated task, and non-linear responses in the amygdala, with the greatest responses to both the most attractive and least attractive faces (Winston et al., 2007). This possibility seems consistent with participants' subjective experience that they could not cancel out the influence from unattended objects even when they were aware of the influence. Thus, understanding of the underlying mechanisms would be deepened by taking the effective behavioral paradigm described in the present study in the MRI scanner.

# Acknowledgments

We thank Vikram S. Chib, Eiko Shimojo, Daw-An Wu, Daniela Mier, and others in Shimojo psychophysics laboratory for their advice on this research. This research was supported by Kao Corporation, JST.ERATO, JST.CREST, and Tamagawa-Caltech gCOE.


**Conflict of Interest Statement:** This study was partly funded by Kao Corporation, where CS is employed as a researcher. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Saegusa, Intoy and Shimojo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*