# FACE PERCEPTION ACROSS THE LIFE-SPAN

EDITED BY: Bozana Meinhardt-Injac and Andrea Hildebrandt PUBLISHED IN: Frontiers in Psychology

#### *Frontiers Copyright Statement*

*© Copyright 2007-2017 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88945-114-2 DOI 10.3389/978-2-88945-114-2

### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **FACE PERCEPTION ACROSS THE LIFE-SPAN**

Topic Editors:

**Bozana Meinhardt-Injac,** Johannes Gutenberg-Universität Mainz, Germany **Andrea Hildebrandt,** Ernst-Moritz-Arndt-Universität Greifswald, Germany

Cover photo by Bozana Meinhardt-Injac

Face perception is a highly evolved visual skills in humans. This complex ability develops across the life-span, steeply rising in infancy, refining across childhood and adolescence, reaching highest levels in adulthood and declining in old age. As such, the development of face perception comprises multiple skills, including sensory (e.g., mechanisms of holistic, configural and featural perception), cognitive (e.g., memory, processing speed, attentional control), and also emotional and social (e.g., reading and interpreting facial expression) domains. Whereas our understanding of specific functional domains involved in face perception is growing, there is further pressing demand for a multidisciplinary approach toward a more integrated view, describing how face perception ability relates to and develops with other domains of sensory and cognitive functioning. In this research topic we bring together a collection of papers that provide a shot of the

current state of the art of theorizing and investigating face perception from the perspective of multiple ability domains.

We would like to thank all authors for their valuable contributions that advanced our understanding of face and emotion perception across development.

**Citation:** Meinhardt-Injac, B., Hildebrandt, A., eds. (2017). Face Perception across the Life-Span. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-114-2

# Table of Contents


*136 Neural systems and hormones mediating attraction to infant and child faces* Lizhu Luo, Xiaole Ma, Xiaoxiao Zheng, Weihua Zhao, Lei Xu, Benjamin Becker and Keith M. Kendrick

### **Emotion Perception: From labeling unimodal stimuli to evaluating facial expressions**

*158 Age, gender, and puberty influence the development of facial emotion recognition*

Kate Lawrence, Ruth Campbell and David Skuse

*172 Aging and emotional expressions: is there a positivity bias during dynamic emotion recognition?*

Alberto Di Domenico, Rocco Palumbo, Nicola Mammarella and Beth Fairfield


### **Mutual influence of emotion and appearance on processing invariant and variant facial information**


Reginald B. Adams Jr., Carlos O. Garrido, Daniel N. Albohn, Ursula Hess and Robert E. Kleck

*233 Through a glass darkly: facial wrinkles affect our processing of emotion in the elderly*

Maxi Freudenberg, Reginald B. Adams Jr., Robert E. Kleck and Ursula Hess

# Editorial: Face Perception across the Life-Span

#### Bozana Meinhardt-Injac<sup>1</sup> \* † and Andrea Hildebrandt <sup>2</sup> \* †

<sup>1</sup> Department of Psychology, Johannes Gutenberg-Universität Mainz, Mainz, Germany, <sup>2</sup> Department of Psychology, Ernst-Moritz-Arndt-Universität Greifswald, Greifswald, Germany

Keywords: face perception, emotion perception, visual processing, life-span development, individual differences

#### **The Editorial on the Research Topic**

#### **Face Perception across the Life-Span**

Faces convey information that is of great importance for humans as social beings. The ability to process information from faces undergoes significant changes across the life-span (e.g., Germine et al., 2011), and shows considerable individual differences (e.g., Wilhelm et al., 2010). Average developmental trends and changes of individual differences in face perception across the life-span arise from multiple components. These include sensory (e.g., holistic, configural, and feature based perception), cognitive (e.g., memory, processing speed, attentional control) and emotion related (e.g., identifying facial expression) processing domains. Because our understanding of rather isolated functional domains involved in face perception was growing during the last decades of research, for the present research topic we called for multi-component approaches toward an integrated view on facial information processing. We anticipated that such an approach may help to better describe how the multifaceted facial processing ability is composed and how the components relate to each other. Thus, we aimed to bring together a collection of papers to provide a shot of the current state of the art in developmental research that illustrate actual trends at theorizing and investigating the components of face processing in the context of related abilities. We were open for submissions focusing on average life-span trends, on changes of individual differences, or both.

Nineteen successfully published submissions contributed to this aim. Their findings suggest that faces are a special object category in many respects. In the Editorial, we aim at an integrated view of these contributions. Several papers link findings on different facial information processing abilities, and illuminate their relationships with other cognitive and socio-emotional ability domains in age heterogeneous samples. We will first give an overview of the papers published in the research topic, integrate their findings and derive conclusions for future life-span research on multiple components of face perception.

### FACE PERCEPTION: WHERE DEVELOPMENT AND AGING MEET

Several regions of the human brain including the amygdala, the superior temporal sulcus (STS) and the fusiform face area (FFA) are tuned to different kinds of facial information (i.e., featural and configural; Golarai et al.). There is evidence on the heritability of face processing behavior (e.g., Wilmer et al., 2010) that implies the neural face-system to be inborn, at least to some extent. This system is responsive to faces or face-like configurations (e.g., up-down asymmetry) already at birth, and becomes more and more tuned to human faces during development as a function of visual experience (Simon and Di Giorgio). Cognitive specialization at processing faces occurs in the first months and years of life and still continues across childhood and adolescence. For example, protracted development may be reflected by an increased sensitivity to second-order configural information that refers to the representation of spatial relations between facial

#### Edited and reviewed by: Rufin VanRullen,

Paul Sabatier University, France

#### \*Correspondence:

Bozana Meinhardt-Injac meinharb@uni-mainz.de Andrea Hildebrandt andrea.hildebrandt@uni-greifswald.de

† These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Perception Science, a section of the journal Frontiers in Psychology

Received: 14 August 2016 Accepted: 22 August 2016 Published: 31 August 2016

#### Citation:

Meinhardt-Injac B and Hildebrandt A (2016) Editorial: Face Perception across the Life-Span. Front. Psychol. 7:1338. doi: 10.3389/fpsyg.2016.01338 features (e.g., inter-eye distances—Meinhardt-Injac et al.; Joseph, et al.). An impaired processing of the second-order configuration in faces is present in the Williams syndrome already in infancy, while processing of facial features seems to be unaffected (D'Souza et al.). These findings may be interpreted as reflecting the particular role of adjustment of the human face-system to second-order configurations. In childhood and adolescence, however, there are also some functional commonalities in face and object processing that are typically not observed in adulthood (Joseph et al.; Jüttner et al.). Accordingly, the development of face perception could be understood as a process where domain-general and domain-specific mechanisms dissociate across childhood and adolescence as a result of increasing face-related expertise (e.g., Wang et al., 2016). The development of domain-specific processes can proceed at different rates for different modules of face processing (e.g., Weigelt et al., 2014).

The sensitivity to configural information in faces shows not only protracted development, but also an early decline, starting at about 50 years of age (Meinhardt-Injac et al.; see also Chaby et al., 2001, 2011). In a comprehensive review, Boutet et al. conclude that impairment in the processing of configural information may be one of the major factors of age-related decline in face processing ability. Other possible factors affecting face processing in older age are the decline in basic sensory abilities and faded context recollection. In line with this argument, Olderbak et al. demonstrate that common variance shared by vision, fluid cognitive ability, and immediate and delayed memory predict some but not all age-related variance in face perception and face memory. Taken together, these studies suggest domain-specific aging of the face processing system that cannot be accounted for by domain-general aging processes (e.g., Hildebrandt et al., 2011).

Not only the age of the perceiver affects face perception, but also face-age plays an important role in processing facial information. Face stimuli of different ages seem to trigger different perceptual mechanisms, where categorization of olderface depend more strongly on local texture-based information than it is the case for young faces (Komes et al.). These face-age effects are not only salient in perceptual mechanisms, but also in memory. Fodarella et al. demonstrates an advantage in naming (i.e., memory) for older faces in older subjects (i.e., Own Age Bias, OAB) in facial-composite construction.

The impact of perceptual expertise with different stimulus domains (e.g., language, non-face visual objects) on higher order cognition has been well-documented. Bulf et al. extended available knowledge to the social domain by showing how perceptual expertise with upright versus inverted faces affects rule learning in young infants. On the other hand, top-down influences on face perception are becoming more and more recognized. Luo et al. reviewed literature describing the neural systems and hormones involved in perceiving the cuteness of infant faces. The identified broad neural circuitry, comprising face and emotion processing, as well as reward and attachment related brain regions, demonstrate top-down influences on person perception and, specifically, on the perception of attractiveness.

### EMOTION PERCEPTION: FROM LABELING UNIMODAL STIMULI TO EVALUATING FACIAL EXPRESSIONS

The development of the ability to differentiate facial expressions has been extensively studied in the past. However, normative data on developmental trajectories is surprisingly scarce. Lawrence et al. provide comprehensive cross-sectional data that allow estimating the developmental trajectory of facial emotion recognition between the ages of 6–16 years. Particular for this study is that a standardized and unitary emotion labeling method was used across the whole age range. Children and adolescents labeled emotions expressed by adults from the Ekman-Friesen Pictures of Facial Affect. These emotion recognition data, controlled for IQ, allow differential comparisons related to basic emotion categories, showing that sadness and anger expression recognition is almost at maturity at mid childhood age (about 6 years), whereas linear increase characterizes happiness, surprise, disgust, and fear recognition across the observed age range.

Decline of emotion recognition in older age cannot be studied without considering its interplay with cognition and emotion regulation. Research described by Di Domenico et al. using a dynamic facial expression recognition and a subsequent intensity evaluation task suggests a positivity bias during online emotion identification in older as compared with younger adults. This phenomenon may be driven by well-documented emotion regulation priorities in older age. When tested in isolation, older adults consistently prove to be impaired in recognizing emotions in several modalities. This has been observed for both faces and voices. However, research on the use of cross-modal integration for emotion recognition that argues for an advantage in older ages still needs scientific attention. Chaby et al. supply data comparing younger and older adults indicating similar benefit provided by multimodal information in older and younger ad Souza, ults.

In everyday affective communication humans often neutralize, mask and simulate emotional expressions. Thus, evaluating the authenticity of facial expressions is (above identification of basic emotions) a crucial ability for mastering social interactions. Dawel et al. describe data suggesting authenticity recognition to be characterized by late maturity. Whereas children were less skilled at identifying genuine smiles as compared with young adults, they performed above chance levels in happiness authenticity recognition. However, children could not differentiate genuine sadness and fear expressions from faked ones. Adults also failed to correctly identify the authenticity of fearful facial expressions.

### MUTUAL INFLUENCE OF EMOTION AND APPEARANCE ON PROCESSING INVARIANT AND VARIANT FACIAL INFORMATION

Neuro-cognitive models on person perception usually differentiate the processing route of invariant (identity related) and variant (emotion, face speech, gaze direction) facial information. However, to date mutual influences between the two routes are well-recognized. The research topic includes three endeavors to this topic. First, every face tends to express an emotion expression even in a neutral state. These characteristic, so called baseline expressions of faces have an influence on person perception in adult receiver. For example, adults perceive faces displaying anger as being more masculine as compared with faces displaying different emotions. Bayet et al. show the anger bias toward male categorization to be present already in children as young as 5–6 years. They also report computational simulations of gender categorization, which together with the developmental data indeed do not refute or confirm the mechanism behind the male-bias associated with anger expressions, but emphasize the role of experience-based perceptual inferences and belief-based inferences (stereotype) to this phenomenon. Second, it is conceivable that facial expressivity leaves long-term marks on faces across the life-span and these will influence person and affect perception from a given face. Adams et al. describe data supporting this assumption and show that expressive ratings of neutral facial displays predicted self-reported positive affect of elderly women. Third, not only expressions influence person perception, but also facial appearance has an impact on emotion recognition. Aspects of this phenomenon are illustrated by Freudenberg et al., who showed that misattributions of emotions to elderly faces impair facial emotion processing at several levels of performance.

### THEORETICAL INTEGRATION

Functional models (e.g., Haxby et al., 2000; Young and Bruce, 2011) postulated a hierarchical structure of facial information processing. Above identity processing—thus of invariant facial information—these models allow predictions about how the system deals with variable information provided

### REFERENCES


by faces, including emotional expressions and gaze direction. Socially relevant information that can be further derived from the invariant face structure are age, gender, judgments about attractiveness etc. Early theorizing assumed independent streams of identity versus expression related information. This assumption was recently modified in favor of partial dependence views (e.g., Calder, 2011). However, it is not yet fully clarified how these systems interact. This research topic contributes with some further knowledge about this interaction and aims to trigger future developmental research in this area.

### CONCLUSION

The research topic provides evidence on developmental trajectories and aging effects of processing identity and expression related information from faces. The life-span development of these abilities have been studied in their interplay with components of these abilities and a series of higher-order cognitive functions. Three further papers described research that has been dedicated to studying mutual influences of emotion and appearance on processing invariant and variant facial information. While the efficiency of face processing is clearly affected by age, we need extensive research to reveal and to separate effects that are domain-specific from those that are possibly domain-general. Moreover, possible cohort effects not only in cognitive, but also socio-emotional domains need to be controlled for in future research.

### AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Meinhardt-Injac and Hildebrandt. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Distinct representations of configural and part information across multiple face-selective regions of the human brain

#### Golijeh Golarai <sup>1</sup> \*, Dara G. Ghahremani <sup>2</sup> , Jennifer L. Eberhardt <sup>1</sup> and John D. E. Gabrieli <sup>3</sup>

*<sup>1</sup> Department of Psychology, Stanford University, Stanford, CA, USA, <sup>2</sup> Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA, USA, <sup>3</sup> Harvard-MIT Division of Health Sciences and Technology (HST) and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Boston, MA, USA*

Several regions of the human brain respond more strongly to faces than to other visual stimuli, such as regions in the amygdala (AMG), superior temporal sulcus (STS), and the fusiform face area (FFA). It is unclear if these brain regions are similar in representing the configuration or natural appearance of face parts. We used functional magnetic resonance imaging of healthy adults who viewed natural or schematic faces with internal parts that were either normally configured or randomly rearranged. Response amplitudes were reduced in the AMG and STS when subjects viewed stimuli whose configuration of parts were digitally rearranged, suggesting that these regions represent the 1st order configuration of face parts. In contrast, response amplitudes in the FFA showed little modulation whether face parts were rearranged or if the natural face parts were replaced with lines. Instead, FFA responses were reduced only when both configural and part information were reduced, revealing an interaction between these factors, suggesting distinct representation of 1st order face configuration and parts in the AMG and STS vs. the FFA.

#### Edited by:

*Bozana Meinhardt-Injac, Johannes Gutenberg University Mainz, Germany*

#### Reviewed by:

*Anthony Paul Atkinson, Durham University, UK Jane Elizabeth Joseph, Medical University of South Carolina, USA*

#### \*Correspondence:

*Golijeh Golarai ggolarai@stanford.edu*

#### Specialty section:

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

Received: *17 May 2015* Accepted: *23 October 2015* Published: *06 November 2015*

#### Citation:

*Golarai G, Ghahremani DG, Eberhardt JL and Gabrieli JDE (2015) Distinct representations of configural and part information across multiple face-selective regions of the human brain. Front. Psychol. 6:1710. doi: 10.3389/fpsyg.2015.01710* Keywords: occipito-temporal cortex, FFA, amygdala, STS, face, configuration, parts, holistic representation

## INTRODUCTION

Human faces convey socially relevant information about emotion, intention and identity. Coordinated activity across a network of human brain regions underlies face processing, where by core regions in this network are thought to be specialized in processing specific aspects of facial information (Haxby et al., 2002; Said et al., 2011). For example, the amygdala (AMG) responds to faces, especially to facial expressions of fear (Adolphs and Spezio, 2006). Face selective regions along the superior temporal sulcus (STS) are involved in detecting facial movements associated with eye gaze, speech, and expression of emotions and intentions (Puce et al., 1998; Allison et al., 2000; Thompson et al., 2007; Cohen Kadosh et al., 2010; Esterman and Yantis, 2010). And faceselective regions along the fusiform gyrus (FG), collectively known as the fusiform face area (FFA) are implicated in face detection and identity recognition (Kanwisher et al., 1998; Golby et al., 2001; Grill-Spector et al., 2004; Kanwisher and Yovel, 2006). Much research on the faceprocessing network has focused on elucidating the distinct functional properties of each region, the interactions among these regions, and their common pathways. However, it remains unknown what specific facial cues differentially engage these brain regions in face processing.

Faces share a common set of parts (eyes, nose, etc.) arranged in a typical spatial configuration within the boundaries of the face (also known as the 1st order configuration: nose above the mouth, eyes above the nose), but vary in the appearance of the parts and the fine grain spatial relations among those parts. Numerous behavioral experiments have shown that both configural and part information in faces contribute to accurate face processing. For example, disruption of the 1st order face configuration by inversion of face stimuli or rearrangement of facial features reduced subjects' performance during tasks involving emotion recognition (McKelvie, 1995; Collishaw and Hole, 2000; Prkachin, 2003; Lobmaier and Mast, 2007; Derntl et al., 2009; Schwaninger et al., 2009) and led to substantial decrements in performance during identity recognition tasks (Tanaka and Farah, 1993). Indeed, there is evidence that FFA responses to faces are based on the whole face (Rossion et al., 2000) and sensitive to subtle changes in the spatial relations among face parts (Rhodes et al., 2009). Thus, one hypothesis suggests that processing of the 1st order configural information in faces may be a common step during performance of various facerelated tasks. Moreover, given that the 1st order configuration is a key characteristic among all natural faces, disruption of this information may lead to substantial signal decrements across several face-selective regions, such as the AMG, STS, and FFA. However, other studies suggest that the degree of reliance on configural and part information in faces varies depending on the task and brain region. For example, subjects correctly guessed the expressed emotion in single features, e.g., happiness in a smiling mouth (Leppänen et al., 2007), or direction of gaze in an eye. Likewise, viewing the white of the eyes in fearful vs. neutral faces was sufficient to evoke AMG responses (Whalen et al., 2004). Thus, single facial features might be sufficient for accurate processing of expressive faces via the AMG or STS (Puce et al., 1998; Adolphs et al., 2005). In contrast, performance during identity recognition undergoes a substantial decrement when healthy adults relied on facial features (Tanaka and Farah, 1993; Schiltz and Rossion, 2006). Moreover, the FFA but not the STS showed sensitivity to subtle changes in the spatial relations among facial features (Rhodes et al., 2009). Indeed, poor face recognition performance in patients with acquired prosopagnosia following injury to the ventral stream is associated with feature-by-feature processing of faces (Busigny and Rossion, 2010; Van Belle et al., 2011; Busigny et al., 2014). Together these findings suggest an alternative hypothesis, namely that configural and part information in faces are differentially represented across brain regions involved in processing of expressive facial signals (e.g., AMG and STS) vs. regions involved in processing of face identity (e.g., FFA). Specifically, the AMG and STS may be more sensitive to the appearance of face parts, whereas the FFA may be relatively more sensitive to configural information. Such differential representations of configural and part information across face-selective regions would suggest the contribution of non-overlapping and perhaps local neural circuits in processing these types of facial information in each region.

Moreover, configural and part information may interact within each region. Indeed, in the macaque infero-temoral (IT) cortex, neural responses to facial features depend on their spatial position within the boundaries of the whole face (Freiwald et al., 2009), suggesting an interaction between part and configural information among face-selective neurons in the IT cortex. However, the relative contribution of configural and part information or the potential interactions among these factors within face-selective regions of AMG, STS, and FFA in humans is not clear.

In humans, electrophysiological studies have shown that disruption of the natural configuration of face parts by arbitrary rearrangement of internal parts within the frame of the face images altered the amplitude and timing of face-specific, temporal cortex responses (i.e., N170) to normal vs. rearranged face stimuli (Bentin et al., 1996; Rossion et al., 1999; Eimer, 2000; Halgren et al., 2000; Sagiv and Bentin, 2001; Liu et al., 2010). However, the regional localization of this signal modulation is not clear, as fMRI studies of configural and part processing have provided conflicting results. For example, early studies found no effect in the response amplitudes of face selective regions along the FG when the overall face configuration was disrupted (Grill-Spector et al., 1998; Kanwisher et al., 1998; Haxby et al., 1999; Lerner et al., 2001; Joseph et al., 2006; Collins et al., 2012), although more recent studies provide evidence of signal reductions in the FG (Collins et al., 2012) or the FFA (Liu et al., 2010). Specifically, Collins et al. showed signal reduction in response to face stimuli after disruption of 1st order face configuration within the anatomical boundaries of the FG, but no signal modulations were found in the AMG or STS, consistent with the greater sensitivity to configural information within sub-regions of the FG, relative to AMG or STS (Collins et al., 2012). However, it is not clear from this study if the sensitivity to configural information within the anatomical boundaries of the FG, overlap with the face-selective regions of FFA. Another study reported substantial reductions in the FFA responses to rearrangement of face parts while responses in the STS remained unchanged, also suggesting a unique sensitivity of the FFA to the 1st order configuration of face parts in contrast to a lack of sensitivity in the STS (Liu et al., 2010). However, in this study response amplitudes of STS to images of natural faces were low and thus the lack of sensitivity in STS to the 1st order configuration may have been the result of low signal to noise ratio in this region. Thus, the relative sensitivity of the AMG, STS, and FFA to the normal configuration of face parts remains unclear.

A related question is whether or not face selective regions of the AMG, STS, and FFA are similar in representing the natural appearance of face parts. Separate studies have shown that all of these regions represent face parts, especially the eye region (Puce et al., 1998, 2003; Allison et al., 2000; Morris et al., 2002; Wheaton et al., 2004; van Belle et al., 2010; Issa and DiCarlo, 2012). However, the relative sensitivity of face-selective regions to the natural appearance of face parts or the potential interaction of configural and featural representations among the AMG, STS, and FFA remains to be determined.

Here we asked if face-selective regions in the AMG, STS, and FFA are equally sensitive to the 1st order configuration and appearance of face parts. We performed fMRI in two experiments while participants viewed images of natural faces, or face images that were digitally transformed to remove the 1st order face configuration by rearrangement of internal face parts (rearranged faces, in Experiments 1 and 2) or to remove the natural appearance of face parts by replacement of natural parts with simple lines (schematic faces, Experiment 2), or both manipulations. We expected that brain regions which represent the overall face configuration would respond more strongly to naturally configured faces than to rearranged faces, and that regions representing the natural appearance of face parts would respond more strongly to faces with natural parts than to schematic faces.

### METHODS

### Participants

Twenty healthy European American adults (8 females) ages 18– 35 participated in Experiment 1. Two participants were removed from further analysis due to excessive motion during fMRI (see below). Eight (4 females) of the 18 also participated in Experiment 2. All participants were right handed with normal or corrected vision and without any past or current neurological or psychiatric conditions, or structural brain abnormalities. Informed consent was obtained according to the requirements of the Panel on Human Participants in Medical Research at Stanford University.

### Stimuli and Pilot Behavioral Test

In Experiment 1, stimuli included 60 gray-scale photographic images for each of the following five categories: natural faces, rearranged faces (digitally rendered by moving the internal face parts to random positions within the normal hairline using Adobe Photoshop), novel objects (abstract sculptures), indoor and outdoor scenes, and textures (scrambled versions of the other categories; **Figure 1A**). In Experiment 2, participants viewed another set of natural and rearranged natural faces, and novel objects as in Experiment 1, as well as 60 schematic faces and 60 rearranged schematic faces (**Figure 1B**). Visual stimuli were not repeated between Experiments 1 and 2. All natural and rearranged-natural faces were of European American males, standardized to show a frontal view of the face above the neck, displaying neutral expressions with no eyeglasses or jewelry, and were placed against a uniform gray background.

Schematic faces consisted of two eyes, a nose and a mouth within the face outline. These face parts were represented by simple lines and ovals (blurred using a Gaussian function in Adobe Photoshop), which did not resemble faces or face parts when presented in rearranged configurations. This was confirmed by a pilot study where 10 participants (not involved in fMRI) saw five samples of the rearranged schematics followed by five correctly configured schematic faces and were asked to identify each picture presented one at a time in response to the question: "What is this?" Rearranged schematics were labeled as faces or face parts in 4 of 50 trials, and correctly configured schematic faces were labeled as faces in 50 of 50 trials. These results demonstrate that the schematic stimuli were perceived as faces only when configured as a face (i.e., they were perceived as faces purely on the basis of the configuration of the internal parts, and the parts alone were not interpreted as either a face or parts of a face).

### FMRI and Behavioral Task

During fMRI, each image category was presented during five pseudo randomly ordered blocks. Blocks were 14 s long followed by 14 s of fixation background. Stimulus images were presented at 1 s intervals, each for 970 ms, followed by a 30 ms fixation baseline. Each image was presented only once, except for two randomly places images in each block, which were presented twice successively for a one-back task. Thus, there were two instances of the 1-back task during each block and these were randomly located within each block. Participants were instructed to look at each image and press a button using their right index finger whenever they detected identical images that appeared successively (i.e., a 1-back task). Responses during the 1-back task were collected in 20/20 subjects in Experiment 1 and 7/8 subjects in Experiment 2.

Images were projected onto a mirror mounted on the MRI coil (visual angle ∼ 14◦ ). Images were presented and responses were recorded via a Macintosh G3 computer using Matlab and the PsychToolbox extensions Psychtoolbox (www.psychtoolbox.org). Average response times for each stimulus category were calculated as the group mean of subjects' median time for correct responses during the one-back task.

### Scanning

Brain imaging was performed on a 3 Tesla whole-body General Electric Signa MRI scanner (General Electric, Milwaukee, WI) with a quadrature birdcage head coil. Participants used a bite bar (made of dental impression material) to stabilize the head position and reduce motion-related artifacts during the scans. First, a high-resolution 3D Fast SPGR anatomical scan (124 sagittal slices, 0.938 ×0.938 mm, 1.5 mm slice thickness, 256 × 256 image matrix) of the whole brain was obtained. Next, a T2-weighted fast spin echo in-plane with a slice prescription identical to that of the functional scan was acquired. Functional images were obtained using a T2<sup>∗</sup> -sensitive gradient echo spiralin/out pulse sequence using blood oxygenation level-dependent (BOLD) contrast (Glover and Law, 2001). Full brain volumes were imaged using 22 slices (4 mm thick plus 1 mm skip) oriented parallel to the line connecting the anterior and posterior commissures. Brain volume images were acquired continuously with a repetition time (TR) of 1400 ms, TE = 30 ms, flip angle = 70◦ , field of view = 240 mm, 3.75 × 3.75 mm in-plane resolution, 64 × 64 image matrix. Data for Experiments 1 and 2 were acquired during separate runs in the same session, each run was approximately 14 min.

### Data Analysis

Data were analyzed using the FSL (5.0.8) toolbox from the Oxford Centre for fMRI of the Brain (www.fmrib.ox.ac.uk/fsl) for group analysis (**Figure 2**) and Statistical Parametric Map (SPM) software package (SPM2, Wellcome Department of Cognitive Neurology) for region of interest (ROI) analyses (**Figures 3**, **4**, Figures S1–S3). The first 10 functional volumes were discarded to allow for T1 equilibration. Functional scans were motion corrected (Jenkinson et al., 2002). As noted above, data from two participants were not used for further analysis due to excessive motion (>2 mm), leaving 18 subjects in Experiments 1 and 8 subjects who also participated in Experiment 2.

### Voxel-wise Analysis

Voxel-wise fMRI analyses were performed using the FSL (5.0.8) toolbox from the Oxford Centre for fMRI of the Brain (www.fmrib.ox.ac.uk/fsl). After motion correction, all non-brain matter was removed using FSL's brain extraction tool. Data were spatially smoothed using a 5 mm full-widthhalf maximum Gaussian kernel. Registration was conducted through a three-step procedure, whereby BOLD images were first registered to the inplane structural image, then to the SPGR high resolution T1 structural image, and finally into standard [Montreal Neurological Institute (MNI)] space (MNI avg152 template), using 12-parameter affine transformations (Jenkinson and Smith, 2001). Registration from SPGR structural images to standard space was further refined using FNIRT nonlinear registration (Andersson et al., 2007a,b). Statistical analyses at the single-subject level were performed in native space, with the statistical maps normalized to standard space prior to higherlevel analysis.

Whole-brain statistical analysis was performed using a multistage approach to implement a mixed-effects model treating participants as a random effects variable. Regressors of interest were created by convolving a delta function representing block onset times with a canonical (double-gamma) hemodynamic response function. Six motion parameters were included as covariates of no interest to account for variance associated with residual motion. Two additional metrics of motion were also included as covariates: frame-wise displacement and a combination of the temporal derivative of the time series and root mean squared variance over all voxels (Power et al., 2014). For all analyses, time-series statistical analysis was carried out using FILM (FMRIB's Improved Linear Model) with local autocorrelation correction (Woolrich et al., 2001) after high-pass temporal filtering (Gaussian-weighted LSF straight line fitting, with sigma = 33 s).

For this group-level analysis, the FMRIB Local Analysis of Mixed Effects (FLAME1) module in FSL was used (Beckmann et al., 2003; Woolrich et al., 2004), and a one-sample t-test was performed at each voxel for each contrast of interest. Z (Gaussianised T) statistic images were thresholded using cluster-corrected statistics with a height threshold of Z > 2.3 (unless otherwise noted) and a cluster probability threshold of p < 0.05, corrected using the theory of Gaussian Random Fields (Worsley et al., 1992), either at whole-brain or within specified masks containing regions of interest. All data were subjected to robust outlier deweighting (Woolrich, 2008). For the contrast natural faces > rearranged natural faces (**Figure 2A**), we restricted analyse to regions relevant for face processing, including bilateral ventral occipito-temporal cortex, STS, and AMG. A mask consisting of these regions, anatomically defined via the Harvard-Oxford Probabilistic Atlas, was applied to the contrast images prior to group-level statistical inference. We also examined the reverse contrast, rearranged faces > natural faces, without restricting to this mask, using a more exploratory approach (**Figure 1B**, also see Table S1).

Anatomical loci of all activations were verified using a sectional anatomy atlas (Duvernoy and Bourgouin, 1999).

## Functional Region of Interest (ROI) Analyses

#### Independent Analyses

We conducted independent analyses of percent signal change within functionally defined ROIs and generated two separate data sets: (i) defined functional ROIs using Experiment 2 and extracted signals from Experiment 1 (**Figure 3C**, Figure S1); (ii) defined functional ROIs using Experiment 1 and extracted signals from Experiment 2 (**Figure 3D**, Figure S3). None of the stimuli were repeated between Experiments 1 and 2. Both experiments included blocks of natural and rearranged natural faces, but only Experiment 2 included blocks of schematic and rearranged schematic faces.

To define face-selective regions, we used spatially smoothed (6 mm FWHM) functional images in each subject's native space and the contrast of natural faces > novel objects (at p < 0.001 uncorrected, cluster size > 3 voxels), and selected suprathreshold voxels within the anatomical boundaries of the AMG, the posterior superior temporal gyrus (STS) or the FG. These latter activations were centered in the FG and extended medially to the collateral sulcus. When more than one cluster of faceactivation was evident along the FG, we selected the more extensive activation.

### Constant Size, Peak, and Spherical ROIs

In each subject we selected three neighboring voxels at the peak of face-selectivity based on the highest T-value for the contrast (natural faces > novel objects) in the AMG, STS, and lateral FG in Experiment 1. We also defined two additional concentric spherical ROIs in the lateral FG, one matched to the size of the average FFA volume across all subjects, and another matched to 150% of the average FFA volume. We extracted the percent signal change to face like stimuli and objects during Experiment 1 from all voxels within these ROIs. Then we calculated the relative selectivity for face like stimuli as ([f – o]/[f + o]), where "f " is the percent signal change to natural or rearranged faces, and "o" is the percent signal change to novel objects (see **Figure 4**). Thus, these ROIs were all centered at the peak of individually defined face-selective regions, but the specific selection of voxels that were included in this analysis were not functionally defined and independent of the signals that we extracted from these voxels.

#### Dependent Analyses

For Experiment 1, in one analysis we functionally defined ROIs using Experiment 1 data and extracted signals from the same experiment (see Figure S2).

**Percent BOLD signal change** for each stimulus category was determined by extracting the raw time-course data for each ROI. For each subject, if a given anatomical location showed <3 supra-threshold voxels for the contrast of interest, that ROI was not included in the analysis. Data were then band-pass filtered (high-pass = 0.0052 Hz cut-off; low-pass = SPM's synthetic hemodynamic response function, Gaussian temporal filter at 4 s FWHM cut-off), shifted in time by 6 s to account for the hemodynamic lag, averaged within each stimulus block (14 s), and then across blocks of each category. Individual time series data were converted to percent signal change relative to the mean activation during fixation blocks, and normalized to the mean activation during texture blocks.

### Statistical Analysis

Differences in the percent signal change between stimulus categories (repeated measure) and ROIs were evaluated using repeated measures analysis of variance (rmANOVA) or paired ttests. All reported statistics are based on two-tailed tests, unless otherwise noted.

### RESULTS

### Behavioral Results

During fMRI participants (n = 18) performed a one-back task while viewing image categories that were presented in blocks. In Experiment 1, these categories included images of natural faces, natural face images after digital rearrangement of internal parts (rearranged faces), novel objects (abstract sculptures), scenes (outdoor, indoor, buildings), and scrambled images of the other categories (**Figure 1**). In Experiment 2, eight subjects (who also participated in Experiment 1) viewed different images of the same categories as in Experiment 1, in addition to simple line drawing of faces (schematic faces: ovals and lines within a large oval outline, arranged in the 1st configuration of face parts) and schematic faces with internal parts randomly rearranged (**Figure 1**).

Repeated measures analysis of variance (rmANOVA) on response latencies (repeated measures) during the one-back task across visual stimuli showed a significant effect of visual stimulus category when we included all visual stimuli in Experiment 1, but not when we limited the comparison to face stimuli in a post-hoc analysis in Experiment 1 [**all stimuli**: F(4, 17) = 3.10, P = 0.03; **face stimuli**: F(1, 17) = 0.25, P = 0.63]. There were no category effects in the response latencies during Experiment 2 [**all stimuli**: F(5, 6) = 1.51, P = 0.25; **face stimuli**: F(3,6)=0.24, P = 0.65, **Figure 1A**, due to technical issues we did not record the 1-back responses in 1 subject during Experiment 2].

Accuracy in performing the one-back task was high (82–99%) across all stimulus categories, and did not differ significantly when we examined responses to all visual stimuli or only face stimuli in Experiment 1 [**all stimuli**: F(4, 17) = 1.41, P = 0.24; **face stimuli**: F(1, 17) = 0.30, P = 0.60] or in Experiment 2 [**all stimuli**: F(5, 6) = 2.52, P = 0.18; **face stimuli**: F(3,6) = 1.85, P = 0.19, rmANOVA, **Figure 1B**]. These findings suggest that participants paid equal attention to all stimuli during fMRI.

each stimulus type is shown below the corresponding bar graph. (A) Participants' response times during the 1-back task is plotted for Experiment 1 (*n* = 18) or Experiment 2 (*n* = 7) for each category of visual stimuli displayed below the bar graphs. *ns:* Response times to natural vs. rearranged faces were not statistically different. (B) Participants' accuracy in performing the 1-back during Experiment 1 or Experiment 2 is plotted based on proportion correct (maximum = 1). *ns:* Accuracy in performance of 1-back task for natural vs. rearranged faces were not statistically different.

## Imaging Results Differential fMRI Responses to Natural and

Rearranged Faces: Voxel-wise Group Analysis

To determine regions across the brain that respond to the 1st order configural information in faces, we examined the contrast of natural faces > rearranged natural faces. After correcting for multiple comparisons (restricting to AMG, FG, and STS), we found bilateral AMG activation (**Figure 2A**). However, clusters within bilateral STS were only found using uncorrected thresholds (P < 0.01).

To our surprise, we found no activation in the FG or the wider ventral or occipital temporal cortex whether in the corrected or uncorrected group analyses (Table S1A). This lack of activation was not due to non-specific BOLD artifacts as we found a robust activation to the reverse contrast (rearranged natural faces >

*P* < 0.01). (B) The contrast *"rearranged natural faces* > *natural faces"* showed activation along the medial FG and parietal regions (see Table S1). Whole-brain activations were cluster-corrected (cluster-threshold: *P* = 0.05, height threshold: *Z* > 2.3). L, left hemisphere. Color bar indicates z-statistic range.

natural faces, cluster-corrected, P < 0.05, Z > 2.3) along the bilateral FG (**Figure 2B**) as well as several other cortical regions (Table S1A), mostly across occipito-parietal cortex, including the collateral sulcus, parietal and frontal cortices and the precuneus.

These findings suggest that responses in the AMG and possibly STS, but not the FFA are sensitive to disruption of 1st order configural information in faces, supporting a differential representation of 1st order face configuration across these ROIs. Alternatively, the weak STS activation and lack of FFA activation may be due to greater between-subject variability in the location and spatial extent of face-selective regions along the length of the STS and FG respectively. We tested these possibilities by evaluating the response profiles of individually defined functional ROIs below.

### Independent Analyses of Percent BOLD Signal Change during Experiment 1 among Individually Defined Functional ROIs

To test the hypothesis that 1st order face configuration is differentially represented across the face-selective regions of AMG, STS, and FFA, we examined the response properties of these regions in an independent analysis. We used an independent experiment (Experiment 2, n = 8) as a localizer to functionally define these regions of interest (ROIs) in each subject's native brain space, based on the contrast of (natural faces > novel objects, p < 0.001,see Methods). Next, we measured these regions' response amplitudes during Experiment 1 (**Figure 3B**).

As expected in all three regions of AMG, STS, and FFA, response amplitudes to natural faces were higher than to objects (Figure S1). However, these ROIs varied in their sensitivity to the 1st order configural information in natural faces (**Figure 3B**). A two-way rmANOVA of response amplitudes to natural vs. rearranged face stimuli across the three face-selective ROIs in the right hemisphere showed significant main effects of ROI and face-type, and a significant ROI by face-type interaction in the right hemisphere [**right: ROI**: F(2, 18) = 50.47, P < 0.0001, **face type:** F(1, 18) = 48.84, P = 0.0001, **ROI X face-type**: F(2, 18) = 6.1, P = 0.009, **Figure 3B**]. In the left hemisphere, we also found a significant main effect of ROI, indicating variations among ROI responses, but there were no significant effects of face-type [**left: ROI**: F(2, 19) = 4.9, P < 0.02, **face type:** F(1, 19) = 2.25, P = 0.14, **face-type X ROI**: F(2, 19) = 1.83, P = 0.19, rmANOVA, Figure S1].

In a series of post-hoc analyses on the responses of each ROI, we found that rearrangement of face parts resulted in a significant reduction in response amplitudes in the right, but not in the left AMG [**right:** t(6) = 2.64, p = 0.034, **left:** t(7) = 0.97, p = 0.36, paired t-test]. Likewise, there was a significant reduction in response amplitudes in the STS bilaterally [**right:** t(5) = 7.73, p = 0.0001**, left**: t(6) = 2.6, p = 0.03, paired t-test]. This effect was highly consistent in the right hemisphere of all subjects and evident in every right AMG and STS ROI that we tested. In contrast, removal of configural information did not change responses in the FFA in either hemisphere (t < 1, p > 0.3, paired t-test). Together, these data support the hypothesis that the AMG, STS, and FFA differentially represent the 1st order configuration of faces.

FIGURE 3 | Face selective functional ROIs were defined using data from one experiment and signals were extracted during an independent experiment. (A) Face selective ROIs in the AMG, posterior STS and FFA were defined by the contrast of *natural faces* > *novel objects, P* < 0.001. Examples of individual t-maps with this contrast are overlaid on coronal slices of high-resolution T1 volume from a representative participant. Functional ROIs are high-lighted by a red circle. (B) During Experiment 1, visual stimuli included natural faces with the normal configuration of face parts ("conf +") and face-like stimuli where internal parts were randomly rearranged within the face boundary ("conf –"). During Experiment 2 visual stimuli included face-like images that retained the natural appearance of face parts (images with red outline, "parts +") or to face-like schematics (green out line, "parts –"). Each type of stimulus was presented either by retaining the 1st order configuration of internal face parts ("conf +"), or random rearrangement of internal parts ("conf –"). Independent analysis of response amplitudes during Experiment 1 to face-like stimuli in the right hemisphere from face-selective regions of AMG, STS, and FFA. Red lines show response amplitudes to face stimuli that retained the 1st order configural information ("conf +") and stimuli with internal parts randomly rearranged ("conf –"). Error bars show ± SEM. Right AMG: Removal of configural information significantly reduced response amplitudes in the right AMG. \*conf: *p* = 0.03, *n* = 7. Right STS: Removal of configural information significantly reduced response amplitudes in the right STS. \*\*conf: *p* = 0.0001, *n* = 6. Right FFA: Removal of configural information did not reduce response amplitudes in the right FFA in the presence of part information. *n* = 8. Independent analysis of response amplitudes during Experiment 2 to face-like stimuli in the right hemisphere from face-selective regions of AMG, STS, and FFA. Red lines show response amplitudes to face stimuli that retained natural part information ("part +"). Green lines show response amplitudes to schematic faces ("part –"). Responses to face stimuli are plotted for the subtypes that retained the 1st order configural information ("conf +") and stimuli with internal parts randomly rearranged ("conf –"). Error bars show ± SEM. Right AMG: Removal of configural information significantly reduced response amplitudes in the right AMG in the presence (red line) or absence (green line) of part information. \*conf: *p* = 0.01, *n* = 7. Right STS: Removal of configural information significantly reduced response amplitudes in the right STS in the presence (red line) or absence (green line) of part information. \*\*conf: *p* = 0.0001, *n* = 6. Right FFA: Removal of configural information did not reduce response amplitudes in the right FFA in the presence of part information (red line), but did so in the absence of part information (green line), revealing a significant interaction between factors of part and configural information. †conf X part: *p* = 0.007, *n* = 8.

### Percent BOLD Signal Change among Individually Defined ROIs during Experiment 1 in a Dependent Analysis

We replicated these results in a dependent analysis of AMG, STS, and FFA responses during Experiment 1 (defined functional ROIs and extracted signals from the same data, n = 18, Figure S2). Thus, the lack of modulation to 1st order configural information in the FFA was not a result of variability in FFA localization between experimental runs.

### Percent BOLD Signal Change among Individually Defined ROIs of Constant Size during Experiment 1

Next, we tested the possibility that sensitivity to 1st order face configuration in the FFA may be evident among its more faceselective voxels, and similar to the responses of the AMG and STS. Thus, we selected three adjacent voxels including the peak of face-selectivity in each of the anatomical regions of the AMG, STS, and FFA, and extracted response amplitudes to face and face-like stimuli during Experiment 1 (n = 18, **Figure 4**). We found that response amplitudes to natural faces were significantly higher than responses to rearranged faces around the peak of selectivity in the AMG and STS (p < 0.001, paired t-tests), but not the FFA (p = 0.23).

FIGURE 4 | Measure of selectivity for natural ("Conf +") or rearranged face ("Conf –") stimuli is plotted for three adjacent voxels including the peak of the AMG, STS, and FFA, and two additional concentric ROIs in the FG. Selectivity was calculated based on the difference of % signal change for each type of face stimulus vs. objects (*[face – object]/[face* + *object]*). The ROIs were defined for each subject as three adjacent voxels including the peak selectivity for faces ("*peak"*), a concentric sphere matched in volume to the size of the average FFA across all subjects (*"matched FFA"*) and a sphere that was 50% larger in volume (*"50% larger"*). Response amplitudes to natural faces were significantly higher in the peak voxels of the STS and AMG ( \**P* < 0.001, *n* = 18). There were no significant differences in selectivity for natural vs. rearranged faces at the FFA peak or the sphere matched to FFA size (*ns, p* > 0.25). Selectivity was significantly higher for rearranged than natural faces in the "50% larger" ROI (\*\**P* = 0.048, *n* = 18).

To examine further the lack sensitivity of FFA responses to rearrangement of face parts, we considered the converse possibility that voxels with lower selectivity for faces within the FFA may show a greater range of responses and more modulation to rearrangement of face parts. Thus, in each subject's FG we also defined two larger concentric spherical ROIs centered at the peak of face selectivity in FG, one matched in volume to the average size of FFA across all subjects and the other matched in volume to 50% larger than the average FFA size. We found no significant difference in the selectivity to images of natural faces vs. rearranged face stimuli in the sphere overlapping the FFA in either hemisphere (p > 0.3, paired t-test, **Figure 4**).

Interestingly, there was a trend toward higher selectivity for rearranged faces in the larger sphere that extended outside the right FFA (**right:** p = 0.05; **left:** p > 0.09, n = 18, paired t-test, **Figure 4**), consistent with the extended activation along the FG to the contrast of [rearranged face > natural face] in the group analysis (**Figure 2B**).

Note that the selection of voxels was based on constant sized ROIs (three voxels in case of the peak ROIs, and based on the group averaged size of the FFA for the concentric spheres), providing an independent analysis of regional response profiles.

Together these findings indicate that in contrast to the responses of STS and AMG, neither the highly face-selective voxels at the peak of the FFA nor the FFA voxels surrounding the peak showed any signal modulation to removal of 1st order face configuration.

### Independent Analyses of Percent BOLD Signal Change during Experiment 2 among Individually Defined Functional ROIs

Next, we tested the possibility that FFA's potential sensitivity to removal of 1st order configural information may be masked by its high amplitude responses to the natural appearance of face parts. Thus, in Experiment 2 we manipulated the appearance of face parts and used schematic faces with internal parts that consisted of simple lines, arranged either in the normal face configuration or randomly rearranged within the boundaries of an oval. Then, we examined the responses of the face-selective ROIs (AMG, STS, and FFA) to four types of face-like stimuli: (i) natural faces, (ii) rearranged-natural faces, (iii) schematic faces with the normal face configuration or (iv) rearranged schematic faces (**Figure 3B**, also see Methods).

Among the face-selective region of AMG, a two-way rmANOVA with factors of configuration and part information on the repeated measures of response amplitudes to face and face-like stimuli showed a significant main effect of configural information and a significant interaction between configural and part information; however, the effect of part information did not reach significance [**right: configural information:** F(1, 6) = 71.62, P < 0.0001, **part information:** F(1, 6) = 0.20, P < 0.67, **configural X part information**: F(1, 6) = 16.99, P < 0.007, **Figure 3B**]. This interaction was due to a trend toward higher amplitude of responses to natural than to schematic faces only when the natural face configuration was preserved [t(6) = 1.9, p = 0.04, one-tailed paired t-test]. In the left AMG signal amplitudes to faces were close to baseline and differences between face-like stimuli did not reach significance (Figure S3). Thus, the right AMG responses were more sensitive to the presence of the 1st order face configuration than to the appearance of those parts.

Among face-selective regions of STS, a two-way rmANOVA with factors of configuration and part information on the repeated measures of response amplitudes to face and face-like stimuli showed a significant effect of configural information [**right STS: configural information:** F(1,5) = 9.81, P = 0.01, twoway rmANOVA, **Figure 3B**], as rearrangement of internal face parts reduced STS responses [**right STS**: t(5) > 2.62, p < 0.05, paired t-test] regardless of the natural or schematic appearance of face parts. However, there were no effects of part information and no interactions between configural and part information [F(1,5) < 1.45, P > 0.26, two-way ANOVA]. In the left STS there were similar trends toward an effect of configuration as well as a trend toward an effect of part information (P = 0.1, n = 6, two-way rmANOVA, Figure S3). These data confirm that faceselective regions in the right STS are sensitive to the configuration of internal face parts, but less sensitive to the natural appearance of those parts, analogous to AMG responses.

Distinct from the AMG and STS, a similar two-way rmANOVA on responses in the FFA revealed significant main effects of configuration, part information and an interaction between these factors [**right FFA: configural information:** F(1, 7) = 20.13, P = 0.001, **part information:** F(1, 7) = 4.10, P = 0.05, **configural X part information**: F(1, 7) = 10.36, P = 0.007, rmANOVA, **Figure 3B**]. These effects were due to a significant reduction in the response amplitudes to rearranged schematic faces (i.e., removal of both configural and part information) relative to other face-like stimuli, which preserved either or both type of information [t(7) > 3.71, p < 0.01, paired t-test]. Results were similar in the left FFA (Figure S3). Thus, FFA responses were generally unchanged to rearrangement of internal face parts in naturalistic face stimuli or after removal of the natural appearance of face parts in simple schematics, if these retained the 1st order configuration of internal parts. However, simultaneous rearrangement of internal parts and replacement of the parts with simple lines lead to a substantial signal reduction in the FFA (**Figure 3B**), rendering these responses indistinguishable from FFA response amplitudes to objects (see Figure S3C).

### DISCUSSION

We used fMRI to examine brain responses while participants viewed images of natural faces and images of face-like stimuli that were digitally transformed by rearrangement of internal face parts, replacement of natural parts with lines, or both manipulations. We found evidence for different sensitivities to the 1st order face configuration and the natural appearance of face parts across the three face-selective regions of the AMG, STS, and FFA. Specifically, AMG and STS responses were primarily modulated by the presence of the 1st order configuration of internal face parts, and less so by the natural appearance of those parts. In contrast, FFA responses showed surprisingly little modulation by removal of either the 1st order face configuration or the natural appearance of those parts. Instead, FFA responses were substantially diminished when both types of information were removed. These findings reveal differential representations of configural and part information across face-selective regions of the AMG and STS vs. FFA, suggesting distinct neural mechanisms of configural and part processing among these regions.

Several of our findings support the above interpretations of the data. First, participants' performance on the 1-back task during fMRI showed that accuracy and response times were similar for all face and face-like stimuli, indicating that there were no substantial differences in global attention to these stimuli. Second, four different fMRI analyses converged on the same main findings: (i) Voxel-wise group analyses of fMRI signals in Experiment 1 (**Figure 2**) revealed that regions in the AMG represent the 1st order configural information in natural faces, as the AMG responded more to natural than to rearranged faces. A similar, but statistically weaker, activation was also evident in the STS. In contrast, no regions in the FG showed this sensitivity. (ii) Independent analyses of ROI responses—by functionally defining ROIs in one experiment and extracting signals from another experiment within each subject's native brain anatomy—confirmed that the AMG and STS differ from the FFA in representing configural and part information (**Figure 3**). Furthermore, this analysis revealed a unique interaction among these representations, specifically in the FFA. These regional variations in representation of 1st order configural information were consistent in our results from both iterations of independent analyses across the two experiments (using Experiment 1 as localizer and extracting signals from Experiment 2 and vice-versa). (iii) Also consistent were results from analysis of peak responses (in 3 adjacent voxels including the peak) in the AMG, STS, and FFA, and also spherical ROIs in the FFA, which were individually defined, fixed in size and centered at the peak of selectivity in each region and subject (**Figure 4**). The selection of two voxels adjacent to the peak (and the spherical ROIs in the FFA) was agnostic to the functional properties of these voxels. However, this analysis in the FFA showed no evidence of reduction in response to rearranged vs. natural faces. Importantly, the lack of response modulation to removal of the 1st order configuration in natural faces even in the vicinity of the peak of the FFA ruled out the possibility that this lack of sensitivity in the FFA is due to signal averaging at its boundary, with regions outside of the FFA. (iv) Finally, a dependent analysis of FFA responses during Experiment 1 (Figure S2) confirmed the lack of FFA modulation by 1st order configural information, ruling out the possibility of confounds related to between run variability in localization of FFA. Note that the latter two analyses on data from Experiment 1 had the advantage of higher statistical power due to larger number of subjects (compared to the independent analyses). Yet, these analyses consistently showed signal modulation to removal of 1st order configural information in the AMG and STS and a lack of this modulation in the FFA, even among its peak faceselective voxels. Together, these findings reveal that AMG and STS are sensitive to both configural and part information, but a distinct response profile was found in the FFA responses, suggesting diverging neural pathways for configural and part

processing across these regions during viewing of neutral faces.

### Face Selective Regions of AMG and STS Represent the Typical Face Configuration

The sensitivity of face-selective regions in the AMG and STS to the 1st order configuration of faces may be understood in terms of these regions' functional specialization in extracting specific types of facial information, which are depleted in the rearranged face-like stimuli, namely socially relevant facial information. For example, the AMG is involved in recognition of facial affect, and responds to emotionally salient stimuli (Adolphs and Spezio, 2006). Similarly the STS is associated with speech, eye gaze, and emotional expression (Puce et al., 1998; Allison et al., 2000; Hoffman and Haxby, 2000; Materna et al., 2008) and more generally biological motion (Puce and Perrett, 2003; Grossman et al., 2010). The STS is also implicated in inferences about intentions, beliefs, and feelings of other persons and more generally social perception (Yang et al., 2015). Thus, greater AMG and STS responses to natural faces than to rearranged faces may reflect participants' extensive prior experience with natural faces in social contexts, and the paucity of socially relevant information that is conveyed by the rearranged or simple schematic faces.

Second, there is evidence that the STS and AMG extract information from specific facial features. For example, AMG responses to facial expressions of fear are critically dependent on the appearance of the eyes (Morris et al., 2002; Rutishauser et al., 2015). Interestingly, the white regions of the eyes are sufficient to activate AMG responses (Whalen et al., 2004). Other studies have reported that the AMG (in contrast to the visual cortex) is specifically responsive to the low spatial frequency information in fearful facial expressions (Vuilleumier et al., 2003; Winston et al., 2004). Indeed, the low spatial-frequencies in faces retain a disproportionate amount of configural information while losing mostly local part information. Consistent with AMG representation of configural information, our data highlight that, during viewing of neutral faces, removing the overall configuration of face parts substantially reduced AMG or STS responses, but removing the natural appearance of face parts did not substantially modulate these signals.

The sensitivity of AMG and STS to configural information that we found during viewing of neutral faces does not contradict the significance of facial features during processing of affective or communicative facial signals. One possibility is that reliance on part information in AMG and STS may be greater during processing of expressive faces (compared to our findings during viewing of neutral faces). Another possibility is that configural information ensures the efficient detection of affective information from the relevant face parts (e.g., from the eyes) during observers' typical patterns of eye movements in scanning the internal features of face stimuli. Future studies of eye-movements during viewing of rearranged faces will be useful in determining the significance of 1st order configuration of internal face parts in automatic targeting of observers' gaze upon face parts during free viewing. Likewise, in our study we used neutral faces to define face-selective voxels in the AMG and STS. However, voxel selection criteria based on expressive faces may yield a different spatial spread across the STS and different functional properties. Thus, future experiments using expressive faces will be important in revealing the relative contributions of configural and part information to AMG and STS responses.

### FFA Responses to Naturalistic Face Parts and the Typical Face Configuration

In contrast to the STS and AMG, responses of the FFA were virtually identical when participants viewed natural faces or natural face parts that were randomly rearranged within the face outline, across two experiments and a number of analyses. This lack of modulation was not due to low sensitivity in our measurements, given that the reverse contrast revealed response modulation to these stimuli in nearby regions in the FG (**Figure 2B**). Indeed, our findings are consistent with a number of earlier fMRI studies that found small or no differences in FFA activations when face configuration was manipulated by inversion (Kanwisher et al., 1998; Beauchamp et al., 1999; Joseph et al., 2006), randomly fragmenting face images by up to 16 divisions (Grill-Spector et al., 1998; Lerner et al., 2001), or rearrangement of face parts (Collins et al., 2012). However, in these studies face inversion, fragmentation or rearrangement preserved some information on the spatial relations among the internal face parts, leaving open the possibility that the spatial relations among these parts may be critical in evoking FFA responses. Our results rule out this possibility.

A more recent study by Liu et al. found evidence for signal reductions in response to rearranged faces in the FFA but not the STS (Liu et al., 2010), in apparent contrast to our findings. However, this reduction was reported for a combination of rearranged faces with natural face parts and cartoon like face parts (i.e., internal face parts that were replaced with dark ovals, Figure 3 in Liu et al.). This signal reduction to the combined removal of face configuration and part information is in fact consistent with the reduced FFA responses to rearranged cartoon faces in our data. Based on our data, we hypothesize that the rearranged cartoon like faces, which lacked both the 1st order configuration and natural appearance of face parts, primarily drove Liu et al.'s reported findings. In turn, our data suggest a more complex scenario, and provide evidence for an interaction between 1st order configuration and part information in the FFA.

Our results pose an apparent paradox. Namely, behavioral studies have shown that rearrangement of natural faces slow face detection (Homa et al., 1976; van Santen and Jonides, 1978; Purcell and Stewart, 1988; Rolls et al., 1994) and hamper face recognition (Tanaka and Farah, 1993). Also, small variations in the shape and configuration of face parts across individual identities are readily detected during face recognition and identity discrimination, and failure to detect these small variations are associated with reduced face recognition performance (Le Grand et al., 2003). Furthermore, there is evidence for whole face processing in the right FFA (Rossion et al., 2000) and signal modulation in the FFA in response to subtle variations in the spatial relations among face parts (Rhodes et al., 2009). These data would suggest that the normal face configuration is critical for operation of the neural systems that are involved in face detection and recognition, such as the FFA (Golby et al., 2001; Ishai et al., 2002; Grill-Spector et al., 2004; Winston et al., 2004; Kanwisher and Yovel, 2006) and would specifically predict response reductions in the FFA for rearranged faces, contrary to our findings. Another hypothesis suggests that responsiveness to faces in the FFA depends on the extensive experience that most individuals have with natural faces (Gauthier et al., 2000; McGugin et al., 2014). This notion of "expertise" would also predict reductions in FFA responsiveness to rearranged faces, a category of visual stimuli with which participants had no previous experience. Our results counter these convergent predictions, showing that novel configurations of internal face parts were just as effective in activating the FFA, as were natural faces.

Why were FFA responses reduced by the rearrangement of schematic but not natural face parts? One possibility is that the variability and salience of rearranged natural faces leads to higher activations among face-selective regions, compensating for any signal reduction due to loss of configural information. Indeed, the higher variability in the configuration of internal face parts might reduce the potential for adaptation effects in the FFA. Although our results were consistent when we examined FFA voxels at peak selectivity for faces, or voxels that included a wider range of selectivity across the FFA (**Figure 4**), we cannot rule out FFA's signal reduction due to adaptation to the 1st order configuration in natural and schematic faces in our data. Also, the bizarre appearance of the rearranged faces might increase their salience and face-selective regions' response amplitudes to these faces, compared to natural faces. In case of the FFA, these effects might be sufficient to compensate for any signal reduction due to loss of the 1st order configuration. Testing these possibilities requires a systematic analysis of image similarity and adaptation responses in the FFA across the various face-like stimuli in future studies. However, the contrast between the unchanging response profiles of the FFA to these face-like stimuli compared to the AMG and STS, both of which showed substantial signal reduction to rearranged faces, indicate that the relative contribution of these opposing factors vary across these face selective regions. These findings support the notion that configural and part information are processed along neural pathways that are distinct for FFA vs. AMG or STS.

A second possibility is that FFA responds to incomplete facial information in an all-or-none manner, perhaps involving pattern-completion mechanisms to compensate for missing facial information. Note that in our pilot behavioral studies, naïve observers categorized the normal schematic faces as faces, but not the rearranged schematics. These observations support the idea that FFA responses parallel the subjective experience of face perception (Hasson et al., 2001; Ishai et al., 2002; Grill-Spector et al., 2004). In our experiments, partial information of natural face parts or their correct configuration were each sufficient to activate the FFA well above the level of nonface objects. This responsiveness to incomplete face information resembles similar effects reported for object selective responses in the lateral occipital complex (Lerner et al., 2004) and may be a general property of the FFA when viewing face-like stimuli in the presence of contextual cues. The significance of these completion mechanisms in FFA's responsiveness to isolated facial information or contextual cues remains to be more systematically determined during face-identification tasks.

A limitation in our study was that we did not vary subjects' task during fMRI and only used face stimuli with a neutral expression. Future experiments that include a wider range of tasks and face stimuli are needed to determine the effect of 1st order configuration and part information during specialized processing of facial emotions, communicative expressions or identity by the AMG, STS, or the FFA respectively. Also, we focused our ROI analyses to only three brain regions as we were: (i) guided by the results of the group analysis in Experiment 1, (ii) motivated to test the hypothesis that FFA responses are particularly sensitive to prior experience with face and nonface stimuli, and (iii) limited in terms of statistical power for a more comprehensive analysis (due to the small sample size in Experiment 2). Future studies of additional brain regions, which are thought to be part of the core or extended face-processing network are needed for a more comprehensive view of how configural and part information in faces are represented across this network.

## CONCLUSION

Face perception is thought to involve the coordinated activity of a distributed neural system in humans that consists of multiple, face-selective regions including the AMG, STS, and FFA. It has been suggested that the AMG and STS represent changeable aspects of a face, extracting socially relevant meaning from faces, and the FFA mediates the visual analysis of faces representing their invariant aspects important in face detection and recognition. Our results show that during viewing of neutral faces, the STS and AMG responses are relatively invariant to removal of the natural details of the face as long as the typical face configuration is retained. In contrast FFA responses are invariant to either removal of the typical face configuration or the natural details of the face parts, but sensitive to simultaneous removal of both types of information. These findings emphasize the distinct representations of the typical face configuration and natural appearance of parts in the AMG and STS vs. FFA, demonstrating each region's sensitivity to different visual information in the face.

### ACKNOWLEDGMENTS

This work was supported by NIH grants 5R21DA15893 & 1R21MH66747 to JG and JE, NIMH training grant MH15157-2 to DG, and a MIND institute training grant to GG. We thank Anders C. Greenwood for helpful input during all phases of this work and Kalanit Grill-Spector for useful discussions in designing the experiment.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2015.01710

### REFERENCES


magnetic resonance imaging study. J. Cogn. Neurosci. 20, 108–119. doi: 10.1162/jocn.2008.20.1.108


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Golarai, Ghahremani, Eberhardt and Gabrieli. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# **Face perception and processing in early infancy: inborn predispositions and developmental changes**

*Francesca Simion 1,2 \* and Elisa Di Giorgio <sup>3</sup>*

*<sup>1</sup> Department of Developmental and Social Psychology, University of Padova, Padova, Italy, <sup>2</sup> Center for Cognitive Neuroscience, University of Padova, Padova, Italy, <sup>3</sup> CIMeC, Center for Mind/Brain Sciences, University of Trento, Trento, Italy*

From birth it is critical for our survival to identify social agents and conspecifics. Among others stimuli, faces provide the required information. The present paper will review the mechanisms subserving *face detection* and *face recognition*, respectively, over development. In addition, the emergence of the functional and neural specialization for face processing as an experience-dependent process will be documented. Overall, the present work highlights the importance of both inborn predispositions and the exposure to certain experiences, shortly after birth, to drive the system to become functionally specialized to process faces in the first months of life.

#### *Edited by:*

*Andrea Hildebrandt, Ernst-Moritz-Arndt Universität Greifswald, Germany*

#### *Reviewed by:*

*Andrew H. Bell, MRC Cognition and Brain Sciences Unit, UK Naiqi G. Xiao, University of Toronto, Canada*

#### *\*Correspondence:*

*Francesca Simion, Department of Developmental and Social Psychology, University of Padova, Via Venezia 8, 35135 Padova, Italy francesca.simion@unipd.it*

#### *Specialty section:*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

*Received: 28 March 2015 Accepted: 28 June 2015 Published: 09 July 2015*

#### *Citation:*

*Simion F and Di Giorgio E (2015) Face perception and processing in early infancy: inborn predispositions and developmental changes. Front. Psychol. 6:969. doi: 10.3389/fpsyg.2015.00969* **Keywords: face perception, face processing, early infancy, perceptual narrowing, visual experience**

## **Introduction**

The ability to detect and to discriminate social beings from inanimate objects is of paramount importance to survive. Among other social cues in the environment, faces are probably the most important to us as humans, since they convey relevant social information, such as identity, age, gender, emotions. Humans are expert in processing faces, and evidence from behavioral, brain lesion, and neuroimaging studies suggests that, in adults, face processing involves specific face processing strategies (i.e., *functional specialization*, Farah et al., 2000) carried out by dedicated brain areas (i.e., structural or neural specialization, Allison et al., 2000; Kanwisher, 2000, 2010). Together, these findings support the hypothesis that the adult brain is equipped with a neural circuitry specialized for preferentially processing faces (Haxby et al., 2002; Haxby and Gobbini, 2011).

As regard with *neural specialization*, according to the models proposed by Haxby (Haxby et al., 2000; Haxby and Gobbini, 2011), face processing in humans recruits a complex and distributed neural system comprised of multiple regions. This system is formed by a "core system" and an "extended system" that work in concert. The core system comprises three functionally distinct regions of extrastriate cortex in both hemispheres: the inferior occipital region, which contributes to early stage of face perception, provides input both to the lateral fusiform gyrus (including the fusiform face area, FFA) for the processing of invariant characteristics of faces, and to the superior temporal sulcus (STS) for the processing of changeable aspects. The authors suggested that, to analyze all the information embedded in a face, it is necessary to postulate reciprocal interconnections between the core system and the extended system, which comprises brain structures responsible for other cognitive functions (i.e., frontal eye fields, intra-parietal sulcus, amygdala). This distributed neural network maps, at a functional level, the cognitive model of face processing proposed by Bruce and Young (1986). This model suggested that face processing is divided into two different processes: *face detection*, which implies the capacity to perceive that a certain visual stimulus is a face, and *face recognition*, that is the capacity to recognize whether a face is familiar (e.g., already seen before) or not and, successively, to identify the identity of a specific face.

As regard with *functional specialization*, evidence from adults' studies has shown that faces are special and are processed in a more holistic or configural way than objects (Tanaka and Farah, 1993; Farah et al., 1998; but see also Robbins and McKone, 2007). To recognize faces, we employ different strategies that require to process different information: the shape of single facial features (i.e., featural information), the space among inner facial features (i.e., second-order configural information) and the global structure of the face (i.e., holistic information; Maurer et al., 2002; Piepers and Robbins, 2012). The inversion effect, the composite face effect and the part-whole effect corroborate the notion of specific strategies in face processing as compared to the strategies adopted to process other objects.

The "*face inversion effect*" (FIE) refers to impairments in the configural information processing from inverted faces compared to other classes of objects (Rossion and Gauthier, 2002, for a review, Yin, 1969). This effect has been considered as the most critical marker for configural face processing in adults, even if some authors hypothesize that the inversion effect is a marker for the adult ability to process and recognize both the configual and featural information embedded in faces. Indeed, some evidence has been grounded that inverting a face affects the capacity to process featural as well as configural information (Rhodes et al., 1993; Malcolm et al., 2004; Riesenhuber et al., 2004; Yovel and Kanwisher, 2004).

The "*composite face effect*" refers to the phenomenon by which the recognition of the two halves of different faces is more difficult when they are horizontally aligned compared to when they are misaligned. In the aligned condition only, the two halves create the illusion of a novel face and therefore adults process it holistically. For this reason, this effect is considered a marker for holistic face processing (Young et al., 1987; Hole, 1994; Rossion, 2013), as well as "*the part-whole effect*" where subjects demonstrate to be more accurate in recognizing the identity of a face feature when it is embedded in the whole face (Maurer et al., 2002).

At first glance, the existence of specific brain areas and of specific strategies for face processing fits well with the idea that they are products of natural selection due to their survival value. For this reason, they are hypothesized to be domainspecific and likely innate (McKone et al., 2006; Wilmer et al., 2010; Zhu et al., 2010). Alternatively, as the experience-dependent hypothesis suggests, the existence of regions specialized for face processing might be the result of the extensive experience with this category of visual stimuli during lifetime (Gauthier et al., 1999; Tarr and Gauthier, 2000; Bukach et al., 2006). Within this open debate, a developmental approach becomes critical to answer the question about the origin of face specialization and whether the functional and structural specialization for face processing, found in adults, is present from birth or is the product of a progressive specialization attributable to visual experience.

Some data seem to contradict the hypothesis of a late and progressive specialization for face processing, because the available evidence, coming from both humans and non-humans, demonstrate early predispositions to orient to faces and renders the hypothesis of a late specialization uncertain. In effect, 2 day-old newborns, despite their lack of experience, orient preferentially toward face or face-like configurations rather than to other, equally complex, non-face stimuli (Goren et al., 1975; Morton and Johnson, 1991; Valenza et al., 1996; Macchi Cassia et al., 2004). Newly hatched chicks attend at patterns similar to the head region of their caregivers (Rosa Salva et al., 2011). Similarly, newborn monkeys, without any visual experience with faces, manifest a preference for faces as compared to objects (Sugita, 2008).

In light of the above evidence in the present paper empirical findings will be reviewed on the mechanisms that subserve face preference (i.e., *face detection*) and face recognition at birth and on the progressive structural and functional specialization of the system to faces during development.

### **General or Specific Mechanisms Underlying Face Preference at Birth?**

Different interpretations were proposed to account for human newborns' face preference, in terms of both domain-specific or of domain-general mechanisms underlying it.

Johnson and Morton (1991) proposed a two-process model of face processing, more recently updated (Johnson, 2005; Johnson et al., 2015), which hypothesizes that newborns possess a first face specific subcortical mechanism, named *Conspec*, to detect faces, selectively tuned to the geometry of a face, and a second, domainrelevant cortical mechanism, named *Conlearn*, that comes to specialize in face recognition. The subcortical mechanism guides the cortical one to acquire information about faces. In this model, face detection at birth is due to Conspec, the facesensitive mechanism adapted for perceiving conspecifics (Johnson and Morton, 1991), later defined as a subcortical low-spatial frequency (LSF) face specific detector, provided by evolutionary pressure active throughout the life span (Tomalski et al., 2009). This subcortical detector would guide the cortical areas that, later during development, will constitute the face network. Specialization of the face cortical circuits would emerge by the interaction of the subcortical mechanism that biases infants' visual attention toward faces and the experience with faces. Importantly, a recent neuroimaging study with newborns corroborated the idea that also the visual cortex contributes in part to the development of the face processing system starting from birth (Farroni et al., 2013), supporting the hypothesis that both subcortical and cortical mechanisms are present at birth (Acerra et al., 2002) and interact (Nakano and Nakatani, 2014). According to this model, the domain-specific mechanism supporting face detection allow newborns to orient to faces and, at the same time, biases the cortical circuits that, progressively will become specialized for face processing.

The existence of a mechanism specifically devoted to detect faces in the environment has been questioned by an alternative view (Simion et al., 2001, 2003, 2006; Turati, 2004) that proposed to explain newborns' preferences as due to domain-general attentional biases toward some structural properties present in a face as well as in other non-face like objects. According to this hypothesis, these general attentional biases are not specifically adapted for detecting faces, and likely derive from the functional properties of the immature newborn's visual system and they are applied in the same manner at faces and non-face stimuli. Indeed,

they are *domain-relevant* because allow newborns to successfully detect and identify faces when embedded among other nonfacelike stimuli (Simion et al., 2001). This view is consistent with the notion that newborns' visual system is immature and is sensitive not only to a certain range of spatial frequency, as described by the contrast sensitivity function (CSF; see Acerra et al., 2002 for a computational model), but also to other structural higher-level Gestalt-like properties, as demonstrated by newborns' preference for horizontal versus vertical stripes (Farroni et al., 2000). From this point of view, faces would be preferred because they are a collection of perceptual structural properties that attract newborns' attention. In effect, faces are symmetric along the vertical axis, contain areas of high contrast (i.e., the eyes) and have more elements in their upper part displaced congruently with the external outline. In addition, faces are three-dimensional, move and, importantly, manifest a behavior contingent upon the baby's activities. All these characteristics are present simultaneously in faces and render them probably the most interesting stimulus experienced by newborns.

Data from our lab showed that at least two non-specific structural properties can elicit newborns' preference both for faces (Turati et al., 2002; Macchi Cassia et al., 2004) and geometric configurations (Macchi Cassia et al., 2002, 2008; Simion et al., 2002). A first property, termed *up-down asymmetry* (or topheaviness), "*is defined by the presence of higher stimulus density in the upper than in the lower part of the configuration*" (Simion et al., 2002; Turati et al., 2002; Macchi Cassia et al., 2004). In effect, newborns preferred geometrical stimuli with more elements in the upper part when contrasted with the upsidedown version of them (Simion et al., 2002 see **Figure 1A**). The same results were replicated with face-like stimuli (Turati et al., 2002, see **Figure 1B**) and with real faces (Macchi Cassia et al., 2004, see **Figure 1C**) in which the geometry of the face was disrupted. These data suggest that this up-down asymmetry, if compared with the face geometry or face structure, is the critical factor in eliciting newborns' preference. This visual preference for configurations with more elements in the upper part may originate from an upper-field advantage in visual sensitivity that renders those configurations more easily detectable (Simion et al., 2002). This sensitivity is attributed to the fact that a major role in visual exploration of the upper visual field is played by the superior colliculus (Sprague et al., 1973), which is thought to affect preeminently newborns' visual behavior (Atkinson et al., 1992).

The second non-specific property is the *congruency* –"*i.e., presence of a congruent or corresponding relationship between the shape and orientation of the contour and the spatial disposition of the inner features*" (Macchi Cassia et al., 2008). Faces are congruent because they display a greater number of features (the eyes) in the wider, upper portion of the face outline and only one feature (the mouth) in the narrower part (see **Figure 1D**). Evidence revealed that when congruent and non-congruent non-face geometrical configurations were compared (using both triangles and trapezoids, see **Figures 1E,F**), newborns looked longer at the congruent pattern (Macchi Cassia et al., 2008). There are several reasons why newborns preferred congruent configurations compared to non-congruent ones. First, in line with some Gestaltlike principles, congruent visual stimuli are easily processed by the visual system from birth because they fit well with the figural simplicity and regularity criteria (Palmer, 1991). Second, newborns perceive and detect configural information embedded in hierarchical stimuli better than featural information (Macchi Cassia et al., 2002; Simion and Leo, 2010).

Overall, since newborns' visual behavior was affected by the up-down arrangement of the inner features and by congruency, independently of whether such arrangement was or not face-like, these findings support the hypothesis of the existence of general non-face specific attentional biases toward structural properties of the stimuli. Their presence at birth seems sufficient to cause the human face to be a frequent focus of newborns' visual attention, allowing the gradual development of a face representation and of a face processing system.

Intriguingly, top-heaviness and congruency are two important structural properties that play a role in shaping the response of adults' face sensitive areas, highlighting the findings obtained with newborns. An fMRI study showed that adults' face cortical areas (e.g., FFA) are tuned for patterns with more elements in the upper part, even if these patterns were not perceived as face-like stimuli (Caldara et al., 2006). This result corroborates the idea that updown asymmetry is crucial in eliciting face preference not only at birth, but also in adulthood. In addition, the same structural properties (i.e., top-heaviness and congruency) modulate the latency and the amplitude of early face-sensitive ERP components in adults (e.g., P1 and N170). Crucially, the violation of both these structural properties modulates ERP components more than the violation of each property alone, demonstrating that they produce an additive effect in face preference (Macchi Cassia et al., 2006).

The existence of general attentional biases toward perceptual and structural properties to explain face preference is in line with a recent theoretical Binocular Correlation Model (i.e., BCM) that proposes to explain the neonatal face bias as a result of a visual filtering mechanism, related to the limited binocular integration possessed by newborns (Wilkinson et al., 2014). In other words, face-like and non-face-like stimuli were presented in the center of a robot's visual field and the salience value was recorded. A binocular model was compared to a monocular model. Results obtained from the binocular model resembled the face preference found in newborns. Although the BCM was able to generate a face preference, the authors suggest that " *it is not based on an innate internal representation of facial structure. It relies on generic binocular circuitry, not a specialist module*" (Wilkinson et al., 2014). In addition, the same model can explain both face preference at birth and other visual preferences that have nothing to do with faces. For example, the BCM model suggests that horizontally oriented patterns are preferred because they generate more binocular correlation than vertical ones. The same hypothesis is true for stimuli with more elements in the upper part. Although further empirical studies are needed to confirm these hypotheses, it seems that the BCM model is a promising computational model to investigate the mechanisms underlying face preference at birth.

The hypothesis of the existence of general biases to explain face preference at birth has been undermined by a study that highlighted how the contrast polarity of the stimuli is determinant to induce such a preference (Farroni et al., 2005). The rationale was that, if the up-down asymmetry is crucial to determine face preference, then the contrast polarity of the elements should not interfere (i.e., face-sensitive view, see Johnson et al., 2015, for a discussion). Results demonstrate that in the negative polarity condition the preference for upright face-like stimuli disappears (see Rosa Salva et al., 2012), for a similar result in newly-hatched chicks. Consistent with that, the authors proposed that the newborns' visual system has been shaped, by natural selection, to prefer faces in the environment under natural lighting illumination conditions, which are from above rather than from below.

Unfortunately, the absence of significant results (i.e. null results) under the negative contrast polarity condition between upright and inverted face-like patterns cannot be considered conclusive, because alternative explanations are possible. First, a large number of stimulus variables, as the sensory hypothesis proposed, can affect newborns' preferences. In particular, at birth, the attractiveness of a pattern is affected by the amplitude spectra (i.e., contrast, luminosity, spatial frequency) as well as by the phase spectra (i.e., structural properties; Slater et al., 1985). The reversal of contrast polarity can be described, in the spatial frequencies domain, as 180°shifts in the phase angles of all spatial frequencies and this shift could interfere with newborns' preferences for faces (Mondloch et al., 1999) and for both faces and objects in 6-weekold infants (Dannemiller and Stephens, 1988). Second, the phase spectra of certain patterns cannot be arbitrarily shifted without destroying the discriminability of the pattern (Kemp et al., 1996) since a change in polarity might affect the process of figure-ground segregation: black regions are more often perceived as figures. Future studies, which either verify if the contrast polarity effect is limited to face-like patterns or if the change in polarity decreases the discriminability of stimuli other than faces, are required to test the role of contrast polarity in determining newborns' preferences. Finally, a mechanism underlying face preference which is more face-related than previously supposed, cannot explain the data demonstrating that an upright stimulus with three blobs randomly located in the upper part is always preferred over a face-like pattern (Turati et al., 2002) and that a scrambled face with more elements in the upper part is always preferred to a real face (Macchi Cassia et al., 2004, see **Figure 1G**).

Consequently, if one takes into account all these considerations, it clearly appears that we are still with two possible interpretations of face preferences at birth and that we are far from a conclusive answer to the question about general domain relevant attentional biases or a specific LSF face detector to explain face preference at birth. What we know, for sure, is that these attentional biases cannot explain face preferences later during development, because 3-month-old infants prefer to look at faces even when they were contrasted with scrambled face configurations with more elements in the upper part (Turati et al., 2002), corroborating the idea that 3 months of visual experience are sufficient to change and tune the face representation.

### **Developmental Changes in Face Representation**

Behavioral evidence supports the idea that face representation changes over development and that experience allows infants to build up a specific representation of experienced faces and to categorize faces within a face space (Valentine, 1991; Valentine et al., 2015).

The face space is "*defined as a multidimensional space, in which each individual face is coded as a point in a continuum where the average face lies at the center of the space*" (Valentine, 1991). This face space narrows over time as a function of experience, so that infants become expert in processing the most experienced faces as proposed by the *perceptual narrowing* view (Nelson, 2001, 2003). According to this view, infants begin life with general mechanisms dedicated to processing faces as well as other stimuli and subsequently become "tuned" to the experienced human faces, as a direct consequence of the exposure to this kind of visual stimuli present in the species-specific environment during the first months (Scott et al., 2007).

Data from both human and non-human infants corroborate the hypothesis of the existence of a broad face perception system at birth. A large proportion of the literature on face-perception at birth in both non-humans (Sugita, 2008) and humans (Kelly et al., 2005; Quinn et al., 2008) reveals clear evidence of a basic, coarsely tuned face-perception system in primates as well as in humans that becomes tuned to the experienced faces. For example, newborns do not show any visual preference for faces from their own or other ethnic groups (Kelly et al., 2005), in contrast this effect is present few months later (Kelly et al., 2005; Anzures et al., 2013). In the same vein, newborns do not respond differentially to the gender of the faces (Quinn et al., 2008), but 3 months of experience are enough to elicit it (Quinn et al., 2002). Furthermore, newborns do not prefer a human face when contrasted with a non-human monkey face equated for all the low-level perceptual properties (i.e., high contrast areas or spatial frequencies; Di Giorgio et al., 2012; but see Heron-Delaney et al., 2011). This preference appears 3 months later (Heron-Delaney et al., 2011; Di Giorgio et al., 2013; Dupierrix et al., 2014).

Interestingly, Di Giorgio et al. (2012) bring into question also the role of the eyes in triggering newborns attention toward faces, since the contrast between the sclera and the iris, which is present in human eyes but not in the non-human ones, does not determine any preference. Recently, Dupierrix et al. (2014) confirmed this result. Newborns that were simultaneously presented with a pair of non-human primate faces differing only for the eyes do not manifest any preference between a face with original non-human primate eyes and the same face where the eyes were replaced by human eyes. These results seem to contradict the idea that face preference reflects an attraction toward human eyes (Baron-Cohen, 1994; Farroni et al., 2005) and seem to contrast previous studies showing that newborns preferred to look at faces with open eyes and with a direct gaze (Batki et al., 2000; Farroni et al., 2002, 2006). However, all these data need to be interpreted with caution because stimuli were never paired as for the low-level variables. Consequently all these preferences might be attributed to a difference in low-level variables such as the difference in spatial frequencies components.

An alternative explanation might be related to the processing of the overall configuration of the face. Possibly, the processing of the eyes might be limited, since newborns might pay more attention to the external parts of faces (Pascalis et al., 1995), especially when eyes are embedded in a non-human primate face with a salient external contour emphasized by fur. However, this explanation is unlikely because newborns attend equally to internal and external features of faces (Turati et al., 2006).

A more convincing explanation would be that newborns process faces holistically and sensitivity for human eyes per se is not inborn but emerges later due to the extensive experience with conspecifics. This idea is supported by recent eye tracker studies in which 3-month-old infants look longer at the eyes of the human face when contrasted with a monkey face (Di Giorgio et al., 2013; Dupierrix et al., 2014). So, it appears that 3 months of exposure to human eyes is sufficient to drive infants' attention toward the more experienced human eyes (Dupierrix et al., 2014).

Overall, data are in line with the hypothesis that the faceperception system becomes tuned to human faces and human eyes during development as a function of visual experience (Nelson, 2001; Pascalis et al., 2002; Pascalis and Kelly, 2009; Di Giorgio et al., 2013; Dupierrix et al., 2014).

The presence of the perceptual narrowing process with the most experienced faces is supported by eye tracker studies that showed different patterns of exploration for different categories of faces (Liu et al., 2011; Di Giorgio et al., 2013). For instance, the visual scanning paths of 4- to 9-month-old Asian infants presented with same and other-race faces are different as a function of the nature of the stimulus, demonstrating developmental changes in the face processing strategies. For instance, with age, infants tend to look longer at the internal features embedded in the same-race face but not in the other-race faces (Liu et al., 2011).

All together these data corroborate, once more, the idea that newborns' visual attention is mainly triggered by the low-level perceptual properties of the visual stimuli, whereas, starting from 3 months of life, visual preferences become specific for faces and, specifically, with the more experienced faces, such as human faces or faces that belong to infants' ethnic group.

From a neural point of view, the perceptual narrowing process consists of a progressive and gradual specialization and localization of the cortical brain areas involved in face processing (Johnson, 2000). Indeed, at birth these circuits respond to a wide range of visual stimuli but later, during development and thanks to visual experience, these cortical circuits became more and more selective to only some categories of visual stimuli, such as experienced face, causing a more localized and specialized neural response. For instance, studies that performed positron emission tomography (PET) scans suggested that, by 2–3 months of age, there are the first signs of cortical specialization for faces (Tzourio-Mazoyer et al., 2002). Moreover, ERPs studies demonstrated that, at a neural level, 6-month-old infants differentiate faces from objects (de Haan and Nelson, 1999) and, interestingly, also human faces from monkey faces (de Haan et al., 2003). Further, near-infrared spectroscopic studies (NIRS) have provided new evidence of cortical regions in the infant brain already devoted to face processing (see Otsuka, 2014, for a review).

Overall, these findings are in line with the idea that the face-perception system is the product of a conjunction of evolutionary inheritance and of an experience-dependent process of learning after birth (de Schonen, 1989; Sai, 2005; Pascalis and Kelly, 2009; Slater et al., 2010) and that the system becomes finely tuned by the visual experience in a speciespecific environment. This specialization corresponds to an improvement in the discrimination of stimuli predominant in the environment and to a decline in the discrimination of stimuli not frequently experienced in the environment. What is currently less understood is the nature of the mechanisms responsible of the perceptual narrowing and of the maintenance or facilitation with experience. One possible neural mechanism that guides perceptual narrowing may be the neural pruning phenomenon (Scott et al., 2007). Indeed, early in life there is an exuberance of synaptic connections in the brain, which are pruned in order to reach adult levels over time. Therefore, it is plausible to hypothesize that the decline in face discrimination ability for certain stimuli coincides with this pruning process.

### **How Newborns and Infants Recognize Faces**

This part of the paper will discuss how faces are recognized and whether the computations to encode, store and retrieve information are special for faces since birth. From a developmental point of view, it is important to investigate whether infants from birth have the capacity to extract and process both the featural and the configural information present in a face, and how the face processing strategies change and become face-specific as a function of visual experience.

It's a matter of fact that newborns, despite their immature visual system, are able to recognize individual faces. After the habituation phase with a picture of a female stranger's face, newborns looked longer at a new face compared to the familiar one, demonstrating their ability to learn a specific individual face to which they are repeatedly exposed (Pascalis and de Schonen, 1994). In addition, the mother's face is recognized and preferred over a female stranger's face within hours from birth (Bushnell et al., 1989; Pascalis et al., 1995; Bushnell, 2001; Sai, 2005). Despite this newborns' learning ability, which is the nature of the operations that occurs on face recognition at birth and in early infancy is still an open question.

Data collected in our lab employing face-like, real faces and geometric stimuli converge to suggest that, at least at birth, the operations involved in face processing are the same that occur to process any visual object. For instance, newborns are able to discriminate between arrays that are identical with respect to the global characteristics (i.e., columns of filled or unfilled elements), but differed as for to the shape of the filled elements contained within the two filled columns (i.e., square elements vs. diamond elements). This result shows that newborns are able to discriminate the individual elements of an array and can organize such elements into a holistic percept (Farroni et al., 2000). The same results have been obtained with face-like patterns since newborns discriminated between schematic face-like that differed exclusively for the shape of the internal local elements (Simion et al., 2002).

Together, these data support the hypothesis that newborns possess a general visual pattern-learning mechanism that enables them to encode, retrieve, and thus recognize as familiar, visual stimuli independently of whether they are faces or not. The learning mechanism responsible of face recognition is not specific for faces but, rather, operates in a similar fashion for all types of visual stimuli (de Schonen and Mancini, 1995; de Schonen et al., 1998; Johnson and de Haan, 2001).

In line with the presence of this general visual pattern-learning mechanism, active both for faces and non-face stimuli, infants from birth are able to perceive and recognize the invariant perceptual characteristics of a wide range of visual stimuli. For instance, newborns are able to perceive objects and faces as invariant across the retinal changes due to modifications in slant or distance (Slater and Morison, 1985; Slater et al., 1990), both when physical (i.e., simple or complex geometrical patterns) and social objects are available in the environment. For instance, it has been demonstrated that newborns are able to process the invariant features of a face regardless of changes in slant relative to the observer (Turati et al., 2008).

Overall, the general visual pattern- learning mechanism seems to operate on non-face-like, face-like configurations and real faces and is thought to be sensitive to those coarse visual cues of a face or non-face stimuli strictly dependent on LSF that convey configural information.

Indeed, evidence demonstrated that the visual information newborns use to process and recognize a face is triggered by lowrather than high-spatial frequencies (de Heering et al., 2007b). Basically, this is due to the fact that, configural information, is processed mainly by the right hemisphere (de Schonen and Mathivet, 1989; Deruelle and de Schonen, 1991, 1998; de Schonen et al., 1993). Deprivation of early visual input to the right hemisphere, due to a bilateral congenital cataract, led to impaired configural processing (Le Grand et al., 2003). Since the right hemisphere matures before and at a faster rate than the left hemisphere, newborns and young infants are sensitive to configural information more than to features in both faces and non-faces (de Schonen and Mathivet, 1990). In effect, the same LSF range is critical in producing the global/local advantage found when newborns process hierarchical stimuli (Macchi Cassia et al., 2002). Employing hierarchical patterns in which larger figures (i.e., cross or rhombus) are constructed from the same set of smaller figures, it has been demonstrated that newborns are able to discriminate both the local and the global levels. However, recognition of the local features was impaired in the condition when information at the global level interfered with identification of the local features (Macchi Cassia et al., 2002). This asymmetrical interference might be used to interpret the inversion effect obtained in the inner features condition with faces. That is, when the face is in the upright orientation newborns encode both levels (i.e. local and global) with a superiority of the global/configural one, which allows recognition of the face. In contrast, when the face is turned upside- down, newborns are impaired to use the global/configural information and, due to the sensitivity to LSF, cannot rely upon the only use of the featural information (Turati et al., 2006). Collectively, findings reported here demonstrated that newborns are sensitive to configural information both to faces and non-faces stimuli due to constraints of their visual system.

However, since in adults configural processing is specific for faces and it has been attributed to the extensive experience with faces during lifetime, from a developmental point of view it seems crucial to investigate when faces start to become special and start to be processed differently from objects (see Hoel and Peykarjou, 2012). Some studies demonstrated that infants start to process differently upright and inverted faces within the first months of life, providing evidence for an early face inversion effect. For instance, Turati et al. (2004) showed that the face inversion affected 4-month-olds' face recognition abilities. In the same vein, 4-month-old infants' visual scanning paths are different as a function of the orientation in which the face was presented (Gallay et al., 2006; see also Kato and Konishi, 2013). At a neural level, two ERP components (i.e., N290 and P400) are found to be indicative of a face processing ability in early infancy (de Haan et al., 2002; Halit et al., 2003; Scott and Nelson, 2006; Scott et al., 2006). ERPs studies conducted with 6-month-old infants revealed that the P400, a precursor of the adult N170, was modulated by inversion already at this age: inverted faces demonstrated greater amplitude negativity than upright faces (Webb and Nelson, 2001; de Haan et al., 2002). Interestingly, although there are no behavioral studies that directly compare inversion effect for faces vs. objects in infants, a recent NIRS study demonstrated that inversion effect for faces and objects differently modulates brain activation in 5- and 8-month-old infants (Otsuka et al., 2007). Further studies demonstrated that, starting in early childhood, the stimulus inversion affects disproportionately faces compared to objects (Picozzi et al., 2009), corroborating previous results with older children (Carey and Diamond, 1977; Teunisse and de Gelder, 2003).

As for the composite face effect, a recent study reported, for the first time, that 3-month-old infants, as well as adults, process faces holistically. Specifically, infants have shown to be more accurate in recognizing the familiar top-half of a face in the misaligned condition as compared to the aligned condition (Turati et al., 2010). Interestingly, although both adults and infants showed the composite face effect, their performance differed in the misaligned condition. In effect, adults looked longer at the novel top half, whereas infants looked longer at the familiar top half. This result demonstrates that the tuning toward configural information appears very early in life, but experience progressively refines early configural strategies in face processing. Employing the same composite face paradigm and extending previous findings (Carey and Diamond, 1994; Mondloch et al., 2007), some studies demonstrated that holistic face processing is fully mature at 4 years of age (de Heering et al., 2007a) and is selective for faces at 3.5 years of age (Macchi Cassia et al., 2009).

Intriguingly, all the studies reported here confirm that visual experience is critical for the typical development of face processing. However, at present how early visual experience shapes the neural mechanisms underlying face processing is not well understood. In light of this, future studies should be conducted to better understand what kind of visual experience is more effective to render the face processing system specialized and the sensitive periods during development (see Scott et al., 2007). A more recent ERP study conducted with infants from 6 to 9 months has attempted to answer this question.

In this study, a neural specialization indexed by a different modulation of P400 for upright compared to inverted monkey faces, was found in infants who have received a training of 3 months with monkey faces labeled at the individual-level (i.e., a single monkey face associated with a name). Infants in this group showed an inversion effect for monkey faces. In contrast, no effects were found in infants who received a training with the same monkey faces labeled at the categorical-level (i.e., "monkey" as the name for all faces presented), demonstrating that the different experiences (i.e., categorical vs. individual learning experiences) affected in a different way face processing and neural specialization for faces during development (Scott and Monesson, 2010).

Taken together, the studies reviewed here demonstrated that at birth, due to the presence of certain constraints of the visual system (e.g., sensitivity to LSF), newborns apply the same strategies to recognize and process both faces and non-faces similarly, corroborating the idea of the existence of a general visual pattern-learning mechanism. Then, during development, thanks to the specific visual experience with certain kind of stimuli, the system becomes specialized to process differently objects and social stimuli.

### **Conclusion**

Overall, the studies carried out with newborns demonstrated the presence, since birth, of pre-wired domain relevant attentional biases toward faces and the role of experience in shaping the face processing system.

As for face detection, here we suggest that faces are not special visual stimuli for newborns and that a specific face-sensitive mechanism is not required to explain face preference since birth. The reviewed evidence speaks in favor of the hypothesis that faces might be preferred at birth because they are a collection of preferred structural (i.e., up-down asymmetry, congruency, etc.) and configural properties that other stimuli may also possess. Consequently, the debate is still open and further studies need to be carried out to disentangle the question about general or specific biases underlying face preference at birth. Further, it seems relevant to investigate whether the activation of the subcortical route in newborns and in adults, putatively active throughout the lifespan (Tomalski et al., 2009), is elicited or not by the same visual stimuli during development and the nature of the interaction between the cortical and subcortical routes in face processing along lifespan.

In addition, future studies are needed on the nature of face representation at birth because we are far from a conclusive answer about the best stimulus that elicits face preference at birth. Some controversial studies about the effect of contrast polarity (Farroni et al., 2005) and the role of the eyes in triggering face preference at birth (see Dupierrix et al., 2014) suggest to further investigate, both with behavioral and neuroimaging studies, what low-level visual cues, such as the high contrast area of the human eyes and the pupil, may render them so important in the first months of life and whether their relevance changes over time.

Furthermore, future studies should investigate what is the nature of the mechanisms responsible of the perceptual narrowing process that occurs during development and, even more important, what is the visual experience that is more effective to guide the specialization of the system to process faces during the sensitive and/or critical periods during development. In particular, electrophysiological studies are needed to investigate how the infant brain works during development in response to faces.

In the same vein, how and when faces become special stimuli and start to be processed differently from objects are intriguing open questions. Future studies should directly compare visual processing strategies employed for faces and for objects by using the same paradigms at different time points during development in order to track a developmental trajectory of the face processing specialization.

One of the main purpose that guides such research should be to increase the knowledge about the typical developmental

### **References**


trajectories in order to identify infants who deviate from them (i.e., infants at high-risk for autism) and to promote screening and intervention programs when the brain is more plastic and receptive to changes.

Overall, the evidence is consistent in demonstrating a progressive functional and neural specialization of the facesystem. The data reviewed here speak in favor of the idea that, in order to develop in its adult-like expert form, the face-system may not require a highly specific input (i.e., a face-specific bias). Rather, it is plausible to hypothesize that the presence of some domain-relevant attentional biases at birth is sufficient to set up and to drive the system toward the gradual and progressive structural and functional specialization that emerges later during the development thanks to the visual experience that infants have in their species-specific environment.


Zhu, Q., Song, Y., Hu, S., Li, X., Tian, M., Zhen, Z., et al. (2010). Heritability of the specific cognitive ability of face perception. *Curr. Biol.* 20, 137–142. doi: 10.1016/j.cub.2009.11.067

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Simion and Di Giorgio. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Developmental changes in analytic and holistic processes in face perception

#### *Jane E. Joseph1\*, Michelle D. DiBartolo1 and Ramesh S. Bhatt2*

*<sup>1</sup> Department of Neurosciences, Medical University of South Carolina, Charleston, SC, USA, <sup>2</sup> Department of Psychology, University of Kentucky, Lexington, KY, USA*

Although infants demonstrate sensitivity to some kinds of perceptual information in faces, many face capacities continue to develop throughout childhood. One debate is the degree to which children perceive faces analytically versus holistically and how these processes undergo developmental change. In the present study, schoolaged children and adults performed a perceptual matching task with upright and inverted face and house pairs that varied in similarity of featural or 2nd order configural information. Holistic processing was operationalized as the degree of serial processing when discriminating faces and houses [i.e., increased reaction time (RT), as more features or spacing relations were shared between stimuli]. Analytical processing was operationalized as the degree of parallel processing (or no change in RT as a function of greater similarity of features or spatial relations). Adults showed the most evidence for holistic processing (most strongly for 2nd order faces) and holistic processing was weaker for inverted faces and houses. Younger children (6–8 years), in contrast, showed analytical processing across all experimental manipulations. Older children (9–11 years) showed an intermediate pattern with a trend toward holistic processing of 2nd order faces like adults, but parallel processing in other experimental conditions like younger children. These findings indicate that holistic face representations emerge around 10 years of age. In adults both 2nd order and featural information are incorporated into holistic representations, whereas older children only incorporate 2nd order information. Holistic processing was not evident in younger children. Hence, the development of holistic face representations relies on 2nd order processing initially then incorporates featural information by adulthood.

Keywords: holistic, configural, featural, similarity, face inversion, children, perceptual matching, serial, parallel

### Introduction

A wealth of research suggests that face recognition and identification improve with age throughout childhood and adolescence (Goldstein and Chance, 1964; Ellis et al., 1973; Kagan and Klein, 1973; Carey and Diamond, 1977; Carey et al., 1980; Ellis and Flin, 1990; Pascalis and Slater, 2003; Gauthier and Nelson, 2001; de Heering et al., 2012). Although numerous perceptual mechanisms have been examined, there continues to be debate as to which mechanism(s) are most critical for the proficient and expert-level face recognition demonstrated by adults. *Configural* processing refers to processing the spatial relations among facial features, with *1st order* configuration referring to

#### *Edited by:*

*Bozana Meinhardt-Injac, Johannes Gutenberg University Mainz, Germany*

#### *Reviewed by:*

*Martin Juttner, Aston University, UK Laurence Chaby, Paris Descartes University, France*

#### *\*Correspondence:*

*Jane E. Joseph, Department of Neurosciences, Medical University of South Carolina, 96 Jonathan Lucas Street, Clinical Sciences Building 325E, Charleston, SC 9425, USA josep@musc.edu*

#### *Specialty section:*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

*Received: 15 April 2015 Accepted: 24 July 2015 Published: 07 August 2015*

#### *Citation:*

*Joseph JE, DiBartolo MD and Bhatt RS (2015) Developmental changes in analytic and holistic processes in face perception. Front. Psychol. 6:1165. doi: 10.3389/fpsyg.2015.01165* the canonical ordering of facial features in an upright orientation (eyes above nose above mouth) and *2*nd *order* configuration referring to the spacing of the features relative to each other. *Holistic* processing refers to perceiving the individual features and their spatial relations as an integrated whole (Pascalis et al., 2011). *Analytical*, *featural*, or *piecemeal* processing of faces refers to perceiving, comparing, or analyzing specific face components, such as the eyes, nose, mouth.

Diamond and Carey (1986) and Carey and Diamond (1994) suggested that perceptual expertise for faces is based on proficiently encoding and using 2nd order information. In their model, objects within a category are compared to a configural prototype in order to discriminate different exemplars. Computing 2nd order information supports rapid and accurate discrimination among the exemplars of the same category. Although faces are the only class of stimuli with which most adults have sufficient expertise to allow the use of 2nd order information (Carey and Diamond, 1994; Tanaka and Farah, 2003; Tarr and Cheng, 2003), the same processing may be used to support expertise with other visual categories (Diamond and Carey, 1986).

Carey and Diamond (1994) also suggested that younger children have not yet developed the perceptual capacity for 2nd order processing of faces and, instead, rely on a featural encoding strategy for identifying upright and inverted faces. They based this conclusion on the finding that 6-year-olds recognized inverted faces as well as upright faces, whereas 8- and 10-yearolds exhibited an inversion effect that is similar to that shown by adults. In other words, older children and adults demonstrate greater difficulty with face identification when faces are inverted but object identification is not as strongly affected (Yin, 1969). One interpretation of the face inversion effect is that configural and holistic processing, which may be more integral to faces than to other objects, is disrupted with inversion so that an inverted face becomes more like a collection of features rather than an integrated, holistic gestalt (Rossion, 2009). Individual features of objects (e.g., the mane of a horse) may be sufficient to uniquely identify an object at a basic-level of categorization, so inversion has little impact on object recognition. In support of this, many studies indicate that inversion affects relational processing more than featural processing (Thompson, 1980; Bartlett and Searcy, 1993; Rhodes et al., 1993; Freire et al., 2000; Murray et al., 2000, 2003; Barton et al., 2001; Le Grand et al., 2001).

This interpretation, however, has been questioned by findings that inversion may also disrupt processing of other nonface categories or face stimuli without internal features (Reed et al., 2003; Brandman and Yovel, 2012). Also, inversion disrupts featural processing of faces in addition to configural processing (Maurer et al., 2002; Riesenhuber et al., 2004; Sekuler et al., 2004). Debates continue about whether featural and configural processing of faces are independent components of face processing (Riesenhuber and Wolff, 2009) and whether featural processing of faces is equivalent to object processing (Peterson and Rhodes, 2003). For example, face inversion effects are much weaker if stimuli are perceptually very similar (Rhodes et al., 2006), and the differential effect of inversion on relational versus featural processing goes away under these conditions. Also, when faces are inverted, participants may use the same local information to discriminate faces, but they do this less efficiently compared with upright faces (Sekuler et al., 2004). Given this debate and the fact that face inversion has been a well used manipulation to study developmental changes in face processing, the present study will examine the effect of inversion on both featural and configural processing across different levels of similarity, in both children and adults.

One reason that face inversion effects have been intensely investigated in developmental studies is that many studies have replicated the findings by Carey and Diamond that younger children show weaker face inversion effects than older children and adults (Schwarzer, 2000; Brace et al., 2001; Joseph et al., 2006; Meinhardt-Injac et al., 2014). Presumably, if children perceive faces as a collection of features rather than as an integrated gestalt, then inversion will not disrupt processing of the individual features. In contrast, inversion will disrupt 2nd order configural processing because spatial relations cannot be as easily perceived when the canonical orientation is changed. Studies that have directly manipulated featural and 2nd order information in faces have also reported earlier development of (and reliance on) featural processing in children, compared to 2nd order processing (Schwarzer, 2000; Freire and Lee, 2001; Maurer et al., 2002; Mondloch et al., 2002, 2003, 2006). This delayed development of 2nd order processing, however, is debated (Gilchrist and McKone, 2003; Pellicano et al., 2006). McKone and Boyer (2006) argued that when baseline performance is accounted for, children as young as 4 years of age show sensitivity to 2nd order information in faces, similar to the sensitivity shown by adults. In addition, developmental delays in 2nd order processing may not be specific to faces (Robbins et al., 2011) suggesting a more domain-general mechanism at play. Although sensitivity to 2nd order information in faces can improve (Baudouin et al., 2010) or become more specific to faces (Cassia et al., 2011) with age, sensitivity to 2nd order information emerges as early as 5 months of age (Hayden et al., 2007). Nevertheless, if younger children (and infants) are sensitive to 2nd order information, then why do they show reduced face inversion effects compared to older children and adults?

The goal of the present study is to explore this question further by using a perceptual matching task and parametrically varying featural and 2nd order configural information (**Figures 1** and **2**). Importantly, the same perceptual processing will be examined in another class of objects (houses) which are well equated to the face stimuli in terms of the type of information manipulated and the level of differentiation required. In addition, the analyses will control for performance differences across adults, older children (9–12 years of age) and younger children (6–8 years of age) by using baseline performance as a covariate.

The experimental paradigm is illustrated in **Figure 1**. Two faces (or houses) were presented simultaneously and subjects decided if they were the same or different. If two identical stimuli were presented, the correct response was "yes" (as indicated by a button press). If the stimuli presented were different in any way, then the correct response was "no" (as indicated by a button press). In the featural condition (**Figure 1A**), the "different" pairs consisted of stimuli that differed in 1, 2, 3, or 4 internal

for a pair (B) sample 2nd order face pairs are shown. Third row indicates the spacing relations that were different (distance of brows to top of head, distance of nose to top of head, distance of mouth to top of head and

sim0–sim3 trials. Identical pairs represent the maximum similarity two faces can share. In this case, holistic representations would lead to a serial exhaustive comparison process (green dotted line).

features. In the example shown, the "sim0" pairs were different in all features (brows, eyes, nose, mouth); hence, similarity was 0. "Sim1" pairs had three features that were different, "sim2" pairs had two different features and "sim3" pairs differed by only one feature. The actual features that varied at each similarity level were different across pairs so that, for example, not all sim2 pairs differed by the nose and mouth (as shown in the figure); some sim1 pairs varied by the eyes and brows, etc.

Much of the evidence from studies using a similar paradigm of parametric manipulation of similarity (e.g., Collins et al., 2012) has shown that the more features that are similar, the harder it is to reject the two stimuli as the same. Or, the more features that are different, the easier it is to reject the two stimuli as the same. Consequently, reaction time (RT) and/or error rates increase as similarity increases for "different" trials (i.e., trials in which a correct response is "no"). This pattern of results

would be expected if the stimuli are compared on a featureby-feature basis. An increase in RT/errors that is linear and monotonic is often taken as a reflection of serial processing of the features, (blue line in **Figure 1C**), as in the frameworks of visual selective attention (Treisman and Gelade, 1980) or shortterm memory search (Sternberg, 1966); that is, each additional level of similarity will incrementally increase RT/errors because the stimuli are compared feature-by-feature until a difference is found. If more features are similar, then more comparisons would be made and RT would be longer. Here, we suggest that if the features of a face are integrated into a holistic percept, then access to individual face features will be more difficult. RT functions will show evidence for serial processing because a feature-byfeature comparison will take longer if more features are similar; that is, more features will need to be compared before finding a difference in features. Hence, if RT functions have a positive

slope, then serial processing is assumed and this, in turn, is a reflection of holistic representations.

Conversely, if features can be processed independently, then the number of features that are different on any given trial will not affect performance. This is akin to parallel processing of features, much like the idea of preattentive processing and popout effects in visual selective attention (Treisman and Gelade, 1980). Parallel processing means that the individual features can be processed simultaneously so that a feature-by-feature comparison is not necessary. Hence, the ideal parallel processor would show no difference in RT/errors as a function of similarity (red line in **Figure 1C**) because detecting just one different feature is sufficient to make a "no" response and the number of additional different features will not significantly increase RT. We suggest that RT functions that do not have a significant positive slope reflect parallel processing. In turn, this pattern of results reflects underlying face representations that are more piece-meal and not integrated into a holistic percept.

Matching of two identical stimuli may involve a serial exhaustive comparison of all features (dotted green line in **Figure 1C**) if the features are integrated into a holistic percept and cannot be processed or analyzed independently. "Same" trials are depicted (and will be analyzed) separately from "different" trials because it has been suggested that "same" matching invokes more holistic processing whereas "different" matching relies on analytical processing (Taylor, 1976). Hence, the "same" pairs provide an upper bound on degree of holistic processing for a given experimental condition.

This analogy with serial and parallel processing is useful because the idea of a serial comparison process suggests that the individual features are not processed independently. If faces are indeed processed holistically, as suggested by an abundance of evidence, then processing individual face components will be difficult and will likely result in a serial response function as illustrated by the green line in **Figure 1**. If, however, features can be processed independently from each other in an analytical fashion, then the response functions will shift toward a function with slope of 0 (red line in **Figure 1**). Of course, similarity functions may emerge that are neither purely parallel nor purely serial exhaustive (e.g., blue line).

A second condition was also tested – 2nd order configural matching. Given the robust debate about whether inversion affects featural or 2nd order processing differently (Rossion, 2009) or in the same manner (Riesenhuber and Wolff, 2009), a comparison of the two different kinds of perceptual information in the present framework can speak to this controversy. In addition, other tests of holistic processing of faces such as the composite effect (Young et al., 1987) and part-whole effect (Davidoff and Donnelly, 1990; Tanaka and Farah, 1993) result in both featural and 2nd order changes in faces, making it difficult to assess the independent contributions of these changes to holistic face processing. As shown in **Figure 1B** the 2nd order face (or house) pairs all had the same features, but the pairs differed in the spacing of the features in a systematic manner (see methods and detail shown in the figure). Again, if the face is perceived as an integrated percept, the spacing relations cannot be easily accessed as individual elements which will result in a serial comparison of the two faces in a pair. But if the individual spacing relations can be perceived independently from the rest of the face, then the comparison process will show evidence for parallel processing.

The central thesis of the present study is that adults will show the most evidence for processing faces as an integrated percept whereas younger children will show the most evidence for processing faces analytically or in a piecemeal fashion. With an integrated percept, access to and comparison of individual features across two faces, whether they are components like eyes, nose, etc., or spacing features, will be more difficult and result in relatively more serial processing, or sloped RT similarity functions. In contrast, if the percept of a face is not highly integrated, decomposing the face into constituent features will be less difficult and features may be processed in parallel, as indicated by flat RT similarity functions. In the present study, the focus is on the relative change in slopes of these similarity functions across different categories (faces versus houses), orientations (upright versus inverted), processing types (featural and 2nd order) and age groups (adults, older children, younger children).

Given this conceptual framework the following hypotheses are tested:


### Materials and Methods

### Participants

For Dataset 1, 18 healthy adults (mean age = 23.6 years, nine males), nine older children (mean age = 10.6 years, six males) and 10 younger children (mean age = 6.9 years, five males) participated in the featural condition. 38 healthy adults (mean age = 19.2 years, 18 males), 12 older children (mean age = 10.6 years, seven males) and 13 younger children (mean age = 7.2 years, five males) participated in the 2nd order condition.

Dataset 2 was collected as part of a functional magnetic resonance imaging (fMRI) study. Fourteen healthy adults (mean age = 22.4 years, six males), 10 older children (mean age = 10.2 years, five males) and 23 younger children (mean age = 7.3 years, 12 males) participated in the featural condition. Eighteen healthy adults (mean age = 20.5 years, 10 males), seven older children (mean age = 10.7 years, four males) and five younger children (mean age = 6.7 years, two males) participated in the 2nd order condition. The behavioral data for adults from this fMRI study has been published (Collins et al., 2012) but the analyses used in the present study were different from the published study.

All subjects had normal or corrected-to-normal visual acuity and normal color vision. In Dataset 1, 22% of the child participants were left-handed and 20% of the adults. For Dataset 2 all subjects were right-handed (as required for the fMRI study). Children completed the Peabody Picture Vocabulary Test (PPVT; Dunn and Dunn, 2007) and Expressive Vocabulary Test (EVT; Williams, 1997) and all children scored in the normal range. No participants reported neurological or psychiatric diagnoses, learning disability, medical conditions, or pregnancy.

Children provided assent and a parent provided informed consent before participating. Children and adults were compensated for participation but some adults received course credit instead of compensation. All procedures were approved by the University of Kentucky's and Medical University of South Carolina's Institutional Review Boards.

### Design and Stimuli

For Dataset 1, the 2 × 2 × 4 × 2 × 3 design had five independent variables: category (face, house), orientation (upright, inverted), and similarity (four levels of graded similarity, as illustrated in **Figures 1** and **2**) manipulated within subjects and processing type (featural, 2nd order) and age group (younger children, older children, and adults) manipulated between subjects.

For Dataset 2, the 2 × 4 × 2 × 3 design had four independent variables: category and similarity manipulated within subjects and processing type and age group manipulated between subjects. Although Dataset 2 did not manipulate orientation, the data were used in a supplementary analysis to increase sample size and assess the reliability of the effects obtained with only Dataset 1.

Photo-realistic faces were constructed using FACES 4.0 software (IQ Biometrix, Redwood Shores, CA, USA) and house stimuli were created using Chief Architect 10.06a (Coeur d'Alene, ID, USA).

### Featural Changes

Twenty-four faces or houses were initially constructed so that none of the features overlapped across these 24 stimuli. These were used as the basis for making featural changes and constructing stimulus pairs that varied in similarity.

For each original face, distracter faces were constructed so that 1, 2, 3, or 4 features (eyes, nose, mouth, or eyebrows) were replaced, yielding four similarity (sim) levels (and 96 unique faces: 24 original faces × 4 variants). Sim0–sim3 faces respectively shared 0–3 common features with the target face. The feature changed for each sim level was counterbalanced across all stimulus pairs so that feature replacement was not confounded with sim level. The same procedures were used for house features (door, steps, lower-level and upper-level windows). Forty-eight "same" pairs of faces or houses included two identical faces or houses, which were randomly selected from the pool of 96 face or house stimuli.

### 2nd Order Configural Changes

Twenty-four faces or houses were initially constructed so that none of the features overlapped across these 24 stimuli. The 2nd order face changes were: (a) horizontal distance between the centroid of both eyes/brows (these features were moved together so that the brows were always aligned with the eyes), (b) vertical distance between centroid of nose and top of forehead, (c) vertical distance between centroid of mouth and top of forehead, and (d) vertical distance between center of two brows and top of forehead. For faces, an initial spacing of 2 SD from the norms published by (Farkas, 1994) was used, but was changed to a 3 SD spacing after 2 SD was identified as being too difficult to detect. The house changes were: (a) horizontal distance between the centroid of both lower windows, (b) horizontal distance between the centroid of both upper windows, (c) vertical distance between center of lower windows and bottom of roof, and (d) vertical distance between center of upper windows and bottom of roof. Again, the relation changed for each sim level was counterbalanced across all pairs to avoid confounding with sim level.

### Procedure

For Dataset 1 each participant completed 256 trials that required a response of "no" (e.g., different trials that varied across four similarity levels, sim0–sim3 trials in **Figures 1** and **2**) and 64 trials that required a response of "yes" (e.g., same trials that consisted of identical stimulus pairs, as in **Figure 1**). The 256 "no" trials consisted of 80 upright face pairs, 80 upright house pairs, 48 inverted face pairs, and 48 inverted house pairs. The 64 "yes" trials consisted of 20 upright face pairs, 20 upright house pairs, 12 inverted face pairs and 12 inverted house pairs. Some of the pairs used for upright trials (38 different pairs and 10 same pairs) were also used for inverted trials, with the remaining inverted trials consisting of unique stimulus pairs that were not used on upright trials. Each subject received a random order of the 320 trials, which were broken up into four blocks of 80 trials providing rest periods for the participants.

On each trial, participants saw either two faces or two houses for 2900 ms followed by a fixation interval for 520 ms. Participants indicated whether the two stimuli were the same (index finger) or different (middle finger) using a serial response box. Participants could respond at any point during the trial. The duration and trial length were fixed because we conducted the behavioral and fMRI study in parallel and wanted to equate the designs of the two studies (and fMRI studies necessarily require a fixed interval for responding). We also wanted a brief period in between trials to present a blank screen; otherwise the stimuli would appear in a consecutive stream which would greatly increase the difficulty of the task. No feedback was given about performance because the major goal was to study perception of faces rather than learning.

For Dataset 2 each participant completed 256 trials broken up into four runs of 64 trials each. Within each run, the 64 trials were broken up into eight blocks (4 similarity levels × 2 repetitions) of eight trials (5 "no" and 3 "yes" trials). Hence, of the 256 trials, 96 trials required a "yes" response and 160 required a "no" response. Two of the runs were face matching and two runs were house matching. The order of the four runs was counterbalanced across subjects. Participants had rest breaks between blocks and between runs.

### Analysis of Reaction Time and Error Rate

Reaction time on each correct trial was log10 transformed (logRT) to meet the assumption of normality for multivariate tests. Outliers were determined separately for each age group and processing type and defined by logRTs that were more than 3 SD above or below the mean. Outliers accounted for 0.06% of the data in adults and 1.7% of the data in children. Errors were defined as incorrect responses or response omissions and the average error rate per condition was used in analyses.

Analyzing logRT across age groups (for Hypotheses 1 and 2) as a function of similarity needs to address the concern of interpreting scale-dependent interactions (Salthouse and Hedden, 2002). Specifically, differences in logRT as a function of age group or experimental condition cannot be interpreted unless those differences occur at the same parts of the RT scale. Given that children and adults (usually) perform at different parts of the RT scale, we addressed this in each analysis in the following ways.

First, in the analyses for Hypotheses 1 and 3, each age group and processing type was analyzed separately so concerns about age differences in RT did not need to be accounted for directly in the analyses. Second, in the analysis for Hypotheses 2a and 2b, which compared age groups directly, an ANCOVA approach was used in which logRT or errors in the sim0 condition served as the covariate, similarity (sim1–sim3) was the repeated factor and age group was the between-subjects factor. Sim0 is the best candidate for a covariate because it represents a baseline level of performance in which all features or relations are different between the stimuli, but the RT would still reflect other cognitive operations (such as orientation to the stimuli, response selection and response execution) that may differ across age groups. "Same" trials were analyzed in separate ANCOVAs from "different" trials: sim0 was the covariate and age group was the between-subjects factor. Sim0 served as the covariate for "same" conditions in order to control for the cognitive operations that were not specific to faces or houses.

The design used in this study (Dataset 1) was a full factorial design with three within-subjects variables (category, orientation and similarity) and two between-subjects variables (age, processing type). However, we did not conduct a full factorial ANOVA for two reasons. First, there were not enough degrees of freedom to estimate the four-way and five-way interactions given the number of subjects in each age group (at least for the featural condition). Second, the ANCOVA approach used sim0 as the covariate for a given condition (such as upright faces or inverted houses). With the full factorial design it would not be clear how to specify a single covariate for all of the experimental conditions or how to map the sim0 condition to different Category × Orientation combinations. Therefore, each hypothesis was tested with analyses for some subset of the variables (described for each hypothesis below). When interactions with similarity were present, simple effects analysis (Keppel and Zedeck, 1989) of similarity was conducted. The simple effects analysis would indicate whether the similarity function was significant for a given condition. Polynomial contrasts were then conducted to indicate whether the similarity function followed a linear trend.

Although error rates are not necessarily subject to the same concern of scale-dependent interactions (but see Salthouse and Hedden, 2002), we used the same ANCOVA approach for the analysis of error rates to keep the analyses consistent. However, we used the RT measure in order to examine serial versus parallel processing as that is the most typical measure used to estimate these processes.

### Results

### Hypothesis 1: Adults and Older Children should Show Stronger Face Inversion Effects than Younger Children

Following other findings in the literature, adults and older children were expected to show a stronger face inversion effect than younger children. This analysis only used data from Dataset 1 as that was the only dataset with an inversion manipulation. In addition, featural and 2nd order conditions were analyzed separately given that initial inspection of error rates revealed that 2nd order matching was more difficult.

### Featural Processing

Repeated measures ANOVAs with logRT and errors as dependent variables and category (face, house) and orientation (inverted, upright) were conducted separately for adults, older children and younger children. The presence of a Category × Orientation interaction served as the main test of the hypothesis. As shown in **Figure 3A** adults and older children showed a trend for a greater inversion effect for featural faces than for featural houses with respect to errors, but younger children did not show this interaction. The Category × Orientation interaction was marginal in adults, *F*(1,17) = 3.2, *p* = 0.089, and older children, *F*(1,8) = 4.1, *p* = 0.076, but not significant for younger children for errors (*p* = 0.727). For logRT, the Category × Orientation interaction was not significant (*p*'s *>* 0.77).

### 2nd order processing

For 2nd order configural faces and houses, all three age groups showed a trend for a face inversion effect with respect to errors (**Figure 3B**). In adults, the Category <sup>×</sup> Orientation interaction was significant for errors, *F*(1,37) = 53.8, *p* = 0.0001. The interaction was marginal in both older children, *F*(1,11) = 3.9, *p* = 0.073, and younger children, *F*(1,12) = 3.6, *p* = 0.082 for errors. The interaction was not significant for any age group for logRT (*p*'s *>* 0.27).

### Analysis of Similarity for Errors Featural Processing

**Figure 4** shows errors as a function of similarity for each age group and each Category × Orientation (i.e., upright faces, inverted faces, upright houses, inverted houses) condition for featural matching for Dataset 1 (which manipulated orientation). Each age group's error function is adjusted based on sim0 error (the covariate) so this value is the same for all age groups and conditions on a given graph. Solid colored lines indicate error functions for "different" trials; dotted colored lines indicate error functions for "same" trials. The primary goal of this analysis was to determine whether there were age differences in the slopes of the similarity functions; hence, the Age × Similarity interaction

For "different" trials, an Age × Similarity ANCOVA was conducted with sim0 as the covariate separately for each Category × Orientation combination. The Age × Similarity interaction was not significant for any condition for "different" trials (*p*'s *>* 0.48). For same trials, an ANCOVA was conducted with age as the independent variable and sim0 as the covariate. The main effect of age was marginal for upright faces, *F*(2,37) = 3.1, *p* = 0.062, significant for inverted faces, *F*(2,37) = 6.2, *p* = 0.005, upright houses, *F*(2,37) = 3.9, *p* = 0.032, and inverted houses, *F*(2,37) = 5.4, *p* = 0.009.

### 2nd order Processing

The same ANCOVAs conducted for featural processing in Section "Featural Processing" were conducted for 2nd order processing. For The Age × Similarity interaction was only significant for upright faces, *F*(4,118) = 6.0, *<sup>p</sup>* <sup>=</sup> 0.0001, on "different" trials (**Figure 5**). On "same" trials, the main effect of age was significant for upright faces, *F*(2,63) = 24.7, *p* = 0.0001, inverted faces, *F*(2,63) = 3.3, *p* = 0.045, upright houses, *F*(2,63) = 21.0, *p* = 0.0002, and inverted houses, *F*(2,63) = 11.2, *p* = 0001.

In summary, although there were no specific hypotheses with respect to error rates, this analysis was presented to show that adults perform the task more accurately than children, as expected. However, there were few age differences in similarity functions for either featural or 2nd order faces. Only upright featural houses and upright 2nd order faces showed interactions with age. Age effects were much more pronounced on "same" trials, with adults showing lower error rates. One important point from this analysis was that, even though error rates were quite high for some conditions, the primary analysis used RT only on *correct* trials. Therefore, in subsequent analyses, speed-accuracy tradeoffs are not driving the effects.

### Hypothesis 2a: Adults and Older Children will Show More Evidence for Serial Processing than Younger Children

This analysis was conducted to test Hypothesis 2a, which predicts that adults and older children should engage a serial comparison process as a function of similarity of the face pairs (and show more sloped similarity functions with a positive linear trend) whereas younger children should show more evidence for parallel processing (and show flatter similarity functions and no positive linear trend). To test this hypothesis, an Age × Similarity ANCOVA was conducted with sim0 as the covariate separately for each Category × Orientation combination. The presence of a Similarity × Age interaction was the primary test of the hypothesis. When this interaction was significant, simple main effects (Keppel and Zedeck, 1989) of similarity for each age group were also examined to determine whether the similarity function was positive and linear as an indication of serial processing. The linear trend was assessed using planned polynomial contrasts. Results are presented first for Dataset 1, which manipulated orientation in addition to similarity and category.

#### Featural Processing

**Figure 6** shows logRT as a function of similarity by age group and by each Category × Orientation condition for featural faces. Each age group's RT function is adjusted based on sim0 logRT (the covariate) so this value is the same for all age groups on a given graph. Solid colored lines indicate RT functions for "different" trials; dotted colored lines indicate RT functions for "same" trials. Interestingly, across all Category × Orientation conditions, younger children show evidence for parallel processing, with similarity functions that are nearly flat. Parallel processing seems to persist across inversion and category manipulations. In contrast, similarity functions for adults have steeper slopes than those for children, especially for face stimuli. Older children show a pattern that is intermediate to adults and younger children for upright faces, but that is similar to younger children for inverted faces. Older children look similar to adults for house stimuli.

The ANCOVAs, however, revealed age group differences in RT functions only for face stimuli for "different" trials. For upright faces, the Similarity × Age Group interaction was significant, *F*(4,66) = 2.7, *p* = 0.044. However, the simple effect of similarity was not significant for any age group. For inverted faces, the Similarity × Age Group interaction was significant, *F*(4,66) = 2.5, *p* = 0.049, and the simple effect of similarity was significant only for younger children (*p* = 0.047) and the trend was marginally linear (*p* = 0.05). However, this effect was driven by sim3 having faster RTs than the other sim levels (**Figure 6B**) so the linear trend was in the negative direction, which is not consistent with serial processing. For house stimuli, the Similarity × Age interaction was not significant.

Age group differences seemed to be even more pronounced on "same" trials. Adults always showed the highest RT (but older children were similar to adults for inverted houses), a pattern suggesting a trend toward serial exhaustive search. Children show a trend toward serial processing for "same" trials in most conditions as indicated by a longer RT for same responses than for the highest similarity level on different trials, except for inverted faces, where younger children show a tendency toward parallel processing (i.e., RT for same is not longer than RT for different trials). For "same" trials, the main effect of age was significant for upright faces, *F*(2,37) = 8.3, *p* = 0.001, inverted faces, *F*(2,37) = 18.1, *p* = 0.0001, upright houses, *F*(2, 37) = 5.0, *p* = 0.013, but not inverted houses.

(top) and logRT (bottom) for each age group in the featural condition. (B) Shows error rates (top) and logRT (bottom) for each age group in the 2nd order condition. Error bars are SE of the mean.

Data from Dataset 2 were combined with the data from Dataset 1 and analyses were rerun. As mentioned, these analyses only applied to upright stimuli as Dataset 2 did not manipulate orientation. The ANCOVAs revealed age group differences in RT functions only for featural face stimuli for "different" trials: the Similarity × Age Group interaction was marginal, *F*(4,160) = 2.0, *p* = 0.095, but the simple effect of similarity was significant for adults, *p <* 0.009 (linear trend, *p* = 0.003). For "same" trials, the main effect of age was significant for upright faces, *F*(2,84) = 12.4, *p* = 0.0001, and upright houses, *F*(2,86) = 8.8, *p* = 0.0001.

### 2nd Order Processing

**Figure 7** shows logRT as a function of similarity by age group and by each Category × Orientation condition for 2nd order configural stimuli. Younger children again show flatter similarity functions, or even negative-going patterns for some conditions, compared to older children and adults. Older children show functions that have similar slopes to adults across all conditions. The ANCOVAs revealed age group differences in RT functions only for upright stimuli. For upright faces, the Similarity × Age Group interaction was significant, *F*(4,118) = 3.3, *p* = 0.017, and the simple effect of similarity was significant for adults (*p <* 0.031) and older children (*p <* 0.013) but the linear trend was only significant in adults (*p* = 0.016). For upright houses, the Similarity × Age interaction was significant, *F*(4,118) = 9.0, *p* = 0.0001, but the simple effect of similarity was only marginally significant for older children (*p* = 0.053, no linear trend) and not significant for adults or younger children.

For "same" trials, the main effect of age was significant for upright faces, *F*(2,63) = 9.8, *p* = 0.0001, inverted faces, *F*(2,62) = 4.6, *p* = 0.014, upright houses, *F*(2,63) = 6.7, *p* = 0.002, and inverted houses, *F*(2,63) = 4.3, *p* = 0.018. Similar to the finding for featural faces, adults always have a longer RT on same trials than on different trials and younger children have an RT on same trials that is comparable to or faster than different trials.

Data from Dataset 2 were combined with the data from Dataset 1 and analyses were rerun. For 2nd order upright faces, the Similarity × Age Group interaction was significant, *F*(4,178) = 4.0, *p* = 0.004, and the simple main effect of

similarity was significant for adults (*p <* 0.006) and marginal for older children (*p* = 0.086), but the linear trend was only significant in adults (*p* = 0.005). For 2nd order upright houses, the Similarity × Age Group interaction was significant, *F*(4,178) = 11.1, *p* = 0.0001, but the simple effect of similarity was not significant for any age group. For "same" trials, the main effect of age was significant for upright faces, *F*(2,84) = 12.4, *p* = 0.0001, and upright houses, *F*(2,93) = 9.6, *p* = 0.0001.

#### Analysis of Untransformed Reaction Time

The interpretation of positively sloped similarity functions as evidence for serial processing may be questioned if logtransformed RTs are used, as in the present study. In other words, a log transformation is a non-linear transformation, so the relation between similarity and RT cannot necessarily assumed to be linear, which is an important assumption for serial processing. To address this, we conducted the analyses for Hypothesis 2a using the raw, untransformed RT (only for correct responses and with outliers removed, as was the case for log-transformed RTs) and the results are fundamentally the same (see Supplement). Importantly, the log-transformed and untransformed RT values yield a similar pattern of similarity functions with respect to age group. Because the log-transformed RTs lead to the same conclusions we would have reached using untransformed RTs, the remaining analyses were conducted using log-transformed RTs.

### Hypothesis 2b: Serial Processing will be Weaker for Houses and Inverted Faces

Hypothesis 2b states that when serial processing is present for upright faces (indicating holistic representations), houses and inverted faces will induce a bias toward parallel processing or weaker serial processing. Serial processing was only evident for 2nd order upright faces (according to the analyses for Hypothesis 2a in Section "Hypothesis 2a: Adults and Older Children will Show More Evidence for Serial Processing than Younger Children"). However, that analysis compared similarity functions across age but did not directly compare categories or orientations. The analysis for Hypothesis 2b requires comparing similarity functions across categories or across orientation conditions. These analyses were thus conducted within each age group that

showed some evidence for serial processing of upright faces; namely, adults and older children (but the effect in older children was marginal and the linear trend did not reach significance). Also, because different age groups were not compared with each other in this analysis, sim0 was not a covariate but instead was included as a level of the independent variable of similarity.

The Similarity (sim0-3) × Category (upright face, upright house) ANOVA revealed a significant interaction only in adults, *F*(3,111) = 4.0, *p* = 0.01, with houses invoking a bias toward parallel processing (**Figure 7**). Although a similar pattern is apparent in older children, this interaction was not significant. The Similarity × Orientation (upright face, inverted face) ANOVA also revealed a significant interaction only in adults, *F*(3,111) = 8.1, *p* = 0.0001, with inverted faces invoking a bias toward parallel processing. Again, older children showed the same pattern but the interaction was not significant. Hence, serial processing is only significant in adults for 2nd order upright faces. House and inverted face stimuli induce more parallel processing in adults (or weaken serial processing). Older children show a similar pattern as adults, but the effects do not reach significance. Younger children show no evidence for serial processing for any of the stimuli or conditions.

### Hypothesis 3: Inversion May Affect 2nd Order Processing More than Featural Processing

Hypothesis 3 states that inversion may affect 2nd order processing more than featural processing. Given that 2nd order upright faces were the only stimulus that invoked serial processing (in adults) and serial processing was weaker with inversion (the significant Similarity × Orientation interaction in adults), this hypothesis would be supported based on analyses above. However, a Similarity (sim0–sim3) × Orientation (upright, inverted) <sup>×</sup> Processing Type (featural, 2nd order) ANOVA was conducted separately in adults to directly compare similarity functions for different processing types and orientations. Although 2nd order faces appear to be more difficult, the similarity functions overlap on the RT scale.

Therefore, a covariate was not used in this analysis. The Similarity × Orientation × Processing Type interaction was significant, *<sup>F</sup>*(3,162) <sup>=</sup> 3.6, *<sup>p</sup>* <sup>=</sup> 0.022. As shown in **Figures 6** and **7**, the similarity functions for featural faces have similar slopes for upright versus inverted faces, but the similarity functions for 2nd order faces are different, with serial processing being weaker with inversion.

### Discussion

The present study examined perceptual matching performance in children and adults to further characterize developmental changes in processing facial information. The experiments manipulated many important factors that have been examined in prior studies of face development, including featural versus 2nd order processing, inversion and category, but the novel contribution was considering how these factors impact serial versus parallel processing which is a marker of the degree to which holistic processing is engaged. In general, several prior findings were replicated but new insights into the development of face processing also emerged. Each hypothesis is discussed in turn.

### Findings for Hypothesis 1: Adults Show Stronger Face Inversion Effects than Younger Children

If adults and older children represent faces holistically, they should exhibit stronger face inversion effects (collapsed over similarity) than younger children. This hypothesis was somewhat supported. Face inversion effects were observed only with respect to errors and not logRT. For featural stimuli, adults and older children showed marginally significant face inversion effects but younger children did not. For 2nd order configural stimuli, adults showed a significant face inversion effect and older and younger children showed marginally significant face inversion effects. In summary, adults were the only group to show a statistically significant face inversion effect, and only in the 2nd order condition. The marginal inversion effects in children are not surprising given the many findings that younger children show lessened face inversion effects. In addition, inversion effects appeared to be more pronounced for 2nd order faces as reported by many others (see Rossion, 2009 for review). However, 2nd order faces were indeed more difficult to differentiate, as the upright conditions for featural and 2nd order faces did not appear to be equated. Given this, we suggest that the question of whether inversion differentially affects featural and 2nd order processing be answered in the context of similarity functions and parallel versus serial processing (see Hypothesis 3).

### Findings for Hypothesis 2a: Adults and Older Children Show More Evidence for Serial Processing than Younger Children

If adults and older children represent faces holistically, they should engage a serial comparison process as a function of similarity of the face pairs. Younger children should show more evidence for parallel processing, driven by more analytical processing. This hypothesis was largely confirmed. Similarity functions for younger children showed evidence for parallel processing whereas similarity functions for adults and older children showed evidence for serial processing, most strongly in the 2nd order condition. Similarity functions were different in adults and children for face stimuli in the featural condition and for upright stimuli in the 2nd order condition. Older children showed similar patterns as adults and significant simple effects of similarity for upright 2nd order faces, but the linear trend was not significant. In fact, the similarity effect was linearly increasing (indicating serial processing) only for "adults" 2nd order and featural upright face conditions.

These findings suggest that older children show more adult-like processing of 2nd order information than featural information, with a holistic representation of faces that is more strongly linked to 2nd order information. Featural information is not as strongly integrated into a holistic representation in older children because they invoked a more "immature" strategy of parallel processing for featural faces. It seems, then, that (a) younger children show the weakest evidence for holistic representations, (b) older children show some evidence for holistic representations, but those representations incorporate 2nd order relations more than featural representations, and (c) adults show the strongest evidence for holistic representations that incorporate both 2nd order relations and, to some extent, featural information.

In some sense, these findings appear to be at odds with the conclusion from many studies that featural processing of faces develops sooner than 2nd order processing (see Mondloch et al., 2010 for review). We suggest that this apparent discrepancy likely reflects a transitional phase in older children in which the holistic representation includes both featural and 2nd order information but the degree to which that information is integrated is weaker compared to adults. Therefore, featural information is somewhat more accessible for analytical processing in older children, but at the same time, the 2nd order information is less accessible.

### Findings for Hypothesis 2b: Serial Processing was Weaker for Houses and Inverted Faces for Adults

When serial processing is present for upright faces (indicating holistic representations), houses and inverted faces will induce a bias toward parallel processing or weakening of serial processing. This hypothesis was confirmed only for adults. Older children showed patterns consistent with adults, but these patterns were not statistically significant. Younger children process upright faces in an analytical manner so neither inversion nor house stimuli could induce more parallel processing. These findings are indeed in line with some of the earliest studies showing that piecemeal or analytical processing of faces is predominant in young children and less so in adults (Carey and Diamond, 1977; Schwarzer, 2000). To our knowledge, however, this has not been demonstrated using a serial versus parallel processing framework. This finding is also consistent with attenuated face inversion effects in younger children both in the present study and the literature (Carey and Diamond, 1994; Schwarzer, 2000; Brace et al., 2001; Joseph et al., 2006; Meinhardt-Injac et al., 2014). Younger children process faces in a similar analytical manner as non-faces (Schwarzer, 2002); therefore, inversion has less effect on performance because inversion does not disrupt piecemeal processing.

### Findings for Hypothesis 3: Inversion Affects 2nd Order Processing More than Featural Processing

If inversion affects only 2nd order configural face processing, then inverted faces will show a bias toward parallel processing only in this condition and not in the featural condition. This hypothesis was examined to address the debate as to whether inversion affects 2nd order processing more than featural processing (Rossion, 2009) or whether inversion affects both kinds of processing equally (Riesenhuber and Wolff, 2009). The present findings are more consistent with the suggestion by Rossion (2009) that inversion affects 2nd order processing more. This was evident in the different slopes for similarity functions for 2nd order faces, but parallel slopes for featural faces, as a function of inversion. However, the finding that inversion induces a change in the intercept for featural faces (**Figure 6**) while preserving the slope is consistent with suggestions that inversion does not invoke qualitatively different processing (Riesenhuber et al., 2004; Sekuler et al., 2004), at least for featural faces. On the other hand, for 2nd order faces, inversion does not change the intercept but does change (i.e., weaken) the slope of the similarity function, indicating a shift away from serial to parallel processing. As noted by both sides of this debate, many of the findings depend on a range of different factors from defining what constitutes "features" or "face components" to different task demands. While the present findings do not resolve this debate, they do outline some conditions under which inversion induces a baseline shift in performance (featural information) versus inducing qualitatively different processing (2nd order information).

### Limitations of the Present Study

One alternative explanation for the finding of flat similarity functions in younger children is that this represents a ceiling effect such that the task was so difficult that younger children needed to take a maximal amount of time to make correct perceptual decisions. However, if ceiling effects (and flat similarity functions) reflected difficulty with the task then children should show ceiling effects for conditions that adults also found very difficult. In particular, adults showed the slowest responding on "same" trials for any given condition and these responses were even slower than the difficult sim3 condition. In contrast, younger children show "same" responses that are on par with the sim3 condition or even faster than the sim3 condition for 2nd order faces. This suggests that a different strategy is driving the similarity functions in younger children and adults, rather than a ceiling effect. Namely, because children are able to process features in parallel, they need not engage a serial exhaustive comparison process and can process the features simultaneously. Adults, in comparison, show evidence for a serial processing (and possibly serial exhaustive) strategy on "same" trials because RT is greater than or equal to RT in the sim3 condition.

Another potential limitation of the study was relatively small sample size, especially for the older children group. There is potentially greater heterogeneity in this age range (10–11 years) if perceptual processes engaged for faces are transitioning from a more immature pattern to a more adult-like pattern. Although we attempted to maximize sample size by including a second dataset, potential greater heterogeneity in this age would best be addressed with a larger sample. In this case, some of the adultlike patterns observed for older children may turn out to be significant.

### Summary and Conclusion

Using the conceptual framework of serial versus parallel processing as in other cognitive domains like selective attention and short-term memory scanning, the present study showed that holistic processing of faces matures during childhood. Younger children more often engaged parallel processing of individual face components and spacing relations than older children and adults. In contrast, adults more often engaged serial processing which is

### References


an index of holistic perception of faces. Older children showed a transitional pattern: their similarity functions often resembled that of adults, but effects did not always emerge as significant. We suggest that the findings in older children are driven by heterogeneity in performance across subjects precisely because they are in a transitional stage. Some older children exhibit adultlike holistic processing whereas other older children still exhibit a more immature analytical or piecemeal processing approach.

Holistic processing of upright faces in adults was reduced by inversion, primarily for 2nd order faces. This finding maps onto the suggestion that inversion has a more pronounced effect on 2nd order (spacing) information processing than on featural processing (Rossion, 2009). We suggest that this more pronounced effect is driven by a shift from holistic to more analytical processing with inversion. However, inversion induced a baseline shift in processing featural faces suggesting that the same process is engaged for upright and inverted featural processing (Riesenhuber and Wolff, 2009).

Development of face processing involves maturation of perceptual processes related to integrating featural and 2nd order information into a unified, holistic representation. Younger children had weak holistic representations given that they engaged parallel processing of individual face features and relations in all experimental conditions. Older children most often resembled adults showing some evidence for holistic representations that integrated 2nd order information. These findings map onto prior research findings but also point toward future and continued investigations of the circumstances that drive the use of 2nd order and featural information for a given face task.

### Acknowledgments

We thank Christine Corbly, Grace Baik and Faraday Davies with their help with data collection and Serena-Kaye Kinley-Cooper, Davy Vanderweyen and Ghislain St-Yves for their help with manuscript preparation. Funding for this research was provided by the National Institutes of Health (R01 HD052724, R01 HD042451).

### Supplementary Material

The Supplementary Material for this article can be found online at: http://journal*.*frontiersin*.*org/article/10*.*3389/fpsyg*.* 2015*.*01165

children. *J. Exp. Child Psychol.* 107, 195–206. doi: 10.1016/j.jecp.2010. 05.008


Tanaka, J. W., and Farah, M. J. (2003). "The holistic representation of faces," in *Perception of Faces, Objects, and Scenes: Analytic and Holistic Processes*, eds M. A. Peterson and G. Rhodes (New York, NY: Oxford University Press), 53–74.


Thompson, P. (1980). Margaret Thatcher: a new illusion. *Perception* 9, 483–484. doi: 10.1068/p090483


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Joseph, DiBartolo and Bhatt. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The sensitivity to replacement and displacement of the eyes region in early adolescence, young and later adulthood

Bozana Meinhardt-Injac\*, Malte Persike, Margarete Imhof and Günter Meinhardt

*Department of Psychology, Johannes Gutenberg University Mainz, Mainz, Germany*

#### Edited by:

*Laurence T. Maloney, Stanford University, USA*

#### Reviewed by:

*Benjamin J. Balas, North Dakota State University, USA Rocco Palumbo, Schepens Eye Research Institute, Harvard Medical School, USA*

#### \*Correspondence:

*Bozana Meinhardt-Injac, Section for Developmental and Educational Psychology, Department of Psychology, Johannes Gutenberg University Mainz, Wallstr. 3, 55099 Mainz, Germany meinharb@uni-mainz.de*

#### Specialty section:

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

Received: *06 April 2015* Accepted: *24 July 2015* Published: *11 August 2015*

#### Citation:

*Meinhardt-Injac B, Persike M, Imhof M and Meinhardt G (2015) The sensitivity to replacement and displacement of the eyes region in early adolescence, young and later adulthood. Front. Psychol. 6:1164. doi: 10.3389/fpsyg.2015.01164* Recent evidence suggests a rather gradual developmental trajectory for processing vertical relational face information, lasting well into late adolescence (de Heering and Schlitz, 2008). Results from another recent study (Tanaka et al., 2014) indicate that children and young adolescents use a smaller spatial integration field for faces than do adults, which particularly affects assessment of long-range vertical relations. Here we studied sensitivity to replacement of eyes and eyebrows (F), variation of inter-eye distance (H), and eye height (V) in young adolescents (11–12 years), young (21–25 years), and middle-age adults (51–62 years). In order to provide a baseline for potential age effects the sensitivity to all three types of face manipulations was calibrated to equal levels for the young adults group. Both the young adolescents and the middle-age adults showed substantially lower sensitivity compared to young adults, but only the young adolescents had selective impairment for V relational changes. Their inversion effects were at similar levels for all types of face manipulations, while in both adult groups the inversion effects for V were considerably stronger than for H or F changes. These results suggest that young adolescents use a limited spatial integration field for faces, and have not reached a mature state in processing vertical configural cues. The H–V asymmetry of inversion effects found for both adult groups indicates that adults integrate across the whole face when they view upright stimuli. However, the notably lower sensitivity of middle-age adults for all types of face manipulations, which was accompanied by a strong general "same" bias, suggests early age-related decline in attending cues for facial difference.

Keywords: development, aging, face perception, configural processing, inversion effect, response bias

## 1. Introduction

The ability to perceive and to recognize faces continuously develops during life-span, steeply rising in infancy and childhood, reaching highest performance in adulthood, and declining with age (Germine et al., 2011). Face perception is regarded as a special domain of ability, because there is no other object category with a comparable degree of part integration (Maurer et al., 2002). However, the high degree of interdependence among face parts is bound to the upright orientation. Turning faces upside down, or even rotating them, disrupts part integration, and sets up a part-wise access to facial features (Thompson, 1980; Tanaka and Farah, 1993; Tanaka and Sengco, 1997; Rossion and Boremanse, 2008). Further, face inversion dramatically affects the ability to judge spatial relations among facial features (Diamond and Carey, 1986; Barton et al., 2001). Some years ago Goffaux and Rossion found an asymmetry in the inversion effects for horizontal relational and vertical relational manipulations of the eyes region (Goffaux and Rossion, 2007). Manipulating vertical relations (changing eye height by moving the eyes and eyebrows region, V) produced large inversion effects, while manipulating horizontal relations (changing eye distance, H) produced small effects of inversion, which were in the same order of magnitude as featural changes (replacement of eyes, F). Sekunova and Barton (2008) contributed and validated a plausible explanation for this asymmetry. Judging eye distance is possible with just a pair of eyes, and without the embedding facial context. Hence, a local analysis of the highly salient eyes region is sufficient to judge eye distance. Eye height, in contrast, cannot be judged with a pair of eyes alone, but needs embedding context. Judging eye height gains precision if long-range spatial relations to multiple face regions (forehead, mouth, nose) are simultaneously taken into account. If inversion narrows the attentional window toward mostly the highly salient eyes region, local relational analysis should be maintained, but distal relational analysis should be affected. Hence, an asymmetry of H and V inversion effects should result. The authors obtained empirical support for their conjecture by testing the effects of moving eye but not eyebrow height. Doing so adds a valid local eye-height cue (eye–eyebrow distance), which can be handled in a small attentional window centered around the eyes. Indeed, this manipulation yielded small inversion effects for eye height (V), in the same order of magnitude as found for eye distance (H).

There is further evidence that the asymmetry in the inversion effects for H and V relational manipulations of the eyes region reflects that local and global configural information are analyzed in parallel by distinct routines when upright faces are viewed. Meinhardt-Injac et al. (2011) found that also the timing prerequisites for H and V inversion effects are quite different. Inversion effects for V appear already at brief timings starting with the first 50 ms, while H inversion effects emerge later, needing exposure durations of at at least 200 ms. Studying the influence of spatial scale Goffaux (2008) found that H manipulations were detected best with high-pass filtered images above 32 cycles per face width (cpfw), while sensitivity to V manipulations were best in bandpass filtered images maintaining the optimal spectrum for faces in the range of 8–32 cpfw. These results indicate that mechanisms sensitive to H manipulations analyse on smaller spatial scales and have sustained temporal characteristics, while mechanisms sensitive to V manipulations analyse on larger spatial scales are instantly responding. Studying the interaction among mouth and eyes region with a context congruency paradigm Goffaux (2009) found that the contextual interaction among these distal face regions was much stronger in the low spatial frequency range below 8 cpfw than in the high spatial frequency range beyond 32 cpfw. Inversion canceled the contextual interaction among both face regions. These results substantiate that the long-range interaction among face parts is critically bound to the upright orientation.

The parallel integration of diagnostic cues from shortrange and long-range relations is a remarkable capability of mature adult face vision (Smith et al., 2005). In a recent developmental study by de Heering and Schlitz (2012) the developmental trajectory of the sensitivity for vertical relational image manipulations of the eyes and mouth region was studied in upright faces. The authors observed a gradual, steady increase in the ability to detect changes in eye height (eyes and eyebrows) from 6 to 16 years of age, while detection of different mouth– nose distances remained at low performance levels, improving at a marginal rate across age. These results suggest that judging vertical relations undergoes protracted development, and still does not reach adult levels during adolescence. Tanaka et al. (2014) studied the sensitivity to size changes of mouth and eyes, as well as to relational changes in both regions [eye distance (H) and nose-mouth distance (V)], in the age range of 7–12 years, and compared to adult performance. They found that accuracy for children and young adolescents was remarkably worse than for adults for both featural and relational manipulations. Sensitivity for manipulations of the eyes region did not improve from 7 to 12 years, while sensitivity to manipulations of the mouth region smoothly increased, but at low absolute levels for mouthnose distance (V). Both studies indicate that efficient use of vertical relational cues in less salient regions of the face is not at mature levels for adolescent observers. The studies add to the findings which support protracted development of encoding relational face properties, which is a key characteristic of the mature "expert" face system (Maurer et al., 2002; Mondloch et al., 2004). Currently, there is no study which addressed whether the typical asymmetry in the inversion effects for H and V is found at younger ages. Because this asymmetry is highly diagnostic for short-range and long-range cue usage in faces (s. above), a study which fills this gap is requisite.

While the H–V asymmetry of inversion effects has yet not been addressed in childhood or adolescence, it has been studied at mature ages (Chaby et al., 2011). The authors used the same relational image manipulations as Goffaux and Rossion (2007), and obtained a large inversion effect for V and a small one for H with older subject in the age range of 60–80 years (mean age 69.9 years). However, the authors found that variation of eye distance (H) for upright stimuli was detected by the elderly with an accuracy near chance. Therefore, the inversion effect for H was limited by a floor effect in the baseline, disabling a valid comparison of H and V inversion effects at mature ages.

In the present study the effects of featural and relational image manipulations of the eyes region were investigated with upright and inverted faces for young adolescents (11–13 years), young adults (21–25 years), and middle-age observers (51–62 years). The age groups were selected such that no ceiling or floor effects could be expected. Further, the image manipulations for all three change types were calibrated to yield equal performance for upright faces in the young adults group. This guaranteed an equal baseline for judging inversion effects across change types, as well as a standard for age-related effects. Doing so, we aimed at revealing relevant clues to the developmental state of young adolescents in handling short- and long-range spatial relations, as well as age-related decline in these abilities.

### 2. Materials and Methods

### 2.1. Experimental Outline

The sensitivity to featural, horizontal relational, and vertical relational image manipulations of the eyes region was measured, having subjects perform a same/different forced choice task on a sequence of two face images with equal duration. Same and different pairs were equally likely, and stimulus pairs with F, H, and V manipulations were presented in randomly interleaved trials. Further, the stimulus orientation was upright or upsidedown, in random alternation. Same pairs were also constructed from two manipulated stimuli in order to preclude that the deviation from the anthropometric face normal could be used as a cue to the difference of face pairs. Catch trials with manipulation of the mouth region were included to have the observers not artificially narrow their attentional focus to just the eyes region. Accuracy and response bias were analyzed using the signal detection paradigm.

### 2.2. Participants

This study was conducted with participants from three age groups: young adolescents (11–12 years), young adults (21–25 years), and middle-age adults (51–62 years). No subject had prior psychophysical experience. Young adolescents (N = 20, 10 female, mean age = 11.7 years) were six-grade students of a German grammar school. In the young adults group there were 25 participants (13 female, mean age = 23.3 years). All were students at the Johannes Gutenberg University Mainz, but were not students of psychology. The group of middle-age adults consisted of 20 participants (15 female, mean age = 55.5 years). All participants had normal or corrected-to-normal vision. Furthermore, there were no known psychological conditions present in the participants. Prior to the study, all potential participants, and in case of the young adolescents, also their parents, were informed about the general study aims, the experimental testing approach, and the kind of judgements which were required from them. From all participants (or their parents in case of the young adolescents) written consent was received for participation. All participants participated on a voluntary basis and were not paid for their participation.

### 2.3. Stimuli and Calibration for Equal Salience of F, H, and V Manipulations

Photographs of 16 swiss male adults (mean age = 24.6 years, age span = 20 − 29 years), taken under controlled lighting conditions in a professional photo studio, were used for stimulus construction. The photographs were carefully selected from a larger database with the constraint that manipulation of the eyes and eyebrow region should be possible without evoking the impression of strong facial "oddity" in single stimulus instances. The photographs were converted to 300 × 400 pixel grayscale images and equalized in contrast. Image manipulations were done using Adobe Photoshop. For featural manipulations the eyes/eyebrows region of a face was replaced with the corresponding region of another face, assuring that no additional position or size cues were introduced by the replacement. In pilot experimentation prior to the main experiment with five student aids exchange pairs were found that yielded about 90% correct same/different judgements. Various values were probed for the horizontal and vertical shift of the eye/eyesbrows region. Finally, moving the eyes and eyebrows 20 pixels apart (H) and 14 pixels upward (V) was found to yield the same proportion of correct judgements as the featural manipulations. These values were used in the main experiment. Stimulus examples of the three change types are given in **Figure 1**<sup>1</sup> .

### 2.4. Design

The experiment had a 3 (Change Type) × 2 (Orientation) × 3 (Age) factorial design. The same/different matching task comprised 16 same and 16 different trials in each condition. Each of the 16 face instances was presented with each of the 3 change types once as a same and once as a different pair. We added 24 catch trials where the mouth was replaced, or moved horizontally, or vertically. Combined with trial-by-trial acoustical feedback catch trials were used to preclude that only the eyes region was attended. Each subject completed 216 trials, which lasted about 20 min.

### 2.5. Apparatus

The experiment was executed with Inquisit 2.0 runtime units. Patterns were displayed on NEC Spectra View 2090 TFT displays in 1280 × 1024 resolution at a refresh rate of 60 Hz. Screen background was the same light gray as the face image background. The room was darkened so that the ambient illumination approximately matched the illumination on the screen. Viewing was binocularly at a distance of 70 cm. Stimulus patterns subtended 12 × 15 cm of the screen. Subjects used a distance marker but no chin rest. They gave responses via the left and the right button of the computer mouse. The assignment of answers (same/different) to the left or right mouse button was counterbalanced across participants. Trial-by-trial acoustical feedback about correctness was given via light headphones. Nonannoying sounds were used: a "tack"-tone indicated a correct response, and a "tacktack"-tone signaled an error.

### 2.6. Procedure

The temporal order of events in a trial sequence was: fixation mark (300 ms)-blank (100 ms)-first stimulus frame (633 ms) mask (350 ms)-blank (200 ms)-second stimulus frame (633

<sup>1</sup>We decided to use just male face models, because the attractiveness of the female faces was particularly affected by changing metric proportions. There is evidence that the range of acceptable facial proportions is narrower in attractive faces (Grammer and Thornhill, 1994; Green et al., 2008). Various studies have substantiated an own-age bias in face recognition (see Rhodes and Anastasi, 2012, for an overview). However, there are only few studies studies on the cross-age effect for face perception tasks. In line with an expertise account of face perception de Heering and Rossion (2014) observed that young adults had a (slightly) larger composite face effect for faces of their own age, as compared to child faces. The effect was not found with preschool teachers. Wiese et al. (2013) studied a potential own-age bias with the composite face task in young (mean age 22.4 years) and older (mean age 67.8 years) adults. They found that both age groups had a small but significant performance advantage with young adult faces. The behavioral effect corresponded to slightly larger N170 amplitudes. Hence, no evidence for a stimulus age × participant age interaction was found. In view of these results we conclude that there is currently no indication for a strong own-age bias in the ability to apply holistic and/or configural viewing strategies to faces when purely perceptual tasks without long-term memory load are used.

ms)-mask (350 ms)-blank frame until response. Different trials were formed by pairing an original face with a manipulated face, with the assignment of a stimulus to the first or the second place in the trial sequence chosen at random. Same trials were formed by pairing two original faces or two manipulated faces, each alternative with equal likelihood<sup>2</sup> . Masking of the stimulus frames was done with spatial noise patterns with a grain resolution of 3 pixels. The presentation positions of each of the two face images were shifted by 20 pixels away from the center in random direction in order to preclude focusing on the same image parts. Pairs with manipulations according to either change type were presented randomly interleaved. Faces were presented upright or upside down, in random alternation.

The setting for the duration of the stimulus presentations (633 ms) was found in pilot experimentation prior to the main experiment. Five student aids, two young adolescents and two middle-age adults were tested with various exposure durations, ranging from 300 to 1200 ms. The students reached saturating performance already for timings of beyond 400 ms. Accuracy of the two middle-age subjects and the two young adolescents did not further improve for values beyond 633 ms. We therefore decided to select this value for the exposure duration of the face stimuli in the main experiment.

The young adolescents were introduced to the experiment in greater detail. An outline of this study was presented to all grade six pupils at a German grammar school. The investigator explained the general study outline and presented examples of the stimuli on an overhead projector. She clarified in detail with different face image examples why stimulus pairs were the same

<sup>2</sup>We paired also two manipulated faces in same trials to preclude that perception of facial "oddity" (i.e., deviation from the anthropometric average) could be used as a cue to the difference of face pairs. This was used as a means to having the observers compare just the perceptual impressions of the faces, without referring to what is experienced as "normal." We do not use quantitative descriptions of normative facial anthropometric descriptions (Farkas, 1994), since at least the V manipulations would require to refer to a variety of index measures which change in close correspondence to a variation of eye height (forehead length, nose length, eyes-to-nose distance, etc.). Potentially relevant contextual cues for judging eye height are discussed in this article.

or different. Each participant received an additional individual explanation prior to the experiment. Here, four faces on a piece of paper with the same face template but in all four conformations (original, F, H, and V) were shown. To reassure that the face manipulations were understood the participants were asked to point out why the faces were different. Subsequently they were invited to start with first test trial on the computer, using the computer mouse for the responses as in the later experiment. After a short introduction by the experimenter the subjects initiated each probe trial on their own. After that, each participant completed 36 probe trials in order to ensure that the instruction was understood and could be put into practice. For young and middle-age adults the same individual explanation procedure was used to ensure that subjects unfamiliar with psychophysical tasks were equally well instructed. Specifically, all participants were informed that two instances of the same basic face would appear in a sequence, either identical or slightly differing in the inner part of the face. Participants were also informed that occasional changes in the mouth region could occur, which should not be overlooked. Participants were told to give any answer if they were uncertain about the right alternative, but to try to be as correct as possible.

### 2.7. Performance Measures

Accuracy was measured in terms of the proportion of correct judgments for each response alternative and then transformed to d ′ using standard formulae, i.e., d ′ = z(Hit) − z(FA) for the sensitivity measure and c = −1/2(z(Hit) + z(FA)) for the response criterion on the standard axis, scaled such that 0 referred to no response preference, negative values indicated a "same" bias and positive values a "different" bias. Since the "same" response category is commonly defined as the target category in the recent face perception literature (e.g., Richler et al., 2011) we complied with this standard. Accordingly, hit-rate (Hit) was defined as the rate of correctly identifying "same" trials and correct rejection rate (CR) was defined as the rate of correctly identifying "different trials." False alarm rate (FA) and the rate of misses (Miss) were defined as being the complementary rates to CR and Hit, respectively.

### 2.8. Data Analysis

Both the d ′ measure and the response criterion c were analyzed with ANOVA, having age group (Age) as grouping factors and change type (Change Type) and orientation (Orientation) as repeated measurement factors. To reveal effects of Change Type in the sensitivity measure separate ANOVAs per age group were run, since Change Type was calibrated for performance equivalence in the younger adults group, and this might underestimate the true variance of the Change Type factor in the main effect and its interactions with Orientation and Age.

### 3. Results

### 3.1. Sensitivity Measure

**Figure 2** shows the average d ′ scores for the three age groups and the three change types. The data for the young adults reflect equal performance for upright faces with F, H, and V manipulations

at a level of 86% correct, which corresponded to a d ′ score of 2.36. This value was slightly below the target calibration value of 90% correct, which was reached by a subgroup of experienced observers of the same age in pilot experimentation. Overall ANOVA confirmed that there were substantial main effects of Age, Change Type, and Orientation (see **Table 1**). As indicated by significant interactions, the effect of Orientation was modulated by Age and by Change Type, while the interaction among all three factors was marginally significant. The interaction of Change Type and Age was not significant. The assumption of normality was checked for the ANOVA data by analyzing normality of the within-cell residuals with the q–q plot correlation technique (Filliben, 1975). This test showed fairly good agreement of d ′ residuals with the assumption of normality (see Appendix).

Post-hoc testing with Fisher LSD tests showed that younger adults outperformed young adolescents and middle-age adults both with upright and inverted stimuli (all p < 0.001). With either orientation, performance was not significantly different among young adolescents and middle-age adults [p = 0.435 (upright), p = 0.129 (inverted)].

To further explore the marginally significant interaction among all three factors separate ANOVAs were run for each age group. The results are shown in the **Tables 2**–**4**. In the young adolescents group there were strong main effects of Change Type and Orientation, but no interaction among both factors. LSD post-hoc tests showed that performance in V was worse than in H (p < 0.001) and F (p < 0.002), while performance in H and F was at equal levels (p = 0.635). For both young and middleage adults there were strong main effects of Change Type and Orientation and a strong interaction among both factors. LSD



*The table shows source of variation, sum of squares (SS), degrees of freedom (df), variance estimate (*σˆ 2 *), F- ratio (F), significance level (p), and partial eta-squared (*η 2 *p ).*

TABLE 2 | ANOVA results for the same/different matching accuracy for faces (d ′ measure) in the young adolescents group.


*Conventions as in* Table 1*.*

TABLE 3 | ANOVA results for the same/different matching accuracy for faces (d ′ measure) in the young adults group.


*Conventions as in* Table 1*.*

post-hoc tests indicated worse performance in V compared to H (young adults: p < 0.001; middle-age adults: p < 0.005) and F (young adults: p < 0.004; middle-age adults: p < 0.001), and no different performance in H and F (young adults: p < 0.43; middle-age adults: p = 0.533). Note that, for young adults, these differences just reflected the change type effects for inverted stimuli.

TABLE 4 | ANOVA results for the same/different matching accuracy for faces (d ′ measure) in the middle-age adults group.


*Conventions as in* Table 1*.*

#### 3.2. Inversion Effects

The overall ANOVA indicated that inversion effects were strongly modulated by age. The specific age dependency of the inversion effects is best reflected in the separate ANOVAs for each age group (see **Tables 2**–**4**). For both young and middle-age adults the inversion effect was strongly modulated by Change Type (see **Tables 3**, **4**) while, for young adolescents, the inversion effect was independent of Change Type (see **Table 2**).

To better illustrate the effects of face inversion we calculated IEs at the level of individual data, and showed the results as Box–Whisker plots (**Figure 3**). The difference data were also fed into ANOVA in order to allow for post-hoc comparisons across conditions and age groups<sup>3</sup> . These analyses substantiated that in both adult groups there was practically the same results pattern of IEs, while young adolescents showed different IE results.

As it was expected from the non significant Change Type × Orientation interaction for young adolescents, LSD post-hoc tests showed that inversion effects were at about the same levels for F, H, and V (F vs. H: p = 0.787; F vs. V: p = 0.674; H vs. V: p = 0.489). In contrast, for both adults groups the inversion effect was the strongest for vertically manipulated faces (young adults: V vs. F and V vs. H both p < 0.001; middle-age adults: V vs. F and V vs. H both p < 0.03). The IEs for F tended to be larger than the IEs for H, but with just marginal significance for young adults (p = 0.079) and failing statistical significance (p = 0.323) for middle-age adults.

LSD post-hoc comparisons across age showed that the IE of young adults in V was significantly larger than any other IE (p < 0.005 for the test against the IE in V for middle-age adults and p < 0.001 for any other pairwise comparison). For F and H young adolescents and young adults reached IEs at comparable levels (all p > 0.25). Evaluating confidence intervals (see **Figure 3**) showed that, for middle-age adults, the IEs for F and H were moderate, failing significance for H [F(1, 19) = 1.22, p = 0.284] and reaching just marginal significance for F [F(1, 19) = 3.66, p = 0.071] . However, post-hoc comparison against the corresponding IEs for young adolescents gave nonsignificant results (F: p = 0.397; H: p = 0.138). Comparing against the IEs of young adults revealed a significantly larger IE

<sup>3</sup>Note that the main effects and interactions of the IE difference data are already included in the overall ANOVA.

of young adults in F (p < 0.04) but not in H (p = 0.140). This might reflect the limited testing power of post-hoc testing, particularly when a difference measure is used DeGutis et al. (2013).

### 3.3. Response Bias

Analysing the response criterion c as the indicator of response bias (see **Figure 4**) revealed significant differences between young adolescents, young and middle-age adults [Age: F(2, 62) = 11.28, p < 0.001]. Middle-age adults showed a strong general "same"-bias, while young adults and young adolescents did not [young adolescents: F(1, 19) = 0.97, p = 0.336; young adults: F(1, 24) = 1.82, p = 0.190; middle-age adults: F(1, 19) = 46.46, p < 0.001]. Also stimulus orientation modulated the subjects' response preferences [F(1, 62) = 10.14, p < 0.003], since inverted faces more often elicited "different" responses than did upright faces. This response pattern was most pronounced in young adults, while, in the other two age groups, this tendency was negligible [Age × Orientation: F(2, 62) = 3.16, p < 0.05; LSD post-hoc: p = 0.216 (young adolescents), p < 0.001 (young adults), p = 0.659 (middle-age adults)]. In all three age groups the preference for "same" responses increased in the order F, H, V [Change Type: F(2, 124) = 18.05, p < 0.001; Change Type × Age: F(4, 124) = 0.57, p = 0.687]. Analysis of the catch trials showed rather low percentage correct in each of the three age groups: 58.4% (young adolescents), 69.2% (young adults), 62.7% (middle-age adults). This indicates that the mouth region was not in the active window of spatial attention, albeit the catch trials should alert the observers also to attending the lower face part.

box represents the mean IE, the outer box standard error and the Whiskers indicate 95% confidence limits of the mean IE. Note that a IE is significant if 0 (dashed black horizontal line) is outside the confidence interval.

### 4. Discussion

We revisited the inversion effect for F, H, and V face image manipulations in the eyes region for young adults, and compared with young adolescents and middle-age observers. The sensitivity for detecting changes according to the three change types was calibrated to an equal level (d ′ = 2.36) in the young adults group. Both the young adolescents and the middle-age adults showed an about 1 d ′ unit lower sensitivity. For young adolescents the decline was strongest for V relational manipulations. Inversion effects for young adults showed the typical H–V asymmetry, with strongest IEs for V and moderate ones for H, while IEs for F were at intermediate levels. This exactly replicated previous results (Meinhardt-Injac et al., 2011). For middleage adults nearly the same IE results were found, but with a constantly smaller IE magnitude. Inversion effects for young adolescents, however, did not show the H–V asymmetry, and were at equal levels for F, H, and V. In the following, the findings are discussed for each age group. Finally, we give an outlook to current constraints for inversion effect measurement across age.

### 4.1. Young Adolescents Show Lowered Sensitivity to Vertical Relational Changes in the Eyes Region

The generally lowered sensitivity level of more than 1 d ′ unit indicates that young adolescents are still far away from adult levels in their ability to judge featural and relational face image manipulations of the eyes region. Our results correspond to findings of Tanaka et al. (2014), who also found generally lowered sensitivity to featural and relational changes at younger ages up to 12 years. In our study sensitivity to changes in eye height (V)

was particularly lowered, while sensitivity to eye distance (H) was larger, at the same level as for replacement of eyes and eyebrows (F). Mondloch et al. (2002) mixed manipulations of eye height and eye distance ("configural") and compared to replacement of eyes and mouth ("featural"). They found that sensitivity to featural manipulations improved faster with age than sensitivity to configural manipulations. Disentangling H and V relational changes shows that eye distance and featural changes in the eyes region are detected equally well by young adolescents at 10–12 years of age (this study; Tanaka et al., 2014). Both de Heering and Schlitz (2008) and Tanaka et al. (2014) found that vertical relational manipulations in the mouth region were detected with relatively poor sensitivity in the age range of 7–12 years. Featural and relational manipulations of the eyes region were found to be detected much better. However, in both studies the locus of change (mouth region, eyes region) and the type of relational change (H, V) were not orthogonally varied, which makes it difficult to judge whether protracted development concerns V type relational changes, compared to H relational changes, or the mouth region compared to the eyes region. Because Tanaka et al. (2014) observed that the sensitivity to featural changes in the mouth region was as high as the sensitivity to H relational changes in the eyes region, one might conclude that sensitivity to V relational changes develops more slowly (de Heering and Schlitz, 2008).

### 4.2. No Asymmetry of the Inversion Effect for H and V Relational Changes in the Eyes Region for Young Adolescents

Adults show a pronounced IE asymmetry for H and V relational manipulations of the eyes region (Goffaux and Rossion, 2007; Sekunova and Barton, 2008; Meinhardt-Injac et al., 2011). We found that young adolescents do not show this typical asymmetry, but exhibit equal inversion effects for F, H, and V. According to Sekunova and Barton (2008) eye distance can be judged without further reference to distal contextual cues, while judging eye height necessarily relies on reference to other facial features and should therefore improve by integrating relational cues across the whole face. If inversion narrows the spatial window of cue integration to a region centered around the eyes, judgement of eye height should suffer more than judgement of eye distance. Also the sensitivity to changes in non-salient, distal face regions should strongly decline. The "spatial narrowing" hypothesis of inversion is supported by findings which show that the IE for non-salient face regions is generally large, but declines substantially if the observers are cued to the region of interest, or a blocking design is used, or observers are given enough time to scan the face stimulus part by part (Barton et al., 2001; Sekunova and Barton, 2008). In line with this interpretation of the inversion effect, the observation of same IEs for all change types indicates that the spatial window of cue integration of young adolescents is confined to a limited region centered around the eyes. Since judging eye height critically depends on cue integration from multiple face regions, the sensitivity of young adolescents to vertical relational changes is disproportionally lowered compared to adults, who integrate cues from the whole face in upright face vision. Inversion further shrinks the window of cue integration, but this should concern detection of F, H, V changes to equal degrees if cue integration for upright stimuli is already confined to the eyes region.

Hence, both findings, the disproportionately lowered sensitivity to V changes in the eyes region and the lack of the asymmetry in the inversion effects for H and V relational changes support the conjecture of Tanaka et al. (2014) that the window of facial cue integration is centered to a confined region around the eyes during childhood, but widens during the course of development, ending in the ability of young adults to simultaneously integrate local and distal cues across the whole face.

### 4.3. The Effects of Featural Changes

The question whether there are distinct mechanisms tuned to "features" and "configurations" has raised serious quarrels in face processing literature (Riesenhuber et al., 2004; Rossion, 2008; Riesenhuber and Wolff, 2009). In a recent review of the magnitude of the inversion effect including 22 studies McKone and Yovel (2009) reported that inversion effects for featural changes were small only when non-shape properties, such as color or brightness, were changed. For shape changes inversion effects were found to be in the same order of magnitude as for manipulations of feature spacing. Most critical for the size of the IE was involvement of facial context. These results confirm to us that a sound distinction of featural and configural processing is impossible (see also Discussion in Meinhardt-Injac et al., 2011). Shape changes do necessarily alter the relational description of a face stimulus—but what authors generally mean by "featural" changes are structural changes of features (scaling, replacement) and not changes of color, contrast or glare.

For both adult groups we found stronger inversion effects for replacement of the eyes region (F) than for manipulating eye distance (H). This indicates that replacement of eyes and eyebrows alters the relational description of face stimuli stronger than moving eyes apart. Note that a change of eyes and eyebrows is usually accompanied by a change of personal identity, while moving eyes apart is not. The "featural" change in **Figure 1** is readily perceived for the upright face pair, but not for the inverted (upper row). The difference in eye distance is still salient for the upside-down pair (mid row), indicating the relative contextual independence of eye distance. The F change in the upright pair is salient because one sees two different persons, and not just two different pairs of eyes. The stronger inversion effect for F compared to H for adults results from holistic integration across the face, which suffers from shrinking the spatial focus due to inversion. Young adolescents do not show this effect—but exhibit same inversion effects for all three change types. This, again, corroborates that their spatial integration window is confined to the eyes region, while the area of integration spans the whole face in adults. Therefore, we conclude that the effective area of cue integration is a simple concept with potentially much higher explanatory power than the "featural-configural" dichotomy, which is not validated in terms of inversion effect.

### 4.4. Sensitivity to F, H, and V Manipulations in Middle-age Adults

In a recent cross-sectional study Germine et al. (2011) found evidence for a late peak of face memory performance. Using the Cambridge Face Memory Test they found a performance peak at about 30 years, and continuous decline afterwards. Interestingly, face inversion effects showed an increase up to middle adult ages. In this study, middle-age adults performed at approximately the same level as young adolescents when comparing upright stimuli with featural and relational manipulations of the eyes region. This means that there is a remarkable age-related decline in this ability in the age range of 50–60 years. Chaby et al. (2011) compared the sensitivity to H and V manipulations of the eyes region among young adults and older participants (mean age 69.9 years). For young adults their results exactly correspond to our measurements, with a mean accuracy of slightly below 90% in upright presentation, a very large IE for V changes and a moderate one for H changes. For older adults, they obtained about 75% correct in upright presentations for V, which again corresponds to our results, but chance performance for H. In our study middle-age adults were able to handle H changes with at least equal accuracy than V changes. A further difference to our results is that V changes were detected at chance level for inverted stimuli in both age groups in the Chaby study, while, here, performance was well above chance in all experimental conditions. Chaby et al. (2011) claimed that their finding of a large IE for V, which was comparable to the IE of young adults, indicated that configural processing along the important vertical face axis encompassing eyes and mouth region is maintained at mature ages.

Besides the puzzling inability to judge H relations, the conclusion that vertical relations are preserved at mature ages is not fully supported by the measurements of the Chaby study, since chance level performance with inverted stimuli in both age groups implies that the true size of the inversion effect is not revealed. It can therefore not be excluded that the true inversion effect of young adults is larger. Because V relational changes were realized by manipulating both eye height and mouth height, shrinking the window of facial cue integration by inversion can account for the strong IE in both age groups. The confined window would no longer encompass the mouth region, and half of the spacing difference would stay unnoticed in upside-down stimuli.

The decline in sensitivity of about 1 d ′ unit for upright stimuli observed in this study speaks against the claim that a full and flexible use of long- and short-range relational cues is maintained at mature adult ages. Indeed, we found the typical asymmetry in the inversion effects for H and V changes, but all IEs were smaller compared to young adults. As for the young adults, the H–V asymmetry of the inversion effect suggests that also middle-age adults integrate relational cues across a large face area for upright stimuli and use a confined integration window for upside-down faces. However, middleage subjects performed notably worse than young adults with upright stimuli, while the performance difference with inverted stimuli was considerably smaller (see **Figure 2**). This suggests that there is an age-related difference in the efficiency of using diagnostic cues, which are in principle available, since the cue integration window is wide. These results correspond to findings of Daniel and Bentin (2012), who found that adults at mature ages show decline in applying configural information in gender categorization based on internal features, a task that heavily relies on an appropriately using local-configural cues. Studying the interaction of external and internal features with a congruency paradigm Meinhardt-Injac et al. (2014b) found the same degree of contextual interaction for young adults and elderly, indicating holistic integration across the whole face for upright stimuli in both age groups. Older adults, however, suffered from a loss of precision when handling internal features. Roudaia et al. (2008) studied contour integration performance and obtained results which suggest that aging is accompanied by a loss in elementary local grouping mechanisms. While there is increasing evidence that the general holistic nature of face perception is maintained at mature ages (Konar et al., 2013; Meinhardt-Injac et al., 2014b), recent findings suggest that adapting viewing strategies aided by feedback, coping with increased attentional demand and flexible handling of diagnostic cues are affected by aging (Meinhardt-Injac et al., 2014a).

### 4.5. Response Bias Effects Across Age

Analysis of response bias revealed an interesting age effect. Middle-age adults were strongly biased toward "same" responses, while young adults and adolescents had no global response preference. A global bias toward "same" responses was also reported for older subjects in the age range of 65–78 years for the composite face task (Meinhardt-Injac et al., 2014a). This indicates that the most frequent error of older adults in face comparisons is overlooking the difference. This tendency might result from the failure to attend the relevant diagnostic features, and a loss of detail precision (see above). Also the type of image manipulation modulated response bias. Vertical relational judgements were accompanied by the strongest "same" bias across all ages. For V changes the response criterion c was consistently lower than for F and H changes for all age groups, and it was also below the expected value 0, which indicates that there was an absolute bias for "same" responses, and not only a relative tendency compared to H and F (see **Figure 4**). Hence, for V changes there was an age-independent tendency to overlook the difference in feature spacings. The general "same" bias for V changes is a further hint that the cues that mediate detection of the difference are not all in the active window of spatial attention. We added catch trials with changes in the mouth region in order to preclude that subjects attended only the eyes region. The poor accuracy in the catch trials is a hint that subjects nonetheless mostly concentrated on the eyes region. This indicates that relational cues from the distal mouth region surely entered with minor weight in the same/different judgement of two faces<sup>4</sup> .

### 4.6. Studying Sensitivity to Relational Changes Across Age

The results of this study suggest that the distinction of H and V configural changes is much more relevant for hypothetical

<sup>4</sup>Note that the "spatial window of cue integration" is not necessarily identical with the attended face region. If the eyes region is attended, and the mouth region is not, cues from the mouth are also perceived, but with less precision. Context congruency paradigms exploit that there is a perceptual interaction among attended and non-attended face parts (Goffaux, 2009; Meinhardt-Injac et al., 2010).

developmental trajectories than the "featural" and "relational" dichotomy. Compared to horizontal relations, the ability to judge vertical relations seemingly suffers from a developmental delay, which is yet not balanced in early adolescence. However, comparing sensitivity to H and V relations is confounded with the effective size of the spatial cue integration window. At the time, it is unclear whether the poorer performance of young adolescents in judging V relations is due to a smaller area of facial cue integration, or the processing route for vertical configural information (Goffaux et al., 2009) has not yet fully matured, or, likely, both reasons apply. The results for young adolescents obtained here suggest both a smaller cue integration field and a specific developmental delay for processing V relations. The results for middle-age subjects suggest a wide cue integration field, but a general sensitivity decline for configural cues. Further research should address ways to disentangle the two hypothetical sources of less efficient facial cue integration by applying techniques which allow to selectively estimate the area of facial cue integration. The bubbles-technique (Gosselin and Schyns, 2001) would offer a possible way to go.

### Ethics Statement

This study was carried out in accordance with the Declaration of Helsinki. The experimental procedures were approved by the

### References


local ethics committee at the Johannes Gutenberg University Mainz. All subjects participated voluntarily and were informed that they were free to stop the experiment at any time without negative consequences. Written informed consent was obtained from all participants, in case of children, consent was also obtained from the parents. The data were analyzed anonymously.

### Author Contributions

All authors contributed equally to the conceptualization of the study. BI and MI set up the basic design. MP conducted the experiments and data preparation. GM contributed data analysis and interpretation. All authors were involved in writing, preparation of the manuscript and final approval. All authors agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are investigated and resolved appropriately.

### Funding

This study was supported by the university research fund of Johannes Gutenberg University Mainz. Funding was granted to BM for project "Visual perception across the life-span."


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Meinhardt-Injac, Persike, Imhof and Meinhardt. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

### Appendix

### Testing ANOVA Cell-residuals for Normality

In order to check the assumption of normality for the ANOVA data the within-cell residuals were standardized and the agreement of observed z-scores (zo) and the z-scores expected from the standard normal distribution (ze), was assessed via the q–q plot correlation technique (Filliben, 1975). **Figure A1** shows the q–q plots for the six experimental conditions, including the Pearson correlation coefficient, R, and the proportion of explained variance for the parameter-free function z<sup>o</sup> = ze, denoted as η 2 . Comparing the correlation coefficients to the critical correlation value for N = 75 observations, Rcrit,5% = 0.984 (see Johnson and Wichern, 2003, p. 182) shows that there was no violation of normality in any of the six experimental conditions.

# **Face processing in Williams syndrome is already atypical in infancy**

*Dean D'Souza <sup>1</sup> , Victoria Cole <sup>2</sup> , Emily K. Farran <sup>3</sup> , Janice H. Brown <sup>4</sup> , Kate Humphreys <sup>5</sup> , John Howard <sup>1</sup> , Maja Rodic <sup>6</sup> , Tessa M. Dekker <sup>7</sup> , Hana D'Souza <sup>6</sup> and Annette Karmiloff-Smith <sup>1</sup> \**

*<sup>1</sup> Department of Psychological Sciences, Centre for Brain and Cognitive Development, Birkbeck, University of London, London, UK, <sup>2</sup> Department of Biostatistics, Institute of Psychiatry, King's College London, London, UK, <sup>3</sup> Department of Psychology and Human Development, Institute of Education, University College London, London, UK, <sup>4</sup> Department of Psychology, London South Bank University, London, UK, <sup>5</sup> Institute of Child Health, University College London, London, UK, <sup>6</sup> Department of Psychology, Goldsmiths, University of London, London, UK, <sup>7</sup> Department of Visual Neuroscience, Institute of Ophthalmology, University College London, London, UK*

#### *Edited by:*

*Bozana Meinhardt-Injac, Johannes Gutenberg-Universität Mainz, Germany*

#### *Reviewed by:*

*Brian W. Haas, University of Georgia, USA Bianca Jovanovic, Justus Liebig University Giessen, Germany*

#### *\*Correspondence:*

*Annette Karmiloff-Smith, Department of Psychological Sciences, Centre for Brain and Cognitive Development, Birkbeck, University of London, 32 Torrington Square, London WC1E 7HX, UK a.karmiloff-smith@bbk.ac.uk*

#### *Specialty section:*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

*Received: 27 March 2015 Accepted: 22 May 2015 Published: 15 June 2015*

#### *Citation:*

*D'Souza D, Cole V, Farran EK, Brown JH, Humphreys K, Howard J, Rodic M, Dekker TM, D'Souza H and Karmiloff-Smith A (2015) Face processing in Williams syndrome is already atypical in infancy. Front. Psychol. 6:760. doi: 10.3389/fpsyg.2015.00760* Face processing is a crucial socio-cognitive ability. Is it acquired progressively or does it constitute an innately-specified, face-processing module? The latter would be supported if some individuals with seriously impaired intelligence nonetheless showed intact faceprocessing abilities. Some theorists claim that Williams syndrome (WS) provides such evidence since, despite IQs in the 50s, adolescents/adults with WS score in the normal range on standardized face-processing tests. Others argue that atypical neural and cognitive processes underlie WS face-processing proficiencies. But what about infants with WS? Do they start with typical face-processing abilities, with atypicality developing later, or are atypicalities already evident in infancy? We used an infant familiarization/novelty design and compared infants with WS to typically developing controls as well as to a group of infants with Down syndrome matched on both mental and chronological age. Participants were familiarized with a schematic face, after which they saw a novel face in which either the features (eye shape) were changed or just the configuration of the original features. Configural changes were processed successfully by controls, but not by infants with WS who were only sensitive to featural changes and who showed syndrome-specific profiles different from infants with the other neurodevelopmental disorder. Our findings indicate that theorists can no longer use the case of WS to support claims that evolution has endowed the human brain with an independent face-processing module.

**Keywords: infancy, Williams syndrome, Down syndrome, face processing, featural, configural, nativism, progressive modularization**

## **Introduction**

Faces provide us with important social information. We use them to guide our actions and to engage in social behavior. They are also ubiquitous in the environment. It is therefore not surprising that faces acquire a special status among visual stimuli. For instance, face recognition is more disrupted by stimulus inversion than is object recognition (Yin, 1969). There also exist adult neuropsychological patients who lose the ability to recognize objects but not faces (Moscovitch et al., 1997; Duchaine and Nakayama, 2005), and *vice versa* (Bodamer, 1990; Kanwisher, 2000; Busigny et al., 2010). Moreover, functional neuroimaging studies have revealed a region of cerebral cortex—the fusiform face area (FFA)—that is significantly more activated for faces than for non-face stimuli such as assorted objects (Kanwisher et al., 1997; McCarthy et al., 1997), strings of letters (Puce et al., 1996), animals without heads (Kanwisher et al., 1999), or the backs of human heads (Tong et al., 2000). Also, faces start to acquire their special status from a very early age. For example, neonates track moving face-like stimuli farther than other visual patterns of comparable complexity, contrast, and spatial frequency (Goren et al., 1975; Johnson et al., 1991). This, along with evidence that face processing is localized in the adult brain, has led to claims in the literature that evolution has endowed the human brain with an independent, minimally interactive, face-processing module (Kanwisher, 2000, 2010).

Further claims for this nativist, modular perspective call on a rare genetic disorder, Williams syndrome (WS: for full genotypic/phenotypic details, see Farran and Karmiloff-Smith, 2012). Adolescents and adults with WS are seriously impaired in a range of domains (e.g., spatial cognition, number, and problem solving; Donnai and Karmiloff-Smith, 2000), have an average IQ of 56 (Mervis and Bertrand, 1997), and yet they perform within the normal range on standardized face-processing tests (Bellugi et al., 1988a; Deruelle et al., 2003; Tager-Flusberg et al., 2003). This has lead to claims in the literature of an "intact," "spared," or "preserved" face-processing module in WS (Bellugi et al., 1988b, 1994; Wang et al., 1995). Nonetheless, the question of whether face processing is "normal" in this population or calls on atypical neuro-cognitive processes remains hotly debated (Mills et al., 2000; Deruelle et al., 2003; Tager-Flusberg et al., 2003; Karmiloff-Smith et al., 2004; D'Souza and Karmiloff-Smith, 2011).

Observational studies revealed that infants and young children with WS are fascinated with faces and spend more time looking at them than at objects (Mervis and Bertrand, 1997; Bellugi et al., 2000; Laing et al., 2002). Experimental studies also found that adolescents and adults with WS perform within (or near) the normal range on standardized face-processing tasks, such as the Benton Facial Recognition Test (Benton et al., 1983) and the Rivermead Face Memory Task (Wilson et al., 1986). But could there be different (i.e., atypical) neuro-cognitive underpinnings to their success on these tasks? For instance, rather than using normal configural processing, it is possible to recognize "faces" on the Benton test by detecting specific features within the face stimuli (e.g., a nose; Duchaine and Nakayama, 2004).

One of the classic claims in the literature about how face recognition is special and differs from object recognition is that the former relies on holistic or configural processing (Tanaka and Farah, 1993). "Holistic" processing occurs whenever a system processes the emergent features of stimuli—e.g., the overall gestalt of a face or, for instance, the area of a square, rather than the lines that make up the square (Piepers and Robbins, 2012). "Configural" information, by contrast, refers to the relationship between features and involves two levels of processing: first-order and second-order configural processing. Specifically, first-order configural information refers to the basic configuration of features (eyes above mouth), while second-order configural information refers to the brain's computation of precise variations in the spacing between these features (see Piepers and Robbins, 2012, for discussion). An important study by Deruelle et al. (1999; see also Rossen et al., 1995) found that, relative to controls, individuals with WS are better at processing the featural than the configural information of a face. Children and adults with WS (from 7 to 23 years of age) had to decide whether two pictures of faces, presented in upright and inverted conditions, were the "same" or "different." TD controls usually process upright faces configurally and inverted faces featurally (Young et al., 1987; Leder and Bruce, 2000), and are more successful at upright than inverted faces: known as the *face inversion effect* (Yin, 1969). By contrast, Deruelle et al. (1999) discovered that the participants with WS were less subject to the inversion effect than chronological age (CA)- and mental age (MA)-matched controls. The researchers proposed that individuals with WS have a bias to process featural over configural information, even when faces are upright. This dovetails with studies that show a similar pattern in other visuospatial domains in WS, leading to the claim that individuals with the disorder are "featural processors" (Pani et al., 1999; but see D'Souza et al., 2015).

However, in a later study, Deruelle et al. (2003) argued that face processing is "preserved" in individuals with WS. Children and adolescents (6–17 years) with WS were instructed to match faces to either a low- or high-spatial frequency filtered target face, with the hypothesis that low-spatial frequency filters would call upon holistic processing and high-spatial frequency filters would require configural processing. The participants with WS as well as the CA- and MA-matched controls all found it easier to process low spatially filtered faces than high ones, and did not differ significantly from each other. No effect of age was found either. It seems reasonable to conclude that all three groups had developed the ability to process faces holistically and were at ceiling (i.e., by 6 years of age).

Tager-Flusberg et al. (2003) also presented findings on face processing in WS. It was an important study because of its large sample size: 47 adolescents/adults with WS, 39 CAmatched controls. These participants were tested on a number of tasks, including the Benton and a part-whole paradigm (Tanaka and Farah, 1993). Tager-Flusberg et al. (2003) found that the surrounding face context had the same effect on individuals with WS as it did for CA-matched controls. The authors concluded that face processing is normal in WS.

However, Karmiloff-Smith et al. (2004) argued that some researchers were conflating two different concepts: "holistic" and "configural" processing, and that the findings of Deruelle, Tager-Flusberg, and others did indeed provide evidence of relatively proficient "holistic" processing in WS, but not of second-order "configural" processing which develops later in TD (Maurer et al., 2002; Mondloch et al., 2002; Liu et al., 2013). As mentioned above, first-order holistic processing occurs when a face is processed directly as a "gestalt" (i.e., fusion between different elements in an array—a low-level visual phenomena). The debate, according to Karmiloff-Smith et al. (2004) is not whether individuals with WS can process a face as a gestalt, they can, but whether they make use of featural or precise configural information (or both). And herein lies the crux of the issue, the focus of our paper: Is featural and/or configural face processing atypical in WS?

Holistic and configural face processing are both involved in normal face recognition. But they develop at different times and at different rates. Holistic processing, pace Carey and Diamond (1977), develops early in infancy (i.e., from at least 3 months of age; Turati et al., 2010), whereas configural and featural processing develops later and at a much slower rate (Liu et al., 2013). Karmiloff-Smith et al. (2004) identified both delay and deviance in face processing in adolescents and adults with WS, specifically with the processing of configural information. In other words, they found that although face processing is relatively proficient in WS, it develops atypically. This had also been confirmed by neuroimaging and event-related potential (ERP) studies of anomalous brain activation in WS during face recognition (Mills et al., 2000; Grice et al., 2003; Mobbs et al., 2004), as well as by recent developmental studies which revealed atypically developmental trajectories of configural face processing in older children with the disorder (Annaz et al., 2009; Dimitriou et al., 2014). The FFA has also been found to be larger in WS than in TD controls, which may also reflect atypical face processing (Golarai et al., 2010). Whether the WS brain starts out with a large FFA, or whether its unusual volume emerges as a result of overly focused face processing in young children (Karmiloff-Smith et al., 2012), remains an open debate. Nonetheless, these are important findings, because they highlight atypicalities in face processing in this syndrome.

In sum, there is currently no consensus on whether face processing is typical in WS. Yet evidence that face processing in WS is *prima facie* typical has been used to support the claim that evolution has endowed the human infant brain with independently functioning modules dedicated to specific functions, e.g., face processing (Kanwisher et al., 1997). So, when individuals with WS present with much more serious deficits in some domains (e.g., visuo-spatial) than others (e.g., face processing), it is taken as evidence of "impaired" and "spared" modules in WS (see D'Souza and Karmiloff-Smith, 2011, for discussion). Individuals with WS should not be seen as having a normal brain with impaired and spared parts, but rather as having a brain that is *developing* differently (Karmiloff-Smith, 1997, 1998). We hypothesize that the ability to perceive a face may appear "intact" when using basic standardized tests, but actually more sensitive measures will reveal that it develops atypically in WS. Face perception involves three different levels of processing: holistic, configural, and featural. Empirical studies (hitherto mainly of adolescents and adults) provide strong evidence—and a broad consensus—that holistic processing is relatively proficient in WS. By contrast, there is also behavioral and neural evidence from several labs (Mills et al., 2000; Deruelle et al., 2003; Grice et al., 2003; Karmiloff-Smith et al., 2004; Mobbs et al., 2004; Dimitriou et al., 2014), that configural processing may develop atypically in WS, and a possibility that featural processing is also atypical (Karmiloff-Smith, 1997). However, hitherto these processes have been examined in older children, adolescents and adults with WS. But what about infants with WS? Do they start with similar face-processing abilities to typically developing (TD) infants, with atypicality developing later, or are the atypicalities

already evident in infancy? This is an important question, because even if a general consensus does emerge that face processing is atypical in older children, adolescents, and adults with WS, then we would still need to know: (1) whether there is an atypicality in featural (as well as configural) processing (see Karmiloff-Smith, 1997), (2) whether the atypicality is present early in infancy or the outcome of a protracted developmental process that has been operating under atypical constraints, and (3) whether infants with WS show the same configural processing impairment observed in adolescents and adults with WS, or whether they also show a different impairment (i.e., featural).

To answer these questions, the present study compared featural and configural processing in infants/toddlers with WS with MAmatched TD control infants/toddlers. We also included a group of infants/toddlers with Down syndrome (DS) for two reasons. First, it was important to ascertain whether the WS profile was syndrome specific or simply due to low IQ, so the two neurodevelopmental disorder groups were matched on both CA and MA. Second, DS was selected as a comparison group because there is some evidence in the literature that whereas individuals with WS show a processing bias to featural over configural information, the opposite pattern obtains for DS (Bihrle et al., 1989; Bellugi et al., 1999; but see D'Souza et al., 2015). In the current study, we presented infants with two faces, a familiar face and either a (novel) featurally-changed face or a (novel) configurally-changed one.

We hypothesized that the infants/toddlers with WS would discriminate between the familiar face and the (novel) featurallychanged face, but not between the familiar face and the (novel) configurally-changed face. We predicted that the opposite pattern would hold for DS, and that the TD controls (who usually process upright faces configurally) would display proficiency in both conditions, albeit with stronger effects in the configural condition.

### **Materials and Methods**

### **Participants**

A total of 92 infants were tested: 29 infants/toddlers with WS, 20 infants/toddlers with DS, and 43 TD controls. The children with WS or DS had been tested either for a microdeletion of the ELN gene via *fluorescence in situ hybridization* or for full trisomy 21. All participants were assessed using the Bayley Scales of Infant Development (Bayley, 1993). Data from an overall 24 infants/toddlers (eight WS, nine DS, seven TD) were excluded from the study due to fussiness or drowsiness. **Table 1** shows the mean CAs and MAs for the remaining 68 participants. The groups did not significantly differ on MA, *F*2*,*<sup>65</sup> = 0.86, *p* = 0.429 (see Results).

**TABLE 1 | Mean (SD) chronological age (CA) and mental age (MA) for each group.**


### **Stimuli and Apparatus**

The stimuli were schematic faces: 7 cm (2.75 inch) yellow circles, with four black elements (two "eyes," one "nose," and one "mouth") on a black background. This basic schematic face (**Figure 1**) was used as the familiarization stimulus. Two featurally-modified and two configurally-modified versions of this basic stimulus were also created. The featural changes were made by replacing the round eyes with similarly sized squares or diamonds; the configural changes were made by stretching or squashing the features toward or away from the centre by 20 pixels (see **Figures 1** and **2**, for examples). We opted for schematic rather than real faces for several reasons. First, it had already been shown that infants' gaze behaviors to naturalistic faces do not differ from their behaviors to schematic faces (e.g., Farroni et al., 2005). Second, several studies have shown that the mechanisms involved in processing schematic faces are the same as those involved in processing naturalistic faces (see Johnson et al., 2015, for review). We therefore decided to use schematic faces because they are simpler to control and manipulate, can be presented in a very large format, with very strong color contrasts that capture and hold infants' attention.

The infants were seated on their parent's lap 60 cm (2 ft) away from a 97 cm *×* 56 cm (38 inch *×* 22 inch) monitor screen, in a dimly lit room with blank, off-white walls. The parents were instructed to look straight ahead and not at the stimuli, and to refrain from interacting with their child during the experiment. A video camera focused on the infant's face was mounted just under the monitor. The camera was connected to a VCR and monitor screen where the experimenter, who was hidden behind curtains,

could watch the infant live. For coding purposes, the experimenter used a "picture-in-picture" tool that showed the display of the infant's screen in the corner of the experimenter's monitor screen. The coder could therefore simultaneously see the infant's face and the display that the infant was looking at.

### **Design and Procedure**

Participants were presented with eight test trials. Each test trial was preceded by four familiarization trials, except for the first test trial, which was preceded by eight familiarization trials to be sure of familiarizing the infants with the model face. The familiarization trials consisted of one yellow schematic face (the *familiarized face*) on a black background (**Figure 1**). The test trials consisted of two faces presented side by side—one familiarized face, and one novel face (see **Figures 1** and **2**, for examples).While the familiarized face remained unchanged, there were four novel faces: two *configurally-changed faces* (one with the features of the face "squashed," the other "stretched") and two *featurally-changed faces* (one with square eyes, one with diamond eyes). Each of these faces was presented once on the left-hand side of the screen, and once on the right. The order of the eight test trials was randomized and subsequently fixed (in the following order: featural, configural, featural, configural, featural, configural, configural, featural). So every participant was presented with the same sequence of trials. The fixed order was presented to each participant, using E-prime (Psychological Software Tools, Pittsburgh, PA, USA).

Before the start of each trial, a noisy visual distractor was used to attract the child's attention to the screen. The trial started once the child was looking at the screen. Each familiarization trial lasted 2 s; each test trial, 4 s. The entire experiment lasted no longer than 3 min. All experimental procedures were in accordance with the Declaration of Helsinki, and were approved by the Departmental Ethics Committee, Department of Psychological Sciences, Birkbeck, University of London.

Preferential looking times were coded frame-by-frame using SuperCoder 1.5 (Hollich, 2005). The coder was blind to the experimental hypothesis. A second experimenter coded 10% of the trials. Inter-rater reliability was very high (*r* = 0.96).

### **Results**

### **Chronological Age (CA) and Mental Age (MA) Matching**

The CA data in the Control group were non-normal [*Z*Skewness = 0.09, *D*(36) = 0.16, *p* = 0.019]. Because the Control data had a (continuous) uniform distribution, rather than transforming the data, a non-parametric test (Independent-Samples Kruskal–Wallis) was used. As expected, the distribution of CA was significantly different across the three groups, *H*(2) = 34.81, *p <* 0.001. However, pairwise comparisons revealed that the DS (*Mdn* = 30.00) and WS (*Mdn* = 27.10) groups did not significantly differ on CA, *U* = 3.03, χ <sup>2</sup> = 0.41, *p* = 1.000. CA was significantly different in the TD control group (*Mdn* = 14.22) than in both the DS and WS groups, *U* = 30.26, χ <sup>2</sup> = 4.44, *p <* 0.001, *U* = 27.23, χ <sup>2</sup> = 5.02, *p <* 0.001, respectively. Because the DS and WS data were normally distributed (i.e., *Z*Skewness *< ±*2,

Kolmogorov–Smirnov, *p >* 0.05), an independent *t*-test was also carried out. It confirmed that the two groups were not significantly different on CA, *t*(30) = 1.38, *p* = 0.178.

Mental age data were normally distributed. A one way ANOVA showed that the groups did not significantly differ on MA, *F*2*,*<sup>65</sup> = 0.86, *p* = 0.429, showing that the atypical groups were well matched to one another and to the TD controls.

### **Proportion of Target Looking**

For each participant and each test trial, the proportion of target looking (PTL) was calculated. PTL is the total amount of time spent looking at the "target" stimulus (i.e., the novel face) as a proportion of the total amount of time spent looking at both the target and the "non-target" stimuli (i.e., the novel face + the familiar face). No data were excluded from these analyses, because none were three standard deviations greater or smaller than the group mean.

The PTL data were normally distributed. There were no main effects of Group, *F*2*,*<sup>65</sup> = 1.13, *p* = 0.328, or Condition, *F*1*,*<sup>65</sup> = 0.003, *p* = 0.955. Nor was there an interaction effect, *F*2*,*<sup>65</sup> = 1.17, *p* = 0.317. **Figure 3** illustrates the data from the PTL analysis.

### **Longest Look Difference**

Proportion of target looking represents an infant's relative interest over the course of an entire trial/experiment. It is one of the most common measures used by infant researchers to investigate cognitive phenomena. However, it does lack sensitivity. For instance, a participant might look for longer at one face (e.g., the target) and then, after building up an internal representation of it, switch to the other face, simply out of interest, before the trial has ended. This would reduce the likelihood of detecting a difference in looking behavior between the two faces. We therefore used another common, but more sensitive, measure—namely, longest look difference (LLD)—over the first four and last four test trials. We would expect a difference in the first four test trials but not in the last four test trials.

### Test Trials 1–4

The first four trials (featural left familiar right, configural left familiar right, familiar left featural right, familiar left configural right) were analyzed. As before, data that were three standard deviations greater or smaller than the group mean were excluded from the analyses on a trial basis (data only from two trials from 2 TD participants needed to be excluded). The LLD data were sufficiently normal.

### *TD children*

As expected for the first four trials, one-sample *t*-tests indicated that longer looks to the *configurally*-changed face were greater than the chance level of 0 in the TD group, *t*(34) = 2.69, *p* = 0.011, *r* = 0.42. This is considered the normal way for TD participants to process faces. We would, however, also expect TD controls to notice featural changes, albeit less strongly, and indeed a trend emerged with respect to the featurally-changed face, but the analysis did not survive a Bonferroni correction (*p >* 0.05).

#### *Williams syndrome*

As predicted, a one-sample *t*-test indicated that longer looks to the featurally-changed face were greater than chance in the WS group, *t*(20) = 2.09, *p* = 0.050,*r* = 0.46, but not to the configurallychanged face.

### *Down syndrome*

Longer looks were not significantly greater than chance in the DS group (*p >* 0.05) for either the featurally-changed or the configurally-changed faces.

#### *Intergroup analyses*

A 3 *×* 2 mixed-design ANOVA with LLDFirst (featural, configural) as a within-subjects factor and Group (TD control, WS, DS) as a between-subjects factor revealed no main effect of LLD, *F*1*,*<sup>63</sup> = 0.25, *p* = 0.617, or Group, *F*2*,*<sup>63</sup> = 0.88, *p* = 0.420. In other words, LLDfeatural did not differ from LLDconfigural, and the three groups did not differ on "LLD." However, there was an interaction between LLD (featural, configural) and Group, *F*2*,*<sup>63</sup> = 5.30, *p* = 0.007, η 2 *<sup>p</sup>* = 0.14 (**Figure 4**). *Post hoc t*-tests revealed a significant difference between the WS and TD control groups on LLDFeatural, *t*(54) = 3.05, *p* = 0.004, *r* = 0.38. No other result survived the Bonferroni correction.

### *Order effects*

To investigate order effects, we examined whether LLD for the first presentation of the configurally-changed face was significantly different from LLD for the second presentation of the configurally-changed face. If there were order effects, then we would expect LLD to change between the first and second presentations. Yet no significant changes were detected in the TD control group, *t*(36) = 1.48, *p* = 0.149, WS group, *t*(20) = 1.45, *p* = 0.162, or DS group, *t*(10) = 0.39, *p* = 0.707.

We also compared LLD for the first presentation of the featurally-changed face with LLD for the second presentation of the featurally-changed face. Again, if there were order effects, then we would expect LLD to change between the first and second presentations. Yet no significant changes were detected in the TD control group, *t*(36) = 0.23, *p* = 0.820, WS group, *t*(20) = 0.11, *p* = 0.914, or DS group, *t*(10) = 0.90, *p* = 0.387.

### Test Trials 5–8

The last four trials were also analyzed. Data that were three standard deviations greater or smaller than the group mean were excluded from the analyses on a trial basis (data only from four trials from 1 TD, 1 WS, and 1 DS participants were excluded). The LLD data were sufficiently normal.

As predicted for the last four trials, neither LLDfeatural nor LLDconfigural differed from chance in any of the groups (all *p >* 0.05). A 3 *×* 2 mixed-design ANOVA with LLDLast (featural, configural) as a within-subjects factor and Group (TD control, WS, DS) as a between-subjects factor revealed no main effect of LLD, *F*1*,*<sup>62</sup> *<* 0.01, *p* = 0.968, or Group, *F*2*,*<sup>62</sup> = 0.93, *p* = 0.400; nor was there an interaction effect, *F*2*,*<sup>62</sup> = 0.42, *p* = 0.662.

### **Discussion**

Typical face identification entails processing (1) the *features* of a face, (2) the *configuration* of these features or precise variations in the spacing between these features, and (3) the face *holistically* (i.e., as a gestalt). The latter (i.e., holistic face processing) develops in the first months of life in typical development and is relatively proficient in individuals with WS. This has led to claims in the literature that WS (which is characterized by an uneven cognitive profile) presents a unique case of "impaired" and "spared" cognitive modules—with face processing being an example of a spared cognitive module. However, although holistic face processing is proficient in this population, there is evidence that featural and/or configural face processing may be atypical in older children, adolescents, and adults with this syndrome. Indeed, our study revealed that this is the case in infants/toddlers with WS. As predicted, the TD controls showed a significant discrimination between the familiarized and configurally-changed faces, and a weaker discrimination between familiarized and featurallychanged faces. By contrast, and in accordance with our hypothesis, we found that the infants/toddlers with WS failed to discriminate between the faces in the Configural condition, yet showed a novelty preference for the featurally-changed face. This suggests that infants, like older children and adults with WS, have atypical face processing strategies and use predominantly featural rather than configural information to process upright faces.

In other words, although individuals with WS can process faces, our data reveal that they use an atypical strategy to do so. This is an important finding because it means that theorists can no longer argue for the existence of an "intact," "spared," or "preserved" faceprocessing module in WS.

Could theorists argue that face processing is "spared" in infancy and any failure in older children and adults is merely the outcome of a common developmental process that is operating under different (atypical) constraints? This is unlikely, because the present study demonstrates that both featural and configural face processing atypicalities are already evident in infancy. Thus, our data suggest that face processing in WS is already atypical in infancy.

This is a novel finding. It was once thought that face processing was intact in WS. However, evidence has been mounting that one aspect of face processing (configural) develops atypically in older children, adolescents, and adults with WS. Furthermore, a preliminary study hinted that young adults with WS succeed on face recognition tasks by focusing on the features of a face (Karmiloff-Smith, 1997). By manipulating the features of face stimuli and the configuration of these features, our data are the first to confirm that featural face processing is indeed atypical in this population, and that both featural and configural atypicalities are present early in childhood and are thus not the outcome of a protracted developmental process.

As far as concerns infants/toddlers with DS, although they tended to look longer at the novel face in both the Featural and Configural conditions, this difference did not reach significance. It is possible that for DS infants the sample size was too small to detect significant differences. It is also possible that the infants/toddlers with DS may have required a greater number of familiarization trials (than TD infants or those with WS) for them to detect changes in the stimuli. As far as the authors are aware, this is the first study to investigate face processing in such a young population of children with DS, so information on the required number of familiarization trials was not available. Nonetheless, the fact that discrimination is more challenging for infants with DS is a novel finding.

Although our TD infants demonstrated differential looking in the Configural condition (as expected), it is unclear why they showed a trend toward the familiarized face in the Featural condition and a significant bias toward the novel face in the Configural condition. Although children at this young age often demonstrate a bias toward familiarized faces (a *familiarity preference*), infant preferences can be driven by both novelty and familiarity (Fantz, 1964; Zajonc, 1968; Berlyne, 1970; Slater et al., 1983; Bornstein, 1989; Fang et al., 2007). It is possible that the TD infants found the Configural condition easier than the Featural condition; hence a novelty preference was elicited in the former but not in the latter.

Alternatively, it is possible that the TD controls showed a trending familiarity preference for the featurally-changed faces because only local details (the eyes) had changed. Variability in people's eyes is something with which they already have experience. By contrast, the novelty preference to configurally-changed faces may have arisen because "squashed" and "stretched" faces were extremely novel to them. We hypothesize that the configural-changes were so unexpected that they attracted the TD infants' longer attention more than changing the shape of the eyes. This hypothesis fits with theories from the face-processing literature: it has been hypothesized that the more discrepant a stimulus is from the observer's state of knowledge (i.e., from their internal template of face stimuli), the more novel it is to the observer and the more likely it is to elicit a novelty preference (Dember and Earl, 1957; Berlyne, 1960; McCall and McGhee, 1977). In other words, if something is completely new and unknown, it attracts a relatively high level of attention. This would explain why a novelty preference emerged in the Configural condition and a trending familiarity preference was demonstrated in the Featural condition in TD controls.

Whatever the mechanism turns out to be, it is important to note that the TD controls were more sensitive to the configural changes than the featural changes. Furthermore, when we compare the findings from the TD controls with those from the WS group, it suggests that infants/toddlers with WS not only fail to notice configural changes but also that they process featural information atypically. This is because, unlike the controls, the participants with WS showed a novelty preference to the featurally-changed face. In other words, both featural and configural processing of faces is atypical already in infancy in WS.

There are several potential limitations to the study, which will be tested in future research. As mentioned, in this infant study we opted for schematic rather than real faces for several important reasons. First, it has already been shown that infants' gaze behaviors to naturalistic faces do not differ from their behaviors to schematic faces (e.g., Farroni et al., 2005). For instance, Farroni et al. (2005) found that infants look longer at upright faces than at inverted faces, as a function of contrast polarity *irrespective of whether the face stimuli were schematic or naturalistic*. Second, several studies have shown that the mechanisms involved in processing schematic faces are the same as those involved in processing naturalistic faces (see Johnson et al., 2015, for review). Our choice of schematic faces allowed us to control and manipulate their size and color contrasts, to make them as attractive as possible to infant participants. Additionally, although familiarization paradigms are frequently used in infancy research, one might have preferred a habituation paradigm allowing each infant to find her/his own time to fully encode the model face. However, habituation studies are more prone to subject loss than familiarization studies, and we were dealing with a rare syndrome where subject loss is critical. Moreover, as mentioned, since we used the same familiarized face throughout, all infants had ample time to encode the model face. Thus we opted for a familiarization study because of the rarity of WS and the difficulty in recruiting sufficient numbers of young infants. To our knowledge, this is indeed the first study to examine face processing in neurodevelopmental disorders at such a young age. Yet, to address fundamental questions in psychological theorizing in general, and in face processing in particular, it is crucial to trace developmental trajectories back to their origins in infancy.

Although further research is necessary, our study provides the first evidence that face processing atypicalities are already present very early in the developmental trajectory of individuals with WS. In other words, despite showing subsequent proficiency on standardized face processing tasks, infants/toddlers with WS do not process faces like TD young children. We have also demonstrated that while face processing is atypical in another neurodevelopmental disorder, DS, the two syndromes differ in their strategies and thus the findings with WS cannot be simply explained by low intelligence. In particular, our study highlights the importance of tracing socio-cognitive deficits from very early in development. Finally, our findings indicate that theorists can

### **References**


Berlyne, D. E. (1960). *Conflict, Arousal, and Curiosity*. New York, NY: McGraw-Hill.


no longer use the case of WS to support claims that evolution has endowed the human brain with an independent face-processing module.

### **Acknowledgments**

This work was funded for AKS by a Wellcome Trust Strategic Award (grant number: 098330/Z/12/Z) conferred upon The London Down Syndrome (LonDownS) Consortium, and also supported by the Williams Syndrome Foundation, the Down Syndrome Association, and Down Syndrome Education International. We should also like to thank the families who participated in our study, without whom none of this would have been possible.


adults. *Neurosci. Biobehav. Rev.* 50, 169–179. doi: 10.1016/j.neubiorev.2014. 10.009


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 D'Souza, Cole, Farran, Brown, Humphreys, Howard, Rodic, Dekker, D'Souza and Karmiloff-Smith. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Developmental Commonalities between Object and Face Recognition in Adolescence

Martin Jüttner<sup>1</sup> \*, Elley Wakui<sup>2</sup> , Dean Petters<sup>3</sup> and Jules Davidoff<sup>4</sup> \*

<sup>1</sup> Department of Psychology, School of Life and Health Sciences, Aston University, Birmingham, UK, <sup>2</sup> School of Psychology, University of East London, London, UK, <sup>3</sup> Department of Psychology, Birmingham City University, Birmingham, UK, <sup>4</sup> Department of Psychology, Goldsmiths, University of London, London, UK

#### Edited by:

Bozana Meinhardt-Injac, Johannes Gutenberg University Mainz, Germany

#### Reviewed by:

Natasha Sigala, University of Sussex, UK Charlie Frowd, University of Winchester, UK

#### \*Correspondence:

Martin Jüttner m.juttner@aston.ac.uk; Jules Davidoff j.davidoff@gold.ac.uk

#### Specialty section:

This article was submitted to Perception Science, a section of the journal Frontiers in Psychology

Received: 30 March 2015 Accepted: 04 March 2016 Published: 15 March 2016

#### Citation:

Jüttner M, Wakui E, Petters D and Davidoff J (2016) Developmental Commonalities between Object and Face Recognition in Adolescence. Front. Psychol. 7:385. doi: 10.3389/fpsyg.2016.00385 In the visual perception literature, the recognition of faces has often been contrasted with that of non-face objects, in terms of differences with regard to the role of parts, part relations and holistic processing. However, recent evidence from developmental studies has begun to blur this sharp distinction. We review evidence for a protracted development of object recognition that is reminiscent of the well-documented slow maturation observed for faces. The prolonged development manifests itself in a retarded processing of metric part relations as opposed to that of individual parts and offers surprising parallels to developmental accounts of face recognition, even though the interpretation of the data is less clear with regard to holistic processing. We conclude that such results might indicate functional commonalities between the mechanisms underlying the recognition of faces and non-face objects, which are modulated by different task requirements in the two stimulus domains.

#### Keywords: development, object recognition, face recognition, categorical, metric, part, configural, holistic

In the visual perception literature, the recognition of faces has often been contrasted with that of non-face objects. While object recognition has been characterized as being part-based (e.g., Biederman, 1987) the processing of faces has been described as being more holistic (e.g., Farah, 1996; Farah et al., 1998). The precise meaning of 'holistic' is a matter of debate (e.g., Maurer et al., 2002; Piepers and Robbins, 2012) but in its most extreme form it implies a representation of faces as undifferentiated wholes, or templates, which distinctly differs from the part-based representation postulated for objects. Such an assumed dichotomy between face and object recognition based on the nature of their putative representations has been particularly prominent in an early model by Farah (1996). It proposes that object and face perception are functionally independent and only share a stage of early visual processing. More recent variants of this model (e.g., McKone and Yovel, 2009; Piepers and Robbins, 2012) acknowledge the potential contribution of parts to the recognition of both objects and faces but continue to confine configural and holistic processing to face-like stimuli. In this paper we will discuss recent evidence from developmental studies that question Farah's view by highlighting the role of configural and holistic processing in non-face object recognition. We will review findings of that work and compare them with corresponding results in the – far more extensively studied – domain of face perception.

## CONFIGURAL OBJECT RECOGNITION

Configural processing can be broadly defined as "any phenomenon that involves perceiving relations between the features of a stimulus" (Maurer et al., 2002, p. 255). In the context of object recognition it is therefore equivalent to the processing of the relations that hold between the parts or components constituting an object. The importance of part relations has been highlighted in Biederman's influential Recognition-by-components (RBC) model (Biederman, 1987, 2000). According to this model complex objects are encoded as spatial arrangements, or configurations, of basic parts that come from a restricted reservoir of elementary shapes, the so-called geons. Geons are defined by categorical contour properties (like 'straight' vs. 'curved'). Similarly, the spatial configuration of geons is encoded in terms of certain categorical relations (like 'larger' vs. 'smaller,' or 'on top of' vs. 'besides'). Furthermore, Biederman contrasts coarse shape differences in terms of categorical properties with more subtle ones arising from variations of continuous, or metric, attributes. Again such attributes can be either part-specific (example: a part's aspect ratio) or part-relational (example: the distance between two parts).

Numerous studies on object processing by children and infants have been inspired by the RBC model. Most have focussed on the status of individual parts. Here parts have been shown to receive particular attention in the analysis of shape similarity (e.g., Tversky and Hemenway, 1984; Saiki and Hummel, 1996) or when categorizing or matching objects (e.g., Madole and Cohen, 1995; Smith et al., 1996). Whether the early primacy of parts in visual processing reflects a peculiar status of geons has, however, remained more contentious (cf. Abecassis et al., 2001; but note Haaf et al., 2003).

Unlike for parts, until recently relatively few studies considered the processing of part relations within the RBC framework. Mash (2006) examined similarity judgements of novel objects consisting of two parts; one of these parts was manipulated in terms of its cross-section (i.e., at part-specific level) and its location relative to the second (i.e., at partrelational level). Young children were found to have a strong bias for classifying objects on the basis of part-specific information only. With increasing age, participants came to use both partspecific and part-relational information in their classification judgements.

Jüttner et al. (2013) asked children aged 7–16 years and adults to judge the correct appearance of familiar animals and artifacts that had been manipulated either in terms of individual parts (for example, by exchanging the head of an animal against that of another animal) or part relations (here: relative size; for example, by changing an animal's proportions). Both types of manipulation were always calibrated for equal difficulty in adult observers. When detecting part changes even the youngest children performed close to adult levels. By contrast, it was not until 11–12 years that they achieved similar levels of performance with regard to relative size changes, i.e., altered part relations. The developmental dissociation between part-relational and part-specific processing was the same for both types of stimuli thus generalizing similar observations by Davidoff and Roberson (2002). In a further experiment, Jüttner et al. (2013) demonstrated that this dissociation only applied to the recognition of metric changes, not to those at categorical level. They used a set of novel multi-part objects, which permitted precisely controlled manipulations of parts and part relations at categorical or metric level, as defined within the RBC framework. Participants were first trained to associate the novel objects with labels (here: numbers). As in the experiments involving animals and artifacts they then had to judge the correct appearance of these objects when manipulated at part-specific level or that of part relations (here again: the object's proportions, i.e., the relative size of its parts). For metric manipulations of an object's proportions, recognition accuracy showed a similarly protracted development as in the case of animals and artifacts. By contrast, no such retardation was observed in the case of categorical changes of an object's proportions.

Using similar stimuli (**Figure 1A**), Jüttner et al. (2014) generalized these findings to the attribute relative position, the second core relational attribute of RBC. Again, even the youngest tested children performed similarly to adults when recognizing categorical changes of individual parts and relative part position (**Figures 1B,C**). By contrast, performance for detecting metric changes of relative part position was distinctly reduced in young children compared to recognizing metric changes of individual parts (**Figures 1D,E**). A similarly late maturation for the processing of metric positional information has been observed in the context of other work involving faces and objects (e.g., Mondloch et al., 2002, 2004; Jüttner et al., 2006; Mash, 2006; Robbins et al., 2011). It has been proposed that the retardation might reflect late developing general perceptual mechanisms (e.g., Crookes and McKone, 2009; Robbins et al., 2011). However, as demonstrated by Davidoff and Roberson (2002) in control experiments involving a paired-comparison task, children's inability to use part-relations for object recognition cannot be attributed to reduced perceptual discrimination skills. Thus, the reduced sensitivity to metric part-relational information appears to reflect a fundamental limitation concerning the way objects are represented in the memory of the developing mind.

The problems young children have with the detection of subtle positional changes of object parts are reminiscent of the welldocumented difficulty they have when assessing spatial relations of facial features. Here it has been shown that children's sensitivity to detect manipulations of the distances between cardinal features (like eyes, nose and mouth) continues to improve until at least 14 years (Carey et al., 1980; Mondloch et al., 2002; de Heering and Schiltz, 2013). Such processing of spatial relations – also known as second-order processing – can be contrasted with the coarse assessment of the basic spatial layout of facial features – their so-called first-order relations. The sensitivity to the latter develops much earlier and may already be present in newborns (e.g., Goren et al., 1975; Johnson et al., 1991). On this basis it is tempting to draw a parallel between the developmental dissociation for first- and second-order relational processing of

two distracters (middle and bottom row). The distracters differed from the target in terms of either a categorical part change (left) or a categorical, configural change of relative part position (right). Participants had to choose the correct depiction of the previously learnt object. (C) Mean recognition accuracies as a function of age for the categorical part change and categorical configural change condition. (D) As in (B) but examples show multi-part objects used to compare recognition performance for manipulations of parts and part relations at metric level. (E) Mean recognition accuracies as a function of age for the metric part change and metric configural change condition. Error bars are standard errors. The dashed line at 0.33 indicates chance level.

facial features on the one hand, and categorical and metric part-relational processing for non-face objects on the other. We will return to this possibility in the final section of our review.

## HOLISTIC OBJECT RECOGNITION

Image-based models of object recognition have been proposed in various forms (e.g., Ullman, 1989; Tarr and Bülthoff, 1995;

Riesenhuber and Poggio, 1999). However, they generally assume "holistic" object representations that are "all-in-one" or viewlike, and where object features are represented in a quasipictorial, two-dimensional coordinate system. While imagebased accounts originally were presented as alternative to structural, part-based approaches, later evidence from behavioral (e.g., Hummel, 2001; Hayward, 2003; Thoma et al., 2004) and neuroimaging (e.g., Vuilleumier et al., 2002; Thoma and Henson, 2011) studies suggests that structural and image-based representations might co-exist in the visual system. This idea has been most comprehensively formulated in Hummel's (2001) dual-route model. It proposes that objects are processed in two different formats – analytic and holistic – that are combined into a hybrid representation in long-term memory. The analytic pathway involves explicit structural descriptions, employing a dynamic, attention-driven binding mechanism that operates on an object's parts and their relations – similar to Biederman's RBC model. By contrast, the holistic pathway is view-like and involves a static, attention-independent binding of an object's local shape features via their relative location in a so-called surface-map. The surface map preserves topological relations of these features, resulting in a template-like, holistic representation.

So far, research assessing the relative extent to which the holistic and analytic route contribute to object recognition in children has been scarce. In the one known study, Wakui et al. (2013) tested holistic and analytic recognition performance for everyday objects in 7- to 12-year-old children and adults. They used a repetition priming paradigm that involved two briefly presented prime stimuli: one attended and the other ignored. Priming was assessed in terms of the facilitation for naming a subsequently presented probe stimulus. According to the dualroute model, holistic priming should in principle be observed both for the attended and the ignored prime stimulus. However, given the view-like object representation used by the holistic route the priming should critically depend on the pictorial identity of prime and probe. By contrast, analytic priming should result only from the attended prime stimulus. Due to the more abstract object format implied by the analytic route, such priming should tolerate image differences between probe and prime as long as those permit at least a partial matching of the underlying structural representations. In Wakui et al.'s (2013) study, adults showed both holistic and analytic priming, in accordance with previous work (e.g., Stankiewicz et al., 1998; Thoma et al., 2004, 2007). By contrast, the data for children only demonstrated analytic but no holistic priming, suggesting a developmental primacy for part-based over holistic object recognition.

A few other studies have assessed children's ability for holistic object perception by employing paradigms more typically used to test holistic processing of faces. Cassia et al. (2009) compared composite effects for faces and non-face objects in 3- to 5 year-old children and adults. The study involved a matching task between composites constructed from the top and bottom halves of faces and non-face stimuli (here: frontal images of cars). A composite effect, suggestive of holistic processing and indicated by an impaired matching performance when the stimulus halves were spatially aligned relative to a condition when they were not aligned, was found for faces in children as young as 3 years. By contrast, no evidence of holistic processing was observed for non-face objects in any of the tested age groups. Meinhardt-Injac et al. (2014) used a context congruency paradigm to compare the processing of faces and non-faces (here: watch faces) in children aged 8–16 years and adults. For both types of stimuli, observers had to make a same/different judgment regarding the internal features of two test stimuli while their (unattended) external features differed in terms of congruency – they could either agree or disagree. With increasing age task performance improved more slowly for faces than non-face objects. However, holistic processing, as assessed by the impact of context congruency, was only observed for faces but not for watches.

The interpretation of the findings of Cassia et al. (2009) and Meinhardt-Injac et al. (2014) is complicated by the fact that for non-face stimuli no holistic processing was observed in adult observers. A possible explanation could be the requirement of structural long-term representations for holistic effects to become manifest (Davidoff and Donnelly, 1990; Donnelly and Davidoff, 1999). In the absence of such representations, as might be the case in non-experts for clock faces (Meinhardt-Injac et al., 2014) and fronts of cars (Cassia et al., 2009), adults – and children – may have predominantly relied on part-based information to perform the tasks.

Despite such methodological challenges, the current evidence suggests that holistic processing develops distinctly earlier for faces than objects. For the former, such processing has been reported for children as young as 4 (e.g., Carey and Diamond, 1994; Tanaka et al., 1998; Pellicano and Rhodes, 2003; de Heering et al., 2007; Cassia et al., 2009) even though its maturational progression remains controversial (e.g., Crookes and McKone, 2009; but note Schwarzer et al., 2010). For non-face objects, holistic processing so far has only been reported in adults; for children, this kind of processing appears not to emerge before late adolescence.

### TOWARD A COMMON FRAMEWORK FOR THE PROCESSING OF FACES AND NON-FACE OBJECTS

In this review we have discussed recent findings regarding configural and holistic object processing that suggest a more intricate relationship between the perception of objects and faces than previously postulated. As outlined in the introduction, over the last two decades the notion of a quasi-dichotomy of object and face perception, illustrated in **Figure 2A** by Farah's (1996) early model, has given way to more differentiated accounts. These acknowledge the potential contribution of parts to recognition in both stimulus domains, as demonstrated by the model of Piepers and Robbins (2012) in **Figure 2B**. Based on the evidence presented in the preceding sections we propose that this relationship may be even closer. Combining elements of Hummel's (2001) dual route model with those of the holistic/part-based account of Piepers and Robbins, **Figure 2C** shows the first sketch of a new, common framework for the processing of faces and objects.

The proposed framework comprises two parallel pathways: (1) a part-based route which in the case of objects encompasses a structural (analytical) description of parts and part relations at categorical level, in the case of faces a representation of the first-order relations of facial features; (2) a view-based route which both for objects and faces includes a metric, templatelike representation supporting holistic processing. It is further assumed that part-based and view-based route interact and

support each other. For example, part-based information may affect view-based processing as illustrated by the impact of feature shape on holistic face perception (McKone and Yovel, 2009). Conversely, view-based representations may augment part-based descriptions, facilitating the metric processing of parts and their relations in the case of objects, and second-order configural processing in the case of faces. At the level of holistic processing of objects and faces, such facilitation may underlie the part-whole effect, i.e., the superior identification performance for a part shown in the context of a complete stimulus than when shown in isolation (Davidoff and Donnelly, 1990; Tanaka and Farah, 1993; Donnelly and Davidoff, 1999).

Both object and face recognition are assumed to show a developmental transition from a coarse, categorical representation based on parts and their relations to a dual format that is augmented by a metric, view-based representation. However, the developmental trajectory of this transition differs between the two stimulus domains – possibly driven by different task demands: subordinate identification in the case of faces, basic-level recognition in the case of objects. For faces, categorical representations accounting for the very early, if not innate, sensitivity to first-order relations of facial features may soon be augmented by a view-based representation facilitating an onset of holistic face perception in early infancy (e.g., Tanaka et al., 1998; Cassia et al., 2009). By contrast, for non-face objects a categorical representation based on parts and their relations may remain the preferred format until late adolescence. This is suggested by part-primacy effects found in children for categorization and similarity judgements (e.g., Madole and Cohen, 1995; Smith et al., 1996) as well as the early maturation of categorical part-relational processing (Jüttner et al., 2013, 2014). For both stimulus classes, the spatial precision of view-based representations may improve throughout adolescence. The prolonged maturation for metric configural and holistic processing observed for faces (e.g., Schwarzer et al., 2010; Kadosh, 2012; Meinhardt-Injac et al., 2014) and objects (Jüttner et al., 2013, 2014; Wakui et al., 2013) supports this view.

The dual-route framework shown in **Figure 2C** does not necessarily argue for a neuro-functional isomorphism of face and object recognition. A category specificity for faces and objects in the adult brain could in principle imply separate dual representations within the well-established functional core regions of the respective stimulus domain, like the fusiform face

### REFERENCES


area (FFA) and the occipital face area (OFA) in the case of faces, and the lateral occipital complex (LOC) in the case of objects. However, recent evidence from developmental neuroimaging studies also raises the possibility that the processing routes for faces and objects may overlap. In particular the developmental trajectory of face specificity within the fusiform gyrus continues to be controversial. While a few studies have reported a mature activation of the FFA in children as young as 4 years (Pelphrey et al., 2009; Cantlon et al., 2011) the majority observed significant developmental changes through mid and late adolescence (e.g., Gathers et al., 2004; Golarai et al., 2007; Scherf et al., 2007, 2011; Haist et al., 2013). Thus, the face specificity of the FFA may emerge gradually as a consequence of the particular task demands of face identification (cf. also Scherf et al., 2011), leaving room for a potentially shared processing of faces and objects in categorization tasks at basic level.

Conceptually, such a partially shared processing of faces and objects could be placed at the structural encoding stage of Bruce and Young's (1986) classical model of face perception. According to Bruce and Young's original account this stage encompasses part-specific and part-relational processing as well as the (basiclevel) classification of a stimulus as a face. Based on the evidence presented in this review we propose that it might be better described in terms of our dual-route framework, and underlie the basic-level categorization of both faces and objects. Information from that stage might then feed into separate, domain-specific modules that accommodate the different requirements of face and object recognition at subordinate level. Future work will need to further clarify the relative contribution of the two routes in our framework across tasks and stimulus domains, as well as their neurological basis.

### AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

### ACKNOWLEDGMENTS

This study was supported by the ESRC (grant RES-062-0167), and by the Heidehofstiftung (grant 50302.01/4.10).




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Jüttner, Wakui, Petters and Davidoff. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# **On the particular vulnerability of face recognition to aging: a review of three hypotheses**

*Isabelle Boutet <sup>1</sup> , Vanessa Taler 1,2 and Charles A. Collin <sup>1</sup> \**

*<sup>1</sup> School of Psychology, University of Ottawa, Ottawa, ON, Canada, <sup>2</sup> School of Psychology, Bruyère Research Institute, Ottawa ON, Canada*

Age-related face recognition deficits are characterized by high false alarms to unfamiliar faces, are not as pronounced for other complex stimuli, and are only partially related to general age-related impairments in cognition. This paper reviews some of the underlying processes likely to be implicated in theses deficits by focusing on areas where contradictions abound as a means to highlight avenues for future research. Research pertaining to the three following hypotheses is presented: (i) perceptual deterioration, (ii) encoding of configural information, and (iii) difficulties in recollecting contextual information. The evidence surveyed provides support for the idea that all three factors are likely to contribute, under certain conditions, to the deficits in face recognition seen in older adults. We discuss how these different factors might interact in the context of a generic framework of the different stages implicated in face recognition. Several

#### *Edited by:*

*Bozana Meinhardt-Injac, Johannes Gutenberg University Mainz, Germany*

#### *Reviewed by:*

*Guillaume A. Rousselet, University of Glasgow, UK Assaf Harel, Wright State University, USA*

#### *\*Correspondence:*

*Isabelle Boutet, School of Psychology, University of Ottawa, 136 Jean-Jacques Lussier, Ottawa, ON K1N 6N5, Canada iboutet@uottawa.ca*

#### *Specialty section:*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

*Received: 30 April 2015 Accepted: 22 July 2015 Published: 21 August 2015*

#### *Citation:*

*Boutet I, Taler V and Collin CA (2015) On the particular vulnerability of face recognition to aging: a review of three hypotheses. Front. Psychol. 6:1139. doi: 10.3389/fpsyg.2015.01139* suggestions for future investigations are outlined.

#### **Keywords: face recognition, aging, contrast sensitivity, familiarity, context recollection**

### **Introduction**

If you ask a layperson whether faces are special, most would not hesitate to answer "*yes*." Indeed, one does not have to be well versed in the intricacies of visual information processing to appreciate that faces carry a wealth of information relevant for social interactions: information about the emotional status of others, the locus of attention (i.e., via gaze direction), gender, ethnic identity, age, etc. But for most people, the specialness of faces is experienced in reference to the crucial role they play in defining an individual's identity. Indeed, the keen sense of identity derived from faces is well illustrated by striking examples of individuals who have had to acquaint themselves with a new identity following gross injuries to the face (e.g., Furr et al., 2007), by the bizarre experience elicited by faces whose spatial elements are denaturalized (e.g., Thompson, 1980; **Figure 1**), as well as by the profound impact that prosopagnosia, the inability to recognize faces, has on those affected as well as their relatives and friends (Yardley et al., 2008).

In the fields of visual perception and cognition, the question of whether there are unique or special visual mechanisms for processing the identity of a face is the topic of considerable scientific debate<sup>1</sup> (e.g., Damasio et al., 1982; Diamond and Carey, 1986; Gauthier et al., 1999; McKone and Robbins, 2011). Despite ongoing controversy, the layperson's intuition that faces are special is supported by empirical observations (reviewed by Haxby et al., 2002; Maurer et al., 2002; McKone and Robbins, 2011). First, faces are unique in the sense that they are the only homogeneous stimulus category for which most humans have developed expertise in distinguishing individual members at the

<sup>1</sup>There is considerable debate regarding whether a dedicated system exists for faces, or whether faces are simply an example of expert object recognition, and we direct the reader to other sources for more details on this issue (see, e.g., Gauthier et al., 2003; Wallis, 2013, for alternative views).

image is inverted.

subordinate level on a daily basis. Second, faces are unique because their recognition is more severely affected by certain manipulations, a finding that has been attributed to a specialized processing style tailored to the idiosyncratic properties of faces. Finally, faces are unique in that a network of brain areas preferentially activated by faces has been identified.

Multiple studies have reported that older adults have difficulty recognizing faces using a variety of experimental paradigms and stimulus formats. Experimental paradigms have included delayed matching-to-sample with various inspection times (e.g., Smith and Winograd, 1978; Grady et al., 1994; Boutet and Faubert, 2006; Habak et al., 2008; Hildebrandt et al., 2010, 2011, 2013; Obermeyer et al., 2012; Konar et al., 2013), delayed nonmatching to sample (Crook and Larrabee, 1992), simultaneous and sequential matching (e.g., Owsley et al., 1981; Chaby et al., 2011), yes/no recognition tests (e.g., Bartlett and Fulton, 1991; Searcy et al., 1999; Boutet and Faubert, 2006; Edmonds et al., 2012; Hildebrandt et al., 2013), as well as naming tasks (e.g., Maylor and Valentine, 1992; Lott et al., 2005). Stimulus formats have included line-drawn faces (Bartlett et al., 1989, 1991), face pictures (e.g., Boutet and Faubert, 2006; Chaby et al., 2011; Edmonds et al., 2012; Obermeyer et al., 2012; Konar et al., 2013), and socalled Mooney faces where color and grayscale information is dichotomized into white or black pixels (Carbon et al., 2013). Age-associated recognition deficits have been reported for test faces presented in a same (all studies presented above) as well as different views (Habak et al., 2008; Meinhardt-Injac et al., 2014).

Three features of these face recognition deficits are particularly noteworthy. First, studies that have employed yes/no recognition paradigms indicate that this age-related deficit arises primarily from older adults having difficulty rejecting unfamiliar faces, with their ability to correctly recognize familiar faces being comparable to that of younger adults (for a review, see by Searcy et al., 1999). Second, age differences are more pronounced for faces than for other comparable recognition tasks including individual recognition of other objects (chairs and houses: Boutet and

**discriminations that are equivalent for faces (A) and other complex objects (B,C).**

Faubert, 2006; watches: Meinhardt-Injac et al., 2014; see **Figure 2**, for examples) as well as recognition of inverted faces (Grady et al., 1994; Boutet and Faubert, 2006; Chaby et al., 2011, Experiment I; but see Obermeyer et al., 2012). The finding for inverted faces is particularly relevant to our discussion because inverted faces contain the same low-level information as upright ones. Indeed, finding larger differences between younger and older participants for upright relative to inverted faces suggests the implication of higher-order processes involved in normal face recognition *per se*, rather than information demands or lower-level processes. Third, general impairments in cognitive function and object recognition do not completely account for age-related face recognition deficits (Hildebrandt et al., 2011), suggesting that in addition to general functioning impairments (in memory, for example), face-specific factors must also be implicated. The latter two sets of results suggest that faces may also be special in the sense that their recognition appears to be more vulnerable to the aging process than that of other object categories.

The notion that older adults may have particular difficulty recognizing faces may not come as a surprise to those who interact with them on a daily basis: alongside word finding difficulties, trouble with face recognition is one of the most commonly reported complaints in this population (e.g., Chaby and Narme, 2009). What is surprising is that despite multiple investigations into possible underlying mechanisms, researchers have yet to provide an account of the deficit that reconciles the extant literature on normal cognitive aging, the unique mechanisms involved in the face processing, and the nature of age-related deficits in face recognition. The variability in the procedures employed to test face recognition deficits, and in the way that aging affects older individuals, further complicates our understanding of this issue. This paper will examine three hypotheses that have been proposed to explain these deficits, as well as relevant evidence and commentary. The presentation is couched within a generic framework of the different processing stages that are likely to be associated with face recognition and our discussion begins with a brief presentation of this framework. We then review evidence that pertains to each of three hypotheses separately and link them to the different stages proposed in the framework. Our intent is not to provide an exhaustive review of the literature on the impact of aging on face processing but to highlight promising explanations as well as define areas where more research is needed. We also endeavor to demonstrate that investigations into the factors that account for age-related face recognition deficits provide a unique opportunity to advance our understanding of both face-specific processes and aging in general.

### **Organizing Framework**

Several models of face recognition exist (e.g., Bruce and Young, 1986; Burton et al., 1990; Biederman and Kalocsai, 1997; Riesenhuber and Poggio, 1999; Haxby et al., 2002; Ishai et al., 2005; Wallis, 2013) and our goal is not to integrate theses various models but rather to present a generic organizing context for the review. This framework is inspired by the seminal model of Bruce and Young (1986) but also borrows relevant elements from other models. The framework focuses on aspects of recognition that pertain to the identity of the face and the person it belongs to. As such, it bypasses the identification of a face as a face because our focus here is not on object categorization but rather on recognition of individual faces.

The framework begins with the perceptual analysis of the visual information present in a face for which a recognition judgment has to be made. The perceived face will be analyzed using increasingly complex levels of information that will eventually lead to the formation of a representation. This hierarchical process parallels the increasingly complex response properties of cells along the visual pathway. It is beyond the scope of this presentation to speculate on the exact nature of face representations and we will only mention two types of information that are relevant for this review. The first type of information likely to be extracted from a perceived face is low-level spatial information that corresponds to the filtering properties of cells in early visual areas such as the LGN and primary visual cortex. Second, faces will be decoded on the basis of the information necessary to discriminate this highly homogeneous stimulus category at the individual level. Even though there is an ongoing controversy regarding exact nature of this information, there is substantial evidence that the recognition of individual faces relies on the processing of *configural* information (see reviews by Maurer et al., 2002; McKone and Robbins, 2011). According to Maurer et al. (2002), configural information can refer either to the first-order relations that specify the basic configuration shared by all faces, or to second-order relations (i.e., spatial distances) between facial features, such as the separation between the eyes or between the mouth and the nose. Finally, configural information can refer to holistic information, meaning that when a face is processed, it is as a whole or a Gestalt. We will focus on second-order relations and holistic information in this paper because they are the hallmark of the specialized processing style purported to be associated with face recognition. While different terminology exists in the literature, we will use the term configural when referring to both the secondorder and holistic information utilized during face recognition. It is important to note that the exact nature of the information used during face recognition is the topic of considerable debate (e.g., Taschereau-Dumouchel et al., 2010; Wallis, 2013; Xu et al., 2014). In particular, the experimental manipulations used to tap into the processing of second-order relations have been heavily criticized for their lack of realism (Taschereau-Dumouchel et al., 2010). Furthermore, alternative explanations have been put forth to account for some of the findings cited as evidence for holistic face encoding (Xu et al., 2014). Despite this controversy, several studies have examined age-associated face recognition deficits using these same experimental manipulations because of their longstanding presence in the literature on face processing. We are adopting the terms configural, second-order relations, and holistic encoding in this paper to respect the original content of the studies we are reviewing.

Once a representation of the perceived face is formed, it is compared to stored representations for evaluation of the degree of resemblance. If there is a match, a feeling of familiarity will arise. Here, the framework posits that familiarity and recollection can be separated (e.g., Bartlett et al., 2009; Edmonds et al., 2012) in the final decision process. Recognition based on feelings of familiarity does not require the retrieval of additional information regarding the person the face belongs to, the context under which the face was previously encountered, etc. In contrast, a feeling of familiarity might lead to a search for, and retrieval of this information, if the face is indeed known (recollection).

In this review, we present three hypotheses that have been put forth to explain age-related deficits in face recognition. The first hypothesis focuses on low-level processes by stipulating that age-related face recognition deficits are attributable to perceptual deterioration. The second hypothesis focuses on midlevel processes by suggesting that older adults have difficulty recognizing faces because of a deficit in encoding configural information. In the context of the framework described above, both perceptual deterioration and impaired configural encoding would result in impoverished representations of perceived faces and lead to confusion when comparing perceived to stored representations. Finally, the last hypothesis focuses on later stages of the processing stream by stating that older adults have difficulty accessing the contextual information needed to correctly decide whether a face has actually been previously encountered or only feels familiar. We selected these three hypotheses because (i) they can account for different characteristics of the face recognition deficits seen in older adults, (ii) they map onto the different stages presented in the proposed framework, and (iii) they are promising both in terms of their plausibility and of their potential to generate further research. A detailed review of the studies conducted to investigate each of these hypotheses follows.

### **Perceptual Deterioration**

Impairments in basic sensory abilities have been reported in older adults across all modalities and it has been suggested that perceptual deterioration is a major determinant of age-related cognitive impairments (e.g., Sekuler et al., 1980; Lindenberger and Baltes, 1997; Schneider and Pichora-Fuller, 2000). Past research linking perceptual deterioration with face recognition has focused on visual loss (e.g., Dulin et al., 2011 Wallis et al., 2014), reduced visual acuity (e.g., Tejeria et al., 2002), and reduced contrast sensitivity (e.g., Owsley et al., 1981). Studies investigating the impact of visual loss on face recognition suggest that patients with foveal loss (Wallis et al., 2014), severe peripheral loss (Dulin et al., 2011), unstable fixations (Wallis et al., 2014), age-related macular degeneration (Barnes et al., 2011), and central scotomas (Dulin et al., 2011) display poorer face recognition than controls. It should be noted that these studies may have limited implications for the age-related face recognition deficits that are the focus of this review because participants with self-reported pathological conditions were usually excluded. Furthermore, studies on visual loss are limited by the heterogeneous pathological profiles of the participants, making it difficult to reach generalizable conclusions for the non-pathological aging population. Nonetheless, the finding that visual loss negatively impacts face recognition highlights the need for formal screening of pathological conditions when testing older adults on face recognition tasks.

Studies that have examined the relationship between acuity and face processing using regression analysis in older individuals have yielded inconsistent results (reviewed by Barnes et al., 2011). The results of several studies (e.g., Tejeria et al., 2002; West et al., 2002; Lott et al., 2005; Barnes et al., 2011) indicate low to moderate statistically significant correlations between performance on face recognition tasks and visual acuity. However, Barnes et al. (2011) reported that differences in face identification between younger and older adults disappeared after adjusting for acuity. Tejeria et al. (2002) found that the use of a magnification device improved face recognition abilities in patients with age-related macular degeneration, suggesting that acuity is a determining factor. However, whether these results can be generalized to participants with normal vision remains to be determined. On theoretical grounds, we suggest that reduced acuity is unlikely to contribute significantly to face recognition impairments *per se* because acuity measurements assess only the upper limit of the contrast sensitivity function, and face recognition has been shown to primarily rely on a band of middle relative spatial frequencies, which lie in the middle of the contrast sensitivity function at most common face viewing distances (e.g., Costen et al., 1996; Gold et al., 1999; Näsänen, 1999; Boutet et al., 2003; Collin et al., 2006; Keil, 2009).

Others have suggested that reduced contrast sensitivity (Norton et al., 2009) impairs face recognition in older individuals. Several of the studies cited in the previous paragraph also included contrast sensitivity in their regression models and, as was the case for acuity, found low to moderate statistically significant correlations between contrast sensitivity and face recognition (Lott et al., 2005; but see also Tejeria et al., 2002). Barnes et al. (2011) have also found that differences in face identification performance between younger and older adults disappeared after adjusting for contrast sensitivity. Finally, Lott et al. (2005) reported contradictory findings whereby contrast detection and face recognition were not significantly correlated. Furthermore, contrast sensitivity did not explain more variance in face recognition than age and high-contrast acuity alone.

Owsley et al. (1981) provided a compelling demonstration of a link between reduced contrast sensitivity and face recognition deficits in older adults. In their study, contrast thresholds were measured by asking participants to adjust the contrast of pairs of faces until they could discriminate them. Older participants required significantly higher contrasts to perform the task. Additional results indicated that pairs of faces are equally discriminable by older and younger adults when the faces shown to the older adults are doubled in contrast. This study provides convincing evidence that a decline in contrast sensitivity impedes face recognition in older adults. Using a similar adjustment technique as in Owsley et al. (1981), Owsley and Sloane (1987) demonstrated that reduced contrast sensitivity can account for deficits in processing a variety of real world objects, suggesting that the link between contrast sensitivity and recognition deficits may not be unique to faces.

The above-mentioned limitations notwithstanding, the contribution of low-level visual perception differences to agerelated face recognition deficits warrants further investigation. For example, it would be interesting to examine whether older adults rely on the same spatial frequency information as younger adults during face processing. Studies conducted with young adults have shown that face recognition depends on a narrow critical band of relative spatial frequencies in the middle range (e.g., Costen et al., 1996; Gold et al., 1999; Näsänen, 1999; Boutet et al., 2003; Collin et al., 2006; Keil, 2009). It is possible that older adults' reduced contrast sensitivity for this range as well as for the higher range leads them to rely on lower spatial frequencies during face discrimination tasks. This compensatory mechanism could yield impairments in face recognition because the observers would not be making their judgment on the basis of the band most useful for the task at hand. In addition, such reliance on low spatial frequencies would be most pronounced for faces because object recognition is quite robust to variations in spatial frequency content (e.g., Biederman and Kalocsai, 1997; Collin and McMullen, 2005; Collin et al., 2012). Thus, the low spatial frequencies for which contrast sensitivity is relatively intact in older adults would suffice for object recognition but not face recognition. Another avenue of research would be to mimic the loss in contrast sensitivity associated with aging by presenting younger adults with faces that have been filtered in such a way as to reflect the perceptual experience of older adults. Finding that younger adults display similar impairments in face recognition, and similar brain activation, under such impoverished conditions would provide powerful evidence for the hypothesis that spatial vision loss contributes significantly to face recognition deficits in older adults.

In every day life, it is likely that a host of perceptual problems are actually implicated in the common complaint of face recognition deficits in older adults. However, considering the heterogeneity of the functional deficits in vision that arise with aging, it is important that the factors that are more generalizable in this population, such as changes in the contrast sensitivity function, be dissociated from pathological conditions such as macular degeneration. While we have mapped the perceptual deterioration hypothesis to the first step in the face recognition stream of processing, it should be noted that the way in which perceptual deterioration contributes to cognitive deficits in aging, and whether the former causes the latter, remains to be determined (e.g., Schneider and Pichora-Fuller, 2000). Investigations of agerelated deficits in face recognition might actually serve as a model to shed light on this issue.

### **Impaired Processing of Configural Information**

We began this paper by discussing the special role that faces play in social interactions and visual information processing. We also stated that face recognition appears to be particularly vulnerable to the aging process. Indeed, past research suggests that face recognition deficits are only partially related to other more general impairments (Hildebrandt et al., 2011) and are more pronounced for faces than for other complex stimuli even when equivalent identity-related tasks with comparable performance are employed (Grady et al., 1994; Boutet and Faubert, 2006; Chaby et al., 2011, Experiment I; Meinhardt-Injac et al., 2014). Although these findings need to be replicated, the emerging pattern is that of a special vulnerability for face processing in aging, rendering explanations based solely on global aging mechanisms less tenable. As a result, several researchers have proposed explanations tailored to the processes involved in face recognition. Even though the exact nature of the information employed during face recognition remains a topic of considerable debate, there is substantial evidence that the recognition of individual faces relies on a processing style specialized to deal with the idiosyncratic properties of this task (see, reviews by Maurer et al., 2002; McKone and Robbins, 2011; but see also, e.g., Taschereau-Dumouchel et al., 2010; Wallis, 2013; Xu et al., 2014, for alternative views). More specifically, face recognition appears to rely on configural information. In contrast, object recognition appears to rely more heavily on information about distinctive features and first-order relations, even when comparable withincategory tasks are used (e.g., Yin, 1969; Tanaka and Farah, 1993; Maurer et al., 2002).

Combining the idea that faces are processed on the basis of configural information with the finding that face recognition appears to be particularly vulnerable to aging has led researchers to test the hypothesis that age-related deficits in face recognition arise from a failure to encode configural information in this population. While a variety of experimental tests have been used to investigate this hypothesis, our review focuses on those tests that best capture the link between configural information processing and face recognition in aging (see Murray et al., 2010; Carbon et al., 2013, for other tests of configural processing in older adults).

The face inversion effect (FIE) provided one of the first suggestions that faces are encoded using a specialized processing style. The FIE refers to the finding that face recognition is more severely affected by inversion than recognition of other complex objects (Yin, 1969). Despite some controversy, the detrimental impact of inversion on face recognition is generally thought to arise from difficulties in encoding configural information in inverted faces (Farah et al., 1998; Rossion, 2009; but see also, e.g., Gauthier and Logothetis, 2000; Sekuler et al., 2004, for alternative views). Several studies have examined the FIE in older adults. First, Boutet and Faubert (2006) failed to find a difference between older and younger adults on the FIE in two separate experiments using two different non-face object categories and two different tasks with different mnemonic demands. Their results were partially replicated by Hildebrandt et al. (2010) with a large sample (*n* = 151) of older adults. They found that inversion impedes recognition of faces equally in younger and older adults.

Whereas these findings suggest that analysis of configural information is intact in older adults, other studies have produced contradictory results. Particularly noteworthy are the behavioral studies by Chaby et al. (2011) and Obermeyer et al. (2012), as well as the event-related potential (ERP) studies by Gao et al. (2009) and Daniel and Bentin (2012). Starting with the behavioral studies, both Chaby et al. (2011) and Obermeyer et al. (2012) reported a reduced effect of inversion on face recognition in older adults as compared to younger adults. However, these results should be interpreted with caution because a floor effect seems to exist in the results of the study byChaby et al.(2011; **Figure 2**) and because the otherwise reliable effect of aging on false alarms was not replicated in Obermeyer et al. (2012). Furthermore, visual inspection of the results in the Obermeyer et al. (2012) study suggests that older adults' recognition was impaired by inversion but that the omnibus analysis they performed failed to detect such an effect. Unfortunately, they do not report the necessary *post hoc* statistical analyses to verify whether or not the FIE was significantly different across the two age groups.

It is important to note that none of these studies, with the exception of Boutet and Faubert (2006), included an equivalent non-face recognition task (see also Meinhardt-Injac et al., 2014). The original demonstration of the FIE included such a condition, and it is the finding that *inversion has a greater impact on faces than on other objects* that allows for the conclusion that face recognition relies on a specialized processing type. The finding that inversion has a greater impact on face recognition in younger adults relative to older adults might be due to: (i) a failure to encode configural information in older adults, or (ii) to difficulties dealing with the more complex and ecologically invalid task of recognizing inverted faces in this population. Unfortunately, it is impossible to choose between these alternative interpretations without the inclusion of a non-face category condition as part of the study design.

Two ERP studies that have included inverted faces in their protocol (Gao et al., 2009; Daniel and Bentin, 2012) are relevant to the present discussion. These studies took advantage of the N170 effect to investigate face processing in older adults. The N170 effect refers to the finding that the N170 ERP is larger in response to faces as compared to other stimuli (see Rossion and Jacques, 2008, for a discussion of different interpretations of the N170 effect). Both studies found that the N170 effect is equivalent in older and younger adults, a finding that has been challenged elsewhere (Rousselet et al., 2010; Bieniek et al., 2013). Both studies also found some differences between older adults and younger adults with regards to the N170 signals elicited by inverted faces. Whether these findings have implications for the configural information hypothesis is debatable, however, because we have no indication that the ERPs recorded were actually related to the behavioral manifestation of the FIE; indeed, no performance data were reported in either study.

Bearing in mind the strengths of the studies by Boutet and Faubert (2006) and Hildebrandt et al. (2010), the former having included a non-face object category to measure the classical FIE, and the latter having a large sample size, together with the limitations inherent in the other studies, the balance of evidence seems to favor the idea that processing of configural information, as measured by the FIE is intact in older adults. However, in interpreting all of these findings, it must be kept in mind that there is an ongoing debate about the exact mechanisms responsible for the FIE (e.g., Farah et al., 1998; Gauthier and Logothetis, 2000; Sekuler et al., 2004; McKone and Yovel, 2009; Rossion, 2009). In part for this reason, other investigators have focused instead on different tests of configural information processing. We turn to these now.

The composite effect (Young et al., 1987) and the wholepart advantage (Tanaka and Farah, 1993) have been used to investigate holistic face processing. The composite effect refers to the finding that composites made of two aligned half faces are more difficult to recognize than non-composites made of two misaligned face halves (**Figure 3**). The whole-part advantage refers to the finding that recognition of facial features is easier when they are presented in full faces rather than in isolation. Both these differences are reduced when faces are inverted, suggesting that recognition of upright faces is performed on the basis of a unitary holistic representation and that formation of this representation is impaired by inversion.

Inconsistent patterns of results have arisen from studies that have employed these tests to investigate holistic encoding in older adults. First, Konar et al. (2013) and Wiese et al. (2013) have provided evidence that the composite effect is present in older adults. In contrast, Boutet and Faubert (2006) and Hildebrandt et al. (2010) failed to find a significant composite effect in older adults, though a trend in the direction of a composite effect was found in the former study. Contradictions also exist for the whole-part advantage: whereas, Boutet and Faubert (2006)

**FIGURE 3 | An example of the composite effect**. The top row shows two unmodified faces **(A,B)**. The middle row shows a stimulus composed of the top half of b and the bottom half of a in an aligned condition **(D)** or a misaligned condition **(D)**. Recognition of the individual faces that make up the composite is significantly less accurate in the aligned composite **(C)** than the misaligned non-composite **(D)**. This difference is less pronounced when the images are inverted **(E,F)**. This is taken as evidence that faces are normally processed holistically, but that inversion disrupts this holistic processing.

found that the whole-part advantage is equivalent in older and younger participants, Hildebrandt et al. (2010) found no difference between recognition of parts in isolated vs. whole conditions in older adults. It thus appears that the literature to date provides inconclusive evidence with respect to holistic encoding of faces in older adults. Finally, to our knowledge, only one study has examined sensitivity to second-order relations in older adults. Specifically, Hildebrandt et al. (2010) measured older adults' sensitivity to changes in the spatial relation between facial features. Their results indicate that older and younger adults are equally sensitive to changes in second-order information.

It is difficult to reconcile these findings as they relate to the hypothesis that face recognition deficits in older adults arise from difficulties with processing configural information. Nonetheless, a number of general conclusions and suggestions can be made. The main pattern that arises from our review of the literature is that the same tests of configural information often yield inconsistent results in older adults. These discrepancies can arise either from the status of the sample of participants tested or from the test parameters employed. Starting with the status of the sample, it is now becoming increasingly obvious that substantial heterogeneity exists within "normal" aging (e.g., Ardila, 2007; Nyberg et al., 2012). To further complicate matters, the variability in older adults might arise from the fact that some of the adults tested may actually be on a trajectory of pathological aging. Indeed, the Mini-Mental State Exam, which is widely used to screen out such pathologies from the studies reported here, has poor sensitivity to the early stages of Alzheimer's disease and mild cognitive impairment (e.g., de Jager et al., 2009). As discussed in the previous section, old age also comes with a variety of vision problems that the participants may not be aware of but that can nevertheless have an impact on face recognition.

With respect to the test parameters employed, our review of the literature reveals a variety of stimulus manipulations and testing conditions that make it difficult to determine whether two different studies actually tap into the same processes. For example, the composite effect seems to be highly sensitive to specific stimulus and methodological parameters, and several modifications of the original paradigm (Young et al., 1987) have been published. The effect can be tested using a naming task, a short-term recognition task, or a simultaneous discrimination task, and it is often the case that differences in results are associated with these different paradigms (e.g., Boutet and Faubert, 2006 vs. Hildebrandt et al., 2010). Another point worth noting is that some studies have failed to include the critical inversion condition in their experiments. Indeed, the composite effect and the whole-part advantage are revealed via a significant interaction between the stimulus type and orientation (McKone et al., 2013). For example, the composite effect provides evidence that holistic information processing is unique to faces *because* the difference between recognition of composites and noncomposites is reduced for inverted faces. Yet several studies have simply omitted to include an inverted condition, making it impossible to determine whether the expected interaction was present in older adults. The inclusion of a critical inversion condition is particularly important in this context because of the heterogeneity of function present in older adults. Indeed, conclusions that a given process is intact in older adults because age differences are not significant must be checked against the limitations of null findings, especially when variability is high. This problem can be partly avoided by testing for the presence of the effect itself (i.e., composite effect, whole-part advantage) in each age group because such effects are manifested via the significant finding of an interaction between stimulus type and orientation for each age group separately. Finally, much can be learned from simultaneously studying the functional and physiological processes underlying face recognition deficits in older adults. Future experiments adopting this approach should employ procedures that better match those that elicit deficits in older adults. Furthermore, all of the conditions necessary to replicate the effects that are the hallmark of face recognition should be included.

We end this discussion by arguing that more research is needed to decipher the relation, if any, of tasks that measure configural encoding with each other and with general face recognition abilities in both younger and older participants. Indeed, contradictory results across different tests of configural information processing exist not only for older adults but also in studies focusing on early development (reviewed by Taylor et al., 2004; Johnson, 2011). Some efforts at comparing different tests have already been undertaken (Konar et al., 2010; Richler et al., 2011; DeGutis et al., 2013), notably the study by Hildebrandt et al. (2010) which focuses on individual differences in the behavioral performance of young, middle-aged, and older adults on a total of twelve face recognition tasks. Despite the difficulties inherent in testing such large numbers of participants on so many tasks, we believe that such studies should be replicated given that testing the same participants on several tests of face recognition eliminates the confound of participant variability in cognitive function. Moreover, studies focusing on individual differences give researchers the opportunity to examine whether common processes are recruited during different tests of configural information processing. Studies adopting this approach with older participants should include a condition that will replicate the high false alarm rate that is the signature of face recognition deficits in this population.

Both perceptual deterioration and impaired configural processing map onto the first stages of face recognition when a representation is extracted from a perceived face. The presence of one or both deficits might result in erroneous feelings of resemblance between perceived and stored representations. The decision regarding whether or not the face is actually familiar would then depend on access to information regarding previous encounters with the face. The following hypothesis maps onto this latter stage.

### **Decline in the Recollection of Contextual Information**

Age-related deficits in face recognition have also been attributed to a decline in recollecting contextual or source information when perceived faces trigger a feeling of familiarity. This hypothesis emerged from the robust findings that face recognition deficits in older adults are characterized by higher false alarms to unfamiliar faces (for a review, see by Searcy et al., 1999). Within the context of the framework presented herein, a feeling of familiarity will arise when there is a match between a perceived representation and a stored representation. According to the contextual information hypothesis, additional information regarding the context under which the perceived information was previously encountered is necessary to correctly discriminate *seemingly* familiar from *truly* familiar faces. Therefore, correctly rejecting a face that appears familiar because it happens to resemble a stored representation requires the extra step of context recollection. However, if access to contextual information is impaired, then face recognition will be based mainly on familiarity judgments, leading to the high false alarm rates observed in older adults.

The four following studies have included manipulations aimed at testing the context recollection hypothesis directly, and all four support the notion that older adults are more likely than to younger adults to base their recognition decision on familiarity. First, Bartlett et al. (1991) reported that older adults produced more false positives than younger adults when judging whether faces are those of celebrities or novel unknown faces. Most importantly, the age difference was particularly pronounced for faces that were likely to yield feelings of familiarity because they had previously been presented in the context of the experiment. Second, Bartlett and Fulton (1991) showed that perceived familiarity of new faces is significantly correlated with incorrectly stating that the face had previously been presented in older adults. Third, Searcy et al. (1999) demonstrated that older adults do not show higher false alarms to conjunction faces constructed from the inner and outer features of two different faces while still showing higher false alarms to non-manipulated faces. They reasoned that because the perceptual information in conjunction faces poorly matches representations stored in memory, conjunctions should not seem familiar and are therefore easily rejected by older adults without the need to rely on context recollection (see also Rhodes et al., 2008). Finally, Edmonds et al. (2012) have specifically manipulated the familiarity and context of faces by presenting faces as lures in a session where participants were asked to judge personality traits, followed by the presentation of study faces where participants were asked to remember the faces for a memory test. The memory test included both the studied faces and the familiarized lures. While older adults displayed similar hit rates for studied faces and correction rejection for new foils, their ability to correctly reject familiarized lures was significantly impaired in comparison to the younger participants. These results support the contention that older adults rely more heavily on familiarity when making yes/no decisions in face recognition tasks.

Studies on the bystander effect, whereby bystanders are often mistakenly identified as perpetrators, are also relevant to the context recollection hypothesis (Searcy et al., 1999). This effect is thought to arise because the face of the bystander is perceived as familiar, and therefore context recollection is essential to correctly reject his/her face during lineup identification procedures. If the context recollection hypothesis is correct, older adults should be more likely to incorrectly identify the bystander as the perpetrator because they will base their judgment on perceived familiarity without recollecting information regarding the context in which the face was encountered. Indeed, a number of studies have demonstrated that older adults are more prone to the bystander effect than younger adults (reviewed by Memon et al., 2002). However, Searcy et al. (2001) failed to find higher false alarms to bystanders in older adults and Memon et al. (2002) failed to demonstrate a positive impact of context reinstatement on the bystander effect in older adults. This evidence suggests that the link between the bystander effect and difficulties in recollecting contextual information is equivocal.

One possible avenue of future research is to determine whether problems with context recollection are face-specific. On the one hand, there is evidence that older adults display inflated false alarms for other types of stimuli (reviewed by Searcy et al., 1999), such as semantic stimuli (e.g., Dywan and Jacoby, 1990; Ozen et al., 2010). On the other hand, familiarity-based responding is much more likely to yield to practical difficulties in older adults for faces than for other stimuli because (i) faces are the only category for which correct individual recognition is frequently crucial during social interactions and (ii) there is an accrual of memorized faces with increasing age (Chaby and Narme, 2009). Unfortunately, to our knowledge, no study has measured age differences on both faces and other objects using comparable yes/no recognition tasks.

While we did not distinguish between familiar and unfamiliar faces in the organizing framework, there is some evidence to suggest that they trigger different forms of processing (e.g., Bruce and Young, 1986; Johnston and Edmonds, 2009). In their review on the topic, Johnston and Edmonds (2009) suggest that processing of familiar and unfamiliar faces might differ early in the processing stream because with an unfamiliar face, "*we are unable to know which characteristics or image properties will be key to representing the identity of an individual*" (Johnston and Edmonds, 2009, p. 591). For example, an observer might focus on distinctive features as a strategy to later remember an unfamiliar face. In contrast, a face triggering feelings of familiarity might be carefully analyzed to identify remembered aspects of the face (Bruce and Young, 1986). With some creativity, researchers might devise ways to examine the treatment of unfamiliar faces by older adults not only during context recollection but in earlier stages as well. On a related note, it will be important for future investigations to distinguish between the results of laboratory experiments, where the distinction between familiar and unfamiliar stimuli is induced artificially, and the real-life task of face recognition where the context is likely to be more elaborate.

We also suggest that future studies take advantage of modern image manipulation techniques to investigate recognition confusions in older adults. For example, instead of using conjunction faces like those employed in Searcy et al. (1999), future studies could use morphing techniques to systematically vary the resemblance of new faces and old faces encountered in different contexts. One recent study has used this approach to demonstrate that older adults have more difficulty than younger adults in discriminating morphed faces (Lee et al., 2014; see also Hildebrandt et al., 2011). It would be interesting for follow-up studies to employ these techniques in a recognition task to test whether faces with greater similarity yield more false alarms in older adults. It might also be possible to combine morphing techniques with experimental manipulations borrowed from studies on source memory (e.g., Brown et al., 1995; Skinner and Fernandes, 2009; Rahhal et al., 2002; Luo et al., 2007) to examine if manipulations of the context under which a face was viewed has a differential impact on younger and older adults. Finally, because face recognition deficits in older adults have deleterious implications for eyewitness testimonies, further investigations into techniques that could improve recollecting contextual information via cues to source memories are worth pursuing.

### **Conclusion**

In this review, we surveyed evidence pertinent to our understanding of the processes implicated in age-related face recognition deficits. Our discussion began with the premise that this decline is special because it is not merely a manifestation of general impairments associated with aging (Hildebrandt et al., 2011) and because it does not generalize to recognition of other complex objects (Grady et al., 1994, Experiment I; Boutet and Faubert, 2006; Chaby et al., 2011; Meinhardt-Injac et al., 2014). We then presented a generic framework that served to organize our subsequent review of three hypotheses that have been proposed to explain the face recognition deficits seen in older adults.

The impaired sensory processing hypothesis states that older adults have difficulty recognizing faces because of impairments in low-level perceptual capacities such as acuity or contrast sensitivity. While several studies have established a link between processing of basic spatial frequency information and face recognition abilities in both younger and older adults, some of the evidence reviewed was equivocal. Suggested avenues for future research include mimicking age-related perceptual loss in younger adults, ascertaining the range of spatial frequencies critical for face processing in older adults, and distinguishing between perceptual deficits that produce idiosyncratic *versus* generalized impairments.

While perceptual deterioration is likely to contribute to agerelated deficits in face recognition as well as in other cognitive abilities (e.g., Schneider and Pichora-Fuller, 2000), some have suggested that impairments in mechanisms specialized for face recognition must also be at play. The impaired configural processing hypothesis states that older adults have difficulties recognizing faces because of a deficit in encoding holistic and/or second-order information, both having been implicated in face recognition. This hypothesis provides the best match between the suggested vulnerability of face recognition in aging and the special processing style that underlies face recognition. Although the configural information hypothesis has received widespread attention in the literature in the past 10 years from both behavioral and imaging studies, the variety of procedures used and the contradictory effects reported, even sometimes in the same study (e.g., Boutet and Faubert, 2006; Hildebrandt et al., 2010), makes it difficult to judge its validity. Nonetheless, it is clear that older adults can, under certain circumstances, encode configural information, suggesting that a failure to encode this type of information is unlikely to be the principal determinant of recognition impairments. Future research in this direction should recruit larger sample sizes to take into account the heterogeneity inherent to the cognitive changes that occur with aging and include the necessary controls to eliminate alternative explanations unrelated to the special processes triggered by faces. Investigations into the conditions under which older adults can, and cannot, extract configural information from faces are also warranted.

The context recollection hypothesis was born out of the finding that older adults exhibit inflated false alarm rates to unfamiliar faces. It maps onto the final stages of the face recognition stream by stating that older adults are more likely to base their recognition decision on perceived familiarity because of difficulties in recollecting contextual information. Research conducted within this framework is promising because it provides an opportunity to translate laboratory findings into the real world situation of eyewitness testimonies. The evidence reviewed partially supports the context recollection hypothesis, suggesting that older adults recognize faces on the basis of perceived familiarity. Nonetheless, additional research using more modern image manipulation techniques (i.e., morphed images) as well as context reinstatement paradigms are needed to further establish its validity. Future studies should also distinguish between familiar and unfamiliar faces when comparing younger to older adults.

These three explanations are not mutually exclusive and all three may work in concert to result in the observed decline in face recognition in normal aging. Impaired contrast sensitivity, alongside diminished encoding of configural information (in certain conditions), would result in confusion when comparing perceived faces to stored representations in the first step of the face recognition process. As a result of this confusion, new faces are more likely to erroneously resemble stored representations and generate feelings of familiarity, especially because of the large number of faces that have been memorized by older adults (Chaby et al., 2001). Impaired access to contextual information would then prevent older adults from correctly rejecting new faces, leading to the high false alarm rate that is the signature of face recognition deficits in this population.

While the current review focuses on the functional aspects of face recognition deficits in older adults, past research points to a variety of plausible underlying physiological mechanisms. For example, reduced synaptic density (e.g., Hedden and Gabrieli, 2004; Kaup et al., 2011) could lead to reduced activation in low-level areas specialized in coding basic attributes such as contrast sensitivity, as well as in the areas of the face network specialized in processing configural information (Zhang et al., 2012). Deterioration of the hippocampus and the concomitant decline in episodic memory (Dickerson and Eichenbaum, 2010) would impede access to contextual information. These alterations could in turn lead to compensatory mechanisms such as activation in the prefrontal cortex, frontal cortex, and other associative areas (e.g., Reuter-Lorenz and Park, 2010). Finally, decreased efficiency in the different structures implicated in the face recognition process might also lead to *dedifferentiation* in the pattern of activation produced by faces in older adults (e.g., Grady et al., 1994; Lindenberger and Baltes, 1997; Park et al., 2004; Payer et al., 2006). More specifically, the brains of older adults may display diminished activation of the network of areas preferentially activated by faces and/or increased activation of areas implicated in more generic cognitive processes.

We have not attempted an exhaustive review of the literature on face recognition deficits in older adults in this paper and several explanations were omitted, either because they serve best to characterize the nature of the deficit (e.g., reduction in speed of processing: Salthouse, 1996; Pfutze et al., 2002; Rousselet et al., 2010; Rhodes and Anastasi, 2012; own-age bias: Fulton and Bartlett, 1991; Wiese et al., 2012; Verdichevski and Steeves, 2013) or because the proposed mechanisms are actually linked to one of the three hypotheses examined herein (e.g., processing of horizontal information: Narme et al., 2011; Obermeyer et al., 2012; context congruency: Meinhardt-Injac et al., 2014; changes in eye movements: Firestone et al., 2007; Chan et al., 2011). That is not to say that these other accounts should be ignored and in fact, we feel, like others, that the search for a single explanatory cause in aging studies is not likely to be fruitful. Instead, different techniques should be used to cast a broad net of investigation into several possible mechanisms that can then be eliminated or refined in light of the accumulated evidence. We hope that our review of three seemingly disparate and yet promising hypotheses illustrates the

### **References**


promise of this approach for our understanding of both aging and face recognition.

## **Acknowledgments**

The authors would like to thank Dr. Cary Kogan for proofreading.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Boutet, Taler and Collin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Examining age-related shared variance between face cognition, vision, and self-reported physical health: a test of the common cause hypothesis for social cognition

*Sally Olderbak1\*, Andrea Hildebrandt2 and Oliver Wilhelm1*

*<sup>1</sup> Universität Ulm, Ulm, Germany, <sup>2</sup> Ernst Moritz Arndt University of Greifswald, Greifswald, Germany*

#### *Edited by:*

*Guillaume A. Rousselet, University of Glasgow, UK*

#### *Reviewed by:*

*Victoria Savalei, University of British Columbia, Canada Rogier Kievit, University of Amsterdam, Netherlands*

#### *\*Correspondence:*

*Sally Olderbak, Universität Ulm, Albert-Einstein-Allee 47, Ulm 89081, Germany sally.olderbak@uni-ulm.de*

#### *Specialty section:*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

*Received: 04 May 2015 Accepted: 27 July 2015 Published: 12 August 2015*

#### *Citation:*

*Olderbak S, Hildebrandt A and Wilhelm O (2015) Examining age-related shared variance between face cognition, vision, and self-reported physical health: a test of the common cause hypothesis for social cognition. Front. Psychol. 6:1189. doi: 10.3389/fpsyg.2015.01189* The shared decline in cognitive abilities, sensory functions (e.g., vision and hearing), and physical health with increasing age is well documented with some research attributing this shared age-related decline to a single common cause (e.g., aging brain). We evaluate the extent to which the common cause hypothesis predicts associations between vision and physical health with social cognition abilities specifically face perception and face memory. Based on a sample of 443 adults (17–88 years old), we test a series of structural equation models, including Multiple Indicator Multiple Cause (MIMIC) models, and estimate the extent to which vision and self-reported physical health are related to face perception and face memory through a common factor, before and after controlling for their fluid cognitive component and the linear effects of age. Results suggest significant shared variance amongst these constructs, with a common factor explaining some, but not all, of the shared age-related variance. Also, we found that the relations of face perception, but not face memory, with vision and physical health could be completely explained by fluid cognition. Overall, results suggest that a single common cause explains most, but not all age-related shared variance with domain specific aging mechanisms evident.

Keywords: face perception, common cause hypothesis, fluid intelligence, immediate and delayed memory, MIMIC model, physical health, vision, face memory

### Introduction

The decline in fluid cognitive abilities across the adult lifespan, including particular components like mental speed, fluid intelligence, and working memory, is a well-known phenomenon (see Lövdén and Lindenberger, 2005, for a review). The decline in fluid abilities is also associated with a decrease in other cognition-related indicators such as sensory functions (i.e., vision and hearing), and physical indicators like blood pressure and respiratory functioning. While it was initially suggested that worsening cognition could be fully mediated by deteriorating sensory functions (Lindenberger and Baltes, 1994), this could not be confirmed (e.g., Anstey et al., 2001). The decline in vision does not cause the decrease in cognitive performance (Lindenberger et al., 2001), and although there is a strong relation between both functions, their age-related declines are only moderately linked (Lindenberger and Ghisletta, 2009).

An alternative explanation for the proposed downward slopes in fluid cognition, sensory functions, and physical health is that each of these factors has a unique negative relation with age. This causes each function to decline and to appear subject to a general factor, which, however, is essentially merely a statistical artifact (Salthouse et al., 1998). This explanation was supported by research that showed controlling for age reduced the relation between these variables, indicating the common cause to be more of a statistical artifact than a genuine overall factor. However, this explanation has not been supported elsewhere (e.g., Christensen et al., 2001).

As an additional perspective, many researchers propose that the decline in fluid cognitive abilities, sensory functions, and physical health indicators is indeed due to a common cause, such as the aging brain, central nervous system, or the aging body as a whole, in what is referred to as the common cause hypothesis. A strict interpretation of the common cause suggests that a single common factor explains *all* age-related shared variance between the (latent) variables of interest, particularly fluid cognitive abilities, sensory functions, and physical health indicators. Importantly, this implies that no age effects are expected for first order factors that are indicators of the common cause because age differences are completely explained by the higher order common cause factor (cf. Borsboom et al., 2003). However, tests of the common cause hypothesis do not always take this strict interpretation, with researchers finding support for the common cause hypothesis even though additional relations are needed between age and domain specific first order factors indication the postulated common cause (e.g., Christensen et al., 2001). That a common factor explains most of the age-related difference or decline in cognitive ability (e.g., fluid intelligence, working memory), sensory functions, and physical health is supported (e.g., Baltes and Lindenberger, 1997; Anstey, 1999; Anstey et al., 2001, 2003; Christensen et al., 2001; Li and Lindenberger, 2002; Valentijn et al., 2005), suggesting that the functions above are related to one another and decline with increasing age as a group. In addition, research suggests that the relations between indicators of fluid cognition, sensory functions, and physical health increase with increasing age (e.g., Baltes and Lindenberger, 1997; Li and Lindenberger, 2002).

The present paper includes an evaluation of the common cause hypothesis and competing models concerning predictions derived from this viewpoint concerning the relation between vision, physical health, and face cognition – a basic facet of social cognition. We will test two structures; the first, allowing all constructs to covary, and the second, modeling a common factor upon which all factors loads. Each structure will be tested with and without controlling for the direct effects of age on vision, physical health, and cognitive ability, and each structure will be tested controlling for fluid cognitive abilities in face cognition. All models will be presented followed by an evaluation of what the models explain regarding the relations between vision, physical health, and face cognition.

### Face Cognition

Age is associated with a stronger decline in fluid abilities (i.e., memory, attention) than crystallized abilities (i.e., basic knowledge; Baltes and Baltes, 1990; Singer et al., 2003b). Similarly, physical health has a stronger relation with fluid than crystallized abilities (Bergman and Almkvist, 2013); in a study examining the effects of a mnemonic training program on very old individuals, participants showed an improvement in fluid but not crystallized abilities (Singer et al., 2003a). The common cause hypothesis is typically evaluated with fluid abilities (i.e., working memory), but is rarely applied to more specific fluid cognitive processes. In particular, it is of interest to evaluate the extent to which the common cause hypothesis explains variance in cognitive ability factors previously shown to be distinct from traditionally established fluid cognitive abilities. As distinct ability factor, we refer to face cognition specifically, including two distinct abilities: the ability to perceive faces and the ability to remember faces (Wilhelm et al., 2010). Face cognition is considered an integral component in daily interactions and a key factor of social cognition (Beauchamp and Anderson, 2010). Face cognition has been identified as an ability that is distinct from, yet related with fluid cognitive abilities (including working memory and reasoning, object cognition, and immediate and delayed memory; e.g., Wilhelm et al., 2010). This distinction remains present across the lifespan, with relations between age and face cognition separable from the relations between age and general cognitive abilities (Hildebrandt et al., 2011). Because of their relations with general cognitive abilities, face perception, and face memory are sometimes modeled as nested factors under a general cognitive ability factor, in order to capture specific variance of face perception and face memory (e.g., Hildebrandt et al., 2011). Further, both the ability to perceive faces and the ability to remember faces can be considered fluid abilities. Face perception involves the identification of particular aspects of a face, while face memory involves perceptual processing, memory encoding and memory access and both have been modeled as indicators of a broad fluid intelligence factor (Hildebrandt et al., 2011; Kiy et al., 2013).

While we know that face cognition abilities decline with age, it is unknown how this decline relates to the decline in sensory functions or physical health. Given that both face perception and face memory are fluid abilities, we would expect the common cause hypothesis – if it holds – to also apply to these factors. That means we would expect that face cognition is related to sensory functions and physical health and ultimately to a common factor indicated by cognitive ability latent variables, sensory functions, and physical health. However, the distinction of face cognition from general cognition indicates that this may not be true, and that face cognition may have distinct relations to vision and physical health, especially after we control for the shared variance of face perception and face memory with general cognitive abilities. Thus, it is important to model face cognition in a way that controls for the general cognitive component, consequently testing the relations of age, sensory functions, and physical health with specific face perception and face memory variance only – which is new in the literature. While it is expected that face cognition will be related to sensory functions and physical health, as it has been found with general cognitive abilities and working memory, this has not been tested and as of yet is unknown.

### An Aging Brain

An aging brain is typically considered as the primary factor identified by the common cause hypothesis (e.g., Baltes and Lindenberger, 1997; Li et al., 2000) being responsible for the age-associated changes in cognition and sensory functions (e.g., Weale, 1982). The ability to perceive faces is linked with activation in the inferior occipital gyrus, the lateral fusiform gyrus, and the superior temporal sulcus (Kanwisher et al., 1997; Haxby et al., 2000; Haxby and Gobbini, 2011), each of which shows a decline in gray matter density and volume with increasing age (Raz et al., 1997; Sowell et al., 2003). Also, performance in face memory tasks, in addition to verbal memory tasks, is associated with left prefrontal cortical regions, which also are associated with a decline in volume with age (Hess, 2005).

Face perception is one of many abilities that, in younger adults, are primarily linked with activity of the ventral temporal cortex, which then shifts to the frontal regions, with reduced activation of the occipital lobe in older adults. This phenomenon is referred to as the posterior–anterior shift in aging (PASA; cf. Grady et al., 1994; Davis et al., 2008). PASA is supported by research showing that when viewing faces, houses, pseudo words, or chairs, young individuals had a high degree of neural specificity in the ventral visual cortex compared with older adults who had less neural specificity, indicating that the utilization of the fusiform gyrus for face perception was stronger for younger than older adults (Park et al., 2004). These findings were replicated by Payer et al. (2006) who also found increased activation in the middle and inferior frontal cortex in older adults.

There is ample evidence of the negative relation between age and face memory (e.g., Smith and Winograd, 1978; Grady et al., 1995; Anstey et al., 2002; de Frias et al., 2006; Hildebrandt et al., 2010); however, there is considerably less evidence regarding the relation of age with face perception. One exception is the study by Hildebrandt et al. (2011) who found that face perception, controlled for general cognitive functioning, did not show linear but instead negative quadratic age-related differences. That is, face perception abilities, when controlled for shared variance with general cognitive functioning, remained comparable between persons aged 18–60 but older persons performed worse.

### Vision

Lindenberger and Baltes (1994) found that both vision and hearing declined with increasing age and both were positively related with cognitive abilities. Vision has a stronger relation with cognition than hearing and is typically easier to measure. Common measures of vision include visual acuity, which is often measured with the Snellen test, which refers to the spatial resolution of what one can see at high contrast (i.e., the sharpness of one's vision), and contrast sensitivity, which refers to the ability to identify certain spatial frequencies at low contrast. Researchers have found that even when assessing visual acuity and contrast sensitivity in individuals with corrected vision (i.e., individuals using glasses or contacts), both functions show an age-related decline (e.g., Owsley and Sloane, 1987).

Vision is typically considered to be associated with an agerelated decline in line with the common cause hypothesis (e.g., Christensen et al., 2001). The results regarding the relation of vision with face cognition are mixed. When controlling for age, only contrast sensitivity, and not visual acuity, were identified as significant predictors in the perception of faces (Owsley and Sloane, 1987). Anstey et al. (2002) also found that when controlling for age, visual acuity was unrelated with face memory. Pfütze et al. (2002) found no relation between the speeds of face recognition with contrast sensitivity.

### Physical Health

While others typically included specific, direct measures of physical health, such as grip strength (e.g., Christensen et al., 2001), we chose a more global measure: self-reported ratings of physical health measured by the SF12. The SF12 physical health scale can reliably distinguish between clinical groups, disease severity, and (when assessed) describe recovery trajectories for individuals with rheumatoid arthritis and osteoarthritis (Hurst et al., 1998; Gandhi et al., 2001), back pain (Luo et al., 2003), retinal disease (Globe et al., 2002), HIV (Delate and Coons, 2000), acute myocardial infarction, and unstable angina (Failde et al., 2010). We chose this measure because it offers a reliable and valid assessment of general health, and instead of focusing on specific physiological measures, we have a general estimate of overall health. However, the scale is based on self-reports and hence only a proxy for physiological measures.

Self-reported health is positively related with general cognitive functioning (Zelinski and Gilewski, 2003). Individuals without mild cognitive impairment, broadly defined, report less subjective health problems, compared with those with mild cognitive impairment (Frisoni et al., 2000), and a meta-analysis of intervention programs showed that cognitive functioning in older individuals, who engaged in physical fitness activities was better than in inactive control participants (Colcombe and Kramer, 2003). The effects of health on fluid cognitive ability, however, differ in magnitude depending on the type of cognitive function assessed. For example, Bergman and Almkvist (2013) found that physical health fully mediated the effect of age on fluid intelligence, but not on crystallized intelligence. Colcombe and Kramer (2003) found that exercise had a stronger impact on performance in executive tasks, that is, tasks that require planning and inhibition, when compared with speeded and visuospatial tasks. Comijs et al. (2002) found that poor health (which they defined as the number of chronic diseases) and age predicted bad memory, even after controlling general cognitive functioning.

The relations between physical health and face cognition (including perception and learning/recognition) are not well established. One exception is a study by Bergman et al. (2007), who found a stronger relation of health with face memory than between age and face memory, with three health variables predicting 39% of the variance in face memory performance. This study suggests health is an important variable that is related with face cognition. However, this study did not control for age-related decline in general cognitive ability, thus they did not investigate specific effects of health on face memory that were not explainable through health effects on general cognitive functioning. In addition, the study was based on a relatively small sample size (*N* = 118 with persons ranging in age from 26 to 91), introducing a larger SE, and the results are not disattenuated for unreliability, suggesting the effect sizes might be higher. Nevertheless, we expected a positive relation between physical health and both face cognition factors. Furthermore, because about half of the variance in face cognition performance is explainable through general cognitive abilities (Wilhelm et al., 2010), we emphasize that the health effects on face cognition-specific variance need to be investigated after controlling for the shared variance with general cognitive abilities.

### Current Study

This paper presents a reanalysis of previously published data (see Hildebrandt et al., 2010, 2011, 2013). The relationships of face cognition abilities with health – which is the primary focus of this paper – have not been considered in any of the previous studies based on the used dataset. With structural equations modeling, we examined whether general physical health and indicators of vision are positively related with the abilities to perceive and remember faces, both before and after controlling for the effects of age and shared variance with general cognitive ability. It should be noted that, after controlling for variance due to general cognitive ability, face memory shows both a linear and a quadratic effect of age, while face perception only shows a quadratic effect of age (Hildebrandt et al., 2011). For modeling simplicity, we will include only a linear effect of age, which in the absence of the quadratic term, should capture the relative decline of face perception and face memory with increasing age.

This paper improves on the methodological shortcomings of previous studies by including multiple measures of cognitive abilities with a relatively large sample size, and the data are modeled at the level of latent variables, which are adjusted for measurement error and the specificity of the assessment method. Furthermore, we utilize structural equation modeling which allows us to model complex relations between the constructs and health-related variables.

### Materials and Methods

### Sample

Participants were 443 individuals (51% female), ranging from young (*n* = 148, ages 17–35, *M*age = 24.5, SD = 4.7), middleaged (*n* = 147, ages 36–64, *M*age = 49.0, SD = 7.9), and older individuals (*n* = 148, ages 65–88, *M*age = 72.0, SD = 4.7), with the sample on average 48.5 years old (SD = 20.3). The educational background of the sample was heterogeneous, with participants who had not completed a high school degree (8%), those who have completed high school (48%), and those who have some form of college or university education (44%). All older participants performed above the cut-off score of 24 on the Mini-Mental State Examination test (Folstein et al., 1975), indicating they did not show any signs of dementia.

### Procedure and Measures

Participants completed 5 h of cognitive testing during two sessions, separated by 5–9 days. Each cognitive test came with a practice trial, during which the participants received feedback; however, no feedback was given during the actual testing trials. All tasks were administered using Inquisit 2.0© with 17-inch color monitors, 85 Hz refresh rate, and 1280 × 1024 resolution. Self-report measures of health were completed at home between the two testing sessions. To this aim, participants were handed out a printed questionnaire that they were asked to complete at home and return at the second testing session. This study received approval from the Humboldt-Universität zu Berlin Psychology Department ethics committee, and written informed consent was obtained from every participant.

### Health Measures

### *SF12 – physical health*

The SF-12 is the 12-item short-form version of the SF-36 (Ware et al., 1996). Half of the SF-12 items are for the assessment of physical health status, referred to as the Physical Health scale, which is composed of four subscales: Physical Functioning, Role Physical, Bodily Pain, and General Health. An example item for the Bodily Pain subscale is "During the past week, how much did pain interfere with your normal work including both outside the home and housework?" Response options varied depending on the question (e.g., response options for the aforementioned example item were "Extremely," "Quite a bit," "Moderately," "A little bit," and "Not at all"). The internal consistency of all Physical Health items was acceptable (α = 0.831). The Physical Health scale can reliably differentiate between groups with adequate and poor physical health (Ware et al., 1996). According to the authors of the test, subscale-level scores are created by summing the items within that subscale, and one does not need to take into account any weighting scheme (Lim et al., 2008; Montazeri et al., 2009).

### *Visual acuity and contrast sensitivity*

Visual acuity and contrast sensitivity were both assessed with the Freiburg Vision Test (FrACT; Bach, 2007). Both variables were assessed with the best possible optical correction, when applicable, such as the use of glasses or contacts. Visual acuity was measured according to Snellen's fraction decimal unit and contrast sensitivity was measured by averaging the luminance of the bright and dark parts of optotypes for those trials where the participant answered correctly. Better vision is indicated by high visual acuity scores, indicating sharp vision, and higher contrast sensitivity values, indicating a better sensitivity to contrast.

### Face Perception (FP) Tasks

All face perception and face memory tasks were developed by Herzmann et al. (2008).

### *Sequential matching of part-whole faces—conditions part (FP 1) and whole (FP 2)*

In this task, participants were first presented with a target face, followed by a blank screen with an X in the middle, followed by two pictures. Those two pictures were a (1) part of the target's face (e.g., nose) and (2) the same part from another person's face (part condition, FP1), or they are (1) the target's original whole face and (2) the target's full face but with a particular part of the face (e.g., nose) replaced with that feature from another face (whole condition, FP2). The task includes 30 trials, and participants' scores are based on how often they correctly identify the original whole face or original part of the target's face.

### *Simultaneous matching of spatially manipulated faces—conditions upright (FP 3) and inverted (FP 4)*

In this task participants were presented with a target face, followed by a blank screen with an X in the middle, followed by the same target face either in its original form or with the spatial relation between facial features (e.g., eyes and nose) altered. Half of the trials show faces upright (upright condition, FP3) and the other half presents them upside down (inverted condition, FP4). The task includes 60 trials and participants' scores were based on how often they correctly indicated whether the two faces of a given trial were identical or not.

### *Facial resemblance (FP 5)*

In this task participants were presented with a target face from a three-quarter view in the top half of the screen and two faces in the bottom left and right half of the screen. The bottom two faces are morphs of the target face, containing 20 or 40% of the target face. The task includes 48 trials and participants' scores were based on how often they correctly identified the morphed face that contained the higher percentage of the target face.

### Face Memory (FM) Tasks

### *Learning and immediate memory of faces (FM 1)*

This task has three phases – (1) study phase (45 s), (2) unrelated task (2 min), (3) recognition phase (unlimited length) – and the sequence is presented two times, each time with new faces. During the study phase, participants were presented with 15 faces and asked to remember each one. During the subsequent recognition phase they saw each face presented during the study phase; individual faces of the memory set were presented together with a distractor face. During each of these trials participants received feedback regarding whether or not they were correct at identifying the targets. The recognition phase included five runs, each time with new distractor faces, so participants saw the target faces five times during the recognition phase (the recognition phase included 75 trials). Participants' scores were based on how many times they correctly identified target faces during the recognition trials.

### *Delayed recognition of learned faces 1 and 2 (FM 2 and FM 3)*

This task is a continuation of the *Learning and immediate memory of faces* (FM1) task. Participants repeated the recognition phase of FM1, with new distractor faces. This was done at the end of the first test session, about 2.5 h after the initial learning phase (FM2) and at the beginning of the second test session (FM3). Both FM2 and FM3 have 30 trials each and scores were based on how often participants correctly identified the target face.

#### *Eyewitness testimony (FM 4)*

In this task participants were presented with two faces, one of which was seen during an earlier face cognition speed task. This task consists of 46 trials and participants' scores were based on how often they correctly identified which face they had seen before.

### Working Memory (WM) and Reasoning (REA) Tasks *Memory Updating (WM 1)*

In this task, adapted from Oberauer et al. (2000), participants were presented with a 3 × 3 grid with single-digit numbers presented consecutively in each cell. Participants were asked to memorize those numbers and then, in a series of visual instructions, arrows pointing upward or downward appeared in each cell requiring participants to mentally update the numbers in the cells by either adding or subtracting 1, respectively. At the end of these instructions, participants were asked to type in the new numbers into each cell. This process was done 18 times and participants' scores were based on how many correct responses they had provided.

### *Rotation Span (WM 2)*

In this task, adapted from Kane et al. (2004), participants were presented with a sequence of arrows that they were to memorize (specifically the arrows' length and direction), while simultaneously completing a secondary task where they decided whether a letter was presented in mirror form or not. Participants' scores were based on the proportion of correctly recalled arrow positions and lengths.

### *Raven's Advanced Progressive Matrices (REA 1)*

Sixteen trials, from Raven et al. (1979), were presented (five trials differed between the participants, with older adults receiving easier trials compared to young and middle-aged participants). In each trial, a 3 × 3 matrix was presented, including symbols with one symbol in the bottom right missing. From eight options participants selected, which symbol logically completed the matrix. One-third of the items differed between the participants depending on their age. Older participants completed only 10 of the 15 difficult items, compared with the younger and middle aged adults who completed all 15 difficult items. Because the older participants worked on different items, the participants' scores were based on a linked 2-Parameter Logistic Model.

### Immediate and Delayed Memory (IDM) Tasks

The following three tasks are based on the Wechsler Memory Scale (Härting et al., 2000).

### *Verbal memory – immediate (IDM 1) and delayed (IDM 2)*

This task consisted of three sequences of a learning and recall phase, containing the same eight words pairs each time, but ordered differently. During the learning phase, participants heard the eight word pairs that they should memorize (we used eight instead of the original six trials to avoid ceiling effects in the younger sample). During the recall phase, immediately following each learning phase, participants heard one word from the word pair and were asked to type the second word of the word pair (condition immediate, IDM1). About 1.5 h later, participants were again asked to complete the recall phase (condition delayed, IDM2). Participants' scores were based on how often they typed in the correct word pair.

### *Name memory – immediate (IDM 3) and delayed (IDM 4)*

This task has the same structure as the verbal memory task. However, instead of pairs of words, participants are to memorize written first and last name combinations and to recall the last name when the first name was presented in written form. Participants were asked to either immediately recall the last name (condition immediate, IDM3) and 1.5 h later (condition delayed, IDM4). Scores were based on how often participants typed in the correct word pair.

### *Address memory – immediate (IDM 5) and delayed (IDM 6)*

The structure of this task is the same as that of the name memory task, but instead of pairs of names, participants read and memorized street names and corresponding house numbers and should recall the house numbers when the street name was presented on the screen. Participants were recall the house numbers either immediately (condition immediate, IDM5) and again 1.5 h later (condition delayed, IDM6). Participants' scores were based on how often they typed in the correct house number.

### Results

### Missing Data

All data were visually screened for outliers in univariate and bivariate distributions and outliers were set to missing. Specifically, values more than 3.5 standard deviations from the mean and scale-level performance scores that were below guessing probability were set to missing. All missing data was imputed with the Expectation–Maximization algorithm available in SPSS. The accuracy tasks were missing 45 values out of 8064 (1% of the observations; see Hildebrandt et al., 2011 for details). The SF12 was missing 75 values out of 5316 (1% of the observations). Visual acuity and contrast sensitivity were missing for 20 persons (5% of the sample), thus 40 data points out of 886. Please see Appendix **Table A1** for the final covariance matrix.

### Measurement Models

All analyses were performed within Mplus Version 7 and model fit was evaluated based on standards suggested by Hu and Bentler (1999) with SRMR ≤ 0.08, RMSEA ≤ 0.06, CFI ≥ 0.95, and TLI ≥ 0.95 indicating the model is a good fit to the data and SRMR ≤ 0.10, RMSEA ≤ 0.08, CFI ≥ 0.90, and TLI ≥ 0.90 indicating the model is an acceptable fit to the data. The chisquare was evaluated according to the ratio of the chi-square value to the degrees of freedom, with ratios 2.5:1 or lower indicating acceptable fit. In order to investigate the effects of age on cognition we employed two measurement model structures. The first tested simple measurement models, with each construct indicated by its specific cognitive tasks. The second employed a nested model structure with a single factor representing general cognitive ability and nested face perception, face memory, and immediate and delayed memory. In this structure, the general cognitive ability factor was indicated by the working memory and reasoning tasks, with each indicator of face perception, face memory, and immediate and delayed memory also loading on the general cognitive ability factor. That general cognitive ability is based on these measures indicates that the factor can be considered a measure of fluid cognition. In both types of measurement models, we allowed the residuals between tests (i.e., the indicators) that shared similar assessment characteristics to covary: the upright and inverted conditions of the simultaneous matching of spatially manipulated faces task, and the verbal and learning recognition portions within each of the immediate and delayed memory tasks.

The measurement model for face perception fit the data well [χ<sup>2</sup> (4) = 4.63, *p* = 0.33, SRMR = 0.015, RMSEA = 0.019, CFI = 0.998, TLI = 0.996, AIC = −3835.95, BIC = −3770.45], with loadings moderate to strong (λs ranged from 0.474 to 0.664). The measurement model for face memory also fit the data well [χ<sup>2</sup> (2) = 3.52, *p* = 0.17, SRMR = 0.008, RMSEA = 0.041, CFI = 0.999, TLI = 0.996, AIC = −3908.31, BIC = −3859.19], with all loadings strong (λs ranged from 0.690 to 0.918). And the measurement model for immediate and delayed memory fit the data well [χ<sup>2</sup> (6) = 12.33, *p* = 0.06, SRMR = 0.010, RMSEA = 0.049, CFI = 0.996, TLI = 0.991, AIC = −2279.86, BIC = −2193.90], with all loadings strong (λs ranged from 0.624 to 0.865). The fluid cognitive ability measurement model was indicated by only three indicators, thus exhausting all degrees of freedom so its fit cannot be tested in a separate measurement model; loadings were strong (λs ranged from 0.758 to 0.824). Finally, the measurement model with all cognitive ability factors modeled simultaneously in a nested structure had acceptable fit to the data [χ<sup>2</sup> (116) = 313.74, *p* < 0.05, SRMR = 0.062, RMSEA = 0.062, CFI = 0.958, TLI = 0.945, AIC = −7441.42, BIC = −7142.59], with general cognitive ability factor loadings moderate to strong (λs ranged from 0.317 to 0.790), weak to strong for the nested face perception factor (λs ranged from 0.170 to 0.538), moderate to strong for the nested face memory factor (λs ranged from 0.497 to 0.690) and moderate to strong for the nested immediate and delayed memory factor (λs ranged from 0.342 to 0.585).

Vision was modeled as a single latent variable indicated by visual acuity and contrast sensitivity. Because the construct was indicated by only two indicators, both variables were standardized and the loadings were equated (both loadings were 0.811). Again, this model is just identified.

Finally, self-reported physical health was modeled based on the latent factor structure by Montazeri et al. (2009). Here, one latent variable, representing physical health, was indicated by all four subscales. The model fit was comparable to that described by Montazeri et al. (2009): χ<sup>2</sup> (2) = 7.95, *p* < 0.05, RMSEA = 0.082, CFI = 0.991, TLI = 0.972, with all loadings being moderate to strong in magnitude (λs ranged from 0.573 to 0.828). The RMSEA is considered poor fit by Hu and Bentler (1999), mediocre by MacCallum et al. (1996), and good fit by Steiger (1989). The RMSEA can be inflated when a model has incorrectly omitted a single covariance between residuals (Savalei, 2012). An examination of the modification indices suggested that the addition of a covariance between the residuals of Role Physical

#### TABLE 1 | Model fit from restricted factor model (RFM) measurement invariance analyses.


∗*p* < *0.05; L, Loglikelihood value; SCF, scaling correction factor; FP, number of free parameters; AIC, Akaike's information criterion; BIC, Bayesian information criterion.*

and Bodily Pain, which was weak in magnitude (*r* = 0.278, *p* < 0.05), resulted in a lower RMSEA value [χ<sup>2</sup> (1) = 0.22, *p* = 0.64, SRMR = 0.003, RMSEA = 0.000, CFI = 1.000, TLI = 1.007; note: the TLI can fall out of the range of 0–1; Kline, 2005]. However, we decided against including this covariance because it was not specified a priori (Steiger, 1990) or postulated in the model by Montazeri et al. (2009), and excluding this covariance should not impact the basic correlational pattern between latent factors (Newcomb and Bentler, 1988).

#### Measurement Invariance

We employed restricted factor models (RFMs), a special condition of latent moderated structures analysis (Klein and Moosbrugger, 2000), where we included a measured variable as a moderator of measurement and structural model parameters to test measurement invariance across age. RFM works by allowing the creation of an interaction variable between age and the particular moderator of interest (e.g., self-reported physical health) and regressing the variables predicted by age and suspected of having an age effect moderated into the interaction term. RFM allows us to test the effect of moderators in a latent variable context while keeping the components of the interaction variable continuous. RFM analyses, however, do not come with established fit indices (e.g., RMSEA; instead, they come with −2 Log Likelihood, AIC, and BIC) so usually the models are first computed with just the main effects, and no interaction variables, with Maximum Likelihood estimation to establish the fit of the model. Then, the modeled is re-estimated within the RFM framework, to provide a baseline of model fit without the interaction variables, followed by a third model estimated within the RFM framework that includes the interaction variables. Nested models in RFM are compared based on likelihood ratio tests. It should be noted that RFM analyses cannot provide standardized estimates of path coefficients; instead, we present *t* values.

The measurement invariance of face perception, face memory, immediate and delayed memory, and general cognitive ability across age were previously established and the results are presented in Hildebrandt et al. (2011). The measurement invariance of physical health across age was estimated with RFM analyses. First, the measurement model of physical health was re-estimated within the RFM framework with age as a direct predictor of physical health (model PH1; see **Table 1** for model fit). Then, an interaction term between physical health and age centered was created, and all indicators of physical health were regressed on this interaction (model PH2). Three of the four loadings onto the interaction term were statistically significant indicating that the loading of that indicator on physical health changes due to age (see **Table 2**). Thus, the subscales Physical Functioning, Role Physical, and Bodily Pain increase in relation to the latent factor physical health, with increasing age.

To test whether the inclusion of this interaction term significantly improved fit, we estimated χ<sup>2</sup> values from the earlier model (PH1) where the interaction term was not included. Because both models were estimated within the RFM framework, we used the following formula (e.g., Muthén and Muthén, 2015):

$$
\Delta\chi^2 = \frac{2^\*(L\_{\text{modelB}} - L\_{\text{modelA}})}{(\text{SCF}\_{\text{modelA}} \, ^\*\text{FP}\_{\text{modelA}}) - (\text{SCF}\_{\text{modelB}} \, ^\*\text{FP}\_{\text{modelB}})/\, ^\*}
$$

$$
(\text{FP}\_{\text{modelA}} - \text{FP}\_{\text{modelB}})
$$

where L is the log-likelihood, SCF is the scaling correction factor, and FP is the number of free parameters. The χ<sup>2</sup> between the two models was statistically significant, indicating that the inclusion of the moderation effects of age significantly improved model fit. Three of the four loadings were statistically significant, indicating that the relations of those indicators with the central construct of physical health, and essentially with each other, increases with age. This suggests that metric invariance is partly supported.

#### Structural Models

### Models 1A and 1B – Covariances between Cognitive Ability (Including Face Perception and Memory), Vision, and Physical Health

First, we tested whether there was convergence between the cognitive ability factors, modeled with their individual nonnested measurement model, vision, and physical health (Model 1A; see **Figure 1**). The model fit the data well (see **Table 3**), with all cognitive ability factors strongly related to one another and with vision. Also, the cognitive ability factors and vision were weakly to moderately related with physical health (**Table 4**). Next, we tested whether these relations decrease once we control for the effects of age. We included age as a direct predictor of each factor and correlated the factor residuals (Model 1B; see **Figure 1**

TABLE 2 | Effect sizes (expressed as *t* values) of the interacting effect of age on the loadings of the physical health indicators on the latent variable physical health.


*Bolding indicates statistical significance (*∗*p* < *0.05).*

FIGURE 1 | Schematic representation of Models 1A and 1B (residual variances are not displayed), with the cognition factors, vision, and physical health covarying and in Model 1B only a direct effect of on all factors (indicated with bolded lines).

#### TABLE 3 | Model fit.


∗*p* < *0.05.*

and **Table 4**). The model fit the data well (see **Table 3**) and age had a moderate to strong effect on all of the factors. Overall, the relations between the constructs were reduced, and particularly for self-reported physical health, eliminated completely. Fluid intelligence, immediate and delayed memory, and face perception were moderately related with one another, while face memory


TABLE 4 | Correlations between cognition, vision, and physical health before (Model 1A) and after (Model 1B) controlling for age, expressed as fully standardized **β** values.

*Values left of the / indicate coefficients from Model 1A, and values on the right of the / indicate coefficients from Model 1B, with the exception of the Age row, which only includes coefficients from Model 1B. Bolding indicates statistical significance (\*p* < *0.05;* +*p* = *0.05).*

was now weakly related with fluid intelligence, moderately related with immediate and delayed memory, and strongly related with face perception. The relations of cognitive ability with vision dropped from strong effect sizes to weak, with immediate and delayed memory no longer significantly related with vision. Finally, the only remaining significant relation with self-reported physical health was for fluid intelligence. Overall, these results suggest that there might be a common factor that includes face perception and face memory in addition to fluid intelligence, immediate and delayed memory, vision, and physical health. That factor should remain influential even after controlling for the effects of age, however, it would be expected that when controlling for the effects of age that physical health is no longer related to the common factor.

### Models 2A and 2B – Covariances between Cognitive Ability (Modeled with Nesting), Vision, and Physical Health

Face perception and face memory might be related with vision because they are fluid abilities and not because of their contentrelated specificity; therefore, we next remodeled Model 1A with the nested cognitive ability factor instead of the separate measurement models for cognitive abilities (Model 2A, see **Figure 2**). The model had acceptable fit to the data (**Table 3**) with the general cognition factor significantly related to vision and physical health. The nested cognitive ability factors were not significantly related to vision or physical health with the exception of face memory, which was significantly related with vision, suggesting that the significant relations between face cognition and vision were mainly due to their general fluid ability component (**Table 5**). Next, we remodeled Model 2A with age as a direct predictor of all latent factors. This model had acceptable fit to the data (**Table 3**). Age was a significant predictor of all latent variables, with the exception of the nested face perception factor (**Table 5**). Controlling for the direct linear effects of age reduced the magnitude of almost all correlations. The general cognition factor was still significantly related with physical health and vision, and the nested face memory factor was still significantly related with vision, but all correlations were now weak in magnitude. Also, the relation between vision and physical health was no longer statistically significant.


TABLE 5 | Correlations between cognition, vision, and physical health before (Model 2A) and after (Model 2B) controlling for age, expressed as fully standardized **β** values.

*Values left of the / indicate coefficients from Model 2A, and values on the right of the / indicate coefficients from Model 2B, with the exception of the Age row, which only includes coefficients from Model 2B. Bolding indicates statistical significance (\*p* < *0.05).*

### Models 3A and 3B – Common Factor with Face Perception and Face Memory

Given Models 1A, 1B, 2A, and 2B still present significant correlations, even after controlling for age, we next modeled a second-order common factor structure. In the first model, the common factor was indicated by face perception, face memory, fluid intelligence, immediate and delayed memory, vision, and physical health (Model 3A; see **Figure 3**; **Table 6**). The model fit the data well (**Table 3**). All cognitive ability factors and vision were strongly related to the common factor, with selfreported physical health moderately related, and all factors had a significant proportion of variance accounted for by the common factor: 62% of fluid intelligence, 63% of immediate and delayed memory, 81% of face perception, 68% of face memory, 51% of vision, and 11% of physical health.

Next, we employed a Multiple Indicator Multiple Cause (MIMIC; Muthén, 1988) model to model the effects of age on the common factor and each of the common factor indicators, to see if the relations of any of the indicators with the common factor are reduced or eliminated with the inclusion of age (Model 3B; see **Figure 3**; **Table 6**). To identify the model we constrained the relation of age on face perception to zero, thus all additional effects of age on the common factor indicators can be compared relative to face perception. In other words, the effect of age on face perception is modeled through the relation of age on the common factor, and the additional effects of age on the common factor indicators (i.e., face memory) are in addition to the effect of age on the common factor. Recommendations suggest choosing a reference factor that is not expected to have an additional effect of age outside of the relation mediated by the common factor. The direct effects of age in Model 1B would suggest using self-reported physical health as the reference variable, because it showed the weakest effect of age. However, the inclusion of age will most likely lead to self-reported physical health not significantly related to the common factor. Therefore, we instead chose the performance indicator with the lowest effect of age, face perception, which also maintained its relation with the common factor even after the direct effects of age were modeled.

The model fit the data well (see **Table 3**). There was a strong negative effect of age on the common factor, with additional negative relations for face memory, fluid intelligence, immediate and delayed memory, and vision. These additional effects of age are similar to the pattern of age effects in Model 1B. There was a weak additional effect of age on physical health, however, like in Model 1B, self-reported physical health was now not significantly related to the common factor; hence this relation is not in addition to the effect of age on the common factor. Most importantly, face perception and face memory were still strongly related to the common factor, with fluid intelligence moderately related, immediate and delayed memory strongly related, and vision weakly related, however, the proportion of variance in all factors explained by the common factor was now reduced. The



*Values left of the / indicate coefficients from Model 3A, and values on the right of the / indicate coefficients from Model 3B, with the exception of the Age row, which only includes coefficients from Model 3B. Bolding indicates statistical significance (\*p* < *0.05).*

proportions were 16% of fluid intelligence, 27% of immediate and delayed memory, 93% of face perception, 52% of face memory, 1% of physical health, and 8% of vision. However, despite the reduced relations with the common factor, these results suggest that face cognition is part of the common factor, even after controlling for the effects of age.

### Models 4A, 4B, and 4C – Common Factor with Nested Face Perception and Face Memory Factors

Next, we remodeled the common factor but with the nested cognitive ability factor instead of the separate cognitive ability measurement models (Model 4A; see **Figure 4**; **Table 7**). The model had acceptable fit to the data (see **Table 3**). Vision and general cognitive ability were strongly related to the common factor, self-reported physical health moderately related, and the common factor explained 55, 84, and 13% of the variance in the factors respectively.

Next, we tested the inclusion of paths from the common factor to the nested face perception, face memory, and immediate and delayed memory factors. To keep the structure stable, paths were tested one at a time. Only the nested face memory factor was significantly related to the common factor, with a strong relation to the common factor (66% of the variance explained), with this model having acceptable fit to the data (Model 4B; see **Table 7**). This model suggests that the relation of face perception and immediate and delayed memory with the common factor is fully mediated by the general fluid cognitive ability factor, with the nested face memory factor also having a direct relation with the common factor.

To see if any of these relations change with the inclusion of age, we again modeled age as a direct predictor of the common factor, and iteratively included direct relations of age on each of the cognitive ability factors (Model 4C; see **Table 7**). Like with Model 3B, there was a strong negative effect of age on the

bolded paths show the effects of age tested for Model 4C.

TABLE 7 | Common factor loadings with a nesting structure, before controlling for age (Models 4A and 4B) and after (Model 4C), expressed as fully standardized **β** values.


*Values left of both / indicate coefficients from Model 4A, and values in between the / indicating coefficients from Model 4B, and values on the right of both / indicating coefficients from Model 4C, with the exception of the Age row, which only includes coefficients from Model 4C. NA indicates this value was not estimated. Bolding indicates statistical significance (\*p* < *0.05).*

common factor, with an additional moderate negative effect on the nested immediate and delayed memory factor (model fit was acceptable). There were no additional effects of age on the nested face perception or face memory factors, and the magnitudes of the loadings on the common factor were similar in magnitude to Model 4B. These results suggest that the effects of age on face perception can be fully mediated by the effect of age on the common factor and the strong relation of the general cognitive ability factor (under which face perception is nested) with the common factor. That there are no additional significant effects of age on face memory indicates that when modeled as a nested factor, the relation of face memory with age is fully mediated by the common factor. The additional effect of age on immediate and delayed memory suggests that that the effect of age on immediate and delayed memory cannot be fully explained by the common factor or by the general cognitive ability component of this factor.

### Discussion

### Summary

We presented a succession of models testing the relation between face perception, face memory, fluid cognitive ability, and immediate and delayed memory, vision, and physical health with each other and with age. These models differed in model fit, with no agreement amongst an evaluation of the fit indices in terms of the best fitting model. According to the AIC and BIC, Model 2B was the best fit to the data, however, according to SRMR, RMSEA, CFI, and TLI, Model 1B was the best fit. However, all models had acceptable fit to the data, according to the SRMR, RMSEA, CFI, and TLI, suggesting that while the structures differ, each offered an acceptable description of the data. Models 1A, 1B, 2A, and 2B, which presented strictly correlational structures, found significant correlations between the latent constructs, which is indicative of shared variance, although that shared variance was not directly modeled. Models 3A, 3B, 4A, 4B, and 4C presented a hierarchical latent factor structure, which did model the shared variance amongst the latent variables labeling this shared variance as a common factor. While the structures differ, they are merely different approaches to modeling the variance shared amongst the latent factors, specifically cognitive ability, vision, and physical health. Next, we will present an evaluation of what these models indicate regarding the relations between fluid abilities, in particular face cognition, with vision and health.

### Effects of Age

Model 1B and 2B indicated a strong negative linear trend of age on vision supporting existing findings (e.g., Owsley and Sloane, 1987; Lindenberger and Baltes, 1994) that vision decreases with age. Model 3B indicated that when vision was modeled as part of a common factor, which was also indicated by physical health and all of the cognitive ability factors (modeled in a nonnested structure), age had an effect on vision in addition to the effect mediated through the common factor. The final model (Model 4C), however, illustrated that when the common factor is indicated by a stronger fluid cognitive ability factor (labeled general fluid cognitive ability), a nested face memory factor, physical health, and vision, that age no longer had an additional effect on vision and instead the common factor fully mediated the effect of age on vision. That Model 3B showed a unique direct effect of age on vision, in addition to the effect mediated through the common factor, supports the findings of Christensen et al. (2001) and the results of Model 4C suggest that if Christensen and colleagues had remodeled their common factor with a stronger fluid cognitive ability factor, they might not have found a direct effect of age on vision.

For physical health, Model 1B and 2B indicated a moderate negative linear trend of age, supporting existing findings that general health declines with age. Model 3B indicated that when the common factor is indicated by vision and all of the cognitive ability factors (modeled in a non-nested structure), age was a stronger predictor of the common factor than physical health. The final model (Model 4C), however, indicated that when the common factor was indicated by a stronger fluid cognitive ability factor, a nested face memory factor, physical health, and vision, that age was no longer a stronger predictor of physical health when compared with the common factor. Instead, the effects of age on physical health could be fully mediated by the common factor.

Model 1B and 2B illustrated a strong negative effect of age on fluid cognitive ability, immediate and delayed memory, face perception, and face memory, supporting research that fluid cognitive abilities decline with age. In Model 3B, in addition to the strong negative effect of age on the common factor, which was moderately to strongly indicated by each cognition factor, all of the cognitive ability factors (with the exception of face perception which will be discussed in more detail below) had additional effects of age that were not mediated by the common factor. In Model 4C, however, when we employed a nesting structure for the cognitive ability factors, the effects of age on general cognitive ability, face perception, and face memory were fully mediated by the common factor, with only immediate and delayed memory showing an additional effect of age.

The lack of an additional effect of age on face perception in Model 3B highlights the findings by Hildebrandt et al. (2011) that face perception (controlled for general fluid cognitive ability) only shows a significant quadratic trend of age and not a significant linear trend. While face perception did show a strong negative effect of age in Model 1B, it is important to note that in that model, face perception ability was estimated without the use of a nested structure, thus without controlling for age effects on general cognitive functioning. Models 2B and 4C employed a nesting structure and found no linear effect of age on face perception. Likewise, Model 3B, through the common factor that was indicated by fluid cognitive ability and other fluid abilities, essentially partialled out the fluid ability aspects of face perception, and the remaining face perception specific variance was unrelated to age. These models suggest that the age trend identified in Model 1B was essentially due to the negative effects of age on fluid cognitive ability.

In regards to face memory, Model 1B and 2B indicated a strong negative linear trend of age, supporting Hildebrandt et al. (2011) and others who found face memory declines with age. Model 2B indicated that this negative trend remained even when face memory was nested under a general cognitive ability factor. Model 3B indicated that part of the negative relation with age was mediated by the common factor, but specific direct effect of age on face memory remained. Model 3C, however, illustrated that when the common factor was indicated by a stronger fluid cognitive ability factor, that the effects of age could be fully mediated by the common factor and no direct relation of age on nested face memory remained. The common factor estimated in Model 4C differed from the one estimated in Model 3B because the common factor in Model 4C included only the fluid cognitive ability components of immediate and delayed memory and face perception, essentially removing the non-fluid ability aspects of immediate and delayed memory and face perception from the common factor. In Model 4C, the common factor was indicated only by vision, fluid cognitive abilities, physical health, and the nested face memory factor. The remaining face memory variance that was unrelated to vision, general fluid abilities, and physical health did not have an additional negative effect of age. Overall, these results suggest that the age related decline in face memory is fully related to the age related declines in vision, general fluid abilities, and physical health, with no additional effects of age on face memory that are not explained through the common factor.

### Common Factor and Face Cognition

When ignoring the effects of age or the use of a nested structure for cognitive abilities, we found positive relations between face cognition, fluid cognitive ability, and memory with vision and physical health. However, these relations dropped in magnitude, or were eliminated completely, once we controlled for the linear effects of age. This suggests that in general, some of the relations between face cognition, fluid cognitive ability, and memory, with vision and self-reported physical health are due to age. After controlling for age, the relations between the cognition factors remained as well as the relations between the cognition factors (with the exception of immediate and delayed memory) with vision, suggesting that with increasing age, cognitive factors and vision are still related. However, the only variable still significantly related with physical health was fluid cognitive ability.

Because both, fluid cognitive (or later the general cognitive factor which was marked by fluid cognitive ability indicators) and vision consistently loaded on the common factor, this finding indicates that the common factor established in this work is similar in nature to other models of the common factor (e.g., Christensen et al., 2001). Both face perception and face memory significantly loaded on that common factor indicating that the common variance shared by vision, fluid cognitive ability, and immediate and delayed memory also predicted variance in face perception and face memory. The relations of face cognition with the common factor also remained after controlling for age. This suggests that the common factor includes face cognition.

Of notable importance is how the relations of face perception and face memory with the common factor changed once we controlled for general fluid ability in each construct. Employing a nested model structure, with immediate and delayed memory, face perception, and face memory nested under a general fluid cognitive ability factor, the nested face perception factor was no longer correlated with vision and physical health (Model 2B) or directly related with the common factor (Models 3A– 4C). This suggests that the relation of face perception with the common variance of vision, general fluid cognitive abilities, and self-reported physical health is fully explained by general fluid cognitive ability. Face memory, on the other hand, when modeled as a nested factor, was still correlated with vision (Models 2A and 2B) and related to the common factor (Models 3A–4C). These relations held even after controlling for age.

#### Face Cognition and Physical Health

In the final model (Model 4C), the common factor was composed of general fluid cognitive ability, vision, physical health, and nested face memory. The occurrence of the first three variables is typically identified; our model for the first time also shows a loading of a nested face memory factor. The purely correlational models (Models 1A and 1B) showed a relation between face memory with vision and each of the cognitive ability factors, but after controlling for age, face memory was unrelated to physical health. Models 2A and 2B indicates that once the general cognitive ability component of face memory is partialled out, the remaining face memory-specific variance is unrelated to physical health, and Model 4C suggest face memory is only related to physical health through the common factor.

Face perception, on the other hand, is essentially unrelated to self-reported physical health. The first models (Models 1A and 1B) suggest that age fully explains the relation between face perception and physical health. In addition, once we controlled for the general fluid components of face perception, face perception was not significantly correlated with physical health (Model 2B) and was related to the common factor, and thus to physical health (Model 4C). Overall, these results indicate that face perception is unrelated to physical health. Instead, any relation found between the two will most likely be due to either age or to the positive relation between fluid cognitive ability with physical health.

### Face Cognition and Vision

Our results indicate that face memory is positively related with vision. Model 2B indicates that even after controlling for the general fluid component of face memory, the nested face memory factor is still significantly correlated with vision. Models 1B and 2B indicate that even once age is controlled for, both factors are still positively correlated with one another. The first set of common factor models (Models 3A and 3B) show that face memory is again related to vision through the common factor, even after controlling for age. Finally, the last model (Model 4C) shows that even once the general cognitive ability components of face memory are controlled for, the specific face memory variance is positively related to the common factor and consequently to vision.

The pattern for face perception, on the other hand, is different. Model 1B shows that after controlling for the effects of age, face perception is positively related with vision. However, Model 2A indicates that once we control for the general fluid cognitive ability component of face perception, that the nested face perception factor is unrelated to vision. The first set of common factor models (Models 3A and 3B) show that face perception is positively related to the common factor and thus to vision, again after controlling for the effects of age. However, the last model (Model 4C) shows that only the general fluid aspects of face perception are related to the common factor, and thus to vision, suggesting that the specific face perception variance is unrelated to vision. Thus, the relations found in the Models 1A, 1B, 3A, and 3B is most likely due to age and the general fluid aspect of face perception.

These models suggest that face memory is related to physical health and vision, even after controlling for age and after controlling for general fluid cognitive abilities. Face perception, on the other hand, after controlling for age or for general fluid cognitive ability, is unrelated to physical health and to vision.

### Implications for the Common Cause Hypothesis

Overall, our models illustrate significant shared variance between fluid cognitive abilities, vision, and self-reported physical health. In Models 1A–2B, these constructs were significantly correlated with one another, and in Models 3A–4C, a higher-order factor structure, which was indicated by the fluid cognitive abilities, vision, and self-reported health, had adequate fit to the data. That a single higher-order factor had adequate fit to our data supports findings by Christensen et al. (2001) and others who could model a common factor. However, we also found that controlling for the linear effects of age reduced these relations, supporting findings by Salthouse et al. (1998), and others who found a portion of the shared variance amongst these constructs is attributable to age. We found, like many others, that there is shared variance amongst cognition, vision, and physical health, but that this does not fully explain the relations between these constructs. In other words, a common factor helps explain and model *most* of the shared variance amongst these factors, but not all. For example, Anstey et al. (2001) found that *most* of the age related variance in cognitive ability could be mediated by vision and hearing, except for a weak direct effect (β = −0.27) of age on cognitive ability. Christensen et al. (2001) found that a common factor fully mediated the relation between age and

### References


cognition, but *not* between age and vision. We can conclude that a common factor explains most, but not all, of the shared variance, rejecting a strict interpretation of the common cause hypothesis as it applies to social cognition, and instead our results suggests domain general and domain specific aging mechanisms.

### Limitations

While this study has several methodological advantages over other studies (e.g., multiple measures of face cognition and its covariates within the structure of intelligence) there are some limitations. First, this study was based on a cross-sectional sample and did not follow individuals longitudinally. Consequently, a rigorous distinction between age effects and cohort effects was not possible, and could lead to over estimating the effects of age on cognitive abilities. In addition, while we had multiple measures of cognition, we did not include multiple measures of self-reported physical health or vision. A replication of this study that addresses those limitations is needed.

### Conclusion

Research on the common cause hypothesis suggests that we should find age-related declines in fluid cognitive abilities and a relation between fluid cognitive abilities with sensory functions and physical health. However, it is unclear whether this decline should be primarily related to just the fluid ability component of these abilities, or whether it is related to specific constructs themselves, after general fluid cognitive ability was partialed out. This study adds to the literature by examining the relation between vision and physical health with face cognition – which has been established as a specific human ability in previous work. We found that both face perception and face memory significantly loaded on the common factor, thus relating both constructs to physical health and vision. After controlling for the general fluid cognitive ability components of both face cognition variables, we found that the relations of face perception, but not face memory, with vision and physical health could be completely explained by age and by the general fluid cognitive ability components of face perception.

function in older adults. *J. Gerontol. B Psychol. Sci. Soc. Sci.* 56, P3–P11. doi: 10.1093/geronb/56.1.p3


internal consistency and construct validity. *BMC Public Health* 9:341. doi: 10.1186/1471-2458-9-341


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Olderbak, Hildebrandt and Wilhelm. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*


Appendix

# Neural correlates of cognitive aging during the perception of facial age: the role of relatively distant and local texture information

#### Jessica Komes 1, 2 \*, Stefan R. Schweinberger <sup>1</sup> and Holger Wiese1, 2

*<sup>1</sup> DFG Research Unit Person Perception, Friedrich Schiller University of Jena, Jena, Germany, <sup>2</sup> Department of Psychology, Durham University, Durham, UK*

#### Edited by:

*Bozana Meinhardt-Injac, Johannes Gutenberg University Mainz, Germany*

#### Reviewed by:

*Joseph M. DeGutis, Harvard University, USA Isabelle Boutet, University of Ottawa, Canada*

#### \*Correspondence:

*Jessica Komes, DFG Research Unit Person Perception, Friedrich Schiller University of Jena, Leutragraben 1, 07743 Jena, Germany jessica.komes@uni-jena.de*

#### Specialty section:

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

Received: *21 May 2015* Accepted: *07 September 2015* Published: *23 September 2015*

#### Citation:

*Komes J, Schweinberger SR and Wiese H (2015) Neural correlates of cognitive aging during the perception of facial age: the role of relatively distant and local texture information. Front. Psychol. 6:1420. doi: 10.3389/fpsyg.2015.01420* Previous event-related potential (ERP) research revealed that older relative to younger adults show reduced inversion effects in the N170 (with more negative amplitudes for inverted than upright faces), suggestive of impairments in face perception. However, as these studies used young to middle-aged faces only, this finding may reflect preferential processing of own- relative to other-age faces rather than age-related decline. We conducted an ERP study in which young and older participants categorized young and old upright or inverted faces by age. Stimuli were presented either unfiltered or low-pass filtered at 30, 20, or 10 cycles per image (CPI). Response times revealed larger inversion effects, with slower responses for inverted faces, for young faces in young participants. Older participants did not show a corresponding effect. ERPs yielded a trend toward reduced N170 inversion effects in older relative to younger adults independent of face age. Moreover, larger inversion effects for young relative to old faces were detected, and filtering resulted in smaller N170 amplitudes. The reduced N170 inversion effect in older adults may reflect age-related changes in neural correlates of face perception. A smaller N170 inversion effect for old faces may indicate that facial changes with age hamper early face perception stages.

#### Keywords: face perception, N170, inversion effect, aging, own-age bias

## Introduction

Aging leads to a number of changes in humans. Two of these changes, which are often considered as particularly salient, relate to a decline in perceptual and cognitive abilities and to age-related changes in facial appearance. As we grow older, attention, working and episodic memory, language processing and executive functions undergo age-related modifications, often (but not always) resulting in less efficient performance in older relative to younger adults (see e.g., Craik and Salthouse, 2008). In addition, several aspects of face perception and memory have been described to become less efficient over the adult lifespan (Hildebrandt et al., 2010, 2011). At the same time, aging typically results in characteristic changes of the texture (e.g., wrinkling) and coloration of the skin's surface, as well as the shape of the face (e.g., Burt and Perrett, 1995), which allow the viewer to roughly estimate the age of another person. The research presented in this paper is at the intersection of these two types of age-related changes, as we examined the perception of facial age in young and older adult participants. More specifically, we were interested in whether the perception of facial age cues would be modulated by the age of the perceiver.

As noted above, a number of different characteristics allow the perception of facial age. One of these characteristics appears to be related to configural information, a term that is used with slightly different meanings by different authors (for a critical discussion, see Burton et al., 2015). Reviewing literature on face image matching tasks, which are often assumed to capture identity processing (but see e.g., Maurer et al., 2002; Burton, 2013; Burton et al., 2015) distinguish three types of configural processing: (1) the sensitivity to first-order relations, which reflects a basic configuration of features shared by all faces, with two eyes above a nose, which is above a mouth, (2) holistic processing, which refers to the integration of facial features into a Gestalt-like representation, and (3) the sensitivity to second-order relations, which reflects the perception of the detailed spatial layout of, and metric distances between, facial features. It is commonly reported that all types of configural processing are substantially disturbed by face inversion, i.e., the picture-plane rotation of the image by 180◦ (Yin, 1969), whereas local information is not to the same extent (Maurer et al., 2002; Rossion, 2008), although some authors observed results diverging from this widely accepted view (see e.g., Sekuler et al., 2004). In line with the former findings, it has been suggested that inversion results in a narrowing of the perceptual field, which only allows the analysis of relatively local information, but not the simultaneous processing of information from multiple features distributed over a large space of the face (Rossion, 2009). Following this suggestion, we will assume for the present manuscript that inversion disrupts the simultaneous processing of relatively distant information in faces, and not (or not to the same extent) the processing of relatively local information.

While the studies discussed in the preceding paragraph mostly report results from face matching tasks aimed at examining identity processing, previous research suggests that processing relatively distant information is also important for the perception of facial age. Although one might argue that perceiving relatively local qualities of the skin's texture (such as the presence vs. absence of wrinkles) would suffice for a rough and dichotomous age categorization, previous research demonstrated that deciding whether an adult face is young or old is slowed by face inversion (Wiese et al., 2012a). Moreover, using the composite face paradigm (Young et al., 1987), in which two halves from different faces are combined to form a novel whole face, Hole and George (2011) found that estimating the exact age of the upper half of a composite is systematically biased toward the age of the lower half, indicating an influence of the task-irrelevant part of the face and thus holistic processing during age estimations.

However, evidence for the simultaneous use of information from relatively distant parts of the face during age perception is not as clear-cut as it may seem from the two studies described in the last paragraph. For instance, estimates of the exact age of a face are similarly accurate for upright and inverted faces (George and Hole, 2000). Together with the above-discussed finding of slower age categorization for inverted faces, this finding suggests that age perception is less efficient but similarly accurate (i.e., more time-consuming processing is necessary to reach the same level of accuracy) when information from relatively distant parts is largely absent. Moreover, George and Hole (2000) reduced the availability of skin texture information by low-pass filtering the face images. Again, this manipulation did not lead to any impairment in the accuracy of age estimations. The authors concluded that facial age could be estimated from a number of different and independent cues, and that whichever cues are currently available can be flexibly used. It is unclear, however, whether this is similarly true for the efficiency of age categorizations.

Importantly, face perception depends to some extent on the amount of expertise the viewer has with a particular category of faces. For instance, it has been shown that own-race faces are perceived more holistically than other-race faces (Michel et al., 2006). In addition, advantages in part-based and secondorder configural processing for own- relative to other-race faces have been observed (for a review, see Hayward et al., 2013). Moreover, it has been shown that young adults typically have more experience with young relative to older faces, whereas older adults either have balanced experience or a bias toward older faces (e.g., Wiese et al., 2012b). Presumably related to these differences in experience, an own-age advantage has been observed in recognition memory (e.g., Bartlett and Leslie, 1986; Rhodes and Anastasi, 2012; Wiese et al., 2013b), face matching tasks (Macchi Cassia, 2011; Verdichevski and Steeves, 2013), and in age estimations (Moyse and Bredart, 2012; Voelkle et al., 2012). Accordingly, young and older adults seem to perceive young and old adult faces differently. For instance, previous findings of faster ethnicity categorizations of other-relative to own-race faces (Valentine and Endo, 1992) could motivate the prediction that whereas own-age faces are remembered more accurately in a recognition memory task, other-age faces will tend to be processed more efficiently in an age categorization task.

Whereas, the behavioral measures discussed so far can only depict the outcome of a cascade of different processing steps, time-sensitive measures of neural activity may be more suited to track the various sub-stages of stimulus processing. Given their high temporal resolution, event-related potentials (ERPs) appear particularly well-suited for this endeavor. ERPs are voltage changes in the electroencephalogram time-locked to a specific event, such as the presentation of a visual stimulus. ERPs largely reflect current changes at the postsynaptic membrane (Jackson and Bolger, 2014) and thus provide a measure of the brain's neural activity.

ERP studies on face perception have identified a negative occipito-temporal peak at approximately 170 ms after stimulus onset, the so-called N170 component, to be reliantly larger to faces as compared to other objects (Bentin et al., 1996; Eimer, 2011). The N170 is typically assumed to reflect early stages of face perception, related to the detection of a facelike pattern (Schweinberger and Burton, 2003; Amihai et al., 2011), which may correspond to first-order configural processing in terms of Maurer et al. (2002), or structural encoding (see e.g., Eimer, 2011), a term which is derived from the model by Bruce and Young (1986) and which denotes perceptual processes prior to individual face recognition. Moreover, this component is sensitive to a number of the manipulations and facial characteristics discussed above: it has been reported to be (1) increased and delayed for inverted relative to upright faces (reflecting the so-called N170 inversion effect; e.g., Eimer, 2000a; Rossion et al., 2000; Itier and Taylor, 2002), (2) smaller for spatially low-pass filtered relative to full spectrum faces (Goffaux et al., 2003; Halit et al., 2006; but see Holmes et al., 2005), (3) increased for other- relative to own-race faces (e.g., Herrmann et al., 2007; Caharel et al., 2011; Wiese et al., 2014), at least when face category or identity is task-relevant (Wiese, 2013), and (4) larger for old relative to young adult faces (Wiese et al., 2008, 2013c; Wolff et al., 2012):(for a related finding on the frontal P2, or VPP, see Ebner et al., 2011). Interestingly, at least some of the processes underlying N170 seem to be modulated by experience, as larger inversion effects for own- relative to other-race faces have been observed (Vizioli et al., 2010; Caharel et al., 2011; Wiese, 2013).

Moreover, several studies used the N170 to examine agerelated changes in face perception. First, generic face sensitivity of N170, with larger amplitudes for faces vs. objects, was found similarly in young and older adults (Gao et al., 2009; Daniel and Bentin, 2012), suggesting preserved neural sensitivity for faces in higher age. Second, smaller N170 inversion effects have been observed in older participants (Gao et al., 2009; Daniel and Bentin, 2012). Finally, the typical lateralization of the N170, with larger amplitudes over the right relative to the left hemisphere (Bentin et al., 1996; Amihai et al., 2011; Eimer, 2011), has been found to be less pronounced in older adults (Pfütze et al., 2002; Gao et al., 2009; Daniel and Bentin, 2012), which may reflect an attempt to compensate for age-related decline (Komes et al., 2014b). Thus, evidence for age-related changes of early face perception on the basis of the N170 is mixed. Whereas, the component's sensitivity to faces seems unchanged, both its lateralization and the N170 inversion effect seem affected by aging. It should be noted, however, that stimulus sets in previous studies showing reduced inversion effects in older adults were dominated by young and mid-aged faces, and that N170 inversion effects have been observed to be larger for own- relative to other-group faces (Vizioli et al., 2010; Wiese, 2013). This may have biased the results, as own-age faces were presented for young but not older participants, and it is thus unclear whether reduced inversion effects in older adults will also occur when old faces are presented.

Finally, ERPs subsequent to the N170 seem to be affected by aging. Whereas, in younger adults a clearly defined positive-going peak, often referred to as the P2, occurs subsequent to N170, this component is clearly reduced in older adults (Wiese et al., 2008; Rousselet et al., 2009). At the same time, effects of face inversion or low-pass filtering on P2 have not been described in older adults. Some authors have associated the P2 with second-order configural processing (Latinus and Taylor, 2006), whereas others suggested that it is related to the distinctiveness of faces (Schulz et al., 2012). Moreover, P2 is strongly affected by spatial attention during face processing tasks (Neumann et al., 2015), and larger for young relative to old faces in both young and older adults (Wiese et al., 2008, 2012b). Overall, for the purpose of examining effects of the participants' age on age perception, an analysis of P2, in addition to N170, seems necessary.

In the present study we asked young and older adult participants to categorize young and older adult faces by age. The faces were presented in upright or inverted orientation as well as in unfiltered or low-pass filtered versions. Our aims were three-fold: First, we wanted to examine the relative importance of processing relatively distant vs. local texturebased information for age categorization. It has been suggested that inversion narrows the perceptual field, disturbing the simultaneous processing of relatively distant parts of the face (Rossion, 2009). Low-pass filtering, in turn, removes wrinkles, and smoothes locally restricted changes in skin coloration, and therefore hinders the processing of local surface texture cues (Kloth et al., 2015). As previous studies demonstrated less efficient age categorization of inverted faces (Wiese et al., 2012a), we were interested in testing whether filtering the images would result in an additional decrease in performance. We used a stepwise filtering approach with increasingly severe cut-off frequencies (unfiltered, 30 CPI, 20 CPI, 10 CPI) to more precisely identify the frequency range informative for age perception. Similarly, at the neural level, we were interested to see how the combination of filtering and inversion, which have been described to have opposite effects on N170 amplitude, would affect ERPs reflecting perceptual processing stages.

Second, we considered that age categorization may be easier for a certain age category (e.g., for other-age vs. own-age faces, or for young vs. old faces) and/or may be modulated by the viewer's experience. If so, any processing advantage for a specific category of faces in one participant group (such as more efficient categorization of old faces in young adults, which would parallel the above-described finding of more efficient categorization of other-race faces) should be absent or even reversed in the other participant group. Previous studies observed a larger N170 inversion effect (with more negative amplitudes for inverted relative to upright faces) for own- relative to other-race faces. If the early perceptual processing of facial age similarly relied on expertise, larger inversion effects for own- relative to other-age faces would be expected.

Finally, and related to this latter point, we tested whether older adults would be less efficient in face perception, and in age categorization specifically. Previous findings of smaller N170 inversion effects in older relative to younger adults may have been related to the use of young face stimuli. If early perceptual processing of facial age was modulated by expertise, using young faces may have resulted in an advantage for young participants. Consequentially, we considered the possibility that previous findings of reduced N170 inversion effects in older adults might not reflect less efficient face processing per se, to the extent that older adults would show similar inversion effects for old faces as young adults do for young faces.

### Materials and Methods

#### Participants

Twenty-four undergraduate students (mean age = 21.5 years, SD = 2.0, 16 female) and 24 older participants (mean age = 65.8 years, SD = 4.3, 13 female) participated in the study. Older adults were recruited in senior citizen groups and via a press release in a local newspaper, and were reimbursed with 7.50 Euro per hour. All participants were Caucasian, reported to reside in independent living conditions and were right handed according to a modified version of the Edinburgh Handedness Inventory (Oldfield, 1971). None reported psychiatric or neurological disorders or received central acting medication, and all participants reported normal or corrected-to-normal vision. Furthermore, all participants gave written informed consent and the study was approved by the local Faculty ethics committee.

#### Stimuli

Stimuli consisted of 50 old (mean age = 77.5 years, SD = 6.7) and 50 young Caucasian faces (M = 22.1 years, SD = 2.42), 50% female respectively, and all taken from the CAL/PAL database (Minear and Park, 2004). All pictures displayed front views of neutral faces and were edited in Adobe Photoshop™ to remove all information (hair, clothing, background, etc.) apart from the face, which was subsequently pasted in front of a black background. All stimuli were framed within an area of 170 × 216 pixels (6.0 × 7.6 cm), corresponding to a visual angle of 3.8◦ × 4.8◦ at a viewing distance of 90 cm. Images were then filtered with the FourierImage software developed by Risto Näsänen (http://nasanen.info/Software.html) using an exponential lowpass filter with cut-off frequencies set to 30, 20, or 10 cycles per image (CPI). Furthermore, all stimuli were presented in both upright and inverted orientation, as well as in unfiltered and three low-pass filtered versions, resulting in eight images of each individual face (see **Figure 1** for stimulus examples).

#### Procedure

Participants were seated in a dimly lit, electrically shielded, and sound–attenuated chamber (400A-CT\_Special, Industrial Acoustics, Niederkrüchten, Germany) with their heads in a chin rest. Approximate distance between eyes and computer screen was 90 cm. Each experimental session began with a series of practice trials on different stimuli, which were excluded from data analysis. On each trial, a face stimulus was presented for 1000 ms, preceded by a fixation cross for 2000 ms.

The main experiment consisted of five blocks with 160 trials each, i.e., 800 trials in total. All 50 young and 50 old face identities were presented once in each of the eight stimulus versions. Within each block, 10 trials per experimental condition were presented, with a maximum of one repetition of facial identities per block. Blocks were presented in fixed order, and individual stimuli were presented in random order within each block. Participants were instructed to categorize each face according to age as fast as possible and without compromising accuracy. Between each block, participants were allowed a self-timed period of rest. Key assignment was counterbalanced across participants. Mean response times (RT, correct responses only) and accuracy was analyzed.

### ERP Recording and Analysis

We recorded 32-channel EEG using a BioSemi Active II system (BioSemi, Amsterdam, Netherlands). The active sintered Ag/Ag-Cl-electrodes were mounted in an elastic cap. EEG was recorded continuously from Fz, Cz, Pz, Iz, FP1, FP2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T7, T8, P7, P8, F9, F10, FT9, FT10, TP9, TP10, P9, P10, PO9, PO10, I1, I2, with a 512-Hz sample rate from DC to 155 Hz. Please note that BioSemi systems work with a "zero-Ref " set-up with ground and reference electrodes replaced by a CMS/DRL circuit (for further information, see www.biosemi.com/faq/cms&drl.htm).

Contributions of blink artifacts were corrected using the algorithm implemented in BESA 5.1 (MEGIS Software GmbH, Graefelfing, Germany). EEG was segmented from −200 until 1000 ms relative to stimulus onset, with the first 200 ms as baseline. Trials contaminated by non-ocular artifacts and saccades were rejected from further analysis. Artifact rejection was carried out using the BESA 5.1 tool, with an amplitude threshold of 100µV, as well as a gradient criterion of 75µV. Remaining trials were recalculated to average reference, digitally low-pass filtered at 40 Hz (12 db/oct, zero phase shift), and averaged according to the 16 experimental conditions.

In the resulting waveforms, mean amplitudes and peak latencies for N170 were determined at P9/P10 between 140 and 180 ms for young adults and between 155 and 195 ms for older adults. Mean amplitude for P2 was measured at the same sites between 200 and 300 ms for both younger and older adults. Statistical analyses were performed by calculating mixedmodel analyses of variance (ANOVA), with degrees of freedom corrected according to the Greenhouse-Geisser procedure where appropriate.

### Results

### Response Times

A mixed-model ANOVA on mean response times (see upper part of **Figure 2**) with the within-subject factors face age (young, old), orientation (upright, inverted), filter (unfiltered, 30 CPI, 20 CPI, 10 CPI) and the between-subjects factor group (young adults, older adults) resulted in main effects of face age, F(1, 46) = 5.97, p = 0.018, η 2 <sup>p</sup> = 0.12, with faster responses for old as compared to young faces, orientation, F(1, 46) = 212.23, p < 0.001, η 2 <sup>p</sup> = 0.82, with slower RTs for inverted vs. upright faces, and filter, F(3, 138) = 145.65, p < 0.001, η 2 <sup>p</sup> = 0.76, indicating slower responses with increasing filter strength. As indicated by the effect of group, older adults responded slower than young adults, F(1, 46) = 44.45, p < 0.001, η 2 <sup>p</sup> = 0.49.

Most interestingly, interactions of face age by group, F(3, 46) = 9.18, p = 0.004, η 2 <sup>p</sup> = 0.17, and orientation by group, F(1, 46) = 12.12, p = 0.001, η 2 <sup>p</sup> = 0.21, were further qualified by a three-way interaction of face age by orientation by group, F(1, 46) = 5.553, p = 0.023, η 2 <sup>p</sup> = 0.11. Post-hoc tests in younger adults indicated a significant interaction of face age by orientation, F(1, 23) = 4.49, p = 0.045, η 2 <sup>p</sup> = 0.16, with larger inversion effects for young relative to old faces. In older adults, a significant main effect of orientation, F(1, 23) = 123.50, p < 0.001, η 2 <sup>p</sup> = 0.84, but no significant interaction with face age, F(1, 23) = 2.81, p = 0.107,

η 2 <sup>p</sup> = 0.11, indicated statistically similar inversion effects for young and old faces.

An interaction of filter by group, F(3, 138) = 12.56, p < 0.001, η 2 <sup>p</sup> = 0.21, indicated similar response times in young adults for unfiltered vs. 30 CPI images, F < 1, but progressively slower response times for 30 vs. 20 CPI faces, F(1, 23) = 5.03, p = 0.035, η 2 <sup>p</sup> = 0.18, and 20 vs. 10 CPI faces, F(1, 23) = 126.02, p < 0.001, η 2 <sup>p</sup> = 0.85. Similar, in older adults response times for the unfiltered vs. 30 CPI conditions were similar, F < 1, whereas slower responses were observed in the 20 relative to the 30 CPI conditions, F(1, 23) = 12.86, p = 0.002, η 2 <sup>p</sup> = 0.36, and in the 10 relative to the 20 CPI conditions, F(1, 23) = 89.44, p < 0.001, η 2 <sup>p</sup> = 0.80. Please note that the interaction with group is probably due to the larger filter effect in older relative to younger adults from 30 to 20 CPI.

Finally, interactions of face age by filter, F(3, 138) = 34.74, p < 0.001, η 2 <sup>p</sup> = 0.43, and orientation by filter, F(3, 138) = 24.31, p < 0.001, η 2 <sup>p</sup> = 0.35, were further qualified by a three-way interaction of face age by orientation by filter, F(3, 138) = 6.69, p = 0.001, η 2 <sup>p</sup> = 0.127. Post-hoctests (see **Table 1**) indicated similar response times for young upright faces in the unfiltered vs. 30 CPI, and in the 30 vs. 20 CPI conditions, but slower RTs in the 10 vs. 20 CPI conditions. Responses to inverted young faces were slower in the unfiltered relative to the 30 CPI condition, similar in the 30 vs. 20 CPI conditions, and slower in the 10 relative to the 20 CPI conditions. Response times for old upright faces were similar in the unfiltered relative to the 30 CPI condition, slower in the 20 relative to the 30 CPI condition, and further decreased in the 10 relative to the 20 CPI condition. Similarly, for old inverted faces response times were equivalent in the unfiltered relative to the 30 CPI condition, but slower in the 20 relative to the 30 CPI conditions, as well as in the 10 relative to the 20 CPI conditions.

In sum, analysis of response times revealed effects of low-pass filtering the images over and above the effects of face inversion, which were particularly pronounced for old faces in the strongest filter condition. Moreover, in young adults, inversion effects were stronger for young relative to old faces, whereas no differential inversion effect for young vs. old faces was detected in older participants.

### Accuracies

A mixed-model ANOVA on accuracies with the within-subject factors face age, orientation, and filter, and the between-subjects factor group revealed a main effect of face age, F(1, 46) = 11.32,

TABLE 1 | Post-hoc tests of the interaction of face by orientation by filter in the analysis of response times.


p = 0.002, η 2 <sup>p</sup> = 0.20, with more correct responses to older compared to younger faces. Furthermore, upright as compared to inverted faces were more frequently correctly categorized, as indicated by the effect of orientation, F(1, 46) = 141.92, p < 0.001, η 2 <sup>p</sup> = 0.76. The main effect of filter, F(3, 138) = 37.10, p < 0.001, η 2 <sup>p</sup> = 0.47, revealed less accurate categorizations with increasing filter strength.

In addition, several interactions were found. Most interestingly, face age interacted with orientation, F(1, 46) = 16.02, p < 0.001, η 2 <sup>p</sup> = 0.26, and separate posthoc ANOVAs for young and old faces revealed that the inversion effect was stronger for young, F(1, 47) = 118.42, p < 0.001, η 2 <sup>p</sup> = 0.72, than for old faces, F(1, 47) = 19.94, p < 0.001, η 2 <sup>p</sup> = 0.30. However, only a trend for an interaction of face age by orientation by group was observed, F(1, 46) = 3.23, p = 0.079, η 2 <sup>p</sup> = 0.07.

Furthermore, orientation interacted with filter, F(3, 138) = 20.96, p < 0.001, η 2 <sup>p</sup> = 0.31, which was further qualified by the group factor, F(3, 138) = 3.18, p = 0.026, η 2 <sup>p</sup> = 0.07. In young adults, post-hoc tests (see **Table 2**) for upright faces revealed less accurate responses in the 10 relative to the 20 CPI condition only. Correct responses for inverted faces were less frequent in the 20 relative to 30 CPI condition, as well as in the 20 relative to 10 CPI condition. In older adults, less accurate responses for upright faces were detected in the 10 relative to the 20 CPI condition only. For inverted faces, less accurate responses were detected in the 20 relative to the 10 CPI condition. No main effect of group was detected in accuracies, F < 1.

Overall, analysis of accuracy data suggested detrimental effects of low-pass filtering the images on age categorization over and above the effect of face inversion, particularly for the strongest filter condition (filtering frequencies higher than 10 CPI). Moreover, the face age by orientation interaction suggested more pronounced processing of relatively distant information for young relative to older faces for both young and old participants.

TABLE 2 | Post-hoc tests for the interaction of orientation by filter by group in the analysis of accuracies.


### Event-related Potentials

A mixed-model ANOVA on N170 mean amplitudes (see **Figures 3**, **4**) with the within-subject factors hemisphere (left, right), face age, orientation, and filter, and the between-subjects factor group resulted in effects of orientation, F(1, 46) = 21.87, p < 0.001, η 2 <sup>p</sup> = 0.32, with more negative amplitudes for inverted as compared to upright faces, and filter, F(3, 138) = 4.02, p = 0.009, η 2 <sup>p</sup> = 0.08, with less negative amplitudes for increasing filter strength. Post-hoc tests revealed no difference for the unfiltered vs. the 30 CPI condition, F < 1, and for the 30 compared to the 20 CPI condition, F < 1, but significantly less negative amplitudes in the 10 compared to the 20 CPI condition, F(1, 47) = 7.08, p = 0.011, η 2 <sup>p</sup> = 0.13. N170 amplitudes differed significantly between age groups, F(1, 46) = 4.40, p = 0.042, η 2 <sup>p</sup> = 0.09, with more negative amplitudes for older relative to younger adults. We further detected a trend for an interaction of orientation × group, F(1, 46) = 3.48, p = 0.069, η 2 <sup>p</sup> = 0.07, pointing toward larger inversion effects in the young as compared to the older group. Interestingly, orientation interacted with face age, F(1, 46) = 43.43, p < 0.001, η 2 <sup>p</sup> = 0.49. Post-hoc analyses for young and older faces separately revealed that inverted young faces elicited significantly more negative amplitudes than upright young faces, F(1, 47) = 34.17, p < 0.001, η 2 <sup>p</sup> = 0.42, whereas the corresponding pattern was not significant for old faces, F(1, 47) = 3.27, p = 0.077, η 2 <sup>p</sup> = 0.065. The interaction of hemisphere by group was not significant, F(1, 46) = 2.287, p = 0.137, η 2 <sup>p</sup> = 0.08. Moreover, no interaction of orientation by face age by group was observed, F(1, 46) = 2.01, p = 0.163, η 2 <sup>p</sup> = 0.04.

In sum, only a trend toward larger inversion effects in younger relative to older adults was detected. Moreover, and in line with the behavioral results, only small and non-significant inversion effects were found for old faces, and this was the case both for young and older adults.

A mixed-model ANOVA on N170 peak latency with the within-subject factors hemisphere, face age, orientation, and filter, and the between-subjects factor group resulted in a main effect of face age, F(1, 46) = 13.44, p = 0.001, η 2 <sup>p</sup> = 0.23, with longer latencies for old than young faces. Moreover, inverted faces elicited longer latencies than upright faces, as indicated by the significant effect of orientation, F(1, 46) = 47.77, p < 0.001, η 2 <sup>p</sup> = 0.51. The filter factor reached significance, F(1, 46) = 4.01, p = 0.009, η 2 <sup>p</sup> = 0.08, indicating longer latencies for increased filtering strength. Furthermore, a three-way interaction of face age, hemisphere, and filter was detected, F(3, 138) = 3.22, p = 0.025, η 2 <sup>p</sup> = 0.65. Separate analyses for the left and the right hemisphere and for old and young faces (see **Figure 5**) indicated the absence of a filter effect over the right hemisphere, both for young faces, F(3, 141) = 2.12, p = 0.101, η 2 <sup>p</sup> = 0.43, and for old faces, F(3, 141) = 1.19, p = 0.315, η 2 <sup>p</sup> = 0.03. By contrast, over the left hemisphere the filter effect reached significance for both young, F(3,141) = 5.13, p = 0.002, η 2 <sup>p</sup> = 0.10, and old faces, F(3, 141) = 3.07, p = 0.030, η 2 <sup>p</sup> = 0.06, and was somewhat more pronounced in the former case. Planned comparisons did not reveal any significant differences between filter conditions for young faces; unfiltered vs. 30 CPI: F < 1, 30 vs. 20 CPI: F(1, 47) = 2.90, p = 0.097, η 2 <sup>p</sup> = 0.06, 20 vs. 10 CPI: F < 1. By contrast, for

older faces the unfiltered condition did not differ from the 30 CPI condition, F(1, 47) = 2.53, p = 0.12, η 2 <sup>p</sup> = 0.05, and the 30 CPI did not differ from the 20 CPI condition, F < 1. However, filtering images at 10 CPI resulted in a delayed N170 peak relative to the 20 CPI condition, F(1, 47) = 7.44, p = 0.009, η 2 <sup>p</sup> = 0.14. The group factor did not reach significance, F(1, 46) = 1.72, p = 0.20, η 2 <sup>p</sup> = 0.04. Similarly, no interaction of orientation by face age by group was observed, F < 1. In sum, both inversion and low-pass filtering resulted in delayed N170 peaks. The filter effect, however, was small and restricted to the strongest condition, old faces and the left hemispheric electrode site.

In order to test for a direct relationship between inversion effects observed in RTs and N170 measured at P10, correlations between the difference of inverted and upright faces in both measures were calculated. This analysis revealed a significant relationship between the two measures when all participants were entered into the analysis, r = 0.395, p = 0.005, but neither for young (r = 0.310, p = 0.141) nor older adults separately (r = 0.323, p = 0.124). Similarly, a corresponding analysis using N170 latency differences at P10 did not result in significant effects (all r < 0.181 and > −0.223, all p > 0.295).

Finally, a mixed-model ANOVA on P2 mean amplitudes with the within-subject factors hemisphere, face age, orientation, and filter, and the between-subjects factor group revealed main effects of face age, F(1, 46) = 48.93, p < 0.001, η 2 <sup>p</sup> = 0.52, with more positive amplitudes for young relative to old faces, and orientation, F(1, 46) = 13.66, p = 0.001, η 2 <sup>p</sup> = 0.23, with more positive amplitudes for upright than inverted faces. The group factor interacted with filter, F(3, 138) = 16.34, p < 0.001, η 2 <sup>p</sup> = 0.26. Subsequent analyses for the two groups separately (see **Figure 4**) revealed filter effects for both older, F(3, 69) = 11.45, p < 0.001, η 2 <sup>p</sup> = 0.33, and young participants, F(3, 69) = 6.17, p = 0.001, η 2 <sup>p</sup> = 0.21. In the older group, however, increasing filter strength elicited more positive amplitudes [unfiltered vs. 30 CPI: F(1, 23) = 6.12, p = 0.021, η 2 <sup>p</sup> = 0.21; 30 vs. 20 CPI: F(1, 23) = 2.97, p = 0.098, η 2 <sup>p</sup> = 0.11; 20 vs. 10 CPI: F(1, 23) = 6.09, p = 0.021, η 2 <sup>p</sup> = 0.21], whereas in the younger group increasing filter strength elicited less positive amplitudes [unfiltered vs. 30 CPI: F < 1; 30 vs. 20 CPI: F < 1; 20 vs. 10 CPI: F(1, 23) = 8.85, p = 0.007, η 2 <sup>p</sup> = 0.28]. In addition, face age interacted with orientation, F(1, 46) = 5.73, p = 0.021, η 2 <sup>p</sup> = 0.11. Post-hoc tests for younger and older faces separately (see **Figure 3**) resulted in an orientation effect for both face age conditions, which was, however, more pronounced for young, F(1, 46) = 19.62, p < 0.001, η 2 <sup>p</sup> = 0.30, relative to old faces, F(1, 46) = 6.02, p = 0.018, η 2 <sup>p</sup> = 0.12. No significant interaction of face age by orientation by group was observed, F < 1.

Overall, similar to the analyses of our behavioral data and N170 amplitude, inversion effects were more pronounced for young relative to old faces. In addition, low-pass filtering resulted in less positive amplitudes in young adults, but more positive amplitudes in older participants.

#### Comparisons within the Older Participant Group

As the match between face and participant age was closer for young relative to older participants, we calculated additional analyses within the older participant group. For that purpose, we conducted a median split in our older group based on age, which resulted in a young older adult (YOA) and an old older adult (OOA) group (N = 12 per group; YOA mean age = 62 years ± 2 SD; OOA mean age = 70 years ± 2 SD)<sup>1</sup> .

A mixed-model ANOVA on RTs (see **Table 3**) with group (YOA, OOA) as a between-subjects factor and face age, orientation and filter as within-subject factors revealed no significant interaction of face age by orientation by group, F(1, 22) = 1.94, p = 0.178, η 2 <sup>p</sup> = 0.081, and none of the other interactions with group resulted in significant effects. A corresponding ANOVA on accuracies (see **Table 3**) yielded a significant interaction of orientation by group, F(1, 22) = 4.50, p = 0.045, η 2 <sup>p</sup> = 0.170, with larger inversion effects in the OOA group. Again, the interaction of face age by orientation by group was not significant, F(1, 22) = 2.24, p = 0.149, η 2 <sup>p</sup> = 0.092. No further effects involving the group factor were significant.

<sup>1</sup>Please note that the chronological age of the face stimuli was still significantly higher than participant age in the OOA group (Mann–Whitney U = 87.0, p < 0.001).


TABLE 3 | Response times and accuracies (means and standard errors of the means) for Young Older and Old Older participants.

An ANOVA on N170 amplitude (see **Figure 6**) with an additional within-subjects factor hemisphere revealed a significant interaction of orientation by group, F(1, 22) = 7.80, p = 0.011, η 2 <sup>p</sup> = 0.262, reflecting significant inversion effects in the YOA group, F(1, 11) = 12.39, p = 0.005, η 2 <sup>p</sup> = 0.530, but not in the OOA group, F < 1. Furthermore, a trend toward a significant interaction of face age by orientation by group was detected, F(1, 22) = 3.52, p = 0.074, η 2 <sup>p</sup> = 0.138. While both groups showed larger inversion effects for young relative to old faces, this pattern appeared less pronounced in the OOA group. No further effects involving the group factor were significant (all p > 0.1). A corresponding analysis on N170 peak latency revealed a trend toward a main effect of group, F(1, 22) = 3.53, p = 0.074, η 2 <sup>p</sup> = 0.138, with numerically longer N170 latencies in the OOA relative to the YOA group, and a significant interaction of face age by group, F(1, 22) = 6.50, p = 0.018, η 2 <sup>p</sup> = 0.228, with longer latencies for old relative to young faces in the YOA group, but no respective difference in the OOA group. The interaction of face age by orientation by group was not significant, F(1, 22) = 1.90, p = 0.182, η 2 <sup>p</sup> = 0.080. No additional effects involving the group factor were detected (all p > 0.1).

Finally, a corresponding mixed-model ANOVA on P2 amplitude yielded a trend toward an interaction of face age by orientation by group, F(1, 22) = 3.00, p = 0.097, η 2 <sup>p</sup> = 0.120, with larger inversion effects for young relative to old faces in the YOA group and no clear inversion effects in the OOA group. None of the other effects involving the group factor were significant.

In sum, the analyses reported in this section did not detect strong hints for a processing advantage for old faces in the OOA group. It should be noted, however, that the sample size might have been too small to detect subtle effects, and therefore the absence of significant effects should be treated with caution.

### Discussion

The present study examined the categorization of young and old faces according to age in young and older adult participants. We were particularly interested to examine (1) whether and to what extent the simultaneous processing of information from relatively distant parts of the face and more local texturebased information contribute to the perception of facial age, (2) whether old and young faces are perceived similarly, and whether the perception of facial age is biased by participant age, and (3) whether older adults would be less efficient in early face perception, and more specifically whether they would show reduced N170 inversion effects. The following paragraphs discuss these questions on the basis of the present findings and the previous literature.

### Both Relatively Distant and Local Information Contribute to Efficient Age Categorization

Our behavioral and ERP data suggest that efficient age categorization depends on both relatively distant and local information. This interpretation is based on the finding that both inversion and low-pass filtering resulted in slower and less accurate responses. More interestingly, the two manipulations interacted, as revealed by the more pronounced costs of lowpass filtering for inverted relative to upright faces. It thus seems that a narrowing of the perceptual field in inverted faces can be partly compensated by using local texture-based information, such that if this information is additionally removed, additional costs apply. The stepwise filtering approach used in the present study, with four increasingly severe cut-off frequencies, allowed pinning down the frequency range most informative for this partial compensation to between 20 and 10 CPI. The finding of reduced response times for inverted faces is in line with a previous report of an inversion effect in age categorization (Wiese et al., 2012a). At the same time, neither inversion nor filter effects were found in a study, in which the exact age of the face stimuli had to be estimated (George and Hole, 2000). Together with the present results, these previous findings indicate that the processing of facial age, although not impossible for inverted and low-pass filtered images, is substantially reduced in efficiency.

Generally in line with the behavioral results discussed above, both face inversion and low-pass filtering affected N170 and P2. Also in parallel to performance measures, filtering effects were largely restricted to the most severe 10 CPI cut-off frequency filter condition. Interestingly, the two factors did not interact.

At first sight it appears plausible to assume that an inversion effect independent of filtering and a filter effect that is evident only in the strongest cut-frequency condition add up to the interaction observed in the behavioral data. It should be noted, however, that inversion and filter effects go in opposite directions: in line with previous studies (e.g., Rossion et al., 2000; Goffaux et al., 2003), we observed larger N170 amplitudes for inverted faces and smaller amplitudes for severely low-pass filtered faces. This seemingly contradictory finding is reminiscent of the longknown apparent paradox that the N170 is larger for upright faces relative to objects, while it is at the same time smaller for upright relative to inverted faces (Itier et al., 2006; Eimer, 2011). It thus seems that the effect of low-pass filtering is related to the former effect of generic face sensitivity, and that blurring may make the faces appear less face-like, probably because the first-order configuration of facial features is harder to detect. The finding that N170 is similarly reduced when high frequency noise is added to the image (Jemel et al., 2003) additionally supports this interpretation. Overall, relative to the N170 for unfiltered upright faces, both reduced and enhanced amplitudes seem to hamper the efficiency of age categorization.

### Relatively Distant Information is Less Important for Categorizing Old Faces, but Age Categorization is not Modulated by Viewers' Age

A further interesting finding of the present study was that young and old faces were processed differently to some extent. While old faces were categorized generally faster, inversion effects were more pronounced for young faces in both accuracies and ERPs, indicating less processing of relatively distant information for old faces. Moreover, when simultaneous processing of relatively distant information was possible, i.e., when images were presented in upright orientation, low-pass filtering affected response times for young faces only in the strongest filter condition, whereas filtering frequencies higher than 20 CPI affected the categorization of old faces. It thus seems that frequencies between 30 and 20 CPI contribute to the efficient categorization of old but not young faces, which are more robust against low-pass filtering. At the same time, when processing relatively distant information was disrupted by face inversion, high frequency information was informative for the detection of young age, as indicated by a decrease in response times starting already in the 30 CPI condition. This was not the case for old faces, which showed similar patterns of response time decrease with stronger filtering in the upright and inverted conditions. It thus seems that different frequency bands are drawn on when categorizing young and old faces if information from relatively distant parts of the face cannot be used.

At the same time, these results suggest that categorizing young faces predominantly depends on information from relatively distant parts and low-frequency information. Accordingly, only strong low-pass filtering affects the categorization of upright young faces. However, if the former type of information is not available, a more demanding analysis of local texture is conducted, which more strongly depends on higher frequency information, and is therefore hampered by even moderate lowpass filtering. Categorizing older faces depends on the processing of relatively distant information to a lesser extent, which is reflected in relatively smaller inversion effects. The relatively stronger use of local texture information for old faces is further reflected in the similar effects of low-pass filtering for upright and inverted faces.

This interpretation is at least partly supported by the ERP data: Parallel to the accuracy results, clearly larger inversion effects for young relative to old faces were observed in N170 and P2 amplitude measures. Accordingly, old faces are processed more similar when seen in upright vs. inverted orientation than young faces, and these different inversion effects might reflect differential processing of first- (N170) and second-order configural information (P2; see Latinus and Taylor, 2006). Again, it appears that processing of relatively distant information is more pronounced for young relative to old faces. At the same time, the effect of high-pass filtering the face images on N170 amplitude was similar for young and old faces. Accordingly, inverting and low-pass filtering the images appear to affect independent processes that contribute to N170 amplitude, with the former being sensitive to face age whereas the latter is not. Moreover, a strong low-pass filter affected the N170 latency for old faces but not young faces. This finding is broadly in line with the somewhat stronger sensitivity of old faces to the filter manipulation in response times. It should be noted, however, that the exact pattern found in response times is not paralleled in N170 latency results. Whereas, response times reflect the outcome of a cascade of sub-processes, including perceptual and decisional stages, N170 represents a more specific measure of early face perception. Therefore, it seems that later processing stages not reflected in our ERP analysis additionally modulated the pattern of results observed in response times.

As stated in the introduction, the own-race face recognition bias is at least partly based on more efficient perceptual processing of facial information (for a recent review, see Hayward et al., 2013). A neural correlate of this is seen in larger N170 amplitudes for other-race (e.g., Wiese et al., 2014) and larger N170 inversion effects for own-race faces (Vizioli et al., 2010; Caharel et al., 2011; Wiese, 2013). In the present study, we were interested in whether the previously described own-age recognition bias in young adults (Rhodes and Anastasi, 2012; Wiese et al., 2013b) was similarly paralleled by differences at early perceptual processing stages. The present results revealed only moderate evidence for this idea. On the one hand, young faces elicited larger RT inversion effects than old faces in young adults, whereas no differential inversion effect was observed in older adults. On the other hand, both N170 and P2 inversion effects were larger for young faces, and this effect did not interact with participant age. Similarly, in a previous study we observed a larger N170 misalignment effect (with larger amplitudes for horizontally misaligned relative to aligned face halves) for young relative to old faces in both young and older adults (Wiese et al., 2013a). Overall, assuming that the present young participants would show an own-age bias in face memory if so tested, the present results do not provide strong evidence for an early perceptual basis of this own-age memory bias. Our data instead suggest that the processing of relatively distant information is less important for old faces for both young and older adults. The absence of a stronger inversion effect for young faces in older adults in one out of four measures can be hardly interpreted as a strong argument against this suggestion.

As a potential limitation of the present study, we note that the match between stimulus and participant age was closer in young relative to older adults, and that the absence of a processing advantage for old faces in older adults might be partly related to this larger mismatch. It should be noted, however, that at least the own-age bias in adult participants' recognition memory does not depend on an exact match of stimulus and participant age (Wolff et al., 2012). Moreover, previous studies have demonstrated that the age of older faces is particularly hard to perceive (George and Hole, 2000; Voelkle et al., 2012), probably because differences in neurobiological and socio-environmental factors (such as sun exposure, smoking etc.) have more time to affect facial appearance with increasing age. Interestingly, these studies have further shown that the age of older adults' faces is systematically underestimated by 4–5 years. This indicates that even though the face images in the present study were de facto older than the OOA participants, this has likely not been perceived as clearly as suggested by the difference in chronological age. Nevertheless, future studies should be stricter when matching stimulus and participant age for older participants.

### Older Adults are Overall Less Efficient in Early Face Perception, but Process Young and Old Faces as Younger Adults

Although a number of differences between participant groups were detected in the present study, it appears important to point out that the results revealed only moderate effects of participant age on both behavioral and ERP data. Analysis of accuracy data hinted toward a slightly higher sensitivity to low-pass filtering facial information in younger adults. More precisely, when processing of relatively distant information was disrupted (i.e., in the inverted condition), a cut-off frequency at 20 CPI led to less accurate categorizations in younger but not older adults. This finding may indicate a somewhat stronger sensitivity to high frequency information in younger adults when information from relatively distant face parts cannot be used. It should be noted that this interpretation implies a link between configural and spatial frequency information, which has been found in some (Goffaux and Rossion, 2006), but not all studies (Boutet et al., 2003; Gaspar et al., 2008) examining this potential relationship in identity judgment tasks. Such subtle effects may also be related to slight differences in visual acuity between groups, which were not explicitly tested in the present study. Although all participants reported normal vision and wore their seeing aids if necessary, previous studies have shown that age group differences in vision remain even under these circumstances (see e.g., Komes et al., 2014a). Moreover, in the present study the overall pattern of response time decreases with increasing filter strength was similar for both age groups, suggesting only moderate age-related change in the present task.

In line with slowing accounts of cognitive aging (e.g., Salthouse, 1996), older participants needed more time for age categorizations than younger adults. If slower age categorization were linked to slowed perceptual processing, and given that N170 reflects a perceptual processing stage (such as the processing of first-order configuration or structural encoding), one might assume that its peak would be substantially delayed in older adults. The present data, however, do not point toward a perceptual locus of this effect, as N170 latency was not significantly delayed in older adults. As a potential qualification it should be noted that some previous studies observed delayed N170 peaks with increasing age (Gazzaley et al., 2008; Wolff et al., 2012), and that a trend in this direction was observed in the comparison of relatively young and old older adults in the present study.

N170 was larger for older adults, a finding that replicated previous results from others (Gao et al., 2009; Daniel and Bentin, 2012) and our group (Wiese et al., 2008; Wolff et al., 2012). More interestingly, and similar to previous studies, the N170 inversion effect was more pronounced in younger relative to older adults (Gao et al., 2009; Daniel and Bentin, 2012), although in the present study the respective interaction was only observed as a statistical trend. However, the analysis of young vs. old older adults yielded smaller N170 inversion effects in the latter group, suggesting that this age-related change in neural processing occurred after the age of 62. Importantly, this effect was observed even though young and old face stimuli were used. This finding suggests that reduced inversion effects reported in previous studies presumably reflected moderate but clearly detectable agerelated changes in neural correlates of face perception rather than an experience-based bias toward own-age faces in the younger participants.

Interestingly, the larger N170 for inverted relative to upright faces has been suggested to reflect the recruitment of additional neural mechanisms (related to feature-based object processing or processing eyes) rather than stronger activation of the neural mechanism for upright faces (Rossion and Gauthier, 2002; Itier et al., 2006; Sadeh and Yovel, 2010). Thus, one might assume that the reduced N170 inversion effect in older adults indicates a deficit in this additional recruitment of processes associated with analyzing more local information for inverted faces. However, in the present study the reduced inversion effect in older adults was at least partly related to more negative N170 amplitudes for upright faces (mean amplitude in young adults: −4.4µV; older adults: −6.2µV), and not to the same extent to age-related differences for inverted faces (young adults: −5.2µV; older adults: −6.6µV). Given that the upright N170 reflects the simultaneous processing of relatively distant information, this finding may be interpreted as reflecting more effort and thus reduced efficiency for this type of processing. Moreover, the smaller increase in negativity from upright to inverted faces in older adults may reflect less recruitment of additional local processing.

At the same time, N170 effects of face age and filtering were similarly observed in younger and older adults. Whereas, the more negative N170 for old relative to young faces in both groups is generally in line with a previous study (Komes et al., 2014b), which also found that the fine-tuning of N170 to faces from different ethnic groups is largely intact in older age, the present findings further suggest that N170 sensitivity to information from different frequency bands also seems to be largely preserved in older participants. Assuming that the effect of low-pass filtering on N170 amplitude is related to the generic face sensitivity of this component, our findings are generally in line with the conclusion that this aspect of N170 is largely intact in older adults (Daniel and Bentin, 2012). Overall, together with previous ERP studies, the present results indicate selective age-related effects of face inversion on N170, which at least partly reflects less efficient processing of relatively distant information in older adults. At the same time, clear inversion effects were observed in both groups in the behavioral results of the present study, which again support the interpretation of only moderate age-related change.

A number of previous behavioral studies tested the simultaneous processing of relatively distant information in older adults. Interestingly, both Boutet and Faubert (2006), who tested the inversion and composite face effect, and Konar et al. (2013), who examined the composite face effect only, concluded that configural processing is not reduced in older adults. Of note, however, Boutet and Faubert (2006) did not analyze response times. Therefore, they might have missed aging effects manifesting in less efficient processing. Moreover, Konar et al. (2013) did observe an effect of aging on response times, which together with similar accuracies may be interpreted as less efficient processing (see also Wiese et al., 2013a). Finally, Hildebrandt et al. (2010) did not observe a composite effect in older participants and found that effect sizes for the inversion effect were substantially smaller for older relative to both middleaged and young participants. In conclusion, and generally in line with previous and the present ERP results, it appears that some age-related changes in the simultaneous processing of information from relatively distant parts of faces (which is measured in "configural" tasks; see Rossion, 2009) are typically observed, either reflecting less accurate or, more subtly, less efficient processing in older adults. Accordingly, future research should take both accuracy and efficiency of processing into account.

It should be noted, however, that in the present study agerelated differences were observed in the N170 but not in the behavioral data. Similar apparent discrepancies between ERP and behavior have been observed in other research areas (e.g., in language processing; see e.g., Federmeier and Kutas, 2005; Federmeier et al., 2010). In principle, we see two possible underlying causes: (1) a specific processing stage (in the present case the processes reflected by N170) is affected by aging, but this deficit is compensated at a later processing stage, or (2) ERPs are more sensitive to detect age-related changes than behavioral measures, and therefore point to changes that will manifest at the behavioral level in higher age. With respect to the first suggestion, we note that an increased recruitment of higher-order cognitive processes to compensate for age-related deficits in sensory and perceptual processes has been suggested by neuroimaging studies (see Dennis and Cabeza, 2008). However, as the present results do not provide direct evidence for such a compensation mechanism, this interpretation remains speculative. At the same time, the second suggestion is not in line with our finding of similar (or even larger) behavioral inversion effects in old older relative to young older adults. Further research is clearly needed to clarify the repeatedly observed mismatch with respect to age-related changes between ERP and behavioral data.

In addition, aging seems to clearly affect the neural processes reflected in ERP components following N170. In the present study, filtering face images had opposite effects on the P2 of younger and older adults (for potentially related findings in slightly later time windows, see Wiese et al., 2012b; Komes et al., 2014a). Whereas, younger adults demonstrated less positive amplitudes with increased filter settings, more positive amplitudes were observed in older adults. This finding may point to a differential orientation and/or location of the underlying generators (e.g., Jackson and Bolger, 2014) as a consequence of age-related brain changes. Alternatively, differences in processing strategies may account for this finding. The occipito-temporal P2 is larger in attentionally more demanding conditions (Neumann et al., 2015). One possibility to explain the above pattern is to assume that older adults tried to compensate for the increased difficulty in the filtered conditions by enhancing attentional resources. Irrespective of the precise underlying cause, ERPs in time ranges following N170 appear to be modulated in older relative to younger adults in a qualitatively different manner, whereas age-related modulations of the N170 are quantitative.

A potential qualification of the present results may lie in the repetition of facial identities (although the same identity was never presented twice in the same condition). This, together with the presentation time of 1 s, may have encouraged participants to not only process the directly task-relevant age information, but also task-irrelevant identity information. If so, this appears unlikely to have affected our main results. First, with respect to our behavioral findings, one might assume that with increasing face familiarity over blocks, configural processing of the faces might increase. This, however, was not the case. An additional ANOVA on RTs, with the within-subject factors block (five levels) and orientation, and a between-subjects factor group neither revealed a significant two-way interaction of block by orientation, F(4, 184) = 1.34, p = 0.243, η 2 <sup>p</sup> = 0.029, nor a threeway interaction of block × orientation × group interaction, F(4, 184) = 1.51, p = 0.200, η 2 <sup>p</sup> = 0.032. In addition, we note that several recent papers question the relationship between configural and identity processing (Taschereau-Dumouchel et al., 2010; Burton et al., 2015). However, independent of whether configural information is important for identity processing, a narrowing of the perceptual field, which may result from face inversion (Rossion, 2009), may slow down age categorizations, suggesting that the simultaneous processing of several relatively distant parts of the face is relevant for efficient age categorization. Second, with respect to our ERP findings we note that most researchers agree that the N170 reflects processes prior to the identification of individual faces (Bentin and Deouell, 2000; Eimer, 2000b; Schweinberger et al., 2002; Henson et al., 2003). Reports associating this component with identity processing typically show very small effects, and are inconsistent with respect to their direction (see Caharel et al., 2006; Marzi and Viggiano, 2007). It therefore appears unlikely that the present results in the N170 reflect the processing of identity rather than age information.

### Conclusions

The present study examined the interplay of two arguably fundamental age-related changes: the change in facial appearance and the change in perceptual and cognitive functioning with increasing age. We found that both information from relatively distant parts of the face and local information are used for the processing of facial age. Moreover, the simultaneous processing of information from distant parts seems to be relatively more important for perceiving young as compared to old faces. This effect was similarly observed in young and older participants, arguing against the idea of an own-age bias in young adults' early face perception. Finally, moderate effects of cognitive aging on face perception were detected in the present study, which is in line with previous research (Hildebrandt et al., 2010).

### Acknowledgments

This work was supported by a grant of the Deutsche Forschungsgemeinschaft (DFG) to HW (Wi 3219/4-2). SS and HW were additionally supported by a grant from the Bundesministerium für Forschung und Technologie (BMBF, Project IRESTRA, Section "Training social communication in elderly people"). The authors gratefully acknowledge help during data collection by Kathrin Rauscher.

### References


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Komes, Schweinberger and Wiese. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# **Cross-age effects on forensic face construction**

*Cristina Fodarella <sup>1</sup> , Charity Brown <sup>2</sup> \*, Amy Lewis <sup>3</sup> and Charlie D. Frowd <sup>1</sup>*

*<sup>1</sup> Department of Psychology, University of Winchester, Winchester, UK, <sup>2</sup> School of Psychology, University of Leeds, Leeds, UK, <sup>3</sup> The School of Psychology and Neuroscience, University of St Andrews, Fife, UK*

The own-age bias (OAB) refers to recognition memory being more accurate for people of our own age than other age groups (e.g., Wright and Stroud, 2002). This paper investigated whether the OAB effect is present during construction of human faces (also known as facial composites, often for forensic/police use). In doing so, it adds to our understanding of factors influencing both facial memory across the life span as well as performance of facial composites. Participant-witnesses were grouped into younger (19–35 years) and older (51–80 years) adults, and constructed a single composite from memory of an own- or cross-age target face using the feature-based composite system PRO-fit. They also completed the shortened version of the glasgow face matching test (GFMT; Burton et al., 2010). A separate group of participants who were familiar with the relevant identities attempted to name the resulting composites. Correct naming of the composites revealed the presence of an OAB for older adults, who constructed moreidentifiable composites of own-age than cross-age faces. For younger adults, age of target face did not influence correct naming and their composites were named at the same level as those constructed by older adults for younger targets. Also, there was no reliable correlation between face perception ability and composite quality. Overall, correct naming was fairly good across the experiment, and indicated benefit for older witnesses for older targets. Results are discussed in terms of contemporary theories of OAB, and implications of the work for forensic practice.

#### *Edited by:*

*Bozana Meinhardt-Injac, Johannes Gutenberg University Mainz, Germany*

#### *Reviewed by:*

*Claus-Christian Carbon, University of Bamberg, Germany Jessica Komes, Friedrich Schiller University, Germany*

#### *\*Correspondence:*

*Charity Brown, School of Psychology, University of Leeds, Leeds LS2 9JT, UK psccbr@leeds.ac.uk*

#### *Specialty section:*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

*Received: 30 April 2015 Accepted: 04 August 2015 Published: 21 August 2015*

#### *Citation:*

*Fodarella C, Brown C, Lewis A and Frowd CD (2015) Cross-age effects on forensic face construction. Front. Psychol. 6:1237. doi: 10.3389/fpsyg.2015.01237* **Keywords: own-age bias, face perception, facial memory, facial composites, PRO-fit, glasgow face matching test**

## **Introduction**

Individuals can effortlessly and accurately detect the age of a face across their life span (e.g., Rhodes and Anastasi, 2012). Age-indicative information can influence face-recognition accuracy, and lead to an own-age bias (OAB) where facial memory is stronger for those of our own than other age groups (Wright and Stroud, 2002; for a review, see Wiese et al., 2013a). Findings for the OAB have been replicated across ages (Rhodes and Anastasi, 2012) and contexts, such as eyewitness line-up studies (Wright and Stroud, 2002) and old/new decision tasks (Wiese, 2012).

It is worth mentioning that the own-*race* bias (ORB) resembles a separate phenomenon, whereby individuals are better able to remember faces belonging to their own-race relative to another race (e.g., see Meissner and Brigham, 2001). This has led researchers when attempting to explain effects of OAB to draw upon accounts originally put forward for the ORB: The assumption is that both ownrace and OAB are examples of a more general underlying phenomenon.

Several accounts have commonly proposed that a social categorization mechanism contributes to explaining group biases (e.g., Sporer, 1991; Levin, 2000). For example, Hugenberg et al.'s (2010) *categorization–individuation model* (*CIM*) theorizes that during the processing of a face, individuals engage either in categorization or individuation. Categorization leads to faces being encoded in terms of the social category to which they belong. This is thought to hinder the ability to discriminate between faces during recognition. Conversely, individuation leads to faces being encoded with regard to individualistic characteristics which would promote later recognition. In terms of the OAB, crossage faces may be perceived with regard to the age category to which they belong (categorization), whilst own-age faces may be perceived with more unique and individuating information (individuation). The impact of this effect results in the superior recognition of own-age faces.

The configural-feature hypothesis may also apply. This account proposes that highly-familiar faces, identities which are encountered frequently, are recognized based to a greater extent upon the configural information they contain (i.e., via the encoding of spatial relationships between some or all facial features; see Peterson and Rhodes, 2003) compared to information about individual facial features (eyes, nose, mouth, etc.) Facial memory is generally stronger when faces are perceived configurally. Therefore, own-age faces may be processed configurally and thereby more-effectively, whilst cross-age faces may be processed featurally, leading to an OAB (see Rossion and Michel, 2011, for a review). However, research comparing younger and older adults on holistic/configural processing of young and old faces is sparse. Wiese et al. (2013a) examined this issue using the well-known composite-face effect as an indicator of holistic processing. They found both younger and older adults were better at the task of discriminating two face halves when presented as misaligned compared to aligned, and this effect was particularly marked for young relative to old faces. This finding indicates that the effective application of holistic processing was determined by the age of the target face *per se*, rather than any effects of own-group bias. Nevertheless, using ERP measures, Wiese et al. (2013b) did observe an own-age advantage for younger, but not older participants when examining N250r, a component interpreted as reflecting enhanced access to face representations. More generally, there was evidence that holistic processing by older compared to younger adults was overall less efficient.

Further, it may be the case that increased contact with socalled "out" group members enables development of experience and expertise, both of which improve the ability to extract relevant information to aid recognition of out-group individuals (Meissner and Brigham, 2001). Normal ageing causes individuals throughout their life span to progress from one age group (e.g., younger adults) to another (e.g., older adults), and through the course of this process, older adults are likely to have gained considerable experience with faces of different age groups. However, it is debated whether accumulated contact over the lifespan influences the OAB, or recent daily-life contact only. There is evidence to support both views. For the former, there are many studies that show an OAB for younger but not older adults (e.g., Rhodes et al., 2008; Wiese et al., 2008; Havard and Memon, 2009). It could be theorized that this would be due to a difference in general experience with cross-age groups. Younger adults in general may not have had sufficient contact over their lifespan with older adults, leading to an OAB. In contrast, older adults must have had contact with all other age groups at some point during their lifetime, as they progressed through different age stages. Therefore, older adults have prior experience as a member of both age groups, leading to a lack of an OAB (see Wiese et al., 2008).

For the latter view, it is proposed instead that recent daily-life contact determines whether or not an OAB occurs. In support of this proposal, the OAB was not apparent when testing face recognition in young adult geriatric nurses (i.e., a job involving substantial contact with older adults) relative to young adult controls who as a group reported having low contact with an older adult population (Wiese et al., 2013c). Similarly, an OAB effect in older adults has also been shown to be mediated by different levels of contact. Wiese et al. (2012) included older adults who reported having either a high or a low level of recent contact with own-aged individuals relative to younger ones. Superior recognition of ownage faces (cf. cross-age faces) was apparent in those older adults reporting a high level of contact with an older adult population. In contrast, those with a more balanced contact to both younger and older adults did not show such bias. These findings indicate that previous experience of having been a member of the younger age group was not sufficient contact to diminish the OAB in *all* older adults, thereby suggesting an influential role for recent contact. The recent-contact hypothesis is further supported by a meta-analysis (Rhodes and Anastasi, 2012).

This opens up the question as to why some previous studies have not observed an OAB in older adults (e.g., Rhodes et al., 2008; Wiese et al., 2008; Havard and Memon, 2009). This null effect may be due to older adults tending to process the face featurally (through processing of individual face features) rather than configurally (processing an object as a whole; Murray et al., 2010). There may also be an associated age-related decline in the processing of facial information, one which causes older adults to perform worse (cf. young adults) when detecting, remembering and recognizing faces (for a review, see Ruffman et al., 2008). The configural-feature hypothesis proposes that familiar stimuli are processed more configurally rather than featurally; here, configural processing strengthens face-recognition memory to a greater extent than featural (Wiese et al., 2013b). Overall, a featural style of processing may hinder older adults from benefitting from improved recognition afforded by a greater reliance on configural information.

In an applied setting, identifying factors which influence face memory is important within a legal system. For example, eyewitnesses (witnesses and victims) are asked by police to construct a picture of the person they have seen commit a crime, an image known as a facial composite, and/or to identify a potential suspect from a police line-up (identity parade). Both of these forensic tasks involve face processing to a great extent and so may be susceptible to an OAB (e.g., Wright and Stroud, 2002). Here, our focus is on the former activity, people's ability to construct identifiable facial composites. Composites provide evidence usually gathered in the early stages of a police investigation that can be crucial to locate potential suspects (see Frowd, 2015, for a general review of composites). They are usually constructed 2–3 days after the crime occurred, but are occasionally created on the same day. An OAB could occur here and, if so, could manifest itself in composites of own-age groups being more effective than those of cross-age groups. To date, no published research has formally explored this issue, and the current paper aims to plug this gap by including both a youngerand older-adult sample who construct composites of same- and cross-age faces.

When including an older-adult sample, age-related memory decline may be relevant. Facial-composite construction using traditional "feature" systems involves detailed recall of facial features from memory (normally using cognitive-type interviewing procedures). Unfamiliar-face recognition is also involved as eyewitnesses are required to select individual facial features (e.g., eyes, nose, mouth) to build a face. Therefore, as both face recall (Wright and Holliday, 2007) and face recognition (Bartlett and Fulton, 1991) are impaired with advancing age, composites produced by older adults may be less effective than those produced by younger adults. Nevertheless, Komes et al. (2014) found that despite dividing their older adult sample into those exhibiting low versus high face recognition memory performance, both groups showed an equivalent bias toward recognizing own- versus other-race faces (i.e., an own-race bias). This suggests that even when memory is less effective individuals may still display a memory bias toward own-group faces, in this case, own-race faces. Similarly, in the present study we may find an OAB emerges over and above any more general age-related memory decline that becomes apparent in the task at hand. In this regard, it is worth noting that one study in the composite area (Frowd et al., 2005a) included older adults in their sample, but found no reliable relationship between age of face constructor and identification of resulting composite; however, while age of target varied considerably across the stimuli set, this property was a random variable and so the design may not have been sufficiently powerful to be able to detect an OAB, should one exist. The aim of the current study, therefore, is to formally assess whether age-related differences exist for composite-face production.

In summary, we investigated whether an OAB effect occurs during composite construction for both younger and older participants. These participant "witnesses" (face constructors) were grouped into younger and older adults and constructed a single feature-based composite of either an own- or crossage target face. As there are fairly-large individual differences in ability to construct faces from memory (e.g., Frowd et al., 2016), and face recognition is an important component for composite construction (e.g., Frowd et al., 2008), participant-witnesses also completed the shortened Glasgow Face Matching Test (GFMT; Burton et al., 2010), to initially investigate whether a relationship exists between face-perception ability and composite quality.

On the basis of the aforementioned face-recognition research, it was expected that an OAB effect would occur when constructing composites, and that this effect would be stronger for a younger than an older adult group. Also, given evidence of age-related decline in both recall and recognition, older (cf. younger) adults were expected to produce less effective composites in general.

### **Experiment**

To mirror real-life criminal investigations, participants who constructed the composites were required to be unfamiliar with the target faces, whilst participants who later attempted to identify the composites were required to be familiar with these identities. To satisfy this constraint on familiarity, previous research concerning composite construction has made use of target categories such as international sports players (e.g., snooker or football players; e.g., Frowd et al., 2007, 2010). This enables the recruitment of people who are not fans of the sport and hence unfamiliar with the target identities, for face construction and subsequently recruitment of sports fans (familiar with the target identities) for naming the resulting composites. Here premiership footballers and international/premiership football managers were used on the basis that these two groups contain individuals that fall into two separate age groups (younger, 22–33 years; older, 49–72 years) allowing age of target to be treated as a categorical variable. This allowed recruitment of participants (constructors) who were unfamiliar with the targets—people who where non-football fans—to create two groups that were mutually exclusive by age (younger, 19–35 years; or older, 51–80 years). These participants made a single composite of an unfamiliar target identity belonging either to their own or the other-age category. Subsequently, football fans were recruited as composite-namers who were familiar to the targets. Football fans are likely to know both footballer players and managers which allowed us to adopt a more powerful withinparticipants design for composite naming: all naming participants attempted to name both the younger (football players) and older (football managers) target identities. The two stages of composite construction and naming required a quite different design and procedure, as described below. The research was approved by the School of Psychology Ethics' Committee at the University of Leeds and complied with the relevant regulatory standards.

### **Composite-construction Stage**

### Design

Participants ("constructors") encoded a target face and then created a single composite from memory using the PRO-fit "feature" system in current police use (Frowd, 2015). This single-session design produces more-identifiable composites than designs involving a long retention interval (e.g., of 1 or 2 days) between target encoding and face production (Frowd et al., 2005b) and should improve the chances of observing an OAB, should one exist.

The two factors of target age and constructor age were each treated as categorical variables and implemented at two levels, "older" and "younger"; age groups in both cases were mutually exclusive. The targets were premiership footballers ("younger" targets) and premiership and international football managers ("older" targets). Thus, a 2 (constructor age: younger vs. older) *×* 2 (target age: younger vs. older) between-participants design was employed. The experimenter was unaware of the identities of the target photographs and the target-age conditions to which participants had been randomly allocated.

### Materials

The targets were front-facing color photographs located via an internet search: 10 white male premiership-level football players (age: 22–33 years; *M* = 28.0; SD = 4.0 years) and 10 international/premiership-level football managers (age: 49–72 years; *M* = 60.5; SD = 8.5 years). No one wore particularly distinctive features such as glasses, beard or jewelry. The size of each target image was approximately 6 cm (wide) by 8 cm (high). Each target was printed twice on single sheets of A4 paper in color, producing 40 targets in total: (i) 10 younger targets for younger participants, (ii) 10 younger targets for older participant, (iii) 10 older targets for younger participants and (iv) 10 older targets for older participants. Each target picture was shown to a different constructor to build a composite of that target face.

The shortened version of the Glasgow Face Matching Task (GFMT), a measure of individual face-processing ability, was administered to all constructors. Participants were presented with 40 pairs of faces, and asked to make same/different judgments. Scores were calculated out of 40, with one point awarded for each correct detection or discrimination.

Older adult participants were additionally screened using the Montreal Cognitive Assessment Tool (MoCA; Nasreddine et al., 2005). This cognitive-screening tool takes little time to administer and assesses potential mild cognitive impairment. Cognitivelyintact older adults typically score in the range of 26 or above. Therefore, adults scoring 26 or less on this assessment did not participate to completion in the study. This was to ensure that any effects of age on composite construction quality were not masked by the presence of abnormal cognitive decline within our older adult sample. PRO-fit software version 3.5 was used to construct the composites.

### Participants

The two age categories of participants for face construction were selected to be close to those of the younger and older target stimuli (see Materials and Methods). They were mutuallyexclusive and were in keeping with age categories used within previous OAB research. For example, within their meta-analysis Rhodes and Anastasi (2012) grouped participants aged 18–35 within a single "young" age group. Further, whilst previous research by Wolff et al. (2012) did split their participants into a "young" age group (19–29 years) and a "young middle" age group (30–44 years), they found comparable memory across these two groups for faces ranging from 18 to 44 years, with both age groups showing less accurate memory for older faces relative to young and young middle-aged faces. Similarly, our younger composite constructor group consisted of individuals spanning 19–35 years. Some researchers have also included a wide variety of ages within their older adult samples (e.g., *>*55 years, Rhodes and Anastasi, 2012; 63–92 years, He et al., 2011; 55–89 years, Anastasi and Rhodes, 2005; 64–86 years, Wilcock et al., 2007). However, some have distinguished between old (*>*75 years) and young-older adult participants (55–74), and there is evidence that old compared to young-older adults may perform differently on some memory tasks, including those that involve event recall (e.g., Wright and Holliday, 2007) or recognition of faces of different ages (e.g., Bäckman, 1991). In the present experiment, the older adult sample consisted of individuals aged 51 to 80 years, but included predominantly young-older adults (17 out of 20 participants were aged below 65). Whilst one may be inclined to suggest that the age difference between our younger (19–35) and older (51–80) age categories is small, Wolff et al. (2012) showed that there is indeed a performance difference between these age groups (19–44 vs. *>* 45 years) with regards to OAB.

The younger adult group (*N* = 20) was recruited from the University of Leeds via opportunity sampling. The age range is 19 to 35 years (*M* = 24.6; SD = 4.0, Kurtosis = 1.61; Skew = 1.28). The older adult group (*N* = 20) was recruited through word of mouth (in the Leeds area, North East England). Their age ranged from 51 to 80 years (*M* = 60.7; SD = 7.6; Kurtosis = 1.94; Skew = 1.39).

Participants were advertised on the basis of being unfamiliar with international footballer players and managers, spoke English as their first language and did not have regular recent daily-life contact with people of a different age group other than themselves. A high level of recent contact with people from the other-age group has previously been associated with a reduction in the OAB in recognition measures (Wiese et al., 2012, 2013a), and may mitigate against our observing any reduced effectiveness associated with constructing composites of other-age faces (Wiese et al., 2012, 2013a). We therefore asked participants whether they had regular contact, such as in an occupational capacity, with people from the other age category within the last 5 years. No participant constructing a composite of a target from the other-age category reported having pronounced job-related or other types of contact with people of the other-age group. All participant-constructors reported having normal or normal-tocorrected vision. Participants were paid £5 for their time.

### Procedure

Participants were tested individually throughout by the researcher. After giving informed consent, the older adult group completed the MoCA at least 30 mins prior to the experiment. Participants next attended a testing session lasting from 45 to 60 mins. They were told that they would create a composite of an unfamiliar target face. Participants then removed a target picture from an envelope (randomly selected by target and by condition, without replacement) and reported whether it was a known identity. If it was familiar, they were asked to select another at random. For the first face that was reported to be unknown, participants were given 60 s to remember it. One person reported that all available targets were familiar and was replaced by another person, to give the sample described in Participants. Following this procedure aimed to ensure that all participants constructed a composite of an unfamiliar face, as would be the norm for real witnesses.

The remaining procedure was self-paced. An Enhanced Cognitive Interview, as described by Fodarella et al. (2016), was administrated to allow participants to recall detailed information about the appearance of their target face; this interview was initiated with rapport-building, and was followed by context reinstatement, free recall and cued recall. Constructors were also told that it was acceptable to state if they did not recall specific features during cued recall; this instruction was important to state as research suggests that older adults have a tendency to guess Fodarella et al. Cross-age effects in PRO-fit composites

during recall tasks (Huff et al., 2011). The researcher operated the PRO-fit composite software to allow participants to construct a single composite of the target from memory. The procedure used to construct the composites is fairly detailed, and is described in full in Fodarella et al. (2016); in brief, participants selected individual features to match their description of the face, with each selected feature resized and positioned on the face with the aim of achieving the best likeness possible. Finally, participants completed the 40-item GFMT, and were debriefed.

## **Composite-naming Stage**

#### Design

A separate group of participants were invited to name the composites, to give a method of assessing the effectiveness of composites which is similar to their use forensically (e.g., Frowd et al., 2005a). Participants were recruited on the basis of their reported familiarity with both footballers who play within the premiership in the UK and those individuals who manage international and premiership football teams. The design was 2 (between-participants: constructor age) *×* 2 (within-participants: target age) Mixed-Factorial. The 40 composites constructed in the previous stage of the experiment were separated into two equal sets by categorical age of constructor. Composites of two youngmale and two older-male unfamiliar targets (so-called "foils") were added to each set; these additional composites were of unfamiliar identities in general (and not of football players or managers) and were included to limit naming by guessing and to increase ecological validity (e.g., Frowd et al., 2016).

The number of participants required in the naming stage was chosen to be able to detect a small effect size when their data were subject to a regression type analysis. This was based on a G\*Power analysis (version 3.9.1.2; Faul et al., 2009) with a small effect size (Odds Ratio *OR* = 1.5). Alpha was set at, *a* = 0.05, and power, 1–β = 0.95, with an equal number of participants viewing composites belonging to each constructor age group (Younger vs. Older adults; the between-participants factor). We assumed that at the very least a small amount of variance associated with age of constructor would be explained by the presence of the within participants factor, age of target, and therefore estimated a squared multiple correlation co-efficient of *R* <sup>2</sup> = 0.1 as an additional input parameter. A normal distribution was assumed for each predictor. Based on these relatively conservative parameters the analysis indicated that about 10 participants per group would be sufficient. We exceeded this lower estimate by recruiting 16 participants in each group (total *N* = 32).

### Materials

The 40 actual composites and the four foil composites were proportionally sized to 15 cm (high) by 10 cm (wide) and printed in greyscale (the image mode of the composite system) on A4 paper. **Figure 1** below shows example items across conditions. The 20 color target photographs from the construction stage were also required.

### Participants

Thirty-two participants (1 female) were recruited via opportunity sampling in a local sports centre in the North East of England.

Their age ranged from 21 to 59 (*M* = 30.8, SD = 10.2) years. Participants were assigned equally to the between-participants factor of constructor age. Each person was paid £2 for their time.

The majority of participants within the sample are male. While it is tempting to suppose that such a gender bias might skew results, previous research suggests that target gender does not strongly influence face recognition (e.g., Shapiro and Penrod, 1986) and, more specifically, gender has not been found to influence composite naming (Frowd et al., 2005b).

### Procedure

Participants were tested individually. They were told that they would be shown a set of 24 composites to name, some of which were of premiership footballers or international or premiership football managers. It was also mentioned that some composites were of unfamiliar identities, to make the task more realistic. The relevant set of composites was then presented sequentially for participants to name, randomly assigned to constructor age with equal sampling. Next, the 20 target photographs were presented likewise for naming. This acted as a familiarity check to ensure participants were familiar with the majority of the target identities. According to an *a priori* rule, participants' data were excluded if less than 16 targets were named correctly. This situation

#### **TABLE 1 | The advantage of older constructors creating composites of older target faces.**


*Figures are percentage-correct accuracy calculated from responses in parentheses: summed correct responses (numerator) and total (correct and incorrect) responses (denominator). These data are for composites for which participants correctly named the relevant target (N* = *603 out of 640). The model converged with significant predictors for age of target (p < 0.05), interaction (p < 0.05) and the Coefficient [B* = *−2.5, SE(B)* = *0.2, p < 0.001, Exp(B)* = *0.1]. See text for more details. <sup>a</sup>p < 0.01; <sup>b</sup>p < 0.05.*

occurred on five occasions. Data from these participants were not included in the analysis, and further participants were recruited as replacements in the relevant conditions (to give the sample described above). The naming task was completed in about 15 mins per person, including debrief.

## **Results**

### **Spontaneous Naming**

Participant responses to composites were checked for missing data (of which no cases were observed) and scored for accuracy: a numeric value of 1 was assigned when the correct name was given and 0 otherwise. Overall, 52 responses were correct out of a possible 640. Responses to target photographs were handled in the same way, and 603 were correct. Target naming was thus considerably higher than composite naming, but this is the usual situation as composites are prone to error and are rarely named perfectly. However, failure to recognize a target does suggest that its corresponding composite could not be recognized either, and so, for each of these cases, the relevant composite was scored as missing data (i.e., not included in the subsequent analysis).

Composite-naming scores were subjected to Logistic Regression for age of target (younger vs. older adult) and age of constructor (younger vs. older adult). A full-factorial model was built and each predictor was subject to sequential removal (for *p >* 0.1) using Backward LR: age of constructor was removed in Step 1 (*p* = 0.61). The resulting model was reliable [*X 2* (2) = 9.4, *p* = 0.009, *R 2* (Cox and Snell) = 0.015, *R 2* (Nagelkerke) = 0.035] with a good fit (Hosmer and Lemeshow, *p* = 0.88).

Age of target was reliable [regression coefficient *B* = 0.6, SE*(B)* = 0.3, *p* = 0.039], with an advantage for older over younger targets [*Exp(B)* = 1.9]<sup>1</sup> . This predictor was qualified by age of constructor [*B* = 1.4, SE*(B)* = 0.6, *p* = 0.025, *Exp(B)* = 3.9] (**Table 1**) since (using two-tailed Fisher Tests) the advantage of target age was restricted to older constructors (*p <* 0.01, Odds Ratio *OR* = 3.4); also, for older targets, there was an advantage of older over younger constructors (*p <* 0.05, *OR* = 2.2).

Participant responses to composites were also analyzed for mistaken names given, to provide an indication of willingness to offer any name (i.e., a guess); it is analogous to response *bias* in signal detection paradigms. After discounting correct responses to composites (*N* = 52) and screening out unfamiliar targets (*N* = 37), mean incorrect names were fairly frequent overall (*N* = 244/551, *M* = 44.3%)—a usual situation with composites (e.g., Frowd et al., 2016). Logistic Regression revealed that neither of the predictors (*p*s *>* 0.4) nor their interaction (*p* = 0.9) exerted a reliable influence on this DV.

### **40-item Glasgow Face Matching Task**

A two-tailed *t*-test was run to compare scores on the GFMT across age groups. Previous findings indicate that there are no reliable age differences in task performance (Burton et al., 2010). Our findings replicate this null effect, *t* (38) = 1.1, *p* = 0.28.

As the GFMT is a measure of face-perception ability, it follows that those who are better at perceiving faces should also be better at constructing faces, as the latter should involve face perception. A one-tailed correlation between the correct-naming score for each composite and the relevant participant's GFMT score was not significant, *r* (38) = 0.18, *p* = 0.13.

## **Discussion**

The current study aimed to investigate whether an OAB effect occurs in facial-composite construction. Older and younger adults viewed an own- or cross-age target face, and created a single composite from memory. The resulting composites were then given to further participants to name. Results of correct names given partially supported one of the hypotheses: OAB was found for older constructors, but—against predictions—not for younger adults.

Own-age bias refers to facial-recognition memory being more accurate for those of our own than cross-age groups (Wright and Stroud, 2002). The literature reveals somewhat inconsistent findings. Some studies indicate that an OAB occurs for all age groups across life span, that is, for both younger and older adults (e.g., Rhodes and Anastasi, 2012). Other studies find that it would only occur for younger adults, with no effect on older adults (Havard and Memon, 2009). The latter findings can be explained by the contact hypothesis which predicts that face recognition of other-age faces improves as a function of general contact with other-age faces that is accumulated throughout the lifespan. The former findings, however, are in line with a recentcontact hypothesis which indicates instead that it is recent dailylife contact with other-age faces (rather than contact gathered over the life span) that plays a role in mitigating OAB effects in face recognition (Wiese et al., 2012). The current study does not seem to fit with either hypothesis, with correct naming of composites suggesting an OAB for older but not younger adults.

The lack of OAB in younger adults is surprising given that previous research has consistently outlined the OAB effect in young adults (e.g., Wright and Stroud, 2002; Wiese, 2012). One possible explanation may be the fact that PRO-fit composite construction and face recognition rely upon different types

<sup>1</sup>For readers unfamiliar with the *Exp(B)* measure of effect size, it is equivalent to the Odds Ratio (OR)—the number of times one condition is more effective than another.

of information—that is, featural versus configural information. PRO-fit being a feature system is likely to have led to featural processing, whilst limiting the ability to engage in configural processing during composite construction. In contrast, OAB may plausibly arise due to differences in the application of configural processing for own versus other age faces (e.g., Rossion and Michel, 2011). This may explain why younger adults showed no OAB. As configural processing is not suited to the feature-based PRO-fit task, younger adults did not benefit from having increased expertise and increased sensitivity to configural information for own compared to cross-age faces. Indeed recent evidence from work using ERP measures suggests younger compared to older adults process holistic information from faces more efficiently (Wiese et al., 2013a). Further, research stresses the importance of transfer-appropriate processing, a match between encoding and retrieval processes, to enable successful task performance (McBride and Abney, 2012). It would seem that face construction using a traditional feature system may not be capitalizing on configural information; in fact, participants who may be less efficient at processing faces holistically, and may therefore rely more on featural information (as in the older adults with older target faces), appear to benefit in this face-perception task. One way to investigate this account further would be to replicate the current research using a holistic composite system such as EvoFIT (e.g., Frowd et al., 2010). This type of system requires constructors to repeatedly select from arrays of complete faces, rather than by selecting individual features, with the aim of maximizing construction of configural cues. It does seem to be the case that this system is a more effective method of accessing memory since mean naming of its composites has been reported to be around 50% correct following a 24 h retention interval (e.g., Frowd et al., 2013, 2016). If the above proposed account is correct, an OAB would be expected to occur for younger adults. This could be due to the holistic system enabling younger adults to use their expertise, leading to an OAB. In contrast, an OAB may now not occur in older adults.

So, our data indicates an OAB in older adults, which is in line with some research showing an OAB for this age group (Rhodes and Anastasi, 2012). As the ability to engage in configural processing declines with age, older adults may therefore engage more in featural processing as a consequence (Murray et al., 2010). This is likely to have been suited to the feature-based PRO-fit task, thereby enabling an OAB for older adults but not younger ones. However, this cannot be the only explanation, as the use of featureprocessing *per se* would have led to better-quality composites for both own- and cross-age faces in older adults.

Hugenberg et al.'s (2010) *CIM* may aid in explaining the effect further. The CIM proposes that individuating information about a face is encoded for own-group faces, thereby facilitating memory. Taking this into account, it may be that older adults were able to extract feature-based individuating information from ownage faces, which may have aided construction of good-quality own-age composites.

Taking into account age-related memory decline (Havard and Memon, 2009), and the fact that older eyewitness memory recall is less detailed and accurate (Wright and Holliday, 2007), it was hypothesized that older adults may produce less-identifiable composites than younger adults. However, no difference was found, and this is consistent with past research in the composite area (Frowd et al., 2005a). In fact, our data indicates a situation in which older adults actually outperformed younger adults. However, within the current study the older adult sample predominantly consisted of adults aged 51–65 years (17 out of 20 participant-constructors). Therefore, as memory declines steadily with age (Grady and Craik, 2000), it may be that age-related memory decline was not strong enough to be observed within our older-adult sample. Replicating this research with an older sample of a smaller age range (70–80 years) would be beneficial to firm-up conclusions.

Our findings also indicated no difference across the two age groups in face perception ability as measured by the 40-item GFMT measure. This suggests that the ability to detect similarities/differences in faces does not decline with age. This is in line with previous research (Burton et al., 2010). However, we expected to find that those scoring high on the GFMT would produce more-identifiable composites, as face construction requires the ability to process faces. No significant positive correlation was found between composite quality and GFMT score. However, future research could also consider incorporating alternative individual difference measures. For example, recognition memory likely plays a role in face construction, and assessing the relationship between measures targeting face recognition memory ability and face construction would aid understanding of the extent to which face construction relies upon an individual's ability to effectively utilize information residing in memory (e.g., memory for configural versus feature information).

With regard to a real-life application, the data suggest a lack of age difference for constructors—that is, older eyewitnesses produce composites to a similar quality to those of younger witnesses. Identification of composites is likely to be better when older adults construct faces of a similar age to themselves, outperforming younger adults. Thereby, composites are likely to be more effective from an elderly witness (cf. younger witness) when the offender is also elderly. This, as we have argued, may differ depending on which composite system is used.

In summary, the current paper is the first to formally investigate whether an OAB occurs during composite construction. Findings indicate that an OAB occurred for older adults only. The mechanism for this OAB in older adults may simply be that these participants are better able to extract individuating feature-based information from targets of their own age, information which is beneficial for face production using the feature-based PRO-fit system.

### **Acknowledgments**

This work was supported in part by an award from the UK Economic and Social Research Council (RES-000-22-4150) to Dr Charity Brown and Dr Charlie Frowd. Datasets associated with the larger project and information concerning additional outputs can be accessed via the RCUK Gateway to Research and UK Data Service. We would like to thank Kate Herold for her assistance in data collection.

### **References**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Fodarella, Brown, Lewis and Frowd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Many faces, one rule: the role of perceptual expertise in infants' sequential rule learning

*Hermann Bulf1,2\*, Viola Brenna1, Eloisa Valenza3,4, Scott P. Johnson5 and Chiara Turati1,2*

*<sup>1</sup> Dipartimento di Psicologia, University of Milano-Bicocca, Milano, Italy, <sup>2</sup> Milan Center of Neuroscience (NeuroMI), Milano, Italy, <sup>3</sup> Dipartimento di Psicologia dello Sviluppo e della Socializzazione, Università degli Studi di Padova, Padova, Italy, <sup>4</sup> Interdepartmental Center for Cognitive Science, Università degli Studi di Padova, Padova, Italy, <sup>5</sup> Department of Psychology, University of California, Los Angeles, Los Angeles, CA, USA*

Rule learning is a mechanism that allows infants to recognize and generalize rulelike patterns, such as ABB or ABA. Although infants are better at learning rules from speech vs. non-speech, rule learning can be applied also to frequently experienced visual stimuli, suggesting that perceptual expertise with material to be learned is critical in enhancing rule learning abilities. Yet infants' rule learning has never been investigated using one of the most commonly experienced visual stimulus category available in infants' environment, i.e., faces. Here, we investigate 7-month-olds' ability to extract rule-like patterns from sequences composed of upright faces and compared their results to those of infants who viewed inverted faces, which presumably are encountered far less frequently than upright faces. Infants were habituated with face triads in either an ABA or ABB condition followed by a test phase with ABA and ABB triads composed of faces that differed from those showed during habituation. When upright faces were used, infants generalized the pattern presented during habituation to include the new face identities showed during testing, but when inverted faces were presented, infants failed to extract the rule. This finding supports the idea that perceptual expertise can modulate 7-month-olds' abilities to detect rule-like patterns.

#### Keywords: rule learning, face, infants, inversion effect, perceptual expertise

## INTRODUCTION

A central question in developmental research concerns how infants learn to detect relations between different elements and to generalize these relations to new elements that may have no surface features in common to those previously encountered. This learning mechanism, known as rule learning, is crucial to extraction of structure from our environment and its consistencies across space and time.

Rule learning was first investigated in the linguistic domain by Marcus et al. (1999). The authors assessed infants' ability to extract rules from a speech sequence, familiarizing 7 months old infants to sequences of syllables that followed a particular grammar (e.g., *la ta ta, gai mu mu,* which is ABB). Given 2 min of exposure, infants were able to discriminate between novel sequences following the same pattern (e.g., wo fe fe, for ABB), and novel sequences following a different pattern (e.g., *wo fe wo,* which is ABA). The test syllables had not been used in training, suggesting that infants can extract a rule, generalize it to novel stimuli that may have no surface features in common with those presented in training, and discriminate it from other, similar patterns. In contrast to

#### *Edited by:*

*Andrea Hildebrandt, Ernst-Moritz-Arndt Universität Greifswald, Germany*

#### *Reviewed by:*

*Rubi Hammer, Northwestern University, USA Markus Krüger, Ernst-Moritz-Arndt Universität Greifswald, Germany*

> *\*Correspondence: Hermann Bulf hermann.bulf@unimib.it*

#### *Specialty section:*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

*Received: 22 April 2015 Accepted: 02 October 2015 Published: 21 October 2015*

#### *Citation:*

*Bulf H, Brenna V, Valenza E, Johnson SP and Turati C (2015) Many faces, one rule: the role of perceptual expertise in infants' sequential rule learning. Front. Psychol. 6:1595. doi: 10.3389/fpsyg.2015.01595* their success in learning rules from speech, 7-month-olds failed to learn rules from non-linguistic auditory stimuli, including animal sounds, pure tones, notes of different timbre (Marcus et al., 2007), and chords (Dawson and Gerken, 2009). These findings could be taken to suggest that rule learning is a mechanism specific for language acquisition, innately predisposed to process speech sounds (Marcus et al., 1999, 2007).

However, a number of studies have cast doubt on this claim. For example, using near-infrared spectroscopy, it has been found that newborns did not exhibit a rule-learning mechanism when they were exposed to linguistic sequences of syllables (Gervain et al., 2008). Newborn babies were able to detect adjacent repetitions such ABB, but they failed in detecting nonadjacent repetitions such as ABA, providing evidence that at birth infants may have a domain-general perceptual "repetition detector," instead of a true ability to extract rule-like patterns. Moreover, 4-month-old infants, but not 7-month-olds, learned rules from non-linguistic auditory stimuli, such as sequences of musical chords or tones, suggesting that 7-month-olds' rule learning would have been tuned to the linguistic domain as a consequence of the different experience with language and music acquired between 4 and 7 months of age (Dawson and Gerken, 2009). These lines of evidence appear to contradict the account that rule learning is an innate mechanism specific for speech processing.

The claim that rule learning is not specific to language acquisition is supported by recent studies that have investigated infants' ability to detect and generalize rules from visual stimuli. Saffran et al. (2007) have demonstrated that 7 months old infants can learn sequential rules from visual stimuli that they can readily represent and categorize, such as images of dogs or cats. The authors argued that, instead of being evolved to subserve language learning, rule learning can be considered as a more general mechanism that is modulated by the familiarity and the categorizability of the stimuli to be learned: familiar stimuli, no matter whether they belong to linguistic or visual domains, enhance infants' ability to detect and generalize rules. This idea is supported by a recent study that has investigated 8- and 11-month-olds' rule learning abilities in the presence of unfamiliar visual shapes (Johnson et al., 2009). Indeed, when visual stimuli are unfamiliar, infants' rule learning abilities are weaker than with familiar stimuli. For instance, 7.5 months old infants' ability to extract rules from a sequence of communicative but unfamiliar sign language-like gestures is limited to some patterns (i.e., ABB vs ABA, but not the reverse) (Rabagliati et al., 2012). Also, 5-month-old infants learn rules that are jointly instantiated in shapes and syllables, but not rules from shapes alone (Frank et al., 2009). Overall, these findings suggest that rule learning can be applied also to visual stimuli, and it is facilitated when the information presented is highly familiar to infants.

Yet, quite surprisingly, to our knowledge, infants' ability to detect rules has never been investigated using a salient stimulus category for which infants commonly accumulate an extensive visual experience, such as faces. Faces are the stimuli that we likely encounter more often in the visual environment from early in life, being an important medium for the child's cognitive, emotional, and social development. Indeed, newborns preferentially attend to faces (Valenza et al., 1996; Macchi Cassia et al., 2004) and show surprisingly refined face discrimination capacities (e.g., Pascalis and de Schonen, 1994; Turati et al., 2006, 2008).

Faces convey a great amount of visual information, both transitional (i.e., emotions, gaze direction) and stable (i.e., species, race, gender), and face recognition is spontaneously, efficiently, and routinely performed by humans. This specialization in face processing is explained as a result of the perceptual expertise with this category of stimuli acquired through development (e.g., Diamond and Carey, 1986; Tarr and Gauthier, 2000; Gauthier and Nelson, 2001). Furthermore, there is mounting evidence that perceptual experience has a critical role in building face representation even in the first months after birth. For example, during the first year after birth, infants' face-processing skills tune around faces of the most experienced species (Pascalis et al., 2002; Di Giorgio et al., 2012), race (Kelly et al., 2007, 2009), and age (Macchi Cassia et al., 2014), providing evidence that the amount of early environmental exposure to different face types shapes infants' face processing abilities. Although a long developmental time course is required in order to achieve the adult level of expertise at discriminating individual faces, key aspects of adult face recognition (e.g., sensitivity to configural cues) develop in the first years after birth and, in many cases, in infancy (McKone et al., 2012). One traditional example of configural processing is the *inversion effect*, which refers to the disproportionate drop in performance for face recognition relative to object recognition due to stimulus inversion (Yin, 1969). Recent evidence has shown that infants process faces differently according to whether the stimuli are presented upright or inverted, providing evidence for the presence of an inversion effect during early infancy (e.g., Turati et al., 2004). Turati et al. (2004) showed that, under some experimental conditions, 4-month-olds' face recognition abilities are limited by stimulus orientation. This orientation difference in infants' face processing was confirmed by an eye movement study showing qualitative differences in the way that 4-month-olds explored upright and inverted faces (Gallay et al., 2006). Overall, these studies provide evidence for a crucial role of perceptual expertise in shaping face-processing skills during the first months after birth, rendering the use of faces particularly suitable in investigations of whether and how rule learning might be affected by stimulus familiarity at early stages of development.

The aim of the present study was to investigate 7-montholds' ability to recognize and generalize rule-like patterns when constituent elements of the patterns are faces. Moreover, we examined the role of perceptual experience in 7 months old infants' ability to detect rule-like patterns by presenting infants with sequences of images of upright and inverted faces. The images of inverted faces were identical to the images of upright faces except for the orientation. To our knowledge this is the first time in which infants' rule learning abilities were investigated by directly comparing a frequently experienced category of visual stimuli with an infrequently experienced category.

Using an infant-controlled visual habituation paradigm, 7-month-olds were habituated to triads of faces in an ABA condition (i.e., a face A was followed by a different face B, that was in turn followed by the face A), or in an ABB condition. In the test phase, ABA and ABB triads, composed by faces that differed from those showed during habituation, were presented. If infants learned the rule inherent in the face triplets presented during habituation and generalized it to the new face identities, they should look longer at the novel rule as compared to the familiar one during the test phase. Infants were presented with an upright condition, in which face triplets were composed of upright faces, or with an upside-down condition, in which faces triplets were composed of inverted (upside-down) faces. We expected that upright faces, in contrast to inverted faces, would be treated as familiar stimuli by 7 months old infants, leading to an advantage in the rule-learning task compared to inverted faces.

### MATERIALS AND METHODS

### Participants

Seventy-one 7 months old infants (35 females, *M*age = 225 days; range = 209–244 days) were included in the final analyses. All participants were healthy and full-term, and they were all Caucasian. Twenty-three additional infants were excluded from the final sample because of fussiness (*N* = 19), preterm birth (*N* = 2), or medical problems in the first months (*N* = 2). Participants were recruited via a written invitation that was sent to parents based on birth record provided by neighboring cities. Infants were randomly assigned to the upright condition (*N* = 35) or to the inverted condition (*N* = 36), in which face sequences were composed by upright and inverted faces, respectively. The procedure was approved by the University Ethical Committee. Parents gave written informed consent for their infants' participation.

### Stimuli

Upright faces were color photographs of 12 female adult faces of Caucasian origin, all displaying a full-front neutral expression with open eyes. The images were taken from the Radboud Faces Database (Langner et al., 2010). Using the software Adobe Photoshop, face images were cropped maintaining some external features like ears and hair, and pasted on a gray background (**Figure 1**). When viewed from approximately 60 cm, adult faces measured 19◦ in height and 14◦ in width. Inverted faces were the same 12 female faces turned upside-down by 180◦.

### Apparatus

All infants were tested in a dedicated cabin, while seated in an infant-seat or on the parent's lap and positioned at a distance of approximately 60 cm from a 61-cm computer screen. The whole experiment was recorded through a video-camera, hidden over the screen, which fed into a TV monitor and a digital video recorder, both located outside the testing cabin. The TV monitor displayed the live image of the infant's face to allow the online coding of the infant's looking times through the E-Prime program by the experimenter, who was outside the testing cabin and blind to the condition to which the infant was assigned. The image of the infant's face was also recorded via a Mini-DV digital recorder for a frame-by-frame offline coding of looking times during test trials.

### Procedure

We adopted the general procedure used by Saffran et al. (2007). Eight different face identities were used to create the habituation triads, and four different face identities were used to create the test triads. For the habituation sequences, four face identities were assigned to the A group and four to the B group. The A and B images were randomly combined by the software to generate 16 different ABA triads (i.e., a face A was followed by a different face B that was in turn followed by the face A) and 16 different ABB triads. For the test sequences, triads were made up of four novel face identities, two assigned to the group A and two assigned to the group B.

A left-to-right sequential and simultaneous presentation of the face images within each triad was used. The first picture was displayed alone for 330 ms, toward the left edge of the screen. The second picture was then added in the middle of the monitor, to the right of the first picture; this two-face display was presented for 330 ms. Then the third picture was added, to the right of the second picture, and the full triad was displayed for 830 ms, for a total of 1.5 s for each triad. A blank screen (500 ms) separated the triad presentations on each trial. In each condition (upright and inverted), half of the infants was randomly assigned to the ABB habituation condition, the other half to the ABA habituation condition (**Figure 1**).

An infant-controlled habituation procedure was used. Testing began with a central cartoon animated image associated to a sound to catch infants' attention. As soon as the infant fixated the screen the experimenter turned off the cartoon and activated a trial, so that the habituation phase began. Each trial consisted of triads of faces, presented in a random order, organized in either the ABB or ABA pattern. The experimenter recorded infant's fixation by holding the mouse button whenever the infant fixated on the stimulus. If the infant looked away from the stimulus for more than 2 s, the trial ended and a cartoon animation reappeared on the screen to re-attract the infant's attention before a new trial was presented. The habituation phase ended when the sum of infant's looking times on three consecutive trials was equal to or less than 50% of the total looking time from the infant's first three trials (Slater et al., 1985). When this habituation criterion was reached, the stimulus was automatically turned off and a new cartoon animation image was turned on. As soon as the infant's gaze was realigned to the animation, the test phase began. All infants received the same set of six test trials in which ABA and ABB triads, composed by faces that differed from those showed during habituation, were presented alternately, each for three times. The order of presentation (i.e., novel or familiar first) was counterbalanced among infants.

Means of looking times (s) toward novel or familiar pattern were considered as the dependent variable. About one third of the infants (*N* = 20) was coded offline by a second independent

observer who was blind to the experimental conditions. Interobserver agreement (Pearson correlation) between the two observers (i.e., the one who coded the data online and the one who coded from digital recording), as computed on total fixation times during test trials, was *r* = 0.97.

### RESULTS

A repeated measures ANOVA was performed on looking times toward test stimuli, with Presentation (First, Second, Third) and Novelty (New, Familiar) as within-subjects factors and Orientation (Upright, Inverted) and Habituation sequence (ABA, ABB) as between-subjects factors. The analysis revealed a main effect of Presentation, *F*(2,134) = 6.95, *p* = 0.010, η2 <sup>p</sup> = 0.09, and an interaction between Novelty and Orientation, *<sup>F</sup>*(2,134) <sup>=</sup> 6.951, *<sup>p</sup>* <sup>=</sup> 0.010, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.09. As for the main effect of Presentation, infants' looking times were greater in the first (*M* = 10.28 s, *SD* = 7.8) than in the second presentation (*M* = 8.17 s, *SD* = 5.4) of the test trials, *t*(70) = 2.27, *p* = 0.027, Cohen's *d* = 0.31, and in the first than in the third presentation (*M* = 7.60 s, *SD* = 4.6) of the test trials, *t*(70) = 3.048, *p* = 0.003, Cohen's *d* = 0.41. This result might indicate a weariness effect in the last test trials. As for the Novelty × Orientation interaction, when infants were habituated to upright faces, in both habituation conditions (ABB and ABA) they looked more toward the novel (*M* = 11 s, *SD* = 6.7) than to the familiar sequence (*M* = 8.7 s, *SD* = 5.6), *t*(34) = 2.69, *<sup>p</sup>* <sup>=</sup> 0.011, Cohen's *<sup>d</sup>* <sup>=</sup> 0.37 (**Figure 2**). When infants were habituated to inverted faces, in contrast, looking times did not

differ between the novel (*M* = 7.1 s, *SD* = 3.5) and the familiar sequence (*M* = 7.9 s, *SD* = 3.6), *t*(35) = 0.99, *p >* 0.3. The Habituation sequence × Novelty interaction was not statistically significant, *F <* 1, *p >* 0.6; there was no reliable difference in novelty preference between infants habituated to the ABB and the ABA sequences.

were upright faces but not in the case of inverted faces. Error bars represent

standard error of the means. ∗*p <* 0.05.

Results indicate that when infants were habituated to a sequence of upright faces that contained a rule-like pattern (ABB or ABA), they looked longer at the novel rule as compared to the familiar one during the test phase, providing evidence that they were able to extract the rule from the habituation sequence and to generalize it to the new face identities presented during the test phase. Conversely, when infants were habituated with triplets of inverted faces, they did not discriminate the familiar rule from the novel one in the test phase, suggesting that infants were not able to detect and to generalize the rule-like pattern when inverted faces were presented.

### GENERAL DISCUSSION

Rule learning is a mechanism that allows infants to detect and generalize rule-like patterns. While it has been first proposed that the ability to extract rules from a sequence of elements might be specific to the linguistic domain (Marcus et al., 1999, 2007), it has more recently pointed out that rule learning is not exclusive to language (e.g., Dawson and Gerken, 2009; Rabagliati et al., 2012). One of the factors that seems to modulate infants' rule learning abilities is the familiarity with the stimuli, perhaps facilitating the types of comparison necessary to extract a rule (Saffran et al., 2007). On this account, rule learning is preferentially evoked by speech because speech is a highly salient and experienced stimulus for infants.

In the present study we further investigated the role of perceptual expertise on 7 months old infants' rule learning abilities using sequences of faces, a visual stimulus category that is pervasive in the infant's environment since birth. After being habituated to sequences of upright faces that contained a rule, infants were able to discriminate and generalize the rule to new face identities. In contrast, when inverted faces were used as elements of the sequences, infants were not able to detect the rule, as revealed by the lack of discrimination between the sequences that contained the familiar rule and the novel rule during the test phase. Face inversion might have disrupted infants' efficacy in processing face information and, in turn, the advantage in extract the rule found for upright faces. According to Saffran et al. (2007), this finding confirms that 7-month-olds' sequential rule learning is affected by the perceptual expertise with the material to be learned: experience with upright faces might have enhanced infants' ability to detect and generalize the rule-like patterns by highlighting similarities between sequences that aid regularity learning.

This outcome is in line with previous evidence regarding 7 months old infants' rule learning limited capacity when unfamiliar visual stimuli such as geometric shapes were presented (Johnson et al., 2009). In contrast to this previous study with geometrical shapes; however, in which infants were able to learn an ABB rule but not an ABA rule, our data with upside-down faces provide evidence that infants were not able to detect either the ABB and the ABA pattern, suggesting that infants' rule learning from inverted faces is more fragile than infants' rule learning from geometric shapes. This difference between inverted faces and geometric shapes could be due to the higher perceptual

complexity of the inverted faces, making this type of stimulus harder to process as compared to the simple geometric shapes shown by Johnson et al. (2009). The same difficulties in extracting rules have been found when 7–8-month-olds were presented with unfamiliar sign language-like gestures (Rabagliati et al., 2012), or unfamiliar and sequential auditory stimuli, such as animal sounds, pure tones, notes of different timbre (Marcus et al., 2007), and chords (Dawson and Gerken, 2009).

It is worth noting that our data do not allow us to identify which processes underlie infants' ability to extract rules for upright-face sequences, and infants' failure to extract rules from inverted-face sequences. Further research is needed to understand which level of processing, featural, or configural, is involved when infants extract a sequential rule from a face sequence, this factor being crucial in affecting visual category learning (Hammer, 2015). In addition, our study focuses on infants' ability to extract a rule from different individuals within a single frequently experienced category, leaving unresolved whether infants can extract rule-like patterns from different broad categories to which they have been exposed. The comparison between faces and non-face objects is critical for this purpose, as it has been proposed that face-processing specialization would be the result of general processes devoted to the highly expert identification of within-category exemplars from any object class (e.g., Gauthier and Logothetis, 2000; Tarr and Gauthier, 2000).

Overall, the present study suggests that perceptual experience is crucial in enhancing rule-learning abilities in 7 months old infants, supporting the idea that rule learning might be a domaingeneral mechanism, instead of a mechanism specific for language acquisition. This claim seems to be confirmed by evidence that newborns are not able to detect rules from a stream of linguistic elements (but possess a general ability to detect perceptual repetitions, Gervain et al., 2008), as well as by evidence that 4-month-olds can learn rules from sequences of non-linguistic auditory elements, such as tones or chords (Dawson and Gerken, 2009). It has been claimed that domain-general cognitive biases and previous learning must be considered as potential sources of constraints on subsequent rule learning abilities (Dawson and Gerken, 2009).

We propose that rule learning abilities might be an emerging property of early biases, such as newborns' ability to detect perceptual repetitions and newborns' sensitivity to the statistical structure of a sequence of elements (Bulf et al., 2011). The early sensitivity to statistical information might provide a foundation for the acquisition of more complex relations, perhaps by directing infants' attention toward potential patterns on the basis of proximity in space and time (Johnson et al., 2009). With development, rule learning might then be tuned to those stimuli to which infants are most frequently exposed in their environment, such as speech and faces, providing an advantage to extract rule-like patterns from these categories of stimuli as compared to those categories of stimuli for which infants have less experience, such as tones or chords. Notably, faces and speech are closely related in infants' environment: speech sounds come from speaking faces, providing infants with multimodal synchronous stimulation. Therefore, it is also possible to speculate that not only faces and speech *per se* may facilitate infants' rule learning abilities, but these two stimulus categories may support each other in accounting for their rule learning advantage. This hypothesis is consistent with recent Bayesian proposals of cognitive development (e.g., Gopnik and Tenenbaum, 2007), for which a core feature is what the child brings to the learning task (Newcombe, 2011). For example, infants possess a rich set of learning mechanisms supporting pattern identification, including rule learning, and we have shown that such mechanisms are constrained by stimulus familiarity. Future research should explore which characteristics of the

### REFERENCES


stimuli make a rule easy or hard to learn, as well as whether and how infants' rule learning abilities develop in early infancy.

### ACKNOWLEDGMENTS

We are deeply indebted to the infants who took part in the study and to their parents. We also thank Carlo Toneatto for programming the experiments. This research was supported by an ERC Starting Grant ODMIR No. 241176 (PI: CT), and by NIH grants R01-HD73535 and R01-HD082844 (PI: SJ).

Marcus, G., Fernandes, K., and Johnson, S. (2007). Infant rule learning facilitated by speech. *Psychol. Sci.* 18, 387–391. doi: 10.1111/j.1467-9280.2007.01910.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Bulf, Brenna, Valenza, Johnson and Turati. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Neural systems and hormones mediating attraction to infant and child faces

Lizhu Luo, Xiaole Ma, Xiaoxiao Zheng, Weihua Zhao, Lei Xu, Benjamin Becker and Keith M. Kendrick \*

Key Laboratory for NeuroInformation of Ministry of Education, Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu, China

#### Edited by:

Andrea Hildebrandt, Ernst-Moritz-Arndt-Universität Greifswald, Germany

#### Reviewed by:

Marta Borgi, Istituto Superiore di Sanità, Italy Jennifer Streiffer Mascaro, Emory University, USA

#### \*Correspondence:

Keith M. Kendrick, Key Laboratory for NeuroInformation of Ministry of Education, Center for Information in BioMedicine, University of ElectronicScience and Technology of China, No.2006, Xiyuan Ave., West Hi-Tech Zone, Chengdu, Sichuan 611731, China k.kendrick.uestc@gmail.com

#### Specialty section:

This article was submitted to Perception Science, a section of the journal Frontiers in Psychology

Received: 15 April 2015 Accepted: 28 June 2015 Published: 17 July 2015

#### Citation:

Luo L, Ma X, Zheng X, Zhao W, Xu L, Becker B and Kendrick KM (2015) Neural systems and hormones mediating attraction to infant and child faces. Front. Psychol. 6:970. doi: 10.3389/fpsyg.2015.00970 We find infant faces highly attractive as a result of specific features which Konrad Lorenz termed "Kindchenschema" or "baby schema," and this is considered to be an important adaptive trait for promoting protective and caregiving behaviors in adults, thereby increasing the chances of infant survival. This review first examines the behavioral support for this effect and physical and behavioral factors which can influence it. It then provides details of the increasing number of neuroimaging and electrophysiological studies investigating the neural circuitry underlying this baby schema effect in parents and non-parents of both sexes. Next it considers potential hormonal contributions to the baby schema effect in both sexes and the neural effects associated with reduced responses to infant cues in post-partum depression, anxiety and drug taking. Overall the findings reviewed reveal a very extensive neural circuitry involved in our perception of cuteness in infant faces, with enhanced activation compared to adult faces being found in brain regions involved in face perception, attention, emotion, empathy, memory, reward and attachment, theory of mind and also control of motor responses. Both mothers and fathers also show evidence for enhanced responses in these same neural systems when viewing their own as opposed to another child. Furthermore, responses to infant cues in many of these neural systems are reduced in mothers with post-partum depression or anxiety or have taken addictive drugs throughout pregnancy. In general reproductively active women tend to rate infant faces as cuter than men, which may reflect both heightened attention to relevant cues and a stronger activation in their brain reward circuitry. Perception of infant cuteness may also be influenced by reproductive hormones with the hypothalamic neuropeptide oxytocin being most strongly associated to date with increased attention and attraction to infant cues in both sexes.

Keywords: baby schema, infant face, neural circuitry, parental behavior, hormones

### Introduction

The faces of both infants and young children are potent cross-cultural emotive stimuli that adults find both very cute and highly likeable and evoke feelings of protectiveness and care which thereby serve to aid survival of these vulnerable individuals (Brosch et al., 2007; Luo et al., 2011, 2014; Proverbio et al., 2011a; Borgi et al., 2014). Konrad Lorenz (1943) has defined this as the so-called "baby schema" or "Kindchenschema" effect.

Baby schema are considered to be a set of prominent infantile facial physical features, including large head, round face, high and protruding forehead, big eyes, small nose, small mouth, etc., that evoke rapid cognitive, affective and behavioral responses in adults. This may serve as an innate releasing mechanism, a fundamental social instinct which serves to initiate and maintain a parent/carer-infant relationship, particularly during the period of early development when a child is unable to care for itself and is therefore highly vulnerable (Parsons et al., 2010). As such this mechanism helps infant individuals develop secure and cooperative relationships, improves their adaptation to the society and thereby enhances offspring/species survival (Darwin, 1872; Parsons et al., 2010). A number of studies have also reported correlations between perceived cuteness of infant faces and health (Yamamoto et al., 2009), including across cultures (Volk and Quinsey, 2002; Volk, 2009; Golle et al., 2015). Since adult facial attractiveness has often been associated with having "good genes" (Weeden and Sabini, 2005; Rhodes, 2006), it is possible that prominent baby schema may also signal that infants are genetically healthy. Thus, as has been proposed by Golle et al. (2015), cute babies by being perceived as more healthy may promote greater protective and nurturing responses in carers, thereby serving to strengthen a community gene pool.

It is well established that viewing infant and child faces produces positive effects in adult observers both within and across different cultures in terms of automatically and rapidly capturing their attention (Brosch et al., 2007, 2008; Proverbio et al., 2011a), evoking smiling (Schleidt et al., 1980), protective reactions (Alley, 1983), close approach and exaggerated greeting responses (Eibl-Eibesfeldt, 1989). Studies demonstrating increased attentional allocation to infant faces have used a wide variety of paradigms, including a "key-press," or "wanting" task (as in Parsons et al., 2014), an attentional capture task (as in Thompson-Booth et al., 2014), eye-tracking (as in Borgi et al., 2014), and the dot-probe task (Brosch et al., 2007).

Adult Caucasian observers presented with a choice between a cute and less cute infant face exhibit a preference for giving a toy to, or adopting, the cute infant irrespective of whether the infant is Caucasian or African, or even a dog puppy (Golle et al., 2015). A recent study has also shown that faces with baby schema not only evoke positive emotions and caring behaviors in adults but can even evoke enhanced ratings of cuteness in young children aged 3–6 years old (Borgi et al., 2014). Overall, children whose faces display strong baby schema features are perceived as being cuter, friendlier, healthier, more attractive, trustworthy, and adoptable (Karraker and Stern, 1990; Ritter et al., 1991; Chin et al., 2006; Glocker et al., 2009a; Little, 2012; Golle et al., 2015). A recent study using an infrared thermography technique revealed that in both Italian and Japanese subjects facial temperature, a physiological index of arousal, was significantly increased during perception of infant faces from both in-group and out-group cultures, whereas with adult faces this was only the case for those from in-group members (Esposito et al., 2014). Finally, in terms of a potential direct influence of baby schema on parental behavior, a study has reported that mothers with cute infants showed greater affection and playfulness toward them than did mothers with infants who were less cute (Langlois et al., 1995).

While it was originally thought that the effects of baby schema on perceivers were limited to young infants of <1 year of age, it is now clear that they can also extend to faces of children of up to 4.5 years of age (Luo et al., 2011). Indeed, even adult faces with "babyish," immature features are considered more attractive, lovable, warm, submissive, physically weak and naive (McArthur and Apatow, 1984; Berry and McArthur, 1986). Imagined interactions with individuals exhibiting these babyish features are also associated with an increased feeling of social belonging in observers (Sacco et al., 2014). Thus findings suggest a generalization of attractiveness of infantile face features across both children and adults (Zebrowitz et al., 2009). The baby schema effect also appears to extend to our perception of cuteness in the young of other species (Golle et al., 2013, 2015; Lehmann et al., 2013; Borgi et al., 2014), and exhibits high level perceptual "after effects" similar to observations for a number of key features in adult faces (i.e., prolonged perception of cute infant faces lowers attraction ratings given to less cute ones presented subsequently and vice versa- Golle et al., 2013).

The baby schema effect is not just dependent upon the presence of the relevant salient physical facial features (Glocker et al., 2009b; Komori and Nittono, 2013) and can, for example, be weakened by the presence of some form of facial disfigurement (Baeken et al., 2010a). A study has also investigated the influence of temperament on attractiveness of neutral expression infant faces by pairing them with happy or sad facial expressions and equivalent vocalizations. Infant faces paired with mostly happy faces and vocalizations are perceived as even cuter and adult observers are prepared to exert greater effort to view them. On the other hand observers of infant faces paired with mostly sad face expressions and vocalizations don't rate them as cuter and are less prepared to make an effort to view them (Parsons et al., 2014). Thus an infant's temperament can enhance or decrease their perceived attractiveness.

The baby schema effect can also be influenced by an observer's social experience. Adults raised together with siblings like infant and child faces more than those who were not, and the smaller the mean age difference between them and their siblings the greater this effect is (Luo et al., 2014). Similarly, Caucasian children with older siblings had a higher accuracy in recognizing unfamiliar Caucasian child faces relative to Asian ones, while those without them didn't show any such recognition difference between the faces (Macchi Cassia et al., 2014). Parental experience is also influential since mothers in an attention capture paradigm showed longer reaction times to infant vs. adult faces, suggesting that they were more attentive toward infant faces. Reaction times were also negatively correlated with self-reported parental distress in mothers (Thompson-Booth et al., 2014). However, another study did not find significant reaction time differences in mothers and non-mothers in judging infant face expressions (Nishitani et al., 2011).

Personality of the observer can also influence the effects of baby schema. Individuals with higher levels of trait empathy, interpersonal closeness and needing to belong rate infant faces more positively, while personality attributes such as narcissism and insecure attachment have no influence (Lehmann et al., 2013). Finally, degree of resemblance of a child's face to that of the observer has also been reported to increase their perceived attractiveness in both men and women (DeBruine, 2004).

In recent years there has been increasing interest in establishing the neural (Nitschke et al., 2004; Glocker et al., 2009b; Malak et al., 2015) and hormonal influences (Bhandari et al., 2014; Hahn et al., 2014) underlying attraction to baby schema in infant and child faces. There is also an increasing focus on understanding changes in the early post-partum period which are crucial for the formation of mother-infant bonds and factors which may contribute to the development of postpartum depression. For example, studies suggest that an appraisal bias might underlie some of the difficulties mothers with postpartum depression have in responding to cues from their own infant's signals, since they are more likely to rate less cute infant faces negatively (Stein et al., 2010), and rate even average ones more negatively than other mothers (Gil et al., 2011). Substance abuse has also been associated with altered behavioral and neural responses to infant cues (Landi et al., 2011). In the present review we will therefore summarize and discuss the various neural and hormonal influences on infant and child facial processing and attraction established in healthy observers and also clinical research findings in disorders where there is some impairment involved.

### Neural Responses to Infant Faces: Baby Schema

As would be expected, emotive and salient social stimuli such as infant faces provoke widespread activation in brain systems involved in face perception, attention, emotion, empathy, memory, reward and attachment, theory of mind and also control of motor responses (see **Table 1**). The question of whether there is something special about brain processing of infant faces is therefore difficult to address given the involvement of so many different systems. Very few studies have investigated the influence of baby schema per se but have simply investigated neural responses to infant or child faces either alone or in comparison to adult human faces or young and adult faces from other species (Barrett et al., 2012; Caria et al., 2012; Stoeckel et al., 2014). The experimental protocols used in these studies are summarized in **Table 1** where it can be seen that the majority use a simple face viewing paradigm, although some have also used a oneback working memory task with faces, an oddball task design, face expression judgments or an affect rating task. Only Glocker and her colleagues have objectively quantified and parametrically manipulated the impact of baby schema content on patterns of brain activations (Glocker et al., 2009a,b). Thus, conclusions drawn from the present review are primarily based on the majority of findings reporting differences in neural responses between viewing infant as opposed to adult faces, or between viewing the faces of own as opposed to other infants.

In general neuroimaging studies, primarily using functional magnetic resonance imaging (fMRI), have shown that there is a degree of overlap in neural processing of infant and adult faces since both activate primary visual processing areas as well as those more specifically associated with face perception, such as the fusiform face area (FFA) (for abbreviations of brain regions see **Table 2**) (Kringelbach et al., 2008; Glocker et al., 2009b; Baeken et al., 2010a; Stoeckel et al., 2014). However, infant faces generally elicit more rapid responses and greater activity changes in these brain areas and additionally recruit other regions (Parsons et al., 2010; Caria et al., 2012; Hahn et al., 2014). In regions showing activation in response to both adult and infant faces stronger responses to infant/child faces have been reported in the fusiform gyrus [FFG—Brodmann area (BA) 41/37/19] (Leibenluft et al., 2004; Kringelbach et al., 2008; Caria et al., 2012; Stoeckel et al., 2014), middle occipital gyrus (MOG—BA 19/37) (Ranote et al., 2004; Caria et al., 2012), medial temporal gyrus (MTG—BA 21/37/39) (Leibenluft et al., 2004; Ranote et al., 2004; Caria et al., 2012) and superior temporal gyrus (STG—BA 38) (Ranote et al., 2004; Stoeckel et al., 2014) (**Figure 1A**). Of these regions, the right FFG in particular is of key importance to face processing (see Leopold and Rhodes, 2010) and may play a vital role in encoding baby schema facial features (Hoffman and Haxby, 2000; Glocker et al., 2009b; Stoeckel et al., 2014). These visual cortical areas may serve as an entry node to forward the processed baby face information to other brain regions associated with attention, emotion and memory for further processing and control of behavioral responses (Glocker et al., 2009b).

Infant and child faces also enhance attention and this is reflected in stronger activation of parietal areas involved both bottom up and top down processing of attention orientation (Shomstein, 2012) including the intraparietal sulcus (IPS— BA 19/7) (Leibenluft et al., 2004), precuneus (PCU—BA 7/31) (Leibenluft et al., 2004; Glocker et al., 2009b), and posterior cingulate cortex (PCC—BA 23) (Leibenluft et al., 2004) (**Figure 1B**). Greater activation of these parietal regions selectively help allocate more automatic and cognitive attentional resources to faces with baby schema features resulting in an attentional bias toward the infant faces (Brosch et al., 2007; Glocker et al., 2009b; Caria et al., 2012). An evoked-related potential (ERP) study has also reported stronger activation in response to neutral expression faces of unfamiliar infants at central–frontal (P3a) and occipital–lateral (N170) sites providing further support for increased attention toward infant cues (Weisman et al., 2012c).

There is also evidence that infant faces elicit strong activation in brain regions associated with core aspects of emotion processing (see Lindquist et al., 2012) including the anterior cingulate cortex (ACC—BA 33/24) (Glocker et al., 2009b) and medial cingulate cortex (MCC—BA 31/23/24) (Caria et al., 2012), insula (INS—BA 48/47/13) (Leibenluft et al., 2004; Glocker et al., 2009b; Caria et al., 2012; Stoeckel et al., 2014), amygdala (AMY; Lenzi et al., 2009; Barrett et al., 2012; Wan et al., 2014), orbitofrontal cortex (OFC—BA 11/47) (Leibenluft et al., 2004; Nitschke et al., 2004; Kringelbach et al., 2008; Minagawa-Kawai et al., 2009; Kuo et al., 2012; Wan et al., 2014) (**Figure 1C**). The OFC also plays an important role in learning the emotional value of information and tracking changing emotions (Goodkind et al., 2012), as well as with judgments of pleasantness (Bartels and Zeki, 2004). Thus, the left OFC (BA 11) is more strongly activated by happy vs. neutral infant faces while the right OFC (BA 11) to sad vs. neutral faces (Montoya et al., 2012; Stoeckel et al.,

#### TABLE 1 | Neural responses to infant and child faces.








N, numbers of subjects; M, male; F, female; L, left; R, right. Abbreviations of brain regions: see Table 2.

2014). Finally, a repetitive transcranial magnetic stimulation (rTMS) study has found that a single high frequency session with stimulation applied to the left dorsolateral prefrontal cortex enhanced processing of positive emotions on baby faces and reduced that for negative emotions (Baeken et al., 2010b). Thus infant faces elicit stronger responses in emotional brain circuitry involved with processing both valence and arousal. This would suggest that overall infant faces evoke both stronger arousal and enhanced responses to both positive and negative cues from the infant. The OFC and dorsolateral prefrontal cortex in particular appear to play a key role in mediating differential responses to positive and negative valence infant faces.

Many of the core brain regions engaged in emotion processing are also involved with core processing of empathy, notably the MCC, INS, and OFC (Fan et al., 2011), and other regions in the empathy network also show strong activation in response to infant faces, notably the posterior superior temporal sulcus (pSTS; Leibenluft et al., 2004), supplementary motor area (SMA—BA 6) and precentral gyrus (preCG; Caria et al., 2012), STG (Ranote et al., 2004; Stoeckel et al., 2014), PCU (Leibenluft et al., 2004; Glocker et al., 2009b), right supramarginal gyrus (SMG—BA 40) (Leibenluft et al., 2004), and also cerebellum (Ranote et al., 2004; Glocker et al., 2009b) (**Figure 1C**). Overall this pattern of enhanced activity in empathy processing regions suggests that it may contribute to better identification of emotions being expressed (cognitive empathy) and enhanced empathic feelings toward the infant (affective empathy). Furthermore, increased activity in preCG and SMA

TABLE 2 | The abbreviations and full names of activated brain regions.


might lead to increased motor empathy in terms of mimicry of facial expressions. Interestingly, a recent paper has shown that 18 month old infants exposed to a higher level of mimicry by their mothers exhibited increased prosocial behavior (Carpenter et al., 2013).

Enhanced activity changes in response to infant faces in the pSTS, PCU and PCC (Leibenluft et al., 2004; Glocker et al., 2009b) may also reflect an impact on core processes for theory of mind (Schurz et al., 2014). Theory of mind refers to the capacity to attribute mental states to oneself or others, and to predict and account for other people's behavior based upon understanding of their intentions and mental states (Premack and Woodruff, 1978; Leibenluft et al., 2004). Greater activation in these areas may therefore help adults utilize previous memorized experience and all the skills they have to better understand and respond toward the infant and successfully manage their social communication and relationship. Linked to this there is also evidence that infant faces evoke greater activation in regions associated with episodic memory including the hippocampus (HIPP; Stoeckel et al., 2014), PCU (Leibenluft et al., 2004; Glocker et al., 2009b) and thalamus (THA; Leibenluft et al., 2004; Caria et al., 2012; Stoeckel et al., 2014) (**Figure 1D**). Enhanced activation in FFG (BA 41/37/19) (Leibenluft et al., 2004; Kringelbach et al., 2008; Caria et al., 2012; Stoeckel et al., 2014) and STG (Ranote et al., 2004; Stoeckel et al., 2014) may additionally contribute through their role in social cognition.

A number of studies have addressed the question of whether infant faces are particularly rewarding. Findings have consistently shown that infant faces appear to evoke greater activation of regions involved in reward and attachment, including the OFC (Leibenluft et al., 2004; Nitschke et al., 2004; Kringelbach et al., 2008; Minagawa-Kawai et al., 2009; Kuo et al., 2012; Stoeckel et al., 2014; Wan et al., 2014), substantia nigra/ventral tegmental area (SNi/VTA; Stoeckel et al., 2014), nucleus accumbens (NAcc)/ventral striatum (Stoeckel et al., 2014), caudate (CAU; Glocker et al., 2009b; Kuo et al., 2012; Stoeckel et al., 2014), putamen (PUT) (Glocker et al., 2009b; Stoeckel et al., 2014), globus pallidus (GP) (Glocker et al., 2009b; Stoeckel et al., 2014) and the THA (Leibenluft et al., 2004; Caria et al., 2012; Stoeckel et al., 2014) (**Figure 1E**). Although the OFC is also involved in emotion processing, its contribution to enhanced rewarding properties of infant faces has been particularly emphasized. The NAcc and VTA are key interconnected regions in the dopaminergic brain reward system (Bourdy and Barrot, 2012) which are selectively activated during both overt perception and mental imagery of rewarding and reinforcing stimuli, especially pleasant and emotionally arousing ones such as infant faces (Costa et al., 2010). Furthermore, lesions in NAcc have been reported to impair the baby schema effect (Numan, 2007). The NAcc and VTA are important for reward-mediated attachment and affiliation (Stoeckel et al., 2014) and also appear to be important targets whereby prosocial hormones such as oxytocin and vasopressin exert their effects on social reward (Scheele et al., 2013).

One of the most notable effects of infant faces is that they evoke stronger activation in brain motor areas than adult faces, including the SMA (BA 6) (Caria et al., 2012), precentral gyrus (preCG—BA 6) (Caria et al., 2012; Glocker et al., 2009b), postcentral gyrus (postCG—BA 2/3/4) (Glocker et al., 2009b; Ranote et al., 2004), superior frontal gyrus (SFG—BA 6) (Glocker et al., 2009b; Caria et al., 2012; Kuo et al., 2012; Wan et al., 2014), THA (Leibenluft et al., 2004; Caria et al., 2012; Stoeckel et al., 2014) and cerebellum (Ranote et al., 2004; Glocker et al., 2009b) (**Figure 1F**). These interconnected areas form the core motor

circuit for preparation and planning and execution of intentional movements and speech (Goldberg, 1985; Caria et al., 2012). The SFG contains a number of different sub-regions with different patterns of connectivity with regions involved in motor control, working memory and attention and self-awareness (Li et al., 2013), and may play an important role in integrating perception and action (Goldberg et al., 2006). Overall the extensive pattern of activation in motor control circuitry seen in response to infant faces may be particularly important in mediating unconscious, intuitive and virtually unavoidable patterns of approach behavior in adults to protect infants and promote both physical and language interactions with them (Ackermann and Ziegler, 2010; Caria et al., 2012). As already discussed above in relation to motor empathy, adult observers may also show a greater propensity to mimic infant face expressions thereby aiding infant prosocial development (Carpenter et al., 2013).

Only a small number of studies have specifically investigated the impact of the intensity of baby schema features in infant faces either by comparing responses to the faces of infants (high baby schema) and children (low baby schema) or through direct manipulation of baby schema features in specific face stimuli. From these studies there is evidence that intensity of baby schema does influence neural responses in visual and face processing regions as well as in regions controlling emotional responses, attention and memory, theory of mind and reward. An ERP study found that N1 amplitude localized in the FFG was increased in response to infant faces compared to adult, although not child faces, in both males and females. On the other hand N2 amplitude localized in the medial occipital cortex and FFG, uncus and medial OFC was increased in response to infant compared with child and adult faces, although only in females (Proverbio et al., 2011b). An fMRI study which specifically manipulated the intensity of baby features found that infant faces with high baby schema features, and which were rated to be very cute, produced stronger activation than ones with low baby schema features in the right NAcc, left ACC (BA 24), left PCU (BA 7), and left FFG (Glocker et al., 2009b). Interestingly, another fMRI study showed that adult faces with a baby-faced appearance and infant faces evoked greater activation in the same brain regions (AMY and FFA) than more mature looking adult faces, and also stimulated greater functional connectivity between them (Zebrowitz et al., 2009). However, it should be noted that the enhanced impact of baby faces on AMY and OFC responses is disrupted if the infant face has physical anomalies, which disrupt the baby schema (Baeken et al., 2010a; Parsons et al., 2013). Overall therefore evidence from these studies indicates that infant faces with high baby schema are more likely to evoke greater responses in brain regions processing faces, attention, emotion and positive reward than those with low baby schema.

### Specific Neural Responses to Own Infant Faces: Parental Love

In addition to the general impact of baby schema on neural processing discussed above there is widespread evidence for further enhanced effects when parents view their own as opposed to other infants. Thus, when mothers view their own infants there is greater activation observed in many of the same brain areas discussed above associated with visual processing, emotion, empathy, theory of mind, reward processing, social cognition and motor control (Bartels and Zeki, 2004; Leibenluft et al., 2004; Nitschke et al., 2004; Noriuchi et al., 2008; Minagawa-Kawai et al., 2009; Laurent and Ablow, 2013; Stoeckel et al., 2014; Wan et al., 2014; Esposito et al., 2015).

There are also some additional brain areas activated when mothers view their babies, including visual processing regions such as the occipital and temporal cortices (BA 17/18/19/37) (Nitschke et al., 2004), the right anterior temporal pole (ATP—BA 38) (Ranote et al., 2004) involved in emotional processing, periaqueductal gray (PAG) (Bartels and Zeki, 2004; Noriuchi et al., 2008), lateral OFC and lateral prefrontal cortex (lPFC—BA 11/47/46/45) (Bartels and Zeki, 2004) involved in maternal responses and emotion/reward processing and anterior paracingulate cortex (APC—BA 9) (Leibenluft et al., 2004) involved in theory of mind and cognitive processing. Moreover, ERP studies have found that both amplitudes of the parietal-distributed P300, involved in attention, and the temporal N170 component implicated in face encoding, were increased during viewing of own infant/child but not of unfamiliar infants/children (Doi and Shinohara, 2012; Weisman et al., 2012c). These specific neural responses toward own infant/child faces may represent components of maternal love/attachment which are clearly very important for the socio-emotional and cognitive development of infants, especially during the early post-partum period.

One of the most consistent findings in terms of an own infant-specific neural response in mothers is increased activation in the OFC which plays a key role both in emotion and reward processing (Bartels and Zeki, 2004; Nitschke et al., 2004; Kringelbach et al., 2008; Stoeckel et al., 2014). The OFC includes medial (BA 25/14/10) and lateral (BA 47/12/11/10) areas and in accordance with its anatomy and connectivity the lateral portion has been further subdivided into another three sub-regions: anterior, posterior and caudal (Elliott et al., 2000). The medial portion is particularly involved in monitoring reward value and making stimulus-reward associations while the lateral portion is more related to stimulus-outcome associations and rewardrelated response suppression (Elliott et al., 2000; Walton et al., 2010). Overall, studies have found that maternal love (in terms of maternal responses to own babies) is mostly associated with increased activation of the lateral OFC. Thus bilateral lateral OFC (BA 11/47) activation occurs when mothers passively view images of their own as opposed to another familiar infant (Bartels and Zeki, 2004); happy faces of their own compared to another unfamiliar happy infant (Nitschke et al., 2004) or video clips of their own compared to unknown infants (Noriuchi et al., 2008; Minagawa-Kawai et al., 2009). Another study has also reported greater activation in the left inferior OFC (BA 11) with own vs. other joyful infant faces in mothers with low self-reported current depressive symptoms (Laurent and Ablow, 2013). Finally there is also a study showing increased activations in response to viewing an own infant in the anterior portion of the OFC (Minagawa-Kawai et al., 2009).

An increasing number of studies have investigated the neural substrates of paternal love in recent years (Atzil et al., 2012; Lamb and Lewis, 2013; Leidy et al., 2013; Mascaro et al., 2014). Neuroimaging studies on fathers viewing own infant/child faces have found increased activity in brain areas including medial and lateral frontal cortex (IFG, MFG, SFG, OFC), SMG, MTG, INS, cingulate, striatum (CAU) and AMY (Atzil et al., 2012; Kuo et al., 2012; Mascaro et al., 2014). All of these regions also show increased activation in mothers viewing their own infant/child faces. The only study to directly compare neural responses in mothers and fathers to videos of their own as opposed to other young infants also reported a considerable similarity between them (Atzil et al., 2012). However, this study also found significantly greater activation in mothers in a number of regions in the right hemisphere (STG, PostCG, FFG, MTG, AMY, lentiform nucleus, Cuneus, and CAU). On the other hand in fathers greater activation was found in the left MFG, inferior parietal gyrus, superior occipital gyrus and precuneus and in part of the right MTG and STG. This possibly suggests more right hemisphere dominated responses in mothers and left hemisphere dominated ones on father. Further evidence for differences between fathers and mothers has also been reported in response to own vs. other infant laughing of crying. In this case a deactivation response in the ACC, which is an important region in the control of emotion, was only found to occur in mothers (Seifritz et al., 2003). Somewhat surprisingly one study has reported that greater parental sensitivity and reciprocity in fathers was negatively associated with the activation in the right OFC for own compared with other infants (Kuo et al., 2012). This seems to imply that more responsive fathers may find all infants more rewarding, not just their own. However, only 10 subjects were included in this study and so some caution should be attached to interpreting this finding. Another a recent study has also investigated changes in regional gray matter (GM) volume in fathers from 2–4 to 12–16 weeks postpartum as an indication of potential neural plasticity changes. Results showed increased GM volume in the hypothalamus, AMY, striatum and lateral frontal cortex although a decrease in the OFC, PCC, and INS (Kim et al., 2014).

In summary, all the neuroimaging and electrophysiological findings described above support the conclusion that baby faces contain highly salient, affective and rewarding information that particularly engages the extensive neural processing systems involved in these functions in adult observers. These systems in turn mediate changes in motor preparation and response circuitry to promote approach, protection and nurturing behavior and the whole system undergoes plasticity changes to further strengthen the social bond with the infant and facilitate subsequent behavioral responses. Evidence to date suggests that the neural circuitry involved in maternal and paternal responses to own infants/children appears to be very similar, although there is some evidence for differential responses in some frontal, temporal, limbic, and brain reward regions. However further studies are clearly needed provide more extensive evidence for such parental sex differences.

### Hormonal Correlates of Attraction to Infant Faces

A number of cultural, experiential and physiological factors could potentially contribute to observed sex differences in responses to infant and child faces. In particular, differential responses in men and women as a result of cultural norms and expectations may play a significant role (Lytton and Romney, 1991), although this has not been systematically investigated in the context of the baby schema effect. However, there is also a growing amount of evidence for hormonal influences on responses to infant cues in terms of sex differences, effects of puberty, the menstrual cycle and pregnancy, sex hormones and also neuropeptides, such as oxytocin and vasopressin, involved in the control of parental behavior and social bonds.

There is increasing behavioral and neural evidence for sex differences in response to infant faces. Behavioral findings have shown that while both males and females find infant faces cute, females tend to be more sensitive to the cuteness of infant faces than males (Glocker et al., 2009a; Lehmann et al., 2013), have stronger reactions and attentional bias toward them (Seifritz et al., 2003; Cárdenas et al., 2013), exhibit a higher preference for and liking of them (Maestripieri and Pelka, 2002; Parsons et al., 2011; Charles et al., 2013) and make more effort and have a stronger motivation to view them (Hahn et al., 2013). However, it should be noted that one study has failed to find a gender difference in attraction to infant faces, although this might possibly reflect the young age of the male participants or possibly the rather limited 5-point Likert scale used (Sprengelmeyer et al., 2013).

This observed gender difference in the perceived attractiveness of infant faces may to some extent be attributed to increased responsiveness in brain reward regions in mothers compared to fathers (Atzil et al., 2012), and also to the larger size of the OFC in females relative to males (Gur et al., 2002; Proverbio et al., 2011b). Indeed, support for a bias in women finding infant faces more rewarding, rather than being more sensitive to recognizing them, comes from a study showing that females were only better than males at choosing which of a pair of infant faces was cuter, but not when deciding which was the younger or the happier one (Lobmaier et al., 2010).

Preferences for infant faces also vary in women across their life cycle, and particularly with regard to their reproductive status. Two studies have reported that while females overall exhibit an overall higher preference for infant faces than males this preference varies with age. The first of these studies reported that during childhood (6–10 years) and adolescence (11–15 years) females exhibited the highest preference, but that this declined thereafter during early (19–35 years) and later (46– 75 years) adulthood. Interestingly males showed a relatively constant preference across all four age groups (Maestripieri and Pelka, 2002). A second study reported that young women (19– 26 years) are more sensitive to infant cuteness than men aged 19–26 and 53–60 years old. Women aged 45–51 years were the same as younger women whereas those aged 53–60 years old showed a reduced cuteness sensitivity that was equivalent to men (Sprengelmeyer et al., 2009). Thus both studies suggest that female reproductive hormones may play an important role in increasing perceived cuteness of infant faces. This explains the sex difference between young women and men and the decline seen in older women who are likely to have undergone menopause.

Puberty and age of puberty have also been shown to influence perception of infant cuteness. Post-menarcheal girls have a higher preference for infant faces and rated them more positively than pre-menarcheal peers and boys, suggesting that the onset of menstruation may increase attention toward infantile features (Goldberg et al., 1982). Furthermore, girls who had an early menarche also exhibited a greater subsequent preference for infant faces than those who had a later onset menarche (Maestripieri, 2004; Maestripieri et al., 2004). This latter finding is perhaps a little surprising given that early puberty is generally more associated with a negative impact in terms of increased likelihood of depression and behavioral problems (Copeland et al., 2010). However, it was also found that both early menarche and increased perception of cuteness were associated with early paternal absence from the home and so it is argued that this may represent an adaptation in terms of an earlier readiness for reproduction and parenting (Maestripieri et al., 2004).

Two studies to date have investigated potential differences in responses to infant faces across the menstrual cycle. One of these did not find any effects (Sprengelmeyer et al., 2013), however the other using a forced-choice paradigm where subjects indicated which of two infant faces was cuter, found that women were more likely to choose the cuter baby during their ovulatory than luteal phase of the cycle (Lobmaier et al., 2015). Nevertheless, cuteness discrimination was not associated with saliva concentrations of oestradiol, progesterone or testosterone, leading the authors to speculate that this menstrual cycle phase effect might be associated with other relevant hormones which change during the cycle, such as oxytocin or prolactin. To date no studies have looked at changes across pregnancy. However, overall findings do suggest some potential links between female reproductive hormones and their sensitivity to the cuteness of infant faces and that this can therefore contribute to facilitation of parental caregiving in individuals who have the reproductive potential to produce children (Sprengelmeyer et al., 2010).

In fathers two studies have reported correlations between blood testosterone concentrations and neural responses to infant cues. Testosterone concentrations were found to be decreased in fathers compared to non-fathers, and negatively associated with activation of the MFG in response to pictures of young children (Mascaro et al., 2014). On the other hand a study has also reported a positive association between testosterone concentrations in fathers and activation of a brain reward area, the left CAU, following interaction with an infant (Kuo et al., 2012). Thus the relationship between testosterone and neural responses to infant cues in fathers is somewhat unclear.

### Effects of Sex Hormones on Attraction to Infant Faces

While no studies have systematically investigated the effects of exogenous treatments with either estradiol or progesterone on sensitivity to cuteness in infant faces one has reported that women using oral contraceptives (which contain estrogen and progesterone) are more sensitive compared to those who do not (Sprengelmeyer et al., 2009). However, the same group in a subsequent study failed to find an effect of oral contraceptives on the aesthetic and incentive salience of cute infant faces (Sprengelmeyer et al., 2013).

Another important sex hormone which influences parenting behaviors is testosterone (Bos et al., 2010; Kuo et al., 2012). Testosterone may play a role in regulating females' reward sensitivity since it has been shown to increase the reward value of financial incentives through testosterone administration (Hermans et al., 2010). While no studies to date have investigated effects of testosterone administration on viewing baby faces several have attempted to find any associations with salivary testosterone concentrations. Thus higher salivary testosterone concentrations in women were associated with greater reward scores given to cute infant faces and this effect was independent of progesterone and estradiol concentrations (Hahn et al., 2014). Higher salivary testosterone concentrations in fathers during interactions with infants have also been associated with greater activation in brain reward regions such as the CAU when processing own vs. other infant faces (Kuo et al., 2012).

### Effects of Oxytocin and Vasopressin on Attraction to Infant Faces

There has been considerable interest in the role of the evolutionary conserved hypothalamic neuropeptides oxytocin and vasopressin in recent years and a large number of studies have investigated their importance for a wide range of human social and emotional behaviors. The majority of studies have focused on oxytocin (OXT) and its effects on trust, cooperation, face emotion recognition, empathy, in-group preferences and also on social bonds and maternal attachment (Bartz et al., 2011; Kemp and Guastella, 2011; Striepens et al., 2011; Weisman et al., 2012a,b; Scheele et al., 2014; Wigton et al., 2015). Oxytocin may play an important role in human parental responses, with higher plasma concentrations of maternal oxytocin across pregnancy being predictive of higher quality of postpartum maternal care (Feldman et al., 2007). Increased plasma concentrations of oxytocin across the first 6 months following the birth of a child have also been correlated with various, although differing, positive aspects of parental responsiveness in both mothers and fathers (Gordon et al., 2010). Associations between oxytocin receptor polymorphisms (Riem et al., 2011b; Feldman et al., 2012) and the vasopressin V1a receptor(Bisceglia et al., 2012) and sensitive parenting have also been reported. Oxytocin released during breast-feeding may also have stress-reducing effects (Heinrichs et al., 2001, 2002). In the context of the current review associations between oxytocin and vasopressin and attraction to infant faces have been shown by demonstrating correlations between plasma or saliva concentrations or associations with receptor polymorphisms or neural and behavioral responses to exogenous treatment using intranasal application.

Salivary oxytocin concentrations have been found to be positively correlated to mood ratings of happy but not sad infant faces in women (Bhandari et al., 2014). In an ERP study, urinary oxytocin concentrations in foster mothers following a cuddle interaction with their infants were also shown to be positively correlated with P300 amplitude in response to viewing all infant faces (Bick et al., 2013). However, another recent study which failed to demonstrate differences in responses to

infant compared with adult faces in a facial visual research task found that urinary oxytocin concentrations were positively correlated with performance on both types of faces (Saito et al., 2014). Higher concentrations of plasma oxytocin have also been found to be related with stronger maternal response in terms of increased gaze toward the infant face in postpartum mothers (Feldman et al., 2007; Gordon et al., 2010), and greater activation in brain reward regions such as the ventral striatum, OFC and medial frontal cortex as well as in hypothalamic/pituitary regions in first-time mothers viewing own vs. unknown infant faces (Strathearn et al., 2009). Furthermore, those mothers with lower plasma oxytocin concentrations when viewing their own vs. unknown infant faces were found to be more likely to have insecure attachment and also reduced activation of the mesocorticolimbic dopamine reward system in response to infant face cues (Strathearn et al., 2008, 2009; Strathearn, 2011).

Thus while plasma, salivary and urinary concentrations of oxytocin may not necessarily always accurately reflect those in the brain (see Striepens et al., 2011) there does seem to be some association between higher endogenous concentrations of the peptide and enhanced responses to infant face cues, at least in post-partum mothers. In line with the potential role of oxytocin in influencing social reward and modulating activity in brain reward systems (Scheele et al., 2013; Striepens et al., 2014), increased peripheral concentrations also seem to be associated with greater responses to own vs. other infant faces and cues in dopaminergic reward pathways (Rilling, 2009; Strathearn et al., 2009; Strathearn, 2011). However, this relationship between oxytocin and enhanced activation in brain reward systems is not specific to parent-infant bonds since it has also been reported for romantic bonds in terms of men viewing the face of their female partner compared with another either familiar or unfamiliar woman (Scheele et al., 2013).

Intranasal oxytocin administration has been found to enhance responses to important infant cues such as crying (Riem et al., 2011a, 2014) and laughing (Riem et al., 2012). Oxytocin administration has also been reported to increase preference for infant faces in homozygous GG allele carriers for the rs53576 polymorphism of the oxytocin receptor, whereas rs53576A allele carriers showed the opposite pattern (Marsh et al., 2012). An ERP study has linked the rs53576 polymorphism with sensitivity to infant cues since in both mothers and nulliparous women who were GG allele carriers an early (∼100 ms) differential frontal ERP response to strong intensity infant face expressions was associated with faster emotion recognition performance (Peltola et al., 2014). The same study found that mothers exhibited modulation of the early posterior negativity component (EPN) by negative valence faces. Intranasal OXT administration has also been shown to enhance subjective arousal ratings for infant photos in nulliparous women, and their ratings were positively correlated with their AMY activation in the oxytocin but not placebo treatment group (Rupp et al., 2013). Another study using the Infant Facial Expressions of Emotions from Looking at Pictures (IFEEL) task showed that oxytocin increased activation in empathy-related brain regions such as the left inferior frontal gyrus (IFG), MTG, and STG when women judged the emotion vs. gender of the infant faces. However, surprisingly it decreased behavioral performance on the face emotion recognition task independent of the difficulty level (Voorthuis et al., 2014). Oxytocin has also been reported to increase activity in the VTA, but not accumbens, of both nulliparous women and postpartum mothers during viewing of infant faces (Gregory et al., 2015). Studies investigating both neural and behavioral effects of intranasal oxytocin on responses to infant faces are summarized in **Table 3**.

While fewer studies have been carried out on men/fathers, one has reported the intranasal oxytocin treatment actually reduced activation in reward- and attachment-related brain regions, such as the left GP, when biological fathers passively viewed their own vs. an unfamiliar child (3–6 years) or an unfamiliar vs. familiar child. Oxytocin also decreased functional connectivity within a fronto-pallido-hippocampal network for own vs. unfamiliar child (Wittfoth-Schardt et al., 2012). Therefore, oxytocin may have differential effects on mothers and fathers by selectively modulating functional brain responses and connectivity to infant faces in regions associated with emotion, attachment, novelty and reward processing (Wittfoth-Schardt et al., 2012). Another study has also reported different associations between plasma oxytocin concentrations in mothers and fathers and brain regions showing greater responses to videos of own vs. other infants (Atzil et al., 2012). Thus, while higher AMY activation in mothers was positively associated with plasma oxytocin concentrations, this was not the case in fathers. In mothers oxytocin concentrations were positively associated with activity in the left INS, left inferior parietal lobule (IPL), bilateral temporal cortex (TC), left ventral ACC and left NAcc. In fathers on the other hand activation in the left IFG, SFG and medial prefrontal cortex (mPFC), left postCG and left ACC was negatively associated with oxytocin concentrations. These findings again support the possibility that oxytocin may be influencing brain regions associated with attention, emotion, reward and even motor processing differently in mothers and fathers, although obviously some caution needs to be applied to such simple correlational analyses of this kind.

A number of studies have reported effects of intranasal oxytocin on reducing AMY responses to negative emotional faces (see Striepens et al., 2011) and also to both laughing (Riem et al., 2012) and crying (Riem et al., 2011a) infants. However despite the fact that greater AMY activation has been reported in response to own infant faces in mothers (Atzil et al., 2012; Strathearn and Kim, 2013), effects of oxytocin on AMY responses to infant faces have so far not been found.

Vasopressin, which is closely associated with oxytocin, has also been shown to influence social behaviors (Hammock, 2015; Patel et al., 2015). In rats, for example, it plays a potent role in facilitating maternal behavior, independent of trait anxiety (Bosch and Neumann, 2008). However, to date few studies have investigated potential effects of vasopressin on the attractiveness of infant cues. One study has reported some overlapping but also different patterns of negative associations between plasma vasopressin concentrations and activity in brain regions responding more strongly to videos of own vs. other infants (Atzil et al., 2012). In mothers, associations were found in bilateral SFG, right MFG and right middle temporal gyrus (MTG), whereas in fathers they were found in the right IPL,


TABLE 3 | The effect of oxytocin administration on brain response to infant and child faces.

OXT, oxytocin; IU, international unit; WB, whole brain analysis; ROI, regions of interest analysis; L, left; R, right; ↑, increased; ↓, decreased. Abbreviations of brain regions: see Table 2.

right inferior and medial frontal gyri, left INS and right temporal lobe. Thus, as with oxytocin, there may be different responses to vasopressin in maternal and paternal brains, although this clearly needs more detailed confirmation. While at this stage it is unclear whether vasopressin may play an important role in influencing responses to infant cues by either males or females, one speculation might be that it could serve to enhance empathic responses, particularly in those individuals whose parental response sensitivity is high. For example, a recent study has reported that intranasal vasopressin, but not oxytocin, increased empathic concern in both male and female subjects. Interestingly this effect was strongest in individuals who had received higher levels of paternal warmth during their childhood (Tabak et al., 2014).

### Altered Responses to Infant Cues in Post-partum Depression and Substance Abuse

Post-partum depression affects between 6.5 and 8.5% of mothers (Yonkers et al., 2001) and is associated with reduced positive interest and responses to infant cues, which in turn can lead to weakening the relationship between a mother and her child. Studies have shown that mothers suffering from postnatal depression are more likely to rate negative emotion infant faces more negatively (Stein et al., 2010), and even neutral expression ones more negative than controls (Gil et al., 2011). Mothers with post-partum depression are also less accurate when identifying unfamiliar happy infant faces than healthy mothers, although there were no differences found when identifying sad faces (Arteche et al., 2011). While paternal postpartum depression is also moderately and positively correlated with maternal depression (Paulson and Bazemore, 2010), no study to date has investigated altered responses of fathers with depression to infant cues. This should be an important area for future studies.

Neuroimaging and electrophysiological studies have found evidence for a reduced effect of infant faces in a number of the same brain regions involved in attention, emotional and empathic responses and reward discussed above. Thus, depressed mothers compared to healthy controls showed a slower response in the dorsal ACC when viewing the distressed face of their own infant. Also, those with higher levels of current symptomatology showed reduced responses in the OFC and INS toward their own infant's joyful faces. Symptom severity could also predict lower responses to their own infant in left prefrontal and insula/striatal regions (Laurent and Ablow, 2013). On the other hand an ERP study has reported that the face-sensitive N170 component elicited in response to infant face stimuli was positively related with depression symptom severity (Noll et al., 2012). This perhaps implies an increased initial automatic perceptual sensitivity to infant faces in mothers with a greater severity of post-partum depression, but a subsequent suppression of responses in brain regions controlling positive attentional, emotional and reward responses to infants.

Anxiety disorders can also impact negatively on maternal responses to infants and one study has shown that mothers with generalized anxiety disorder are inclined to rate the intensity of happy infant faces lower than controls (Arteche et al., 2011). Furthermore, the babies of anxious mothers appear to be less willing to look at their face since maternal anxiety scores have been shown to be negatively correlated with the amount of time babies looking at their mother's face (Jones et al., 2013). Thus babies also appear to be sensitive to reduced interest in them by anxious, and probably also depressed mothers, thereby further increasing the potential threat to the parent-infant bond.

Key regions exhibiting altered responses to negative emotional stimuli in patients with anxiety and depression disorders notably include those involved in responses to infant faces, such as the ACC, INS, and AMY (Jaworska et al., 2014; Oathes et al., 2015).

Drug addiction has also been shown to influence responses to infant faces. The National Survey on Drug Use and Health (NSDUH) in 2007 found that 5.2% of pregnant women reported using illicit drugs during pregnancy; and an additional 11.6% reported using alcohol and 16.4% tobacco. Mothers using cocaine during pregnancy have been found to respond more passively and in a more disengaged way to their babies (Gottwald and Thurman, 1994), and similar patterns of reduced responsivity in substance-using mothers have been reported in terms of parenting children even beyond infancy (Johnson et al., 2002; Molitor and Mayes, 2010). An fMRI study has shown that mothers using drugs during pregnancy (tobacco, heroin, marijuana, opiates, cocaine, and alcohol) had reduced responses to neutral and emotional infant faces in many of the regions discussed above which show enhanced responses to infant faces. For happy faces reduced responses were found in frontal regions involved in attention, salience and reward (ventromedial, dorsolateral and dorsomedial frontal cortex) as well as in early visual processing (occipital gyrus). For sad faces similar reductions were seen in frontal regions (dorsolateral frontal cortex, inferior and medial frontal gyri and medial OFC), although additionally in sensorimotor regions, MTG, STG, and PCC, as well as the AMY and parahippocampal gyrus (PHG). For neutral expression faces again there were extensive frontal reductions in responses (ventromedial, dorsomedial, and dorsolateral frontal cortex and inferior frontal gyrus) as well as in sensorimotor regions, PCC, GP, AMY, and PHG. There was also reduced responsiveness a primary visual processing region, the cuneus (Landi et al., 2011). Thus overall, drug taking appears to have an even more pronounced effect in reducing responsiveness in brain circuitry to infant faces than either post-partum depression or general anxiety.

### Conclusions and Future Directions

In line with the potent impact of facial baby schema on adult attraction, protection and caregiving behaviors, neuroimaging and electrophysiological studies reveal an extensive neural circuitry involved in our perception of infant faces. Enhanced activation in response to infant compared to adult faces is found in cortical and sub-cortical brain regions involved in face perception, attention, emotion, empathy, memory, reward and attachment, theory of mind and also control of motor responses. Both mothers and fathers also show evidence for enhanced responses in these same neural systems when viewing their own as opposed to another child. Importantly postpartum depression, anxiety and drug-taking all tend to reduce responsivity in this neural circuitry involved in processing and responding to infant face cues, with the most extensive changes in this respect appearing to occur in women taking addictive drugs during pregnancy.

Reproductively active women tend to rate infant faces as cuter than men and this may be mainly a reflection of both heightened attention to relevant cues and a stronger activation in their brain reward circuitry. In both sexes perceived cuteness of infant faces is influenced by reproductive hormones, with women in particular showing an ovulatory peak in interest during their cycle and an apparent decline post-menopause. To date evidence does not support major roles for the gonadal hormones estradiol, progesterone and testosterone in influencing responsivity to infant faces, although there is increasing evidence linking oxytocin with facilitation of attention toward and attractiveness of infant cues in both sexes.

Future studies need to explore in more detail the functional relevance of specific components of the widespread neural circuitry associated with the enhanced responses to infant faces. To date, for example, only one study has demonstrated the functional importance of the dorsolateral prefrontal cortex in the processing of emotional baby faces using rTMS (Baeken et al., 2010b). A particular focus should be on the circuitry involved with face processing, attention, emotion, empathy and reward processing since this would appear to be affected in reduced responses observed in post-partum depression, anxiety and drug use. It is also important to establish the functional roles of the neuropeptides oxytocin and vasopressin in mediating enhanced neural and behavioral responses to infant faces and other salient cues since they could in future represent potential therapeutic agents.

### References


Eibl-Eibesfeldt, I. (1989). Human Ethology. Hawthorne, NY: Aldine de Gruyter.

Elliott, R., Dolan, R. J., and Frith, C. D. (2000). Dissociable functions in the medial and lateral orbitofrontal cortex: evidence from human neuroimaging studies. Cereb. Cortex 10, 308–317. doi: 10.1093/cercor/10.3.308


faces in mothers and non-mothers. Dev. Sci. 17, 35–46. doi: 10.1111/desc.12090 Volk, A. A. (2009). Chinese infant facial cues. J. Evol. Psychol. 7, 225–240. doi: 10.1556/JEP.7.2009.3.3


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Luo, Ma, Zheng, Zhao, Xu, Becker and Kendrick. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Age, gender, and puberty influence the development of facial emotion recognition

#### *Kate Lawrence1, Ruth Campbell2 and David Skuse3\**

*<sup>1</sup> Department of Psychology, St Mary's University, Twickenham, London, UK, <sup>2</sup> Deafness Cognition and Language Centre, University College London, London, UK, <sup>3</sup> Behavioural and Brain Sciences Unit, UCL Institute of Child Health, London, UK*

Our ability to differentiate between simple facial expressions of emotion develops between infancy and early adulthood, yet few studies have explored the developmental trajectory of emotion recognition using a single methodology across a wide age-range. We investigated the development of emotion recognition abilities through childhood and adolescence, testing the hypothesis that children's ability to recognize simple emotions is modulated by chronological age, pubertal stage and gender. In order to establish norms, we assessed 478 children aged 6–16 years, using the Ekman-Friesen Pictures of Facial Affect. We then modeled these cross-sectional data in terms of competence in accurate recognition of the six emotions studied, when the positive correlation between emotion recognition and IQ was controlled. Significant linear trends were seen in children's ability to recognize facial expressions of happiness, surprise, fear, and disgust; there was improvement with increasing age. In contrast, for sad and angry expressions there is little or no change in accuracy over the age range 6–16 years; nearadult levels of competence are established by middle-childhood. In a sampled subset, pubertal status influenced the ability to recognize facial expressions of disgust and anger; there was an increase in competence from mid to late puberty, which occurred independently of age. A small female advantage was found in the recognition of some facial expressions. The normative data provided in this study will aid clinicians and researchers in assessing the emotion recognition abilities of children and will facilitate the identification of abnormalities in a skill that is often impaired in neurodevelopmental disorders. If emotion recognition abilities are a good model with which to understand adolescent development, then these results could have implications for the education, mental health provision and legal treatment of teenagers.

Keywords: emotion, social cognition, facial expression, emotion recognition, child development, face recognition

### Introduction

Faces are of unrivaled significance to human social interactions. Not only do faces provide us with visual information that allows us to determine the sex, age, familiarity and identity of an individual, we also use faces to gather information about what other individuals might be thinking or feeling. Analysing and interpreting facial expressions of emotion is necessary to enable us to modify our social interactions appropriately. Over the decades since the 1970s, social and psychological research has established the universality of the six main facial expressions of emotion

#### *Edited by:*

*Bozana Meinhardt-Injac, Johannes Gutenberg University Mainz, Germany*

#### *Reviewed by:*

*Olivier Pascalis, Centre National de la Recherche Scientifique and Université Pierre-Mendès-France, France Kathrin Cohen Kadosh, University of Oxford, UK*

#### *\*Correspondence:*

*David Skuse, Behavioural and Brain Sciences Unit, UCL Institute of Child Health, 30 Guilford Street, London WC1N 1EH, UK d.skuse@ich.ucl.ac.uk*

#### *Specialty section:*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

*Received: 26 March 2015 Accepted: 22 May 2015 Published: 16 June 2015*

#### *Citation:*

*Lawrence K, Campbell R and Skuse D (2015) Age, gender, and puberty influence the development of facial emotion recognition. Front. Psychol. 6:761. doi: 10.3389/fpsyg.2015.00761* (Ekman and Friesen, 1971). Very young infants can discriminate some, but not all, facial expressions of emotion (Serrano et al., 1992; Peltola et al., 2008) and facial emotion recognition ability is impaired in numerous psychological disorders (Fairchild et al., 2009; Eussen et al., 2015; Evers et al., 2015; Maat et al., 2015; Mancuso et al., 2015; Taylor et al., 2015). However, our knowledge of the development of this ability throughout childhood and, in particular, adolescence is surprisingly sparse. This study aims to explore the quantitative and qualitative changes in facial emotion recognition accuracy across this period of development.

It has been suggested that, by 6 years of age, typically developing children are relatively accurate at discriminating several facial expressions of emotion (Izard, 1971), with some studies suggesting that near-adult levels of recognition are achieved before adolescence (Tremblay et al., 2001; Rodger et al., 2015). Other studies, however, suggest childhood difficulties in recognizing expressions of fear that persist into adolescence (Baird et al., 1999; Lenti et al., 1999). As a review by Herba and Phillips (2004) pointed out, no studies to date have examined the developmental trajectory of emotion recognition development throughout childhood and adolescence. Although there now exist a handful of reports focusing on this age range, the methodologies employed differ to a straightforward emotion recognition labeling paradigm (Thomas et al., 2007; Rodger et al., 2015) and the majority of studies focus their attention on the early childhood period, with few even considering 8–11 year olds (Gao and Maurer, 2010; Mancini et al., 2013 being notable exceptions).

Gao and Maurer (2010) used a paradigm which manipulated intensity of facial expression, comparing groups of children aged 5, 7, and 10 years with a group of adults. Children selected the appropriate emotion from a choice of four in two separate sets (either neutral, happy, surprised, scared; or neutral, sad, angry, disgusted). The intensity of the emotion displayed varied between 10 and 100%, with the threshold for emotion recognition calculated. Sensitivity to happy facial expressions was at adult levels in children as young as 5 years of age, but for the other emotions there was some increase in sensitivity between the youngest children and adulthood. This raises some interesting questions with regard to the development of emotion recognition, including, 'is improvement evident in adolescence?' (there were no participants between 10 years of age and adulthood) and, 'what would the results look like if participants had the six basic emotions to choose between?' This final point may have a particular bearing on the results, since Gao and Maurer (2010) report near-ceiling levels of accuracy for recognition of the emotions when intensity reached about 50%, even in the youngest of viewers. Such levels are achieved only for happiness when task-demands required participants to choose between 6 emotion labels (Lawrence et al., 2003b).

A more recent study (Mancini et al., 2013) included a paradigm that called upon participants (aged between 8 and 11 years) to choose between the six basic emotions, together with neutral, for each face shown. Recognition accuracy increased during this period of mid-childhood, except (as found by Gao and Maurer, 2010) for happy faces. The largest age-related increases were noted for neutral and sad faces. Employing a complex paradigm to assess the perceptual threshold for detecting different emotional expressions, Rodger et al. (2015) found that sensitivity to emotional expressions increased from 5 years of age up until adulthood, for all expressions except those of happiness and fear. It would seem that the young children possessed adult-levels of sensitivity for detecting happiness and fear in faces. The stability of happiness recognition across this age range, and the fact that we are sensitive to this emotion from a young age is consistent with the findings of Mancini et al. (2013).

Looking directly at emotion recognition during late childhood (7–13 years), adolescence (14–18 years), and adulthood (25–57 years), Thomas et al. (2007) found that there was increased sensitivity to subtle changes in emotional expression in adults compared to the younger age groups. This study morphed faces between expressions of anger, fear, and neutrality, reporting linear trends in sensitivity to the changes throughout these three stages of life. Notably, for facial expressions morphed between neutral and anger, a quadratic trend was identified, whereby sensitivity to anger was equivalent in older children and adolescents but showed a marked increase between adolescence and adulthood. It is unknown whether the other basic emotions (not looked at in this study) would also continue to develop throughout adolescence.

From the above studies, we see suggestions of improvements in facial emotion recognition during childhood and adolescence. However, the different methodologies and age groups used, together with the differing emotions included, make it difficult to comprehensively understand the quantitative and qualitative developments in emotion recognition during this period of life. The current study sets out to systematically assess recognition accuracy for the six basic emotional expressions throughout childhood and adolescence. The primary aim of the current study was to explore the developmental trajectory of explicit facial emotion recognition in a large sample of children and adolescents split into large year-band categories, using original photographs from the Ekman-Friesen Pictures of Facial Affect (Ekman and Friesen, 1976).

The Ekman-Friesen Pictures of Facial affect test (Ekman and Friesen, 1976) has been used in hundreds of studies, over numerous decades, to assess the ability to recognize the six basic emotions within facial expressions; happiness, sadness, fear, surprise, disgust, and anger. The test consists of selecting which of these six emotions is best represented by each of a series of photographs. Ten images of each emotion (total 60 images) are shown in a random order and a mixture of male and female images are used. The test has been shown to have good reliability (Ekman and Friesen, 1976; Frank and Stennett, 2001), and also to be applicable for use with differing age groups from young children (Uljarevic and Hamilton, 2013), through to older adults (Calder et al., 2003). This test has not just furthered our academic understanding of emotion recognition, but is also used in clinical and educational settings to evaluate emotion recognition ability in children with developmental disorders and those with special educational needs.

In general, when emotion recognition is assessed in adult patients, the ability of a patient to recognize facial expressions is compared to standardized adult norms for such tests, but this is not possible for children since child norms do not exist. Although other tests have been developed in more recent years (including the use of morphed faces and those with cropped hairlines), very often in clinical and educational situations and for research with these groups these original Ekman faces are used as an assessment tool (Unoka et al., 2011; Cantalupo et al., 2013; Collin et al., 2013; Demirel et al., 2014; Gomez-Ibanez et al., 2014). There are differing merits to using different tests of emotion recognition. This study chose to use the basic set of faces for its ecological validity and also so that the normative data derived might be of use to clinical and educational psychologists using the original version of the task.

Several childhood neurodevelopmental disorders influence the ability to recognize facial expressions. There are difficulties recognizing facial expressions encountered by children with autism and Asperger syndrome (for example, Hobson et al., 1988; Howard et al., 2000; Sachse et al., 2014; Taylor et al., 2015). Children with psychopathic tendencies have also been reported to show selective impairments in the recognition of sad and fearful facial expressions of emotion (Stevens et al., 2001) which may extend to other emotions (Dawel et al., 2012). The lack of systematically gathered normative data on the ability of typically developing children to recognize facial expressions hampers our understanding of the nature and severity of such deficits. The correct interpretation of suspected impairments in childhood requires an understanding of the normal range of ability at any given age, and one aim of the current study was to establish such norms for children of school age. A newly developed facial identity recognition test for children [The Cambridge Face Memory Test for Children (CFMT-C)] presents norms for children from 5 to 12 years, shows a developmental improvement across this period, and is capable of detecting face recognition memory deficits in children with autism (Croydon et al., 2014). It is hoped that norms for the emotion recognition task could be equally successful in aiding the detection of deficits in children with atypical development. In order to understand the developmental normative data qualitatively, as well as quantitatively, the current study sets out to explore the effect of IQ, gender and puberty on emotion recognition ability.

Is the ability to recognize facial expressions of emotions associated with IQ? For children with autistic spectrum disorders and for psychiatric control children both verbal memory and Performance IQ have been found to predict emotion recognition ability (Buitelaar et al., 1999). Previous research by our group has shown that, within this same sample of children, recognition memory ability for facial identity was positively and significantly correlated with general cognitive ability (Lawrence et al., 2008). Another aim of the study was to assess the relationship between emotion recognition and IQ in typically developing children.

Both sexes are competent at recognizing facial expressions of emotion, with many studies finding that males and females perform at equivalent levels on a wide variety of emotion recognition tasks (Hall and Matsumoto, 2004). However, when differences are reported, they typically show a female advantage with women being more accurate decoders than men (Hall, 1978; Hall et al., 1999). One study, for example, found that females had a higher rate of correct classification of facial expressions, with males being more likely to have difficulty distinguishing one emotion from another (Thayer and Johnsen, 2000). This finding holds both for basic emotional expressions (Hall, 1978; McClure, 2000; Montagne et al., 2005; Biele and Grabowska, 2006; Mancini et al., 2013) and for more complex emotional and mental states (Baron-Cohen et al., 1997; Alaerts et al., 2011). Furthermore, males and females have been reported to show distinctive patterns of activation in neural regions involved in the processing of facial expressions of emotion including the amygdala and prefrontal cortex (Killgore et al., 2001), suggesting the possibility of different underlying mechanisms for processing them.

With respect to children specifically, Thomas et al. (2007) reported no sex differences in sensitivity to fearful and angry facial expressions, whilst other studies have suggested a constant female advantage during late childhood for facial emotion recognition (Montirosso et al., 2010). It is likely that any differences that do exist may be subtle, with research suggesting gender differentiated development of sadness and disgust recognition during childhood (Mancini et al., 2013) in the absence of differences in final levels of ability in the oldest children tested. It is unknown what happens to these developmental trends during adolescence. Not all studies of emotion recognition during childhood have explored sex differences (e.g., Gao and Maurer, 2010) and this is clearly an area that would benefit from greater exploration with a large sample across the childhood and adolescent period. One aim of the current study was to test for gender differences in the development of our ability to recognize different facial expressions of emotion across childhood and adolescence.

Given that recognition accuracy, even in young children, is relatively good, a question that needs addressing is whether there are qualitative changes in the development of facial expression recognition during childhood and adolescence and, if so, whether these changes themselves might influence recognition accuracy. In a very insightful review concerning the interplay between face recognition, adolescence and behavioral development, it is suggested that, as adolescents reorient away from their parents toward their peers, there is an increased drive for peer-acceptance and increased sensitivity to peer evaluation (Scherf et al., 2012). This may lead to qualitative changes in the type of information that is extracted from faces, with a greater emphasis than before being placed on appraisals of attractiveness and social status, together perhaps with greater sensitivity to displays of negative peer-evaluation. Scherf et al. (2012) suggest that, as a result of the changing way in which facial information is utilized, developmental differences in face processing abilities will emerge during this period of life. This notion is supported by the fact that facial attractiveness ratings undergo both quantitative and qualitative changes during late-childhood and early-adolescence (Cooper et al., 2006; Saxton et al., 2006). It has been suggested that the own-age bias in face recognition, representing superior recognition abilities for faces of a similar age to the viewer, may emerge as a result of social reorientation toward peers during this period of development (Scherf et al., 2012).

Is it possible that hormonal surges associated with puberty may influence the development of our ability to recognize facial expressions of emotion? There is evidence to suggest that hormonal fluctuations during the menstrual cycle influence fear recognition accuracy in females (Pearson and Lewis, 2005) and that emotion recognition abilities may be influenced by hormonal changes in late pregnancy (Pearson et al., 2009). Additional evidence that hormones may influence emotion recognition ability comes from studies of individuals who have Turner syndrome, a single X chromosome and are lacking in endogenous estrogen. Studies by our group have shown that these women have deficits in recognizing emotion and higherorder mental state information from facial expressions (Lawrence et al., 2003a,b). Furthermore, it has long been suggested that faceidentity recognition may undergo qualitative changes around the time of puberty (Carey and Diamond, 1977; Carey et al., 1980) with children showing a decline in face identity recognition memory around the age of 12 years. A large scale study on nearly 500 children and adolescents, by our group, is suggestive of a similar pattern, in that improvement in face recognition memory was found to increase from 6 to 16 years of age but with a plateau in performance in the mid-pubertal years of 10–13 (Lawrence et al., 2008). Indeed, as Mancini et al. (2013) point out, the differing hormonal development of boys and girls during puberty could influence emotion recognition, suggesting that future studies should seek to explore this directly. Areas within the social brain, such as the amygdala, are populated with testosterone receptors (Filova et al., 2013) suggesting a possible mechanism by which hormonal changes during puberty might influence emotion recognition abilities. We aimed to test the hypothesis that pubertal development would influence facial emotion recognition ability. This was done in an exploratory way using a subset of the adolescent sample and, as such, should be considered as indicative rather than definitive.

The objectives of the current study were; firstly to assess the developmental trajectory of facial emotion recognition in school age children and to establish norms and developmental trends for these abilities; secondly, to ascertain whether general cognitive ability (IQ) correlates with overall emotion recognition accuracy; thirdly, to explore gender differences in these abilities; and finally, to assess whether pubertal development is related to emotion recognition accuracy.

### Materials and Methods

### Participants

Four hundred and seventy eight participants were recruited from six primary schools (age 6–11 years) and eight secondary schools (ages 11–16 years) within the London area of the UK. 20–25 males and 20–25 females were recruited within each 1-year age band. A full breakdown of gender and age group is given in **Table 1**. Schools were selected on the basis that the pupils nationally assessed levels of performance (key stage test results) were within the average range. Parents provided informed consent for their child to participate in the study. Children were excluded from testing if they had known neurological or psychological difficulties.

The majority of the participants were White Caucasian (*n* = 333, 69.67%). Of the remainder, 77 (16.19%) described themselves as African/Caribbean, 34 (7.11%) Indian/Pakistani, 20 (4.18%) Asian, and 14 (2.93%) described themselves as 'Other,' typically being of mixed-ethnicity.

At all ages, Vocabulary and Matrix reasoning *t*-scores and Full-scale IQ were within the average range for both boys and girls. Mean IQ scores ranged from 94.2 to 106.8. Independent sample *t*-tests revealed no significant differences between boys and girls at any age. The overall mean IQ score for males was 99.73 (SD 12.12, range 74–139) and 98.21 (SD 13.26, range 55–139) for females. Mean vocabulary *t*-score for males was 49.47 (SD 9.00, range 26–76) and for females 48.48 (SD 9.52, range 20–73). Mean Matrix Reasoning *t*-score for males was 49.91 (SD 8.82, range 21–72) and for females 48.76 (SD 9.78, range 20–77).

Information on pubertal status was available for a limited subset of the participants. The analysis looking at pubertal status as an IV was conducted on 173 participants over the age of 11 years.

### Task Descriptions

#### Facial Emotion Recognition

A computerized version of the Ekman-Friesen Pictures of Facial affect test (Ekman and Friesen, 1976) was developed for this study. 60 full face, uncropped images (10 of each emotion) were presented individually on a computer monitor (see **Figure 1**). Participants were required to click the mouse on the emotion label (happy, sad, angry, fearful, disgusted, and surprised – presented in this standardized order) that best described what they thought the individual was feeling. Images were presented in a single block with gender and emotion inter-mixed. The ability to read the emotional labels and respond with a mouse-click was tested in all children. Three children were unable to do this and for these cases verbal responses were given and the mouse-click made by the experimenter. Faces remained on the screen until a response was made.

#### Wechsler Abbreviated Scale of Intelligence (WASI)

Two subtests, one Verbal and one Performance, were administered according to standardized procedures (Wechsler, 1999). *t*-scores for the subscales Vocabulary and Matrix Reasoning subtests were computed and an estimated IQ score derived for each individual.

### Pubertal Development Scale (PDS; Petersen et al., 1988)

The PDS has been reported to be a reliable measure of pubertal development. Standardization of this self-report questionnaire suggested that correlations between interview ratings and


*To assess the degree of deviance for any child, a simple z score can be calculated: z score* = *(child's score – mean score for age and gender)/SD. A positive z score indicates the child is performing above the mean score for their age and gender, whilst a negative z score indicates lower than mean levels of accuracy.*

questionnaire scores had a median correlation of 0.7 (Petersen et al., 1988). Ethical permission was granted to administer the questionnaire to children of secondary school age (ages 11–16 years) but not to primary school (6–11 years) children. 16 children chose not to complete the questionnaire or provided insufficient information. Complete data were obtained for 206 children. According to scoring criteria, children were classified as pre-pubertal, beginning-pubertal, mid-pubertal, advancedpubertal, or post-pubertal. Owing to the restricted age range assessed, very few children fell into either the pre-pubertal (*n* = 4), beginning pubertal (*n* = 22) or post-pubertal (*n* = 7) categories. Within the beginning pubertal group, the majority of participants were male. The categories with sufficient numbers to be used in the final analysis were group 3 'mid-pubertal' (*n* = 73) and group 4 'advanced-pubertal' (*n* = 100).

### Results

#### Development of Emotion Recognition Norms

The data were inspected for outliers. One participant (an 8 year old boy) was excluded on the basis that his recognition accuracy for the Pictures of Facial Affect was at chance level (11.67%) and much lower than the next lowest score of 40% accuracy (which was achieved by three individuals with a further eight individuals achieving accuracy of between 41 and 45%).

**Table 1** shows mean accuracy scores as percentages for each gender within each age group and each emotion category. These data therefore permit any individual child or adolescent's score to be compared with the distribution for that age-band and gender.

### IQ and Facial Emotion Recognition Abilities

Bivariate Pearson correlations were calculated for each respondent's total emotion recognition score and their IQ. These revealed a significant relationship between emotion recognition and IQ (*r* = 0.313, *n* = 474, *p <* 0.0001) across the whole sample, indicating that tested IQ was a factor in accuracy of labeling facial emotion categories. The relationship was then examined within each age-band by separate correlations. Since 11 correlations were calculated, Bonferroni corrections were applied to the significance level (0.05/11) re-setting it to 0.005. After this correction was applied, a significant relationship between IQ and emotion recognition held at age 8 (*r* = 0.424,

*n* = 44, *p* = 0.004), age 10 (*r* = 0.609, *n* = 48, *p <* 0.0001), age 13 (*r* = 0.504, *n* = 44, *p <* 0.0001) and age 14 (*r* = 0.442, *n* = 40, *p* = 0.004). Emotion recognition accuracy correlates significantly with general cognitive ability in typically developing children and adolescents; IQ was entered as a covariate into all subsequent analyses.

### Developmental Trajectory of Emotion Recognition from Facial Expression, by Gender and Age

Scores for the individual facial expressions were submitted to a repeated measures ANOVA. Percentage recognition accuracy scores for each emotion (happy, surprised, fearful, sad, disgusted, and angry) were entered as 6 levels of the repeated measure, with the 11 levels of age group (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, and 16 years) and two levels of gender (male/female) as the between-subject factors and IQ as the covariate. There were significant main effects of gender [F*(*1*,*451*)* = 24.05, *p <* 0.0001, η2 <sup>p</sup> <sup>=</sup> 0.05], and age [*F(*10*,*451*)* <sup>=</sup> 18.39, *<sup>p</sup> <sup>&</sup>lt;* 0.0001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.29] qualified by a significant interaction between emotion and age group [*F(*50*,*2255*)* <sup>=</sup> 7.74, *<sup>p</sup> <sup>&</sup>lt;* 0.0001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.15] and a significant interaction between emotion and gender [*F(*50*,*2255*)* = 2.30, *<sup>p</sup>* <sup>=</sup> 0.04, <sup>η</sup><sup>2</sup> <sup>p</sup> <sup>=</sup> 0.005). As can be seen from **Figure 2**, emotion recognition accuracy increased with age, with females outperforming males at all ages.

Recognition accuracy for each emotion was then examined using separate univariate ANOVAs to identify the source of the interactions between emotion, age group and gender. Each emotion was entered separately as the dependent variable, with age group and gender as the fixed factors and IQ as the covariate. Age group was, in each instance, subjected to polynomial contrasts in order to identify any age trends in recognition accuracy. The different age trends for each of the individual emotions, by gender, have been plotted in **Figure 3**.

#### Happiness

Happy faces were accurately named by children of all ages. At 6 years of age children could accurately name 92% of happy faces. However, there was a significant main effect of age [*F(*10*,*451*)* <sup>=</sup> 2.84, *<sup>p</sup>* <sup>=</sup> 0.002, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.059] with a significant linear improvement with age identified (*p* = 0.001). Females achieved significantly higher levels of accuracy than males, although the mean difference was small [Female mean 96.75, SD 6.44; Male mean 94.86, SD 9.90; *F(*1*,*451*)* = 7.67, *p* = 0.006, η2 <sup>p</sup> = 0.017]. There was no significant interaction between gender and age.

#### Surprise

There was a significant main effect of age for the recognition accuracy of surprised faces [*F(*10*,*451*)* = 11.41, *p <* 0.0001, η<sup>2</sup> <sup>p</sup> = 0.20]. Both boys and girls showed significant linear improvements with age in the ability to recognize facial expressions of surprise (difference = 29.89, *p <* 0.0001). A significant quadratic trend was also identified (difference = −15.14, *p <* 0.0001). This reflects the linear improvement up to age 10 or 11 years followed by an asymptote. Ten year olds achieve a mean accuracy score of 86.67% for surprised faces, nearly identical to the level of accuracy achieved by 16 year olds (86.14%). There was a main effect of gender, with females achieving higher recognition rates than males [Female mean 82.09, SD 22.21; Male mean 77.35, SD 26.05; *F(*1*,*51*)* = 9.31, *<sup>p</sup>* <sup>=</sup> 0.002, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.02].

#### Fear

There was a significant main effect of age on the recognition accuracy of fearful faces [*F(*10*,*451*)* = 7.16, *p <* 0.0001, η2 <sup>p</sup> = 0.14]. There was a significant linear trend in improvement of recognition accuracy for fearful faces with increasing age (difference = 28.20, *p <* 0.0001). There were no significant differences in accuracy according to gender and no interaction between age and gender.

#### Sadness

Young children were accurate at recognizing sad facial expressions. There was no effect of either age group or gender on the ability to recognize sad faces, and the interaction between these factors was non-significant. On independent samples *t*-test, there was no significant difference in the recognition of sad faces by 6 year olds (79.39%) and 16 year olds (74.77%; df 91, *t* = 1.27).

#### Disgust

There was a significant main effect of age for the recognition of disgust [*F(*10*,*451*)* <sup>=</sup> 21.23, *<sup>p</sup> <sup>&</sup>lt;* 0.0001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.32], with a linear trend (difference = 54.99, *p <* 0.0001). Females recognized more disgusted facial expressions than males [Female mean 58.84, SD 30.50; Male mean 52.47, SD 29.56; *F(*1*,*451*)* = 13.63, *p <* 0.0001, η2 <sup>p</sup> = 0.029). There was no interaction between age and gender.

#### Anger

There was no effect of age on the recognition of angry facial expressions. Independent samples *t*-test revealed that the scores achieved by 6 year olds (76.53%) were similar to those achieved by 16 year olds (77.50%; df 91, *t* = −0.246, n.s). A small gender effect was found for this ability [Female mean 77.01, SD 17.40; Male mean 73.31, SD 18.13; *<sup>F</sup>(*1*,*451*)* <sup>=</sup> 5.81, *<sup>p</sup>* <sup>=</sup> 0.016, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.01] but there was no significant interaction between age and gender.

### Pubertal Development and Emotion Recognition

Emotion recognition accuracy was analyzed for respondents who were classified as being in the stages of mid-puberty or advanced-puberty. A multivariate analysis of covariance was performed with each of the six emotions entered as dependent

variables, the two levels of pubertal development as the fixed factor, and gender, age group and IQ as covariates. Mean recognition accuracy for each of the emotions according to pubertal stage is shown in **Table 2**. There was a main effect of pubertal development on recognition accuracy for facial expressions of disgust [*F(*1*,*159*)* <sup>=</sup> 7.63, *<sup>p</sup>* <sup>=</sup> 0.006, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.05) and anger [*F(*1*,*159*)* <sup>=</sup> 4.10, *<sup>p</sup>* <sup>=</sup> 0.04, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.03], but no significant effect for the other facial expressions. There were no significant interactions between pubertal development and gender. Recognition of facial expressions of anger and disgust was significantly more accurate in advanced-puberty than in mid-puberty, after the effects of age, gender and IQ had been taken into account. The means and 95% confidence intervals for recognition accuracy are presented in **Figures 4** and **5**.

A summary of the main effects for age, gender, and puberty is shown in **Table 3**.

### Discussion

This study establishes childhood norms for the recognition of the six basic facial expressions using the Ekman-Friesen Pictures of Facial Affect (Ekman and Friesen, 1976). Developmental

trajectories for various emotions were strikingly different. We have demonstrated that the ability to recognize certain facial expressions of emotion, including fear, disgust, and surprise, improved considerably with age across childhood and adolescence. Whilst for other emotions, notably happiness, sadness, and anger, levels of recognition were very similar for 6 year olds and 16 year olds.

Most previous studies have focused on a sub-set of these emotions (Thomas et al., 2007) or on part of this developmental period (Gao and Maurer, 2010; Mancini et al., 2013). We believe this study is unique in assessing all six basic emotions in 1-year age-bands in a large sample of school age children of both genders, from the age of six to 16 years, using the same methodology (simple expression labeling) and the same materials across the age range. These normative data will be useful for appraising this skill in any individual child or adolescent, and therefore for educational or clinical diagnosis. The Ekman-Friesen Pictures of Facial Affect test is widely used for assessing emotion recognition skill in children with developmental disorders and behavioral and learning difficulties (Unoka et al., 2011; Cantalupo et al., 2013; Collin et al.,

TABLE 2 | Emotion recognition (% accuracy) scores for the Ekman-Friesen test of Facial Affect according to pubertal stage and gender.



TABLE 3 | Overview table showing the main effects reported in this study for the individual emotions.

*A refers to a significant main effect for this variable being found. A* × *refers to no significant main effect for this variable being found.*

2013; Uljarevic and Hamilton, 2013; Demirel et al., 2014; Gomez-Ibanez et al., 2014), but normative data has, so far, been lacking. The results of our study allow non-normative performance to be identified. For example, individuals with autism spectrum disorders (ASDs) have been reported to have sub-optimal recognition of certain facial expressions of emotion using the Ekman-Friesen Pictures of Facial Affect in both standardized and re-developed paradigms (for example, Humphreys et al., 2007; Wright et al., 2008a). Since we have identified the scale of variation in the performance of typically developing children on this test, *z*-scores for any individual child can be compared to normative data, reducing the need for a control study. Elsewhere, we have charted developmental curves for the recognition of individual facial expressions, and developed algorithms to compute centile scores based on age, recognition score and gender (Wade et al., 2006).

Previous research has suggested children may have disproportionate difficulties recognizing particular facial expressions. It has been suggested that surprise, fear, anger and disgust may present disproportionate problems (Camras and Allison, 1985). In the current study, children were substantially worse at identifying facial expressions of fear and disgust in these adult face images than they were at recognizing any other emotional expression. Disgust and fear were the least accurately recognized emotion in 6-year olds and also showed the greatest linear improvements with age. By 16 years of age, respondents were as accurate at recognizing fearful and disgusted faces as faces depicting other emotions. In contrast, the recognition of sadness and anger showed no developmental age trends; 6 year olds were as good as 16 year olds at recognizing these emotions. A recent study by Rodger et al. (2015), consistent with the current study, found that recognition of sadness remained stable and accurate from an early age, with most other emotions showing some improvement during childhood. However, unlike our study, they also found this to be the case for expressions of surprise, which were found here to continue to show improvements until late-childhood. Variations in the methodologies used may account for this discrepancy. Since the current study used emotion labeling, whereas Rodger's study was assessing perceptual thresholds, it seems possible that age-related increments in accuracy were related to efficiency of recognizing and/or labeling surprise, rather than any perceptual developments.

More research is needed to understand the differences between emotions and why such contrasting developmental trajectories exist. The recognition of some emotions relies more on information from the upper-face (eyes) or the lower face (mouth) or the configuration of the whole face (Calder et al., 2000; Lawrence et al., 2003a,b) but these differences seem unable to explain the differing developmental trajectories observed in childhood. Similarly, there is no obvious differentiation in neural regions recruited to process these emotions that easily explains these developmental differences.

Why might it be the case that we acquire accuracy in labeling certain emotions (such as sadness, happiness, and anger) at a much younger age than we do for other emotions (notably; fear, surprise, and disgust)? These findings would seem to be consistent with a theory put forward by Widen (2013) who has suggested that young children divide facial expressions into two categories ('feels good' or 'feels bad') and only gradually does this system of classification undergo qualitative changes, enabling children to increasingly use more specific, discrete categories. According to this theorizing, the initial distinction made within the 'feels bad' category is between angry and sad faces, which is in line with our finding that both these emotions were recognized accurately by the youngest participants. A further test of this idea would be to analyze incorrect responses in relation to the hypothesis: more errors would be anticipated for 'within' than 'between' the two feeling groups.

Another possible explanation for age-sensitive profiles for surprise and disgust comes from research in autism. Taylor et al. (2015) suggest that surprise and disgust may cause greatest difficulty to people with developmental impairments including ASDs and specific language impairment (SLI) because these are expressions which signal states of mind and intention. Since people with ASD often have difficulty in interpreting intentions of others (theory of mind deficits), and since tests of theory of mind show developmental progression in normally developing children, it is plausible that these expressions may be inaccurately recognized in the youngest children, and show the greatest improvement with age. A rather different interpretation for the pattern of results is that the recognition of emotion in adult faces, by children, may be susceptible to an 'ownage bias' effect, as has been demonstrated for face recognition (Proietti et al., 2014), and also as a factor in the effects of emotion on brain activation (Wright et al., 2008b). That is, the present findings may underestimate children's ability to recognize certain emotions on adult faces, specifically. It is possible that those same emotions, seen on age-peer faces, may be better recognized by children. In this context it is worth noting that happiness, sadness, and anger are emotions displayed by adults to younger children which may be expected to elicit specific behaviors in the child. The 'later-developing' emotions (disgust, surprise, fear) may be more (age)-peer-specific in their effects.

### General Cognitive Ability and Emotion Recognition

Our study highlights the significant association between emotion recognition skills and general cognitive ability throughout childhood and adolescence, as has been previously demonstrated for facial recognition memory (Lawrence et al., 2008). This finding has been demonstrated within atypical populations; for example in children with autistic spectrum disorders and for psychiatric control children (Buitelaar et al., 1999) but our study confirms that this is also the case for typically developing children. This suggests two things. Firstly, that children with cognitive delay may have concomitant delays in their ability to accurately decode facial expressions of emotion. Secondly, when making deductions about domain specific impairments in emotion recognition accuracy within clinical populations, it is important to assess the level of general cognitive functioning in such individuals, using either a control group matched for cognitive ability or comparing scores against normative values based on mental age rather than chronological age.

### Gender Differences in Emotion Recognition

Previous research exploring sex differences in recognition accuracy for facial expressions of emotion has argued that females are more accurate decoders of emotional expressions than males (Hall, 1978; Hall et al., 1999; McClure, 2000; Montagne et al., 2005; Biele and Grabowska, 2006; Mancini et al., 2013). Our research supports and extends these findings, suggesting that during childhood and adolescence, girls are significantly more accurate than boys at recognizing facial expressions of emotion. This is consistent with the considerable popular science literature built on an assumption that females have superior empathy and emotion recognition abilities (e.g., Baron-Cohen and Wheelwright, 2004; Alaerts et al., 2011). It supports the findings of research with dynamic faces which demonstrate a female advantage throughout latechildhood (Montirosso et al., 2010). Analyzing individual emotions, it becomes clear that this pattern reflects a slight female advantage for the recognition of happiness, surprise, disgust and anger but not for the recognition of fearful or sad faces. However, this small but significant female advantage has not been replicated in all studies (e.g., Hall and Matsumoto, 2004), which may relate to differences in methodologies employed together with the sample sizes, age ranges used.

The small gender differences identified may reflect sexually dimorphic processing of facial expressions. A study by our group has suggested that men and women may call upon different functional processes for face and emotion recognition (Campbell et al., 2002). For females there was a positive correlation between the ability to recognize fearful facial expressions and face identity recognition, this correlation was absent for males. From this it was suggested that females and males may rely on different psychological processes for these tasks. The small difference

in accuracy of emotion recognition between boys and girls in the current study may be reflective of subtle differences in the underlying psychological processes recruited by males and females.

Although no age by gender interactions were statistically significant in our study, other studies suggest that gender differences decrease with age. Mancini et al. (2013) report that girls had high accuracy of sadness and disgust recognition by age 8 with boys not reaching these levels until the age of 11 years, when they surpass the performance of girls. Could this male lag in the development of emotion recognition be related to anecdotal reports and observational studies that boys are more emotionally immature than girls when they start school, exhibiting different temperament characteristics to girls (Yoleri, 2014).

### Pubertal Development and Emotion Recognition

We found that anger and disgust were better recognized by respondents classified as late-pubertal compared with those who were mid-pubertal (age partialled out). These findings are similar to those reported by Thomas et al. (2007) who analyzed recognition of facial expressions morphed between neutral and anger (but did not look at recognition of disgust). They found that sensitivity to anger increased significantly between adolescence and adulthood. This is of particular interest for angry faces, as this *puberty-related* development is in stark contrast with the lack of *age-related* development of this ability in our sample. Angry and disgusted expressions could be conceived of as signaling disapproval and negative judgments of the viewer. In this sense, the finding would seem to be consistent with the observation that, as children enter adolescence, they become increasingly driven to seek the social acceptance of their peers, whilst becoming acutely sensitive to peer evaluation (Steinberg and Morris, 2001; Scherf et al., 2012). The synaptic reorganization that is evident in the adolescent brain (see Blakemore and Choudhury, 2006) may make regions dedicated to processing emotional information especially sensitive to environmental experience during this period of development. It might be hypothesized that hormonal changes during puberty differentially affect psychological processes and potentially neural circuits involved in the recognition of these facial expressions.

Due to insufficient numbers within the pre- and early-puberty groups, it was not possible to analyze the developmental trend for these recognition skills throughout the full range of the pubertal period. Future studies may benefit from collecting data across these stages of puberty in order to explore whether the linear improvement noted in our current study is preceded by any 'dip' in abilities in early puberty that may mirror the decline (or plateau) and subsequent improvement in facial recognition memory that has been observed in some (Carey and Diamond, 1977; Carey et al., 1980; Lawrence et al., 2008), but not all, studies. Puberty-related developments in facial emotion and facial identity recognition may be suggestive of qualitative changes in the way we process and extract meaning from faces that may in turn be under-pinned, or at the very least supported by, the structural and functional re-organization of brain circuitry recruited for such processing.

### Methodological Considerations

The Ekman faces task uses monochrome full-face photographs of adults taken during the 1970s. Even though emotions have been shown to be culturally consistent and universal (Ekman and Friesen, 1971), it is worth bearing in mind that the use of images from a different era might impact on the results of this study. Not only may there be an own-age bias (Proietti et al., 2014), but it is possible that gray-scale photographs of people wearing old- fashioned clothes, make-up and hairstyles, and who therefore seem to be from a different historical era, may elicit different responses than those of images seen to be of the respondents' own time. Another aspect of this set of photographs is that they show very little ethnic variation, being predominantly of light-skinned Caucasian– featured individuals.

There are limitations to using the PDS, in that it relies on self-report, which may be inaccurate. Furthermore, the sensitive nature of the questions asked render it problematic to obtain ethical permission to administer, especially in younger children who are more likely to be pre-pubertal. For this reason, we have a very limited number of pre-pubertal children within our study, which makes it difficult to fully explore the development trend across all the stages of puberty. A more illuminating approach could be to adopt direct saliva testing of hormonal levels which may give, not only a more valid measurement of pubertal stage, but also be ethically acceptable to use across the entire age span of childhood and adolescence.

### Conclusion

There is burgeoning interest in the apparent deficits associated with the recognition of facial expressions among children with pervasive developmental disorders (Bolte and Poustka, 2003; Hall et al., 2003; Unoka et al., 2011; Cantalupo et al., 2013; Collin et al., 2013; Demirel et al., 2014; Gomez-Ibanez et al., 2014). The ability for clinicians to detect such impairments in an objective way relies upon the establishment of quantitative norms of the emotion recognition abilities of typically developing children. Using a well-established set of emotional face photographs (Ekman and Friesen, 1976), this study has enabled us to ascertain the normal developmental patterns of emotion recognition abilities, which are surprisingly different for different emotional expressions. Girls were more accurate than boys at recognizing some facial expressions of emotion, and pubertal maturation appeared to influence the development of the ability to recognize expressions of anger and disgust. We found considerable variance in the recognition ability of typically developing children. Such variance may in part explain why studies of emotion recognition in neurodevelopmental disorders, based on relatively small samples of clinical and control participants, often report

### References

inconsistent findings. The normative data provided in this study will aid researchers in assessing degree of impairment with more accuracy. For typically developing children, recognition of sadness, anger and happiness from facial expressions is highly accurate in early childhood. However, the ability to recognize facial expressions of fear, disgust and (to a lesser extent) surprise, matures significantly over the course of late childhood and adolescence.

If changing face and emotion recognition abilities serve as good model to understand adolescent development more generally (as suggested by Scherf et al., 2012), then researching changes in these abilities may be instrumental to developing our understanding of behavioral and mental health vulnerabilities within the teenage years. Adolescence represents a time of particular vulnerability for developing difficulties that could be seen as being associated with emotion processing or emotional regulation. For example, mood disorders such as depression and generalized anxiety disorder become increasingly prevalent in adolescence (Zuckerbrot and Jensen, 2006; Beesdo et al., 2009) and the onset of schizophrenia is often seen toward the end of the teenage years (Gogtay et al., 2011). In addition rates of antisocial behavior peak in adolescence (see Fairchild et al., 2013 for a review). Depression, anxiety, schizophrenia, and conduct disorder (which is common in those demonstrating antisocial behavior) have all been associated with deficits in facial emotion recognition accuracy (Demenescu et al., 2010; Ventura et al., 2013; Weightman et al., 2014; Sully et al., 2015). Potentially, assessing facial emotion recognition abilities in at-risk individuals might allow the detection of potential vulnerabilities, which, in turn, may have implications for intervention strategies that could provide experiential input that may encourage more appropriate emotional development during this sensitive period. As such, a fuller understanding of some of these issues could have implications for teenage mental health provision, secondary education and the remediation and legal treatment of youngoffenders.

### Acknowledgments

This research was supported by The National Alliance for Autism Research (NAAR) and the Nancy Lurie Marks Family Foundation. Thanks to Tim Cole for providing advice on sample size for this study. Many thanks to Paul Ekman for providing the stimuli for this study. Data collection was assisted by Deborah Bernstein, Sarah Brand, David Spektor, and William Mandy. Thanks to Lubna Ahmed for her comments on a revision of the manuscript. We thank the schools involved for their remarkable support in conducting the study. Finally, we especially thank all the participants of our investigation for their time, which was generously and enthusiastically given.

Baird, A. A., Gruber, S. A., Fein, D. A., Maas, L. C., Steingard, R. J., Renshaw, P., et al. (1999). Functional magnetic resonance imaging of facial affect recognition in children and adolescents. *J. Am. Acad. Child Adolesc. Psychiatry* 38, 195–199. doi: 10.1097/00004583-199902000- 00019

Alaerts, K., Nackaerts, E., Meyns, P., Swinnen, S. P., and Wenderoth, N. (2011). Action and emotion recognition from point light displays: an investigation of gender differences. *PLoS ONE* 6:e20989. doi: 10.1371/journal.pone.0020989


immediate memory (warrington RMF) from 6 to 16 years. *J. Neuropsychol.* 2(Pt 1), 27–45. doi: 10.1348/174866407X231074


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Lawrence, Campbell and Skuse. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Aging and emotional expressions: is there a positivity bias during dynamic emotion recognition?

*Alberto Di Domenico1\*†, Rocco Palumbo1,2†, Nicola Mammarella1 and Beth Fairfield1*

*<sup>1</sup> Department of Psychological Sciences, University of Chieti, Chieti, Italy, <sup>2</sup> Schepens Eye Research Institute, Harvard Medical School, Boston, MA, USA*

In this study, we investigated whether age-related differences in emotion regulation priorities influence online dynamic emotional facial discrimination. A group of 40 younger and a group of 40 older adults were invited to recognize a positive or negative expression as soon as the expression slowly emerged and subsequently rate it in terms of intensity. Our findings show that older adults recognized happy expressions faster than angry ones, while the direction of emotional expression does not seem to affect younger adults' performance. Furthermore, older adults rated both negative and positive emotional faces as more intense compared to younger controls. This study detects agerelated differences with a dynamic online paradigm and suggests that different regulation strategies may shape emotional face recognition.

#### *Edited by:*

*Bozana Meinhardt-Injac, Johannes Gutenberg University Mainz, Germany*

#### *Reviewed by:*

*Nicola Jane Van Rijsbergen, University of Glasgow, UK Ahmed Megreya, Qatar University, Qatar*

#### *\*Correspondence:*

*Alberto Di Domenico, Department of Psychological Sciences, University of Chieti, Via dei Vestini 31, Chieti 66100, Italy alberto.didomenico@unich.it*

*†These authors have contributed equally to this work.*

#### *Specialty section:*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

*Received: 15 April 2015 Accepted: 20 July 2015 Published: 04 August 2015*

#### *Citation:*

*Di Domenico A, Palumbo R, Mammarella N and Fairfield B (2015) Aging and emotional expressions: is there a positivity bias during dynamic emotion recognition? Front. Psychol. 6:1130. doi: 10.3389/fpsyg.2015.01130* Keywords: aging, positivity bias, emotion recognition, facial expression recognition, face perception

### Introduction

Face perception is one of the most well developed visual skills in human beings. Moreover, it is a skill present from the very early stages of life (Johnson et al., 1991) and holds a crucial role in social communication (Haxby et al., 2000). Indeed, we describe feelings, intentions, motivations, impressions, and above all, emotions based on faces that reveal a large amount of information to the perceiver and at least six emotional expressions expressed by the human species are communicated through facial expressions. In fact, happiness, fear, surprise, anger, disgust, and sadness are typically identified with extreme precision even when shown in static or dynamic images (Howell and Jorgensen, 1970; Buck et al., 1972; Wagner et al., 1986; Ekman et al., 1987). Most importantly, face perception, although sensitive to aging and clinical conditions, plays an adaptive role (Zebrowitz et al., 2015).

Interestingly, contrary to this adaptive function in which we would expect negative faces to have an advantage, literature in emotional face recognition has constantly identified a behavioral recognition advantage for happy faces with respect to negative ones (Calvo and Beltrán, 2013). One of the reasons for this advantage may be that participants in these studies are generally asked to recognize only a final version of an emotional face. Here, we were interested in examining emotional biases and preferences in online recognition of emotional faces that may be more closely related to motivational preferences (Fairfield et al., 2015a).

Increasing evidence shows that face recognition may be impaired in older adults. Studies investigating the effects of aging on face perception using tasks such as face detection (Norton et al., 2009), face identification (Habak et al., 2008; Megreya and Bindemann, 2015) and emotion recognition (Calder et al., 2003) have shown how older adults are slower and less accurate on these face perception tasks (Hildebrandt et al., 2011, 2013). More importantly, aging seems to be related to qualitative changes as well as quantitative changes in face perception [e.g., reaction time (RTs), accuracy etc.]. Different fields of psychology such as perception and memory have shown that older adults seem to show a preference for positive emotional stimuli, a phenomenon referred to in the literature as the positivity effect. This effect on working memory is well documented in literature (Mammarella et al., 2012; Fairfield et al., 2013) and many studies on the effects of aging on memory have highlighted enhanced memory for positive-valence autobiographical events (Kennedy et al., 2004) and in remembering positive images (Mikels et al., 2005) compared to younger adults. In addition, studies on trait impression have shown that older adults tend to judge faces as more positive than younger adults and to perceive faces as more trustworthy as well as less hostile and less dangerous, especially for the most threatening-looking faces (Ruffman et al., 2006; Castle et al., 2012; Zebrowitz et al., 2013).

Traditionally, tasks that assess emotion perception use static facial stimuli representing happy, fear, and neutral expressions but a potentially important factor influencing visual emotion perception concerns the role of dynamic information. It has been reported that healthy controls show an improvement in emotion recognition for dynamic over static point-light displays (Atkinson et al., 2004). Dynamic stimuli therefore present an interesting case for investigating emotion perception in aging. In fact, few studies have assessed the threshold of intensity at which emotions are most consistently identified.

Previous studies have used dynamic emotion recognition tasks based on real videos (Banziger et al., 2009; Minardi, 2012), but here we adopted a new online task. Starting from two pictures of the "Karolinska Directed Emotional Faces" (Lundqvist et al., 1998) portraying the same actor, we generated several morphs and subsequently created videos of faces in which facial expressions changed their intensity from neutral to happy or from neutral to angry. In this way, we have been able to examine whether normal aging is associated with reduced perceptual processing of emotional cues and to determine whether older adults require more intense stimuli to correctly label and discriminate emotional facial expressions. We recorded RTs in facial expression recognition in younger and older adults. In line with facial expression recognition literature, we expected older adults to perform more slowly than younger adults. In addition, to investigate the direction of emotions (i.e., positivity bias for older adults), we asked participants to rate angry, negative and hybrids faces on a visual analogic scale from positive to negative. In this case, we predicted that older adults would rate faces more positively than younger ones.

### Materials and Methods

### Participants

A group of 40 younger and 40 healthy older adults who scored high on the Mini-Mental State Examination (MMSE; Folstein et al., 1975; *M* = 28.75, SD = 1.1; maximum score = 30) participated in the experiment after giving written informed consent in accordance with the the Declaration of Helsinki. The study was approved by the local departmental ethical committee. Participants' demographic and clinical characteristics are presented in **Table 1**. Exclusion criteria included history of severe head trauma, stroke, neurological disease, severe medical illness or alcohol or substance abuse in the past 6 months. All participants reported normal or corrected-to-normal visual and auditory acuity and younger and older adults reported being in good health.

### Stimuli

We created 20 dynamic videos from two versions of the same actor selected from the "Karolinska Directed Emotional Faces" (Lundqvist et al., 1998). The first version was neutral while the second was happy or angry (gender of the actors and emotions were balanced across trials). These two pictures were then morphed to obtain 98 hybrid faces with an increasing percentage of happiness or anger and these 100 pictures were presented, from the neutral to the happy/angry, for 40 ms in order to generate the video.

### Procedure

#### Recognition Phase

The recognition phase was split into two identical sessions to avoid fatigue. In each session, participants watched 10 videos in the center of the screen and then complete a forced choice recognition test.

During the videos, an initially neutral face gradually changed to assume an expression of happiness or anger. Each video, preceded by a 200 ms fixation point, lasted 4000 ms. Participants pressed the space bar as soon as they were able to identify the emotional expression the face was assuming. Participants subsequently pressed the "m" key if the face had assumed a positive expression or the "z" key if the face had assumed a negative one.

#### Rating Phase

Participants rated 24 new faces according to valence. Six faces were happy, six faces were angry and 12 faces were hybrid (**Figure 1**). Each hybrid face (50% happy and 50% angry) was created starting from two pictures of the "Karolinska Directed Emotional Faces" portraying the same actor; the first picture was happy and the second was angry (gender of actors was balanced across trials). Each face, preceded by a 200 ms fixation point, was presented in the center of the screen for 1000 ms. Participants were then instructed to evaluate, using a visual analog scale (i.e.,

#### TABLE 1 | Participants' demographic characteristics.


*Values are means (SD) or as otherwise indicated.* ∗*p 0.001.*

FIGURE 1 | Example of hybrid face.

a line presented horizontally in the center of the screen), how positive or negative the face seemed by moving a slider along the line with the mouse. The line represented a double-ended continuum where the two ends indicated the maximum value of positivity on one side and the maximum value of negativity on the opposite side. The direction of the continuum positive/negative or negative/positive was balanced across participants.

### Results

First, the *t*-test on recognition accuracy did not show any significant differences between groups (*t* = 1.71, *p* = 0.09). This indicates that older and younger adults were equally able to process, label and discriminate faces.

Second, we submitted the accuracy scores (percentage) for facial expression changes to a 2 (*Emotion*: Happy, Angry) × 2 (*Group*: Younger vs. Older Adults) mixed-design analysis of variance. No significant effect were found, indicating that there were no differences in discriminating the changing to happy versus changing to angry for both groups, younger and older adults (the average accuracy was 96%).

Third, in order to evaluate differences between groups in the temporal processing of the facial expression changes, we submitted RTs to a 2 (*Emotion*: Happy, Angry) × 2 (*Group*: Younger vs. Older Adults) mixed-design analysis of variance. The mixed ANOVA revealed a main effect of group (*F*1*,*<sup>78</sup> = 9.99, *p <* 0.01) since younger adults were faster than older adults, a main effect of emotion (*F*1*,*<sup>78</sup> = 30.08, *p <* 0.001) because participants recognized changes from neutral to happy faster than changes to angry and a significant two way *Emotion* × *Group* interaction (*F*1*,*<sup>78</sup> = 31.19, *p <* 0.001). The *post hoc* analysis on the *Emotion* × *Group* interaction confirmed that older adults were slower to recognize changes from neutral to angry (*M* = 1208.5) compared to happy (*M* = 613.1, *p <* 0.001). No differences were found in the RTs of younger adults (*p* = 0.94; **Figure 2**).

Finally, in order to examine differences between groups in facial expression ratings, we submitted the face judgment ratings to a 3 (*Emotion*: Happy, Hybrid, Angry) × 2 (*Group*: Younger vs. Older Adults) mixed-design analysis of variance. The mixed ANOVA revealed a main effect of group (*F*1*,*<sup>78</sup> = 4.72, *p <* 0.05) because younger adults judge faces more negatively than older adults, a main effect of emotion (*F*1*,*<sup>78</sup> = 838.74, *p <* 0.001) and a significant two-way *Emotion* × *Group* interaction (*F*2*,*<sup>156</sup> = 15.59, *p <* 0.001). The *post hoc* analysis confirmed that older adults rated negative facial expressions more negatively (*M* = −34.53) than younger adults (*M* = −26.64) and positive facial expressions more positively (*M* = 37.09) than younger adults (*M* = 28.77). Older adults also rated the hybrid faces as more positive (*M* = 4.65) than younger adults (*<sup>M</sup>* = −0.85; **Figure 3**).

### Discussion

The aim of this paper was to examine what aspects of emotional facial recognition are impaired in older adults by using a novel emotional face recognition task that combines a dynamic recognition phase with a more general static facial rating. Accuracy data indicated that both groups are able to perform the task correctly. However, when we analyzed RTs, we found that older and younger adults showed different patterns of recognition based on face expression. Older adults detected happy expressions faster than angry expressions while younger adults did not show any differences in the time it took them to recognize facial expression. This pattern of performance seems to be linked to the emotional valence of the facial expression since we did not find any differences between the two groups when we asked them to complete a subsequent forced choice recognition phase to evaluate general recognition difficulties. All together, these results seem to suggest a positivity bias during dynamic emotion recognition in older adults. We did not find a happy face advantage typically found in younger adults. This may be because participants did not recognize a single final face in our study, but pressed a key as soon as they were able to detect the direction of the emotional change on a face. The recognition task in itself was very easy and led to ceiling effects in the younger adults that may have "hidden" the happy face advantage. In addition, we found that older adults evaluate unambiguous emotional faces of both valences more intensely than controls. Interestingly, when faces are ambiguous, as in the hybrid condition, only the older

### References


adults maintain more intense ratings for positive faces compared to younger adults.

Older adults exhibited enhanced recognition of happy expressions. This finding is consistent with literature showing that older adults prefer positive emotional stimuli (Mather and Carstensen, 2005; Mammarella et al., 2013; Di Domenico et al., 2014). It is possible that age-related motivational changes guide the processing of emotional information and subsequently lead to emotional effects. In fact, older adults often show enhanced memory for positive emotional information. Accordingly, they tend to focus less on negative information linked to perceived time limitations that lead to motivational shifts and direct attention to emotionally meaningful goals (Carstensen, 1995; Fairfield et al., 2015b). Differently, younger adults typically perceive time as more expansive and consequently prioritize goals related to knowledge acquisition and are typically motivated toward knowledge-related goals.

However, our results might also be influenced by the fact that older adults favor different facial features (e.g., Wong et al., 2005). Indeed specific parts of the face can drive emotional processing. For example, the mouth for happiness and the eyes for anger (e.g., Schyns et al., 2007). In future studies, may want to investigate the scanning path of older adults compared to younger adults by manipulating experimental emotional faces.

### Conclusion

In our study, the age-related differences in emotional facial expression recognition evidenced how different regulation strategies shape preferences in emotion processing leading older adults to show a preference for positive information, while younger adults prefer negative information. These findings may have implications for developing new clinical treatments in terms of new emotional facial recognition training programs.

### Acknowledgment

We thank all the participants in our study.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Di Domenico, Palumbo, Mammarella and Fairfield. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Compensating for age limits through emotional crossmodal integration**

*Laurence Chaby 1,2 \*, Viviane Luherne-du Boullay <sup>3</sup> , Mohamed Chetouani <sup>2</sup> and Monique Plaza <sup>2</sup>*

*1 Institut de Psychologie, Sorbonne Paris Cité, Université Paris Descartes, Boulogne-Billancourt, France, <sup>2</sup> Groupe Intégration Multimodale, Interaction et Signal Social, Institut des Systèmes Intelligents et de Robotique, CNRS UMR 7222, Paris, France, <sup>3</sup> Université Paris 8 Vincennes Saint Denis, Saint-Denis, France*

Social interactions in daily life necessitate the integration of social signals from different sensory modalities. In the aging literature, it is well established that the recognition of emotion in facial expressions declines with advancing age, and this also occurs with vocal expressions. By contrast, crossmodal integration processing in healthy aging individuals is less documented. Here, we investigated the age-related effects on emotion recognition when faces and voices were presented alone or simultaneously, allowing for crossmodal integration. In this study, 31 young adults (*M* = 25.8 years) and 31 older adults (*M* = 67.2 years) were instructed to identify several basic emotions (happiness, sadness, anger, fear, disgust) and a neutral expression, which were displayed as visual (facial expressions), auditory (non-verbal affective vocalizations) or crossmodal (simultaneous, congruent facial and vocal affective expressions) stimuli. The results showed that older adults performed slower and worse than younger adults at recognizing negative emotions from isolated faces and voices. In the crossmodal condition, although slower, older adults were as accurate as younger except for anger. Importantly, additional analyses using the "race model" demonstrate that older adults benefited to the same extent as younger adults from the combination of facial and vocal emotional stimuli. These results help explain some conflicting results in the literature and may clarify emotional abilities related to daily life that are partially spared among older adults.

#### **Keywords: aging, emotion, faces, voices, non-verbal vocalizations, multimodal integration, race model**

### **Introduction**

Emotion recognition is a fundamental component of social cognition. The ability to discriminate and interpret others' emotional states from emotional cues plays a crucial role in social functioning and behaviors (Carton et al., 1999; Adolphs, 2006; Corden et al., 2006; Frith and Frith, 2012). From early and throughout lifespan, emotion recognition is an essential mediator of successful social interactions and well-being (Izard, 2001; Engelberg and Sjöberg, 2004; Kryla-Lighthall and Mather, 2009; Suri and Gross, 2012). Hence, impaired recognition of others' emotional states may result in severe social dysfunctions, including inappropriate social behaviors, poor interpersonal communication and reduced quality of life (Feldman et al., 1991; Shimokawa et al., 2001; Blair, 2005). Such difficulties have been observed not only in disorders characterized

**Abbreviations:** PASA, Posterior–Anterior Shift in Aging; fMRI, functional magnetic resonance imaging; ERP, eventrelated potentials; RT, response times; STS, superior temporal sulcus, p-STC, posterior superior temporal cortex; BDI, Beck Depression Inventory; MMSE, Mini Mental State Examination; CDFs, cumulative distribution functions.

#### *Edited by:*

*Andrea Hildebrandt, Ernst-Moritz-Arndt Universität Greifswald, Germany*

#### *Reviewed by:*

*Natalie Ebner, University of Florida, USA Cesar F. Lima, University College London, UK*

#### *\*Correspondence:*

*Laurence Chaby, Groupe Intégration Multimodale, Interaction et Signal Social, Institut des Systèmes Intelligents et de Robotique, CNRS UMR 7222, 4 Place Jussieu, Paris 75005, France laurence.chaby@parisdescartes.fr*

#### *Specialty section:*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

*Received: 01 April 2015 Accepted: 10 May 2015 Published: 27 May 2015*

#### *Citation:*

*Chaby L, Luherne-du Boullay V, Chetouani M and Plaza M (2015) Compensating for age limits through emotional crossmodal integration. Front. Psychol. 6:691. doi: 10.3389/fpsyg.2015.00691* by prominent social-behavioral deficits (i.e., autism spectrum disorders, schizophrenia, neurodegenerative dementia; e.g., Chaby et al., 2012; and see for review Kennedy and Adolphs, 2012; Kumfor et al., 2014) but also in normal aging, which is frequently associated with social withdrawal and loneliness (e.g., Szanto et al., 2012; Steptoe et al., 2013).

Although older adults report high levels of satisfaction and better emotional stability with advancing age (Reed and Carstensen, 2012; Sims et al., 2015), they have difficulties processing some types of emotional information, which is often marked by a decline in emotion recognition (Ruffman et al., 2008; Isaacowitz and Blanchard-Fields, 2012). Most past studies have identified age-related difficulties in the visual channel, particularly when participants were asked to recognize emotion from posed facial expressions (see for review, Chaby and Narme, 2009; Isaacowitz and Stanley, 2011). These posed expressions were created to convey a single specific emotion, typically with exaggerated individual features, without any distracting or irrelevant features. However, emotions are not usually expressed solely by the face during daily social interactions; typically, voice (including non-verbal vocalizations) is also an important social signal, which needs to be processed quickly and accurately to allow successful interpersonal interactions. The rare studies that have explored how the ability to recognize vocal emotion changes with age have been conducted on speech prosody using words or sentences spoken with various emotional expressions. Theses studies concluded that advancing age is associated with increasing difficulties in recognizing emotion from prosodic cues (Kiss and Ennis, 2001; Paulmann et al., 2008; Mitchell et al., 2011; Lambrecht et al., 2012; Templier et al., 2015). However, vocal emotions could also be experienced via non-verbal affect bursts (e.g., *screams or laughter*; see Scherer, 1994) that typically accompany intense emotional feelings and that might be considered as the vocal counterpart of facial expressions. The processing of non-verbal vocal affects in aging individuals has rarely been studied (see Hunter et al., 2010; Lima et al., 2014); thus, this issue needs to be further investigated.

Altogether, the above studies showed evidence of age-related decline of some basic emotions via unimodal visual or auditory channels. These changes might start early, at approximately 40 years, for both facial (Williams et al., 2009) and prosodic emotions (Paulmann et al., 2008; Mill et al., 2009; Lima and Castro, 2011), and decline may occur linearly with advancing age (see Isaacowitz et al., 2007). In particular, compared to young adults, older adults could experience difficulties recognizing fear, anger and sadness from faces but experience no deficits recognizing happy or neutral faces (see for review, Isaacowitz et al., 2007; Ruffman et al., 2008). The recognition of disgust also seems highly preserved in older adults (e.g., Calder et al., 2003). Data from voices are less coherent, as difficulties have been found in older adults only for anger and sadness (Ruffman et al., 2008) or for almost all emotions (e.g., Paulmann et al., 2008).

Different mechanisms have been proposed to explain these age-related changes in emotion recognition. One preeminent explanation concerns structural and functional brain changes associated with age. Multiple interconnected brain regions are implicated in visual and auditory emotional processing. These regions include the frontal lobes, particularly the orbitofrontal cortex (Hornak et al., 2003; Wildgruber et al., 2005; Tsuchida and Fellows, 2012) and the temporal lobes, particularly the superior temporal gyrus (Beer et al., 2006; Ethofer et al., 2006). The amygdala is also involved in this processing (Iidaka et al., 2002; Fecteau et al., 2007). Prefrontal cortex atrophy (in particular atrophy of the orbitofrontal region; Resnick et al., 2003, 2007; Lamar and Resnick, 2004) is a known marker of normal aging and could explain the difficulties identifying some facial emotions, in particular anger. Moreover, although the amygdala does not decline as rapidly as the frontal regions, some studies have reported a linear reduction of its volume with age (Mu et al., 1999; Allen et al., 2005). When comparing elderly people with young adults, neuroimaging studies observed a less significant activation of this structure among the elderly during the processing of emotional faces, especially negative ones (Mather et al., 2004). This was coupled with increased activity in the prefrontal cortex (Gunning-Dixon et al., 2003; Urry et al., 2006; Ebner et al., 2012). Conversely, other studies found a decrease in functional connectivity between the amygdala and posterior structures, which may reflect a decline in the perceptual process (Jacques et al., 2009). Overall, these patterns of brain activity observed in neuroimaging studies during a variety of emotional tasks (including recognition) are consistent with the Posterior–Anterior Shift in Aging (PASA; for review, see Dennis and Cabeza, 2008), which reflects the effect of aging on brain activity.

Another explanation for older adults' lower performance on negative emotion recognition emerges within the framework of the socio-emotional selectivity theory (Carstensen, 1992). With advancing age, adults appear to concentrate on a few emotionally rewarding relationships with their closest partners, report greater emotional control, and reduce their cognitive focus on negative information. Based on these observations, it was suggested that "paradoxically," the recognition of negative emotion declines (Carstensen et al., 2003; Charles and Carstensen, 2010; Mather, 2012; Huxhold et al., 2013).

Losses in cognitive and sensory functions are also possible explanations for age-related changes in emotion recognition. Increasing age is often associated with a decline in cognitive abilities (e.g., Verhaeghen and Salthouse, 1997; for review, see Salthouse, 2009), as well as with losses in visual and auditory acuity (Caban et al., 2005; Humes et al., 2009), which could hamper higher-level processes such as language and perception (Sullivan and Ruffman, 2004). However, these sensory attributes are shown to be poor predictors of the age-related decline in visual or auditory emotional recognition (e.g., Orbelo et al., 2005; Mitchell, 2007; Ryan et al., 2010; Lima et al., 2014).

Previous research on age-related differences in the recognition of basic emotions has focused predominantly on a single modality, and thus little is known about age-related differences in crossmodal emotion recognition. However, in daily life, people perceive emotions through multiple modalities, such as speech, voices, faces and postures (e.g., Young and Bruce, 2011; Belin et al., 2013). This indicates that our brain merges information from different senses to enhance perception and guide our behavior (Ernst and Bülthoff, 2004; Ethofer et al., 2013). Evidence supporting this idea includes studies of brain-damaged patients, such as traumatic or vascular brain injuries and brain tumors. These studies found similar impairments in processing emotions from faces and voices in a single modality, but found that brain-damaged patients experienced greater performance using both facial and vocal stimuli (e.g., Hornak et al., 1996; Borod et al., 1998; Calder et al., 2001; Kucharska-Pietura et al., 2003; du Boullay et al., 2013; Luherne-du Boullay et al., 2014).

Some studies in young adults have demonstrated that congruent emotional information processed via multisensory channels optimizes behavioral responses, which results in enhanced accuracy and faster response times (RT; De Gelder and Vroomen, 2000; Kreifelts et al., 2007; Klasen et al., 2011). In older adults, audiovisual performances have been shown to be equivalent or even improved relative to younger adults (Laurienti et al., 2006; Peiffer et al., 2007; Diederich et al., 2008; Hugenschmidt et al., 2009; DeLoss et al., 2013), with more rare exceptions showing reduced multisensory integration in older adults (Walden et al., 1993; Sommers et al., 2005; Stephen et al., 2010). Some of these studies have explored the effects of age on crossmodal emotional processing and found evidence for preserved multisensory processing in older adults when congruent auditory and visual emotional information were presented simultaneously (Hunter et al., 2010; Lambrecht et al., 2012).

Multisensory integration refers to the process by which unisensory inputs are combined to form a new integrated product (Stein et al., 2010). This process has been studied in humans using neuroimaging techniques, which show that different regions of the human brain are implicated in the integration of multimodal cues, including "convergence" areas such as the superior temporal sulcus (STS; Laurienti et al., 2005; James and Stevenson, 2011; Watson et al., 2014; see for review, Stein et al., 2014). Neuroimaging techniques such as functional magnetic resonance imaging (fMRI) generally show greater activity in response to bimodal stimulation. More precisely, in a series of fMRI experiments conducted by Kreifelts and collaborators (e.g., Kreifelts et al., 2007; see for review, Brück et al., 2011) the posterior superior temporal cortex (p-STC) emerges as a crucial structure for the integration of facial and vocal cues. In event-related potential (ERP) studies (e.g., Giard and Peronnet, 1999; Foxe et al., 2000; Molholm et al., 2002), multisensory enhancement is measured by comparing the ERP from the multisensory condition to the sum of the ERPs from each unimodal condition. Multisensory enhancement is also commonly measured in behavioral studies by calculating a redundancy gain between the crossmodal stimulus and the more informational unimodal stimulus. Another interesting method performed in studies using RT, is to test whether the redundant target effect (shorter RT under the crossmodal condition) reflects an actual multisensory integrative process by comparing the observed RT distribution with the distribution predicted by the "race model" (proposed by Miller, 1982; see also Colonius and Diederich, 2006). The "race model" assumes that a crossmodal stimulus presentation produces parallel activation (i.e., in a separate way) of the unimodal stimuli. According to this model, the shortening of RT for crossmodal relative to unimodal stimuli derives from the fact that either unimodal stimulus can produce a response. Thus, any violation of the race model (i.e., if the observed RTs in crossmodal trials are shorter than those predicted by the race model) indicates that the stimuli are not processed in separate channels, which suggests an underlying integrative mechanism (see Laurienti et al., 2006; Girard et al., 2013; Charbonneau et al., 2013).

To date, the processing mechanisms responsible for multisensory enhancement in older compared to young adults remains unclear, and crossmodal emotional integration in aging evaluated by the race model has not been investigated. To characterize the age-related effect on emotional processing, we used emotional human stimuli (i.e., happy, angry, fear, sad and disgust) and a neutral expression in the form of unimodal (facial or vocal) or crossmodal (simultaneous congruent facial and vocal expressions) cues. Isolated facial expression was studied using pictures of posed facial expressions, and isolated vocal expression was studied using non-verbal affect stimuli. Our primary focus concerned crossmodal emotional processing in aging, and we aimed to explore whether older adults benefit from congruent crossmodal integration and to better understand the nature of this benefit. According to recent studies of multisensory integration mechanisms during aging (e.g., Lambrecht et al., 2012; Freiherr et al., 2013; Mishra and Gazzaley, 2013), we hypothesized that older adults benefit from congruent crossmodal presentation when identifying emotions. To assess this hypothesis, we calculated redundancy gains for scores and used the race model for RTs to determine the nature of multisensory integration achieved by combining redundant visuo-auditory information.

### **Materials and Methods**

### **Participants**

The study participants consisted of 31 younger (20–35; *M* = 25.8, SD = 6.4; 16 females) and 31 older adults (60–76; *M* = 67.2, SD = 5.8; 17 females); see **Table 1**. The participants spoke French and reported having normal or corrected-to-normal vision and good hearing abilities at the time of testing. All participants were living independently in the community and were in good general physical health. None of the participants had any history of psychiatric or neurological disorders, which might compromise cognitive function. They also had a normal score on the Beck Depression Inventory (Beck et al., 1996; BDI II, 21 item version; a score of less than 17 was considered to be in the minimal range). All elderly adults completed the Mini Mental State Examination (Folstein et al., 1975; MMSE), on which they scored above the cutoff score (26/30) for risk of dementia. Grade level was calculated with the Mill Hill Vocabulary Scale (French adaptation: Deltour, 1993), and this did not differ between groups (*p* = 0.55).

**TABLE 1 | Participant demographic characteristics.**


The study was approved by the ethics committee of Paris Descartes University (Conseil d'Evaluation Ethique pour les Recherches en Santé, CERES, n IRB 2015100001072) and all participants gave informed consent.

### **Materials**

Examples of stimuli and the task design for each condition are illustrated in **Figure 1**.

*Visual stimuli*. Visual stimuli consisted of pictures of human facial expressions obtained from the Karolinska Directed Emotional Faces database (Lundqvist et al., 1998). This database was chosen because it provided good examples of universal emotion categories with a high accuracy of labeling. The faces of 10 models (5 females, 5 males) expressing facial expressions of happiness, sadness, anger, fear, disgust or neutral constituted a set of 60 stimuli. All stimuli (presented on a black background) were 10 cm in height and subtended a vertical visual angle of 8° at a viewing distance of 70 cm.

*Auditory stimuli*. Auditory stimuli (**Figure 1**) consisted of nonverbal affective vocalizations (cry, laugh, etc.) obtained from The Montreal Affective Voices database (Belin et al., 2008). This database was chosen because it provided a standardized set of emotional vocalizations corresponding to the universal emotion categories without the potential confounds from linguistic content. The voices of 10 actors (5 females, 5 males) expressing happiness, sadness, anger, fear, disgust or neutral, vocalization constituted a set of 60 stimuli.

*Crossmodal stimuli*. Each emotional face was combined with an affective vocalization to construct 60 congruent expressions of faces and voices. The gender of the face and the voice were always congruent.

### **Procedure**

Participants were tested individually in a single session that lasted approximately 45 min. The protocol was run using E-prime presentation software (Psychology Software Tools). Prior to the experiment, short facial-matching and vocal-matching tasks were administered to control for basic visual and auditory abilities in processing faces and voices. The subjects were asked to match the identity of non-emotional faces (i.e., six pairs of neutral faces obtained from the Karolinska Directed Emotional Faces Database) and non-emotional voices (i.e., six pairs of neutral voices obtained from the Montreal Affective Voices database). The stimuli were different from those used in the main task.

Then, after a short familiarization period, the experiment began. The experiment consisted of three blocks (visual, auditory, crossmodal) of 60 trials. Each trial started with the presentation of a fixation cross for 300 ms and was followed by the target stimulus, which was presented or repeated until the subject responded. Participants were asked to select (by clicking with the computer mouse) one label from a list of choices that best described the emotion presented. The six labels were displayed at the bottom of the computer screen and were visible throughout the test. There was an inter-trial interval of 700 ms. The order of the three blocks was counterbalanced across participants, and the order of trials was pseudo-randomized across each block. During the session, resting pauses were provided after every 10 trials, and the participants could take breaks if necessary between blocks. No feedback was given to the participants.

### **Statistical Analysis**

Participants' accuracy (scores of correct responses) and corresponding RT (in milliseconds, ms) was computed for each condition. To control for outliers, trials with RT below 200 ms or greater than two standard deviations above the mean of each condition (0.90% of the trials in young adults; 1.25% of the trials in older adults) were excluded.

First, the data were entered into an overall analysis of variance (ANOVA), with age (young adults, older adults) as a betweensubjects factor and with modality (visual, auditory, crossmodal) and emotion (neutral, happiness, fear, anger, sadness, disgust) as within-subjects factors. Effect sizes are reported as partial eta-squared (η 2 *p*). ANOVAs were adjusted with the Greenhouse-Geisser non-sphericity correction for effects with more than one degree of freedom. To provide clarity, uncorrected degrees of freedom, the Greenhouse-Geisser epsilon (ε) and adjusted *p* values are reported. Planned comparisons or *post hoc* Bonferroni tests were conducted to further explore the interactions between age, modality and emotion. The alpha level was set to 0.05 (*p* values were corrected for multiple comparisons).

Second, to examine whether both groups showed redundancy gains, as reflected by the difference in the scores when the visual and auditory stimuli were presented together (crossmodal condition) compared to each modality alone (unimodal condition), we calculated a "redundancy gain" for each participant separately by subtracting the higher of the scores under the unimodal conditions from the score under the crossmodal condition [(*crossmodal score—best modality score) × 100*] (see Calvert et al., 2004; Girard et al., 2013). The significance of the difference in redundancy gain (*in percent*) between younger and older participants was tested using an independent samples *t* - test.

Finally, to further test the advantage of crossmodal over unimodal processing, we investigated whether the RTs obtained under the crossmodal condition exceeded the statistical facilitation predicted by the race model (Miller, 1982). In multisensory research, the race model inequality has become a standard tool to identify crossmodal integration using RT data (Townsend and Honey, 2007). To analyze the race model inequality, we used RMItest software (http://psy.otago.ac.nz/miller), which implements the algorithm described in Ulrich et al. (2007). The procedure requires four steps. First, participants' RTs in each condition (i.e., visual, auditory and crossmodal) are converted to cumulative distribution functions (CDFs). Second, the race model distribution is calculated by summing the CDFs of observed responses to the two unimodal conditions (visual and auditory) to create a "predicted" multisensory distribution. Third, percentile points (i.e., in the present study: 5th, 15th, 25th, 35th, 45th, 55th, 65th, 75th, 85th, and 95th) are determined for every distribution of RT. Finally, in each group, the mean RT for the crossmodal condition and the "predicted" condition are compared for each percentile using a *t*test. If significant values are obtained in the crossmodal condition relative to the predicted condition, we conclude that the race model cannot account for the facilitation of the redundant signal conditions, supporting the existence of an integrative process.

### **Results**

### **Age-related Difference in Emotion Recognition**

Mean performance and RTs for all conditions are presented in **Table 2**<sup>1</sup> . For both younger and older groups, the mean performance accuracy was greater than 80% for the visual, auditory and crossmodal conditions. However, we found significant main effects of age, indicating that older adults performed less accurately and more slowly than younger adults (85.23 *±* 1.24% vs. 92.58 *±* 0.51%, *F*(1,60) = 30.17, *p <* 0.001, η 2 *<sup>p</sup>* = 0.33 for scores; 3619 *±* 145 ms vs. 1991 *±* 68 ms, *F*(1,60) = 103.4, *p <* 0.001, η 2 *<sup>p</sup>* = 0.63 for RTs). Importantly, we found a significant effect of modality on the scores [*F*(2,120) = 137.54, *p <* 0.001, ε = 0.92, η 2 *<sup>p</sup>* = 0.7] and the RTs [*F*(2,120) = 62.48, *p <* 0.001, ε = 0.88, η 2 *<sup>p</sup>* = 0.51], indicating that participants responded more effectively under the crossmodal condition than under either unimodal condition (all *p <* 0.001). There was a significant effect of emotion on the scores [*F*(5,300) = 92.11, *p <* 0.001, ε = 0.62, η 2 *<sup>p</sup>* = 0.60] and the RTs [*F*(5,300) = 36.91, *p <* 0.001, η 2 *<sup>p</sup>* = 0.38]. Furthermore, main effects were accompanied by several two-way interactions: between group and modality (see **Figure 2**) on the scores [*F*(2,120) = 6.98, *p* = 0.002, ε = 0.92, η 2 *<sup>p</sup>* = 0.10] and the RTs [*F*(2,120) = 7.66, *p* = 0.001, ε = 0.88, η 2 *<sup>p</sup>* = 0.11]; between group and emotion on the scores [*F*(5,300) = 8.13, ε = 0.61, *p <* 0.001, η 2 *<sup>p</sup>* = 0.12] and the RTs [*F*(5,300) = 8.62, *p <* 0.001, η 2 *<sup>p</sup>* = 0.12], and between modality and emotion on the scores [*F*(10,600) = 27.01, *p <* 0.001, ε = 0.55, η 2 *<sup>p</sup>* = 0.31] and the RTs [*F*(10,600) = 10.52, *p <* 0.001, ε = 0.56, η 2 *<sup>p</sup>* = 0.15]. Importantly, there was a significant effect of the three-way interaction between group, modality and emotion on the scores [*F*(10,600) = 3.23, *p* = 0.005, ε = 0.55, η 2 *<sup>p</sup>* = 0.05] and the RTs [*F*(10,600) = 2.7, *p* = 0.016, η 2 *<sup>p</sup>* = 0.04]. This reveals the following (see **Table 2**): (a) in the visual and auditory modality, older adults have lower scores

than younger adults for the negative emotions only<sup>2</sup> (i.e., sadness, anger and disgust in the visual modality, *p <* 0.01; anger and fear in the auditory modality, *p <* 0.01), and (b) in the crossmodal condition, older adults perform more poorly than younger adults for anger only (*p <* 0.001). Concerning RTs, in the unimodal and crossmodal conditions, older adults were slower to identify all emotions (all *p <* 0.001) except for happiness (*p >* 0.1).

### **Integration of Crossmodal Emotional Information in Aging**

To explore the ultimate crossmodal gain in the scores, we calculated a "redundancy gain" (i.e., the difference between the crossmodal condition and the unimodal condition with the higher score) for each participant in the two groups (see Materials and Methods section).

For the scores, our analysis indicated that the redundancy gain was greater for the older (8.82%) than for the younger adults (5.86%, *p* = 0.007). In the older group, all but two subjects showed a redundancy gain (29/31; one performed equally between the auditory modality and the crossmodal condition, and the other performed slightly better under the visual condition compared to the crossmodal condition). Moreover, there was a significant difference between the unimodal and crossmodal conditions for all emotions (all *p <* 0.003). In the younger group, all subjects except for one (30/31; who performed equally between the auditory condition and the crossmodal condition) showed a redundancy gain. Our analysis showed a significant difference between the unimodal and crossmodal conditions for negative emotions only (fear, sadness, anger and disgust) (all *p <* 0.007); for the neutral emotion and for happiness, performance ceilings may explain the lack of significant effects (all *p >* 0.1).

For RTs, we used the race model to explore crossmodal integration and to determine whether the observed crossmodal behavioral enhancement (i.e., shorter RTs) was beyond that predicted by statistical summation of the unimodal visual and auditory conditions (**Figure 3**). In the younger group, we observed a violation of the race model prediction for the 5th, 15th, 25th, and 35th percentiles of the RT distribution (all *p <* 0.01, but not for the slowest percentiles (all *p >* 0.1). These results support

<sup>1</sup>To control for potential gender differences, this variable was initially entered as a between-subject factor in the analyses. However, gender failed to yield any significant main effects (*F <* 1) or interactions (*p >* 0.1) so we collapsed across gender in the reported analysis.

<sup>2</sup>Note that a ceiling effect was observed for happiness in both groups and for neutral in the younger group.




the existence of a crossmodal integrative process. The temporal window in which this benefit was significant was from 1019 to 1410 ms. Similar to the responses by the younger group, the older group responses were shorter than those predicted by the race model for the 5th, 15th, 25th, and 35th percentiles of the RT distribution (all *p <* 0.01). The temporal window in which this benefit was significant was from 1647 to 2300 ms. Although the maximal enhancement occurred at different absolute RTs between the two populations, this peak enhancement occurred at the exact same percentile of the cumulative distribution curve.

percentiles, i.e., the 5th, 15th, 25th, and 35th percentiles (all *p <* 0.01).

### **Discussion**

While a large body of evidence shows that older adults are less accurate than younger adults in recognizing specific emotions from emotional faces, fewer studies have examined vocal emotion recognition, and hardly any studies have investigated the recognition of emotion from emotional faces and voices presented simultaneously (Hunter et al., 2010; Lambrecht et al., 2012). The purpose of this study was to compare unimodal facial and vocal emotion processing in older and younger adults and, in addition, to test whether older adults benefit from the combination of congruent emotional information from different channels, which reveals crossmodal integration. Our results first confirm that older adults experience difficulties in emotion recognition. They were less accurate and slower overall than younger adults in processing emotion from facial or non-verbal vocal expressions presented alone. Second, the participants similarly recognized facial and vocal cues, and both groups benefitted from the crossmodal condition. Third, age-related differences were modulated by emotion, as older adults were particularly affected in term of accuracy with regards to processing negative emotions under both the facial and vocal conditions. Finally, our results provide compelling evidence for the multisensory nature of emotional processing in aging. The important finding of this study was that older adults benefit to the same extent as younger adults from the combination of information presented in the visual and auditory modalities. This suggests that crossmodal processing represents a mechanism compensating for deficits in the visual or auditory channels that often affect older adults.

### **Effects of Age on Emotion Recognition Based on Unimodal Stimuli**

Our findings indicated that emotion recognition based on unimodal stimuli changes with age. In the visual modality, our results support previous findings showing age-related difficulties in the ability to recognize emotion from facial cues (see for a metaanalysis, Ruffman et al., 2008). However, most of these studies used the collection of posed black-and-white photographs of human faces from the 1970s Ekman dataset (e.g., Orgeta and Phillips, 2008; Hunter et al., 2010; Slessor et al., 2010) that has been criticized for its lack of ecological validity, which leads to questions about the generalizability of the results (Murphy and Isaacowitz, 2010). The present study used emotional expressions consisting of static color photographs of faces (see also, Ebner et al., 2010; Eisenbarth and Alpers, 2011), and this study confirmed the robustness of age-related difficulties. The fact that the same results were found using dynamic facial expressions (Lambrecht et al., 2012) confirms that widespread difficulties in recognizing emotion from facial cues are encountered by older adults. In the auditory modality, the ability to recognize emotion from non-verbal vocal cues also becomes less efficient with age. This result is in accordance with that of Hunter et al. (2010), who used non-verbal affective vocalizations. It is also in line with some recent studies using spoken words in a neutral context that showed impairments in decoding emotional speech with advancing age (Paulmann et al., 2008; Mitchell et al., 2011; Lambrecht et al., 2012). As normal variations of prosodic emotion ability could be associated with depression, relationship satisfaction or wellbeing in younger populations (Noller and Feeney, 1994; Emerson et al., 1999; Carton et al., 1999), the question remains whether and how the age-related decline in emotional vocal processing influences social interactions. However, it is important to note that in our study, the performance of the older group reached 80%, suggesting a relatively mild deficit. This suggest that nonverbal vocalizations, that are devoid of linguistic information, are however, effective at communicating diverse emotions in aging.

### **Age-related Difference in the Responses to Different Sensory Modalities and Specific Emotions**

However, the main effect of age was tempered by a set of interactions, suggesting that age-related differences varied across modalities and across specific emotions. Specifically, in response to the visual and auditory stimuli, we found an age-related reduction in accuracy for negative expressions (i.e., fear, sadness, anger and disgust) and comparable performance for neutral and happy expressions. For the visual modality, this result is in accordance with individual studies using images of static faces showing different emotional expressions, which showed that certain discrete emotions, notably negative ones, are more sensitive to age-related variation (see for review, Ruffman et al., 2008). Studies regarding the auditory channel are more inconsistent because they are based on diverse paradigms. The results differ inasmuch as the studies did not isolate specific emotions (e.g., Orbelo et al., 2005; Mitchell, 2007; Mitchell et al., 2011), they investigated negative emotions only (e.g., Hunter et al., 2010), they explored a few contrasting emotions (e.g., Lambrecht et al., 2012) or they included several positive and negative emotions (e.g., Wong et al., 2005; Lima et al., 2014). Wong et al. (2005) found that older adults poorly recognized only sadness and happiness in speech; in contrast, using non-verbal vocalizations, Hunter et al. (2010) found that older adults poorly identified negative emotions (fear, anger, sadness, disgust), whereas Lima et al. (2014) found that older adults performed poorly for all emotions (positive and negative ones). Note however, that for scores, interpretations about agerelated difference in responses to specific emotions are limited because of the presence of ceiling effects for happy and neutral expressions. Interestingly, for RTs the effects seem to be more general since older adults were especially slow to respond to all emotions.

These divergent results across aging studies may be due to the individual variability of the samples and the use of different types of emotional stimuli with varying presentation times, which might influence the identification of the given emotion. In the present study, the stimuli were presented or repeated until the subject provided a response. The observed slower RT for all negative emotions contrasts with the findings of recent studies (Pell and Kotz, 2011; Rigoulot et al., 2013) using verbal emotional stimuli, which showed that listeners are generally faster at identifying fear, anger, and sadness and slower at identifying happiness and disgust. This suggests that non-verbal affective vocalizations are processed at different rates. Interestingly, this time window is consistent with a work by Pell (2005); when happy, sad, or neutral pseudo-utterances spoken in English were cut from the onset of the sentence to last 300, 600, or 1000 ms in duration, emotional priming of a congruent static face was only observed when vocal cues were presented for 600 or 1000 ms, but not for only 300 ms. Hence, vocal information enduring at least 600 ms maybe necessary to presumably activate shared emotion knowledge responsible for multimodal integration. More importantly, our data show that the participants did not find it easier to identify emotion from isolated facial or non-verbal vocal cues. By contrast, Hunter et al. (2010), who used facial and vocal non-verbal emotions, found that emotion recognition was easier in response to facial cues than vocal cues. However, our experiment used not only negative but also happy and neutral expressions, which potentially improved the performance of older adults in both the visual and auditory modalities.

Overall, these results are consistent with the fact that agerelated emotional difficulties do not reflect general cognitive aging (Orbelo et al., 2005) but rather a complex change affecting discrete emotions; notably, the same authors also suggest that the agerelated decline in emotional processing is not explained by sex effects or age-related visual or hearing loss. Nevertheless, assessing hearing and seeing abilities objectively could have informed the pattern of our findings and we can consider the lack of measuring these covariates as a limitation of the study. For example, recent findings (Ruggles et al., 2011; Bharadwaj et al., 2015) suggest that despite normal or near-normal hearing thresholds, a significant portion of listeners exhibit deficits in everyday communication (i.e., in complex environments such as noisy restaurants or busy streets).

These results could also be interpreted in terms of the socioemotional selectivity theory, which states that aging increases emotional control, diminishes the impact of negative emotions and facilitates concentration on more positive social interactions (e.g., Charles and Carstensen, 2010; Huxhold et al., 2013). However, Frank and Stennett (2001) have noted that using only a few basic emotion categories allows participants to choose their response based on discrimination and exclusion rules, which is less likely to be the case in a real-life setting. In particular, if happiness is the only positive emotion, participants can make the correct choice as soon as they recognize a smile. Therefore, a ceiling effect can be an alternative explanation to the socio-emotional selectivity theory. An alternative to examine possible valence-specific effects is the use of a similar number of positive and negative emotions (see Lima et al., 2014).

### **Integration of Crossmodal Emotional Information in Aging Individuals**

The principal goal of the current study was to explore whether older adults benefit from congruent crossmodal integration and to better understand the nature of this benefit. In daily life, the combination of information from facial and vocal expressions usually results in a more robust representation of the expressed emotion (e.g., De Gelder and Vroomen, 2000; Dolan et al., 2001; De Gelder and Bertelson, 2003), which thus results in a more unified perception of the person (Young and Bruce, 2011).

In our study, emotional faces and voices come from different sensory modalities to build a unified and coherent representation of the same percept (i.e., an emotion) as defined by crossmodal integration mechanisms (Driver and Spence, 2000). We showed that whereas older adults exhibited slower RTs under the crossmodal condition, resulting in a different temporal window of multisensory enhancement, a multisensory benefit occurred to the same extent in the two groups. However, early studies of multisensory integration in aging individuals showed that compared to younger adults, older adults did not benefit from multisensory cues (Stine et al., 1990; Walden et al., 1993; Sommers et al., 2005) and experienced a suppressed cortical multisensory integration response that was associated with poor cortical integration (Stephen et al., 2010). By contrast, more recent studies point toward an enhancement of multisensory integration effects in older adults, notably reporting shorter RT in response to multisensory events (e.g., Mahoney et al., 2011, 2012; DeLoss et al., 2013).

Consistent with the latter works, the present study indicates that in younger and older adults, emotional information derived from facial and vocal cues is not reducible to the simple sum of the unimodal inputs and suggests that multisensory integration is maintained with increasing age and could play a compensatory role in normal aging. This is in accordance with a magnetoencephalography study (Diaconescu et al., 2013), which indicated that sensory-specific regions showed increased activity after visual-auditory stimulation in young and old participants but that inferior parietal and medial prefrontal areas were preferentially activated in older subjects. Activation of the latter areas was related to faster detection of multisensory stimuli. The authors proposed that the posterior parietal and medial prefrontal activity sustains the integrated response in older adults. This hypothesis is supported by the theory of PASA and that of cortical dedifferentiation, stating that healthy aging is accompanied by decreased specificity of neurons in the prefrontal cortex (Park and Reuter-Lorenz, 2009; Freiherr et al., 2013). This could explain why the crossmodal RTs of older adults was longer than that of younger adults for each emotion.

Furthermore, a recent study using ERPs by Mishra and Gazzaley (2013) among healthy older adults (60–90 years old) suggested the existence of compensatory mechanisms susceptible to sustaining efficient crossmodal processing. The authors showed evidence that distributed audio-visual attention results in improved discrimination performance (faster RTs without any differences in accuracy in congruent stimuli settings) compared to focused visual attention. They noted that the benefits of distributed audio-visual attention in older adults matched those of younger adults. Interestingly, ERPs recoding during the task further revealed intact crossmodal integration in higher performing older adults, who had results similar to those of younger adults. As suggested by Barulli et al. (2013), attention, executive function and verbal IQ may play a role in the generation of a "cognitive reserve" that reduces the deleterious effects of aging and, thus, buffers against a diminished adaptive strategy (Hodzik and Lemaire, 2011). These results show the necessity of taking into account individual cognitive differences in aging. It is clear that significant cognitive decline is not an inevitable consequence of advancing age and that each cognitive domain is differentially affected. As aging can have diverse effects on cognitive functions, it is therefore important to emphasize the maintained functions rather than taking a customary approach that only underlines the loss of capacities among the elderly.

It should be noted however, as a possible limitation of the current study, that our stimuli are quite unnaturalistic since they combine non-dynamic (photographs) and dynamic (sound) stimuli. Although our participants did not report any incongruent perception of crossmodal stimuli, the use of emotional expressions that contain truly multimodal expressions (video and audio obtained from the same person), which are not posed, but enacted using the Stanislawski technique (see the Geneva Multimodal Emotion Portrayals, GEMEP; Bänziger et al., 2012) could be relevant.

### **Conclusion**

In conclusion, our results suggest that despite a decline in facial and vocal emotional processing with advancing age, older adults integrate facial and vocal cues to yield a unified perception of the person. Given the changes in facial and vocal modality exhibited by older adults, it may be helpful for family members and caregivers to use multiple sensory modalities to communicate important affective information. Thus, supplementing facial cues with vocal information may facilitate communication, preventing older individuals from withdrawing from the community and reducing the development of affective disturbances such as depression. Future research is required to further examine whether crossmodal integration can benefit older adults who exhibit cognitive impairments (e.g., Mild Cognitive Impairments, Alzheimer's Disease). Such studies would be of particular interest in the context of recently developed assistive robotics platforms that prolong the ability of persons who have lost their autonomy to remain at home. For instance, serious games and socially aware assistive robots have actually been designed without considering the age-specific effects on social signal recognition. Therefore, improving the efficiency and suitability of these interactive systems clearly requires a better understanding of crossmodal integration.

## **Author Contributions**

Study concept and design was performed by LC, VL, MC, and MP. Data acquisition was conducted by VL. Data analysis was performed by LC, VL, and MC. All authors contributed to data

### **References**


interpretation and the final version of the manuscript, which all approved.

### **Acknowledgments**

This work was partially supported by the Labex SMART (ANR-11- LABX-65) under French state funds managed by the ANR within the Investissements d'Avenir program under reference ANR-11- IDEX-0004-02 and by the FUI PRAMAD2 project. We are also grateful to all of the volunteers who generously gave their time to participate in this study.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Chaby, Luherne-du Boullay, Chetouani and Plaza. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Children can discriminate the authenticity of happy but not sad or fearful facial expressions, and use an immature intensity-only strategy**

*Amy Dawel <sup>1</sup> \*, Romina Palermo 2,3, Richard O'Kearney <sup>2</sup> and Elinor McKone <sup>1</sup>*

*<sup>1</sup> Research School of Psychology and ARC Centre of Excellence in Cognition and its Disorders, The Australian National University, Canberra, ACT, Australia, <sup>2</sup> Research School of Psychology, The Australian National University, Canberra, ACT, Australia, <sup>3</sup> ARC Centre of Excellence in Cognition and its Disorders, and School of Psychology, University of Western Australia, Perth, WA, Australia*

#### *Edited by:*

*Andrea Hildebrandt, Ernst-Moritz-Arndt Universität Greifswald, Germany*

#### *Reviewed by:*

*Sally Olderbak, Universität Ulm, Germany Eva G. Krumhuber, University College London, UK*

#### *\*Correspondence:*

*Amy Dawel, Research School of Psychology and ARC Centre of Excellence in Cognition and its Disorders, The Australian National University, Canberra, ACT 2600, Australia amy.dawel@anu.edu.au*

#### *Specialty section:*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

*Received: 23 January 2015 Accepted: 31 March 2015 Published: 05 May 2015*

#### *Citation:*

*Dawel A, Palermo R, O'Kearney R and McKone E (2015) Children can discriminate the authenticity of happy but not sad or fearful facial expressions, and use an immature intensity-only strategy. Front. Psychol. 6:462. doi: 10.3389/fpsyg.2015.00462* Much is known about development of the ability to label facial expressions of emotion (e.g., as happy or sad), but rather less is known about the emergence of more complex emotional face processing skills. The present study investigates one such advanced skill: the ability to tell if someone is genuinely feeling an emotion or just pretending (i.e., authenticity discrimination). Previous studies have shown that children can discriminate authenticity of happy faces, using expression intensity as an important cue, but have not tested the negative emotions of sadness or fear. Here, children aged 8–12 years (*n* = 85) and adults (*n* = 57) viewed pairs of faces in which one face showed a genuinelyfelt emotional expression (happy, sad, or scared) and the other face showed a pretend version. For happy faces, children discriminated authenticity above chance, although they performed more poorly than adults. For sad faces, for which our pretend and genuine images were equal in intensity, adults could discriminate authenticity, but children could not. Neither age group could discriminate authenticity of the fear faces. Results also showed that children judged authenticity based on intensity information alone for all three expressions tested, while adults used a combination of intensity and other factor/s. In addition, novel results show that individual differences in empathy (both cognitive and affective) correlated with authenticity discrimination for happy faces in adults, but not children. Overall, our results indicate late maturity of skills needed to accurately determine the authenticity of emotions from facial information alone, and raise questions about how this might affect social interactions in late childhood and the teenage years.

**Keywords: facial emotion, genuine, posed, Duchenne, empathy**

### **Introduction**

Developmental studies of facial expression processing have focused almost exclusively on children's ability to label emotional facial expressions (i.e., as happy vs. sad etc.; for review see Widen, 2013). Yet being able to name the facial expression being displayed is not enough for successful social interaction. It is also important to be able to tell whether a facial display matches a person's underlying emotional experience (i.e., a genuine expression) or not (i.e., a posed expression). Concerning this ability, previous developmental studies have focused on happy facial expressions. None have investigated any negative facial expressions using stimuli for which it has been confirmed that adults can discriminate authenticity. Here we test for the first time children's ability to discriminate the authenticity of two negative facial expressions: sadness and fear, as well as happiness. We also provide the first test of the association between children's authenticity discrimination and perceived intensity of the expression across all three emotions. Finally, we provide the first evidence of correlations between individual differences in empathy and typical adults' authenticity discrimination ability, and test the same correlation in children.

### **Genuine and Posed Expressions**

Being able to tell the difference between genuine and posed facial expressions is crucial to social interaction because the two types of expression carry different meanings and imply different social responses. For example, if a person sees someone they know from school or work in a busy mall, a genuine smile might signal an invitation to approach and chat, whereas a posed onlybeing-polite smile might signal that further social interaction is not wanted at this time. In another example concerning sad expressions, not being able to tell the difference between genuine and posed sadness might increase vulnerability to manipulation: somebody showing a pretend sad expression could use it to elicit help from somebody who cannot tell the sadness is faked.

Genuine and posed expressions differ in several ways. The fundamental and critical distinction is that genuine expressions correspond with a congruent underlying emotion (e.g., smiling when feeling happy, frowning when feeling angry), whereas posed expressions do not. Here we investigate specifically the type of posed expressions that are *pretend*, in which there is no strong underlying experience of any emotion, such as smiling for a photograph, or playing pretend with a child whilst feeling emotionally neutral. (Note these potentially differ from posed expressions that are *masked*, in which the underlying emotion is *incongruent* with the facial display, e.g., masking anger using a smile; Gosselin et al., 2002a).

As a consequence of the differences in underlying emotional experience, genuine and posed facial expressions may also differ in their physical appearance, providing perceivers with some clues about emotional authenticity. Although the nature of these physical differences is not yet fully understood, some differences have been identified. One approach has been to use the Facial Action Coding System (FACS; Ekman et al., 2002), which is a tool for objectively measuring the degree of activation of different facial muscle groups, termed action units (AUs). Genuine expressions sometimes include so-called "reliable" AUs (Ekman, 2003), which occur less often in posed expressions (Ekman et al., 1988), and which people have less ability to control voluntarily (Mehu et al., 2012; although note that some people are able to voluntarily activate these AUs, Gunnery et al., 2012). The best established of these is AU6 for happy, or the "Duchenne" marker, which involves contraction of the orbicularis oculi muscle around the eyes to form wrinkles. AU6 has been associated with genuine happy expressions (e.g., Ekman et al., 1988; at least for Caucasians, Thibault et al., 2012). Reliable AUs for other emotions are less well established, but for sadness the AU1+4 combination (proposed by Ekman, 2003), which pulls the medial portion of the brow upward and together, has recently been empirically associated with genuine sadness (McLellan et al., 2010; although see Mehu et al., 2012, who found *observers* associated AU23 with authentic sadness, and not AU1+4). Note that for fear, it has not been empirically established in the literature what, if any, are the reliable AUs for genuine fear (Ekman, 2003). In addition to reliable AUs, other physical differences may include symmetry and signs of arousal. Genuine expressions are thought to be more symmetrical than posed expressions (Frank and Ekman, 1993; Ekman, 2003), and may include physical signs of arousal such as pupil dilation or skin "blushing" (Levenson, 2014), which are missing from pretend expressions because there is minimal underlying emotional arousal. Finally, intensity (how weak or strong the expression is) may potentially differ between genuine and posed expressions, particularly for happy where it has been suggested that a stronger underlying experience of happiness results in a more intense facial display (Hess et al., 1995).

Stimuli used in facial authenticity studies have been generated in several different ways. For happy in particular, some researchers have defined genuine happy expressions as any smile that includes AU6, and pretend happy expressions as any smile that does not include AU6 (e.g., Beaupré and Hess, 2003). Typically, these stimuli have been generated using actors who are able to voluntarily activate AU6. However, whether these actors were feeling underlying happiness is unknown and thus, although these stimuli mimic the muscle AU characteristics of genuine and pretend happiness, they may not include other physical markers of authenticity, such as signs of arousal. Given this, we suggest it is also valuable to test stimuli in which the emotional state of the photographed person is known to correspond to the assigned status of the stimulus as genuine versus posed.

For this reason, in the present study, we use genuine expression stimuli from McLellan and colleagues (e.g., McLellan et al., 2010; see **Figure 1** for examples). These stimuli were elicited in a laboratory setting using procedures developed by Miles (2005). For genuine expressions, emotions were elicited by looking at emotional pictures, listening to emotional sounds (e.g., baby laughing), or remembering an emotional event, and subsequently were verified by self-report of the people who had displayed the expressions to correspond with their underlying experience of emotion. The McLellan stimuli also include pretend versions of the same expressions, from the same models, which were elicited by instructing stimulus models to pose or pretend a sad or fearful face, fake a fearful reaction, or smile for a license photo, and which were subsequently verified by self-report to have been generated without any strong underlying experience of emotion. For the happy and sad expressions, the genuine versions include AU6 (happy) or AU1+4 (sad), while these reliable-AU markers are absent in the pretend versions. (For fear, reliable-AU status of the stimuli cannot be determined given that reliable-AUs for fear have not been established).

**FIGURE 1 | Examples of genuine and pretend (A) happy, (B) sad and (C) fear expressions (McLellan et al., 2010).**

### **Adult Authenticity Discrimination Ability for the McLellan Stimuli**

For the present study, we wished to select stimuli, and emotions, for which previous studies have found adult authenticity discrimination performance was above chance (thus allowing the examination of developmental trends). For adults, five published studies using stimuli from McLellan have tested whether observers can distinguish genuine from pretend expressions (McLellan et al., 2010, 2012; Johnston et al., 2011; Douglas et al., 2012; McLellan and McKinley, 2013; note that all of these studies did not use exactly the same stimulus items). Participants were instructed, "Your job is to decide*. . .* whether or not they are [the person shown is] feeling each emotion. For instance, sometimes when people smile it does not necessarily mean that they are actually feeling happy." (McLellan et al., 2010, p. 1283). Participants were then asked to give a yes/no response to the question "Are the following people *feeling* [emotion]?" For happiness and sadness, all five studies (McLellan et al., 2010, 2012; Johnston et al., 2011; Douglas et al., 2012; McLellan and McKinley, 2013) found that adults were significantly above chance at authenticity discrimination on this task (i.e., more "yes" responses for genuine expressions than for pretend expressions as indicated by A', a non-parametric signal detection score that combines hits and false alarms). For fear, of the two studies that tested this emotion, the initial study found significant if weak discrimination (McLellan et al., 2010), although this was not replicated in a later study (which found no discrimination; Douglas et al., 2012). For other expressions, there have either been no tests of McLellan-type stimuli (anger, surprise), or no evidence of above-chance authenticity discrimination in adults (e.g., disgust; Douglas et al., 2012). Overall then, there is good evidence adults can discriminate the authenticity of the McLellan happy and sad expression stimuli, with equivocal evidence regarding fear expressions.

### **Previous Studies of Authenticity Discrimination in Development**

Turning to development, there is evidence even very young children have in place at least some of the abilities needed to determine the genuineness of others' emotional signals. Testing multimodal signals of emotions—specifically, adults communicating an emotion simultaneously through facial expression, body gestures, and voice—Walle and Campos (2014) showed that 19-month-olds can detect incongruency between the emotional display and the context of the rest of an event (e.g., a parent displaying pain, while hitting a hammer not on their finger but on the table nearby; although note that certain aspects of emotion–context interactions are not mature even by 12 years of age, Dawel et al., 2015) and can detect incongruency between two successive emotions (e.g., an actress displaying disgust followed immediately by happiness). These infants also showed sensitivity to whether a multimodal expression of fear was of normal versus exaggerated intensity. Importantly, however, in the Walle and Campos (2014) study all scenarios were acted (i.e., all facial expressions were likely posed rather than genuinely-felt) and the study concerned ability to discern authenticity-related information from multimodal stimuli, not facial information alone, which is the focus of the present investigation.

Studies that have tested specifically the ability to determine the authenticity of facial expressions, by contrasting genuine and posed versions of the expression, have tested children rather than infants. We are aware of six such studies. Five tested happy expressions, but only one tested any other emotion.

For happy, four studies varied authenticity by creating stimuli using the AU6 present–absent method (Gosselin et al., 2002b, 2010; Del Giudice and Colle, 2007; Thibault et al., 2009). One used happy faces created using the Miles/McLellan method where the subjective feelings of the photographed person are known (Blampied et al., 2010). Results of both methods agree that children can discriminate happy authenticity above chance from as young as 4 years of age, but do not reach adult levels of performance even by 16–17 years of age. Two of these studies also investigated the physical cues that children use to achieve authenticity discrimination in happy faces. Thibault et al. (2009) used smile stimuli that varied in intensity of the smile, and were also either with, or without, the reliable-AU marker for genuine happy AU6. From their data, Thibault et al. (2009) concluded that children from 4 years of age used intensity of the expression to judge authenticity, and also the presence of AU6. Del Giudice and Colle (2007) examined the relationship between 8-year-olds' judgments of smile authenticity and FACS-coded AU intensity. This study found that expressions were judged by 8-year-olds as more authentic if they included bare-teethed smiles (AU25), stronger activation of AU6, and/or stronger activation of the "lid tightener" (AU7), which is easily confusable with AU6. Their results suggest that it may not be not the presence of AU6 per se that children use to judge smile authenticity, but rather the increased intensity of the expression that is associated with activation of AU6 (or AU7).

For other emotions, the only previous study videoed children's genuine reactions to "disliked" stimuli (most commonly producing expressions of disgust, e.g., tasting a salty drink; Soppe, 1988). Child observers (6–12 year olds) could not discriminate these above chance from pretend dislike reactions to neutral stimuli. Unfortunately, this finding is difficult to interpret as evidence regarding developmental trends because adults also could not discriminate authenticity of the same stimuli. Thus overall, there have been no studies that have tested children's ability to discriminate emotion authenticity of negative facial expressions for which adults have successfully demonstrated authenticity discrimination.

### **Present Study**

The primary aim of the present study was to provide the first test of children's ability to discriminate the authenticity of two negative facial expressions, sad and fear, as well as happy expressions, in 8–12 year olds relative to adults. We tested 8–12 year olds because, by 8 years of age, children have a good conceptual understanding of the difference between genuine and pretend expressions (Sidera et al., 2011). We tested sad and fearful expressions specifically because these were the only two negative expressions for which we were able to obtain genuine-pretend stimulus pairs from the same identity models (created using the Miles/McLellan method), and for which adults had already demonstrated ability to discriminate authenticity (for sad, consistently above chance in five studies), or at least some evidence of ability to discriminate authenticity (for fear, above chance in one out of the two previous studies). We also included happy faces because it is well established that children of the age tested here can discriminate their authenticity, which allowed us to use happy to validate our task (i.e., children's above-chance performance for happy expressions would help us to establish that children understood the task). We used a task that presents pairs of faces, rather than individual faces (e.g., as in Blampied et al., 2010; McLellan et al., 2010), to minimize task demands for children (Gosselin et al., 2010). In our two-alternative-forced-choice (2-AFC) paradigm, participants were shown pairs of genuine and pretend expressions from the same model and asked to decide which of the pair was "only pretending."

A second aim of the present study was to obtain information relevant to understanding the strategies children use to discriminate authenticity, and particularly the extent to which they rely only on intensity of the facial expression, or a combination of intensity and other factors. Previous studies of this question have examined only the expression of happy. Here, we examine relationships between authenticity discrimination and perceived expression intensity, across all three expressions of happy, sad, and fear, to determine the contribution of intensity to children's judgments of authenticity more broadly, beyond just the expression of happy. In addition, the analysis we use (see Results) allows us to determine whether children, and adults, demonstrate significant use of cues *beyond* intensity. Note that our stimuli do not, in general, allow us to define the specific nature of such cues (e.g., to what extent they include reliable-AUs versus other physical differences that might be present between the genuine and posed McLellan faces). Certain outcome possibilities, however, would allow us to draw some limited conclusions regarding other cues (e.g., for sad, the genuine but not posed expressions contain AU1+4; thus, a finding that, say, adults can discriminate authenticity of these stimuli above chance while children cannot would imply that children do *not* use this reliable-AU).

Our final aim was to examine associations between authenticity discrimination ability and individual differences in empathy. This is a largely novel question even in adults, and has not previously been tested at all in children. That there might be such an association is suggested by some theoretical models of empathy that link perception and action, so that by perceiving another's situation the observer creates some kind of simulation, either through emotional or motoric representation, of the other's situation that results in sharing of their emotional experience [e.g., theories that empathy is derived from emotional contagion, see Maibom, 2012; or the perception-action model of empathy (PAM), Preston and de Waal, 2002]. These theories do not specifically discuss a relationship between empathy and ability to determine *authenticity* of facial emotion, but do make it plausible that such an association could exist. For example, we suggest an association with authenticity discrimination might arise from a simulation process either because people with greater simulation abilities might be *better* at discriminating between genuine and pretend expressions because they experience an especially strong emotional experience in response to genuine expressions (predicting a positive correlation), or, in the opposite direction, that they might be *worse* at authenticity discrimination because they experience an indiscriminately strong emotional experience to both genuine and pretend expressions (predicting a negative correlation; Manera et al., 2013). These predictions regarding simulation appear to relate more specifically to the *affective* component of empathy—the extent to which a person is emotionally responsive to others' experiences (e.g., the extent to which they feel sad, sympathetic or distressed because a friend is crying). In the present study, we also examine *cognitive empathy*—the ability to infer what another person is thinking and feeling from physical cues in the face and body, contextual information, and knowledge of the person (e.g., using a frown to infer that someone is angry; see Maibom, 2012). As cognitive and affective empathy are at least partly independent facets (Jolliffe and Farrington, 2006; Dadds et al., 2008), we examined them separately. Concerning predictions for cognitive empathy, we suggest that, potentially, observers could use cognitive strategies (e.g., explicit knowledge of the AU6 marker or arousal cues) to infer authenticity, predicting a positive correlation between cognitive empathy and authenticity discrimination. Regarding previous empirical tests, we are aware of only one previous study that has examined associations between empathy and authenticity discrimination in adults (but cf. Manera et al., 2013, for study of associations between emotional contagion and authenticity discrimination). McLellan and McKinley (2013) found a positive correlation between empathy (affective and cognitive components combined) and authenticity discrimination within a clinical traumatic brain injury group; however, this correlation was not significant within neurologically healthy controls with a small sample size (*n* = 19). Here, we reexamine this association within typically developing adults and, for the first time, test it in children.

## **Materials and Methods**

### **Participants**

Participants analyzed were 85 children (*M*age = 10.0 years, SDage = 1.1, age range = 8.3–12.3, 46 females) and 57 young adults (*M*age = 19.3, SDage = 2.2, age range = 17–27, 40 females). All participants were Caucasian, to match the race of face stimuli, because there are race-related cultural differences in the perception of expression authenticity (e.g., some non-Caucasian cultures do not interpret AU6 as a sign of genuine happiness; Thibault et al., 2012). Adults were recruited via fliers posted around campus at the Australian National University. Children were from two local primary schools, and were recruited by having the schools send letters home to all parents in the class requesting their child's participation in the study. All participants were reported to have normal or corrected-to-normal vision. Informed, written consent was obtained from adult participants and from the parents of child participants. Verbal assent was also obtained from child participants. Adults were paid \$15 per hour for their participation, or given undergraduate course credit. Children were rewarded with certificates and stickers. This study was approved by, and conducted in accordance with the guidelines of, the Human Research Ethics Committee at The Australian National University. Additionally, for children, approval for the study was obtained from the ACT Government Education and Training Directorate.

We excluded data from seven additional participants who, on a screening questionnaire, reported major disorders that can affect face processing (e.g., brain injury, Autism, etc.). These were five children reported by parents to have an intellectual impairment (1), ADHD or ADD (3), or Aspergers disorder (1), and two adults who reported epilepsy (1) or severe migraines with aura (1).

### **Session Structure and Order of Tasks**

Participants were tested in a single session lasting up to one hour (children) or one and a half hours (adults, extra time was for completing questionnaires and, for the first *n* = 26 adults tested, the emotion labeling and intensity rating tasks). Tasks reported in the present article were completed as part of a larger battery, but always in the following order: basic emotion labeling task (i.e., categorizing the facial expression as happy, sad, etc.); authenticity discrimination task; intensity rating task (adults only; note that children's ratings of affective stimuli tend correlate highly with adults' ratings, *r*s *>* 0.82; McManis et al., 2001); and finally demographic, screening, and empathy questionnaires (for adults; for children the questionnaires were completed by parents prior to the session). Experimental tasks were run using Macintosh computers. Faces were presented on an attached ELO IntelliTouch touchscreen with screen size 15" and resolution 1024 *×* 768, using Superlab version 4.0 software. Participants responded by touching the screen.

### **Facial Expression Stimuli**

Examples of the facial expression stimuli are shown in **Figure 1**. Stimuli were genuine and pretend versions of happy, sad and fear expressions. In total, there were 12 genuine-pretend pairs; 4 happy, 4 sad, and 4 fearful (24 face images total). The stimuli were provided by McLellan (personal communication, 2011), comprising a set that largely overlapped with that used by McLellan et al. (2010), and all were created in the manner described in that article [i.e., following Miles (2005) as described in the final paragraph of "Genuine and Posed Expressions" in our Introduction]. Genuine and pretend versions of each emotion were displayed by four female stimulus models (all three expressions = 1 model; happy and sad = 2 models; happy and fear = 1 model; sad only = 1 model; fear only = 2 models). Stimulus models were from the general population, and did not have any specific training. Faces were displayed centrally, and subtended 5.5 *×* 7.3°visual angle (4.8 cm wide *×* 6.4 cm high at the viewing distance of approximately 50 cm).

### **Emotion Labeling**

The 24 happy, sad and fear face stimuli were presented individually in random order for each participant, intermixed with 16 additional images displaying genuine and pretend anger and disgust expressions to make the labeling task a 5-choice response (total stimuli = 40 faces). (Anger and disgust stimuli were also provided by McLellan and colleagues; note these were not included in our authenticity discrimination task). The task was to indicate what expression each face was displaying by choosing from the five emotion labels presented onscreen (angry, disgusted, scared, happy, sad). Faces were displayed until response. There were five practice trials (one for each emotion label) showing cartoon characters from The Simpsons. All children and a subsample of the first 26 of the 57 adults completed the labeling task (*M*age = 19.7 years, SDage = 2.8, age range = 17–27, 16 females). The task was exactly the same for children and adults.

Prior to starting the labeling task, we verified children understood the meaning of the emotion labels. Children were asked for each emotion, "Tell me what happy (*or* sad, etc.) means or when you might feel happy (*or* sad, etc.)." All children provided explanations that were consistent with the meaning of the emotion labels (e.g., for happy: "if something goes your way or if you get something that you like"). Children were then read five brief stories from Widen and Russell (2002) depicting scenarios that would be likely to elicit each of the five emotions, and asked "How do you think [the child in the story] is feeling?" All children verbalized the correct emotion label or a synonym for that emotion for each story (e.g., some children gave the label "afraid" instead of "scared" for the fear story).

### **Authenticity Discrimination Task**

The authenticity discrimination task is illustrated in **Figure 2**. Two images of the same person were presented one after the other for 2000 ms each, with a blank interstimulus interval of 500 ms. One image showed a genuine happy, sad, or fear expression and the other image showed a pretend version of the same emotional expression. Participants were instructed that the two images were of twins, and that "The twins are playing a trick on you. One twin

really feels happy, scared, or sad, but the other is only pretending! Your job is to decide which twin is only pretending." The task was exactly the same for children and adults, with the exception that, following these initial instructions, children were also asked what "pretending" means. All children were able to give a synonym (e.g., "faking") and/or an example of pretend behavior.

Each trial started by asking, "Which twin is only pretending to be happy about the present (*or* sad about the rain, *or* scared of the spider)?" To ensure participants were looking at the screen they were required to press an object to start each trial (happy = present; sad = rain cloud; fear = spider). After viewing the two face images, participants responded by touching one of two boxes shown side-by-side onscreen, labeled "'1st twin" and "'2nd twin," to indicate which twin they thought was only pretending to be happy, sad, or scared. Prior to starting the task, all participants completed three practice trials (one each for happy, sad, and scared) showing cartoon characters from The Simpsons. The "twin" images were shown consecutively, rather than together, to ensure participants had the same amount of time to scan each face. The order of genuine and pretend versions of each emotion was counterbalanced across trials (e.g., so the 1st twin displayed genuine happiness in two of the four happy trials, and the 2nd twin displayed genuine happiness in the other two trials). Trial order was randomized for each participant (trials per emotion = 4, total trials = 12).

### **Intensity Rating Task**

Intensity ratings were from the same subsample of adults who completed the emotion labeling task (*n* = 26). Participants were instructed, "Your next task is to rate the intensity of each facial expression, from weak to strong." Each face was presented individually onscreen, in random order, with the statement, "Please rate the intensity of this facial expression," and a scale numbering from 1 (labeled "weak") to 9 (labeled "strong"). (Note the intensity rating task also included the anger and disgust expressions).

### **Empathy Questionnaires**

Empathy was measured for children using the Griffith Empathy Measure (GEM; Dadds et al., 2008) and in adults using the Basic Empathy Scale (BES; Jolliffe and Farrington, 2006). These measures were selected because: (1) they each have a two-factor structure in which one factor taps affective empathy and the other cognitive empathy, and (2) they are well matched across the child and adult measures in terms of the number of items that refer to negatively valenced emotions, positively-valenced emotions, and to "feelings" generally without specifying valence (see **Table 1**).

The GEM is a 23-item parent report measure, with items scored from *−*4 (strongly disagree) to +4 (strongly agree). Both factors demonstrate good to acceptable reliability (current sample: affective α = 0.81; cognitive α = 0.50; note the cognitive subscale only has six items, and Cronbach's α tends to underestimate reliability when there are small numbers of items; Schmitt, 1996). Concerning validity, parent ratings on the GEM have been demonstrated to correlate positively with direct observations of children's empathic behavior and with questionnaire measures of prosocial behavior (Dadds et al., 2008).

The BES is a 20-item self-report measure that uses a 5-point Likert scale, ranging from strongly disagree to strongly agree. Both factors demonstrate good reliability (current sample: affective α = 0.89; cognitive α = 0.77). Concerning validity, each factor correlates as would be expected with other questionnaire measures


**TABLE 1 | Number of items by valence for the affective and cognitive subscales of the GEM (children) and the BES (adults).**

of empathy and of personality traits (e.g., positive correlation with agreeableness; Jolliffe and Farrington, 2006; Baldner and McGinley, 2014).

### **Results**

Given that we had no specific predictions for individual emotions, all significance tests throughout the Results that report individual emotions are *post hoc* and all *p*-values are Bonferroni corrected (for the three emotions). Also, all significance tests are two-tailed.

Authenticity discrimination scores (**Figure 3**) were calculated as proportion correct (i.e., the proportion of trials on which the participant correctly chose the pretend expression as the one "just pretending"). Initial ANOVA revealed a two-way interaction between age group (adult, child) and face emotion (happy, sad, fear), *F*(2,280) = 5.87, MSE = 0.08, *p* = 0.003, which established the pattern of authenticity discrimination ability across the three emotions was significantly different for adults and children. Thus we analyze adults and children separately.

### **Adults**

Results are displayed by the dark bars in **Figure 3A**. A one-way ANOVA on adults' authenticity discrimination scores revealed a significant effect of face emotion, *F*(2,112) = 42.28, MSE = 0.07, *p <* 0.001, which established that adults' authenticity discrimination ability differed across the three emotions. Follow-up *t*-tests revealed that adults' ability to discriminate the authenticity of expressions was significantly better for happy than for sad expressions, *M*happy = 0.89, *M*sad = 0.68, *t*(56) = 4.18, *p <* 0.001, and was also significantly better for sad than for fear expressions, *M*fear = 0.43, *t*(56) = 5.06, *p <* 0.001.

It was also theoretically important to establish whether authenticity discrimination was above chance (0.5). Using one-sample *t*-tests we found that, consistent with previous studies using similar stimuli (e.g.,McLellan et al., 2010, 2012; Johnston et al., 2011; Douglas et al., 2012; McLellan and McKinley, 2013), adults were able to successfully discriminate the authenticity of happy, *t*(56) = 12.08, *p <* 0.001, and sad expressions, *t*(56) = 4.09, *p <* 0.001. For fearful expressions, authenticity discrimination was slightly below chance but not significantly so, *t*(56) = 2.22, *p* = 0.061. Overall, these results indicate that adults could not discriminate the authenticity of the fear expressions, but could discriminate the authenticity of the happy and sad expressions, and that they were better at this for happy than for sad.

### **Children**

Results are displayed by the light bars in **Figure 3A**. A one-way ANOVA on children's authenticity discrimination scores revealed a significant effect of face emotion, *F*(2,168) = 32.37, MSE = 0.09, *p <* 0.001, which established that authenticity discrimination ability differed across the three emotions. Follow-up *t*-tests revealed that, like adults, children showed better ability to discriminate the authenticity of happy than of sad expressions, *M*happy = 0.77, *M*sad = 0.46, *t*(84) = 6.36, *p <* 0.001. Unlike adults, however, there was no significant difference between children's authenticity discrimination scores for sad and fear expressions, *M*fear = 0.46, *t*(84) = 0.227, *p* = 1.0.

Comparison of children's authenticity discrimination scores to chance (0.5) for each emotion using one-sample *t*-tests also showed that, like adults, children were able to successfully discriminate the authenticity of happy, *t*(56) = 8.93, *p <* 0.001, and not of fearful expressions, *t*(56) = 1.61, *p* = 0.224, but that, unlike adults, children were unable to discriminate the authenticity of sad expressions, *t*(56) = 1.12, *p* = 0.474.

Finally, it was also important to compare children to adults. This must be done separately for individual emotions (due to the original emotion *×* age group 2-way interaction, which indicates developmental improvement varies across emotions). Results showed there was no age-related change in authenticity discrimination accuracy for fear expressions, *t*(140) = 1.56, *p* = 0.602, but that children performed significantly more poorly than adults for happy and sad expressions, happy: *t*(140) = 2.46, *p* = 0.015; sad: *t*[106.1(equal variances not assumed) = 4.04, *p <* 0.001]. Importantly, children's poorer authenticity discrimination for

(Standard Error of the Mean).

happy and sad expressions could not be explained by any inability to label these expressions (**Figure 4**). Children's labeling accuracy for sad expressions was statistically equivalent to that of adults, *M*child = 0.83, *M*adult = 0.86, *F*(1, 109) = 0.574, MSE = 0.060, *p* = 0.450, and for happy expressions was slightly better than that of adults, *M*child = 0.99, *M*adult = 0.97, *F*(1, 109) = 6.95, MSE = 0.004, *p* = 0.010, irrespective of whether expressions were genuine or pretend (no significant interaction between age group and expression authenticity for happy or sad expressions, both *p*s *>* 0.222).

Overall, results for children showed they were able to discriminate the authenticity of happy expressions, but not of sad (or fearful) expressions, and that even in the case of happy expressions children did not perform as well as adults.

### **Does Authenticity Discrimination Improve from 8 to 12 Years of Age?**

**Figure 3B** illustrates that, for all three emotions, children showed no improvement in their ability to discriminate the authenticity of expressions with age (i.e., the 8-year-old white bars on the left for each emotion are comparable with the darker grey bars for older children on the right). Supporting this conclusion, linear trend analysis on mean authenticity discrimination score across age in years (8 year olds, 9 year olds, 10 year olds, and 11 and 12-year-olds combined for sufficient sample size) showed no significant change with age for any of the three emotions, happy: *F*(1,81) = 0.98, MSE = 0.08, *p* = 0.326; sad: *F*(1,81) = 0.11, MSE = 0.08, *p* = 0.739, fear: *F*(1,81) = 1.25, MSE = 0.06, *p* = 0.266.

### **Intensity**

Our results show that children are not able to discriminate the authenticity of sad expressions at all, and are less able than adults to discriminate the authenticity of happy expressions. This raises the question of whether children and adults use different strategies to discriminate authenticity. We tested the extent to which children rely on expression intensity to judge authenticity, as opposed to also using additional cues in the face (e.g., reliable AUs; Thibault et al., 2009, 2012), across all three of the expressions we tested (happy, sad, fear; note the previous studies of Thibault et al., 2009, and Del Giudice and Colle, 2007, examined variations in intensity within happy only). Note that our method and analyses are designed to establish (1) the contribution of intensity to authenticity discrimination and (2) *if* additional cues contribute. They are not intended to establish *what* these additional cues are.

**TABLE 2 | Mean intensity ratings (***n* **= 26 adults) for genuine and pretend versions of each emotional expression, with SDs in parentheses.**


*Scale* = *1 to 9, with higher number indicating the expression is stronger (more intense). Scores are averaged over the four items in each category (e.g., the four genuine happy face items).*

We first examined the mean intensity ratings for our stimulus items from the three emotions, shown in **Table 2** averaged across emotions and in **Figure 5** for individual stimulus pairs. These values are potentially consistent with the idea that children (and adults) judge authenticity based on the intensity of the expression, such that they perceive the more intense expression of each trial pair as the more genuine, and the less intense as less genuine (more likely to be pretend). Specifically, for happy—the expression for which children were able to reliably discriminate authenticity—comparison of mean ratings showed the genuine items we used were on average significantly more intense than the pretend items, *t*(25) = 7.65, *p <* 0.001. For sad, however, the genuine items and pretend items did not differ significantly in average intensity, *t*(25) = 1.54, *p* = 0.137; and for these stimuli children were not able to discriminate authenticity above chance (while adults could). Finally, for fear, the *pretend* items were significantly more intense overall than the genuine items, *t*(25) = 5.55, *p <* 0.001; and for these stimuli even adults could not discriminate authenticity above chance (indeed, they showed a trend in the opposite direction, i.e., toward perceiving the *pretend* item as more authentic than the genuine one).

Taking this intensity analysis one step further, we then examined correlations between authenticity discrimination performance and intensity of individual items. Note our use of paired stimuli (for 2-AFC) for the authenticity trials requires a somewhat complicated way to analyze the data (i.e., we cannot just plot discrimination accuracy against intensity of *the* face, since there were two faces presented on each trial). In **Figure 6**, we illustrate the format of our data plots. On the x-axis, we plot the *difference* in mean intensity ratings for each genuine-pretend pair. On this scale: a score of zero indicates that the genuine and pretend items on the trial were of equal intensity; a score to the right of zero indicates that the genuine face was more intense than the pretend face; and a score to the left of zero indicates that the pretend face was more intense than the genuine face. On the y-axis, we plot authenticity discrimination accuracy (mean across participants), for each individual face pair (12 pairs in total).

In **Figures 6A–C**, we illustrate how various results outcomes would correspond to evidence of using different types of strategies. In particular, we test whether a given age group is using: (a) intensity *only* (**Figure 6A**); (b) other strategies *only* (**Figure 6B**), such as making use of the reliable AUs (AU6 for happy and AU1+4 for sad) that are present in the genuine versions and absent in the pretend versions, or making use of affective empathy responses; or (c) a *combination* of intensity and other strategies (**Figure 6C**).

First, in **Figure 6A**, if intensity is an important driver of which member of the pair is perceived as authentic, we would expect to observe a positive-slope relationship. Here, pairs in which the genuine face is the more intense item would tend to produce higher authenticity decision accuracy (i.e., participants are more likely to correctly perceive the genuine face as the genuine one), and pairs in which the pretend face is the more intense item tend to produce lower authenticity decision accuracy (i.e., participants are more likely to incorrectly perceive the pretend face as the genuine one). Moreover, if participants are using *only* intensity to determine authenticity, then we would expect the line of best fit to pass through (0, 0.5): that is, when the two faces in the stimulus pair are equal in intensity (zero on the x-axis), we would expect participants to perform at chance (0.5) in authenticity discrimination.

Second, in **Figure 6B**, if only other strategies drive percepts of authenticity, with no role for intensity, then we would observe a flat line relationship with the line of best fit set at abovechance discrimination (y-intercept significantly above 0.5 on the authenticity discrimination scale). This is because the lack of a role for intensity would leave no association with that variable (i.e., a flat slope), and the contribution of the other factor/s would give above-chance discrimination (including when intensity is equal for the genuine and pretend faces, i.e., at the y-axis where intensity difference is zero).

Finally, if both intensity *and* other strategies were being used in combination, we would expect the pattern illustrated in **Figure 6C**. Here, there is a positive-slope relationship indicating a contribution of intensity, and simultaneously an above-0.5 y-intercept value indicating a contribution from other factor/s. Note this is different from the pattern in **Figure 6A**, in which the y-intercept is 0.5 indicating authenticity discrimination is related *exclusively* to intensity.

We present the actual results in **Figure 6D** (children) and **Figure 6E** (adults). For both age groups, authenticity discrimination was significantly positively related to expression intensity (slope of regression line for children: *b* = 0.105, β = 0.823, *t*(11) = 4.58, *p* = 0.001; for adults: *b* = 0.113, β = 0.756, *t*(11) = 3.65, *p* = 0.004), arguing that both children and adults used intensity as a cue to authenticity across the three expressions.

no significant contribution of other factors. **(E)** Results for adults (*n* = 57) indicate they also use intensity information, but that they combine intensity with some other cue/s to boost their performance above that of children. "Difference in intensity ratings" is the difference in mean rating, for each stimulus pair (*n* = 12 pairs), averaged across 26 adult participants (e.g., mean intensity rating for a genuine happy stimulus face minus the mean intensity rating for the corresponding pretend happy stimulus with which it appeared on the 2-AFC trial). Authenticity discrimination scores were averaged across participants for each stimulus pair. H1 = happy stimulus pair number 1, H2 = happy stimulus pair number 2, and so on for sad (S) and fear (F).

Moreover, comparison of slopes across the two age groups indicated that there was no evidence of any difference in children's compared to adults' sensitivity to intensity; that is, the child slope of *b* = 0.105 was not significantly less steep than the adult slope of *b* = 0.113, *t*(20) = 0.20, *p* = 0.838 (parallel slopes shown in **Figure 6E**).

Where children differed from adults was in the y-intercept. For children, when the genuine and pretend expressions in the pair were of equal intensity, the regression line intercepted the y-axis at a point not significantly different from chance levels of authenticity discrimination [*intercept* = 0.557, *t*(11) = 1.68, with *p* = 0.122 for comparison to chance value of 0.5]. Thus in the absence of diagnostic intensity information children were unable to discriminate authenticity above chance (consistent with the earlier analysis for sad-expression items treated as a group). In comparison, **Figure 6E** shows that the regression line for adults is moved up relative to children's, and crosses the y-axis significantly above chance levels of authenticity discrimination [*intercept* = 0.657, *t*(11) = 3.41, *p* = 0.006 for comparison to chance value of 0.5]. This result argues that adults used, in addition to intensity, some other cue or cues to improve their discrimination of authenticity.

While the size of our stimuli set for the intensity ratings was small (*n* = 12 stimulus pairs) and we only used adult's estimates of intensity differences, overall these analyses are consistent with the view that children's percepts of authenticity were driven primarily by intensity of the expression, while adults judge authenticity using intensity in combination with other factors. Concerning the nature of these other factor/s, note that our adult results do not directly demonstrate that adults used the presence versus absence


*<sup>1</sup>For scales with >3 items we report Cronbach's* α*; however,* α *tends to underestimate reliability when there are small numbers of items (Schmitt, 1996) so for the 3-item scales we report the Spearman-Brown coefficient (Eisinga et al., 2013).*

of reliable AUs, although the results are at least consistent with this view in that our stimuli had reliable AUs present in the genuine version (and absent in the pretend version) for the two emotions that adults could discriminate above chance (happy and sad), but not necessarily for the emotion adults could not discriminate (fear). For children, however, our results do directly support the view that this age group did *not* use the reliable-AU combination of AU1+4 as a cue to authenticity for sad faces: if they did so, then their mean performance for sad faces would have to be above chance, which was not the case (see Children).

### **Empathy**

We next examined correlations between individual differences in empathy and authenticity discrimination ability, for those emotions where performance was above chance. Reliability analyses are reported in **Table 3** and correlational results in **Table 4**. Note that for conditions where mean authenticity discrimination was at chance (fear for adults, sad and fear for children), individual differences in performance are not meaningful (i.e., they are taken to reflect merely guessing), and so correlations with empathy are not reported.

For adults, there were significant positive correlations (i.e., higher empathy was associated with better ability to discriminate authenticity) for happy expressions. This was true for both affective empathy, τ = 0.309, *p* = 0.004, and cognitive empathy, τ = 0.352, *p* = 0.001 (note all correlations involving happy in this article report the non-parametric Kendall's τ due to a skewed distribution of authenticity discrimination scores for this expression). In an additional analysis, we divided the adult empathy questionnaire (BES) into items that referred to negative emotions and positive emotions (excluding items in which the emotional valence was not specified; see **Table 4**). This was because Manera et al. (2013) reported that *better* authenticity discrimination was related to susceptibility to emotional contagion (one aspect of empathy; Maibom, 2012) only for contagion from *negative* emotions, while susceptibility to emotional contagion from *positive* emotions was related to *worse* authenticity discrimination (note the results were for happy-face authenticity only; other expressions were not tested). However, in the present study we found that both negative-valence and positive-valence BES scores showed significant positive correlations with authenticity discrimination for happy expressions (BESnegative*−*valence: τ = 0.342, *p* = 0.002; BESpositive*−*valence: τ = 0.383, *p* = 0.001). That is, we found in adults that better authenticity discrimination of happiness was associated with greater BES empathy, *irrespective* of the emotional valence of measurement items.

In contrast, for children we found no significant correlations with empathy (**Table 4**), specifically including trivially small correlations for happy (i.e., the expression for which significant correlations were present for adults). Note this lack of correlation cannot be attributed to uninteresting explanations such as lack of range: the children's happy-face authenticity scores had if anything more range than the adults' (SDchildren = 0.28, SDadults = 0.24), and the children's empathy scores covered a wide range of values compared to norms (for total GEM, *M* = 34.32, SD = 18.94, compared to *M* = 35.03, SD = 21.7 for *n* = 1034 7–10 year olds; Dadds et al., 2008).

Finally, we examined whether sex differences might play a role in the empathy correlations found in adults. This issue arose because (a) empathy was positively related to being female in our adult sample (affective empathy: *r* = 0.587, *p <* 0.001; cognitive empathy *r* = 0.299, *p <* 0.05; replicating previous findings, for review see Eisenberg and Lennon, 1983), and (b) at the same time, we found that authenticity discrimination was also better in females. Including participant sex in a global ANOVA [sex *×* facial emotion *×* age group (children vs. adults)] on authenticity scores revealed a significant interaction between sex and face emotion, *F*(2, 276) = 7.04, MSE = 0.076, *p* = 0.001. This interaction is illustrated in **Figure 7**, where it can be seen that females showed an advantage over males in authenticity discrimination for happy faces, but not sad or fear faces. Collapsing over age group [noting that the ANOVA showed no significant interactions involving sex and age group: 2-way sex *×* age, *F*(1, 138) = 0.56, MSE = 0.056, *p* = 0.456; 3-way sex *×* age *×* emotion, *F*(2, 276) = 2.29, MSE = 0.076, *p* = 0.103], there were no sex differences for either sad or fear expressions, both *p*s *>* 0.498. However, for happy, the female advantage was significant, *M*females = 0.90, *M*males = 0.70, *t*[75.6(equal variances not assumed) = 3.94, *p <* 0.001]. This raises the possibility that the significant empathy correlations for adults were in fact driven by sex differences. To rule out this possibility, we re-ran correlations using only female participants. (There were insufficient males in our adult sample to look at males separately). These female-only analyses (right side of **Table 4**) replicated the finding of a significant relationship between authenticity discrimination for happy expressions and empathy in adults (affective empathy, τ = 0.277, *p* = 0.040, cognitive empathy, τ = 0.334, *p* = 0.015), and not in children, indicating that our empathy findings were not due to sex effects.

Overall, empathy results indicate that individual differences in empathy were correlated with ability to discriminate authenticity for happy expressions in adults, but not in children, and not for sad

#### **TABLE 4 | Correlations between empathy and authenticity discrimination scores (for emotions where mean performance was above chance).**


*Empathy was measured in children using the GEM and in adults using the BES. Correlations for fear for both age groups and for sad for children are not presented because mean performance was at chance and, correspondingly, internal reliability was extremely low. <sup>1</sup>Kendall's* τ *is reported for correlations involving happy authenticity discrimination scores, because this variable was strongly negatively skewed. Pearson's* r *is reported for all other correlations. <sup>2</sup>Correlations with the GEM cognitive subscale and the GEM positively-valenced subscale should be interpreted with caution due to the low reliability of these subscales, but are presented here for completeness. <sup>3</sup>Positively-valenced and negatively-valenced scores were calculated by summing together all items from the* total *scales (because there were too few items if we did so for the affective and cognitive subscales separately) that referred to positive and negative emotions respectively (see* **Table 1***), excluding all items for which emotional valence was not specified. \*p < 0.05, \*\*p < 0.01.*

expressions in adults (with fear correlations in both age groups, and sad correlations in children, not analysable due to the chance performance).

### **Discussion**

The present study is the first to test children's ability to discriminate the authenticity of facial expressions for any basic emotions beyond happy, using stimuli in which the genuinely-felt or posed nature of the underlying emotion is known from the self-reports of the person appearing in the photograph. Overall, our results indicate that 8–12-year-olds have some ability to discriminate authenticity in facial expressions, but are immature relative to adults both in their performance level and in the strategies they use to achieve that performance. For happiness, in which genuinely-felt expressions were more intense than posed expressions, children were able to successfully discriminate expression authenticity from the youngest age tested (i.e., 8 year olds), but they did not perform as well as adults, and showed no improvement in this ability over the 8–12 year old age range. When genuinely-felt facial expressions were not more intense than the posed versions for our sad and fear stimuli, then children failed to discriminate authenticity, whereas adults could for sad expressions, arguing that the sad expressions included some other cue or cues to authenticity that children failed to use. Overall, children appeared to judge authenticity exclusively based on intensity of the expression, for all three expressions. In contrast, adults used intensity combined with other factor/s, which for happy expressions may include empathic responses.

### **Ruling Out Uninteresting Interpretations of Children's Poor Performance**

Before proceeding to discuss theoretically interesting interpretations of the differences between children and adults, it is important to rule out uninteresting possibilities. This arises particularly for the differences between children and adults in overall performance level (rather than for evidence concerning different strategies).

First, children's poorer authenticity discrimination, found for happy and sad, could not be explained by any inability to label the expressions. Children were as accurate at labeling happy and sad expressions as adults.

Second, children's poor performance cannot be explained by failure to understand the task instructions. We verified that all children understood the meaning of pretending as distinct from genuinely-felt emotion, plus children's above-chance performance for happy expressions shows they were able to correctly follow the instructions to choose the face that was "just pretending."

Finally, it is important to evaluate whether children's poor performance might be attributed to factors associated with general cognitive development that might lower laboratory task performance independent of difficulties in perceiving authenticity. As argued previously for other face tasks (e.g., McKone et al., 2012), such factors could include greater distractibility in younger children (i.e., poorer concentration on the task) or, in the present design, difficulties with remembering the *order* in which two items were presented. For the present findings, a number of observations argue against such an interpretation. Concerning order, previous studies have shown that the type of sequential task we used (participants have to remember which item was first and which second, and indicate their choice after a short delay) can easily be performed by children even at the younger end of our age range, when the perceptual discrimination between the two items is straightforward (e.g., accuracy *>*90% for 9-years-olds in remembering the order of faces displaying happy, sad and fear expressions; Pollak and Kistler, 2002). Concerning distractibility and other attention-related factors, if these factors were responsible for our children's poor performance, then we would have expected to see authenticity-task performance improve significantly across the 8–12 year age range (because it is plausible that distractibility decreases across this age range); yet this was not the case. And, further, distractibility and other attention-related factors would be expected to lower children's slope in our plots of authenticity performance against relative intensity (**Figure 6D**)—because lapses in attention would increase random errors in responses, having the result of pushing both ends of the line toward chance (0.5) thus decreasing its slope—yet, again, this was not the case (i.e., children's sensitivity to intensity was no weaker than adults').

Overall, we argue our that our results do not reflect difficulties with task demands, but instead indicate that children aged 8–12 years have poor actual ability to discriminate authenticity in the faces.

### **What Cues Do Children Use to Discriminate Authenticity?**

Our results argue that 8–12 year old children use immature strategies relative to adults to determine authenticity of facial emotion, relying only on intensity and not other additional cues.

Concerning intensity, we found that the children were equally as sensitive to intensity as a cue to authenticity as adults. This early sensitivity to intensity is in agreement with the two previous studies (Del Giudice and Colle, 2007; Thibault et al., 2009), which showed that children use intensity to judge facial expression authenticity—and rely on it as much or more so than adults—from as young as 4 years of age. Importantly, these previous studies tested only happy faces, and the present study has replicated and extended this result to also include negative-valence facial expressions.

Beyond intensity, we found no evidence that 8–12-year-olds could use any additional strategies. Concerning reliable-AUs, for sad AU1+4 was present in the genuine versions of our stimuli and absent in the pretend versions, yet children could not discriminate authenticity for sad faces above chance (contrasting with evidence that adults can use AU1+4 to determine authenticity; McLellan et al., 2010; but see Mehu et al., 2012). With respect to happy, our correlations with intensity across all three expressions (**Figure 6D**) are not inconsistent with ideas that children may potentially use AU6 but (in contrast to the conclusion of Thibault et al., 2009) suggest AU6 affects children's authenticity judgements only to the extent that *the presence of AU6 increases the intensity* of happy expressions. The idea that 8–12-year-olds do not yet use reliable-AUs effectively is also consistent with results for happy from Del Giudice and Colle (2007), who found 8-year-olds, unlike adults, interpreted a facial action that is similar to AU6, AU7 (the "lid tightener"), as signaling authentic happiness. We also found no suggestion that children used any strategies related to empathy (as potentially used by adults).

Overall, the results of the present study converge with previous findings to support a theoretical view in which intensity is the primary or only cue that elementary or primary-school aged children use to judge facial expression authenticity, and that the difference between adults and children in authenticity-discrimination ability arises because adults develop extra strategies in addition to intensity that emerge later in development (i.e., after 12 years of age).

An important question, then, becomes *why* is it that intensity would emerge earlier during development than other strategies? Concerning why intensity is learned *early*, one possibility might be that (a) intensity might have particular real-world value as an authenticity cue for happy expressions (i.e., more intense smiles are more likely to be genuine; Hess et al., 1995; Krumhuber and Manstead, 2009), combined with (b) children might have more opportunity to learn about the authenticity of happy than other emotions in early life, due to explicit instruction from parents (even young children are taught to pretend happiness in keeping with social norms, e.g., "smile for the camera") and/or having regular opportunities to observe genuine-pretend contingencies (i.e., their parents display intense happy expressions in response to something funny, and less intense happy expressions when politely greeting a disliked relative). Indeed, it may then be that initial learning of the value of intensity for determining genuineness of happy faces is extended by children (and adults) to a use of this cue for other emotions including, potentially, to emotions where intensity is in fact *not* a valid cue to authenticity (as for our sad and fear stimuli in the present laboratory study; and as indeed could occur in the real world where, for example, we know of no evidence as to whether genuinely-felt sad or fear expressions are typically more, or less, intense than their pretend counterparts).

Concerning why it is that other facial cues to authenticity (e.g., reliable-AUs, arousal cues) are learned *later* in development, a plausible possibility is that these are simply less physically obvious than intensity, resulting in young children either failing to *perceive* these more subtle cues or, perhaps, correctly perceiving them but failing to have learnt what they *mean* (e.g., 6–7 year olds do not consciously know the AU6 display rule; that is, "wrinkles around the eyes mean someone is really feeling happy," Gosselin et al., 2002b).

### **What Additional Cues Do Adults Use to Discriminate Authenticity?**

Our results argue that adults use extra strategies in addition to intensity information. Concerning the nature of these strategies, the present study investigated correlations with empathy, and found evidence potentially consistent with adults using empathyrelated strategies for determining authenticity of happy faces (but not sad faces). Concerning the correlation with affective empathy, theoretically, we suggest that participants might be able to use awareness of their own empathic response to the faces (e.g., Manera et al., 2013) to help judge authenticity. This is on the assumption that affective responses to genuine expressions might be stronger than to pretend ones, and this increased response to genuine faces might be greater in individuals with high affective empathy than in individuals with lower affective empathy. This reasoning predicts a positive correlation between affective empathy and authenticity discrimination ability, as we found. Concerning the correlation with cognitive empathy, then a positive correlation—as we found—is predicted if we assume that cognitive empathy might include knowledge of what physical cues are indicative of expression authenticity (e.g., that AU6 signals genuine happiness). (Importantly, we note that both these ideas make an assumption about the direction of causation, namely that empathy is causing strategies that assist with authenticity discrimination. It is, of course, equally possible that the correlation between the two variables could reflect an opposite-direction causality, in which individuals who are better at recognizing authentic emotions in others' faces go on to develop higher empathy as a result, at least by the time they are adults).

Regarding other possible strategies, our results suggest that adults use reliable-AU1+4 for sad, given that: this AU was present in our genuine stimuli and absent from the posed versions; and adult authenticity performance was above chance at the same time that intensity of the genuine and pretend versions was equal and there was no association with empathy. This adds to earlier evidence that adults use reliable-AUs (AU6 for happy, Del Giudice and Colle, 2007, Thibault et al., 2012) or proposed reliable-AUs for other emotions (e.g., AU15 for panic fear, Mehu et al., 2012) to judge authenticity. Additionally, adults may also use other physical cues within faces that have been proposed to differ between genuine and posed expressions (e.g., signs of physical arousal such as pupil dilation or skin "blushing," Levenson, 2014; or symmetry, Ekman, 2003).

### **Why Can't Even Adults Discriminate the Authenticity of the Fear Expressions?**

In our study, neither adults nor children could discriminate the authenticity of the fearful expression stimuli. This finding agrees with one of the two previous (adult) studies of fear stimuli generated using the same Miles/McLellan method (Douglas et al., 2012, obtained A' = 0.48 where chance is 0.5) and is not very different from the other (which found significant but weak discrimination ability; A' = 0.61, McLellan et al., 2010). We suggest two possible explanations for poor authenticity discrimination of the fear expressions.

One idea is that, while it may be adaptive to discriminate authenticity of most emotions (including in the present context, happy and sad), for fear "the negative consequences of failing to detect (fear) and then avoid (the cause of that fear) perhaps render even close approximations of fear signals as real" (McLellan et al., 2010, p. 1285). That is, it may be that it is adaptive to treat all fear expressions as if they are genuine (i.e., in everyday life, this could allow a person to rapidly avoid danger without waiting for a more time-consuming analysis of authenticity to be completed).

Alternatively, the inability to tell apart genuine and posed fear could reflect physical characteristics of the particular Miles/McLellan stimuli. These might fail to match real-world genuine fear faces in at least two ways. First, as noted, reliable-AUs for fear have not been empirically validated (although suggestions have been made by Ekman, 2003), and thus it is not known whether the fear stimuli included reliable-AUs (should they exist). Second, the fear faces are probably only modest in terms of the underlying strength of the emotion felt by the person shown in the stimulus photograph, meaning that the potential for the presence of other physical cues to genuineness (particularly arousal cues, i.e., pupil dilation, skin tone changes, etc.) may be limited. This is an intrinsic limitation of any fear face stimuli created in a laboratory setting. It is difficult to invoke a very strong feeling of fear in the lab: for somebody to feel strong fear, they must believe there is real danger, and it is not ethical or practical to, for example, release a tiger into the lab, or to set off a bomb. By comparison, it is much easier to induce strong underlying emotions of happiness in the lab (e.g., there are no ethical problems with making somebody laugh hilariously).

### **Limitations**

The present study has a number of limitations on scope, with corresponding implications for generalisability of the results. Perhaps most significantly, we used a small sample of stimuli (from McLellan and colleagues, personal communication, 2011). We chose to do this because we are unaware of any other stimulus sets meeting the core criteria we wanted our stimuli to meet: a set containing happy, sad and fear; in which the same model displays both genuine and pretend versions; for which it has been verified by self-report of the people photographed that their underlying emotions were indeed genuinely-felt, or pretended, respectively; and for which FACS coding confirmed the presence of empirically-supported reliable-AU markers in the genuine version (i.e., AU6 for happy and AU1+4 for sad). This small set of stimuli, however, is limited in four important ways. Regarding *intensity*, the genuine sad and fear faces are only moderate in expression intensity (**Figure 5**), and likely correspond to rather substantially lower intensity of the underlying emotion than would occur in some real world situations (e.g., bereavement; a terrorist attack); thus, we cannot rule out that abovechance authenticity discrimination might emerge for sad and fear expressions in children (and for fear in adults), if the genuine expression reflects a more intense underlying emotion (even if the pretend expression is also high-intensity to match). Concerning *age of the faces*, all images showed adults, and children may have more experience with children displaying genuine versus pretend sadness (e.g., siblings faking tears to get sympathy from a parent), and we also used *static photographs* while real-world facial expressions are dynamic (over a few hundred milliseconds, the expression begins to appear, reaches its maximum intensity, and then disappears again) that include additional physical differences between genuinely-felt and pretend emotions (e.g., genuine expressions have smoother onset and offset than posed ones; e.g., Schmidt et al., 2006). Thus we cannot rule out the possibility that authenticity discrimination could improve if movie-images, or children's faces, were used. Concerning the *number* of stimuli, it is possible that using a larger set may increase statistical power; while power alone seems unlikely to account for our finding that children could not discriminate sad authenticity above chance (given that the mean performance was 46% with a large number of child participants, i.e., *n* = 85), in the case of the y-intercept from our intensity analysis, this was numerically above 0.5 for children (i.e., y-intercept = 0.56). Potentially, additional stimuli might reduce the SE of the y-intercept value and thus increase the chances of finding evidence that children make some use of strategies additional to intensity (i.e., y-intercept significantly above 0.5).

It is also worth noting that the present study has only tested children's ability to make *explicit* decisions about authenticity. It would also be of interest to know whether children show differential *implicit* responses to genuine and posed facial expressions (as has been found in adults; e.g., Peace et al., 2006; Miles and Johnston, 2007). Potentially, differences in implicit behavior might emerge earlier than explicit knowledge; for example, even

### **References**


young children might show greater willingness to help a person displaying genuine than posed sadness.

### **Conclusion**

Our study has provided the first test of authenticity discrimination in children for facial expressions of basic emotions beyond happy (i.e., also sad and fear), including the first examination of use of intensity as a cue to authenticity across this broader range of emotions, plus the first test of relationships with empathy. Our results imply that authenticity discrimination from facial expressions matures surprisingly late in development, specifically some time during the teenage years, with children aged 8–12 having developed adult-like use of expression intensity as a cue to authenticity, but failing to show significant use of skills related to reliable-AUs (for sad) and empathy (for happy). This late maturity of authenticity discrimination ability for facial expressions suggests it will be important in future research to ascertain how its development impacts on social skills, such as friendship formation and maintenance, during the late primary school and teenage years.

### **Acknowledgments**

Funded by Australian Research Council Discovery Project grant to EM, RP, and RO'K (DP110100850), an Australian Research Council Queen Elizabeth II Fellowship to EM (DP0984558), and an Australian Postgraduate Award to AD (1166/2009). This work was also supported by the Australian Research Council Centre of Excellence for Cognition and its Disorders (CE110001021) http://www.ccd.edu.au. We thank Dr Tracey McLellan for supplying the stimuli, and Dr Jessica Irons and Emma Cummings for some participant testing.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Dawel, Palermo, O'Kearney and McKone. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Angry facial expressions bias gender categorization in children and adults: behavioral and computational evidence

Laurie Bayet 1, 2 \*, Olivier Pascalis 1, 2, Paul C. Quinn<sup>3</sup> , Kang Lee<sup>4</sup> , Édouard Gentaz 1, 2, 5 and James W. Tanaka<sup>6</sup>

<sup>1</sup> Laboratoire de Psychologie et Neurocognition, University of Grenoble-Alps, Grenoble, France, <sup>2</sup> Laboratoire de Psychologie et Neurocognition, Centre National de la Recherche Scientifique, Grenoble, France, <sup>3</sup> Department of Psychological and Brain Sciences, University of Delaware, Newark, DE, USA, <sup>4</sup> Dr. Eric Jackman Institute of Child Study, University of Toronto, Toronto, ON, Canada, <sup>5</sup> Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland, <sup>6</sup> Department of Psychology, University of Victoria, Victoria, BC, Canada

#### Edited by:

Bozana Meinhardt-Injac, Johannes Gutenberg University Mainz, Germany

#### Reviewed by:

Irene Leo, Università degli Studi di Padova, Italy Elisabeth M. Whyte, Pennsylvania State University, USA

#### \*Correspondence:

Laurie Bayet, Laboratoire de Psychologie et Neurocognition, Centre National de la Recherche Scientifique, UMR 5105, Université Grenoble-Alpes, Bâtiment Sciences de l'Homme et Mathématiques, BP47, Grenoble 38040, France laurie.bayet@upmf-grenoble.fr

#### Specialty section:

This article was submitted to Perception Science, a section of the journal Frontiers in Psychology

> Received: 06 February 2015 Accepted: 11 March 2015 Published: 26 March 2015

#### Citation:

Bayet L, Pascalis O, Quinn PC, Lee K, Gentaz É and Tanaka JW (2015) Angry facial expressions bias gender categorization in children and adults: behavioral and computational evidence. Front. Psychol. 6:346. doi: 10.3389/fpsyg.2015.00346 Angry faces are perceived as more masculine by adults. However, the developmental course and underlying mechanism (bottom-up stimulus driven or top-down belief driven) associated with the angry-male bias remain unclear. Here we report that anger biases face gender categorization toward "male" responding in children as young as 5–6 years. The bias is observed for both own- and other-race faces, and is remarkably unchanged across development (into adulthood) as revealed by signal detection analyses (Experiments 1–2). The developmental course of the angry-male bias, along with its extension to other-race faces, combine to suggest that it is not rooted in extensive experience, e.g., observing males engaging in aggressive acts during the school years. Based on several computational simulations of gender categorization (Experiment 3), we further conclude that (1) the angry-male bias results, at least partially, from a strategy of attending to facial features or their second-order relations when categorizing face gender, and (2) any single choice of computational representation (e.g., Principal Component Analysis) is insufficient to assess resemblances between face categories, as different representations of the very same faces suggest different bases for the angry-male bias. Our findings are thus consistent with stimulus-and stereotyped-belief driven accounts of the angry-male bias. Taken together, the evidence suggests considerable stability in the interaction between some facial dimensions in social categorization that is present prior to the onset of formal schooling.

Keywords: face, emotion, gender, children, representation, stereotype

### Introduction

Models of face perception hypothesize an early separation of variant (gaze, expression, speech) and invariant (identity, gender, and race) dimensions of faces in a stage called structural encoding (Bruce and Young, 1986; Haxby et al., 2000). Structural encoding consists of the abstraction of an expression-independent representation of faces from pictorial encodings or "snapshots." This results in the extraction of variant and invariant dimensions that are then processed in a hierarchical arrangement where invariant dimensions are of a higher order than the variant ones (Bruce and Young, 1986).

Facial dimensions, however, interact during social perception. Such interactions may have multiple origins, with some but not all requiring a certain amount of experience to develop. First, they may be entirely stimulus-driven or based on the coding of conjunctions of dimensions at the level of single neurons (Morin et al., 2014). Second, the narrowing of one dimension (Kelly et al., 2007) may affect the processing of another. For example, O'Toole et al. (1996) found that Asian and Caucasian observers made more mistakes when categorizing the gender of other-race vs. own-race faces, indicating that experience affects not only the individual recognition of faces (as in the canonical other-race effect, Malpass and Kravitz, 1969), but a larger spectrum of face processing abilities. Third, perceptual inferences based on experience may cause one dimension to cue for another as smiling does for familiarity (Baudouin et al., 2000). Finally, it has been suggested that dimensions interact based on beliefs reflecting stereotypes, i.e., beliefs about the characteristics of other social groups. For example, Caucasian participants stereotypically associate anger with African ethnicity (Hehman et al., 2014). This latter, semantic kind of interaction was predicted by Bruce and Young (1986) who postulated that (1) semantic processes feedback to all stages of face perception, and (2) all invariant dimensions (such as race, gender) are extracted, i.e., "visually-derived," at this semantic level. More generally, prejudice and stereotyping may profoundly influence even basic social perception (Johnson et al., 2012; Amodio, 2014) and form deep roots in social cognition (Contreras et al., 2012). Data on the development of these processes have reported an onset of some stereotypical beliefs during toddlerhood (Dunham et al., 2013; Cogsdill et al., 2014) and an early onset of the other-race effect in the first year of life (Kelly et al., 2007, 2009).

One observation that has been interpreted as a top-down effect of stereotyping is the perception of angry faces as more masculine (Hess et al., 2004, 2005, 2009; Becker et al., 2007), possibly reflecting gender biases that associate affiliation with femininity and dominance with masculinity (Hess et al., 2007). Alternatively, cues for angry expressions and masculine gender may objectively overlap, biasing human perception at a bottomup level. Using a forced-choice gender categorization task with signal detection analyses and emotional faces in adults (Experiment 1) and children (Experiment 2), and several computational models of gender categorization (Experiment 3), we aimed to (1) replicate the effect of anger on gender categorization in adults, (2) investigate its development in children, and (3) probe possible bases for the effect by comparing human performance with that of computational models. If the bias is purely driven by top-down beliefs, then computational models would not be sensitive to it. However, if the bias is driven by bottomup stimulus-based cues, then we expect computational models to be sensitive to such objective cues. To investigate the impact of different facial dimensions on gender-categorization, both ownrace and other-race faces were included as stimuli - the latter corresponding to a more difficult task condition (O'Toole et al., 1996).

### Experiment 1: Gender Categorization by Adults

To assess whether emotional facial expressions bias gender categorization, adults categorized the gender of 120 faces depicting unique identities that varied in race (Caucasian, Chinese), gender (male, female), and facial expression (angry, smiling, neutral). We hypothesized that the angry expression would bias gender categorization toward "male," and that this effect might be different in other-race (i.e., Chinese in the present study) faces that are more difficult to categorize by gender (O'Toole et al., 1996).

## Materials and Methods

### Participants and Data Preprocessing

Twenty four adult participants (mean age: 20.27 years, range: 17–24 years, 4 men) from a predominantly Caucasian environment participated in the study. All gave informed consent and had normal or corrected-to-normal vision. The experiment was approved by the local ethics committee ("Comité d'éthique des center d'investigation clinique de l'inter-région Rhône-Alpes-Auvergne," Institutional Review Board). Two participants were excluded due to extremely long reaction times (mean reaction time further than 2 standard deviations from the group mean). Trials with a reaction time below 200 ms or above 2 standard deviations from each participant's mean were excluded, resulting in the exclusion of 4.70% of the data points.

### Stimuli

One hundred twenty face stimuli depicting unique identities were selected from the Karolinska Directed Emotional Face database (Lundqvist et al., 1998; Calvo and Lundqvist, 2008), the Nim-Stim database (Tottenham et al., 2002, 2009), and the Chinese Affective Picture System (Lu et al., 2005) database in their frontal view versions. Faces were of different races (Caucasian, Chinese), genders (female, male), and expressions (angry, neutral, smiling). Faces were gray scaled and placed against a white background; external features were cropped using GIMP. Luminance, contrast, and placement of the eyes were matched using SHINE (Willenbockel et al., 2010) and the Psychomorph software (Tiddeman, 2005, 2011). Emotion intensity and recognition accuracy were matched across races and genders and are summarized in Supplementary Table 1. See **Figure 1A** for examples of the stimuli used. Selecting 120 emotional faces depicting unique identities for the high validity of their emotional expressions might lead to a potential selection bias, e.g., the female faces that would display anger most reliably might also be the most masculine female faces. To resolve this issue, a control study (Supplementary Material) was conducted in which gender typicality ratings were obtained for the neutral poses of the same 120 faces. See **Figure 1B** for examples of the stimuli used in the control study.

### Procedure

Participants were seated 70 cm from the screen. Stimuli were presented using E-Prime 2.0 (Schneider et al., 2002).

A trial began with a 1000–1500 ms fixation cross, followed by a central face subtending a visual angle of about 7 by 7◦ . Participants completed a forced-choice gender-categorization task.

They categorized each face as either male or female using different keys, and which key was associated with which gender response was counterbalanced across participants. The face remained on the screen until the participant responded. Participant response time and accuracy were recorded for each trial.

Each session began with 16 training trials with 8 female and 8 male faces randomly selected from a different set of 26 neutral frontal view faces from the Karolinska Directed Emotional Face database (Lundqvist et al., 1998; Calvo and Lundqvist, 2008). Each training trial concluded with feedback on the participant's accuracy. Participants then performed 6 blocks of 20 experimental trials, identical to training trials without feedback. Half of the blocks included Caucasian faces and half included Chinese faces. Chinese and Caucasian faces were randomly ordered across those blocks. The blocks alternated (either as Caucasian-Chinese-Caucasian. . . or as Chinese-Caucasian-Chinese. . . , counterbalanced across participants), with 5 s mandatory rest periods between blocks.

### Data Analysis

Analyses were conducted in Matlab 7.9.0529 and R 2.15.2. Accuracy was analyzed using a binomial Generalized Linear Mixed Model (GLMM) approach (Snijders and Bosker, 1999) provided by R packages lme4 1.0.4 (Bates et al., 2013) and afex 0.7.90 (Singmann, 2013). This approach is robust to missing (excluded) data points and is more suited to binomial data than the Analysis of Variance which assumes normality and homogeneity of the residuals. Accuracy results are presented in the Supplementary Material (Supplementary Figure 1, Supplementary Tables 2, 3). Inverted reaction times from correct trials were analyzed using a Linear Mixed Model (LMM) approach (Laird and Ware, 1982) with the R package nlme 3.1.105 (Pinheiro et al., 2012). Inversion was chosen over logarithm as variance-stabilizing transformation because it led to better homogeneity of the residuals. Mean gender typicality ratings obtained in a control study (Supplementary Material) were included as a covariate in the analysis of both accuracy and reaction times. Finally, signal detection theory parameters (d′ , c-bias) were derived from the accuracies of each participant for each condition using the female faces as "signal" (Stanislaw and Todorov, 1999), and then analyzed using repeated measures ANOVAs. Because female faces were used as the "signal" category in the derivation, the conservative bias (c-bias) is equivalent to a male bias. Data and code are available online at http://dx.doi.org/10.6084/m9.figshare.1320891.

### Results

#### Reaction Times

A Race-by-Gender-by-Emotion three-way interaction was significant in the best LMM of adult inverse reaction times (**Table 1**). It stemmed from (1) a significant Race-by-Emotion effect on male [χ 2 (2) <sup>=</sup> <sup>6</sup>.48, <sup>p</sup> <sup>=</sup> <sup>0</sup>.039] but not female faces [<sup>χ</sup> 2 (2) <sup>=</sup> <sup>4</sup>.20, p = 0.123], due to an effect of Emotion on Chinese male faces [χ 2 (2) <sup>=</sup> <sup>8</sup>.87, <sup>p</sup> <sup>=</sup> <sup>0</sup>.012] but not Caucasian male faces [<sup>χ</sup> 2 (2) <sup>=</sup> 2.49, p = 0.288]; and (2) a significant Race-by-Gender effect on neutral [χ 2 (1) <sup>=</sup> <sup>4</sup>.24, <sup>p</sup> <sup>=</sup> <sup>0</sup>.039] but not smiling [<sup>χ</sup> 2 (1) <sup>=</sup> <sup>3</sup>.31, p = 0.069] or angry [χ 2 (1) <sup>=</sup> <sup>0</sup>.14, <sup>p</sup> <sup>=</sup> <sup>0</sup>.706] faces. The former Race-by-Emotion effect on male faces was expected and corresponds to a ceiling effect on the reaction times to Caucasian male faces. The latter Race-by-Gender effect on neutral faces was unexpected and stemmed from an effect of Race in female [χ 2 (1) <sup>=</sup> <sup>7</sup>.91, <sup>p</sup> <sup>=</sup> <sup>0</sup>.005] but not male neutral faces [<sup>χ</sup> 2 (1) <sup>=</sup> <sup>0</sup>.28, p = 0.600] along with the converse effect of Gender on Chinese [χ 2 (1) <sup>=</sup> <sup>5</sup>.16, <sup>p</sup> <sup>=</sup> <sup>0</sup>.023] but not Caucasian neutral faces [χ 2 (1) <sup>=</sup> <sup>0</sup>.03, <sup>p</sup> <sup>=</sup> <sup>0</sup>.872]. Indeed, reaction time for neutral female Chinese faces was relatively long, akin to that for angry female Chinese faces (**Figure 2B**) and unlike that for neutral female Caucasian faces (**Figure 2A**). Since there was no hypothesis regarding this effect, it will not be discussed further.

Importantly, the interaction of Gender and Emotion in reaction time was significant for both Caucasian [χ 2 (2) <sup>=</sup> <sup>18</sup>.59, p < 0.001] and Chinese [χ 2 (2) <sup>=</sup> <sup>19</sup>.58, <sup>p</sup> <sup>&</sup>lt; <sup>0</sup>.001] faces. However, further decomposition revealed that it had different roots in Caucasian and Chinese faces. In Caucasian faces, the interaction stemmed from an effect of Emotion on female [χ 2 (2) <sup>=</sup> <sup>14</sup>.14, p = 0.001] but not male faces [χ 2 (2) <sup>=</sup> <sup>2</sup>.49, <sup>p</sup> <sup>=</sup> <sup>0</sup>.288]; in Chinese faces, the opposite was true [female faces: χ 2 (2) <sup>=</sup> <sup>2</sup>.58,

TABLE 1 | Best LMM of adult inverse reaction time from correct trials.


The model also included a random intercept and slope for participants. Significant effects are marked by an asterisk.

(paired Student t-tests, p < 0.05, uncorrected). Top: Caucasian (A) and Chinese (B) female faces. Bottom: Caucasian (C) and Chinese (D) male faces.

p = 0.276; male faces: χ 2 (2) <sup>=</sup> <sup>8</sup>.87, <sup>p</sup> <sup>=</sup> <sup>0</sup>.012]. Moreover, in Caucasian faces, Gender only affected reaction time to angry faces [angry: χ 2 (1) <sup>=</sup> <sup>11</sup>.44, <sup>p</sup> <sup>=</sup> <sup>0</sup>.001; smiling: <sup>χ</sup> 2 (1) <sup>=</sup> <sup>0</sup>.59, p = 0.442; neutral: χ 2 (1) <sup>=</sup> <sup>0</sup>.03, <sup>p</sup> <sup>=</sup> <sup>0</sup>.872], whereas in Chinese faces, Gender affected reaction time regardless of Emotion [angry: χ 2 (1) <sup>=</sup> <sup>25</sup>.90, <sup>p</sup> <sup>&</sup>lt; <sup>0</sup>.001; smiling: <sup>χ</sup> 2 (1) <sup>=</sup> <sup>7</sup>.46, <sup>p</sup> <sup>=</sup> <sup>0</sup>.029; neutral: χ 2 (1) <sup>=</sup> <sup>5</sup>.16, <sup>p</sup> <sup>=</sup> <sup>0</sup>.023].

The impairing effect of an angry expression on female face categorization was clearest on the relatively easy Caucasian faces, while a converse facilitating effect on male face categorization was most evident for the relatively difficult Chinese faces. The effect of Gender was largest for the difficult Chinese faces. The angry expression increased reaction times for Caucasian female faces (**Figure 2A**) and conversely reduced them for Chinese male faces (**Figure 2D**).

#### Sensitivity and Male Bias

A repeated measures ANOVA showed a significant Race-by-Emotion effect on both d′ (**Table 2**) and male-bias (**Table 3**).

Sensitivity was greatly reduced in Chinese faces (η <sup>2</sup> = 0.38, i.e., a large effect), replicating the other-race effect for gender categorization (O'Toole et al., 1996). Angry expressions reduced sensitivity in Caucasian but not Chinese faces (**Figures 3A,B**). Male bias was high overall, also replicating the finding by O'Toole et al. (1996). Here, in addition, we found that (1) the male bias was significantly enhanced for Chinese faces (η <sup>2</sup> = 0.35, another large effect), and (2) angry expressions also enhanced the male bias, as predicted, in Caucasian and Chinese faces (η <sup>2</sup> = 0.17, a moderate effect)—although to a lesser extent in the


The ANOVA also included a random factor for the participants, along with its interactions with both Race and Emotion. Significant effects are marked by an asterisk.


The ANOVA also included a random factor for the participants, along with its interactions with both Race and Emotion. Significant effects are marked by an asterisk.

latter (**Figures 3C,D**). Since Emotion affects the male bias but not sensitivity in Chinese faces, it follows that the effect of Emotion on the male bias is not solely mediated by its effect on sensitivity.

Further inspection of the experimental effect on the hit rate (female trials) and false alarm rate (male trials) confirmed, however, that the overall performance was at ceiling on male faces, as repeated measures ANOVAs revealed a significant interactive effect of Race and Emotion on the hit rate [F(2, 42) = 12.71, p < 0.001, η <sup>2</sup> = 0.07] but no significant effect of Race, Emotion, or their interaction on the false alarm rate (all ps > 0.05). In other words, the effects of Race and Emotion on d′ and male bias were solely driven by performance on female faces. Accuracy results are presented in the Supplementary Material (Supplementary Figure 1, Supplementary Table 2).

#### Discussion

The effect of anger on gender categorization was evident on reaction time, as participants were (1) slower when categorizing the gender of angry Caucasian female faces, (2) slower with angry Chinese female faces, and (3) quicker with angry Chinese male faces. Interestingly, the angry expression reduced sensitivity (d′ ) of gender categorization in own-race (Caucasian), but not in other-race (Chinese) faces. In other words, angry expressions had two dissociable effects on gender categorization: (1) they increased difficulty when categorizing own-race faces, and (2) they increased the overall bias to respond "male."

The results are consistent with the hypothesis of a biasing effect of anger that increases the tendency to categorize faces as male. However, a ceiling effect on accuracy for male faces made it impossible to definitively support this idea. To firmly conclude in favor of a true bias, it should be observed that angry expressions both hinder female face categorization (as was observed) and enhance male face categorization (which was not observed). While a small but significant increase in accuracy for angry vs. happy Chinese male faces was observed (Supplementary Figure 1D), there was no significant effect on the false alarm rate (i.e., accuracy on male trials).

Different from the present results, O'Toole et al. (1996) did not report an enhanced male bias for other-race faces (Japanese or Caucasian) faces, although they did find an effect on d′ that was replicated here, along with an overall male bias. The source of the difference is uncertain, one possibility being that the greater difficulty of the task used in O'Toole et al. (a 75 ms presentation of each face followed by a mask) caused a male bias for own-race faces, or that the enhanced male bias to other-race faces found in the present study does not generalize to all types of other-race faces. Finally, O'Toole et al. (1996) found that female participants had displayed higher accuracy on a gender categorization task than male participants. However, the sample for the current study did not include enough male participants to allow us to analyze this possible effect.

### Experiment 2: Gender Categorization in Children

One way to understand the male bias is to investigate its development. There is a general consensus that during development we are "becoming face experts" (Carey, 1992) and the immature face processing system that is present at birth will develop with experience until early adolescence (Lee et al., 2013). If the angry male bias develops through extensive experience with peers observing male aggression during the school years, it follows that the angry male bias should be smaller in children than in adults and that the bias would increase during the school years, a time period when children observe classmates (mostly males) engaging in aggressive acts inclusive of fighting and bullying.

In Experiment 2, we conducted the same gender categorization task as in Experiment 1 with 64 children aged from 5 to 12. The inclusion of children in the age range from 5 to 6, as well the testing of 7–8, 9–10, and 11–12 year-olds, is important from a developmental perspective. Experiment 2 should additionally allow us to (1) overcome the ceiling effect on gender categorization for male faces that was observed in Experiment 1 (as children typically perform worse than adults in gender categorization tasks, e.g., Wild et al., 2000), and (2) determine the developmental trajectory of the biasing effect of anger in relation to increased experience with processing own-race (Caucasian) but not other-race (Chinese) faces. While facial expression perception also develops over childhood and even adolescence (Herba and Phillips, 2004), recognition performance for own-race expressions of happiness and anger have been reported to be at ceiling from 5 years of age (Gao and Maurer, 2010; Rodger et al., 2015).

### Methods

### Participants and Preprocessing

Thirteen 5–6 year-olds (9 boys), 16 7–8 year-olds (3 boys), 15 9–10 year-olds (9 boys), and 14 11–12 year-olds (3 boys) from a predominantly Caucasian environment were included in the final sample. These age groups were chosen a priori due to the minimal need to re-design the experiment: children from 5 to 6 years of age may complete computer tasks and follow directions. A range of age groups was then selected from 5 to 6 years old onwards, covering the developmental period from middle to late childhood, and the time when children begin formal schooling. The experiment was approved by the University of Victoria Human Research Ethics Board and informed parental consent was obtained. Six additional participants were excluded due to non-compliance (n = 1) or very slow reaction times for their age (n = 5). Additionally, trials from participants were excluded if their reaction times were extremely short (less than 600, 500, 400, or 300 ms for 5–6 year olds, 7–8 year olds, 9–10 year olds, or 11–12 year olds, respectively) or further than 2 standard deviations away from the participant's own distribution. Such invalid trials were handled as missing values, leading to the exclusion of 11.35% data points in the 5–6 years olds, 5.57% in the 7–8 year olds, 5.28% in the 9–10 year olds, and 4.88% in the 11–12 year olds. The cut-offs used to exclude trials with very short reaction times were selected graphically based on the distribution of reaction times within each age group.

#### Stimuli, Procedure, and Data Analysis

Stimuli, task, procedure, and data analysis methods were identical to that of Experiment 1 except for the following: Participants were seated 50 cm from the screen so that the faces subtended a visual angle of approximately 11 by 11◦ . Due to an imbalance in the gender ratio across age groups, the participant's gender was included as a between-subject factor in the analyses. Data and code are available online at http://dx.doi.org/ 10.6084/m9.figshare.1320891.

### Results

#### Reaction Times

There was a significant Race-by-Gender-by-Emotion interaction in the best linear mixed model (LMM) of children's inverse reaction times from correct trials (**Table 4**), along with a three-way Age-by-Gender-by-Participant gender interaction, an Age-by-Race-by-Emotion interaction, and a Participant genderby-Gender-by-Emotion interaction.

The interaction of Age, Gender, and Participant gender was due to a significant Gender-by-Participant gender interaction in the 11–12 year olds [χ 2 (1) <sup>=</sup> <sup>6</sup>.19, <sup>p</sup> <sup>=</sup> <sup>0</sup>.013], with no significant sub-effects (ps > 0.05). The interaction of Gender, Emotion, and Participant gender was due to the effect of Gender on angry faces reaching significance in female (female faces, inverted RT: 9.35 ± 3.67.10−<sup>4</sup> ms−<sup>1</sup> ; male faces: 10.67 ± 3.51.10−<sup>4</sup> ms−<sup>1</sup> ) but not male participants (female faces, inverted RT: 8.88 ± 3.24.10−<sup>4</sup> ms−<sup>1</sup> ; male faces: 9.72 ± 3.26.10−<sup>4</sup> ms−<sup>1</sup> ), although the effect had the same direction in both populations. Importantly, however, the overall Gender-by-Emotion interaction was

#### TABLE 4 | Best LMM of children's inverted reaction times from correct trials.


The model also included a random intercept and slope for the participants. Significant effects are marked by an asterisk.

significant in both male [χ 2 (2) <sup>=</sup> <sup>7</sup>.44, <sup>p</sup> <sup>=</sup> <sup>0</sup>.024] and female participants [χ 2 (2) <sup>=</sup> <sup>52</sup>.41, <sup>p</sup> <sup>&</sup>lt; <sup>0</sup>.001]. The interaction of Race and Emotion with Age reflected the shorter reaction times of 5–6 year olds when categorizing the gender of Caucasian vs. Chinese smiling faces [χ 2 (2) <sup>=</sup> <sup>7</sup>.40, <sup>p</sup> <sup>=</sup> <sup>0</sup>.007], also evidenced by a significant Race-by-Age interaction for smiling faces only [χ 2 (3) <sup>=</sup> <sup>10</sup>.11, <sup>p</sup> <sup>=</sup> <sup>0</sup>.018]. Faster responses to smiling Caucasian faces by the youngest participants probably reflect the familiarity, or perception of familiarity in these stimuli.

Finally, the interactive effect of Gender and Emotion on reaction times was significant in Caucasian [χ 2 (2) <sup>=</sup> <sup>49</sup>.81, <sup>p</sup> <sup>&</sup>lt; <sup>0</sup>.001] but not Chinese faces [χ 2 (2) <sup>=</sup> <sup>2</sup>.25, <sup>p</sup> <sup>=</sup> <sup>0</sup>.325] leading to a Race-by-Gender-by-Emotion interaction. Further decomposition confirmed this finding: Race significantly affected reaction times for male [χ 2 (1) <sup>=</sup> <sup>19</sup>.52, <sup>p</sup> <sup>&</sup>lt; <sup>0</sup>.001] but not female angry faces [χ 2 (1) <sup>=</sup> <sup>1</sup>.86, <sup>p</sup> <sup>=</sup> <sup>0</sup>.173], Gender affected reaction times for Caucasian [χ 2 (1) <sup>=</sup> <sup>17</sup>.01, <sup>p</sup> <sup>&</sup>lt; <sup>0</sup>.001] but not Chinese angry faces [χ 2 (1) <sup>=</sup> <sup>0</sup>.48, <sup>p</sup> <sup>=</sup> <sup>0</sup>.489], and Emotion significantly affected the reaction times for Caucasian female [χ 2 (2) <sup>=</sup> <sup>29</sup>.88, <sup>p</sup> <sup>&</sup>lt; <sup>0</sup>.001] but not Chinese female [χ 2 (2) <sup>=</sup> <sup>3</sup>.82, <sup>p</sup> <sup>=</sup> <sup>0</sup>.148] or male faces [χ 2 (2) <sup>=</sup> <sup>5</sup>.13, <sup>p</sup> <sup>=</sup> <sup>0</sup>.077].

Children were slower when categorizing the gender of angry vs. happy Caucasian female faces (**Figure 2A**), and slightly faster when categorizing the gender of angry vs. happy Caucasian male faces (**Figure 2C**). The interaction of Gender and Emotion was present in all participants but most evident in female participants. It was absent in Chinese faces. In other words, an angry expression slows gender categorization in own-race (Caucasian) but not in other-race (Chinese) faces.

### Sensitivity and Male Bias

ANOVAs with participant as a random factor showed a small, but significant Race-by-Emotion interaction on sensitivity (d′ , **Table 5**, η <sup>2</sup> = 0.02) and male-bias (c-bias, **Table 6**, η <sup>2</sup> = 0.03). Neither for sensitivity nor for male-bias did the Race-by-Emotion interaction or its subcomponents interact with Age.

Two additional effects on sensitivity (d′ ) can be noted (**Table 5**). First, there was a significant effect of Age as sensitivity increased with age (η <sup>2</sup> = 0.09). Second, there was an interactive effect of Emotion and Participant gender that stemmed from female participants having higher sensitivity than male participants on happy [F(1, 114) = 9.14, p = 0.003] and neutral [F(1, 114) = 18.39, p < 0.001] but not angry faces [F(1, 114) = 0.39, p = 0.533]. Emotion affected the overall sensitivity of both female [F(1, 102) = 21.07, p < 0.001] and male participants [F(1, 72) = 4.69, p = 0.014].

The pattern of the interactive effect for Race and Emotion was identical to that found in adults: anger reduced children's sensitivity (d′ ) to gender in Caucasian faces (**Figure 3A**), but not in the already difficult Chinese faces (**Figure 3B**). This pattern is remarkably similar to that found in reaction times. In contrast, anger increased the male-bias in Caucasian (**Figure 3C**) as well as Chinese faces (**Figure 3D**), although to a lesser extent in

### TABLE 5 | ANOVA of d′ for children's gender categorization. Fixed effects SS d.f. MS F p η 2 Race\* 28.32 1 28.32 80.59 <0.001 0.13 Emotion\* 6.14 2 3.07 12.65 <0.001 0.03 Age\* 21.04 3 7.01 6.40 0.001 0.09 Participant gender 4.15 1 4.15 3.79 0.057 0.02 Race-by-emotion\* 4.55 2 2.27 8.58 <0.001 0.02 Age-by-race 2.56 3 0.85 2.42 0.076 0.01 Age-by-emotion 0.89 6 0.15 0.61 0.719 <0.01 Age-by-gender-by-emotion 1.12 6 0.19 0.71 0.644 0.01 Participant gender-by-race 0.83 1 0.83 2.35 0.131 <0.01 Participant gender-by-emotion\* 3.99 2 1.99 8.21 0.001 0.02 Participant gender-by-gender-by-emotion 0.36 2 0.18 0.68 0.511 <0.01 Age-by-participant gender 3.63 3 1.21 1.10 0.356 0.02 Error 28.07 106 0.27 Total 223.56 347

The ANOVA also included a random factor for the participants along with its interactions with both Race and Emotion. Significant effects are marked by an asterisk.

#### TABLE 6 | ANOVA of male-bias for children's gender categorization.


The ANOVA also included a random factor for participant, along with its interactions with both Race and Emotion. Significant effects are marked by an asterisk.

the latter category. In other words, the biasing effect of anger cannot be reduced to an effect of perceptual difficulty. Further analyses revealed that Race and Emotion affected the hit (female trials) and false alarm (male trials) rates equally, both as main and interactive effects [Race-by-Emotion effect on hit rate: F(2, 106) = 10.70, p < 0.001, η <sup>2</sup> = 0.02; on false alarm rate: F(2, 114) = 13.48, p < 0.001, η <sup>2</sup> = 0.03]. That is, the male-biasing effect of anger is evident by its interfering effect during female trials as well as by its converse facilitating effect during male trials. Accuracy results are presented in the Supplementary Material (Supplementary Figure 1, Supplementary Table 3).

These last observations are compatible with the idea that angry expressions bias gender categorization. The effect can be observed across all ages and even with unfamiliar Chinese faces, although in a diminished form. The biasing effect of anger toward "male" does not seem to depend solely on experience with a particular type of face and is already present at 5–6 years of age.

### Discussion

The results are consistent with a male-biasing effect of anger that is in evidence as early as 5–6 years of age and that is present, but less pronounced in other-race (Chinese) than in own-race (Caucasian) faces. The ceiling effect observed in Experiment 1 on the gender categorization of male faces (i.e., the false alarm rate) was sufficiently overcome so that the male-biasing effect of anger could be observed in male as well as female trials.

Participant gender interacted with Emotion on sensitivity and with Emotion and Gender on the reaction times of children. This finding partly replicates the finding by O'Toole et al. (1996) that female participants present higher face gender categorization sensitivity (d′ ) than male participants, particularly with female faces. Here, we further showed that in children, this effect is limited to neutral and happy faces, and does not generalize to angry faces.

It is perhaps surprising that anger was found to affect the malebias on Chinese as well as Caucasian faces, but only affected sensitivity (d′ ) and reaction times on Caucasian faces. Two dissociable and non-exclusive effects of angry expressions may explain this result. First, angry expressions may be less frequent (e.g., Malatesta and Haviland, 1982), which would generally slow down and complicate gender categorization decisions for familiar (Caucasian) but not for the already unfamiliar (Chinese) faces. This effect is not a bias and should only affect sensitivity and reaction time. Second, angry expressions may bias gender categorization toward the male response by either lowering the decision criterion for this response (e.g., as proposed by Miller et al., 2010) or adding evidence for it. It naturally follows that such an effect should be evident on the male-bias (c-bias), but not on sensitivity. Should it be evident in reaction time, as we initially predicted? Even if a bias does not affect the overall rate of evidence accumulation, it should provide a small advantage on reaction time for "male" decisions, and conversely result in a small delay on reaction time for "female" decisions. While this effect would theoretically not depend on whether the face is relatively easy (own-race) or difficult (other-race) to categorize, it is possible that it would be smaller in other-race faces for two reasons: (1) the extraction of the angry expression itself might be less efficient in other-race faces, leading to a smaller male-bias; and (2) the small delaying or quickening effect of anger could be masked in the noisy and sluggish process of evidence accumulation for other-race faces.

Three possible mechanisms could explain the male-biasing effect of angry expressions: Angry faces could be categorized as "male" from the resemblance of cues for angry expressions and masculine gender, from experience-based (Bayesian-like) perceptual inferences, or from belief-based inferences (i.e., stereotype). Of interest is that the male-biasing effect of anger was fairly constant from 5 to 12 years of age. There are at least two reasons why the male-biasing effect of anger would already be present in adult form in 5–6 years olds: (1) the effect could develop even earlier than 5–6 years of age, or (2) be relatively independent of experience (age, race) and maturation (age). Unfortunately, our developmental findings neither refute nor confirm any of the potential mechanisms for a male-bias. Indeed, any kind of learning—whether belief-based or experience-based - may happen before the age of 5 years without further learning afterwards. For example, Dunham et al. (2013) evidenced racial stereotyping in children as young as 3 years of age using a race categorization task with ambiguous stimuli. Similar findings were reported on social judgments of character based on facial features (Cogsdill et al., 2014). Conversely, the resemblance of cues between male and angry faces would not necessarily predict a constant male-biasing effect of anger across all age groups: for example, the strategy used for categorizing faces based on gender may well vary with age so that the linking of cues happens at one age more than another because children use one type of cue more than another at some ages. For example, it has been established that compared to adults, children rely less on second-order relations between features for various face processing tasks, and more on individual features, external features, or irrelevant paraphernalia, with processing of external contour developing more quickly than processing of feature information (Mondloch et al., 2002, 2003). Holistic processing, however, appears adult-like from 6 years of age onwards (Carey and Diamond, 1994; Tanaka et al., 1998; Maurer et al., 2002). Therefore, each age group presents a unique set, or profile, of face processing strategies that may be more or less affected by the potential intersection of cues between male and angry faces. Whichever mechanism or mechanisms come to be embraced on the basis of subsequent investigations, what our developmental findings do indicate is that the angrymale bias is not dependent on peers observing an association between males and aggression during the school age years.

## Experiment 3: Computational Models of Gender Categorization

To determine if the effect of anger on gender categorization could be stimulus driven, i.e., due to the resemblance of cues for angry expressions and masculine gender, machine learning algorithms were trained to categorize the gender of the faces used as stimuli in Experiments 1–2. If algorithms tend to categorize angry faces as being male, as humans do, then cues for anger and masculinity are conjoined in the faces themselves and there should be no need to invoke experience- or belief-based inferences to explain the human pattern of errors.

### Methods

Stimuli

Stimuli were identical to those used in Experiments 1, 2.

### Different Computational Models

Analyses were run in Matlab 7.9.0529. The raw stimuli were used to train different classifiers (**Figure 4A**). The stimuli were divided into a training set and a test set that were used separately to obtain different measures of gender categorization accuracy (**Figure 4B**). Several models and set partitions were implemented to explore different types of training and representations (**Table 7**; **Figure 4A**).

Different types of representations (Principal Component Analysis, Independent Components Analysis, Sparse Autoencoder, and Hand-Engineered features; **Table 7**; **Figure 4A**) were used because each of them might make different kinds of information more accessible to the classifier; i.e., the cue-dimension relationship that drives human errors may be more easily accessible in one representation than another. Sparse auto-encoded representations are considered the most "objective" of these representations in contrast to other unsupervised representations (Principal Component Analysis, Independent Components Analysis) that use a specific, deterministic method for information compression. Conversely, hand engineered features are the most "human informed" representation, since they were defined in Burton et al. (1993) using human knowledge about what facial features are (eyes, brows, mouth) and about the assumed importance of these features for gender categorization and face recognition. The choice of Principal Component Analysis as an unsupervised representation method (used in models A–C, and as a preprocessing step in models D–F) was motivated by the knowledge that PCA relates reliably to human ratings and performance (O'Toole et al., 1994, 1998) and has been proposed as a statistical


TABLE 7 | Representations, classifiers, and face sets used in the computational models of gender categorization.

analog of the human representation of faces (Calder and Young, 2005).

All models included feature scaling of raw pixels as a first preprocessing step. Models based on Principal Component Analysis (PCA, models A–C) used the first 16 principal components for prediction (75% of variance retained). Models based on Independent Components Analysis (ICA, models D–F) used the Fast-ICA implementation for Matlab (Gävert et al., 2005) that includes PCA and whitening as a preprocessing step. Sparse representations (models G–I) were obtained using the sparse auto-encoder neural network implemented in the NNSAE Matlab toolbox (Lemme et al., 2012). A sparse autoencoder is a particular kind of neural network that aims to obtain a compressed representation of its input by trial and error. The hand-engineered features used in models J-L were the 11 full-face 2D-features and second-order relations identified in Burton et al. (1993) as conveying gender information (for example, eyebrow thickness, eyebrow to eye distance, etc.).

Most models used a logistic regression classifier because this method provides log-odds that were useful for human validation. Models D–F used the Support Vector Machine Classifier implementation from the SVM-KM toolbox for Matlab (Gaussian kernel, h = 1000, quadratic penalization; Canu et al., 2005) because in those models the problem was linearly separable (meaning that using logistic regression was inappropriate and would lead to poor performance).

Each model was trained on a set of faces (the training set, leading to the computation of training set accuracy), and then tested on a different set of faces (the test set, resulting in computation of test accuracy). Accuracy on the training sets was further evaluated using Leave-One-Out cross-validation (LOOCV), which is thought to reflect generalization performance more accurately than training accuracy. Accuracies at test and cross-validation (LOOCV) were pooled together for comparing the performance on (angry) female vs. male faces. See **Figure 4B** for a schematic representation of this set up.

The partitioning of faces as training and test sets differed across the models (**Figure 4B**). The partitioning of models A, D, G, and J ("familiar") was designed to emulate the actual visual experience of human participants in Experiments 1–2. The partitioning for models B, E, H, and K ("full set") was designed to emphasize all resemblances and differences between faces equally without preconception. The partitioning for models C, F, I, and L ("test angry") was designed to maximize the classification difficulty of angry faces, enhancing the chance to observe an effect.

#### Human Validation

Gender typicality ratings from a control experiment (Supplementary Material) were used to determine how each model accurately captured the human perception of gender: the classifier should find the most gender-typical faces easiest to classify, and viceversa. Ratings from male and female faces from the training sets were z-scored separately, and the Pearson's correlation between those z-scored ratings and the linear log-odds output from each model at training were computed. The log-odds represent the amount of evidence that the model linearly accumulated in favor of the female response (positive log-odds) or in favor of the male response (negative log-odds). The absolute value of the log-odds was used instead of raw log-odds so that the sign of the expected correlation with gender typicality was positive for both male and female faces and one single correlation coefficient could be computed for male and female faces together. Indeed, the faces with larger absolute log-odds are those that the model could classify with more certainty as male or female: if the model adequately emulated human perception, such faces should also be found more gender typical by humans.

Data and code are available online at http://dx.doi.org/ 10.6084/m9.figshare.1320891.

### Results

Results are summarized in **Table 8** below.

### Overall Classification Performance

Sparse-based models (**Table 8**, SAE, G–I) performed poorly (around 50% at test and cross-validation) and showed no correlation with human ratings, probably due to the difficulty of training this kind of network on relatively small training sets. Those models were therefore discarded from further discussion. PCAbased models (**Table 8**, PCA, A–C) on the other hand had satisfactory test (68.75–77.50%) and cross-validation (66.25–76.67%) accuracies, comparable to that of 5–6 year old children (Supplementary Figure 1). ICA- and SVM- based models (**Table 8**, ICA, D–F) performed, as expected, slightly better than models A-C at training (100%) and cross-validation (85%). However, performance at test (68.75–72.50%) was not better. Models based on hand-engineered features (**Table 8**, HE, J–L) had test and crossvalidation performance in comparable ranges (62.50–76.67%), and their training accuracy (81.00–85.00%) was comparable to that of 85.5% reported by Burton et al. (1993) on a larger set of neutral Caucasian faces (n = 179). Most notably, the latter models all included eyebrow width and eye-to-eyebrow distance as significant predictors of gender.

#### Human Validation

Classification evidence (absolute log-odds) correlated with zscored human ratings in 2 of the 3 models from the PCA based model family (**Table 8**, A,B) as well as in 2 of the 3 models based on hand-engineered features (**Table 8**, K,L). The highest correlation (Pearson r = 0.46, p = 0.003) was achieved in model A that used PCA and a training set designed to emulate the content of the participants' visual experience ("familiar"). PCA-based representations might dominate when rating the gender typicality of familiar faces, while a mixture of "implicit" PCA-based and "explicit" feature-based representations might be used when rating the gender typicality of unfamiliar faces.

#### Replication of Human Errors

Only one of the models (**Table 8**, D) exhibited an other-race effect, and this effect was only marginal [1 = −15.00%, p = 0.061, χ 2 (1) <sup>=</sup> <sup>3</sup>.52]. Two models actually exhibited a reverse other-race effect, with better classification accuracy on Chinese than Caucasian faces [model C: 1 = 16.67%, p = 0.046, χ 2 (1) <sup>=</sup> 3.97; model K: 1 = 16.67%, p = 0.031, χ 2 (1) <sup>=</sup> <sup>4</sup>.66]. Overall, the computational models failed to replicate the other-race effect for human gender categorization that was reported in Experiments 1–2 and in O'Toole et al. (1996).

The pattern of errors from PCA- or ICA-based models (**Table 8**, A–F) and feature-based models (**Table 8**, J–L) on female vs. male faces were in opposite directions. Four out of 6 PCAand ICA- based models made significantly (**Table 8**, A,B,D) or marginally more mistakes (F) on male vs. female angry faces. Conversely, all 3 feature-based models (**Table 8**, J–L) made more mistakes on female vs. male angry faces, as did humans in Experiments 1–2. Similar patterns were found when comparing classification performance on all female vs. male faces, although the effect only reached significance in 2 out of 6 PCA- or ICA-based models (**Table 8**, A,D) and in 1 out of 3 feature-based models (**Table 8**, L). Hence, two different types of representations led to completely different predictions of human performance, only one of which replicated the actual data. Thus, the features of angry faces resemble that of male faces, potentially biasing gender categorization. However, this information is absent in



Models used either Principal Component Analysis (PCA, models A–C), Independent Component Analysis (ICA, models D–F), features generated by a sparse auto-encoder (SAE, models G–I), or hand-engineered features (HE, models J–L). Correlations with ratings are Pearson correlation coefficients between absolute log-odds at training and z-scored gender typicality ratings from humans. Results from the sparse auto-encoder vary at each implementation as the procedure is not entirely deterministic; a single implementation is reported here.

PCA and ICA representations that actually convey the reverse bias.

Absolute log-odds obtained by the feature-based model J on familiar (neutral and happy Caucasian) faces significantly correlated with mean human (children and adults) accuracy on these faces in Experiments 1–2 (Spearman r = 0.39, p = 0.013), while the absolute log-odds obtained by the PCA-based model A on those same faces correlated only marginally with human accuracy (Spearman's r = 0.28, p = 0.077). In other words, feature-based models also better replicated the human pattern of errors in categorizing the gender of familiar faces. See Supplementary Table 4 for a complete report of correlations with human accuracies for models A–C and J–L.

### Discussion

Overall, the results support the idea that humans categorize the gender of faces based on facial features (and second-order relations) more than on a holistic, template-based representation captured by Principal Component Analysis (PCA). In contrast, human ratings of gender typicality tracked feature-based as well as PCA-based representations. This feature-based strategy for gender categorization leads to a confusion between the dimensions of gender and facial expression, at least when the faces are presented statically and in the absence of cues such as hairstyle, clothing, etc. In particular, angry faces tend to be mistaken for male faces (a male-biasing effect).

Several limitations should be noted, however. First, training sets were of relatively small size (40–120 faces), limiting the leeway for training more accurate models. Second, the ratings used for human validation were obtained from neutral poses (control study, Supplementary Material) and not from the actual faces used in Experiment 3, and there were several missing values. Thus, they do not capture all the variations between stimuli used in Experiment 3. While a larger set of faces could have been manufactured for use in Experiment 3, along with obtaining their gender typicality ratings, it was considered preferable to use the very same set of faces in Experiments 1–2. Indeed, it allowed a direct comparison between human and machine categorization accuracy. Finally, our analysis relied on correlations that certainly do not imply causation: for example, one could imagine that machine classification logodds from feature-based models correlated with mean human classification accuracy not because humans actually relied on these features, but because those features are precisely tracking another component of interest in human perception—for example, perceived anger intensity. A more definitive conclusion would require a manipulation of featural cues (and secondorder relations) as is usually done in studies with artificial faces (e.g., Oosterhof and Todorov, 2009). Here, we chose to use real faces: although they permit a more hypothesis-free investigation of facial representations, they do not allow for fine manipulations.

That a feature-based model successfully replicated the human pattern of errors does not imply that such errors were entirely stimulus driven. Indeed, a feature-based strategy may or may not be hypothesis-free: for example, it may directly reflect stereotypical or experiential beliefs about gender differences in facial features (e.g., that males have thicker eyebrows) so that participants would use their beliefs about what males and females look like to do the task—beliefs that are reinforced by cultural practices (e.g., eyebrow plucking in females). In fact, a feature-based strategy could be entirely explicit (Frith and Frith, 2008); anecdotally, one of the youngest child participants explicitly stated to his appointed research assistant that "the task was easy, because you just had to look at the eyebrows." On a similar note, it would be inappropriate to conclude that angry faces "objectively" resemble male faces as representations from Principal Component Analysis may be considered more objective than feature-based representations. Rather, it is the case that a specific, feature-based representation of angry faces resembles that of male faces. This point applies to other experiments in which a conjoinment of variant or invariant facial dimensions was explored computationally using human-defined features (e.g., Zebrowitz and Fellous, 2003; Zebrowitz et al., 2007, 2010). It appears then that the choice of a particular representation has profound consequences when assessing the conjoinment of facial dimensions. Restricting oneself to one particular representation of faces or facial dimensions with the goal of emulating an "objective" perception may not be realizable. Evaluating multiple potential representational models may thus be the more advisable strategy.

### General Discussion

Overall, the results established the biasing effect of anger toward male gender categorization using signal detection analyses. The effect was present in adults as well as in children as young as 5–6 years of age, and was also evident with other-race faces for which anger had no effect on perceptual sensitivity.

The present results (1) are in accord with those of Becker et al. (2007) who reported that adults categorized the gender of artificial male vs. female faces more rapidly if they were angry, and female vs. male faces if they were smiling, and (2) replicate those of Hess et al. (2009) who reported that adults took longer to categorize the gender of real angry vs. smiling Caucasian female faces, but observed no such effect in Caucasian male faces. Similarly, Becker et al. (2007) found that adults were faster in detecting angry expressions on male vs. female faces, and in detecting smiling expressions on female vs. male faces. Conversely, Hess et al. (2004) found that expressions of anger in androgynous faces were rated as more intense when the face had a female rather than male hairline, a counter-intuitive finding that was explained as manifesting a violation of expectancy. Here, we complement the prior findings taken together by providing evidence for a male-biasing effect of anger using signal detection analyses, real faces, and a relatively high number of different stimuli.

We did not observe an opposing facilitation of gender categorization of female smiling faces, as could be expected from the results of Becker et al. (2007) and Hess et al. (2009), probably because in the present study, facial contours were partially affected by cropping. Furthermore, our results differ from those of Le Gal and Bruce (2002) who reported no effect of expression (anger, surprise) on gender categorization in 24 young adults, a null finding that was replicated by Karnadewi and Lipp (2011). The difference may originate from differences in experimental procedure or data analysis; both prior studies used a Gardner paradigm with a relatively low number of individual Caucasian models (10 and 8, respectively) and analyzed reaction times only, while reporting very high levels of accuracy suggestive of a ceiling effect [in fact, 22 participants from Le Gal and Bruce (2002) that had less than 50% accuracy in some conditions were excluded; not doing so would have violated assumptions for the ANOVAs on correct reaction times].

The findings yield important new information regarding the development of the angry-male bias. In particular, the malebiasing effect of anger was fairly constant from 5 to 6 years of age to young adulthood; the extensive social observation gained during schooling does not seem to impact the bias. This result is in accord with recent reports by Banaji and colleagues (Dunham et al., 2013; Cogsdill et al., 2014) showing that even belief-based interactions in the categorization of faces appear in their adult form much earlier than expected and do not appear to require extensive social experience. For example, Caucasian children as young as 3 years of age (the youngest age studied) were as biased as adults in categorizing racially ambiguous angry faces as Black rather than Caucasian (Dunham et al., 2013), an implicit association usually understood to reflect stereotyping (Hehman et al., 2014). Similarly, children aged from 3 to 5 stereotypically associated maleness with anger in cartoon faces (Birnbaum et al., 1980). Such biases may begin to develop in early infancy, a developmental period characterized by the emergence of gendered face representations rooted in visual experience (Quinn et al., 2002). Indeed, studies of racial prejudice have demonstrated a link between the other-race effect, a perceptual effect developing in infancy, and belief-based racial biases that are apparent from early childhood through adulthood such as associating other-race African faces with the angry expression (Xiao et al., 2015). It is possible that similar trajectories from perceptual to social representations may be found for gender. For example, a recent, unpublished study found that 3.5-month-old infants preferred a smiling to a neutral female expression, but preferred a neutral to a smiling male expression (Bayet et al., manuscript under review), suggesting an early association between female faces and positive emotions that results from differential perceptual or social experience with female caregivers. Such an early association could be a precursor to the increased performance of 5–6 year old children on smiling female faces that was observed in Experiment 2. Future studies on the developmental origins of stereotypes should focus on (1) finding precursors of stereotypes in infancy, and (2) bridging the gap between infancy and early childhood, thus providing a basis for early intervention that could curtail formation of socially harmful stereotypes.

Here, the male-biasing effect of anger appeared to be at least partially mediated by featural (e.g., brow thickness) and secondorder (e.g., brow to eye distance) cues. While children have been reported to be less sensitive than adults to second-order relationships in some studies (e.g., Mondloch et al., 2002) and are less accurate in identifying facial emotional expressions (Chronaki et al., 2014), their encoding of featural information appears already mature at 6 years of age (Maurer et al., 2002) and they can recognize angry and smiling expressions most easily (Chronaki et al., 2014). Thus, the stability of the male-biasing effect of anger does not contradict current knowledge about children's face processing skills.

As discussed above, neither our behavioral nor our computational findings allowed us to embrace a particular mechanism for the male-biasing effect of anger, i.e., whether it was stimulus driven (an inherent conjoinment of dimensions) or stemmed from belief-based inferences. The findings are, however, relevant to the ongoing debate about the nature of face representations in the human brain. As stated by Marr (1982), any type of representation makes some kind of information evident while obscuring other kinds of information, so that studying the nature and origin of representational processes is at the heart of explaining low, middle, and high level vision. Various types of face representations have been proposed. For example, an important study in rhesus macaques found face-specific middle temporal neurons to be tuned to particular features or their combination while being affected by inversion (Freiwald et al., 2009). Other studies in humans have (1) emphasized the role of 2-D and 3- D second order relations in addition to features (Burton et al., 1993), and (2) argued for a double dissociation of featural and configural encoding (Renzi et al., 2013). An opposing line of argument has been advanced for a role of unsupervised representation analogs to Principal Component Analysis (Calder and Young, 2005) or Principal Component Analysis combined with multi-dimensional scaling (Gao and Wilson, 2013) or Gabor filters (Kaminski et al., 2011). All of those potential representations are fully compatible with the general idea of a face space (Valentine, 2001) since the face space may, in theory, present with any particular set of dimensions. Here, we provide additional evidence supporting the importance of features and second-order relations in the human processing of faces, and argue for the need to systematically consider various representational models of face processing when determining whether performance is stimulus driven, and to evaluate their respective contributions in perception depending on task, species, and developmental stage.

In conclusion, the present results indicate that the angrymale bias, whether stimulus- or belief- driven, does not require extensive social interaction with school-age peers to develop. It is in evidence as early as 5 years of age, and appears remarkably unaffected by experience during the primary grade levels, a developmental period that presumably includes observation of males engaging in aggressive acts.

### Author Contributions

Study design was performed by LB, KL, OP, PQ, and JT. Data acquisition was conducted by LB, OP, and JT. Data analysis was performed by LB. All authors contributed to data interpretation, approved the final version of the article, revised it critically for intellectual content, and agree to be accountable for all aspects of the work.

### Acknowledgments

This work was funded by the NIH Grant R01 HD-46526 to KL, OP, PQ, and JT, and a PhD scholarship from the French Department of Research and Higher Education to LB. The

### References


authors thank the families, adult participants, and research assistants that took part in these studies, and declare no conflict of interest.

### Supplementary Material

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fpsyg. 2015.00346/abstract


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Bayet, Pascalis, Quinn, Lee, Gentaz and Tanaka. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# What Facial Appearance Reveals Over Time: When Perceived Expressions in Neutral Faces Reveal Stable Emotion Dispositions

Reginald B. Adams Jr.<sup>1</sup> \*, Carlos O. Garrido<sup>1</sup> , Daniel N. Albohn<sup>1</sup> , Ursula Hess<sup>2</sup> and Robert E. Kleck<sup>3</sup>

<sup>1</sup> Department of Psychology, The Pennsylvania State University, University Park, PA, USA, <sup>2</sup> Department of Psychology, Humboldt-Universität zu Berlin, Berlin, Germany, <sup>3</sup> Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA

It might seem a reasonable assumption that when we are not actively using our faces to express ourselves (i.e., when we display nonexpressive, or neutral faces), those around us will not be able to read our emotions. Herein, using a variety of expression-related ratings, we examined whether age-related changes in the face can accurately reveal one's innermost affective dispositions. In each study, we found that expressive ratings of neutral facial displays predicted self-reported positive/negative dispositional affect, but only for elderly women, and only for positive affect. These findings meaningfully replicate and extend earlier work examining age-related emotion cues in the face of elderly women (Malatesta et al., 1987a). We discuss these findings in light of evidence that women are expected to, and do, smile more than men, and that the quality of their smiles predicts their life satisfaction. Although ratings of old male faces did not significantly predict self-reported affective dispositions, the trend was similar to that found for old female faces. A plausible explanation for this gender difference is that in the process of attenuating emotional expressions over their lifetimes, old men reveal less evidence of their total emotional experiences in their faces than do old women.

#### Keywords: face perception, emotional expression, person perception, aging, appearance

### INTRODUCTION

"Wrinkles should merely indicate where smiles have been."

∼Mark Twain

Given the importance of emotion recognition for smooth social interaction and interpersonal functioning (cf. Feldman et al., 1991; Carton et al., 1999; Niedenthal and Brauer, 2012) the ability of the elderly to accurately decode emotion expressions has been intensely studied (Ruffman et al., 2008). The question of how accurately the expressions of older individuals are recognized by other human observers, however, and of how emotion perceived in their neutral facial displays may reveal a lifetime of experience and expressed emotion, has received very little empirical attention.

General negative stereotypes may be one source of perceptual bias in reading expressions. Indeed, the most prevalent age-related stereotype is that the elderly are more emotionally negative than their younger counterparts, resulting in an overall negativity bias toward them

#### Edited by:

Andrea Hildebrandt, Ernst-Moritz-Arndt Universität Greifswald, Germany

#### Reviewed by:

Rachael Elizabeth Jack, University of Glasgow, UK Laura Kaltwasser, Humboldt-Universität zu Berlin, Germany

> \*Correspondence: Reginald B. Adams Jr. regadams@psu.edu

#### Specialty section:

This article was submitted to Perception Science, a section of the journal Frontiers in Psychology

Received: 20 July 2015 Accepted: 14 June 2016 Published: 30 June 2016

#### Citation:

Adams RB Jr, Garrido CO, Albohn DN, Hess U and Kleck RE (2016) What Facial Appearance Reveals Over Time: When Perceived Expressions in Neutral Faces Reveal Stable Emotion Dispositions. Front. Psychol. 7:986. doi: 10.3389/fpsyg.2016.00986

(Kite and Johnson, 1988; Fabes and Martin, 1991; Ebner, 2008). A meta-analytic review of studies that examined general attitudes about the young and the old (Kite et al., 2005) found that this negative age bias decreases as information about the target person is learned. However, even when this bias is not explicit, it remains substantial when implicit evaluations are examined (see Hummert et al., 2002). Interestingly, such biases also appear to be pan-cultural. For example, in a large study of 26 different cultures, researchers found widespread agreement across cultures regarding negative elderly stereotypes, including physical and socioemotional areas of functionality (Löckenhoff et al., 2009). Further, when Chinese and American cultures were examined, researchers found that contrary to what was expected, both cultures exhibited negative views toward the elderly (Boduroglu et al., 2006). When these same participants were explicitly asked about their emotion expectations (i.e., rating "typical" young and elderly adults, without faces presented), however, these differences were not found.

This perceived negativity extends to the perception of specific expressions in elderly faces. Elderly faces are typically rated as expressing more negative emotions and as being less attractive than young adult faces. Such biases may stem from age-related stereotypes, but another likely source are the wrinkles and folds associated with aging, which can be misperceived as expressing negative emotions (Hess et al., 2012). As such, even an elderly neutral expression may contain incidental expressive features, such as downturned mouth corners, that disrupt and/or bias perception. Such emotion-resembling features then influence perception through a process of emotion overgeneralization (Zebrowitz et al., 2010; see also Todorov et al., 2008 and Adams et al., 2012). These perceptual biases then can serve as both a source of and fuel for general negative elderly stereotypes.

Other social categories such as gender and race have also been found to have facial appearance cues that are confounded perceptually with emotion expressions (Becker et al., 2007; Adams et al., 2015). One particularly compelling study, using a connectionist model trained to detect emotion, revealed that neutral male faces activated angry expression nodes more, and happy expression nodes less, than neutral female faces (Zebrowitz et al., 2010). Likewise, White faces were found to activate anger expression nodes more than African American or Korean faces, while African American faces activated happy and surprise nodes more than White faces. Critically, these findings were based purely on facial metric data, and therefore were necessarily uninfluenced by social learning or culture, thereby offering direct evidence for an objective structural resemblance of typical sex and race appearance to these expressions. Critically, emotion-resembling cues such as these have been demonstrated shape trait impression formation (Adams et al., 2012). Adding emotion-resembling cues (e.g., heightened brow, thinner lips) to otherwise neutral facial texture maps impacted a whole host of trait impressions that are otherwise seemingly independent of emotion (e.g., cooperativeness, honesty, naivety, trustworthiness, dominance, rationality). Despite a growing number of studies now pointing to age-related changes in facial appearance being confounded with expressive cues, little work has been conducted on emotion overgeneralization effects of elderly faces.

Early work conducted by Malatesta et al. (1987b) did suggest that morphological changes in the face due to aging can be misinterpreted as emotional cues due to their direct resemblance to expressions. For instance, drooping of the eyelids or corners of the mouth might be misinterpreted as sadness. In their study, they asked young, middle-aged, and older women to rate the videotaped emotion expressions of young, middle-aged, and older women. One result was that the ability to decode expressions varied with age congruence between encoder and decoder (i.e., the different age groups were better at decoding emotion in same age faces). Most relevant to the current work is that they also found that the emotion expressions of older individuals were more difficult to decode (lower emotion recognition accuracy) due to age-related appearance changes in the face (Malatesta, Izard, Culver, and Nicolich).

More recently, Hess et al. (2012) followed up on this previous work using cutting edge technological advances that offered more precision and experimental control. This work confirmed that advanced aging of the face does degrade the clarity of specific emotional expressions. In this study, identical expressions were applied to young and old faces using FaceGen, a state of the art 3D facial modeling software (Singular Inversions, Vancouver, BC, Canada). The effects of aging were thereby examined while holding the underlying expression constant. In this study, young faces were rated as expressing target emotions more intensely, whereas older faces were rated as more emotionally complex (i.e., they had higher ratings across a number of non-target emotions). In other words, the greater number of emotions present elder faces was associated with a reduced signal clarity for any given target emotion. Neutral old faces were also rated as more emotionally complex, particularly for anger and fear (Hess et al., 2012).

These findings are consistent with another earlier study conducted by Malatesta et al. (1987a), in which they asked 14 elderly models to pose 5 different emotions (anger, fear, sad, joy, and neutral). In this study they found that, aside from happy displays, all other photographic stimuli produced high error rates, suggesting again that wrinkles give rise to more complex and negative looking expressions. Notably, even for neutral faces over 60% of labels given represented negative emotions (note there was no "neutral" label offered): 15% sadness, 14% contempt, 11% anger 8% fear, 7% disgust, 5% guilt, 4% shame/shyness. Matheson (1997) similarly found that when focused on the perception of pain in the face young adult observers were systematically predisposed to see more pain in the faces of the elderly, including in their neutral faces, again presumably due to misreading aging cues as expressive.

One particularly compelling finding in the Malatesta et al. (1987a) study was the correspondence between misperceived emotion displays in elderly faces and the models' self-reported emotionality. Before posing emotions, the fourteen elderly actors in this study also filled out a Differential Emotion Scale (DES; Izard, 1972) on the same emotions that independent raters later used to label their expressions based on their facial poses (these included, anger, interest, sadness, joy, contempt, disgust, shame/shy, guilt, fear, and surprise). When judges' mean error rates (i.e., the average error rate for a particular emotion collapsed

across all of the actor's posed expressions) for labeling expressions were examined, they found correlations between specific types of errors and the participants' own DES scores. For example, judges' errors for selecting a face as angry predicted participants' anger scores on the DES, as did sadness, contempt, and guilt. In all, 19 out of 100 correlations conducted were significant, beyond what would be expected by chance alone (i.e., p < 0.05). The authors concluded that when individuals make inferences about a face, the errors they make reveal something accurate about the actor's own emotional predisposition (Malatesta et al., 1987a).

To our knowledge, no study to date has followed up on these intriguing findings. Further these findings only hinted at a possible connection between emotion perceived from neutral faces and the models' dispositional affect. Thus it remains an empirical question whether the neutral face alone, with all its appearance confounding emotion cues including wrinkles, folds, and facial musculature sagging, can reveal something about the emotional nature of the individual. Thus, we sought to replicate and extend this previous work, and did so in three primary ways.

First, we sought to examine whether these effects generalize to elderly male faces as well. Because there are differences in expected expressivity in males and females, with men being being expected (overall) to suppress emotional expression more than women (see Fabes and Martin, 1991), we might expect old men to show a similar, though reduced effect as has been found for women. Despite an overall expectation to suppress, however, men are also expected to express more power-oriented emotions such as disgust and anger than women (Fabes and Martin, 1991; Fischer, 2000), suggesting that the emotions revealed through a lifetime of expression might also be different for men than women.

Second, Malatesta et al. (1987a) examined misattributed emotion labels to five target displays, to examine whether emotion-resembling cues in the face lead to diagnostically accurate mistakes. Their conclusion was that there is something about neutral facial appearance driving these erroneous, yet accurate impressions. In the current work we wanted to take a more direct approach to this question by focusing on ratings of perceived emotions in neutral faces to assess if these would also predict self-reported ratings. To do this, we used a widely used and well-validated measure of affective disposition, which gauges trait positive and negative affect (i.e., PANAS; Watson et al., 1988). We examined this question using a variety of expressionrelated face ratings. In Study 1, we had faces rated on two simple scales, one gaging positive and one negative affect. Because these scales were conflated in Study 1, in Study 2 we had the faces rated on the same twenty items that the participants had used to rate themselves – that is on the PANAS items. In Study 3, we extended these findings by having participants rate the faces on a number of discrete indices including "basic" emotions (i.e., the "Big 6" anger, fear sad, joy, disgust, and surprise), as well as a variety of trait dispositions ratings that have been previously linked to emotion resembling cues in the face (Adams et al., 2012). In all three studies the question was the same: does perceived positive/negative expressions in otherwise neutral faces predict the models' own self-reported positive/negative affect?

The third way in which we sought to extend Malatesta et al.'s (1987a) previous work was to include a young adult sample to serve as a comparison group. If it is the case that age-dependent cues such as wrinkles and folds in the face drive these effects by resembling expressive cues in the face, then we would expect them to emerge most robustly in older faces. Having a young adult control condition then becomes an important baseline comparison to assess this possibility.

In light of research showing that certain age-related cues affect signal clarity by increasing the emotional content perceived in faces, we predicted that the same aging cues that otherwise obscure emotional displays will likewise contribute to perceptions of emotion in a neutral face, and that these emotion perceptions will predict the actual emotional disposition of the models. Below, we begin with a preliminary study that details our procedure for obtaining and validating our stimulus set. We also provide descriptive analyses on the models' own PANAS scores.

### PRELIMINARY STUDY: STIMULUS GENERATION

Our current research required that we generate a stimulus set of neutral faces for which we have corresponding self-reported emotion disposition ratings of the models. To do this, we obtained a set of 60 facial images that were captured from videos used in another study (see Huhnel et al., 2014). The photographs depicted White actors who varied in sex and age (30 young and 30 old; 15 of each gender/age group), all of whom also completed the Positive and Negative Affect Schedule (PANAS; Watson et al., 1988).

### Participants

This study was carried out in accordance with the recommendations of the Humboldt-Universität zu Berlin Psychology Department ethics committee. The models in this study were recruited through an internal participant database. All models gave informed written consent and were compensated with 10 Euros for their participation. Models were 30 older (65–94 years; M = 72.37 years, SD = 6.48; 15 male, 15 female) and 30 younger (20–30 years; mean age = 24.47, SD = 3.17; 15 male, 15 female) adults who were screened for neurological or psychiatric disorders. Male and female faces were of equivalent age within each age condition.

### Stimulus Preparation

Photographic frames were captured from dynamic video recordings that featured the models looking directly at the camera as they narrated events in their lives. The models were told to act naturally as they narrated answers to questions that were specifically designed to be as neutral in valence as possible, including questions regarding what they ate for breakfast, to describe their wake up routine, etc. The original videos varied in length, but were all approximately one minute. From these original videos of the actors, a trained assistant then selected a 20 s continuous segment that appeared to be the least expressive. From those shortened segments, co-author Dr. Ursula Hess, who is a gold standard rater (i.e., one of the original coders against whom new coders are tested for certification) in the Facial Action Coding system (FACS: Ekman and Friesen, 1978), selected the photographic frames that best perceptually represented each model's natural baseline display, selecting frames that also deviated as little as possible from a direct gaze pose. Selected photographs were then cropped and converted to gray-scale (see **Figure 1** for example images).

## FaceReader 6.1TM

fpsyg-07-00986 June 29, 2016 Time: 13:41 # 4

Because a major premise of the current work is that aging-cues in the face can resemble emotional cues, we also used the 6th version of the software FaceReader (Noldus, 2015) to objectively confirm the neutrality of our stimuli. FaceReader has been well-validated through its use in a growing number of psychological studies showing a high degree of convergent validity with ratings made by human FACS experts (den Uyl and van Kuilenberg, 2005). Its accuracy level in classifying eight emotions (including neutral) is at an average of 89 percent, higher than the rate of emotion recognition by most human subjects (see Lewinski et al., 2014).

FaceReader (version 6.1) models the face using over 500 points, which are based on over 10,000 images that have been manually annotated by experts. Using these points, the face is reconstructed into a virtual mask. An artificial neural network is utilized to estimate which of the six basic emotions (plus neutral and contempt) the face best represents at any given point. The same procedure is used when determining the actor's age, ethnicity, and sex, which are subsequently taken into account when the algorithm estimates the emotionality present on the face. This work is largely based on Paul Ekman's FACS (Ekman, 1992a,b,c).

FaceReader is proprietary commercial software. As such, it has a closed access to its code. However, FaceReader is well-developed, having been utilized in over 50 peer-reviewed publications to validate or enhance results, and spanning such diverse fields as psychology, marketing, and methodology. Having been trained on thousands of expressive faces, FaceReader works by detecting a face in an image, identifying 500 landmark points in the face, and then classifying the image according to how likely the emotion is present (or not) in the face (see van Kuilenburg et al., 2005 for a detailed algorithmic description of the FaceReader software). The output consists of coefficients that range from 0 to 1 for each image and for each emotion (including neutrality). Coefficients with higher values indicate a higher likelihood that the given face displays the given emotion (or neutrality).

In our images, young adult images were analyzed using FaceReader's general module and the elderly adult images were analyzed using FaceReader's elderly face module, which controls for age-related changes in facial appearance (e.g., wrinkles, folds in skin, and facial musculature sagging). To validate that our stimuli represent baseline neutral poses across all our experimental conditions, we conducted a 2 (age) × 8 (emotion) mixed design ANOVA using the coefficients yielded by FaceReader as the dependent variable and emotion as the within subjects factor. The second factor includes all eight expressive ratings (neutral, happy, sad, angry, surprised, fear, disgust, and contempt). We found a significant main effect of emotion, F(7,50) = 184.74, p < 0.001, η <sup>2</sup> = 0.99. No effects involving age were significant. So, next we ran a planned comparison of neutral against all other emotions, which revealed that overall the faces were perceived to be more neutral than expressive, F(1,59) = 389.9, p < 0.001, η <sup>2</sup> = 0.87. Direct comparisons between neutral and each of the seven emotions then revealed that neutrality in these faces, as coded by the FaceReader software, was the predominant display compared to all other possible emotions (all ts > 10, all ps < 0.001). Means and standard deviations of FaceReader's output for neutral, by condition is as follows: elderly Males: 0.76 (0.32) Elderly Females: 0.80 (0.29) Young Males: 0.83 (0.26) Young Females: 0.86 (0.20). As

indicative from the above coefficients, FaceReader scored all the faces, regardless of age group and sex, as appearing highly neutral. Further, FaceReader is able to predict actual age of faces with a high rate of accuracy. We used this to examine whether the faces here varied in age-related appearance across our gender conditions. FaceReader's predicted age and the models' actual age were highly correlated (r(58) = 0.76, p < 0.001), and critically neither varied across our gender conditions. From this we can conclude that our young and old models were matched across for actual and perceived age across gender conditions.

### PANAS Scores

All models filled out the 20-item PANAS twice, once before the filming took place, and once after. The 10 positive and 10 negative PANAS traits were then combined to create standardized measures of positive (PA) and negative (NA) affective states for each of the two time points. PANAS scores obtained at both time points correlated highly with one another, PA (r = 0.57, p < 0.001) and NA (r = 0.78, p < 0.001). Because this scale is a highly reliable trait measure (see Crawford and Henry, 2004 for extensive evaluation of this widely used instrument), we combined scores to best approximate each individual's central tendency in rated emotional dispositions. We then ran a 2 (age: young/old) × 2 (gender: male/female) × 2 (affective type: PA/NA) repeated measures ANOVA to examine any possible differences between the stimulus groups on PA/NA scores. The only effect to reach significance was a main effect of affect type, such that participants across all groups reported more positive (M = 29.37, SD = 2.98) than negative affect (M = 12.9, SD = 2.01), F(1,14) = 254.74, p < 0.001, η <sup>2</sup> = 0.95. Thus, the individuals in our four conditions (old men, old women, young men, and young women) did not vary in their self-reported dispositional affect. The final set of 60 photographs and PANAS scores were then used in the three studies reported below, and because variation in expressive resemblance of facial appearance was the primary focus, all analyses reported in these studies are at the items level.

### STUDY 1

The purpose of Study 1 was to examine whether independent ratings of our non-expressive models' faces on two scales, positive and negative affect, are positively associated with the models' own self-reported positive affect (PA) and negative affect (NA) as measured by the PANAS. Participants were specifically instructed to attend to expressive cues in the face. They were asked to rate how much each face displayed was currently expressing positive and negative affect. This was to ensure that our human raters were tuned to the emotion resembling aspects of facial appearance. Then we examined the association of these ratings with the models' self-reported emotion dispositions.

### Methods

#### Participants

Studies 1–3 were carried with the approval of Penn State's Institutional Review Board for Human Subject Research. All participants gave informed written consent before participating and were compensated with partial course credit for their participation. Undergraduate students enrolled in psychology classes were recruited via the departmental participant pool's online recruitment system. In all studies we recruited undergraduate students (typically ranging from 18 to 24 years of age) who were enrolled in psychology classes. Participants were recruited via the departmental participant pool's online recruitment system. For Study 1, forty participants (12 men) participated in the study in exchange for class credit. Twentyseven participants identified as White, 6 as Black, 2 as Latino, 4 as Asian, and 1 multiracial.

### Design and Procedure

The purpose of the study was described as an examination of perceptions of people's mental states based on their faces. After completing the informed consent process, participants were instructed to read the instructions carefully before beginning the survey. Instructions informed participants that they were to rate 60 faces one by one and urged them to go with their first instinct and to not deliberate excessively over any of the faces or of the expressive states they rated. In all three studies, images were displayed subtending a visual angle of approximately 7.6◦× 5.1. As is the case for all studies reported on this paper, participants completed the task in groups of up to six in order to maximize efficiency and reduce data collection time. However, each participant was assigned to a computer workstation and workstations were separated by partitions. First, participants were asked to rate the extent to which each of the 60 faces expressed positive and negative affect (on separate scales) using a scale ranging from "0" to "100," where 0 represented lowest degree of the type of affect and 100 represented the highest degree. Positive affect was defined as "a mood dimension that consists of specific pleasant or positive emotions." Negative affect, on the other hand, was defined as "the full spectrum of negative or unpleasant emotions." Each participant rated all 60 photographs and each photograph remained on the screen until both positive and negative affect ratings were made. The order of presentation of the stimuli was randomized and the presentation of the affect scales (positive versus negative affect) was counterbalanced across participants. Lastly, participants provided demographic information and were fully debriefed.

### Results

### Affective Perception of Faces

We first assessed the correlation between rated perceptions of positive and negative affect. Because the two scales correlated very highly (r = 0.96, p < 0.001), we reverse scored the negative affect scale to create a composite score that ranged from very negative (low numbers) to very positive (high numbers). We then conducted a 2(gender: male/female) by 2(age: young/old) factorial within-subjects ANOVA to examine differences of affective attributions among the groups. As predicted, there was a main effect of age, F(1,56) = 6.93, p = 0.01, η <sup>2</sup> = 0.11, such that elderly targets (M = 43.28, SD = 16.93) were rated as more negative/less positive than young targets (M = 54.57,

SD = 15.89), t(57) = 2.63, p = 0.01. No other effects reached significance.

#### Relation to Self-Reported PANAS Scores

Next, we computed correlations between the independent ratings of target facial expression and the targets' own self-reported affect using the composite PANAS scores. As predicted, independent ratings of perceived affect when viewing elderly target faces was positively associated with the targets' self-reported positive affect (r = 0.36, p = 0.05). This was not, however, the case for self-reported negative affect (r = –0.01, p = 0.97). Affective ratings of young adult faces predicted neither self-reported NA (r = 0.04, p = 0.84), nor PA (r = 0.03, p = 0.89). Finally, when analyzed separately for each gender, we found that the significant correlation between affective ratings of the elderly target faces and target self-reported PA scores was primarily carried by perceptions of elderly female faces (r = 0.59, p = 0.01); see **Figure 2**. Elderly male faces showed a positive association, but this did not reach significance (r = 0.18, p = 0.53).

### STUDY 2

Because positive/negative ratings scales were statistically conflated in Study 1, in Study 2 we had the same faces rated on the entire PANAS battery. The PANAS is a collection of 20 emotion items, half loading on positive and half on negative affect factors. This instrument was specifically designed, through the use of multiple item ratings, to differentiate and thus statistically separate positive/negative affect ratings. Thus, in this study we aimed to replicate and extend the findings in Study 1 by examining positive and negative affect ratings of the faces as distinct dimensions of perceived emotionality.

### Methods

### Participants

One hundred thirty one undergraduate-aged participants (36 men) participated in exchange for partial class credit. Five participants identified as White, 8 as Black, 5 as Latino, 18 as Asian, and 5 as multiracial.

#### Design and Procedure

Participants rated the extent to which the 60 faces expressed each the 20 emotion items included in the PANAS. Of the 20 affective states, 10 were positive in valence (e.g., interested, enthusiastic) and 10 were negative (e.g., distressed, scared). Due to the large number of state ratings, we collected data in three waves. Waves differed only in the stimuli presented to participants and each wave contained 20 of the 60 total stimuli. For each wave, stimuli selection was based on random assignment without stimulus replacement. Each wave was conducted subsequent to the end of the previous wave and the data collection period lasted less than a month. As in the previous study, the order of both presentation of the stimuli and of the emotional expression scales, was randomized but expression scales were presented on the same survey screen. Otherwise, instructions, procedures, and stimuli were identical to those used in Study 1.

### Results

#### Affective Perception of Faces

As prescribed for the PANAS, we averaged the scores of the 10 positive and 10 negative items to generate positive and negative affect scores. Both the positive (α = 0.95) and negative (α = 0.92) affect composites achieved high reliability. There was a high correlation between the two variables (r = –0.81, p < 0.001). Thus, we again reverse scored the negative affect ratings and combined them with the positive affect scores to create a single affect index. As before, these scores ranged from negative to positive (high scores indicate greater positivity). First, we conducted a 2(gender) by 2(age) factorial ANOVA on these scores. A marginally significant main effect of age emerged, F(1,56) = 2.95, p = 0.09. Consistent with the results of Study 1, the elderly faces (M = 51.12, SD = 7.70) received lower ratings of positive/higher negative affect than the younger faces (M = 54.69, SD = 8.30).

### Relation to Self-Reported PANAS Scores

First, we conducted correlations between attributions and selfreported affect by age group. Replicating the results of Study 1, ratings of positive affect for elderly targets were positively associated with self-reported PA (r = 0.42, p < 0.05). This association was not apparent for face ratings and self-reports of young targets (r = 0.04, p = 0.85). As in Study 1, when analyzing the data separately for each gender, we found that the significant association between perceptual ratings and selfreports of positive affect was primarily driven by elderly females (r = 0.62, p = 0.01); see **Figure 3**. Again, this association was positive for elderly male targets as well, but did not reach significance (r = 0.15, p = 0.59). No other correlations reached significance for any of the other target groups or affect type (positive or negative).

### STUDY 3

We aimed to replicate and extend this work to basic emotions (e.g., anger, fear, sad, and happy), as well as trait impression ratings known to be derived from emotion resembling features of the face (e.g., trustworthy, dominance; see Said et al., 2009; Adams et al., 2015). Because principal components analysis (PCA) revealed that both sets of items yielded a primary valence factor, we included composite results for each for comparison and conceptual continuity with the prior two studies. However, because these ratings are also widely used as independent predictors, we also report each item's association with the stimulus models' PANAS scores separately.

### Methods

#### Participants

Forty-two undergraduate-aged participants (15 men) enrolled in psychology classes participated in the study in exchange for class credit. Thirty participants identified as White, 2 as Black, 2 as Latino, 5 as Asian and 3 as multiracial.

#### Design and Procedure

Instructions, procedures, and stimuli used were identical to those used in the previous studies with the exception of the type of ratings participants performed. For this study, participants rated the extent to which each face appeared to express six primary emotions (happy, sad, joy, surprise, disgust, anger) and to possess five traits (affiliative, attractive, dominant, threatening, trustworthy). As in the previous study, presentation of the stimuli and ratings were randomized. As with all studies reported on this paper, participants made their ratings of the five emotions and five traits using a 0 "lowest degree" to 100 "highest degree" scale.

### Results

#### Affective Perception of Faces

We first examined correlations of all six primary emotions. With the exception of surprise, each emotion was highly correlated with all other emotions (all rs > ± 0.34, all ps < 0.01; see **Table 1A**). Given the high degree of intercorrelations between these emotions, with each associated with a clear positive or negative valence, we performed a PCA to determine whether valence was an explanatory factor. The scree-plot revealed a onefactor solution (i.e., only one factor emerged with eigenvalue greater than 1), which explained 71% of total variance. As expected, the component matrix yielded positive loadings for anger, disgust, fear, sadness, and a negative loading for joy on the principal component (see **Table 2A**). Thus, we converted all the emotion ratings to z-scores before reverse scoring the negative emotions and converting all five into one composite emotion score that, like in Studies 1 and 2, ranged in valence from negative to positive. We then ran a 2(gender) × 2(age) between subjects ANOVA on the distribution of z-scores. As expected, there was a main effect of age, F(1,56) = 9.82, p < 0.01, η <sup>2</sup> = 0.15, such that elderly targets (Z = –0.32) were rated as appearing less positive in their affect than were young targets (Z = 0.32), t(57) = 3.11, p < 0.01.

#### Relation to Self-Reported PANAS Scores

Next, we examined whether ratings of positive/negative emotions to the targets correlated with self-reported affect on the PANAS for each of the two age groups. As in Studies 1 and 2, the correlation between the emotion valence index and self-reported positive affect was significant for elderly targets (r = 0.37, p < 0.05), but not for young targets (r = –0.03, p = 0.87). Again, there was no association between the emotional valence index of faces and self-reported negative affect either for the elderly (r = 0.00, p = 0.99) or the young (r = 0.03, p = 0.87). Lastly, splitting the analyses by gender of target revealed once again that the significant association between positive/negative

emotion perceived from elderly faces and self-reported PA was driven primarily by elderly female targets (r = 0.58, p < 0.05); see **Figure 4** (see also **Table 3A** for independent correlations between each emotion and self-reported PANAS scores). Elderly male targets again showed a positive association, which did not reach significance (r = 0.20, p = 0.47).

#### Trait Perception of Faces

Because the emotions perceived in neutral faces have been directly implicated as influencing impression formation (e.g., Adams et al., 2012), we also performed a PCA on the five trait ratings using varimax rotation with Kaiser normalization (see **Table 1B** for intercorrelations of traits). The scree-plot revealed a two-factor solution (both factors with eigenvalues above 1). The first factor included items highly related conceptually to the construct of valence, and the second had items corresponding to a power/dominance dimension, another common factor found in the emotion and person perception literature (Todorov et al., 2008). The first factor explained 62% of total variance that formed a "positivity" dimension. The second factor explained 21% of the variance, included the other two traits (dominance and threatening) to form a "power" dimension (see **Table 2B**). Consequently, we z-scored and averaged corresponding traits to create an index of "positivity" and an index of "power."

A 2(gender) by 2(age) factorial ANOVA using the positivity index as the dependent variable yielded a significant main effect of age, F(1,56) = 8.26, p < 0.01, η <sup>2</sup> = 0.13, such that the elderly were seen as less positive (M = 24.25, SD = 5.55) than the young

#### TABLE 1 | Intercorrelations between Emotion (A) and Trait (B) ratings of stimulus items (Study 3).


<sup>∗</sup>p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001.

TABLE 2 | Principal components factor solutions for basic emotion (A) and trait impression (B) ratings (Study 3).


Because only one factor emerged for basic emotions there was no Varimax rotation. Trait extraction method involved Varimax with Kaiser Normalization. Variables loading on each factor are indicated in bold.

(M = 29.5, SD = 8.86). There was also a marginal main effect of gender, F(1,56) = 3.14, p = 0.08, η <sup>2</sup> = 0.05, such that female faces were rated as more positive (M = 16.42, SD = 8.68) than male faces (M = 15.25, SD = 6.55). These main effects were qualified by a significant interaction, F(1,56) = 4.35, p < 0.05, η <sup>2</sup> = 0.07. Simple effects comparisons showed that young women (M = 33.02, SD = 8.66) were rated greater in positivity than both elderly women (M = 23.97, SD = 6.09), t(57) = 3.51, p = 0.001 and young men (M = 25.98, SD = 7.81), t(57) = 2.73, p < 0.01.

Using the power index as the dependent variable in the factorial ANOVA yielded a main effect of sex, F(1,56) = 8.58, p < 0.01, η <sup>2</sup> = 0.13, and of age, F(1,56) = 7.13, p = 0.01, η <sup>2</sup> = 0.11, but no interaction, F(1,56) = 0.64, p = 0.43. Elderly targets (M = 30.98, SD = 9.56) were rated as more powerful than young targets (M = 24.78, SD = 9.49), t(57) = 2.67, p = 0.01, and men (M = 31.28, SD = 9.25) were rated as more powerful than women (M = 24.48, SD = 9.59), t(57) = 2.93, p < 0.01.

### Relation of Trait Ratings to Self-Reported PANAS Scores

Overall the correlation between the index of trait positivity and self-reported positive affect was not significant for elderly, r = 0.18, p = 0.33, or young,r = –0.10, p = 0.61, targets. However, conducting the analyses separately for each gender revealed a positive correlation between the trait positivity index and selfreported positive affect for elderly women, r = 0.60, p < 0.05

ratings of positive affect are the combined attributions of joy and reversed scored ratings of anger, fear, disgust, and sadness. Only the correlation for elderly female faces is significant, all other coefficients p > 0.05.

TABLE 3 | Correlations between emotion and trait ratings that loaded on the valence factor of the principal components analysis (PCA) and elderly female models' self-reported PA (Study 3).


†p < 0.1, <sup>∗</sup>p < 0.05, ∗∗p < 0.01.

(see **Table 3B** for independent correlations between each related trait and self-reported PANAS scores). No significant correlations were observed between the power index and self-reported PANAS scores for any of the target groups (all rs < ± 0.15).

### GENERAL DISCUSSION

Even though self-reported PANAS scores showed no age or gender differences in actual positive and negative affective styles for the models (Preliminary Study), across all three subsequent experimental studies we found a strong perception of more negative affect expressed by elderly faces compared to young faces. This bias was apparent for single item positive/negative ratings (Study 1), the full set of PANAS face ratings (Study 2), and emotion profile and trait impressions indices (Study 3). That elderly faces here were seen as expressing more negativity fits with what has been reported in the previous research (Malatesta et al., 1987a; Matheson, 1997; Hess et al., 2012) suggesting that age-related cues in the face (drooping around the eyes, wrinkles, folds) are at least partially confounded with emotion expressions, which in turn influence perception.

Importantly, the primary question addressed in the current research was whether these age-related appearance cues also convey accurate information about the target's actual emotional disposition. Studies 1–3 consistently revealed that perceptions of elderly faces did accurately predict the models' own selfreported affective dispositional styles, but only on the positive affect dimension of the PANAS, not on negative affect. In addition in all three studies it was ratings of elderly female faces that drove the effects. Critically, this effect was not apparent in ratings of young adult faces.

The previous related work (Malatesta et al., 1987a) only included perceptions of elderly female faces. Thus, the current work replicates this previous research while at the same time delimiting its generality to female stimuli. To our knowledge this is the first replication of this work since it was originally reported. The current results extend these previous findings in a number of important ways. First, we included young adult faces, which do not have age-related changes such as wrinkles. Across all studies, we found no evidence that these faces convey information that is diagnostic of self-reported emotional disposition, lending additional credence to the conclusion that the effects we found are due to age-related changes in the face. Because we had raters in each study focus specifically on expressive aspects of the faces to make their ratings, these findings are consistent with previous suggestions that there are age/emotion cue confounds. We further extended this work by including a sample of elderly men. Across all three studies, ratings of old male faces did not significantly predict self-reported emotionality, though the trend was in the same direction as that found for old female faces.

Helping explain these differences, a prevailing gender stereotype across cultures is that women are more "emotional" than men in that they are expected to feel and express emotions more than men (Brody and Hall, 2000; Shields, 2000). Directly relevant to the current work is evidence that genderbased expectations are particularly pronounced for emotional expression (Fabes and Martin, 1991), and that these stereotypes drive gender differences in emotional expression. Based on this, Fabes and Martin posited a deficit model of male expressivity, which essentially underscores the tendency for males to be stoic even in the face of intensely felt emotion. They suggest that while males may experience a similar amount of emotion as females, they are expected to suppress or inhibit their expression of it. They conclude (1991, p. 539): "With few exceptions, it appears that the stereotype that females are more emotional than males is based on the deficit model of male expressiveness (i.e., a belief that males do not express the emotions they feel)." In the same vein Shields (2005) describes emotional restraint as the culturally valued expressive mode for men, suggesting that even if emotions are shown, this should ideally be in a restrained form (Shields, 2005). The consequent reduced expressivity helps explain the lack of significant effects emerging for elderly men. If men are less likely to express emotion, expression is less likely to influence their aging cues in the face.

Whereas men are expected (and tend) to be less expressive, with a neutral mask being their default expression (Fabes and Martin, 1991; Fischer, 1993), women are expected to be emotionally expressive, particularly with regard to happiness, fear, and sadness (Fabes and Martin, 1991; Briton and Hall, 1995). There exists particularly strong evidence that women smile more often than men. Indeed, a study examining yearbook pictures shows that on average 80% of the women but only 55% of the men smiled (Dodd et al., 1999). These findings suggest that women may feel obliged to smile. A failure to smile socially may be met with disapproval, because it defies the affiliative role women are expected to adhere to (LaFrance et al., 2003). In fact, women expect more costs when not expressing positive emotions in an "other" oriented context (Stoppard and Gruchy, 1993) and are rated more negatively when they do not smile (Deutsch et al., 1987).

This same idea was well typified in the landmark paper "Perfidious Feminine Faces," by Bugental et al. (1971), who reported that children perceived verbal messages from their fathers, when delivered with a smile, as more positive than when delivered without a smile. Verbal messages from

their mothers, however, were perceived as no more positive when coupled with a smile than when not. This finding is consistent with the conclusion that smiling is the default expression for women. Thus, for women to convey genuine positivity presumably would require far more intense and frequent smiling behavior. Perhaps, then, it should be no surprise that even young adult women have higher smile lines when smiling, and thicker zygomaticus major muscles than men (measured using ultrasound: McAlister et al., 1998).

Critically, this smiling behavior in women has been found to predict real-life outcomes as well. In one study examining photographs of women in yearbooks, it was found that smiles by women that appeared more genuine (i.e., Duchenne smiles) predicted life satisfaction scores up to 25 years after the picture was taken (Harker and Keltner, 2001). Specifically, women whose smiles appeared more genuine were more achievement focused, organized, more approachable, and less susceptible to negative emotions than those who showed less. Overall we found that aging cues in elderly faces are misperceived as expressing more negative emotion compared to young faces. That we then found that ratings of elderly women's faces predicted positive affect is interesting, but not contradictory. Notably, both ratings of negative and positive expressivity in elderly women's faces predicted their self-reported positive affect. Thus variability across faces is due to both. Perceived negative emotion in elderly neutral faces predicted low self-reported positivity, and vice versa. Such age-related appearance across elderly female models' faces was highly predictive of accurate self-reported positive disposition across all three studies.

The tendency for women to smile more often, and presumably to be obliged to smile even more intensely to convey genuine positivity, may help explain why it was only dispositional positive affect that was predicted from their faces in the current studies, not negative affect, as smile related wrinkles may temper the negative-resembling expressions caused by aging in the face. If men are prone to suppress outward expressions, then their expressiveness would not have as noticeable impact on facial appearance over time. Likewise, if women smile so much more often than men, this too would presumably reveal itself in the face over time. Notably, elderly women were not rated as looking any more positive than elderly men (or young men and women) overall, but variation in positivity that was perceived in their faces did reveal actual self-rated dispositional positivity.

Although there are gender differences in overall expectations for emotional expression, these expectations also appear to be divided along the dimensions of dominance and affiliation (Adams et al., 2015). Women are expected to express more overall emotion in comparison to men with some notable exceptions. That is, men are expected to inhibit "weak" emotional expressions, but to express powerful emotions, such as anger and disgust (Fabes and Martin, 1991; Fischer, 2000). The current set of studies did not focus primarily on power-oriented emotions, and so perhaps missed meaningful facial cues that would have been more predictive for men. In this vein, we conducted a post hoc analysis of face ratings of anger and disgust in Study 3, which did predict self-reported hostility on the PANAS (ps < 0.05). These two effects, however, do not survive corrections for multiple tests, but they are suggestive that future work focused more on male-oriented "power" emotions might be a fruitful avenue for continued work in this domain when examining aging in the male face.

With recent evidence that perceiving emotion-resembling cues in otherwise neutral facial appearance serves as a powerful mechanism of impression formation, these results highlight new vistas of exploration in person perception. Malatesta et al. (1987a; p. 68) suggested that "the "misattributions" of decoders are also probably in part a consequence of the leakage of prepotent emotion response tendencies." It is also possible that the face carries with it emotional residue from one expressive experience to another that reveals current, previous, or chronic emotional states. Such insights could eventually help explain how it is that we can extract accurate perceptions of others just by viewing their faces. It has been shown, for instance, that people can accurately identify political-race winners (Todorov, 2005; Rule et al., 2010a), political affiliation (Rule and Ambady, 2010), business leaders' salaries (Rule and Ambady, 2011), sexual orientation (Rule et al., 2008), and religious affiliation (Rule et al., 2010b). Recently, Tskhay and Rule (2015) specifically implicate emotional processes underlying such effects, demonstrating that emotions are embedded in the very mental representations people have of certain social groups. Thus, if people are primed to signal or detect their social group members, emotional leakage or residue in the face may be means through which that information is gleaned, even if unintentionally.

In sum, there may be partial truth to the age-old warning "if you don't stop making that expression, your face will freeze that way!" Our findings suggest that at least over the course of a lifetime, expression can become etched into the folds and wrinkles of the face and become a stable part of a person's neutral visage, offering diagnostic information of their dispositions to others. While it is certainly possible that extraneous idiosyncratic features (genetics, tanning, or plastic surgery) may also influence the perception of a neutral face, we believe that expressions—a behavior present since birth—hold equal, if not more powerful influence over the evolution of the face as it ages. Cicero said it well: "The face is a picture of the mind with the eyes as its interpreter." Our findings suggest that even when presumably neutral, the face may well reveal more to the eyes than previously realized.

### AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

### ACKNOWLEDGMENT

Preparation of this manuscript was supported by a National Institute of Aging grant (Award # 1R01AG035028-01A1) to RK, UH, and RA, and a National Institute of Mental Health grant (Award # 1 R01 MH101194-01A1) To RA.

### REFERENCES

fpsyg-07-00986 June 29, 2016 Time: 13:41 # 12


responding. J. Exp. Soc. Psychol. 50, 136–143. doi: 10.1016/j.jesp.2013. 09.011



Zebrowitz, L. A., Kikuchi, M., and Fellous, J. M. (2010). Facial resemblance to emotions: group differences, impression effects, and race stereotypes. J. Pers. Soc. Psychol. 98, 175–189. doi: 10.1037/a0017990

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer LK declared a shared affiliation, though no other collaboration, with one of the authors UH to the handling Editor, who ensured that the process nevertheless met the standards of a fair and objective review.

Copyright © 2016 Adams, Garrido, Albohn, Hess and Kleck. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Through a glass darkly: facial wrinkles affect our processing of emotion in the elderly

### *Maxi Freudenberg1\*, Reginald B. Adams Jr.2, Robert E. Kleck3 and Ursula Hess1*

*<sup>1</sup> Department of Psychology, Social and Organizational Psychology, Humboldt-Universität zu Berlin, Berlin, Germany, <sup>2</sup> Department of Psychology, The Pennsylvania State University, University Park, PA, USA, <sup>3</sup> Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA*

The correct interpretation of emotional expressions is crucial for social life. However, emotions in old relative to young faces are recognized less well. One reason for this may be decreased signal clarity of older faces due to morphological changes, such as wrinkles and folds, obscuring facial displays of emotions. Across three experiments, the present research investigates how misattributions of emotions to elderly faces impair emotion discrimination. In a preliminary task, neutral expressions were perceived as more expressive in old than in young faces by human raters (Experiment 1A) and an automatic system for emotion recognition (Experiment 1B). Consequently, task difficulty was higher for old faces relative to young faces in a visual search task (Experiment 2). Specifically, participants detected old faces expressing negative emotions less accurately and slower among neutral faces of their peers than young faces among neutral faces of their peers. Thus, we argue that age-related changes in facial features are the most plausible explanation for the differences in emotion perception between young and old faces. These findings are of relevance for the social interchange with the elderly, especially when multiple older individuals are present.

Keywords: face perception, emotional expression, face age, signal clarity, visual search task

## Introduction

Would you rather approach someone who looks happy or mad? Because individuals live in and depend on groups, we take note of each other's emotional expressions and react in accordance with the information provided by them. Hence, we are likely to approach someone who shows happiness, as this emotion signals that the person is pleased with the current situation. In contrast, if we perceive anger, we probably do not wish to spend time with the angry other, and therefore avoid interaction. Thus, emotional expressions can serve as an important means of communication (Parkinson, 1996) and convey distinct social signals that can have important effects on the social behavior of others (Van Kleef et al., 2010; Hareli and Hess, 2012; Weisbuch and Adams, 2012). Yet, for the perceiver to benefit from the information conveyed by the emotional displays of others and to react with the most adaptive social behavior, these displays first have to be decoded successfully.

Overall, people are rather good at decoding facial expressions of emotion, especially with the highly prototypical ones used in most research to date (Tracy and Robins, 2008; Hess and Thibault, 2009). Nevertheless, the relationship between the encoder, the person who displays an emotion, and the decoder, the one interpreting the facial expression, can impact decoding accuracy. Thus, emotions expressed by members of the same culture (see Elfenbein and Ambady, 2002, for a meta-analysis) or social in-group members (Thibault et al., 2006) are better recognized. Even bogus

#### *Edited by:*

*Andrea Hildebrandt, Ernst-Moritz-Arndt Universität Greifswald, Germany*

#### *Reviewed by:*

*Caroline Blais, Université du Québec en Outaouais, Canada Martin Junge, Ernst-Moritz-Arndt University of Greifswald, Germany*

#### *\*Correspondence:*

*Maxi Freudenberg, Department of Psychology, Social and Organizational Psychology, Humboldt-Universität zu Berlin, Rudower Chaussee 18, 12489 Berlin, Germany maxi.freudenberg@hu-berlin.de*

#### *Specialty section:*

*This article was submitted to Perception Science, a section of the journal Frontiers in Psychology*

*Received: 13 April 2015 Accepted: 14 September 2015 Published: 01 October 2015*

#### *Citation:*

*Freudenberg M, Adams RB Jr., Kleck RE and Hess U (2015) Through a glass darkly: facial wrinkles affect our processing of emotion in the elderly. Front. Psychol. 6:1476. doi: 10.3389/fpsyg.2015.01476* group membership operationalized as apparent personality type affects decoding accuracy (Young and Hugenberg, 2010).

The relative ages of the encoder and decoder also play a role in emotion recognition. The literature suggests reduced decoding accuracy for old faces relative to young faces by young decoders, but also by older participants (see Fölster et al., 2014, for a review). One explanation for this is that old encoders are simply poorer communicators, as their aged facial muscles constrain the ability to express emotions intensely (Malatesta et al., 1987b). Even though older participants generally selfreport less expressive behavior for negative emotions, empirical evidence regarding emotional expressivity as a function of age is mixed (Gross et al., 1997). Age differences in emotional expressivity may have their origin in differences in emotional experience. Thus, when asked about the emotions they show in general, older participants report less negative emotions. For example, older nuns report less negative and more positive emotions relative to their younger peers (Gross et al., 1997). By contrast, when young, middle-aged and old women in another study were asked to recall a specific emotional experience, no age-related differences in the self-rated intensity level of their emotional arousal were found (Malatesta et al., 1987b). In addition, age is associated with increased emotional control (Gross et al., 1997). The authors distinguish between inner control, which targets emotional experience and external control, which regulates expressive behavior. They further argue that individuals learn over a lifetime of experience to effectively regulate primarily the inner experience of emotion.

If the lower decoding accuracy for older faces were an artifact of age-related changes in facial musculature or emotional reactivity, it should disappear when expressions are equated for intensity. However, the effect maintains when such artificial stimuli are used (Hess et al., 2012). The authors therefore propose that the difference in decoding accuracy is linked to the decreased signal clarity of older faces. Specifically, the complexity of facial features increases in the aging face due to age-related changes such as wrinkles and folds. Once developed, those features, independent of the situation or emotional context, can emphasize some emotional expressions but also obscure others. Hence, structural changes in the old face may blur the clarity of an emotional signal (Malatesta et al., 1987b). As a consequence, emotional displays in older faces are more ambiguous than in young faces, making it more difficult to identify the emotional state underlying the facial expression of older adults (e.g., Ebner and Johnson, 2009).

The present article aims to contribute to the understanding of the mechanisms that underpin differences in emotion perception as a function of face age in three experiments. Our general assumption is that the misattribution of emotions to elderly faces (Experiments 1A,B) creates impairment in emotion discrimination, particularly when multiple individuals are present (Experiment 2). To investigate emotion discrimination in young and old faces, researchers typically present facial expressions individually and ask participants to either judge the face using forced-choice scales or rate the emotion expression on multiple intensity scales. In Experiment 1A, we make use of the latter approach to complement findings on the misattribution of emotions in the elderly face. However, instead of presenting emotionally expressive faces, we investigate how emotion perception in solely neutral faces varies as a function of expresser age. In Experiment 1B, we collect emotion judgments on the same stimuli by a software tool for automatic facial expression recognition. For both procedures, we hypothesize that neutral faces of the elderly are perceived as less neutral than neutral young faces, because the wrinkles and folds of older faces will appear "emotional." If indeed the neutral faces of the elderly appear to convey emotions, then it should be the case that it is more difficult to distinguish an emotional old face from emotionally neutral old faces than to do the same task for young faces. A novel way to test this is to use a visual search task in which an emotional old or young face is embedded into an array of neutral faces of their peers. We do this in Experiment 2. An additional advantage of this paradigm is that participants are presented with a group of individuals. Thus, we can examine emotion discrimination in a way that is closer to real life situations, which provides more ecological validity.

In sum, as the correct identification of facial emotional displays is essential for a successful social life, the goal of the present article is to assess whether the impairment of emotion perception in elderly faces is due to a lack in signal clarity. The focus of Experiment 1, as a preliminary study, is to extend previous findings that aging degrades the signal value of emotional expressions, by examining whether the elderly faces are perceived as less neutral than young faces even in their neutral displays. This notion is explored more fully in that we complement findings on human ratings (Experiment 1A) with the results of an automated system for facial expression recognition (Experiment 1B). The focus then of Experiment 2, is to examine the effect of face age on emotion perception in a visual search task. If neutral old faces are perceived as more emotionally expressive, we predict that neutral expressions in older faces will be more distracting when the task is to find an emotional face within an array of neutral faces. Thus, we expect that older faces depicting happiness, anger or sadness will be identified more slowly and less accurately within old-age-groups relative to young faces in young-age-groups.

### Experiment 1A

Experiment 1A was designed to investigate how the age of a face influences the perception of emotions in neutral faces and show that neutral facial displays of the elderly are perceived as less neutral than neutral young faces. In this regard, we defined a neutral face as a facial expression that displays a neutral state, when no emotional expression is intended by the expresser. Despite this objective neutrality, participants are known to misattribute emotions to the elderly face, when a forced-choice format is applied (Malatesta et al., 1987a). For methodical reasons of assessing neutrality via a rating scale, we take an indirect route by asking participants to rate the expression intensity of the facial display on multiple emotion scales. If a face is perceived as indeed neutral, ratings on the emotion scales should be low. Specifically,

we expected participants to attribute more emotionality to the elderly relative to young individuals.

### Materials and Methods Participants

The study was carried out in accordance with the procedures approved by the Humboldt-Universität zu Berlin Psychology Department ethics committee. Potential participants were invited to take part in an online survey through a newsletter from the Humboldt-Universität zu Berlin. In the introduction to the online survey, interested volunteers were informed concerning the objectives and procedures of the investigation. Preservation of their anonymity was guaranteed through the data collection methods that were employed. Individuals were told that completing the survey would constitute their informed consent to participate in the study. Upon completion of the online survey, participants had the option to demand the deletion of their data and thus to withdraw from the study. No individual made use of this option. A total of 45 (10 men) participants, primarily students, completed the survey. They had a mean age of 24.84 years (*SD* = 5.17), ranging from 18 to 47 years.

### Materials

Stimuli were chosen from a larger database of color photographs of faces (Ebner et al., 2010) of female and male actors varying in age. The subset of images selected for the present experiment included only neutral expressions by 36 either young (19– 31 years) or old (69–80 years) men and women. Both sexes were equally represented. The order of presentation was randomized.

### Procedure and Dependent Variables

While seeing the neutral facial picture on a computer screen for as long as they wanted, participants rated the emotion expression on each of the following 7-point scales anchored between 0 ("not at all") and 6 ("very intense"): Anger, sadness, happiness, fear, disgust, surprise, and contempt. Although all images depicted individuals with neutral facial expressions, we expected the participants to perceive the expressions as emotional. Hence, participants were encouraged to inspect the images carefully as the face rating task would be harder for some faces than for others.

### Results and Discussion

To determine whether participants perceive older compared to younger neutral faces as more expressive as a function of face age, emotion rating and sex, mean intensity ratings for each scale were computed for each expresser type. A threeway analysis of variance (ANOVA), with face age (young vs. old), emotion (anger, sadness, happiness, fear, disgust, surprise, and contempt), and face sex (female vs. male) all as withinsubjects factors was conducted. Where Mauchly's test indicated the violation of sphericity, Greenhouse Geisser corrections were applied and degrees of freedom were rounded to the nearest integer.

As predicted, the analysis revealed a main effect of face age, *<sup>F</sup>*(1,44) <sup>=</sup> 25.56, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.37, such that overall, neutral old faces (*M* = 0.92, *SE* = 0.10) were rated as more expressive than neutral young faces (*M* = 0.77, *SE* = 0.10). However, this effect was qualified by a significant face age × sex × emotion interaction (see **Figure 1**), *<sup>F</sup>*(4,183) <sup>=</sup> 13.18, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.23. The main effects of sex and emotion as well as the face age × sex and face age × emotion interactions reached significance, but were also qualified by the three-way interaction [sex: *<sup>F</sup>*(1,44) <sup>=</sup> 12.74, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.23; emotion: *F*(3,117) = 30.03, *p <* 0.001, η<sup>2</sup> <sup>p</sup> = 0.41, face age × sex: *F*(1,44) = 9.64, *p* = 0.003, η2 <sup>p</sup> = 0.18, face age × emotion: *F*(4,155) = 12.96, *p <* 0.001, η2 <sup>p</sup> = 0.23].

To decompose the three-way interaction, we conducted simple effects analyses in the form of separate ANOVAs on the

individual emotion scales. For ratings of *happiness*, a main effect of face age emerged, *<sup>F</sup>*(1,44) <sup>=</sup> 90.36, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.67, such that participants rated neutral old faces (*M* = 0.68, *SE* = 0.07) as more intensely happy than neutral young faces (*M* = 0.28, *SE* = 0.06). For ratings of *anger* as well, a main effect of face age emerged, *<sup>F</sup>*(1,44) <sup>=</sup> 31.29, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.42, such that participants also attributed more anger to neutral old faces (*M* = 1.28, *SE* = 0.14) relative to neutral young faces (*M* = 0.85, *SE* = 0.14). However, there was also a significant main effect of sex, *<sup>F</sup>*(1,44) <sup>=</sup> 4.64, *<sup>p</sup>* <sup>=</sup> 0.04, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.10, which was qualified by a significant face age × sex interaction, *F*(1,44) = 17.26, *p <* 0.001, η2 <sup>p</sup> = 0.28. Whereas neutral old women were perceived as angrier than old men, *t*(44) = 3.45, *p* = 0.001, there was no difference between neutral young women and young men. For ratings of *contempt*, there was also a significant main effect of face age, *<sup>F</sup>*(1,44) <sup>=</sup> 6.981, *<sup>p</sup>* <sup>=</sup> 0.01, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.14, as well as a marginally significant main effect of sex, *F*(1,44) = 3.59, *p* = 0.06, η<sup>2</sup> <sup>p</sup> = 0.07. Both effects were qualified by a significant face age × sex interaction, *F*(1,44) = 4.97, *p* = 0.03, η<sup>2</sup> <sup>p</sup> = 0.10, such that again old women were perceived as more contemptuous relative to young women, *t*(44) = 2.99, *p* = 0.01, but also relative to old men, *t*(44) = 2.52, *p* = 0.02. Similarly, for ratings of *disgust*, there was a marginally significant main effect of face age, *F*(1,44) = 3.18, *<sup>p</sup>* <sup>=</sup> 0.08, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.07, and a significant main effect of sex, *F*(1,44) = 17.77, *p <* 0.001, η<sup>2</sup> <sup>p</sup> = 0.29, that were both qualified by a significant face age × sex interaction, *F*(1,44) = 5.76, *p* = 0.02, η2 <sup>p</sup> = 0.12. Again, neutral old women were perceived as more disgusted than young women, *t*(44) = 2.57, *p* = 0.01, or old men, *t*(44) = 4.86, *p < 0*.001. For ratings of *surprise*, a significant main effect of sex emerged, *<sup>F</sup>*(1,44) <sup>=</sup> 6.83, *<sup>p</sup>* <sup>=</sup> 0.01, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.13, that was qualified by a significant face age × sex interaction, *<sup>F</sup>*(1,44) <sup>=</sup> 27.47, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.38. In line with the other emotion ratings, participants attributed more surprise to neutral old women relative to young women, *t*(44) = 3.57, *p* = 0.001, or old men, *t*(44) = 5.86, *p <* 0.001. Contrariwise, old men were rated as less surprised relative to young men, *t*(44) = −3.83, *p <* 0.001. Furthermore, compared to young men, young women were perceived as less surprised, *t*(44) = −2.66, *p* = 0.01. For ratings of *fear*, neither the main effects of face age or sex nor the face age × sex interaction reached significance. As a last point, for ratings of *sadness*, there was a marginally significant effect of face age, *<sup>F</sup>*(1,44) <sup>=</sup> 3.51, *<sup>p</sup>* <sup>=</sup> 0.07, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.07, qualified by a significant face age × sex interaction, *F*(1,44) = 25.53, *p <* 0.001, η<sup>2</sup> <sup>p</sup> = 0.37. In contrast to all the other emotion ratings, old women were perceived as less sad relative to young women, *t*(44) = −4.48, *p <* 0.001, and old men, *t*(44) = −2.58, *p* = 0.01. Then again, more sadness was marginally attributed to neutral faces of old men relative to young men, *t*(44) = 1.68, *p* = 0.099. Furthermore, more sadness was attributed to neutral faces of young women compared to young men, *t*(44) = 3.52, *p* = 0.001. All other effects were not significant.

In sum, our results replicate other findings that neutral old faces are perceived as more emotionally expressive than are young faces (Hess et al., 2012), which is consistent with the conclusion that some of the wrinkles and folds that develop as a function of aging are perceived as emotional cues in the face. Given that participants judged neutral expressions, it is not at all surprising that the ratings on the individual emotions were generally low. In other research, when a forced-choice format was used in order to select a discrete emotional expression, accuracy rates for neutral facial displays were lower for old relative to young faces (Ebner and Johnson, 2009). This is in line with our overall finding that neutral old faces were perceived as more expressive than young faces. Further, for old faces Malatesta et al. (1987a) showed that most errors relate to the misattribution of sadness, contempt and anger. The same pattern occurred in our sample, but also extended to misattributions of these emotions in the young face. With regard to the individual emotion ratings, we replicated the finding that neutral faces of the old are perceived as angrier than neutral young faces (Hess et al., 2012), but also found higher intensity ratings for happiness for the elderly. Further, in the current sample, there was no difference between young and old neutral faces for fear. However, relative to young faces, more contempt, disgust, and surprise was misattributed to neutral faces of old women and also more sadness to faces of old men.

Though we are here advancing the notion that age-related changes in the face are responsible for how age of the face may affect emotional decoding alternative explanations need to be acknowledged (see Fölster et al., 2014, for a review). These include age-related differences in the production of facial expressions. However, as all faces depicted a neutral expression, less controllability of muscle tissues with age seems an unlikely explanation for the finding that human raters judge neutral faces of the elderly as more expressive than neutral faces of young individuals.

In Experiment 1A, people rated neutral faces of young and old individuals on emotion intensity. This procedure allowed us to investigate whether the elderly overall might be perceived as less neutral. However, one limitation of this approach is that age-related differences in a neutral expression are derived only indirectly through intensity ratings on discrete emotion scales. Hence, next it seems desirable to assess neutrality directly in form of a discrete value, which will be realized in Experiment 1B. Further, Experiment 1A does not address whether the misattributions to neutral faces of the elderly indeed result from age-related changes in the face or, alternatively, relate to stereotype knowledge. Such stereotypes have previously been reported to bias emotion perception (see Fölster et al., 2014, for a review). Thus, in Experiment 1B, we used an approach that is independent of stereotype knowledge by submitting the faces to an automated facial expression recognition system.

### Experiment 1B

### Materials and Methods

The same neutral stimuli as were used in Experiment 1A were submitted to the computer expression recognition toolbox (CERT; Littlewort et al., 2011). CERT is a software tool for automatic facial expression recognition. The program registers the intensity of different facial action units in individual images. Given these as inputs, CERT also creates probability estimates for prototypical facial emotion expressions. Correlations between the human ratings and probability estimates by CERT can be downloaded as Supplementary Material.

### Results and Discussion

**Figure 2** shows the probability estimates for each emotion and a neutral facial expression based on the registered facial action units in young and old faces of men and women by CERT. We conducted two-way independent ANOVAs on the probability estimates for each emotion and the neutral facial expression with face age and face sex as between factors.

As predicted, the analysis on the probability estimates for *neutrality* revealed a main effect of face age, *F*(1,32) = 45, *p <* 0.001, η<sup>2</sup> <sup>p</sup> = 0.58, such that old faces were coded as less neutral than young faces. As such, even the automated emotion recognition system attributes less neutrality to old faces relative to young faces, when the face actually depicts a neutral expression. It should be noted that the probability estimates for each emotion are mutually dependent such that a low probability estimate of a correct neutral facial expression consequently is linked to higher probability estimates in all remaining erroneous cases. Thus, in case of an old face relative to a young face, the computational analysis estimates the probability that the neutral face expresses *disgust*, *<sup>F</sup>*(1,32) <sup>=</sup> 9.22, *<sup>p</sup>* <sup>=</sup> 0.01, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.22, *sadness*, *<sup>F</sup>*(1,32) <sup>=</sup> 11.16, *<sup>p</sup>* <sup>=</sup> 0.002, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.26, and *happiness* as more likely, *<sup>F</sup>*(1,32) <sup>=</sup> 7.42, *<sup>p</sup>* <sup>=</sup> 0.01, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.19. However, the computational analysis also reveals that neutral young faces express *contempt* more than neutral old faces, *F*(1,32) = 5.27, *<sup>p</sup>* <sup>=</sup> 0.03, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.14.

Further, the main effect of face sex reached significance for the probability estimates in neutral faces of *fear*, *F*(1,32) = 4.85, *<sup>p</sup>* <sup>=</sup> 0.04, <sup>η</sup><sup>2</sup> <sup>p</sup> <sup>=</sup> 0.13, *surprise*, *<sup>F</sup>*(1,32) <sup>=</sup> 4.95, *<sup>p</sup>* <sup>=</sup> 0.03, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.58, as well as marginally for *happiness*, *F*(1,32) = 3.45, *p* = 0.07, η2 <sup>p</sup> <sup>=</sup> 0.10, and *anger*, *<sup>F</sup>*(1,32) <sup>=</sup> 3.93, *<sup>p</sup>* <sup>=</sup> 0.06, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.11. Specifically, it seems more likely that the neutral face of a women signals more fear, surprise and happiness, but also less anger than a man's neutral face. All other effects, inclusive of all age × sex interactions, were not significant.

The findings from Experiment 1A indicated a higher overall expressiveness in the neutral old face relative to the neutral young face. Experiment 1B offered a complementary finding, such that even an automated emotion recognition system ascribes neutral old faces less neutrality than neutral young faces. Alternatively, perceiving emotions in neutral faces may be considered as misjudgments. Comparing the specific types of errors toward neutral faces, similarities between the two empirical procedures emerge, such as the finding that human raters as well as CERT misattributed more happiness to the neutral faces of the elderly relative to young faces. Moreover, most age-related differences in the probability estimates by the automated recognition tool in Experiment 1B are in line with the human ratings in Experiment 1A, at least in subgroups by face sex, e.g., the higher disgust rating in old relative to young women's faces. However, some differences in the misattribution of emotions to the neutral face occurred as well between the two experimental designs. This suggests that the wrinkles and folds render the face more ambiguous. Given such an ambiguous face, it remains possible that stereotypical knowledge about the elderly may be used as a cue. However, age information alone does not lead to an activation of stereotypic traits (Casper et al., 2011). More importantly, age-related beliefs about emotion expression are highly heterogeneous (Montepare and Dobish, 2013). As such, divergent findings between the two experiments may also be linked to two other possibilities. First, the procedural variations and accompanying scope varied between the two experiments. In Experiment 1A participants judged intensity levels of emotions. Thereby, the different ratings on the emotion scales are rather independent from one another. In contrast, the automated assessment is similar to a recognition task. Hence, the probability estimates of emotions computed by CERT depend on each other in an additive relation. Secondly, both experiments indicated that face sex also influences emotion recognition, either by itself, or in interaction with face age. Generally, facial appearance, especially structural differences between male and female faces have been known to influence emotion perception, e.g., anger and happiness (Hess et al., 2004; Becker et al., 2007; Adams et al., 2015). In the course of aging, women are then more affected by age-related changes (see Albert et al., 2007, for a review). Specifically, women tend to develop more and deeper wrinkles in the perioral region than men due to differences in skin appendages, precisely fewer sebaceous glands, sweat glands, and blood vessels (Paes et al., 2009). Apart from wrinkles as the most prominent age-related change, loss of tissue elasticity and facial volume also follow age-related changes in bony support structure (see Albert et al., 2007, for a review). This might lead to an overall more concave look that might contribute to lower signal clarity, especially for old women. Then again, the CERT as an automated tool for emotion detection in faces was developed to register the activation pattern of facial muscles that are associated with emotion expression. However, individual raters are not confined to this and process a wider range of facial signals, even perhaps the hairstyle of an individual. In this vein, facial cues of dominance and affiliation are also known to affect emotion attribution in human raters (Hess et al., 2010). Thus, the fact that the computational analysis is restricted to differential activation of facial muscles, whereas human raters process facial characteristics in its entirety, might possibly explain the face age by sex interactions for human raters, which was not significant for the automated assessment.

We expected to find that participants would perceive neutral old faces as less neutral than neutral young faces. This is supported by the finding of an overall main effect of face age (Experiments 1A,B) such that faces of the elderly would be perceived as overall more emotional than those of young individuals. When it comes to the level of specific emotions, it can be argued, that not all discrete emotions reveal an effect of face age, apparently questioning our proposition of impaired signal clarity due to age-related changes in a face. However, one important thing to note is that we did not hypothesize emotion specific biases in the misattribution of emotions to neutral old faces for several reasons. First, the overlap between age-related changes in a face and expressive facial markers varies highly between distinct emotions. As such, emotions are more likely affected by misattributions, if the expressive markers of that emotion actually correspond to facial wrinkles and folds that develop in the course of aging, as for example the facial display of anger and frown lines on the forehead. In contrast, emotions like surprise and fear that are characterized by the opening of the mouth and eyes (Tomkins and McCarter, 1964) are less likely to be misperceived in young and old faces. Accordingly, neither the human rating nor the computational analysis revealed an age of face effect for fear to a neutral face. On a different note, facial wrinkles and folds are an especially idiosyncratic feature. Normally, in one's early 20 s fine facial lines start to emerge, for example horizontally across the forehead or vertically between the eyebrows (Albert et al., 2007). Given a certain age level, individuals naturally share the characteristic of manifested facial wrinkles in general. Yet, differences in the specific type of lines are apparent between individuals, as for example the nasolabial groove, glabellar lines, or crow's feet. In fact, there is even evidence for some linkage between the emotion traits of the elderly and emotion resembling aging wrinkles and folds (Malatesta et al., 1987a).

In sum, despite some differences between the human ratings and the computational analysis in the specific misattributions to old and young faces that were found, both procedures found that neutral faces of the elderly are perceived as less neutral and thus more emotionally expressive than faces of young individuals. This renders stereotypical expectations as the sole underlying mechanism unlikely and supports our principal hypothesis that agerelated changes in a face impair the signal clarity of emotion expression.

## Experiment 2

In Experiments 1A,B, we found that neutral old faces resemble emotional faces more than do young faces, probably as a function of age-related changes in facial appearance. This difference should impair performance in a visual search task, where participants have to find an emotional face among multiple neutral faces presented concurrently. Specifically, if the neutral faces of the elderly are perceived as more expressive, as was the case in Experiment 1, this should serve to increase the level of difficulty in a visual search task. As a variation on the standard paradigm employed in this sort of research, participants not only had to decide whether one expression is different from the others or whether all are the same, but were asked to locate the position of the one emotional stimulus among neutral faces of the same age and sex group. This strategy removes the need for non-target trials and thus reduces the number of trials a participant must complete. It also allows us to know which specific expression was perceived to be emotional. As all neutral expressions can be seen as expressing some, low level, of emotion, this reduces the risk of false positives. We then compared response times and accuracy rates to determine, whether one stimulus type is recognized faster than another as a function of the target emotion. The literature on visual search paradigms of this sort so far shows that discrepant targets are found more quickly among neutral distractors than among emotional ones (Pinkham et al., 2010). Further, search efficiency in visual search tasks is improved the more similar the distractor items are and the more the target item deviates from distractor items (Duncan and Humphreys, 1989). Hence, as neutral old distractor faces appear less neutral and resemble emotional states more (Experiments 1A,B), we predict better overall expression identification for young faces. Thus, we predict for Experiment 2 that finding a young emotional face among

multiple young neutral faces will be faster and more accurate than finding an old emotional face among multiple old neutral faces.

### Materials and Methods

### Participants

A total of 51 volunteers from the Berlin area were tested in groups of up to six. The study was approved by the Humboldt-Universität zu Berlin Psychology Department ethics committee. After verbal consent was given, participants worked on individually assigned computer workstations, separated by partitions. Subsequent to the experiment, participants were given feedback on their individual performance plus a short presentation on the current state of research as well as the object of this investigation. Participants were then asked for their written permission to use their data. Four participants refused permission and their data were deleted. Hence, the final sample consisted of 19 men and 28 women ranging in age from 14 to 65 years with a mean age of 35 (*SD* = 13.93) years. Data from an additional four participants were discarded from the analysis due to an error rate higher than 25% of trials in at least one of the three emotion blocks.

### Materials

As in Experiment 1, images were chosen from the FACES database (Ebner et al., 2010). The stimulus set for the visual search task consisted of 144 photos from 36 identities (nine per age and sex group), each displaying a happy, angry, sad, and neutral facial expression. Participants saw an equal number of young (19–31 years) and older (69–80 years) male and female faces.

### Design and Dependent Variables

Pictures were presented and responses recorded via E-Prime 2.0 software. To create a realistic "group," 9 images of multiple identities from the same sex and age group were presented simultaneously on a 3 by 3 matrix for each trial. Within the 3 by 3 matrix, the target face was equally often presented at each position. All distractor faces showed neutral expressions. Overall, there were three blocks of trials that differed in the target emotions (happiness, sadness, or anger). Block presentation was counterbalanced. Within each emotion block, trials varied randomly in terms of target sex and age, such that each target group appeared nine times per emotion block. This resulted in a 3 (target emotion) × 2 (target age) × 2 (target sex) × 9 (target position) design. At the beginning of each emotional block, participants were informed about the type of target emotion, e.g., "Where is the happy face?" and completed four practice trials with different actors than those used in the main experiment. These practice trials had the purpose of introducing the task design and target emotion to the subjects. Following each practice trial, the respective emotion block was completed. Responses were made by moving and clicking the computer mouse. Participants were told to fixate a cross prior to each trial, which was displayed in the middle of the screen for 500 ms, and were encouraged to react as fast as possible after detecting the emotional target face.

### Results

To determine whether young target faces were recognized more quickly and more accurately as a function of target sex and emotion expression than old faces, mean accuracies and log transformed response times were computed for each combination of target age, target sex and target emotion. For response time, only correct responses above 200 ms were log transformed and included in the analyses. However, raw values are given in the text and in the figures to facilitate interpretation of the data. Response time and accuracy were then analyzed in separate three-way analyses of variance (ANOVA), with target age (young, old), target sex (male, female), and target emotion (happy, angry, sad) as within-subjects factors. Where Mauchly's test indicated the violation of sphericity, Greenhouse Geisser corrections were applied and degrees of freedom were rounded to the next integer.

It should also be noted, that in an initial analysis, response time outliers, defined as deviating more than 3 SD from an individuals' mean (1.7% of trials), were excluded. However, both analyses revealed the same effects and thus, the statistic values are shown for response times including those outliers.

### Accuracy

Overall, participants were very good at locating the emotional target face (*M* = 98%, *SD* = 2.40). A main effect of emotion, *<sup>F</sup>*(2,84) <sup>=</sup> 9.16, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.18, emerged, such that decisions were made more accurately for angry (*M* = 98%, *SE* = 0.57) and happy targets (*M* = 99%, *SE* = 0.19) relative to sad targets (*M* = 96%, *SE* = 0.78). As predicted, a significant main effect of target age, *<sup>F</sup>*(1,42) <sup>=</sup> 12.37, *<sup>p</sup>* <sup>=</sup> 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.23, emerged. This main effect was qualified by a significant target age × sex × emotion interaction, *F*(2,67) = 3.67, *p* = 0.04, η2 <sup>p</sup> <sup>=</sup> 0.08 (see **Figure 3**). No other effect was significant. Simple effects analyses were conducted in the form of separate ANOVAs on angry, sad and happy face blocks. When the target face expressed anger, decisions were made more accurately on young (*M* = 99.36%, *SE* = 0.42) compared to old faces (*M* = 97.31%, *SE* <sup>=</sup> 0.81), *<sup>F</sup>*(1,42) <sup>=</sup> 12.47, *<sup>p</sup>* <sup>=</sup> 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.23. Neither the main effect of sex nor the interaction between target age and sex reached significance for angry targets. For sad and happy targets, no main effect of target age or sex emerged, but a marginal significant interaction between target age and sex was found [sadness: *<sup>F</sup>*(1,42) <sup>=</sup> 3.45, *<sup>p</sup>* <sup>=</sup> 0.07, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.08; happiness: *<sup>F</sup>*(1,42) <sup>=</sup> 3.84, *<sup>p</sup>* <sup>=</sup> 0.06, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.08]. Post hoc paired *t*-tests revealed that sadness was recognized more accurately on young men (*M* = 98%, *SD* = 4.95) compared to old men (*M* = 95%, *SD* = 9.40), *t*(42) = −2.492, *p* = 0.02. In line with the age trend, happiness was also recognized more accurately on young women (*M* = 100%, *SD* = 0.00) compared to old women (*M* = 99%, *SD* = 3.57), *t*(42) = −2.351, *p* = 0.02.

Thus, compared to old target faces, young target faces expressing anger were detected more accurately. Findings regarding sad and happy facial expressions were less consistent, but when there were age differences, young target faces were detected more accurately than old target faces. Therefore the hypothesis that emotional expressions are perceived more

accurately in the young relative to the old face appears to receive strong support in terms of the accuracy criterion.

### Response Time

The analysis of the log transformed response time revealed a main effect of emotion, such that in a group of neutral faces happy targets (*M* = 1,623, *SE* = 67.13) were detected earlier than angry faces (*M* = 2,883, *SE* = 128.00), which were recognized faster than sad targets (*M* = 3,583, *SE* = 220.30), *F*(2,84) = 190.36, *p <* 0.001, η<sup>2</sup> <sup>p</sup> = 0.82. The main effects of age and sex, as well as all two-way interactions were qualified by the three-way interaction between target age, sex and emotion, *F*(2,84) = 19.49, *p <* 0.001, η2 <sup>p</sup> = 0.32. Specifically, the difference in response time for young and old targets varied as a function of target sex and emotion (see **Figure 4**) [age: *<sup>F</sup>*(1,42) <sup>=</sup> 96.73, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.70; sex: *F*(1,42) = 38.51, *p <* 0.001, η<sup>2</sup> <sup>p</sup> = 0.48; age × sex: *F*(1,42) = 24.90, *p <* 0.001, η<sup>2</sup> <sup>p</sup> = 0.37; age × emotion: *F*(2,84) = 30.81, *p <* 0.001, η2 <sup>p</sup> <sup>=</sup> 0.42; sex <sup>×</sup> emotion: *<sup>F</sup>*(2,84) <sup>=</sup> 27.83, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.40).

Simple effects analyses were conducted. A 2(age) × 2 (sex) ANOVA on angry faces revealed main effects for target age and sex, modified by the target age × sex interaction, *F*(1,42) = 20.37, *p <* 0.001, η<sup>2</sup> <sup>p</sup> <sup>=</sup> 0.33 [age: *<sup>F</sup>*(1,42) <sup>=</sup> 66.03, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.61; sex: *<sup>F</sup>*(1,42) <sup>=</sup> 76.95, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.65]. Follow up paired *t*-tests revealed, that young angry faces (men: *M* = 2,647, *SD* = 716; women: *M* = 2,344, *SD* = 675) were detected faster than old angry faces (men: *M* = 3,748, *SD* = 1,340; women: *M* = 2,793, *SD* = 954) and this effect was larger for male faces [male target: *t*(42) = 8.933, *p <* 0.001; female targets: *t*(42) = 4.230, *p <* 0.001]. Also, women's angry faces were detected faster than men's angry faces and this effect was larger for old faces [old face: *t*(42) = −9.207, *p <* 0.001; young face: *t*(42) = −3.457, *p* = 0.001].

The two-way ANOVA on sad faces revealed a main effect for target age, *<sup>F</sup>*(1,42) <sup>=</sup> 52.87, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.56, such that young sad faces were detected earlier than old sad faces. Although the main effect of target sex was not significant, the interaction between target age × sex indicated that the difference in response time between female and male sad targets varied as a function of age, *<sup>F</sup>*(1,42) <sup>=</sup> 26.90, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.39. Whereas correct decisions for sad old women (*M* = 3,569, *SD* = 1,411) were made faster than for sad old men (*M* = 4,395, *SD* = 1,961), *t*(42) = −4.972, *p <* 0.001, correct decisions for sad young women (*M* = 3,324, *SD* = 1,674) were made less rapidly than for sad young men (*M* = 3,043, *SD* = 1,158), *t*(42) = −2.949, *p* = 0.005.

The two-way ANOVA on happy faces, revealed no significant main effects of target age or target sex. However, there was a significant interaction between target age and sex, *F*(1,42) = 7.78, *<sup>p</sup>* <sup>=</sup> 0.008, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.16. Whereas there was no difference in mean response time for old happy women (*M* = 1,650, *SD* = 425) compared to young happy women (*M* = 1,613, *SD* = 465), *t*(42) = 0.829, *p* = 0.412, happy old men (*M* = 1,571, *SD* = 488) were detected even faster than happy young men (*M* = 1,659, *SD* = 499), *t*(42) = −2.854, *p* = 0.007. Furthermore, old happy men were detected earlier than old happy women, *t*(42) = 2.332, *p* = 0.025, with no such difference for young happy targets, *t*(42) = −1.424, *p* = 0.162.

Thus, compared to old target faces, young target faces expressing anger or sadness were detected faster. However, this was not the case for happy facial expressions. Therefore, the hypothesis that emotional expressions are perceived faster in the young relative to the old face appears to receive strong support in terms of response time only for negative emotions.

#### Discussion

The primary purpose of Experiment 2 was to assess whether face age impairs the recognition of emotional targets among a neutral set of faces of same sex peers in a visual search task. Our results suggest that emotional young faces are more quickly detected

than emotional old faces, as reflected by overall faster response times for an emotional young face among neutral distractors relative to an emotional old face among neutral old distractors. In addition, participants were also better at identifying angry and sad targets in young neutral groups relative to old neutral groupings. However, the age of the face did not impair the identification of a happy face consistently, but research has shown that happiness is the most easily decoded expression.

Despite high accuracies across all three facial expressions, the difference in response time varied considerably. Ever since Hansen and Hansen (1988) reported that angry facial expressions are found more efficiently than happy facial expression, this design has been a focus of contention. The original study suffered from methodological confounds, but the implications of other methodological choices in this design have been contentious as well (Purcell and Stewart, 2010; Becker et al., 2011; Craig et al., 2014). Criticism has focused mainly on the degree to which the results of these studies represent responses to the displayed emotional expression *per se*, or rather are driven by low-level visual features that are unrelated to facial affect (Pinkham et al., 2010; Purcell and Stewart, 2010; Craig et al., 2014). In all, there is mixed evidence whether happy or angry facial expressions are processed more efficiently (see Shasteen et al., 2014; Lipp et al., 2015). Nonetheless, we decided to use this paradigm in the present research. To that effect, it should be noted that we were not interested in assessing whether affectively positive or negative stimuli are processed preferentially. Rather, we were interested in differences in emotion recognition as a function of face age. As happiness can be characterized by a single salient feature anything but a fast and accurate emotion recognition would be surprising (Adolphs, 2002). Thus, as noted above, it is likely that the low task difficulty for happy faces reduced the effect of face age. Specifically, the open smile is a salient feature that is easily discernable among the distractor faces without a detailed analysis of other facial features. Consequently, there is no search advantage for young relative to old faces in the emotion recognition of happy faces. And this is what we found. However, when multiple facial features have to be extracted, as in sad and angry facial expressions, the level of task difficulty, and consequently the influence of distractor faces increases. We argue that neutral young faces can easily be grouped together as distractors, whereas neutral old faces need additional processing resources as facial morphological features have to be discriminated from facial expressive features.

Apart from low-level features as a source of pop-out effects, the literature on visual search tasks also suggests that familiarity influences visual search performance. A slowed search occurs especially when attention is captured by unfamiliar distractors (Wang et al., 1994; Malinowski and Hübner, 2001; Shen and Reingold, 2001). Thus, the level of contact with the elderly could account for age-related differences in visual search performance. Also, a differential motivation to respond to young relative to old individuals might explain differences in emotion recognition between young and old faces (Malatesta et al., 1987b; Lamy et al., 2008). However, as our sample included both younger and older individuals neither explanation seems satisfactory.

The identification of the target emotion is also influenced by expectations. This is why standard visual search designs usually just demand a decision as to whether all faces are the same or one is different. One might therefore argue that informing participants about the target stimulus' type of emotional expression before the search, serves as artificial priming that does not occur in real-world settings (Pinkham et al., 2010). However, emotion perception in an interpersonal situation does not occur in a social vacuum. Instead, there might be more information on the emotional state of the interactional partner, as, e.g., voice tone (Kappas et al., 1991), knowledge about private matters of the other or even earlier emotional interactions with that person (Hess and Hareli, 2014). Those cues regarding the likely emotional state of others are typically available to the perceiver before emotional features in the face are noticed. Yet, although the repetition of the same emotion on successive trials speeds search (Lamy et al., 2008), the variation of target age within an emotion block makes emotional priming also an unlikely explanation for differences in emotional target discrimination in young and old face groups.

As slowness is a quality stereotypically associated with elderly people and the activation of elderly stereotypes can result in behavior in line with this stereotype such as slower walking speed (Bargh et al., 1996), the presentation of older target faces might activate such stereotypes and therefore prime slower responses in general. However, in our design the target age switched randomly from trial to trial to avoid accumulated priming effects. Further, no such slowing influence of target age was found for happy facial expressions, which also supports our interpretation focused on task difficulty.

One could question the practical significance of these results, especially the size of the age-related differences in the accuracy rates. Generally, a mean accuracy rate of 98% seems very good. On the other hand, the task difficulty was also comparably low. Without time restriction, participants simply had to pick an emotional face among neutral faces. This was further facilitated in that all emotional photographs depicted individuals with a high level of emotion intensity. It should be noted that, participants made more mistakes in the case of older relative to younger targets and were more than a second slower for the old targets in correct trials. Imaging a situation that is closer to real life, the emotional display would probably appear less clear due to a lower emotion intensity and may occur in a more ambiguous context.

In sum, whereas previous studies on impaired emotion identification in the old face used individually presented still photos or videos with single actors, we investigated emotion recognition in a simulated social context. In each display, only one identity varied from the same sex peers by emotional content, whereas all faces differed from one another by virtue of face identity. Thus, distractor faces offered great heterogeneity and increased the difficulty of finding the discrepant stimulus. In such a design, we found slowed and less accurate visual search to detect emotional old faces relative to young faces for negative emotions. We conclude that the visual properties in the aged face, specifically the presence of wrinkles and changes in facial appearance that are age-related, are the most plausible explanation for this pattern of results.

### General Discussion

The current article addressed in three experiments how the age of a face impairs the perception of emotional expressions. Experiment 1A,B revealed that misattributions of emotions to neutral faces are more likely for old stimuli than for young individuals. Specifically, participants rated neutral old faces not only as angrier and, for male faces, as sadder than neutral young faces, but also as happier. Further, Experiment 2 tested whether this perceived "emotionality" in old faces hindered emotion identification in a visual search task. In fact, we found a search disadvantage for emotional old faces among neutral same-sex peers, likely due to the attentional capture by ambiguous facial displays, especially for negative emotions.

Setting both experiments in context, more anger was attributed to neutral faces of old individuals in the human ratings and likewise participants spent more time and made more errors in trials when they had to find an angry old face among a set of same-aged neutral faces. Simultaneously, more sadness was attributed to neutral faces of old men and sad facial expressions were found less accurately relative to young men in groups of the same sex and age. With regards to response time, participants were generally slower for old compared to young faces.

Our results strike us as especially interesting in light of the large age range of the perceivers. Specifically, we regard our heterogeneous sample as advantageous, particularly for Experiment 2 with an age up to 65 years. This allows us to conclude that the differences in emotion perception in young and old faces described in the current article did not result from an own-age advantage by young participants, as could be argued. Moreover, less contact with the elderly that may lead to deficits in emotion decoding also affects our findings less than would be the case with a student sample. For a further discussion on how the age congruence between an observer and face might influence facial expression decoding, see Fölster et al. (2014).

A promising issue for future research regards the configural processing of faces. An increased motivation to process ingroup compared to outgroup faces explains, in part, the ingroup advantage in expression identification (Thibault et al., 2006; Young and Hugenberg, 2010). Young and Hugenberg (2010) further argue that configural processing of ingroup faces drives this advantage, which disappears after face inversion. Given that the recognition of happy faces is unaffected by face inversion (McKelvie, 1995; but see also Leppänen and Hietanen, 2007), it seems reasonable to pose the question, whether the distracting wrinkles in the elderly face lead to a more feature-by-feature processing of the old face. If this were the case we would indeed expect happy faces to be spared. Hence, it would be interesting to investigate whether impaired recognition in still photos of old relative to young faces not only occurs in upright, but also inverted faces, where configural processing of young face would also be disturbed.

As noted earlier, the fact that emotional signals by older individuals are perceived with less clarity has potential implications for our social lives. Generally, it seems critically relevant to the quality of social interactions to reduce misinterpretations of emotional expressions. Emotions occur between people (Fischer and van Kleef, 2010) and influence social relationships. The accurate perception of emotion displays and emotional states helps to coordinate and facilitate interpersonal interaction and communication (Keltner and Haidt, 2001; Niedenthal and Brauer, 2012) and provides the necessary "affective glue" between individuals (Feldman et al., 1991). Given the literature on the relationship between loneliness and depressive symptoms in the elderly (see Hawkley and Cacioppo, 2010, for review), the impaired recognition of facial emotions expressed by elderly people is especially problematic, because the relationship between loneliness and depression in the elderly is mediated by social support (Liu et al., 2014). Following this line of argument, a lack of signal clarity in the elderly face can result in emotional misunderstandings and hence dysfunctional

behavior from the environment, such that the intended social support will not be perceived as supportive.

In sum, the present research provides further evidence for the notion that the emotional expressions of the elderly may easily be misunderstood. Specifically, because neutral expressions already seem "emotional" two problems may occur. On one hand, elderly people may be perceived as expressing (negative) emotions when in fact they are not, and conversely, their (negative) emotions may not be perceived as such when shown. This may be the case especially when surrounded by others. This has implications for everyday life, especially in contexts where many elderly people are present, such as in nursing homes.

### References


### Acknowledgment

Preparation of this manuscript was supported by grant 1R01AG035028 – 01A1 from the National Institute of Aging to RK, UH, and RA. Jr.

### Supplementary Material

The Supplementary Material for this article can be found online at: http://journal*.*frontiersin*.*org/article/10*.*3389/fpsyg*.* 2015*.*01476


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Freudenberg, Adams, Kleck and Hess. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*