**TOWARDS AN EMBODIED SCIENCE OF INTERSUBJECTIVITY: WIDENING THE SCOPE OF SOCIAL UNDERSTANDING RESEARCH**

**Topic Editors Ezequiel Di Paolo and Hanne De Jaegher**

#### *Frontiers Copyright Statement*

*© Copyright 2007-2015 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-529-9 DOI 10.3389/978-2-88919-529-9

# About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **TOWARDS AN EMBODIED SCIENCE OF INTERSUBJECTIVITY: WIDENING THE SCOPE OF SOCIAL UNDERSTANDING RESEARCH**

Topic Editors:

**Ezequiel Di Paolo,** Ikerbasque and University of the Basque Country, Spain and University of Sussex, UK

**Hanne De Jaegher,** University of the Basque Country, Spain and University of Sussex, UK

An important amount of research effort in psychology and neuroscience over the past decades has focused on the problem of social cognition. This problem is understood as how we figure out other minds, relying only on indirect manifestations of other people's intentional states, which are assumed to be hidden, private and internal. Research on this question has mostly investigated how individual cognitive mechanisms achieve this task. A shift in the internalist assumptions regarding intentional states has expanded the research focus with hypotheses that explore the role of interactive phenomena and interpersonal histories and their implications for understanding individual cognitive processes.

After the fireworks, San Sebastián, 2010, © Ezequiel Di Paolo

This interactive expansion of the conceptual and methodological toolkit for investigating social cognition, we now propose, can be followed by an

expansion into wider and deeply-related research questions, beyond (but including) that of social cognition narrowly construed.

Our social lives are populated by different kinds of cognitive and affective phenomena that are related to but not exhausted by the question of how we figure out other minds. These

phenomena include acting and perceiving together, verbal and non-verbal engagement, experiences of (dis-)connection, management of relations in a group, joint meaningmaking, intimacy, trust, conflict, negotiation, asymmetric relations, material mediation of social interaction, collective action, contextual engagement with socio-cultural norms, structures and roles, etc. These phenomena are often characterized by a strong participation by the cognitive agent in contrast with the spectatorial stance typical of social cognition research. We use the broader notion of embodied intersubjectivity to refer to this wider set of phenomena.

This Research Topic aims to investigate relations between these different issues, to help lay strong foundations for a science of intersubjectivity – the social mind writ large.

To contribute to this goal, we encouraged contributions in psychology, neuroscience, psychopathology, philosophy, and cognitive science that address this wider scope of intersubjectivity by extending the range of explanatory factors from purely individual to interactive, from observational to participatory.

**Citation:** Di Paolo, E., De Jaegher, H., eds. (2015). Towards an Embodied Science of Intersubjectivity: Widening the Scope of Social Understanding Research. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-529-9

# Table of Contents



Nicole Rossmanith, Alan Costall, Andreas F. Reichelt, Beatriz López and Vasudevi Reddy

*145 Embodied intersubjective engagement in mother–infant tactile communication: a cross-cultural study of Japanese and Scottish mother–infant behaviors during infant pick-up*

Koichi Negayama, Jonathan T. Delafield-Butt, Keiko Momose, Konomi Ishijima, Noriko Kawahara, Erin J. Lux, Andrew Murphy and Konstantinos Kaliarntas

*158 Playful expressions of one-year-old chimpanzee infants in social and solitary play contexts*

Kirsty M. Ross, Kim A. Bard and Tetsuro Matsuzawa

*171 Putting the "joy" in joint attention: affective-gestural synchrony by parents who point for their babies*

David A. Leavens, Jo Sansone, Anna Burfield, Sian Lightfoot, Stefanie O'Hara and Brenda K. Todd

*178 Proximity and gaze influences facial temperature: a thermal infrared imaging study*

Stephanos Ioannou, Paul Morris, Hayley Mercer, Marc Baker, Vittorio Gallese and Vasudevi Reddy


Tom Froese, Hiroyuki Iizuka and Takashi Ikegami

*209 Quantifying long-range correlations and 1/f patterns in a minimal experiment of social interaction*

Manuel G. Bedia, Miguel Aguilera, Tomás Gómez, David G. Larrode and Francisco Seron

*221 Assessing embodied interpersonal emotion regulation in somatic symptom disorders: a case study*

Zeynep Okur Güney, Heribert Sattel, Daniela Cardone and Arcangelo Merla


*275 Emotion in languaging: languaging as affective, adaptive, and flexible behavior in social interaction*

Thomas W. Jensen


John Z. Elias and Kristian Tylén


# Toward an embodied science of intersubjectivity: widening the scope of social understanding research

# *Ezequiel A. Di Paolo\* and Hanne De Jaegher*

*Logic and Philosophy of Science, IAS-Research Centre, University of the Basque Country, Donostia/San Sebastián, Spain \*Correspondence: ezequiel@sussex.ac.uk*

## *Edited and reviewed by:*

*Eddy J. Davelaar, Birkbeck College, UK*

**Keywords: intersubjectivity, social interaction, embodiment, enaction, methodology**

The study of human social phenomena in their proper scope demands the integrated effort of many disciplinary traditions. This fact is widely acknowledged but rarely acted upon. It is in practice often difficult to cross disciplinary boundaries, to communicate across different vocabularies, research goals, theories and methods. The aim of this Research Topic has been to make some progress in stepping across these borders.

Not attempting this crossing in a subject as multi-faceted as intersubjectivity inevitably binds us to remain within selfenclosed conceptions. By this we mean a bundle of selfreinforcing perspectives, hypotheses, experimental methods, debates, communities and institutions. Traditional ways of thinking about social cognition frame the questions that are deemed worth researching. These all revolve around the issue of how we figure out other minds, assuming that other people's intentional states are hidden, private and internal. The proposed answers rely only on how the perceived indirect manifestations of other people's mental states are processed by individual cognitive mechanisms (Van Overwalle, 2009).

We would like to raise, instead, the question of what an embodied science of intersubjectivity would look like if we were to start from different premises than those that delimit classical approaches to social cognition. For doing this, we thought the time was ripe for bringing together work that crosses disciplinary boundaries and informs us about different conceptions of how people understand each other and act and make meaning together.

The move is timely. The internalist assumptions in social cognition research are beginning to shift. We have more and better tools to explore the role of interactive phenomena and interpersonal histories in conjunction with individual processes (Dumas et al., 2010; Di Paolo and De Jaegher, 2012; Konvalinka and Roepstorff, 2012; Schilbach et al., 2013). This interactive expansion of the conceptual and methodological toolkit for investigating social cognition, we now propose, can be followed by an expansion into wider and deeply-related research questions, beyond (but including) that of social cognition narrowly construed.

Our social lives are populated by different kinds of cognitive and affective phenomena apart from figuring out other minds. They include acting and perceiving together, verbal and nonverbal engagement, experiences of (dis-)connection, relations in a group, joint meaning-making, intimacy, trust, secrecy, conflict, negotiation, asymmetric relations, material mediation of social interaction, collective action, contextual engagement with sociocultural norms, etc. These phenomena are often characterized by a strong participation by the cognitive agent, in contrast with the spectatorial stance of social cognition (Reddy and Morris, 2004; De Jaegher and Di Paolo, 2007). We use the broader notion of *embodied intersubjectivity* to refer to this wider set of questions.

Forty-two contributions to this Research Topic explore several of these themes. They combine ideas and methods from psychology, neuroscience, philosophy of mind, phenomenology, psychiatry and psychotherapy, social science, and language studies. The number of contributions confirms our suspicions that there is a genuine interest in embodied intersubjectivity.

All of the contributions in some way or other move beyond traditional cognitivist perspectives. Here we can simply highlight some of the most interesting ways in which this happens. As already mentioned, there is a recent trend to investigate the dynamics of actual interactive encounters between people. Several empirical studies in this Research Topic continue further along this line. They look at interactive encounters using methods such as thermal imaging, interactive virtual environments, or 1/f noise analysis, or combine existing methods with novel theoretical starting points.

Other work looks at aspects of embodied social understanding which are pertinent even in the absence of ongoing interaction. These include the richness of body kinematics, affect regulation, and life-story analysis. A few contributions focus on how embodied and interactive perspectives impact on developmental research. They study real-life interactions between infants and their care-givers in various contexts (infant pick-up, book sharing, pointing, cooperation, and expressiveness during play in chimpanzees). Aspects of psychopathology are explored also from an embodied intersubjective angle, inspiring research on intraand inter-personal emotion regulation, social affordances, personal biography, and therapeutic play, and their effects on somatic symptom disorders, autism, and schizophrenia.

Broadening the scope of relevant questions for embodied intersubjectivity inevitably means including research on language. Many of the contributions make headway on this matter, questioning the notion of the common ground, the role of conformity in social understanding, the processes involved in the activity of reading texts, and the links between conversational coordination and meaning-making. Others investigate the participatory nature of understanding narratives, and the role of organizational, temporal, and inter-affective aspects in language. Similar advances can be made in the area of connecting the cognitive and the social sciences. This is a very fruitful but still largely unexplored territory. A discussion is offered along Marxist lines concerning the interaction between categories of understanding and modes of social exchange and production. And the lessons of embodied/enactive approaches to intersubjectivity are summoned to contribute to understanding the phenomenological and social effects of solitary confinement.

Finally, some contributions elaborate theoretical and methodological implications and concepts, and in this way contribute to shaping the core of an embodied science of intersubjectivity. Methodological issues include whether dynamical systems concepts can bridge the multiple scales involved in social understanding, from the biological and neural to the personal, interactive and societal, how second person perspectives in cognitive science can help psychopathology research, and whether techniques used in theater can refine intuitions and theoretical concepts about interactive experience. Theoretical advances include radically embodied accounts of intersubjectivity that bring together conceptions from enactivism and ecological psychology, the notion of intersubjective time, and a socially embodied notion of the human self. Other discussions offer links between interpersonal interaction and phenomenal experience, between social normativity and conceptual abilities, or unearth the importance of opacity, i.e., the secret, silent or hidden aspects of personal experience, for understanding each other.

It is noteworthy, and especially satisfying, that many novel themes and questions emerged, several of them in some way related to personal meaning. To name a few: joy, secrecy, solitude, influence of capitalist mode of production on cognition, book sharing in infancy, the search for comprehensiveness and integrity in interacting, literature, and enactivism, ethics of care, shame in relation to interaction, and the interactive building blocks of culture and institutions.

Once again, we notice that the contributions to this Research Topic demonstrate the richness of enquiry and research work that is opened by the combination of novel methods and the bringing together of fields that traditionally work in isolation from each other. It also shows that criticisms of classical approaches as being sometimes too narrow are not just idle but point to genuinely new perspectives on concrete and everyday intersubjectivity that are opened to investigation.

# **ACKNOWLEDGMENTS**

This work is supported by the Marie-Curie Initial Training Network, "TESIS: Towards an Embodied Science of InterSubjectivity" (FP7-PEOPLE-2010-ITN, 264828).

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 09 February 2015; accepted: 15 February 2015; published online: 02 March 2015.*

*Citation: Di Paolo EA and De Jaegher H (2015) Toward an embodied science of intersubjectivity: widening the scope of social understanding research. Front. Psychol. 6:234. doi: 10.3389/fpsyg.2015.00234*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2015 Di Paolo and De Jaegher. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Tackling the social cognition paradox through multi-scale approaches

#### *Guillaume Dumas <sup>1</sup> \*, J. A. Scott Kelso1,2 and Jacqueline Nadel 3,4,5*

*<sup>1</sup> Human Brain and Behavior Laboratory, Center for Complex Systems and Brain Sciences, FAU, Boca Raton, FL, USA*

*<sup>2</sup> Intelligent System Research Centre, University of Ulster, Derry, Northern Ireland*


*\*Correspondence: dumas@ccs.fau.edu*

#### *Edited by:*

*Ezequiel Alejandro Di Paolo, Ikerbasque - Basque Foundation for Science, Spain*

#### *Reviewed by:*

*Viktor Müller, Max Planck Institute for Human Development, Germany Ezequiel Alejandro Di Paolo, Ikerbasque - Basque Foundation for Science, Spain*

**Keywords: multiscale, social interaction, developmental psychology, two-body neuroscience, second person perspective, hyperscanning, coordination dynamics, complex systems**

Recent debates regarding the primacy of social interaction versus individual cognition appear to be caused by the lack of an integrative account of the multiple scales at play. We suggest that reconciling individual autonomy and dyadic interactive viewpoints requires the taking into account of different time scales (e.g., development, learning) and levels of organization (e.g., genetic, neural, behavioral, social). We argue that this challenge requires the joint development of tools for two-body and second person neuroscience, along with the theoretical concepts and methods of coordination dynamics and systems biology. Such a research program may be particularly fruitful in deciphering complex sociodevelopmental diseases that are known to involve alterations on multiple levels.

## **THE ONTOGENY OF SOCIAL COGNITION: A CHICKEN-EGG ISSUE?**

Despite a propensity to interact with others, our ability to socialize seems neither given nor fixed once and for all (Dumas, 2011). As Sheets-Johnstone (2011) has pointed out "we come into the world moving; we are precisely not stillborn." The question of the ontogeny of social cognition (mirror neurons included) is grounded in our propensity to move. This primacy of movement can even be observed before birth: motorneurons appear well before their sensory counterparts in embryo; a large repertoire of spontaneous (thus self-organized) movements—e.g., making a fist, kicking, sucking—already exists (Kelso, 2002; Piontelli, 2010). Even twin fetuses demonstrate distinctive movements directed to each other (Castiello et al., 2010). At this stage, the "social events" are essentially movements. Does this mean, however, that there is no element of "social cognition" in such encounters? We think not.

Behavioral coordination acts as a powerful linkage between persons, even early in life. Infants are sensitive to contingent movements of the mother (Nadel et al., 1999) and the first dyadic interactions already exhibit co-regulation, "a continuous mutual adjustment of actions and intentions" (Fogel and Garvey, 2007). The disposition of human and monkey newborns to imitate (Meltzoff and Moore, 1983; Kugiumutzakis, 1993; Nagy et al., 2005; Ferrari et al., 2006; Soussignan et al., 2011) is not due to a passive coupling of perception and action. Rather it is an active attempt to adapt and gradually refine their own movements with respect to others. When imitated, human infants and newborn macaques display affiliative behavior toward the imitator (Paukner et al., 2009), as do low-functioning children with Autism Spectrum Disorder (ASD) (Nadel et al., 2000). The two facets of imitation, imitate and be imitated, constitute dual roles that can be traded, thereby allowing turn-taking (Nadel-Brulfert and Baudonnière, 1982). All that is needed is anticipation of the partner's next movement.

Here it seems we arrive at a cross-road: key ingredients of social cognition already appear to be present very early. Coregulation of synchrony, anticipation of the other's intentions, joint attention on a physical target, are central facets of social interaction. Does this mean they all emerge from the developing Mirror Neuron System (MNS)? Even if the early capacity to couple perception and action is associated with a proto MNS (Lepage and Théoret, 2007), we appear to be confronted with a circular logic problem: you need a MNS for social interaction but you need to interact to form a MNS. Although there is limited evidence for mirror neurons in early development (Catmur, 2013), sensorimotor experience may indeed be key to creating mirror neuron responses through Hebbian learning (Keysers and Perrett, 2004; see also Allen and Williams, 2011). See also the epigenetic view of Ferrari et al. (2013).

The idea that the MNS underlies not only motor exchanges but also high-level social cognition is now challenged by the proposal of a complementary role for the "mentalizing network" (Keysers and Gazzola, 2007; Uddin et al., 2007; Sperduti et al., 2014). A main task is to decipher possible top-down and bottom-up processes in social cognition. Such an endeavor requires, at the very least, joint investigation of behavioral and neural dynamics during real social exchanges (Hari and Kujala, 2009; Schilbach, 2014).

# **THE RISE OF TWO-BODY AND SECOND-PERSON NEUROSCIENCE**

Although social neuroscience has gathered a lot of data on how individual human beings perceive social stimuli, a truly interactive social neuroscience still lags behind. The community seems to have reached a consensus on the importance of investigating social situations that involve reciprocal exchange and mutual engagement (Hari and Kujala, 2009; Schilbach et al., 2013). Technological developments such as hyperscanning (Tognoli et al., 2007; Dumas et al., 2010; Babiloni and Astolfi, 2012; Hasson et al., 2012; Konvalinka and Roepstorff, 2012) and human-machine interfaces (Kelso et al., 2009; Pfeiffer et al., 2011) have greatly helped operationalize various aspects of real-time social interaction, thereby narrowing the gap between what we know about off-line and online social cognition (Schilbach, 2014). The former not only involve the same brain structures identified in research on isolated individuals (Sperduti et al., 2014); the brain dynamics vary according to social context, e.g., spontaneous vs. instructed interaction (Dumas et al., 2012a; Guionnet et al., 2012; Sänger et al., 2012) and social role, e.g., leaders vs. followers (Dumas et al., 2012a; Sänger et al., 2013; Konvalinka et al., 2014).

A further challenge concerns the structure and timing of inter-individual coordination and its relationship with intra-individual processes. Functional magnetic resonance imaging (fMRI) hyperscanning first showed strong anatomical and functional similarities across different individuals responding to the same perception, especially if it is social (Hasson et al., 2004). This finding extends to interactive contexts where inter-brain synchronization emerges in multiple frequency bands (Dumas et al., 2010; Müller et al., 2013). The related symmetrical and asymmetrical inter-brain patterns reflect how social interaction goes beyond a simple mirroring of the other and relies both on grasping other individuals' motor goals and inferring their intentions (Nadel and Dumas, 2014). Moreover, unlike intra-brain dynamics which primarily involves high frequency rhythms, the inter-brain dynamics appear to operate at lower frequencies (Müller et al., 2013). Thus, the temporal interplay between brain networks involved in social interaction, such as the so-called mirror and mentalizing systems, may be modulated by dynamics at the dyadic level, as in turn-taking (Wilson and Wilson, 2005). Moreover, social cognition cannot be understood only on the bases of intraor inter-personal dynamics but rather in their common hyper-brain space including both intra- and inter-brain coupling dynamics (e.g., Montague et al., 2002; De Vico Fallani et al., 2010; Sänger et al., 2012, 2013; Müller et al., 2013).

# **SOCIAL DYNAMICS AS A BRIDGE BETWEEN SCALES**

Cognition is constantly evolving during interactions with the environment and others. In order to sustain covariation, members of a social interaction must engage in active co-regulation (Fogel, 1993) and co-anticipation (Nadel and Dumas, 2014), potentially leading to the co-ownership of the action (Dumas et al., 2012a). Such genuine sharing of the interaction with others has been proposed as participatory sense-making (De Jaegher and Di Paolo, 2007) where social interaction plays a constitutive role for individual cognition (De Jaegher, 2009; Froese et al., 2014). The chicken-egg paradox here vanishes since both interactive and noninteractive mechanisms co-develop and mutually shape each other's development (Di Paolo and De Jaegher, 2012). Although still debated (Gallotti and Frith, 2013), this proposal is now supported by both modeling (Froese and Di Paolo, 2010; Froese et al., 2013) and experimental research (Auvray et al., 2009; Froese et al., 2014). In studies that have assessed the emergence of collective intelligence through dialog (Bahrami et al., 2010; Bang et al., 2014) interaction has been shown to constrain individual information processing (Fusaroli et al., 2014).

Social cognition thus relies on a braiding of neural, behavioral, and social processes (Hari and Kujala, 2009; Kelso et al., 2013). Neurobiological models of socio-cognitive functions have already been proposed (Gallese et al., 2004; Keysers and Perrett, 2004; Friston et al., 2011), though the dynamical components of human interaction are still largely missing (Adolphs, 2003). The theoretical and empirical framework of coordination dynamics has shown that neural, behavioral, and social scales may be studied and understood from a common perspective (Kelso, 1995; Kelso et al., 2009, 2013). As in other theories that aim to elaborate mathematical formalisms for cognition (e.g., Tononi, 2008; Friston, 2010), the objective of coordination dynamics is to identify general principles, the mechanistic realizations of which may be found in a variety of different systems at multiple levels of description. To be more than just words, coordination dynamics had to establish experimentally that criterial features of self-organization (e.g., order parameters, control parameters, stability, instability) actually existed in human behavior and that they could be mapped explicitly on to a theoretical model of the selforganizing dynamics. Then it had to show how information (e.g., about goals, intentions, the environment, etc.) shapes and is shaped by the self-organizing dynamics. Coordination dynamics relies on the same concepts and mathematical formalisms across different time scales and organizational levels and thus potentially offers inroads into a multi-scale account of social cognition.

In physics, multi-scale approaches have already uncovered universal principles, especially when matter undergoes phase transitions (Wilson, 1979). At the neural level, non-linear cross-scale interactions have been demonstrated experimentally (Le Van Quyen, 2011; see also Plenz and Niebur, 2014). In social neuroscience, nonlinearities are omnipresent in the underlying neural and social dynamics. Since functional networks display similar behavior across time-scales (Kelso, 1995; Bressler and Tognoli, 2006), a parsimonious account may be possible. Beyond the quest for parsimony and semantic clarity, having a mathematical formalism enables one to ask computationally relevant questions. For example, in the case of social neuroscience, neurocomputational modeling shows that the anatomical structure of the human brain favors both the complexity of intraindividual dynamics and the coupling in inter-individual dynamics (Dumas et al., 2012b). Regarding the debate about the constitutive role of social interaction, future computational studies can quantify macro-to-micro causal effects ranging from dyadic to individual processes (Hoel et al., 2013).

## **CONCLUSION**

Social interaction challenges the boundaries between the field of cognitive science and how to divide observations across distinct time scales and organizational levels. Social neuroscience is taking up this challenge at both theoretical and methodological levels. Here we have argued that three major dimensions are of potential significance: integrating a developmental perspective, investigating real-time social interaction with a twobody or second person neuroscience, and adopting a multi-scale approach through complex systems' perspectives, in particular the concepts, methods and tools of coordination dynamics. These developments have already begun and should help further an understanding of disorders of social interaction such as autism.

As Abney et al. (2014) have remarked, in cognitive science "multiple theories should interact when describing the same phenomenon." In social cognition, the case of autism provides a test bed for an integrative approach. Developmental psychopathology has uncovered a wide range of behavioral peculiarities of persons with autism (Burack et al., 2002); cognitive neuroscience has identified many biomarkers at both structural and functional levels; and systems biology has begun to relate genetic variants associated with cellular and metabolic pathways to individual behavior (Randolph-Gips, 2011). The next logical step is to bridge the gap between multiple levels (and disciplines). Twobody or second-person approaches have already drawn some connections between neural and social dynamics in neurotypical populations, and provide potentially powerful tools for the investigation of autism. Hyperscanning techniques, for instance, can be used to uncover relationships between phenotypes at the behavioral level and endophenotypes at neural levels. Inter-individual computational models combined with hyperscanning data could help elucidate causal relationships between structure and dynamics. Differences in brain anatomy may impact the ability of persons with autism to couple with others early in life thus decreasing their propensity to develop social skills (Dumas et al., 2012b). Computational neurogenetic approaches can help model the relationship between the genetics of autism and brain dynamics (Benuskova and Kasabov, 2008). Such integration of neurogenetics and systems biology may soon aid in tackling the heterogeneity observed in autism across genotype, neural endophenotype, and socio-behavioral phenotype levels.

## **ACKNOWLEDGMENT**

Guillaume Dumas and J. A. Scott Kelso are grateful for the support of NIMH Grant MH 080838.

# **REFERENCES**


*Lett.* 540, 21–27. doi: 10.1016/j.neulet.2012. 10.001


com/srep/2014/140114/srep03672/full/srep03672. html


in duets. *Front. Hum. Neurosci.* 6:312. doi: 10.3389/fnhum.2012.00312


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 May 2014; accepted: 24 July 2014; published online: 12 August 2014.*

*Citation: Dumas G, Kelso JAS and Nadel J (2014) Tackling the social cognition paradox through multiscale approaches. Front. Psychol. 5:882. doi: 10.3389/ fpsyg.2014.00882*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Dumas, Kelso and Nadel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# *Caterina Ansuini 1, Andrea Cavallo2 , Cesare Bertone2 and Cristina Becchio1,2 \**

<sup>1</sup> Department of Robotics, Brain and Cognitive Sciences, Italian Institute of Technology, Genova, Italy <sup>2</sup> Department of Psychology, Centre for Cognitive Science, University of Torino, Torino, Italy

#### *Edited by:*

Hanne De Jaegher, University of the Basque Country, Spain

#### *Reviewed by:*

James Kilner, University College London, UK Janny Christina Stapel, Radboud University Nijmegen, Netherlands

#### *\*Correspondence:*

Cristina Becchio, Department of Robotics, Brain and Cognitive Sciences, Italian Institute of Technology, Via Morego 30, 16163 Genova, Italy e-mail: cristina.becchio@unito.it

A key component of social understanding is the ability to read intentions from movements. But how do we discern intentions in others' actions? What kind of intention information is actually available in the features of others' movements? Based on the assumption that intentions are hidden away in the other person's mind, standard theories of social cognition have mainly focused on the contribution of higher level processes. Here, we delineate an alternative approach to the problem of intention-from-movement understanding. We argue that intentions become "visible" in the surface flow of agents' motions. Consequently, the ability to understand others' intentions cannot be divorced from the capability to detect essential kinematics. This hypothesis has far reaching implications for how we know other minds and predict others' behavior.

**Keywords: kinematics, reach-to-grasp, intention, action observation, social interaction**

Room H3 in King's College, Cambridge, was crowded that night. It was 25 October 1946, and Karl Popper and Ludwig Wittgenstein were battling over the very trajectory of their discipline, when Wittgenstein picked up a fire-poker. Did Wittgenstein brandish the poker to threaten Popper, or did he merely pick it up absentmindedly to give emphasis to his own remarks? (Edmonds and Eidinow, 2001).

When we observe others acting, what matters are their goals and intentions. In the above "poker incident,"what matters – especially from Popper's point of view – isWittgenstein's intention in picking up the poker. But how do we discern intentions in others' actions? What kind of information about intentions is actually available in the features of others' movements? (Baldwin and Baird, 2001).

The ability to interpret and predict the behavior of other people hinges crucially on judgments about the intentionality of their actions – whether they act purposefully (with intent) or not – as well as on judgments about the specific intentions guiding their actions. Until recently, however, direct investigation of these skills has been surprisingly rare. One obstacle to such investigation has been the framing of the problem as a problem of access to mental states which are hidden away in the other person's mind and therefore inaccessible to perception. As Gallagher (2008) puts it, the supposition has been precisely that intentions are "not things that can be seen."

Recent findings challenge this view by positing that intentions are specified at a tangible and quantifiable level in the movement kinematics (Becchio et al., 2010). "How" an action is performed is not solely determined by biomechanical constraints, but it depends on the agent's intention, i.e., "why" the action is performed. This raises the intriguing possibility that intentions – regarded as *covert* mental state dispositions by standard theories of social understanding – may become "visible" in a person's *overt* motor behavior (Runeson and Frykholm, 1983).

In this *Perspective* article, we discuss this hypothesis in light of recent kinematics and psychophysical evidence. An apt characterization of the ability to understand others' intentions, we argue, may not abstractfrom a systematic assessment of how intentions translate into movements. In line with this, the first section shows how kinematics techniques can be applied to investigate the influence of intention on grasping movements. Intention is here defined at the level of "why" an actor is performing a specific action with an object, i.e., the distal goal of the action (Grafton and de C Hamilton, 2007). Following the demonstration that intention influences action kinematics, the second section reviews evidence that observers are capable to pick-up intention information from movement patterns. The third and final sections discuss the implications of these findings for future research on action understanding.

# **WHAT DOES KINEMATICS TELL US ABOUT INTENTIONS IN ACTION EXECUTION?**

Research on hand kinematics has proven insightful in revealing how specific kinematic landmarks modulate with respect to object properties, including object size, shape, texture, fragility, and weight. As recently reviewed, all these factors influence the kinematics of grasping (Castiello, 2005). The way an object is grasped, however, does not only depend exclusively on the properties of the object, but it is also influenced by the agent's intention. This was first demonstrated by Marteniuk et al. (1987) by asking participants to grasp a disk and either fit it carefully or throw it. The *deceleration time* was longer for fitting than for throwing (see **Table 1**). Since this seminal work, a plethora of studies have investigated how intentions influence the execution of reach-to-grasp movements (e.g., Ansuini et al., 2006, 2008; Armbrüster and Spijkers, 2006). The logic of these studies has been to "manipulate" the intention while keeping the object to be grasped (i.e., goal) as well as the situational requirements (i.e., context) constant (see **Figure 1**). If within the same context, the


#### **Table 1 | A brief overview of the main kinematic variables traditionally used to describe reach-to-grasp movements.**

The proximal component refers to the "reaching" and is described by variables obtained from the radial aspect of the wrist. The distal component refers to the "grasping" and is described by variables obtained from thumb and index fingers. With three or more markers (a configuration classically used for reach-to-grasp movements), the distances and angles at joints can be measured as well as the accelerations and velocities of hand and limb segments. Please note that to compare movements with different absolute durations, time variables can be normalized with respect to the movement duration (e.g., % of normalized movement duration).

same object is handled differently depending on the agent's intention, this would indicate that the intention influences the grasping kinematics.

This hypothesis has been tested in two-digit grasp studies as well as in multi-digit grasp studies that investigated how the whole hand is shaped during the unfolding of the reach-to-grasp movement. Ansuini et al. (2008), for example, asked participants to reach toward and grasp a bottle to accomplish one of four possible actions: pouring, displacing, throwing, or passing. Analysis of digit kinematics revealed that when the bottle was grasped with the intent to pour, both the middle and the ring fingers were more extended than in all the other considered intentions. Similarly, choice of hand placement on the object has been shown to adapt to the upcoming intention. For example, participants place their thumb and index finger in a higher position when they grasp a bottle with the intention to pour than when they grasp it with the intention to lift (Crajé et al., 2011).

Further studies have extended these effects to the domain of social intention. For instance, it has been shown that participants' *maximal finger aperture* is smaller and *grip aperture velocity* increases when an object is reached and grasped with the intent to move it compared to when it is grasped with the intent to pass it to another person (Becchio et al., 2008a; see also Sartori et al., 2009; Quesque et al., 2013). At a higher level of abstraction, Becchio et al. (2008b; see also Georgiou et al., 2007) showed that the kinematics of grasping movements differed depending on whether the object was grasped with the intent to cooperate with a partner, compete against an opponent, or perform an individual movement

at slow or fast speed. Despite similar task requirements, *movement duration* was shorter and *wrist velocity* was higher for "competitive" than for "individual fast" movements. Conversely, *movement duration* was longer and *wrist velocity* was lower for "cooperative" than for "individual slow" movements.

## **WHAT DOES KINEMATICS TELL US ABOUT INTENTIONS IN ACTION OBSERVATION?**

The above findings suggest that intentions influence action planning so that, although the to-be-grasped object is the same, different kinematic features are selected depending on the overarching intention. That intention information is available in the kinematic pattern of human action, however, is not to say that it can be perceptually appreciated. Are observers sensitive to differences in movement kinematics*?* Can they use them to discriminate between movements performed with different intentions?

One approach for probing the contribution of visual kinematics is progressive temporal occlusion, where multiple occlusion points are used so as to provide selective vision to different time periods or events within an observed action sequence (Farrow et al., 2005). This paradigm has been used with a number of different sports to demonstrate superior attunement to advance kinematic information by experts over nonexperts (e.g., Abernethy and Zawi, 2007; Abernethy et al., 2008). For example, it has been shown that in racquet sports such as badminton to predict the depth of an opponent's stroke, expert players use advance pre-impact kinematic information to

**FIGURE 1 |Techniques used to quantify the influence of intention on movement kinematics. (A)** Example of experimental set-up employed in action execution studies. The participant sits at a table with his hand resting in a starting position, which is kept constant across participants. The task is to reach and grasp the object (i.e., a bottle) either to lift it or to place it inside a box. An optoelectronic system (Vicon Motion Systems Ltd., UK) equipped with nine infra-red cameras is used to quantify reach-to-grasp movements. This system relies on passive markers (retro-reflective material on a plastic sphere) placed on points of interest over participant's hand. An infra-red light is transmitted toward the work space area and the rays are reflected back off the markers to a series of "cameras" that record their positions. These positions are then referred to a coordinate system, the origin of which is either in 2-D or 3-D coordinates, i.e., two or three mutually orthogonally axes, each passing through the origin. **(B)** A computer-generated stick figure representing the position of the markers placed over arm and hand joints during a reach-to-grasp movement toward the bottle. After collecting raw data, it is possible to identify and track the marker's trajectories almost in real time by means of tracking procedures.

which less skilled players are not attuned (Abernethy and Zawi, 2007).

Adapting the same logic to intention anticipation, Sartori et al. (2011) tested whether observers use pre-contact kinematic information to anticipate the intention in grasping an object. To this end, they first analyzed the kinematics of reach-to-grasp movements performed with different intents: cooperate, compete against an opponent, or perform an individual action at slow or fast speed. Next, they selected videos representative of each type of intention and prepared experimental video-clips. Each clip started before reach onset and ended at the time the fingers contacted the object so that neither the second part of the movement, nor

the interacting partner, when present, were visible. Participants watched these videos and judged the intention in a yes/no detection task. The results revealed that observers were able to judge the agent's intent by simply observing the initial reach-to-grasp phase of the action (Sartori et al., 2011; but see also Naish et al., 2013).

But what specific cues did participants use to make their anticipation judgments? To examine the spatial location of anticipatory information, in a second psychophysical study, Sartori et al. (2011) combined temporal and spatial occlusion procedures to mask visibility to selected spatial areas of the agent's movement. Masking the visibility of the upper part of the agent's body (i.e., from shoulders to head) caused no significant decrements in prediction accuracy, suggesting that observers were able to pickup useful information from the arm kinematics (Sartori et al., 2011).

The spatial occlusion method helps to determine how much information is lost when a specific spatial region of the display is masked. However, because other areas of the display can potentially provide compensatory or alternative information, it does not indicate how much information is carried in isolation by specific kinematic features (Abernethy et al., 2008). To obtain an analytic determination of the key kinematic features that provide useful advance information about the agent's intention, in a subsequent study Manera et al. (2011) rendered reach-to-grasp movements as point-light displays. Though the displays were reduced to only three disconnected points of light corresponding to the position of the markers on the wrist, the index finger, and the thumb of the agent's hand, participants were nonetheless able to discriminate between social and individual intentions from the unfolding movement kinematics.

# **UNDERSTANDING OTHERS' INTENTIONS: IMPLICATIONS AND FUTURE DIRECTIONS**

Considered together, the studies reviewed above indicate that observers are capable of picking up and using kinematic information to make judgments not only about movement patterns but also about intentions. In this section, we consider some of the theoretical and the methodological issues raised by these findings and speculate on the ways in which they may be addressed by future research.

## **HOW DOES KINEMATICS COMBINE WITH OTHER SOURCES OF INFORMATION?**

How does movement kinematics combine with other sources of information in revealing others' intentions? There are situations in which the intention of an observed actor can be unambiguously estimated from one source of information, e.g., the type of grasp, the presence of a target object. Most often, however, combining different sources of information may lead to more accurate predictions. This is indeed what Stapel et al. (2012) demonstrated by asking participants to anticipate how an observed action would unfold. Participants observed an actor walking. After a few steps, they had to indicate how the action would continue, i.e., whether the actor would take another step walking or start crawling. A first experiment showed that observers were more accurate when they could base their predictions on the combination of movement kinematics, situational constraints

(e.g., the presence of a table), and target object position (a ball). In a second experiment, the target object was artificially moved to another location so that movement kinematics was incongruent with the target object position. Results revealed that, in this ambiguous situation, participants relied on movement kinematics rather than on object location in making their predictions. This suggests that in the presence of conflicting information from different sources, movement kinematics may be prioritized to disambiguate the agent's intention. A challenge for future research will be to understand the temporal course of information integration from different sources. A recent transcranial magnetic stimulation (TMS) study by Cavallo et al. (2013) demonstrated that, at movement onset, motor-evoked potential responses reflected the most probable motor program estimated from the situational context (e.g., whole hand grasp). During movement observation, however, the initial motor program was substituted by a new plan matching the specific features of the observed movement (e.g., precision grip). Thus, an intriguing possibility is that the contribution of movement kinematics is related to the specific stage of the observed action processing: before the to-be-observed action starts, observers rely on contextual factors to predict the course of the action; as the movement unfolds, however, action prediction might prioritize kinematic information. If confirmed, this would have implications for the interpretation of the so-called chain model of action organization (Bonini et al., 2013): modulation of mirror neuron discharge by end-goal might reflect not only (and not so much) the presence of contextual cues allowing the monkey to predict the experimenter's intention (Fogassi et al., 2005), but also sensitivity to intention-related differences in the movement kinematics.

## **"SECOND-PERSON" VS. "THIRD-PERSON" INTENTION UNDERSTANDING**

The studies reviewed above used spatial and temporal occlusion procedures to quantify pick-up of advance information. The advantage of using psychophysical methods is the high degree of control and statistical power they ensure. However, it is not clear how far this type of paradigm accounts for realtime interactions in which two or more individuals are set in a common social context. Social cognition has been proposed to be substantially different when we actively interact with others ("second-person" social cognition) rather than merely observe them ("third-person" social cognition; Schilbach et al., 2013). For third-person social cognition, observing body movement is merely a way of gathering data about the other person. For secondperson social cognition, the knowledge of the other resides – at least in part – in the interaction dynamics "between" the agents (De Jaegher et al., 2010); it is thus plausible that interaction dynamics affect pick-up and use of advance kinematic information.

An initial investigation on this topic was made by Streuber et al. (2011) by adapting the spatial occlusion procedure to a social interaction task. Participants played a table tennis game in a dark room with only the table, the net, and the ball visible. The game could be played in a cooperative fashion, i.e., to play the ball back and forth as often as possible, or in a competitive fashion, i.e., to win the trial. The visibility of the players' racquets and the body movements was manipulated with the following logic. If a specific source of information is important for playing table tennis, then rendering this source of information visible should positively affect the players' performance. Results revealed that when the game was played cooperatively, seeing the other player's racket had the largest effects on performance. In contrast, when the game was played competitively, seeing the other player's body resulted in the largest increase in performance. This suggests that online cooperative and competitive dynamics selectively modulates the use of visual information about others' actions. A question to be addressed by future research is whether a similar modulation is observed in offline tasks, in which participants are required to merely observe cooperative and competitive actions. More generally, it would be interesting to directly compare second-person and third-person social understanding with respect to the pick-up and the use of advance information: is attunement to kinematic features modulated by self-involvement? Do second-person and third-person intention understanding rely on the same kinematic characteristics?

## **WHAT IS THE NATURE OF THE MECHANISMS WHICH ALLOW US TO READ INTENTIONS IN OTHERS' ACTIONS?**

Ever since their discovery, mirror neurons have been proposed to underlie our ability to understand actions "transforming visual information into knowledge" about others' goals and intentions (Gallese and Goldman, 1998). But how exactly is this transformation achieved?

Rizzolatti and Craighero (2004) suggested a rather simple mechanism: "Each time an individual sees an action done by another individual, neurons that represent that action are activated in the observer's premotor cortex." This motor representation of the observed action "corresponds to that which is spontaneously generated during active action and whose outcome is known to the acting individual." In this way, mirror neurons would transform visual information into knowledge about another person's intention.

This model has been criticized on the assumption that "the same visual kinematics can be caused by different goals and intentions" (Kilner et al., 2007). Simulating the observed kinematics – it has been claimed – might allow an observer to represent what the agent is doing. However, given the non-specificity of the observed kinematics, it will not allow them to represent the agent's intention (Jacob and Jeannerod, 2005).

The findings reviewed above provide strong evidence to the contrary. First, in contrast to the "non-specificity assumption," they demonstrate that intention information is specified in the visual kinematics. Second, they indicate that observers are sensitive to this information and can use it to discriminate between different intentions. Evidence that the mirror system supports this ability comes from recent fMRI studies (Vingerhoets et al., 2010; Becchio et al., 2012). For example, Becchio et al. (2012) report that mirror areas are sensitive to kinematic cues to social intention. Participants observed isolated reach-to-grasp movements performed with the intent to cooperate, compete, or perform an individual movement, followed by a static test picture. They were

required to judge whether the test picture depicted a continuation of the observed movement or not. Despite the lack of contextual information, observing grasping movements performed with a social intent relative to grasping movements performed with an individual intent activated mirror areas, including the inferior frontal gyrus and the inferior parietal lobule. Interestingly, comparison of social vs. individual movements also revealed differential activations at the temporo-parietal junction and within the dorsal medial prefrontal cortex, two regions traditionally associated with explicitly thinking about the state of minds of other individuals (i.e., "mentalizing"). These findings shed some light on the neural mechanisms underlying intention-from-movement understanding. They leave, however, a number of crucial issues unanswered.

A first issue pertains to how observed actions are mapped onto one's own motor system. The mirror system is generally assumed to associate observed actions with"corresponding"motor programs of the observer. What though is exactly meant by "corresponding?" When we observe other individuals act, the very fact that our body differs from theirs' introduces a disparity between the observed and the executed kinematics (for data on this issue see for instance Gazzola et al., 2007). It is thus difficult to envision how, at a computational level, the executed kinematics might be "coupled" with the observed kinematics (but see Press et al., 2011).

A second question concerns the exact contribution provided by the mirror and the mentalizing system (Van Overwalle and Baetens, 2009). While some theorists have argued that these two systems are mutually independent (e.g., Jacob and Jeannerod, 2005; Saxe, 2005), a substantial number of authors support the notion that the mirror system might inform the mentalizing system (e.g.,Keysers and Gazzola,2007; Uddin et al.,2007). According to this view, people would use their own motor system to encode the intentionality of an action based on its visual properties and form a pre-reflective representation of the other person'sintention. This representation would then serve as inputs to attributional processing within the mentalizing system (Keysers and Gazzola, 2007; see also Spunt and Lieberman, 2012). In line with this, de Lange et al. (2008) report that mirror areas, including the inferior frontal gyrus, process the intentionality of an observed action on the basis of the visual properties of the action, irrespective of whether the subject paid attention to the intention or not. In contrast, brain areas that are part of the mentalizing network become active when subjects reflect about the intentionality of an observed action, but are largely insensitive to the visual properties of the observed action. Alternatively, mirror neurons might discharge during action observation not because they are driven by the visual input but because they are part of a generative model that is predicting the sensory input (Kilner, 2011). Within this framework, the generative model starts with a prior prediction of the intention of the observed action. This prediction would be estimated in areas outside the mirror system (including mentalizing areas) and then conveyed to mirror areas, influencing the selection of a specific action intention. Techniques for characterizing effective connectivity between brain areas can provide answers in this debate because they can demonstrate the influence one system exerts over the other.

## **CONCLUSION**

The view that "motor" is separated from "mental" has long been dismissed, yet traces of it remain in the way the problem of intention understanding is currently addressed. Based on the assumption that intentions are hidden away and therefore not accessible to perception, standard theories of social cognition have mainly focused on the contribution of higher level, inferential processes to intention understanding. We argue that reframing the relationship between intention and movement provides radically new insights into the psychology and neurobiology of how we know other minds and predict others' behavior.

Did Wittgenstein pickup the poker to threaten Popper or to give emphasis to his thoughts? As Popper's account of the episode proves, the way in whichWittgenstein brandished the poker clearly betrayed his intention.

## **ACKNOWLEDGMENTS**

This work received funding from the European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement no. 312919. The authors thank Laura Taverna for her help in figure preparation and Marco Jacono for his support in description of kinematics measures and techniques.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 April 2014; paper pending published: 03 June 2014; accepted: 09 July 2014; published online: 24 July 2014.*

*Citation: Ansuini C, Cavallo A, Bertone C and Becchio C (2014) The visible face of intention: why kinematics matters. Front. Psychol. 5:815. doi: 10.3389/fpsyg.2014. 00815*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Ansuini, Cavallo, Bertone and Becchio. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Enacting a social ecology: radically embodied intersubjectivity

# *Marek McGann\**

Department of Psychology, Mary Immaculate College, University of Limerick, Limerick, Ireland

## *Edited by:*

Hanne De Jaegher, University of the Basque Country, Spain

#### *Reviewed by:*

Harry Heft, Denison University, USA Cor Baerveldt, University of Alberta, Canada

*\*Correspondence:* Marek McGann, Department of

Psychology, Mary Immaculate College, University of Limerick, South Circular Road, Limerick, Ireland e-mail: marek.mcgann@mic.ul.ie

Embodied approaches to cognitive science frequently describe the mind as "worldinvolving," indicating complementary and interdependent relationships between an agent and its environment. The precise nature of the environment is frequently left ill-described, however, and provides a challenge for such approaches, particularly, it is noted here, for the enactive approach which emphasizes this complementarity in quite radical terms. This paper argues that enactivists should work to find common cause with a dynamic form of ecological psychology, a theoretical perspective that provides the most explicit theory of the psychological environment currently extant. In doing so, the intersubjective, cultural nature of the ecology of human psychology is explored, with the challenges this poses for both enactivist and ecological approaches outlined. The theory of behavior settings (Barker, 1968; Schoggen, 1989) is used to present a framework for resolving some of these challenges. Drawing these various strands together an outline of a radical embodied account of intersubjectivity and social activity is presented.

**Keywords: enactivism, ecological psychology, affordances, behavior settings, culture**

## **IN SEARCH OF THE PSYCHOLOGICAL ENVIRONMENT**

Many of the various flavors of embodied cognitive science describe the mind as "world involving." Psychological activity is "situated" or "embedded", dependent on or highly sensitive to environmental conditions. Enactive cognitive scientists quote the philosopher Merleau-Ponty to provide perhaps the most dramatic example of such thinking:

*The world is inseparable from the subject, but from a subject which is nothing but a project of the world, and the subject is inseparable from the world, but from a world which the subject itself projects.* (Merleau-Ponty, 1962, p. 430)

Given such a view, understanding the mind requires an account of the psychological environment as detailed and comprehensive as our accounts of the cognitive system. I believe that enactivists have yet to provide such an account.

In order to address this issue, in this paper I will advocate for a closer alliance between enactive thinking and ecological psychology as it has developed from the work of James J. Gibson. In doing so I endorse a similar call by Chemero (2009), and explore some of the ways in which these two approaches can be brought closer together to the benefit of both.

Primarily, I will argue that drawing on the theoretical resources of ecological psychology offers significant benefits for an enactive cognitive science, though I will also note where I consider enactivism has something to offer ecological psychology. Further, following arguments that all of human psychology in particular is embedded not only in a physical but a social and cultural surround, I outline how a combined approach enables a comprehensive account of the human psychological environment.

In the following sections I will outline first the extant enactive thinking on the psychological environment and the core tenets of the related but distinct ecological perspective. I will then examine the revisions of traditional ecological thinking that Chemero (2009) uses to bring these two approaches into closer alignment and suggest some resolutions to remaining tensions. With this groundwork laid I turn to the question of sociality and the shared environment. Following the work of Heft (2007, 2011), I suggest that the concept of behavior settings advanced by Barker (1968) and others can be used to understand social activity and suggest this as an example of the kinds of theoretical resources that an ecological psychology can provide for enactive thinking. I argue that an understanding of behavior settings, encapsulated within a radical embodied framework, can form a sound basis for a science of embodied intersubjectivity.

# **WHAT IS AN ENVIRONMENT BROUGHT FORTH BY ENACTION?**

The enactive approach posits a fundamental complementarity between the agent and its environment. As the quotation from Merleau-Ponty makes clear, the two are seen as deeply interdependent. Enactivists describe agents and their environments as arising together, emergent phenomena (Varela et al., 1991; Weber and Varela, 2002; Thompson, 2007). For enactivists, it all begins with an autonomous, organisationally closed, system (see Varela, 1979). Such systems are made of a set of processes where each process depends on at least one other component and supports at least one other. Once such a system arises in the world the system operates so as to implicitly make a distinction between things (processes) that are part of that system and those that are not. The system, the most basic form of agency whose only purpose can be seen as continuing to produce itself (Weber and Varela, 2002; Thompson and Stapleton, 2009), will be structurally coupled to the world around it. Richer, more complex

systems have richer, more complex potential interactions (Di Paolo, 2005; Barandiaran et al., 2009). Some aspects of the world are relevant to the agent's concerns and body, and can affect it in various ways, whereas there are large portions of the world that are effectively absent or non-existent for the agent. The *environment*, then, is the world standing in various relations to the agent, relations that hold because of the agent's values, needs, capabilities and embodiment. As a relational phenomenon the environment emerges with the agent, the two are a complementary pair and neither can be fully specified without reference to the other.

Thinking in such terms means that encounters between an agent and its environment are normally achievements of the agent rather than impositions upon it. The world does not stimulate a passive agent, but rather the agent engages with its surround; interaction is sought. Psychology is, by these lights, not a process of stimulus and response. There is no starting point for an organism's actions (a trigger stimulus to a patient organism) because they are already alive, already acting, already concerned. Simply being alive means that an agent is coordinating its own activity with that of its environment. Enactivists term this process *sensemaking*. An event, process, or object in the world only exists for the agent insofar as it affects and can be brought into coordination with the agent's own on-going activity – it is the world made sense of by the organism. A classic illustration of this kind of coordination often used in the enactive literature is that of a bacterium's climbing of a sucrose gradient (Varela, 1991).

The *Escherichia coli* bacterium has two modes of locomotion: one characterized by random tumbling, the other by coherent movement in a given direction. The chemical sucrose can interact with the bacterium's cell membrane and can be metabolized by the cell. As such, an *E. coli* can encounter sucrose, and what is more, tends to encounter it as food. When a tumbling bacterium encounters sucrose it tends to switch to a more coherent movement that brings it toward areas of higher concentration of food. This illustration outlines the mutual character of the agent and its environment – the sucrose can only be present for the organism because the organism's embodiment enables it. The agent simply cannot engage with many other aspects of the world (e.g., tectonic movements, most variations in the electromagnetic spectrum, most variations in atmospheric pressure). The example also makes the point that engagements between an agent and the environment involve the coordination of the agent's needs or values (in this case the need of continued material self-production to which sucrose can contribute, serving the value of continued existence) with the resources, opportunities, threats, and demands of an environment that matters to it.

The enactive description of psychology fits very closely with the notions of Dewey (1896) set forth in his classic paper "The reflex arc concept in Psychology." Dewey argues that a "response" is never "triggered" by a "stimulus" because the stimulus is always encountered in the process of the agent's on-going behavior. Rather than consider stimuli and responses we are better considering tensions that arise in the organism's encounters that are resolved by coordinations. Psychology is not a process that occurs in the space between stimulus and response but in the

engagement between an agent and its environment. It is a relational phenomenon that must be addressed in relational terms that acknowledge both aspects of the tensions and coordinations in question.

Many of the illustrations of the world-involving nature of cognitive activity by enactive researchers deal in rather fundamental biological terms, such as the chemical processes in living cells (Varela, 1991) or minimalist computational robotics models that illustrate proofs of concept (Di Paolo, 2003; Di Paolo et al., 2010; Egbert and Barandiaran, 2014; Egbert and Cañamero, 2014). The characterisation of the relationship between the agent and the world in stark physical, chemical, or dynamical terms of bodily processes coupled to environmental ones makes some important points. The environment does not stand outside of the agent, imposing stimuli upon it in tit-for-tat exchanges of trigger and movement. It remains something of an open task for enactivists, however, to characterize the psychological environment in terms that fit both the enactive attitude – acknowledging the relational, co-determined nature of the environment and psychological activity – as well as experience and activities more personally familiar to us human beings.

# **THE ECOLOGICAL PERSPECTIVE**

Perhaps the most clearly and systematically developed account of the psychological environment available is that of the ecological psychology that traces back to the perceptual psychologist Gibson (1966, 1986). Much like the enactivists who would come later, Gibson described a complementarity between the organism and its environment. He notes that the organism's environment is not defined by the kinds of purely objective measures of Physics, but rather in terms relative to the agent – ecological terms. When being introduced to someone you do not stand, say, 80 cm from them, you stand within arm's reach to shake their hand. The psychological environment, then, should be described relative to the psychological agent who is engaged with it.

On first blush it might seem that this way of thinking could lead us very quickly into an unwanted solipsism, with each organism living in its own distinct environment. Gibson (1986, p. 43) resolves this concern with a single clear and seemingly obvious point. Perceivers move. While no observers can occupy precisely the same point of view at the same time, the environment they share can be moved around and explored. The same perspective can be taken by different observers at different times. The environment remains to be explored by all of the observers that share it over the duration of its existence. An environment is shared inasmuch as two agents can perceive and act on it in a similar manner, something that will be the case for almost all animals of the same species and indeed many animals of different species.

Understanding the psychological environment as described by ecological psychology, then, involves understanding the relationship between an animal and its ecological niche – those aspects of the physical world that are relevant to the animal's needs and capabilities and within which the animal will spend its life. This relationship between need, capability and the world around the organism brings out perhaps the most famous of concepts that Gibson put forward – affordances.

*The affordances of the environment are what it offers the animal, what it provides or furnishes, either for good or ill...It implies the complementarity of the animal and the environment.*

(Gibson, 1986, p. 127)

For us humans, for instance, flat ground generally affords walking on, while a cup affords grasping. The surface of water affords walking on by a pond-skater but not for us. Affordances are opportunities, allowing an animal to fit their actions to the world around them, or obstacles, demanding effective actions to be overcome.

Within the ecological literature affordances are commonly seen as properties of the environment. While they might be animal relative (such as the affordance of a pond surface for walking) they remain proper to the environment. Reed (1996a) takes quite a strong stance on this position, holding affordances as being properties of the world ready to be engaged with by any animal and which can impose selection pressures on species over evolutionary timescales. A more standard mode of thinking on the issue sees affordances as dispositional properties, properties of the world that can be instantiated just in those instances where an animal with the appropriate capacity interacts with it. This perspective is particularly associated with Michael Turvey, Robert Shaw and William Mace (Turvey et al., 1981; see also Turvey, 1992).

From an ecological perspective perception is generally perception of affordances. We perceive our environment in terms of what it affords. Crucially this perception is direct – it needs no representations, computations or other"mental gymnastics" (Chemero, 2009). Direct perception is to a large extent a matter of successful coordination of our behavior with some relevant variable in the environment. Rather than the creation of a perceptual image, the activation of some encoded memory or the production of a mental model, perception is the ability to engage with the environment.

Acting and perceiving take place in a medium. For us landliving types that medium is generally the air, which is transparent and diffuse so as to allow light, sound and solid objects to move through it readily. In the case of vision, light, which typically suffuses the entire domain in which we are behaving, will move (reflect, bounce around) in a reliable manner that is given structure by the shape and texture of objects in the vicinity. By moving our eyes we can use the structure of the light to coordinate our movements with the objects, surfaces, and other things in our environment. The world is perceived directly via these structures in the ambient array of energy (light or sound, for instance) and chemistry (in the case of smells) rather than interpreted through the construction of representations or models.

This structure in light (or sound or other energy and chemical arrays around us) Gibson referred to as "ecological information." Quite different to how information is commonly discussed in Cognitive Science, it is structure in ambient energy that is formed due to the structure of the environment.

A classic example of ecological information is how the dynamics of optical flow specify and thereby allow us to perceive, time to impact as we move toward something (Lee, 1974). As we approach an object elements of its visual texture tend to spread apart in our visual field. The rate at which this happens has a direct relationship with how long we have until we hit the object in question (if it's in the middle of our visual field and stays there). This information is present not in the form of some encoding but in the relationship between movement and structure in the ambient light. It is a relatively simple affair (which is to say that it requires no "mental gymnastics" nor a cognitive system capable of same) for an animal to guide various movement-based behaviors according to this easily sampled relationship. No representation of the actual time to impact is required because that can be perceived directly via the optic-flow variable in question.

Ecological information is not "taken in" or processed with some model (however, sparse or rich) of the environment. Rather, an animal is able to attune to it, to use it as a means of coordinating its behavior with the environment. The psychological environment is the set of affordances that exists for the agent. Ecological information is the means by which an animal perceives those affordances. Perceiving occurs not as a passive reception of stimuli but as an active part of perception-action cycles, coordinations between the agent and its environment.

Gibson thus shares with enactivists (and, indeed, Dewey) the notion that in perception the agent is already acting. Actions are coordinations with the environment, not responses to it. While the enactive and the ecological clearly have much in common, however, there are a few considerations that stall any straightforward adoption by enactivists of an ecological account of the environment.

# **A DYNAMIC RECONCILIATION OF ENACTIVE AND ECOLOGICAL ACCOUNTS, CHEMERO'S "RADICAL EMBODIED COGNITIVE SCIENCE"**

In *The Embodied Mind*, Varela et al. (1991, pp. 203–204) explicitly oppose their enactive view to Gibson's ecological one. They take issue with a seemingly fixed conception of affordances put forward by Gibson, arguing that such an approach does not adequately acknowledge the dynamic interdependence between the agent and its environment. They quote Gibson (1972, p. 239) as understanding affordances, and the ecological information that specifies them, as "there to be discovered."

The ontological priority here of not just the world but the environment (the perspectival, relational, description of the world) is a form of philosophical realism that runs counter to the emergentist views of enactivists. A more observer-relative description of affordances put forward by Turvey, Shaw, Reed and Mace (Turvey et al., 1981; see also Turvey, 1992) is somewhat less objectionable (Varela et al., 1991). However, the idea that the world for the agent is exhaustively specified at any given moment by ecological information, thus leaving much of the texture and detail of the agent unnecessary in a description of a given engagement, remains counter to an enactive stance. Similarly, a proper explanation of the perception of (visual) affordances will require more than just an account of optics, however, ecologically specified (Varela et al., 1991).

Over several decades of ecological research, however, there has been a long-standing debate as to just how affordances should best be conceived and how their relationship with the agent should be understood (Heft,1989; Turvey,1992; Chemero,2003,2009; Jones, 2003; Michaels, 2003; Stoffregen, 2003; Withagen and Chemero, 2012; Withagen et al., 2012).

Recently, Chemero (2009), in refining our understanding of affordances, has explicitly sought to reconcile the ecological and enactive viewpoints under a banner of "radical embodied cognitive science." In order to do this, Chemero has argued a number of points.

Firstly, he redoubles the emphasis on dynamic interaction with the environment that is part and parcel of an ecological approach. Chemero notes that while ecological psychologists have adopted dynamical thinking and the methods of dynamical systems science in a deep and thorough-going manner over the past decades, the orthodox conception of affordances (that associated with Turvey et al., 1981) does not show quite the same dynamic sensibility.

Now, affordances have always been dynamical concepts. A flying ball might afford catching, but only while in flight. A stationary or slow-moving cup affords grasping, but not one moving too quickly. But many affordances are sufficiently stable such that they are often discussed simply as properties of the object in question – the flat rigid surface of the ground affords walking on, for instance. However, even something so basic as the rigidity required for walking on need only remain long enough for me to perform the action in question. Non-Newtonian dilatant fluids, for example, such as a suspension of starch in water, can afford walking despite the rigidity only lasting as long as the impact of a person's foot with its surface (Custard, 2014).

Affordances are dynamic things whose presence describes an opportunity for effective action, a possibility of coordination. In being such they say as much about the agent acting as they do the environment with which they are engaged. Chemero (2009, p. 140) follows Michaels (2000), who argued that an affordance to punch a falling ball is perceived as "it's time to flex the elbow." By this view, affordances are not properties of the environment. They are, rather, relations that hold between an agent and their environment. In making this claim Chemero removes a significant point of disagreement between ecological and enactive thinking and asserts a relational description of the psychological environment. Chemero is still a realist about affordances, because affordances really do exist, but it is, as he put it "not a simple form of realism" (Chemero, 2009, p. 150). It is a realism that seems quite consistent with the emergentist commitments of enactivism.

Chemero also addresses considerations about how an organism might perceive affordances. The orthodox Turvey et al. (1981) view on the matter requires a strict one-to-one relationship between the ecological information (e.g., structure in the ambient array of light) and the affordance that it specifies. These must be lawfully related (even if the laws in question are specific to an ecologicalniche). Chemero (2009) uses the situational semantics of Barwise and Perry (1981) to dilute the lawfulness requirement. Like the philosopher Millikan (2000), he argues that the relationship need not be exception-free, it just needs to be sufficiently reliable to guide behavior effectively under normal circumstances, and, we might imagine, within normally recoverable bounds of likely perturbation or failure. This move offers some flexibility in the relationship between the agent and their environment that undermines the kind of objectivist pre-specification of relationships that Varela et al. (1991) considered counter to the enactivist emphasis on the role of the specific embodied agent with its own history of coupling with the world.

This perspective, more sensitive to individual histories and dynamics, is also present in Chemero's view as he argues for another dynamic aspect to affordances – the gradual transformation of affordance relations over various timescales. Traditional thinking on affordances links them to the organism's ecological niche, noting that over evolutionary time aspects of the environment become available for use by members of a species. Chemero points out that this also happens at a personal level over developmental time. It involves, in a sense, the construction of an individual, personal eco-niche, as a person develops certain skills or abilities and learns to engage with their environment in different ways.

Chemero (2009) refers to this niche for the individual organism as the phenomenological-cognitive-behaivoral niche of the particular animal. It is a concept intended to enable a more fine-grained analysis of the animal environment system. Rather than examining the effect of populations of animals on their shared environment, the focus is on the peculiarities of a single agent's effect on the world around it. This will include the agent's continually increasing sensitivity to the specific, particular details of that world, that give rise to a unique perspective. Such phenomenological-cognitive-behavioral niches will certainly be largely shared between animals with similar capacities, but will differ insofar as the particular histories and capabilities of those animals differ.

This fitting of the agent with its environment over time is achieved through another relation that holds between the two, one that is complementary to the affordance relation. Chemero (2009) describes *abilities* as relations that mediate changes in the animal (Chemero emphasizes the nervous system, but there is no *a priori* reason to limit the scope to that) that enable the organism to become sensitive to affordances.

In his outlining of the notion of abilities and the timescale over which they change Chemero focuses primarily on developmental time, the kinds of periods over which we learn new skills and gradually change what we are capable of. These changes tend to occur over much shorter timescales too, though. Central to enactive theorizing is the notion that the agent-enviornment system is valenced, normative. Enactivists, to a much greater degree than mainstream ecological thinking, emphasize the importance of motivation and intentions. In addition to more stable species-typical capabilities and even individually tuned skills, the immediate field of action of an agent will depend on the flow of it needs and intends at a given time.

#### **THE ENGAGEMENT: THE FIELD OF ACTION OF INTENTIONAL AGENTS**

In a brief paper that provides an overview of enactivist psychology, McGann et al. (2013) claim that enaction begins with an engagement – a particular encounter between an agent and their environment. For enactivists psychology is to be found in the entire animal-environment system, and ecological psychologists hold to the same idea. Both points of view find an agent already dealing with being alive, already interacting with its environment, rather than waiting in passivity and darkness for stimulation. The ecological perspective (and Chemero's revision of it still shares this characteristic) examines how the process of the interaction unfolds or develops over time, the dynamics of the sensorimotor processes.

Enactivists have a similar interest, but also explore the various ways in which the biological dynamics of the living agent motivate or drive (and constrain) those sensorimotor processes (McGann, 2007; Di Paolo et al., 2010; Barandiaran and Egbert, 2013; McGann et al., 2013). Though substantial work remains to be done on this issue, particularly in "scaling up" to the complexities of human psychology, enactive theory makes salient considerations of intentionality (that is the formation and dynamics of intentions to act) about which ecological theorists have had comparatively little to say.

A notable exception on this front is the intentional theory of affordances advanced by Heft (1989). In an account that I believe is largely consistent with Chemero's, Heft argues that in order to understand affordances we must not only describe them relative to the body of the agent, but to that body in the process of intentional activity. This provides a much more dynamic and relational conception of affordances than the ecological psychology orthodoxy. As motivations wax and wan the relevance of different abilities varies and the engagement between agent and environment varies accordingly. Heft (1989) notes that intentions must themselves be considered in world-involving, relational, terms – these are not mentalistic representations after all – and to leave them out of the description of the agent's relations with its surround is a mistake.

Chemero's description of abilities has no prominent role for these intentional, motivational aspects of the agent's activity, but no description of an engagement, an animal-environment system, can be complete without them. Along with the driven, valued, normative character of the engagement, they also highlight the short-timescale dynamics of abilities and affordances, which will arise and dissolve as relations as the agent finds its values challenged or facilitated, in conflict or coordination, in interaction with its environment. We might describe the general ecological niche of a given species, and even a particular animal's phenomenological-cognitive-behavioral niche, but animals don't interact with generalities. These broader descriptions of an environment provide progressively higher resolution explanations for an animal's behavior. Understanding the finer-grained details of an organism's activity on a given occasion will need to include the kind of fast-moving intentional dynamics that are involved in the engagement in question.

The engagement, the field of action of an agent, is defined by a complement of ability/affordance relations, with the proviso that these relations have a normative, intentional aspect. These relations have value. Sense-making was described above as the process of an organism being sensitive to and integrating the world into its own activity (at the very base, the activity of continually producing itself, staying alive). Insofar as something in the world plays a role (is an opportunity or threat of some kind) in the agent's normative activity, the agent can make sense of it through the coordination of its behavior with the event, process, or object in question. Motivations and intentions are how we describe these normative aspects of an agent-environment system, and so sense-making is effectively a process of the coordination of an agent's values and intentions with its environment.

Of course things get a little more tricky when there's more than one agent in that environment.

## **SHARED FIELDS OF ACTION**

Where more than one agent is involved in a situation then the engagement is not just the coordination of one set of values or intentions with the environment, but a set of complex interactions between the various agents and their shared environment. Where the meaning or sense-making in the individual case is in the congruence between abilities and affordances that hold between agent and environment, in the social case there will be a set of relations that are negotiated between the agents. Whether another agent is an obstacle or resource, impediment, or aid to a given agent's intentions is often malleable, due to the adaptive responsiveness of both agents to each other.

The variability of agentive action is in theory a significant challenge for an ecological approach to understanding the environment. The range and variability of animals' behavior could be thought to undermine the reliable relationship between structure of ambient energy at any time and the animal's activity. During any given period it is conceivable that the same person might engage in any one of numerous possible behaviors, some of which will share postures, gestures, or other physical attributes that give rise to structure in, for example, ambient light. Social interaction seems to our intuitions to be so pregnant with possibility that effective interpersonal engagement cannot be accounted for by the kind of direct, ecological mode of description I have been advocating here. Even allowing for Chemero's somewhat less stringent relationship between environmental structure and perceived event there seems to be a want of reliability when dealing with other people, given just how diverse a single individual's repertoire of behavior can be.

Of course this is a straw man of variability, because human behavior is rarely if ever that arbitrary or unpredictable. The question arises though, as to what provides the stability that human activity tends to have and how it channelises behavior such that the logically conceivable problem rarely ever arises in practice.

Heft (2007) has argued that a completely realized ecological psychology will in fact be social to its core. He claims that social activity is a fundamental part of the fabric of human psychology and must be a fundamental part of a complete ecological psychology. Drawing on paleoanthropology he notes that sociality is not just part of our evolutionary heritage, but part of our evolutionary history. *Homo sapiens* evolved in culture, not the other way around (Heft, 2007; see also Donald, 1991, 2001a,b and Tomasello, 1999a for related arguments). The mutual influence between animal and environment over time is a central tenet of ecological psychology – the organism's ecological niche makes demands of and shapes the behavior of the organism, and in turn the organism over time affects the niche. Throughout the process of development, then, our behavior forms within and is shaped by our culture. Two facets of this process can be quickly identified.

The first facet is the process of behavior shaping that Merlin Donald has termed "deep enculturation" (Donald, 2001a). The idea is that during development a complex of standard ways of doing things is formed through which more intricate coordinations with our native culture are enabled. The ecological psychologist Reed (1993) put forward a distinct but related idea in what he terms the "field of promoted action." Societies tend to evoke some behaviors more than others and in doing so shape the habits and capabilities of their members over the course of development. This of course has the effect of stabilizing behaviors, constraining the innumerable (or at least very numerous) possible activities in which a person might engage within some reliable range.

One of the principle means by which the field of promoted action is produced is the careful design and structuring of the physical environment (Reed, 1996b). This cultivation and curation of the environment in which we behave is the second facet of development that makes social interaction more reliable.

Gibson (1986) discusses the notion of *places*. Places are areas of the environment with a set of functional properties – they enable affordances for various specific activities. Over evolutionary, historical, and developmental time the physical environment has been nurtured to given ends, and distinctions between places sharpened. Much of our social activity, our shared and inter-coordinated behavior, is conducted in physical environments that support it. Examining this interdependence of the social and physical in some depth, Heft (2001, 2007, 2011) has shone a particular spotlight on the theory of behavior settings developed by Barker (1968) and Schoggen (1989). Developed independently of Gibson's work, Heft has nevertheless argued that the theory of behavior settings is a effectively a theory of Gibson's "places."

## **BEHAVIOR SETTINGS AS A THEORY OF PLACES**

A behavior setting involves a cohesive set of standing patterns of behavior and those patterns' physical surroundings (Barker, 1968). Easily overlooked and underestimated because of their near omnipresence in our lives we can nevertheless recognize examples of setting kinds immediately – a soccer game, a mathematics lesson, a religious service, a conference talk. They involve a set of physical resources, which often provide a spatial boundary to the setting (e.g., the walls of a classroom or church) as well as structuring the behavior of those within (perhaps with so blunt a means as a rigid arrangement of furniture). They also tend to have quite clear temporal boundaries. Specific instances of a behavior setting will form, evolve and dissolve at given times, often explicitly stated (e.g., a Wednesday, 10.30–11.10 mathematics lesson in classroom B6). Probably a majority of our lives is spent in different behavior settings (Heft, 2007).

In Barker's (1968) original work examining the natural flow of behavior of residents in a small town, he and his field team found that the differences between the behavior of individuals tended to be greater within a person between settings than between people within settings. They also found that settings were just as powerful, if not more so, than identified antecedent stimuli in predicting the behavior of a person in their natural environment.

The theory of behavior settings is a rich and detailed one, whose apparent power unfortunately seems matched by its obscurity (Scott, 2005). For our present purposes it serves as a means for illustrating how cultural practices are enmeshed with physical surroundings and how the stability of physical environments is used to help stabilize social interactions.

With behavior settings in mind we can conceptualize deep enculturation as a process of learning how to engage with and make use of resources in our environment that are shaped and made available by a history of cultural practice. Enculturation is the cultivation of abilities to use socially provided and promoted resources, opportunities for shared and sanctioned actions.

Heft (2001) argues that the physical settings (Barker uses the unfortunate term "synomorphs") which are complementary or similar in structure to the behaviors they support (they are "synomorphic" to the behavior) can be considered affordances for joint action. Many of the places in which we spend our lives are selected and designed to support the coordination of multiple people in some activity. More, the character of the physical environment and the inertia of encultured habits can lead settings to coerce the behavior of their inhabitants. Heft (2001) puts it as follows:

*The relation between milieu and behavior is not contingent. It is not the case that because this room worked well as a classroom on previous occasions that it can be used for that purpose again. Rather it worked well on previous occasions (or not) because of its structure or form.*

*Because the meaning of the setting resides in the congruence between behavior and milieu, this relational structure has the potential to bring actions of individuals entering the setting in line with its functional character.*

(Heft, 2001, p. 288)

Enculturation, through the promotion of certain patterns of behavior, substantially reduces the kind of variability in behavior that might be conceived as challenging a radical embodied (enactive, ecological) account of social interaction. Our subjectivity is at any time constrained by our shared environment, shared histories and shared abilities or habits.

This capacity for cultural background and social activity to constrain and shape our behavior brings into focus a final complication, a quirk of social dynamics that has seen some significant discussion over the past few years within enactivist thinking: the autonomy of the social.

## **SITUATED PARTICIPATION: BEING DEEPLY ENGAGED WITH OTHERS**

De Jaegher and Di Paolo (2007) noted that there are occasions when a social interaction can be more than the sum of its parts – situations in which the interaction takes on something of a life of its own. These situations, in which the participants together find themselves coordinating with each other perhaps despite their individual intentions, or coordinating with their environment in a manner not possible for either individually, are examples of "participatory sense-making."

An important aspect of participatory sense-making is that the social dynamic is emergent. The social interaction is not merely a combination or aggregate of the behavior of its participants but is autonomous, it has a dynamic of its own that can constrain the behavior of the interactants just as much as facilitate it. The autonomous organization of the social dynamic provides it an inertia, making the interaction resistant to perturbation, perhaps even by the individuals enacting it. Whenever we have found ourselves in a conversation we couldn't get out of (when both participants want it to stop), or felt an interaction drawn on an unwanted trajectory despite the efforts of both parties to prevent it, we are experiencing the autonomy of that interaction. An example used by De Jaegher and Di Paolo (2007)is that of two people trying to pass one another in a narrow corridor and being briefly unable to do so because of the way their behavior becomes coordinated – a brief back-and-forth "dance."

For our present purposes what we take away from the idea of participatory sense-making is an admonition that engagement with a social situation is constrained not only by the ability/affordance relations of the participants but also by the inherent dynamics of the interaction itself. This over-riding dynamic, whether due to our culturally inherited resources, the inertia of habitual practice or our tendency to synchronize the rhythms of our actions with the environment (and the behaviors of others), can impose tensions and create perturbations in an agent's activities as much as they might enable or facilitate them (De Jaegher and Froese, 2009). In situations of participatory sense-making we will need to describe the shared engagement in terms that are more than the aggregate of the individual engagements that comprise them.

Participatory sense-making as it is currently theorized is an important phenomenon that occurs in some but not all social interactions. If the actions of individuals are explained by the evolution of the agent-environment systems in question, the arising of tensions and coordinations between the two, there will be some circumstances in which the explanation of the actions of two or more interacting agents might produce a remainder – where their actions were in fact more than the sum of their parts, where the group of agents together were a single entity engaged with their environment rather than an aggregate of individuals. Behavior settings and the notion of places remind us that participatory sense-making will not occur in a vacuum but often in a cultivated physical milieu. These concepts offer a first pass theoretical account of how such over-arching dynamics can arise and can have functional effects. Barker (1968) and his colleagues have explored some of the ways in which settings coerce behavior, examining optimally- and under-inhabited settings and the different ways in which people respond to the requirements of a given place. These have also been put to some practical use in,for example, promoting inclusiveness in school-aged children (Fuhrer, 1993). A sensitivity to the broader context of a given activity offers some possible value in predicting when participatory sense-making is likely to occur, and what the course of its dynamic over time is likely to be.

Participatory sense-making reminds us that social activity is not just *more* activity, but is different in kind from interaction with the inanimate environment. However, the ideas of behavior settings and the acknowledgment of the socially curated, designed nature of most of the places in which human activity takes place equally remind us that participatory sense-making and the other complexities of social interaction are both supported and constrained by a host of observable and investigable factors. Recent work by Froese et al. (2014) is an example of how the dynamics of social interaction have been examined explicitly in these terms in a minimalist virtual environment. The theory of behavior settings offers a means of analyzing environments to explore the issue in more naturalistic contexts.

# **RADICAL EMBODIED INTERSUBJECTIVITY**

As has already been noted, a radical embodied approach that combines enactive and ecological thinking sees perception and action as occurring within an already flowing stream of activity. A living agent is never entirely at rest (even sleep is an activity). Such a view thus adopts a Deweyan notion of tensions and coordinations

of behavior in context. When we are considering human beings the dynamics of tensions and coordinations are shaped by the practices and places of the surrounding culture.

Traditional, computational, or cognitivist models of psychology begin with a bare, decontextualized psychological system and layer context in the form of interpretations or biased representations over what are imagined as at least potentially faithful encodings of an external environment. For the view advanced here perceiving is done within the flow of behavior and so objects or actions of others show up in that flow, are engaged with as concordant or discordant with it. Interpretation doesn't come after the fact, culturally formed cognitive activity is not an add-on or appendix to normal cognitive activity. Because in the human case abilities, habits, and practices are cultivated according to cultural norms from our earliest experiences, our culture does not introduce bias or add skew to our behavior, but inheres in the very basic forms of our activity from the get-go. Our acting and perceiving is done in cultural settings – in places – and our abilities (and their complementary affordances) develop accordingly.

Tomasello (1999a,b) has argued a similar point. He criticizes Gibsonian researchers for overlooking the cultural context in which objects are first encountered and the manner in which this affects people's sensitivity to those objects' affordances. He suggests the idea of "intentional affordances," which are the normal functions to which objects are put and will be primary for that object in the field of promoted action. Here, I point out that this mode of thought generalizes to the social activity itself.

Just as perceiving and acting occurs within an on-going flow of activity, so people and their behavior are always present within a flow of cultural practice. We cannot identify and examine perceiving and acting separately to the context in which they show up, but must analyze them within the engagement between the agent and the environment. Similarly, we cannot pick out the individual cognitive processes or actions separately from their cultural context and attempt to understand the whole as the sum of its parts. A radical embodied approach requires us to always address phenomena as occurring as wholes, with parts existing insofar as they stand in various relations within that identified system. In the case of psychology, parts arise from wholes, rather than the other way around.

This approach imposes some challenges on us as investigators. A cognitive science must specify the context of its observations at all times, making explicit the situation in which the processes of interest are arising. While there may be ways in which aspects of context can be held steady across observations and even experiments, we can never leave implicit the particular dynamics of the setting in which behavior is emerging and flowing.

A radical embodied understanding of both individual and joint activity places that activity out in public – in the observable interaction. Intentions, actions, emotions and other phenomena are not locked away in the heads of participants, needing a series of inferences to identify them. We perceive these things directly insofar as we can coordinate effectively with them, whether that activity involves scientific observation or just personal interaction. To that extent, science is a direct extension of the personal activity of making sense of things (and in fact, is a contextualizing support for sense-making for those of us who are practicing scientists).

An understanding of intersubjectivity is approached from precisely the same perspective, seeing the individuals show up within the engagement rather than seeing the engagement as the linear sum of the actions and interpretations of rigidly specified individuals as they meet.

There is, thus, a sense in which you are a different person in different interactions, but the stability of your bodily dynamics and the inertia of your habitual behaviors, cultivated over time, within cultural contexts, means that you are not created anew, without history every time. The identity of individuals within interactions varies between situations but neither arbitrarily nor entirely unpredictably. Your role in a behavior setting will shape your behavior, as will your personal history of experience with such settings, and such roles. Many interactions will enable multiple social roles to be played and their associated skills exercised, other roles will be suppressed, or starved of opportunities.

Interacting with my undergraduate students, for instance, it is demanded of me that I play a didactic role and deploy a particular complex of skills in doing so. There is also occasional possibility for indulging in a little philosophical speculation but little if any possibility or likelihood in passing a soccer ball or debating the merits of a science fiction novel. A classroom setting can make certain demands on my because of my history and skillset – it makes different demands on my students.

Emergent interpersonal engagements are not fully autonomous from their enabling conditions – they still occur between embodied agents who are coupled to their environments (including each other) through various sensorimotor abilities. The utterance of a promise, a protestation of love or a glint in the eye still produce structure in the ambient energy of the living medium with which attuned agents can coordinate their actions. Though there is an important sense in which it is autonomous and the social domain has a dependence on recent history of the individuals' interaction that the inanimate world does not – the same structure in the ambient array provides ecological information (supports effective coordination of action) under one history but not under another. What is more, because the relation is continually evolving, being negotiated, based on the actions of the agents involved some affordances for joint action will only arise when other aspects of behavior have been effectively entrained and the two are involved in participatory sense-making.

Attempting to reduce participatory sense-making to the actions of individual participants is doomed to failure, but the autonomy of social practice is still conducted by embodied agents in physical settings and these emergent dynamics can be explored by examination of these enabling and constraining features.

## **DIRECT SOCIAL PERCEPTION**

One of the concerns that critical readers might raise is whether direct perception is really possible in activity that is so heavily mediated by cultural processes. How can it be the case that I directly perceive, say, an insult, given that the host of cultural and historical dependencies on which such an experience is based? Surely there must be some representation that the cognitive system must use to keep track of relationships and enable the rich complexity of even momentary events in social interactions.

This kind of concern makes two mistakes.

First, direct perception is not a claim that what is perceived is unmediated. Cultural events and actions are mediated by tradition and practice, but those events can still be directly perceived. Cognitivist and computationalist models of psychology have perhaps trained our intuitions to consider that only the world as described by Physics, in its neutral, raw, brute form can be perceived directly. To perceive culturally mediated phenomena such as social roles, symbols, and the social implications of actions *must* require mental gymnastics to infer the cultural import of a physical event.

Direct perception of non-physical (in the "mere" or "brute" sense of physical) is a perfectly coherent notion and all of ecological psychology is grounded in the idea. For ecological psychology the pickup of ecological information is done through physical interaction, of course (what else could it be?) but what that information enables perception of can be anything so long as a sufficiently reliable relationship exists between it and the information in question. The glint in my wife's eye or the rudeness in the exclusionary orientation of a person's body, or of the offensiveness of their utterances, are perceived within the interaction, not built, LEGO-style, from the perception of their elements. They depend on my ability to engage effectively with social practices and in the individual people in question, but as I have noted those abilities are culturally shaped from the ground up. My movements and utterances are culturally structured, meaningful at their most basic level; cultural relevance and value is not added afterward.

Second, direct perception is not instantaneous (Bingham, 1995). It is un-mediated by inference or representations, but it can still take time, sometimes quite a long time. Because of the dynamic nature of the relationship at least some time (even if it's a very very short period of time) will be required to allow the agent to coordinate their behavior. However, where the dynamics of the environment are slower, then the process of perceiving might take relatively prolonged periods. It can take time to see another person's intentions and different periods of time might make different aspects of the other person perceivable. Over increasingly long durations we may see only the contours of the other's intentions, then their general thrust and tone, and finally their finer grains. Direct perception can be slow, and what is perceived can be vague. There is also no particular moment in time at which perceiving is "complete" because such perception always occurs in the flow of on-going behavior – activity does not have to wait for it.

For more cognitivist thinkers any prolonged coordination will imply the existence of a representation capable of being updated so that the agent can keep track of details as they become apparent. This mode of thought, however, makes the assumption that at any given time the agent's interactions with the environment are being built up from bare physical facts that need interpretation, and are overlooking the possibility of an on-going process of activity whose trajectory is amended as it is perturbed or otherwise constrained by the way in which it is coupled with the environment.

Historical dependency of processes is something that is inherent in a great many forms of dynamical system, with no need for representations to keep track of that history. Social relationships between agents are particularly sensitive to historical dependencies.

## **SUMMARY AND CONCLUSION**

Dewey (1896) argued that no behavior occurred outside of the context of the animal's already on-going stream of activity. Perceiving and acting exist in a dynamic of tensions and coordinations that enable the continuity of a person's effective coping in the world. The "parts" of psychological activity emerge out of the "whole" of a living being's engagement with its environment, not the other way around.

Enactive and ecological approaches to cognitive science developed independently, but effectively extend and flesh out Dewey's insight. In doing so, they highlight the need for a characterisation of both the embodied psychological agent, and the environment, in terms that acknowledge their interdependent relationship. I have argued in the present paper that bringing enactive and ecological points of view together offers the best hope for such an account, over either perspective alone [and in this I offer an initial response to a call for their closer alignment by Chemero (2009)].

The "already acting" point of view that this account involves means that the environment is never encountered ahistorically. All acting and perceiving is done in a flow of activity that is continuous for living beings. For us human beings the fields of action, the engagements in which we find ourselves, have both personal and cultural histories. Our subjectivity is dependent on our intersubjectivity. Social activity mediates individual psychology but does so in a manner that is fundamental, not additional. Cultural activity does not sit on top of more basic forms of behavior. Rather, it evokes, shapes and transforms those basic actions. The environment in which we human beings live and act is cultural to its core.

The approach advocated here poses some challenges for empirical investigation, but can also draw effectively on established theoretical resources, particularly in the form of the theory of behavior settings of Barker (1968) and Schoggen (1989). As we look to the horizon of a more culturally sensitive embodied cognitive science it might also be possible to begin a process of integration with some aspects of cultural psychology (Bruner, 1990; Harré, 1998; Benson, 2000; Harré and Moghaddam, 2012) where the primacy of cultural practice in psychological activity is already acknowledged.

By these lights, a science of radically embodied intersubjectivity is not only possible, it is the only way in which we can adequately address the question of the nature of the human mind.

## **ACKNOWLEDGMENT**

I am grateful to Harry Heft and Cor Baerveldt for extensive comments on an earlier version of this paper.

## **REFERENCES**

Barandiaran, X., Di Paolo, E., and Rohde, M. (2009). Defining agency: individuality, normativity, asymmetry and spatio-temporality in action. *Adapt. Behav.* 17, 1–13. doi: 10.1177/1059712309343819


Reed, E. S. (1996b). *The Necessity of Experience*. New Haven: Yale University Press.


**Conflict of Interest Statement:** The Editor Hanne De Jaegher declares that, despite having collaborated with author Marek McGann, the review process was handled objectively and no conflict of interest exists. The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 May 2014; accepted: 31 October 2014; published online: 18 November 2014.*

*Citation: McGann M (2014) Enacting a social ecology: radically embodied intersubjectivity. Front. Psychol. 5:1321. doi: 10.3389/fpsyg.2014.01321*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 McGann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# *Paul Lodder\*, Mark Rotteveel and Michiel van Elk*

*Department of Psychology, University of Amsterdam, Amsterdam, Netherlands*

#### *Edited by:*

*Ezequiel Alejandro Di Paolo, Ikerbasque - Basque Foundation for Science, Spain*

#### *Reviewed by:*

*Moritz M. Daum, University of Zurich, Switzerland Ezequiel Alejandro Di Paolo, Ikerbasque - Basque Foundation for Science, Spain*

#### *\*Correspondence:*

*Paul Lodder, Department of Psychology, University of Amsterdam, Weesperplein 4, Room 1.03, Amsterdam 1018XA, Netherlands e-mail: p.lodder@uva.nl*

Recently within social cognition it has been argued that understanding others is primarily characterized by dynamic and second person interactive processes, rather than by taking a third person observational stance. Within this enactivist view of intersubjective understanding, researchers differ in their claims regarding the innateness of such processes. Here we proposed to distinguish nativist enactivists—who argue that studies on neonatal imitation support the view that infants already have a non-mentalistic embodied form of intersubjective understanding present at birth—from empiricist enactivists, who claim that those intersubjective processes are learned through social interaction. In this article, we critically examine the empirical studies on neonate imitation and conclude that the available evidence is at least mixed for most types of specific gesture imitations. In the end, only the tongue protrusion imitation appears to be consistent across different studies. If neonates imitate only one single gesture, then a more parsimonious explanation for the tongue protrusion effect could be put forward. Consequently, the nativist enactivist claim that understanding others depends on second person interactive processes already present at birth seems no longer plausible. Although other strands of evidence provide converging evidence for the importance of intersubjective processes in adult social cognition, the available evidence on neonatal imitation calls for a more careful view on the innateness of such processes and suggests that this way of interacting needs to be learned over time. Therefore the available empirical evidence on neonate imitation is in our view compatible with the empiricist enactivist position, but not with the nativist enactivist position.

**Keywords: enactivism, neonatal imitation, intersubjectivity, action understanding, social cognition**

# **1. INTRODUCTION**

Humans are social in nature. Almost everything we do involves interacting with other human beings. An important prerequisite for social interaction is the understanding of others 1. Take for instance a game with three people in which person A reads a message and has to transfer it to person B, who, after receiving the message has to transfer it to person C. The difficulty in this game, however, is that person A and C are not allowed to interact directly and all attendants are not allowed to use spoken language. Therefore they have to transmit the message by only using weird sounds and gestures instead. Often the receiver of the message imitates the gestures and sounds of the transmitter in order to better understand the transmission. In the end, the original message is compared to person C's interpretation of the message received from person B. Occasionally, person C's interpretation differs considerably from the original message, but surprisingly often the interpretation lies close to the original message. This example not only illustrates that human interaction requires us to understand each other's actions, but it also shows that we are pretty good at it, even in complex situations where we cannot use all available channels of communication. But how exactly are we able to understand actions of other people?

Within the field of social cognition, there are two dominant theoretical approaches that explain our ability to understand other human beings form a cognitivist perspective. According to *Theory theory* (TT), we understand others by theorizing about their minds (Leslie, 1987; Gopnik and Wellman, 1994). On this account, the understanding of other minds relies on taking a theoretical stance and postulating the existence of mental states

<sup>1</sup>We realize that the word "understanding" has a strong cognitivist connotation, when combined with words like "intention," but in our view the term understanding in itself can be used by both cognitivists and enactivists alike, because understanding can also be interpreted in a non-cognitivist way. For instance, Gallagher and Hutto (2008) published an article titled: "Understanding others through primary interaction and narrative practice." Carpendale and Lewis (2010) define social understanding as the "everyday thinking necessary to engage in social interaction." Because this definition could imply a cognitivist reading of social understanding and we aim to remain agnostic regarding the debate on the role of representations when it comes to explaining social interaction, we propose to define social understanding as "the skills necessary to engage in social interaction." Social understanding from a cognitivist perspective would for instance involve skills like having mental representations about other people's intentions, while from an enactivist perspective it would for instance involve skills like an immediate perceptual understanding arising from a social interaction in which intentions are explicitly expressed in embodied actions Gallagher and Hutto (2008).

in others that can help us to explain their behavior. *Simulation theory* (ST), on the other hand, posits—broadly speaking—that we use our own experiences as an internal model for understanding others (Gordon, 1986; Goldman, 2002). We simulate thoughts and/or feelings that we would experience if we were in the very same situation the other person is in. TT and ST agree about the fact that we explain and predict other's behavior using mental state attributions by taking a third person observational stance. Because both theories use internal representations to explain how human beings understand others, they can be viewed as representational theories. The nature of these representations, however, differs between the two theories and they therefore disagree clearly with regard to the processes that let us understand others. TT claims that understanding others can be accomplished by using abstract theories about other minds, while ST claims that representations are based on sensorimotor experiences instead and involve simulating others' thoughts and actions.

Recently, it has been argued that understanding others is not primarily characterized by taking such a third person stance involving representations of other's actions, but instead by a second person stance involving dynamic and interactive processes (Zahavi, 2001; Gallagher, 2005; Gallagher and Hutto, 2008; Fuchs and De Jaegher, 2009). This enactivist position proposes that the environment as well as an agent's body play an important role in shaping our cognition. According to enactivists, cognition is a sense making process, emerging from a dynamic interaction between agents and the environment in which they are embedded (de Bruin and Kästner, 2012). Enactivist theories are for instance supported by studies on motoric development in children, showing that their stepping behavior does not result from a cognitive programme present in the child, but instead the behavior selforganizes in a dynamic interaction between a child's spontaneous limb movements and a changing environment (Galloway and Thelen, 2004; Gershkoff-Stowe and Thelen, 2004). The enactivist proposal differs from both third person perspectives on social cognition (Theory theory and Simulation theory) in that the latter two use internal representations to explain our understanding of others, while enactivism is strongly anti-representational (Chemero, 2009). While this anti-representationalism is an essential characteristic of enactivism in general, enactivists still argue about the origins of the intersubjective processes we use to understand others. Some argue that these processes are innate and therefore already present at birth (Gallagher, 2001, 2005; Gallagher and Hutto, 2008; Fuchs, 2009), a position coined *nativist* enactivism. *Empiricist* enactivists, on the other hand, claim that these intersubjective processes are not innate, but develop as a result of interpersonal interaction (Di Paolo and De Jaegher, 2012; Froese et al., 2012) 2 .

Nativist enactivism does not necessarily imply a rejection of the empiricist notion that infants develop intersubjective understanding through learning. A nativist enactivist could view the processes underlying social cognition as primarily innate, while allowing experience to play a secondary role. Consequently, learning could still influence human cognition as a trigger of innately determined intersubjective processes (Gallagher, 2005). A much more stronger nativist claim would be to deny any influence of learning on human understanding whatsoever. However, such *final state* nativism (Meltzoff, 2002) is rare within enactivism, because it is incompatible with the central enactivist tenet that social cognition is shaped by experience in a dynamic interaction between an agent's body and the environment. To our knowledge, most nativist enactivist therefore still allow learning to play a role in shaping cognition (Zahavi, 2001; Gallagher and Hutto, 2008; Fuchs, 2009).

The nativist enactivist view on intersubjective understanding is supported by studies on intentionality detection (Meltzoff, 1995), eye direction detection (Baron-Cohen, 1997), and neonatal imitation (Meltzoff and Moore, 1977), suggesting that very young infants already have a non-mentalistic and embodied form of intersubjective understanding (Gallagher, 2008). Of those three strands of research, the studies on neonatal imitation are most important to the nativist enactivist view because they could imply that a basic form of intersubjective understanding is already present at birth and does therefore not depend on any learning—as, for instance, assumed by empiricist enactivists. More specifically, studies on neonatal imitation imply that a basic form of intersubjective understanding is reflected in the infant's ability to automatically and dynamically respond to observed actions, by producing a similar gesture, suggesting an important role for an innate body schema guiding interaction with the world (Gallagher, 2005). Recent reviews on neonatal imitation literature, however, questioned the generality of neonatal imitation and proposed alternatively more parsimonious theories to explain these findings (Anisfeld, 1991; Jones, 2009; Ray and Heyes, 2011).

In contrast, the empiricist view on enactivism puts more emphasis on the importance of sensorimotor and social learning for intersubjective understanding (Di Paolo and De Jaegher, 2012; Froese et al., 2012). In support of this account it is for instance pointed out that imitation in infants is experience-dependent and possibly mediated by the sensorimotor configuration of the socalled mirror neuron system (MNS). Furthermore, it is argued that rather than being equipped with an innate body schema,

<sup>2</sup>We realize that many enactivist positions are more nuanced than the nativistempiricist distinction suggests, but we still consider our distinction useful because it provides a conceptual tool for classifying different theories in their relative emphasis on learning or innate processes. That is, we argue that some enactivist explain social understanding in part by innate processes, while other enactivists deny the relevance of such processes because they claim that interactive or learning processes are sufficient. The distinction between nativist and

empiricist enactivism therefore primarily serves an instrumental purpose in order to illustrate the differing enactivist views on the origin of social understanding. A similar empiricist-nativist distinction appears to be a fruitful way to classify other developmental debates, such as the origin of knowledge (Spelke, 1998), language (MacWhinney, 1999), or spatial and quantitative processing (Newcombe, 2002). We propose to use a similar distinction to clarify the present debate on the origin of social understanding. Disentangling theories based on their relative emphasis on learning or innate processes is especially relevant for discussing the evidence of neonatal imitation. That is, if neonatal imitation would exist, this provides strong evidence for the notion that basic forms of social interaction are already present at birth and do not have to be learned.

infants gradually acquire an implicit sense of their body through visuomotor and visuo-tactile experience (Zmyj et al., 2011).

In the present paper we investigate whether the available empirical evidence for neonatal imitation poses a potential problem for the validity of the nativist enactivist claim that understanding others depends on second person interactive processes that are already present at birth. If neonates can imitate only one single gesture, then a more parsimonious explanation could be put forward. Therefore, we will investigate the scope of neonatal imitation, because the nativist enactivist theories rely on the generality of this phenomenon (Heyes, 2001). First, we will clarify the basic concepts and theories about imitation, followed by a short review of the classic neonate imitation experiments by Meltzoff and Moore (1977, 1983a, 1989, 1994). After that we will focus on some contradictory findings, followed by an examination of two systematic reviews (Anisfeld, 1991; Ray and Heyes, 2011). Lastly, we will wrap these findings up and consider their implications for the enactivist approach on intersubjective understanding.

# **2. IMITATION**

One of the milestones in parent-child interaction is the moment a newly born for the first time imitates the parent. Examples of such mimicking behavior are the imitation of observed head movements, facial gestures, or even rudimentary speech. Imitations are not confined to human beings: researchers demonstrated that birds and non-human primates are also able to imitate, even at a neonatal age (Carpenter and Tomasello, 1995; Custance et al., 1995, 1999; Akins and Zentall, 1996, 1998; Ferrari et al., 2006; Myowa-Yamakoshi, 2006; Bard, 2007).

# **2.1. DEFINITION**

A key issue within imitation debates is how genuine imitation is defined, hence how the construct of imitation is validated in different empirical studies. All definitions of imitation have in common that they entail an observer copying a body (part) movement of a model (Heyes, 2001). In other words, an observer receives visual information about an observed body movement and uses this information to perform a similar movement in response. Note that we exclude those situations in which the model's movement and the imitator's movement spontaneously co-occur. We also exclude any act to be of imitative nature when it is caused by something else than the model and its behavior (Anisfeld, 1991).

Further, it is important to distinguish imitation from both emulation (Tomasello, 1996) and spatial compatibility (Brass et al., 2001). Emulation—like imitation—concerns a person copying an action from a model, but the performed action is only similar to the model's action in terms of the goal and not in terms of the movements that lead to that goal. For instance, you might water the plants with a watering can, while I might achieve the same goal by using a watering hose. In that case, the goal of the action is the same, whereas the movements differ and this is considered an instance of emulation rather than imitation. Thus, a prerequisite for genuine imitation is a match between the observed and the performed movements. Spatial compatibility like imitation—involves a similarity between the relative position of the action of an imitator and a model, but with spatial compatibility the action's target is not necessarily similar. For instance, if a person standing opposite to you asks you to raise your right hand and he raises his own right hand at the same time, due to spatial compatibility you will be more likely to raise your own left hand instead. Emulation as well as imitation can also be used in order to understand the actions of others (Takahashi et al., 2010). That is, being able to imitate another person's actions implies the ability to respond to the other's movements in a way that is socially and communicatively effective.

## **2.2. CURRENT DEBATES IN IMITATION RESEARCH**

Within the field of imitation research, different debates regarding the onset, the underlying mechanisms and automaticity of imitation can be discerned. Although most scientists agree that human infants *are* able to imitate at some age, probably an equal number of scholars disagree about the *exact* age at which infants become able to show imitation. Numerous studies indicate that in their second year of life infants are able to imitate other people (Piaget, 1946; Meltzoff, 1995; Carpenter et al., 1998; Nadel and Butterworth, 1999). Yet, when it comes to imitation at a *neonatal* age, the results are still contradictory (Meltzoff and Moore, 1977, 1983a; Koepke et al., 1983; McKenzie and Over, 1983).

The second dispute concerns the underlying mechanisms of imitation and whether these differ between neonatal and older infants or even adults. In a way this debate mirrors also the nature-nurture debate, because the issue is here whether imitation is innate or depends on learning. If newly born infants can imitate, then this underlines the existence of an innate mechanism underlying imitation (e.g., an automatic coupling of observed actions to one's own behavioral repertoire). When neonatal imitation proves not to be genuine, on the other hand, and is not comparable to imitation seen in older infants, then this might indicate dependency of additional learning such as learning to couple observed actions to one's own behavioral repertoire (Anisfeld, 1991; Gallagher, 2001, 2005; Ray and Heyes, 2011) 3 .

Related to this debate is the third dispute to what extent imitation in adults can be viewed as automatic (Heyes, 2011). Studies on automatic imitation in adults suggest that the mirror neuron system (MNS) provides a direct connection between the perception of action and the production of action (Kilner et al., 2003; Press et al., 2005; Longo et al., 2008; van Schie et al., 2008). This involvement of the mirror neuron system (MNS) in imitation might imply that the system has evolved as a specialized mechanism for our intersubjective understanding (Rizzolatti et al., 2001; Gallese et al., 2004). On the other hand, it has been argued that the mirror neuron system is not an innate mechanism but relies on sensorimotor learning and accordingly develops through experience (Ray and Heyes, 2011). Thus, a similar discussion regarding innateness and automaticity vs. the role of experience and learning can be observed in studies on infant imitation and the development of the MNS.

<sup>3</sup>Some enactivists, however, do not necessarilly view two qualitatively different forms of imitation (neonate vs. adult) as problematic (Froese and Leavens, 2014).

## **2.3. FUNCTIONAL AND COGNITIVE MECHANISMS**

An important functional mechanism underlying imitation concerns the mapping from observed movements to one's own body. More specifically, this *correspondence problem* entails that when imitating someone, the imitator needs to know which observed body parts map onto his or her own body parts. In other words: it needs to be specified how visual information is translated into a corresponding motor act. If you see someone move their hand then you need to know that their hand looks similar to your own hand and that you are able to perform the same movement with your hand. This process becomes much more complicated when it involves the observation of body parts that are difficult to observe on your body, such as for instance your tongue. In order to solve the correspondence problem, cognitivist theories propose that infants imitate an observed movement by using an internal representation of the observed body part. Infants then associate this observation with a motor act by mentally matching this representation with proprioceptive information of their own body parts (Schaal, 1999; Heyes, 2002; Spaulding, 2010). Enactivist theories, on the other hand, propose that cognitive internal representations are not required to explain imitation. Enactivists propose that we understand other people primarily by directly responding to other people's behavior in a dynamic interaction between the environment and our own perceptual experiences.

Within enactivism, two different explanations of imitation can be distinguished. First, *nativist enactivists* claim that an innate body schema enables children to directly map observed movements (e.g., facial gestures) on their own movement repertoire. A body schema is defined as a system of sensorimotor processes that constantly regulates posture and movement—processes that function without reflective awareness or the necessity of perceptual monitoring (Gallagher, 2005). Such an innate body schema is biologically based and already present in the pre-natal stage (i.e., in the womb), where the child can already explore his own body through touch and proprioception (Butterworth, 1992; Gallagher, 2008). Nativist enactivist theorists claim that we understand other people primarily because of our *innate* capability to directly respond to other people's behavior involving a dynamic interaction between the environment and our own perceptual experiences and body schema (Gallagher, 2008). Support for the innateness of this process relies heavily on experimental studies showing that neonates already have a basic form of intersubjective understanding. If neonates have the capacity to dynamically interact with the environment by directly matching their proprioceptive experience with other people's behavior, then the basic mechanisms that adults use to understand others are already present at birth and do therefore not need to be learned. According to one nativist enactivist, the "studies on newborn imitation suggest that there is at least a primitive body schema from the very beginning. This would be a schema sufficiently developed at birth to account for the ability to move one's body in appropriate ways in response to environmental, and especially interpersonal, stimuli" (Gallagher, 2005). Similarly, according to Gallagher and Meltzoff (1996) the evidence on neonate imitation "suggests that there exists an innate system that accounts for the possibilities of early infant imitation." This line of reasoning indicates clearly that studies on neonatal imitation are of high importance to the nativist enactivist claim.

Nativist enactivists often refer to one particular set of studies on neonate imitation published by Meltzoff and colleagues (Meltzoff and Moore, 1977, 1983a, 1989, 1994). They use these studies to support the notion that the basic intersubjective mechanisms underlying adult social cognition are already present in neonatal infants. For instance, according to Fuchs (2009), the studies by Meltzoff and Moore show "that the capacity of imitation in human infants is essential for understanding others. From birth on, infants possess interpersonal body schemas for spontaneous facial imitation and emotional resonance. They experience the other's body as similar to their own, and thus, they also transpose the seen facial expressions and gestures of others into their own feelings. These schemas underlie the development of more sophisticated empathic abilities in the course of early interactions." In a similar vein, Gallagher and Hutto (2008) claim that the Meltzoff and Moore studies imply that "an intermodal tie between a proprioceptive sense of one's body and the face that one sees is already functioning at birth." In other words, these studies "confirm the existence of an innate body representation," allowing infants to "imitate some simple movements like protrusion of tongue" (De Vignemont, 2003).

The neonate imitation studies underlining the nativist enactivist claim (Meltzoff and Moore, 1977, 1983a, 1989, 1994) are, however, only a selective sample of all the studies conducted using the imitation paradigm; most other studies show at least contradictory results regarding the capability of genuine imitation in neonates. To our knowledge, most nativist enactivists do not refer to these contradictory findings (Gallagher, 2000, 2001, 2005, 2008, 2011; Zahavi, 2001; Gallagher and Hutto, 2008; Fuchs, 2009). Furthermore, the nativist enactivist's claim that neonates already have a basic form of intersubjective understanding relies heavily on experiments showing that neonates cannot only imitate one specific gesture but that they can imitate different kinds of social gestures. This generality of neonatal imitation is important to nativist enactivists: if imitation is an innate mechanism used for intersubjective understanding, then one would expect that this imitative mechanism is not limited to only one specific type of gesture. Reacting to only one specific gesture would probably indicate that neonates do not understand action in social situations but only imitate one particular gesture as a result of other, more unspecific biological, reflex-like, or learned mechanisms (Anisfeld, 1991, 1996; Heyes, 2001; Di Paolo and De Jaegher, 2012). As a consequence the nativist enactivist claim regarding the innateness and automaticity of imitation and action understanding would no longer be valid.

*Empiricist enactivists*, on the other hand, claim that the processes underlying imitation are dynamically learned during social interaction (Di Paolo and De Jaegher, 2012; Froese et al., 2012; Froese and Leavens, 2014). These views are substantiated by studies showing that the mirror system is continuously shaped through sensorimotor learning and therefore highly adaptive. This high plasticity of the mirror system enables the mechanisms underlying imitation to be constantly adjusted during interpersonal interaction (Catmur et al., 2007, 2009). We consider the distinction between nativist- and empiricist enactivism to be important, because it highlights the opposing views within enactivism regarding the origins of intersubjective understanding in humans. The studies on neonate imitation are important within this debate, because they are used to support the nativist enactivist view that those intersubjective processes are already present at birth. Although most empiricist enactivists are well aware of the conflicting evidence on neonate imitation (Di Paolo and De Jaegher, 2012; Froese et al., 2012; Froese and Leavens, 2014), some nativist enactivists clearly use the studies on neonate imitation as if they are an indisputable phenomenon (Gallagher, 2005; Gallagher and Hutto, 2008; Fuchs, 2009). Therefore, in the following paragraphs we will critically examine the studies on neonate imitation and consider the implications of these studies for both the nativist- and empiricist enactivist view on intersubjective understanding.

# **3. EXPERIMENTAL EVIDENCE ON NEONATAL IMITATION**

Studies on neonatal imitation are important within the imitation debate because they could imply that a basic form of intersubjective understanding is already present at birth and does therefore not need to be learned. The phenomenon of neonate imitation was already widely reported in the pre-experimental literature (Stern and Barwell, 1924; McDougall, 1926; Piaget, 1946), but the novelty of the Meltzoff and Moore (1977) studies was that they were the first to investigate neonate imitation in an experimental and systematic fashion, by studying infants in a hospital lab.

## **3.1. MELTZOFF AND MOORE'S SEMINAL STUDIES**

In one experiment, Meltzoff and Moore (1977) asked a model to present three different facial gestures to 12–17 days old infants. The model first presented each infant for 90 s with a neutral and passive face, which served as a baseline measure with which the imitation effect would be compared. Subsequently, the model showed the infants four times in a 15 s period randomly one of the three facial gestures (tongue protrusion, mouth opening, or lip protrusion). This was followed by a 20 s period during which the infants were allowed to respond. For all infants, responses to the model's gestures were videotaped. Afterwards and for each trial, six independent graduate students who were blind to the model's specific gestures, watched the video and ranked the facial gestures from being most to least likely imitated by the infant. For instance, a possible ranking of imitative responses for a modeled tongue protrusion could be (1) tongue protrusion; (2) mouth opening; (3) lip protrusion. It turned out that for each modeled gesture infants were significantly more likely to perform specifically that gesture, compared to no gesture or other gestures. This finding conforms the definition that imitation involves a non-random copy of an observed body (part) movement of a model caused by nothing else than the mere observation of the model itself.

One limitation of this study, however, is that the researchers did not exclude the possibility of an experimenter bias. That is, during the experiment, neonates were often not paying attention to the model, because they were spitting or choking. To overcome this problem, the model sometimes repeated the facial gesture to make sure the gestured was attended by the neonate. Consequently, this solution might have led the model to repeat the gesture until a neonatal reaction randomly coincided with the model's demonstrated gesture. To overcome this considerable problem, Meltzoff and Moore designed another experiment (Meltzoff and Moore, 1983a) in which they used a fixed duration for each presented gesture. Neonates in this experiment were even younger than those in the previous experiment: their ages ranged from 42 min to 71 h. Again, neonates imitated the model's tongue protrusion and mouth openings consistently. The effect of lip protrusion on imitation, however, failed this time to reach the required level of statistical significance.

An alternative account of this neonate imitation effect entails an innate and evolutionary relatively old release mechanism involved in promoting the neonate's chances of survival (Jacobson, 1979; Bjorklund, 1987). Mouth openings and tongue protrusions, could for instance just be a reflex toward a suckable object, such as a mother's nipple. Consequently, neonate responses in the gesture imitation paradigm could thus be caused by their mere perception of the model's tongue as a suckable object, independent of any genuine imitation. According to the innate release mechanism account, the observed link between a model's tongue protrusion and the neonate's tongue protrusion could be merely coincidental and uninformative regarding genuine imitation.

However, Meltzoff and Moore (1994) propose that if this innate release mechanism plays a role in neonate imitation, then the neonate's response to a suckable stimulus should occur shortly after the perception of that stimulus and not after a delay. To rule out the innate release account, they conducted an experiment similar to their previous experiments, but now with an additional condition in which the neonate's response was delayed by 24 h: the model randomly demonstrated a gesture and after 24 h, the neonates saw the same model again, but now only with a passive face. First, Meltzoff and Moore replicated their previous findings that neonates systematically imitated the model's tongue protrusion and mouth openings if they were allowed to respond directly after the model presented the gesture. Furthermore, after the 24 h delay, neonates showed significantly more tongue protrusions than other gestures, if the model had demonstrated a tongue protrusion 24 earlier. Interestingly, this effect was not found for other gestures. This finding is interpreted as reflecting a specific effect of imitation, in which the observed action is imitated after a delay and can therefore not be explained by being a reflex due to an innate release mechanism4 .

Several other studies found results very similar to those of Meltzoff and Moore (Jacobson, 1979; Field et al., 1983; Meltzoff and Moore, 1983b; Fontaine, 1984; Kugiumutzakis, 1985; Abravanel and DeYong, 1991), but an even more extensive number of studies failed to replicate these initial neonate imitation effects (Anisfeld et al., 1979; Hayes and Watson, 1981; Koepke et al., 1983; McKenzie and Over, 1983; Neuberger et al., 1983; Abravanel and Sigafoos, 1984; Fontaine, 1984; Lewis and Sullivan, 1985; Heimann et al., 1989). To clarify and explain these mixed

<sup>4</sup>This experiment by itself does in our view not provide evidence for the nativist enactivist claim that neonates are capable of intersubjective understanding, for all the dynamics between actor and observer are lost after the introduction of a delay between the modeled gesture and the neonate's response.

results, several reviews on neonatal imitation have been published that will be discussed in the next section.

## **3.2. REVIEWS OF NEONATAL IMITATION**

One review analyzed 26 experiments on neonatal imitation that together combined 15 different gestures in a total number of 76 gesture conditions (Anisfeld, 1996). Tongue protrusion and mouth opening were the most commonly studied gestures, accounting for 23 and 16 gesture conditions, respectively. Anisfeld counted for each experiment whether or not an effect was found in a particular gesture condition. He defined an effect as present when the neonates showed significantly more correct imitations in the gesture condition than in the neutral comparison condition. Finally, he required an effect to be significant on a two tailed test, with a *p*-value smaller than 0.05.

In total, an effect was present in 28 of the 76 gesture conditions (37%). It turned out that an effect was present in 12 of the 23 tongue protrusion conditions (52%), 3 of the 16 mouth opening conditions (19%), and 13 of the 37 remaining gesture conditions (35%). Tongue protrusion appears thus to be stronger than the other gesture effects in this review. However, still 48% of the tongue protrusion conditions did not show an effect at all. For all 11 tongue protrusion conditions that did not have a significant effect, the duration of the gesture demonstration turned out to be less than 40 s. Conversely, conditions in which the tongue protrusions were demonstrated for more than 60 s all did show a significant effect. Anisfeld (1991) concludes therefore that a neonate imitation effect is present only for the tongue protrusion gesture and only under conditions of longer gesture presentation.

Based on the review, Anisfeld (1996) argues further that if neonate imitation would have been a general phenomenon, then neonates that showed a strong tongue protrusion effect should also more strongly imitate other studied facial gestures. In other words, if genuine neonate imitation is present, then a positive correlation should show up between different gesture imitations. This was, however, not the case for the 76 reviewed gesture conditions (Anisfeld, 1996).

Anisfeld investigated additionally also the frequency of tongue protrusions and mouth openings per minute after modeled tongue protrusions, mouth openings, or passive faces. He found that the frequency of neonatal tongue protrusions was significantly higher after a modeled tongue protrusion than after modeled mouth openings or passive faces. This effect was not found for the mouth openings: the frequency of mouth opening responses did not significantly differ when either tongue protrusions, mouth openings or passive faces were modeled. This does not necessarily mean however that no genuine imitation of mouth openings was present. It could also mean that statistical power was simply too low. That is, Anisfeld analyzed a total of 12 mouth opening studies. The power to find a medium effect (*d* = 0*.*50), given an alpha of 0.05 and a sample size of 12, equals 0.35, which is quite low indeed (Cohen, 1977).

Furthermore, because Anisfeld used data from different studies in his two-sided *t*-test, the observations of the neonates are nested within the different studies, making it likely that specific study characteristics influence the neonate imitation effects excessively (Hox, 2002). In his analysis, Anisfeld also made use of aggregated data by looking at the mean frequencies of neonatal gesture responses, thereby ignoring individual variation in gesture responses. In fact, even more variation is ignored because the data actually conforms to a multilevel structure with four levels: gestures nested within neonates, nested within experiments, nested within studies. When a multilevel analysis had been adopted instead, then this unsystematic variation would have been addressed more appropriately. By not taking this variation into account, chances of making a type I error are dramatically increased (Stevens, 2009; Hox, 2010), which makes it also more likely that the tongue protrusion imitation is over-estimated or even is itself a false positive.

These latter statistical considerations make it difficult to conclude clearly about the presence or absence of neonatal imitation based on the analysis of the tongue protrusion and mouth opening frequencies. This leaves us then with Anisfeld's counts of the significant gesture effects showing significance for only 52% (12/23) of the tongue protrusion conditions and 37% (28/76) of the gesture conditions in general. However, this analysis simplifies and reduces quantitative information by dichotomizing the data into either an effect or no effect. The strength of an effect or the amplitude is thereby completely ignored, as well as the variation of the data within each separate study. Therefore, we cannot draw any strong conclusions about the strength of the genuine neonate imitation effects for each gesture. This would only be possible if we conduct a meta-analysis, but most of the reviewed studies did not even report standard deviations, which makes it impossible to conduct a proper meta-analysis in the first place (Tabachnick et al., 2001) 5 .

A more recent review corroborates the findings of Anisfeld (1996). Ray and Heyes (2011) reviewed 37 experiments on neonatal imitation, comprising a total of 17 different gestures. It turned out that eight of those gestures did not provide support for the existence of genuine neonatal imitation. Eight of the remaining nine gestures showed mixed results, but the authors explained these findings either as peculiar scoring criteria, or by being a side-effect of the tongue protrusion gesture. Peculiar scoring criteria include for instance the categorization of each imitation as either present or absent, rather than calculating response frequencies. Furthermore, gestures that include mouth movements such as mouth openings can be viewed as a side-effect of an imitated tongue protrusion. Despite these limitations, but in line with the results of Anisfeld (1996), the only gesture that did reliably show positive results was the tongue protrusion (Ray and Heyes, 2011).

Because the reviews described in this paper lack proper metaanalytic techniques, a compelling meta-analysis seems to be required to settle the question whether neonatal imitation really exists. Additionally, one venue for further empirical exploration of this matter could be to find out which factors may moderate the neonate imitation effects (e.g., differences in parental style and personality characteristics, attractiveness of the experimenter's face, delay that is used in the experiment etc.). Moderating factors might explain the huge discrepancy in the experimental findings that have been reported thus far. A proper meta-analysis will not only overcome the statistical problems of the systematic review by

<sup>5</sup>Such a meta-analysis, however, was beyond the scope of the present paper.

Anisfeld (1996), but it can also be used as a tool to discover factors moderating the neonate imitation effects.

# **4. DISCUSSION**

The studies reviewed above indicate that there is no convincing evidence for the existence of neonatal imitation of different social gestures. Both reviews conclude that only the tongue protrusion gesture shows a reliable imitation effect (Anisfeld, 1991; Ray and Heyes, 2011). However, these reviews suffer from a number of statistical flaws that make it difficult to interpret their results decisively in this matter. Leaving this aside, the Anisfeld (1991) review points out that 63% of the investigated imitation conditions failed to show any effect, which indicates at least that the available evidence does not favor neonatal imitation in general. And although the strongest imitation effect appears to be found with tongue protrusion gestures, still 48% of those experiments fail to find an effect. Thus, it can be concluded that neonate imitation is far from a well-established scientific phenomenon. It seems misleading therefore to present genuine neonate imitation as a robust finding (as for instance in Gallagher, 2005, and see Gallagher, 2000, 2001, 2005, 2008, 2011; Zahavi, 2001; Gallagher and Hutto, 2008; Fuchs, 2009; Varga and Gallagher, 2012).

## **4.1. ALTERNATIVE ACCOUNTS OF THE EMPIRICAL EVIDENCE ON NEONATAL IMITATION**

If neonates are really capable of genuine imitation, then nativist enactivists need to explain why the experimental evidence is so contradictory and why it seems to indicate that genuine neonate imitation—if it exists at all—is only restricted to tongue protrusions. If neonate imitation is not a general phenomenon, then it is more parsimonious to explain tongue protrusions, for instance, by an underlying innate release mechanism (Anisfeld, 1996). According to this interpretation, a modeled tongue protrusion resembles an approaching nipple, thereby triggering an innate sucking reflex in the neonate. This interpretation cannot explain, however, the finding of delayed tongue protrusions observed in one of Meltzoff and Moore's experiments (Meltzoff and Moore, 1994), because the innate release mechanism requires the reflex to happen directly after the observed tongue protrusion.

An even more parsimonious explanation that also does *not* contradict Meltzoff and Moore's delayed response finding (Meltzoff and Moore, 1994), proposes that tongue protrusions reflect a tendency to explore the world (Jones, 2009). One study showed, for instance, that neonates do not only stick out their tongue in reaction to a tongue or nipple-like objects, but also to a human face or inanimate objects such as bright lights or music (Jones, 1996a). Consequently, this theory explains the delayed tongue protrusion as oral exploratory behavior in reaction to non-specific visual stimuli – in this case the mere perception of the person who modeled the tongue protrusion 1 day earlier. This implies that to a neonate, modeled tongue protrusions are just a specific example of a wide range of stimuli that can arouse the neonate's interest to explore the world. Additionally, a longitudinal study indicates that tongue protrusions decrease as soon as infants become able to grasp objects (Jones, 1996b). Therefore, according to Jones, the tongue protrusion effect can be more parsimoniously explained as an innate reflex that enables neonates to start exploring the world until other modes of exploration become possible. The finding that tongue protrusions are not only directed at humans but also at inanimate objects like bright lights, suggests that tongue protrusions do not necessarily have a communicative or social function. However, if the tongue protrusions directed at humans are of a different kind than those directed at inanimate objects, then a social function might still be possible alongside the gesture's explorative features as proposed by Jones (2009).

Both alternative explanations described above propose that neonate imitation is caused by an innate, reflex-like mechanism and does not reflect genuine imitation as defined before. Although both explanations can explain the origin of the tongue protrusion imitation in neonates, they cannot account for instances of infant or adult imitation that are more complex, such as intentional imitation. This naturally raises the question of how and by what mechanisms human beings are able to develop the capacity to imitate. Recently, a new model has been proposed that explains imitation as a process that is learned through sensorimotor experience, rather than a purely innate biological mechanism (Heyes and Ray, 2000; Ray and Heyes, 2011). This *associative sequence learning* (ASL) model claims that associations between motor representations and sensory representations of an action are formed through experience via associative learning (Schultz and Dickinson, 2000). These associations can be formed not only through direct self-observation, but also by observing oneself through a mirror or by observing someone else imitating your actions. In this way, the ASL model is able to explain how infants learn to imitate—even the imitation of actions that cannot be directly observed by the actor, such as for instance facial expressions.

Various studies support this notion that genuine imitation is acquired through learning rather than being innate. First, evidence from neuroimaging studies indicates that sensorimotor experiences can influence the mirror neuron system (Calvo-Merino et al., 2005, 2006). For instance, people who are expert dancers show more activity in their mirror neuron system when observing other people perform "their" dance, than when they observe a dance they do not master. This difference in mirror neuron system activity might imply that sensorimotor learning influences the development of the mirror neuron system. This connection between action experience and action observation is also found in young children. Sommerville et al. (2005) showed that a short experience with using a mitten to reach to distant objects, changes the infant perception of other goal directed actions, suggesting an important role for action experience on action observation. In support of this view, when babies perceived actions of others, they showed higher motor resonance for actions that were already present in their motor repertoire (e.g., crawling), compared to actions were not yet present in their repertoire (e.g., walking) (van Elk et al., 2008). Other studies also highlight the importance of visuo-motor experience and associative learning for the imitation of observed actions (for review, see Heyes, 2011).

If imitation is mediated by the mirror neuron system, then it might be possible to adjust imitative effects through sensorimotor learning. This is exactly what Heyes and colleagues tested in several experiments (Heyes et al., 2005; Catmur et al., 2008). They showed that humans make faster imitative gestures than comparable non-imitative gestures—an effect believed to be mediated by the mirror neuron system. However, they were able to change this advantage of imitative over non-imitative gestures through a sensorimotor training. In this training people were instructed to execute a particular action while observing a different action, thereby weakening existing imitative responses through interference. The finding that sensorimotor experience can cancel or even reverse automatic imitation was recently also corroborated by several other studies (Catmur et al., 2007; Press et al., 2007; Gillmeister et al., 2008), underlining the learned nature of imitative processes.

Although the ASL model can explain how infants learn to imitate through sensorimotor experience, the model lacks an explanation for the tongue protrusions found in neonates within 1 day after birth. Neonates that have only been born for a few hours lack the observational and action experience necessary for any imitative learning. Therefore, we propose to view such neonatal tongue protrusions—in line with Jones (2009)—not as genuine imitation, but as an innate tendency to explore the world instead. The ASL model can then still be used to explain the later development of genuine imitation in infants as being caused by sensorimotor experience6 .

## **4.2. IMPLICATIONS FOR THE ENACTIVIST THEORY OF INTERSUBJECTIVE UNDERSTANDING**

Based on the studies reviewed in this paper, we conclude there is no strong evidence for innate and genuine neonate imitation. In fact, imitation may be learned and shaped through sensorimotor experience rather than being automatic and innate. A neonate's tongue protrusion can be explained as an innate tendency to explore the world, rather than being genuine imitation (Jones, 2009). This explanation, however, does not necessarily contradict the enactivist proposal that such tongue protrusions have a communicative or social function. Even if tongue protrusions turn out to be an a innate reflex, then this could still be a reflex that evolved biologically with a social function, because such neonatal gestures might stimulate the neonate's bonding with its parents, who likely adore such gestures.

If we assume that genuine imitation is learned through sensorimotor experience rather than being innate, then what are the implications for the enactivist theory in general and for the way it explains our intersubjective understanding? One implication would be that *nativist* enactivists are not warranted to claim that neonatal imitation supports the existence of intersubjective understanding in neonates. However, they could still use other studies to support the existence of infant intersubjectivity. For instance, Baron-Cohen (1997) describes two mechanisms that point to a basic intersubjective understanding in young infants. First, the eye-direction detector allows infants to recognize where other persons are looking and understand that a person is actually seeing something. Second, an intentionality detector allows infants to interpret bodily movement as goal-directed and intentional. One study showed that 18-month-old children could understand what another person intends to do and even finish the behavior if the observed person did not complete it (Baldwin and Baird, 2001). Other evidence on infant intersubjectivity shows that infants between 2 and 5 days old have a preference for looking at human faces (Farroni et al., 2002). Furthermore, 2–3 month old infants show awareness of their mother's emotional behavior by responding reciprocally (Murray and Trevarthen, 1985, 1986). The evidence described above, however, is based on studies that tested infants older than the ones used in the neonatal imitation experiments. Because of this time gap, infants already could have experienced interactions with other humans for at least a few days. Therefore one could argue that those findings can alternatively (and more parsimoniously) be explained as resulting from learning through social interaction. Because infants were not tested directly after birth, these findings cannot support an innate view as strongly as neonate imitation studies would do. In neonate imitation studies, neonates are sometimes observed within minutes after birth, which precludes the possibility of having experience with imitation. Therefore, if one wants to claim that innate processes are causally powerful then the studies used to support that claim will have to rule out that those processes are carved through learning.

The absence of neonate imitation evidence makes it more difficult for nativist enactivists to describe intersubjective understanding as an innate mechanism. It could still be the case, however, that these processes *are* present at birth, but then the nativist enactivist who uses neonate imitation studies will have to come up with new empirical evidence instead to support the claim that our basic intersubjective mechanisms are innate. Innateness, however, is not a necessary component of the enactivist theory in general. Empiricist enactivism, which proposes that the embodied processes underlying intersubjective understanding are learned rather than innate, is therefore not affected by the invalidity of neonate imitation. Nativist enactivists use the body schema as a mechanism to explain imitation and our understanding of others (Zahavi, 2001; Gallagher, 2005). The validity of that proposal is not necessarily threatened if genuine neonate imitation does not exist. We propose that mechanisms like the body schema and processes like imitation and social understanding are not innate, but need to be learned over time. The implication for enactivism would be that rather than being innate, the body schema is acquired through a process of exploration, sensorimotor experiences and learning from social interaction. Therefore, we claim that the available experimental evidence on neonate imitation

<sup>6</sup>One shortcoming of all explanations described above, however, is that they all focus on individuals as units of analysis. This "methodological individualism" (Boden, 2006) is not only dominant in imitation research, but also in most areas of social neuroscience. Recently, a new model has been proposed (Froese et al., 2012) that explains imitation not only in terms of the individuals involved in the imitation, but takes the social interaction itself as a unit of analysis. This theory actually bypasses the nativist-enactivist discussion, because instead of using individual mechanisms (innate vs. learned), it explains imitation as emerging completely from the social interaction itself. Although this theory has been supported experimentally (Froese et al., 2012), it is not yet complemented by brain imaging studies because of the challenges associated with second-person perspective neuroscience. A potential venue of future research would therefore be to study the social interaction underlying imitation by using promising new second-person perspective techniques such as dual EEG (Dumas et al., 2010; Naeem et al., 2012).

only undermines the nativist enactivist view on intersubjective understanding, while the evidence does *not* contradict the *empiricist* enactivist views (Di Paolo and De Jaegher, 2012; Froese et al., 2012).

# **5. CONCLUSION**

Altogether, the generality of genuine neonatal imitation is not supported convincingly by the available experimental evidence at this moment. Despite the findings of the tongue protrusion imitation, it cannot be concluded that neonate imitation is a general phenomenon. This conclusion provides a potential problem for the nativist enactivist proposal that neonates already have a basic and innate form of intersubjective understanding at birth. It would be important to address the contradictory findings in future theories regarding the innateness of social cognition and enactive understanding and to consider more parsimonious explanations of the tongue protrusion effect. Nonetheless, the outcome of the neonatal imitation debate does not pose a threat to enactivism in general, because other strands of evidence provide converging evidence for the importance of intersubjective processes in adult social cognition. The available evidence on neonatal imitation, however, calls for a more careful view on the innateness of such processes and suggests that this way of interacting needs to be learned over time.

# **ACKNOWLEDGMENT**

This research was supported by a VENI grant no. 016.135.135 from the Netherlands Organization for Scientific Research (NWO).

## **REFERENCES**


Boden, M. (2006). Of islands and interactions. *J. Conscious. Stud.* 13, 53–63.


Schaal, S. (1999). Is imitation learning the route to humanoid robots? *Trends Cogn. Sci.* 3, 233–242. doi: 10.1016/S1364-6613(99)01327-3


imitation and joint action. *J. Exp. Psychol. Hum. Percept. Perform.* 34, 1493. doi: 10.1037/a0011750


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2014; accepted: 13 August 2014; published online: 02 September 2014.*

*Citation: Lodder P, Rotteveel M and van Elk M (2014) Enactivism and neonatal imitation: conceptual and empirical considerations and clarifications. Front. Psychol. 5:967. doi: 10.3389/fpsyg.2014.00967*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Lodder, Rotteveel and van Elk. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Embodiment of intersubjective time: relational dynamics as attractors in the temporal coordination of interpersonal behaviors and experiences

# *Julien Laroche1,2 \*, Anna Maria Berardi <sup>2</sup> and Eric Brangier <sup>2</sup>*

<sup>1</sup> Akoustic Arts R&D Laboratory, Paris, France <sup>2</sup> PErSEUs, Université de Lorraine, Metz, France

## *Edited by:*

Ezequiel Alejandro Di Paolo, Ikerbasque – Basque Foundation for Science, Spain

*Reviewed by:* John G. Holden, University of Cincinnati, USA Wolfgang Tschacher, Universität Bern, Switzerland

#### *\*Correspondence:*

Julien Laroche, Akoustic Arts R&D Laboratory, 48 Rue René Clair, 75018 Paris, France e-mail: julien.laroche@akousticarts.com

This paper addresses the issue of "being together," and more specifically the issue of "being together in time." We provide with an integrative framework that is inspired by phenomenology, the enactive approach and dynamical systems theories. To do so, we first define embodiment as a living and lived phenomenon that emerges from agentworld coupling. We then show that embodiment is essentially dynamical and therefore we describe experiential, behavioral and brain dynamics. Both lived temporality and the temporality of the living appear to be complex, multiscale phenomena. Next we discuss embodied dynamics in the context of interpersonal interactions, and briefly review the empirical literature on between-persons temporal coordination. Overall, we propose that being together in time emerges from the relational dynamics of embodied interactions and their flexible co-regulation.

**Keywords: coordination, intersubjectivity, dynamical systems, embodiment, attractor, enaction, phenomenology, complexity**

# **INTRODUCTION**

How can we "share a moment" and experience this sharing? How can we share some time, even if it is immaterial? How can we share the intimacy of a moment despite the distance that usually separates our bodies? How can wefeel being together if this means more than being in the same place or doing the same thing? For time is often taken for granted as an objective and physical dimension of reality, the issue of sharing its lived experience has not been much addressed by cognitive sciences. The aim of this paper is to provide a theoretical, phenomenological, and empirically grounded framework that addresses this issue.

To do this, we rely on the complementary approaches of phenomenology, enaction and dynamical systems (Froese and Gallagher, 2012). We take embodiment, temporality and interactivity seriously, and it is on the basis of these three inter-related dimensions that we construct our proposition. More precisely, since being a body is necessary for us to live (and therefore share) experiences, we first define what we mean by embodiment. In a second part, we address the issue of time as embodied, that is, the issue of how time is experienced and what kind of temporality underlies our embodiment. We then address the issue of embodiment in the context of intersubjectivity, and more specifically the issue of the embodiment of a properly intersubjective time. We finally discuss our overall proposition.

# **WHAT IT IS TO BE EMBODIED**

The notion of embodiment refers to numerous meanings (e.g., Wilson, 2002). In this section, we specify our understanding of what it is to be embodied through the lens of the enactive approach (Varela et al., 1991; Di Paolo, 2005; Thompson, 2007; McGann et al., 2013; Di Paolo and Thompson, 2014) and

its phenomenological background (Husserl, 1913, 1931, 1952; Merleau-Ponty, 1942, 1945). According to the enactive approach, mind is both a *living* (observable, biological) and a *lived* (experienced) phenomenon that emerges from agent∼world coupling1. Since living and lived aspects are concretely intertwined (Thompson and Varela, 2001), they can only be distinguished from an observer's point of view. By abstraction, we discuss them successively; their entanglement will then become explicit.

## **EMBODIMENT AS A LIVING PHENOMENON**

At the roots of the enactive approach, living has been defined as the self-production and self-maintenance of its own organization, where "organization" means "the relations that exist among component processes of a system" (Varela et al., 1974; Varela, 1979; Maturana and Varela, 1987). It is thus a network whose operations are closed (i.e., each process has causes in and effects on other processes of the system). This interdependency enables the self-organized emergence of a coherent living unit. Emergence designates two complementary processes: the "local to global" formation of a new system (or pattern) out of the interactions between coupled components (i.e., out of the reciprocal effects they have on each other), and the "global to local" constraints that the newly formed system exerts on its components and the organization of their relations (Thompson and Varela, 2001). By producing itself, the living system actively "affirms" its own identity (it specifies what it is), and thereby defines its own intrinsic laws or norms of self-maintenance. In a word, the living system

<sup>1</sup>The tilde sign used in this text is a reference to Kelso and Engstrom (2006). It denotes that paired concepts are dynamically related to each other: the separate understanding of each concept remains incomplete as long as its complementary aspect is not taken into account

is auto-nomous (Varela, 1979; Di Paolo, 2005; Barandiaran and Egbert, 2014).

However, the autonomy of the living system is bounded by the domain of its viable relations with the environment. The boundaries of the living are therefore relational (rather than merely "skin-bounded"). Further, by generating itself, the living unit distinguishes itself from what it is not and thereby defines what the environment is from its own point of view (i.e., what counts as a significant environment and which value their relation has for the maintenance of its autonomous existence). The living unit thus constitutes an autonomous perspective on its own relations: interactions with the environment are asymmetrically anchored in its own, self-constituted perspective (Barandiaran et al., 2009). The phenomenological domain of the living is thus autonomous∼relational, which means both that the living system's interactions are autonomous and that the autonomy of that system is realized interactively.

When a living system has the ability to regulate its coupled relations with the environment (as a function of the values that emerge from its own norms), we speak of an embodied agent whose interactive autonomy is adaptive (Di Paolo, 2005; Barandiaran et al., 2009): something has to be done to bring forth a "difference that makes a difference"(to quote the slogan of Bateson, 1972), preferably in the right direction (i.e., in accordance with its self-constituted norms). Cognition is thus broadly defined as a sense-making activity (Weber and Varela, 2002; Di Paolo, 2005): it consists in the enactment of a world of significance and values through autonomous interactions with the environment. In short, cognitive experiences are enactedfrom an autonomous perspective that is intrinsically relational.

## **EMBODIMENT AS A LIVED PHENOMENON**

In contrast of the above definition, classical accounts attribute cognition the role of mentally and internally representing information coming from the external world (Varela, 1988). Agent and world, organism and environment, subject and object or inner and outer are thus defined as being *a priori* external to each other. From a phenomenological point of view however, and as shown throughout this text, these boundaries are not given. These opposite poles only exist in the dynamics of their irreducible relations. Indeed, in lived experience, (cognizing) subject and (cognized) object are irreducible (Husserl, 1952), just like we can not distinguish the look from the thing that is seen. The detached, reflective stance is thus not our primary way of being in the world. Rather, our connection with the world is primarily corporeal and pre-reflective (Merleau-Ponty, 1945), as discussed below.

Appearance of our lived world obviously depends on our sensory structures, but motility directly affects how these sensory structures are perturbated by the environnement: what we do changes what can be sensed. The lived world is thus imprinted by our sensorimotor embodiment and is constituted in the context of our ongoing activity (McGann, 2010; McGann et al., 2013). Sensorimotor coupling allows for coherence of both the autonomous agent (its embodied experiences and its underlying internal dynamics) and his relations with the world. This is

reflected in his own active and sensitive way of inhabiting the world he enacts (Buhrmann et al., 2013).

To discuss our pre-reflective connection with the world, we refer to the phenomenological distinction between the living body and the lived body. The living body refers to the image one can have of a body (or one's own body), observed and thematized as an object of perception. The lived body is the pragmatic, unthematized (hence pre-reflective) background of experience, it is what our body-in-the-world affords us to sense and do (Lenay, 2010). This bodily self-consciousness is necessary for our experiences to be and feel "for" us, (Thompson, 2007). It is transparent to us: it is the pre-reflective background of our perspective, the point from which we see, do and live. In turn, affordances of the lived body are constantly reshaped by the ongoing activity of the living body: we enact the pre-reflective background of our perspective. Living and lived body thus co-constitute each other, and this is what defines embodiment (Thompson and Varela, 2001). It provides us with an autonomous perspective on our relations with the world (the phenomenal world that we enact and inhabit). Because both the co-constitution of lived and living body and the intertwinement of autonomy and relations are dynamical, we now turn our attention to the temporality of embodiment.

# **THE EMBODIED MIND IN TIME**

To address the issue of embodiment and temporality, we first present a phenomenological account of the time as lived, or time consciousness. Then, we address the issue of the temporality of the living. The co-constitution of lived temporality and temporality of the living will then be explicited.

## **TIME CONSCIOUSNESS**

Time consciousness is directed toward both the "outer" objects or events that have a temporal extension, and the "inner" experience of duration itself (i.e., the feeling of living enduring experiences with a temporal envelope; Thompson, 2007). This outer∼inner separation is only an abstract description from an external observer's point of view: these aspects are irreducible in concretely lived experience. Indeed, we do not have an experience of the temporal extension of objects or events on the one side, and a sensation of our own enduring temporal experiences on the other: these aspects manifest themselves as a whole in a unified way (Thompson, 2007).

Husserl (1928) and its commentators (e.g., Merleau-Ponty, 1945; Varela, 1999a; Zahavi, 2003; Thompson, 2007; Gallagher and Zahavi, 2014) proposed a descriptive structure that accounts for both outer and inner time consciousness as well as their non-separateness. This structure consists in three inter-related component processes: primal impression, retention and protention. Primal impression designates the openness to the current "now-phase" of an object. This "now" is never lived in isolation of its temporal horizons, for there would be no time-extended perception (duration, succession or change) if present was lived as a succession of inarticulate moments (Varela, 1999a; Gallagher and Zahavi, 2014). Primal impression thus only exists in the networked conjunction with retention and protention. Retention is the subjective holding of the just-elapsed phase of the object or

event that is receding into the past. Protention intends the phase of the object or event that is just about to occur: it is the temporal horizon formed by the (implicit) anticipation of the unfolding of experience.

These component processes do not behave "additively" (Gallagher and Zahavi, 2014): in the fullness of concrete experiences, they can't be separated so as to manifest themselves as "retention + primal impression + protention" (i.e., they do not provide with diachronic feelings such as distinctly articulated past, present and future). Indeed, primal impression is qualified by both retention and protention: "now" would be different in the context of another retention and implicit anticipation. In turn, primal impression shapes what temporal horizon might be anticipated, and (re)shapes the way its retentional background is felt (it puts, as it were, the retentional trace into perspective, such that when a surprise arises from the unfulfillment of a protention, its presentification transforms the felt quality of the retained experience). Component processes of time consciousness thus qualify each other: they are inter-related in a "multiplicative" way (Gallagher and Zahavi, 2014). These processes operate synchronically and their interactive product manifests as a unified whole (Merleau-Ponty, 1945). It provides with a complex temporal field, a "specious" present in the thickness of which objects or events can be experienced with a time-extended quality (Varela, 1999a).

This threefold structure thus does not function as a mere sliding window (where protentions would become primal impressions which would further become retention). Retention, for instance, is not the intentional aiming of an absent phase of the outer object or event, for it is not possible to directly aim at something that is not actually there. Rather, retention refers to the just-elapsed phase of the experience of that object or event (Thompson, 2007). Because this experience had a threefold (primal impression – retention – protention) structure, what retention holds is a full threefold structure. Protention also has a threefold structure, for it intends what is anticipated to be about to qualify as retention, primal impression and protention. As component processes of the threefold (retention – primal impression – protention) structure "holds" the same threefold structure again (and so on), the dynamical flow of time-consciousness can be said to have a fractal structure (Gallagher and Zahavi, 2014). Fractality captures the self-similarity of a structure: constituting parts resemble the whole they form across multiple scales of observation or "zooms". Vrobel (2011) also proposed a fractal interpretation of Husserlian accounts, in which "nows" (threefold structures) are nested into each other, and can be thought as different timescales or "levels of description". Nesting nows provide nested nows with a (common) context in the light of which they are experienced. This multiscale structure is necessary for the current note of a melody to be meaningfully experienced not only in the narrow context of its predecessor, but also in the larger contexts of the melody or the whole piece it belongs to, or even the evening when it was listened to. In turn, nested nows can affect the experience of the contextual background in which they are embedded, such that the current note can modify how its embedding retentional background and its protentional horizon are experienced (especially if that note is surprising). Time as experienced thus does not follow a unidimensional, linear chronology: the

temporal texture of lived experience thus has a multiscale, fractal topology.

Time consciousness has a multiplicative, self-referential structure: it makes references to its own retained pasts and anticipated futures. It is thereby a self-constituted flow: it manifests itself to itself, enabling the experience of the enduring quality of its own dynamics (the so-called "inner" time consciousness). This flow is therefore the "absolute," irreducible, most fundamental level of time consciousness, and the necessary background out of which any experience can arise (Thompson, 2007). In other words, it is the pre-reflective structure of consciousness (Zahavi, 2003), the transparent background of our embodied perspective. This perspective is thus not just a point of view in the spatial domain: it is also a temporal perspective (Vrobel, 2011). The lived body thus has to be seen from the dynamical point of view of this flow. Because it presents itself as an affordance, the lived body is oriented toward what is anticipated to be about to be enacted. This orientation is underlain by the broken symmetry of time consciousness (to-be-fulfilled protentions intend what hasn't been yet, in contrast to retentions that hold what has actually been). The dynamical structure of consciousness is thus always incomplete and moves forward, toward the complementarity of afforded anticipations. In this sense, time consciousness is enactive (Gallagher and Zahavi, 2014), pragmatically oriented toward (what) perception and action (could be). In turn, because perception and action emerge from this flow, they are imprinted by its dynamics and therefore have a similar structure.

Finally, because of the complex processes whereby components qualify each other dynamically, contents of experience affect its own intrinsic temporality (Gallagher and Zahavi, 2014). Indeed, think for instance about the fulfillment (or lack thereof) of a retained protention, and how it shapes primal impressions, their retentional background and their protented horizon. The flow of time consciousness thus makes present both the temporal content of experience and the temporal experience itself (i.e., both the "what" and the "how"). Outer and inner aspects of time consciousness thus co-constitute each other dynamically. Intrinsic temporality of experience thereby embodies the dynamics of the environment (Vrobel, 2011). Our dynamical perspective is thus relational as well.

Overall, embodiment constitutes an autonomous∼relational perspective whose dynamical background is self-referential, multiscale and multiplicative. This forms a pre-reflectively lived background from which we can inhabit the world. How does temporality manifest itself in the domain of the living? More specifically, how does the temporality of a complex organism emerge in a unified, coherent coordinated way? It is important to address this issue if we want to find out how time can be shared and what kind of temporality can be shared.

## **TEMPORALITY OF THE LIVING**

In this subsection, we discuss the processes that account for the features of the temporality of experiences, namely, its endogenous self-constitution, its non-linear, non-chronological unfolding, its multiscale, fractal nature, and its permeability to the environment's temporality. We also introduce the dynamical concepts and models that will guide us toward a general understanding of how different temporalities can get coordinated and shared. We first refer to a simple, abstract model, in the light of which we discuss the temporality of both brain and behavioral dynamics.

In 1665, Huygens (Hugenii, 1673) deceptively observed that two pendulum clocks he designed for the sake of increased precision actually drifted apart when they were placed in isolated rooms. However, when they were placed on the same plank, their respective ticking converged until they reached synchrony, a state in which they then stayed. Though the clocks oscillated autonomously, they were flexible enough so as to be mutually affected by the vibrations they transmitted to each other (through the plank by which they were coupled). Because of the reciprocity of their interaction, clocks' ticking became dependant on each other, and got attracted toward a common pattern. This pattern can then persist by efficaciously and commonly constraining clocks' ticking. The stability of the collective system thus emerges from the interactions between its variable components. In dynamical systems terminology, such stability is captured by an "order parameter" (Haken, 1983) or a "collective variable" (Kelso, 1995), which measures the ordering of the relations among components. Emergent synchrony between coupled behaviors is actually ubiquitous in nature, though it manifests in obviously more complex ways (Pikovsky et al., 2001; Strogatz, 2003). Some of the most fundamental issues in brain and behavioral sciences are related to this phenomenon: how can large-scale coherent activity be formed in the brain out of its noisy basal functioning? How can coherent movements be performed despite of the numerous degrees of freedom they involve? The hypothesis according to which temporal coordination or "synergies" (Haken, 1983) emerge from the non-linear dynamics of interactions between coupled components (Kelso, 1995; Varela, 1995) has gained further and further support over the years. A brief look on brain and behavioral dynamics will help us to naturalize the temporality of lived experience as well as to understand how different components can coordinate in time by interacting.

As a result of non-linear interactions between neurons' activity, brain oscillations can couple (Kelso, 1995; Varela, 1999a). Because brain signals are composed by a broad range of adjacent periodicities, oscillations whose frequencies are close enough can converge by reciprocally influencing each other (Buzsaki, 2006). This enables the emergence of large-scale synchronized patterns of activity, or assemblies. However, because of the detuning between intrinsic periodicities of neurons, coupling is weak: soft-assembled components quickly relax toward their intrinsic dynamics. Emergent assemblies are therefore transient, shortlived, and are followed by their own dismantlement (Buzsaki, 2006). This continuous reorganization is a signature of metastability, a regime characterized by the coexistence of contrasting tendencies: the integrative tendency of neurons to "cooperate" (i.e., to align their behavior through reciprocal interactions) and their segregative tendency to return to their intrinsic, autonomous functioning (Tognoli and Kelso, 2014). This allows for both the emergence of patterns of activity that are stable enough to be sustained over a significant period of time, and their flexible dismantling in order to make room for new patterns, which

is important in the face of rapidly and ever-changing environmental conditions. Fluctuations thus enable the emergence of new stable (but flexible) patterns of coordinated activity: the variability of processes itself is therefore functional and adaptive.

The timescale at which large-scale assemblies are formed (hundreds of milliseconds) correlates well with the subjective impression of nowness: their short-lived maintenance allows for the thickness of the specious present (Varela, 1999a). According to Varela, the order parameter that captures the coherence of these soft-assemblies reflect an ordering that constrains future assemblies, a correlate of protention. The dynamic flow of brain activity is thus constrained and imprinted by the trace of ongoing and therefore previous patterns' formations: it thereby constitutes retentional dynamics (Varela, 1999a). Varela (1999a; see also Freeman, 2000) designated other neurodynamical timescales: the micro-level of sensorimotor events (tenths of milliseconds) and the macro-level at which successive assemblies are coherently ordered (a few seconds). Interestingly, the many adjacent periodicities of brain signals exhibit a 1/f power law (the longer the periods of oscillations, the larger the amplitude of their contribution to the signal), a typical signature of fractal, metastable processes (Buzsaki, 2006; Werner, 2010). This encourages a view similar to Vrobel's theory (2011) in which activities at different timescales are nested into each other, the slowest timescales of fluctuations constraining or enslaving the activity of the fastest ones (Penny et al., 2008). Overall, brain dynamics do not unfold according to a single timescale of operation. They evolve coherently, thanks to the interactions between fluctuating processes whose operations span multiple timescales. Brain dynamics thus seem to shape the felt envelope of time (Lutz et al., 2002) as well as to account for its complex multiscale texture.

How to achieve coherent behaviors despite of the numerous degrees of freedom they involve? Self-organization of component processes in a metastable regime would lead to "synergies" that are easier to guide (Bernstein, 1967; Haken, 1983). Bimanual rhythmic tasks support that hypothesis. Participants have been asked to give regular taps with both hands in alternance, by following the pace of a metronome (Kelso et al., 1981). When its frequency was increased until a certain critical threshold, patterns of movements suddenly shifted toward another organization: participants spontaneously, irremediably, and abruptly tapped with both hands in phase. The motor system bifurcated non-linearly from a bistable regime (two possible patterns of behavior coexist) to a monostable one (only one pattern can be stabilized in these circumstances). This metastable phenomenon can be modeled by the dynamics of a relational variable that measures the ordering of the relations among components' activity (the relative phase between the limbs). Before the phase transition toward the uninstructed pattern occured, this relational variable started to fluctuate. This translates the loss of stability of the current pattern, which allows for a flexible reorganization of behavior (i.e., the sudden, emergent switching toward a more stable pattern). Behavioral dynamics thus seem to emerge from the self-organized interactions of components rather than from the sole properties of these components or from explicit central instructions (Kelso, 1995). In other words, and

from a general point of view, common or coherent temporal patterns can emerge from the relational dynamics between various components: these collective patterns manifest themselves as attractors that dynamically co-ordinate in time components' behaviors.

The multiscale, non-linear, fluctuating dynamics of brain and behavior are at odds with the classical view of time. Time is usually assimilated to its "objective" measurement and is subsequently described as a linear succession of isochronous units (Varela, 1999a). In the context of rhythmic behaviors, this view prompts to take stable, metronome-like regularity as the norm. Variability is thus seen as a deviation from that norm, as an error in cognitive measurements or motor implementations (Wing and Kristofferson, 1973; Delignières and Torre, 2009). While the tempo of music is indeed felt as having a stable quality in despite of the inherent variability of musicians' performances (Large and Palmer, 2002), listeners experience these fluctuations as well, and not as errors or mere approximations. Rather, these fluctuations convey expressivity (Collier and Collier, 1996; Palmer, 1997; Iyer, 2002), a phenomenon also observed in mother-infant interactions (Gratier, 2003; Gratier and Apter-Danon, 2009). Variability of behavior thus makes sense. In fact, rather than being mere noise to ignore (whether statistically or cognitively), fluctuations of rhythmic performances exhibit a highly structured complexity. Studies on pianists (Rankin et al., 2009), drummers (Hennig et al., 2011, 2012) or non-musicians (e.g., Delignières et al., 2004; Lemoine et al., 2006) show that human tempo fluctuations are fractal: they display similar structures across scales of observation, with their amplitude decreasing with their frequency according to a 1/f law. The resulting rhythmic behavior is thus composed by the intertwinement of fluctuations of various amplitudes and periodicities, like waves enslaved in larger waves. Patterns of behavior are thus organized at multiple timescales, even when the task's instructions target a unique timescale, such as the pulse.

The fractal structure of human temporality has been observed in many situations and seems to be the norm rather than the exception (for a review, see Van Orden et al., 2009). The hypothesis according to which fractal properties are generated by component processes (Pressing and Jolley-Rogers, 1997; Wagenmakers et al., 2004) is therefore fragile. Alternatively, fractality is thought to emerge from multiplicative interactions between processes that operate at multiple timescales (Van Orden et al., 2003; see Torre and Wagenmakers, 2009 and Delignières and Marmelat, 2012, 2013, for debates about these hypothesis). The latter hypothesis is supported by recent studies showing that behavioral dynamics actually exhibit multifractal properties (Ihlen and Vereijken, 2010; Dixon et al., 2012). While monofractal measurements only point out the co-presence of multiple timescales of fluctuations, multifractality captures the presence of contingencies across timescales of behavioral dynamics: underlying processes therefore interact at multiple timescales (Kelty-Stephen et al., 2013). Fractality has also been observed at multiple scales of organization and in many different measurements of the same behavior: this "pervasiveness" of fractality has been linked to metastability and the emergence of soft-assemblies (Kello et al., 2008; Kello and Van Orden, 2009; Holden et al., 2011). Indeed, while metastability reflects the balance of processes' dependance and independance (their tendency

to function in relation with each other versus autonomously), fractal fluctuations reflect the balance of temporal dependance and independance between processes through time and at different timescales. Fractality would thus be a signature of metastable dynamics. Relative dependance between processes and enslavement of local dynamics in fluctuations of larger timescales can create (long-term) correlations that fractal measurements capture. In contrast to uncorrelated fluctuations of independant processes, long-term correlations provide with a dynamical coherence that allows for a more robust unfolding of behavior. However, too rigidly correlated fluctuations (such as those introduced by strongly interdependent processes) wouldn't let enough room for fast reorganization of behavior when demands of the environment change. Soft coupling of processes thus allows for a blend of stability and adaptive flexibility, and fractality illustrates optimal, healthy metastable dynamics whose complexity is often lost with pathology (Stergiou and Decker, 2011). The temporal baseline of biological dynamics is therefore complex, metastable and (multi-)fractal, rather than linear.

Brain and behavioral coordination thus doesn't start "from scratch": it doesn't require the explicit control of all parameters or components involved in a specific pattern. Metastable dynamics provide with a background that"do something"for coordination. These spontaneous endogenous dynamics constitute a dynamical landscape that orients behaviors' trajectories toward stable attractors (Kelso, 2009). In support of this view, it is this underlying dynamical landscape that is affected as a whole by learning (Kostrubiec et al., 2012). The role of intentional agency would thereby be to actively modulate this complex background of ongoing dynamics, in order to stabilize or destabilize its intrinsic tendencies (Kelso, 2002; see also Tschacher and Haken, 2007). This metastable background thus shapes what it is afforded to do and sense: it dynamically orients behaviors and experiences and is therefore a correlate of the lived body. Because it is constituted by processes that interact at multiple timescales and are nested into each other, metastable dynamics carry a portion of their own past in which they are embedded, and prefigurate a part of their upcoming trajectories. Metastable, fractal dynamics thus have a retentional-and-protentional structure that correlates well with the complex texture of the temporality of experiences (Vrobel, 2011). Our behaviors and the lived experiences they bring forth would be entangled in and shaped by these metastable dynamics. In turn, experiences and intentional agency can then act as global constraints that modulate and guide local endogenous dynamics (Thompson and Varela, 2001; Kelso, 2002). Living and lived embodiment thus co-constitute each other and form a dynamical embodiment whose temporality is complex, multiscale, (multi-)fractal, and retentional-protentional, rather than linear and chronological. This embodied temporality emerges as a whole from a complex but flexible background of relational dynamics, wherein processes interact with each other at multiple timescales.

## **THE EMBODIED MIND IN THE TIME OF THE WORLD**

Sofar, we considered the embodiment of time by subjects who were isolated from any environmental constraints (except the boundary conditions of experimental tasks). If embodiment is relationally constituted, its underlying dynamics should be imprinted by the environment's temporality, as we show below.

Entraining to external temporalities happens very spontaneously at multiple timescales. For example, if we were isolated from the outside world, our wake/sleep cycles would not last 24 h (Czeisler et al., 1980). At a much smaller timescale, body movements can be unintentionally entrained to the oscillations of a moving room (that is merely displayed on a screen; Dijkstra et al., 1994) or even smaller stimuli (Lopresti-Goodman et al., 2007; Schmidt et al., 2007). Interestingly, synchronizing a limb in antiphase with a metronome whose frequency is increased brings forth the same dynamical features as tasks involving the synchronization of two limbs (Kelso, 1984). This isomorphism again suggests that patterns of coordination emerge from dynamics that exist at the level of the coupling (between limbs or between limb and metronome) rather than from the sole intrinsic properties of involved components.

Coordinating to the environment happens simultaneously and interactively at multiple timescales. For example, we synchronize in a more stable fashion to pulses that are embedded into larger patterns (Drake, 1993). Grouping pulses into larger patterns emerges spontaneously: participants do it during the performance of a mere pulse without any intention or awareness to do so (Parncutt, 1994) and perceive larger patterns that have no counterpart in objective information when they listen to isomorph, isochronous pulses (Bolton, 1894). Musicians' expressive fluctuations reflect the organization of larger patterns as well (Repp, 1997) and enhance listeners'coordination at these larger timescales (Drake et al., 2000). Synchronization to a pulse is also stabilized by the presence of subdivisions forming simple patterns (Repp, 2003) and destabilized when the fine-grained timing of these subdivisions is altered (Repp, 2008). More generally, the way one coordinates to a particular timescale of a stimulus reflects the temporal organization of that stimulus at other timescales (Large et al., 2002). We thus embody the stimulus' temporality at these timescales as well, and this constrains the dynamics that operate at the targeted scale.

We do not just embody plurifrequential rhythms though (Toiviainen et al., 2010), but also the complex structure of their fluctuations. For instance, when participants synchronize to the tempo of a piece of music whose fluctuations are fractal, they produce taps whose variability quantifiably match that fractal structure. Conversely, participants' taps do not exhibit a fractal structure at all in presence of a metronomic version of the same performance (Rankin et al.,2009). Participants' taps also match the complexity of pulses of metronomes that fluctuate fractally (Hunt et al., 2014; Marmelat et al., 2014a) or chaotically (Stephen et al., 2008). Such a tight coupling is not the result of a mere "imitation" of the fluctuations by means of local adjustments. Rather, the multifractal structure of taps indicates that the pattern of coordination is more complex and emerges out of the interactions between processes operating at multiple timescales (Stephen and Dixon, 2011). Coupling with the environment thus seems to modulate the whole multiscale complexity of internal dynamics, even when the stimulation's frequency is restricted to a narrow frequency band (e.g., a fluctuating pulse). As a result, multiscale patterns of coordination with the environment emerge as wholes. In this regard, (Large and

Jones, 1999; Large, 2001, 2008; Large and Palmer, 2002) proposed models that account for perceptual and motor coordination to expressive fluctuations as well as to multiscale patterns. Endogenous dynamics are modeled by coupled autonomous oscillators whose respective intrinsic frequencies span multiple timescales. Their non-linear interactions enable the emergence of coordinated patterns of internal activity that span multiple timescales as well. The rhythmic signal acts as a sensory perturbation for ongoing internal activity. Coordination to that signal is thus modeled by the subsequent entrainment of internal oscillators to the periodicities of the signal. However, because oscillators are coupled with each other, the signal does not merely perturbate them individually, or frequency band by frequency band. Rather, the stimulus modulates the complex organization of endogenous dynamics as a whole, a general model whose essence captures the aforementioned empirical observations and fits our theoretical construction well.

On the one hand, multiscale patterns of coordination are constituted by an autonomous perspective: they emerge from the background of its *ongoing* endogenous dynamics (such that different patterns might emerge in the context of different ongoing internal dynamics, even when environmental circumstances are identical). On the other hand, patterns of coordination are constituted in relational dynamics: they are a product of the interactions with the world. Indeed, when sensory perturbations affect an agent's internal dynamics, it modifies how these dynamics can later be modulated and what patterns can emerge out of it. This way, as in lived experience, inner and outer temporal dynamics co-constitute each other irreducibly. Endogenous and relational dynamics are thus intertwined such that patterns of coordination are both autonomous and relational. Because they are constrained by the dynamical traces of what is going on endogenously and thereby by the traces of agent∼ world relational dynamics, patterns of coordination are retentional. Internal dynamics thus embody the regularities of the environment in its own fluctuating activity. Because sensory perturbations are experienced in the light of this ongoing activity, this dynamical backgound provides with implicit anticipations, or protentions. For instance, when internal dynamics are modulated and stabilized by a certain pattern of perturbations that is repeated, a sudden difference in the stimulus introduces a difference in the agent∼world's relation: it unfulfills the protention embodied in the agent's ongoing internal dynamics. Dynamical embodiment of external temporalities thus allows for a strong, multiscale coordination with the environment (Dubois, 2003; Stepp and Turvey, 2010). Dynamical models that blend internal and relational dynamics therefore provide with a framework for both perceptual and motor coordination to the world. In this regard, the relations between participants' patterns of activity and patterns of stimulation were investigated [e.g., the relation between patterns of response times and the temporal patterning of successive stimuli (Holden et al., 2011) or the relative phase between participants' taps and the metronome they follow (e.g., Chen et al., 2001)]. In these cases, the dynamics of these relations exhibit fractal fluctuations as well, in a way that strongly depends on the temporality of the context of the task (Holden et al., 2011). This further points out that soft-assembled, metastable patterns

of coordination emerge at the level of the whole agent∼world coupling.

Overall, interactions between processes operating at multiple timescales form an endogenous background of metastable dynamics. It is from this background that temporal coordination of activity and experiences can emerge. It is therefore the background of our autonomous perspective: it orients the dynamics of our embodiment (i.e., both experiences and behaviors). Because it is modulated by the dynamics of its relations with the world, this "dynamical landscape" embodies the environment. Relational dynamics thus shape the dynamical landscape of our "sensorimotor habitat" (Buhrmann et al., 2013). The coordinated inhabitance of the world we enact is therefore autonomous∼relational. Embodiment is thus a dynamical phenomenon, and it is the temporality of the behaviors and the experiences it gives rise to that can be shared in human interactions (i.e., it is in the course of these dynamics that we can be together). To address this issue in more depth, we first discuss how embodiment and intersubjectivity relate to each other. We then question the temporality that emerges from the dynamics of their relation, and how this temporality is embodied by interacting subjects.

## **EMBODIMENT OF INTERSUBJECTIVE TIME EMBODIMENT AND INTERSUBJECTIVITY**

When we meet an other person, "what" we interact with is a "who" (McGann and De Jaegher, 2009): another embodied perspective. This transforms the dynamics of our embodiment in two contrasting but complementary ways. On the one hand, because the sensory-motor affordances of our respective embodiments are similar, we are subtely sensitive to each other's behaviors and to a similar world. On the other hand, our very embodiment makes alterity persist indefinitely: our respective embodied perspectives always differ (especially when they aim at one another). In this subsection, we detail the phenomenological implications of these two aspects successively, and then present experiments that track their underlying dynamics.

During our mutual encounters, part of my transparently lived body (e.g., my looking eyes, my expressing face) becomes a visible living body for the other (Lenay, 2010, who we closely follow in the next two paragraphs). Because the other is sensitive to my activity, the expression of my lived experience through my visible living body affects him and thereby changes his own lived experience. I can thus modulate and participate to the other's experience. The expression of his own experience is visible to me as well (especially the expression of the *changes* I induced in his experience). I am therefore also living experiences to which the other participates, in a way to which I participated to upstream. The other thus becomes part of my embodied coupling with the world: I do something to him that changes something for me. This way, I can pragmatically experience the other, I can enact him (I bring forth an experience of the other that emerges from the consequences of my activity toward him). By the reciprocity of this pragmatic link, we become part of each other's embodied coupling: our respective embodiments become dynamically contingent of each other (we dynamically co-determine each other's behaviors and experiences). When we interact, we thus mutually enact each other (Varela, 1999b; Thompson, 2001), so that we can

participate to and mutually incorporate each other's embodied perspective (Merleau-Ponty, 1945; Fuchs and De Jaegher, 2009). It is thus by interacting that we can share experiences, activities, meaning, and so to speak, points of view (De Jaegher and Di Paolo, 2007).

Whatever I do changes the other: he thus constantly escapes my intentions toward him (Lenay, 2010). In return, changing the other also affects me. During our interactions, I thus change myself as well, so that any of my intentions glides in the interaction process itself, wherein they get remolded. By interacting, I therefore also escape myself (hence the difficulty of applying a prepared plan of conversation once the actual encounter is unfolding). Our experiences of each other and ourselves are thus always broken, incomplete and escape us so that our interactions keep moving forward. Because the visible effects we have on each other are transparently caused (by our pre-reflectively lived body), part of the very linkage of our respective embodiments escapes both of us as well. The dynamics of our relations thereby acquire an autonomy of their own (De Jaegher and Di Paolo, 2007). Because these relational dynamics affect us simultaneously, they can efficiently coordinate our respective embodiments and constitute our behaviors and experiences in a common fashion, from a common dynamical background (De Jaegher et al., 2010). Our dynamical embodiment is thus shaped by the dynamics of our relation: we embody collective dynamics. In this sense, not only do we incorporate each other's perspective, but we also transparently incorporate the dynamics of the interaction process itself (De Jaegher,2009). In other words, the dynamical background of our embodied perspective is constituted in the process of interaction. The pre-reflectively lived landscape that orients us in our sensorimotor habitat is therefore interactively shaped (Kyselo and Tschacher, 2014).

Our respective embodiments thus become contingent of each other not only because of their congruence, but also because of their broken symmetry. On the one hand, incompleteness of relational dynamics keeps the interaction moving forward. The resulting dynamical autonomy of the interaction process can thereby "bonds" our respective embodiments. On the other hand, this incompleteness makes alterity persists. The interaction process thus always involves us personally and still imply our autonomous agency (De Jaegher and Di Paolo, 2012). While embodiment is constituted in and by relational dynamics, it is at the same time these very relational dynamics that have to be actively regulated. As it depends on the other and its the complementary involvement in the process of interaction, the active modulation of interpersonal coupling escapes us. As an individual effort, it is always incomplete. It is a co-regulation of an irreducibly collective process. The coregulation of our coupling entails a dynamical congruence such that an even more fine-grained sharing of embodied dynamics becomes possible. Further, because the process to regulate is collective, sharing its modulation has a quality that is proper to the interpersonal domain: it makes sense in itself. Sharing experiences, activities or meanings is thus not just about the content. It involves an inter-enactive process whose dynamics have a proper quality that makes sense on its own. Because its underlying dynamics participate to our embodiment, and because we can experience the consequences of the co-regulation of these dynamics, this intersubjective quality can also make sense to us personally.

Auvray et al. (2009) empirically tracked the general dynamical structure of human interactions. Pairs of blindfolded participants manipulated a device that reduced their sensorimotor coupling to a strict minimum: each participant moved a mouse that displaced an avatar in a virtual environment and participants received a unique type of tactile stimulation whenever the receptor field of their avatar overlapped the position of an entity in that virtual environment (**Figure 1**). There was thus only one bit of information (0: no stimulation; 1: stimulation). In this context, participants couldn't distinguish if the stimulations they received resulted from the crossing of their partner, or from the crossing of a lure that imitated the partner's displacements. However, participants met each other a lot more often than they met the lure: they found each other without knowing they did. The difference between the two situations emerges at the collective level. The lure is disembodied: it doesn't receive any stimulations that modify the internal dynamics of its behavior. Conversely, the partner is embodied and the overlap with its receptor field leads to a mutual stimulation. Even if all participants participant ignore what they do for the other (Lenay, 2010), they affect each other's behaviors. They thereby got attracted toward a common pattern of behavior (a reversal of movements around the source of stimulation). In other words, they were oriented and coordinated by the mutual and common effects of the interaction process, without any awareness of the dynamical situation in which their behavior got entangled. This illustrates how the incompleteness of the encounter (i.e., what I do for the other escapes me, as well as what the dynamics of our patterns of relations do for us) allows

for the interaction to move forward *on its own*. The coordination of behavior that is observed externally can thus emerge from the process of interaction and/or its regulation (Froese et al., 2012; Lenay and Stewart, 2012; see Auvray and Rohde, 2012, for a review of replications of the above experiment with both human participants and artificial agents). Boker et al. (2009) captured such kind of phenomenon in a somewhat more ecological experiment. They reduced the visible expressivity of one of two conversational partners by resynthesizing the movements of its realistic avatar. This effect was transparent to him, but visible to his partner, who enhanced the amplitude of his own movements, as if he were compensating for this lack of expressivity. The complementary regulation of coupling dynamics then became explicit as both partners ended up enhancing the expressivity of their movements, without any awareness to do so. Their behaviors became thus entangled in relational dynamics between their embodiment in a way that escaped them.

Relational dynamics can attract agents' internal dynamics toward behavioral regions that aren't reachable or attracting outside of a mutually engaging situation (Froese and Fuchs, 2012; see Laroche and Kaddouch,2014, in the domain of musical pedagogy). The process of interaction can thus transform individual repertoires of behaviors by shaping the underlying dynamical landscape that orients them. Relational dynamics thus modulate our affordances such that we embody collective dynamics (i.e., collective dynamics are part of our embodied coupling). In the experiment of Auvray et al. (2009) though, the embodiment of collective dynamics didn't seem to entail a distinct experience (participants didn't distinguish the lure from the partner). With more precise measurements of lived experience and by explicitly encouraging participants to collaborate, Froese et al. (2014a) observed that they could discriminate each other from the lures. Partners relied on the dynamical complementarity afforded by their interaction and actively co-regulated their coupling. Judgments were thus based on the enactive experience of irreducibly collective dynamics. In support of that interpretation, mutual recognition increased the clarity of experience of the other's presence: collective patterns modulate personal experiences. Subjects thus embodied relational dynamics in the full sense of the term: their behavior was livingly oriented by the interaction process, and they had a distinct experience of the relational dynamics they co-regulated and in which they were caught.

If, by interacting, we can participate to each other's embodiment, then we participate to each other's pre-reflective dynamical background. The temporalities of our respective embodiments should thus get coordinated as an effect of interacting. Because the process of interaction escapes us, it can bring forth a temporality of its own: a properly intersubjective time (Gratier and Apter-Danon, 2009) that emerges from interpersonal relational dynamics. Because the process of interaction coordinates us, we can also embody this temporality (it participates to our dynamical background). By actively regulating relational dynamics that affect us, we can experience this intersubjective temporality by ourselves. It is precisely because this regulation partly escapes us and involves the complementarity of our respective activities that we can experience its intersubjective quality. This is in the course of such an intersubjective time that we can be together. This intersubjective quality has to be brought forth before it can be experienced and thus shared in a dynamical and embodied way. Being together (as experienced enactively) can therefore be hypothesized to be the experience of the coordination of our dynamical embodied perspectives that emerges from our relational dynamics and their co-regulation. More precisely, in light of the previous sections, intersubjective time should yield autonomous∼relational patterns of coordination underlain by multiscale metastable dynamics. In the next subsection, we discuss empirical results that support this hypothesis.

## **EMBODIMENT OF INTERSUBJECTIVE TIME**

In this subsection, we address the issue of the embodiment of intersubjective time. We briefly review the empirical literature that supports hypotheses that emerged from the framework that has been built so far. We first point out that behavioral dynamics coordinate during interpersonal interactions, so that it leads to the emergence of a common, shared temporality of behavior. Afterwards, we verify that this coordination emerges from the metastable relational dynamics of between-persons interactions. Next mutuality of interaction is shown to play a proper role in these dynamics. This leads us to point out that the experience of the intersubjective dimension of interpersonal timing is enacted thanks to the co-regulation of the interaction process. It therefore requires the personal but flexible engagement of individuals. We then discuss the functional role of fluctuations in interpersonal coordination dynamics. Finally we show that these dynamics and their co-regulation coordinate interacting persons in a multiscale and multiplicative way, and that this forms a shared dynamical background in which behaviors and experiences are entangled.

The temporal coordination of individual behaviors manifests spontaneously in our daily interactions (Condon and Ogston, 1966), most often in a rhythmic way (Condon, 1986; Bernieri and Rosenthal, 1991; Gill, 2012). For instance, both newborns and adults tend to synchronize their movements to the speech of their interlocutor (Condon and Ogston, 1971; Condon and Sander, 1974). Behavioral coordination is multimodal (Kendon, 1970; Barbosa et al., 2012; Louwerse et al., 2012; Bangerter and Mayor, 2013) as well as physiological (Guastello et al., 2006; Feldman, 2007; Feldman et al., 2011; Müller and Lindenberger, 2011). A tight temporal coupling is even observed in breathings during turn-taking (McFarland, 2001) and speech rates converge (Street, 1984). Whereas conversations seem to be structured by an alternance of roles (speaker vs listener), behaviors are thus underlain by a common temporal framework. Relational dynamics seem to attract individual temporalities toward a shared timing (Deschamps et al., 2012; Froese et al., 2012). In laboratory settings, individually prefered tempi indeed tend to move toward a common ground even when people coordinate unintentionally and without awareness to do so (Oullier et al., 2008).

If coordinating in time isn't the proper aim of daily interactions, how does it arise? In the light of the previous sections, we would expect that temporal coordination of behaviors emerges spontaneously from the self-organization of between-persons relational dynamics. This hypothesis is supported by numerous studies (for reviews, see Oullier and Kelso, 2009; Delaherche et al., 2012; Schmidt et al., 2012; Dale et al., 2013; Lagarde, 2013). For instance, when pairs of participants oscillate their legs in anti-phase (opposite directions) at an increasing frequency, their coupling becomes unstable near to a critical threshold; phase wandering between attractors or abrupt transitions toward more stable patterns is observed (Schmidt et al., 1990), a typical signature of self-organized dynamical systems that are modeled by non-linearly coupled oscillators (see also Schmidt and Turvey, 1994; Amazeen et al., 1995). Interpersonal patterns of coordination thus follow the same dynamical laws than bimanual patterns or unimanual-metronome patterns (see also Mottet et al., 2001; Black et al., 2007; Richardson et al., 2007a). This isomorphism suggests again that coordination emerges from the dynamics of interaction rather than from the specific properties of the coordinated components. Such synergistic effects have also been observed in more ecological tasks such as martial arts and hand clapping games (Riley et al., 2011), rocking chairs (Richardson et al., 2007b; Frank and Richardson, 2010), in language games that imply turn-taking (Schmidt et al., 2011) or in problemsolving tasks (Shockley et al., 2003; Richardson et al., 2005; Coey et al., 2011; see also Richardson et al., 2008; Shockley et al., 2009; Fusaroli et al., 2014). During sport activities, whether players are opponents or not, the dynamics of their coupling spontaneously self-organize and attractors emerge from their collective dynamics as well (Bourbousson et al., 2008, 2010a,b; Travassos et al., 2011; Yokoyama and Yamamoto, 2011; Okumura et al., 2012; Duarte et al., 2013; García et al., 2013). Whether intended or not, interpersonal coordination is thus underlain by a similar dynamical landscape constituted by attractors of collective dynamics (Schmidt and O'Brien, 1997; Richardson et al., 2007b; Oullier et al., 2008). The spontaneity of interpersonal dynamics is such that coordination also emerges when participants are specifically instructed not to do the same movements as their partner (Boker and Rotondo, 2002; Issartel et al., 2007). Movements unintentionally coordinate even when participants attend to a different external pacer, up to the point that the very reorganization of their own behavior tended to occur through simultaneous phase transitions (Varlet et al., 2011). The coordinative efficacy of the process of interaction is thus difficult to escape from. Because it happens most often without any awareness on our behalf, it precedes its explicit experience and thereby its regulation. Individual behaviors thereby seem to be entangled in the relational dynamics of their coupling. Intention and attention might then guide the regulation of this metastable background of collective dynamics in order to stabilize it (see Temprado and Laurent, 2004).

Relational dynamics of interpersonal interactions involves two autonomous embodied perspectives and are thus bidirectional. Studies on interpersonal coordination dynamics rarely took this aspect into account: usually, the comparison is made between coupled and non-coupled situations. The enactive approach emphasized the role of the very mutuality of interactions as a source of coordination (e.g., Froese and Di Paolo, 2008), which points out the properly interpersonal dimension of this phenomenon. Murray and Trevarthen (1985) and Nadel et al. (1999) evidenced the importance of the mutuality of the interaction process. Infants and their mother interacted through a TV-monitor, until the live retransmission of the mother's behavior was replaced

by a recording of her behavior made during the same interactional sequence. Though infants observed the exact same behavior of their mother in both situations, they reacted very differently when they faced the recording, displaying anger and frustration. Probably because they could not experience their own contribution in the regulation of the relational dynamics, they lost interest in interacting with their non-responsive (recorded) mother. This happens even when her image is delayed by three seconds only (Henning and Striano, 2011). In adult video-conferences, slight delays in the transmission of information can destabilize interpersonal coupling dramatically too (Nijholt et al., 2008). The mutual, simultaneous sharing of the interaction process is thus critical to interpersonal coordination, which can therefore not be reduced to purely individual processes.

Collier and Burch (1998) made the general prediction that bidirectional interactions between complex systems should yield "more effects for less effort" (i.e., enhanced coordination for less energy dissipation) than unidirectional interactions where only one system can be affected by the other. Indeed, mutual interactions entail more accurate and/or stable coordination than unidirectional ones (Cummins, 2009; Konvalinka et al., 2010; Shikanai and Hachimura, 2012; Hart et al., 2014), or than interactions where participants had to follow a partner who has a metronomic cue in his headphones (Oullier et al., 2003). Moreover, when mutual interactions are compared to unidirectional ones, increased stability of coordination at the level of the interpersonal coupling is accompanied by decreased fluctuations at the individual level (Hart et al., 2014), confirming the general "more effects for less effort" hypothesis (Collier and Burch, 1998). It seems that relational dynamics enable the (potentially or partly self-organized) co-regulation of each other's variability, as if it was the coupled system's whole variability. Our influence on the other, his responsiveness and the relational dynamics it entails thus do something for our coordination: it lays a background of collective dynamics that orient our inter-actions. By interacting, we co-regulate this metastable background, and thereby co-organize the dynamics of each other's embodied background. This permits to unload part of the coordinative process on the dynamics of interactions themselves. Our embodiment is thus such that it can benefit from the (self-organized and co-regulated) complementary dynamics of each other's actions. Conversely, unidirectional coupling rigidifies the situation. In this situation, variability cannot be organized collectively: the entire inflexible variability of the unresponsive partner has to be accomodated by the other on top of his own fluctuations. As already stated, stability (at the collective level) thus involves flexibility (at the individual level).

Unilateral and mutual embodied coupling thus have distinct phenomenologies. However, during concrete interactions, these two typical situations are extremities of a whole "spectrum of participation" (Di Paolo and De Jaegher, 2012). Different degrees of involvement can indeed be invested in the regulation of the interaction process. Interacting therefore implies participating to the modulation of the interaction process by modulating our participation to that process. Attention could thus be directed toward different aspects of autonomous∼relational patterns of coordination. Indeed, leaders (or socially dominant personalities) seem more focused on their own behavioral temporality: they display

less fluctuations and thereby interact in a more rigid fashion than "followers" (Schmidt et al., 1994; Fairhurst et al., 2014; see also Sacheli et al., 2013). Followers pay more attention to the stability of the interaction process itself (Fairhurst et al., 2014). However, participants classified as "socially dominated" can be overresponsive (by taking the interaction process too much in charge; Schmidt et al., 1994). This might not leave enough room for the personal involvment of the other in the co-regulation of relational dynamics and the variability of behaviors that underlies it (Repp and Keller, 2008). For instance, social anxiety disorders entail difficulties in intentionally leading a coordination task (Varlet et al., 2014).

The coordinated regulation of interactions thus implies moderate contingencies, that is, flexible deviations from strict synchrony (Gratier and Apter-Danon, 2009). Such flexibility of the interaction process is also observed in mother-infants interactions, where moderate contingencies are both preferred and preferable for communication and development (Jaffe et al., 2001; Gratier, 2003; Hane et al., 2003; Gratier and Apter-Danon, 2009). Interpersonal rhythmic structures facilitate and guide coordination by providing embodied coupling with anticipatory dynamics. The emergence of interpersonal rhythms thus allows for dynamical backgrounds of embodiment to converge and to be organized with congruent retentions and protentions. Flexible fluctuations are functionnal too. They provide with surprises and make the interaction process incomplete (protentions are not entirely fulfilled). This incompleteness then requires the active engagement of participating individuals in the co-regulation of their relational dynamics (Deckers et al., 2012). Further, flexibility also permits to repair coordination breakdowns by reorganizing the interaction process. Optimal relational dynamics are thus a balance of stability and flexibility, a compromise between random fluctuations and strictly metronomical rhythms. In other words, interpersonal relational dynamics are metastable. This regime of interpersonal coordination leaves enough room for autonomy, such that subjects can experience their interactions in the background of their own dynamical embodiment. It also leaves enough room for relational dynamics to bring forth a temporality of their own. The coregulation of these dynamics provides with a common dynamical background that modulates and coordinates autonomous embodiments. In this regard, spontaneous imitations of each other's behavior entail the temporal coordination of brain dynamics themselves (Dumas et al., 2010; for reviews of inter-brain synchronization studies, see Dumas et al., 2011 and Konvalinka and Roepstorff, 2012). Autonomous and relational dynamics thus coconstitute each other, such that, by interacting, we co-enact a time whose sharing can be experienced inter-actively.

If the interaction process entails metastable relational dynamics, the latter should exhibit multiscale multiplicative dynamics. The presence of coordination of multiple behavioral cyclicities has indeed been observed during conversational interactions (Newtson, 1993; Sadler et al., 2009). Moreover, relational dynamics observed in movements had significant interpersonal meanings such as dominance and affiliation (Sadler et al., 2011). Mother-infants interactions are also coordinated at multiple timescales (Malloch, 1999; Gratier, 2008; Gratier and Apter-Danon, 2009): they follow an implicit pulse, and form broader phrases as well as longer narrative cycles of vocal and behavioral exchanges. Interestingly, the behavioral timescale of microexpressivity, pulses and phrases correlate well with the neurodynamical scales described by Varela (1999a). Further, dynamics at work at these behavioral timescales seem to interact with each other. For instance, the lack of expressivity of deviations from isochrony at the pulse level has long-term effects on the overall quality of coordination (Gratier andApter-Danon,2009). The perturbation of the precise simultaneity of time has deleterious effects on the overall temporal organization of adult interactions, including turn-takings (Ruhleder and Jordan, 2001). On top of being multiscale, the interaction process thus exhibit signs of multiplicative dynamics. Indeed, in interpersonal motor tasks, relational variables such as relative phase or cross-correlation of periodicities of behaviors exhibit fractal structures (Hennig, 2014). Further, Ashenfelter et al. (2009) observed that head movements of conversational partners have a multifractal structure. It consisted in two fractal scalings: one at the level of local dynamics (short timescales) and the other at a more macro level. Ashenfelter and colleagues interpret this result as an indication of the presence of both coordinative processes and role alternance (or symmetry formation and symmetry breaking). The dynamical background that underlies interpersonal interactions is thus metastable: it is characterized by a dynamical blend of stable integration and flexible segregation of individual behaviors (Kelso and Engstrom, 2006).

If we participate interactively to each other's dynamical embodiment, then the whole complexity of our dynamically embodied perspectives should get coordinated. In general, interacting complex systems are expected to match the very complexity of each other's dynamical organization (West et al., 2008). Indeed, a flexibly fluctuating and responsive metronome (built on nonlinearly coupled oscillators) can reinstate fractal dynamics of Parkinson diseased patients' gait at a normal level, whereas this "healthy" complexity is lost as a consequence of this pathology, as evidenced in absence of a metronome or in presence of an unresponsive one (Hove et al., 2012). Mutually coupled participants match each other's fractal dynamics of behavioral fluctuations as well (Marmelat and Delignières, 2012). Participants also match the fractal dynamics of their partner when they are unidirectionnally coupled (Marmelat et al., 2014b), but to a far lesser extent than mutually coupled participants (Laroche, unpublished). Co-regulated relational dynamics thus entail an attraction of complex internal dynamics toward congruent patterns of coordination. Dynamically and actively shared patterns of coordination that are both autonomous and relational thus emerge as wholes.

Overall, the complex temporalities that underlie our behaviors can be strongly coordinated at multiple interacting timescales. As a consequence, the backgrounds of our respective embodiments are dynamically bonded in a very subtle way. It is as if we were mutually attracted toward a common manner of "inhabiting" and shaping the time in the course of which we live. This could be hardly explained by individual capacities that would seek to mimick such complex dynamical structures. This phenomenon rather seems to emerge from relational dynamics between dynamical embodiments whose respective complexities converge by attraction and co-regulation. As even chaotic signals can synchronize

their complex behavior (Strogatz, 2003), this is eventually not a surprising phenomenon.

If complex behavioral dynamics influence each other and are attracted toward collective patterns, their retentional and protentional structures should mutually orient and shape each other, and thereby be enactively shared. The pre-reflective dynamical background of experience should thus be shaped by the interaction process (Obhi and Hall, 2011). Interpersonal coordination dynamics are indeed experienced meaningfully (Gratier and Apter-Danon, 2009; Gratier and Magnier, 2012). Their coregulation can lead to a coordination of personal experiences (Markey et al., 2010; Wiese et al., 2010) as well as to experiences of interpersonal connection (Hove and Risen, 2009; Marsh et al., 2009; Miles et al., 2009; Paladino et al., 2010; Ramseyer and Tschacher, 2011; Watanabe et al., 2011; Vacharkulksemsuk and Fredrickson, 2012). In turn, the embodiment of collective dynamics favor cooperative and pro-social behaviors (Wiltermuth and Heath, 2009; Kokal et al., 2011; Valdesolo and Desteno, 2011; Behrends et al., 2012). Unfortunately, precise first-personal descriptions of the lived experience of being together in time still lacks (but see Froese et al., 2014b). However, it is precisely because relational dynamics participate to each other's experience that the interaction process can be appropriated and co-regulated (Laroche and Kaddouch, 2014; Froese et al., 2014a). Being toghether in time is thus inter-enacted: by interacting, we embody collective dynamics that coordinate our behaviors and experiences, and we participate actively to the regulation of that process. By coregulating our embodied relational dynamics, we can co-enact a shared world of significance in which to be together. With this final remark in mind, let us now summarize and conclude this paper.

# **CONCLUSIVE DISCUSSION**

In this paper, we proposed a dynamical and embodied, enactive framework for the understanding and the investigation of the phenomenon of being toghether in time. We first defined embodiment as being both a living and a lived phenomenon that emerges from agent∼world coupling. Embodiment provides us with a perspective on our relations, a pre-reflective dynamical background on the basis of which we can enact the world through autonomous embodied interactions. This background is constituted by the self-organization of component processes whose interactions span multiple timescales. From the point of view of the living, temporality has a shape that is thus totally different from the "physical time" (Bailly and Longo, 2008; Holden, 2013). As a result of an underlying metastable regime, the temporality of the living is multiscale, multiplicative, (multi-)fractal. Behaviors and experiences thus carry the imprint of these complex dynamics in which they are entangled. This dynamical background is at the same time co-constituted by the dynamics of our relations with the world. Whole autonomous∼relational patterns of coordination thereby emerge, so that inner ("subjective") and outer ("objective") temporalities co-constitute each other dynamically.

During between-persons interactions, relational dynamics can self-organize and escape us. This gives rise to attractors of behavior in the shared dynamical landscape that we enact and navigate or

inhabit together. By exerting a mutual attraction on their underlying temporalities and by coordinating them in time, relational dynamics can constitute individual behaviors and experiences. In short, by interacting, we embody collective dynamics. Mutuality of interaction further allows for the co-regulation of each other's background of variability, as well as the emergence of a time that is properly intersubjective. The very complexity of our dynamical embodiments can thereby be inter-enactively shaped and thereby shared. This enables a strong coordination that is not a mere local synchrony (it is not a succession of synchronous states), but is extended in time at multiple interwoven scales. Since intrinsic dynamics of temporal experiences and the content of these experiences co-constitute each other, by interacting we can participate to each other's pre-reflective dynamical flow. In other words, thanks to the inter-enactive process, retentions, protentions and their multiplicative interplay can be actively and dynamically shared (not in the sense that we have an informational duplicate of each other's dynamical flow, for such a flow always emerges from its own background, but rather in the sense that we mutually shape each other's pre-reflective dynamical background). Part of our experiences are therefore embodied in each other's retentions and protentions. A co-enacted dynamical landscape thus emerges and forms a background of collective dynamics that brings forth a properly intersubjective time and coordinates its personal embodiment. Behaviors and experiences are thus entangled in this collective metastable background. By actively co-regulating these relational dynamics and by experiencing the effects of this co-regulation, we can experience the intersubjective dimension of this shared time as well as experience this sharing.

Overall, being together is neither a mere co-presence in the physical space, nor a mere temporal correlation of activities in the physical time that can be observed from an external point of view. It is the co-regulated and skillful inhabitance of the complex, metastable dynamical landscape that emerges spontaneously from the meeting of our embodied perspectives. Being together has thus to be enacted, that is, it has to be actively, dynamically and autonomously but relationally brought forth. In short, we can only experience being together through our inter-enactive engagement. In turn, this experience carries the imprint of the collective dynamics that emerge from this inter-enactivity. However, precise phenomenological descriptions of being toghether in time still lack. The recourse to more fine-grained phenomenological methods (e.g., Petitmengin, 2001) could guide fruitful empirical and modeling researches. Indeed, it is yet not clear how the temporal complexity of behaviors as measured gives rise to, is influenced by, or at least is correlated with clear and meaningful felt qualities (but see Lutz et al., 2002, in the intrapersonal domain).

Complex multiscale dynamics of interpersonal interactions have not been much addressed yet. Notwithstanding, it is a promising avenue of research. For instance, deficits in social coordination might be rooted in a loss of complexity, possibly at both the individual and the collective level (for recent dynamical studies, see Lazerges et al., 2011; Varlet et al., 2012, 2014; Lavelle et al., 2013; Marsh et al., 2013). If we take the interaction process seriously, as well as the complexity that

underlies our dynamical embodiment, treatments of cognitive disorders might be improved. For example, rhythmic auditory stimulations improve the linguistic performances of children diagnosed with developmental language disorders (Przybylski et al., 2013). Further, fractal metrics can distinguish between dyslexic and normal readers in a word-naming task (Wijnants et al., 2012). Couldn't a flexibly fluctuating and responsive rhythmic device improve performances even more, in the vein of the aforementioned work of Hove et al. (2012) with Parkinson Disease patients? If relational dynamics coordinate individual behaviors by modulating their underlying endogenous dynamics, responsive devices might entail more healthy dynamics, whereas part of the burden of coordinating to this device could be unloaded onto the interaction process itself.

Finally, coordinating in time leaves traces on embodied dynamics after the interaction itself (Oullier et al., 2008; Hove et al., 2012) on top of explicit traces of the partner himself (Macrae et al., 2008; Miles et al., 2010). Recurrent interactions and the temporal coordination they entail might enable the stabilization of interactional repertoires as well as the emergence of long-term and large-scale bonding such as those found in cultural practices and habits (Gratier and Apter-Danon, 2009; Gratier and Magnier, 2012). Dynamical models of embodied interactions thus might also play a significant role in the understanding of socio-cultural phenomena that are observable at larger timescales (Aguilera et al., 2013; Cao et al., 2013).

## **ACKNOWLEDGMENTS**

The authors would like to thank the anonymous reviewers whose comments helped to improve this text significantly. They also thank Myriam Gillibert for her linguistic help. Finally, Julien Laroche personally thanks Ilan Kaddouch for giving him free time to work on this manuscript.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 May 2014; accepted: 29 September 2014; published online: 31 October 2014.*

*Citation: Laroche J, Berardi AM and Brangier E (2014) Embodiment of intersubjective time: relational dynamics as attractors in the temporal coordination of interpersonal behaviors and experiences. Front. Psychol. 5:1180. doi: 10.3389/fpsyg.2014.01180*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Laroche, Berardi and Brangier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# *Alba Montes Sánchez 1,2\**

*<sup>1</sup> Center for Subjectivity Research, Department of Media, Cognition and Communication, University of Copenhagen, Copenhagen, Denmark*

*<sup>2</sup> Departamento de Humanidades: Filosofía, Lenguaje y Literatura, Universidad Carlos III de Madrid, Madrid, Spain*

*\*Correspondence: xjf783@hum.ku.dk*

#### *Edited by:*

*Hanne De Jaegher, University of the Basque Country, Spain*

#### *Reviewed by:*

*Petr Urban, Academy of Sciences of the Czech Republic, Czech Republic Phil Hutchinson, Manchester Metropolitan University, UK Julien A. Deonna, University of Geneva, Switzerland*

#### **Keywords: shame, guilt, moral emotions, relationality, intersubjectivity**

In recent years, a view on two key moral emotions, shame and guilt, seems to be establishing itself in some sectors of psychology, based mainly on the research of Tangney and Dearing (2004) and their "Test of Self-Conscious Affect" (TOSCA). On this view, guilt is a productive force in our moral lives, while shame is morally counterproductive and psychologically harmful. Therefore, one should cultivate guilt and fight shame. But this conclusion is problematic for two main reasons, among others. On the one hand, the distinction that grounds it is too simplistic: the boundary between guilt and shame is far more blurry and complex than this account acknowledges. On the other hand, it operates on a functionalistic definition of morality, where "moral" means "prosocial," which is ultimately insufficient to account for the moral role of these emotions. The functionalistic approach neither does justice to the self-conscious aspects of guilt and shame nor to the interactive dimensions of morality, as a shared practice we engage in with others (Calhoun, 2004).

## **TANGNEY AND DEARING'S ACCOUNT**

According to Tangney and Dearing (2004), the main difference between shame and guilt lies in their objects of focus: shame focuses on the ashamed self, while guilt focuses on behavior. In shame we feel bad about the way we are, about some characteristic or feature of ours, while in guilt we feel bad about our actions or omissions, about having done something wrong, broken a norm or harmed somebody. On this view, because self is perceived as much more difficult to change or undo than behavior, shame leads to antisocial tendencies (shunning contact with others, lashing out in anger), and ultimately to low self-esteem, depression and addictions. In contrast, guilt motivates prosocial efforts (apologizing, attempting to undo or compensate the harm done), and it is not correlated to low self-esteem or addictions. Therefore guilt is seen as productive and shame as counterproductive. However, another finding of Tangney's should give us pause. In a study of incarcerated offenders, Tangney and Stuewig claim that the only people who have no capacity for shame are psychopaths; therefore they conclude that in "extreme populations" some shame is better than the absence of any self-evaluative emotion, as it offers a ray of hope for social reintegration (Tangney and Stuewig, 2004, p. 327). But if shame is thus in some way connected to moral sensibility, why should this conclusion only hold for "extreme populations"?

## **PROBLEMS WITH THE DISTINCTION BETWEEN SHAME AND GUILT**

Let us take a closer look at the problems entailed by this account. First, although Tangney and Dearing's definitions of shame and guilt, based on the work of Helen Block Lewis (1971), are widely accepted and indicate a helpful distinction, they should be handled with care. Tangney et al. (1996) have shown that people tend to have trouble distinguishing between shame and guilt (while they find it much easier to distinguish between shame and embarrassment). Dearing and Tangney (2011, pp. 9–11) explain this as an error of judgment or a confusion on the part of therapists or clients, but I disagree. Dearing and Tangney present these emotions as two perfectly discrete processes that produce very different responses and have very different functions, but this is very dubiously the case. Guilt and shame are complex self-conscious emotions, with a high degree of cognitive specification and wide variations from culture to culture. They are in the same emotional territory, they share a vast phenomenal ground and work together in many ways. Some authors (Ortony, 1987; Elison, 2005) claim that they are two slightly different cognitive specifications of the same basic affective phenomenon, which would explain why some times they are hard to distinguish, and should cast doubts on attempts at sharply differentiating their functions.

Tangney and Dearing's definitions of shame and guilt rely on a clear separation between self and behavior, where "self" refers to the set of features that define an individual. Instead, I believe that selfhood should be conceived as a dynamic process of self-conscious individuation that can rely on different dimensions in different contexts (see, e.g., Zahavi, 2005; Reddy, 2008; Rochat, 2009). According to Tangney and Dearing, in shame, self-individuation takes place in terms of a negative feature that is perceived as defining the self as a whole: for example, greed. I perceive myself as greedy and I am ashamed of myself as a result. In my view, this account overlooks several dimensions of the shame experience that play a crucial role in the process of self-individuation, namely embodiment, situatedness and temporality (Guenther, 2011; Zahavi, 2012; León, 2013): I apprehend myself not simply as *a* (any) greedy individual, but as *this* singular one, *me*, put on the spot here and now. As León (2013, p. 211) puts it, to feel shame is "to experience in intersubjective contexts the irreducibility of one's own particular subjective situation in the world." Admittedly, these phenomenological dimensions don't render themselves easily to operationalization and testing. But my worry is not so much that descriptions of shame and guilt are inaccurate, but that strong moral conclusions are drawn from them. Behavior often contributes crucially to dynamic and situational selfindividuation, so the boundary between them is blurry and permeable. Let me be clear here: I agree that self and behavior are concepts that mark a helpful distinction. But Tangney and Dearing further tell us that, in the interest of morality, we ought to *disconnect* them, that our emotions of shame and guilt do just that, and that a focus on behavior is morally preferable to a focus on self—indeed, it is not merely preferable, it is the morally good choice versus the morally bad choice (see Tangney and Dearing, 2004, esp. ch. 5 and 6). This entails that there are no situations where shame might be the more appropriate moral response, which is questionable (ought citizens of Western countries feel guilty, as opposed to ashamed, of our governments' failure to prevent the genocides in Rwanda and Bosnia, for example? See Hutchinson (2008) and Morgan (2008) on this issue).

A more serious concern is that Tangney and Dearing's very definitions of shame and guilt already imply many of the factors they are trying to test. In particular, the antisocial and destructive nature of shame and the prosocial and constructive nature of guilt are presupposed by and built into their TOSCA tests (see Ferguson and Stegge, 1998; Luyten et al., 2002; Giner-Sorolla et al., 2011; Nelissen et al., 2013, p. 358). Luyten et al. (2002) have shown that the original TOSCA overwhelmingly represents cases of mild, adaptive guilt related to reparation, and maladaptive aspects of shame related to low self-esteem. Drawing on these findings, Giner-Sorolla et al. (2011, p. 446) reach the conclusion that "TOSCA guilt measures the motivation to respond to one's own misdeeds with compensatory action, whereas TOSCA shame measures the tendency to experience intense emotions of guilt and shame from the appraisal of self-blame, and to a lesser extent the desire to withdraw from others." Thus, the test does not track shame and guilt, but two different ways of dealing with them.

This takes me to another worry: the TOSCA test is designed to measure a disposition or a character trait, *proneness* to feel shame or guilt in various situations, but in the subsequent interpretation of results, Tangney and Dearing extend their conclusions to individual episodes of these emotions. This is problematic, because, as Nelissen et al. (2013, p. 359) explain, the characteristics of the people who are generally predisposed to feel a particular emotion in a wide array of circumstances tell us very little about the *function* and effects of isolated episodes of that emotion in just any person. From the finding that shame*proneness* is *associated* with low self-esteem one cannot conclude that all individual episodes of shame *lead to* low self-esteem. The conclusion of Tangney and Dearing's study should be that people with certain character traits or dispositions tend to deal with emotions of self-assessment in counterproductive ways, not that shame is destructive and guilt is constructive across the board.

## **INSUFFICIENT ACCOUNT OF THE ROLE OF OTHERS**

Further, some important elements to determine whether shame will have productive results or not are contextual and depend on interaction. Indeed, De Hooge et al. (2010) have found in their empirical studies that shame *can*, and actually *does*, lead to prosocial behavior in certain circumstances, namely in dyadic interactions where the partners have witnessed the shameful behavior. If somebody does something shameful in front of us, and we see this person react with shame, our opinion of the offender is likely to be much less negative that if this person acts shamelessly. This is so because, from a second-person perspective, shame reveals a concern for other people's opinions, as well as for shared norms and standards, which can counter the effects of a previous failing and partially restore other people's trust in the offending individual.

Tangney and Dearing disregard this. They combine their functionalistic understanding of morality (behavior is considered moral when it tends to favor others at the expense of oneself) with an agentcentered take on it, which overlooks interaction and group dynamics. Actions are judged as morally constructive if, from the agent's perspective, they are in any measure altruistic or other-regarding, and they are judged as morally counterproductive if the opposite is the case. But no attention is paid to other people's perceptions of and reactions to displays of these emotions, or to the intersubjective interactions that ensue, which can and often do have prosocial consequences. Those tendencies should be part of a functionalistic story about the role of these emotions in morality, but this is not enough. In my view, this type of functionalistic and consequentialist approach is too narrow to fully account for the private aspects of morality (self-evaluation, selftransformation, deliberation and decisionmaking) and overly simplifies the public ones, reducing them to action tendencies.

Moreover, the abovementioned studies of dyadic interactions only show a small fraction of the important role of others in shame. Rochat (2009) and Seidler (1996, 2000), among others, offer accounts of shame as crucial to the intersubjective development and sustainment of selfconsciousness. Shame would precisely be crucial because it captures the experience of self in relation to others and is the product of a discrepancy between the firstand the third-person perspectives on oneself (Rochat, 2009, p. 105, 108, 109). This role in self-constitution is also essential to morality in ways that Tangney and Dearing's account cannot do justice to. It is crucial for self-examination, learning and self-transformation. In my view, the intersubjectivity and social self-consciousness that shame entails constitute a ground from which morality can take off. A *capacity* to feel shame would therefore be morally productive in general, not only in the contingent occasions in which shame actually works to foster harmonious social relations. One of the standard, albeit controversial (see Deonna et al., 2011), claims about shame is that it is a social emotion. In my view, the correct way to interpret this claim is not that in every instance of shame I evaluate myself exactly as the other does—an interpretation that has its own share of problems—, but rather that this emotion entails a widening of my perspective where I recognize that a part of who I am escapes my control and depends on the other (see Sartre, 2003). Shame does not include all the elements that moral goodness requires, but it does attest to our openness to others, our "irreducible relationality" (Guenther, 2012, p. 71), and it can show that we take seriously the shared practice of morality (Calhoun, 2004, pp. 139–146). Before dismissing shame as morally counterproductive, its crucial role in intersubjective self-constitution needs to be studied in its full complexity (see, e.g., Schneider, 1977; Hutchinson, 2008; Reddy, 2008; Williams, 2008; Rochat, 2009; Guenther, 2011; Zahavi, 2012; León, 2013; Welz, 2014). TOSCA-based research programs overlook or flatten many of these issues, and therefore can only offer a limited picture of the role of shame and guilt in morality.

## **ACKNOWLEDGMENTS**

This work was supported by the Marie-Curie Initial Training Network, TESIS: Towards an Embodied Science of Inter-Subjectivity (FP7-PEOPLE-2010-ITN, 264828). I wish to thank Antonio Gómez Ramos and Dan Zahavi.

## **REFERENCES**


Psychological Association. doi: 10.1037/ 12326-000


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2014; accepted: 09 July 2014; published online: 29 July 2014.*

*Citation: Montes Sánchez A (2014) Intersubjectivity and interaction as crucial for understanding the moral role of shame: a critique of TOSCA-based shame research. Front. Psychol. 5:814. doi: 10.3389/fpsyg. 2014.00814*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Montes Sánchez. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Invisible excess of sense in social interaction

# *Alice Koubová\**

Department of Contemporary Continental Philosophy, Institute of Philosophy of the Academy of Sciences of the Czech Republic, Prague, Czech Republic

## *Edited by:*

Hanne De Jaegher, University of the Basque Country, Spain

#### *Reviewed by:*

Ezequiel Alejandro Di Paolo, Ikerbasque – Basque Foundation for Science, Spain Elena Clare Cuffari, University of the Basque Country, Spain

#### *\*Correspondence:*

Alice Koubová, Department of Contemporary Continental Philosophy, Institute of Philosophy of the Academy of Sciences of the Czech Republic, Jilska 1, Prague 1, 11000, Czech Republic e-mail: alicekoubova@seznam.cz

The question of visibility and invisibility in social understanding is examined here. First, the phenomenological account of expressive phenomena and key ideas of the participatory sense-making theory are presented with regard to the issue of visibility. These accounts plead for the principal visibility of agents in interaction. Although participatory sensemaking does not completely rule out the existence of opacity and invisible aspects of agents in interaction, it assumes the capacity of agents to integrate disruptions, opacity and misunderstandings in mutual modulation. Invisibility is classified as the dialectical counterpart of visibility, i.e., as a lack of sense whereby the dynamics of perpetual asking, of coping with each other and of improvements in interpretation are brought into play. By means of empirical exemplification this article aims at demonstrating aspects of invisibility in social interaction which complement the enactive interpretation.Without falling back into Cartesianism, it shows through dramaturgical analysis of a practice called "(Inter)acting with the inner partner" that social interaction includes elements of opacity and invisibility whose role is performative. This means that opacity is neither an obstacle to be overcome with more precise understanding nor a lack of meaning, but rather an excess of sense, a "hiddenness" of something real that has an "active power" (Merleau-Ponty). In this way it contributes to on-going social understanding as a hidden potentiality that naturally enriches, amplifies and in part constitutes human participation in social interactions. It is also shown here that this invisible excess of sense already functions on the level of self-relationship due to the essential self-opacity and self-alterity of each agent of social interaction. The analysis consequently raises two issues: the question of the enactive ethical stance toward the alterity of the other and the question of the autonomy of the self-opaque agent.

**Keywords: participatory sense-making, enactive theory, Merleau-Ponty, invisibility, opacity, (Inter)acting with the inner partner, performativity, dramaturgical analysis**

# **INTRODUCTION**

Starting from a basic agreement with key ideas of enactive theory of social understanding, especially with the concept of participatory sense-making, this paper presents a detailed examination of the question of visibility and invisibility of agents in social interaction. In order to avoid Cartesian homuncularity, enactive theory pleads for the principal visibility of agents, the *a priori* given tendency to understand each other, and their capacity to integrate disruptions, opacity and misunderstandings in mutual modulation. While it does not fully deny opacity and invisible aspects of agents in interaction, it regards these as the dialectical counterpart of visibility, as a lack of sense that brings about the dynamics of social understanding in the form of asking, of coping with each other, and of improving interpretations. The aim of this article is to focus on an aspect of invisibility in social interaction that complements the enactive interpretation. Without lapsing into Cartesianism, I wish to exemplify several levels of invisibility in social interaction (physical, social, self-relational, and intersubjective). In particular, I wish to analyze a hypothesis according to which there is an aspect of invisibility that functions as a subtle *source* of ungraspable meaning, as an excess of sense, whose function is *performative*. This aspect, I argue, contributes to on-going social understanding as a hidden potentiality that naturally enriches, amplifies and in part constitutes human

participation in social interactions. As the second part of the research I wish to draw upon experimental work to make it clear that an agent's invisibility (opacity) is present already in the self-relationship and grounds the non-trivial structure of self-alterity.

I wish to demonstrate these aspects of social understanding on the basis of qualitative research accomplished within longitudinal studies of theatricality, performativity, and art-based practices at the Institute for Research and Study of Authorial Acting (IRSAA), at the Academy of Performing Arts in Prague and the Academy of Sciences of the Czech Republic. The research limits itself only to one experiment called "(Inter)acting with the inner partner" and its potential to exemplify self-alterity and the invisible excess of sense. Methodologically I mainly draw upon the research of the sociologist Goffman (1956) and his dramaturgical analysis (see also Hare and Blumberg, 1988), but important tools for the present study are also the examination of symbolic interaction in everyday life, the phenomenological method and participatory observation.

## **THEORETICAL BACKGROUND**

The problem of social cognition has been one of the most burning issues in psychology and cognitive science over the last several decades. The original "problem of other minds" that presupposes the existence of hidden mental, interior and private space represented secondarily through different linguistic, corporeal, and gestural manifestations has been widely criticized. The critique arose in phenomenological philosophy and has undergone further elaboration and development in the theory of embodied and enactive cognition.

As regards the phenomenological response to the problem of other minds, already in Scheler (1973 [1912], 232–234) proposed the concept of expressive unity (*Ausdruckseinheit*). He attempted to show that expression (corporeal behavior, action, gestures, discursive articulation) is not only a secondary visible manifestation of a *psyche*, but at least an integral part thereof. This statement is further developed by many other authors. Plessner (2003 [1941], 261) says that the unity of an expressive phenomenon is given by "indifference between content and form," as is evident in the case of primary intersubjective expressive phenomena, such as laughter or crying, where the sign and meaning cannot be linked together arbitrarily. We may refer as well to Maurice Merleau-Ponty who assumes: "We must reject the prejudice which makes 'inner realities' out of love, hate or anger, leaving them accessible to one single witness: the person who feels them. Anger, shame, hate, and love are not psychic facts hidden at the bottom of another's consciousness: they are types of behavior or styles of conduct which are visible from the outside. They exist on this face or in those gestures, not hidden behind them"(Merleau-Ponty,1964a, 52–53). In this way phenomenology assumes that social cognition has to do with perception (*aisthesis*) and that a human being is in principle "visible" to other human beings.

The enactive theory of embodied cognition moves in a similar direction. It attempts to overcome Cartesianism and its third-person paradigm of social cognition understood as a passive observation of others' behavior. As many have shown (Varela et al., 1991; Thompson and Varela, 2001; Gallagher and Varela, 2003; Thompson, 2007; Hutto, 2013; Hutto and Myin, 2013) the mind cannot be reduced to brain processes or internal representations, but is "an ongoing and situated activity" (De Jaegher and Di Paolo, 2007, 486). This entails that social understanding should be understood as "an interactional and intercorporeal process in which both partners are immersed and in which the process of interacting itself plays a leading role for the understanding... In short: social cognition emerges from embodied social interaction or, in Merleau-Ponty's term, from intercorporeality" (Fuchs and De Jaegher, 2009, 469).

This familiar context is important for the purpose of this paper in one particular respect, namely as regards the principal visibility of other minds. The enactive theory opposes the Theory Theory (Premack and Woodruff, 1978; Baron-Cohen et al., 1986; Antonietti et al., 2006) and the Simulation Theory (Gordon, 1996; Dokic and Proust, 2002; Goldman, 2006) that share the common presupposition of "homuncularity, the absence of body" (De Jaegher and Di Paolo, 2007, 485), in other words the idea that "[human beings] are hidden from each other in principle" (Fuchs and De Jaegher, 2009, 467). Enactivism explains this presupposition of the Theory Theory and the Simulation Theory by the fact that they both stem from Cartesian dualism. In accordance with phenomenological approaches mentioned above, the theory of enaction operates on

the assumption that agents' actions can be understood "as exhibiting an inherent and 'visible' intentionality and as being related to each other in a meaningful way" (Fuchs and De Jaegher, 2009, 467). This is possible due to the fact that the very substance of what one means in interaction is always embodied. "Bodily expression does not mean a simple subsequent externalization of what already is inside me, but rather expression is a realization of sense" (Waldenfels, 2000, 222); "For the enactivist the body is the ultimate source of significance; embodiment means that mind is inherent in the precarious, active... process of animation... Cognition simply cannot but be embodied" (Di Paolo et al., 2010, 42). The mutual visibility of agents is explicated by the genealogy of social understanding that has a firm basis in intercorporeality. The genealogy points to basic empathy as developed in the first month of human life on the basis of perpetual corporeal interaction between the child and her most intimate caregivers (Tronick et al., 1979; Murray and Trevarthen, 1985; Kelso, 1995). This relationship is based on "trust in others and bonding capacity," where "mismatches," i.e., "interactive errors" are followed "with quick reparations"and reparation becomes a key process of social understanding (Fuchs and De Jaegher, 2009, 479). This leads directly to the *primacy of visibility* that is potentially able to integrate invisibility, disruption, and misunderstanding: "interactional experience continually increases the skillfulness of the participants...This is based on the 'visibility' of intentions-in-action" (Fuchs and De Jaegher, 2009, 471). Visibility is further reinforced by the phenomena of "coordination" as a "ubiquitous phenomenon in physical and biological systems" (De Jaegher and Di Paolo, 2007, 490) and "mutual modulation" (De Jaegher and Di Paolo, 2007, 504) that both play a crucial role in the generation of meaning called "participatory sense-making" (De Jaegher and Di Paolo, 2007). The expressive phenomenon is understood as a *unity* that is fully at stake in the process of mutual understanding.

Obviously, enactive theory does not fully deny the relevance of opacity for social interaction and understanding. Such an interpretation would be false and misleading. There are basically two strategies for dealing with the opacity or alterity of the other in social interaction. The first emphasizes the radical alterity and non-transparency of the other (e.g., Lévinas, 1979, 89, Theory Theory, Simulation Theory). However, as Zahavi (2005, 175) mentions, "the difficulty with this view is that it often tends to emphasize the transcendence and elusiveness of the other to such extent that it not only denies the existence of a functioning intersubjectivity, but also the *a priori* status of intersubjectivity." This is indeed the reason why the enactive theory rejects such an approach. The second strategy tends to see opacity as a relevant aspect of the shared process of coping and social understanding (e.g., Husserl on interplay between ipseity and alterity: Hua 8/62; 14/457; 13/263). This perspective characterizes the participatory sense-making approach as well. It is based on the assumption of the primary visibility of the other, as I mentioned above. On this view, the other is never "totally alien" (De Jaegher and Di Paolo, 2007, 504). Because the other appears due to "my own participation in the emergence and breakdown of joint relational sense-making" (De Jaegher and Di Paolo, 2007, 504), we cannot be completely alien to each other. Authors say that agents taking part in interaction do not "experience the other-in-interaction as totally obscure

and inaccessible, nor as fully transparent, but... as protean pattern with knowable and unknowable surfaces and angles" (De Jaegher and Di Paolo, 2007, 504). However, the basic unity of the shared process enables "mutual modulation" and makes visible within the process of social interaction *only those aspects* that *make sense to me* as a participant. It is a continual process where misunderstandings as the "dialectical counterpart of understanding" *serve* the initiation of the process and its continuation "like questions that lead to answers in the subsequent course of the interaction" (Fuchs and De Jaegher, 2009, 471). The assumption of the primary visibility of the other, the phenomenon of mutual modulation and coordination may be seen as characteristics of the primary readiness and tendency to understand the other, to interpret her as meaningful.

## **FORMULATION OF SPECIFIC FOCUS OF THE STUDY: THE INVISIBLE EXCESS OF SENSE**

Principally in agreement with this way of interpreting social cognition, I wish nevertheless to focus in some detail on one particular and rather subtle aspect of invisibility. I hope that taking this phenomenon into consideration will enable us to uncover a new dimension of opacity in social interaction besides radical hidden transcendence (the absolute form of the intangibility of the other). First of all, I wish to follow Zahavi's (2005, 175) assumption that "the encounter with the other is, in any way, prepared and conditioned by an alterity internal to the self." I wish to demonstrate on the basis of my experimental study that there is a "form of alterity internal to the embodied self," that self-experience of subjectivity must contain a dimension of otherness and that intersubjectivity would otherwise be impossible (Merleau-Ponty, 1945/1962, 400– 401; Zahavi, 2005, 158). Secondly, I wish to demonstrate that this otherness is not only a dialectical counterpart of our unity, not only a lack of sense that calls for fulfillment or a question that calls for an answer, but it is rather a *performative* aspect of our existence that can obtain only as invisible and can contribute to social interaction only if we let it run its course. In other words, there is an aspect of invisibility that contributes to social interaction and represents something that *makes sense* in social interaction, although it is not within the agent's reach and comprehension. This is why I call this aspect the invisible excess of sense.

To demonstrate this sort of phenomenon and to support my thesis I wish to refer briefly toMerleau-Ponty's description of opacity in two layers, one corporeal and one cognitive. First of all, he emphasizes that visibility is, already on the corporeal level, given under the condition of non-transparency, invisibility. Human beings are visible, because they are not transparent. In other words, one can be visible to somebody if one is invisible in some respect, if one gives resistance to perception. As Merleau-Ponty (1964b, 167) put it: "Experience is... contact of finite subject with impenetrable being... the flesh of what is perceived, this compact particle... stops exploration." Thus, the condition of the possibility of the visibility of the other seems to be not the willingness to be visible, but rather the resistance to the view of the other. The hiddenness of perceived being is given in the density, the thickness of materiality which impedes any further seizing hold of it. Merleau-Ponty urges us not to seize hold of the real, but to become accustomed to its mystery, hiddenness – to its "being in withdrawal." He warns that

as soon as we create our possibilities, goals or ideas out of the real, we lose sight of a certain layer of things, a layer of inexhaustible riches opened to our gaze.

As regards the "cognitive" layer, Merleau-Ponty (1968, 150) speaks of ideas that can exist only in a hidden way: "It is as though the secrecy wherein they lie... were their proper mode of existence." By this Merleau-Ponty does not mean to say that these ideas are abstract and invisible, pure, and intangible. He points out that their visibility is necessarily covered and they can exist only in a covered way: "there is no vision without the screen: the ideas we are speaking of would not be better known to us if we had no body and no sensibility; ... [they] are not exhausted by their manifestations."Merleau-Ponty thus reminds us of the complex relationship between the visible (perceptible) and invisible aspects of meaning. He states that there are "ideas" whose existence is necessarily embodied although this embodiment makes them unavoidably opaque. There is nothing like an abstract idea transformed subsequently into a sign graspable by the receiver. These ideas can exist only as hidden and never complete in their manifestation. Their being is embodied, but their meaning is not totally given by perceptible signs. It is uniquely this invisibility which gives them their performing power, their authority in the shared perceptible world. "It is that they owe their authority, their fascinating, indestructible power precisely to the fact that they are in transparency behind the sensible" (Merleau-Ponty, 1968, 150). We should understand from this that visible meaning is sometimes accompanied by the invisible aspect that amplifies the"active power" of what is being perceived. Thanks to their opacity ideas address the receivers, provoke them to action, and move them. They are performative. This view is not restricted to the thought of Merleau-Ponty. Ricouer (2000, 337), for instance, mentions the existence of the unsiginifiable element in human experience that is a source of meaning without ever being addressed.

# **METHODOLOGICAL BACKGROUND**

As I have already mentioned, the existence of the invisible excess of sense and the inner duality based on irreducible self-difference and self-opacity of the agent are to be shown through experimental investigation using art-based practices and methods of qualitative research.

The art-based and theater-based inspirations for the study of social interactions have a long-standing tradition in academic research. The wider context of the presented research can be seen in the work of the following researchers and their conceptions: the dramaturgical analysis of symbolic interaction in everyday life of the sociologist Goffman (1956; see also Hare and Blumberg, 1988); Berne's (1964) transactional analysis inspired by theater practice and his view on games we all play in social interaction; sociodrama, sociometry, and psychodrama in the work of Moreno and Moreno (1975a,b, 1983) the anthropological account based on theater metaphors given by Blumenberg (1989); and the ontology of play developed by Fink (1960, 2012). If we understand individuals as agents or actors, then social interactions can be viewed as dramatic productions. The resonances and conflicts in social interaction, reflecting processes and pathologies, role play, the issue of visibility and invisibility of an individual to the other, the exposure of the actor to the public, the relationship between labor and entertainment, and inter-corporeality in social relations are studied in this context by means of thoroughly examining the agent's experience on stage.

#### **"(INTER)ACTING WITH THE INNER PARTNER." SETTING**

My analysis is meant as a modest contribution to this scope of investigation. It focuses on one experiment which concerns a detailed study of self-interaction and social interaction in an empty space (stage) under special conditions. The experiment "(Inter)acting with the inner partner" is being developed by Ivan Vyskoˇcil, Czech psychologist, pedagogue, philosopher, playwright and writer and it has been practiced at the Academy of Performing Arts in Prague for the last 20 years (Groenewald, 2004).

The practice is organized in the form of semester courses (September–January, February–June) for the general public, i.e., for people who take the course out of their own interest. Each year there are approximately 10 parallel courses of the practice. Participants are of different age, affiliation, gender, nationality, and professional background. The initial motivations of participants differ. Among them are curiosity based on the renown of the practice and author, self-knowledge and self-development, and artistic inclination. "(Inter)acting with the inner partner" is not a method focused uniquely on the training of future actors, but it provides interested people with the capacity to act openly in front of other people and to develop their consciousness of body, mind, and action. The experiment takes place in sessions. Each session lasts an hour and a half and is organized once a week for a minimum of 4 months. The significant effect of the experiment, however, is usually achieved after about a year of rehearsal. The group of participants is closed, their number being limited to 15. The session is led by one or two leaders, professionals with extensive experience in both the practice itself and taking a leadership role in it. The sessions take part in a bright and empty classroom with a high ceiling, empty space (stage), and the appropriate number of seats. The Academy of Performing Arts guarantees the ethical approval of the practice: the experiment is fully voluntary. If a participant concludes after several sessions that she does not wish to continue, she may choose to stay in the group without practicing or simply to leave. However, this case is extremely rare in spite of the fact that the practice is sometimes frustrating, especially in its initial phase. The favorable atmosphere helps people concentrate on what can be gained from the practice.

## **DESCRIPTION**

The experiment consists in entering the empty space (stage), to be seen by other participants, and experimenting there for a time ranging from 2 to 5 min. The experimenting participant is thus alone in a field of the onlookers' attention without any aids (e.g., music, props, costume). She is given no task in advance, no role to play, no object to deal with. She appears in a situation of so-called "public solitude." Public solitude is understood as a situation in which the participant does not contact the spectators in any way, especially visually or physically. It is "as if " they were not present. The spectators, however, are not fully detached observers. They are encouraged to support the actor with "favorable attention" which

means that there is no loss of intersubjectivity despite the lack of discursive and direct eye contact.

The participant is encouraged within this time-frame to "go out of oneself," "express oneself in a sufficiently intense way" and to "come back toward oneself" through voice, body expression and speech (Vyskoˇcil, 2005, 4)1. This means that the performer expresses herself in such a way that this expression attains some sort of autonomy – it becomes a meaningful "figure" (i.e., autonomous unequivocal expression) in space and time that can be addressed back to the performer and make her react as someone else.

Each individual trial is terminated after the pre-set time period by means of an auditory signal. The participant joins her seated colleagues and discusses observations and comments about what she did in the space when exactly"intense"moments with potential for acting appeared, by which condition the action could proceed, what exactly blocked the action, how the psychosomatic balance or disorder affected her capacity to be present in the situation, in front of other people, etc. The discussion is facilitated by the leader who usually gives most of the comments. After the discussion, another participant volunteers to go into the empty space and practice. There are usually two or three rounds for each participant during the session. Participants are encouraged to note and articulate their experience, discoveries, observations, and ideas in the form of regular written reflections that enable them to fix key moments in the development of the process.

## **DATA GATHERING**

The practice has been thoroughly developed and studied with respect to many features (self-consciousness, creativity, communication skills, psychological effects, group dynamics, etc.). The research is developed at the IRSAA (https://www.damu. cz/cs/umeni-veda-vyzkum/ustavy/ustav-pro-vyzkum-a-studiumautorskeho-herectvi) at theAcademy of PerformingArts in Prague. The institute provides systematic data gathering and outcomeanalysis of the mentioned practice (Vyskoˇcil,2005; Slavíková,2009; Suda, 2009; Chrz, 2010a,b,c). The data gathered since 1995 up to this moment consist in:


<sup>1</sup>In order to emphasize and support this capacity, the leader of the session asks the participants at the outset to bring to mind, for the sake of guidance, the experience of being with oneself in interaction, talking to oneself, playing by oneself in pure solitude. It usually happens when we are alone and either face an issue (as we make a decision and hesitate, as we remember an embarrassing situation) or we are bored or relaxed (making faces in the mirror during a long elevator ride, singing, and narrating in the bathroom). Such basic living situations already show a certain non-trivial interactive self-relationship of human being in moments of emptiness, lacking any explicit interaction with other objects or subjects. The question is: what happens if this playful interaction with oneself as another is induced in front of the others?

assorted, and accessible in the Archive of the IRSAA, Academy of Performing Arts, Prague, Czech Republic.

(3) Interviews with Ivan Vyskoˇcil explaining the main principles of the respective practice, some of which are available online at http://www.ivanvyskocil.cz/, or http://www.interactingwiththe innerpartner.org/Downloads\_&\_Links\_files/A%20Discussion %20with%20Ivan%20Vyskocil%20about%20IwIP.pdf, codified and accessible in the Archive of IRSAA, Academy of Performing Arts, Prague, Czech Republic.

## *Data used for presented study*

The results I wish to present in this paper are based on my participatory observation and dramaturgical analysis of one closed group that assembled from September 2012 to June 2013. There were in total 36 sessions, each lasting an hour and a half. The particular group consisted in 15 participants (six men, nine women), 10 of them 20–30 years old, 3 of them 30–40 years old, and 2 of them 40–55 years old. Seven participants were university students of philosophy, two IT professionals, two students of authorial acting, one unemployed, one professional translator (Czech-Japanese), one professional anthropologist, one on maternity leave. Data I used for investigating the main idea of this paper consisted in:


## **RESEARCH METHODS**

For investigation of "(Inter)acting with the inner partner" I used the following methodology: dramaturgical analysis, participatory observation, and the phenomenological approach.

In accordance with dramaturgical analysis of Goffman, I understood participants as performers of a dramatic situation presenting themselves at the beginning of the practice "as such and such" in order to satisfy or resist the cultural norms, values and expectations (Bochner, 2001; Spry, 2001; Jago, 2002). "(Inter)acting with the inner partner" is explained by Vyskoˇcil (1981) as a "laboratory of (inter)action in dramatic situation." This convergence enables me to use dramaturgical analysis as a method of investigation of how identities, values, meanings, opacity, and relations are in detail constituted and executed in the stable conditions and protocols of the practice described in chap. "Description." These features were studied with participatory observation and observation of participation (Malinowski, 1922; Firth, 1985; Tedlock, 1991, 2000; Clough, 1992). I made use of the following: direct observation as a member of the audience, participatory observation during informal meetings with other participants, in collective discussions about the practice, analyses of personal text reflections written by participants, narratives on development, and transformation of other members of the group.

For participants, the phenomenological contribution to this methodology was very important. The setting of public solitude enables them to adopt Husserl's idea of *epoché* (for relevance of this method see, Moustakas, 1994; Sadala and Adorno, 2001; Groenewald, 2004). *Epoché* means bracketing (withdrawing from personal consideration, Groenewald, 2004, 50) the direct intentional reliance of the participant on other subjects and objects of the world. This bracketing concerns "pre-given coordination" (De Jaegher and Di Paolo, 2007, 495), obvious ways of being and thinking in public through coded, normative ways of interaction. Participants were made aware of the fact that the practice enables them to distance themselves from their automatic coded forms of performance in the public world, to become conscious of this way of acting in public and to study structures of their action without abandoning action as such (Creswell,1998, 54 and 113; Moustakas, 1994, 90).

# **RESULTS**

In this section I describe results of participatory observation for 1 year and dramaturgical analysis of a group of beginners, as described in Section "Data Used for Presented Study." As I mentioned in Introduction, my research (inspired by Merleau-Ponty's ideas) was focused on:


Due to the fact that the effect of the practice is not only cognitive but also transformative and develops in the course of time, the observations concerning the research at issue changed significantly with the number of sessions. For this reason I decided to divide the description of observations into stages.

# **UNCANNY CHAOS**

The situation of being visible for others without any task to perform and without any *a priori* given role represents the initial period of the practice. This period typically lasts between 6 and 10 sessions. It is usually described by participants as a situation of the deepest confusion, chaos, uncanny experience, frustration, embarrassment, fear, anxiety, threatening exposure, and emptiness. As personal notes of participants and interviews document this situation, the negative emotions, according to actors, stem from the fact that they cannot use their obvious codified way of social behavior. One participant writes: "When I appeared in the space, I couldnot identify myself with anything particular. I was nobody suddenly. It was unbearable." The others add: "What is the most embarrassing is that I cannot use my usual tricks," "There is no where to hide, I am like infinitely exposed." The function of social roles becomes evident in the following commentaries: "If I should not have a role, I do not know what the others want from me." "I do not feel safe if there is no role for me." Participants agree that they do not know who they are if there is not the "pre-given coordination," coded game of roles, the possibility of being visible "as this and this" – as a clever guy, beautiful lady, rebel, bored intellectual, engaged socialist.

The participants react to this situation during the initial experiments with a chaotic "overtension," expressed in unlimited speaking, chaotic moving on the scene, fighting with the situation, or on the other hand, with a very remarkable "undertension," loss of effort, depressive behavior and physical resignation, flight and freezing. Written participants' notes document this feature again. A young participant explains: "When I first tried the IwIP I stood in the corner of the room and couldnot move, it was like I was stuck." The reaction of another agent was different: "I only was able to run in the circles around the space faster and faster." Participants had quite liminal reactions as well: "My first attempt looked like a very intense training in martial art," or "I just lay down on the floor and hid my head in my arms. It was like an overwhelming 'nothing' all around me."

This state can be designated withVyskoˇcil's term"state of insensibility," a sort of trance following from obvious dependence of individual on public expectations and ruled social interactions. This very frustrating initial stage, however, does not discourage participants from keeping practicing. Their curiosity is greater than the negative feelings regarding the first rehearsals. As they comment on it: "It attracts me in spite of the fact that I do not absolutely know what this can bring,""There is something intriguing in it, I wish to find it out," "It looks like nonsense, but I kept thinking of it the whole week. I am extremely nervous before each trial but immediately after the session I wish to try it again."

This stage of the experiment does not show much according to our hypothesis that social interaction includes performative elements of opacity and invisibility and that this invisible excess of sense already functions on the level of self-relationship due to the essential self-opacity and self-alterity of each agent of social interaction. In terms of our hypothesis this stage of experiment is preparatory. In spite of it, it shows important aspects of visibility and invisibility in social interaction in agreement with the enaction theory:


then as an interplay of coded roles, habitus (Bourdieu, 1990, cit. op. De Jaegher and Di Paolo, 2007, 495).

(C) It is important to notice that participants are attracted by the practice despite the frustration it brings. The curiosity of primarily frustrated participants is a testimony to the human ability and even the will to cope with unobvious situations beyond pre-given coordination (De Jaegher and Di Paolo, 2007, 495), to wish to act otherwise than in a pre-coded way.

## **PHYSICAL NON-TRANSPARENCY AS THE FIRST FORM OF INVISIBILITY**

The next stage of the experiment arises after about six sessions (1 month and half of practicing). At this stage participants slowly allow themselves to calm down, to concentrate, to loosen up, to perceive and to express themselves. In this period they very often mix their tendency to imitate, copy, accept and produce various prefabrications and standards in order to amuse the observers, not to be silly in front of them, to escape from the uncanny situation etc. with a sort of acknowledgment that they are simply here as they are. Their physical presence seems to be sufficient for being visible. They become conscious of their essential non-transparency, their resistance to the gaze of others. As one participant says: "For so many weeks I have tried to perform something interesting here, but I finally found this always so stupid. So now I decided just to stand up in the center of the room. I told myself: nobody can harm me, let them watch if they want to. This is me. I was standing there for a very long time. I felt like a rock, or statue, full of meaning suddenly."

The key observation in this stage consists in becoming *aware of oneself as of the other on the physical level.* This is documented by the following commentary: "I started touching myself with my hand and explored the boundaries of my body, those of my face, of my neck, of the other hand, of the back. It was like discovering myself in the space, physically. I realized I was there as a body. This calmed me down, it was sufficient to be there and feel my boundaries. I was there like this for the others as well."

The awareness of self-alterity led sometimes even to creative play: "It was for the first time I really stopped focusing on what the others think of me. I was uniquely interested in the way my hand was moving around my body. It was like a small butterfly touching me at different places. And at the moment I told myself it was a butterfly, my hand was more and more like this and my body changed into a flower. It was amazing, I could just play with it." Another form of self-alterity was found through voice communication: "I had a problem with my voice. I couldnot speak loudly, I felt ashamed. But this time I told myself: well, the worst thing that can happen is that I will be stupid before them as usual. So I cried out loud and it really scared me. It was like the voice of a stranger. But I cried back, telling it that he scared me and that he should stop immediately. And the first answered he couldnot stop until I would calm down. And I answered I couldnot calm down while he kept scaring me. And then he proposed to me that we should cry together and show we are here. It was a sudden change and even very amusing one. I completely forgot about the fear in front of the audience and laughed a lot."

These commentaries document already some aspects present in the research hypothesis:


## **BACK-STAGE AND FRONT-STAGE: SOCIAL INVISIBILITY**

After approximately 10 sessions, a new stage of experimentation emerges. Participants feel sufficiently assured through their physical resistance and being-in-relation with themselves through physical contact so that they start to observe the duality of roles and "non-coded behavior."What comes to their mind very often at this stage is the idea that they are hiding something very true ("backstage") behind their prefabricated roles ("front-stage"). This is a very personalistic part of the experiment in which agents have the impression of uncovering the alleged "secret Self," hidden sphere of themselves. A young participant's description demonstrates the effect of the uncovering of secrecy that has not yet been embodied and enacted: "Up to now I have always controlled myself in order not to show the truth. But the experiments always brought me to this point. So this time I followed the impulse and transformed into a child. A child in the uterus. I had my eyes closed, was lying on the floor huddled and moved very slightly. I thought I would stay there forever because it was the most secret Self I had. But after some time, I do not know how long it took, it started to be somehow boring. It did not interest me anymore. It was very surprising for me. I stood up and told the child: it's

time to be born, don't you think?" The way in which the utterance and exposure transform the content of a secret idea is illustrated by the following example: "When I for the first time said loudly I was stupid and again stupid and stupid, it sounded suddenly not like the only truth about myself anymore. I had to react by saying that I objected. The stupid one still insisted on being stupid and the other figure tried to tell him it was a nonsense. The secret truth transformed into a good piece of a dual game." A more general view of the relationship between personal engagement and self-differentiation is offered by a third comment: "The more I am personal in the experiment, the more I see I have many different aspects or figures linked to each other. There is nothing like the only true Self. It is rather a dialog among different agents."

With respect to the investigation of the mentioned hypothesis we can note at this stage of the experiment the following:


## **DISCOVERY OF THE OPAQUE OTHER: SELF-RELATIONAL INVISIBILITY**

The fourth stage of the experiment starts approximately after twelve sessions. With an increasing number of attempts supported by the increasing trust in the positive feedback from audience, the participants start not only to orient themselves in the situation but even to enjoy it in some respect. Enjoyment appears when the participant begins to get more relaxed, to slow down and become more curious about *what is happening* instead of focusing on oneself and one's exposure and visibility. This transformation of focus goes from the alleged hidden self toward miniscule events that happen to the agent: for instance, a slight motion of fingers, ideas enrolling in mind, fissure in the wall, sound of steps. These events can be understood as so-called impulses, as triggers of some unknown expressive forms that are yet to be. Through the expressive amplification, the trigger develops into the so-called figure, i.e., a discernible, unequivocal, and complex expression having clear contours and meaning (for example, a slight motion of the fingers becomes slowly a mother waving goodbye to her child going to school for the first time; or a slight motion of fingers becomes a dancer in a group around a fireplace).

The following example documents this process very clearly: "I started with a slight balancing on my feet. I was balancing in this way until I realized it was like being on a ship. I balanced a little bit more and it made me wave to people who, I imagined, were waving to me. "I will get back soon" I cried out to them. "Good luck, our hero!" They cried out. (I changed into them for a while) I waved three more times and then I had a sudden impression I was completely alone on the open ocean and my waving is useless. "What shall I now do with this waving hand?" I asked. But at the same moment the hand was already acting as a magic animal who tried to bite me. My reaction was to bite back at the beast. We fought for a moment and then the signal stopped the play." This example shows the function of imagination and playfulness at this stage of experiment. The participant is not any more concerned by himself personally. He is capable of following the "logic of the play" that has its own rules. This capacity includes the readiness to change one's own stance, bodily scheme at the right moment of the play. The key skill in "(Inter)acting with the inner partner" is thus to follow the order of play, not that of one's own fixed form. It includes catching the moment when the intensive expression receives "an answer from the other side" (Chrz, 2010a, 154). The other side is a name for an *a priori* unlocated answer, the emergence of a response in a situation. The other side does not mean the deeper secret Self, the alter ego, but the situational opposite, a surprising emergent phenomenon that balances the hitherto monological way of being and acting (e.g., the change from balancing to waving, from waving to crying, from crying to reflecting, from reflecting to biting).

The other example shows that the dialogical structure can appear in the discursive form as well: "I was walking in the circles in the space. After quite a long time I told myself it was too boring to walk in this way around the room. 'Can you do something more interesting, so that I can react on it?' I asked. 'No' was the response, 'I am a very boring sophisticated philosopher who does nothing but walk in a boring way and produce boring ideas. Do you want to hear some?,' 'Yes please' was the answer 'it sounds very attractive in the end. Tell me the most boring one you have. Are you paid for this sort of thinking? How much do you get?"' This example shows again that the agent does not identify herself with one or the other expressive figure but follows the dialog between them. She performs not as an individual agent, but as *dividual* agent, i.e., an agent capable of existing as divided, in different aspects/identities. Aspects of this alternation are not more or less essential among themselves. This playful interaction with oneself has to have specific tempo-rhythm so that it does not fade out or explode. The practitioner should respect the rules of her own play.

In the third example a participant directly points out the experience of surprise and fascination. "I walked very slowly in the space. Everything seemed to me boring. I told it aloud: 'How the world is boring... nothing happens at all, nothing, nothing.' But when I pronounced the word 'nothing' it started to interest me that it can be pronounced as a sound made by a barking dog. It was extremely surprising to me that I transformed from a bored IT into a dog, but it gave me so much energy at the same time. I was fascinated by each sound I pronounced and the situation

began to clarify itself. I was so deeply immersed in the play that I even misheard the auditory signal." This example shows that the subtle sensibility to what already happens in the situation brings new forms of meaning. The "nothing" was transformed into a source of meaning that gave the participant "energy" and "fascination." These moments are usually very surprising because they are not *a priori* given. They accompany the standard utterances and expressions as their marginal, even invisible aspects.

The audience reaction at this stage of practice is very significant. The creative withdrawal of the participant from her personality has a paradoxical effect on observers. One observer commented as follows concerning the first example: "She is not speaking of her own accord, but yes, now she has a sparkle, something that is of her own. She is attentive and exact." The other commented: "She fascinated me by some unknown subtlety." The second example was commented on in the following way: "I do not know why, but his action was suddenly addressing me. It is precise and strong. I have to think on it intensively."; "He uncovers something general in his action that attracts my attention." The third example was commented on as follows: "There is something that influences me, fixes my attention all the time, a sort of secret that is extremely powerful."; "I so much like the moments of subtle concentration on the play when it happens, when it starts to make sense. I seem to come alive at this moment and reflect on what is going on."

The"subtlety"very often described by metaphors as an idiosyncratic "color," "taste," "sparkle" of the action is what I wish to denote as the invisible excess of sense. According to the description of participants, some ungraspable aspect of the action moves them, fascinates them, makes them attentive and reflective. The opacity of such aspects involves performative force for them. This excess could even be understood as an artistic dimension of our expressivity.

This stage of the experiment concurs with both points of the hypothesis:


## **EXPERIMENT FOR MORE AGENTS: INTERSUBJECTIVE INVISIBILITY AND ETHICAL ATTITUDE**

The final stage of the 1-year experiment consisted in interaction of more participants in the space. The experiment had the same setting as individual practice, except that now there were two participants in the space together. Their goal was to establish mutual contact using the same hints as in the individual practice (no roles, no eye contact with the audience, relaxed attention, expression). Due to the fact they all had already had experience with the practice for quite some time, their interaction had the character of relaxed and attentive improvisation that took into consideration the opacity of another human being in the space. As the following comment shows, participants were able to "give space" to each other: "The appearance of my colleague brought into play a new form of opacity. I did not know exactly what he meant and wanted, but I knew it was necessary to wait for a while. I made some movements with my hands to indicate a widening of space. Then he started to sing." However, they agree as well that their inner dialogical structure resembles that of interaction: "For me interacting with another person was not that different from the individual rehearsal. She was there as I am for me, as another. I interacted with her as with another inner partner. She surprised me, which created a good field of energy."

We can see that participants played along with the opacity of the other without the need to address it directly. This stance includes the acceptance that there are aspects in social interaction that are not any of our business, even though they participate in the situation. This moment raises questions on the ethical attitude toward the invisibility of the other, or the ethical extent of intersubjective invisibility.

# **EXPERIMENT AND REAL SOCIAL INTERACTION**

"(Inter)acting with the inner partner" brings people in an experimental situation that is lacking in relative obviousness. In spite of this, the experiment has direct implications for everyday intersubjectivity.


stepping out of rigid habits and contributing to participatory sense-making through their specific relaxed concentration. This concentration enables development and performance of surprising, invisible but profound aspects of the other. Comments collected through interviews demonstrate it: "'(Inter)acting with the inner partner' changed completely my view of my surroundings. I observe people differently, try to develop our interaction from other sources, not from the norms and this sort of stuff"; "When I take the metro, I observe how people interact when they have little space or when they are in a hurry. It is very funny to see it as a game. And it is even funnier if I propose some new way of behavior there, as I once made something like sport commentary about who will be the first to step into the wagon and people immediately relaxed and started to laugh"; "Every social situation can be creative. I feel like a part of a vast network where surprising things may happen, like gifts from nowhere. They are to be noticed merely."

# **CONCLUSION**

I hope to have demonstrated by means of an empirical exemplification the following conclusion: social interaction includes elements of opacity and invisibility. These elements play a particular role in social interaction. This role is performative. This means that opacity is neither an obstacle to be overcome by means of a more precise understanding, nor a lack of meaning, but an excess of meaning, a "hiddenness" of the real that has an "active power" (Merleau-Ponty). The description of the practice showed that we can sensitize ourselves to the invisible excess of sense on the physical level, on the level of social norms, and on those of self-relationship and intersubjective relationships. Aspects of invisibility are partially described by the enaction theory and in phenomenology. My goal was to underline mainly two important aspects of invisibility that have not yet been developed in detail in participatory sense-making theory.

The first point concerns the self-opacity of each agent of social interaction and the dividual, dialogical character of her self-interaction. The self-relationship is characterized as the ability to see oneself as another. It occurs as an interaction among non-identical aspects that correspond together on the basis of a temporal rhythm and the regularity of their dynamics. This observation raises new questions for further research, especially concerning the autonomy of an agent in social understanding. The theory of participatory sense-making proposes an idea of "multi-dimensional complex of identities that coexist in what we call a subject" (De Jaegher and Di Paolo, 2007, 503), but this element has not been developed in detail in terms of its unity. Instead the autonomy of living systems is defined as "the property of operational closure" and "the virtue of their self-generated identity as distinct entities" (De Jaegher and Di Paolo, 2007, 487). Can this definition of unity be explained or even shifted toward the self-interactionbased definition that includes non-hierarchical multiplicity of different aspects? May the agent be coherent on the basis of rhythm of some inner "process," even "play," "dialogical order," "dividual dynamics" (as opposed to individuality) and exactly

as such take part in social interaction? How is the detachment from oneself (inner non-identity) important for human autonomy?

The second point concerns the invisible excess of sense in social interaction that represents neither a closed content of disembodied mind nor a clearly embodied expression but still has to be accepted as source of meaning. The resulting issue that arises from the idea of existence of invisible excess of sense concerns the form of the enactive approach to this sort of phenomena. How do people approach in an appropriate way the opacity of others? Does the only way consist in understanding the other, in the effort to catch what the other means, in coordination and coupling? Should a hint of hidden excess present a trigger for an attempt to keep asking, to uncover it in interaction, to understand it better through action? Does the invisibility have only a form of question, lack of sense that calls for an answer? Can we notice a certain dimension of silence, peace, pause, shutdown of dynamics with respect to the element of invisible excess of sense in social understanding? Enactive ethics (Colombetti and Torrance, 2009) is usually characterized by the focus on interactive and interpersonal dimensions of moral phenomena. This approach allows us in a most appropriate way to avoid ethical individualism. Within this very propitious ethical context I wish however, to stress that social understanding may also imply a capacity of generous respect for alterity – stepping back, letting be – which is not passive but creative.

## **ACKNOWLEDGMENTS**

This article was written as a part of the international project No. M300091203 "Philosophy in Experiment" of the Academy of Sciences of the Czech Republic and as a part of the project No. P401/10/1164 "Philosophical investigations of corporeity – transdisciplinary perspectives" of the Czech Science Foundation.

## **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2014; accepted: 08 September 2014; published online: 30 September 2014.*

*Citation: Koubová A (2014) Invisible excess of sense in social interaction. Front. Psychol. 5:1081. doi: 10.3389/fpsyg.2014.01081*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Koubová. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Toward an expansion of an enactive ethics with the help of care ethics

# *Petr Urban\**

*Department of Contemporary Continental Philosophy, Institute of Philosophy, The Academy of Sciences of the Czech Republic, Prague, Czech Republic \*Correspondence: petr\_u@yahoo.com*

#### *Edited by:*

*Hanne De Jaegher, University of the Basque Country, Spain*

*Reviewed by:*

*Antonio Casado Da Rocha, University of the Basque Country, Spain Mason D. Cash, University of Central Florida, USA*

**Keywords: enactivism, enactive ethics, feminist relational theory, the ethics of care, socially extended mind**

# **INTRODUCTION**

An important and urgent way of widening the scope of embodied and situated approaches to intersubjectivity consists in exploring their implications for ethics<sup>1</sup> . Cash (2010, 2013) has recently argued for a rethinking of seminal ethical concepts against the background of the idea of *socially distributed* cognition. Colombetti and Torrance (2009) have proposed an ethics based on an *enactive* cognitive science of social life <sup>2</sup> . In this short paper, I want to focus mainly on the latter proposal and argue that recent developments in the enactive approach to social phenomena call for further expansion of an enactive ethics beyond its initial focus on face-toface dyadic interactions. In this respect I aim to draw attention to the so far underappreciated kinship between an enactive ethics and the ethics of care. I consider the alliance of these two as remarkably well suited for abandoning the pitfalls of a widespread view of human autonomy in terms of the self-determination of individual rational agents, a view that has been systematically questioned from the perspective of care ethics over the last 35 years, but which still exerts a strong influence on our thinking about the good life and morality<sup>3</sup> .

# **ENACTIVE ETHICS AND SOCIALLY EXTENDED MIND**

Colombetti and Torrance (2009) made the first attempt—and the only one that has been made thus far—to show that the enactivist shift of attention from the individual to the interactional and relational domain (De Jaegher and Di Paolo, 2007) has profound repercussions for *ethics*<sup>4</sup> . What each of us does in relation to another is to be structured and characterized, according to the enactive view, primarily in inter-individual and interpersonal terms. It can be said from this perspective that the ethical character of a given situation arises, at least in part, from the meanings that emerge out of the inter-relations between the participants. These ideas suggest several important shifts in moral theory. An enactive ethics invites us to explore "the deep ethical ramifications of the participatory, collective dynamics of human inter-relations *per se*, as opposed to the ethical significance of individual actions and their simple aggregations" (Colombetti and Torrance, 2009, p. 517). It recommends a de-emphasis of the notions of individual autonomy and responsibility. The main lesson to be taken from the proposal of an enactive ethics is, thus, that the inter-relational, interactional, and interaffective dimensions have to gain a central place in ethics, lest ethical theory overlook the very subject of its inquiry.

Colombetti and Torrance, however, have based their proposal of an enactive ethics on De Jaegher and Di Paolo's account (2007) and limited the scope of their inter-relational interpretation of moral phenomena exclusively to dyadic and face-to-face interactions. The most recent developments within the enactive approach to social life, however, transcend the narrow realm of dyadic interactions and open enactive research to a wider sphere of interactions with sociocultural *institutions* (Steiner and Stewart, 2009; Froese and Di Paolo, 2011; Torrance and Froese, 2011). Human "sense-makers" construct shared meanings in their ongoing interactions within the context of a vast array of social givens (Torrance and Froese, 2011, p. 45). The agent's entrance into an interactional and properly social domain requires abiding by a heritage of pre-established social and cultural norms, while at the same time expanding possibilities of the agent's sense-making

<sup>1</sup> In this paper, the term *ethics* stands for moral theory, especially moral *philosophy*.

<sup>2</sup>In what follows, I will focus exclusively on the enactivist tradition whose philosophical foundations have been laid by Varela, Thompson, and Rosch in *The Embodied Mind* (Varela et al., 1991) and which has been further exemplified by Thompson (2005, 2007), Di Paolo (2005, 2009), De Jaegher and Di Paolo (2007), Di Paolo et al. (2010), and Froese and Di Paolo (2011).

<sup>3</sup>For a discussion of the general influence of individualistic views of autonomy on the current social, cultural and moral imaginary see e.g., Fineman (2004). Sass (2011) shows that the idea of individual autonomy—in terms of self-direction and volition still plays the role of a widely recognized standard for assessing mental health. He characterizes it as an "extremely influential notion" in the field of psychopathology (Sass, 2011, p. 99). Ho (2008) puts forward a criticism of its predominant role in the field of bioethics, whereas Herring (2014) explores its current impact in the realm of law, in particular family law. 4It was not the first and only attempt to focus on the ethical implications of enactivism as such. Varela himself has done the pioneering work in his 1999 book on ethics (Varela, 1999). Recently, DeSouza (2013) has developed Varela's idea of the "ethical knowhow" and brought it in connection in an interesting way with the current debates on the second nature. Nishigaki (2006) has elaborated the link between enactivism and ethics from a different perspective. He attempted to demonstrate an affinity between the enactive approach and Eastern ethical traditions. However, none of the above-mentioned authors have focused on the moral significance of the interactional and inter-relational domain as described in the enactive account of social life. Their focus has by and large been to put forward a novel view of the relationship between emotional and cognitive dimensions of moral sense-making.

and agency (Torrance and Froese, 2011). If we want to take into account the wider social and institutional dimension of social life as approached from the enactive perspective, the following urgent question seems inevitable: What would an appropriate expansion of an enactive ethics look like?

The wider normative dimension of social life plays a central role in a recent parallel attempt to reinterpret fundamental ethical concepts against the background of a different <sup>5</sup> embodied and situated approach to cognition, which can be found in Cash (2010, 2013). Cash introduces the "third-wave arguments" for socially and culturally distributed cognition and distinguishes them from the individualcentered focus of the previous arguments for the extended mind hypothesis, which are based on Clark and Chalmers' pioneering work (Clark and Chalmers, 1998). Cash explores the implications of the idea of socially distributed cognition for seminal moral concepts, such as autonomy, agency, and responsibility. The main novelty of his answers, on my view, consists in the recommendation that the advocates of socially distributed cognition should avoid reinventing the wheel and avail themselves of extant arguments elaborated in principally feminist relational theory and criticism of the individualistic conceptions of self, agency, and moral autonomy. Cash refers in particular to the concepts of *relational autonomy* and *relational self* as introduced in the 1990s by feminist theorists and ethicists, such as Meyers (1989, 1997, 1998), Friedman (2000), Mackenzie and Stoljar (2000), and others, who argue, in general, that one's self and one's autonomy are decentralized and are relationally and socially constituted.

## **ENACTION AND CARE ETHICS**

I argue that the feminist relational theory, to which Cash's arguments appeal, and in particular the closely related *ethics of care* [as developed by Gilligan (1982), Noddings (1982); Ruddick (1989), Held (1993, 2006); Tronto (1993), Kittay (1999), and many others] can be considered as a rich source for further developing and expanding an enactive ethics. Both the enactive approach and the ethics of care attempt to rethink the concepts of autonomy, individuality and agency in a way that enables a novel reading of human relations in terms of the irreducibility of the inter-relational and interactional domain. On both approaches, agents are conceived as essentially embodied, situated, and embedded in multiple relational networks at different levels, from the biological to the social and the cultural level (e.g., Hamington, 2004). Concern and emotionality are central to both perspectives and are considered as part and parcel of any agents' making sense of the world and others (e.g., Held, 2006, pp. 21–22). However, the ethics of care undertook the shift to the interactive and interpersonal moral phenomena decades before a proposal of an enactive ethics had first been made. I argue that the conceptual and methodological toolkit of the ethics of care, its elaborated accounts of human interdependency, mutuality, engagement with social and political institutions, etc., should serve as a well-suited means of arriving at an appropriately expanded enactive view of social and moral phenomena. The experiential knowledge of the ethics of care, its sensitivity to the inequalities of powerrelations and its developed views of complex structures and relations at various levels of human social life can provide useful tools for widening an enactive ethics to the broader domain of properly social life.

On the other hand, the enactive approach to social phenomena, based on the concept of participatory sensemaking, provides a detailed description of the complex relations between persons, and between persons and institutions, which can help to account not only for the specific nature and dynamics of the social interdependence between persons (in terms of interactional autonomy), but also for the generation and subsistence of social institutions. Human social interactions are essentially situated in a normative context and are governed by various social institutions that make these interactions possible. However, these norms and institutions "don't just exist in a special normative realm independently of the actual lives of people: they are embedded in the ways people conduct those lives—their continued existence requires that they be continually (inter-) enacted, in either word or deed" (Torrance and Froese, 2011, p. 46). Real social interactions involve interpretation and sometimes even creative reinterpretation and modification of the very norms that are the framework within which they take place. The enactive look at "the origin of and fluid changes in normativity" (De Jaegher, 2013, p. 22) with the corresponding focus on the bi-directionality of influence between social interactions and social institutions, can help us explain how a criticism and transformation of social structures, institutions, and norms can materialize. And this is precisely what has been at stake in the ethics of care since soon after its conception (e.g., Held, 1993, 2006; Tronto, 1993, 2013; Sevenhuijsen, 1998; Engster, 2007; Barnes, 2012).

In this connection De Jaegher (2013) aims to show that we should consider the enactive approach as a better way of arriving at a full-blown picture of our interactions with social norms as compared to the proposal of socially extended and distributed cognition (as developed by e.g., Gallagher and Crisafi, 2009; Gallagher, 2013). On her view, the socially extended mind approach is limited to addressing rule-based, hierarchical institutions and interactions, and unable to grasp fluid and more participatory aspects of society. She holds this view, for she sees some aspects of the socially extended mind approach as being in line with functionalism of mainstream cognitive science, which deals with cognitive agents that are primordially lone individuals, instrumentally extending their "cognitive reach." This is why the socially extended mind approach, according to her reading, tends to be onesidedly focused on the functioning of ready-made, rigid normative systems, and therefore "would hardly tell us how institutions could be criticized or changed" (De Jaegher, 2013, p. 22).

This observation, if correct, indicates an important reason why the potential alliance between *enactivism* and care ethics may be seen as more promising and fruitful than the alliance between the theory of *socially distributed cognition* and feminist

<sup>5</sup> It has been repeatedly argued that the extended mind hypothesis (even the socially extended one) and enactivism are incompatible for a number of important reasons (e.g., Di Paolo, 2009; Thompson and Stapleton, 2009; Wheeler, 2010; De Jaegher, 2013). However, there are also a number of commonalities between the two approaches that allow us to qualify both of them as embodied and situated accounts of cognition and that justify the next step of our argumentation.

accounts of relational autonomy. However, we should proceed with caution and not overlook the fact that De Jaegher's criticism is aimed at the funcionalist and individualist core of the notion of a socially *extended* cognition (and only at Gallagher's and Crisafi's account to the extent that some elements of this view are still present in it). Most of her points would obviously not apply to the aforementioned "third-wave arguments" for socially and culturally distributed cognition (Cash, 2013). I deem it plausible to claim that the expansion of an enactive ethics with the help of care ethics, which I was arguing for in this paper, and Cash's proposal of an alliance between feminist relational theory and socially distributed cognition can and should be viewed as complementary rather than conflicting.

# **ACKNOWLEDGMENTS**

Thank you Virginia Held, Alice Koubová, and Martin Nitsche for your warm intellectual support and inspiring comments on some of the ideas presented in this paper. I am also grateful to the editors of this Research Topic, Hanne De Jaegher, and Ezequiel Di Paolo, for their kind willingness to share their ideas with me and to comment on my suggestions. Finally, I am indebted to three anonymous referees for their constructive remarks on a previous version of this paper. This work was supported by the Czech Science Foundation under the grant "Empathy: Between Phenomenology and Neurosciences" (P401/12/P544).

# **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 May 2014; accepted: 06 November 2014; published online: 27 November 2014.*

*Citation: Urban P (2014) Toward an expansion of an enactive ethics with the help of care ethics. Front. Psychol. 5:1354. doi: 10.3389/fpsyg.2014.01354*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Urban. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Shared intentional engagement through language and phenomenal experience

# *Christoph Durt\**

*University of Heidelberg, Phenomenological Section, Clinic for General Psychiatry, Heidelberg, Germany*

#### *Edited by:*

*Ezequiel Alejandro Di Paolo, Ikerbasque - Basque Foundation for Science, Spain*

#### *Reviewed by:*

*Jeffrey K. Yoshimi, University of California, Merced, USA Andrew Christian Delunas, Gavilan College, USA*

#### *\*Correspondence:*

*Christoph Durt, TESIS Marie Curie Experienced Researcher, University of Heidelberg, Phenomenological Section, Clinic for General Psychiatry, Voßstr. 4, 69115 Heidelberg, Germany e-mail: christoph@durt.info; www.durt.de*

This article introduces the notion of shared intentional engagement and argues that the current debate around intersubjective interaction can profit from taking that notion into account. Shared intentional engagement holds between people when they relate together to the same meaningful entities. For instance, when people talk about something, they share intentional engagement as long as they don't talk past each other. But what if the entity talked about involves perceptual experience—is the quality of one's experiences not something that cannot be conveyed to others through language? Against this widespread idea, this article takes up philosophical arguments for the intersubjectivity of, on the one hand, language, and, on the other hand, phenomenal experience. It contents that language and phenomenal experience both exhibit shared structures that enable shared intentional engagement. It then considers an example for how this result matches well with empirical research on "pop out" experiences. Because shared intentional engagement is fundamental for all kinds of human interaction, it necessitates interdisciplinary investigations that are frequently hindered by the assumption that the phenomenal experiences of humans are hidden to others.

**Keywords: interaction, engagement, intentionality, consciousness, phenomenology, phenomenal experience, experience, sensations**

Intersubjective interaction is becoming an increasingly important topic in the literature on cognitive science, for good reason. Intersubjective interaction is a pervasive feature of human life, and thinking about it is apt to show and potentially overcome the limits of the standard inferential approach to other minds. This article looks into several recent attempts to do so, and contents that the notion of shared intentional engagement can contribute to a better understanding of intersubjective interaction. It considers the role of language and phenomenal experience for intersubjective interaction, and argues that both provide the structures that enable shared intentional engagement.

An example for the inferential approach to other minds is "theory theory," according to which the participating subjects apply their own and possibly implicit theories about the "mental states" of others by means of a folk psychology, which is then either falsified or confirmed in the interaction. Another example is "simulation theory," according to which one does not need a theory of the "mental states" of others, but rather employs one's "own mind as a model, with which we simulate—create 'as if' or pretend beliefs, desires, intentional states—and then project these mental states into the mind of the other person to explain or predict their behavior" (Gallagher, 2009, p. 290). Theory theory assumes that knowledge about the mental states of others is reached through a theory of their behavior. Simulation theory contents that this is done by relating them to one's own states of mind, maybe through physiological mechanisms like those manifested in "mirror neurons."

The inferential approach attempts to explain intersubjective interaction through an observation based model. Observation surely is important for intersubjective interaction. Yet, it is a one-way relation: the observer is observing the actor, but the actor may not even know of the observation. Intersubjective interaction, in contrast, is never just a one-way relation. There are a number of recent attempts to understand what characterizes intersubjective interaction, such as the distinction between engagement and coupling by De Jaegher et al. (2010). "Coupling" refers to exchanges that could be had between lifeless bodies, such as the exchange of heat. Engagement, in contrast, is the "qualitative aspect of social interaction as it starts to 'take over' and acquires a momentum of its own" (p. 441). Other authors, such as Schilbach et al., point out that social cognition does not happen between detached observers, and contend that there often is "emotional engagement" (Schilbach et al., 2013, p. 396).

While the details of the proposals of these authors are quite different, they all make an important observation: coupling is not enough for intersubjective interaction, there also has to be engagement. The notion of engagement connects to that of the second-person approach, according to which "recognizing and being recognized by a *You* is primary for understanding other people" (Reddy, 2008, p. 233). Part of what engagement means is that the actors recognize each other. Engaged interaction is a second-person relation in that the other is recognized as an interactor. That means that she or he is recognized as somebody who does not only act, but also reacts to the other's actions, who asks and responds, who has expectations, and who enters into obligations through her or his actions. Each action allows some and forbids other future actions, which is a reason for why engagement has a "momentum of its own." Because such interactions are likely to involve emotion, emotional engagement is an important form of engagement.

In this article, I would like to draw attention to another form of engagement that is of fundamental importance for all human interaction. It may be dubbed *shared intentional engagement*. Shared intentional engagement is the engagement people are in when they relate together to an action, belief, idea, symbol, object, or other meaningful entity. For instance, when two people talk about an entity, they share intentional engagement. Of course, people also talk past each other. When that happens, they cease to engage in shared intentionality with regard to the meaning of their speech. That does not mean that in shared intentional engagement each subject has exactly the same understanding of the entities intended. Nor is the meaning of the intended entities up to the individuals; each actor can learn new things about the entity. Also, shared intentional engagement does not need to be part of a full-blown language; it can be mediated through a language or not. It can consist in pre-linguistic and simple linguistic activities, such as when children and their parents relate to an object in "joint attention" (cf. Tomasello, 1999)—which does not have to mean that either interactor needs to have a representation of that object in her or his mind (cf. Reddy, 2008, p. 86). In this article, "language" is used in a wide sense. It is not restricted to representations, and it is thought to be intertwined with prelinguistic behavior, which the interactors may or may not be able to verbalize.

Language and phenomenal experience are often thought to be the two constituents of a dichotomy: On the one side, language is thought of as structuring otherwise unstructured phenomenal experience, which in itself only provides raw material. For instance, what pain and colors are, is thought to be due to the conventions of each language.1 Phenomenal experience, in contrast, is thought to be independent of language. For instance, the sensations one has when perceiving a color are thought to have a quality that is merely named in language. Studies such as that on joint attention would then show that shared intentional engagement can be had before and without language. But this is by no means the only interpretation. Such studies may also show that, on the one hand, language itself is rooted in human behavior, and, on the other, that pre-linguistic forms of shared intentional engagement are for normal speakers of a language shaped by that language. This paper argues that language and phenomenal experience both come together in shared intentional engagement.

Usually, shared intentionality is discussed under the heading of "collective intentionality." Collective intentionality mainly concerns intentions that obviously cannot be had by one individual alone, such as the task of carrying an object that is too heavy for one person. A paradigmatic question in the discussions of collective intentionality is if "we-intentions" can be reduced to a sum of "I-intentions" (cf. Tuomela and Miller, 1988; Schmitz et al., 2013). The notion of shared intentional engagement, in contrast, is meant to draw attention to shared engagement in actions or entities that are typically done or had by only one individual. For instance, when a person talks about some pain she is feeling, she intends a pain that only she is having. That seems to speak against the above definition of shared intentional engagement, for apparently the meaning of the pain she refers to is not shared with others. But is this really so?

Let's first consider what language has to do with phenomenal experience. There is a sense in which one can say that only the person who has the sensation can know that she has an experience of pain: in theory, she could always pretend she is feeling pain. But can we deduce from the fact that only she is having that instance of a pain that the meaning of that pain sensation can be known only to her? I think that such a conclusion would be preposterous. Wittgenstein gives strong reasons against it in the context of his thoughts on the possibility of a "private language" in *Philosophical Investigations*. He admits that there is a sense in which somebody can attend to her experience that she could not describe to others—or herself (Wittgenstein, 1999, p. 277). But the impression one has at one moment is different from what is meant by sensation terms; the meaning of these terms needs to be recognized in repeated instances, which is done with the help of rules and criteria. Even if "pain" was only a word for something like "this feeling," the deictic reference to "this feeling" would still be determined with the help of rules and criteria, which are at least potentially public. If there were no such criteria, the person having the pain herself would not know whether what she is having is a sensation, and less that it is a sensation of pain, rather than some other sensation. As a quality that can be recognized in other instances, the pain can be described to others and known to others.2

Wittgenstein's investigations into language match up well with the everyday experience of understanding other people's feelings. Of course, talking with somebody about her or his pain does not give us that person's pain. Since language and experience are different, there is always something about experience that cannot be conveyed by language. But speaking about somebody's pain can give us a pretty good idea of what the pain is like for the person. Our everyday experience is that of shared intentionality even when we refer to seemingly merely subjective feelings like pain. When doing so, we may make use of theory and simulation: we may theorize about the behavior of others, and we may try to relate it to sensations we know from our own experience. But the above consideration of the role of language for phenomenal experience suggests that phenomenal experience is not independent of rules and criteria that are expressed in language and pre-linguistic behavior. Because the rules and criteria of a language are shared between the speakers of the language, they enable shared intentional engagement.

The argument that experience is not independent of rules and criteria that are embedded in language and behavior is often

<sup>1</sup>Examples of sensations are very different from the main examples of the debate around internalism and externalism that emanated from Putnam and Burge. The latter usually concern scientific concepts and not experiences, and involve something that is usually thought to be part of the external world, such as H2O.

<sup>2</sup>For further considerations of Wittgenstein's thoughts in this respect see Rudd (1999) and Durt (2014).

misunderstood as the claim that language shapes in other ways unstructured experience. For instance, conventionalists claim that language carves out certain color experiences that could as well be carved out differently by different languages. Under this view, which hue in the (physical or phenomenal) color spectrum is called "blue" is conventional, and color words could just as well be assigned to different hues. I think, however, that this is not only a simplistic view of language, but that it also is inconsistent with the phenomenology of sensations. I now would like to shortly outline how phenomenological investigations can show that phenomenal experience itself is structured in many ways, and that these structures are not up to the individual subject.

There are some sensations that seem to force themselves upon us, or at least "pop out" from the stream of conscious experience. The experiential quality of a severe pain, for instance, demands attention, regardless of whether the pain has a serious cause or not. Other kinds of pain, such as a dull pain, are less prominent and sharply distinguished. In a similar way, a typical red, blue, or green seems to pop out much more than mixtures of these colors. In this sense, they have a characteristic phenomenal quality. For instance, when looking at a rainbow that has an equal distribution of wavelengths from infrared to ultraviolet, one would expect that the color gradient has a smooth appearance. But the phenomenal appearance of a rainbow is different; it looks as if some colors were more prominent than others, and as if there were steps in the distribution of colors. This may be the reason for why sensations are often thought to be selfintimating, that they reveal themselves to the person who has the experience just by having that experience. But this thought relies on the questionable assumption that individual phenomenal experiences are unaltered by such things as attention, the context of conscious experience, and learned distinctions, which would not only speak against the above considerations of language, but also is contradicted by the phenomenal structures of experience.

For instance, there is a structure to color sensations. One may imagine a subject that has inverted phenomenal experiences of yellow and blue, but such an inversion would at some point lead to different behaviors. When asked which experience looks brighter, the person with the inverted experiences would either have to answer that the blue looks brighter. Or, what she or he perceives as bright and dark would have to be inverted, too. Yet, due to the unequal distribution of hue, saturation, and brightness throughout the color spectrum, such inversions would become apparent with sufficient further intersubjective interaction.3 Studying the actual structure of color sensations shows that if "inverted qualia" are possible at all, then only to a very limited degree. Most phenomenal experiences cannot be completely different from one individual to another, and the relations between such qualitative experiences are not up to the individual. Because the structure of phenomenal experience is not something completely individual, it enables shared intentional engagement. This result of phenomenological study goes well together with the above remarks on language.

Philosophical investigations are often seen as at best relevant for meta-scientific considerations. But phenomenological discoveries such as that of pop out colors go well together with empirical research. For example, Berlin and Kay, in their famous study on basic color terms (1969) claim that, rather than picking out arbitrary parts of the color spectrum, basic color terms throughout a wide array of languages are clustered around foci. This suggests that there is something non-conventional about color terms, a suggestion that may receive further impetus by a study of the physiology of color perception. After all, the physiology of our sense organs and our nervous systems is relatively similar, in spite of important variances, which can sometimes lead to typical variations and aberrations. One way in which the build of the perceptual system could influence color vision is that human cone cells and neural structures react especially well to specific stimuli, which may cause the perception of focal colors. The phenomenal pop out experiences may, in turn, be the reason for why there are foci for basic color terms in a number of different languages. In a similar way, future empirical research into language and physiology may explain why there are shared structures in sense perception. An example of an interesting subject of further study in which phenomenology and empirical research can enrich each other are synesthetic experiences.

Even researchers who try to model basic color terms on a "purely cultural route" recognize that it is "driven, on its turn, by a non language-specific property of human beings," which they take to be physiological (Loreto et al., 2012, p. 4). But, even though Loreto et al. proclaim a "non language-specific property" as the basis of color perception, they nevertheless model color terms as otherwise detached rather than part of a shared phenomenal structure. As with many authors who write on this topic, they imply the dichotomy I was arguing against above. On the one side, it is assumed that if language determines the right use of sensation terms, they are purely conventional. On the other side, it is presupposed that if there is a phenomenal quality to sensations, it is only contingently connected to language and behavior. If this were true, investigations of language and phenomenal qualities could never be brought together in a unified account of intersubjective interaction.

The idea that sensations are detached from behavior and language often goes back to what Fuchs and De Jaegher call the "'inner world' hypothesis." They claim that it is presupposed by theory theory and simulation theory, both of which "conceive of the mental as an inner realm separated from others by an epistemic gulf that can only be crossed by inference or projection. We are hidden from each other in principle; therefore, we must infer or simulate the other's inner states in order to understand him" (Fuchs and De Jaegher, 2009, p. 467). But the above considerations of, on the one side, the role of shared language, and, on the other side, the shared phenomenal structure of experience, both suggest that we are not hidden from each other. Both show that already in repeatable phenomenal experience there is shared intentional engagement. We are thus not limited to theory and simulation when explaining other minds, although we may make use of both.

Because shared intentional engagement is fundamental for all kinds of human interaction, it is in need of interdisciplinary

<sup>3</sup>Cf. e.g., Hilbert and Kalderon, 2000.

investigation, which has been hindered by the notion that the phenomenal experiences of humans are hidden from each other. Intentional engagement is conditioned by, amongst other things, language and its rules and criteria, forms of behavior, membership in cultures and social groups, the structure of phenomenal experience, the physiology of sense organs and neural structures, and much more. Scientific investigations into all of these can contribute to our understanding of how shared intentional engagement shapes intersubjective interaction. Investigations into intersubjective interactions thus need to integrate a number of diverse fields of research, such as psychology, psychiatry, neuroscience, and philosophy.

## **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 20 May 2014; accepted: 26 August 2014; published online: 08 October 2014. Citation: Durt C (2014) Shared intentional engagement through language and phenomenal experience. Front. Psychol. 5:1016. doi: 10.3389/fpsyg.2014.01016*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Durt. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The body social: an enactive approach to the self

# *Miriam Kyselo\**

Department of Logic and Philosophy of Science, University of the Basque Country, Donostia-San Sebastián, Spain

## *Edited by:*

Eddy J. Davelaar, Birkbeck College, UK

#### *Reviewed by:*

Colwyn Trevarthen, The University of Edinburgh, UK Julian Kiverstein, University of Amsterdam, Netherlands

#### *\*Correspondence:*

Miriam Kyselo, Department of Logic and Philosophy of Science, University of the Basque Country, Avenida De Tolosa 70, 20018 Donostia-San Sebastián, Spain e-mail: miriam.kyselo@gmail.com

This paper takes a new look at an old question: what is the human self? It offers a proposal for theorizing the self from an enactive perspective as an autonomous system that is constituted through interpersonal relations. It addresses a prevalent issue in the philosophy of cognitive science: the body-social problem. Embodied and social approaches to cognitive identity are in mutual tension. On the one hand, embodied cognitive science risks a new form of methodological individualism, implying a dichotomy not between the outside world of objects and the brain-bound individual but rather between body-bound individuals and the outside social world. On the other hand, approaches that emphasize the constitutive relevance of social interaction processes for cognitive identity run the risk of losing the individual in the interaction dynamics and of downplaying the role of embodiment. This paper adopts a middle way and outlines an enactive approach to individuation that is neither individualistic nor disembodied but integrates both approaches. Elaborating on Jonas' notion of needful freedom it outlines an enactive proposal to understanding the self as co-generated in interactions and relations with others. I argue that the human self is a social existence that is organized in terms of a back and forth between social distinction and participation processes. On this view, the body, rather than being identical with the social self, becomes its mediator.

**Keywords: enactive self, social self, embodied self, body-social problem, distinction and participation**

# **INTRODUCTION**

Models and conceptions of the self are diverse. It is considered a substance or a thing, a concept, a narrative, a system, a process or a function; some even argue that there is no such thing as the self (Hume, 1739; James, 1890; Dennett, 1992; Hayward, 1998; Tani, 1998; Perlis, 1999; Strawson, 1999; Dainton, 2004; Metzinger, 2004; Zahavi, 2008). This list is not exhaustive but it makes a point: there is no unifying concept of *the* self.

The lack of a coherent concept of self is not merely a philosophical armchair problem but remains an issue of general theoretical, as well as practical, concern. Here lies the main motivation for the present paper: to propose avenues for a philosophy of self that eventually aids in facilitating dialog and research on the self across the disciplines in cognitive science.

One desideratum for a cross-disciplinary approach to the self is that it acknowledges the diversity of phenomena associated with self and does not make an essentialist claim according to which the self is, for example, either neurological or phenomenal while other aspects are seen as irrelevant or added on. Shaun Gallagher has recently warned against such reductionism of understanding *the* self as essentially this or that "and nothing more."Alternatively, Gallagher proposes a pluralistic, so-called "pattern theory of self:"

[W]hat we call a "self " is a cluster concept which includes a sufficient number of characteristic features. Taken together, a certain pattern of characteristic features constitute an individual self. (...) I propose that we think of these aspects as organized in certain patterns, and that a particular variation of such a pattern constitutes what we call a self. (Gallagher, 2013, p. 2)

Examples of aspects that could serve as constituents of a self-constitutive pattern are *minimal embodied*, *minimal* *experiential*, *affective*, *intersubjective*, *psychological/cognitive*, *narrative, extended,* and *situated*. According to Gallagher, adopting a pattern view of self helps understanding different aspects of the self non-reductively "as compatible or commensurable instead of thinking them in opposition." He illustrates this for a particular conceptual tension in cognitive science, namely the question whether self-hood is best explained in terms of cortical midline structures, a particular brain region (Northoff and Bermpohl, 2004) or whether the necessary condition of self-hood is not rather that all experiences acquire a first-person perspective (Légrand and Ruby, 2009). On Gallagher's pattern approach, resolving this conceptual tension is now pretty simple: do not reside with either of the positions but allow for the 1st person perspective or particular neuronal activation patterns to each count as one "among other aspects" (Gallagher, 2013) of an organized pattern of self – which in the present case, is a pattern defined in terms of minimal embodied and experiential aspects.

I agree with Gallagher's pledge for pluralism, but I also think that his radical openness might prove somewhat too laissez-faire: what makes any of the listed features part of a (meta-)theory of self and what is it that makes a pattern of self acquire its particular organization? Once the diversity of self related phenomena is acknowledged, we also need to understand how the elements of a collection of relevant self features interrelate.

A pattern approach to the self acknowledges diversity but lacks integration, offering no account of the individual as explanatory whole. This poses more than a philosophical armchair problem because what researchers in cognitive science believe the self to be impacts very practically the way they conduct research, from the choice of methodology in setting up experiments and forming hypothesizes, to the interpretation of results. It affects how a medical doctor assesses a person's state of consciousness and well-being or how a psychologist conceives of pathologies of the self and thus whether she choses to treat with pharmaceuticals, body therapy or social and dialogical intervention.

Understanding the self should therefore not consist only in composing lists of aspects according to standards of a given contextual convenience; we still need a notion of the self as a whole, something that can count as a distinguishable unit of explanation and eventually help to interrelate different aspects of the self. As Olson had argued almost two decades ago:

Simply extending the list will only make matters worse. What we need is not just an account of self that would command wider assent than any of these, but one that would synthesize them and show them all to reflect a part of some larger, common idea (Olson, 1998, p. 651).

What I suggest in this paper is that such a larger, common idea exists and that we do not have to chose between either a pluralistic and laissez-faire or an essentialist and reductive approach to the self. A middle way, that acknowledges diversity, while also offering an integrative perspective on the self as a whole could be found in considering the self from the perspective of enactive cognitive science.

The enactive approach holds that biological and mental phenomena are continuous, which means that it characterizes the identity of cognitive beings by similar principles and concepts as the identity of living beings (Clark, 2001; Thompson and Varela, 2001; Di Paolo et al., 2010). It proposes the biologically based concept of autonomy to capture cognitive identity in terms of self-generated, self-determined precarious networks (Thompson, 2007; Di Paolo et al., 2010). The concept of autonomy has a fruitful link to the question of self since both are at heart about *individuation* and concerned with understanding what makes something – or, in the present case, somebody – a coherent unity. The enactive perspective on identity is neither reductionist nor essentialist but aims at a wide enough focus to accommodate the diverse aspects of cognition, while still being concise enough that it can provide constraints to interrelate them. For that reason I utilize the concept of autonomy to inspire a new perspective on theories of self. In this enactive approach, I take the fact that human life is genuinely social to be of crucial relevance. I argue that humans live not only in a world of others that affect them and that they relate to, but that *qua* being interactors in a social world, they also co-constitute each other's self. The human self is not only saturated by the social, but is also entirely inconceivable without it.

The paper involves two layers of novelty, first, it provides an elaboration of the notion of autonomy and the higher levels of the life-mind continuity axis, which moves from basic, sensorimotor cognition to psychological and socially mediated forms of human (cognitive) individuation. Second, it promises to help clarify current conceptual tension associated with the bodily and social dimension of self: while embodied cognitive science has recognized for a while that humans are not their brains but rather embodied and situated social beings, the field still faces another dichotomy, namely the split between individual selves and the social world of others. The social still plays the role of an outside and divided context: the external, independently given world

into which these newly embodied, yet essentially isolated selves parachute1.

The following elaborations of the enactive concept of autonomy are thus at the same time concerned with what I call (in reminiscence of the body-mind problem or as a successor to the body-body problem) the body*-social* problem, i.e., the question for philosophy of cognitive science about how bodily and social aspects figure in the individuation of the human individual self as a whole (Kyselo and Di Paolo, 2013)2.

The strategy for this paper is as follows: I begin by laying out the body-social problem. This is followed by an introduction to the enactive approach to cognition, focusing particularly on the notion of autonomy. In the next section I show that a version of the body-social problem also applies to recent work in enactive approaches to social cognition, in particular to participatory sense-making. Coming back to the logic of some early enactive philosophy by Hans Jonas, I then elaborate the notion of autonomy in terms of sociality and outline an enactive approach to the self that acknowledges diversity without being essentialist and reductive. Support for this proposal is provided considering empirical evidence from research on social pain, quality of life reports in global paralysis, as well as some examples from everyday life.

## **THE BODY-SOCIAL PROBLEM IN COGNITIVE SCIENCE**

There is a conceptual problem arising for recent philosophy of cognitive science. It has to do with two important advances in the development of cognitive science and how they relate to the human self- firstly, the realization that cognition is not brain-bound, but embodied (the "embodied turn") and secondly, the increasing awareness that cognition is not individualistic, but also social (the "social," or if you will, "interactive turn," De Jaegher et al., 2010). Each of these developments itself constitutes an answer to a previously noted conceptual dichotomy: the embodied turn concerned the dichotomy between brain and body, and the social turn, the gap between individual and others.

Let me explicate this tension beginning with the first insight that cognition is not in the head. Recent embodied and situated cognitive science seeks to overcome the brain-bound view of cognition and thereby the clear-cut separation between the individual cognitive system and the environment as an objective and independent given. Cognition is now considered a dynamic interplay of individual bodily and environmental processes, with the brain as a mediator of that interplay (Fuchs, 2011). In this view, cognition also entails subjectivity so that research on cognition is no longer restricted to third-person operational descriptions but also relies on subjective and phenomenological observations from a 1st and 2nd person perspective (Varela et al., 1993; Lutz, 2002; Lutz and Thompson, 2003; Petitmengin, 2006).

<sup>1</sup>I borrow this image from Varela et al. (1993) who used it not in a social sense but in support of the idea that the organism and its environment co-determine each other. The authors caricatured the cognitivist view as implying that the environment is a "landing pad for organisms that somehow drop or parachute into the world" (p. 198).

<sup>2</sup>The body-body problem is the question how a living body can bring about embodied experience (Hanna and Thompson, 2003; Thompson, 2007, pp. 235–237).

The embodied view in cognitive science has implications for understanding the self. While there are still some people who argue that self is found in the brain (e.g., Feinberg and Keenan, 2005; Churchland, 2013), there now is a much wider range of research on the embodied self that explores the role of more than neuronal bodily structures and action for human identity (Gallagher, 2000; Fuchs et al., 2010). It is investigated as a subjective and experiential bodily self (Zahavi, 2008). There are new investigations on the foundations of self and self consciousness in terms of bodily processes, i.e., sensorimotor structures (Légrand, 2006; Gallese, 2014). The idea that the self is embodied has thus found increasing acceptance.

As a consequence, we see new proposals for understanding disorders of the self (such as autism, schizophrenia, etc.) not simply as neurological dysfunctions, but rather as disturbances of sensorimotor capacities of this bodily subjectivity. Accordingly, there are also suggestions for new forms of body based treatment and therapy (Fuchs, 2005; Drayson, 2009; Röhricht, 2009; Parnass and Sass, 2010). Perhaps here it is most evident why cognitive scientists cannot merely adopt a pattern approach to the self, as Gallagher suggested. Explaining schizophrenia as a disorder of the embodied self,for example, cannot imply that the ordered self is considered to be a lose collection of neuronal, social and also bodily aspects. The way we reason for example about what goes wrong in a disorder of the self reveals that instead we already have implicit assumptions about what counts as the ordered self *as a whole*, a coherent explanatory unit – the body, in the present case.

While these considerations are not exhaustive, it thus seems fair to say that cognitive science is on a good track to move from the brain-bound to the embodied view of the self, where embodiment amounts to more than a conceptual add-on.

Consider now the second development in cognitive science: the growing acknowledgment of the idea that cognition involves the social and is, broadly construed, also concerned with intersubjectivity and with understanding others. This has become a subject of interest across the disciplines in cognitive science. The relevance of social interaction is, for instance, argued for in psychological studies on child development, particularly in neo-natal imitation and early infant–mother relations (see e.g., Trevarthen and Aitken, 2001; Reddy, 2003; Rochat et al., 2009). The interpersonal approach has attracted increasing interest in neuroscience, in particular with regards to the question of understanding others, e.g., in research on the (in)famous mirror neurons (Gallese and Goldman, 1998; Gallese, 2013), and in simulation theory approaches (Frith and Frith, 2010; Gallotti and Frith, 2013). In more philosophical approaches we find the corresponding objections to brain-based accounts of social cognition (e.g., Gallagher, 2001) and developments emphasizing the social dimension of self in terms of narrative practices (Hutto, 2010, 2014). There have also been more general considerations about the relation between low-level embodied and social forms of cognition (De Jaegher and Froese, 2009) and new basic concepts that capture the essential role of intersubjectivity in structuring human cognition (De Jaegher and Di Paolo, 2007). In addition, we observe a flowering dialog between cognitive science and phenomenology of intersubjectivity reconsidering authors such as Husserl, Merleau-Ponty, Gurwitsch, or Schütz (e.g., Thompson et al., 2005; Zahavi, 2008).

The question is, how do these two developments, the embodied and social, go together; or better, how do the bodily and social dimensions figure in the individuation of the human self? From a pattern theory approach to the self à la Gallagher they seem compatible and could complete existing theories of the self, adding novel (e.g., sensorimotor and sociocultural) items to a list of (previously neuronal) aspects associated with the self. This perspective is mainly descriptive, which is why it also risks not adding much to understanding the self from a philosophical point of view. As already pointed out in the introduction, one of the reasons why it matters that we do adopt more than a mere completion perspective is that (interdisciplinary) research cannot do with a lose collection of aspects, but must refer to a coherent unity, with which particular aspects, such as neuronal, bodily or social are then possibly associated.

I therefore suggest considering that embodied and social approaches to cognition entail the attempt to re-determine the boundaries of the individual. From this perspective, the embodied and social turns would therefore entail claims about what counts as the individual (agent, system, person, self) as a whole, each specifying an *individuating principle* or the essential or minimal sense of this whole.

However, upon accepting that embodied and social cognitive science makes implicit assumptions about what counts as the individual in this sense, we will see that these developments are, as it were, in tension. The self as a whole can either be embodied or social, but it cannot be both.

Cognitive scientists might give one of the following two answers in response to this. According to the first, they might assume that the body is equated with the self. When speaking of the individual, then clearly no longer referring to the brain, they mean the lived and living *body* as a whole. According to this, there is an embodied core self, which is equated with the individual embodied or living organism (Parnass and Sass, 2010, p. 230). Other recent approaches associated with the idea of such an embodied core self are, for example, Albahari's (2007) concept of *perspectival ownership*, Damasio's (2006) *core consciousness* and Zahavi's (2008) *minimal self*, which considers self from a phenomenological viewpoint of bodily subjectivity. It is assumed that such a bodily, minimal self is present from birth (Krueger, 2011).

Even though proponents of this answer (the self is equal to the body) would probably agree that embodied and social aspects of self are closely interrelated, there seems to be a strong intuition that something about the self remains entirely independent from the question of sociality (Zahavi, 2008, 2010) and that this something – a core self, if you will – can be associated with the body as an organic, separate and individual entity. The social in this version is of course not irrelevant, yet because it provides the context in which the minimal bodily self is embedded, it figures non-constitutively in the individuation of self3. In other words, there can be a self as a whole without the social. I call this claim about the interrelation of body, social and self the *social as contextual* claim.

<sup>3</sup>I rely on a recent distinction made by De Jaegher et al. (2010) between *contextual*, *enabling,* and *constitutive* roles of the social for cognition.

The other way to answer the question of how social and bodily dimensions relate with regards to the individuation of self as distinguishable unity is to assume that the social, instead of the body, is the primary source of individuation. One might call this the *social as constitutive* claim. It states that the core self relies on social processes and that it could not be a self without them. On this account, in its most minimal sense, the self is not neuronal or bodily, but must be essentially a social self.

There are not many researchers in cognitive science who would currently adopt this position decisively, a notable exception being De Haan (2010) who criticized the notion of a *minimal bodily* self and claimed quite specifically that the self, in its minimal sense, is a social self. The idea of self as social is of course not new; it can be traced back to the work of researchers such as Mead (1934), Buber (1947/2002), Vygotsky (1986). Hermans et al. (1992) suggested over a decade ago that the self is social and dialogical in the sense that "other people occupy positions in the multivoiced self". However, it is not clear whether these approaches make an essentialist/constitutive or a contextual claim about the role of the social for the self. In order to argue for a *constitutive* role of the social in the individuation of the self as a whole no stronger statement about the status of the body might be required as it leaves the possibility open that the relevant processes of self individuation could be mediated in terms of mere brain activity, thus trivializing the role sensorimotor structures and other non-neuronal bodily structures. Hermans and Gieser (2012), for instance, locate the biological basis of the dialogical self in the orbitofrontal cortex and the subcortical limbic system, thus leaving the relation between self as bodily and self as social underspecified (Cresswell and Baerveldt, 2011). An emphasis on the role of the social in the constitution of the self as a whole might therefore risk to downplay the other achievement in cognitive science, the embodied turn.

It is possible to make a stronger statement about the role of the body for an essentially social self. But for a claim that the body plays a non-trivial role in the social constitution of the self as a whole to make sense a clarification is required on what counts as a body. That is because embodiment, commonly understood, still equates with organismic embodiment as well as with movement (for a discussion see Kyselo and Di Paolo, 2013), and there is nothing social about the organismic or the moving body *per se*. Nevertheless, whether or not the essentially social self is seen as neurally or bodily mediated, it would still be in tension with the contextual social contribution claim according to which the body is the primary source of individuation.

This is the prevalent tension in cognitive science with regards to the individuation of self. In reminiscence of the body-mind problem or as a successor to the body-body problem I will call it the *body-social problem*, i.e., the question for philosophy of cognitive science about how bodily and social aspects figure in the individuation of the human individual self (Kyselo and Di Paolo, 2013). This tension exists for any approach in cognitive science making a claim about the self as a whole or coherent unity, thus implying a more-than-pluralistic notion of the self. Proponents of an embodied view of individuation risk giving lip service to the social while those emphasizing the role of the social risk doing the same with respect to the body. Both approaches are

mutually exclusive. Without due conceptual clarification, adopting either version, i.e., a primacy of embodiment or a primacy of the social, reduces the other. The assumption that the body individuates the self while the social remains merely context puts into doubt the second disciplinary development in cognitive science, the social turn, and would reinvite accusations of methodological individualism. One could argue that while now there no longer exists a dichotomy between the brain as individual and the world of others, there still exist a dichotomy between the body-as-individual and the world of others. Yet it remains unclear how to work an embodied perspective into an account that takes seriously the role of the social in individuation, when the relevant contribution could equally be made by the brain.

To see that this body-social problem is not an abstract theoretical issue, consider two empirical examples: social pain and locked-in syndrome. Firstly, Eisenberger (2011) has shown that the experience of social rejection (in her example, being excluded from participating in a game) leads to the same activation of neuronal circuitry as physical pain (in reaction to increased temperature). This arguably suggests that people who are socially rejected experience this as similar distressing as bodily pain. Eisenberger argues that this has evolutionary reasons. Humans rely on "social connection" in order to ensure their survival. Social rejection hurts so we avoid (life) threatening situations in which we find ourselves separated from others. Here it seems that the body constitutes the core of human existence as a biological whole. Through pain signals it ensures its integrity, while the social is a means to the same end.

Secondly, consider locked-in syndrome, a case of global paralysis, which leaves a person's entire body paralyzed (with the exception of minimal eye movement, such as blinking), yet her consciousness preserved. The patient's bodily capacities are drastically restricted. Yet inquiries about the quality of life in patients with locked-in syndrome reveal that their self-reported well-being does not differ significantly from that of "normal" subjects. These studies show that the patients'well-being is not equated with physiological capacities. What mattered is that that they were able to engage with others, be recognized and experience themselves as subjects. Locked-in syndrome was not considered a physiological but rather social condition (Gosseries et al., 2009; Lulé et al., 2009). These findings seem counter-intuitive for an embodied approach to the self. If the self was equated with the body and the bodily self is what grounds first-person subjectivity then the patients' wellbeing should be worse, since locked-in syndrome affects the body as a whole.

How we interpret these empirical examples will in each case depend on which version of body-social relation to the self we adopt. Should we explain bodily experiences (such as social pain) and self experience (positive quality of life in LIS) using a theory of the self seated in bodily or organic processes or do these cases rather show that human nature and thus the self is genuinely social and that the body plays an important, but rather enabling role?

One option to avoid a pluralistic or pattern approach to the self (in which body and social co-exist as different aspects of the self) and to still provide an alternative for a cross-disciplinary approach to the self, is to adopt an essentialist perspective, according to which the self as a whole is either embodied or social. But this option risks privileging one dimension, while reducing the other to a contextual element. Either view remains problematic for the purposes of cognitive science. A pattern approach acknowledges diversity without integrating, while an essentialist view offers a sense of unity but at the risk of being reductive and of trivializing the role of either the social or embodied turn in cognitive science. Does this mean we have to decide that one of the two is less relevant or are merely dimensions of a lose pattern of self?

I do not think so. There is a way to argue for a more than pluralistic perspective that does not require one to assume an essentialist perspective on the self as being *either* embodied *or* social. I propose that the body-social problem can be resolved by adopting an enactive approach to the self. However, this point requires nuance and elaboration, since I think there is a version of enactivism that does address the role of bodily and social processes in the emergence of individual autonomy – namely, participatory sense-making – yet still gets us into the same trouble with the body-social problem.

## **THE ENACTIVE APPROACH TO COGNITION**

A central proposal of the enactive approach is that there is a continuity of mind and life, i.e., that mental phenomena can be understood based on the principles that describe the organization and behavior of all life, including the simplest life form such as the single cell organism (Varela, 1997; Thompson, 2007). The philosopher of biology Hans Jonas provided some of the basic definitions of living and cognitive identity. They have been taken up by more recent research in the enactive tradition. The most important idea with respect to the present paper concerns how Jonas conceived of the relation between the individual organism and the world. According to Jonas, the boundary, i.e., that which allows us to identify the individual organism *as* individual is an emerging distinction. He says:

Sameness, while it lasts ... is perpetual self-renewal through process, borne on the shift of otherness. This active self-integration of life alone gives substance to the term "individual" ... its very existence at any moment, its duration and its identity in duration is, then essentially its own function, its own concern, its own continuous achievement (Jonas, 1966/2001, p. 80).

Crucial to Jonas' idea is that the processes involved in the emergence of the organism are in principle not different than those of the organism's environment. These organic processes have a "double nature:"

the materials are essential to [the organism] specifically, accidental individually; it [the organism] coincides with their actual collection at the instant, but is not bound to any one collection in the succession of instants ... "[d]ependent on their availability as material, it [the organism] is independent of their sameness as these; its own, functional identity, passingly incorporating theirs, is of a different order (ibid.).

This means that the individual organism creates its identity as an organism by negotiating a permanent tension between a need for material resources from the world that "it is made of" and the simultaneous drive to emancipate or free itself from *some* of the material processes, so it can exist as an independent individual. The organism's identity thus relies on organic matter that

serves as "constructive material" on one side, and yet at the same time provides identity by being organized in a particular functional way ("a different order"). A fundamental tension exists at the heart of organic life, between a general dependence on material resources and a striving for emancipation from them. Jonas called this tension "needful freedom" (Jonas, 1966/2001, p. 80).

Needful freedom means that an organism's identity is ontologically relational and interactively constructed. Jonas sees the organism as a precarious being, remaining restless as long as it is alive. As Thompson has put it, the "organism has to change; stasis is impossible" (Thompson, 2007, p. 152). It is concerned with its own survival and with having to avoid conditions that lead to disintegration, i.e., its death. The organism is thus permanently in need because in order to survive it has to continuously interact with the environment. One can say that the organism is therefore relatively, but never fully, "in control" of the construction of its very identity.

Over the last decades Jonas' ideas have been elaborated and more formally expressed in various ways, which together ground an enactive view of cognitive individuation (e.g., Maturana and Varela, 1980; Varela et al., 1993; Varela, 1997; Weber and Varela, 2002; Di Paolo, 2005; Thompson, 2007; Di Paolo and Thompson, 2014). The basis for this view is the notion of *autopoiesis*, according to which living beings are defined as self-organized autonomous networks that produce and sustain themselves as a systemic whole – an *identity* within a particular domain (Varela, 1997; Maturana and Varela, 1980, 1987). The production and maintenance of such an identity requires that some relations between the processes of the network remain constant despite structural dependence on the environment. This characteristic of identity has been referred to as *operational closure* (Maturana and Varela, 1987; Di Paolo et al., 2010). More recently these ideas have been elaborated in order to understand not only biological but also cognitive individuation. Some enactivists propose that cognitive systems are best conceived as *autonomous* systems. According to this idea, a cognitive system's identity is a network of processes that self-produces and maintains the network as an *inter*connected network, i.e., each process in the network is not only enabling but also enabled by some other process. The identity is sustained under"precarious conditions," since without being organized in an interconnected way the individual processes making up the network would risk running down and the network as a whole could dissipate (Di Paolo and Iizuka, 2008). In line with Jonas, from the enactive perspective cognitive beings are thus considered intrinsically purposeful beings: they strive to maintain life, which is considered a natural property (Weber and Varela, 2002).

Based on this concern for survival, cognitive beings develop a perspective on the world, from which environmental features and interactions are evaluated and acquire a meaning and a normative status. Not every aspect of the world matters. The normative status of environmental aspects and interactions depends on whether they count as threatening or beneficial to the basic goal of identity maintenance (Di Paolo, 2005; Thompson and Stapleton, 2009). Here lies, according to more recent proponents of the enactive approach, the difference between a mere living system and a

living cognitive system. A cognitive system's perspective on the world depends not only directly on its physical survival – the "mother-value of all values" (Weber and Varela, 2002, p. 111) – but enlarges its action possibilities, from immediate reactions to existential perils, to a recognition of more fine-grained ways to maintain its existence. A cognitive system evaluates its interactions *adaptively*, thus flexibly regulating and changing its own conditions of identity maintenance (Di Paolo, 2005). Cognitive individuation in the autonomous self-production of identity entails a view of cognition as goal-directed, value-driven and purposeful. Cognitive systems have a basic intrinsic twofold goal: to create and maintain an identity and to generate sense or meaning.

For that reason cognitive identity of the autonomous system cannot only be grasped from a third-person, operational definition of the processes involved in its individuation; instead, it requires a view from which the world is encountered and interactions are evaluated *by the system* itself. The enactive approach thus adopts a complementary perspective on cognition, one which also considers the perspective of the cognitive system itself. On this view, research on cognition also relies on subjective and phenomenological observations from 1st and 2nd person perspectives (Varela et al., 1993; Lutz, 2002; Lutz and Thompson, 2003; Petitmengin, 2006). With regard to the question of the self, this means taking the first-person perspective – and therefore subjective experiences and phenomenological investigation – seriously, when it comes to describing its basic structures.

## **THE BODY-SOCIAL PROBLEM IN ENACTIVISM**

Let me now consider how the aforementioned two shifts in contemporary cognitive science, the embodied and the social turn, are accounted for in current work in the enactive tradition. In the enactive approach, the body is what grounds a cognitive system's identity and individuates it as a living entity. It allows the autonomous system to differentiate itself from the environment (Di Paolo and Thompson, 2014) and it is also the means and reason for the cognitive system's engagement and evaluations of its interactions with the world (Kyselo and Di Paolo, 2013). On the one hand, the body is assumed to inform the cognitive system how it is faring with regards to its own intrinsic goals – for instance through emotions (Colombetti and Torrance, 2009; Colombetti, 2014) – but it is due to being embodied that, on the other hand, a cognitive system can have any goals at all. If bodily existence were not finite, nothing would matter to a cognitive system. The individuation of identity and sense-making – the adaptive regulation of interaction with the world – can be realized in various ways, including, for example, through the appropriation of non-physiological tools (Kyselo and Di Paolo, 2013). This is based on the life-mind continuity hypothesis, according to which autonomous self-individuation is not limited to biological processes but can be found at higher levels of cognition, too. This brings us to the second shift in contemporary cognitive science, the question of how, for researchers in the enactive tradition, the social figures in the individuation of cognitive identity.

That human life is not merely biological but also social has been taken seriously by some proponents of the enactive approach (Gallagher, 2001; Thompson, 2007; De Jaegher and Froese, 2009; De Jaegher et al., 2010; Di Paolo et al., 2010). One central example for this that I focus on now is "participatory sense-making" (De Jaegher and Di Paolo, 2007). Participatory sense-making reflects a classical idea from sociology and system theory, that based on the dynamical behaviors of (at least two) individual agents an interaction process emerges that exhibits new properties irreducible to the individuals concerned, so that it can be described as a new systemic entity (Luhmann, 1992). It uses the concept of autonomy to characterize this new systemic entity as a *social* form of autonomy, an "interactive autonomy." Participatory sense-making elaborates on the idea that identity is not passively given but brought forth through interactions with the environment. But it is concerned with a form of autonomy in the relational processes based on coordinated *social* interactions of participants (De Jaegher and Di Paolo, 2007; Colombetti and Torrance, 2009, p. 32).

Recent proponents in the enactive tradition acknowledge that human cognition is not brain-bound, but a matter of embodied, sensorimotor engagement with the environment, as well as a matter of social interactions, as the example of participatory sense-making shows.

But with regards to the present issue, the body-social problem, I show now that the enactive approach currently entails an ambiguity about the role of social interactions for the individuation of identity. To explain this requires, as a first step, to disentangle two senses, in which social interactions appear to be relevant for proponents of participatory sense-making. Firstly, in that social interactions matter with regards to a group of (classically two) individuals, jointly creating the autonomy of the interaction process. Here we look at an autonomous system whose identity as a whole is defined in terms of human social interactions. It is a group identity. Secondly, participatory sense-making also says something about the role of social interactions for the individual: they enlarge individual cognitive capacities.

Participatory sense-making thus implies that there are individuals involved in social interaction. But what can be said about their nature as individual identities? This question remains implicit within the theory. But let me point out some indications of what could count as an answer to what the individual is for participatory sense-making. There are at least two possible readings. One option would be to say that social interactions matter not only for augmenting the individual's cognitive capacities but also for its identity as such. This seems to be what De Jaegher and Di Paolo (2007, p. 492) have in mind when they write that their "perspective bypasses the circularity that arises from pre-conceiving individuals as ready-made interactors. Individuals co-emerge as interactors with the interaction." In this vein one might characterize the individual identity as essentially a (socially) relational and interacting being, in other words, as a participant.

This however, raises a worry. Critics might argue that an identity that is defined as being relational or a participant in social interactions runs the risk of dissolving in the interaction process, effectively becoming invisible *as* an individual identity (Hutto, 2010). But why should a focus on the interaction dynamics imply that there is no individual contribution or that the individual risks dissolving? One reason could be that as of now in participatory sense-making the individual's nature as a relational being is underdetermined with respect to *its own identity*. It appears that the intrinsic purpose of participatory sense-making is not directed at the emergence and maintenance of the individual's identity but at that of an overall interaction dynamics, in other words, at the group identity. From this perspective, the individual participants of course make an important contribution. They act as constituents of a higher order dynamics, in that they "sustain the encounter, and the encounter itself influences the agents and invests them with the role of interactors" (De Jaegher and Di Paolo, 2007, p. 492). The problem is that if being relational means being part of a process whose properties are irreducible to the individual (p. 494) and if, as De Jaegher and Di Paolo (2007, p. 493) say "the regulation is aimed at aspects of the coupling itself so that it constitutes an emergent autonomous organization," then for the individual to be an individual it would have to adapt to an external norm. This norm has to do with the group's identity and the interaction dynamics of which the individual is part. If that were generally be the case, then the individual would actually not be autonomous but rather heteronomous, as it is not governed by its own laws of self-organization. The individual would risk dissolving because it merges with the social environment rather than emerging from it.

Note that De Jaegher and Di Paolo (2007, p. 492) try to avoid the worry that identity is lost in interaction dynamics. They emphasize that "the autonomy of the individuals as interactors must also not be broken ... [o]therwise the individual (as the other) would "become a tool, [or] an object." They appear to defend their view by saying that a person is individuated from others *qua* being *embodied.* This is supported by quotations such as the following: "When we speak about cognitive agents in interaction, the basis for such a coupling can take various shapes and involve various perceptual systems, sensorimotor flows, neural, and physiological processes, external objects, and technological mediation." Co-regulation involves "bodily variables, such as relative positions and timing between movements, coordination between perceptual systems, and neuro-physiological variables" (ibid.). Such wording suggests that the individuals involved in participatory sense-making are bodily beings. If the mentioned processes and mechanisms of co-regulation ground the individual's identity then it would be an individual that moves, has a brain, interacts with material environment, in short is a body. However, it would also be, as it were, unsocial because nothing in the definition of the body as such is social. The identity of the individual is then defined not in social terms, but remains bodily. Ironically, in their very attempt to keep the individual from dissolving, participatory sense-making therefore risks to downplay the role of the social. The body, while differentiating the individual from others, would be a locus of isolation, not a means of connection and engagement.

One way for proponents of participatory sense-making to avoid this second horn of the dilemma would be to admit that individuation of human identity is not fully determined in terms of bodies in isolation but requires that the body engages in *socially mediated* interactions with the world. This would permit a view according to which both claims come together: individuals are not merely embodied, but they are also interactors. This may be the view that proponents of participatory sense-making are actually arguing for. However, this position would suffer from the first horn of the dilemma of the body-social problem, for it implies that the social matters only as a context, in which bodily individuals relate to each other as otherwise ready-made identities. Participatory sense-making risks trivializing the role of social interactions as mere context, a position that stands in stark contrast to the original claims of the theory.

When it comes to defining the individual, the enactive approach currently thus gives an ambiguous answer to the body-social problem. With regards to embodiment and the role of social interactions for the self as a whole, it remains caught in a dilemma. With its identity heteronomically defined as being a participant, the individual either risks immersing and getting lost in the social interaction, or the individual becomes isolated, with its identity defined in terms of bodily processes. Like other research in embodied and social cognitive science that attemps to define the individual as a whole, participatory sense-making actually runs the risk of being individualistic, not in the sense that it implies a split between an objective and material world and the brain-bound individual, but rather a split between a material and social world and *body-bound* individuals.

To conclude, while participatory sense-making is essential for understanding social cognition as a processual and interactive phenomenon and will be important to understand some of the underlying dynamics of group identity construction and interrelations of individuals, its concept of the individual remains ambiguous. We have still to provide more steps within the conceptual move from the low-level cellular to the higher, bodily and social levels of autonomy.

Without further conceptual clarifications and a definition of what counts as the individual, the concept of autonomy, which is considered a crucial building block for the enactive approach to human cognitive individuation, remains underspecified. If this remains the case, critics of the enactive approach might find it difficult to see how the notion of autonomy can help cognitive scientists address important questions at the intersection of individual and social cognition.

As we will see in the remainder of this paper, the notion of individual autonomy can be elaborated following the classical logic of the enactive position itself. In the next two sections I outline how one can account for the individual in cognition in a way that avoids the body-social problem without being reductive or essentialist. I propose an approach to the self that acknowledges plurality while also offering an idea how it might form a coherent unity.

## **AN ENACTIVE APPROACH TO THE SELF**

In this section I outline an account for the individual self in a way that avoids a tension between the role of bodily and social processes in cognitive individuation. From an enactive point of view, it is therefore crucial to carefully distinguish on the one hand between two different kinds of identity that the enactive approach refers to as autonomous system – the identity of a group (autonomy in the interaction process) and the identity of the individual (individual autonomy). On the other hand, we also need to differentiate two kinds of organizational principles – one in terms of bodily and organismic, and the other, in terms of social interaction processes. What I focus on here is how bodily and social processes matter for individual autonomy.

We must also acknowledge that, while the individualism entailed in an (essentially) embodied view of the self is reductive, it also has an important point: it introduces a distinction between the individual and the world and thus makes it distinguishable as what it is: an individual, and not the world. As I argue in the following, it is not the distinction between individual and world *per se* that we should give up, but the degree to which a brain- or body-bound view would force us to endorse it. Speaking about separation from the environment (and thus about the individual as an identifiable whole) does not rule out that social interactions are vital for cognition (as participatory sense-making has it) nor force us to assume that the individual is an isolated being parachuted into the social world. The solution is to reconcile both views by finding a common ground from which a middle way can emerge.

I propose that this common ground can be derived from the logic of the individuation of organismic identity entailed in Hans Jonas' notion of "needful freedom." The notion captures a principle that I believe is most useful for beginning to conceptualize the basic organization of the human self as a distinguishable unit of explanation. This principle is what I will call the individuation *through and from a world*: an individual identity reflects, in its structure and existential needs and concerns, the world *from* which it continuously emerges; but, in order to exist as an individual, it thereby also emancipates itself from the world *through* those very same processes.

This principle demands two things: first, that the processes defining an identity are in principle of the same kind as those of its environment and second, that there is not only interaction with the world but also *emancipation* from it. The two together ground the tension between needing the world (needful) and striving to emancipate from it (freedom).

In line with the hypothesis of the life-mind continuity, I propose to use the principle of individuation *through and from a world* to inspire a new look at the individual self, which can be formalized in terms of the enactive notion of autonomy. The key idea for this to temporarily free Jonas' notion from the realm of the bodily and organic and to wonder what it would mean for a human *social* individual to be needful and free. The body-social problem for participatory sense-making (and cognitive science in general) arises when, while making the embodied and social turn, one does not fully endorse the principle of *through and from a world*. Freeing, I should thus emphasize, really means to bracket for a moment any role that the body might play in the individuation of human cognitive identity and to instead consider human individuation as a social process rigorously and all the way down (the body does play a non-trivial role but I will get to this in the next section). This means to define the human self *organizationally* as a whole in terms of social interactions and exchanges with the environment. In this context I refer to social interactions as virtual or actual interpersonal engagements of at least two individuals, but also processes of self-relating and being related in social relationships4. The types of processes that individuate the self as identity are therefore relational in nature (Tschacher and Rössler, 1996). This realization means that the self is never seen as given or as something that an individual just *has* – it is an achievement, constantly open to change and, at best, something *between* individuals. The self thus never just *is* but rather emerges continuously and jointly relying on behavior and action and on doing and being together with others.

The next important step is to thereby take seriously that, while the principle of individuation *through and from a world* entails the individual's emergence in dependence on the social world, it also requires its emancipation from it. Without this second aspect, that is without a distinction, the individual would dissolve in social interactions, becoming invisible *as* individual. Again, to introduce this distinction does not require a shift to an ontologically different kind of identity, say the body (or brain). It can be achieved at the same level. It simply means that the social processes involved in individuation matter in different ways: in providing the "material" on which the individual's identity constructively relies, but also in forming its identity as that particular social individual standing out against the social relations of which it is made. I believe Mead captured the same idea in principle when he said, in *Mind, Self and Society*, that the self is "an eddy in the social current and so still a part of the current" (Mead, 1934, p. 182).

In this way we begin to expand Jonas' concept of needful freedom, from referring to biological individuation, to an individuation in terms of social interactions. However, to say that the individual emerges through social interactions is not quite enough to capture the idea of freedom and emancipation entailed in the principle of *through and from a world.* Individuation must also involve a particular flexibility and the possibility of ongoing emergence, not just of a one-time instantaneous independence. We have seen this in the case of the organism whose freedom is relative in the sense that it "coincides with their actual collection at the instant, but is not bound to any one collection in the succession of instants" (Jonas, 1966/2001, p. 80). The organism is always dependent on organic matter but what allows it to be an individual organism is that it is not always dependent on the same organic matter.

I propose that just as the organism's metabolism continuously exerts a choice by taking in only particular processes, while avoiding others, so too the socially organized individual cannot incorporate all social interactions or relations at the same or throughout time, but rather and at different instants in time only particular collections of them. The basic idea is thus to transfer the temporal dimension entailed in Jonas' perspective on individuation to the level of the human individual and to capture the tension of *through and from a world* by admitting that, while individuation always relies on social interactions and relations, these can vary and matter for the individuation of self to different degrees. In principle the individual does therefore not depend on any single

<sup>4</sup>My requirement for an interaction to count as social is therefore lower than typically assumed in enactive social cognition. A social interaction need not involve equal subjects. A relation between an infant and its care-giver, a prisoner and guard or between an ego-centric and empathic person is social even when the recognition of subjects as free and autonomous individuals might have different degrees.

one of them. The construction of human identity occurs not in terms of organismic, but rather *social needful freedom.*

Social needful freedom would do more justice to the role of social interactions and relations than current models of the individual in cognitive science allow: they do not merely matter in that they constitute the individual's identity as a participant in an interaction or belonging to a group. It is also through social interactions and relations that the individual can free itself and enable itself to move away from some interactions and/or to engage in certain others. Because at different instants in time the individual can engage in certain or disengage from certain other relations, it achieves a relative or functional degree of independency, a mobility that is social. In this way the individual frees and distinguishes itself through time, not merely through being a moving separate body. Nevertheless, as long as it is an individual, it cannot free itself fully from the social interactions and relations, since they are the general "relational material" that it is made of and only against and through which the individual could ever be emancipated5.

Let me now indicate how the idea of social needful freedom can be used for elaborating the enactive notion of autonomy as introduced in section "The Enactive Approach to Cognition," so that it can inspire an approach to self that is integrating without being reductive or essentialist. I would like to emphasize that I aim to initiate the beginning steps toward re-thinking the concept of autonomy to ground novel approaches to the self, not to provide a full-fledged theory of the self.

The model is basic in the sense that it conceptualizes the self at the most encompassing level required for understanding it as an organized unity, while however abstracting over particular phenomena of self, inter- and intra-individual variations, as well as across development, disposition and enactment of self and the particular cultural context, in which the self is embedded in. Indications of how this abstraction can be used to illuminate the different manifestations and dimensions of self will be given later. Right now the goal is to help in avoiding the trap of thinking of the self either as individualistic and embodied, or as social and potentially lost in interaction.

The first step toward a definition of individual autonomy in terms of social cognition is to begin thinking the individual as arising from a sea of social relational, not merely bodily processes. In this way the autonomous network is therefore not only

a metabolically "self-generated identity" (Di Paolo et al., 2010) but actually also, and necessarily, an identity that remains open to structural change generated in interaction with others. It is a *self-other-generated* network. This means that the organizational process that constitutes the identity of the individual are defined in terms of interpersonal behavior and action6. Let me now determine in a second step just how these processes are minimally organized so that they bring about the individual as self-other generated autonomous network.

Capturing the idea of social needful freedom in terms of individual autonomy, the autonomous network that constitutes an individual's self must be organized such that, while principally relying on social relations, it can also resist and therefore free itself from some of these relations. I propose to use the term *distinction* to capture the emancipation as individual *from* certain social relations. Without emancipation there could be no identifiable entity (phenomenologically corresponding to a sense of ipseity, or of alterity in perceiving the other). Being distinct or emancipated however, does not mean that this individual merely stands out, independently, against a vast and unchanging sea of social interactions and relations. In addition to distinction, social needful freedom also entails that the individual continuously becomes individual *through* social interaction and relations. I thus suggest using the term *participation* to denote the other side of social needful freedom: the possibility to organizationally rely at different moments in the succession of time on different instantiations of social interactions and relations (see **Figure 1**)7.

Both kinds of network processes, those enabling distinction and those that enable participation, are required together to ensure social needful freedom and to bring about the individual as a network of autonomous self-other organization. Without distinction the individual would risk becoming heteronomously determined and forced to rely on the next best or only a limited set of social interactions. But without participation and its act of openness toward others, the individual eschews structural renewal, thus risking isolation and rigidity. This describes what some enactivists refer to as "precarious conditions" of autonomy (see "The Enactive Approach to Cognition"). In this case, distinction and participation both keep the individual from a particular risk, namely

<sup>5</sup>The idea of social needful freedom might be in tension with some phenomenological accounts of intersubjectivity (e.g., Henry, 1988 or Husserl, 1992/1930). The sense of self in our self-other relation involves a first-person givenness and therewith a sense of separation from others and, at the same time, there is something about the experience of the other that escapes my own experience. This seems to contradict my idea that the individual can never free itself from the social as material of its own self-constitution. I am afraid I cannot do justice to the rich body of phenomenological inquiries into self and understanding others in this paper. But generally speaking I am convinced that the idea of social freedom is in principle compatible with phenomenological accounts of intersubjectivity. What it challenges are some deep intuitions about the structure of first-person perspective and to which extent experiences of ipseity and alterity are conclusive to it. The suggestion that, contra Henry's absolute immanence, it might be relationally co-enacted, is not to replace, but rather to complement phenomenological inquiries. My hope is that the proposed *organizational* perspective herein helps to specify just the how first-person perspective is constituted. This could be seen as part of an ongoing dialog between cognitive science and phenomenology of intersubjectivity.

<sup>6</sup>Such overlap of behavioral and identity constructing processes can already be seen in non-human animals. Fisher spiders for example store air bubbles trapped on the surface of their body, thus mechanically stabilizing the bubbles (so called "plastrons") to prevent them from collapsing under water. They then use the oxygen contained in the bubbles to survive for longer periods than they otherwise could, under the water surface (Flynn and Bush, 2008). Thus the survival of the insect under water relies not only on constructive processes (of its metabolism) but also on a particular behavioral and interactive strategy (collecting bubbles, storing them etc.).

<sup>7</sup>Note that social relationality need not translate to actual engagement, or actual interaction, with others. The self in the mode of *distinction* is not suddenly socially unrelated, rather it is socially related in two ways: first, in order to be distinguishable as individual it relies on particular social relational processes that favor its distinctiveness, and second, it is related in that the social relational processes that do not favor or do not matter for its distinctiveness necessarily act as the system's environment only against which is visible as unity. Similarly, being in *participation* does not mean to cease to exist as separate individual; were this the case, no *one* would be distinguishable as participant. For this reason participation is not equal to participatory sense-making. The latter adopts a more narrow perspective on the individual, which is determined heteronomously, in terms of its contribution to a group identity.

**FIGURE 1 | Socially enacted autonomy.** The graphic illustrates the basic organization of the network of processes that constitute the self as individual socially enacted autonomy. The network processes are social interactions and relations (the blue-red grid) that are spanned between two poles, distinction (blue ball, D) and participation (red ball, P). D and P are interconnected in that they enable each other. Together, the poles determine and qualify the overall tendencies of the network processes (indicated by the blue thin arrows left and the red thin arrows on the right) as having more or less distinction/emancipation and participation/openness. The network processes are in tension (the double arrow in blue and red). When social interactions and relations exhibit higher tendencies toward P, the "pull" from the opposite pole D ensures that the processes do not end up in a extreme degree of P. In this way the network avoids the risk of dissolution. Vice versa, when social interaction and relations have a higher degree of D then the network's organization tends to balance this with increasing tendencies toward P, thereby avoiding the risk of isolation.

isolation from others or the dissolution in social interactions, and they enable each other in doing so.

I propose to capture these ideas in the following definition for human socially enacted autonomy of the individual:

Individual autonomy is a self-other generated network of precariously organized interpersonal processes whose systemic identity emerges as a result of a continuous engagement in social interactions and relations that can be qualified as moving in two opposed directions, toward emancipation from others (distinction) and toward openness to them (participation).

Because of the tension between a risk to dissolve or to become isolated, the individual, much like the organism, remains permanently concerned with the continuity of its own existence. But while mere living systems strive to survive by avoiding interactions with the environment that threaten their biological survival, the human self *qua* self-other generation has to avoid tendencies in social interactions leading to social death.

Just like the organism in its metabolic autonomy, the social human being follows an intrinsic existential norm guiding behavior and evaluations of interactions. The important difference is that the organismic identity as a bodily whole is secured by homeostasis ensuring the body remains stable throughout different interactions with the environment. In the case of the social self, the stability of the unity is not achieved by individual biological or bodily means, but through engaging with others, by learning first how to and then continuously negotiating the balance between the processes of distinction and participation. This balance between distinction and participation is achieved by navigating a range between two extremes, total distinction and total participation and to thereby coregulate, monitor, identify and seek to avoid tendencies of falling into either of them. This could be the social version of what some enactivists refer to as adaptive regulation (**Figure 2**). The negotiation of distinction and participation can be seen as a co-enacted, quasi-homeostatic principle keeping the self relatively stable and alive as a socially organized and organizing existence.

Mere organismic systems adaptively evaluate their interactions with regards to nutrition needed for the maintenance of metabolism. They seek the right kind and amounts of food, avoiding poisonous food and preferring especially nutritious food. Humans need an additional kind of nutrition. Because human autonomy is co-generated with others, it is necessarily vulnerable to disturbances and conflict. Others can fail or refuse to contribute to a person's identity affirmation, which could ultimately interfere with the very organizational network that constitutes human autonomy. Particular interactions (or the lack thereof) would lead to problems, either with regards to the individual's experience *as* somebody individual or with her experiences of being somebody that is *connected* with others. For them to adaptively regulate their own states and interactions with the social environment means to evaluate actions with regard to their contribution to a *socially defined boundary*. To this end, processes enabling or limiting *recognition* of the twofold need for emancipation (distinction) from and openness to others (participation) can be relevant8. In line

<sup>8</sup>An important question for further elaboration is how processes of distinction and participation could be mediated in linguistic terms. To this end, it might be fruitful to relate the present argument to Maturana's work on languaging and the creation of consensual domains in which individuals co-structure their social, not merely organismic, identities (Maturana, 1978). A further crucial linkage exists to developmental psychology. Research showing the vital role of intersubjective engagement in early infant development (e.g., Trevarthen, 1993; Braten, 2004; Stern, 2009) could be relevant for specifying how processes of distinction and participation organize the initial development of socially enacted autonomy. The educational psychology of Bruner, who was also the first to use the term "enactive," could inspire further elaborations of how children continuously expand their self-reflexive capacities and understanding others through active, intersubjectively structured learning (Bruner, 1996).

**FIGURE 2 | Adaptive regulation of the twofold basic norm of distinction and participation.** The three graphics illustrate different degrees of distinction (D, blue ball) and participation (P, red ball) in different contexts. Graphic **(A)** illustrates an individual featuring a stronger experience of participation (e.g., when being in love, dancing tango, emerging in the crowd at a concert). Graphic **(B)** illustrates an individual with an equally strong degree of distinction and participation (e.g., in the intimate encounter or during a fight with a close person). The third graphic **(C)** illustrates an individual that experiences a higher degree of distinction (e.g., during a conference talk, in non-transcendental states of meditation). with the present suggestions one could say that social recognition is vital throughout life (Ikäheimo, 2009). Recognition is the nutrient required to co-construct the boundary of the self. If this were not the case, solitary confinement would not be chosen as one of the harshest punishments. As studies with prisoners have shown, social isolation can lead to serious short-term and long-term psychiatric disturbances such as paranoia and hallucinations (Grassian, 1983; Haney, 2003; Guenther, 2013) and as research on social exclusion and ostracism shows human contact is needed to sustain a minimal social identity and prevent social death (Bauman, 1992; Williams, 2007).

According to the present proposal social death has two faces. It could occur when the individual gets stuck in the extremes of either of the two dimensions, distinction or participation. An extreme degree of distinction would mean that the individual has lost its connection to the very structures that it is made from (it risks dying from isolation), while an extreme degree of participation would mean that the individual has lost its individuality (it risks dying from dissolution). There are examples that approximate such extreme degrees in disorders of the self and particularly in symptoms of schizophrenia (Parnass and Sass, 2010), such as social or self-isolation (extreme distinction) or loss of agency (extreme participation).

Recall from section "The Enactive Approach to Cognition" that the enactive approach also provides a route for integrating a thirdperson, organizational perspective with the subjective dimension and phenomenological perspective of the system itself. Though it is outside the scope of the present argument, a thorough and long-term investigation concerning how the processes of distinction and participation structure subjectivity is as yet required. In the remainder of this section I provide some examples to indicate how humans ensure their survival as social existence through interactions and relations that generate or prevent processes of distinction and participation.

The above definition of socially enacted autonomy proposes that humans co-generate their identity following a twofold norm. This can be used to structure the individual's perspective on the world in terms of subjective experiences that are evaluated according to whether and how they serve survival, i.e., in this case, the maintenance of the self.

Both distinction and participation are (experienced) types of social interactions and relations, though they say nothing about the amount or actuality of engagement. Distinction roughly means that a person experiences herself as emancipated and distinguished from certain social interactions and relations. It involves a sense of separation and of being someone in her own right. This can apply for a diversity of self-conscious experiences (whether positively, negatively or otherwise evaluated): doing yoga, nervousness in front of an audience,feeling disconnectedfrom your partner, being proud of an achievement, being the stranger at a party, but also the joyful experience of finally being alone after having spent the entire day with other people. Such experiences mirror the basic structure of social autonomy, as striving to maintain a particular degree of emancipation as individual. Participation then generally refers to experiences of feeling both connected and open. It involves a sense of readiness to affect and to be affected by the other. Again, there are manifold examples: the sense of self as curious when falling

in love with someone, the pull we feel when finding somebody sexually attractive, afeeling of letting go when dancing tango, being one with the crowd at a concert and so forth. Such experiences refer to the basic structure of social autonomy as striving to remain connected and open to particular types of social interactions and relations (see **Figure 2**).

I have given examples, in which either a sense of distinction or participation is more prominent. However, these two qualities – of experiencing oneself as separatefrom others and as somebody willing to engage – precede or follow each other, and they can even overlap. There are situations, in which we experience the shift from one quality to another quite clearly. If, e.g., in a difficult discussion our partner finally seems to understand what we want to say, a relief or a sudden relaxation may appear, upon which we begin to feel less separated from the other and begin to experience a readiness to be open again. Yet something about this readiness is already found in feeling separated and misunderstood – one can at the same time feel the need to just overcome the conflict and to be in harmony again. Similarly, at a conference presentation we can experience both a sense of separation from the audience (for instance because of nervousness in the face of criticism) and a sense of eagerness to engage with it (because we would like to discuss our ideas) at the same time. One of the clearest examples of the presence of these two basic kinds of experience is perhaps found in moments of emotional intimacy, or better, in the struggle therein. In an intimate encounter, experiences of wanting to engage and connect to the partner and fear of rejection or of losing oneself are situated very close to each other and individuals can sometimes continuously oscillate between them. In such moments humans can struggle to find the fine attunement between a readiness to let go and be open to other (participation) while, at the same time, an attunement to owning yourself and remaining visible as another individual (distinction). Emotional intimacy is mostly rare, perhaps because it is where the necessarily open and vulnerable self is at its greatest risk.

In contemplation of human existence, it is our task to remind and "elucidate those fundamental aspects that are so familiar to us, so taken for granted, that we often fail to realize their true significance and even deny their existence" (Zahavi, 2008, pp. 127–128). According to the present proposal, what is so familiar to us simply is human life and how it continuously expresses itself to ourselves through sequences of experiences of being more or less separated and of being more or less connected. What we struggle to recognize until we are in a social or personal crisis, in non-transcendental meditation or adopting a researcher's and philosopher's stance, is that *both* these experiential dimensions are shades of something that is fundamental to our nature: we need and we want to be individuals in our own right, distinguished, able and free but we thereby also need others and want to be connected, vulnerable, supported and receptive. It is when our standard self-other perception is challenged that we appreciate that these needs are probably never achieved independently from others. Being both emancipated and relational should not be treated independently, both conditions the self at the same time.

This basic model of socially enacted autonomy could constitute an important conceptual move for an enactive approach to the self. It offers an organizational principle for approaching the self as a co-generated and co-maintained whole. On this view, the self is not just a lose collection of aspects but has boundaries that are generated through interacting and being related to others. The self in its most minimal sense, thus escapes the body. It is never fully separable from the social environment, but instead determined precisely in terms of the types of social interactions and relations of which it is, at the same time, a part. Without an ongoing engagement with other people, and without their contribution, there is no generation of self.

Yet, that is not to say that the self is essentially social and "nothing more." The argument is not in favor of a disembodied conception of the self. To the contrary, as I show in the next section, in this organization of the self as social existence the body plays a more than a trivial role.

## **TOWARD RESOLVING THE BODY-SOCIAL PROBLEM**

As a consequence of the above proposal, speaking of the embodied self cannot mean that the self *is* the body. Through birth we indeed become a bodily identity, as we "emancipate" ourselves to some extent as physiological entities in a material environment. However, to emancipate as a self, as identity which differs not from organic bodies but *from other human subjects*, a further process of individuation is required (Mahler et al., 2000). This process of individuation, so I suggest in this paper, is achieved through social interactions and relations.

This proposal is fully compatible with the idea of an embodied self where the body, rather than being considered the seat of the self, changes its status and becomes the self's means and mediator.

The body is then non-trivial for the self as a whole to the extent that it functions as a matrix of co-constructed existence, helping (together with the brain, of course) to organize human social existence and to monitor and regulate the intrinsic goal and minimal purpose of the self: to be some*one*.

It is an open research question how bodily consciousness relates to the human (social) self from an enactive point of view. At this point I can only hint at it. For the enactive approach the creation of a living and cognitive identity brings about a perspective, which is considered as a minimal form of consciousness. This chimes well above mentioned research on the bodily basis of selfconsciousness. The idea is to then extend these ideas to the social domain. If, as I suggest, the self is not a bodily but socially coenacted identity, and if consciousness arises with the creation of identity, then an essential part of (bodily) self-consciousness may emerge through relations with others. Bodily self-consciousness, embodied emotions and existential feelings can then be seen as ways of informing an individual about its state of being in a world of others.

Conjoining the embodied turn with the social in a more than pluralistic sense, the idea of the self as socially enacted continues to do justice to the embodied turn in cognitive science, which recognizes the non-neuronal body, but risks reducing it to a developmental role. It could also pick up where extended functionalist approaches to embodiment remain inflationary (Kyselo and Di Paolo, 2013). Acknowledging that (cognitive) identity is irreducible to the physiology of one's own body while at the same

time considering the body a matrix of an enacted social existence, provides the body with a more clearly defined status. It is not a rock or remote island, but it is also not a random vessel. On the present account, being someone implies being an individual that one can connect to and that remains open to being affected by others. The body plays a major role in making this possible. It is an interface for connection. But the structure of that body interface to the world is not rigid. It is fluctuating, a subject to permanent change – change that mostly happens in reaction to and in dependence on our relations with other beings. In continuation of Bernstein's theory of motor psychology, according to which bodily movement shapes the brain's motor system instead of bodily movement being controlled by the brain (Thelen, 2004), within the logic of the argument at hand, there might be a further reversal regarding the relation between body and sociality. The body is not merely a means but also an imprint of social engagement. As a consequence, bodily consciousness alone would be insufficient to ground even the most minimal sense of the human self. Instead, it might be seen as a kind a sensor for monitoring social engagements and relations with the goal of social homeostasis. This sensor does not merely reside within the realms of the individual's body and actions, it is also co-constituted in and through the space created between individuals.

Of course, there is something quite crucial to insisting that a person feels their very self changes when they change bodily aspects of their existence, be it when they become sick, suffer an accident leading to disability or even when they only change slightly, say with getting a new hair cut or dress. But we can admit this without also arguing that body and self are ontologically the same. The point I want to make is that many bodily changes matters for someone because of what they mean with respect to this person's relation to the social world and how she fares in its relation to others. Bodily experiences acquire a social meaning and I propose that this meaning is generally evaluated according to the twofold norm of distinction and participation. The new hair is not merely a change to some biomaterial that grows out of my head. It is a change to the way I look, and thus relate to myself and to others, and of course to the way, in which others relate to me. My partner might notice the difference in style and compliment that I look fresher, more beautiful etc. But if after my haircut I went to work for *medicins sans frontiers*, the change of style would probably not matter much. The point is what I feel about my haircut depends on how I saw and now see myself and on how others have seen and now see me. It requires an implicit act of relationality to make this bodily change significant for my self.

Let me now come back to the two empirical examples, introduced in section "The Body-Social Problem in Cognitive Science" where this point becomes more pressing: the possibility of positive quality of life in LIS and social pain. Recall first the case of LIS patients, who despite being globally paralyzed, report a positive quality of life. One way of making sense of this is by adopting what I would call a cognitive adaptation strategy. In a recent study, Nizzi et al. (2012) conducted interviews with LIS patients to assess how the paralysis affected their sense of personal identity. They found that patients can adjust very well to the objective change in physiology and actually "feel the same as before the accident." According to the authors, this is because the patients maintained a positive subjective "bodily representation" (p. 435). If positive quality of life has to do with a positive self representation then this adjustment strategy can explain why patients feel well despite the paralysis. However, Nizzi et al.'s (2012) interpretation seems to presuppose a disembodied view of the self. Whether or not the body is subject to severe objective change plays no role for the patient's self as long as she consciously decides that it does not. One of the problems for an explanation of well-being in LIS is that it risks trivializing the role of the non-neuronal body for the self – all the necessary work could be done by a bodily representation, presumably located in the brain. For an (essentially) embodied approach to the self this interpretation must seem counter-intuitive. The embodied self implies that there is a relation between objective physiological change and subjective experiences of self and well-being. On adopting this view, one would probably have to assume that LIS, being a global bodily paralysis, is in a sense also a disorder of the self and of (bodily) self consciousness. If the self is equated with the body and the bodily self considered as grounding first-person subjectivity, then the patients' well-being should be affected. And yet, as the results of Nizzi et al.'s (2012) interviews and other qualitative studies on LIS patients seem to suggest, this is not the case. The embodied approach to the self (as a whole) would thus actually make a counterfactual prediction.

The proposed model of the self as socially organized autonomy could provide an alternative to the cognitive adaptation story. On the enactive interpretation, the self remains non-trivially embodied in the sense that it is mediated by the body; the body is part of the interface organizing the individual's social existence. According to this perspective, the patient can adapt to the new situation precisely because she is not the physiological body, but a genuinely social self. The physiological change matters because it changes the ways, in which the patient is able to relate to others and, in which others relate to her. To the extent that these relations are still given, even the most minimal form of communication – as can be seen in the usage of brain computer interfaces – can suffice to enact the processes necessary for the individuation of self (distinction and participation) and thus for integrating bodily changes into a positive sense of self. This interpretation is also empirically supported by studies of less severe forms of disability. Babies with Moebius syndrome, for example, lack facial expressions and are unable to show their care-givers "that someone is home" (Cole, 2009, p. 351). This can affect how care-givers react to their children. They might respond to them with "reduced signals" which can in turn cause "emotional impoverishment" (Cole, 2009, p. 354). For patients with spinal cord injury "disablement [ha]s nothing to do with the body. It is a consequence of social oppression" (Cole, 2009, p. 348). Paralysis is "not simply a physical affair ... but an ontology, a condition of our being in the world" (Murphy, 1990, p. 90). Despite global restrictions, the LIS patient is still "yearning for intersubjectivity" (Dudzinski, 2001, p. 43). Statements such as these suggest that it is through being related to others that bodily changes can affect and be integrated in our self. The fact that the "quality of life often equates with social rather than physical interaction" (Gosseries et al., 2009, p. 199) makes sense when the boundaries of the self are not determined

by bodily processes alone, but rather in terms of relational and co-enacted processes. LIS can be considered a disorder of the self to the extent that the body is restricted as the individual means of social relationality, not as the seat or constitutive basis of the self. More accurately, like other cases of disability, LIS should be seen as a "disease of social relations" (Murphy, 1990, p. 4). This also means, for better or worse, whether she is able to integrate severe bodily changes and lead a happy life, does not entirely depend on the patient herself, but also on the support and recognition of others.

An interpretation of well-being in LIS makes sense from a disembodied view, but the idea of the self as mediated by the body offers a non-reductive explanation, doing justice to both, the embodied and the social turn in cognitive science.

The present proposal also makes sense in light of the fact that social rejection hurts (see "The Body-Social Problem in Cognitive Science"). One might be tempted to read this fact prima facie as evidence for the primacy of the organic body in individuating the self as a whole and so as supporting the idea of the (essentially) embodied self. This is indeed what Eisenberger seems to have in mind when arguing that the pain is evolutionary beneficial since it helps to ensure survival. On such a reading, the social matters, contextually in allowing an individual to survive as a biological identity (a minimal bodily self, if you will). The social rejection of being excluded from participating in a game hurts because it indicates a risk, namely that others will not be there to help protect the biological self 9.

The alternative would be to consider the evidence that the major source of concern for human existence does not stem from nuisances within the organic body itself, but rather from the fact that human existence is organized socially. Thus, instead of reducing sociality to the role of the means to a biological end, why not take the evidence as direct support for the fact that humans are concerned about their existence as social beings? I would agree with Eisenberger that the pain of social rejection is beneficial for survival. But in light of the present consideration, this survival is not merely biological. Rather, the empirical example can be seen as support for the hypothesized relation between socially enacted autonomy and the fundamental role of social recognition as enabling the processes of distinction and participation. Social rejection constitutes a potential violation of recognizing me as someone others can connect to or who can connect to others, but it also risks reducing my ability to be seen as a distinct individual. On assuming that the body mediates a socially enacted self, pain of social rejection could be one of the body's clever ways of cautioning the self against the lack of recognition and its ultimate consequence, *social* death. I would thus reverse the

<sup>9</sup>One might wonder whether and to what extent this can be extrapolated to human identity in general. It could invite an odd argument according to which humans must suffer physiological pain for every social activity in which they are not included. This is obviously not the case. Whether a certain interaction counts as a case of social rejection might be better determined by evaluating whether it means something to the person, and this depends on how she is related or wants to be related to the people involved in the interaction. If a person does not care to be included in the activity, then she would not feel rejected and therefore also not experience physiological pain. Even if the person desires to be included, if she reassures herself that that the exclusion is temporary, she avoid interpreting the situation as a rejection and thus remain pain-free.

standard argument: the social does not help the bodily self as a whole, instead the body is helping the self to survive as a social whole.

To conclude these considerations on the quality of life and pain of social rejection, there is no logical reason that forces us to prefer one of the three possibilities of interrelating body, self and sociality (disembodied, essentially embodied or bodily mediated). The first example supports both a disembodied and a socially enacted view of the self, while the second example seems to be plausible on both an essentially embodied and on a socially enacted and bodily mediated account of the self. I am thus not arguing that my approach is the only game in town. What I would like to suggest however is that it might be preferable for the purpose of cross-disciplinary dialog, since it rises to the challenge of the body-social problem without avoiding either, the embodied or social turn in cognitive science. At the same time it might have advantages over a pattern approach to the self, since it does not merely account for diversity but also provides an account of the self as a coherent unity and determines how other dimensions such as sociality and (neural and more than neural) embodiment might integrate as aspects of this unity.

## **CONCLUSION**

In this paper, I have introduced the body-social problem, the question for cognitive science of how bodily and social aspects go together in an account of the human self as a whole. I have discussed the problem in more detail with regards to research on social cognition in enactivism, where it translates to the question of how bodily individual autonomy and higher, socially enacted forms of autonomy, are interrelated.

I proposed the principle of individuation through and from a world to extend Jonas' notion of needful freedom and to ground an integrative perspective on the embodied and social self. According to this principle, humans emancipate themselves not merely through organic, but also interpersonal, interactions. Their identity emerges out of a tension concerning social freedom: humans strive to distinguish themselves from others as individuals, yet at the same time they also strive for connection with, and being affected by, others.

I elaborated on the enactive approach to individual autonomy and indicated how this discussion can inform an approach to human identity as co-generated and organized in terms of an adaptive regulation of social distinction and participation processes. I have argued that the enactive approach to the self can be a way for cognitive science to avoid the dilemma of the body-social problem. One does not have to choose between positing an isolated bodily individual or an individual as mere participant. The positive contributions entailed in both horns of the body-social dilemma are brought together in an integrative way. In this view, humans are participating and therefore able to emancipate themselves, and because they emancipate themselves they are able to participate. The self is constitutively social, not merely developmentally, but throughout its life. The body's role is to mediate that social existence and is the major key to ensuring the twofold goal of maintaining both distinction and participation, leaving the possibility open for non-physiological forms of self-co-maintenance, using tools and language-based technology.

The paper provides an alternative to a pattern approach to the self. It acknowledges diversity but as shown in the context of empirical examples, such as the positive quality of life in patients with global paralysis and the pain of social rejection, it also offers some ideas for how they integrate.

These considerations are not meant as a final word on the question of how self, body and sociality interrelate. The paper provides some novel and basic conceptual suggestions for cognitive science to integrate embodiment and sociality in a way that neither underestimates the role of interpersonal relations, nor runs the risk of losing the individual through an overemphasis on group and interaction dynamics. I propose them as steppingstones toward a biologically based, yet social and experientially plausible approach to human individuation. Further investigations, to this end, are required, including philosophical inquiries on self and intersubjectivity at the intersection of philosophy of mind and phenomenology as well as philosophical anthropology. Further required are explorations of existing linkages to intersubjective approaches to self and subjectivity in other fields of cognitive science, especially developmental psychology, psychiatry, and social neuroscience.

# **ACKNOWLEDGMENTS**

I would like to thank Gabriel Levy, Mike Beaton, Elena Cuffari, and Ezequiel Di Paolo for their valuable comments on earlier versions of this paper. This work is supported by the Marie-Curie Initial Training Network, "TESIS: Toward an Embodied Science of InterSubjectivity" (FP7-PEOPLE-2010-ITN, 264828).

# **REFERENCES**


Dennett, D. (1992). "The self as a center of narrative gravity," in *Self and Consciousness*, eds F. Kessel, P. Coleand, and D. Johnson (Hillsdale, NJ: Erlbaum).


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 May 2014; accepted: 19 August 2014; published online: 12 September 2014.*

*Citation: Kyselo M (2014) The body social: an enactive approach to the self. Front. Psychol. 5:986. doi: 10.3389/fpsyg.2014.00986*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Kyselo. This is an open-access article distributed undertheterms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Interaction and self-correction

# *Glenda L. Satne\**

Center for Subjectivity Research, University of Copenhagen, Copenhagen, Denmark

## *Edited by:*

Ezequiel Alejandro Di Paolo, Ikerbasque – Basque Foundation for Science, Spain

## *Reviewed by:*

Julian Kiverstein, Institute of Logic, Language and Computation, University of Amsterdam, Netherlands Manuel De Pinedo-García, University of Granada, Spain

#### *\*Correspondence:*

Glenda L. Satne, Center for Subjectivity Research, University of Copenhagen, Njalsgade 140-142, 5th Floor, 25.5.20, 2300 Copenhagen, Denmark e-mail: satne@hum.ku.dk

# **INTRODUCTION**

Conceptuality traditionally seems to impose specific challenges to the possibility of a naturalistic account of mind. The issue I address in this paper is how to specify the normative abilities that are associated with conceptual competence in order to meet a very popular challenge in recent developments of philosophy of Mind, what I call the naturalist challenge (NC). I do not intend to provide a complete or even general account of conceptuality but, more modestly, I try to specify certain conditions that a naturalistic account of conceptuality should accommodate, conditions that define a framework of specific questions and concerns, in particular in relation of our capacities of conceptual self-correction, that lead us, I argue, to prioritize a certain approach *vis-à-vis* others: the interaction theory of mutual understanding. In the context of that general approach, I claim it is possible to account for self-correction in a way that is compatible with the challenge at issue.

Addressing the problem of conceptual competence within a naturalist framework makes it necessary to meet the NC, that is, to account for:

(1) the evolutionary path from creatures without language or thought to creatures with both abilities without postulating any explanatory and/or evolutionary gap1.

In this paper, I address the question of how to account for the normative dimension involved in conceptual competence in a naturalistic framework. First, I present what I call the naturalist challenge (NC), referring to both the phylogenetic and ontogenetic dimensions of conceptual possession and acquisition. I then criticize two models that have been dominant in thinking about conceptual competence, the interpretationist and the causalist models. Both fail to meet NC, by failing to account for the abilities involved in conceptual selfcorrection. I then offer an alternative account of self-correction that I develop with the help of the interactionist theory of mutual understanding arising from recent developments in phenomenology and developmental psychology.

## **Keywords: interaction, self-correction, naturalism, normativity, evolution, conceptual abilities**

(2) the capabilities of learning or acquiring conceptual contents – and a natural language – without producing or presupposing any explanatory and/or evolutionary gap or committing to the existence of non-natural entities.

And a further constraint:

(3) Answers to (1) and (2) must be able to justify the attributions of intentional attitudes to children and non-human animals2.

There are two main strategies that have been adopted toward this challenge. Both of them, when broadly construed, define two general models of conceptual abilities that may be described in terms of the adoption of a first-personal perspective or a thirdpersonal one. The first one, that can be called the first-personal model, includes those attempts to understand conceptual abilities that focus on the individual's brain states, conceiving them as dispositions or informational states that are related in appropriate ways to the environment such that they can be conceived as constitutive of the competence involving a specific concept. According to this model, NC is met because the explanatory work is made by a naturalistic specifiable notion, i.e., one that can be found pervasively in the natural sciences, the notion of *causation*. What makes a state constitutive of the competence according to a concept is its being properly *caused* by that to which the concept refers to or is about. In this sense, these approaches are causalist accounts of the nature of conceptual competence.

<sup>1</sup>It was J. Levine the first to use the expression in the context of the discussion of reductivist accounts of the mind. He said: "In the end, we are right back where we started. The explanatory gap argument doesn't demonstrate a gap in nature, but a gap in our understanding of nature. Of course a plausible explanation for there being a gap in our understanding of nature is that there is a genuine gap in nature. But so long as we have countervailing reasons for doubting the latter, we have to look elsewhere for an explanation of the former" (http://cognet.mit.edu/posters/TUCSON3/Levine.html). Even if Levine was referring to another aspect of the mind, the point still applies in relation to the development and evolution of conceptual capacities. The use of the expression "evolutionary gap" is meant to emphasize the need of having an explanation of

how certain capacities evolved from others, instead of postulating a gap in nature. "Explanatory gap" refers to what Levine calls a gap in our understanding, i.e., the insufficiency of a certain set of explanatory tools to infer or otherwise explain conceptual capacities.

<sup>2&</sup>quot;Justification"in this condition is to be understood in broad terms. Thus, it is meant to cover a broad range of explanatory accounts of those attributions, not merely accounts that will take those attributions to be literally true. There is nevertheless a minimal constraint that justification places in these explanations. It requires that the explanation of the attributions is based on the abilities displayed in the behavior of the organism to which the attributions are made.

The second approach I examine focuses not on the individual brain states but on the attributive standpoint of an interpreter that can understand an individual's behavior conceptually, thus undertaking a third-personal perspective. This strategy is known as an interpretationist account of conceptual abilities. NC is met – so the defenders of this position claim – because this perspective is not committed to there being any specific reality of concepts over and above the interpretational activity of taking the behavior at issue to be explained in terms of the attribution of the concepts in question.

My aim in this paper is twofold:


## **CONCEPTUAL ABILITIES: BASIC NOTIONS AND CONSTRAINTS**

There seem to be good reasons to think that no matter how we define conceptual abilities nor the position we assume concerning the scope of conceptual content and its articulation with experience, being able to apply concepts presupposes as a necessary condition – though of course not sufficient –being able to distinguish between correct and incorrect applications of them in actual cases. This is what we may call the *normative constraint on conceptual abilities*3.

Such constraint can be defined as follows:


Further precisions are required in order to understand correctly the constraint. As it may be apparent, the normative dimension involved in the ability to apply concepts involves the possibility of error.

There are nevertheless two notions of error or mistake that must be distinguished. In particular, there are two different kinds of mistakes that we attribute to others in their use of concepts. On the one hand, we may attribute error to someone when she misapplies a concept. I call this *misapplication or conceptual mistake*. On the

other, we may attribute lack of competence to a person regarding a concept when she lacks the concept or is simply not applying the concept at all. This I what I call *absence of application*. Such distinction will prove especially fruitful when assessing whether a model of conceptual abilities can fulfill the normative constraint accommodating the requirements of NC.

Consider the following cases:


In the first case, we attribute to John that he is adding wrongfully, in the second one that he simply is not adding. While (i) is a case of conceptual mistake, (ii) is just a case of absence of application. The crucial difference lies in the fact that while in the former case the concept in question is relevant to the evaluation of the action, i.e., is relevant for the way in which the performance is carried out; in the second case the concept is not relevant for explaining his performance, it is simply absent4. Following our previous constraints, to account for the normative constraint specified in (1) and (2) above, it is necessary to be able to account for the abilities that underlie the attribution to a subject that she is committing a mistake in the use of a concept (conceptual mistake), and to distinguish that case from a case in which the subject is simply not applying the concept (absence of application), i.e., it is necessary to account for when and how someone who uses a concept, commits and recognizes conceptual mistakes and accordingly self corrects her use and to distinguish that case from one in which the subject is not applying the concept at all.

How should we then understand self-correction in application of concepts? Self-correction in the relevant sense seems to involve three dimensions of performance:


As it will be shown in the following sections, both causalist and interpretationist accounts of conceptual abilities fail when accounting for the distinction between cases of misapplication or conceptual mistakes and cases of absence of application and the consequence of this failure is their inability to meet NC.

## **THE CAUSALIST CONCEPTION OF CONCEPTUAL ABILITIES5**

The way in which competence regarding a specific concept X can be defined in causal terms is the following:

<sup>3</sup>For a full list of necessary conditions for the possession of conceptual abilities, see Camp (2009) and Scotto (2010).

<sup>4</sup>Of course there are cases like (ii) in which we say that John should have been adding and *ante* that there is then an *error of performance*. But in such cases, what we mean is that *he should have known the concept*: the problem resides precisely in the absence of application of that concept and not a misapplication of it.

<sup>5</sup>Forbes (1984), Ginet (1992) and Fodor (1998) are some of the advocates of this approach, although it is much more broadly accepted.

John is competent with respect to concept X iff given certain conditions C, John is disposed to apply X to y iff X(y) is true6.

In this framework, conceptual mistakes are modeled in terms of the failure of a mechanism: conditions C are not given. The reason for this failure might be internal to the mechanism, that is, that the mechanism is malfunctioning or it might be the absence of one of the enabling conditions required for the mechanism to work.

I claim that when assuming such way of understanding conceptual competence, there is no non-question-begging way of distinguishing between conceptual mistakes and absence of application.

It is important to bear in mind that if John's mistakes can be accounted for equally as conceptual mistakes according to a concept, or as a case of lack of application, there would be no way to account for the capacity to make conceptual mistakes. Say John says "5 + 6 = 12." We would be immediately inclined to think he was adding and adding wrong. But he could equally be performing a different operation, say, +∗, and doing it correctly. If an account of conceptual capacities could not distinguish between both cases, it would fail to explain what is for John to have any conceptual ability and to distinguish this from the case where this ability is merely absent.

The causalist model fails to provide a plausible distinction between conceptual mistakes and absence of application at least for two reasons:

The first reason is that, according to this model, the subject's reactions/dispositions to apply concepts can be described in terms of different concepts. So in this model it is not possible to distinguish between cases of conceptual mistakes and cases of lack of application. As Boghossian (1989)<sup>7</sup> famously pointed out, the same reactions can be described using different concepts. This further requires for the model to distinguish different responses as appropriate or not in specific contexts, and in order to identify the proper set of responses we need to distinguish the good cases from the bad ones, conceiving these as cases in which conditions C fail, in the example at issue conditions C would include John cognitive mechanisms working fine, including normal functioning of attention, memory, etc. The problem is that we can only distinguish the two cases by using the concept we want to reconstruct, stipulating which is the concept in question, for example, stipulating that when John says that "6 + 5 = 12," he is using the concept of addition. But this means that we have to presuppose its content without accounting for it in terms of reactions, opening an explanatory gap. Importantly, there is no distinction between absence of application and misapplication that does not depend on stipulating the concept at issue and thus presupposing

the pertinence of that very distinction. It is important to bear in mind that this problem rises independently of whether the account takes these processes to occur at the subpersonal level or at the personal one. In either case, there is no non-question-begging way of distinguishing that the behavior accords with one concept and thus is a case of conceptual mistake and not mere absence of application of that concept8. Thus, the proposal fails to meet NC9.

The second reason why this view fails to make the distinction between misapplication and absence of application is that this account does not give a proper account of self-correction. According to this kind of theory, the source of error is a failure in conditions C, but this kind of error is independent of the subjects being able to identify it in practice. The mistakes are of such a nature that the subject may be unable to identify them (direct access to them could even be impossible for the subject) and modify his use of concepts according to the identification of error and its sources.

In fact, conditions C are not conceptually linked to the concepts the subject is applying or trying to learn. But self-correction seems to be a key ability to account for the process of learning new conceptual contents through training. Can this theory account for the connection between the identification of mistakes and conceptual abilities that seem constitutive of the process of learning conceptual contents and linguistic terms associated with them? As shown before, they cannot. This amounts to a failure to meet NC, since there is an explanatory/evolutionary gap concerning how new concepts are learnt and from this perspective the fact that concept users are able to apply concepts correctly and self-correct themselves if mistaken seems to be a complete mystery.

However, someone may hold that there are second order dispositions to evaluate reactions (corresponding to the component (b) of self-correction described above). The idea would then be that by positing them it is possible to account for self-correction and still defend a purely dispositional account of conceptual competence10.

But a similar problem arises: if those (second-order) dispositions were fallible and learnt, they would require dispositions of higher order to be learnt. This involves a vicious regress. If, on the contrary, those dispositions are not fallible and learnt, they are some kind of *sui generis* dispositions. This leaves their nature unexplained: are they to be conceived in causal terms? It seems that they must not be, in order to avoid the previous difficulties, but then another notion of conceptual ability must do the work here. This leads to an explanatory gap. Thus, the theory fails to account for NC (2) since it cannot explain the learning and acquiring of conceptual contents in a naturalist way (it fails by

<sup>6</sup>According to the kind of concept, conditions C will vary. They may for instance include normality in the subject's cognitive functions as well proper external conditions, so for example, were the concept a perceptual one, then proper conditions of illumination will be included as well as the proper functioning of the visual system. 7Kripke (1982) and Wright (1989) have also argued for the same conclusion. The main claim, as we will see, is that the causalist way of specifying conceptual competence is circular, in as far as it presupposes the very concept that is supposed to be specifying by the identification of the relevant dispositions. For a discussion of this see Satne (2005, chapter 3).

<sup>8</sup>One might think that I am presupposing that self-correction as I define it is a personal-level concept and thus unable to challenge subpersonal accounts of conceptual abilities. On the contrary, the definition is neutral with respect to this. I thank one of the anonymous referees for pressing this point.

<sup>9</sup>Fodor (1990a) specifies the concept in question in terms of higher order relations of asymmetrical dependency between causal relations of this sort. But the problem reappears in a slightly different form: postulating asymmetrical relations between causal relations in the absence of a naturalistic explanation of why those relations should hold merely restates the problem at issue (Hutto, 1999, 2009, pp. 47–48, p. 22; Cummins, 1989).

<sup>10</sup>Again, the account could sensically hold that this mechanism is to be understood as operative in a subpersonal level.

opening an explanatory gap when introducing the *sui generis* dispositions involved in self-correction). And it also fails to account for NC (1) since its inability to account for self-correction shows a corresponding failure to draw crucial distinctions between the capabilities of artifacts and other sorts of entities, some of them capable of self-correcting in ways that others are not. There is, according to this model, only one basic kind of mechanism that explains all of these *prima facie* different phenomena. But then the proposal fails in explaining the nature and complexity of different abilities in terms of more basic or previous ones, and so fails in drawing the relevant distinctions between abilities and capabilities of different complexity in a natural and gradual scale11.

# **THE INTERPRETATIONIST ACCOUNT OF CONCEPTUAL ABILITIES12**

I have presented three dimensions that are involved in selfcorrection:


If causalism thinks of level (b) by analogy with (a) and fails to account for (c), interpretationism stresses level (b).

Briefly sketched, according to this model to be a conceptual creature is to be a language user. Both notions are accounted for in terms of interpretation: to be a conceptual creature is to be able to interpret other creatures' actions as meaningful. The interpretation of language is just a part of the global task of attributing meaning to other creatures' behavior. To interpret someone is to attribute meaning to their conduct conceiving it as oriented by wishes and beliefs in the context of a common perceived world. In sum, to interpret someone is to implicitly construct a theory about the content of their beliefs, wishes and the like, in the context of a world where both the interpreter and the interpretee are commonly situated.

The emphasis in this view lies then on component (b), the evaluation of the actions of a subject according to concepts. Accordingly, the model defines conceptual competence as follows:

John is competent with respect to a concept X iff John applies X to y only when the interpreter would apply X to y, or y is such that the interpreter would have applied X to it, had his beliefs been slightly different in a way that matches John's (assuming that the attribution of the belief that y is X to John respects principles of rationality, charity, humanity and causality regarding the interpretation of John's behavior as a whole)13.

The attribution of error – in the sense of conceptual mistakes – is captured as a difference between the perspective of the interpreter and the perspective of the interpretee regarding a special case of application. This may happen in a number of ways. It might be the case that the subject makes a perceptual judgment about something that is openly accessible to both the interpreter and the speaker or it might be that the claim involves a judgment that is not immediately connected to the commonly available perceptual evidence for both speaker and interpreter. Both cases are structurally similar according to this theory, even if they are distinct in terms of the role that each kind of judgment plays for the interpreter to construct the ongoing understanding of the speaker's discourse. While the former constitutes the beginning of the interpretational process, the latter depends on previous judgments concerning what the speaker is taken to believe, intend and desire.

The structural similarity resides in that, for the interpreter, to be able to interpret the speaker's judgment she would have to assume that the speaker shares with her a vast optimized majority of true beliefs. Because of the general theory about what the speaker is trying to convey at that particular moment, the interpreter can then attribute local mistakes to what is asserted. The difference between the two cases is then that in order for the interpreter to make sense of what is being asserted she would start by attributing to the speaker that he is related to the same environment that she is and by that token that he perceives and holds to be true beliefs about that environment that are the same as those she herself holds. It is only with specific evidence to the contrary that the interpreter will withdraw this particular attribution and then attribute to the speaker an error of judgment regarding what both are commonly perceiving. Error will then be explained as a matter of difference between what the interpreter takes to be the case and what she can make sense of the speaker trying to convey, taking into account all the other evidence she has about his beliefs, desires, and the like. The cost of attributing error to commonly held judgments is so vast that rationality constraints on the interpretation dictate to attribute a difference between her perspective and the one of the speaker regarding some other judgment. This is all left on the hands of the interpreter who can then make sense of the behavior in different ways, all compatible with the evidence. The rule is always to attribute the less possible mistake, which is just the content of the principle of charity that governs interpretation.

This model turns out to be problematic when trying to distinguish between conceptual mistakes and absence of application – and hence to account for conceptual abilities. There are at least three difficulties worth mentioning:

(1) Following the principles of interpretation, the conduct of the interpretee can be described either way, as a case of

<sup>11</sup>Another relevant candidate to account for the normativity of conceptual abilities is teleosemantics, a model that appeals to the notion of biological function and the evolutionary history of the organisms to explain representational content. I would not consider this proposal in detail in this paper. The main reason is that as Fodor (1990b) has argued, biological function is not sufficient for intensionality: we can explain the behavior at issue according to one concept or other as long as they are co-extensional in the relevant *de facto* situations. In the present context this would amount to a failure to distinguish between conceptual mistakes according to a concept and absence of application of that concept. For a detailed treatment of Teleosemantics and the problems it rises for explaining conceptual content see Hutto and Satne (2014), where I argue that a story of that sort is part of the explanation of the relevant capacities but not yet sufficient to account for the normativity of conceptual content.

<sup>12</sup>Davidson (1975, 1982, 1984, 1986, 1992, 1994, 2001, 2005), Stalnaker (1984), Dennett (1991), and Brandom (1994) are some of the main advocates of this approach. Further specifications are required that distinguish their positions. I may dispense of introducing such distinctions here since nothing especially important for the arguments presented in this section follows from drawing these distinctions.

<sup>13</sup>I will be following mainly Davidson's presentation of the central traits of the theory although a similar case, with correspondent adjustments, can be made for Dennett's, Stalnaker's and Brandom's accounts.

misapplication of a particular concept or as a case of absence of application. The concept of error is just a tool for interpreting another person's behavior, an attribution that can be canceled by a better interpretation. Hence, this theoretical reconstruction does not distinguish between conceptual mistakes and absence of application.


In sum, the model fails to meet NC (2), since it cannot explain the learning of conceptual abilities as a gradual process. This implies an explanatory gap regarding the acquisition of language, in particular in the acquisition of the concept of error to be attributed to oneself and others. For these reasons, the model cannot account either for continuity in nature, i.e., for the way in which complex abilities of some natural entities emerge through gradual changes and combinations of more basic capabilities exhibited by other natural entities, and this is a failure to meet NC (2). And this also means that this kind of theory cannot explain our attribution of thought to animals and children, such attributions would be at the most mere "ways of talking14," that would not be justified in terms of the abilities exhibited by the behavior of such agents, i.e., the theory cannot answer to NC (3). This leaves unexplained the nature of their capacities and the connection between their ways in the world and ours.

# **MY STRATEGY TO MEET NC: CONCEPTUAL MISTAKE AND STANDARDS OF CORRECTION**

The above considerations have shown that both causalist and interpretationist accounts fail when accounting for component (b) of self-correction, i.e., the ability to evaluate the performance (a). Thus, in order to overcome their difficulties we need to offer an explanation of level (b) of the self-correction dimensions that (i) is not reduced to mere causal reactions, as in the case of causalist models. The strategy is to include an evaluative component that is not conceived in terms of level (a). Second, the account of (b), must (ii) not presuppose articulated contentful thought, as is the case of interpretationists account. As in the previous cases, the account of (b) needs to (iii) have the relevant consequences for (c).

Before presenting my strategy, there are some distinctions and precisions that are worth making. The aim to give an account of conceptual competence seems to be a highly ambitious one and there are of course a number of different proposals all of which would deserve to be seriously taken into account when analyzing what the correct answer to NC might be. One issue that is of particular relevance in this domain is the distinction between conceptual and non-conceptual content. As it is known, many current theories of conceptual competence attempt to address what I am calling the NC precisely by drawing that distinction. Nevertheless, I neither address this specific topic in this paper nor I explore alternative attempts to bridge the gap between the conceptual and non-conceptual domains15. I can dispense of doing that since what I would be arguing for is neutral to those further worries. It should be noted that my claim is not that all cognition should be conceptual but rather that to account for conceptual abilities while meeting NC, the account needs to meet the normativity constraint. So my point is the following: no matter where you draw the line between the conceptual and the non-conceptual, meeting NC requires giving an account of some sort of basic cognition that cannot be reduced to mere dispositions but that, at the same time, can be accounted for in terms that do not presuppose the grasping of propositional fine-grained thoughts.

My proposal is to think of this more basic competence as a normative one and to model the minimal conceptual ability at issue as an ability to respond to standards of correct behavior in a way that suffices to distinguish between cases of absence of application and cases of misapplications of the standard16. The proposal is then to describe that behavior as a behavior of responding to specific standards of correction (hence being assessable as right or wrong according to those standards). Such an account must be one that conceives conceptual abilities in terms of more than mere causal mechanisms without thus committing to an explanatory gap concerning the emergence of propositional fine-grained articulated thought.

We can now define more precisely our question concerning the possibility of accounting for the normative constraint on conceptual abilities accommodating NC in the following terms: what features must a behavior have in order to count as a conduct that is sensitive to correctness patterns (unlike a behavior describable in merely dispositional terms) without thereby committing to it being explained as depending on propositionally articulated thought, thus leading to an evolutionary and explanatory gap.

Surprising as it might appear as first glance, I suggest that the crucial move to answer this question is to focus our attention into the kinds of interactions that basic intelligent creatures are able to deploy. This move is not completely novel in the literature. It was

<sup>14</sup>For a proposal exactly along these lines, see Hutto (2008).

<sup>15</sup>For an overview of the main views that endorse non-conceptual content and discussions thereof, see York (2003).

<sup>16</sup>Some may think that responding to a specific standard of correction should not be classified as a conceptual behavior, but instead representational, and that we should reserve the term "conceptual" for propositional articulated thought and behavior. At this point, this will perhaps be a terminological issue. For a proposal along those lines, see Schmitz (2012, 2013).

perhaps Dewey (1929) the first to emphasize that second-personal interaction is key to the learning of language- and this is a tradition that one can find exemplified in the later Wittgenstein as well as in Davidson's and Brandom's writings17. The crucial point to get clear about though is what *kind* of interaction we are referring to. In particular, we need to specify what features of the behavior at stake, if any (1) display sensitivity to standards of correction and (2) are both basic and at the same time sophisticated enough to meet NC.

A final further constraint on a proposal of this sort is for it to accommodate the available empirical evidence concerning language and concept acquisition. A first step could then be to take a look at the available evidence concerning language acquisition. The empirical study of the way in which such abilities are learned and deployed may help us identify the nature of the capacities involved. Furthermore, it is obvious from an empirical point of view – or at least denying it would be highly implausible – that small children do not have fine-grained articulated thought from the start, so the study of children's development should exhibit the possibility of acquiring the capacity to grasp propositional articulated thoughts departing from previous nonpropositional capacities that characterize the child's earlier stages of development.

I propose that a natural candidate to account for the right kind of behavior capable of accommodating the normative constraint is what I call *sensitivity to correction*, that is the disposition to modify one's behavior in the light of salient assessments of others with whom one is interacting. This claim still needs to gain support from empirical as well as conceptual grounds and I do try to provide such support in the remaining sections of this paper. Available evidence from developmental psychology will also provide some interesting cases of how this second-personal interaction can be conceived. Hence, while taking a look at empirical evidence, I expect to back up both my claim that a middle path between dispositionalism and interpretationism is in order and that such middle path is to be thought of in terms of a second-personal kind of interaction.

# **EXAMINING THE EMPIRICAL EVIDENCE FROM DEVELOPMENTAL PSYCHOLOGY**

As I said, one natural place to look for an answer to this question, framed with NC in mind, is the way children learn concepts.

Csibra and Gergely (2009) have argued that adults–children interaction is essential to the learning of conceptual content. They have conducted a number of experiments that suggests that there is a crucial difference in the subsequent behavior of the infants if they have learnt merely by observation – when the children are just observing the behavior of adults – or through being explicitly taught – i.e., when there is explicit demonstrative reference through the use of language to the objects the concepts apply to in a context in which the child is addressed. What they noted is that only in the latter case children generalize the result to all similar cases, while in the former they conceive of the case as contextually and situationally bound. This provides us a first indication

that interaction plays a crucial role in learning and displaying conceptual abilities as opposed to other kind of learning, where no language is involved.

A second indication that the sort of interaction that humans are capable of might be key to the development of their conceptual abilities comes from primatology. Tomasello (1999) and Tennie et al. (2010) have claimed that chimpanzees are capable of emulating behavior but not of abstracting this conduct from the situational bound contexts in which they first perceive it. This means that while they are capable of imitating the use of tools in performing a specific task governed by their own interests and goals, they do not grasp the general meaning of the object nor of the end that is displayed in the behavior in a way that can be detached from the context and the objects they are observing and using in that specific occasion. This fits well with Csibra and Gergely's (2009) studies suggesting that the interactive aspect of learning in humans involves a capacity to grasp the general, rulelike content of linguistic terms and behavior in a way that is not available to other creatures, and that this specific learning of general meanings takes place through particular training instances in the context of adult–child interactions, not being possible for children isolated from those interactions or for primates other than human who are not capable of those sorts of interactions (ibid)18.

Furthermore, Tomasello and Racokzy (2003) and Schmidt and Tomasello (2012) have studied the conduct of children regarding the enforcement of norms, and they observed that at two years of age children not only asses their behavior according to norms, accompanying what they do with statements of the sort "this is what we do" or "This is how it is done," but also that they teach others (puppets but also adults that they identify as outsiders to the community) and that they complain when others do not conform to what they understand the social norm dictates in that particular situation. This means that children are ready to understand normative standards of behavior and to teach them to others at a very early stage of the development of their conceptual capacities and that they generalize the appropriateness of what they tend to do to all others with whom they are interacting, expecting them to act as they do and complaining if they refuse to do so.

How can this then help us to address NC, considering such behavior is exhibited by young children but not by other primates?

As I said before, there are a number of philosophical theories that have focused on the nature of human intersubjective exchanges to account for our capacity to grasp linguistic meanings. Haugeland (1990) and Brandom (1994) for example, have suggested that it is our attitude of treating a performance as right or wrong in particular contexts what makes that conduct right

<sup>17</sup>Wittgenstein (1953), Davidson (1984, 2001), and Brandom (1994). Also Hutto and Myin (2013).

<sup>18</sup>Csibra and Gergely (2009) have called this specific aspect of the way human beings teach and learn from each other "natural pedagogy." Tomasello (1999, 2014) argues that primates are incapable of engaging in joint action with other primates or humans because they lack the ability to form intentions about other individuals intentions. Here I am not committing to the particular explanation Csibra and Gergely (2009) give of the abilities in which this sort of interactions are based, nor to Tomasello's explanation, in both cases highly sophisticated Theory of Mind abilities seem to be required. Regardless of their explanations, the evidence points toward a key role for interaction in the ability to learn and apply conceptual contents. With the idea of meeting NC, I provide a different and less demanding understanding of what is at issue in interaction that accounts for these differences.

or wrong, and that this is a socially structured practice, in which we treat each other as committed and entitled or not to further actions as if we were playing a social game, the rules of which get specified by us treating the different moves as appropriate or not. Wittgenstein (1953) has also been read as defending a view according to which language should be thought of as a cluster of games that we play together and that it is internal to those games that certain moves are allowed or forbidden. The moves would then be correct or incorrect according to the game in the context of which they are assessed. Nevertheless, these theories are problematic if, as in Brandom's theory, the moves of the game are thought to be propositionally articulated or if they imply interpretational stances on the part of the participants, as interpretationist accounts do. As I have argued before, such positions, if taken to be the whole story, turn out to be unable to meet NC. So I suggest that the right place to look at for is not the domain of interpretational theory but rather a different kind of interactionism, in particular interactionist phenomenologically based theories19.

Such theories start from one basic insight about the nature of social cognition: the fact that we are able to understand directly and correctly emotions on the face of others and their behavior as intentional and goal-oriented from the very first experiences of encountering others. This has been called "primary intersubjectivity." It involves a kind of recognition of others that is displayed by newborns and that is characterized precisely by neither involving any kind of inferential cognitive mechanisms nor any mediation through articulated thoughts, such as attributing states to others. That notwithstanding, it involves more than just mere reactions to stimuli. More precisely, it involves grasping the meaning of the other person's reactions. As Scheler famously described it: "that experiences occur there [in the other person] is given for us in expressive phenomena – [...] not by inference, but directly, as a sort of primary "perception." It is in the blush that we perceive shame, in the laughter joy" (Scheler, 1954, p. 10).

Phenomenology then provides us with a different route to understand the empirical findings of developmental psychology on the nature of normative behavior. It allows us to understand in what sense we are able to grasp the rightness or wrongness of what we are doing without committing us to think of this in a propositionally loaded way. According to these theories, based both in early development psychological studies and a phenomenologically based explanation of them, there is, from the very beginning of our lives, a way of tuning the other person's emotions and it is that tuning, we might think, what first teaches us about the distinction between right and wrong, good or bad, this way or not-this-way.

Having taken a brief look at some recent works on Phenomenology and Developmental Psychology, we have found concurring support for the need to abandon the third-person perspective characteristic of interpretationism, but also the confinement within the first person perspective, characteristic of causalism. Such works suggest the convenience of prioritizing

interlocutors' interactions in face-to-face encounters in which the emotional recognition of the emotions of others might play a key role in our entry to language. It is in this domain, I argue, that we find the kind of behavior that allows distinguishing between conceptual mistakes and absence of application in a way that does not imply yet the reflective and explicit grasping of the standard to which we are nevertheless responding. In particular, I argue that it is our emotional response to approval and disapproval attitudes expressed in the interlocutors emotional behavior what allows us to learn from others language and criteria of correct use for words in contexts of use. Thus, this responsive behavior constitutes a kind of minimal conceptual competence vis-à-vis naturalist and normative constraints. How this allows us to accommodate the normative constraint answering at the same time to NC will be the topic of the next and final section.

## **INTERACTION AND SENSITIVITY TO CORRECTION**

As I have claimed, if the problems of interpretationism and causalism are taken seriously what we need to find is a form of behavior that is not reduced to causal reactions but does not presuppose the ability to entertain articulated thoughts. Furthermore, I have shown that taking into consideration the evidence from developmental psychology regarding the learning of language and norms, the right kind of behavior seems to be essentially interactive.

Advocators of the phenomenologically based interactionist theory usually draw a distinction between two different kinds of intersubjectivity that characterize capacities that are displayed at different stages in the child's development. First, primary intersubjectivity (to be found from birth) is constituted by the ability to recognize emotions and reactions in other person's faces without the use of any theoretical tool in face-to-face encounters. It is a capability that is primary, not acquired, but innate. The conduct of others is recognized as intentional, as directed toward an end. It involves temporal, auditive, and visual coordination with someone else with whom the baby is interacting. It is not substituted by other types of interaction but coexists with them, as a precondition for other abilities and as a complement of them. Later on20, children engage in secondary intersubjectivity, a kind of interaction that is characterized by the ability to identify objects and events in pragmatically meaningful contexts by shared attention mechanisms (based on the abilities gained through engaging in the previous kind of intersubjectivity). In this stage, children refer to the adults gaze when the meaning of an object is ambiguous or unclear. It is in the context of this kind of engagement with others that children learn a natural language by being taught and exposed to it in all sort of interactions21.

<sup>19</sup>Trevarthen (1978, 1979), Hobson (2002), Reddy (2008), and Rochat (2012) have defended and developed this theory from a psychological point of view. Gallagher (2001, 2004, 2007), Gallagher and Hutto (2008) and Gallagher and Zahavi (2008) have provided reasons in favor of if from the philosophical one.

<sup>20</sup>There is some debate about when exactly this happens among advocates of the interactionist theory, ranging from 6 to 18 months of age depending on the author. 21Gallagher and Hutto (2008) have claimed that narratives play a crucial role in the way in which children learn different perspectives and build a conception of themselves and of others that is enriched *vis-à-vis* the primary and emotional sort of engagement characteristic of the initial encounters with others. Even if this may be so, a previous question to be made, following our previous considerations, is how is it that children learn to respond to concepts as standards to assess their own conduct.

My suggestion is that the right place to look for the ability of self-correction is in the context of the capability of engaging in primary intersubjectivity22. It is in that domain that children display a disposition to respond to others, characterized by an attunement to their expectations and an ability to shape their behavior as a way of responding and satisfying the demands of others, paying special attention to the kind of response that their behavior elicits in the adult. This kind of exchanges is possible through common engagements in face-to-face encounters where the emotions of both are directly perceptible for each other. The common contexts in which those interactions take place include objects and their properties, which, as the interaction evolves and the answers become more stable, begin to be understood as independent standing qualities and objects. Throughout this process, joint attention mechanisms among other capacities come into stage and help to develop an early stage conceptual understanding and a primitive form of using concepts that will later became much more sophisticated, gaining independence from particular assessments and responses. Nevertheless, they will never lose their connection with actual uses and assessments of others.

How can we then distinguish between conceptual mistakes and absence of application in this early stage of development? In the previous section, I have examined some relevant work in developmental psychology on the nature of normative behavior and learning. Those studies suggest that interactions are key in that they elicit and display normatively informed behavior that is exhibited in the way in which children respond to adults in learning through two basic attitudes: *generalizing* (what they take to be *correct*) and *enforcing* on others the norm (actively correcting each other, showing that they are not only passively responding to the environment but spontaneously conceiving of what they are doing as an *standard of correction* to which themselves and all others are *supposed to conform*). Accordingly, in the context of the kind of interaction just described, I suggest there is a specific ability that constitutes a better candidate than mere reactions or articulated thought to meet NC. I call such ability *sensitivity to correction*. It can be defined as the disposition to modify one's own behavior regarding the application of a specific concept in the light of the consent and dissent of others with whom one is interacting in face-to-face encounters. Sensitivity to correction so defined is precisely the feature of human behavior that allows us to accommodate the normativity constraint without abandoning the naturalistic conditions of adequacy that constitute NC.

When characterizing the different levels involved in selfcorrection (a pervasive feature of normative behavior), I mentioned: (a) the application of concepts (the actions of applying or misapplying a concept), (b) The ability to evaluate (a) and (c) the modification of (a) according to the results of (b). Both causalist and interpretationist account of conceptual capacities fail to provide a consistent answer to account for the difference between conceptual mistake and absence of application overemphasizing one of the elements, (a) as a model for (b) in the case of causalism, (b) as the all-encompassing interpreter's perspective in the case of interpretationism. My proposal, on the contrary, is to think of level (b) as constituted by *sensitivity to correction*, that is the ability to correct and monitor our own action in the light of the reactions of others toward those very actions23. In this case (a) corresponds to a kind of behavior that displays intentionality, being directed toward an object to which the behavior is responding and (b) corresponds to the dimension in which we self-monitor our reaction to the object by tuning it to the way other reacts to us and our directed behavior. Sensitivity to correction is a social disposition, that is, a disposition to tune our behavior to the assessments and normative feedbacks we get from others in particular interactions. It is then an evaluative attitude that involves the perceiving and attunement to the approval or disapproval from others. Finally, corresponding to (c), the way in which we apply concepts is of course modified through the assessments involved in (b): actually, we may say, assessing our conduct amounts – at least in the most early stages of the acquisition of language and conceptual abilities – to modifying it according to the approval or disapproval of others.

We may now characterize the difference between conceptual mistakes and absence of application given the framework I have just presented. This distinction will take different shapes along the different stages involved in learning and grasping concepts. It will first consist in the ability to correct ourselves by tuning the other person's assessments (monitoring myself through you, trying to make my own the perspective of the other with whom the interaction is taking place). It is a self-monitoring mechanism based upon the convergence of joint attention mechanisms that identify what is salient in the context and of the other's monitoring of my own performance; the individual monitors her conduct taking into account both what she is directed to (level a) and assessing it in accordance to the assessment of others (level b), by then modifying the behavior accordingly (level c). It is precisely through responding to the other's gaze and his attitudes of approval or disapproval that a criteria for the application of a concept in practice can be thought to be in place, as a standard of correction, hence distinguishing the case at stake from one in which the concept is not relevant at all, a case of absence of application. The concept in question would be poor in content at this point and its boundaries blurry. Thus conceptual competence at this stage is understood as a minimum conceptual understanding: but that minimum is exhibited precisely by the fact that the behavior is sensitive to a distinction between right and wrong ways of acting according to specific standards of correction (concepts), and this in turn is equivalent to there being a right way of acting in the world that the other and I share. Sensitivity to correction is, we may say, the phenomenological exhibition of the normativity of concepts. We

<sup>22</sup>Varga and Gallagher (2012) have claimed that the notion of recognition, as an interpersonal demand, that occupies a central role in the discussions of moral normativity, should be traced back to its primary location in this first strongly psychologically based kind of interaction with others. I am claiming that this recognitional competence plays a role in conceptual normativity as well.

<sup>23</sup>According to this view, what is directly perceived are emotions, associated with positive and negative reactions toward other's behavior when conceiving it correct or incorrect. So by extension, understanding such assessment can be thought as based on the ability to perceive these positive and negative emotions and tune to them by changing one's behavior accordingly. The intentional directed behavior of the adults or peers, that is also perceived, will also play a key role in understanding what kind of performance is expected. I am grateful to one of the anonymous reviewers for pressing this point.

can thus distinguish conceptual mistakes from cases of absence of application in that the subject is responding to the assessment of his behavior by modifying it accordingly as will not be the case if it were a case of absence of application. So, what makes the crucial difference is sensitivity to correction, a sensitivity that is displayed in actual interactions. Now, as learning progresses, self-correction gains independence from the presence of actual assessors. And then the subject self-corrects herself according to different actual or imagined scenarios and perspectives that she can reenact. Sociability is still a pervasive and crucial element of self-correcting behavior but is now exhibited as the very idea that I can be wrong according to different standards (which equates to the idea that there are other perspectives)24.

Finally, it is time to consider whether the tools just introduced are capable of properly meeting NC when accounting for the normative dimension involved in concept use. I cannot provide in this paper a detailed and all-encompassing answer to NC but, as it will be shown next, this proposal can give a proper general strategy to meet NC. This general strategy consists in identifying sensitivity to correction as the middle step between mere causal responses to the environment and contentful propositional attitudes. While the latter imply complete independence, flexibility, detachability, and general inferential articulation; the former, on the contrary, only amounts to nomological covariances between states and objects that may fail given an open number of contextual variations. The important point is that between these two ends of the invisible line of development and evolution there are as well different intermediate stages.

Following this strategy, we can then give a general outline of the evolutionary path from creatures without language or thought to creatures with both abilities. In a first very elemental level there may only be reactions to stimuli, being error just a failure in causal mechanisms. The true normative dimension emerges precisely when sensitivity to correction enters into stage, displaying the ability to interact with others (same species, interspecies) in a primary interaction sort of exchange. This hypothesis is supported from the fact, underlined by many evolutionary theories (Tomasello, 1999, 2014; Tomasello and Racokzy, 2003), that the main evolutionary step that distinguishes humans from other species is the ability to engage in social interactions of a highly sophisticated nature. Accordingly, in this stage subjects are capable of applying concepts independently of stimuli and are capable of applying the same concept to different objects and different concepts to the same object25, ultimately gaining the capacity to associate language items with meanings (norms of use of sounds and marks). Thus, the well-acknowledged idea of sociality as the trait characteristic

of the emergence of the human26, when understood in terms of sensitivity to correction, can also explain the emergence of normative behaviors without any explanatory gap. The possibility of interpreting others and ourselves explicitly as following or failing to follow certain norms or rules, an ability that involves already propositionally articulated thoughts, is to be gained by engaging in earlier forms of sociality27.

A similar point can be made regarding the question of ontogenesis, where practical engagements with others in face-to-face encounters (primary intersubjectivity) that display a primitive form of sensitivity to correction progressively lead to secondary intersubjectivity, as a form of interaction involving shared attention mechanisms, monitoring and correcting, in the context of which language is learned. Learning is a process in which the child eventually gets to be a competent user. At the beginning she may need guidance and mainly self-correct when assessed negatively but later on, she will try herself to repeat this correcting behavior thus generalizing what is learnt and gaining autonomy in self-assessing her own behavior. Once again, the third-personal interpretative stance can only get into the picture much later once the full inferential capacity and the capability of complex interpretation processes are in place.

# **CONCLUDING REMARKS**

I have claimed that two of the most popular theories that account for conceptual competence fail when considered against the background of both the NC, i.e., the challenge of accounting for both the ontogeny and phylogeny of conceptual thought without explanatory or evolutionary gaps, and the normative constraint, i.e., the distinction between conduct that is guided by an standard of correction and the conduct that can only be externally assessed as responding to concepts.

Following some insights from developmental psychology and phenomenology, I have presented an alternative framework, interactionist theory, in the context of which the normativity constraint is accommodated in the domain of actual interactions with others in the learning of language and concepts. My central claim was that sensitivity to correction is a social, evaluative disposition that tunes us to other people's assessments of our behavior in actual interactions and allows us to learn from them standards of correction for our actions. This kind of disposition is what makes the difference evolutionarily and in terms of individual development. The fact that human sociality is the main difference between us and other species is pervasively accepted and has independent grounds in evolutionary studies. If we can make sense of the connection between conceptual informed behavior and social behavior, as we

<sup>24</sup>It is important noticing that contrary to Hutto's (1999) and Davidson's (2001) view the idea is not that perceiving other perspectives as such gives a normative dimension to what I am doing, but that first I *attune* my behavior to what others *expect* from me and only latter the difference of perspectives can became salient and object of my own reflection. This last possibility is only present when there is also the capability of grasping explicitly the standards that this other perspectives represent and how they stand to the behavior being assessed.

<sup>25</sup>This is the satisfaction of a simplified version of the Generality Constraint (see Camp, 2009). All these abilities together amount to the acquiring of minimal conceptual capacities (for conditions on minimal conceptuality, see Camp, 2009; Scotto, 2010).

<sup>26</sup>See Sterelny (2012) and Tomasello (2014).

<sup>27</sup>I am making a distinction between three paradigmatic and different abilities: (i) causal responses to the environment; (ii) sensitivity to correction in interaction; (iii) entertaining of propositionally articulated thoughts. This distinction is schematic and it is meant to distinguish important milestones in development and evolution. But this threefold classification should not be taken to characterize one stage in development as opposed to others. On the contrary, those abilities appear in Interactionist Theory only as paradigmatic of some stages that give rise to the others (and multiple other intermediate ones in between) by ways of progressive complexity. Accordingly, each stage in evolution and development integrates in different manners previous stages not by replacing them but by complementing them with new abilities.

have proposed we can, then this gives indirect support to the idea that this might be the crucial step in the evolutionary story of the human species. As for the case of human learning, I argued that recent studies in developmental psychology suggest that it is precisely our ways of engaging with others and understanding them what underlies our capacity to learn from each other the kind of general and abstract meanings that we then deploy in our social lives. The so often underlined social character of human life may find in the idea of sensitivity to correction a further specification capable of illuminating the way in which language and thought emerge.

## **ACKNOWLEDGMENTS**

The research for this paper was supported by the Marie-Curie Initial Training Network, "TESIS: Towards an Embodied Science of InterSubjectivity" (FP7-PEOPLE-2010-ITN, 264828), The National Council for Scientific Research (Argentina) and PICT 02344-2011, National Agency for the Promotion of Science, Argentina.

## **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 May 2014; accepted: 07 July 2014; published online: 23 July 2014. Citation: Satne GL (2014) Interaction and self-correction. Front. Psychol. 5:798. doi: 10.3389/fpsyg.2014.00798*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Satne. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Navigating beyond "here & now" affordances—on sensorimotor maturation and "false belief" performance

# *Maria Brincker\**

*Department of Philosophy, University of Massachusetts Boston, Boston, MA, USA*

#### *Edited by:*

*Hanne De Jaegher, University of the Basque Country, Spain*

#### *Reviewed by:*

*Leon de Bruin, VU University Amsterdam, Netherlands James M. Dow, Hendrix College, USA*

#### *\*Correspondence:*

*Maria Brincker, Department of Philosophy, University of Massachusetts Boston, 100 Morrissey Blvd., MA 02125, Boston, USA*

*e-mail: mariabrincker@gmail.com*

How and when do we learn to understand other people's perspectives and possibly divergent beliefs? This question has elicited much theoretical and empirical research. A puzzling finding has been that toddlers perform well on so-called implicit false belief (FB) tasks but do not show such capacities on traditional explicit FB tasks. I propose a navigational approach, which offers a hitherto ignored way of making sense of the seemingly contradictory results. The proposal involves a distinction between how we navigate FBs as they relate to (1) our current affordances (here & now navigation) as opposed to (2) presently non-actual relations, where we need to leave our concrete embodied/situated viewpoint (counterfactual navigation). It is proposed that whereas toddlers seem able to understand FBs in their current affordance space, they do not yet possess the resources to navigate in abstraction from such concrete affordances, which explicit FB tests seem to require. It is hypothesized that counterfactual navigation depends on the development of "sensorimotor priors," i.e., statistical expectations of own kinesthetic re-afference, which evidence now suggests matures around age four, consistent with core findings of explicit FB performance.

**Keywords: affordances, false belief test, metacognition, sensorimotor priors, developmental psychology, embodied cognition, theory of mind, social cognition**

# **FALSE BELIEF TESTS AND CONFLICTING EXPLICIT AND IMPLICIT FINDINGS**

The question of how and when children learn to understand other people's beliefs and perspectives has long been an object of study and philosophical debate. Many experimental paradigms use false belief (FB) scenarios to test this development. Typical FB paradigms set up a discrepancy between a test subject's accurate information about a scenario and a divergent perspective, which then is used to probe whether the false perceptive of the other is taken into account. Interestingly so-called "explicit" and "implicit" categories of FB tests each consistently point to very different minimal ages for the development of FB abilities.

Explicit FB tests can be exemplified by the Sally-Anne task (Baron-Cohen et al., 1985), which involves a presented story with two protagonists Sally and Anne. The FB discrepancy is set up as the test subject watches that Sally puts a marble in location A and leaves, whereupon Anne unbeknownst to Sally transfers the marble to location B. The experimenter/story-teller then prompts the test subject to anticipate an action by the mislead Sally. e.g., "Where do you think Sally *would* look for her toy?" Researchers have overwhelmingly found that typically developing (TD) children generally do not "pass" this kind of FB test before about age four or older. Younger children have a strong tendency to suggest the current toy-location rather than where Sally left it (Wellman et al., 2001). If one assumes the test tracks abilities to understand "beliefs," then results indicate that TD kids cannot handle others' beliefs before age four. Accordingly, failing to produce the correct answer has been interpreted as revealing either a categorical inability to understand minds (Baron-Cohen, 1995), or performance difficulties with linguistic aspects and/or with prioritizing and using FB information executively over current factual information (Moses et al., 2005; Carruthers, 2013).

Conclusions of relative "mind-blindness" in toddlers have been challenged by various so-called "implicit" experimental paradigms, showing that children seem to track and form FB expectations much earlier. These studies use non-verbal/active participation FB tasks, relying on either looking-time (Onishi and Baillargeon, 2005), communicative reference (Southgate et al., 2010) or helping paradigms (Buttelmann et al., 2009; Knudsen and Liszkowski, 2010, 2012). These paradigms do not explicitly ask the child about *expected actions of another agent*, but rather measures whether their *own action selections vary* significantly with respectively true/false belief conditions of observed others. e.g., in Buttelmann et al.'s helping paradigm the test subject is invited to open either of two boxes given a manipulation where, during the crucial toy transfer, an experimenter is observed either staying in the room (true belief) or leaving (FB). The remarkable finding was that even 18 months-olds varied their box choice significantly with the manipulation of the experimenters "belief scenario." Thus, the results suggested that toddlers might indeed understand FBs. Looking-time paradigms indicate FB understanding perhaps as early as 9–15 months (Baillargeon et al., 2010; see also Reddy, 2008), but I focus uniquely on the helping paradigm and the specific puzzle of why 18–30 month-old toddlers appropriately vary their behavior according to perceived FBs (and also handle object permanence and pretense play), and yet consistently fail Sally-Anne style FB tests.

In addition to the age disparities between Sally-Anne and the helping-paradigm FB findings, the labels of implicit/explicit and their core experimental differences are not easily conceptualized, as discussed by Carruthers (2013). Existing theories often distinguish *information types* of FB tasks; i.e., mental, counterfactual, linguistic, conceptual, representational, etc. and theorize the *domain-specific* or *domain-general capacities* needed to handle these kinds of information. Further, debates often center on whether the observed infant/toddler capacities should be interpreted as based on mindreading or on behavioral rules (Knudsen and Liszkowski, 2010; Perner, 2010; Carruthers, 2013).

This perspective article points to an alternative framework for theorizing the Sally-Anne and helping-paradigm and the developmental processes underlying the discrepancy of their findings. Given the short format I do not argue against any existing theory, but rather propose that a sensorimotor maturation-based explanation would expand the existing interpretive possibility space, as mental processes are modeled differently than current categorizations imply.

The core proposal is that the age discrepancy of FB findings might be rooted in sensorimotor maturation processes taking place around age four, as these might ground the relevant cognitive developments documented at this age. It is suggested that the helping and Sally-Anne paradigms might require different kinds of "navigation," which again depends on different levels of sensorimotor maturation. More precisely a distinction is drawn between the ability to navigate presently available "here & now" perceptual space vs. navigating an imagined, remembered or otherwise currently counterfactual bodily space. The latter ability to "navigate beyond the here & now" is hypothesized to depend on late-developing predictable movement variations, and further that such abilities support successful Sally-Anne FB performance. By contrast it is proposed that helping paradigm FB tasks are based on the child's understanding of the "here & now" social affordance space and thus can be navigated without this aspect of sensorimotor maturation. To explain this unexplored possibility and theorize the distinction between here & now and counterfactual navigation, we need to look to neuroscientific evidence for (1) affordance tracking and decision-making, (2) default systems and self-projection and lastly (3) the maturation of sensorimotor priors.

## **DECISION-MAKING AND NAVIGATION OF "HERE & NOW" AFFORDANCES**

The idea that we track multiple affordances, i.e., perceived action possibilities in the "engageable" space around us, is not new (Gibson, 1977). Recently, the affordance notion has attracted renewed attention, through theories (Heft, 2003; Gallagher, 2005; Rietveld, 2008; De Jaegher et al., 2010), but also via neuroscientific discoveries about cortical fronto-parietal sensorimotor processes and massively parallel and dynamic circuits mediated by cortical and sub-cortical circuits (Cisek and Kalaska, 2010) and advances in e.g., robotics (Horton et al., 2012) where these kinds of ecological agent-environment relations have begun to replace traditional input-output representational frameworks. Cisek and Kalaska point to findings that sensorimotor processes are engaged early, in parallel and support not only action execution but also decisionmaking and action selection between multiple tracked options. They argue these findings of early and parallel affordance tracking are inconsistent with modular input-output information processing frameworks. Further, mirror neuron research has shown that we track object affordances as they relate to perceived others, as well as complex and dynamic social affordances between self and other (Casile et al., 2011; Sartori et al., 2012).

Thus, in opposition to classic notions of the mind as entirely hidden or "sandwiched" between action and perception (Hurley, 2001), a relational affordance story lets decision-making processes partially reveal themselves through not only actual but possible engagements with our environment. On a theoretical level such findings complicate our notion of social perception, as we see not only the actual behaviors of others, but their potential and afforded action targets, and how these relate to our own current action in overall shared affordance space. The key is that affordances alert to potential outcomes as they relate to actual objects and agents in the spatial environment.

Carruthers (2013) hypothesizes that FB abilities require a "domain specific mindreading module" (tracking other's goals, beliefs etc.), domain general planning, and decision-making abilities (own action selection) plus belief attribution processes. But the question is if we need to postulate a separate "mindreading module" and attribution processes for FB understanding. If sensorimotor processes ground a complex and dynamic tracking of current affordance relations of self and other (Trevarthen, 1979; De Jaegher and Di Paolo, 2007; Gallagher, 2012), then might we not oftentimes understand goals and FBs of interaction partners though this tracking of their actions and affordances and how they differ from and dynamically modulate our own?

Fronto-parietal processes have been found to support not only action planning and decision-making but also perception of the affordances of others. Notably these circuits show complex and dynamic properties, as affordances can be social, visible or hidden (e.g., Umilta et al., 2001). Further, the complexities of our context relations undergo various crucial maturation processes, particularly in the first years of life as evidenced by "A-not-Berror" (Smith and Thelen, 2003) and pretense play studies (Leslie, 1987) etc. For our present purposes we should note that affordances and thus action planning of 2–3 year-old toddlers is not necessarily restricted to what is presently *sensed,* but rather to that which offers our spatially situated bodies sensorimotor engagement. Thus, the door behind us might still be tracked as an afforded "escape-route" even if currently unseen. Similarly, a pretend banana, which is not actually there, can–as spatially actable—be part of the shared affordance space and must comply with certain rules of engagement. Research on mirror neuron circuits also indicates that others' actions and affordances are dynamically integrated into this affordance space understanding (Caggiano et al., 2009; Sartori et al., 2012). The "here & now" affordance space might thus contain counterfactual and prospective teleological *elements*—such as unseen and pretense affordances or others' falsely maintained affordances—as long as these are placed in relation to embodied agents (Gibson, 1977; Brincker, 2010, 2012). A source of complexity is that limits to the current affordance space are fluid, but it is an empirical question how far from our current position our skilled, cultured, and toolenhanced bodies can track own and shared potentialities (Iriki, 2006).

In sum, the proposal is that we reach boxes, answer questions, and point to hidden marbles via a "here & now" affordance space, and further that FBs of others might, as long as they relate to the space we concretely inhabit, be understood, tracked, and engaged through our affordance space understanding.

Looking at the typical helping paradigms, they allow the child to incorporate relevant past and present perspectives of others into their present scenario, without requiring them to abstract from their embodied relation to it. e.g., 18/30 month-olds in the Buttelmann et al. study must integrate contrasting perspectives and "false beliefs" of others within their affordance space. However, they need not let go of their pragmatic relations to this current space to perform the task. Thus, it can be interpreted as "here & now" social navigation incorporating FB *content* in their actual body *space*.

In Sally-Anne style tests on the other hand, toddlers might be aware that Sally doesn't know and yet still not pass the test. Perhaps they even have extra difficulties not sharing the true location of the marble precisely *because* they know Sally doesn't know where it is. In navigational terms, passing the Sally-Anne test requires a child–beyond linguistic skills—to be able to handle the *conflicting pragmatic contexts* of the marble hunt story and the experimenter-meta-question. To pass this test one might need to (1) understand and remember Sally's perspective, but also—and this is where we seem to move beyond the here & now—(2) navigate the situation from her counterfactual vantage-point, which involves momentarily setting aside one's current position in relation to the marble, and finally (3) to return to the verbal prompt and respond to what Sally would have done (had we not been there to help her). Each of these subsequent aspects contribute to the complexity of this highly non-cooperative scenario where one in addition to *including the other's perspective*, also must *exclude one's own* embodied knowledge, and *navigate* both via the current affordance space and beyond it.

## **SELF-PROJECTION AND NAVIGATING BEYOND THE "HERE & NOW"**

In contrast to here & now navigation, we sometimes—typically in our thoughts–engage in what we might call *counterfactual navigation*. i.e., when we place ourselves in remembered, imagined or otherwise not-actually-bodily-inhabited-spaces to use the resulting relational body-space understanding for various deliberations. The key is that it is the *relation* that is counterfactual, not the information or the objective existence of the space. Thus, such navigation goes beyond merely including *counterfactual information,* i.e., memories, perspectives or pretense objects, in our actually inhabited space and situated action choices. Rather it is about "placing oneself" and "making moves" via a space, which– although perhaps factually existing—is *pragmatically counterfactual*, and requires one to abstract from actual embodied relations to the current affordance space.

In terms of Sally-Anne tasks, one might thus be able to track Sally's FB in the current affordance space, but unable to plan a response from Sally's perspective and/or shift back and verbalize it in the prompt context, which requires one to ignore the current position of the marble as it relates to both oneself and Sally (see Bloom and German, 2000; Rubio-Fernández and Geurts, 2013 for Sally-Anne variations that probe some of these behavioral complexities and age limitations). Own prior beliefs and misperceived affordances might also under some conditions require counterfactual navigation, as in e.g., the Smarties box/ appearance-reality paradigms (Gopnik and Aslington, 1988). In short, task aspects of planning, deciding and inferring via remembered/projected scenarios can all be interpreted as involving navigating beyond the current affordance space, as one needs to relate to options, which cannot be interpreted through the sensorimotor affordance space that our situated bodies are actually dwelling in.

The idea of such counterfactual navigation differs from traditional modular, non-relational, and knowledge-focused aspects of "theory of mind" theories (e.g., Leslie et al., 2004; Carruthers, 2013), as no domain specific mindreading or ToM module is postulated. Rather the crucial distinction of the navigational hypothesis pertains not to the other mind *content per se* but rather to how it is presented, assessed or navigated by the understanding subject. Similarly, though there are parallels between the idea of counterfactual navigation and that of off-line simulation (e.g., Heal, 1996; Goldman, 2006), an important contrast is that counterfactual content or "pretense" can play a role in both kinds of navigation. Thus, specific overlaps and contrasts exist to aspects of traditional theory, ToMM etc.

Interestingly, the core cortical areas implicated in much social cognitive research of "theory of mind" have been found to overlap greatly with the default-mode network (Schilbach et al., 2008). The default mode network was precisely isolated due to its sustained activity in the absence of current external-directed attention and stimuli (Gusnard and Raichle, 2001), and has been interpreted as supporting various kinds of "self-projection" (Buckner and Carroll, 2007) or "internal mentation" (Andrews-Hanna, 2012), whether these pertain to social cognition, memory or future projection. In other words, it fits with a notion of navigation beyond the here & now.

Another line of social cognitive research that could be reinterpreted within a navigation framework is the role of the rTPJ in thinking about other minds. This cortical region has been consistently implicated in imaging studies and thus on a modular account proposed to support "theory of mind" (Saxe and Wexler, 2005). However, another possibility is that the region is important for *shifting* between here & now navigation and counterfactual navigation. This hypothesis would fit with broader evidence regarding the role of the rTPJ in attentional shifts (Mitchell, 2008).

These preliminary notes are meant simply to highlight that the navigation hypothesis throws a new light on existing imaging findings and most importantly is empirically tractable.

## **MATURATION OF SENSORIMOTOR PRIORS AND COUNTERFACTUAL NAVIGATING**

The distinction between these two kinds of navigation is proposed as a new interpretation of the age discrepancy of implicit vs. explicit FB tests. The idea is that what is maturing around age four might not be the ability to represent minds, non-current facts or that others have counterfactual beliefs. Rather the proposal is that a counterfactual use of one's embodied learning becomes possible, which again allows for ignoring "here & now" perspectival affordances and for acting on the basis of a navigation of counterfactual space. Toddlers are typically capable of remembering, making predictions, telling us what we don't know etc., and they actively do so in present contexts (Poulin-Dubois et al., 2007; Apperly and Butterfill, 2009). What the smaller children might not be able to do is pragmatically navigating counterfactual terrains. The hypothesis is thus that non-current information only can take on the needed affordance organization for action choice in relation to their current body.

But why would the counterfactual navigation not be available to toddlers? In other words, according to the navigation hypothesis what is the crucial maturation that happens around age four? Recent experimental data from Elizabeth Torres' sensorimotor lab points to a fascinating answer. She found that typically developing (TD) 3-year-olds do not yet have statistical predictability of temporal features of their limb movements (Torres et al., 2013). One might say that they do not yet have "sensorimotor priors" with respect to their own bodily movements and their ensuing reafferent sensations (Von Holst and Mittelstaedt, 1950). Notably, Torres found that hand movement variations go through a crucial maturation precisely around age four. More specifically the variations are noisy and random in toddlers but then begin to show predictability and better signal-to-noise ratios in the 4 year-olds.

These are remarkable findings and their potential goes beyond this current project. The hypothesis that I would like to bring attention to here is that such "sensorimotor priors" can precisely be seen as a kind of predictable probabilistic body, an abstract body that we can "bring into" counterfactual scenarios and thus use to navigate and make decisions in spaces we do not stand in current embodied relations to. The idea is that only when we have established a reliable and predictable baseline expectation about our own re-afferent movements, can we use these hypothetically. In other words, sensorimotor priors as such embodied expectations might ground abstract navigational relations to non-present spaces, and thus allow us to navigate beyond the pragmatic relations of our actually situated bodies. Torres further found that this sensorimotor maturation does not follow the usual trajectory in individuals with autism—which suggests that they have to rely on their "here & now" body and world sensation in very different ways than TD children, and offers a new perspective on their social interaction and FB task difficulties (Brincker and Torres, 2013; Torres, 2013; Torres et al., 2013).

Thus, the proposal is that typical 2–3 year-olds can engage in elaborate pretense incorporating hypothetical content such as memories, fantasy objects and alternative perspectives in their present affordance space, relating to their bodies and sensorimotor capabilities. But they might not have the statistical body expectations needed to navigate counterfactual spaces "in their head" so to speak. The proposal is thus that the helping paradigm does indeed show early understanding of FBs, but not the ability to go beyond the current affordance relations to counterfactual spaces or mental navigation, which seems to be needed in the traditional Sally-Anne task. Under this framework, the development of sensorimotor priors around age four might transform some here & now social knowledge and interactions, but does not move the child from "mind-blindness" to "mindreading."

## **REFERENCES**


Gibson, J. J. (1977). *The Theory of Affordances*. Hillsdale, NJ: Lawrence Erlbaum.


Reddy, V. (2008). *How Infants Know Minds*. Cambridge: Harvard University Press.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 July 2014; accepted: 24 November 2014; published online: 15 December 2014.*

*Citation: Brincker M (2014) Navigating beyond "here & now" affordances—on sensorimotor maturation and "false belief" performance. Front. Psychol. 5:1433. doi: 10.3389/fpsyg.2014.01433*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Brincker. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# We can work it out: an enactive look at cooperation

# *Valentina Fantasia1\*, Hanne De Jaegher 2,3 \* and Alessandra Fasulo1*

<sup>1</sup> Centre for Situated Action and Communication, Department of Psychology, University of Portsmouth, Portsmouth, UK

<sup>2</sup> IAS-Research Centre for Life, Mind, and Society, Department of Logic and Philosophy of Science, University of the Basque Country, San Sebastián, Spain

<sup>3</sup> Centre for Computational Neuroscience and Robotics, University of Sussex, Brighton, UK

## *Edited by:*

Eddy J. Davelaar, Birkbeck College, UK

#### *Reviewed by:*

Joachim Funke, Ruprecht-Karls-Universität Heidelberg, Germany

Nicola Yuill, University of Sussex, UK

#### *\*Correspondence:*

Valentina Fantasia, Centre for Situated Action and Communication, Department of Psychology, University of Portsmouth, King Henry 1 Street, Portsmouth, Hants PO1 2DY, UK e-mail: valentina.fantasia@port.ac.uk; Hanne De Jaegher, IAS-Research Centre for Life, Mind, and Society, Department of Logic and Philosophy of Science, University of the Basque Country, Avenida de Tolosa 70, 20018 San Sebastián, Spain e-mail: h.de.jaegher@gmail.com

The past years have seen an increasing debate on cooperation and its unique human character. Philosophers and psychologists have proposed that cooperative activities are characterized by shared goals to which participants are committed through the ability to understand each other's intentions. Despite its popularity, some serious issues arise with this approach to cooperation. First, one may challenge the assumption that high-level mental processes are necessary for engaging in acting cooperatively. If they are, then how do agents that do not possess such ability (preverbal children, or children with autism who are often claimed to be mind-blind) engage in cooperative exchanges, as the evidence suggests? Secondly, to define cooperation as the result of two de-contextualized minds reading each other's intentions may fail to fully acknowledge the complexity of situated, interactional dynamics and the interplay of variables such as the participants' relational and personal history and experience. In this paper we challenge such accounts of cooperation, calling for an embodied approach that sees cooperation not only as an individual attitude toward the other, but also as a property of interaction processes. Taking an enactive perspective, we argue that cooperation is an intrinsic part of any interaction, and that there can be cooperative interaction before complex communicative abilities are achieved. The issue then is not whether one is able or not to read the other's intentions, but what it takes to participate in joint action. From this basic account, it should be possible to build up more complex forms of cooperation as needed. Addressing the study of cooperation in these terms may enhance our understanding of human social development, and foster our knowledge of different ways of engaging with others, as in the case of autism.

**Keywords: cooperation, development, autism, infancy, social interaction, participatory sense-making**

# **INTRODUCTION**

The ability to cooperate has received increasing attention over the past years, particularly by researchers from analytical philosophy, and from developmental and comparative psychology. Cooperation has been described as the "coordinated, synchronous activity that is the result of a continued attempt to construct and maintain a shared conception of a problem" (Teasley and Roschelle, 1993) or, more basically, as consisting in "(i) acting or working together and (ii) a common or the same end or purpose" (Tuomela, 2000, p. 3). One of the reasons why cooperation has been considered such an important topic in the past two decades, is its apparent importance in exploring differences between humans and other animals (especially great apes; Tomasello et al., 2005; Tomasello, 2009). Moll and Tomasello (2007, p. 1) have argued that "among primates, humans are by far the most cooperative species, in just about any way this appellation is used (...) constituted by all kinds of cooperative institutions and social practices with shared goals and differentiated roles."

Despite its extensive exploration by philosophers and psychologists, a clear description and understanding of what makes an activity cooperative is still controversial. This is because cooperation is often described, by mainstream accounts, as depending on high-level social skills, and this, as Butterfill (2012) puts it, "already presupposes too much sophistication in the use of psychological concepts" to be applicable in the investigation of more basic forms of cooperation. Indeed, most of the empirical studies on children's cooperation are based on inferential and mentalistic theoretical accounts, which may not be the most adequate framework to study how it emerges in typical and atypical developmental paths.

In this paper we challenge these theoretical models and propose to widen the exploration of what cooperation is, what kind of experiences may support someone's cooperative participation in joint actions, and how this participation may develop over time. Widening the concept of cooperation, we aim to explore the different interactional modalities for it to work out, including those that are not explicitly or previously agreed on as cooperation. We end by drawing some implications of such a change in perspective for cooperation in infancy and in autism.

## **PHILOSOPHICAL ACCOUNTS OF COOPERATION**

Current theories of joint action have attempted to describe cooperation as a phenomenon primarily based in cognitive abilities. These theories depict social encounters (and cooperative actions) as encounters of minds, where participants have to infer each other's beliefs and desires to understand and predict the other's intentions and moves. Central to these theories is the concept of shared intentionality. Many philosophical theories propose that joint actions require the creation of shared (or collective, or joint) intentions1 (Gilbert, 1989; Bratman, 1992, 1993; Searle, 1995; Tuomela,1995). Sharing intentions is possible when partners make individual plans for achieving a common goal, and then formulate predictions upon the other's intention to achieve the same goal (Gilbert, 1989, 2000; Bratman, 1992; Tuomela, 1993, 2005; Pacherie, 2006). Shared intentions, according to Bratman (1992) are defined as a set of interrelated individual intentional states. In shared activities, he claims, "each agent intends that the group perform the joint action in accordance with and because of meshing subplans of each participating agent's intention that the group so act." (Bratman, 1992, p. 333). According to this view, a joint activity is the result of a shared intention, and a shared intention is simply a pattern of "interlocking" plan-intentions of the participants about which they have common knowledge. Essentially, for cognitivist philosophical approaches, partners engage in cooperative actions if they are able to infer each other's thoughts and plans, and combine them to build their co-actions in some shared way.

As the interest in exploring joint intentionality and joint actions has grown, further theorization followed the original descriptions of cooperation. Building up on Bratman's account, for example, Tummolini (2013) suggests that representing one's own goal and those of others from a third-person observational perspective is also a necessary cognitive ability to collaborate, along with mindreading. Thanks to this allocentric representation of goals (as he names it) individuals are endowed with both "an intention in favor of the joint action and one in favor of a joint mode of reasoning," which enables them to coordinate in a joint action. Other researchers have attempted to formulate less cognitively demanding accounts of shared intentionality, yet still considering representing intentions at the very ground of any joint action. Sebanz et al. (2006, p. 70), for instance, proposed that a successful joint action "depends on the abilities (i) to share representations, (ii) to predict actions, and (iii) to integrate predicted effects of own and others' actions."

Because they presuppose the presence of high-level sociocognitive capacities, standard accounts of cooperation hardly apply to those who do not possess propositional knowledge about others' intentions, such as young children or animals, and some philosophers have already questioned this assumption. Tollefsen (2005), for instance, argued that awareness of another's intention may not depend on inferring it, but on the ability to track intentions-in-action. She argues that attending to each other's actions provides participants with a shared perceptual space constructed through joint attention dynamics. In this shared space, intentions-in-action are perceptually overt and identifiable so that even young children without a "robust theory of mind" (p. 81) can theoretically engage in cooperative activities. Despite these developmental concerns, the author

explicitly renounces to address how this perspective can be effectively applied from very early in development, by saying that "[p]rior to the first year, young infants are like windowless monads" (p. 80), implying that they cannot yet interact. By stressing the importance of joint attention and social referencing mechanisms (as defined by Tomasello, 1995) for the building up of a shared space, she neglects the possibility of earlier forms of cooperation, e.g., in infancy. Similarly concerned with understanding the role of joint action in development, Butterfill (2012) proposed to replace the concept of shared intentions with that of shared goals. Sharing a goal, in his view, only requires agents' goaldirected actions to be coordinated, but does not imply knowledge. This move should make cooperation possible in early development. However, he also claims that possessing a shared goal requires representing goal-directed actions, and the way this is achieved by young children, in his proposal, is not completely clear.

We find all these arguments to reflect a general problem with the cooperation research reviewed so far: cooperation is framed in its full-blown, adult form and therefore it is difficult to see how those who do not have high socio-cognitive skills (including representing goal-directed actions) or experience could possibly cooperate. This is our main concern in the present paper.

## **COGNITIVE DEVELOPMENTAL ACCOUNTS OF COOPERATION**

Defining what is to cooperate from a developmental point of view is challenging. Recent developmental research in psychology has endorsed a cognitivist account of shared cooperative activities, suggesting that a major step in children's social cognitive development occurs when, at around 12–14 months, children begin to engage with adults in cooperative activities involving an understanding of interdependent roles (Tomasello et al., 2005), and are generally motivated to help the other to accomplish her role if needed (Moll and Tomasello, 2007). Therefore, in order to cooperate, it seems that "children must be able to represent, monitor, and regulate both their own and the partner's behavior relative to their relation to a single, common goal" (Brownell and Carriger, 1990, p. 1165).

To empirically investigate early cooperative skills through abilities such as perspective taking and understanding of the other's intentions and goals, most of the studies on young children have adopted specifically designed lab tasks involving role reversal or simultaneous coordination of movements (Brownell and Carriger, 1990; Warneken et al., 2006, 2012). In the majority of these studies, successfully performed joint tasks would set the age threshold for attributing cooperative abilities and instrumental helping to children.

For example, Brownell et al. (2006) observed children at 19, 23, and 27 months of age engaging in peer cooperative problem solving tasks. In these tasks, each child had to pull simultaneously or sequentially one handle of a wooden box to activate a musical toy mounted on the box. Activating the toy by coordinating each other's timing and movements would lead to successful performance of the task. The researchers found that 1-year-old children coordinated their actions more by coincidence than in a cooperative way, whereas older children appeared to be more actively cooperating toward a shared goal. They took these results

<sup>1</sup>We will not go into the debate here about specific differences between shared or collective intentionality or other denominations as it is not relevant for our argument. For an overview of analytic standpoints on the terms, see (Schweikard and Schmid, 2013).

to confirm their view that the ability to cooperate depends on "being able to represent and to share goals and intentions with a partner" (p. 806); an ability that, according to the study, could only be seen over the second and third years of life. Another example is a study in whichWarneken and Tomasello (2007)investigated instrumental helping and cooperation in 14-months-olds children. Instrumental helping was defined as providing help to people in completing a task, e.g., pick up an out-of-reach object, whereas cooperation was measured through a series of cooperative tasks to be resolved jointly, such as retrieving an object from a vertically movable cylinder embedded in a platform. Results showed that at 14 months children reliably helped a partner who could not achieve a goal, but cooperated successfully only in tasks demanding low coordination. The authors concluded that "Helping might be easier for children than cooperating because it requires the understanding of what another individual intends to do (...), whereas cooperation requires the ability to form a shared goal and to mesh plans of action toward that goal" (ibid. p. 291). In other words, helping would only require to read another's intention, whereas cooperation would also need for one's own and the other's intentions to be co-dependent and converge.

In sum, developmental research has attempted to define the beginning of cooperation by setting tasks based on similar premises, thus designing practical tasks that need not only inferring but also mobilizing well-formed intentions to be completed. These premises derive from the mainstream philosophical accounts of cooperative actions, which propose that to be engaged in a cooperative action requires possessing mind-reading abilities, and abilities to align one's own intentions and beliefs with the other's, although milder, less cognitively weighted positions have also been proposed. In the next section we will discuss what we believe are some pitfalls of both the existing theoretical and methodological approaches to the study of cooperation.

## **METHODOLOGICAL AND THEORETICAL ISSUES WITH STANDARD APPROACHES**

To put shared intentionality at the very basis of shared cooperative action raises the question of how humans get to know others' intentions and goals. On the standard accounts, this is done by use of a theory of mind or a simulation mechanism, which is "any cognitive system ... that predicts or explains the behavior of another agent by postulating that unobservable inner states particular to the cognitive perspective of that agent causally modulate that agent's behavior" (Penn and Povinelli, 2008, p. 394). This cognitive system is often thought to be supported by the so-called social brain (Frith and Frith, 2003; Frith, 2007).

## *If intentions are hidden, are joint intentions hidden too?*

Within mind-reading approaches, social understanding requires, among other things, being able to get access to another's intentions, or more in general, contents of the mind. The "problem" of understanding others' minds is based on the premise that intentions are hidden and private, that is, that others' intentions (like thoughts, ideas, beliefs) need to be inferred through complex representational operations (Apperly, 2011). Now, how are such intentions shared? On standard representationalist accounts, this is often

proposed to happen through some forms of mental alignment, for instance by simultaneous mirror system activation (Gallese, 2003; Pacherie, 2006; Sebanz et al., 2006). In this view, everyone has her own understanding of others'intentions to jointly perform an action, but how these understandings become shared remains unclear. For example,Knoblich and Sebanz (2008) have attempted to explain how people can form intentions to act together in three steps. First, they need to be able to derive the other person's intentions behind her object-directed actions or actions directed to her partner. Then, actors need to be able to keep knowledge of these intentions separate from their own intentions. Eventually, "There needs to be an intentional structure that allows an actor to relate his/her own intention and the other's intention to an intention that drives the joint activity" (Knoblich and Sebanz, 2008, p. 2025). Although it may seem very basic, this definition is still quite cognitively demanding, and does not solve the main problem of how an "intentional structure" works. Is it individual or shared, implicitly or explicitly created?

There seems to be a gap here in the form of an empty space *in between* people: these approaches have explained shared intentionality from an observer's perspective, but not from a participant's one. This is in line with criticisms of the standard approach to social cognition (e.g., Gallagher, 2001; Leudar and Costall, 2009) and with views on interpersonal alignment as primarily based on embodied engagement (Macmurray, 1991; Braten, 2003; De Jaegher and Di Paolo, 2007; Fuchs and De Jaegher, 2009; Reddy and Morris, 2009). Shotter (1983, p. 39) nicely summarized these alternative positions: "Motives, intentions, sentiments are (. . .) directly perceived by those directly involved in [a joint action] as first person actors and second person recipients in that activity. Only third person observers have to make inferences."

Another consideration is whether we need *to know* that we are cooperating in order to be able to cooperate. Often, cooperation is presupposed as something we set out to do, so that actions are either clearly cooperative or not – a separate and identificable type of action altogether. This may indeed sometimes be the case, for example when two people meet to perform a certain shared task, like bathing a very agitated dog. But taking this idea as the starting point for understanding cooperation presupposes that we already know what it is, and so we do not need to define the elements out of which it could arise. It precludes, for example, the possibility that cooperation arises without there being a predefined intention or motive to cooperate, while this may be key to understanding how people get to cooperate in the first place. Shared goals may emerge during the course of an interaction, and so participants can "roll into" cooperation without having previous awareness of it. For instance, making space for someone who enters a crowded bus is achieved by the new and old passengers together, each adjusting movements and postures. Here, a common goal emerges out of the interaction and in the context of a small space to be shared as smoothly as possible. Understanding this emergent kind of phenomenon will give us further insights into what cooperation is and how it works.

## *Where is development?*

We may question to what extent we can explain the role of cooperative actions in children's development if we conceive of cooperation as heavily relying on high cognitive skills, and a long experience with social interactions. As Butterfill (2012, p. 24) wrote:

If the leading account were the whole truth about joint action, engaging in joint action would presuppose, and therefore not explain, much of the development of reasoning about others'mental states. (. . .) We need a further account of joint action, one that is compatible with the premise that joint action plays a role in explaining how humans develop abilities to think about minds.

Furthermore, developmental research on cooperation is based on a rather restricted pool of tasks, which are designed to assess cooperative problem solving and related abilities like role reversal, perspective-taking and joint attention. These do not necessarily cover the whole range of possible cooperative interactions in a child's life, as there are many situations (some of which we discuss below) in which a clear, explicit division of roles and statement of goals is not needed. Furthermore, the structure of these tasks implies a"pass or fail" evaluation and seems therefore more appropriate to detect when cooperative skills are already present, rather than telling us how they emerge or develop in time (Thelen and Smith, 1994).

Which view on cooperation one adopts is likely to have rather serious consequences when studying cooperative exchanges in typical and atypical development. This is, for example, the case with research on cooperation in autism. Studies on cooperation in autism that are based on mind-reading and perspective-taking abilities2 find that children with autism are less successful than children with developmental delay (Sally and Hill, 2006; Liebal et al., 2007). However, this does not mean that they are completely incapable. For instance they seem able to help an adult as needed (Liebal et al., 2007), particularly when they understand the other person's goals toward an object (Aldridge et al., 2000; Carpenter et al., 2001). Liebal et al. (2007) explained these findings in terms of a specific impaired understanding of the partner's role within the cooperative task that would not apply when the situation does not require knowledge of and agreement on each partner's role. Thus, it may be that children with autism can succeed in cooperative tasks, if they do not entail an explicit understanding and prior agreement on each partner's role. Similarly, if they are given appropriate interactive support, e.g., if they are helped with being aware of the other person in the interaction, they can cooperate in a dual-control technology task (Holt and Yuill, 2014).

In conclusion, to study cooperation as it develops and in conditions implying impairments in social skills we need to investigate it at a more basic level than has been done so far. In the next section, we put forward our proposal, which looks at cooperative interactions from the point of view of what is at stake for the individuals participating in them, and the organization of cooperative interaction processes. For doing this, we will use the concepts

and research tools of *enaction*, a specific approach to cognition within the embodiment movement in cognitive science (Varela et al., 1991; Thompson, 2007; Di Paolo et al., 2010).

# **THE ENACTIVE PERSPECTIVE ON SENSE-MAKING AND SOCIAL INTERACTIONS**

Enaction is a non-reductive naturalistic approach that proposes a deep continuity between living and cognitive processes. It is a scientific program that explores several phases along this life-mind continuum, based on six mutually supporting, operational concepts: *autonomy*, *sense-making*, *embodiment*, *emergence*, *experience*, and *participatory sense-making* (Varela et al., 1991; Thompson, 2005, 2007; De Jaegher and Di Paolo, 2007; Di Paolo et al., 2010). Here, we first introduce two of its main concepts: *sense-making*—the enactive notion of cognition in general; and *participatory sense-making*—enactive social cognition. In section 3, we start applying these ideas to understanding cooperation.

## **SENSE-MAKING**

For enaction, "the mind is seen not as inhering in the individual, but as emerging, existing dynamically in the relationship between organisms and their surroundings (including other agents)" (McGann et al., 2013). Or, as Merleau-Ponty (1962, p. 430) already put it:

The world is inseparable from the subject, but from a subject which is nothing but a project of the world, and the subject is inseparable from the world, but from a world which the subject itself projects.

In this view, the paradigmatic cases of cognizers are living organisms (Varela, 1997; Thompson, 2007). One of their crucial properties is their constitutive and interactive *autonomy*, which is defined as a network of dynamical processes (metabolic, immune, neural, sensorimotor, etc.) that actively generates and sustains an identity under precarious conditions (Di Paolo, 2005). An autonomous system constantly produces itself physically, and regulates its interactions with the world to satisfy the needs created by its precarious condition (Di Paolo, 2005). The living organism spontaneously generates its own goals and responds to the environment (McGann, 2007), in accordance with its selforganization. The cognizer is therefore always situated in a world that is significant for it, based on this perspective based on need. Its world is not pre-given but largely *enacted*, i.e., shaped as part of its autonomous activity. For the enactive approach, cognition is *embodied*, meaning that a cognizer's activity depends non-trivially on the body. The body is more than just anatomical or physiological structures and sensorimotor strategies; it *is* the precarious combination of various interrelated self-sustaining identities (organic, cognitive, social), each interacting with the world in terms of the consequences for its own viability (Di Paolo, 2005).

These ideas together ground the enactive characterization of cognition as *sense-making*: a cognizer's adaptive regulation of its states and interactions with the world, with respect to the implications for the continuation of its own autonomous identity. The concept of sense-making describes the relation between an autonomous agent and the world of significance it enacts. It therefore does not conceive of cognitive processes as

<sup>2</sup>Mainstream accounts of autism have long proposed that people with autism have difficulties in mind-reading (Baron-Cohen, 1989; Dinishak and Akhtar, 2013), joint attention (Loveland and Landry, 1986), or impairments in turn-taking skills (McEvoy et al., 1993), although these findings are not uncontroversial, and even primary proponents recognize that there is always a number of participants who do pass the tests (Happé, 1994; but see also Boucher, 1989, 1996, 2012; Gernsbacher et al., 2008).

representational and avoids the known problems of cognitivism. Organisms do not passively receive information from their environments, which they then translate into internal representations whose significant value is to be added later. Natural cognitive systems participate in sense-making as a relational and affect-laden process grounded in biological organization (Jonas, 1966; Varela, 1991, 1997; Weber and Varela, 2002; Di Paolo, 2005; Thompson, 2007). Sense-making, thus, is *valued* or *concerned* acting and interacting, leaving no gap between affect and cognition they are one in the relation of significance between cognizer and world.

## **PARTICIPATORY SENSE-MAKING**

Having briefly explained what enactive cognition is, and sensemakers' inherently meaningful perspective on and interactions with the world, let us now take a closer look at social encounters, the second main element in our enactive sketch of cooperation. The enactive approach considers sociality in its broadest form, namely as*intersubjectivity*, or the meaningful engagement between subjects (Reddy, 2008), in which three aspects are crucial: engagement, meaning, and subject. Meaning and subjectivity have been explained above in terms of sense-making, namely as the way living (cognizing) systems always meaningfully engage with their environment, because they are self-organizing and self-maintaining. In this section, we turn our gaze on engagement between such concerned subjects.

Crucial to the enactive approach is the focus on *social interaction processes*, which are complex phenomena involving different dimensions of verbal and non-verbal behavior, varying contexts, numbers of participants and technological mediation. They impose strict timing demands, involve reciprocal activity, exhibit a mixture of discrete and continuous events at different timescales, and are often robust against external disruptions. Essential to interaction is that it involves *engagement* between agents. Engagement (Reddy, 2008; Reddy and Morris, 2009) captures the qualitative aspect of social interactions once they start to "take over" and acquire a momentum of their own. It also reflects the way this experience is described in everyday language (e.g., "being in sync with someone"). Experientially, engagement is the fluctuating feelings of connectedness with one another, including that of being in the flow of an interaction.

In order to capture this taking-over aspect of engagement, enaction defines social interaction in terms of the autonomy (as defined above) of the interaction process and that of the individuals involved, as:

a co-regulated coupling between at least two autonomous agents, where: (i) the co-regulation and the coupling mutually affect each other, constituting an autonomous self-sustaining organization in the domain of relational dynamics and (ii) the autonomy of the agents involved is not destroyed (although its scope can be augmented or reduced; De Jaegher et al., 2010, pp. 442–443; also De Jaegher and Di Paolo, 2007, p. 493).

Apart from each agent involved in such a coupling contributing to its co-regulation, the interaction process itself also self-organizes and self-maintains. To illustrate this, think of how sometimes, when you encounter someone coming from the other direction in a narrow corridor, you end up in front of each other, then each step

aside, moving to the same side at the same time, preventing both of you from continuing on your way. This simple example shows how the interaction process can become autonomous or "take on a life of its own." At the same time, the interactors also maintain their autonomy as participants. This is a necessary condition for calling an interaction social, because if one of the participants loses their autonomy, for the other it would be like interacting with an object or a tool, and thus not a social interaction anymore (De Jaegher and Di Paolo, 2007).

Social interactions are sustained by processes of embodied coordination, including its breakdowns and repairs (De Jaegher and Di Paolo, 2007; Di Paolo and De Jaegher, 2012). Coordination does not necessarily require cognitively complicated skill. Analyses of social interactions and conversations in social science show that participants can unconsciously coordinate their movements and utterances, and this is already the case in mother-infant interactions (Condon and Sander, 1974; Stern, 1977/2002; Condon, 1979; Scollon,1981; Davis,1982; Tronick and Cohn,1989; Kendon,1990; Grammer et al., 1998; Malloch, 2000; Jaffe et al., 2001; Issartel et al., 2007; Malloch and Trevarthen, 2009). With the concept of coordination and other dynamical systems tools, interaction dynamics can be measured (see e.g., Kelso, 2009). Moreover, they can be related to neural activity (see e.g., Lindenberger et al., 2009; Dumas et al., 2010, 2012; Cui et al., 2012; Di Paolo and De Jaegher, 2012; Konvalinka and Roepstorff, 2012; Schilbach et al., 2013).

Based on this definition of social interaction, and the notions of sense-making and coordination, we can now characterize social understanding as *participatory sense-making*: If, as indicated above, we make sense of the world by moving around in and with it, and we coordinate our movements with others when interacting with them, this means that we can coordinate our sense-making activities. That is, we literally *participate in each other's sense-making activities*. Thus, on the enactive account, social understanding is understood as the generation and transformation of meaning together in interaction (De Jaegher and Di Paolo, 2007; De Jaegher, 2009; Fuchs and De Jaegher, 2009). Participants co-create the interactive situation, *but also* the interaction process as such influences the sense-making that takes place. If a social interaction is as characterized, then people can act together, also for no apparent end or purpose of their own, or even against their individual ends (e.g., the corridor encounter). Even without a shared intention to start with or when entered into against their will by the participants, interacting can change or affect one's ends or purposes.

This has an interesting consequence for understanding intentions, namely they are truly generated and transformed interactionally, and interacting with each other opens up new domains of sense-making that we would not have on our own. This contrasts with the way intentions are conceived in cognitivist approaches to cooperation, as introduced above, namely as hidden, and only shareable by high-level cognitive mechanisms. On our account, intentions do not first arise or are first made individually, but they emerge as the interaction goes on (Di Paolo, under review). Therefore, intentions are visible and understandable by each participant, also in cooperative interactions, as they are contextualized and stemfrom that specific ongoing interaction.

This makes understanding and aligning with the other's intentions un-mysterious: it happens in doing things together, which is moving together, since movements are already and always imbued with meaning for sense-makers (Johnson, 2007; Sheets-Johnstone, 2011; Merritt,2013). On the basis of this, we can see how intentions can evolve in their jointness, meanings and specificity for those involved throughout interaction, including cooperative ones.

## **COOPERATION AS A PROCESS**

Here, we start from the most rudimentary or minimal form of cooperation, in order to make it understandable from a developmental point of view. With the enactive concepts of sense-making and participatory sense-making in hand, let us now look again at cooperation, starting from its basic definition as "(i) acting or working together and (ii) a common or the same end or purpose" (Tuomela, 2000, p. 3). Now, considering social interactions as already cooperative in a basic sense (in line with our enactive approach), we want to characterize our approach to cooperation starting from this definition by Hubley and Trevarthen (1979, p. 58):

cooperation means that each of the subjects is taking account of the other's interests and objectives in some relation to the extrapersonal context, and is acting to complement the other's response."

In our view, "taking account of the other's interests and objectives" does not need inferencing, as we argued, but happens through embodied interactions that are meaningful in the given situation and in the interactional history. These actions are complementary in that they fit each other in some form. This is not only the case for positive co-operations but also for situations in which we argue and disagree about something, where some complementarity is still needed in order for the disagreement even to be played out. This means that there are different forms, layers, and aspects of cooperation: embodied, in time, in space, in topic, imitative or complementary, etc. The fact that we are interacting guarantees that some basic cooperative layer is present (e.g., in the corridor situation, we cooperate to stop cooperating) and therefore, every time we interact, we cooperate, in a basic sense. Also, since sense-making always involves affect, this view of cooperation becomes less intellectualistic and begins to investigate how affective processes may be involved in cooperation. Then, the challenge is to investigate what further levels of cooperation are present in a specific interaction or situation, over and above the basic interaction process. This can involve different, increasingly more complex levels of sense-making.

Like the enactive approach, interactionist approaches such as ethnomethodology and conversation analysis have also based their empirical program on a theory of social interaction as a dynamical constructions and a view of others' intentions as mutually accessible and accountable for. Ethnomethodology was originally developed by Garfinkel to "discover the methods that persons use in their everyday life (. . .) in constructing social reality" (Psathas, 1968, p. 509), and thus study how this reality is constructed, produced and organized in social encounters. Derived from phenomenology, it shares with it an interest in exploring the participants' embodied experience of being engaged in mundane interactions; the latter are seen as phenomena in

their own right, yet situated in specific cultural contexts and practices (see, for instance, the work of Schütz, 1967/1932). Inspired by ethnomethodology and by Goffman's (1983) work on the interaction order, Conversation analysis (Sacks et al., 1974; Sacks, 1992; Schegloff, 2007) investigates the systematic features of naturally occurring conversations. In a large body of work now spanning over five decades, it has revealed the fine, moment-by-moment coordination of speakers, and the sequential structuring that enable the orderly participation of different interactors across turns-at-talk and within complex activities. Central in this approach is a view of human communication as multimodal, where different but integrated communicative resources (verbal and non-verbal) contribute to establishing the interactional context, anticipating, co-constructing, and if necessary repairing the emergent definition of what is going on (Kendon, 1990; Streeck et al., 2011; Tulbert and Goodwin, 2011). Thus, interactions are always cooperative, inasmuch as participants orient to, monitor and support the interlocutors' understanding and act so as to enable their successive moves (Goodwin, 1995, 2013).

Intentions and goals are not searched before or behind the communicative action as its "cause," but manifest in speakers' behavior, shaped and adjusted as the interaction unfolds. Within this framework, and in convergence with enactivism, cooperating is possible even for those – like young children – who do not possess a robust capacity to "read" others' intentions or plans, but can nevertheless participate in joint, situated interactions (Forrester, 2008; Lerner et al., 2011; Mehus, 2011).

## **COOPERATION IN INFANCY**

We can now ask what this view implies for understanding cooperation in infancy. Since infants cannot remain alive alone, they need others to help them with nourishing, shelter, hygiene, and social interaction. On our account, it is to be expected that infants contribute actively to this caring, because they are themselves sense-makers, generating and maintaining their own living identity, and also, quite possibly, already their social identity (Stern, 1985/2000; Delafield-Butt and Gangopadhyay, 2013).

Hubley (1983) defined cooperation in infancy as the joint management of objects, actions or ideas to fulfill a purpose that two interactors share. She identified some minimum requirements for cooperative actions in infancy, which are (1) a shared plan of action within mutual orientation, with the infant attending to and acting with reference to the partner's indicated purposes; (2) active contributions to a single coordinated event, which, on the infant's part, is seen as a clearly identifiable and oriented action to influence the behavior of the partner and then mesh with the partner's action to complete a shared purpose; (3) willing participation. On the one hand, such a definition seems fitting with the infant's limited communicative resources as it does not imply that the partners should verbally agree on a shared plan or goal. However, it presupposes that some shared plan has been somehow established, and requires that each partner understands the interest or purposes of the other regarding the shared action. As we already argued, such an explicit agreement may not be required in all forms of cooperative interactions.

The fundamental contribution of past developmental research has been to reveal how early communicative interactions are created out of contributions of both the infant and the caregiver (Hubley and Trevarthen, 1979; Trevarthen, 1979). Bruner (1977) recognized shared reference and role-taking as cooperative features in communicative interactions involving giving and receiving objects before 1 year of age. More recent observations have demonstrated how, since very early in life, infants adjust and facilitate actions directed to them, especially in daily routines such as when the caregivers pick them up, change their nappy, or play a social game with them (Service, 1984; Nomikou and Rohlfing, 2011; Reddy et al., 2013; R¸aczaszek-Leonardi et al., 2013; Fantasia et al., 2014). Under a perspective that considers social interactions as basic forms of cooperation by participating in shared, meaningful interactions, infants practice their ability to make sense of and coordinate with the caregiver's action, becoming increasingly skilled in their social participation.

One of the criticisms we made of existing studies was that they measured children's cooperative ability when they successfully performed a joint pre-fixed task; regarding cooperation as a cognitive skill that can be switched on and off means neglecting the importance of learning processes that sprout *from* and *within* cooperative interactions. In contrast, in a here and now perspective the process of cooperating enables children to build up their actions moment by moment through a sequence of relational adjustments and (dis-)engagements toward a joint goal. Thanks to its structuring and structured nature, cooperation may be seen as a framework in which development occurs and at the same time as a mode of being with others learnt during development. If we take seriously what was proposed so far – that any interaction requires some basic cooperation, followed, in some cases, by a process of co-negotiation toward a more or less explicit goal that matters to those who are involved in that process – then we may also explain how it develops. And, at the same time, we may be able to see how participating in goal-directed joint actions supports and shapes infants' development.

# **COOPERATION IN AUTISM**

A different theoretical perspective may also open up new possibilities for investigating cooperation in autism. As reported above, empirical findings suggest that some children with autism (at different chronological ages) perform poorly in high-level cooperative tasks and in other correlated abilities, such as joint attention, imitation, perspective taking, and role-reversal (see Colombi et al., 2009). Yet, performing "poorly" does not mean that the capacity is absent, and indeed some children with autistic spectrum condition (ASC) do pass the cooperative tasks. This result is not consistent with the theoretical premises informing the design of the tests and the difficulties of children with autism, classically understood. One way to explain this (controversial) evidence may lie in changing the premises we start from, instead of *post hoc* adjustments to the interpretation.

Studies of the verbal production of children with autism that do not start from a deficit but try to understand the children's spontaneous interactional behavior, can help to illustrate and support this shift of perspective. Conversation analysis studies, for example, allow to observe how even echolalic productions (the repetition of utterances with no apparent relation to prior talk from other speakers), often seen in children with autism, are in fact responsive moves (Loca and Wootton, 1995; Wootton, 1999; Stribling et al., 2005/2006; Sterponi and Shankey, 2014). The repetition of available utterances helps children to stay in the conversation despite their difficulty with improvising a newly designed turn. Sometimes these stereotypical contributions can take the form of questions and feed the progression of an interaction, supporting the child's continued participation in a social exchange (Sterponi and Fasulo, 2010).

Dickerson et al. (2007) also show that observing what children actually do reveals capacities for cooperation that cannot emerge in pre-defined tasks, for sometimes the ways in which children find solutions for their difficulties are not incorporated into the tasks. They investigated classroom interactions between two autistic children and their tutors. The children were asked to answer questions, using answer-cards. During the session, each of the children tapped the answer-cards, an action which at first sight seemed meaningless. However, using conversation analysis, Dickerson et al. (2007) could show that the children tapped on the cards just before they started answering, and sometimes continuing into their answering. This seems to indicate that the tapping is a way of engaging and of "projecting a relevant forthcoming response on the part of the child" (Dickerson et al., 2007, p. 297). In other words, the children found means to signal their ongoing engagement when the timing of their verbal production was delayed, thus cooperating to the maintenance of the interactive plane.

Using fine-grained observational methods, the actions of all participants can be studied and analyzed in interaction, making it possible to pick up the forms of cooperation that infants and people with autism are capable of (see also Stribling et al., 2009). These examples demonstrate how the use of non-verbal and non-vocal resources for building up a co-participatory model of how the child and teachers work together becomes possible thanks to transcripts of the interactions. In this way, not only the participants' talk, but also a number of non-verbal activities that are salient for the interaction are acknowledged. These results fit well with the Vygotskian idea that collaborative work leads to learning (Vygotsky, 1978; see also Goodwin, 2013). Furthermore, these studies suggest that ways to observe cooperative interactions in autism exist, if only we consider interactions and autism from a different perspective. During everyday interactions at home or school, in the car or at the park, children with autism are involved in many simpler, not-always-explicit cooperative exchanges. Not only are the children part of these exchanges, but they also grow into them; namely, they learn to be active partners out of everyday cooperative interaction, just like every other child does. This is not to say that there are no difficulties or differences, but social understanding in autism may be more fruitfully studied from thebasic and positive perspective we put forward here.

# **IMPLICATIONS**

In summary, the perspective shift we propose has implications for understanding development as well as autism. Firstly, our approach supports a developmental stance on cooperation in that

it explores how we become cooperative interaction partners in the first place. If we assume high-order mental skills (or a great deal of "social experience") to be prerequisite for cooperating, we would not able to see how infants can grow into social interactions and gradually learn to engage with the social world around them, but rather wait until much of the development has already happened. If, on the other hand, we propose that cooperation is a form of interacting and understanding each other, it does become possible to investigate how cooperation can emerge and be learnt even in early interactions. In this perspective, cooperation in infancy is a product of development, as well as a process in which development occurs.

An interesting aspect to consider regarding development is how to conceive of cooperation in asymmetrical interactions. Infants seem to be able to cooperatively coordinate with caregivers since very early (see e.g., Reddy et al., 2013; Fantasia et al., 2014), but they may not do it with peers until later on, as suggested by some research (Warneken and Tomasello, 2006, 2007). From an enactive point of view, it is not surprising that infants are better able to cooperate with a caregiver than with a peer, since the presence of someone with more interactive experience makes the overall interaction more competent. This is related to Vygotsky's (1978) notion of the zone of proximal development, where it is possible to scaffold someone in interaction to be jointly more capable of activities they cannot yet do alone. What is needed for an interaction to be cooperative if the relation is asymmetric? If we think of a pick up situation, we know that the adult is doing the major part by actually holding the infant and lifting her up. Yet, infants are not passively waiting for it to happen. They make specific preparatory body adjustments that facilitate the mother's movements, and thus, the pick up sequence (Service, 1984; Reddy et al., 2013). At the same time, when the adult fails to complete the expected pick up sequence, infants seem to stop being cooperative by dropping their body tension and participation (Fantasia et al., 2014). In this case, although the mother has the main role in making the pick up sequence effective, the infant's role is essential in its being clearly oriented toward the joint achievement of the interaction. Obviously, asymmetry may or may not play a strong role depending on the task.

As a second point, if we are to understand autism in general, and specifically people with autism's capacity to cooperate (which is firstly a particular form of social interaction) the change of perspective we propose here may also be helpful. We may try to forsake a typical-development perspective and, as Petra Björne and other authors have already suggested, reverse our glasses, paying more attention to what people with ASC can do and the way they describe their own experiences (Björne, 2007; Robledo et al., 2012; De Jaegher, 2013; Donnellan et al., 2013). As shown by the studies on autism presented in the previous section, if we consider actions in their interactional context and in their significance for all participants, it becomes possible to understand the emergence of cooperation also in the interactions of and with people with autism. Exploring cooperation in children with autism from an observer or third-person perspective not only fails to take into account the child's experience of cooperating as an engaged partner; it also cuts out how the other person is feeling or experiencing the child as a partner. In cases like autism, in which social interactions run a different course, in which jointly attending to an object may not be at the core of the interaction, approaching cooperation from a second person perspective can make all the difference.

We thus suggest that future studies on cooperation and autism should include more ecological observations and parental reports. We expect to gain more detailed knowledge about what infants and children with autism can do cooperatively in early goal-directed interactions from taking an enactive approach. This involves: finely studying the interaction (e.g., through ethnomethodology or conversation analysis), taking into account the context or the environment (using, for instance, parental reports or ecological observations), and studying what is at stake for the individuals involved (i.e., asking how they make sense in and of the interaction).

# **CONCLUSION**

We hope to have shown that it is possible to encompass a wider range of cooperative interactions, not only those in which interactors explicitly agree upon and set rules and roles for a specific shared task to be performed. This is not to neglect that in some particular scenarios participants do need to make efforts to make sense of the other's intentions, and indeed goals need to be set out and agreed beforehand. Only, this is not always the case, as cooperation is a multi-layered process that may take different forms. In this perspective, we share Tollefsen's view that intentions-inaction can emerge out of ongoing interaction (Tollefsen and Dale, 2012), with the minimum requirement that interactors share an interactional space. Cooperation is a form of participating in each other's sense-making, in which we may form a goal or purpose together while interacting. It is not a skill that can be lacked but rather a way of being with others that is possible to learn. Learning to cooperate then becomes understandable as an important aspect of typical and atypical development. For this reason, we think that future developmental research on cooperation (and social cognition in general) could benefit from more ecological observational methods and less adult-centric approaches (Donaldson, 1978). As the adult's way of cooperating is an already fully blossomed one, one in which the picture is complete (and intentions can be easily inferred if needed), we need instead to observe infants and their daily living and discover the basic, emerging ways in which cooperation develops.

# **ACKNOWLEDGMENTS**

We are greatly thankful to Alan Costall, Beatriz López, Ezequiel Di Paolo, Vasu Reddy, Stephen Butterfill, the two reviewers and the researchers who attended the presentation of this paper at the Children and Technology Lab, Developmental and Clinical Psychology Group, University of Sussex for their suggestions, support and inspirational discussions. This work is supported by the Marie-Curie Initial Training Network, "TESIS: Towards an Embodied Science of InterSubjectivity" (FP7-PEOPLE-2010-ITN, 264828)."

## **REFERENCES**

Aldridge, M. A., Stone, K. R., Sweeney, M. H., and Bower, T. G. R. (2000). Preverbal children with autism understand the intentions of others. *Dev. Sci.* 3, 294–301. doi: 10.1111/1467-7687.00123


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 May 2014; accepted: 22 July 2014; published online: 08 August 2014. Citation: Fantasia V, De Jaegher H and Fasulo A (2014) We can work it out: an enactive look at cooperation. Front. Psychol. 5:874. doi: 10.3389/fpsyg.2014.00874 This article was submitted to Cognitive Science, a section of the journal Frontiers in*

*Psychology.*

*Copyright © 2014 Fantasia, De Jaegher and Fasulo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Going along with or taking along with: a cooperation continuum in autism?

# *Nicola Yuill\**

*ChaTLab, School of Psychology, University of Sussex, Brighton, UK \*Correspondence: nicolay@sussex.ac.uk*

## *Edited by:*

*Ezequiel Alejandro Di Paolo, Ikerbasque - Basque Foundation for Science, Spain*

#### *Reviewed by:*

*Alessandra Fasulo, University of Portsmouth, UK*

**Keywords: cooperation, development, autism, social interaction, collaboration**

## **A commentary on**

**We can work it out: an enactive look at cooperation**

*by Fantasia, V., De Jaegher, H., and Fasulo, A. (2014). Front. Psychol. 5:874. doi: 10.3389/fpsyg.2014.00874*

We welcome Fantasia et al.'s (FDF's) embodied perspective on cooperation and agree that the definition and varieties of cooperative activities need unpicking. We commend FDF's multidisciplinary approach, offering diverse methods and standards of support, from controlled experimental set-ups and ethological observations to rich descriptions of interaction. Can these approaches work together, or are they incommensurable? We think work in technology-supported collaboration in autism offers new insights.

FDF argue that autistic children are involved in many cooperative exchanges and suggest working from these "positive" perspectives. We propose distinguishing such "cooperative" exchanges from the carefully-constructed cooperative tests used in research with toddlers, by drawing on research which compares different designs of collaborative technology. Such technology is nothing artificial or exotic, just a means of understanding and altering environments to make it easier or harder for organisms to engage in cooperative interactions, e.g., co-working with an interactive surface placed horizontally vs. vertically. Studying people's interactions with different designs tells us about human interactional capacities and processes.

FDF define cooperation broadly, e.g., including echolalic productions and everyday interactions in autism. They claim that such behaviors show "ways of engaging" or signal ongoing engagement. Do we take these behaviors as deliberate signals of engagement, or as merely interpreted by the other as engagement, which might then bootstrap the development of intentional engagement in the child (a strong tradition in developmental psychology: Kaye, 1982)? A useful heuristic is seeing cooperation on a continuum between "going along with" and "taking along with"—committing to considering the interaction, rather than the individual, as unit of analysis.

FDF suggest children with autism are involved in cooperation in everyday social interactions. There just *is* a degree of cooperation between parent and child: it might be difficult to get your child through the regimented routine of a school day, but it would be impossible if the child offered no cooperation. Parents and schools typically work hard to scaffold this basic cooperation, e.g., using visual timetables representing each step, or shaping behavior through reward regimes. But these examples of cooperation, which we term "going along with," are asymmetrical, with the child often required to comply with the needs of the adult world, rather than having shared goals. An example at the other end of the continuum lies in therapies such as Intensive Interaction, which focus on an adult following and adapting to the child's actions, in the hope of the child recognizing the therapist's behavior as a response contingent on the child's behavior: the child is given the power of eliciting such responses, "taking the adult along with" them. The fact that these interventions produce "engagement" by observer judgment (Escalona et al., 2002) suggests that cooperation (=taking along with) is possible for many autistic children. Synchronization is primarily therapist-driven, but the direction is determined more by the child.

The facet of cooperation that involves engaging the other in joint action seems minimal or absent, both in experimental studies of cooperation in autism and in descriptions of everyday behavior of children with autism. We could interpret this as being a "deficit" of autism, lacking spontaneous intrinsic social motivation, unlike typically-developing children, who quickly adopt ideas about what role the other should play in an interaction, and enforce even relatively novel norms of behavior (Schmidt and Tomasello, 2012). Cooperative novice-expert interactions are typically smooth, but their mechanics can be revealed by observing breaches, e.g., still-face paradigms, participants with autism. This should help in investigating dynamical "taking along with," given novice-expert pairings work together so invisibly smoothly. Our (Holt and Yuill, 2014) studies of paired children *both* on the autism spectrum enabled us to address both "going along with" and "taking along with," given that the children have to collaborate with each other, rather than with a compliant and stronglyscaffolding other. We demonstrated contingent action between such pairings using a dual-control game that stalled progress until participants' responses matched. Active other-awareness occurred here, but not in a similar setting without constraints to support contingency. Thus, the children showed collaborative capacity only in environments constrained to support it. Subsequent work-in-progress with touch-technology further clarifies three prerequisites for any collaboration to occur: understanding the activity (i.e., criteria for performing tasks), coordinating action with the partner (going along with), and fostering coordination of the other's activity with their own (taking along with). For pair success, both children must understand the goal of the activity itself and at least one child must be able to coordinate his behavior with the partner's, even if the partner cannot reciprocate. Thus, children could successfully play together if just one child could follow or match his behavior to that of a lessable child not displaying any contingent behavior. However, a more complex form of collaboration is required if there is a further constraint, of shared solutions being correct. With such a constraint, then at least one partner needs to realize this and to bring his partner along to the right solution, necessitating "mutual engagement*...* in a coordinated effort" (Roschelle and Teasley, 1995, p. 70).

We argue that "going along with"/"taking along with" marks a useful *gradation* in cooperation, encompassing both FDF's "everyday cooperative interaction" and the more structured requirements of lab-based cooperative tasks. A sharp dichotomy between the broader idea of engagement as a prerequisite of cooperation and the narrower focus on agreed, planful, outcome-directed joint working, loses the benefit of the enactive approach in uniting literature across paradigms and blurring the classical motivation–cognition divide. We must reconcile top-down theoretical claims about prerequisites of "true" collaboration with questions driven by observation of everyday behavior, to consider similarities and differences in cooperative encounters in different groups of participants (e.g., toddlers, people with autism) and to characterize the place, in collaborative activity, of a sense of joint engagement, from second- and third-person perspectives. The debate underlines the need, in studying cooperation, to consider the behavior of both participants in relation to each other; it is the interaction that is cooperative, not only the participants.

# **ACKNOWLEDGMENTS**

I am grateful to members of the ChaTLab for useful suggestions and very stimulating discussions about this response.

# **REFERENCES**

Escalona, A., Field, T., Nadel, J., and Lundy, B. (2002). Brief report: imitation effects on children with autism. *J. Autism Dev. Disord.* 32, 141–144. doi: 10.1023/A:1014896707002


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 19 September 2014; accepted: 19 October 2014; published online: 05 November 2014.*

*Citation: Yuill N (2014) Going along with or taking along with: a cooperation continuum in autism? Front. Psychol. 5:1266. doi: 10.3389/fpsyg.2014.01266*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Yuill. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Jointly structuring triadic spaces of meaning and action: book sharing from 3 months on

#### *Nicole Rossmanith1 \*, Alan Costall 1, Andreas F. Reichelt 2, Beatriz López <sup>1</sup> and Vasudevi Reddy1 \**

*<sup>1</sup> Centre for Situated Action and Communication, Department of Psychology, University of Portsmouth, Portsmouth, UK <sup>2</sup> Cognition and Action Lab, Centre for Neuroscience Studies, Queen's University, Kingston, ON, Canada*

## *Edited by:*

*Ezequiel Alejandro Di Paolo, Ikerbasque - Basque Foundation for Science, Spain*

#### *Reviewed by:*

*Jonathan T. Delafield-Butt, University of Strathclyde, UK Patricia Zukow-Goldring, University of California, Los Angeles, USA Kaya De Barbaro, Medical Research Council, UK*

#### *\*Correspondence:*

*Nicole Rossmanith and Vasudevi Reddy, Centre for Situated Action and Communication, Department of Psychology, University of Portsmouth, King Henry Building, King Henry 1st Street, Portsmouth, Hampshire PO1 2DY, UK e-mail: nicole.rossmanith@ port.ac.uk; nicole.rossmanith@univie.ac.at; vasu.reddy@port.ac.uk*

This study explores the emergence of triadic interactions through the example of book sharing. As part of a naturalistic study, 10 infants were visited in their homes from 3–12 months. We report that (1) book sharing as a form of infant-caregiver-object interaction occurred from as early as 3 months. Using qualitative video analysis at a micro-level adapting methodologies from conversation and interaction analysis, we demonstrate that caregivers and infants practiced book sharing in a highly co-ordinated way, with caregivers carving out interaction units and shaping actions into action arcs and infants actively participating and co-ordinating their attention between mother and object from the beginning. We also (2) sketch a developmental trajectory of book sharing over the first year and show that the quality and dynamics of book sharing interactions underwent considerable change as the ecological situation was transformed in parallel with the infants' development of attention and motor skills. Social book sharing interactions reached an early peak at 6 months with the infants becoming more active in the coordination of attention between caregiver and book. From 7 to 9 months, the infants shifted their interest largely to solitary object exploration, in parallel with newly emerging postural and object manipulation skills, disrupting the social coordination and the cultural frame of book sharing. In the period from 9 to 12 months, social book interactions resurfaced, as infants began to effectively integrate manual object actions within the socially shared activity. In conclusion, to fully understand the development and qualities of triadic cultural activities such as book sharing, we need to look especially at the hitherto overlooked early period from 4 to 6 months, and investigate how shared spaces of meaning and action are structured together in and through interaction, creating the substrate for continuing cooperation and cultural learning.

**Keywords: infant development, intersubjectivity, triadic interaction, action coordination, joint-attention, participatory sense-making, picture book, longitudinal studies**

# **INTRODUCTION**

How do we arrive at a shared world? We jointly act in, communicate about, transform and co-create our world. In the process, we smoothly navigate and build complex networks of meaningmaking involving persons, objects, and symbols. How do children grow in and into culture? How do they become competent participants in cultural practices, in networks of meaning-making including people and artifacts?

Researchers interested in cultural and social learning mostly start looking from the end of the first year, a period often characterized as a major shift, even revolution ("secondary intersubjectivity" Trevarthen and Hubley, 1978; "9 month revolution" Tomasello, 1999) in development, when infants engage in a number of qualitatively new ways of interacting such as jointly labeling things, following instructions, imitating acts on objects, or frequent gaze checking with their parents. At this point infants are credited with engaging in true triadic interactions, and are considered capable of coordinating for the first time their engagements with objects and their engagement with people. The transition is often seen as the convergence of two lines of development considered to be separate before this point: dyadic infant-caregiver communication and infant-object interaction. This convergence is supposedly mediated by a newly emerging capacity for visual joint attention only then giving rise to conventional labeling and language use, conventional object use and symbolic activities in general, often associated with cultural learning. Interestingly, the seminal studies which constitute much of the empirical basis of this developmental narrative (Trevarthen and Hubley, 1978; Hubley and Trevarthen, 1979; Bakeman and Adamson, 1984), document early modes of combined social and object engagement termed joint praxis and passive joint engagement, respectively. Looking at the data reported, the studies actually show a gradual rather than revolutionary shift toward active triadic engagement on the part of the infant. Hubley and Trevarthen describe how caregivers first introduce their own body (games of the person) and later objects (marking and animating them) as a third pole into their social engagement with their infants. Adamson and Bakeman (1984) document how caregivers change their marking of objects over the course of the first year toward more conventional forms. These data have begun to be picked up on only very recently (De Barbaro et al., 2013; Nomikou et al., 2013; see also Moro and Rodríguez, 2004; Zukow-Goldring, 2012) The standard narrative has also recently been challenged by experimental studies documenting aspects of labeling, and joint attention in infants already at 6 months (Striano and Reid, 2009; Bergelson and Swingley, 2012).

Here we take book sharing as a model activity to explore the development of triadic infant-caregiver-object interactions. In a longitudinal study looking at infants' everyday life activities from 3 to 12 months, this activity turned out to be one of the earliest social interactions involving a complex object, occurring from as early as 3 months.

This early occurrence raises the question: how can infants who are preverbal, do not yet understand the referential character of pictures, and—supposedly—do not have command of joint attention, meaningfully participate in a book sharing activity? As one of the earliest jointly practiced cultural object routines, book sharing provides an excellent model for exploring (1) how a joint object activity is practiced and sustained between asymmetric interaction partners; (2) as an inherently semiotic activity, involving the guiding and mutually orienting of attention, and shared meaning, it allows us to explore how triadic interactions involving mutual coordination and orientation toward common points of reference develop over the first year of life.

While there is an extensive literature on picture book sharing, most studies start looking toward the end of the first year (Ninio and Bruner, 1978; Fletcher and Reese, 2005; but see Van Kleeck et al., 1996), and primarily focus on educational achievements associated with the cultural technology of book reading such as labeling and word learning, picture understanding, and literacy skill.

Here we focus on how the activity of book sharing unfolds, how caregiver, infant, and book respectively guide, sustain, and constrain the unfolding interaction. Taking the interaction as our level of analysis, we draw—in addition to approaches from developmental psychology—on concepts from embodied, situated, dynamical and enactive cognitive science (Fogel, 1993; Thelen and Smith, 1994; De Jaegher and Di Paolo, 2008), adapt methods from ethnography, conversation and interaction analysis (e.g., Goodwin, 2000; Alac, 2005; Streeck et al., 2011; Deppermann, ˇ 2013) and use qualitative micro-analysis to explore how, from the interplay of multiple modalities, shared spaces of meaning and action are created around objects and change over time.

# **MATERIALS AND METHODS**

The book sharing activities documented in this paper have been collected as part of a naturalistic longitudinal study investigating the development of triadic infant-caregiver-object interactions over the first year of life especially focusing on conventional practices and encounters with everyday objects. Ten infants were visited in their homes once a month from 3 to 9 months of age and 7 of them up to the age of 12 months. A smaller pilot study with 6 infants at 3, 4, 5 as well as 9 months of age (3 located in Vienna, 3 in the UK, 4 girls, 4 first ones, 2 of them girls) was conducted in advance of the main study.

## **PARTICIPANTS**

Of the 10 families participating in the study, 7 were from the UK and 3 from Austria. They were recruited from a wider circle of friends and family acquaintances, from mother and infant groups, as well as through word of mouth and flyers. All infants were living in middle class households with two caregivers and were raised in a monolingual (English or German) environment except one boy raised bilingually in German and Russian. The primary caregivers (mothers in all cases) all had tertiary education and took an active interest in supporting the infant's education. Six of them (all in the UK) returned to either part time or full time work during the course of the study. Of the 10 infants 5 were female and 3 (2 boys and 1 girl) were first born. None of them had medical or cognitive problems.

## **HOME VISIT OBSERVATION PROCEDURE AND DATA COLLECTION**

A typical home visit lasted 3–4 h, spanning 1–2 sleep-wake cycles of the infants. One to two observers accompanied infants and caregivers with a video camera (Panasonic HC-V500 in iframe format: 960 × 540 pixels resolution, 25 frames per second) documenting their everyday activities as they unfolded. For static situations a tripod camera mount was used, though for a large number of cases we switched to a handheld camera approach to capture dynamic scenes especially after infants became mobile. Also, field notes were taken detailing the behavior of the infants, caregivers and siblings, including object and socially directed behavior, layout of the environment, and availability of objects such as toys and tools. In addition, reports from parents were collected giving additional background information on object use. The study was approved by the Psychology Research Ethics Committee of the University of Portsmouth, and was conducted in accordance with the 1964 Declaration of Helsinki and the Code of Human Research Ethics of the BPS. Parents provided written informed consent for the study.

## **DATA MANAGEMENT AND ANALYSIS**

From these raw data, 300+ hours of video recordings, a video library was constructed in Final Cut Pro X (Apple Corporation). Episodes were tagged with keywords organizing activities into basic ecological activity categories, including *(breast) feeding, diaper change, "witnessing," soothing, social and/or object play, book sharing, sibling interaction, watching TV*. In addition, infantcaregiver-object interactions as well as mutual coordination and orientation episodes were marked. For the purposes of this paper, "book sharing" was selected as a model activity for investigating the development of participation in joint cultural activities and coordination of triadic engagements.

In total 124 book interaction episodes (excluding 15 infantresearcher interactions) were identified and described. For an episode to be counted as a book interaction infants needed to be engaged with a book for at least 30 s. If after a period of disengagement—seen here as an integral part of (especially joint) activities (Stern, 1971; Brazelton et al., 1974; Tronick, 1989; De Jaegher and Di Paolo, 2007)—re-engagement did not occur within 30 s, the book interaction was considered to have ended at the point of disengagement. For all episodes, the actors (infant, mother, father, sibling, . . . ), actions and objects used (types of books), as well as spatial configuration were cataloged.

We distinguished between 2 different types of book interactions: (1) social book sharing (72 episodes), and (2) solitary book exploration (52 episodes). For a book interaction to count as social book sharing the participants each had to be engaged with the book (via gaze or other book oriented actions, e.g., grasping, pointing to, or verbally referencing a page) and to coordinate their engagement, that is, to adjust their behavior in response to and in anticipation of each other's—book or partner directed—actions (Bühler, 1927/2000; Fogel, 1993; De Jaegher and Di Paolo, 2007). For each type of book interaction, the number of occurrences and duration of the episodes was determined across ages and families, and basic analysis and visualization was performed using Python (numpy, scipy, and matplotlib packages, free software).

## **QUALITATIVE MICRO-ANALYSIS OF SELECTED EPISODES**

Of the 72 social book sharing episodes, 20 episodes were selected for further qualitative analysis using the following criteria: (a) only caregiver-infant interactions without siblings to reduce complexity, (b) sampling of interactions from every age group, and (c) richness of interactions including attention and action coordination and communication. These selected episodes were transcribed and analyzed drawing on methods from conversation analysis and interaction analysis, adapted to the study of preverbal infants, with a special focus on embodiment and multimodality (Goodwin, 2000; Alac, 2005; Demuth, 2012; Deppermann, ˇ 2013). The analysis was performed in ELAN (free software, The Language Archive, Max-Planck-Institute for Psycholinguistics, Nijmegen Brugman et al., 2004) with audio pitch and intensity extraction performed in Praat (free software, by Paul Boersma and David Weenink, University of Amsterdam).

The videos were repeatedly viewed and described in an iterative process looping back and forth between video and transcript (using ELAN), including gross description, and particular tiers for vocalization, audio pitch and intensity, action and gaze of caregiver and infant. Thus a multi-tiered, parallel record of the episode was constructed and visualized similar to a music score sheet, mapping a range of descriptors to the video stream and relating them to each other in time. Using these visualizations, we analyzed the sequential organization of the actions and how the various strands of an action, spanning multiple modalities, relate to each other and play together in the coordination of action. Transcripts were compared across infants and ages. Some transcription and video stills from ELAN are also used for purposes of illustration.

# **RESULTS AND DISCUSSION**

# **GENERAL RESULTS: POPULATION LEVEL RESULTS, THE "UMWELT" OF THE INFANTS AND THREE BOOK SHARING EXAMPLES**

# *Population level results*

Book sharing was practiced in all 10 participating families (ranging from 2 to 20 episodes per infant). We documented the activity from as early 3 as months (4 families) right from the beginning of the observation period, and no later than by 6 months for all families. To our knowledge, this is the first time book sharing interactions at this early age have been described in the literature.

Social book sharing provided the context for infants' first encounters with books. Later, in the second half of the first year, they also began to approach and interact with books on their own in solitary book exploration. **Figure 1** (top) shows the number of occurrences of book interaction episodes for all infants observed in the longitudinal study, by age group and type (social or solitary). Note that we include these data to give an overview of the distribution of episodes forming the basis for the qualitative study. Also note the overall small sample size and that key variables such as the frequency of book sharing offers, and presence and comparability of books in the environment were not controlled for in the naturalistic study as would have been the case in an experimental study. Throughout we focus on two relatively robust measures to complement insights about the changing nature of book interactions gained from qualitative analysis: (1) the relative prevalence of social vs. solitary book interactions, and (2) the changes in mean episode duration over the course of the first year. While social book sharing interaction occurred from as early as 3 months, solitary book exploration episodes started to occur at around 6 months, displacing social book sharing as the dominant type of interaction at 8–9 months. From around 10 months on, social book sharing interactions became dominant again until a balance was reached at 12 months. **Figure 1** (middle) shows the mean durations (in seconds) of book sharing episodes for all infants, by age group and type. Starting from durations of around 2 and a half minutes at 3 months, mean durations increased considerably from 4 months reaching a peak of over 6 min at 6 months. From 7 months on, mean durations showed a sharp decrease, as book sharing interactions dropped by more than half to around 3 min duration and then stayed relatively constant. Social and solitary book interactions accounted for from around 1% (at 3 months) to around 5% (at 6 months) of the total recorded time that infants were awake on average at each month as shown in **Figure 1** (bottom), with their distributions largely reflecting the overall trend from social to solitary to balanced book interaction and the reduction in mean episode duration after 6 months.

# *The "Umwelt"1 of infants at 3–4 months of age*

Before turning to the book sharing interactions in detail, we provide a sketch of the larger context of everyday life with a 3–4 month old infant as it presented itself in the study and is described in the literature. How do infants engage with their world at 3– 4 months and what does their world look like at this age? At 3 months of age, infants are getting more and more interested in their surroundings. They have good control over their gaze (with a well developed oculomotor system) and increasingly look at and track objects in their environment (Von Hofsten and Rosander,

<sup>1</sup>Notion by Jakob von Uexküll (Uexküll, 1921; Uexküll et al., 1956). An interpretation in contemporary terms: *Umwelt* refers to those aspects of the environment an organism can interact with—i.e., effectively perceive, distinguish and act on (= the sum of prospective functional action-perception cycles)—and which hence constitute the organism's meaningful world. This world is subjective, different organisms/subjects who have different histories and possibilities of interaction live in/enact different worlds.

1997) Apart from that, however, their possibilities for effectively interacting with their world are quite restricted: they are able to hold and move their head, but are not yet able to support their body, turn or move about. Accordingly, the infants in the study at this age spent a lot of time either in a supine position, lying on their backs, or in a reclined sitting position with their backs supported in a baby rocker. In accordance with their postural capacities, they were able to perform coordinated whole body movements, reach toward and start hitting objects, but were not yet able to effectively grasp, mouth or manipulate objects (for a review of the developmental trajectories of motor skills see Adolph and Berger, 2011).

At 3–4 months infants are, however, already fluent conversation partners: by then, they have already actively participated in dyadic proto-conversations with their caregivers for several weeks, fully utilizing and practicing all their capacities including gaze and facial expressions, vocalizations, and rhythmic coordinated whole body movements (Trevarthen, 1974; Bateson, 1975, 1979; Snow, 1977; Bullowa, 1979; Masataka, 2003). Not only are they aware of the dialogical, mutual give-and take character of the interaction—getting upset when the mother's face became unresponsive (Tronick et al., 1978) or when confronted with a friendly but non-contingent (playback) response (Murray and Trevarthen, 1985)—but they are able to regulate their own state of arousal as well as the course of the interaction by turning their gaze and head toward or away from the caregiver (Stern, 1971) and even seem to be able to place their own vocalization exactly at the right time and place at the right pitch in jointly created vocal phrases (Malloch, 2000; Malloch and Trevarthen, 2009).

As infants now take a wider interest in their surroundings (Trevarthen and Hubley, 1978),—in tandem with their increased waking and attentional periods—while still lacking the means to pursue their active interests, to explore or manipulate the world on their own—they pose a new set of challenges and opportunities to caregivers. Therefore, at this stage a large part of caregiving activities observed in the longitudinal study—apart from feeding, diaper change and putting them to bed—was to keep infants content and "entertained": the caregivers in the study responded to this challenge both by taking the infant to the world and by bringing the world to the infant. They did the former by taking the infants along with them, when doing their daily chores, e.g., placing them in a baby rocker, so they had a good view of the activities, regularly addressing them and bringing household objects or food items to their attention (e.g., rhythmically moving and labeling them) and occasionally also within their reach. They did the latter through presenting, looming and animating everyday life objects as well as specifically designed toys. Caregivers also placed them in specifically designed environments such as activity mats and baby-gyms where they were able to interact with objects dangling from toy bars. In contrast to their previous exposure to only a small range of objects, a whole range of new and manipulable objects now enter the infant's world.

Thus infants were introduced to objects very early at 3–4 months in the context of social interactions. This was also the context in which infants first encountered picture books and book sharing, which took 2 different forms: (1) Their caregivers directly engaged them with books, often specifically designed for young infants. (2) They took part in the picture book reading activities of older siblings and caregivers.

# *Three examples of early book sharing interactions*

**Figure 2** shows three instances of very early book sharing with 3-month-olds. Example A shows a 3-month-old boy vocalizing toward a black and white high contrast face pattern in a book specifically designed to engage very young infants, even newborns, to meet their particular skills, needs, and interests. In the second example, B, a mother is rustling the crinkly pages of a brightly colored book to soothe her crying 3-month-old daughter. As the infant abruptly stops crying, she begins to engage her daughter in more conventional book sharing, drawing attention to pictures, turning pages, and inviting participation. The infant now and again grasps, holds onto, and crumples the soft pages producing more crinkling noise. In example C, after demonstrating page turning as an action of suspense and release—when a new page is revealed—the book is presented and held in place within the reach of the infant. The book with its rigid pages, solidly bound together at one end, provides a stable structure to interact with that is still highly flexible with easily movable parts along a single degree of freedom. This allows the infant not yet able to properly grasp an object to nevertheless effectively turn pages, thus exerting control over his sensory stimulation.

These three book sharing episodes are examples of early infant-caregiver-object interactions in everyday life, where the object—the book—plays a central role in the interaction. These books have been specifically designed to meet the infants' needs: their physical properties are adapted to the infants' perceptual capacities (high contrast patterns, crinkly pages), and serve as a scaffold for their rudimentary motor skills (rigid pages). In contrast to conventional books, this design emphasizes the effective interaction with the medium, the physical properties of the book and pragmatic actions performed on them. The specifically designed books serve as a bridge between the capacities and needs of infant and caregiver, as well as between caregiving and the cultural practice of reading. Indeed, in all three examples specific material aspects present in the book also capture and afford some of the general, mainly pragmatic aspects of conventional book reading: the format of the book itself is present, as is the format of the activity that has a definite beginning and end

**FIGURE 2 | Three examples of book sharing with books specifically designed for young infants. (A)** Visually engaging a 3-month-old with high contrast patterns. **(B)** Soothing a 3-month-old with crinkly pages. **(C)** Scaffolding a 3.5-month-old's motor skills with rigid pages.

corresponding to working through a book from cover to cover, as well as the activity of page turning. Even more, already at 3–4 months, infants regularly experienced episodes involving the full range of book sharing typical for older children including more conventional, complex, and semiotic aspects such as pointing, content labeling, as well as reading and narration (Fletcher and Reese, 2005) as will be discussed in more detail in the next section.

# **EARLY OCCURRENCE OF SMOOTHLY COORDINATED BOOK SHARING INTERACTIONS AT 3–4 MONTHS OF AGE**

Given young infants' inability to interact with objects on their own yet—in contrast to their active role in proto-conversations and the widely held theoretical view that they are not yet able to co-ordinate their engagement between people and objects (Hubley and Trevarthen, 1979; Bakeman and Adamson, 1984; Carpenter et al., 1998; Tomasello et al., 2005) the question now arises: How do book sharing interactions work at a micro-level, how do they unfold over time? How are they initiated and sustained, and what are the respective roles of the participants?

## *The contribution of the caregivers: establishing contact, carving out interaction building blocks, patterning and shaping actions*

*Establishing contact.* As shown above, caregivers were instrumental in introducing objects to very young infants who thus far are unable to approach or handle them on their own. Often caregivers took their cue from the infants' behavior: either following up on infants' gaze or action impulses, or, conversely, in trying to divert them out of their current state (e.g. pain) caregivers moved to establish contact between the infant and an object to engage with and build up a shared activity around it.

In the example shown in **Figure 3** the mother visually presents a book to her 4-month-old son, who is sitting between her legs leaning against her, and puts it in his reach. She starts with a sharp intake of breath indicating surprise (".h") (Zukow, 1982), then, pointing dynamically by moving her left index finger up and down over the pictures of the book cover, follows this up with "Look at the cats," while the infant is looking at the book continuously. (For transcription conventions see glossary).

As shown in this example, establishing contact between infant and object often involved visual presentation, ranging from static "offering," placing an object into the infant's view and reach, to more dynamic actions including "animating" the object, such as moving it to and fro, looming, or acting on the object. In the case of books, which were seldom animated by mothers, this prominently included performing dynamical pointing gestures, as in the example above. In addition, caregivers produced a number of different vocalizations ranging from general and unspecific exclamations of surprise (".h"), via imperatives ("Look!"), questions ("What's that?") to specific labels for objects or object parts ("a book!"), and content such as pictures ("an elephant!"). Among these, the most frequently used in the dataset was a sharp intake of breath indicating surprise (".h") combined with raised eye brows, wide eyes and open mouth.

Functionally speaking, caregivers are doing two things at once. First, they are capturing and directing the infant's attention, often utilizing the auditory domain to highlight and mark the visual presentation of an object. Second, they are making an object available to the infants to interact with "as a unit"—in this case the book itself or one of its parts. Such actions actively foreground—or even create—the object for the infant to interact

**FIGURE 3 | Mother multimodally presenting a book, holding it within reach of her infant: Introducing the book to the infant (A), marking the animals on the title page by dynamical pointing and vocal labeling (B–D), opening the book with the infant attending (E,F), more dynamical pointing drawing the infant's attention (G,H), who subsequently acts on**

**the book (I,J).** Below the camera stills, an ELAN analysis detail documents, from top to bottom: audio traces (pitch in red and intensity in green), and annotation tiers. Tier label abbreviations used (from top to bottom): mothervoc: mother vocalizations, motheract: mother (manual) actions, babyact: infant actions, babygaze: infant gaze, and babyvoc: infant vocalizations. r: right, l: left.

with "as a unit" by "carving it out" against the background and various other ways to parse a scene, compare Zukow-Goldring's notion of "educating attention" (Zukow-Goldring, 1997, 2006, 2012).

Thus guiding the infant's attention and foregrounding or "carving out" "building blocks" to interact with, are two partly overlapping processes. They often involve performing a variety of activities composed of various strands of actions, which appeal to one or another of the infant's modalities and which can either be used (a) in close succession or (b) simultaneously, adding one on top of each other combining them into a complex multimodal action. It is especially this multimodal structure of the activity, in particular invariant relations across modalities, which provides infants with opportunities to extract coherent perception and action units (Zukow-Goldring, 1997; Bahrick and Lickliter, 2012).

*Carving out interaction building blocks and embodying meaning.* Book sharing, with its wide range of semiotically rich materials, physical spine-and-page-structure, pictures, spoken words, printed text, rhymes, narratives and referential acts is mostly about learning about, sharing, and negotiating "units" or "building blocks" to interact with, which form the public cultural interaction space. That is, these book related actions are very similar to "guiding attention and making objects available for interaction" described above; only many of the "units" forming the cultural interaction space are more abstract and are not directly graspable. Children become familiar with those "units," how they relate to each other (pictures to pictures, words to words, pictures to words), and how all of these potentially map onto actions and relations in the world outside, and above all how to jointly manipulate and act upon them.

So how is book sharing practiced with an infant, who is preverbal, does not yet understand the referential character of pictures (DeLoache et al., 2003) and—supposedly—does not have command of joint attention either? While, as described above, the books designed for infants highlight particular physical properties adapted to their sensorimotor needs and interests, book sharing even at an early age is not at all restricted to interacting with an "interesting stimulus" or "object for manipulation." Instead, young infants already encounter the whole range of book sharing actions.

In **Figure 3** the mother is sitting on the floor supporting her 4-month-old infant boy between her outstretched legs. Throughout, she is closely following the prototypical book sharing protocol: reading out rhymed text, accompanied by additional pointing and labeling, as well as making comments relating the story to the infant's life. On his part, the infant is intently looking at the pictures, his gaze drawn through dynamical pointing, and from time to time acts on the book, either by banging or grasping the pages, which gets transformed into page turning with the support of his mother.

Neither is the infant in this interaction merely exposed to an arbitrary set of interesting stimuli and action affordances, nor does the mother blindly follow the cultural conventions. Rather, at key points in the activity, the mother is making selected parts and aspects of content and the overarching narrative accessible to the infant, making them meaningful to him through embodying and enacting them and giving them patterns of affective salience and arousal.

**Figure 4** shows the mother making characteristic animal actions "come alive" and accessible to her 4 month old son through enacting the essence of "leaping" and "jumping"—a rising motion—through a rising intonation contour "This is the speedy kangaroo, she jumps and she LEAPS," "here's a smooth gray dolphin jumping in the Air."

Whereas in the above example the enactment takes place solely within the action medium of speech—typically utilized in picture book sharing—there are also much more extensive and thorough forms of enactment and embodiment.

In **Figure 5** the mother tells her by now 5-month-old son about baby Humphrey having "a BI::g YA:::wn and a STREtch, going 'UAAAHHH."' First, she utilizes prosody again, drawing out the words "BI::g YA:::wn," thus temporally expressing the

**FIGURE 4 | ELAN analysis detail showing pitch (red) and intensity (green) curves.** The mother is reading a picture book about animal actions to her 4-month-old son enacting the essence of "leaping" and "jumping" (a rising motion) through a rising intonation contour (highlighted).

extension of "bigness" and at the same time already enacting the yawn. But then, as the text itself goes on to onomatopoetically illustrate the yawn "going UAAAHHH" she adds another layer: turning to the infant, grasping first one hand and then the other and gently pulling them into a stretch while performing the yawn, she is embodying and enacting the meaning directly with the baby's body.

In this case expressing "meaning" is no longer simply "talking about" something or "depicting" something but rather encompasses fully realizing the action itself. Only that in this special case the action of yawning and stretching, referenced in the book, is now happening in a different context than it usually would, i.e., when the infant is tired or being put to bed. Rather, this context is created and defined by the book. And as the mother is gently acting on her infant's body, taking him through the motions of stretching and at the same time performing the yawn, mother and infant closely share the meaning and the action in the sense of taking part in and realizing it together (Alac, 2005; Zukow-Goldring, ˇ 2006, 2012; Zukow-Goldring and Arbib, 2007).

*Patterning actions and shaping actions into action arcs.* Describing how objects or rather "units for interaction" are carved out to form the building blocks of a shared meaning and action space covers only one aspect of how such a space is created. This section will explore how the actions the partners perform are themselves structured in the course of interaction, highlighting the dynamic form of the jointly structured interaction space.

Two aspects of "structuring of actions" can be distinguished: The first is the temporal patterning, punctuation, and "chunking" of actions, also leading to the creation of "events" in the flow of action (Nomikou and Rohlfing, 2011). Examples include: the rhythmic multimodal performance of a monkey noise ("Ooh-Ooh-Ooh-Ooh-Ooh"), the marking and highlighting of action parts by exclamations (".h!," "Look!"), the labeling of action parts ("now we TURN the page"), and direct invitations ("Can you turn the page?"). Second, beyond patterning and chunking, caregivers structure actions by continually shaping parts of activities into bigger or smaller dynamic "action arcs" with a beginning, build up, climax, and resolution (compare Brazelton et al., 1974; and notions of "vitality contour" Stern, 2010; "narrative" or "shared project" Delafield-Butt and Gangopadhyay, 2013; Trevarthen and Delafield-Butt, 2013).

To illustrate this we will look at the example of page turning (**Figure 6**). The mother sets the stage by drawing attention through the surprise exclamation ".h!" and announcing the action of page turning with the question: "What's on the next page?" Then she starts developing the action arc: leaning forward, repeating the question followed by two more ".h!" surprise exclamations of increasing intensity and pitch, she builds up tension which is mirrored in the growing arousal of the infant, indicated by her increasing movement, body tension, and facial expression, culminating in her mouth dropping open and a sharp intake of breath just before the climax. After a short hesitation—drawing forth the tension still further—a sudden quick page turn releases the tension and the arc levels off and comes to a close in a soft, whispered "There we go," coinciding with the infant relaxing and closing her mouth again.

This shaping of action arcs is found across all kinds of actions and at different levels and multiple timescales within an activity, nested into one another. At a high level, the activity of book

sharing as a whole can be considered as an "overarching" action arc structure defined by the physical arrangements of the pages to be turned from cover to cover as well as the organization of the narrative. A smaller scale action arc is defined by each double page, the unit visible at a given time, and often structured by a (rhyming) pair of lines, the first ending in a slight rise continued in one breath (enjambement) to the second one, and coming to a close in a fall in pitch and intensity. At the basic level, action arcs re-occur with any interaction unit, be it the turning of the page itself a literal rise and fall, labeling of a picture, posing of a question, etc. Relevant words were typically placed at the peak of an action arc, and infants often looked at the caregiver's face at the peak of an action arc, as well as in a pause after an action arc's closure.

## *What about the role of the infant?*

To what extent do infants actively participate in early book sharing interactions?

As briefly discussed above, it was often the infants' behavior which was prompting the caregiver to introduce an object into the interaction, which—in case the infant let him- or herself be engaged—then led to a shared object activity. Such "active interest," that is, staying content and maintaining attention on the activity might already be considered as a form of "active participation." Though at this age attention could easily be drawn especially by moving stimuli and also easily wandered away from time to time, infants were already able to some extent to actively control their gaze and hence their engagements. That the shared activity indeed requires an active contribution on the part of the infant became evident from cases when they withhold participation—which did not only happen when they got fussy, but also when they lost interest and kept looking away—and then there simply would not be any shared activity.

When successfully engaged, infants typically were alert and showed "serious intent" with knit brows and widely opened eyes, the type of engagement Piaget (1962) described for the adaptive mode of being absorbed in—and letting oneself be "informed"—in object exploration. Thus—at least for the youngest infants in the study—this shared activity looked somewhat different from other social interactions (e.g., social games) of the same infants at the same age, where more explicit expressions of joy such as laughter were observed.

However, even though not a single case of laughter in relation to a book was observed before 6 months, there was some affective communication going on in book sharing at this age: besides serious intent, a neutral expression, and occasional cases of overall fussiness, there were several instances of infants and caregivers engaging in a mutually attuned build-up of arousal in which infants showed great excitement through their bodily movements (e.g., the example of page turning discussed above, see **Figure 6**). Later, from around 6 months, laughter and a whole range of facial expressions were observed in an intricate emotional interplay going on between book or story, mother and infant (see Section "Ecologies in transformation").

While caregivers significantly shape book sharing activities with 3–4 month old infants by guiding attention, inviting and scaffolding actions, infants actively participate by showing "active interest" and being responsive, amenable to their caregivers lead, letting their attention and actions be guided, and readily accepting the caregivers' invitations to engage with objects offered (De Barbaro et al., 2013).

Young infants also showed active participation in a more conventional sense in their active movements, especially manual object manipulation as far as it lay within their range of action. Whenever possible, such actions—e.g., getting hold of the edge of a page—were interpreted by the caregiver in terms of the culturally established book sharing framework ("Do you want to hold the book?," "Can you turn the page?"), and shaped it into the frame of the book sharing activity as far as possible. These actions, however, also sometimes got in the way of the activity, especially when they could not be made to fit the book sharing frame, as when infants would not let go of a page and their own actions became their primary focus of attention (see Section "Ecologies in transformation").

## *The interaction unfolding in the interplay between infant, caregiver, and object*

After discussing the roles of mother and infant separately let us now look at one example in more detail in order to see how infant, caregiver, and artifact come together and how—out of this interplay—an interaction arises.

In this 13 s sequence (see **Figure 7**) the mother is sitting on the couch with her 4-month-old boy sitting on her knee, facing away from her. Both are looking at an open picture book featuring brightly colored cat pictures and "touchy-feely" textures, which the mother is holding in front of the infant. The sequence begins with the mother rhythmically reading out a line in verse: "I love THIS friendly kitten with the VE:::Lvety so::ft NO::::::::se." thus turning it into a two arc structure: the first arc is dominated by the deictic "THIS" which—with a sudden increase in intensity and a slight ascend in pitch—stands out as a single accentuated peak (accompanied by a slight movement of the left thumb). Thereupon the infant focuses more closely on the left page of the book. The second arc is a more pronounced, with a gradual rise in pitch peaking in "VEL-vety" followed by a slow fall in pitch and a gradual decrease in the intensity of the mother's vocalizing, during which she turns her head toward the infant. After his mother's turn toward his face, just as she arrives at the end of an elongated, soft "NO::::::::se" forming the coda of the action arc, the infant turns his head and elevates his gaze toward his mother's face. As his gaze arrives at her face with a slight delay, her gaze has already moved on to the next page, where her right index finger is now performing a dynamic pointing gesture moving up and down on the velvety textured nose, and the infant's eyes follow there soon after.

There is a sustained social interaction going on revolving around an object. Both mother and infant—acting as autonomous agents—co-regulate each other and the activity—at the same time also shaped by the object and the cultural activity frame—in ways that sustain the interaction itself (in the sense of De Jaegher and Di Paolo, 2007). The interaction is asymmetric with the infant's attention and gaze responding to and following the mother's (object related) actions and the mother guiding the interaction, checking back with the infant and adapting her actions to the infant's response. The interplay of actions has an overall smooth and orderly quality, even though the infant is slightly lagging behind in time; still the order of events in the activity is retained and meaningful for the participants, as the actions of each of them effectively serve as an affordance to

the other's next action (Zukow-Goldring, 2006, 2012; Ra¸czaszek-Leonardi et al., 2013). The infant's actions are also recognizable to the mother as turns in the context of a (culturally structured) conversation (Schegloff, 2007). The mother interprets and shapes the spontaneous behaviors of the infant to fit the cultural frame.

Like the earlier example interaction involving page turning (see **Figure 6**), this interaction is organized into action arcs, again clearly illustrated by the intonation curve (pitch and intensity). The relevant deictic "THIS" is placed at the peak of the arc; the infant shifts his gaze at that peak, as well as in the pause after the closure of the arc after "NO:::::::::se." It is well known from the literature on infant directed speech that the rise in pitch—approaching the peak of the arc—makes it more likely that infants shift their gaze and is often used as an invitation for turn-taking. (Ryan, 1978; Stern et al., 1982; Ferrier, 1985; Papoušek et al., 1991) As infants and caregivers repeatedly move through action arcs together, they co-regulate and share arousal and excitement, as well as act out and experience the structure, shape, and dynamics of actions together.

## **ECOLOGIES IN TRANSFORMATION: SKETCHING A DEVELOPMENTAL TRAJECTORY OF BOOK SHARING OVER THE FIRST YEAR**

Over the first year, the quality and dynamics of book sharing interactions underwent considerable change in tandem with motor development, amounting to transformations of the whole ecological setting including spatial configurations the strategies and behavior of the caregivers as well as the objects used. Some aspects of these changes have already been described in the first section, as they became manifest in gross measurements on the population level: book sharing episode durations slightly increased until 6 months, then sharply declined at 7 months. From around 6 months on, solitary interactions emerged and became the dominant type of book interactions at 8 months until social book sharing took over again at 10 months finally reaching a balance at 12 months (see **Figure 1**). These results closely match a series of qualitative changes observed in the course of the longitudinal study. This section will sketch a developmental trajectory of book sharing over the first year based on these changes. For this purpose, the data samples are pooled into four age groups in accordance with the newly observed interaction qualities in each period:


become increasingly conventional—with the socially shared activity.

Each sub-section begins with a description of the newly observed interaction qualities in terms of the infant's activities as well as the overall ecological setting. Selected example episodes are then described and analyzed in more detail to explore and discuss attention and action coordination processes. For an overview of the changing characteristics of book sharing over the first year of life see **Figure 12**.

## *5–6 months: an early peak at social book sharing interactions*

From 5–6 months, the 2 months immediately following the early phase described in the previous sections, book sharing activities became richer, smoother, and more sophisticated in parallel with the infants' developing motor and attention skills and the increasing routine and attunement between the partners. During active participation infants used manual manipulation more extensively, showed improved aim when grasping pages, and their page flipping became more fluent. The repertoire of book interactions was extended by the addition of newly emerging actions, motor schemes such as banging, rapid opening and closing of the fingers ("scratching") on the surface of the pages, and mouthing objects (which also began to have slightly disruptive effects on the otherwise smooth interaction). Still, these actions were largely shaped into the cultural frame by caregivers. Coordinating and switching attention between object and caregiver was performed more easily and effortlessly: infants now followed the caregiver's lead more fluently, with faster, better aimed gaze shifts from the object to the caregiver's hands or face—following his or her voice—and then looking back to the book again spontaneously, without necessarily being prompted by local, dynamical events created by the caregiver (see **Figures 8**, **9** below).

In accordance with infants' improving postural control and new ability to maintain a sitting position with only slight support, spatial configurations with the interaction partners facing each other at a 90◦ angle became more frequent. At the same time, mothers less frequently acted on the infants' body (putting them through the motions of a specific action); rather, mothers used their own body and voice, especially their hands, to enact meaning and perform lively visual demonstrations (including the beginning use of baby signs). In line with the increasing frequency and skill of infants' object manipulations, books with touchy-feely textures and attached graspable objects became prominent, as did books made of real paper with audio-haptic crinkle.

**Figures 8**, **9** illustrate the new quality and range of book sharing interactions at 5, and especially 6 months with a focus on co-ordination of attention and of action.

In the first example (see **Figure 8**) the mother is sitting on the couch cross-legged with her 6-month-old daughter placed at a 90◦ angle in the hollow formed by the mother's left leg with her back supported by the mother's left thigh and a sofa cushion. They are both facing a small square paperback "Mr. Men and Miss Little" book with thin paper pages which the mother is holding. Immediately after a sharp rise in the intonation curve ("er ist SO::stark" ["he is SO::strong"]), the infant turns her gaze upwards toward her mother's face, who in turn responds with an

eye-greeting and a more pronounced facial expression and affective intonation. They share and reinforce each other's expression of surprise and amazement in voice and facial expression before first the infant and then the mother turn their gaze back to the book again.

In the second example (see **Figure 9**), the mother and her 6-month-old son sitting on her lap at a 90◦ angle are sharing a book about animal noises and have just arrived at the last page. After setting the scene by "Who's your favorite?" the mother starts curving her right hand with the fingertips pressed together through the air toward the infant—accompanied by "a bzzzzzz bzzzzy bee"—with her eyes fixated on the infant, who is still involved with the book, his left hand reaching for and touching the animal picture on the upper right corner of the right page. When the mother's hand finally touches the infant's belly, he turns his gaze and head to her hand and begins tracking her hand as she starts moving it with her fingers joined side by side in up and down waves acting out ". . . or a ssSSSSSSSSSssssssssssssnake." As the mother concludes her enactment of the snake, the infant looks up first at the mother's mouth and then at her eyes, beginning to smile. He then turns his gaze to the book again, his smile broadening, shortly after being followed by the mother returning her gaze to the book.

*Infants' attention coordination becoming more fluent and guided by routine.* In both examples the infant is responding to an aspect of the mother's behavior related to the book, e.g., the intonation curve going up as part of the mother's interpretation of the narrative. In a previous example at 4 months (**Figure 7**), the infant was responding to and following the mother's salient actions but kept lagging slightly behind and so the mother's gaze had already moved back to the book by the time the infant had shifted his gaze to his mother's face. In contrast, this time the eyes of mother and infant meet, facilitated by the 90◦ configuration and the infant's more fluent movement. The infant thus elicits a communicative exchange of affect, including mutual acknowledgement and reinforcement. Also in contrast to the previous interactions, in both these cases it is now the infant who first turns his/her gaze back to the book again, before the mother does. . . .

While infants, despite their growing motor skills, are still unable to autonomously move in and explore the world of objects, they are now turning their gaze and head more fluently from book to the caregiver's hand or face and back again. They do so spontaneously, without necessarily being cued by dynamical movements, but arguably guided by routine, at times even arriving back at the book first, taking the lead in coordinating attention. Thus, within these interactions, infants demonstrate a basic understanding of the activity as shared and of the spatiotemporal structure and format of the book sharing activity at hand. The examples at 6 months also invite us to consider how small changes in the temporal dynamics of the interaction can lead to profound qualitative shifts as infants' more fluent gaze coordination enables episodes of affective communicative emotional exchanges, and thus increase the infants' ability to effectively shape the interaction dynamics of the book sharing activity.

*Interspersed affective communicative exchanges related to the book.* Whereas at 3–4 months, infants showed "serious intent" when engaging in book sharing interactions, along with these novel communicative exchanges, infants now show pronounced affective exchanges.

While the mother narrates the story, in the short span of 5 min the infant displays and moves through a whole range of emotions in rapid succession, in concordance with the mother's tone of voice, her gestures and movements: from surprise and amazement to amusement, and from being "staggered" to concern and sadness (see **Figure 10**). The emotions build up and develop in the flow of the interaction. In response to the mother's voice and actions the infant looks up to her face with an expression of surprise, for example after an abrupt rise in pitch contour in "SO::strong," the mother takes up her daughter's expression and responds to it with widely opened eyes, raised eye-brows, and a sharp intake of breath indicating surprise (.h). She then repeats the passage that drew her daughter's attention to her "SO::strong," again with exaggerated pitch contour, reinforcing and further shaping her daughter's emotion, thus acknowledging and reinforcing each other (compare Stern, 1985; Jensen, 2014 this issue).

So they were moving through the emotions together without however seeming to be seriously upset or sad. Importantly, these communicative exchanges are situated in the book sharing context, immediately following and leading back into attentional engagement with the book. Thus, the exchanging of emotions appears clearly linked to the book, and even to constitute a jointly relating to and negotiating "about" the book (see general discussion below).

# *6–9 months: shifting attention to object exploration*

During the next few months, however, roughly in the period between 6 and 9 months of age, the interaction dynamics of infant-caregiver-object interactions underwent a significant transformation and the course of the developmental trajectory took a sharp turn: infant-object-caregiver interactions decreased in number relative to solitary book exploration, and book sharing interactions showed a considerable decrease in duration and appeared generally less smooth compared to the period before, in spite of the infants further developing their capacity to sustain attention (see **Figures 11B,E**).

These changes occurred in a period when the infants' developing strength and postural control allowed them to adopt and maintain a stable sitting position for longer periods of time, enabling them to reach and grasp and bimanually manipulate objects without falling over. Also, many infants at this age started locomoting by rolling and ("army") crawling, and actively initiated interactions in a clearly visible way. The 7-month-old girl in **Figure 11A** for example, noticing a book sharing interaction taking place between her mother and sister, glances over her shoulder, rolls over from back to belly, and crawls across the

**FIGURE 10 | Book sharing interaction with 6-month-old infant sitting at a 90◦ angle on mother's lap including extensive voice and hand acting.** Still images showing sequence of emotional exchanges: going in rapid

succession and hand in hand with the mother's tone of voice and movement when narrating the story, the infant moves from surprise, amazement, to amusement, and from being staggered to concern and sadness.

room toward the book (still held by her mother but abandoned by now by her older sibling), thereby prompting her mother albeit without explicit social signals—to start a book sharing interaction. Infants were also better able to focus and maintain their attention—see the 6-month-old boy in **Figure 11B** intently watching his mother's stroking a texture and closing in to see better. However, they were also more likely to quickly terminate interactions as their newly developed autonomous object exploration and locomotion activities drew them into new attentional engagements. In **Figure 11C** the same 6-month-old, after sitting back up again, accidentally touches a toy ring, subsequently grasps it and—with his eyes still on the book—brings it to his mouth, at which point his gaze is finally distracted away from the book and he becomes pre-occupied with exploring the ring, bringing the book sharing activity to a halt.

In this period, facilitated by the now stable sitting posture, infants got at times deeply involved with objects, e.g., banging, mouthing and manipulating books or other objects in solitary play to the extent of seemingly ignoring people: having escaped from a book sharing interaction after barely 2 min the boy in **Figure 11E** engages in manipulating a single object for nearly 6 min without interruption immediately afterwards. Infants did, however, from time to time look up at people's faces, e.g., when introduced to an object, or in what might be early forms of instrumental looking: after having pushed a book out of reach, a 6-month-old girl lying on her belly turned her head up to her mother's face and vocalized.

These changes were also reflected in the caregiver's behavior: they were now often content to leave the infants to their solitary play. When they did try to engage them in book sharing, their efforts of directing attention became more vigorous: for example, they called their infant's name repeatedly with increasing intensity to get the infant's attention and resorted to acting on the infant's body again, but now in an exaggerated fashion to keep the infant entertained. Caregivers also adapted by changing the situational context: for example, they tried to engage infants in book sharing interactions before bedtime, when infants are already tired, or changed the spatial configuration by placing infants on their lap, thereby actively constraining their action possibilities.

Books chosen by caregivers during this period had more interactive elements: in addition to the touchy-feely textures, flaps, and small graspable objects, they now included buttons producing various animal noises and moveable parts set on massive plastic pages eliciting blinking lights and nursery rhymes when operated correctly (**Figure 11D**). Thus, books are designed to invite manual exploration and multimodal interaction, drawing in infants now able to approach and engage with books on their own. On their part, caregivers included these highly salient object interaction opportunities in their social interactions to make them more interesting again to their infants with mixed results (**Figure 11E**).

## *9–12 months: putting books, caregivers and world back together*

At 9–12 months, infants continued to engage in many solitary book interactions, but in contrast to the previous months, when they had primarily been exercising various motor schemes, banging, scratching, mouthing the book, as well as bimanually exploring books, they now started showing many more behaviors associated with conventional book interactions such as sitting still and looking at the pictures, turning pages, opening flaps, pointing at pictures, touching textures, and vocalizing.

Also in contrast to the previous period, the proportion of social book sharing episodes in relation to solitary ones increased again. Both solitary and social book interactions showed considerable variations in duration. Although the majority of the interactions were short, at times infants engaged in book interactions for extended periods lasting up to 7 min, as well as chained several episodes together into much longer lasting book activities. For example, they would ask for another round of looking at a specific book several times in a row, or, according

**FIGURE 11 | (A)** 7-month-old infant initiating book sharing by crawling toward the book. **(B)** 6-month-old, sitting freely, focusing on mother's dynamical pointing and further closing in. **(C)** the 6-month-old in the same interaction getting distracted after accidentally touching and subsequently grasping and mouthing a toy ring. **(D)** 7-month-old absorbed in solitary play: correctly operating interaction device resulting in music and blinking. **(E)** 9-month-olds escaping from the book sharing activity despite their mother's attempts to engage them. **(F)**

11-month-old proactively performing appropriate actions for "Pat the bunny": putting his finger through the ring, sharing affect with his mother while making dolly's ball squeak by banging on it, and "waving bye-bye" directed at the researcher, thus connecting the book sharing context with the visitor context. **(G)** Mother naming, pointing at, and signing "bird," infant turning head looking out of the window while mother is still involved with the book, before mother turns her head recounting how they saw a bird out there the day before.

to the mothers' reports, entertain themselves during car journeys by looking at books and turning pages for extended periods of time.

Book sharing episodes, even short ones, encompassed an increased number of action turns and showed a new quality and a larger degree of integration between interactions with the caregiver and with objects, between book and world and across time and space. Infants now more actively integrated manual object actions into their social engagements (e.g., approaching the mother with a book, laughing) and, when engaged with objects, now integrated social interactions (pointers, requests. . . ), which may or may not include gaze alternations. Moreover, they were now actively bidding for and directing others' attention.

Infants now moved pro-actively in the spatiotemporal attention-action framework of an activity: spontaneously performing appropriate actions in a specific context independent of temporal order, e.g. performing an action corresponding to a specific book page ("pat the bunny," "put the finger through mommy's ring," "wave goodbye"—see **Figure 11F**), and were also able to anticipate what came next. The infants' actions extended much further over space and time, between the book and the world, while still being part of and coming back to the shared activity. For example, a boy interrupted his immediate engagement with the book, ran off and found the object depicted in the picture book and returned to mother and book. Or when the mother in **Figure 11G** is pointing out and signing "bird" referring to the picture in the book the infant is turning and looking out of the window. Not realizing this, the mother first finishes her signing, and then herself turns to look to the window recounting how they had encountered a bird there on the previous day.

## **CONCLUSIONS, GENERAL DISCUSSION, AND OUTLOOK**

Our 3 main findings were:


## *Development of triadic interactions*

With regard to various theoretical accounts concerning the development of triadic interactions our observations suggest that:

Interactions with objects and interactions with people are not separated during the first year as often suggested in the literature (Bakeman and Adamson, 1984; Tomasello et al., 2005). On the contrary, at around 3 months when infants' interests start to reach beyond the dyad but they lack the means to effectively interact with the material world on their own yet, objects are introduced by their caregivers in the context of social interactions.

Instead of a late, sudden appearance of triadic interactions at the end of the first year, we report a much more gradual development (compare Striano and Reid, 2009; De Barbaro et al., 2013)—albeit following a non-linear trajectory, characterized by an apparent dip after around 6 months followed by a recovery starting from 9 months; this would also explain why the earlier interactions have been largely overlooked in the literature.

The qualitative changes in the period between 9 and 12 months need a more differentiated conceptual framework as many of the criteria for triadicity—active contribution of the infant, coordination of attention and action between caregiver and object, etc.—already seem to be met by earlier interactions. Key notions need to be clarified and re-conceptualized, including: the nature of the infant's active contribution, infants' coordination of attention/orientation actions in relation to their coordination of manual actions and in particular the concept of joint attention.

*3–4 months.* At 3–4 months the infants showed active interest in the activity. They were responsive, amenable to and following the caregiver's lead, effectively co-ordinating their engagement between caregiver and object, their attention being drawn by local dynamical cues created by the caregiver (though following with slight delay) and their (rudimentary) manual actions were shaped into cultural frames by the caregiver. Thus the interaction was coordinated but asymmetric, smooth and orderly but slightly off-set (see **Figure 12**).

Accounts of infants' (lack of) triadic behavior at this early age do not begin to capture these intricacies revealed through the qualitative micro-analysis. For example, in Adamson and Bakeman's (1984) notion of passive joint engagement, the caregiver establishes and sustains the (passive) triadic interaction essentially all by herself. By turning to whatever the infant is engaged with or directing the infant's attention to a specific target, she ensures that infant and caregiver are "actively involved in the same object, but the baby evidences little awareness of the other's involvement or even presence." (p. 1281) In early book sharing, however, the infants were clearly not oblivious to the caregivers' presence, as evidenced by e.g., their regular gaze shifts between caregiver and object, drawn by the caregiver's voice and movements. Rather, early book sharing already comes close to their description of *coordinated joint engagement* characterized by the infant being "actively involved with and coordinating his or her attention to both another person and the object that person is involved with."

While it is arguable whether the responsive nature of the 3–4 month infant's engagement completely matches this set of criteria introduced to describe the behavior of infants 9 months and older, by 5–6 months, infants' active involvement was pronounced, especially with respect to their attention coordination.

*5–6 months.* At 5–6 months infants now coordinated their engagement between caregiver and object more fluently, and shifted their gaze back to the book by themselves without the need for a prompt arguably guided by routine. Their gaze often arrived back at the book first, thus at times leading the interaction. As faster gaze shifts led to meeting the caregiver's eyes, infants now entered into affective exchanges and sequentially coordinated these exchanges with periods of shared object involvement. Despite their improved motor skills, infants were still unable to move in and explore the world of objects on their own. In book sharing, their range of manual contributions has expanded, including both helpful and disruptive actions, which were still mostly shaped into the cultural frame by their caregivers. Thus the interaction is co-ordinated and more symmetric with regard to attention, but asymmetric in terms of action, and overall orderly and fluent (see **Figure 12**).

Due to the interspersed affective exchanges, the interaction already resembles Hubley and Trevarthen's concept of *secondary*


**FIGURE 12 | Ecologies in transformation.** The table gives an overview of book sharing as it changes over the first year. The columns list relevant characteristics for the respective participants: infant (inf): motor skills and book sharing actions sorted in attentional, manual and affective; caregiver

(cg): book sharing actions in terms of function and modalities they are implemented in; books: type of book used; and for the interaction as a whole: the spatial configuration of the participants and the quality of - the resulting interaction. The rows list the pooled age groups (3–4, 5–6, 6–9, 9–12 months).

*intersubjectivity*, characterized by integrating "acts of joint praxis" around objects with "interpersonal communicative acts" (Hubley and Trevarthen, 1979). On the other hand, infants may not show enough manual object actions yet, and alternating back and forth between shared book involvement and communicative affective exchanges sequentially (see **Figure 9**) may not be "integrated" enough to match the criteria again set to describe the behavior of infants around 9 month and above.

Whatever the verdict on its "triadic" status, this alternation between engagements may constitute a basic form of "joint aboutness"—jointly communicating about something—which plays an important role in secondary intersubjectivity. It is also reminiscent of a crucial notion in Liebal and Carpenter's account of joint attention: one of its central features, "knowledge of knowing together," is held to be established via what they call "sharing looks." These looks close the triangle of the triad, turning "not-yet-shared attention into truly joint, shared attention," confirming that attention is shared, with the goal of bringing about "an alignment of attitudes" (Carpenter and Liebal, 2011; compare Hobson, 2005). Their account again refers to infants at around 9 months and older and was not intended to capture the behavior of younger infants. Notably, social book sharing interactions at 6 months seem to already constitute a basic comment structure, in Bruner's terms (1975), in that infant and caregiver exchange affect in relation to, or even "jointly negotiate about" the book. Thus the affective exchanges in conjunction with the joint involvement with the book, its pictures, and vocal narrative might constitute a basic form of "content" and the succession of emotional exchanges may build up toward a basic form of "emotional narrative."

*6–9 months.* At 6–9 months, infants were actively seeking out and autonomously manipulating books, mostly engaging in solitary book exploration, with their attention primarily drawn to their own manual object actions, only at times looking up at their caregivers. Thus the social book sharing episodes were shorter, as the infants failed to keep up their engagement with the caregiver long enough to sustain the interaction. Though the interactions were now more symmetric, due to the infants' more autonomous object manipulation, they were also less coordinated, at times dis-coordinated: when their caregivers attempted to guide them, infants were frequently already involved in an action, putting them at cross purposes (compare De Barbaro et al., 2013), and their manual actions could no longer easily be shaped into the cultural frame of book sharing (see **Figure 12**).

Looking at the period between 6 and 9 months revealed that the configuration commonly described in the literature for most of the first year does indeed occur: there was little joint or shared action as infants were drawn into deep object involvement to the point of seemingly "ignoring people" (e.g., Tomasello, 1999). However, when looked at more closely in the bigger ecological context, the apparent dip in triadic interactions at this point is not the beginning of the story but rather is only temporary, following a period of already well coordinated infant-caregiver-object interactions.

Rather than reflecting an enduring lack of cognitive capacities, the relative paucity of triadic interactions compared to solitary book sharing interactions between 6 and 9 months can hence be understood as a change of interaction dynamics due to new achievements (developing object manipulation, posture and mobility) and accordingly shifting interests. This shift of interest toward objects has long been known in the literature (Trevarthen and Hubley, 1978; Bakeman and Adamson, 1984). To characterize it (beyond noting basic correlations with infant postural and motor development) further investigations are required at the micro-developmental level (see De Barbaro et al., 2013). The primary focus in the literature on the development of triadic interactions in terms of underlying cognitive capacities "coming on line" only later on explains why the diminished and discoordinated social object interactions at this age range are ignored and why the significance of early triadic interactions has been so often neglected and even overlooked (Tomasello et al., 2005; compare Reid and Striano, 2007).

*9–12 months.* At 9–12 months infants' attention and action were guided not only through dynamical cues and routines but also by indirect and conventional means (words, instructions, demonstrations). Infants' fluent coordination at this age incorporated manual object actions into social actions and social actions into manual object actions across different cultural activity frameworks, across time and space. Infants increasingly shaped and adapted their now versatile locomotion and object manipulation actions according to the conventional frame and to communicative exchanges, and were themselves actively directing others' attention and action. The episodes were of varying duration, with a high frequency of action turns, and often chained together. The interactions were mostly coordinated and symmetric, orderly and fluent (see **Figure 12**).

This period clearly encompasses significant qualitative changes in the interactions. Rather than appearing suddenly supposedly mediated by a newly emerging capacity of joint attention, these changes can be seen as part of a gradual development (compare De Barbaro et al., 2013), coming out of the interplay of multiple strands of development in interaction with the social and cultural environment and the entire ecology of the activity.

In order to further explore and better understand the interplay of these multiples strands of development we need to reframe, refine, and expand key notions such as (visual) joint attention to create conceptual frameworks which likewise allow for an interplay of multiple concepts capturing different aspects of the interactions, cultural activities, and their ecologies. For example, whereas the concept of joint attention, which developed in the context of experiments on gaze following and gaze checking (Scaife and Bruner, 1975), is primarily focused on the visual domain, processes such as sharing of experience, attention coordination, mutual orienting can rely on multiple modalities bound together in structured actions. The role of gaze within this interplay of modalities is only beginning to be explored in more detail (e.g., social gaze to eye-hand-coordination in caregiver-infantobject interactions Yu and Smith, 2013).

## *Jointly structuring shared spaces of meaning and action*

The richness of early infant-caregiver-object interactions in naturalistic contexts invites an expansion of focus from the supposedly late emerging triadic interactions primarily associated with visual (joint) attention to studying how shared spaces of meaning and action are multi-modally structured together from early on.

The infants' situation at 3–6 months (showing interest in their surroundings but not yet being able to explore the object world on their own) makes this age window particularly interesting for learning socially (including learning "about objects and the world"), as the infants readily engage in the highly structured and experientially rich joint activities offered by their caregivers.

Book sharing is such an activity. It serves as a "container" holding infant, caregiver, and world together in a small confined space opening up possibilities for shared experience and action and fostering learning (Wood et al., 1976; Vygotsky, 1978). In pointing actions, for example, rather than having to follow a pointing finger to a distant target, the close encounters of early book sharing allow the finger pointing and the object pointed at to meet in immediate vicinity and within the infant's reach, often accompanied by salient, dynamical gestures and actual, audible contact events. The container offers a rich reservoir of—and substrate for creating—interaction structures which are easily accessible to learn from and act upon together (Shotter, 1983; Goodwin, 2013). Part of this (spatial as well as temporal) structuring is provided by the cultural book sharing framework created around and manifested in the artifact book. Not only does the book invite the infants to physically engage with it (scaffolding their manual actions), it also embodies and reliably reproduces a stable, recognizable and predictable sequence of actions. What makes the activity come alive is the caregivers' active moment-to-moment structuring as they dynamically enact and carve out "building blocks" of interaction, pattern actions, and shape actions into action arcs in dialog with the infants.

The wealth of information available in infants' natural environments has been emphasized by computational approaches in order to explain the impressive early achievements of infant learners, focusing primarily on the problem of word-reference learning (Smith et al., 2014). Also the statistical validity of social cues (caregivers' action and gaze directions) for finding and disambiguating meaning in the complex cluttered streams of objects, actions, events—and words—has been shown using statistical learning models (Frank et al., 2012). Caregivers in real world activities actively select and structure their infant-directed speech, performing "auditory packaging" closely coupled to the relevant actions, creating crossmodal invariances, thus simplifying learning by highlighting relevant aspects within the interaction (Nomikou and Rohlfing, 2011; Bahrick and Lickliter, 2012; see also Leavens et al., 2014, this issue).

The present study invites us to take a step beyond the structuring of "perceptual input," and consider the infant's active, embodied participation and engagement in joint practices. Infants experience the activity first hand, actively seeking out and probing their environment through active vision and active touch. They are fully immersed and emotionally invested in coordinated interactions with their caregivers and the book, actively structuring shared spaces of meaning and action together. To describe this structuring in more detail we used the notion of "action arcs." The basic arc structure with a beginning, build up, climax, and resolution is ubiquitous in physiological processes, e.g., breathing, and is fundamental to action, with different actions following different dynamic trajectories (Stern, 2010; Trevarthen and Delafield-Butt, 2013).

As infants and caregivers repeatedly move through action arcs together, they co-regulate and share arousal and excitement, as well as act out and experience the structure, shape, and dynamics of actions together. These types of co-regulation could be regarded as merely coordination of behavior with sharing of affect (Tomasello et al., 2005). However, in moving through these arcs together, sharing of affect goes hand in hand with, and is inseparable from, learning about the structure of the action: infants become familiar with the dynamic trajectories as they are led through the motions, providing an opportunity to learn about structure and dynamics of actions, about themselves, their partner, the object involved, and their relation. Moreover, they get to experience and learn about the effects their own actions have on the partner and the unfolding of the activity.

Through such immersion in participation, infants are able to learn specific routines and practices, and more generally, "ways of interacting," following the implicit norms of their culture (Mauss, 1973; Rietveld, 2008). It also provides the opportunity to learn about other people as social agents, whose actions significantly shape the unfolding of the activity. Through being drawn repeatedly by cues and movements to the relevant locations—hands, faces, objects—"where the action takes place"—infants become accustomed to and learn to anticipate the specific sequences of action trajectories (e.g., Hunnius and Bekkering, 2010), and the interplay of gaze, hand actions, and object use—in short how people act.

Crucially, infants are learning how to learn: when to look, where to get important information, and when to join in with an appropriate action (e.g., after a rising action at the peak of an action arc). Once established as interpersonal routines, action structures lend themselves to be played with, e.g., introducing temporal variations that violate expectations (as in teasing), thus highlighting and making explicit mutual coupling and coregulation, potentially helping to develop action coordination skills and cooperation (Reddy, 2008; Reddy et al., 2013). As active participants even in early interactions, infants become familiar with how to jointly structure activities and begin to learn how to negotiate and modify this shared structuring of activities. This skill, developed further, may be characteristic of how infants coordinate triadic interactions at 9–12 months, and crucial for cultural learning and culture creation.

## **ACKNOWLEDGMENTS**

We are grateful to all the participating families and infants for sharing this precious time. We would like to thank Tina Reichelt and Elisabeth Zimmermann for support in data collection, and Hanne De Jaegher, Valentina Fantasia, Alessandra Fasulo, Joanna Ra¸czaszek-Leonardi, Thomas Wiben Jensen, and Michael Schmitz for discussion, as well as the participants of data sharing sessions and research seminars in Portsmouth and Vienna. Many thanks also to the 3 reviewers for their helpful comments and suggestions. This work is supported by the Marie-Curie Initial Training Network, "TESIS: Toward an Embodied Science of InterSubjectivity" (FP7-PEOPLE-2010-ITN, 264828).

# **REFERENCES**


Piaget, J. (1962). *Play, Dreams and Imitation in Childhood*. New York, NY: Norton.


Tronick, E. Z. (1989). Emotions and emotional communication in infants. *Am. Psychol.* 44, 112–119. doi: 10.1037/0003-066X.44.2.112

Uexküll, J., (1921). *Umwelt und Innenwelt der Tiere*. Berlin: J. Springer.


in *Evolving Explanations of Development: Ecological Approaches to Organism– Environment Systems*, eds C. Dent-Read and P. Zukow-Goldring (Washington, DC: American Psychological Association), 199–250.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 May 2014; accepted: 13 November 2014; published online: 10 December 2014.*

*Citation: Rossmanith N, Costall A, Reichelt AF, López B and Reddy V (2014) Jointly structuring triadic spaces of meaning and action: book sharing from 3 months on. Front. Psychol. 5:1390. doi: 10.3389/fpsyg.2014.01390*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Rossmanith, Costall, Reichelt, López and Reddy. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **GLOSSARY**

In the micro-analytic descriptions and ELAN illustrations some transcription conventions from conversation analysis were used where appropriate. (See Zukow, 1982; Jefferson, 2004)


! Exclamation mark: animated tone

AIr Upper case: increased loudness relative to surrounding sound

.h Period preceding h: audible inhalation, in particular: sharp intake of breath indicating surprise

BI::g Colons: lengthening of preceding sound, the more colons, the longer.

# Embodied intersubjective engagement in mother–infant tactile communication: a cross-cultural study of Japanese and Scottish mother–infant behaviors during infant pick-up

# *Koichi Negayama1\*, Jonathan T. Delafield-Butt <sup>2</sup> , Keiko Momose1, Konomi Ishijima1, Noriko Kawahara3 , Erin J. Lux2 , Andrew Murphy4 and Konstantinos Kaliarntas 4,5*

<sup>1</sup> Faculty of Human Sciences, Waseda University, Tokorozawa, Japan

<sup>2</sup> Faculty of Humanities and Social Sciences, University of Strathclyde, Glasgow, UK

<sup>3</sup> Faculty of Home Economics, Kyoritsu Women's University, Tokyo, Japan

<sup>4</sup> Department of Biomedical Engineering, University of Strathclyde, Glasgow, UK

<sup>5</sup> School of Life, Sport and Social Sciences, Edinburgh Napier University, Edinburgh, UK

#### *Edited by:*

Hanne De Jaegher, University of the Basque Country, Spain

#### *Reviewed by:*

Gabriela Markova, University of Vienna, Austria Martine Van Puyvelde, Vrije Universiteit Brussel/Royal Military Academy, Belgium Monica Birgitta Hedenbro, Hedenbro Institutet, Sweden

#### *\*Correspondence:*

Koichi Negayama, Faculty of Human Sciences, Waseda University, 2-579-15 Mikajima, Tokorozawa, Saitama 359-1192, Japan e-mail: negayama@waseda.jp

This study examines the early development of cultural differences in a simple, embodied, and intersubjective engagement between mothers putting down, picking up, and carrying their infants between Japan and Scotland. Eleven Japanese and ten Scottish mothers with their 6- and then 9-month-old infants participated. Video and motion analyses were employed to measure motor patterns of the mothers' approach to their infants, as well as their infants' collaborative responses during put-down, pick-up, and carry phases. Japanese and Scottish mothers approached their infants with different styles and their infants responded differently to the short duration of separation during the trial. A greetinglike behavior of the arms and hands was prevalent in the Scottish mothers' approach, but not in the Japanese mothers' approach. Japanese mothers typically kneeled before making the final reach to pick-up their children, giving a closer, apparently gentler final approach of the torso than Scottish mothers, who bent at the waist with larger movements of the torso. Measures of the gap closure between the mothers' hands to their infants' heads revealed variably longer duration and distance gap closures with greater velocity by the Scottish mothers than by the Japanese mothers. Further, the sequence of Japanese mothers' body actions on approach, contact, pick-up, and hold was more coordinated at 6 months than at 9 months. Scottish mothers were generally more variable on approach. Measures of infant participation and expressivity indicate more active participation in the negotiation during the separation and pick-up phases by Scottish infants. Thus, this paper demonstrates a culturally different onset of development of joint attention in pick-up. These differences reflect cultures of everyday interaction.

**Keywords: embodied intersubjectivity, cultural learning, development, Japan and Scotland, mother–infant relations, motor control, anticipation, peri-personal space**

## **INTRODUCTION**

Human culture is marked by social expectation and patterns of engagement. Differences in culture are constituted by implicit differences in patterns and style of social expectation and engagement, as well as explicit differences in adornment, and language, and differences in value. For example, greeting styles are composed not only of the gestural codes of, e.g., bowing in Japan or hand-shaking in Scotland, but they differ markedly in their course of social expectation and engagement constituted by psychological values that places one's actions correctly within an acceptable cultural context. These expectations produce its cultural narrative (Bruner, 1990). Such implicit knowledge of expectation and patterns of affect, arousal, and interest co-regulation is an important contributing element of learned cultural knowledge (Merker, 2009; Frank and Trevarthen, 2012).

Patterns of engagement are learned in embodied social interaction in infancy from birth (Delafield-Butt and Trevarthen, 2013; Trevarthen and Delafield-Butt, 2013; Kugiumutzakis and Trevarthen, 2015). At this early age, infants are able to guide their movements purposefully to achieve desired sensory effects (van der Meer et al., 1995), including social responses from caregivers (Nagy and Molnar, 2004). Such early self-generated action made with anticipation of its sensory contingencies is a fundamental marker of intentionality, and it is expressed in early life before conceptual and reflective development has become established to give a primary form of intentionality (Delafield-Butt and Gangopadhyay, 2013). Primary intentional actions generate sensory consequences that give knowledge of the world, and while the origins of intention has been controversial in psychology (Zeedyk, 1996), it is clear that infants actively contribute to social engagements and learn from these to anticipate their outcomes, within primary experience (Trevarthen and Reddy, 2007; Gallagher, 2008; Trevarthen, 2009; Panksepp, 2011).

Self-generated action repeated regularly over cycles of activity – in what Baldwin (1895) called the 'circular reaction' – forms the basis of knowledge and understanding, generating reliable patterns, or'schemas,' of sensorimotor knowledge (Piaget, 1953, 1954; Trevarthen and Delafield-Butt, 2015). Shared between persons, regularly patterned acts of common purpose form the foundation of cultural understanding, they co-create meaning (Delafield-Butt and Trevarthen, 2013). These regular action patterns and their exchange of the motives and feelings that guide them form the basis of an intersubjective, socially generated and embodied knowledge of a culture (Donaldson, 1978; Halliday, 1978; Bruner, 1996; Rogoff, 2003; Legerstee, 2005; Gratier and Trevarthen, 2008; Reddy, 2008; Frank and Trevarthen, 2012).

Even everyday embodied interactions during practical tasks enable communication of expectation with their affects and intentions made manifest within simple acts, such as picking up the infant for feeding. Self-other intentionality within these acts can be read by direct neural resonance of their motor patterns (Gallese, 2003; Ammaniti and Gallese, 2014), giving implicit meaning within a 'direct' and intrinsically 'smart' social perception (Gallagher, 2008). Seminal psychologist Daniel Stern recognized these bodily projects are patterned with narrative form structured by their intended outcome, which enables learning the consequences of expression in social projects (Stern, 1985). Intimate engagements attuned to each other's affects and intentions are conveyed by an inter-modal fluency of action, voice, and touch (Stern et al., 1985; Trevarthen et al., 2011; Trevarthen and Delafield-Butt, 2013). And their timing and particular kinematic form transmit affective value, giving particular expressive, poetic feeling that holds meaning for those with whom they are shared (Stern, 2010). The timing, form, and energetic of body movements can be specific to a culture and learned in early adult–infant engagement (Gratier and Trevarthen, 2008; Gratier and Apter-Danon, 2009). Feelings conveyed in body movement, in choice or form of action, form a basis of cultural knowledge, and evolution (Rogoff, 2003; Hrdy, 2009; Packard and Delafield-Butt, 2014).

The present study examines the early development of cultural differences in communication between mothers and infants in 'pick-up and carry' paradigm by application of high-precision motion capture, together with video micro-analysis, to accurately record the actions of mother and infant during this task. Special attention is paid to the timing and structure of the mother's movements as we reason it provides a culturally specific framework for the full sequence of the 'approach,''pick-up,' and 'carry' phases as a determinant of culture-specific behavioral development. Mother– infant interactions were examined by video micro-analysis of data obtained from cameras set alongside the motion capture system. Together, motion capture and video data afforded a comprehensive analysis of both kinematic style and quality of expressive behavior, giving precise measure to the inter-body relationship between mother and infant from two different cultures (Japan and Scotland) at two developmental ages (6 and 9 months).

## **TEMPORAL COORDINATION IN INTERSUBJECTIVITY**

Mothers are commonly understood as the principal driver of an interaction, structuring the encounter and framing it. However, infants are also active participants in social interaction evident from birth (Nagy, 2011), both soliciting interaction from others (Nagy and Molnar, 2004) and patterning these to form intersubjective dialogs of meaning-making (Trevarthen and Delafield-Butt, 2013; Kugiumutzakis and Trevarthen, 2015). A wealth of detailed mother–infant analyses prove that adjustment of the timing of actions that make up behavior is facilitated by awareness of each partner's intention, and mother and infant read the intentions inherent in each other's actions, coordinating their activity and expressions, and forming the basis of embodied intersubjectivity (Brazelton, 1979; Stern, 1985; Trevarthen, 2001).

Mother and infant communicate with each other to generate shared meaning. The communication has components of pulse, quality, and narrative with a four-part structure of introduction, development, climax, and resolution. Thus, mother–infant interaction generates what Malloch and Trevarthen (2009) identify as '*communicative musicality.*' In this idea, synchrony is not only shared dynamically between individuals, but is also contextual. Such context-based interactions enable participants to dynamically anticipate each other's behaviors and sequentially attune their own behaviors to them, such as in jazz improvisation (Schögler and Trevarthen, 2007). This kind of successive anticipation, intention-reading and resulting sequence of joint engagement promotes a sense of belongingness (Gratier and Apter-Danon, 2009). These interactions require a complex reciprocity in the behaviors between the mother and infant. Nine months of age is the time of joint attention and is interpreted as the time of significant development in intention-reading in a triadic relationship (Tomasello, 1993), but younger infants are nevertheless aware of the social context and adapt their actions appropriate to their particular feelings and motivations within it (Legerstee and Markova, 2007; Ishijima and Negayama, 2013).

Trevarthen (1998) identified two different types of intersubjectivity: primary intersubjectivity and secondary intersubjectivity. Primary intersubjectivity involves direct social attention and attunement evident from birth, while secondary intersubjectivity, characterized by inclusion of objects into the primary mother– infant intersubjective interactions, is evident from 9 months (Trevarthen and Hubley, 1978). Joint attention of mother and infant to an object of shared interest is considered to be a mutual inclusion of the other's perspective into their shared experience to form a true triadic relationship (Tomasello et al., 1993). Mutual intention-reading between mother and infant enables advanced, fine temporal coordination between infant and mother at 9 months. However, mothers and infants take part in shared attention and engagement with their body parts in games and rituals, such as in tickling play, which suggests an earlier form of proximal triadic relations using their body part as the target may exist (proto-triadic relation; Negayama, 2011).

Another important concept closely related to intersubjectivity is parent–infant interactional synchrony; synchrony requires mutually adaptive timing. Feldman (2007) identified synchrony as a construct that denotes intersubjectivity. Synchrony has several different developmental phases, starting with a basic biological clock and autonomic physiological system, through to voluntary, behaviorally mediated interactions and symbol use. Among such

interactions, touch is a strong inducer of synchrony. Touch is a significant modality of agent engagement with strong, direct sensory consequences that can be life-affirming, or the opposite. It simultaneously brings a bilateral experience of 'touch' and 'being touched' in the participants (Rochat, 2001), and this bilaterality of experience is a significant mediator of synchrony. Experience of contact is always mutual, and the experience is intensely personal; it is not shared by a third other and it generates vital, affective appraisals of their value as benefit or threat.

For example, hugging and kissing mediate affection, but hitting and kicking are aggressive attacks. Thus contact can elicit broad range of emotions. And as the body is isomorphic between the persons mutually engaging touch, the sensations of one's body being touched is simultaneously sympathetically perceptive to the one making the touch. Such unique characteristics of symmetry and simultaneity in tactile experience are favorable for conveying shared feelings of oneness between mother and infant. Touch is not simply a tactile experience of texture and pressure, but involves different types of receptors all over the body, including those for temperature and pain (McGlone and Spence, 2010). Bodily communication with touch gives a rich and intimate experience to both participants, of which holding and being carried is one important everyday example.

## **HOLDING BEHAVIOR AND ITS DEVELOPMENT**

Opportunities to learn and practice synchronization of one's own behavior to another's behavior are richly embedded in everydaylife tactile interactions. Mother–infant holding is a behavior of this kind because it is a major joint behavior of the mother and infant that requires fine tactile attunement of movements in the arms, hands, legs, and trunk (Negayama et al., 2010). Holding behavior is clinically known to reflect the quality of the mother– infant relationship (Massie, 1975; Weatherill et al., 2004) possibly because of the necessity of intimacy for this complex mother– infant coordination.

Hand-aiming, clutching and lifting by the mother, and armreaching and grabbing by the infant are likely to be included in the paradigmatic sequence of 'put-down' and 'pick-up' phases. When the mother walks, she and her infant dynamically adjust their behaviors in harmony with each other to maintain secure holding. For all these to be performed smoothly, the mother and infant must mutually attune their movements precisely.

The 'put-down' phase is a separation of the infant, previously securely held by his or her mother, from the mother, and the 'pick-up' phase is thus a reunion with the mother who just left the infant alone. However, the mother–infant interactions within these processes have the opportunity to be even more dynamic than during the simple act of holding, and mother–infant adjustment of behaviors for synchrony and reciprocity should be worth examining in the paradigmatic sequence of put-down, pick-up, and carry.

Reddy et al. (2013) demonstrated that even infants of 2 months of age are able to adjust their behaviors in anticipation and in preparation of being picked up. This responsiveness develops over the following few months. As noted above, 9 months of age marks an upsurge in intention-reading. Thus, the interactions during the pick-up phase ought to be different between

9 months old and younger ages. Successful pick-up requires a precise attunement of timing in embodied communication, which might require a long developmental and learning process that precedes the 9-months revolution and transition to true secondary intersubjectivity.

## **CULTURAL DIFFERENCE IN PARENTING**

Two types of parenting have been repeatedly pointed out as a cultural difference: regulator or authoritarian type and facilitator or authoritative type. Japanese parenting is classified in the latter (facilitator) type, which relies more on affective ties and empathy rather than parental control. In the authoritative childcare in Japan, children are expected to take the parent's intention on their own and control their behavior accordingly. In addition, Japanese mothers are characterized by their tendency to follow, not control, the infant (Azuma, 1994).

Keller et al. (2009) proposed another dichotomy in parenting types: distal and proximal. Japanese mothers and infants engage in more bodily contact than their U.S. counterparts (Rothbaum et al., 2000), and can be classified into the proximal type. Japanese parents feel less averse to their infants' bodily waste than French parents do (Negayama and Norimatsu, 2009), which is supportive evidence of a stronger psychological closeness to the infant body in Japanese culture. As mentioned previously, bodily contact brings a feeling of oneness, and the Japanese authoritative parenting relies on this mutually minded attunement even when physically separate.

Japan and Scotland are also culturally different in the structure of childcare observed in, e.g., behaviors of feeding (Negayama, 1998–1999) and of the caregiver–infant relationship in putting children to bed (Negayama and Kawahara, 2010) in day nurseries. These studies showed a stronger Japanese motivation to comfort the infants patiently and with greater contact. This may be reflected in the process of pick-up and holding studied in the present paper. Mother-infant attachment patterns have been classified into four types (secure, avoidant, ambivalent, and disorganized) by the Standardized Strange Situation paradigm (Solomon and George, 2008). The paradigm also shows a remarkable cultural difference (IJzendoorn and Sagi-Schwartz, 2008).

All these findings are related to cultural differences in mother– infant intersubjectivity, and we expect these differences could result in differences of timing and organization of behavioral patterns observed in the put-down, pick-up, and carry paradigm. Thus, fine kinematic analysis of interaction during these phases is likely to be a sensitive and promising window with which to explore the development of these cultural differences in the mother–infant relationship.

## **SUMMARY OF THE AIM**

The aim of this paper is to identify and define culturally and developmentally specific patterns of motor timing and form by kinematic analysis of maternal movements and mother– infant behaviors. Mother–infant pairs in Japan and Scotland were observed once at 6 months and once at 9 months. An identical procedure was employed at both sites and at both ages to afford comparison of action and interaction timing and forms before the onset of secondary intersubjectivity and at its onset (Trevarthen and Hubley, 1978) at the so-called 9-months revolution (Tomasello, 1995).

## **MATERIALS AND METHODS**

## **PARTICIPANTS**

Eleven Japanese and ten Scottish healthy infants between the age of 5 and 6 months participated with their mothers. They were recruited at local nursery schools in Japan and through word of mouth, parent groups, and nurseries in Scotland. Mother and infant pairs participated in the experiment twice: first at the infants' age of ca. 6 months, and second at ca. 9 months. This study is a part of a bigger project on the development of mother– infant gap closure with three different approaches of picking up, feeding, and playing. Six months was chosen as the normal starting age of solid-food. Background information of the participants is shown in **Table 1**. This study was approved by the Ethical Review

**Table 1 | Japanese and Scottish mothers and infants participated in the study.**

Board of Waseda University (No. 2012-273). Written informed consent was obtained from each mother or father.

## **PROCEDURE AND DATA RECORDING**

Data recordings of Japanese and Scottish participants were carried out at a laboratory at Waseda University and another at the University of Strathclyde, respectively. Participants' visit schedules were tailored to fit the infants' eating and sleeping patterns so the infants arrived in an awake and alert phase some 30 min prior to typical feeding time.

All mother and infant pairs performed pick-up and carrying tasks at 6 and 9 months. This study is a part of the bigger research project, and the present task was performed at first of four different tasks: (i) mothers put-down, then picked up their infant from the floor, (ii) mothers fed their infant with solid-food with a spoon, (iii) mothers tickled their infant in free play of about


15 min, and (iv) mothers and infants played an action-word game task. During the tasks, mothers' and infants' body movements and interactions were audio–video recorded with two or three standard consumer digital video cameras and their movements recorded by optical motion capture systems. Each digital video camera was either mounted on a tripod to record the mother's and infant's whole body within the frame, or hand-held to allow focus on eye gaze and specific facial expressions.

For motion capturing, comparable 3D motion analysis systems were employed: a 12-camera Optitrack system (NaturalPoint Inc., USA) employed at Waseda University in Japan, and a 12-camera Vicon Nexus system (Oxford, UK) employed at the University of Strathclyde in Scotland. Reflective markers were attached to the mother's head, shoulder, back, arm, hand, and waist, and infant's head using a similar configuration. In Scottish cases, additional markers were attached to the infant's shoulder, back, arm, hand, waist, and leg. Optical motion capture data were collected at 100 Hz. The motion capture floor space was marked by use of nearly identical, soft brown carpets measuring 2600 mm × 2000 mm and 2300 mm × 3300 mm for the Japanese and Scottish situations, respectively. The room itself was larger in Scotland and facilitated greater freedom of movement. The room floor sizes were ca. 3.9 m × 9.9 m with a height of 3.0 m in Japan and ca. 8.0 m × 18.0 m with a height of ca. 4.0 m in Scotland.

After finishing all tasks, information on mothers' and infants' birth date, sibling number and parity, maternal education and employment, and health were collected by brief interview (see **Table 1**).

## **MOTION ANALYSIS**

Each mother's and infant's motion during the 'approaching' phase of the pick-up and carrying task was focused on and analyzed to better understand the intersubjective engagement (**Figure 1**). Body part trajectories were obtained either from a calculation of the average position of three or four markers placed on a rigid surface and attached to each body part (Japan), or from single marker displacements placed directly on the body (Scotland).

Each trajectory was time-shifted to *t* = 0 at the mother's contact point with her infant. The contact point was determined by calculation of 5% of the maximum velocity of the distancegap closure between mother's hand and infant's head in the following steps: (i) calculating the distance sequence between mother's dominant hand and infant's head positions, (ii) velocity sequence was obtained by calculating differences between adjacent values of the distance-gap trajectory, (iii) velocity trajectory was smoothed by Gaussian smooth function whose cut-off frequency was 9 Hz, (iv) two 5% of maximum velocity points were obtained for each approach, and (v) the second 5% point was determined to be the contact point. This contact point was used to assess the mothers' and infants' coordination of body movement, kinematics, behavioral and affective expressivity, and motor anticipations during the mothers' approach to pick-up her child.

In this approaching phase, mother's typical motions were annotated by observation of the video. The time-points at which (i) the mother stopped walking (two feet at final position), (ii) touched her infant's body (contact), (iii) lifted up infant's entire body from the floor (lifting), (iv) finished picking up and started to hold the infant in a stable manner (stable holding), and (v) started to walk by lifting the foot (walking), were chosen. These time course was analyzed together with motion-captured data.

One trial was selected for each pair at 6 and 9 months on the basis of the infants' emotional stability and the behavioral visibility. We selected an earlier trial if more than one trials met the standard. Then 11 Japanese and 10 Scottish mothers' trajectories

were evaluated, and their kinematic parameters were compared in each 6- and 9-months group.

## **BEHAVIORAL ANALYSIS**

Based on an idea that the current put-down/pick-up procedure is regarded as a simplified separation-reunion situation, "leaving" and "approaching" phases were annotated in ELAN (Max-Planck Institute for Psycholinguistics, Nijmegen, The Netherlands). The "leaving" phase was defined as starting with the mother's placement of her infant on a floor and ending with the stop of her stepping back. The "approaching" phase was defined as section from 2.5 to 0.5 s before touch.

Four infant behaviors were coded for each phase: eye gaze, negative reaction, positive reaction, and arm reaching. Occurrence of eye gaze was judged by the visual or facial orientation to mother irrespective of the frequency or intensity. Negative and positive reactions were coded on the basis of facial expression, tone of voice, and body movement. Occurrence of arm reaching was judged by the extension of arm to mother irrespective of the frequency or intensity. Inter-rater reliability between two experienced coders was calculated by independently coding all the Japanese data, and was within an acceptable range (Kappa was 0.75). Kappa's for eye gaze toward mother, negative reaction, positive reaction, and arm reaching calculated by independently coding randomly chosen 30% of data were 0.92, 0.89, 0.67, and 0.91, respectively.

Further, the greeting-like behaviors in mothers (hand/arm opening and vocalization) and infants (leg flailing and vocalization) were annotated and timing of the occurrences was measured by ELAN. Mothers' hand/arm opening was a quick extension of fingers and/or arms just before pick-up, and infants' leg flailing was a jerky movement of lifted legs. Inter-rater reliability for each behavior was calculated by coding 14 randomly chosen pairs by a second experienced coder, and was within an acceptable range: Kappa's for mother's hand/arm opening and vocalization and infant's leg flailing, vocalization were 0.96, 0.81, 0.92, and 0.85, respectively. Then a gap between the mother's hand and the infant's head at the moment of the behavior occurrences was calculated from the motion capture coordinate values of the markers.

Holding is a joint behavior requiring active participation between mother and infant after the infant acquires motor control (Negayama et al., 2010). The style of maternal holding was analyzed by recording the placement of the mothers' hand holds on her infant, which fell into two categories: infant's back/armpit and bottom. Maternal 2-hand positions were classified into two patterns: bottom-back/armpit and bottom–bottom. The latter pattern requires autonomous posture control by the infant and allows the infant's free body movement, both signs of advanced development. Kappa for the judgment of maternal hand position was 0.72.

# **RESULTS**

## **ACTION PATTERNS AND KINEMATICS OF MOTHERS' APPROACH**

Picking up the infant from the floor required transport of the hand to the grasping point of the infant along the trunk and under the arms. In this paradigm, mothers began their approach several paces away from their child and therefore control of gait, leaning, and/or kneeling or squatting was required to move into proximity of the child, altogether enabling displacement of the hands to the point of best purchase along the infant's trunk.

Trajectories of distance between mother's dominant hand and the infant's head in each of mother–infant dyads at 6 months are given in **Figure 2A**. The curves are very smooth, which means that the approach was generally constant. In spite of the smooth approach, the speed of movement in the mothers' dominant hand to reach the infants was highly variable as shown in **Figure 2B**. The gap closure of the approach became smooth due to this moment-by-moment adjustment of velocity of the hand to absorb different speeds of movement in different body parts to reach its goal efficiently.

Japanese and Scottish mothers approached their infants with different styles. Changes in the height of waist in **Figure 3** show that the Japanese mothers stepped forward with crouching or kneeling at the feet of their infants than the Scottish mothers at 6 months, giving earlier closeness in proximity during the approach. Duration of squatting at 6 months was significantly correlated with age (days) of the infants in the Japanese pairs (i.e., longer squatting for the younger infants, Pearson's *r* = –0.633, *p* = 0.037). On the other hand, the Scottish mothers were characterized by a higher

waist position at 9 months than the Japanese mothers and by the lack of kneeling and squatting, with the exception of a one-knee kneel by one mother with difficulty in picking up her infant. The Scottish mothers picked up their infant by just bending the torso forward.

We reasoned the kinematics of the continuous gap closure between the mother's hand and the head of the infant on approach to contact (**Figure 4**) was a good marker of the quality of the approach, indicated by computations of duration, velocity, and distance of approach. Of the approaches analyzed, some movements exhibited discontinuous gap closures due to long pauses in the kneel/squat phase in the case of Japanese mothers (one at 6 months, three at 9 months), or clapping and arm waving in the case of Scottish mothers (one at 6 months, two at 9 months) and were excluded. The remaining movements were comparable in producing a single continuous velocity to contact with the infant.

Analyses of variance (Welch's test) of these continuous closures of the hand to the point of contact with the child (**Figure 4**) revealed significant differences in kinematics between Japanese and Scottish mothers' movements at 6 months and at 9 months. The duration of the final continuous closures was significantly longer in Scottish mother–infant pairs than in Japanese pairs at 6 months, but not at 9 months (**Figure 4A**). The distances of the closure was significantly longer in Scottish pairs than in Japanese pairs at both 6 and at 9 months (**Figure 4B**). Finally, the average velocity of Scottish mothers at 9 months was significantly greater

than that of their Japanese counterparts (**Figure 4C**), but this was not the case at 6 months.

## **SEQUENTIAL TIMING OF PICK-UP BEHAVIORS**

The sequence of movements made by mothers to pick-up their infants was mapped to produce a sequence of time-points: starting with two feet reaching the final position before contact, contact with the infant next, then onset of lift, and finally onset of walking (i.e., carrying). Correlation analysis of these time-points among the mothers at each age revealed a higher correlation between behaviors at 6 months for both Japanese and Scottish mother–infant pairs than at 9 months (**Table 2**). Further, Japanese mothers at 6 months demonstrated stronger correlations with more behaviors correlated than their Scottish counterparts, indicating more regular, structured coordination in their sequence of actions with greater similarity among the mothers. Interestingly among the correlated behaviors, there was no significant correlation between timing of final foot position and contact in Scottish mothers, whereas timing of this final foot position was significantly correlated with those of contact and lift, and timing of contact was significantly correlated with those of all other behavioral markers for the Japanese pairs. This suggests a sequential programmed engagement that purposefully accounted for the foot position in Japanese, but not in Scottish pairs, which suggests a more various and flexible patterning in Scottish mothers. Significant correlations were seldom observed at 9 months for both populations.

## **CONTACT THROUGH EYE GAZE, VOCAL, AND GESTURAL COMMUNICATION**

Japanese and Scottish infants differed markedly in the amount of expressive gestural action and vocalization made during their mothers' approach to pick-up, with Scottish infants more active and expressive than the Japanese ones (**Table 3**). The paradigm involves the mother placing the infant on the floor and withdrawing a few steps before approaching again, presenting a mild separation and reunion between mother and infant.

The difference between Scottish and Japanese was checked by Fisher's exact test. Scottish infants looked at their mothers more often than Japanese infants did. At both 6 and 9 months, most Scottish infants (8/8 and 9/10, respectively) maintained eye gaze with their mothers as they withdrew, while a smaller proportion of Japanese infants did (4/11 and 6/11; *p* = 0.007 and 0.094, respectively). And at 6 months all Scottish infants (8/8) held their gaze on their mothers as they approached, but only about half of the Japanese infants did (6/11; *p* = 0.040). Scottish infants reached

with their arms and hands more often than Japanese infants did at the approach phase at 6 months (0/11 and 4/8 for Japanese and Scottish, respectively, *p* = 0.018), but at 9 months no significant difference was observed (4/11 and 6/10 for Japanese and Scottish, respectively; *p* = 0.260).

Mother and infant 'greeting' behaviors in the approach phase differed between cultures (**Table 4**). Scottish infants vocalized at approach at 6 months, but the Japanese infants did not (1/11 and 6/8 for Japanese and Scottish, respectively, *p* = 0.006). Further, at 9 months, a greater proportion of Scottish infants tended to flail their legs as their mother approached than did the Japanese infants (1/11 and 5/10 for Japanese and Scottish, respectively, *p* = 0.055). On the mothers' side, there was also some tendency that a greater proportion of Scottish mothers showed greeting-like arm and hand gestures at 9 months than Japanese mothers did (2/11 and 6/10 for Japanese and Scottish, respectively, *p* = 0.063), although both Scottish and Japanese mothers were unlikely to display it at 6 months (1/11 and 3/8 for Japanese and Scottish,


Pearson's r between onset times of sequential pick-up behaviors for Japanese and Scottish mother–infant pairs at 6 and 9 months (\*p < 0.05, \*\*p < 0.01).


**Table 3 | Incidence of infants' expressiveness and sensory contact between mother and infant during withdrawal and approach.**

aFisher's exact test.

respectively). Finally, there appeared to be balance between the proportion of mothers who vocalized on approach to their child between populations and ages.

Altogether, it appears the Scottish infants monitored their mothers more carefully and were motivated to react to the situation more actively than Japanese infants at 6 months, Scottish mothers also tended to show more frequent bodily gesture. However, there were no significant differences in the positive or negative expression of affect *per se*, which means that although Scottish mothers and their infants produced and maintained more overt and direct sensory contact, their affective experiences did not appear to be dissimilar.

## **INFANT AND MOTHER INTIMATE SPACE**

The distance between mother's hand and infant's head at the moment of expression of particular styles of greeting was measured between the locations of markers on the mother's hand and infant's head. Expressions of greetings from both mother and infant occurred in a zone between 400 and 1,800 mm with a stable median around 1 m from the infant's head to the mother's hand as the mother approached (**Figure 5**).

Median (and Quartile Deviation) of the gaps in all the greeting behaviors by mothers and infants at 6 and 9 months were 787 (57) mm and 948 (20) mm for the Japanese pairs and 1302 (288) mm and 1139 (515) mm for the Scottish pairs, respectively. The similarity at 9 months in spite of difference in experimental floor space between Japan and Scotland, suggests a common border at about 1 m between the mother's hand and the infant's head separating the intimate and outer spaces active in both mother and infant at 9 months.

Finally, the timing of occurrence of these behaviors was compared between mother and infant within each pair to determine if one or the other initiated expressive participation. The analysis failed to find any consistent initiator–follower relationship

**Table 4 | Mothers' and infants' 'greeting' behavior incidence in the approach phase for reunion.**


aFisher's exact test.

between them in either of the two ages (*p*'s = 1.00 and 1.00 for 6 and 9 months, respectively, by binominal test applied to greeting behaviors of mother and infants).

## **MATERNAL HOLDING AND INFANT PARTICIPATION**

The infant's autonomous orientation was enabled by a change in the mother's holding style. At 6 months, mothers predominantly held their infants with one hand on the infant's bottom and the other hand on the infant's back or armpit (73% of Japanese pairs and 88% of Scottish pairs), rather than using both hands to support the bottom. At 9 months, only about half of the mothers held their infants with one hand on the infant's bottom and the other hand on the back or armpit (55% of Japanese pairs and 60% of Scottish pairs). The other half used both hands to support the bottom showing a more advanced style.

# **DISCUSSION**

Complex mother–infant interactions during the phases of approaching, picking up, and holding the infant were analyzed at two developmental ages, 6 months and 9 months, which correspond to the period immediately preceding and immediately after the onset of secondary intersubjectivity and joint attention. Motion analysis together with video micro-analysis revealed interesting age and cultural differences.

## **MOTHER–INFANT MOTION ATTUNEMENT IN PICK-UP AND HOLDING**

The smooth approach of the mothers' hands was produced by moment-by-moment adjustment of its speed (Bernstein, 1967). The degree of smoothness of the hand movement as it approached may affect the infant's anticipation and adaptive motor response. Reddy et al. (2013) found that infants as young as 2 months prepared to be picked up by their mothers with postural adjustments to muscle tensions in the back, neck, hands and legs, as well as in expressive gestural communication. This increase in muscular strength and regulatory autonomy appears to develop a behavioral-biological pattern that may have facilitated more autonomous infant attunement with their mother's behavior, enabling a common, shared goal orientation. These behaviors also give some indication for understanding the more efficient pick-up at 9 months as a more active, joint collaboration between mother and infant than at 6 months.

Holding is a complex joint action between mother and infant (Negayama et al., 2010), and is also part of wider context including the approach before pick-up. Mothers and infants of less than 1-year-old cooperated to make a smooth pick-up possible. Our data suggest that early onset of social action anticipation continues to develop and improve over the first year of life, and is made in a collaborative fashion by both mother and infant to actualize culturally different fine attunements for efficient and smooth pick-up.

Ishijima and Negayama (2013) observed Japanese mother– infant tickling interaction longitudinally and found an expectant ticklishness in infants before an actual touch by the mother at 6½ months of age. Such anticipation of contact in everyday-life interactions (e.g., tickling or holding) is a sign of awareness of the other's intention. Such awareness may support social cognitive development as it progresses from primary intersubjectivity to secondary intersubjectivity, developing 'mind-reading' of the other's intentions in anticipation of its action consequence (Sinigaglia and Rizzolatti, 2011), especially during regularly patterned episodes of inter-body interaction (Delafield-Butt and Trevarthen, 2013; Trevarthen and Delafield-Butt, 2013).

A disturbance in the organization of action at 9 months in both countries, on the other hand, was possibly caused by greater initiative and participation of the infants at this age that demanded compensations and adjustments from the mother. Infants were lighter and required less strength at 6 months, potentially freeing up one hand for additional support. Improved postural control by the infant and self-regulated stability of the upper body at 9 months may have allowed the mothers to use both hands to support their bottom, allowing for their increased weight to be supported safely with both hands. It also allowed more freedom for the 9-month-olds to turn and face the same direction as the mothers to share the perspective while walking.

## **CULTURAL DIFFERENCES IN MOTHER–INFANT INTERACTIONS AT PUT-DOWN AND PICK-UP**

The paradigm of the present study could be taken as a milder version of the separation-reunion situation the Strange Situation paradigm employs. Researches using the Strange Situation indicate that Japanese infants protest at separation (Ujiie and Miyake, 1985; Takahashi, 1990). ButTakahashi (1990)reported a reduction in the distress response of Japanese children in a more familiar and typical situation than the standardized Strange Situation. Separation in the present study, with the mother kept within sight and within a few steps of the infant, further ameliorated the distress of separation for the Japanese infants, who were calmer than their Scottish counterparts.

Scottish and Japanese mothers showed different ways to actualize embodied intersubjectivity with different reactions to the mild separation and reunion on the basis of different cultural frameworks of mother–infant relationship. The measured response in the reunion phase somewhat parallels the Strange Situation (Ainsworth et al., 1978), which includes a measure of how an infant deals with separation anxiety by measurement of the affective expression and behavior in the reunion phase. Affect is not always expressed. Thus, we are interested in the reunion phase as indicative of mild separation affectivity, and how a culture negotiates these everyday feelings of separation and reunion. In another study, Japanese mothers typically stayed with their infants when putting them to bed, until they fell asleep. They were reluctant to leave them alone. This was in contrast to Scottish mothers who more often left their infants alone to fall sleep, even when crying (Negayama, 1997). Such cultural differences in the negotiation of separation may produce generalized, lasting differences in affective expectations in the to and fro dynamic of social relations.

In this study, Scottish mothers and infants explicitly tried to interact with each other, whereas Japanese mothers and infants were much less active. Almost all the Scottish infants looked at their mothers before being picked up, while Japanese infants seldom did so. A greater proportion of the Scottish infants reached their arms toward their approaching mothers and vocalized than did their Japanese infant counterparts at 6 months. These results suggest that the Scottish infants monitored their mothers carefully and anticipated the timing of contact, tried to interact with their mothers, and then reached their arms to cooperate with their mothers for the pick-up, especially at 6 months. Such expectation, cooperation and behavioral adjustment at 6 months in Scotland – before the so-called 9-months "miracle" or "revolution" (Tomasello, 1995) – is evidence of active anticipation of the patterns of participation made in regular patterns of embodied intersubjective engagement. This feature of social knowledge and awareness is clearly evident even at 2 months of age (Reddy et al., 2013), and data indicate it is active even at birth (Nagy and Molnar, 2004; Nagy, 2011; Kugiumutzakis and Trevarthen, 2015), with a first rudimentary social awareness emerging in mid-gestation fetal life (Castiello et al., 2010). The fact that Japanese infants exhibited less active expressivity at 6 and 9 months raises important questions on the cultural nature of social anticipation and its communicated affective expression – features that underpin attachment style classification. It is possible the Japanese infants were unaware or disinterested in the social patterns of engagement, but given the studies cited above we find this unlikely. Rather, it appears Japanese infants hold their social expectations differently, with different impulse for sharing expressively their affectivity.

A certain number of Scottish mothers took a higher waist position without squatting at contact. It resulted in the production of a greater mother–infant inter-body gap significant at both 6 and 9 months (**Figure 4B**). Scottish caregivers and infants are interpreted as being more distal than proximal (Keller et al., 2009; Negayama and Kawahara, 2010), and some Scottish mothers' higher waist position might be a reflection of, or contribution to, this greater inter-body distance. Overall, kinematic measures revealed distances of the final gap between the mother's hand and the infant's head were significantly longer in duration in Scottish pairs than in Japanese pairs at 6 months (**Figure 4A**), which also supports this notion of a more distant, or 'distal' care-giving by the former. In contrast, the Japanese mothers brought their torso closer to the infants before final contact with their infant. This gave a longer duration, shorter distance approach with slower

speed that altogether produced a gentler, more intimate closure that supports the notion of a more 'proximal' care-giving style by Japanese mothers.

The situation was also a playful situation in which the approaching mothers and waiting infants interacted with arm/hand opening for the mothers and leg flailing for the infants together with vocalizations. These behaviors appeared to be"greetings" with expectant arousal and interest given to each other just before the moment of reunion, as the two were beginning to come together. The Scottish pairs appeared to be more strongly motivated to interact with each other.

These greeting behaviors occurred in a zone at around 1 m distance between mother's hand and infant's head in both countries. The similarity in the distance among the participants at 9 months in spite of difference in the floor spaces of the experiments in Japan and Scotland (see Materials and Methods) strongly suggests the existence of a common psychological border between two different spaces at about a ½ m; the infants were placed on the floor with their legs pointing to their mother, suggesting the 1 m head-hand distance was equivalent to a 0.5 m inter-body distance. Mothers and infants greeted each other when the mother crossed this border on approach. This finding is in agreement with the "peri-personal space" where a multisensory interaction and perceived illusion of tactile and visual sensations of hand occurs immediately around the body (Maravita et al., 2003; Lloyd, 2007). It may be that in the peri-personal space the mother can achieve intimacy and security with her infant even apart from her. The explicit greeting behaviors also might have functioned to help mutual adjustment of the timing of behaviors for the effective pick-up.

For the Scottish it was a playful game-like situation of mutually reading one's partner's intentions in their action, and adjusting or attuning one's behavior during the welcoming return phase as the mother approached to pick-up her infant. This is an everyday-life experience of putting an awake infant to bed and retrieving the infant after sleep, while Japanese infants are almost never forced in such a way to be separated when awake and compelled into sleep (Negayama, 1997). Scottish children in a day nursery were put to sleep while crying with much less bodily contact than in Japan (Negayama and Kawahara, 2010). Thus, Scottish mothers and infants would be accustomed to greet at the reunion, and the infants perhaps developed a habit to complain more noisily at a forced separation.

Evidence indicated Japanese mothers were more empathetic with their infants than their Scottish counterparts during feeding (Negayama, 1998–1999). Having more accommodating and less individualistic traits (Rothbaum et al., 2000), the more contactseeking of the Japanese pairs might have been inclined to make effort in a more proximal sharing, rather than a distal exchange, of positive emotion. At the same time, it is quite normal for Japanese infants to be laid on a tatami-floor and picked up frequently in the non-sleep context during the course of everyday-life, which is similar to the procedure of the present study. Thus, the separationreunion by put-down and pick-up may not have been particularly arousing for Japanese infants, and no special motivation for an energetic, positive mutual engagement was encouraged.

This paper expands on how differences in co-regulation and shared expressions of affect, made within culturally specific, embodied and enacted patterns of daily engagement, such as those identified here, establish the early foundations of a cultural anticipation and regulation of affective expressivity within an individual, to be propagated and adapted throughout later childhood and adult life. Learning the expectations and patterns of co-regulation of feelings, and their expressive form manifest in play and everyday rituals of companionship, define and build the character of a community, and its cultural forms of expression (Bruner, 1990; Frank and Trevarthen, 2012).

## **CONCLUSION**

In this paper, we have identified and measured early features of the regulation of affect, expression, and motor pattern in an everyday embodied intersubjective engagement. We have given evidence to the participatory nature of the interaction from both sides, mother and infant. Future study will help to map the more detailed ontogenesis of a culture, to discern differences in the elements of social expectation, affectivity, and expressivity, and the contribution of both parents and infants to its specific form. Such study may help to elucidate not only the genesis of the cultural form of nations, but also differences in patterning during distress or in cases of pathology. Elucidation of cultural patterns of development is an important route for understanding cognitive as well as socio-emotional development.

## **ACKNOWLEDGMENTS**

We kindly thank the mothers and their infants for their time and participation, without whom this work would not have been possible. This project was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number 23653193 to Koichi Negayama, MEXT KIBANKEISEI Grant Number S1001028 to Dr H. Kumano, a Visiting Fellowship (S-1307) from the JSPS to Jonathan T. Delafield-Butt, and a Bridging the Gap grant from the University of Strathclyde to Jonathan T. Delafield-Butt and Andrew Murphy. We also thank Kerry Gunn for her assistance establishing the Scottish side of the study.

## **REFERENCES**


Stern, D. N. (1985). *The Interpersonal World of the Infant*. New York: Basic Books.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 May 2014; accepted: 13 January 2015; published online: 27 February 2015.*

*Citation: Negayama K, Delafield-Butt JT, Momose K, Ishijima K, Kawahara N, Lux EJ, Murphy A and Kaliarntas K (2015) Embodied intersubjective engagement in mother–infant tactile communication: a cross-cultural study of Japanese and Scottish mother–infant behaviors during infant pick-up. Front. Psychol. 6:66. doi: 10.3389/fpsyg.2015.00066*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2015 Negayama, Delafield-Butt, Momose, Ishijima, Kawahara, Lux, Murphy and Kaliarntas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Playful expressions of one-year-old chimpanzee infants in social and solitary play contexts

#### *Kirsty M. Ross 1,2, Kim A. Bard2 \* and Tetsuro Matsuzawa3*

*<sup>1</sup> Department of Psychology, University of Winchester, Winchester, UK*

*<sup>2</sup> Department of Psychology, Centre for Comparative and Evolutionary Psychology, University of Portsmouth, Portsmouth, UK*

*<sup>3</sup> Department of Behavioral and Brain Sciences, Primate Research Institute, Kyoto University, Kyoto, Japan*

#### *Edited by:*

*Hanne De Jaegher, University of the Basque Country, Spain*

#### *Reviewed by:*

*Jonathan T. Delafield-Butt, University of Strathclyde, UK Elisabetta Palagi, University of Pisa, Italy*

#### *\*Correspondence:*

*Kim A. Bard, Department of Psychology, Centre for Comparative and Evolutionary Psychology, University of Portsmouth, King Henry Building, King Henry I Street, Portsmouth PO1 2DY, UK e-mail: kim.bard@port.ac.uk*

Knowledge of the context and development of playful expressions in chimpanzees is limited because research has tended to focus on social play, on older subjects, and on the communicative signaling function of expressions. Here we explore the rate of playful facial and body expressions in solitary and social play, changes from 12- to 15-months of age, and the extent to which social partners match expressions, which may illuminate a route through which context influences expression. Naturalistic observations of seven chimpanzee infants (*Pan troglodytes*) were conducted at Chester Zoo, UK (*n* = 4), and Primate Research Institute, Japan (*n* = 3), and at two ages, 12 months and 15 months. No group or age differences were found in the rate of infant playful expressions. However, modalities of playful expression varied with type of play: in social play, the rate of play faces was high, whereas in solitary play, the rate of body expressions was high. Among the most frequent types of play, mild contact social play had the highest rates of play faces and multi-modal expressions (often play faces with hitting). Social partners matched both infant play faces and infant body expressions, but play faces were matched at a significantly higher rate that increased with age. Matched expression rates were highest when playing with peers despite infant expressiveness being highest when playing with older chimpanzees. Given that playful expressions emerge early in life and continue to occur in solitary contexts through the second year of life, we suggest that the play face and certain body behaviors are emotional expressions of joy, and that such expressions develop additional social functions through interactions with peers and older social partners.

**Keywords: play face, communication, emotion, development, chimpanzee, infancy**

# **INTRODUCTION**

Chimpanzee playful expressions have typically been studied within social contexts, driven primarily by an interest in communicative function. However, solitary play is a distinctive feature of chimpanzee infancy with playful expressions being reported during solitary play (Cordoni and Palagi, 2011). Therefore, the study of playful expressions is incomplete without considering their occurrence in a variety of social and solitary contexts. Comparisons across contexts are essential in evaluating the extent to which these expressions function as social signals, expressions of individuals emotional state, or some combination (Seyfarth and Cheney, 2003; Gaspar, 2006). Moreover, social partners sometimes match playful expressions, which prolongs play bouts (Davila-Ross et al., 2011). Here we explore the rate of playful facial and body expressions in solitary and social play, and the extent to which social partners match expressions, which may illuminate a route through which context influences expression.

Chimpanzee play is punctuated by a variety of facial, vocal, and body expressions. These expressions convey information about an individual's motivations, intentions, and emotions, which may influence the recipient's behavior (see Owren et al., 2010; Seyfarth et al., 2010, for debate on the importance of information vs. influence in communicative signals). Play faces (relaxed open mouth displays with the teeth either covered by the lips or exposed to varying degrees) and the laughter-like vocalizations which sometimes accompany play faces (soft, breathy pants or grunts) appear almost exclusively during play (van Hooff, 1973; Parr et al., 2005; Davila-Ross et al., under review). Play faces can play a role in initiating and maintaining play (Tomasello, 2008), and matching of play faces and laughter by social partners prolongs the duration of play bouts (Waller and Dunbar, 2005; Davila-Ross et al., 2011).

Many expressive body behaviors are observed during chimpanzee play including hitting and kicking, raised arms, ground slaps, foot stomps, pokes, head bobs, hand claps, and throwing (Tomasello et al., 1994; McCarthy et al., 2013). These behaviors are not exclusive to the play context and can be found in contexts that are more aggressive. Play faces, when combined with such potentially ambiguous behaviors, may function to modify the meaning of these behaviors and clarify to social partners and observers that these behaviors are playful rather than aggressive (Pellis and Pellis, 1996; Bekoff and Allen, 1998; Palagi and Mancini, 2011). Chimpanzees may use certain behaviors, such as throwing objects and hand clapping, to draw attention to the play face or other visually perceived gestures (Leavens et al., 2004; Liebal et al., 2004a,b; Tomasello, 2008). Juveniles have been observed to adjust the frequency of their play face displays during high intensity rough and tumble play, according to the age of their social partner and the audience, providing evidence of the signal value of play face expressions in combination with other behaviors to reduce the uncertainty of play partners and observers (Flack et al., 2004). However, play faces are not sufficient or necessary to determine whether or not behaviors are playful, and situational cues and behavioral sequences also contribute to the interpretation of playfulness (Pellis and Pellis, 1996; Bekoff, 1998).

Chimpanzee infants are capable of using a large repertoire of playful expressions by the end of their first year. Play faces and laughter appear within the first 2–3 months of life, often in response to gentle tickling by mothers (van Lawick-Goodall, 1968; Bard, 2002; Bard et al., 2011). Tickle request gestures, where the arms reach backwards over the shoulders, develop over the first year. Although Plooij (1978, 1979) argued that this communicative gesture emerged from a defense mechanism, Bard et al. (2014b) demonstrate that this gesture develops gradually, based foundationally on intersubjective meaning-making. There is general agreement, however, that this gesture is used to initiate and maintain play with mothers and other adults. Other forms of playful body expression also appear around the end of the first year coinciding with infants exploring further away from their mothers and interacting with other social partners (Schneider et al., 2012).

The emotional aspect of chimpanzee playful expressions has been somewhat neglected because of the focus on their communicative value. However, expressiveness of chimpanzees develops in interaction with their early socio-emotional environments (Bard, 2005; Bard and Leavens, 2009). Emotional tone cannot be separated from playful expressions and indeed emotions may be an integral component of successful communication (Bard et al., 2004; Parkinson, 2005; Gaspar, 2006; Bard et al., 2014b) with further links between flexibility in expressiveness, attractiveness, social cognition, and social popularity (Bard et al., 2011, 2014a). Chimpanzees are sensitive to the emotional tone of facial expressions, and can match facial expressions to emotional video scenes, beyond prototypical associations, in experimental settings (Parr, 2003). Furthermore, asymmetries in chimpanzee facial expressions suggest right hemisphere lateralization consistent with emotional signals (Fernández-Carriba et al., 2002).

The basic emotional systems in the brain are similar across all mammals, both neuroanatomically and neurochemically, yet the capacity of non-human animals to experience emotion is denied or over-looked in much behavioral research (Panksepp, 2011). Panksepp (1998) has identified seven emotional operating systems in the mammalian brain (denoted by upper-case letters); some of these systems being evident from birth, with others, such as the PLAY system, being engaged at appropriate times in ontogenetic development. The emotional system for PLAY is primarily engaged in the infancy and juvenile periods, with remarkable similarity across mammalian species in the motivation to engage in physical rough and tumble play. Playful activity is often accompanied by expressive behaviors indicative of joy (such as the high pitched chirping "laughter" of rats, or the smiles and laughter of human infants) (Panksepp, 1998; Panksepp and Biven, 2012). The open-mouthed smiles expressed by human infants are indicative of excited arousal, playfulness, and joy, and they are similar morphologically and functionally to the chimpanzee play face (Messinger and Fogel, 2007). Several parallels are evident in the development of play behaviors in human and chimpanzee infancy: social smiles appear in the first few weeks, typically during gentle play with the mother; laughter follows at around 3- to 4-months often in response to tactile stimulation such as tickling; mothers are sensitive and responsive to infant expressions; and increasingly varied types of play appear later in the first year as socio-cognitive and motor skills develop and infants begin to explore opportunities for social and solitary play with their mother as a secure base (van Lawick-Goodall, 1968; Plooij, 1979; Bard, 2002; Messinger and Fogel, 2007; Bard et al., 2011, 2014b). If we accept that human infants experience and express joy during these playful behaviors then it seems a fair assumption that chimpanzee infants are also experiencing and expressing joy during similar playful behaviors. A contextual approach to the examination of chimpanzee playful expressions may help to illuminate the flexibility of their communicative and emotional functions, and identify those aspects of expression that are particularly influenced by the socio-emotional environment.

Chimpanzee infancy is a particularly interesting period for the contextual examination of playful expressions since play is more frequent and more diverse than at any other age. The frequency of chimpanzee play peaks around late infancy (van Lawick-Goodall, 1968; Savage and Malick, 1977; Lewis, 2005) with solitary play, object play, and locomotor play being particularly characteristic of infant play (Markus and Croft, 1995; Mendoza-Granados and Sommer, 1995; Nishida and Inaba, 2009; Cordoni and Palagi, 2011; Myowa-Yamakoshi and Yamakoshi, 2011). Social play behaviors develop rapidly during infancy. Tickle play and chase play have different developmental chronologies and require different gestural skills, even in infancy (Bard et al., 2014b). Infant rough and tumble play does not fully resemble the play fighting of juveniles and older chimpanzees but ranges from mild sparring in early infancy to more boisterous behaviors in later infancy (van Lawick-Goodall, 1968). Moreover, infant social play is less complex than that of juveniles, being characterized by a few highly repeated behaviors and greater asymmetry between play partners (Cordoni and Palagi, 2011). Infant social play may be functionally different to juvenile social play; infant play may help to develop social and motor skills, whereas juvenile and adolescent play may influence social dominance relationships (Byers and Walker, 1995; Burghardt, 2006; Palagi and Cordoni, 2012).

The context of play may influence the presence of an expression and the rate of expression. Play faces have been observed during infants' solitary play, though at a lower rate than during social play (Spijkerman et al., 1996; Cordoni and Palagi, 2011). Thus, the signal function of play faces may have even greater complexity than suggested by studies which concentrate on social play with predominantly older age groups. Less is known about the appearance of body expressions and multimodal expressions across the diverse contexts of infant play and the appearance of matched expressions in the context of social play. Comparisons of the modality of playful expressions across diverse types of infant play, and the matching of different modalities by social partners, can add to discussions about the functions of these expressions.

The purpose of the current study was to explore playful expressions across the diverse contexts of chimpanzee infant play to get a broad perspective on the communicative and emotional aspects of playful expressions. Infants were observed at the beginning of their second year to coincide with increased exploration at distances beyond arms reach of mothers, which broadens the range of social and solitary playful activities available to the infants (van de Rijt-Plooij and Plooij, 1987; Schneider et al., 2012).The whole-body nature of playful expressions was considered with attention given to play faces, playful body expressions, and multimodal facial and body expressions. Our approach was based upon studies of joyful emotional expression in human infancy where researchers code multiple behaviors as indicative of joy, including smiles, vocalizations, and positive motor activity (e.g., Aksan and Kochanska, 2004; Messinger and Fogel, 2007; Langerock et al., 2013).

There were two hypotheses. The first hypothesis was that rates of playful expressions would vary both by modality of expression and by play context. This prediction was based on our expectation that different modalities of expression would have different functions. For example, rates of play faces were expected to be higher during social play than solitary play in line with previous research. Few studies have considered body expressions and multimodal expressions, but we thought that they might be differentially evident in different type of social play, e.g., multimodal expressions might occur more often during play fighting, since play faces are thought to clarify the meaning of potentially ambiguous body expressions such as hitting. The second hypothesis was that social partners would match playful expressions of infants, as this would be one developmental process by which the communicative meaning of expressions might become established.

The influences of age and group setting were examined in addition to the two hypotheses stated above. Infants were observed at two ages, 12 and 15 months. Since the frequency of play increases steeply during infancy it was important to consider any age effects. We collected data from chimpanzee infants living in two group settings; all infants had similar experiences of good maternal care and interactions with non-maternal social partners, but the groups differed in size, in composition, and in daily routines, so it was important to examine group differences.

# **METHODS**

The study was approved by the Department of Psychology Ethics committee at the University of Portsmouth and permission to collect videotaped observations was granted by Chester Zoo, England, and the Primate Research Institute, Kyoto University, Japan. The research adhered to the legal requirements of the countries in which it was conducted; to the Guide for the Use and Care for Non-human Primates by the Primate Research Institute; and to the American Society of Primatologists (ASP) Principles for the Ethical Treatment of Non-Human Primates.

# **SUBJECTS**

Seven chimpanzee infants were observed at the beginning of their second year at Chester Zoo (CZ), England, and the Primate Research Institute (PRI), Kyoto University, Japan. See **Table 1** for demographic details. Infants within each group were born within 6 months of each other, and received good maternal care. Thus, there was opportunity for peer play and mother-infant play, alongside other types of play. During the day, both groups had access to a large outdoor garden, an indoor area, and climbing frames.

# *Chester Zoo, England*

The subjects were four infant chimpanzees living in a group with 27 other chimpanzees. Other group members were five adult males (18–40 years old), nineteen adolescent and adult females (8–35 years old), an older female infant (1.5 years old, born 3 months before the oldest focal infant), and two juvenile males (6 years old and 2.5 years old). All infants were raised by their mothers without intervention from the keepers. Mothers had been raised by their own mothers at Chester Zoo. The group had minimal interaction with keepers apart from daily health checks through bars and the supply of food.

# *Primate Research Institute, Kyoto University, Japan*

The subjects were three infant chimpanzees living in a group with 11 other chimpanzees. Other group members were three adult males (19–35 years old) and eight adult females (18–35 years old). Infants were raised successfully by their mothers despite their mothers' early rearing histories involving human caregivers. Prior to giving birth, mothers had received training in infant care by watching videos of wild chimpanzee mothers and infants and by practicing with a chimpanzee baby doll. The PRI group had daily interactions with human researchers in testing areas, where they were given experimental tasks and had the opportunity to manipulate a variety of objects. Infants had been attending these sessions with their mothers since shortly after birth (Matsuzawa et al., 2006).

# **OBSERVATIONAL PROCEDURE**

Observations took place April to November 2001 (PRI) and December 2005 to August 2006 (CZ) using the method of focal animal sampling (Altmann, 1974). Infants were observed at two


*Age of mother was determined at the start of the observations. aDied in early infancy.bAge is approximate as birth date unknown.*

ages (first observation: mean = 12.1, range: 11.4–12.5 months; second observation: mean = 15.0, range: 14.4–15.5 months). Observations were video-taped for later analysis. The PRI infants were observed during times when the infants and their mothers were engaging in everyday activities in their indoor and outdoor enclosures without any interaction with human observers (i.e., typically on Saturdays when there was no morning testing). Two to three hours of video were available for each PRI infant (1–2 h at each age). The CZ infants were observed during zoo opening hours (typically 10.00–16.00 h). Six hours of video were selected for each CZ infant (3 h at each age) as a representative sample of all observations.

The first author pre-screened the videos for playful behavior using INTERACT coding software from Mangold International. The behavior of focal infants was coded in 30-s intervals as playful, not playful, or not visible. Infants were judged to be exhibiting playful behavior when they were relaxed, alert and positively engaged in an activity that did not meet any immediate physical needs such as sustenance or comfort. Some exploratory behaviors were included within this definition. Reliability was tested by a second coder who coded 13% of the videos (396 min). Observed agreement was 92%, Cohen's kappa = 0.83. Playful behavior was observed in 55% of intervals on average (±*SD* 9%). The total time spent engaging in playful behaviors was 1006 min (mean = 287 ±*SD* 112 min) and these minutes were subject to further coding.

## **CODING PROCEDURE**

General playful behaviors (as identified through the pre-screening of videos) were micro-analyzed in 5-s intervals by the first author to identify play types, play partners, playful expressions, and playful expression matching by social partners. The coding schemes are described below.

## *Play context*

The playful behavior of the focal infant was coded as *social play, solitary play, not playful*, or *not visible*. Social play was coded when the infant directed playful behaviors toward another chimpanzee, regardless of response of the partner. Solitary play was coded if the infant was playing alone without visually attending to, or having any other playful contact with, another chimpanzee. For some analyses, social and solitary play were subdivided into 10 mutually exclusive and exhaustive sub-types of play (see **Table 2** for descriptions).

# *Social partners*

The partners of the focal infant during social play intervals were coded as *mother*, *adolescent/adult* (8-years-old or older), *peer* (any other infant), *juvenile* [any chimpanzee between 2.5- and 6-yearsold], or *not visible.* Juveniles were only present at CZ and not at PRI. The 2.5-year-old male at CZ was classed as a juvenile in the present study because he was highly independent from his mother in terms of body contact and transportation, at least during daytime observations (Goodall, 1965; van Lawick-Goodall, 1968, 1972; Bard, 2002). The codes mother and adolescent/adult were combined into *older chimpanzees* because two infants were rarely observed to play with their mothers but were observed playing with other adults when in close proximity to their mothers.

**Table 2 | Description of social and solitary play types.**


*aInfants were never observed to tickle another individual.*

## *Playful expressions*

Infant facial expressions and body expressions were coded for all 5-s playful intervals where the face and body of the focal infant was fully visible (67% of all playful intervals). The facial expressions of focal infants were coded as *play face*, *no play face*, or *not visible*. Play face was coded when mouth was partly or fully open, lower jaw was relaxed and dropped, and teeth could be either visible or not visible. The body expressions of focal infants were coded as *playful body*, *no playful body*, or *not visible*. Playful body was coded when limb or body movements in the context of play were quick, exaggerated, deliberate, and often repetitive. For some analyses, playful body was subdivided into five mutually exhaustive and exclusive codes: *acrobatics* (spins, rolls, tumbles, swings), *bouncing* (repetitive up and down body movements), *flailing limbs*, *hitting*, *tickle request gestures*. Unfortunately, play laughs were not detectable under these observational conditions.

## *Matched playful expressions*

The expressions of play partners were coded for all intervals where a focal infant displayed a playful facial or body expression. This was a measure of the co-occurrence of playful expressions between the infant and a play partner. Intervals with infant play faces were coded as *play face match* (both the infant and the play partner display a play face), *no play face match*, or *not visible*. Intervals with infant body expressions were coded as *body match* (both infant and play partner display a body expression of the same type), *no body match*, or *not visible*. A time-series analysis of expression synchrony was not attempted since observations in captive group enclosures meant that the view of the focal infant or their play partner was often obscured.

## *Reliability*

Reliability was tested by comparing the codes of a third coder to the codes of the first author for 14% of the 5-s intervals available for microanalysis (1646 intervals, taken from 4 h of observation of one chimpanzee). Good to excellent reliability (Bakeman and Gottman, 1997) was found for each coding scheme (observed agreement and Cohen's kappa scores, respectively): *play context*, 91%, kappa 0.89; *infant facial expression*, 87%, kappa 0.79; *infant body expression*, 94%, kappa 0.85; *matched play faces*, 88%, kappa 0.82; *matched body expressions*, 93%, kappa 0.85. Reliability was not tested for the social partner coding scheme since this was based on identification of individuals rather than judgments about behavior.

# **DATA ANALYSIS AND STATISTICS**

Statistical analyses were conducted using mean proportions of play time, mean rates of playful expression, and mean rates of playful expression matching (*N* = 7 unless otherwise stated). See **Table 3** for descriptions of how mean rates were calculated. The maximum possible rate of playful expression and matched playful expression was 12 intervals per minute (ipm) given that a minute of play consisted of 12 × 5-s intervals.

Repeated measures ANOVA was the main statistical tool unless otherwise stated (*N* = 7). Greenhouse-Geisser corrected values were reported when the assumption of sphericity was violated. Where there were comparisons of two means, repeated measures ANOVA was preferred to the equivalent *t*-test since it allowed examination of effect sizes (partial eta-squared). Mann-Whitney *U*-tests were used when making comparisons between groups because of the small and uneven sample sizes. The null hypothesis was rejected at an alpha level of 0.05.

# **RESULTS**

# **HYPOTHESIS 1: ARE THERE DIFFERENCES IN PLAYFUL EXPRESSIONS AS A FUNCTION OF GROUP, AGE, CONTEXT, OR TYPE OF SOCIAL PARTNER?**

Microanalysis in 5-s intervals identified 5059 intervals of play where the face and body of the focal infants was visible. Playful

## **Table 3 | Description of mean rate calculations.**


expressions were present in 26% of these intervals (1298 intervals) resulting in a mean playful expression rate of 3.04 intervals per minute of play (±*SD* 0.81 ipm). Most playful expressions were classified as either play faces (49%) or playful body expressions (38%); multimodal play face and body expressions accounted for a small proportion of expressions (13%).

## **GROUP SETTINGS**

The two settings differed in the size and composition of their social groups and in their daily routines. Differences across settings in the mean proportion of infant play time spent engaging in different types of play and with different social partners were examined using Mann-Whitney tests. The CZ infants engaged in social rough and tumble play to a greater extent than the PRI infants (CZ: 6% of play time ± *SD* 4%; PRI: 2% ± *SD* 2%; *Z* = 2*.*12, *p <* 0*.*05) and they had opportunity to engage with juveniles (28% of social play time, ± *SD* 8%; no juveniles at PRI; *Z* = 2.20, *p <* 0*.*05). The PRI infants engaged in social tickle play to a greater extent than the CZ infants (PRI: 7% of play time ± *SD* 2%; *CZ*: 1% ± *SD* 1%; *Z* = 2*.*12, *p <* 0*.*05). For all other types of social and solitary play and social partners, there were no significant group differences (*Z*s *<* 1.78, *p*s *>* 0.07).

A comparison of the rate of playful expressions in the CZ group (mean rate = 2.98 ± *SD* 1.13 ipm, *n* = 4) and in the PRI group (mean rate = 3.11 ± *SD* 0.17 ipm, *n* = 3) showed no significant difference (Mann-Whitney *U*-test: *Z* = 0*.*35, *p* = 0*.*72). Group had no significant effect on play face rate, body rate, and multimodal rate, during social play and during solitary play (Mann-Whitney *U*-tests: *Z*s *<* 1.41, *p*s *>* 0.16).

# **AGE**

Since play behaviors were broadly similar in the two group settings, the groups were collapsed for the age analyses. Age had no significant effect on the proportion of infant play time that was either social or solitary (*F* = 1*.*94, *df* = 1, 6, *p* = 0*.*21, η2 *<sup>P</sup>* = 0*.*24). The effects of sub-type of play and age on infant play time were examined and there was no significant effect of age (*<sup>F</sup>* <sup>=</sup> <sup>0</sup>*.*00, *df* <sup>=</sup> 1, 6, *<sup>p</sup>* <sup>=</sup> <sup>1</sup>*.*00, <sup>η</sup><sup>2</sup> *<sup>P</sup>* = 0*.*00) and no significant interaction (*<sup>F</sup>* <sup>=</sup> <sup>1</sup>*.*61, *df* <sup>=</sup> 8, 48, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*15, <sup>η</sup><sup>2</sup> *<sup>P</sup>* = 0*.*21). Infant expression rate was examined by age, modality, and play type (social, solitary), and there was no significant effect of age (*<sup>F</sup>* <sup>=</sup> <sup>0</sup>*.*00, *df* <sup>=</sup> 1, 6, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*96, <sup>η</sup><sup>2</sup> *<sup>P</sup>* = 0*.*00) and no significant age interactions (*Fs <* 1.34, *p*s *>* 0.29, η<sup>2</sup> *<sup>P</sup>*s *<* 0.18).

Age and group setting had no significant effects on rates of infant expressions so the two ages and the two group settings were collapsed for the following analyses by play context and by type of social partner.

## **SOCIAL vs. SOLITARY PLAY CONTEXT**

Infant play time consisted of a higher proportion of solitary play than social play (mean solitary = 66% ± *SD* 6%; mean social <sup>=</sup> 34% <sup>±</sup> *SD* 6%; *<sup>F</sup>* <sup>=</sup> <sup>48</sup>*.*52, *df* <sup>=</sup> 1, 6, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> *P* = 0*.*89). Playful expression rate was examined by play context and by modality and there was a significant effect of play context (*<sup>F</sup>* <sup>=</sup> <sup>81</sup>*.*12, *df* <sup>=</sup> 1, 6, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> *<sup>P</sup>* = 0*.*93), a significant effect of modality (*<sup>F</sup>* <sup>=</sup> <sup>14</sup>*.*28, *df* <sup>=</sup> <sup>1</sup>*.*14, 6.82, *<sup>p</sup> <sup>&</sup>lt;* 0.01, <sup>η</sup><sup>2</sup> *<sup>P</sup>* = 0*.*70), and a significant interaction between modality and play context (*<sup>F</sup>* <sup>=</sup> <sup>28</sup>*.*62, *df* <sup>=</sup> <sup>1</sup>*.*04, 6.25, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> *<sup>P</sup>* = 0*.*83). *Post-hoc* comparisons (see **Figure 1**) showed that play face rate and multimodal rate were significantly higher during social play than during solitary play, while body rate did not differ by play context. During social play, play face rate was significantly higher than body rate and multimodal rate. During solitary play, play face rate was significantly lower than body rate and significantly higher than multimodal rate. All six expression rates shown in **Figure 1** were significantly higher than 0 (i.e., 95% confidence interval surrounding the intercept did not include 0; *t*s *>* 2.95, *p*s *<* 0.03).

# *Body expressions*

Body expressions were subdivided into five types: hitting (32%), acrobatics (28%), flailing limbs (22%), bouncing (15%), and tickle requests (2%). Expression rate was examined by body type and play context. Body type had a significant effect on expression rate (*<sup>F</sup>* <sup>=</sup> <sup>5</sup>*.*80, *df* <sup>=</sup> 4, 24, *<sup>p</sup> <sup>&</sup>lt;* 0.01, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*49), and there was a significant interaction between body type and play context (*<sup>F</sup>* <sup>=</sup> <sup>5</sup>*.*01, *df* <sup>=</sup> <sup>1</sup>*.*86, 11.17, *<sup>p</sup> <sup>&</sup>lt;* 0.01, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*46) (**Figure 2**). Pairwise comparisons (Bonferonni adjusted) found that the rates of hitting and acrobatics were higher than the rate of tickle requests. Comparisons of expression rates for each body type across social and solitary play found no significant differences despite some moderate effect sizes (*Fs <* 5.97, *df* = 1, 6, *p*s *>* 0.05, η2 *<sup>p</sup>* range = 0.12–0.50). Note that although tickle request expressions were observed only during social play, four infants never displayed this expression. During social play, the rates of acrobatics, hitting, and flailing limbs were significantly higher than 0 (i.e., the 95% confidence interval of the intercept did not include

**FIGURE 1 | Mean rate (intervals per minute of play, with SE) of chimpanzee infants' playful expressions, as a function of modality of expression and type of play.** The modality × play type interaction was examined by comparing playful expression rates for each modality across social and solitary play contexts (paired *t*-tests) and by comparing the playful expression rates for each modality within each play context (One-Way ANOVA with simple contrasts). ∗*p <* 0*.*05, ∗∗*p <* 0*.*01, ∗∗∗*p <* 0*.*001.

0, *t*s *>* 2.91, *p*s *<* 0.03). During solitary play, all expression rates were significantly higher than 0 (*t*s *>* 3.02, *p*s *<* 0.03), with the exception of tickle requests.

## *Multimodal body and play face expressions*

Multimodal expressions were subdivided into five types: play face with hitting (48%), play face with flailing limbs (20%), play face with tickle request (13%), play face with acrobatics (12%), and play face with bouncing (6%). Expression rate was examined by multimodal type and play context. Rates differed significantly by multimodal type (*F* = 7*.*37, *df* = 4, 24, *p <* 0.001, η<sup>2</sup> *<sup>p</sup>* = 0*.*55), and there was a significant interaction between multimodal type and play context (*F* = 6*.*01, *df* = 4, 24, *p <* 0.01, η<sup>2</sup> *<sup>p</sup>* = 0*.*50) (**Figure 3**). Pairwise comparisons (Bonferroni adjusted) showed that the rate of play face with hitting was

significantly higher than the rate of play face with flailing limbs (mean difference = 0.179, *p <* 0*.*05). The multimodal type × play type interaction was examined by comparing the expression rate by multimodal type across social and solitary play. One type of expression, play face with hitting, was displayed at a significantly higher rate during social play than during solitary play (*<sup>F</sup>* <sup>=</sup> <sup>16</sup>*.*57, *df* <sup>=</sup> 1, 6, *<sup>p</sup> <sup>&</sup>lt;* 0.01, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*73), and none of the other types of multimodal expression differed significantly by play context. Note that although play face with tickle request expressions were observed only during social play, three infants never displayed this multimodal expression. Only two types of multimodal expressions occurred at rates significantly higher than 0: play face with hitting during social play (*t* = 4*.*28, *p <* 0.01) and play face with acrobatics during solitary play (*t* = 3*.*485, *p <* 0.05).

The rates of most of the body expression types accompanied by play faces were significantly lower than the rates of body expressions without play faces [bouncing, *F* = 9*.*14, *df* = 1, 6, *p <* 0.05, η2 *<sup>P</sup>* <sup>=</sup> <sup>0</sup>*.*61; acrobatics, *<sup>F</sup>* <sup>=</sup> <sup>19</sup>*.*44, *df* <sup>=</sup> 1, 6, *<sup>p</sup> <sup>&</sup>lt;* 0.01, <sup>η</sup><sup>2</sup> *<sup>P</sup>* = 0*.*76; hitting, *<sup>F</sup>* <sup>=</sup> <sup>6</sup>*.*85, *df* <sup>=</sup> 1, 6, *<sup>p</sup> <sup>&</sup>lt;* 0.05, <sup>η</sup><sup>2</sup> *<sup>P</sup>* = 0*.*53; and flailing limbs, *<sup>F</sup>* <sup>=</sup> <sup>10</sup>*.*97, *df* <sup>=</sup> 1, 6, *<sup>p</sup> <sup>&</sup>lt;* 0.05, <sup>η</sup><sup>2</sup> *<sup>P</sup>* = 0*.*65]. The rate of tickle request with play face, however, did not differ from the rate of tickle request without play face, *F* = 0*.*88, *df* = 1, 6, *p* = 0*.*38, η2 *<sup>P</sup>* = 0*.*13.

#### *Sub-types of play*

Social and solitary play were divided into seven sub-types of play: locomotor solitary (48%), object solitary (19%), locomotor social (9%), mild contact social (14%), rough and tumble social (5%), invite social (1%), object social (1%) (other solitary and social play *<* 0.5%).

Four sub-types of play (solitary locomotor play, solitary object play, social mild contact play, and social locomotor play) occurred with sufficient frequency to allow expression rates to be calculated for all infants. Expression rate was examined by play sub-type and modality. The effect of play sub-type was significant (*F* = 30*.*82, *df* <sup>=</sup> 3, 18, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*84), the effect of modality was significant (*<sup>F</sup>* <sup>=</sup> <sup>17</sup>*.*86, *df* <sup>=</sup> <sup>1</sup>*.*10, 6.62, *<sup>p</sup> <sup>&</sup>lt;* 0.01, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*75), and the interaction between modality and play sub-type was

also significant (*<sup>F</sup>* <sup>=</sup> <sup>8</sup>*.*06, *df* <sup>=</sup> <sup>2</sup>*.*30, 13.81, *<sup>p</sup> <sup>&</sup>lt;* <sup>0</sup>*.*01, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*57) (**Figure 4**). To examine the interaction effect, expression rate was examined by play sub-type for each modality. Play face rate and multimodal rate differed significantly across the four play subtypes, while body rate did not differ (play face, *F* = 16*.*47, *df* = 3, 18, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* = 0*.*73; multimodal, *F* = 17*.*57, *df* = 3, 18, *p <* 0*.*001, η<sup>2</sup> *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*75; body, *<sup>F</sup>* <sup>=</sup> <sup>2</sup>*.*16, *df* <sup>=</sup> 3, 18, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*13, <sup>η</sup><sup>2</sup> *p* = 0*.*27). Simple contrasts showed that play face rate and multimodal rate were significantly higher during mild contact play than during the other play sub-types. Play face with hitting accounted for 73% of multimodal expressions during mild contact social play.

The other sub-types of play were relatively infrequent and not all infants engaged in these types of play; therefore, only descriptive data is available. Tickle play (*n* = 5) had the highest play face rate of all play sub-types (mean rate = 7.45 ± *SD* 1.61 ipm), a low body rate (mean rate = 0.58 ± *SD* 0.75 ipm), and the second highest multimodal rate (mean rate = 2.14 ± *SD* 1.21 ipm). Play face with tickle request gestures accounted for 71% of multimodal expressions during tickle play. Rough and tumble play (*n* = 3) had the second highest play face rate of all play sub-types (mean rate = 6.34 ± *SD* 2.44), a low body rate (mean rate = 0.56 ± *SD* 0.43 ipm), and a moderate rate of multimodal expressions (mean rate = 0.68 ± *SD* 0.62 ipm). Invite play had a moderate play face rate (mean rate = 1.14 ± *SD* 1.01 ipm), the highest body rate of all play sub-types (mean rate = 7.21 ± *SD* 2.59 ipm), and the highest multimodal rate of all play sub-types (mean rate = 2.70 ± *SD* 2.27 ipm). Flailing limbs accounted for 70% of body expressions during invite play, while 57% of multimodal expressions were play faces with flailing limbs.

#### **TYPE OF SOCIAL PARTNER**

Social play was subdivided according to the partner of the focal infants: peer (51% of social play time), mother (15%), other adult (15%), juvenile (19%). One infant was never observed to play with her mother; therefore, the mother and adult categories were combined into an older category. The mean proportion of time that infants engaged in social play was examined by partner (older, peer) and social play type, and there was a signifcant interaction between partner and play type, *F* = 10*.*66, *df* = 1*.*30, 7.80, *p <* 0.01, η<sup>2</sup> *<sup>p</sup>* = 0*.*64. Infants spent more time engaged in locomotor play and rough and tumble play with peers than with older chimpanzees and they spent more time engaged in tickle play with older chimpanzees than with peers (no observations of tickle play with peers) (**Table 4**).

Playful expression rate was examined as a function of social partner and modality. There was a significant effect of partner (*<sup>F</sup>* <sup>=</sup> <sup>12</sup>*.*64, *df* <sup>=</sup> 1, 6, *<sup>p</sup> <sup>&</sup>lt;* 0.05, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*68), such that infants playful expression rate was higher with older chimpanzees (mean rate = 7.20 ± *SD* 2.36 ipm) than with peers (mean rate = 4.20 ± *SD* 1.02 ipm). The interaction between modality and partner was not significant (*F* = 1*.*54, *df* = 1*.*14, 6.86, *p* = 0*.*26, η2 *<sup>p</sup>* = 0*.*21). Descriptive data of the CZ infants playful expression rate with juveniles showed that the rate was at an intermediate level between older chimpanzees and peers (mean juvenile rate = 5.92 ± *SD* 2.19 ipm, *n*=4).

## **HYPOTHESIS 2: ARE EXPRESSIONS MATCHED?**

Matching of play faces was found frequently: infant play faces were present in 424 intervals with a visible social partner and the partner displayed a play face in 34% of these intervals. Matching of playful body expressions was also found: playful body expressions were present in 335 intervals with a visible social partner and the play partner displayed the same playful body expression in 9% of these intervals. Multimodal expressions were not included in the analysis of expressions that were matched by the play partner since infant multimodal expressions were present in only 84 intervals of social play with a visible play partner.

A comparison of the matched play face rate by group found a significantly higher rate in the CZ group (mean

**Table 4 | Mean proportion of social play time spent engaged in different sub-types of play, as a function of play partner.**


*The proportion of older partner play time spent engaged in different types of play was compared to the proportion of peer play time spent engaged in different types of play (One-Way ANOVAs). \*p < 0.05, \*\*p < 0.01, \*\*\*p < 0.001.*

rate = 1.45 ± *SD* 0.68 ipm, *n* = 4) than in the PRI group (mean rate = 0.46 ± *SD* 0.37 ipm, *n* = 3; Mann-Whitney *U*-test: *Z* = 2*.*12, *p <* 0.05). This difference was examined by comparing the matched play face rate for the two groups across play types and play partners but there were no significant differences after applying the Bonferroni correction (Bonferroni corrected *P*-value for significance *<* 0.025; mild contact, *Z* = 2*.*12, *p* = 0*.*03; other *Z*s *<* 1.76, other *p*s *>* 0.08). Older chimpanzees at PRI were never observed to match infant play faces, while older chimpanzees at CZ were observed to match infant play faces (for three of the four infants) albeit at a relatively low rate (mean rate = 0.60 ± *SD* 0.70 ipm, *n* = 4). A comparison of the matched body rate by group found no significant difference between the CZ group (mean rate = 0.17 ± *SD* 0.08 ipm, *n* = 4) and the PRI group (mean rate = 0.15 ± *SD* 0.14 ipm, *n* = 3; Mann-Whitney *U*-test: *Z* = 0.00, *p* = 1*.*00).

Matching of infant expressions across ages was examined. The matched play face rate was higher at 15 months (mean rate = 1.45 ± *SD* 0.95 ipm) than at 12 months (mean rate = 0.68 ± SD 0.56 ipm; *F* = 8*.*62, *df* = 1, 6, *p <* 0.05, η2 *<sup>P</sup>* = 0*.*59). However, matched body rate did not differ by age (12 months mean rate = 0.12 ± *SD* 0.16 ipm; 15 months mean rate = 0.22 ± *SD* 0.21 ipm; *F* = 0*.*79, *df* = 1, 6, *p* = 0*.*41, η2 *<sup>P</sup>* = 0*.*12).

Overall, after collapsing the data by group and age, the matched play face rate was significantly higher than the matched body rate (mean matched play face rate = 1.02 ± *SD* 0.75 pm; mean matched body rate = 0.16 ± *SD* 0.10 ipm; *F* = 9*.*26, *df* = 1, 6, *p <* 0.05, η<sup>2</sup> *<sup>P</sup>* = 0*.*61). It is noteworthy that although the rates of matched play face expressions and matched body expressions were significantly higher than zero (i.e., the 95% confidence intervals did not include 0; matched play face rate, *t* = 3*.*63, *p <* 0.05; matched body rate, *t* = 4*.*41, *p <* 0*.*01).

Matched expressions were examined by social partner (older, peer) and modality. Matching rates were higher with peers than with older partners (*<sup>F</sup>* <sup>=</sup> <sup>6</sup>*.*09, *df* <sup>=</sup> 1, 6, *<sup>p</sup> <sup>&</sup>lt;* 0.05, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*59). The effect of modality was significant (*F* = 8*.*65, *df* = 1, 6, *p <* 0*.*05, η<sup>2</sup> *<sup>p</sup>* = 0*.*59) but the interaction between partner and modality was not significant (*<sup>F</sup>* <sup>=</sup> <sup>2</sup>*.*22, *df* <sup>=</sup> 1, 6, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*19, <sup>η</sup><sup>2</sup> *p* = 0*.*27). Only matched play faces by peers occurred at a rate significantly above zero (mean matched play face rate = 1.20 ± *SD* 1.00 ipm, *t* = 3*.*172, *p <* 0.05). Descriptive data of matching rates of the CZ infants and their juvenile partners showed that the rates of matching by juveniles were relatively high (mean matched play face rate = 2.81 ± *SD* 1.30 ipm; mean matched body rate = 0.31 ± *SD* 0.10 ipm) and significantly above zero (matched play faces: *t* = 4*.*32, *p <* 0.05; matched body: *t* = 6*.*50, *p <* 0.01).

For two social play sub-types, mild contact and locomotor, there were sufficient observations of all seven infants to allow analysis of matched expressions by sub-type of play and modality. There was no significant effect of sub-type of play (*<sup>F</sup>* <sup>=</sup> <sup>2</sup>*.*89, *df* <sup>=</sup> 1, 6, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*14, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*33) and no significant interaction of sub-type of play and modality (*F* = <sup>0</sup>*.*45, *df* <sup>=</sup> 1, 6, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>*.*53, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0*.*07). However, as for all social play, matched play face rate was significantly higher than matched body rate (*F* = 26*.*05, *df* = 1, 6, *p <* 0*.*01, η2 *<sup>p</sup>* = 0*.*81).

# **DISCUSSION**

This study found that infant chimpanzees, 12–15 months of age, exhibited characterstic facial and body movements during both solitary and social play suggesting that joy may be expressed even in the absence of a social partner. Unfortunately, vocal expressions were not able to be detected under these observational conditions, but would clearly add another dimension to playful expressions. Infant chimpanzees spent significantly more time in solitary play, however, they exhibited significantly higher rates of facial expressions and multimodal expressions in social play. This suggests that something about the social context encourages or enhances the appearance of facial expressions. Infant chimpanzees exhibited playful expressions significantly more often with older chimpanzees, but playful expressions were matched significantly more often by peers. Since we found that social partners matched facial expressions significantly more often than body expressions, we propose that this is at least one likely route by which social engagements modify infant behavior. Moreover, the rate of matching facial expressions increased with infant age, even though the rate of infant facial expressions did not change. Although our observational study cannot definitively distinguish communicative and emotional aspects of playful expressions, we suggest that joyful emotion is the core of playful expressions (Panksepp, 1998; Panksepp and Biven, 2012), and underscores the meaningfulness of early social communication (Bard et al., 2014b, see also Di Paolo et al., 2010; Scott and Pika, 2012, for further discussion of the importance of communication meaning). Given that playful expressions emerge early in life and continue to occur in solitary contexts through the second year of life, we suggest that the play face and certain body behaviors are emotional expressions of joy, and that such expressions develop additional social functions through interactions with peers and older social partners.

# **COMMUNICATIVE SIGNALS OR EMOTIONAL EXPRESSIONS?**

In recent years, there has been some resolution of the dichotomous position that facial behavior is either emotional or communicative (Russell et al., 2003; Seyfarth and Cheney, 2003). This has coincided with the increasing recognition of the wholebody nature of emotional expression and communication (e.g., de Gelder, 2009; Zieber et al., 2014) and an understanding that many expressive behaviors, rather than being unambiguous markers of emotion, are interpreted according to the situational context (Camras et al., 2002). In this study, it is important to note that, although play faces and some multimodal expressions were more predominant during social play, they were still observed during solitary play. Here we found rates of expressive behaviors during solitary play (i.e., play face, bouncing, acrobatics, hitting, flailing limbs, and play face with acrobatics) were significantly above 0, supporting a conclusion that these playful expressions do not have an exclusively social function. Given that solitary play accounts for two-thirds of infant play time, the function of these expressions in solitary contexts deserves further consideration. Play was defined as solitary when infants were not in active physical contact with another chimpanzee and when infants gaze was not directed toward any other individual. While it is possible that the play face may still serve a social function (e.g., to reassure mothers that they do not need to intervene particularly when infants solitary play becomes more vigorous or excitable), it is more parsimonious to argue that the significant rate of play faces during solitary play has a non-social function (e.g., Bard et al., 2004). Functional approaches to the study of human emotional expression suggest that, in addition to a communicative function, expressions may regulate internal feelings and behaviors (Barrett, 1993) while from a dynamic systems approach infant smiling may be "an emotional signal to the self" as well as others (Messinger and Fogel, 2007, p. 330). Given the early emergence of playful expressions in chimpanzees and the fact that expressions continue to occur in solitary contexts through the second year of life, we suggest that the play face, certain bodily movements, and certain multimodal expressions are expressions of joy.

One particular type of play, mild contact social play, resulted in infants displaying play faces and multimodal expressions at significantly higher rates than were observed during the other predominant types of infant play (i.e., locomotor play, solitary object play). This suggests that the higher rates of play faces and multimodal expressions during mild contact social play were not a result of higher emotional arousal, since this context was not the most intense play and higher rates of body expression did not occur during this type of play. Instead, we suggest that play faces and multimodal expressions were displayed at higher rates during mild contact social play because their communicative value was greatest during this type of play (for infants of this age). Social mild contact play is a gentle form of sparring, a context in which infants may take the opportunity to develop communicative skills, as a foundation skill that will become more necessary during boisterous rough and tumble play later in life (e.g., Flack et al., 2004; Palagi, 2006).

The prevalence of the multimodal play face and hitting expression, but not other multimodal expressions during social play, supports the idea that play faces can sometimes function as signals of playful intentions (e.g., benign intent) even in young chimpanzees (Waller and Dunbar, 2005). Hitting can be a playful act or an aggressive act and so displaying a play face while hitting may reassure the play partner that the hit is playful rather than aggressive (Palagi, 2008). Nevertheless, in young chimpanzees the hitting rate without an accompanying play face expression was significantly higher than the rate of hitting with a play face expression, suggesting that communicative skills are still developing in infant chimpanzees (Bard et al., 2014b). Other playful body expressions, such as bouncing and acrobatics, have fewer associations with aggression and were displayed in combination with play faces at low rates and with no significant bias toward social play. Therefore, by the beginning of the second year, the chimpanzee infants appear to be learning that it is appropriate, at least in some instances, to disambiguate their playful hits during social play with play face expressions. The infants' immaturity could be a factor in the high rate of hitting without an accompanying play face during social play. This could be determined with further longitudinal studies of these types of expressions during the play of older infants and juveniles.

## **EMOTIONAL ENGAGEMENT AND COMMUNICATIVE DEVELOPMENT**

Chimpanzee infants seem to be sensitive to the charactersitics of their social partners during play, being more expressive when playing with older chimpanzees (mothers, other adults, and adolescents) than with peers. The prevalence of tickle play was the main difference between infant play with older chimpanzees and infant play with peers, with tickling being observed only during play with older chimpanzees and resulting in directionally high rates of infant expressions, particularly play faces and multimodal expressions (see Goodall, 1986, for further discussion of tickling). In other words, mother chimpanzees and other adults seemed to be very effective at eliciting infant joy, since the rates of facial and motor expressions were more than twice as high as with peers. Older chimpanzees seemed to be particularly skilled at using tickling to elicit playful expressions from infants. However, play partners had no effect on infant expressiveness during the predominant type of social play, mild contact play. Therefore, it seems that infants are learning, through engagement, about the different characteristics of play with a variety of social partners (e.g., Bard et al., 2014b).

Play faces were matched by social partners at a higher rate than body expressions were matched, suggesting that play faces may have greater communicative value. Nevertheless, body expressions were matched at above chance levels. Matching expressions could have multiple functions including emotional engagement and responsive communicative signaling. Here, analysis of the contextual nature of matching was limited but research with orangutans suggests that play face mimicry, albeit automatic in many instances, may be influenced by socio-emotional factors (Davila Ross et al., 2008). The social partner influences emotional synchrony in human infant interactions; mother-infant interaction being characterized by coordination of socially-oriented expressions and father-infant interaction being characterized by sudden peaks of high emotional intensity (Feldman, 2003). Here, peers matched infant play faces at a higher rate than older chimpanzees, and matching may be particularly relevant during peer play as both infants are developing their social skills and exploring the rules of social interaction (van Lawick-Goodall, 1968; Savage and Malick, 1977; Cordoni and Palagi, 2011). Matching of infant body expressions was notable only by juveniles, based on descriptive data of CZ infant-juvenile play. Play between infants and juveniles was marked by a high frequency of rough and tumble play and matching may be one means by which juveniles demonstate sensitivity to infants developing abilities (Mendoza-Granados and Sommer, 1995; Pellis and Pellis, 1996; Flack et al., 2004). We expect that further analysis of playful expression matching, with a focus on matching both facial and body movements, may reveal further variations by play context and play partner and allow specification of the mechanisms underlying matching behaviors.

It is interesting to note that playful expressions occurred more than once per minute of solitary object play, but rarely during social object play. We know that chimpanzee infants' interest in objects varies with their early socio-emotional experiences, from wariness when infants are raised in isolation (e.g., Menzel, 1964) to engagement when infants are raised with typical western human interactants (e.g., Bard and Vauclair, 1984; Bard et al., 2014a). Socialization experiences may support representational and pretend play with objects, even in apes (e.g., Jensvold and Fouts, 1993; Lyn et al., 2006). Here, all partners of the infant chimpanzees were conspecifics and relatively few non-food objects were available in their enclosures, but 20% of play included an object (typically vegetation or ropes), though on all but a few occasions object play was solitary. This study supports the conclusion that infant chimpanzees in the Zoo and PRI settings, do not have a large amount of emotional nurturing of joint interest in objects. That is, without emotional encouragement, for instance matching playful expressions during object play, there may be relatively little increase in joint social attention with objects as these infant chimpanzees grow up. Infant chimpanzees, even this young, are sensitive to, and outcomes are influenced by, the emotional engagement patterns of their social partners (e.g., Bard et al., 2013, 2014a; Bard and Leavens, 2014).

## **GROUP DIFFERENCES**

Play face matching differed by group membership with a significantly higher matching rate among the Chester Zoo group than the PRI group. This suggests that group members have a very important role to play in shaping the expressive behaviors of young chimpanzees. The mechanism by which this influence is exerted deserves more study, though differences in group size and composition may have an effect (see Aureli and de Waal, 1997; Brosnan et al., 2005 for studies of group influences on chimpanzee social behaviors). Group size was larger at Chester Zoo, infant rough and tumble play was more frequent in this setting, and juveniles were present, all of which may have affected the nature of infants joyful interactions with others. Group differences in mutual engagement between chimpanzee mothers and their young infants suggest that the modalities of engagment (visual, tactile) are interchangeable (Bard et al., 2005). Although we found no evidence of an increase in body expression matching amongst the PRI group to compensate for the lower rates of play face matching, the PRI group did engage in higher levels of tickle play. Facial expression matching may be less relevant when the interacting chimpanzees are in close body contact.

Our preliminary analyses found that group membership had no significant effect on the rates of infants playful expressions. Our sample size across groups was very small, and although large effects could have been detected, more subtle ones could not. Behavioral flexibility within the chimpanzee species has been well-documented (Whiten et al., 1999) and social dynamics are thought to be a critical factor in expressive behavior patterns (see Smith and Delgado, 2013 for further discussion), so we predict that group differences in rate of infant expressions will be found with larger sample sizes. Although larger sample sizes are likely to sacrifice a narrow age focus, they will allow closer examination of the behavioral characteristics of groups.

# **DEVELOPMENTAL TRENDS**

Play face matching increased in rate from 12 to 15 months indicating developmental progression of infant chimpanzees' emotional communication skills. We were surprised to find no significant differences in rates of infant expressions from 12 to 15 months. We expected that change from 12 to 15 months would indicate more expressions in solitary contexts earlier, compared to more expressions in social contexts later, but no significant age differences were found. Other studies have found age-related trends in play types during the infancy and juvenile period (Markus and Croft, 1995; Mendoza-Granados and Sommer, 1995; Cordoni and Palagi, 2011). However, there is limited knowledge about the developmental progression of playful expressions. Further development occurs in playful body expressions, since the infants in this study were not yet displaying body expressions, such as the play walk, ground slaps, and pirouettes, which would be expected to emerge in juvenile and adolescent chimpanzees (Goodall, 1986; Tomasello et al., 1994; Liebal et al., 2004a; Nishida and Inaba, 2009; McCarthy et al., 2013). Further research across a wider age range is needed to understand how the changing contexts of infant play interact with the display of playful expressions. The developmental trajectory of multimodal playful expressions would be particularly interesting to examine given that the gestural repertoire of chimpanzees increases throughout infancy and into the juvenile period (Hobaiter and Byrne, 2011).

# **CONCLUSIONS**

Infant chimpanzees exhibited a variety of characteristic facial and body movements, in both solitary and social play. Although chimpanzees also express playfulness through laughter, these vocal expressions were not available here due to the constraints of our observational settings. The playful expressions of infant chimpanzees varied in rate across different play contexts and different social partners. Play faces and play face-hitting combinations occurred at elevated rates during mild contact social play indicating that young infants, whether playing with peers or older chimpanzees, are capable of using these expressions to communicate benign intentions during ambiguous or vigorous play. However, the presence of these expressions and certain other body expressions during other social and solitary play types supports the idea that playful expressions are also an expression of joy during play. Similarly, playful expression matching can be regarded as emotional engagement or communicative signaling. The multimodal nature of playful expressions deserves greater attention given the evidence that certain body expressions, either alone or in combination with play faces, are significant features of social play in infancy. The effect of the social group on playful expression rate remains unresolved but we predict that the presence or absence of certain play partners will affect the prevalence of certain play types which in turn will affect rates of playful expressions and matching. The developmental trajectory of infant playful expressions deserves further study across a wider age range which was beyond the scope of the current study. However, the advantage of the narrow age focus here was the emergence of an unusually detailed picture of the context of infant playful expressions at a particular stage of development.

## **ACKNOWLEDGMENTS**

The first author was supported by a research studentship from the Economic and Social Research Council and funds from the Department of Psychology, University of Portsmouth. Additional funding was provided by grants from the European Commission (FP6 IST-045169 to L. Canamero), The Leverhulme Trust (F/00 678/O Research Project Grant to K. Bard) and the Ministry of Education, Culture, Sports, Science, and Technology in Japan (MEXT 24000001 to T. Matsuzawa). Grateful appreciation is extended to Chester Zoo for allowing observations of the chimpanzee group and to Yuu Mizuno who helped with the video observations of the PRI chimpanzees. We thank Kate Thorsteinsson and Marina Davila-Ross for reliability testing.

# **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2014; accepted: 25 June 2014; published online: 24 July 2014.*

*Citation: Ross KM, Bard KA and Matsuzawa T (2014) Playful expressions of one-yearold chimpanzee infants in social and solitary play contexts. Front. Psychol. 5:741. doi: 10.3389/fpsyg.2014.00741*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Ross, Bard and Matsuzawa. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Putting the "Joy" in joint attention: affective-gestural synchrony by parents who point for their babies

# *David A. Leavens 1\*, Jo Sansone1, Anna Burfield1, Sian Lightfoot 1, Stefanie O'Hara1 and Brenda K. Todd1,2*

<sup>1</sup> School of Psychology, University of Sussex, East Sussex, UK

<sup>2</sup> Department of Psychology, City University London, London, UK

## *Edited by:*

Hanne De Jaegher, University of the Basque Country, Spain

#### *Reviewed by:*

Valentina Fantasia, University of Portsmouth, UK Heidi Keller, Osnabrueck University, Germany

#### *\*Correspondence:*

David A. Leavens, School of Psychology, University of Sussex, Falmer, East Sussex, BN1 9QH, UK e-mail: davidl@sussex.ac.uk

Despite a growing body of work examining the expression of infants' positive emotion in joint attention contexts, few studies have examined the moment-by-moment dynamics of emotional signaling by adults interacting with babies in these contexts. We invited 73 parents of infants (three fathers) to our laboratory, comprising parent-infant dyads with babies at 6 (n = 15), 9 (n = 15), 12 (n = 15), 15 (n = 14), and 18 (n = 14) months of age. Parents were asked to sit in a chair centered on the long axis of a room and to point to distant dolls (2.5 m) when the dolls were animated, while holding their children in their laps. We found that parents displayed the highest levels of smiling at the same time that they pointed, thus demonstrating affective/referential synchrony in their infant-directed communication. There were no discernable differences in this pattern among parents with children of different ages. Thus, parents spontaneously encapsulated episodes of joint attention with positive emotion.

**Keywords: pointing, smiling, embodied cognition, intersubjectivity, affective -gestural synchrony**

## **INTRODUCTION**

Joint attention is the ability to capture and re-direct the attention of a social partner, and to follow another's communicative cues to a specific locus (e.g., Moore and Dunham, 1995; Bard and Leavens, 2009; Leavens and Racine, 2009; Seemann, 2012). Joint attention refers to a suite of triadic communicative skills that typically develop in humans and apes late in their infancy periods, near the end of the first year of life, and includes such behavioral developments as pointing, following the pointing and gaze direction of others, using emotional information from a caregiver to regulate one's response to novel objects (social referencing), and other tactics involving the coordination of attention to a common focus (e.g., Carpenter et al., 1998; Butterworth, 2001; Striano and Bertin, 2005; Racine and Carpendale, 2007; Bard et al., 2014). Under existing strictures in some contemporary psychological theories, this kind of coordination requires that babies develop a reasoning capacity, based on an ability to represent the invisible contents of others' minds; pre-verbal human babies are held to point to things because they can represent the perceptions, even knowledge, of their social partners and wish to manipulate those perceptions and those knowledge states (see, e.g., Racine and Carpendale, 2007, for a review and critique).

Joint attention in human infants occurs in social contexts characterized by dynamically changing emotional contours. There is a growing body of work examining the dynamic expression of infants' positive emotion in joint attention contexts (e.g., Adamson and Bakeman, 1985; Hobson, 1993; Messinger and Fogel, 1998; Jones and Hong, 2001, 2005; Reddy, 2001, 2003; Striano and Bertin, 2005; Carpenter and Liebal, 2012). For example Adamson and Bakeman (1985) reported high rates of positive affect when infants from 6–18 months of age were jointly engaged around objects with their mothers. Jones and Hong (2001) reported that, late in the first year of life, infants begin to incorporate their own smiling behavior into intentional communication with their mothers (and see Jones and Hong, 2005). Reddy (2001, 2003) outlined a developmental pathway into triadic reference grounded in infants' experiences of themselves as objects of attention and intentional actions. In particular, Reddy's account specifies the affective qualities manifested during infants' early engagements with others as a field of experience that can be generalized to objects later in the first year of life. Recently, Carpenter and Liebal (2012) have described, in conceptual terms, the role of mutual visual regard with positive affect between babies and their parents as a kind of acknowledgment of the mutual awareness of the jointness of the interactions, the idea being that babies and their mothers acknowledge the shared nature of these joint attention episodes with mutual gaze and smiles. These findings and conceptual advancements were presaged by Hobson (1993), who speculated that "the development of a child's awareness of propositional attitudes might begin with more or less direct perception of other people's affective attitudes" (p. 240). Thus, according to Hobson, affective awareness scaffolds or bridges later-developing conceptions of mental attitudes. Few studies, however, have examined the moment-by-moment dynamics of emotional signaling by adults interacting with babies in these triadic contexts. These affective landscapes may have significant bearing on infants' motivations to follow into another person's focus of attention, for example, following their pointing gestures or their line of regard.

The present study was originally designed by Leavens and Todd to examine parents' coordination of the hands that they used to point to distant dolls arranged in an arc and to support their children in their laps, the question being at what angular displacement to left or right would parents switch the hands being used to physically support their babies in their laps and being used to point (Todd and Leavens, in preparation)? Upon initial examination of the videotaped footage, it seemed to be the case that the parents were marking their own pointing gestures with bursts of positive emotion. This has significant bearing on a longstanding debate in developmental psychology: are human children evolutionarily prepared for engaging in joint attention, as argued by Tomasello et al. (2007), or do parents shape infants'attention-oriented behavior with social reinforcement, as long argued by Moore (e.g., Moore and Corkum, 1994)?

In considering the different predictions of these two classes of theory, we reasoned, following Leudar and Costall (2004), that nativist accounts like that of Tomasello et al. (2007) assume that there is an epistemological gap between a communicative behavior and its psychological underpinnings; i.e., there is a theoretical commitment to the idea that invisible psychological processes cause communicative behavior, and it is the role of the developing infant to discover this relationship (see also Leavens et al., 2005; Froese and Leavens, 2014, for discussions of this issue). As a consequence of this embedded assumption, external features of the ontogenetic contexts in which children develop their social skills are assumed to be typical for the species. Therefore, we could not specify any pattern of behavior, in advance, that could falsify a theoretical claim of evolutionary preparedness for joint attention in humans (see also Bard and Leavens, 2014, for a review of theoretical positions that omit developmental experience as an explanatory factor in the development of social skills; also see Bateson, 1972; Churcher and Scaife, 1982; Zukow-Goldring, 1997). In contrast, learning- or experience-based accounts of the development of joint attention do make some global predictions about the patterns of reward in the lived experiences of children who are developing these skills (see, esp., Moore and Corkum, 1994; Reddy, 2003). In particular, if children are to learn to attend to deictic signals, then there must be some way that these physical acts are marked as being, somehow, special-in-relation-to external objects and events.

Accordingly, we set out to characterize the smiling behavior of the parents in this study in temporal relation to key events in each of trials: at several time points before the doll was animated, at the moment the doll was activated, at the moment of maximum extension of the parents' pointing hands, at the moment the pointing hand was maximally retracted, the moment the doll's activation ceased, and at two subsequent time points. Our reasoning was that if parental smiling behavior was paired with their referential signals (their points), then this would provide evidence relevant to at least two broad classes of theoretical axioms: first, as a kind of affective-referential precursor to the affective-conceptual links described by Hobson (1993) and Reddy (2001, 2003) and, second, as a pattern of contingencies in social reward that could, in principle, exert the kind of socially grounded attention-shaping processes required by Moore and Corkum's (1994) theory. Alternatively, if we did not find a close temporal association between pointing and smiling behavior, this would have some bearing on the generality of developmental process models grounded in environmental factors, like social reinforcement; in other words, because learning models require contingent social reinforcement, the present study comprises a direct test of the hypothesis of socially based reward contingencies in parent-infant interaction.

Of particular relevance to the emerging science of embodied intersubjectivity is that the interactive phenomena we describe here comprise bodily manifestations of the interactive accompaniments to pointing; thus, this experimental context is an ideal test bed for exploring behavioral coordination in intersubjective activities. As Froese (2011) recently emphasized, the rapidly emerging strands of embodied approaches to understanding cognitive development, including enactivist and dynamic systems theories, markedly expand the kinds of questions we can ask about intersubjective engagement (e.g., Zukow-Goldring, 1997; De Jaegher and Di Paolo, 2007; Wilson and Golonka, 2013). The present context, in which parents point for their young children, is ideal for examining the bodily vehicles of attention scaffolding.

# **MATERIALS AND METHODS PARTICIPANTS**

We recruited parents with young babies through advertising posters with tear-off cards on which were printed the contact details of the infant study unit, at the University of Sussex. We invited 76 parents of infants (3 fathers) to our laboratory, of whom 73 completed testing, comprising parent-infant dyads with babies at 6 (*n* = 15), 9 (*n* = 15), 12 (*n* = 15), 15 (*n* = 14), and 18 (*n* = 14) months of age (two of the three remaining dyads were excluded due to infant fussiness, and one because of experimenter error—specifically, the videotape was accidently overwritten).

## **PROCEDURE**

Parents were asked to sit in a chair centered on the long axis of a 5×4 m room with symmetrical illumination and a beige curtained backdrop (**Figure 1**). The parents held the children in their laps. Four mechanical dolls were arranged in an arc around the dyads, 2.5 m from their chair, at symmetrical angular displacements of 20 and 60◦ to the left and right of their midlines. Two video cameras were placed, respectively, centrally and 45◦ to the right of the dyads; images were mixed to a split screen and this split screen image was recorded on Super VHS video. Dyads were randomly assigned to random sequences of doll activation so that each of the four dolls were animated on the first four trials and then this same sequence was repeated for an additional four trials, rendering eight trials per dyad. As each of the dolls was animated from a control room adjacent to the laboratory, its "arms" and "legs" oscillated up and down while auditory signals (a recorded female voice repeating the phrase, "Hey, baby!") were emitted from a speaker mounted behind each doll's "head" for a duration of 5 s. Parents were asked to point to the dolls when they were animated. No other specific instructions were given.

## **CODING AND ANALYSES**

The onsets of eight 1-s intervals were defined for each trial: (a) 6 s prior to doll activation, (b) 3 s prior to doll activation, (c) the instant the doll was activated, (d) the time at which the maximum extent of parents' points were displayed, (e) the time at which

the pointing hand was maximally retracted (note that in every observation, the hand used to point was retracted and brought to a clear, unambiguous resting position), (f) the moment the doll was inactivated (5 s after doll activation), (g) 3 s after the doll was inactivated, and (h) 6 s after the doll was inactivated. On each of the eight trials, for each of these eight 1-s intervals, parents were dichotomously classified as either "smiling" or "not smiling" during that 1-s interval. On 27% of the 4,672 observation intervals it was not possible to see the faces of the parents, so the dependent variable was the proportion of trials across the eight intervals in which the parents smiled, including only intervals in which the parents' faces were clearly visible. Because some infants became fussy, the total number of trials per parent–infant dyad ranged from 4–8 (we included all dyads that had completed at least four trials). Because not every parent–child dyad participated in the same number of trials, the dependent variable was the proportion of trials in which parents smiled.

Two coding teams, each comprising independent pairs of researchers, performed separate passes through the entire corpus, each team coding to a consensus. Because, initially, we were interested in characterizing the intensity of smiling on each trial, the first coding team (Burfield and O'Hara) applied a three-category scheme to only some of the observational intervals described above: parents were categorized as (a) not smiling, (b) weakly smiling, and (c) strongly smiling. We found it difficult, however, to define the boundary between weakly smiling and strongly smiling, in objective terms, to the second coding team (Lightfoot and Sansone). Therefore, the second coding team scored an expanded number of intervals, dichotomously classifying parents as either (a) not smiling, or (b) smiling, as described above; this was the data used for the analyses reported here. Smiles were coded when the corners of the mouth could be seen to be raised (Ekman and Friesen, 1978). Due to a late-discovered technical problem with the microphone, few recorded video clips contain audible speech; this was not considered to be problematic with respect to the original hypotheses the study was designed to test, pertaining to cradling and handedness, but it does constrain our present analyses and conclusions entirely to visual information.

### **RELIABILITY**

As noted above, there were two coding passes through the data, using slightly different coding schemes. For purposes of reliability assessment, we collapsed the initial coding of weakly smiling and strongly smiling into a single category of "smiling" and then directly compared these data with the inherently dichotomous data of the second coding team. Reliability was assessed as the agreement on parental smiling in intervals coded by both teams (25% of the corpus) Cohen's κ = 0.64. Because the probability of a 1-s interval being coded as either smiling or not was highly variable across intervals, and because the coding system was very simple, therefore this is a very good level of agreement (see discussion in Bakeman and Quera, 2011, pp. 65–68). Landis and Koch (1977) characterized κ values between 0.61 and 0.80 as"substantial agreement" (page 165).

# **RESULTS**

An 8 (intervals) × 5 (age group) mixed ANOVA revealed that parents smiled non-randomly throughout the experimental trials, *F*(7,476) = 55.67, *p* < 0.001. Systematic pairwise comparisons, with Bonferroni corrections for multiple tests revealed a general pattern of three "levels" of parental smiling: from a LOW level of smiling at all time points preceding the doll activation up to the moment of activation (i.e., the first three time points in the trials), through an epoch of a HIGH level of smiling starting from the maximum extension of the parental points and ending 3 s after the doll had stopped moving (i.e., the next four time points in the trials), and, finally, an INTERMEDIATE level of smiling at the last time point measured in each trial, 6 s after the doll's animation ceased, as the smiling returned to baseline levels (see **Figure 2**; **Table 1**). In other words, levels of parental smiling within the "HIGH" level did not differ statistically in pairwise comparisons, but they did differ from smiling levels in INTERMEDIATE and LOW, and this pattern held for all three levels of smiling, with only one exception: within the LOW category, smiling during the second interval, DOLL ON −3 s, was statistically lower than both of the immediate adjacent levels, also labeled LOW, but stood in an identical relation with these adjacent intervals to all intervals labeled HIGH and INTERMEDIATE (i.e., there was statistically less smiling in all intervals labeled LOW, compared to INTERME-DIATE and HIGH smiling levels). Because our minimum intertrial interval was 12 s in duration, we could not extend our observations later in time during each trial, because this would have overlapped with successive trials in many instances; this is why our analyses do not capture a full return to baseline levels by the end of the trials. Thus, parents encapsulated these episodes of joint attention, in which they pointed to distant targets, with an envelope of positive emotion.

There was no influence of age group, *F*(4,68) = 0.41, *p* = 0.799, nor was there an interaction between interval and age group, *F*(28,476) = 1.18, *p* = 0.242 (see **Figure 3**). Parents encapsulated joint attention episodes with positive emotion across the entire age range of our infant subjects, from 6 to 18 months of age. There was modest, but statistically significant variability in the number of intervals in which parents smiled across trials (Greenhouse–Geisser corrected *F*(4.42, 317.86) = 55.91, *p* < 0.001. To determine whether there was any evidence of

parental habituation in smiling to the doll animations, we summed the number of intervals in which parents smiled in the first four trials and compared this to the number of smiles in the last four trials, finding that there was a significant difference [paired samples *t*(72) = −2.09, *p* = 0.041]. However, there were more intervals with smiling in the second half of the experiment (mean = 13.3, SD = 8.9) than in the first half of the experiment (mean = 12.1, SD = 7.1), indicating that, if anything, the experiment elicited more smiling with the passage of time.

We found no influence of infant birth order on parental smiling behavior [*F*(2,70) = 1.41, *p* = 0.252]), nor did we find a relationship between parental age and smiling behavior (Pearson's *r* = 0.05, *p* = 0.681). Finally, parents did not smile differentially as a function of infant gender: *t*(71) = −0.47, *p* = 0.642.

## **DISCUSSION**

There are two substantive findings from this study of 73 parent-infant dyads. First, parents displayed peak positive emotion, as evidenced by smiling behavior, that was temporally synchronized with their pointing gestures and their immediate aftermaths. Second, this pattern characterized the entire sample of children from 6–18 months of age. This distinctive pattern of positive emotional display while pointing to entities has significant relevance for contemporary theoretical interpretations of infant pointing. The dominant, internalist (or telementational – see the detailed analysis and critique of internalist theories of development by Leudar and Costall, 2004) perspective on human communicative development interprets infants' abilities to triangulate with others on a common focus as evidence for infants' developing abilities to represent the abstract visual perspective of others, along with the developing appreciation of others as psychological entities (e.g., Povinelli et al., 1997; Tomasello et al., 2007). Thus, in mainstream cognitive psychology, there is, arguably, an overweening concern with computational models of human cognitive development; or as Shotter and Newson (1982, p. 37) put it: "[t]raditional cognitive psychology has now set its sights upon discoving the nature of the 'inner computer' ... people use in achieving their actions." We think that our findings draw attention to the external, ecological features of the communicative environments in which children necessarily construct their habits of response to the communicative bids of their caregivers.

This synchronization of parents' positive emotional signaling at the peak extensions of their own pointing gestures highlights



The number of parents whose faces were visible at any given 1 s-interval, on any given trial, ranged from 38–69, mean = 52. Values in parentheses are standard deviations. Tabled values in bold are the minimum and maximum values in each row (ties are both bolded). Percents reported exclude parents whose faces were not visible.

the environmentally situated placement of key affective information about the nature of these joint attention interactions. This pattern raises the possibility that, in accordance with the analyses of Moore and Corkum (1994), Zukow-Goldring (1997), and Rader and Zukow-Goldring (2012), parents may actively, if apparently unconsciously, shape the attention-deployment patterns of their children, at least in some cultural contexts. If these patterns of parental affective signaling do exert an influence on the development of infants' responses in joint attentional social frames, then we would predict substantial cross-cultural variation in these developmental profiles (Schieffelin and Ochs, 1989; Triesch et al., 2006; Keller, 2007, 2012). Although there is not a lot of directly relevant literature, what evidence exists is consistent with the idea that both the amount of time parents spend in coordinated joint engagement around objects with their babies and the emotional tones of those interactions differ substantially across settings. For example, Bakeman et al. (1990) reported that aboriginal !Kung infants spent only 1.6% of observed intervals engaged in joint object involvement, compared (with some qualifications) with a North American sample (Clarke-Stewart, 1973), in which about 4.5% of intervals involved joint object involvement between babies and their mothers. Abels et al. (2005) reported relatively low levels of joint object involvement between mothers and their babies in both rural and urban settings in a study from India. Vogt and Martin (2013) reported substantially fewer co-speech gestures by parents of young children in rural Mozambique communities, compared with urban communities in Mozambique. Salomo and Liszkowski (2013) observed that Mayan babies pointed with their index fingers at much reduced rates, compared to both Dutch and Chinese children, and also spent significantly less time in triadic joint action than Dutch and Chinese children. Thus, cross-cultural differences in the incidence of object-centered joint engagement are well-established, and the present findings suggest that these differences may be accompanied by cross-cultural differences in

maternal affective tone in relation to object-centered coordination of attention.

The absence from the present study of any apparent influence of infant age on parental smiling behavior suggests that this pattern of gestural/affective synchrony may characterize intersubjectivity across a wide swathe of infancy and infant competencies. Parents of even our youngest infants (6 months of age) still smiled most frequently at the peak of their pointing gestures. There is little evidence of point- or gaze-following ability in Western children at this age (e.g., Butterworth and Grover, 1988; Butterworth, 2001; Deák et al., 2008), so if parents are displaying this pattern of pairing high positive affect with pointing gestures outside of the laboratory, then this could provide a stable emotional contingency contour around parent-initiated joint attention long before babies evidently can use these kinds of referential signals and continuing well into the second year of life. In other words, if these patterns of affective/referential synchrony are manifested in the home environments of these babies, then both the babies' attentional deployments and their attitudes about novel objects or events may be developmentally shaped into a typical Western pattern of joint object involvement (see, e.g., Moore and Corkum, 1994; Rossmanith et al., in press). Learning- and ecologically based theoretical accounts of the development of joint attention ability in humans, like those of Shotter and Newson (1982), Moore and Corkum (1994), and Triesch et al. (2006) require this kind of stability in these contingent social reward. Thus, the present study, despite its *a posteriori* approach, was sufficiently powerful in design to have significantly challenged learning-based accounts of sociocognitive development, by failing to find either (a) that parents did not pair their referential gestures with smiles or (b) that parents only displayed these patterns for a minority of our age groups. In accordance with environmentally oriented theoretical accounts, the parents in this study paired their own pointing gestures with smiles across the entire age range of our sample, with infants from 6 to 18 months of age. If the present findings can be extended to the rearing environments of children with their families, outside a laboratory context, then these data suggest that affective-referential synchrony might occur across a vast swathe of human infancy, at least in Western, post-industrial cultural environments. Thus, joint attention in humans is situated in a social landscape of emotional markers for key intersubjective experiences (Churcher and Scaife, 1982; Shotter and Newson, 1982). Churcher and Scaife (1982), for example, noted that children learning to follow the gaze and pointing cues of their caregivers may be motivated not only by the potentially rewarding sight of the indicated entity, but also the "social reactions" of their caregivers (p. 127; see Bard et al., 2014, for evidence of the association between affect and joint attention in infant chimpanzees). Shotter and Newson (1982) noted that a human child, "although perceptually distinguishable from her environment as an individual ... is not as such physically isolable from it; she exists (as an open system) only in mutual relation to it" (p. 34). Thus, our findings are consistent with developmental accounts that emphasize the non-computational, distributed concomitants of joint attention, insofar as these babies' social environments displayed distinctive envelopes of dynamic changes in the expression of

positive emotion, peaking at the time of parents' pointing gesture extensions.

In contrast, nativist accounts of the development of joint attention in human children, such as those by Butterworth (2003), Povinelli et al. (2003), and Tomasello et al. (2007) all posit a species-unique human specialization for triadic joint engagement, based on hypothetical cognitive and/or motivational capabilities that are also allegedly unique to our species. What holds these disparate nativist perspectives together as a class of theoretical speculation is the postulate that human capacities to follow into and to direct the attention of others are predicated on evolutionary adaptations of cognitive and/or motivational systems in our lineage, and shared by all extant humans. As Racine (2012; and see, e.g., Racine and Carpendale, 2007) has pointed out, the hypothesis of a human biological adaptation for joint attention is necessarily an assumption without empirical foundation. It is, at best, an interpretive stance on the manifold interactive phenomena of human caregiver-infant interactions. Importantly, for purposes of the present argument, these adaptationist approaches to understanding the development of joint attention in humans do not predict (a) the emotional features of the environmental contexts in which human signaling develop or (b) the cross-cultural variability displayed in the development of joint attention. As such, our finding of the pairing of positive emotional signals with referential gestures by adult caregivers neither confirms nor disconfirms an adaptationist interpretive stance; in other words, adaptationist theories are not falsifiable on the basis of our findings. Thus, the theoretical significance of our findings, in our view, is that we were able to test a key tenet of learning- and ecologically based environmental accounts of the development of joint attention in humans – that the environment must provide a differential reward structure – and the social learning approach survived this test of its predictions. Given the distribution of this gestural/affective synchrony in parental signaling across a very large range of infancy, future studies would add substantially to our understanding of the integration of emotional and referential signaling in the early lives of children. For example, this kind of analysis could be extended to infants' home environments, like the seminal studies of Clarke-Stewart (1973, and see Rossmanith et al., in press). Moreover, future studies should explore the auditory/verbal concomitants of referential gestures in caregiver–infant interactions. Coding archival and future footage of parent-infant interactions in a range of cultural contexts could provide valuable insight into cross-cultural patterns of similarity and difference in affective/gestural synchrony, using these relatively simple measures of smiling and gestures. Hence, the essential finding of parents pairing their deictic gestures with smiles has significant relevance for theory development in this area. For example, the kind of affective/referential synchrony we report, here, throughout the infancy period, might complement the dynamic-gesture/word (visual/auditory) synchrony that figures prominently in Zukow-Goldring's (1997)theory of attention-shaping through perception of amodal invariants.

With respect to the specific theoretical concerns of the present special issue, the present findings are consistent with environmentally situated accounts of child cognitive development. The episodes of joint attention that we elicited in the laboratory were encapsulated with positive affective expression, even though the parents received no instruction to do so. Their spontaneous display of positive emotion is consistent with Hobson's (1993) postulate of affective bridges to conceptually based social awareness, a point of view that highlights the embodied, situated nature of infants' developing social competencies. The contemporary practice of attributing developmental change solely to hypothetical, hidden changes in psychological processes can direct researchers' attention away from the empirical, psychologically relevant bodily realities of human parent-infant engagement patterns (e.g., Shotter and Newson, 1982; Zukow-Goldring, 1997; Reddy, 2001, 2003; Leudar and Costall, 2004; Striano and Bertin, 2005; Leavens and Bard, 2011; Bard and Leavens, 2014). This study suggests that an increased awareness of the affective components of deictic communication may reveal the previously underappreciated and public information available not only to researchers but to parents and young children tasked with building routines of meaning.

## **ACKNOWLEDGMENTS**

This research project was funded by an internal grant from the former Psychology Group, School of Cognitive and Computing Sciences (COGS), University of Sussex, and prior to data collection, this study was scrutinized for adherence to the ethical standards of the British Psychological Society by the research committee of the Psychology Group, COGS. We thank Kim Bard for helpful discussion of these findings, and the many kind and generous parents who brought their babies into our laboratory. We would like to thank the late George Butterworth for his mentorship and collegiality; he left us too soon. Finally, we thank the reviewers and editor for their helpful and constructive comments and advice.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 April 2014; accepted: 23 July 2014; published online: 12 August 2014. Citation: Leavens DA, Sansone J, Burfield A, Lightfoot S, O'Hara S and Todd BK (2014) Putting the "Joy" in joint attention: affective-gestural synchrony by parents who point for their babies. Front. Psychol. 5:879. doi: 10.3389/fpsyg.2014.00879*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Leavens, Sansone, Burfield, Lightfoot, O'Hara and Todd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Proximity and gaze influences facial temperature: a thermal infrared imaging study

#### *Stephanos Ioannou1,2\*, Paul Morris 2, Hayley Mercer 2, Marc Baker 2, Vittorio Gallese1 and Vasudevi Reddy2 \**

*<sup>1</sup> Section of Human Physiology, Department of Neuroscience, Parma University, Parma, Italy*

*<sup>2</sup> Department of Psychology, Centre for Situated Action and Communication, University of Portsmouth, Portsmouth, UK*

#### *Edited by:*

*Ezequiel Alejandro Di Paolo, Ikerbasque - Basque Foundation for Science, Spain*

#### *Reviewed by:*

*David Perrett, University of St Andrews, UK Ioannis Pavlidis, University of Houston, USA*

#### *\*Correspondence:*

*Stephanos Ioannou and Vasudevi Reddy, Department of Psychology, Centre for Situated Action and Communication, University of Portsmouth, King Henry Building, King Henry 1st Street, Portsmouth PO1 2DY, UK e-mail: ioannoustephanos@ gmail.com; vasu.reddy@port.ac.uk*

Direct gaze and interpersonal proximity are known to lead to changes in psycho-physiology, behavior and brain function. We know little, however, about subtler facial reactions such as rise and fall in temperature, which may be sensitive to contextual effects and functional in social interactions. Using thermal infrared imaging cameras 18 female adult participants were filmed at two interpersonal distances (intimate and social) and two gaze conditions (averted and direct). The order of variation in distance was counterbalanced: half the participants experienced a female experimenter's gaze at the social distance first before the intimate distance (a socially "normal" order) and half experienced the intimate distance first and then the social distance (an odd social order). At both distances averted gaze always preceded direct gaze. We found strong correlations in thermal changes between six areas of the face (forehead, chin, cheeks, nose, maxilliary, and periorbital regions) for all experimental conditions and developed a composite measure of thermal shifts for all analyses. Interpersonal proximity led to a thermal rise, but only in the "normal" social order. Direct gaze, compared to averted gaze, led to a thermal increase at both distances with a stronger effect at intimate distance, in both orders of distance variation. Participants reported direct gaze as more intrusive than averted gaze, especially at the intimate distance. These results demonstrate the powerful effects of another person's gaze on psycho-physiological responses, even at a distance and independent of context.

**Keywords: gaze, interpersonal distance, thermal imaging, autonomic nervous system, emotion regulation**

# **INTRODUCTION**

The way that people communicate and engage in emotional and intentional exchanges needs the recognition of the subtle nonverbal cues that conspecifics generate (Freeth et al., 2013). Gaze (Frischen et al., 2007) and interpersonal distance (Baillenson et al., 2001) are important sources of social meaning, conveying a range of information regarding intentions (Nummenmaa and Calder, 2009), interpersonal relationships (Little, 1965; Evans and Howard, 1973), character (Argyle et al., 1974; Sodikoff et al., 1974), culture (Hall, 1966; Watson, 1970) as well as mental health and emotional state (Oliver et al., 2001; Aziraj and Cerani ´ c, 2013; ´ Freeth et al., 2013). Competence in interpersonal interaction is important for reproduction and survival and therefore important from an evolutionary perspective. At the cognitive level this is achieved through the recognition of opportunities and threats whereas at the behavioral level, by the selection of social strategies for exploitation or avoidance (Bodenhauzen and Hugenberg, 2009). The autonomic nervous system (ANS) is an integral part of the social engagement process and alters its activity to foster behavioral strategies of threat engagement or non-emergency vegetative states (Porges, 2001).

Gaze, characterized as affording a "language of the eyes" (Frischen et al., 2007, p. 694), can communicate to the receiver a range of mental states, such as intentions, emotions, desires and beliefs. Emery (2000) states that gaze perception has evolved as a form of warning, informing the organism that a predator is attending to it. Many animals respond to being stared at with exhibitions of fear and submissive behavior showing that the identification of direct gaze is perceived as a warning (Schwab and Huber, 2006). Neuroimaging data have shown that the amygdala, a major structure for emotional processing, responds when individuals observe images of others engaging in direct gaze, rather than when they look somewhere else (Kawashima et al., 1999). Furthermore, the peripheral nervous system seems to be affected by gaze direction. Participants' skin conductance increases when observed by another, an indication that long periods of eye contact are perceived as threatening or aggressive (Nichols and Champness, 1971; Hoffman et al., 2007; Hietanen et al., 2008). People seem to be highly sensitive to being looked at by others showing finely tuned ability to detectothers' gaze (Baron-Cohen, 1995). Visual search experiments have shown that less time is taken to find eyes that are directed toward the observer than eyes that are looking at another target (Conty et al., 2006). Single cell recordings have shown that cells in the anterior part of the superior temporal sulcus code the social significance of the visual stimulus. Jellema et al. (2004) exposed rhesus macaques to a live 3-D live presentation of a human walking away or toward the subject in a both compatible (e.g., walking forward posture and head in the same direction) and incompatible manner (e.g., walking backward with head and body in the opposite direction). The researchers concluded that specialized cells in the temporal lobe analyze the intentions and goals of others actions. Moving from the normal population to clinical disorders, people with autism are reported to find eye contact aversive (Dalton et al., 2005). In fact compared to controls during tasks in which they are asked to explore the eye region of the face, people with autism show increased skin conductance as well as greater activity in the amygdala and fusiform gyrus. Their avoidance or dislike of eye contact suggests a strategy of physiological regulation (Dalton et al., 2005). Similar preferences and strategies are observed in people with social phobias (Horley et al., 2003).

The study of proxemics dates back more than four decades (Hall, 1959; Sommer, 1959). Hall (1966) defined interpersonal space in four different categories Intimate (0–46 cm), Personal (45–120 cm), Social (1.2–3.5 m), and Public (3.5 m+). Interpersonal space seems to be affected by a variety of individual and cultural differences (Hall, 1966). Women have smaller personal space when interacting with other women (Larsen and LeRoux, 1984) and, in contrast to males, dislike lateral intrusions into their personal space (Fisher and Byrne, 1975). Sanders (1978) showed that women maintain a larger personal space during menstruation. Moreover, in the United States, Malaysia, Spain, and Chile, irrespective of their country of origin and gender people preferred to be touched by a female rather than by a male (Willis and Rawdon, 1994). To some extent Spanish men were the most tolerant in terms of being touched by other males, whereas Malaysians irrespective of gender, were the least tolerant of being touched. Women from the United States had the highest tolerance in terms of being touched by the same gender. In the same line of research, Little (1968) asked Americans, Swedes, Greeks, Italians, and Scots to place a doll at the distance in which they would normally interact with another individual. Scots placed the doll at the greatest distance with Greeks placing the doll nearest. Age as well as prior knowledge about the forthcoming experience seems to also play a defining role in interpersonal distance. Older individuals prefer to sit further away from the interviewer regardless of expectations about the pleasantness or unpleasantness of the situation (Feroletto and Gounard, 1975). On the other hand younger individuals are affected mainly by their expectations about the situation. Perceived violence and level of criminality also affect personal space with people generally were less reluctant to sit next to an individual who has never committed a crime than to violent and non-violent offenders (Skorjanc, 1991). Studies with clinical populations have shown that schizophrenic patients, compared to controls, require larger personal space (Deus and Jokic-Begic, 2006) and people with anxiety, compared to individuals with psychotic disorders, left more space between themselves and the experimenter (Aziraj and Cerani ´ c, 2013 ´ ).

Personal space seems to expand or shrink according the intimacy level of the participants According to equilibrium theory mutual gaze and personal space are two inversely related social behaviors (Argyle and Dean, 1965) modulated by intimacy: interpersonal space increases in order to balance out the effects of direct gaze. Interacting with avatars in a virtual environment, people leave more space between themselves and the virtual agent when direct gaze is involved; when the avatar invades their personal space, participants move further away (Bailenson et al., 2003). Data collected with electroencephalography and other psychophysiological measures is consistent with the above behavioral findings. When a male experimenter was looking at a male participant from a close distance, arousal was at its peak compared to when gaze was averted. In addition, when distance was increased arousal diminished; nevertheless, direct compared to averted gaze always caused greater arousal independent of distance (Gale et al., 1975). McBride et al. (1965) found that galvanic skin response (GSR) increased as a function of proximity and frontal confrontation. Similar results were observed by Nichols and Champness (1971). Finally, in a study that examined these effects in a clinical population, highly anxious women avoided gaze contact and exhibited backward head movements in response to male avatars who showed direct gaze. These behaviors were exhibited independent of distance (Wieser et al., 2010).

Porges' Polyvagal theory is one of the most influential interpretations on the role of the ANS in social engagement (Porges, 2001). Through evolution, the ANS retained three neural pathways whose hierarchy reflects their phylogenetic origins. On the top of this pyramid is the (a) social engagement system that is part of the parasympathetic nervous system (PNS) (Cannon, 1928), which inhibits more primitive structures of (b) mobilization (e.g., fight-or-flight), and (c) immobilization (e.g., feigning death, behavioral shutdown "syncope"). Having its main control component in the cortex, the social engagement system controls brainstem nuclei responsible for motor movements of communication such as eyelid opening, facial muscles, head turning, pharyngeal and laryngeal muscles, middle ear muscles, as well as muscles of mastication. These facial muscles have been reported to be dysfunctional in several types of psychopathology such as depression, autism, antisocial personality disorders and posttraumatic stress disorders. Despite head muscles, which are the "beacon" of human interaction, other structures such as the cardiopulmonary and the sympathetic-adrenal system alter their activity to support social demands and physiological relaxation. In occasions of threat the 10th cranial nerve (vagus) that controls the heart is disengaged providing an immediate cardiac output for mobilization of the organism without the need for activating the costly sympathetic adrenal-system. It is only after prolonged challenges that the sympathetic nervous system takes action. Mammalian evolution allowed the rise of this efficient mechanism that enables not only fast mobilization, but also fast physiological restoration as re-engagement of the vagal nerve inhibits sympathetic inputs to the heart (Vanhoutte and Levy, 1979). People with social or affective disorders do not show the same efficiency in dealing with environmental stressors as emotional arousal seems to engage lower, more primitive, structures associated with immobilization and energy saving rather than primary physiological responses such as heart mobilization and sympathetic engagement. Low cortisol reactivity has been linked to post-traumatic stress disorders (Yehuda et al., 1996), schizophrenia (Jansen et al., 2000), as well as to child neglect and abuse (De Bellis et al., 1994).

The majority of studies that have examined social attention and proximity were conducted decades ago with only a few managing to exercise full experimental control over the variables of interest. In fact to date, social attention research has largely been conducted in non-realistic experimental settings (Freeth et al., 2013) without the agent being physically present (Risko et al., 2012). Furthermore, only a few studies have addressed physiological elements of social arousal despite the fact that somatic arousal defines behavioral engagement strategies (Damasio, 1996). The current study aims to measure physiological responses in a more ecological experimental setting using high sensitivity functional thermal infrared imaging (fTII).

Thermal imaging is an upcoming physiological technique that allows recordings of cutaneous temperature variations wirelessly without interfering with the experimental procedure or the participant's biological movements (Pavlidis et al., 2012). Thermal imaging offers similar recording efficiency to GSR in reflecting autonomic effects in experimental procedures, without its problems of hitting ceiling levels of reaction (Kuraoka and Nakamura, 2011). Physiological observations of an affective nature are primarily related to subcutaneous vasoconstriction or vasodilation as well as heart rate and blood flow (Kistler et al., 1998). The validity of this technique for the measurement of various types of arousal has been demonstrated by simultaneous recording of proven measures such as GSR and Laser Doppler flowmetry (Kistler et al., 1998; Pavlidis et al., 2012).

The effects of social attention (direct gaze/averted-headgaze) and proximity (social space—4 meters/intimate space—0.5 meters) on facial temperature are examined. The majority of studies measuring facial temperature have only looked at one site; in the current experiment multiple sites are examined in order to get a more accurate index of temperature fluctuations in the face as a function of condition. Temperature is measured from six regions of interest (ROI) on the face selected on the basis of previous research: (1) the nose (Nakayama et al., 2005; Kuraoka and Nakamura, 2011; Ioannou et al., 2013), (2) chin, (3) cheeks (Nakanishi and Imai-Matsumura, 2008), (4) periordital region (Pavlidis et al., 2001, 2002; Hahn et al., 2012), (5) maxillary area (Shastri et al., 2012), as well as the (6) forehead (Zhu et al., 2008). It is expected that the highest values of physiological arousal are going to be observed when the experimenter looks at the participant from intimate compared to social distance. In addition, being at a social distance will have a greater effect when the experimenter's gaze is directed toward the participant, rather than when averted. Finally, being at an intimate distance and not looking at the participant, will have a greater effect than when the experimenter is at a social distance, independent of gaze.

# **METHOD**

## **ETHICS**

The Research Ethics Committee of the Faculty of Science of the University of Portsmouth gave approval for the study. Experimental procedures were in line with the declaration of Helsinki and the Code of Human Research Ethics of the British Psychological Society.

## **PARTICIPANTS**

Eighteen female participants were recruited for the study with an age range of 19–21 years old (*M* = 19.83, *SD* = 1.30). Exclusion criteria for participation in the study included gender (males), neurological or mental illness, as well as psychophysiological disorders. In order to improve the reliability of the physiological observation, consumption of vasoactive substances (nicotine, caffeine, alcohol) for at least 2–3 h prior of participation was prohibited. The female participants came from a range of cultural backgrounds and recruitment was performed through personal contacts and the University of Portsmouth recruitment database.

## **DESIGN**

A 2 × 2 × 2 mixed factorial design was employed. The within subjects factors were gaze (direct gaze vs. averted gaze) and distance (social space vs. intimate space). The independent groups factor was order (intimate space then social space vs. social space then intimate space. The order of gaze vs. gaze aversion was fixed with the gaze aversion condition always occurring first. The dependent variable examined was face skin temperature on six sites on the face.

## **PROCEDURE**

Upon arrival participants completed an Informed Consent Form, and then the BIS/BAS questionnaire (Carver and White, 1994). They were then escorted to the test laboratory where they were instructed to sit comfortably on a chair. Prior to the start of the experimental procedure, the participants were familiarized with a buzzer that was an integral part of the experimental protocol. During this period they were exposed to the sound of the buzzer, held the buzzer as it produced the sound and were fully informed about the reason why a buzzer would be used. Prior to any recordings the participants spent at least 10–15 min in the test laboratory. During the experimental procedure the female experimenter moved from intimate (0.5 m) to social space (4 m) or from social to intimate space. Visual floor markers were provided to the experimenter to define the precise distance from the participants, as well as other filler floor marks to avoid prediction of the experimental order. The transition from one experimental phase to another was signaled every 40 s by the buzzer. The buzzer sounded six times (a) the start of the experiment, (b) the second experimental phase, (c) the transition period from one social space to another, (d) the third experimental phase (e) the final phase as well as (f) the end of the experiment. Once the experiment was completed participants were given a self-report questionnaire regarding how uncomfortable or comfortable they found the four experimental conditions.

## **MATERIALS AND DATA ACQUISITION** *Data acquisition*

To perform recordings of skin temperature a digital Guide Infrared TP8 camera (ThermoPro™) was used with an uncooled FPA microbolometer (384 × 288 pixels). TP8 provides temperature sensitivity of 0.08 K with an accuracy of ±1◦C and a sampling rate of 1 frame per second. Prior to recording the camera was placed 50 cm away from the participant, was automatically calibrated and manually fixated on the individual's face. The sampling rate was set at 50 Hz. The experimental room was set at normal temperature 20–21◦C, 60–65% humidity, and with no direct sunlight, ventilation or airflow. Prior to the experimental procedure the participant were left for 15 min in the experimental room to acclimatize. All experimental recordings took place in the afternoon between 2–4 p.m. In addition behavioral recordings took place with a frame rate of 50 Hz, by two radio-controlled cameras, both connected to a DVD recorder The two video signals were combined using a Pinnacle system providing a two-split movie.

# *Questionnaires*

To control for any personality variables that might have affected autonomic arousal (Critchley et al., 2001; Gaynor and Baird, 2007; Hughes et al., 2012) the BIS/BAS scale (Carver and White, 1994) was administered. For the current study a two-factor model of the BIS scale was used as suggested by Heym et al. (2008) where BIS is separated into BIS-anxiety, (4 items) related to conflicts, negative criticism etc. and the freeze/fight/flight system (FFFS-fear) that relates to fear responsiveness to punishment (3 items). The BAS scale is divided into three subscales (a) Drive for achieving goals (DR-4 items), (b) Fun-Seeking or Sensation seeking (FS-4 items, α = 0.73), and (c) Reward Responsiveness (RR-5 items). For the current study Chronbach's alpha value were for BIS-anxiety 0.57, for FFFS-fear 0.63, DR 0.71, FS 0.46, and RR 0.64. The relatively low Chronbach alpha values obtained may be due to the small number of items included each subscale. Nevertheless this psychometric scale is widely used, has good psychometric properties as well as good convergent and discriminant validity (Campbell-Sills et al., 2004). Furthermore four questions were given to the participants regarding pleasantness or unpleasantness of their experience. Rating were provided on a five point Likert-scale ranging from 1 = not uncomfortable to 5 = very uncomfortable. The questions were the following: (a) How uncomfortable/comfortable did you feel when the experimenter was looking at you from the back of the room? (b) How uncomfortable/comfortable did you feel when the experimenter was looking at you from a close distance? (c) How uncomfortable/comfortable did you feel when the experimenter was **not** looking at you from the back of the room? (d) How uncomfortable/comfortable did you feel when the experimenter was not looking at you from a close distance?

## *Thermal data analyses*

Prior to the analyses the behavioral and thermal videos were synchronized in order to represent the same frame in time. Frames were extracted every 5 s using Launch GuideIR analyser by Wuhan Infrared Technology (http://www.guide-infrared.com). This was performed in a consistent manner across frames since participants' movements were minimal because of the nature of the experiment. For temperature extraction of the ROI different shapes were used as indicated by Ioannou et al. (2014). Circular shapes were used for the nasal tip, cheek, and the periorbital regions, whereas rectangular shapes for the maxillary area and forehead. Oval shapes were used only for the chin. The shapes did not vary in size across frames and temperature was extracted only when the face was in direct angle to the camera as it has been previously suggested that the above factors induce relative noise (Ioannou et al., 2013). On average 37 frames were extracted for each participant approximately nine for each phase. To perform the analyses the Statistical Package for the Social Sciences, version 17 (SPSS, Chicago, IL) was used. Data was screened to ensure it was suitable for parametric analysis. A reliability check was conducted. Results from five participants were analyzed by a second rater naïve to the purpose of the study. The second ratter performed temperature extraction on the same frames that were primarily selected for the five individuals (× 37 frames). In addition before moving into a Kappa measure of agreement the degree of temperature change from one condition to the other was calculated for both coders (see **Table 1**). Kappa's alphas (*p* < 0.05) ranged from moderate 0.64 (Cheek), to good 0.70 (Forehead, Periorbital), to very good agreement >0.8 (Nose, Chin, Maxillary). To eliminate the possibility that the results from

**Table 1 | The degree of temperature change from one condition to another based on the coding of two independent ratters.**


the two ratters were different a 2 × 5 between groups' analyses of variance was conducted. No significant difference was observed between the two groups (*p* < 0.05).

# **RESULTS**

# **CORRELATIONS BETWEEN TEMPERATURES ON THE SIX SITES ON THE FACE**

We correlated the temperature values from each site on the face with all other sites on the face for each condition. There were six sites on the face, which gives 15 correlations when all sites are correlated with all other sites. As there were four conditions this gives a total of sixty correlations. All of the sixty correlations were significant as were the means of the correlations for each condition (intimate space, gaze aversion *M* = 0.65, *SD* = 0.13; intimate space, gaze *M* = 0.67, *SD* = 0.13; social space, gaze aversion, *M* = 0.71, *SD* = 0.12; social space, gaze, *M* = 0.66, *SD* = 0.12). This is strong evidence that the different sites on the face are measuring a similar underling construct.

## **FACIAL TEMPERATURE ANALYSES**

To obtain a more clear and robust pattern on the effects that interpersonal distance and gaze had on facial skin temperature, all ROI were averaged (see **Table 2**, **Figure 1**) and a 2 × 2 × 2 mixed factorial ANOVA was performed on the averaged data. No significant interaction effects were observed between interpersonal distance, gaze and order, Wilks' Lambda = 0.96, *F*(1, 16) = 0.66, *p* = 0.429, η2 *<sup>p</sup>* = 0.04 or order and gaze, Wilks' Lambda = 0.80, *F*(1, 16) = <sup>3</sup>.89, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.066, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0.19. There was a significant interaction between interpersonal distance and order, Wilks' Lambda = 0.41, *<sup>F</sup>*(1, 16) <sup>=</sup> <sup>22</sup>.68, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.000, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0.58 (see **Figure 2**). From **Figure 2** it seems that the interaction is a function of the fact that temperature increases when the experimenter moves from social space to intimate space but is relatively unaffected by distance when moving from intimate space to social space. This interpretation is supported by simple main effects analyses (with Sidak adjustment). It was observed that there was a significant increase in temperature when the experimenter moved from social space (*M* = 33.20, *SD* = 1.05) to intimate space (*M* = 33.62,



*SD* = 1.01), *p* = 0.000. However, no significant difference was observed in temperature when the experimenter moved from intimate space (*M* = 34.25, *SD* = 1.22) to social space (*M* = 34.32, *SD* = 1.18), *p* = 0.054 (see also **Table 3**). There was also an interpersonal distance and gaze interaction, Wilks' Lambda = 0.76, *<sup>F</sup>*(1, 16) <sup>=</sup> <sup>5</sup>.03, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.039, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0.24 (**Figure 3**). From **Figure 3** it appears that the interaction is the result of the fact that the effect of direct gaze increasing temperature is greater in the intimate space condition than the social space condition. Simple main effects analyses provide some limited supported for this interpretation as there was a significantly higher temperature in the intimate space, direct gaze (*M* = 34.02, *SD* = 1.14) condition compared to intimate space gaze aversion condition (*M* = 33.84, *SD* = 1.75), *p* = 0.000. The temperature was also significantly higher in the social space condition when the experimenter engaged in direct (*M* = 33.79, *SD* = 1.21) compared to averted gaze (*M* = 33.72, *SD* = 1.29) *p* = 0.014. However, the difference was greater in the intimate space condition (see also **Table 3**). There was a significant main effect of gaze with direct gaze having a higher temperature (*M* = 33.90, *SD* = 1.21) than the gaze aversion (*M* = 33.78, *SD* = 1.16), Wilks' Lambda = 0.36, *<sup>F</sup>*(1, 16) <sup>=</sup> <sup>28</sup>.35, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.000, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0.64, with a large effect size. Given the ordinal interaction between gaze and distance it is safe to interpret this main effect and conclude that direct gaze always produces a large effect on facial temperature. There was also a significant effect of interpersonal distance with temperature being higher in the intimate space condition (*M* = 33.93, *SD* = 1.15) than the social space condition (*M* = 33.76, *SD* = 1.23) Wilks' Lambda <sup>=</sup> 0.58, *<sup>F</sup>*(1, 16) <sup>=</sup> <sup>11</sup>.66, *<sup>p</sup>* <sup>=</sup> <sup>0</sup>.004, <sup>η</sup><sup>2</sup> *<sup>p</sup>* = 0.42 with a large effect size. Given the interaction results we can again be relatively confident that there is a pervasive and robust elevation of temperature in intimate space. Finally in order to provide a visual illustration of the effects of the experimental protocol on facial temperature four images were created from two randomly selected individuals for each experimental order (**Figure 4**). The images were taken 10 s prior of the end of each phase. This was performed in order to allow enough time for large temperature effects to take place on the skin surface that would enable a vibrant visual illustration of the infrared image.

## **INDIVIDUAL REGION ANALYSES**

The mean values for each of the six ROI for each condition were calculated from all 18 individuals and mixed 2 × 2 × 2 mixed factorial ANOVAs were performed on the data for each region of interest. The repeated measures factors were (eye contact vs. head gaze aversion), proximity (social space—4 meters vs. intimate space—0.5 meters); and the independent groups factor was order (social space then intimate spaces vs. intimate space then social space). On all occasion the three-way interaction was not significant. However, two-way interaction effects between interpersonal distance and order were observed for the nose, cheek, chin, and maxillary area. In addition a significant interaction between distance and gaze was observed for the chin. In the "normal" order the pattern was very consistent. Temperature increased in all ROI when the experimenter moved from social space to intimate space. However, in the "odd" order the change in temperature was much less consistent. Main effects for distance and gaze were


### **Table 3 | Results of the simple main effect analyses between distance and order as well as distance and gaze.**

*Order 1, Social space* → *Intimate space; Order 2, Intimate space* → *Social space.*

observed for the nose, maxillary area, cheeks, and the forehead. Temperature was higher in intimate rather than social space and higher in direct gaze compared to averted gaze. However, the effect of this latter could only be observed within the same interpersonal space (e.g., Intimate Space-Averted Gaze vs. Intimate Space-Direct Gaze). There was no significant main effect of order.

## **QUESTIONNAIRES**

## *Questionnaire analyses*

There was no significant correlation between BIS/BASS scores and temperatures on the six ROI. A One-Way ANOVA was conducted to examine the effect of the four experimental conditions (gaze aversion, intimate space, gaze aversion, social space, gaze intimate space, gaze social space) on the subjective pleasantness rating scores showed that unpleasantness scores on the four subjective questions were significantly different *F*(3, 68) = 10.10, *p* = 0.000, η<sup>2</sup> *<sup>p</sup>* = 0.3 with a medium effect size. *Post-hoc* comparisons using Tukey HSD indicated that unpleasantness scores were significantly higher when the experimenter was in intimate space and engaged in direct gaze (*M* = 4.17, *SD* = 1.1) compared to direct gaze in social space (*M* = 3.06, *SD* = 0.87), averted gaze in intimate space (*M* = 2.89, *SD* = 1.18) and averted gaze in social space category (*M* = 2.89, *SD* = 1.18). No other significant differences between groups were observed.

## **DISCUSSION**

In the present study we explicitly modulated the social context in which gaze and proximity occurred by having two different

experimental sequences. One sequence involved what would be considered a socially normal shift from a social distance to an intimate distance. The other involved a socially odd shift from an intimate distance to a social distance. At each distance there was a fixed sequence of the two gaze conditions: direct gaze always followed averted gaze. Analyses showed that when moving from social to intimate distance facial temperature rose on average by 0.42◦C. However no significant temperature change between the two distances was observed when the socially odd sequence took place. On the other hand the effects of direct gaze compared to averted gaze were significant independent of order: direct gaze led to a higher temperature than averted gaze at both the intimate distance (a difference of 0.17◦C) and at the social distance (a difference of 0.10◦C). The subjective ratings given by the participants on the self-report unpleasantness scales supported the thermal findings: the highest "uncomfortable" scores were obtained by direct gaze in intimate and social distance followed by averted gaze in intimate and social distance.

Previous studies suggest that gaze and distance seemed to have a consistently robust effect on a range of psychophysiological measures. The current findings obtained using fTII are in agreement with results obtained with GSR (McBride et al., 1965; Nichols and Champness, 1971) and electroencephalography (Gale et al., 1975). As previously observed, direct gaze not only increases arousal but also seems to be mediated in intensity by interpersonal distance. Although in the current study the participants could not alter their interpersonal distance, the findings provide support for Argyle and Dean's (1965) equilibrium theory arguing that interpersonal distance and gaze interact to modulate arousal. In the present study, as interpersonal distance decreased, the arousal effects of direct gaze were greater. Overall it would appear that interpersonal distance and gaze interact to have strong effects on human physiology, with temperature variability of the face affected differently by each experimental condition. These bodily signs of autonomic arousal picked up by thermal imaging reveal the preparedness of the organism to support behavioral engagement strategies whether these involve social interaction or mechanisms of avoidance (Porges, 2001; Bodenhauzen and Hugenberg, 2009).

The increase in facial temperature creates a "physiological paradox." Although the overall subjective experience of personal space intrusion as well as eye contact independent of distance was rated as uncomfortable, there was a rise rather than a dip in facial temperature. Previous literature using thermal imaging has found that negative emotions such as fear (Kistler et al., 1998; Nakayama et al., 2005; Kuraoka and Nakamura, 2011), stress (Pavlidis et al., 2012), and guilt (Ioannou et al., 2013), lead to a drop in the temperature of the nose, the maxillary area, the forehead, as well as the fingers as a result of peripheral vasoconstriction. From the present results it seems that the experience of interpersonal proximity and gaze does not fall physiologically into the same category. Increases in facial temperature have been observed in experiments of social contact (Hahn et al., 2012) and anxiety (Pavlidis et al., 2002; Tsiamyrtzis et al., 2007; Zhu et al., 2008). In the case of Hahn et al. (2012) participants were touched on various parts of the body by female and male experimenters using a handheld light-flashing device. Body parts that were touched were the face and chest (high-intimate) and the arm and palm (low intimate). What was observed was that when participants were touched on high intimate areas temperature increased, with an even greater increase when this act was performed by an experimenter of the opposite sex. These increases in temperature were localized on the nose, lip and peri-orbital regions of the face. Anxiety in individuals seems to cause a similar effect on facial temperature. Participants who were interrogated for a mock crime that they had just committed and who tried to defend their innocence showed an increase in temperature near the peri-orbital (Tsiamyrtzis et al., 2007) and the supraorbital vessels of the forehead (Pavlidis et al., 2002; Zhu et al., 2007). The results obtained by Pavlidis et al. (2002) are consistent with traditional polygraphs tests that use physiological measures of pulse, blood pressure, perspiration and skin conductivity to draw conclusion about the honesty of the individual. Behaviorally, something that is common in the above experiments is a challenging social situation. In extent although physiologically, evidence exist explaining the reason why this might have happened in the case of Pavlidis et al. (2002), Zhu et al. (2007) as well as Tsiamyrtzis et al. (2007) no evidence exists to explain why this might have been caused in the study by Hahn et al. (2012). Pavlidis and Levine (2002) suggested that temperature increase results from increased blood perfusion to the surface of the skin. Increased blood perfusion is the result of increased delivery of blood to body tissue and the heart is the organ of the body that can sustain such a function (Kreibig, 2010). Thus judging from the previous literature on thermal imaging, increased blood flow to the surface of the skin is the result of increased heart rate that causes the skin temperature to rise. The current physiological findings as well as the observation made by previous research are in support of Polyvagal theory (Porges, 2001). According to this theory mammalian evolution favored the development of an efficient neural control model, which provided increased control of the heart via the myelinated vagus, the 10th cranial nerve. When needed, sympathetic tone expression and increase cardiac output supports transitory mobilization states without activating the costly sympathetic or adrenal system; only if vagal disengagement can provide safety from short-lived stressors. Furthermore, the findings of the current study are not agreement with Bell et al. (1996) as the proposed "stress theory" should also have the appropriate temperature tendency. In the current set-up a rise in temperature was observed instead of a drop (Kistler et al., 1998; Pavlidis et al., 2012; Ioannou et al., 2013).

No significant order effect was observed, however, in the "odd" sequence what both **Figures 1**, **4** have in common is that no significant temperature change took place from one condition to the other and only minor temperature changes can be observed. Approaching the individual initially from intimate distance and then moving to social space yielded no significant temperature changes. Although at the group level, significant results were obtained in direct gaze compared to averted gaze and independent of sequence, this outcome might have reached statistical significance because of the power of the normal experimental order (Social distance → Intimate distance). The way that the experiment was designed seems to favor the approach moving from social distance to intimate distance as a linear increase in temperature was observed. In this order physiological effects seem to intensify from one condition to the other. On the other hand, moving from intimate to social distance, temperature changes did not behave in the same manner. What can be observed by the "odd" experimental sequence is an overcompensation effect where temperatures started off higher and decreased less. Although in intimate distance, an increase in temperature from averted to direct gaze was observed, no temperature change took place during the transition from intimate to social distance. Facial temperature did not have the opportunity in a 40 s time interval to reject physiological changes that took place at intimate distance along with direct gaze before moving to social distance and averted gaze. It is believed that these results show a physiological temperature "spill over" effect from the most arousing condition to the next as can be seen during the transition from intimate to social space.

Literature on thermal imaging does not provide evidence on the time needed for facial temperature to return back to baseline or rest values when the temperature of the face rises. However, temporal evidence exist on how temperature behaves when temperature decreases. Nakayama et al. (2005) reported that after stimulation, 220–280 s are needed in order for temperature of the nose to return to baseline values. Moreover Kuraoka and Nakamura (2011) reported that changes on the nose lasted on average for 60 s before descending back to baseline values. Evidence from rodents suggests that according to the region of interest there is also the appropriate expected delay (Vianna and Carrive, 2005). The back, head and the body of the animal took approximately 60–75 min to return to baseline whereas the eyes, tails and paws 14, 10, and 15 min consecutively. Although changes in heart rate take place much more rapidly than changes in vasoconstriction (Kistler et al., 1998; Vianna and Carrive, 2005) and despite the fact that the two physiological phenomena have different underlying mechanisms we believe that there was not enough time in the present study in the transition from intimate to social distance for adequate heat changes to be observed between conditions. Finally although temperature changes in the direction of the physiological excitation can be rapidly observed within 15– 20 s (Kistler et al., 1998; Nakayama et al., 2005; Kuraoka and Nakamura, 2011), as a result of an affective stimulus temperature restoration takes substantially longer.

The current experiment demonstrates that this novel, wireless, physiological technique has the sensitivity not only for picking up changes in the intensity of the stimulus but also in replicating results that have been observed by previous studies using other widely established physiological measures. Through this experimental model the foundation stone has been set for other studies in the clinical domain whether this relates to diagnoses or the efficacy of treatment (De Bellis et al., 1994; Yehuda et al., 1996; Jansen et al., 2000; Horley et al., 2003; Dalton et al., 2005). Thermal imaging is a valuable tool in studies in which participants cannot express their emotions verbally (Nakanishi and Imai-Matsumura, 2008; Kyselo and Di Paolo, 2013; Uithol and Paulus, 2013) or in studies where emotional arousal can only be inferred by coding behavior and by measuring physiological responses (Vianna and Carrive, 2005).

Functional thermal imaging has the potential for identifying subtle psychophysiological changes that take place on the surface of the skin as a result of underlying vasoconstriction or heart rate variability. Although in the current experiment a rise in temperature was observed no direct other physiological measures were obtained to clearly explain why such changes took place. Literature on the topic provides some evidence as to why this might have happened and we can only speculate that this is related to increased heart rate output based on the findings from the literature (Pavlidis and Levine, 2002). It would be important in future studies that investigate temperature changes to employ heart measures to explain temperature related physiological observations. In addition related to the context of the current study are the other two distances "personal" and "public" which would also be nice to investigate (Hall, 1966). Furthermore since in the current experiment female participants were exposed to female experimenters, mixed gender dyads could be added as well as mixed gender groups. Some of the temperature changes observed might have resulted from the reflection of heat from the experimenter. However, given the strong psychophysiological effect of the experimental conditions and orders, this is unlikely to account for the changes measured Future research could attempt to measure and exclude such effects. Finally, thermal imaging has a poor temporal latency despite its sensitivity in picking up small fluctuations in temperature. This is not because of the inadequacy of the technique but because of the temporal latency that the skin needs to exhibit changes of physiological nature whether these are the results of vascular constriction or of increased blood flow. Thus it is important that in experiments that do not follow a linear increase in the intensity of the presented stimuli to leave adequate time from one condition to the next in order for temperature to return to approximately baseline values.

## **CONCLUSIONS**

Interpersonal distance and perceived gaze are two related social constructs with each one imposing its presence on the physiological reactions of the receiver. Current observations of these phenomena suggest that direct compared to averted gaze affect autonomic reactions and facial temperature. These results persist independent of the distance from which gaze occurred. In terms of interpersonal distance, intruding an individual's intimate space led to a marked increase in temperature. This result however was only evident when there was an approach from social to intimate distance. On the other hand a difference in temperature was not observed when the individual was approached primarily by the experimenter in intimate distance and then moved to social distance. This phenomenon of "physiological spill-over" represents an effect that lasted longer than the pre-defined time interval of the experimental phase. Skin temperature did not recover after it was exposed to the most arousing intimate condition and this effect lasted after transition was made to the least arousing condition in social space. Despite the methodological significance of the study, gaze and interpersonal distance have their own piece of the pie to claim in social interaction. Physiological reactions obtained by facial skin temperature suggest that preparatory action for engagement or avoidance takes place by the organism when gaze is engaged and when intimate space is violated. However at the level of conspecifics and as suggested by the physiological reactions of the participants, social elements of space and gaze are not treated as threatening since, if they were, a drop in temperature showing the full blown effects of threat would have been observed. These results suggest rather, a physiologically preparatory action by the organism for what will follow whether this is an attack or a pleasant social interaction.

## **ACKNOWLEDGMENTS**

This work was supported by the Marie-Curie Initial Training Network, TESIS: Toward an Embodied Science of Inter-Subjectivity (FP7-PEOPLE-2010-ITN, 264828).

# **REFERENCES**


Deus, V., and Jokic-Begic, N. (2006). Personal space in schizophrenia patients. *Psychiatr. Danub.* 18, 150–158.



Sommer, R. (1959). On writing "little papers." *Am. Psychol.* 14, 235–237.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 May 2014; accepted: 16 July 2014; published online: 04 August 2014. Citation: Ioannou S, Morris P, Mercer H, Baker M, Gallese V and Reddy V (2014) Proximity and gaze influences facial temperature: a thermal infrared imaging study. Front. Psychol. 5:845. doi: 10.3389/fpsyg.2014.00845*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Ioannou, Morris, Mercer, Baker, Gallese and Reddy. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Thermal expression of intersubjectivity offers new possibilities to human–machine and technologically mediated interactions

# *Arcangelo Merla\**

Infrared Imaging Lab, Department of Neuroscience, Imaging and Clinical Sciences–Institute of Advanced Biomedical Technologies, G. d'Annunzio University, Chieti–Pescara, Italy

## *Edited by:*

Ezequiel Alejandro Di Paolo, Ikerbasque – Basque Foundation for Science, Spain

#### *Reviewed by:*

Ezequiel Alejandro Di Paolo, Ikerbasque – Basque Foundation for Science, Spain Ricardo Angelo Rosa Vardasca, Instituto de Engenharia Mecânica – Faculty of Engineering, University of Porto Campus, Portugal

#### *\*Correspondence:*

Arcangelo Merla, Infrared Imaging Lab, Department of Neuroscience, Imaging and Clinical Sciences–Institute of Advanced Biomedical Technologies, G. d'Annunzio University, Via dei Vestini, 31 Chieti–Pescara, Italy e-mail: a.merla@itab.unich.it

The evaluation of the psychophysiological state of the interlocutor is an important element of interpersonal relationships and communication. Thermal infrared (IR) imaging has proved to be a reliable tool for non-invasive and contact-less evaluation of vital signs, psychophysiological responses, and emotional states. This technique is quickly spreading in many fields, from psychometrics to social and developmental psychology; and from the touch-less monitoring of vital signs and stress, up to the human–machine interaction. In particular, thermal IR imaging promises to be of use for gathering information about affective states in social situations. This paper presents the state of the art of thermal IR imaging in psychophysiology and in the assessment of affective states. The goal is to provide insights about its potentialities and limits for its use in human–artificial agent interaction in order to contribute to a major issue in the field: the perception by an artificial agent of human psychophysiological and affective states.

**Keywords: human–machine interaction, psychophysiology, thermal infrared imaging, emotions, intersubjectivity**

# **INTRODUCTION**

We routinely interact with machines since they pervade our lives. Over the centuries, the way we interact has dramatically changed since the machines have evolved from pure mechanical tools to complex robots endowed with humanoid capabilities. If we refer to machine as every non-human non-biological actor able to passively or actively interact with humans, the fields of human–machine interaction (HMI), human–computer interaction (HCI), and human–robot interaction can be unified into the general field of human–artificial agents interaction (HAI).

A common key challenge of all typologies of the artificial agents (AA) is to set up a contingent interaction. This means that AA not only must react to human actions, but also that they must (or should) react in ways that are congruent with the emotional and psychophysiological state of the human user or interlocutor. The latter aspect is especially relevant for social and affective robots, which are designed to interact with human users in a variety of social context and over long periods of time. Such AA need to communicate with people in ways that must be promptly comprehended and accepted (Kirby et al., 2010).

Affective state, mood, and emotion play an important role in social interaction. Emotional responses are triggered by social interactions, influenced by cultural and societal patterns, and expended to communicate desires to other people (Parkinson, 1996). Emotions bring colloquial content, consenting conversational partners to increase the effectiveness of their communication (Clark and Brennan, 1991). For example, the desire or the need to be comforted may be expressed through a manifestation of sadness that may be facial, vocal, or behavioral. Moreover, the actual mood of a person may have an effect on the way that person interacts with others (Forgas, 1999). People who are interacting may unconsciously tune moods and emotions to match those of their conversational partner (Wild et al., 2001). Cover-up of emotions can be highly disadvantageous for forming relationships and is disruptive to conversations (Butler et al., 2003). In fact, the principal reason for social interaction is to experience emotions, which help to develop a "sense of coherence with others" (Frijda, 2005).

People tend to treat AA as they treat other people, attempting to establish a social relationship with them (Reeves and Nass, 1996). Therefore, the above-mentioned "sense of coherence with others" defines the core of need for congruency of the HAI. Understanding the psychophysiological state of other individuals plays an essential role for planning or adopting congruent strategies in social interactions. Such an innate capability is at the basis of empathetic sharing among humans. To give AA this capability is one of the most important challenges in the field of the HAI (Pantic and Rothkrantz, 2003). However, recognition and instrumental measuring of affective states is also one

of the most challenging research activities in the field of applied psychophysiology.

# **ASSESSMENT OF PSYCHOPHYSIOLOGICAL STATES THROUGH THERMAL INFRARED IMAGING**

To date, monitoring of psychophysiological and emotional states is usually performed through the measurements of several autonomic nervous system (ANS) parameters, like skin conductance response, hand palm temperature, heartbeat, and/or breath rate modulations, and peripheral vascular tone. This assessment is also performed through behavioral channels, like facial expression recognition and electromyography activity. Classical technology for monitoring autonomic activity usually requires contact sensors or devices, resulting somehow invasive and potentially biasing the estimation of the state, as the compliant participation of the individual is required.

Thermal infrared (IR) imaging was proposed as a potential solution for non-invasive and ecological recording of ANS activity (Merla et al., 2004). Thermal IR imaging, in fact, allows the contact-less and non-invasive recording of the cutaneous temperature through the measurement of the spontaneous thermal irradiation of the body.

The autonomic nervous system is fundamentally involved in the bioheat exchange, unconsciously controlling heart rate, breathing, tissue metabolism, perspiration, respiration, and cutaneous blood perfusion. It provides an effective tool for observations of emotional responses and states. Previous research in this field has demonstrated that thermal IR imaging (also referred to as functional infrared imaging,fIRI) can characterize competing subdivisions of the ANS (Murthy and Pavlidis, 2006; Garbey et al., 2007; Merla and Romani, 2007; Pavlidis et al., 2007; Shastri et al., 2009; Merla, 2013; Engert et al., 2014). Since the face is usually exposed to social communication and interaction, thermal imaging for psychophysiology is performed on the subject's face. Given the proper choice of IR imaging systems, optics, and solutions for tracking the regions of interest, it is possible to avoid any behavioral restriction of the subject (Dowdall et al., 2006; Zhou et al., 2009).

The reliability and validity of this method was proven by comparing data simultaneous recorded by thermal imaging and by golden standard methods, as ECG, piezoelectric thorax stripe for breathing monitoring or nasal thermistors, skin conductance or galvanic skin response (GSR).Asfor the latter, studies have demonstrated that fIRI and GSR have a similar detection power (Coli et al., 2007; Shastri et al., 2009; Pavlidis et al., 2012; Di Giacinto et al., 2014; Engert et al., 2014).

An almost exclusive feature of thermal IR imaging in stress research is its non-invasiveness. In a recent study, Engert et al. (2014) explored the reliability of thermal IR imaging in the classical setting of human stress research. Thermal imprints were compared to established stress markers (heart rate, heart rate variability, finger temperature, α-amylase, and cortisol) in healthy subjects participating into two standard and wellestablished laboratory stress tests: the cold pressor test (Hines and Brown, 1932) and the trier social stress test (Kirschbaum et al., 1993). The thermal responses of several regions of the face proved to be change sensitive in both tests. Although the thermal imprints and established stress marker outcome correlated weakly, the thermal responses correlated with stress-induced mood changes. On the contrary, the established stress markers did not correlate with stress-induced mood changes. These results suggest that thermal IR imaging provides an effective technique for the estimation of sympathetic activity in the field of stress research.

The maturity and the feasibility achieved by thermal IR imaging suggest its use even in psychiatry or psychophysiology (Merla, 2013). Recently, thermal IR imaging was used, together with standard GSR, to examine fear conditioning in posttraumatic stress disorder (PTSD; Di Giacinto et al., 2014). The authors examined fear processing in PTSD patients with mild symptoms and in individuals who did not develop symptoms (both groups consisting of victims of a bank robbery), through the study of fear-conditioned response. The authors found: (a) a change of physiological parameters with respect to the baseline condition in both control subjects and PTSD patients during the conditioning phase; (b) the permanence of the conditioning effect in the maintenance phase in both control and PTSD patients; and (c) patients and controls did differ for the variation across the phases of the physiological parameters rather than for their absolute values, showing that PTSD patients had a prolonged excitation and higher tonic component of autonomic activity. These results indicate that the analysis of facial thermal response during the conditioning paradigm is a promising psychometric method of investigation, even in the case of low level of PTSD symptom severity.

Thermal IR imaging was indicated as a potential tool to create, given the use of proper classification algorithms, an atlas of the thermal expression of emotional states (Khan and Ward, 2009; Nhan and Chau, 2010). This would be based on the characterization of the thermal signal in facial regions of autonomic valence (nose or nose tip, perioral or maxillary areas, periorbital, and supraorbital areas associated with the activity of the periocular and corrugator muscle, and forehead), to monitor the modulation of the autonomic activity.

The above-mentioned studies were possible, thanks to the impressive advancement of the technology for thermal IR imaging. Modern devices ensure a high spatial resolution (up to 1280 × 1024 pixels with up to a few milliradiants in the field-of-view), high temporal resolution (full-frame frequency rate up to 150 Hz), and high thermal sensitivity (up to 15 mK at 30◦C) in the spectral range [3÷5] μm (Ring and Ammer, 2012). The commercial availability of 640 × 480 focal plane array of uncooled and stabilized sensors (spectral range 7.5÷13.0 μm; full-frame frequency rate around 30 Hz; thermal sensitivity around 40 mK at 30◦C) permits the extensive use of this technology in the psychophysiological arena.

However, several limitations exist for using thermal IR imaging in a real world and everyday life scenario. Because of the homeostasis, the cutaneous temperature is continuously adjusted to take into account the environmental conditions. Cautions and countermeasures must therefore be adopted to avoid attributing any psychological valence to pure thermoregulatory or acclimatization processes (Merla et al., 2004).

## **THERMAL EXPRESSIONS OF INTERSUBJECTIVITY**

According to Kappas (2013), "emotions are evolved systems of intra- and interpersonal processes that are regulatory in nature, dealing mostly with issues of personal or social concern." Emotions regulate social interaction and the social sphere. According to Kappas (2013), social processes impact and regulate emotions. This means that "intrapersonal processes project in the interpersonal space, and inversely, interpersonal experiences deeply influence intrapersonal processes." These reciprocal connections between interpersonal and intrapersonal emotions and processes are important elements for achieving interaction awareness.

However, as outlined above, emotions may posses a thermal signature or may be characterized by a regulatory activity of the autonomic nervous system, which in turn possesses a thermal imprint through which it can be detected. In addition, the thermal modulation of real and natural social interaction among individuals can be studied non-invasively through thermal IR imaging, even recording thermal signatures from more individuals at once (**Figure 1**). Therefore, it is plausible to talk in terms of thermal expression of emotions and interaction as a channel for studying intersubjectivity intended as psychological relation between people. Studies in this field have regarded mostly maternal empathy and social interaction (Ebisch et al., 2012; Manini et al., 2013).

Early infant attachment was studied using thermal IR imaging in infants exposed to three different experimental phases: (i) separation from the mother; (ii) a short-lived replacement of the mother by a stranger; and (iii) infant in the presence of the mother and the stranger. By observing temperature changes on the infants' forehead, the researchers concluded that infants are aware of strangers and that infants form a parental attachment earlier than previously thought, specifically from 2 to 4 months after birth (Mizukami et al., 1990).

Maternal empathy is considered fundamental to develop affective bonds and a healthy socio-emotional development. Ebisch et al. (2012) demonstrated that a situation-specific parallelism between mothers' and children's facial temperature variations exists (**Figure 1**). This study was the first that proved evidence, in a pure natural context, for a direct affective sharing involving autonomic responding.

An extension of the above study with an additional group of female participants showed that mothers–child dyads in contrast to other-women–child dyads have faster empathic reactions to the child's emotional state (Manini et al.,2013). As for the adults,fewer studies of social interaction with thermal IR imaging are available.

Merla and Romani (2007) exposed the participants to the attention of unknown people, while performing a stressful task (a stroop test). The study was designed in order to elicit feeling of embarrassment and mild stress when the participants failed to

**FIGURE 1 | Facial thermal imprints of a mother–child–other mother triad synchronization during distressing situation (adapted from Manini et al., 2013).** The picture shows many of the features of thermal imaging in psychophysiology, especially the possibility of simultaneous recording several individuals sharing an experimental

condition or a social interaction. Facial temperature variations are shown across experimental phases. Such variations, expression of the sympathetic activity may regard not only the average value but also the spatial distribution of the temperature across the regions of interest.

perform correctly the task in the presence of others. Temperature decreases associated with emotional sweating were observed on the palm and the face, especially around the mouth and over the nose tip. The authors reported that the largest temperature variations were found for those subjects more influenced by the presence of unknown people, while less significant variations were found in subjects less interested in the judgment of others.

Given the capability of thermal IR imaging to capture emotional states, a variety of studies examined the potential of this technique in the context of deception detection (Pavlidis et al., 2002; Tsiamyrtzis et al., 2006; Zhou et al., 2009). Often, individuals who commit a crime show involuntary physiological responses when remembering details of that crime. By capitalizing on the thermal imprint of such responses, Pollina et al. (2006) found significant facial temperature differences between deceptive and non-deceptive participants.

Sexual arousal has clear and marked interrelationships with ANS activity. Merla and Romani (2007) studied the facial thermal response of healthy males to the view of erotic video clips in contrast with the view of sport movies. Through bioheat models, these facial temperature variations were converted into cutaneous perfusion variations and compared with the penis response, measured through a pneumatic device. Cutaneous perfusion of specific facial regions (nose, lips, and forehead) markedly increased during sexual-based content video more than during non-sexual-based stimuli.

Hahn et al. (2012) examined social contact and sexual arousal during interpersonal physical contact. This study investigated facial temperature changes with interpersonal social contact. The stimulus was a standardized interaction with a sameand opposite-sex experimenter touching the subject over face and chest (high-intimate contact) and arm and palm (lowintimate contact). Facial temperatures significantly increased from baseline during the high-intimate contact, these temperature increases being larger when an opposite-sex experimenter touched the subject. The study demonstrated that facial temperature changes were reliable indicators of arousal during interpersonal interactions.

## **THERMAL IR IMAGING AND ARTIFICIAL AGENT PERCEPTION**

In recent years, the robotics community has increased the availability of social robots, that is, robots devoted primarily to interact with human interlocutors. Examples of museum tourguide robots (Nourbakhsh et al., 1999) and robots that interact with the elderly (Montemerlo et al., 2002) prove the advantages of social robots. However, they also pose the awareness of the need of natural and ecologic interactions. Many of these robots incorporate some rudimentary emotional behaviors. Robots with infant-like abilities of interaction were presented (e.g., Kismet by Breazeal, 2003) and used also to demonstrate the ability of people to understand and respond correctly to a robot's display of emotions. Emotionally expressive graphical robot's face encourages interactions with a robot (Bruce et al., 2002).

Therefore, there are several advantages that could derive from the use of thermal IR imaging for HMI. From the point of view of the computational physiology, there is the concrete possibility of monitoring, in a realistic environment, at a distance and unobtrusively, several physiological parameters and vital signs such as pulse rate, breathing rate, cutaneous vasomotor control, and indirect estimation of electro-dermal activity. This opens the way for remote monitoring of the physiological state of individuals without requiring their collaboration and without interfering with their usual activities, thus favoring the use of assistive robots. Another relevant possibility is to capitalize on thermal IR imaging to provide AA with the capability of adopting behavioral or communicative strategies contingent with the actual psychophysiological state of the human interface. This possibility, even though still not completely available, could be particularly effective for affective robots and automatic agents designed for improving and personalizing learning or treatment strategies on the basis of the measured user's psychophysiological feedback.

Also, the technologically mediated interaction could be redesigned through the possibilities offered by thermal IR imaging, as it has been proved that collective emotions in cyberspace can be recorded and classified (Kappas, 2013). Participants communicating in real time via a computer exhibited expression and electrodermal activations according to how well they got acquainted with each other in these interactions. They were physically separated, but online connected via text-based computer-mediated communication (Kappas et al., 2012). These processes emerge in real time and they apparently apply to e-communities of considerable size (Chmiel et al., 2011).

Of course, thermal IR imaging is not the first and unique attempt to endow the AA with the capability of understanding the affective and emotional state of the human interlocutor. This problem is well known to the robotic community (Pantic and Rothkrantz, 2003). Multimodal user-emotion detection systems for social robots have been presented. Alonso-Martín et al. (2013) recently proposed the robotics dialog system (RDS). This system uses two channels of information to detect emotional state: voice and face expression analysis. For emotion detection in facial expressions, the authors developed the gender and emotion facial analysis (GEFA). This system integrates two-party solutions: the first one recognizes the object in the field of view (SHORE – Sophisticated High-speed Object Recognition Engine) and the second one the facial expressions (CERT – Computer Expression Recognition Toolbox). The outcome of these components feed a decision rule to combine the information given by both of them to define the detected emotion.

Cid et al. (2014) presented Muecas, a multi-sensor humanoid robotic head for human–robot interaction. Muecas uses the mechanisms of perception and imitation of human expressions and emotions. These mechanisms allow direct interaction through different natural language modalities: speech, body language, and facial expressions. Muecas can be directly controlled by Facial Action Coding System (FACS), which is defined by the authors as "practically the standard for facial expression recognition and synthesis."

The use of behavioral responses, like speech, body language, and facial expressions, appears to be the most natural for classifying the human interlocutor affective state. However, the amount

of information about the physiological state of the human interlocutor derived from his/her behavioral response is limited or absent at all. In this perspective, thermal IR imaging provides an extraordinary opportunity to add physiological information to behavioral responses for a better classification of affective states and emotional responses (**Figure 2**).

The above-mentioned studies, and the capability of thermal IR imaging of providing computational physiology data (Merla and Romani, 2007; Shastri et al., 2009; Merla, 2013), makes this technique a powerful tool for studying the psychophysiology of interpersonal relationships and intersubjectivity.

As the automatic recording and real-time processing of thermal IR imaging data for psychophysiology in realistic scenario is possible (Buddharaju et al., 2005; Dowdall et al., 2006; Merla et al., 2011), it seems that this technology, in combination or in addition with the other existing technologies, could potentially contribute to endow AA with the capability of monitoring the psychophysiological state of the human interlocutor. The technology and knowledge for achieving this result are available and already implemented in patent care and other applications (Merla, 2013).

Real-time processing of thermal IR imaging data and data classification for psychophysiological applications is possible as

**FIGURE 2 | Visible and thermal facial imprints of happiness (upper panel) and disgust (lower panel).** Thermal infrared (IR) imaging provides physiological response in addition to the behavioral ones measured through facial expression. Changes into the temperature distribution associated with the two different conditions could help in classifying affective states.

the computational demand is not larger than that required for 640 × 480 pixels visible-band imaging data (Buddharaju et al., 2005; Dowdall et al., 2006; Merla, 2014).

A major issue that needs to be addressed for a real use of thermal IR imaging in HMI is how specific method is for identifying specific emotional states at individual level. There are no specific studies available at the moment to answer such an important question, which remains matter of further research. A global limitation derives from the fact that cutaneous thermal activity is intimately linked to the autonomic activity. The question therefore becomes: "how specific and descriptive of each emotion are the autonomic responses?" A universally accepted answer is currently not available. Also no extensive studies are available about the fascinating possibility of merging together physiological information and automatic recognition of facial expressions for providing an atlas of the thermal signatures of emotions.

However, to date, no known attempts have been so far performed to integrate thermal IR imaging in any available system for robotic recognition of human affective state. Therefore, this opportunity remains a fascinating but still speculative possibility that needs to be validated with real-field studies.

## **ACKNOWLEDGMENTS**

This work is supported by the Marie-Curie Initial Training Network,"TESIS: Towards an Embodied Science of Inter-Subjectivity" (FP7-PEOPLE-2010-ITN, 264828)."

The Department of Neuroscience and Imaging, G. d'Annunzio University, is an associated partner of the TESIS network.

## **REFERENCES**


stress disorder. *Neuroscience* 266, 216–223. doi: 10.1016/ j.neuroscience.2014. 02.009


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 May 2014; accepted: 07 July 2014; published online: 23 July 2014.*

*Citation: Merla A (2014) Thermal expression of intersubjectivity offers new possibilities to human–machine and technologically mediated interactions. Front. Psychol. 5:802. doi: 10.3389/fpsyg.2014.00802*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Merla. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Using minimal human-computer interfaces for studying the interactive development of social awareness

# *Tom Froese1,2\*, Hiroyuki Iizuka3 and Takashi Ikegami <sup>4</sup>*

*<sup>1</sup> Departamento de Ciencias de la Computación, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Ciudad Universitaria, Mexico*

*<sup>2</sup> Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Ciudad Universitaria, Mexico*

*<sup>3</sup> Laboratory of Autonomous Systems Engineering, Graduate School of Information Science and Technology, Hokkaido University, Hokkaido, Japan*

*<sup>4</sup> Ikegami Laboratory, Department of General Systems Studies, Graduate School of Arts and Sciences, University of Tokyo, Tokyo, Japan*

#### *Edited by:*

*Ezequiel Alejandro Di Paolo, Ikerbasque - Basque Foundation for Science, Spain*

#### *Reviewed by:*

*Miguel Aguilera, Universidad de Zaragoza, Spain Julien Laroche, Akoustic Arts, USA*

#### *\*Correspondence:*

*Tom Froese, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Apartado Postal 20-126, Col. San Ángel, DF 01000, Mexico e-mail: t.froese@gmail.com*

According to the enactive approach to cognitive science, perception is essentially a skillful engagement with the world. Learning how to engage via a human-computer interface (HCI) can therefore be taken as an instance of developing a new mode of experiencing. Similarly, social perception is theorized to be primarily constituted by skillful engagement between people, which implies that it is possible to investigate the origins and development of social awareness using multi-user HCIs. We analyzed the trial-by-trial objective and subjective changes in sociality that took place during a perceptual crossing experiment in which embodied interaction between pairs of adults was mediated over a minimalist haptic HCI. Since that study required participants to implicitly relearn how to mutually engage so as to perceive each other's presence, we hypothesized that there would be indications that the initial developmental stages of social awareness were recapitulated. Preliminary results reveal that, despite the lack of explicit feedback about task performance, there was a trend for the clarity of social awareness to increase over time. We discuss the methodological challenges involved in evaluating whether this trend was characterized by distinct developmental stages of objective behavior and subjective experience.

**Keywords: social cognition, joint action, social interaction, intersubjectivity, second-person perspective, consciousness, developmental psychology**

# **INTRODUCTION**

Theories about the primacy of embodied interaction over detached social cognition have grown in popularity. For example, there are interaction theory and the narrative practice hypothesis (Gallagher and Hutto, 2008), the concepts of participatory sense-making (De Jaegher and Di Paolo, 2007) and self-other co-determination (Thompson, 2001), the formal methods of interpersonal synergies (Riley et al., 2011) and social coordination dynamics (Oullier and Kelso, 2009), and the second-person approach to neuroscience (Schilbach et al., 2013). Closely related to this emphasis on embodiment and social interaction is the hypothesis of direct perception of other minds (Gallagher, 2008a; Krueger, 2012; Stout, 2012), which holds that perceptual social experience normally takes precedence over, and provides the concrete basis for, reflective social cognition such as simulating and theorizing. These theories thereby doubly break with psychology's traditional emphasis on an individual's thinking as the essential basis of social awareness (e.g., Wegner and Giuliano, 1982).

These theories, which accord primacy to social perceptual interaction in adult social cognition, are naturally complemented by theories that accord primacy to this social perceptual interaction in the development of social cognition in infancy (Gallagher, 2008b). For example, preverbal infants' understanding of other minds is argued to originate and develop within mutual engagement (Reddy and Morris, 2004), secondperson interaction (Fuchs, 2013), and primary intersubjectivity (Trevarthen, 1979). Within this context of co-regulated activity an infant's intentions can emerge and be realized in joint action (Fogel, 1993). Again, this kind of embodied interaction is not conceived as a purely unconscious phenomenon, since an infant's movement always already implies a certain form of animation and affectivity (Sheets-Johnstone, 1999). Rather, embodied interaction is seen as going hand in hand with the development of what has been called dyadic states of consciousness (Tronick, 2004) and self-other conscious affect (Reddy, 2003). A similar emphasis on the developmental precedence of communal embodied coupling before self-other differentiation can be found in the phenomenological psychology of Merleau-Ponty (1964). By extending his account to prenatal development, it can even be argued that the maternal body and fetal body are already situated in an embodied interaction that is affectively structured through the negotiated movements themselves (Lymer, 2011).

The primacy of embodied-social-perceptual interaction is therefore supported by a variety of empirical and theoretical traditions that are progressively being integrated into a cohesive research program (Froese and Gallagher, 2012). However, while this emerging framework is compelling for many, from the mainstream perspective it still needs to further prove its worth compared to the traditional framework by making unique predictions that are experimentally verified.

Following the enactive approach, we have recently provided evidence for the interactive constitution of intersubjective awareness in pairs of adults using a minimal haptic human-computer interface (HCI) (Froese et al., 2014), namely an experimental setup which is known as the "perceptual crossing paradigm" (Auvray and Rohde, 2012). Subsequently, given the close theoretical link between interactive approaches in cognitive science and developmental psychology, we hypothesized that the same kind of setup could also provide insights into the early development of intersubjective awareness. Promisingly, related research with pairs of interacting adults has shown that it is indeed possible to study the development of new communication systems (Galantucci and Garrod, 2011), including on the basis of purely embodied interactions (Iizuka et al., 2013). We were therefore interested to determine whether some preliminary evidence to support this hypothesis could be found by extending the analysis of our original experiment to include diachronic aspects of the interaction process. Given a trial-by-trial analysis, would we find indications of a sequence of developmental stages of social awareness, such as those proposed by interaction-oriented developmental psychologists? We derived the specific form of our hypothesis on the basis of the following considerations.

It has been argued that the phenomenal quality of perception is largely constituted by the specific dynamical form of its underlying sensorimotor skill, rather than just by a dedicated biological organ and/or neural system (e.g., O'Regan and Noë, 2001; Noë, 2004; Mcgann, 2010). Moreover, it follows that if perceptual experience is indeed constituted by skillful sensorimotor interaction then incorporating some form of *mediation* into that embodied interaction will result in a corresponding *modulation* of that experience. Learning how to practically engage the world via new tools, HCI, and other mediating systems1 is associated with the emergence of new ways of being in and experiencing the world, that is, technology is conceived as anthropologically constitutive (Havelange, 2010). Some modulations are relatively subtle changes in perceptual experience (e.g., Davoli et al., 2012), while other phenomenological changes, such as those induced by one's mastery of sensory substitution systems, can be more profound (Lenay et al., 2003; Auvray and Myin, 2009). As we have observed in our research with various kinds of HCIs, the fact that skillful usage of an HCI must first be learned provides us with an opportunity to systematically investigate the development of new modes of perceptual experiencing (Froese et al., 2012). An added methodological bonus is that this development can happen long after infancy, i.e., at a time when the typical adult participants' standard perceptual modalities have normally long been formed already. This idea that learning can recapitulate ontogeny is supported by a tradition in psychology centered on the "microgenetic" method, which has also observed that older individuals sometimes regress to the strategies and developmental trajectories of younger individuals when they are learning an unfamiliar task (Miller and Coyle, 1999).

We were therefore led to the following hypothesis: if we accept the enactive theory that social experience is constituted by skillful interactions with others (Mcgann and De Jaegher, 2009; Froese and Di Paolo, 2011), and our proposal that learning how to co-regulate a mutual interaction by means of an unfamiliar HCI is tantamount to re-acquiring such a social skill, then our original perceptual crossing study should have provided the conditions for the recapitulation of typical infants' developmental stages of social awareness during repeated embodied interaction by pairs of adults. In order to determine whether this hypothesis is worthy of further systematic consideration, we re-analyzed the objective and subjective data from our original study in a diachronic manner. Given the post hoc nature of this analysis, the results are only preliminary. And, while they already look promising in some respects, they also serve to highlight areas where more methodological fine-tuning is still needed.

# **THEORY AND METHODS**

Using technological interfaces has been indispensable for providing support for an interactive approach to social development. For example, evidence for Trevarthen's (1979) notion of primary intersubjectivity has been obtained on the basis of his double TV monitor paradigm, which allowed the insertion of recorded video footage into a live face-to-face interaction (Murray and Trevarthen, 1985). Using this kind of setup, it has repeatedly been demonstrated that infants are sensitive to the co-regulation of social interaction (Nadel et al., 1999). Although cognitivist interpretations of these findings are possible, agent-based modeling of Trevarthen's experimental paradigm has contributed to the formalization of this sensitivity in terms of dynamical systems theory (Di Paolo et al., 2008). And while such modeling can lend formal support to a phenomenological analysis of the structures of intersubjectivity (Froese and Fuchs, 2012), what we are still lacking is an experimental paradigm that allows researchers to systematically investigate the development of social awareness as it is experienced from the first-person (or second-person) perspective.

Indeed, the scientific study of the development of social awareness is confronted by serious methodological challenges. Only in the last decades has there been a growing appreciation of infant consciousness (Trevarthen and Reddy, 2007), and their social experience has been investigated from the second-person perspective, that is, based on the concrete experiences of developmental psychologists who frequently interact with infants (Reddy, 2003; Reddy and Morris, 2004; Tronick, 2004). Clearly theories about infant phenomenology devised through such engagement are valuable, but it would still be desirable to verify them from the infant's perspective. However, in the absence of verbal skills it is difficult if not impossible to apply the usual first- and secondperson methods used in the science of consciousness (e.g., Froese et al., 2011). And while adult investigations of the phenomenology of intersubjectivity provide detailed insights into how we experience others (Ratcliffe, 2007), adults take social awareness

<sup>1</sup>In the category of mediating system we may also include cultural modulators of experience such as language (Bottineau, 2010), norms (Merritt, 2014), and institutions (Gallagher, 2013).

for granted and can no longer remember how it had originally developed2 .

To overcome this problem we took advantage of the perceptual crossing paradigm in psychology (Auvray et al., 2009),

2At least this is the case for people who do not suffer from a mental disorder or some other unusual condition. For example, people with schizophrenia or with an autism-spectrum disorder tend to lack the capacity for direct social perception (Froese et al., 2013). Yet if our hypothesis were correct, it would suggest the intriguing possibility that this perceptual capacity could be partially recovered by engaging in some form of embodied practice that enhances skillful engagement with others, which is consistent with the aims of movement therapies (Fuchs and Koch, 2014).

interface consists of two parts: a trackball mouse that controls the linear displacement of their virtual avatar, and a hand-held haptic feedback device that vibrates at constant frequency for as long as the avatar overlaps with another virtual object and remains off otherwise. Three small lights on each desk signal the start, halftime (30 s), and completion of each 1-min trial. Figure adapted from Froese et al. (2014).

a body object. Unbeknownst to the players a "shadow" object is attached to

which has enabled researchers to systematically investigate the real-time self-organizing dynamics of dyadic interaction by mediating embodied interactions of pairs of adults over a minimal HCI (**Figure 1**). Participants are embodied as avatars in a 1D virtual environment (**Figure 2**). They can move their avatar left and right, and they receive haptic feedback in the form of a constant vibration to their hand for as long as their avatar overlaps with any other virtual object (otherwise the feedback remains turned off). Each participant can encounter three objects: their partner's avatar, an exact copy of the other's avatar that moves at a constant distance from the avatar (which we call the "shadow" object), and a simple static object (one for each player at distinct locations). All objects have the same size and provide the same haptic feedback. They can only be distinguished by means of their differing affordances for interaction.

Participants are instructed to click in order to signal to the experimenters when they have recognized that the object, with which they have been interacting, is their partner's avatar. Participants cannot directly perceive each other's clicks and no feedback is provided until after the experiment. In other words, in order to establish an embodied communication system they must learn how to distinguish between sensations that are generated by their own actions from those generated by the movement of external objects (the problem of separating self from non-self), and to distinguish external movements that express a communicative intention from those that do not. The latter challenge not only involves finding some responsive object as such (the problem of detecting social contingency), but also learning how to differentiate between movements made to change location and movements made with specific communicative intent (the problem of signaling signalhood). And since there is no kind of external feedback, learning can only be guided by impressions obtained via these interactions themselves. It is a formidable task indeed.

Methodologically, this kind of experimental approach shares notable similarities with the microgenetic method of developmental psychology (Siegler and Crowley, 1991). According to Rosenthal (2004), the latter draws on a long tradition which had two key methodological aims, namely "to provide the means of

engagement. Figure adapted from Froese et al. (2014).

externalizing the course of brief perceptual, or other cognitive processes by artificially eliciting 'primitive' (i.e., developmentally early) responses that are normally occulted by the final experience" and "to construct small-scale, *living models* of large-scale developmental processes in such a way as to 'miniaturize' (i.e., accelerate and/or telescope) the course of a given process and bring it under experimental control" (p. 221). Regarding the perceptual crossing paradigm, the choice of asking participants to interact via a novel, minimalist HCI is motivated by the same aims of externalizing and recapitulating the processes underlying the constitution of otherwise already formed perceptual experiences, and thereby making these processes available for scientific investigation (for another example of this approach, see Lenay and Steiner, 2010). Although it could be argued that these methods are confusing development with learning, the distinction between these processes of individual change is not that clear-cut. In addition, the hypothesis that the processes underlying changes that are occurring on differing time scales share important commonalities has long been a useful working hypothesis in developmental psychology <sup>3</sup> .

In the original experiment by Auvray et al. (2009), as well as in several subsequent variations (for a review, see Auvray and Rohde, 2012), it was found that differences in the relative stability of interactions ensured that participants managed to locate each other while avoiding the shadow and static objects. Interaction with the static object is too stable and predictable to be human, whereas the shadow object is too unstable since it moves but does not respond; only the other's avatar can respond to contact by reacting and sticking around. Yet this interactive self-organization of a situation of mutual tactile interaction apparently did not generally coincide with the emergence of an individual awareness of the actual presence of the other participant. While participants signaled recognition more frequently during mutual interaction, thereby objectively solving the task, this could also have simply been a statistical consequence of the fact that they spent more time in mutual interaction. Importantly, Auvray et al.'s statistical analysis revealed that the probability of clicking was not significantly higher after making contact with the other player when compared with its unresponsive shadow copy. Although it is possible that participants were genuinely aware of having engaged with their responsive partner in some cases, this could not be shown with the data. The results therefore fell short of conclusively demonstrating an interactive constitution of social cognition, where social cognition is conceived as resulting in a personal-level insight (Michael and Overgaard, 2012).

On the basis of agent-based models and theoretical considerations, Froese and Di Paolo (2011) hypothesized that this lack of personal recognition of the other was to be expected given that the experimental task was not genuinely social, at least if the mark of the social is conceived specifically as the co-regulation of mutual interaction. Through their coupled behavior the pairs of participants in these studies were forming a multi-agent system of sorts, but without any additional incentive to engage in co-regulated joint action there was little opportunity for social experience, and alongside it individual recognition of the other's presence, to emerge consistently. The original task of clicking whenever encountering the other also allowed purely individualistic strategies to be successful. For example, simply waiting until an object repeatedly made contact, which indicates that it must be the other because she is sensitive to one's presence as an object in the virtual space, and then clicking. However, from the searching other's perspective this kind of unresponsive strategy makes it impossible to distinguish the partner as such. For a genuinely social, that is, shared situation to emerge there has to be mutual engagement.

Froese et al. (2014) tested this hypothesis by running a perceptual crossing experiment in which participants formed teams in a tournament game and were explicitly instructed to help each other with the task of locating each other. In this study 17 pairs of adults completed a sequence of 15 one-minute trials. For each trial they were asked to click once (and once only) as soon as they became aware of the other player's presence. After each trial in which a participant had clicked they were asked to rate the clarity of their experience of their partner on a Perceptual Awareness Scale (PAS) that was adapted for this purpose from Ramsøy and Overgaard (2004), and to give a short free-text description of that experience and their strategy. Specifically, players were asked to give a PAS rating between 1 and 4: "Please select a category to describe how clearly you experienced your partner at the time you clicked: (1) No experience, (2) Vague impression, (3) Almost clear experience, (4) Clear experience." The hypothesis was confirmed: clicks were significantly more probable after contact with the other, most trials led to accurate identification of each other, and such joint success was correlated with high ratings of clarity of the other's presence.

Although that study was not designed to specifically investigate the development of social awareness, our interest in conducting such a diachronic analysis of the results was provoked by some of the first-person reports provided by the participants. As we expected, there were many reports describing forms of joint attention and joint action, for example turn-taking and imitation. Surprisingly, however, there were also quite a few individualcentered reports in which participants described their experience of the other's presence in terms of the other's actions toward themselves. This is a specific kind of second-person awareness that is familiar from the developmental psychology literature. Reddy (2003) has argued that social awareness in the first couple of months in an infant's life primarily consists in being the object of the other's attention, while more advanced forms of mutual attention, including joint attention on aspects of the social interaction itself, develop in subsequent months. In retrospect this finding of a possible recapitulation of the development of social awareness is not that surprising; it follows quite naturally from enactive theories of perception and social interaction, as we argued in the introduction.

## **DIACHRONIC ANALYSIS AND RESULTS**

In the following we present a diachronic analysis of the perceptual crossing study first described in Froese et al. (2014). First, we were

<sup>3</sup>Some developmental psychologists have even argued that, just like ontogeny was supposed to recapitulate phylogeny (i.e., Haeckel's biogenetic law), there are important parallels between cognitive development and the history of science (for a critical discussion of Piaget's theory, see Franco and Colinvaux-De-Dominguez, 1992). Here we restrict ourselves to comparing two processes of change that can take place within the time scale of one individual's lifetime.

interested to determine if there was an effect of implicit learning in terms of changes in the objective results. Although participants were not given any information about the success of their clicks during the experiment, there might still have been tendencies toward improvement over the sequence of trials. Second, we wanted to see if we could find any qualitative transformation of user experience over trials, both in terms of PAS ratings and in the brief first-person reports written by participants.

## **EVIDENCE FOR IMPLICIT LEARNING**

Right at the beginning of the diachronic analysis we noticed that there was a potential confounding factor in the way we had designed the original study. Although we had randomized the starting configurations of the 15 trials, we had neglected to randomize the starting configurations across teams. This makes no difference if we are interested in aggregate performance only (like in the original study). However, when analyzing performance over trials, there is a possibility that trends in the results were influenced by accidental trends in the starting positions, in particular players' initial distance to each other and to their static objects. Although such influence cannot be ruled out in principle, we did several tests and did not find any compelling dependency on starting positions (for details see Supplementary Information, Section S1). It is likely that the 60 s available during each trial were sufficient for starting positions to be of little influence with regard to the final outcome.

As a first step toward detecting the effects of implicit learning we can consider how the frequencies of clicks on object types changes over the 15 trials (**Figure 3**). During the first half of trials there is a consistent tendency toward an increasing number of clicks on the other's avatar. This upward trend generally continues during the second half of trials. Three of them result in the three highest number of avatar clicks (i.e., trials 10, 11, and 13). But there is also a notable lack of consistency: all of the other later trials resulted in notably less avatar clicks, although never less than the very first couple of trials. Interestingly, in most cases these later reduced successes cannot be explained by corresponding increases in wrong clicks. Rather, it is the total number of

possible clicks per trial number is 34 (2 players × 17 teams). The linear trend line refers to avatar clicks only. For details of how the different virtual objects were determined to be the target of a click, see the methods section in Froese et al. (2014).

clicks that is temporarily decreased (see especially trials 8 and 12). In other words, these later fluctuations seem to be partially the result of more conservative choices: the players seem to have implicitly learned how to identify their partner, but perhaps the opportunity to do so did not present itself clearly enough in those trials to warrant a click <sup>4</sup> . Nevertheless, even so it remains an open question why these later moments of increased uncertainty consistently arose across the 17 teams.

Stewart (2010) has noted that when we are talking with others we tend to give them the benefit of the doubt that any uncertainties we may have about what they meant to say will be resolved as our interaction proceeds. After a while we may become active participants in this process of resolution by asking: "what did you mean when you said that. . . ?" The temporary decreases in avatar clicks may thus reflect attempts to gain more certainty by renegotiating the interaction process. Cuffari (2014) has argued that jointly overcoming breakdowns of sense-making is intrinsic to the emergence of shared meaning. On the basis of the first-person reports we can see that something similar is going on here in some cases, as forms of co-regulation emerge, stabilize, become questioned, and dissolve again.

For example, in one session (experiment 18) two players were trying to co-create a shared signal. After trial 3 player "b" wrote: "I collided with a moving object but the first and second periods of the appeal were different so I recognized it was the simple moving object and searched again" (E18T3Pb)5. Eventually the players reached an agreement about the shape of their signal, which is why the same player wrote self-assuredly after trial 9: "Receiving and sending. Do either role alternately" (E18T9Pb). However, later on doubts about whether a meaningful connection had actually been established start to creep in. After trial 14 the player explained: "Appeal and wait. But the object that I touched generates clear three-times-signal with constant period and it happens twice. So I did not click because I felt it was so mechanical" (E18T14Pb).

It is interesting to note the shifting conditions of communication: the same player who earlier on rejected an interaction because the repeated appeal was too "different" later ends up rejecting an interaction because the already established appeal was repeated "twice." Of course, the other player noticed that the signal failed to elicit the desired response: "I could not get the good response. I felt that the partner ran away during the trial" (E18T14Pa), and is left wondering about the reasons for this

<sup>4</sup>There are two reasons for increased conservatism of clicks compared to the study by Auvray et al. (2009). First, players were allowed to click maximally only once per trial rather than as many times as they wished. Second, the experiment was run as a team-based tournament game and a wrong click meant loss of a point for the team (a correct click gained the pair of participants a point, whereas no click left the score unchanged).

<sup>5</sup>This code uniquely identifies the first-person report. In this case it was during (E)xperiment 18, after (T)rial 3, and written by (P)layer 'b'. Since the original perceptual crossing experiment was conducted at the University of Tokyo, many first-person reports were originally written in Japanese, including those of experiment 18 discussed here. They were translated into English by HI. The number of experiments goes to 18, even though there were only 17 teams, because the numbering includes a test experiment (E4) between TF and TI that was removed from the analysis.

breakdown: "I felt that there was an interruption while communicating. It might be because a very fast object passed or I made a mistake" (E18T14Pa). Although it could be debated whether we can trust subjects to report accurately about their experience and about what is objectively going on (Jack and Roepstorff, 2003), here we decided to give participants the benefit of the doubt. There is no reason to assume that their reports are systematically misleading; see, e.g., **Figure 4**.

We discussed this example at length because it serves to show the complexity of the development of human communication. We should not expect to find a linear or even smooth developmental progress, since we are not dealing with machine learning like a hill climbing algorithm. If there had been further trials, this pair might have resolved their crisis and established another communication system with renewed, and perhaps even increased, confidence. For example, they could have meaningfully incorporated that repetition of the "three-times-signal" or even dropped it altogether. There is no reason to assume interactive alignment or convergence of behaviors, because progress in a coordinated dialogue requires differentiation of interlocutors' turns (Mills, 2014). Too much repetitive imitation may be interpreted as a failure to communicate, as we saw in this example. Both the means and goals of social interaction change over time, and these dialogical changes can go beyond the intentions of the individuals (Fusaroli et al., 2014). Relatedly, two common findings of microgenetic studies of learning, which are consistent with our analysis, are the halting and uneven use of newly acquired competencies and, more surprisingly, that changes in strategies are also often initiated following successes rather than just failures (Siegler and Crowley, 1991).

Another way of measuring implicit learning is by evaluating whether the amount of co-regulated activity changed over trials. For example, clicking success may come from a lucky guess, it may be the result of an individualist strategy such as waiting for the "prey" to trigger the sensor without moving oneself, or it may be the outcome of reciprocal interaction and joint action. While it is difficult to objectively differentiate between the various possibilities, a useful heuristic is to at least distinguish between trials in which both players were able to click successfully ("Joint Success") from trials were only one of the players clicked successfully ("Single Success"). And both of these cases can be contrasted with clicks that were simply wrong ("Wrong Click")6. **Figure 5** shows how the number of each of these three categories changed over the sequence of 15 trials. It reveals that there is a tendency for trials with jointly correct clicks to increase in frequency.

This tendency toward more Joint Success could be a sign that the players were able to develop better ways of mutually identifying each other. However, arguably it could simply be contingent on an increase in successful individualistic strategies, because more individual successes would independently add up to more cases in which both players click successfully, even if they did not directly facilitate each other. Yet while this could be the explanation of some cases of Joint Success, it is unlikely to be the whole

**FIGURE 4 | Virtual trajectories over 60 s of three representative trials.** Player a (Pa) is shown as blue, while player b (Pb) is green (see **Figure 2**). Solid and dashed lines represent positions of avatar and shadow objects, respectively. Light blue and light green solid lines show the positions of the static objects detectable by Pa and Pb, respectively. The bottom of each plot shows the haptic feedback (on/off) received by each player. **(A)** In trial 3 players find each other quickly, but Pb can be seen to break off their interaction. At no point is Pb interacting with the shadow object *(Continued)*

<sup>6</sup>Note that the number of Wrong Clicks is not identical with the number of Wrong Click trials, because there were 1, 1, 1, and 2 trials with jointly wrong clicks in trial numbers 5, 9, 14, and 15, respectively.

#### **FIGURE 4 | Continued**

(a "simple moving object"), but the unexpected irregularity of responses he describes could be attributed to interference caused by Pa's static object. **(B)** Trial 9 begins with some difficulties as Pb briefly interacts with Pa's shadow object and Pa becomes distracted by his static object. Eventually they find each other and start "receiving and sending" tactile stimuli while adopting either role alternately. Note that their exchanged activity consists of varying frequencies and durations. **(C)** In trial 14 we see two periods of turn-taking activity. In both cases Pa keeps sending a slow and regular "three-times-signal" while Pb's activity is faster and more irregular. Both times Pb abruptly departs from the interaction after a few exchanges, thus explaining why Pa is left feeling that "the partner ran away during the trial."

number of trials where one player clicked correctly while the other player clicked wrongly or not at all. "Wrong Click" shows the number of wrong clicks.

story because a strategy of trying to detect the other without actively making oneself detectable to the other is less likely to lead to Joint Success.

An indication of the co-dependence of correct clicks can be gained by analyzing their temporal relationship within a trial. At first sight the delays between jointly successful clicks support a more interactive interpretation of the results. In most trials where both players correctly clicked on the other, they did so within seconds of one another (23% co-occurred within 3 s), which suggests that we are dealing with cases of mutual attention that led to mutual clicking (see Figure 4 in Froese et al., 2014). Yet when we look at the distribution of clicking delays over the sequence of trials (Figure S5), the picture becomes more complex: the increase in the number of Joint Success trials is largely due to an increase in Joint Success trials with mutual clicking delays longer than 10 s. Presumably this is because participants have developed the capacity for more sustained interactions, thus eliminating the need to click as soon as possible when detecting the other's presence. The interaction process may also have become an interesting end in itself, rather than just a secondary means for solving the clicking task. Admittedly, it is difficult to objectively verify our intuitions.

As a first step toward a personal-level explanation for the tendency of increasing joint clicking delays, we can evaluate whether there are corresponding qualitative changes in the participants' experience. As shown in **Figure 6**, there does indeed seem to be a correlated change in the reported clarity with which the other's presence is perceived. While low to medium levels of clarity predominate during the first few trials, there is an increase in the

number of reports of maximum clarity until these reports come to predominate. Given that clicks in Single Success trials are most frequently associated with low to medium levels of clarity (see Figure 5 in Froese et al., 2014), this suggests that we are actually dealing with a qualitative change in the kind of mutual interaction that players engage in. Their engagements develop not only to be longer, as suggested by the increase in Joint Success clicking delays, but also more clearly social.

We expected that the nature of this qualitative shift in interaction had something to do with the emergence of more structured forms of co-regulated interactions, especially cases of turn-taking and mutual imitation. However, applying the objective measure of turn-taking described in our original study (see Supplementary Information Section S3 for details), which we had used to demonstrate that clearer experiences of the other player are preceded by more pronounced turn-taking interaction, did not reveal a very remarkable upward trend when viewed across trials, at least not when we average the turn-taking measure across all 17 teams (Figure S6). It may be that this measure is too crude to detect an increase in the co-regulation of interaction. And it is also possible that there are no general trends in turn-taking across teams; pairwise developments of mutual interaction may be too idiosyncratic for such averaging to be meaningful.

The second possibility is supported by a comparison of developments in each team's clicking performance, which reveals that there are indeed different clusters of expertise (**Figure 7**). Future research may therefore be better served by focusing the diachronic analyses on selected teams. For instance, if we examine the changes in turn-taking performance of the best team alone we do find a notable upward trend over time, which remains consistent at least for one of the players (**Figure 8**). This is not the only case with such an upward trend but, as already indicated by Figure S6, it cannot be generalized. Many trials show no discernable trend, and there is even an example of a downward trend. Moreover, even this exemplary best case shows that the regular turn-taking interactions that had slowly been established during the first half of trials loose some of their regularity during the last 5 trials.

We note that this kind of transition is consistent with the findings of research in dialogic joint activity: "since one of the hallmarks of coordinated dialogue is its progressivity, the development of procedural coordination necessarily involves the *differentiation* of interlocutors' turns as coordination increases" (Mills, 2014, pp. 161–162). This increasing differentiation may also help to account for the facts that players click more conservatively during the second half of trials, and that they click in a less synchronized manner. As players implicitly learn how to coregulate their interaction, simple interactive synchrony changes into more complex interpersonal synergy (Fusaroli et al., 2014).

**FIGURE 7 | Changes in team performances over 15 trials.** In each trial a player can make a correct click (+1 point), a wrong click (−1 point), or no click (no change). The final maximum possible team score is 30 (15 trials × 2 correct clicks).

correct clicks by Pa and Pb, respectively. This team (E14) managed to score

## **EVIDENCE FOR DEVELOPMENTAL STAGES OF SOCIAL AWARENESS**

Following Reddy's (2003) work on developmental psychology, we hypothesized that participants' social awareness emerges in situations of mutual attention, in which one's awareness of the other's presence is first framed in terms of the other's attention to one's self in general, followed by mutual attention to what one's self specifically does. We did not consider the further progression to triadic joint attention.

An initial evaluation of participants' first-person reports suggested the possibility that there could be two distinct forms of awareness of being the object of the other's attention, namely depending on whether this awareness is mutually shared or not. In some cases people described awareness of being the object of the other's attention, but without thematizing the other's awareness of this awareness. Such descriptions of an individualistic awareness of being the other's object of attention may simply be a consequence of the technical specificities of the perceptual crossing setup. An actively searching participant cannot in principle distinguish between a completely immobile (or nonresponsive) participant and the static (or shadow) object. This means that there is a possibility of one participant having awareness of the other's attending presence, but without the other sharing in that awareness of attention.

Nevertheless, we highlight that analogous situations exist in human development. As Tronick discusses at length, a newborn lacks control over its own movements to the point that "what he is doing is messy – variable, unstable, disorganized" (2004, p. 307). And Reddy considers non-responsiveness to be an intentional action with which infants sometimes counter being the object of other's attention: "Infants can also be indifferent to others' visual attention, as anyone knows who, trying to engage a 2-monthold, has had the infant glance expressionlessly at them and turn away" (2005, p. 97). We can also consider cases of pathological development. For example, Tronick (2004, p. 304) examines the pathological apathy that is exhibited by chronically deprived orphans. When attending to such individuals we may remain unaware of the extent of their awareness of being the object of our attention, even though they might actually be aware of our attending presence.

We therefore defined three categories of experiencing the other's presence, which incrementally build on each other: (A) individual awareness of being the object of the other's attention, (B) mutual awareness of being each other's objects of attention, and (C) mutual awareness of specific aspects of the interaction being the object of joint attention. The categories overlap to some extent, but essentially category A includes only reports of awareness of the other's self-directed actions, B additionally required awareness of mutually responsive interaction, and C additionally required awareness of joint attention on something specific other than the selves, for example an arbitrary pattern of mutual contacts that had acquired special communicative significance.

After each trial, participants could write as little or as much as they wanted within 2 min until the next trial started. The questionnaire sheet asked them to describe the sensation of having encountered the other at the time of the click, and more generally to describe the strategy they had used during the trial. There were 472 instances of a participant having voluntarily written at least

27 points (see top line in **Figure 7**).

some text after a trial. Mostly these were fragmentary statements, with only very few responses consisting of several sentences.

Each of these responses was coded as belonging to one of the three social awareness categories (A, B, or C) or not assigned to a category (N/A). It was quite a challenge to categorize the responses. Wherever possible we based our categorizations not only on the brief description of the experience, but also on the brief description of the strategy, as well as descriptions provided for preceding and subsequent trials (e.g., participants often abbreviated by writing "same as above"). In cases where different categories where implied by a description of an experience compared to the stated strategy, for example if a participant only reported having individual awareness of being the object of the other's attention although an interactive strategy was described, the category of the experience took precedence. In order to get an estimate of interobserver reliability, two of us (TF and HI) independently did the coding. The results are shown in **Table 1**.

In total there were 308 coding agreements, which is 65% of all responses. Given the frequency distribution of the four types of codings (A, B, C or N/A), the expected percentage of agreement is 29%. This gives an interobserver-reliability kappa of 0.51 (see Supplementary Material Section S3 for calculations), which can be interpreted as moderate agreement. Given the sparse responses collected during the original study, this is probably all that can be hoped for at this point. Moreover, it is encouraging that disagreements tended to occur more frequently between consecutive stages of awareness (i.e., between A and B, or B and C, rather than A and C), which is to be expected given that the three categories build on each other. In the following we limit our analysis to only those responses for which there was an agreement between the two coders. First, in order to illustrate how people responded and how we coded, we provide 10 examples for each of the three categories in **Tables 2**–**4**, respectively.

TF and HI jointly classified 29, 58, and 70 responses as belonging to categories A, B, C, respectively. Given that these three categories can be interpreted as analogous to the first stages in


*There were 510 opportunities for giving free-text responses (15 trials* × *17 teams* × *2 players), out of which 472 resulted in some written text. Two experimenters went through these responses with the aim of coding each into one of three categories: (A) individual awareness of being the object of the other's attention, (B) mutual awareness of being each other's objects of attention, and (C) mutual awareness of specific aspects of the interaction being the object of joint attention. If no category was applicable or there wasn't sufficient text to make a decision, the response was coded as N/A. Bold numbers represent the number of responses for which the coders were in agreement.*

the development of social awareness, from passive individuality to active mutuality to co-regulated triangulation on a third element, we expected there to be a corresponding increase in the reported clarity of perceiving the other's presence. Or to put it differently, following the hypothesis formulated by Froese and Di Paolo (2011), we expect there to be a correlation between the extent of co-regulation and the sense of sociality in the experience. The increasing number of reports found for each category already suggests this trend, since having a clearer experience of the other makes it easier to report it. And we further confirmed this hypothesis by evaluating the perceptual awareness scale (PAS) ratings associated with each category.

In order to determine if there was a significant difference between the average PAS ratings reported for the categories we applied one-tailed, two-sample equal variance *t*-tests. There were 24, 56, and 68 PAS scores associated with the agreed categories

## **Table 2 | Category A: individual awareness of being the object of the other's attention.**


*Ten exemplary first-person reports (emphasis added).*

**Table 3 | Category B: mutual awareness of being each other's objects of attention.**


*Ten exemplary first-person reports (emphasis added).*



*Ten exemplary first-person reports (emphasis added).*

A, B, and C, respectively. The equality of variances was verified using an *f*-test for each comparison. The average reported clarity of experiencing the other's presence for category B experiences was slightly but not significantly higher than for category A (µ<sup>A</sup> = 2*.*83; µ<sup>B</sup> = 3*.*05; *P* = 0*.*15), but the average clarity for category C was significantly higher than for category B (µ<sup>C</sup> = 3*.*62; *<sup>P</sup>* <sup>=</sup> <sup>3</sup>*.*<sup>71</sup> <sup>×</sup> <sup>10</sup>−6).

The fact that the clarity of social awareness associated with categories A and B was not significantly different suggests that these categories may not be experienced as qualitatively distinct situations from the first-person perspective. This is consistent with Reddy's (2003) approach, which does not allow for a purely individualistic awareness of being the object of the other's attention but treats such awareness as always already mutual to some extent. Indeed, from what we have observed while running the study, it does seem highly unusual for participants to remain completely nonresponsive while being their partner's object of attention. Typically, after having received a few touches the subjects of attention quickly get pulled into a mutually responsive interaction. In the following we therefore collapse categories A and B into a single category of mutual awareness, category AB.

The final step of our analysis was to determine if experiences belonging to categories AB and C actually followed a sequence. Given the developmental sequence from AB to C, we expected responses categorized as AB to be more frequent than C during the initial trials. We may also expect that category C is more frequent during later trials, although it does not necessarily have to displace category AB since C can be seen as a more specific articulation of AB. These predictions are partially supported by the data (**Figure 9**). In the first couple of trials there are indeed more cases of AB than C. The frequency of C tends to increase over the subsequent trials, but it never fully becomes the dominant category. These findings are suggestive, but the trends are not that well pronounced and may be biased by the small sample size.

Clearly, a proper evaluation of our hypothesis that the developmental stages of social awareness can be recapitulated using this kind of experimental setup requires a more systematic collection and analysis of subjective reports. Developmental studies using the microgenetic method have long emphasized the need for dense observations of individual cases (Siegler and Crowley, 1991). Due to the limited number of usable free-text responses, and even less agreed upon codings, we averaged categorizations across all 17 teams, which may have further obscured any idiosyncratic team-based trends. Nevertheless, these tentative results at least hold out the prospect that more distinguishable developmental trends in social awareness could be discovered by studies that are specially designed to elicit detailed first-person reports.

Participants could also be phenomenologically trained beforehand to become more aware of their different kinds of experience (Lutz, 2002). Another possibility is to interview them about their experience using a specialized method (e.g., Petitmengin, 2006; Hurlburt, 2011). Biases associated with experimenters' classifications of the written reports could be avoided by asking participants to define and select categories that best describe their own experiences (Lutz et al., 2002).

## **DISCUSSION**

We have proposed that a suitably implemented perceptual crossing paradigm can fill a gap in experimental psychology. Following enactive theory, we hypothesized that we should find something akin to the main stages of development of social awareness in infants recapitulated in adults if they are forced to implicitly relearn the skill of social perception. A sequence of three categories was defined: (A) individual awareness of being the object of the other's attention, (B) mutual awareness of being each other's objects of attention, and (C) mutual awareness of specific aspects of the interaction being the object of joint attention. The preliminary results we have presented suggest that our

**FIGURE 9 | Changes in how participants described their social awareness.** Categorizations were based on brief free-text first-person reports. Only cases where both coders agreed were considered. Category AB: mutual awareness of being each other's object of attention (combining categories A and B). Category C: mutual awareness of joint attention to aspects of the interaction.

hypothesis has merit, although the methods still need refinement. We found that there was an average increase in reported clarity of social awareness over trials, but it is challenging to find an objective explanation of this phenomenon. Turn-taking is only partially responsible, and a team-based measure may be more appropriate. We also found that there is no significant difference between categories A and B in terms of the associated clarity of social awareness, with only C being significantly clearer, which we argued is in line with Reddy's (2003) original proposal, although evidently more precise phenomenological work is needed. At least these diverse and complex results already have the benefit of warning us against idealizing the phenomenon of development as a linear sequence of independent stages.

Although it was difficult to find general trends across all teams, many participants were able to sense the other's attention to their objective presence, and to engage in co-regulated interactions that involved mutual attention, such as feeling being chased/being led (see **Table 3**, rows 1 and 2, for descriptions by one team). Some participants were able to further develop these co-regulated engagements into communication games involving turn-taking and mutual imitation, such as passing patterns of activity between each other (see **Table 4**, rows 1 and 2, for reports by one team). On the basis of such coordinated interactions apparently it even became possible to perceive the other's emotional state across the HCI, as predicted by Lenay (2010).

For example, after trial 10 one player somewhat confidently remarks: "I think I am pretty sure that I could communicate about my intention" (E10T10Pa), and two trials later he writes: "Same as before, but I felt that the partner is anxious" (E10T12Pa). Did this player correctly make sense of his partner's emotional state of mind? Given the unfortunate scarcity of free-text descriptions that were generated by the original experiment, it was usually impossible to evaluate this kind of question. However, here we happened to be lucky because after the next trial his partner writes: "I think my click was correct but if this response was autonomous object's, I will get anxious" (E10T13Pb). In other words, despite the extreme poverty of the stimulus provided by this minimal HCI, namely a sequence of binary (on/off) tactile sensations, one player seems to have correctly noticed some anxiety in the other's style of engagement.

This finding is consistent with studies showing our propensity to discern intentional states on the basis of minimal movement information (Blake and Shiffrar, 2007), such as detecting other people's emotions from the point-light displays of their dances (Brownlow et al., 1997). It is still debated if this ability is best explained as a direct perception of the other's intentional state in their behavioral expression (Stout, 2012), or if perception just presents us with meaningless "surface behavior" that needs to be cognitively penetrated to gain access to the underlying intentions (Baldwin and Baird, 2001). We suggest that with this experimental setup social understanding might be productively analyzed as a case of direct perception by interaction (De Jaegher, 2009). Due to the constraints of the HCI it is impossible to discern the other's intentions without at the same time interacting with them, and this interaction can evoke a felt sensation of the other's mental state. Using the terminology introduced to developmental psychology by Stern (1998), we can describe this encounter as an amodal perception of the other's vitality affect in the activation contours traced by their movements-in-interaction. The result is a felt impression of the other's state, e.g. "She likes me!" (E1T9Pb), which in turn will modulate the perceiver's expression, thereby making an impression on the other, and so forth. Interacting players are thereby able to create an intertwinement of embodied affectivity, which is a form of embodied communication (Fuchs and Koch, 2014). Movement and being moved both have spatial and emotional components.

From this perspective we can also better understand why a player would terminate an interaction that is too repetitive and "mechanical" (see **Figure 4C**). The other player might keep faithfully replicating an already established signal, but without at the same time allowing their movement to resonate with the other's changing expressions they fail to participate in a shared affective space. When the signal stops being grounded in a mutually affecting situation it looses its communicative value; it becomes an empty form that obscures rather than expresses the other's subjective presence. This example nicely shows the primacy of embodied communication via interbodily resonance, in contrast to the traditional starting premise of sending and receiving symbolic signals across pre-defined channels. The importance of common ground for the emergence of an embodied communication system has been observed before (Scott-Phillips et al., 2009). Here we saw that it continues to be crucial even after the establishment of that system in the interactive maintenance of its meaningfulness.

However, we acknowledge that this interactive-perceptual strategy is not the only way of realizing the task of locating the partner. As we discussed previously (Froese et al., 2014), one outstanding participant managed to get nearly perfect clicking scores while never reporting any direct perceptual awareness of the other's presence. Leaving the free-text boxes asking for descriptions of his felt sensations entirely blank, he only provides a few statements of his strategy that reveal the perspective of a detached observer: "Because the partner generated intermittent stimulation, I also reply the intermittent stimulation" (E15T2Pa). Similarly, another participant insisted on relying more on an individualist cognitive strategy: "Felt like it was him. But every time I say feel, I must say I rely much more on thinking about my strategy and sticking to it." (E2T14Pb). But then again, the fact that at least some more cognitivist strategies were employed is not all that surprising. After all, participants were adults who in real life already had fully developed social skills and who were confronted with a breakdown of these skills, a breakdown that could be expected to elicit more reflective awareness and cognitive compensatory strategies (Dreyfus, 1991).

We note that accepting the importance of the individual is not a problem for this framework because the interactive turn in cognitive science is not a return to the old days of behaviorism or some kind of extremist externalism. The internal organization of agents is a central concern of the enactive approach to social cognition (Froese and Di Paolo, 2011). Neither is this concession to the individual and its internal milieu a return to the classical internalism of cognitivism, since all behavior is conceived of as a dynamical property of embodied and situated minds (Beer, 2000). The perceptual crossing paradigm thus provides a platform for gaining a better understand of the diversity of individual and interactive styles that exist. These differences were mainly ignored in the current analysis because we were looking for statistically significant trends that were averaged over players and teams.

Apart from confirming the preliminary results presented here, it would be interesting to use this approach to investigate other hypotheses about the development of social awareness. For instance, Stern (1998, pp. 56,57) assigns developmental primacy to amodal perception of the other's vitality affects over modal perception of overt acts and objects. Future work could attempt to use the current approach to study the developmental trajectory from the former to the latter. In addition, studies have found differences in phenomenology between people from an Eastern and Western cultural background, including divergences in their development of social experiences (Cohen et al., 2007). Although our study included participants from these two backgrounds, we did not distinguish between these groups. Conducting a betweengroup experiment might reveal differences in their development of social awareness. Finally, it is an interesting open question whether it is possible to modify the perceptual crossing paradigm so as to allow for the emergence of secondary intersubjectivity (Trevarthen and Hubley, 1978), including the triangulation of joint attention on external objects, which is predicted to follow after the stages of mutual awareness that we have described here (Reddy, 2005).

## **ACKNOWLEDGMENT**

This study was partially funded by KAKENHI grant number 25560430.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fpsyg. 2014.01061/abstract

## **REFERENCES**


*Cognition at the Edge of Sense-Making: Making Sense of Non-Sense,* eds M. Cappuccio and T. Froese (Basingstoke, UK: Palgrave Macmillan), 207–237.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 20 May 2014; accepted: 04 September 2014; published online: 26 September 2014.*

*Citation: Froese T, Iizuka H and Ikegami T (2014) Using minimal human-computer interfaces for studying the interactive development of social awareness. Front. Psychol. 5:1061. doi: 10.3389/fpsyg.2014.01061*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Froese, Iizuka and Ikegami. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Quantifying long-range correlations and 1/*f* patterns in a minimal experiment of social interaction

# *Manuel G. Bedia\*, Miguel Aguilera , Tomás Gómez , David G. Larrode and Francisco Seron*

*Department of Computer Science and Engineering Systems, University of Zaragoza, Zaragoza, Spain*

#### *Edited by:*

*Ezequiel Alejandro Di Paolo, Ikerbasque - Basque Foundation for Science, Spain*

#### *Reviewed by:*

*Sebastian Wallot, Aarhus University, Denmark Takashi Ikegami, The University of Tokyo, Japan*

#### *\*Correspondence:*

*Manuel G. Bedia, Department of Computer Science, School of Engineering and Architecture (EINA), University of Zaragoza, Maria de Luna s/n, 50018 Zaragoza, Spain e-mail: mgbedia@unizar.es*

In recent years, researchers in social cognition have found the "perceptual crossing paradigm" to be both a theoretical and practical advance toward meeting particular challenges. This paradigm has been used to analyze the type of interactive processes that emerge in minimal interactions and it has allowed progress toward understanding of the principles of social cognition processes. In this paper, we analyze whether some critical aspects of these interactions could not have been observed by previous studies. We consider alternative indicators that could complete, or even lead us to rethink, the current interpretation of the results obtained from both experimental and simulated modeling in the fields of social interactions and minimal perceptual crossing. In particular, we discuss the possibility that previous experiments have been analytically constrained to a short-term dynamic type of player response. Additionally, we propose the possibility of considering these experiments from a more suitable framework based on the use and analysis of long-range correlations and fractal dynamics. We will also reveal evidence supporting the idea that social interactions are deployed along many scales of activity. Specifically, we propose that the fractal structure of the interactions could be a more adequate framework to understand the type of social interaction patterns generated in a social engagement.

**Keywords: perceptual crossing, social engagement, long-term correlations, multiscale interaction, 1***/f* **noise, multifractality**

# **1. INTRODUCTION**

There are emergent social processes in collective online situations—when two persons are engaged in real-time interactions—that can not be captured by a traditional offline perspective, whereby the problem is considered from the perspective of an isolated individual who acts as an observer exploiting their internal cognitive mechanisms to understand people. Although the study of how people process social information can be considered an old problem, in recent years, social cognitive processes have generated significant interest. On the one hand, theoretical interest, for example, a promising theoretical proposal has been developed about the possible "constitutive" role of social interaction for social cognition (De Jaegher, 2009; De Jaegher et al., 2010) that suggests that interactivity capabilities of the "second-person perspective" (Gomila, 2013) are the basis that support the "first and the third person approaches" and their related structure of mental states (Reddy, 2008; Wilms et al., 2010). On the other side, experimental interests; for example, the recent development of a minimal and simple framework for studying social online interactions, and for understanding the mechanisms that give support to minimal social capabilities (Auvray et al., 2009) that is known as the "perceptual crossing framework." This experimental frame is a way to study online dyadic interactions and to analyze the perception of someone else's agency in different situations implemented in a minimal virtual world. Through the self-organized collective patterns that emerge from the interactions (like emergent coordination, turn-taking, etc.), hypotheses about the human capacity for social cognition can be extracted.

Perceptual crossing paradigm constitutes a simple framework for studying social online interactions in its simpler form. It consist of a minimal scenario in which two participants, sitting in different rooms, interact each other by moving a sensor along a shared virtual line using a computer mouse. The subjects are only allowed to move laterally in a one-dimensional world and perceive the collisions with other human subjects or artificial agents. In the last few years, the perceptual crossing paradigm has become a promising experimental tool for the analysis of dynamic interactions of human social processes. A more detailed analysis leads to two types of experiments: (i) behavioral experimental research and (ii) simulated agent modeling. Relating to the former, numerous experiments where real subjects try to identify each other in a virtual world have been carried out, and researchers have analyzed the type of behaviors that seem to offer support for social coordination patterns [for example, in one-dimensional experiments (Auvray et al., 2009) and also in their extensions to two dimensions (Lenay et al., 2011)]. In some cases, real experiments and phenomena previously tested in simulations were combined, for example in Iizuka et al. (2009, 2012), where authors explored how participants modulated the interaction dynamics to figure out if an interaction was live or not. Regarding the latter, i.e., focusing now on the computational modeling context (for example in Iizuka and Paolo, 2007; Iizuka et al., 2012), virtual agents have been evolved to locate others in an experimental set-up analogous to that used in Auvray's version (Auvray et al., 2009), providing a mathematical analysis that explained how virtual agents managed their own variables, such as size or velocity, to coordinate with others in an extremely robust way. Simulation models to build "social software agents" have demonstrated that this kind of behavior can emerge from very simple structures without explicit social reasoning capabilities (Froese et al., 2014a). In general, the studies on simulation models complimented the experimental work with humans, sometimes providing proofs of concept and a methodological alternative to explore social interactions.

A common feature exists in the way in which we deal with the experimental results obtained in both cases cited (considering behavioral modeling experiments and simulated agents): the participants' behavior is analyzed only *in a short-time scale* (this point is explained in detail in the next section). As how later, in this paper we propose to get a quantitative indicator that works as a complementary measure of the analysis addressed in previous perceptual crossing experiments, an indicator that consists of characterizing the cross-scale nature of the interaction through *fractal and multifractal analysis* (Van Orden et al., 2003) of the collective dynamics. We argue that this indicator can help to shed some light on the understanding of social constitutive processes and related questions and will be useful in order to characterize the *genuine constitution of social interactions*.

This is a brief outline of the paper: in Section 2, our working proposal is detailed and we propose that a multiscale analysis is needed to identify the type of pattern that emerges in a social interaction. Notions of 1/*f* patterns and fractal measures are explained in order to support the idea that "1/*f* noise analysis" (Van Orden et al., 2003, 2005) can be a genuine indicator able to discriminate between "human-human" or "humansoftware agent" interactions in a perceptual crossing experiment. In Section 3, we explain the type of experiments that we have carried out and the fractal and multifractal analysis on the results obtained. We also deliberate whether or not our results imply new insights into the characterization of social interactions. In Section 4, we discuss whether or not the results are statistically significant. Finally, Section 5 includes a review of the most notable points related to our analysis and future lines of research to be explored. An Appendix of Supplementary Material are included at the end of the paper. The first S.1 relates to the software platform and the protocols used for the experiments. The second focuses on the statistical foundations that give support to the results obtained.

# **2. THEORETICAL FRAMEWORK**

Studies of the perceptual crossing experiment have provided insightful evidence about the importance of inter-individual coordination for the emergence of social cognition and agency detection. However, we think that still more advances are needed in order to characterize and better understand how coordinated interactions may give rise to collective social processes. Recently, some authors have emphasized the importance of understanding how distinct time scales and organizational levels are intertwined for the emergence of social cognition (Dumas et al., 2014). At neural level, there is experimental evidence of the importance of non-linear cross-scale interactions for brain organization (Le Van Quyen, 2011), and in social neuroscience, inter-brain synchronization in multiple frequency bands has been found during imitation of hand movements (Dumas et al., 2010) or synchronization patterns are found during guitar improvisation showing a complex interplay of different frequencies (Müller et al., 2013). It is still missing to our knowledge a detailed analysis of this kind of phenomena at a behavioral level. These examples show the potential of a multi-scale account of social cognition and lead us to think that sometimes the analyses developed so far to understand perceptual crossing dynamics may fall short in their ability to characterize the emergent multi-scale nature of social interaction.

As previously stated, we contend that some of the conclusions about the original perceptual crossing experiments only focus on the reaction to short-term interactions. For example, in Auvray et al. (2009) the analyzed is limited to analyzing the probability of clicking in a 2 s window after the subject encounters another subject or object and by standard statistical variables of some system variables (frequency of crossings, correlation between velocity and acceleration, etc.). Thus, it is implicitly assumed that the emergence of social engagement can be reduced to a scale of short-term activity and that there is no influence of other scales or any inter-scale correlations that are relevant for the subject behavior (for example, assuming that there is no interference between the previous collisions of the subject with different kinds of agents and the decision of clicking or not clicking). A similar assumption is also found in the agent modeling field, for example in Di Paolo et al. (2008), where the simulated model is focused on finding what kind of short term dynamics (modeled in terms of delays between the perceptual stimulation of the agent and its motor response) is able to create the stable pattern of social interaction that allows a dynamic of co-regulation to emerge. Again, interscale correlations in the social interaction process are left out of the analysis and modeling.

In this context, we propose that it may be useful to analyze the dynamics in the perceptual crossing experiment from a conceptual framework that is not constrained by the assumption of one dominating scale of behavior. Despite its apparent simplicity, we propose that the perceptual crossing paradigm could comprise several embedded levels of dynamic interaction, resulting in correlations of the signals over different time scales. Therefore, in the next section we propose a framework of analysis that is capable of capturing the multiple relations between different scales of behavior. Specifically, in this next section we propose the analysis of fractal and multifractal patters for obtaining evidence of the multi-scale nature of social interaction in the perceptual crossing experiment.

## **2.1. 1***/F* **NOISE AND MULTIFRACTALITY FOR CHARACTERIZING SOCIAL INTERACTION**

During the last two decades, the 1/*f* noise approach to cognitive science has achieved considerable progress in toward conceptualizing cognitive and mental organization (Dixon et al., 2012). Dynamical systems concepts as self-organized criticality or scalefree patterns have provided new insights about how the brain and the mind operate in a non-linear dynamic manner, selforganizing its activity always at the brink of criticality. The concept of self-organized criticality (SOC) (Jensen, 1998), one of the main exponents of this approach, was proposed by Bak et al. (1987) to define certain classes of dynamical systems which have a critical point as an attractor, displaying critical behavior without any significant "tuning" of the system from outside1 Critical systems present very interesting properties: the most notable of which is the lack of a dominant scale of activity. They show complex dynamical responses and their statistical properties have to be described by power laws. Thus, critical systems typically display temporal and spatial scale invariance in the form of fractals and 1/*f* noise, reflecting the process of propagation of long-range correlations based on local effects. The idea of long-range correlations refers to the presence of long-term dependencies in a signal between the current observation and a large set of previous observations, displaying a slow decay of the correlation function (typically exponential). Thus, the presence of long-range correlations suggests the presence of multiple, intertwined timescales in the system, responsible for the emergence of patterns or regularities in the system. For a multi-scale approach to cognitive science, SOC is appealing because it allows us to imagine systems that are able to self-regulate coordinated behaviors at different scales in a distributed manner and without a central controller.

1/*f* patterns have also been widely found in cognitive science and psychology. For example, 1/*f* noise is present in performance time series (Gilden, 2001). More recently, Van Orden et al. (2003, 2005) used 1/*f* noise measures in different tasks to gather evidence to argue that certain systems are not modular and decomposable but "softly assembled" systems sustained by *interactiondominant dynamics* (IDD hereafter) as opposed to *componentdominant dynamics* (Van Orden et al., 2003). That is, IDD systems do not consist of additive interactions of their components, but multiplicative interactions that imply coordination between the different timescales in the system. Moreover, 1/*f* is not a unique and exclusive property of SOC or IDD systems (see Wagenmakers et al., 2004, 2012) since it has been shown to be displayed by a linear superposition of many random inputs with different time scales (Hausdorff and Peng, 1996). To avoid the uncertainty about the true origin of 1/*f* noise some authors have suggested to complementing it by a measure of multifractality as a quantitative indicator of the coordinated intermittency in the system's activity (Ihlen and Vereijken, 2010). Ihlen and Vereijken propose that intermittency is displayed within the series as distinct periods of large and irregular performance variability prompted by emergent changes in the commitment, attention to stimuli, or intention of the participant in a cognitive task. The multifractal spectrum width quantifies the difference between the intermittent and the laminar periods, so it provides further evidence of the interaction between different timescales in the system.

## **2.2. OUTLINE**

In this paper we try to explore the presence and relevance of multiple scale and inter-scale or long-range correlations in the perceptual crossing experiment. We think that genuine social interaction will display long-range correlations and coordinated intermittency in the form of 1/*f* scaling and a multifractal spectrum. Moreover, multi-scale interactions should be present in collective variables and not only in individual variables, as an indicator of an emergence of a social domain of interaction.

We propose a modified version of the original perceptual crossing experiment, in which the player only faces one opponent, which may be another human player or a programmed agent with two possible kinds of behavior (a simple oscillatory behavior or a "shadow" behavior that repeats the movement of the player. More information will be given in the next section and in the Supplementary Material Section S1). Thus, in our experimental setup we have different kinds of social interaction: humans recognizing each others as such, humans interacting with programmed agents with artificial behavior, humans failing to recognize other humans, bots tricking humans, etc. Can we characterize when genuine social interaction emerges? And if so, where does it lie?

In Auvray et al. (2009), the authors propose that the sensitivity for recognizing other intentional subjects, instead of being perceived by each of the participants, arises from the dynamics of the interaction itself. In their experiment, the distribution of clicks suggested that social recognition arose from a combination of (i) the ability to discriminate between mobile (human player, shadow) and immobile objects and (ii) the stability of mutual interaction patterns between two human partners or between human and a immobile object. This interpretation was inspired by the results in a simulated model which showed the importance of the stability of coordinated behavior (Di Paolo et al., 2008). However, we think that further evidence supporting the claim that social recognition emerges from interaction dynamics instead of individual sensitivity is necessary. In fact, the model presented in Di Paolo et al. (2008) could be interpreted as showing that relatively simple behaviors could account for a click distribution in which agents appear to "recognize" each other, without a genuine, underlying process of social recognition. We propose that genuine social interaction should arise from the emergence of a complex web of interactions across different timescales between the activity of different agents. For a first approach to support this claim we propose the following schema:


<sup>1</sup>All the data used in this experiment is available at https://github.com/Isaac Lab/datasets/tree/master/PerceptualCrossing/data-28-03-2014

and the movement of the opponent (using their individual speeds) and conducting linear mixed effects models to assess if the different variables analyzed (difference of speeds, speed of the player and speed of the opponent) can discriminate between the type of interaction, finding that only the collective variable of the relative speeds can discriminate the two types of programmed agents from genuine human interaction (Section 4.3).

## **3. MATERIALS AND METHODS**

## **3.1. EXPERIMENTAL PROCEDURE**

In this experiment, human participants were allocated computers to interact in pairs, within a shared perceptual space, where some opponents were other human participants and some opponents were computerized agents (bots) but participants were unaware of the nature of their opponents.

Our intention was not to make a duplication of Auvray's experiment where each participant simultaneously encounters a human partner, a mobile agent and a static one. In our case, each participant received only a single stimulus in one of the following scenarios: human vs. human, human vs. "oscillatory agent" and human vs. "shadow agent." The "oscillatory agent" was programmed to deploy a sinusoidal behavior (describing a sinusoidal trajectory of 0.5 Hz and 200 pixels of amplitude), predictable and deterministic. In contrast, the "shadow agent" was able to show an irregular pattern because it consists of a "shadow image" of the participant (i.e., a bot that generates a movement strictly identical to the participant trajectory but delayed 400 ms. in time and 125 pixels in space). Participants were instructed to try to detect wether their opponent was human or not and asked to fill a questionary (although the analysis of the participants responses is out of the scope of this paper).

When participants arrived at the laboratory they were randomly assigned to a workstation and were provided with headphones. They were informed that the study involved two parts, each independent from the other and that the first one—training stage—would take approximately 3 min and the second one evaluation stage—a further 10 min. In order to guarantee confidentiality during the study, identification codes/nicknames were chosen by the participants. Throughout the experiment, participants were provided with verbal instructions regarding the structure of the experiment and their sections.

In the training stage, the participants were informed that it was a simple "proof of concept" stage and that the purpose was only to learn how the platform worked. Participants were free to move the mouse as they pleased during three sessions of 1 min each with a short break between them. They played consecutively against three bots of increasing difficulty in the interaction: a static bot, a bot moving at a constant low speed and a bot moving at a constant medium speed.

After that, they were informed of the aim and rules of the evaluation part of the experiment. The experiment consisted of 10 sessions of 40 s each. In each session: (i) each participant was randomly assigned an opponent (human-human or human-bot) to explore the virtual space; (ii) participants were asked to move their mouses in order to detect the movement of their assigned opponents, (iii) after each session, participants were asked to make a choice between the two options displayed on the screen in order to guess whether their opponent was a human or a bot, and (iv), finally, participants were informed on the screen whether or not they had guessed successfully. After the 10 sessions were completed, the experiment was declared finished.

A total of 13 participants (8 females and 5 males) took part in this experiment. Their ages ranged from 16 to 19 years. However, due to a problem with the computer of one participant, some data were not recorded and therefore not included in the study. As well, we removed a few samples in which no interaction between the players was detected. The final dataset used in the analysis comprises a total of 106 samples of the cursor positions over time of each participant recorded with a sampling period of 1 ms.

More detailed information related to experiment protocols (study sample, characteristics of participants, experimental stages, number of sessions, etc.), information about how the technological platform was built (network properties, latency estimation, etc.) or how software requirements were programmed (virtual environment conditions, experimental devices, sensor stimuli, etc.) can be consulted in the Supplementary Material Section S1.

#### **3.2. FRACTAL AND MULTIFRACTAL ANALYSIS**

In order to analyze the interaction between the subjects, we take the time series of the distance between the two players (or the player and the bot agent). We compute the players relative velocity (i.e., the first derivative of the distance between the player and its opponent) to extract whether the players are approaching or distancing themselves at each moment of time. Then we use a DFA algorithm (Peng et al., 2000) to compute the statistical self-affinity in the data series of distance variations and, in order to verify if the involved cognitive processes presents an intermittent nonlinear structure, we also analyze the multifractal spectrum with the multifractal DFA (MFDFA) algorithm (Ihlen and Vereijken, 2010).

In a nutshell, the DFA algorithm removes the mean and integrates (cumulatively sums) the analyzed time series *x*(*i*) into *y*(*k*) and then divides it into segments of equal length *n* (i.e., of a certain time scale). For each segment, a least squares line (the trend of the signal within that segment) is fitted to the data obtaining a local linear approximation *yn*(*n*). The characteristic size of the fluctuation *F*(*n*) is computed as the root mean square deviation between the integrated signal and its trend in each segment. This computation is repeated for every value of *n*.

$$\gamma(k) = \sum\_{i=1}^{k} \chi(i) \tag{1}$$

$$F(n) = \sqrt{\frac{1}{N} \sum\_{k=1}^{N} \left[\wp(k) - \wp\_n(k)\right]^2} \tag{2}$$

where *N* is the total length of *x*(*n*). Typically, *F*(*n*) increases with *n*. A linear relationship on a log-log plot with slope α indicates the presence of fractal scaling in the analyzed signal, where α is a generalization of the Hurst exponent, and is related to the scaling in the Power Spectrum of the Fourier analysis being β = 2 · α − 1.

The DFA has some advantages compared to spectral analysis as the Fourier transform. While the Fourier transform is only well suited for stationary signals, the DFA has been reliably used in non-stationary signals (Kantelhardt, 2011). A visual inspection of the data revealed abrupt transitions at different moments, so we decided to use DFA instead of the Fourier transform. Usually pink or 1/*f* noise is considered to correspond to values of β between 0.5 and 1.5. Similarly, values of β close to 0 correspond to white noise (uncorrelated processes) and values close to 2 to brown noise (process driven by slow timescales showing shortterm predictability). Only processes with β around 1 and a wide multifractal spectrum are considered to display SOC (Jensen, 1998; Ihlen and Vereijken, 2010).

On the other hand, the multifractal spectrum is computed by the MFDFA algorithm, a variation of DFA in which the squared exponent of the root mean squares deviation becomes a variable *q*, therefore allowing calculations outwith the standard euclidean norm defined by the root mean square. Following this procedure, positive *q*-values describe the scaling behavior of the segments with large variance because the large deviations from the corresponding fits will dominate the average *F*(*n*). On the contrary, negative *q*-values describe the scaling behavior of the segments with small variance because the large deviations from the corresponding fits will be largely attenuated on the average *F*(*n*) (Kantelhardt, 2011). This behavior describes the regularity of laminar periods of little performance variability vs. the regularity of intermittent periods of large performance variability, and can be quantified as the difference between the maximum and minimum values obtained along the different *q*-values, namely the width of the multifractal spectrum. A multifractal signal is characterized by the presence of intermittent periods of large and irregular fluctuations, denoting the interaction among timescales within the signal. Being the width of the multifractal spectrum, a measure of these interment periods, it serves as an index to quantify an structure of interactions between temporal scales (Ihlen and Vereijken, 2010).

DFA bins for parameter *n* have been defined logarithmically from 26 s to <sup>1</sup> <sup>4</sup> times the size of the time series and an intervals of 20.<sup>01</sup> s. For the MFDFA we have used the same values for the *n* bins and we have taken a value of *q* with values from −3 to 3 with intervals of 0.25.

# **3.3. STATISTICAL APPROACH**

The design of this experiment involves repeated measures per subject and, in order to account for this characteristic, linear mixed effect models have been computed. In a nutshell, mixed effect models are regression models that incorporate both fixed and random effects. Fixed effects are the independent variables of interest while random effects replicate the structure of the data (i.e., games within player in this case). As a consequence, the unexplained variation can be split into the variation between players and the residual variation between games within players. In this experimental design, the variable "type of opponent" ("human," "shadow agent," or "oscillatory agent") acts as the only fixed effect. Each player performs the experiment several times, so we include the variable "player" in order to account for the potential lack of independence of the repeated measures for each participant. In the next section, these techniques will be applied to the results of the experiment, showing the statistical validity of our study. More detailed description of the method can be found in the Supplementary Material Section S2.

# **4. RESULTS**

Above we proposed that some previous analysis made about the scale in which the dynamics of the perceptual crossing should be considered. We proposed instead that multi-scale analysis is better suited to unveil the structure of social interaction. In this section we perform different tests to explore the possibility of multi-scale interactions shaping the dynamics within the perceptual crossing experiment. We start by analyzing our results with measures similar to some used in previous analysis and propose the necessity of complementing them with other measures that are not constrained to one particular scale of behavior.

## **4.1. PRELIMINARY ANALYSIS**

Typically, analysis of the interaction dynamics in the perceptual crossing has not been concerned with the distribution of activity across different scales. For example, in Auvray et al. (2009) the two variables that explain the detection of another human player are the frequency of stimulation (the number of times a player receives an input from its opponent) and the probability of clicking (the probability of the player clicking their mouse in a 2 s interval after a stimulation). The setup in our task differs in that the players are not asked to click if they recognize a human player. However, here we substitute the probability of clicking for the probability of having a new stimulation in an interval defined as a given window after a previous stimulation. This measure is intended to capture the probability of engagement in an ongoing interaction between the two players. Unlike Auvray et al. (2009) we will not use a single value for the window length and will instead test the values 0.25, 0.5, 1, and 2 s (around 95% of stimulations happen within a window of 2 s after the previous stimulation). We will refer to the frequency of stimulation as *Fs* and the probability of consecutive stimulations in a window of length *L* seconds as *P<sup>L</sup> s* .

We conduct linear mixed effects modeling of the series corresponding to each measure and we obtain the results shown in **Table 1**. Here we show the *p*-value resulting from the comparison of the distributions corresponding to players when playing against an other human player and when playing against each type of bot . We can observe in the table how the frequency of stimulation *Fs* does not discriminate against different types of players. This result is different from the classical perceptual crossing results, and maybe caused by the fact that the participants play individually against each type of opponent. For the probability of consecutive stimulations *P<sup>L</sup> <sup>s</sup>* we observe that the result depends largely on the chosen value of *L*. For example, for the extreme values of 0.25 and 2 s we cannot discriminate human opponents against either of the two bots (setting the statistical significance level at 5%). Oscillator opponents however can be discriminated for windows of 0.5 and 1 s, and shadow opponents can only be discriminated for windows of 0.5 s. Thus, choosing a value of 0.5 s would give us a variable that allows us to statistically differentiate the different players, showing us that at that particular scale some opponents have more consecutive stimulations with the player than others (in this case, the shadow agent presents a higher probability of consecutive stimulations).

To asses the significance of the statistical results without the bias of choosing particular windows of analysis, we proceed to compute the distribution of inter-stimulation intervals *t* , that is, the time between one stimulation and the next. However, since the data for each player and round are scarce (around 40 mean stimulations by game, depending on the type of agent), we aggregate the data of different players and rounds (although this could entail losing some information about the data structure). The result of the cumulative probability can be observed in **Figure 1**, where we observe that the windows of discrimination in **Table 1** roughly coincide with the intervals in which the cumulative density functions overlap. This may indicate that what we are doing when we just take the probability of stimulation (or clicking) is integrating the density distribution of a process that unfolds over different scales (in our case ranging from 0.1 to 10 s).

**Table 1 | Results of the linear mixed-model effects for comparing stimulation frequency** *Fs* **and probability of consecutive stimulations** *PL <sup>s</sup>* **between the rounds where the player was facing other human player and the two cases of programmed agents (oscillatory and shadow agents).**


**participants and trials.** Values for the regions illustrated are: (dotted line) human vs. oscillatory agent, (dashed line) human vs. shadow agent, (solid line) both participants are human players.

Here we may question whether the fact that the results for a particular window are discriminative between agents is either the consequence of something relevant happening at that timescale, or it is instead caused by the different underlying structures of the temporal density distributions. In order to shed some light on this question we have represented the aggregated density distribution functions of the time between stimulations *t* for the three types of opponents (**Figure 2**). In the figure we can observe the presence of long tails that start around 0.5 s in the case of the shadow and human opponents, and that these long tails have different slopes in a logarithmic plot. This might be indicating that the statistically significant differences in the activity between the different 0.5 s windows are not the result of something happening at that scale, but the product of a deeper change in the temporal structure of the interaction. In that case the statistical difference at windows of 0.5 s may appear because the fact that we are integrating along all the smaller timescales.

To illustrate this point we offer the following example (**Figure 3**): imagine that we have a system in which we can access to two components *x*<sup>1</sup> and *x*2, each one being active at a different timescale. The same system may display different behaviors. We can imagine that stimulating *x*<sup>2</sup> the system switches from behavior 1 to behavior 2.*a*. As a result of the behavior change, we can find statistical differences between the distributions of *x*<sup>2</sup> in behavior 1 and 2.*a*, respectively. Also, we can consider a different condition in which we enhance the influence of variable *x*<sup>1</sup> over *x*<sup>2</sup> (in a process of phase modulation), making the system switch from behavior 1 to behavior 2.*b*. Again, we find statistical differences between the distributions of *x*<sup>2</sup> in behavior 1 and 2.*b*. The important point is that, while in the first case the statistical distribution of *x*<sup>2</sup> is provoked by a direct change in the activity of this variable (directly stimulating the component that produces it), in the second case the statistical difference in *x*<sup>2</sup> can only be explained by a change in the interaction between variables *x*<sup>1</sup> and *x*2. Similarly, significant statistical changes in a timescale of 0.5 s, might be the result of something relevant happening at that scale, or it may be the result of a reconfiguration of the whole temporal structure relating different scales of behavior.

The example in **Figure 3** indicates that by analyzing just one particular scale of the system we may be failing to capture the causes of a change in the system's behavior even in the case that we were able to find a statistical discrimination of the distribution of a variable. In the case of the perceptual crossing, we propose that previous analysis may be extended with analysis of the activity at different scales and the relation between these scales. We contend that taking into account the changes in the temporal structure of inter-stimulation times allows a fuller explanation of the statistical discrimination offered by simple indices such as the number of clicks or consecutive stimulations within a given window. Nevertheless, the analysis of the density distribution of aggregated data is too coarse to test this claim. We need to perform a more detailed analysis of the temporal structure within the individual interaction dynamics in each round to provide more conclusive results. We propose that statistical analysis of fractal and multifractal time series may be a better suited tool for this kind of problem.

**FIGURE 2 | Probability density function of the time between stimulations for different types of opponents aggregated among participants and trials.** Values for the regions illustrated are: **(A)**

human vs. oscillatory agent ("vs. oscillator"), **(B)** human vs. shadow agent ("vs. shadow"), **(C)** both participants are human players ("vs. human").

## **4.2. FRACTAL DYNAMICS IN THE INTERACTION PROCESS**

In this section we seek a more detailed analysis of the temporal structure of the interaction between the two players for the three kinds of opponent. In doing so, we need to extract the movements of the two players. In order to analyze the interaction between the subjects, we take the time series of the distance between the two players (or the player and the bot agent): (i) the first derivative of the distance is computed in order to obtain the variations in the distance (whether the players are approaching or distancing themselves at each moment of time given that we are interested in the coordinated movements of the players, not their positions); (ii) we use the DFA and MDDFA algorithms to compute the structure of correlations across scales in the data series and (iii) we perform a linear mixed-effects modeling in order to observe if the DFA and MDFA exponents are capable of differentiating between the interaction dynamics depending on the type of opponent the player is facing (oscillatory, shadow or human).

As a first step in the analysis, we observe the results of individual DFAs in different rounds. In **Figure 4** we show some representative examples of the types of temporal structures displayed by the interactions with each type of agents. Since the slope of the fluctuations in a logarithmic plot is not always linear for all scales, we check if there is any cutoff value in which the linear relationship is truncated. We do this by searching for negative peaks in the second derivate of *F*(*n*). The search of cutoff values is only performed in the right half of the *n* axis, in order to find only the cutoffs at larger scales. Once the cutoff value is found, we analyze the slope *F*(*n*) for the values of *n* in the decade just below the cutoff value (e.g., **Figures 4A,B**) . In the cases where there is no cutoff value (as in **Figure 4C**) we analyze the interval *n* ∈ [10−0.5, <sup>10</sup>0.5].

For the oscillatory agent, we can observe in **Figure 4A** a flatland at higher values of *n*, followed by a steep linear slope with a β parameter around 1.5. For lower values of *n* the linear slope disappears. This kind of fluctuation is characteristic of oscillatory dynamics, with the transition from flat to slope being equal to the period of the oscillations. In the other case, for the shadow agent, **Figure 4B** presents something similar to a linear slope in

**FIGURE 4 | Fractal analysis calculated on interactive patterns between two participants.** Values for the regions illustrated are: **(A)** human vs. oscillatory agent ("vs. oscillator"), **(B)** human vs. shadow agent ("vs.

the middle of the fluctuation spectrum, although the slope linearity breaks at the extremes. The slope of the fluctuation gives an exponent somewhat higher than β = 0. This suggests that weak short-range correlations exist (close to a white noise structure), but they do not hold for longer timescales. Finally, in **Figure 4C**, when a player faces another human player, the fluctuation spectrum displays a linear slope with a β exponent close to a pink noise spectrum (β = 1). In a large part of the series, the fractal slope reaches the largest timescales, showing that the correlations of the interaction dynamics cover a wide range of the spectrum. In **Figure 4C**, fractal relations covering the hole spectrum are illustrated, although there are many other cases which present a cut-off point at large scales breaking the linear relation. We propose that the existence of fractal 1/*f* patterns covering the whole analyzed spectrum just in some cases of human-human interaction may be related with the fact that in some cases interaction will be successful during the some round but other cases will experience a breakdown in the interaction, leading to disruption in correlation at longer timescales.

**Figure 5** shows three particularly representative examples of the three kinds of populations in the experiment. Particularly, in **Figure 5A** we can observe the boxplots of β for the different types of interaction. When the opponent is the oscillatory agent, we find that the values of β in the time series are around 1.5. This means that the interactions are closer to a brown noise structure, signifying that the interaction is more rigid and structured than in the other cases. This makes sense since the movement of the oscillatory agent is constraining the interactions into its cyclic movement structure. On the other hand, when the opponent is the shadow agent, we have the opposite situation in which the interaction dynamics tend to display values of β greater but close to 0. This means that the history of interaction is more uncorrelated. Using a linear mixed-effects model we asses that indeed the three distributions of β are different from each other. We tested this idea appropriately using linear mixed-effects models of the three types of opponents (oscillatory agent, shadow agent, and human) to assess the presence of statistically significant differences between the density distributions of β. Using a linear mixed-effects model we can test that beta is a significant parameter for distinguishing the different kinds of interactions depending on the type of opponent [*F*(2, 93) = 258.350, *p* < 0.0001].

As we have mentioned above, fractal analysis is a mathematical procedure to determine scale invariant structures in a dataset. Monofractal signals have the same scaling properties throughout the entire signal, therefore, can be indexed by a single global exponent (that is known as the Hurst exponent, see Section 3.2). Alternatively, when spatial and temporal variations in a scale invariant structure appear, we get a "multifractal structure" that can be decomposed into different subsets characterized by different local Hurst exponents (denoted as *h*) which quantify the local scaling of the time series. With this collection of exponents, we characterize their scaling properties: any deviation from the average fractal structure for segments with large and small fluctuations is captured by the "multifractal spectrum width," denoted by *D*(*h*). In particular, the resulting multifractal spectrum is represented by an arc defined as the difference between the maximum and minimum values of the local Hurst exponent for each scale [*D*(*h*) vs. *h*]. Thus, the width of this spectrum is a measure of the degree of multifractality and will be zero for a monofractal series. The higher the value of the width the more multifractal the spectrum will be.

In order to verify the non-linear intermittent structure of the involved processes behind the patterns analyzed above, we also analyze the width of the multifractal spectrum of the derivative of the distance between players. For each case, we calculate the width of the multifractal spectrum using the MFDFA algorithm and plot the distributions of the obtained values depending on the type of opponent (**Figure 5B**). The probability distribution of the multifractal spectrum width *h* on the oscillatory agent is more concentrated around small widths, indicating little interaction between the time-scale of the oscillation frequency of the agent and the time-scales of the movement of its human opponent. Larger values on the distribution of the shadow agent indicate stronger interaction between its time-scales. Finally, the distribution of the human agent reaches the largest values of the multifractal spectrum width, suggesting a rich time-scale dynamics prompted by the interactivity between the time-scales of the movements of a pair of human opponents. Again, a linear mixedeffects models shows us that the distributions of values of *h* are different depending on the type of opponent [*F*(2, 93) = 258.350, *p* < 0.0001].

The fractal and multifractal spectrum results show that the relative velocity of the player with respect to their opponent in the interaction process present different distributions depending if genuine social interaction is happening or the player is interacting with an artificial agent with trivial (oscillatory) or complex (shadow) patterns of movement. It is interesting that 1/*f* noise emerges for a collective variable (the derivative of the distance) only in the case of human-human interaction, suggesting that long-range correlations emerge in the shared space of social interactions and genuine social interaction is characterized by the collective evolution of the dyadic exchange. In those cases where the interaction between the players is too rigid or too weak, the emergent multiscale phenomenon disappears. Multifractal seems to support this claim. To further test this proposal and determine if the same results can be obtained from non-collective variables, we will compare this results with the behavior of individual variables of the player and their opponents.

## **4.3. COMPARING FRACTAL EXPONENTS IN INDIVIDUAL AND COLLECTIVE VARIABLES**

One of the ideas behind much of the work in the perceptual crossing paradigm is that the interaction between subjects is a constitutive element of social cognition (Auvray et al., 2009). If that is true, the characteristics of a genuine social interaction should appear only in dyadic variables such as the relative velocity between subjects and should be absent in individual variables such as the individual movement of the player or their opponent. For testing to what extent this is true, we repeat the fractal and multifractal analysis above using the velocity of the player and the velocity of their opponent, instead of the relative velocity between the two. Thus, we can test if the differences in the fractal emergent structure takes place in the shared space of social interaction or are instead phenomena that may be accounted for by the changes in individual dynamics alone.

In **Figure 6** we can see how in this case the boxplots of β and the multifractal spectrum width *h* show more overlapping among the distributions corresponding to the different opponents.

We tested this proposal appropriately using linear mixedeffects models of the three types of opponents (oscillatory agent, shadow agent and human) to assess the presence of statistically significant differences between the density distributions of β (**Table 2**) and *h* (**Table 3**) for three different cases: (i) the relative velocity between the player and its opponent (labeled in the tables as the "interaction" case), (ii) the individual velocity of the player (labeled as "player") and (iii) the individual velocity of the opponent (labeled as "opponent"). Both tables include the corresponding *p*-values resulting from the modeling.

Given the results shown in both tables and setting the significance level at 5%, we can conclude that only in the case of the relative velocity between the agents ("interaction" columns) all three distributions are statistically significantly different for both β and *h*.

For the case of the velocity of the player, we cannot assure an statistically significant difference between the distributions of β and *h*. In the case of the velocity of the opponent, we could only find evidence of statistically significant differences between the oscillatory agent and the other two kinds of opponents, but not between the human opponent and the shadow agent.

The obtained results show that individual variables are not suitable for discriminating between the kind of interaction going on in the case of the shadow agent. This reveals that when the individual behaviors have some kind of complexity, what it is relevant in terms of the emergence of social interaction is what is going on in the interaction between the two subjects and not the complexity of their individual behaviors.

# **5. DISCUSSION**

In this paper we have revisited some of the results of the research program around the perceptual crossing paradigm. As we have seen, in recent years, this paradigm has allowed the study of social interaction in its simpler form and has offered very interesting experimental results to try to understand what kind of processes underly the emergence of social engagement. In particular, we have addressed a new version of the experiment in

case when we analyze the velocity of the opponent. Values illustrated refer to interactions between: a human and a oscillatory agent ("vs. oscillator"), a human and a shadow agent ("vs. shadow") and two human participants ("vs. human").

**Table 2 | Results of the linear mixed-model effects for comparing the fractal** *β* **exponent from DFA results between the rounds where the player was facing other human player and the two cases of programmed agents (oscillatory and shadow agents).**


*The left column (interaction) reflects the results when the relative velocity between the players is analyzed, central column (player) shows the results when the velocity of the player is analyzed and the right column (opponent) the velocity the opponent.*

which the player can face only one human player or an artificial agent that shows either (i) an oscillatory movement or (ii) behaves as a temporal "shadow" of the player. After analyzing the different kinds of social engagement dynamics generated, we have found that a fractal 1/*f* structure (with high multifractal indices) at many timescales of the history of collective interactions only emerges in the case of genuine social **Table 3 | Results of the linear mixed-model effects for comparing the fractal** *h* **exponents from MFDFA results between the rounds where the player was facing other human player and the two cases of programmed agents (oscillatory and shadow agents).**


*The left column (interaction) reflects the results when the relative velocity between the players is analyzed. Central column (player) shows the results when the velocity of the player is analyzed and right column (opponent) the velocity of the opponent.*

interaction (i.e., the "human vs. human" case) and not in other cases ("human vs. agent"). In this respect, our results present a new interpretation of the results obtained in previous perceptual crossing experiments: there could be some limitations in the approach take in previous analyses of the social engagement process, which have been often restricted to studying a single temporal scale and consequently falling short for capturing the complex unfolding of the different levels of cognitive and social interaction.

This interpretation offers a new conceptualization of the directions in which we should focus attention: given the results shown in this paper, it is possible that the emergence of social engagement might not depend solely on either the stability of coregulative dynamics between two participants as suggested in previous perceptual crossing experiments and simulations (Di Paolo et al., 2008; Auvray et al., 2009). Furthermore, the results obtained let us propose that genuine social engagement might be better characterized by a structure of cross-scale interactions that we try to capture analyzing fractal 1/*f* scaling and multifractal spectrum. Moreover, fractal and multifractal exponents showed no statistically significant differences when we analyzed the velocity of the player or their opponent compared to the relative velocity between them, leading us to conclude that the emergence of a 1/*f* structure for genuine social interaction is something that happens only in the shared space between the two subjects, and the process cannot be reduced to the individual dynamics of any of them.

However, this work leaves several questions unanswered. The first concerns what an adequate framework of analysis might be and how previous and new insights can be integrated in a larger framework. The framework presented here still needs to be extended, since 1/*f* scaling and the multifractal spectrum reduce the complexity of multiscale dynamics to a single exponent that, although detecting the presence of activity at different scales, falls short of being able to characterize the nature of crossscale interactions in detail. Multi-scale synchronization analysis employed for measuring inter-brain synchronization in social tasks appears to be a suitable candidate for extending the analysis presented here with multiscale synchronization analysis of behavioral dynamics (Dumas et al., 2010; Müller et al., 2013). More detailed analysis may also offer new points of connection with previous work and alternative explanations for the phenomena observed in the perceptual crossing experiment.

Another way forward may lie in modifications of the perceptual crossing experiment which may prove helpful in better understanding the cross-scale interactions in minimal social interaction. Interesting advances following this approach include the work of Iizuka et al. (2013), which studies the emergence of a communication system between two participants, using the perceptual crossing set up to collectively categorize different symbols. Also, (Froese et al., 2014b) explore the evolution of interaction of fixed pairs of players during several rounds with the objective of establishing a team for finding each other, observing that at some point the players simultaneously become aware of each other. This kind of extended experiment may allow the study of correlations at larger scales than just instantaneous online recognition, allowing us to analyze interesting dynamics as learning, development of shared patterns and joint development of the player's mutual dynamical entanglement.

Finally, it could also be interesting to apply some of these ideas to the simulation domain. Some of the attempts to model agents that could perform the perceptual crossing task were based in an agent vs. agent joint evolution using a genetic algorithm maximizing the number of interactions between the agents. We are concerned that such minimalistic scenarios as the perceptual crossing experiment may bias co-evolution of agent toward simple behaviors that exploit only one scale of behavior to maximize the outcome (e.g., simple oscillatory behavior). Maybe other evolution strategies could be used, for example trying to evolve turn taking behavior (Iizuka and Ikegami, 2004). Another interesting extension to tackle this problem could be to explore the possibilities of mixed environments shared by human and robotic agents in order to allow a richer repertoire of dynamics that could be exploited for learning and tuning of the modeled agents.

## **ACKNOWLEDGMENTS**

This research has been partially supported by the project TIN2011-24660, funded by the Spanish Ministry of science and Innovation, and the project FCT-13-7848, funded by the Spanish Foundation for Science and Technology. Miguel Aguilera holds a FPU predoctoral fellowship with reference AP-2010-6036. The authors would like to thank Xabier Barandiaran for comments that helped improved the manuscript, all the study participants for giving up their time and, finally, Altea Lorenzo and Dominic Duckett for his valuable assistance in language editing.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fpsyg. 2014.01281/abstract

## **REFERENCES**


in *Self-Organized Biological Dynamics and Nonlinear Control*, ed J. Walleczek (Cambridge: Cambridge University Press), 66–96. doi: 10.1017/CBO97805115 35338.006

Reddy, V. (2008). *How Infants Know Minds.* Cambridge: Harvard University Press.

Van Orden, G. C., Holden, J. G., and Turvey, M. T. (2003). Self-organization of cognitive performance. *J. Exp. Psychol. Gen.* 132, 331–350. doi: 10.1037/0096- 3445.132.3.331


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 19 May 2014; accepted: 22 October 2014; published online: 12 November 2014.*

*Citation: Bedia MG, Aguilera M, Gómez T, Larrode DG and Seron F (2014) Quantifying long-range correlations and 1/f patterns in a minimal experiment of social interaction. Front. Psychol. 5:1281. doi: 10.3389/fpsyg.2014.01281*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Bedia, Aguilera, Gómez, Larrode and Seron. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Assessing embodied interpersonal emotion regulation in somatic symptom disorders: a case study

# *Zeynep Okur Güney1,2 \*, Heribert Sattel 1, Daniela Cardone3,4 and Arcangelo Merla3,4*

<sup>1</sup> Department of Psychosomatic Medicine and Psychotherapy, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany


#### *Edited by:*

Hanne De Jaegher, University of the Basque Country, Spain

#### *Reviewed by:*

Timo Partonen, National Institute for Health and Welfare, Finland Patrick Luyten, University of Leuven, Belgium Claudia Subic-Wrana, Universal Medical Center of Johannes Gutenberg University Mainz, Germany

#### *\*Correspondence:*

Zeynep Okur Güney, Department of Psychosomatic Medicine and Psychotherapy, Klinikum rechts der Isar, Technical University of Munich, Langerstraße 3/1 81675, Munich, Germany e-mail: z.okur@tum.de

The aim of the present study was to examine the intra- and interpersonal emotion regulation of patients with somatic symptom disorders (SSDs) during interactions with significant others (i.e., romantic partners).We presented two case couples for analysis.The first couple consisted of a patient with SSD and his healthy partner, whereas the second couple consisted of two healthy partners. The couples underwent an interpersonal experiment that involved baseline, anger and relaxation tasks. During each task, partners' cutaneous facial temperature, heart rate and skin conductance levels were measured simultaneously. Participants' trait-emotion regulation, state-affect reports for self and other, and attachment styles were also examined. The experimental phases were successful in creating variations in physiological processes and affective experience. As expected, emotion regulation difficulties predicted higher increase in the course of temperature at each phase. Besides, the patient showed restricted awareness and reflection to emotions despite his higher autonomic activity compared to healthy controls. Both partners of the first couple revealed limited ability in understanding the other's emotions, whereas the second couple performed relatively better in that domain. The temperature variations between the patient and his partner were significantly correlated while the correlations of temperature changes between the second couple were negligible except anger task. The study supported the merits of an embodied interpersonal approach in clinical studies. The tentative results of the cases were discussed in the light of findings in emotion regulation and attachment research.

## **Keywords: emotion regulation, somatic symptom disorders, interpersonal interactions, embodiment, anger, relaxation**

# **INTRODUCTION**

Somatic symptom disorder (SSD) is characterized by persistent somatic disturbances, which cause severe impairment in patients' daily life (DSM-V). The disturbances are accompanied by excessive and dysfunctional thoughts, affects, behaviors or health concerns. Psychological factors contribute to the development, course and treatment of these disorders (Henningsen et al., 2003; Sattel et al., 2012). The overlap of multiple somatic symptoms, comorbidity with psychiatric and psycho-social disturbances, absence of clear diagnoses and ineffective treatments make SSD both difficult to treat and costly for society (Wessely et al., 1999; Henningsen et al., 2007). Such an overlap of multiple physical and psychological symptoms renders SSD as being neither purely physical nor mental but truly psychosomatic (Wessely et al., 1999; Henningsen et al., 2007).

An increasing number of studies highlight the presence of emotion regulation disturbances in SSD, such as emotion suppression (Burns et al., 2011; Gul and Ahmad, 2014), rumination, catastrophizing (Hadjistavropoulos and Craig, 1994; Garland et al., 2011), decreased ability to up-regulate positive emotions (Zautra et al., 2001), imbalance in physiological arousal (Pollatos et al., 2011a,b), and diminished ability in emotional awareness

(Waller and Scheidt, 2004; Subic-Wrana et al., 2010) and emotion recognition (Beck et al., 2013). In addition, difficult transference and counter-transference in psychotherapy related to patients' resistance to experience emotions was reported (Yasky et al., 2013).

## **COHERENCE BETWEEN EMOTION RESPONSE SYSTEMS IN SSD**

Theories that explain the nature and development of SSD put an emphasis on the role of emotion regulation disturbances (see Waller and Scheidt, 2006 for a thorough review of the theoretical models). For example, early psychodynamic theories depicted somatic symptoms as defenses of the unconscious unresolved affective conflicts (Freud, 1961). Alexander (1950), "one of the founders of psychosomatic medicine," posited that, if affectrelated physiological arousal is not realized into action, in time, it is experienced as disturbing physiological states. Deficits in symbolic affect representation, such as limited emotional awareness and ability to reflect on and describe emotions (i.e., alexithymia) were identified as typical to SSD (Sifneos, 1973; De Gucht and Heiser, 2003; Subic-Wrana et al., 2010). Similarly, an impaired integration of symbolic (language, imagery) and subsymbolic emotion schemas (sensory, somatic, and motoric forms) was asserted to

<sup>2</sup> Department of Psychology, University of Kassel, Kassel, Germany

feature SSD (Bucci, 1997). Attachment theories also point to the disequilibrium among stress regulating networks associated with insecure attachment style (Luyten et al., 2012). It is posited that, having internalized certain dysfunctional attachment patterns and regulation strategies, patients with SSD tend to regulate stress by employing these strategies later in life. This may lead to imbalance between stress response networks, which are associated with impairments in patients' ability of embodied mentalization (i.e., understanding one's own and others' feelings and intentions, and linking these internal processes with the body; Luyten et al., 2012).

The theoretical models mentioned above as well as existing empirical research imply a pattern of emotion regulation in SSD, which is characterized by incoherence between emotion constituents. Supporting the postulation of incoherent emotional processing, a systematic review on emotion regulation in SSD (Okur et al., in revision) revealed that patients with SSD tend to detach from the emotion by means of disengaging the cognitivebehavioral components of emotion from the emotional perturbations. For instance, patients were shown to have higher levels of alexithymia and reduced ability in emotion recognition and affective theory of mind (Subic-Wrana et al., 2010; Beck et al., 2013; Castelli et al., 2013; Haas et al., 2013; Stonington et al., 2013). On the other hand, the few available studies having examined somatic components of emotions demonstrated aberrant or vigilant somatic reactivity, such as greater startle responses, paraspinal muscle reactivity, sympathetic activity or stress sensitivity in SSD (Seignourel et al., 2007; Burns et al., 2008; Twiss et al., 2009; Luyten et al., 2011; Pollatos et al., 2011a,b).

Emotion theories generally agree that emotion response system has multiple components coordinating with each other (Hollenstein and Lanteigne, 2014). The concordance among these physiological, behavioral and experiential response systems, which facilitates adaptive and coordinated responses as the emotion unfolds over time, is described as emotional coherence (Mauss et al., 2005). Although almost all emotion theories agree on some degree of coherence between the emotion response systems, empirical studies show quite mixed findings (Mauss et al., 2005; Hollenstein and Lanteigne, 2014). Recently, several theoretical and methodological issues related to emotional coherence were particularly addressed in a special issue (Hollenstein and Lanteigne, 2014). It was argued that, the inconsistent findings might be related to methodological errors such as non-correspondent timing or obstruction of concordance with individual differences, such as emotion regulation (Mauss et al., 2005; Butler et al., 2014; Hollenstein and Lanteigne, 2014). When taking precautions regarding these errors, the authors could show moderate to high coherence.

We also argue that, since emotional process is a continuous, inseparable regulating and regulated system (Davidson,1998;Kappas, 2011), a person's own emotion regulation patterns constantly influence the emotional coherence. Therefore, it is probable that level of coherence would vary between people having distinct patterns of emotion regulation, as would be the case in patients with SSD. In fact, some studies exist supporting the effect of emotion regulation on coherence. For example, a study comparing participants with different body awareness levels showed that experienced Vipassana meditators (awareness of visceral sensations) had the highest coherence between physiological changes and subjective experience. This was followed by experienced dancers (awareness of somatic sensations) and then controls with no experience of bodily exercises (Sze et al., 2010). Deliberately employed emotion regulation strategies affect the coherence as well. Emotion suppression was found to decrease the coherence between physiological, behavioral and experiential subsystems although acceptance of emotions was not (Dan-Glauser and Gross, 2013). Lending support to these findings, reappraisal was reported to increase the concordance for positive emotions, but to decrease it for the negative ones (Butler et al., 2014).

These findings illustrate the potential effects of emotion regulation on concordance between emotion response systems. Emotion regulation patterns, which patients with SSD unconsciously or deliberately deploy, might affect the coherence between emotional constituents. Hence, in the light of the literature on emotion regulation in SSD, we hypothesize that incoherence in emotional process characterizes the regulation patterns of patients with SSD, which is moderated by attachment and trait emotion regulation styles. This incoherent process is described by disengagement of cognitive components from the emotional perturbations but greater physiological stress responses marked by higher activity or vigilance at the somatic components of emotion. Our proposed assessment of intrapersonal emotional incoherence relies on the extent of discrepancy between emotional responses, which is manifested by restricted expression of and reflection on emotions. Simultaneously, we expect an aberrant and reactive sympathetic nervous system response.

## **INTERPERSONAL REGULATION OF EMOTIONS IN SSD**

Interpersonal factors, which are proposed to play a role in the development of emotion regulation disturbances in SSD, continue to trigger and maintain the psychosomatic symptoms later in life. There is quite a consensus on the role of interpersonal interactions, attachment and trauma history in dysregulated affect of SSD that is linked to alterations in the endocrine, immune, and pain regulating systems (Henningsen, 2003; Luyten et al., 2013). Lending support to this linkage, a shared neural system for social pain, such as rejection, exclusion or loss, and physical pain is acknowledged (Kross et al., 2011; Eisenberger, 2012; Landa et al., 2012).

In the developmental history of SSD, an "emotional avoidance culture" with significant adults was described, which was associated with patients' disconnection of awareness from stress reactions in the body (Bondo-Lind et al., 2014). Besides, insecure attachment history and related impairments in interpersonal emotion regulation between the caregiver and child, such as nonexpression of emotions, is commonly reported in SSD (Waller and Scheidt, 2006). Patients with SSD were reported to regulate stress by deactivating or hyperactivating attachment strategies later in life that have adverse metabolic and interpersonal consequences (Luyten et al., 2012). For example, denial of attachment needs (Luyten et al., 2012), minimization of affective experience or expression (Waller and Scheidt, 2004) or impaired embodied mentalization (Luyten et al., 2012) as well as over expression of negative affect with respect to bodily complaints and clinging behavior (Waller and Scheidt, 2006) can govern the interpersonal interactions of the patients. These dysfunctional strategies in turn generate a vicious cycle of further interpersonal distress, exacerbation of the symptoms, and further stress and symptoms (Luyten et al., 2012). Such regulation strategies can be linked to incoherence among emotion response systems. For example, in subjects with high avoidant attachment, discordance between psychological and endocrine stress measures was found. However, in subjects with low avoidant attachment, these measures were significantly correlated (Ditzen et al., 2008).

Although studies exist having examined the perceived social interactions with significant others in SSD, there is a scarce literature on how on going affects during patients' interaction with significant others are co-regulated. Self-report studies show less supportive and cohesive family environment, conflicts in marital relationship (Mullins and Olson, 1990), frustration and helplessness of physicians, and rejecting behavior from significant others (Stuart and Noyes, 1999). A few available studies focusing on the dynamic interaction between couples have shown that interpersonal emotion regulation, such as validation or invalidation of a partner's affective experience has predictive roles on experience of pain (Cano et al., 2008; Leong et al., 2011). A psychotherapy study has also demonstrated how affective experience of both patients and therapists influence each other, ensuing with an increased expression of negative affect (Merten and Brunnhuber, 2004). To our knowledge, no previous study has examined the dynamic coordination of physiological, experiential, and behavioral emotion response systems of patients with SSD and their interaction partners. In this study, we aim to fill in this gap by examining the relationship of affective experience, autonomic activity and trait emotion regulation of both interacting partners. Such a paradigm would facilitate a meeting of psychosomatic research with an embodied, dynamic and interpersonal approach.

We believe that studies from social cognition and developmental research on intersubjectivity can provide much insight to clinical research by introducing the constitutive aspects of social interaction, such as coordination or reciprocity. In fact, it is highlighted that the process of social interaction cannot be sufficiently grasped by examining the mere static interaction of individual elements, since social interactions possess dynamic features such as self-organization and autonomy (Di Paolo and De Jaegher, 2012). In line with such developments in social cognition, emotion regulation research has incorporated the dynamic parameters of interpersonal interactions such as emotion contagion, reciprocity, coupling, synchronicity or co-regulation, which describe the temporal emotional exchange and covariation between persons (Butler, 2011). These aspects can also uncover implicit emotion regulation patterns, which are described as processes operating free of conscious supervision (Koole and Rothermund, 2011).

In the context of these recent developments in social cognition and emotion research, we inquire how affect dysregulation takes place in interactions of patients with SSD with significant others (i.e., romantic/life partner). We propose that: (1) Intrapersonal emotional incoherence in SSD is more likely to be reciprocated by an emotional incoherence in the interaction partner. This may leave the affective exchange dysregulated and generate a system of incoherent interpersonal emotional processing. This

persisting dysregulated affect at intra- and interpersonal levels might exacerbate bodily disturbances. Here, we define interpersonal emotional coherence as the correlation between interaction partners' physiological and subjective affective response systems. (2) The parameters of autonomic nervous system activity will be less concordant during emotional interactions between dyads with SSD as compared to healthy control dyads. This concordance will be moderated by the attachment and trait emotion regulation styles of the partners.

We deemed it necessary to employ a paradigm involving real-time dyadic emotional interaction tasks (i.e., dyadic stress interview paradigms) that allows for the measurement of temporal affective exchange between persons. A base-line interpersonal task without an emotional manipulation would enable the comparison between different affective states as well as the acclimation of the participants to the experiment. Following that, an emotional interaction task that elicits a high level of arousal and negative valence, ensued by a relaxation task low in arousal and positive in valence, would permit us to examine the down- and up-regulation of emotions.

Concerning participants, comparing patient-healthy partner dyads with both healthy partner dyads would be elucidative in understanding the affective interaction patterns, that may exacerbate the symptoms, such as the reciprocal nature of dysregulated affect. In order to provide homogeneity in the sample of forthcoming studies, we aimed to focus on a certain group of SSD; somatoform pain disorder.

Anger was reported as both a particular predictor and outcome of chronic pain (Fernandez and Turk, 1995; Burns et al., 2008; van Middendorp et al., 2010). Patients' appraisals with regard to chronic experience of pain, together with persistent treatment failures as well as not being heard by significant others and health professionals generate habitual anger in patients (Fernandez and Turk, 1995). Furthermore, high trait anger experience, as well as suppressing anger was shown to exacerbate pain through activating endocrine and muscular systems of the body (Bruehl et al., 2007; Burns et al., 2011). Therefore, anger was chosen as a central theme of the dyadic interaction task. In addition, dysfunctional regulation of positive affect was reported to be a distinctive feature of somatoform pain as opposed to "medically explained" pain (Zautra et al., 2001, 2005). In line with these findings, we aimed to examine the interplay of both down regulation of anger and upregulation of positive affect in somatoform pain patients during interpersonal interactions. In order to activate attachment styles that would arouse characteristic emotion regulation patterns, the interaction partner was thought to be a significant other for the patient (i.e., romantic partner). In order to measure emotional coherence and affective exchange, assessment of multiple components of emotion, namely, state and trait subjective reports for emotion regulation, as well as autonomic nervous system measures were included. Below, we demonstrate the two case studies that we conducted employing our proposed paradigm.

## **MATERIALS AND METHOD PARTICIPANTS**

This study was approved by the Ethics Commission for the Faculty of Medicine of the Technical University of Munich (TUM). The first couple invited to participate in the study consisted of one patient and her partner. The patient was admitted to the Department of Psychosomatic Medicine at TUM and fulfilled the diagnostic criteria of persistent somatoform pain disorder (ICD-10 F45.40). As a comparison case, a healthy control couple who were found through the internal communication network of TUM was also recruited to the study. The first couple was between 40 and 50 years old and the second was between 30 and 40 years old.

## **PROCEDURE**

The experiment appointments were arranged by telephone interviews. In the telephone interview, participants were screened for the existence of any medical or psychological disturbance, as well as for use of painkillers or any other medication, particularly for control purposes. Participants were asked not to take any stimulants (e.g., coffee, tea, nicotine) less than 2 h prior to testing. Upon arrival at the laboratory, couples were given oral and written briefing about the experiment and informed consent was obtained. All participants were screened for medical and psychological health status, use of medication, pain or any received treatment with an anamnestic questionnaire. The control couple and the partner of the patient did not report any health-related disturbances. Following the demographic and health screening, both partners filled in questionnaires on emotion regulation and pain experience. Thereafter, participants were invited to the experiment room and prepared for the physiological measurement. Couples underwent three phases during the experiment, which were composed of interactions with their partners. A trained interviewer, who was blind to the study hypotheses, led the couple interactions. During the entire three phases of the experiment, video recordings and physiological responses were taken of two partners. Immediately after each phase, participants reported their emotional experience and their perceptions of their partner's emotional experience. In addition, after the dyadic anger induction task, participants were given questionnaires to assess attachment styles and stateanger experience (See **Figure 1**, for a schematic plan of the study process).

## *Emotion induction tasks*

*Baseline.* For the baseline assessment, the interviewer facilitated a 5–7 min dialog between couples about an emotionally neutral event, such as trip to the lab, events of the day or the weather as suggested by previous studies (Gottman and Levenson, 1999).

*Real-time dyadic interaction phase for anger induction.* Compared to other methods such as movie clips or punishment tasks, interview methods have been shown to be more effective in eliciting emotions and creating physiological variations (Lobbestael et al., 2008). Furthermore, in comparison with other methods, such as showing participants pictures or videos, autobiographical recall and reliving past experiences are more effective in eliciting emotions, particularly because they are self-relevant (Ellsworth and Scherer, 2003; Kross et al., 2009). Therefore, the interview method was utilized to elicit a dynamic emergence of anger in the couples. For this task, the couples were instructed to identify a mutual past event that generated a strong feeling of anger, which could be well recalled for the experiment. One of the partners was instructed to recall and verbally describe the event. Then, both partners were invited to talk about the event, the nature of the stressor, and their thought and feelings as genuinely as possible (see Dimsdale et al., 1988). Both couples chose a conflictual topic between them, which opened up further discussions during the conversation. The interviews lasted between 15 and 20 min.

*Relaxation phase.* After the anger induction task, participants were instructed to extricate themselves from the negative state by pursuing an audio progressive-muscle relaxation and imagination exercise that lasted ∼12 min.

## **MEASURES**

## *Physiological recordings*

Continuous thermal imaging recordings of the face, and measures of heart rate (HR) and electrodermal activity were taken from each partner simultaneously during the entire phases of baseline, anger, and relaxation phases.

*Thermal imaging.* Thermal imaging is a contact free method used for measuring autonomic activity manifested by variations in the cutaneous temperature, through recording of thermal infrared signals. This method was proven to be a non-invasive and robust way for measuring autonomic activity during emotional interactions (Ebisch et al., 2012; Ioannou et al., 2014). Thermal imaging was performed using two digital cameras: FLIR, SC660 (640 × 480 bolometer, FPA, sensitivity: <30 mK @ 30◦C), and FLIR SC655 (640 × 480 bolometer, FPA, sensitivity: <50 mK @ 30◦C). The cameras were positioned behind and just over the head of each partner, so that each camera could record the partner opposite to its position. The sampling rate was five frames/second. Variations in cutaneous temperature of the facial regions of interest were analyzed using customized Matlab programs (http://www.mathworks.com). Our primary regions of interest, the nose and the forehead were selected based on previous studies in primates and humans (Merla and Romani, 2007). After the thermal imprints were inspected visually for the recording quality, the thermograms were corrected for movement artifacts.

*Heart rate.* Heart rate was assessed with a continuous electrocardiogram recorded with Nexus-10 equipment (Biotrace, Mind Media BV). Signals were recorded (sampled at 256 Hz) and analyzed by the computer-based Biotrace software system. A three electrodes array for each partner, which simultaneously recorded the HR of both, was used. One electrode was placed on the left and another on the right shoulder of the participant. The third electrode was placed on the left side, below the lead on the left shoulder, under the 10th rib. Before placing the electrode, the skin was cleaned to improve the quality of the signal. After the signal stabilization was achieved, data acquisition was registered. Following the collection of the data, the ECG data curves were then visually inspected for possible movement artifacts and no abnormalities were detected in any participant.

*Skin conductance level (SCL).* Skin conductance level was recorded using the Nexus-10 device of Biotrace system, following the standard published guidelines (Boucsein, 2012). Velcro straps were attached to the II and III fingers of the participants' non-dominant hand. Before placing the electrode, the skin was scrubbed to improve the quality of the signal. After the signal stabilization was achieved, data acquisition was registered at 32 Hz sample rate.

## *Subjective reports*

Before the experiment, participants' trait emotion regulation patterns were assessed by subjective measures of emotional awareness (Level of Emotional Awareness Scale, LEAS; Subic-Wrana et al., 2001), alexithymia (Toronto Alexithymia Scale-20, TAS-20; Bach et al., 1996) and anger regulation (Spielberger State-Trait Anger Expression Inventory, STAXI; Schwenkmezger et al., 1992). TAS is the most commonly used self-report measure of alexithymia differentiating three areas of emotion regulation difficulties: difficulty in identifying feelings, difficulty in describing feelings, and externally oriented thinking. Despite being the best-validated instrument for alexithymia, its use may be biased due to its paradoxical reliance on patients' insight on their own ability of

emotional self-reflection (Waller and Scheidt, 2004). On the other hand, LEAS is a performance-based instrument, consisting of twenty emotion-eliciting scenarios where the subjects report how they and the other person in the scene would feel (Lane et al., 1990). The advantage of this scale is that it enables an assessment of both conscious and sub-conscious levels of awareness of both one's own (LEAS-self) and other's (LEAS-other) emotions (Subic-Wrana et al., 2014). This instrument was shown to be related with a capacity of mentalization, which reflects the ability to interpret ones' own and other's feelings, thoughts and intentions (Subic-Wrana et al., 2010).

Participants' experience of pain intensity and pain sensations was examined by the Brief Pain Inventory (BPI; Radbruch et al., 1999) and the Pain Experience Scale (SES; Geissner, 1995). In order to assess participants' affective experience and pain during each experimental phase, participants were given a scale for pain and affective experience immediately after each phase. This scale consisted of a visual analog scale for pain, as well as a non-verbal, pictorial affective scale that assesses the pleasure, arousal and dominance aspects of affective experience (Self-Assessment Manikin, SAM; Fischer et al., 2002). SAM was advocated to be a quick and more implicit way of measuring affective experience, particularly because it is a non-verbal cartoon-like graphical assessment of affect (Fischer et al., 2002). It has a nine-point scoring system for measuring pleasure (unhappy to happy), arousal (calm to excited) and dominance (controlled to controlling). Arousal describes the perceived vigilance as a psychological and physical state, while pleasure describes the positive or negative feelings. Dominance describes how much a person feels control in a situation. In addition to SAM, right after the anger task participants were given the state anger subscale of the STAXI, as well as the Experiences in Close Relationship Scale-Revised (ECR-R; Ehrenthal et al., 2009). ECR-R is a validated self-report instrument that assesses attachment anxiety and attachment avoidance in adults (Fraley et al., 2000). For all scales, validated German translations were used.

## **RESULTS**

## **DATA ANALYSIS PROCEDURE**

For thermal imaging data, temporal course of the temperature change was included for the statistical analyses. For heart rate and skin conductance levels, the arithmetic mean of the entire data within each experimental phase was computed and then described in detail for each couple.

Firstly, we tested whether experimental condition and participant status (i.e., patient, partner of the patient, and healthy controls) could determine the temporal change of the nose tip and forehead temperature. We applied hierarchical linear models with experimental condition, participant status and temporal course (i.e., number of frames) as fixed effects and the participant as random factor (Singer, 1998). In order to determine the specified characteristics of the temporal course for each participant we added a condition ∗ temporal course ∗ participant interaction term to the model. Individual temperature changes were estimated by analyzing each participant separately, and slopes for each condition were computed. For comparisons of slopes between patient and healthy controls, confidence intervals of each slope were computed.

We examined the relationship of emotion regulation and anger regulation with thermal changes by including the scores of the corresponding questionnaires (i.e., LEAS, TAS-20, and STAXI) as covariates in the model. We tested the influence of these psychological measures by introducing them as fixed effects. Additionally, we included a condition ∗ course ∗ psychological measure interaction in the model to allow condition specific analyses of their association with the temperature changes. Each psychological domain was tested separately in order to prevent possible effects due to multicollinearity. We did not include the attachment scores in the model due to missing data in Couple 1.

To examine the relationship of physiological processes between partners, based on a previous study (Ebisch et al., 2012), we performed Pearson correlation analyses for nasal tip and forehead temperature between partners for each condition.

In the following sections, firstly, the results of the statistical analyses are presented. Following that, for each couple, the results of heart rate, skin conductance levels, and subjective report measures are described in detail.

#### **TEMPORAL THERMAL CHANGES ON THE NOSE TIP AND FOREHEAD**

During the experiment, the average skin temperature of the nose tip was rising for all participants except for the patient's partner, whose nose tip temperature slightly decreased (see **Figures 2** and **3**). The forehead temperature didn't show a comparable pattern, and related observed changes were comparably small.

All temporal courses were significantly different for each participant for the whole session (see **Table 1**). This could be demonstrated for forehead temperature, too. The full model – again including all participants – confirmed individually different slopes for each condition and all patients. When we compared the slopes of the temperature change between subjects, we found that the forehead temperature of the patient increased significantly in anger and relaxation phases.

When the psychological factors (i.e., TAS-20, LEAS, STAXI) were included in the model, condition specific associations of these factors with the thermal variations were observed. Although the relationship of the psychological factors with the overall temperature was negligible, high associations were found between these psychological measures and condition specific temperature

belongs to Mr 1A (wears eye glasses) and the second to Ms 1B.

**FIGURE 3 | Graphical and pictorial representations of variations in the facial thermal imprints of Mr 2A and Ms 2B, respectively.** The first illustration belongs to Mr 2A and the second to Ms 2B.


**Table 1 |Temporal course of the changes in cutaneous temperature of the participants.**

\*Temperature change: degree centigrade per minute. \*\*Temperature changes were significantly higher for the patient, compared to the partners of the healthy couple.

changes (see **Table 2**). All the psychological factors were significantly associated with temperature changes in each condition, but not with the absolute temperature levels. Changes in the relaxation phase tended to be smaller compared to the initial phases. Higher

scores in STAXI and TAS were associated with more pronounced temperature changes. Likewise, lower scores in emotional awareness measured by LEAS were associated with greater temperature changes.


**Table 2 | Psychological predictors of change in nose tip temperature within experimental conditions.**

\*Regression coefficient b; mean temperature predicted by the respective psychological measure. \*\*Regression coefficient b; change in temperature per minute (degree C/min) predicted by the respective psychological measure. In all condition-specific associations with psychological measures: p < 0.001.

## **CORRELATION OF TEMPERATURE CHANGES BETWEEN PARTNERS**

STAXI-control 0,311 0.41 0,0012 −0,0029 −0,0067

Correlation analysis of nasal tip temperature of the dyads showed significant relationships at *p* < 0.01 (**Table 3**). As forehead temperature did not show much variance across phases, we did not include it in the analysis. At baseline, the nasal tip temperature was positively correlated between the partners of Couple 1 (patientpartner; *r* = 0.89), while for Couple 2 (healthy control-partner) no correlation was found. At anger phase, a positive correlation between the nose tip temperatures of partners of both Couple 1 and 2 was shown (*r* = 0.62 and 0.84, respectively) At relaxation phase a strong negative correlation between the nasal tip temperature of the first dyad (*r* = −0.71) and a weak one (*r* = 0.20) for the second dyad was found.

## **CASE-BASED ANALYSES**

## *Couple 1: patient and partner*

*Pain and psychological symptoms.* Mr 1A (patient) suffered from somatoform pain disorder. His pain encompassed a chronic widespread pain and local pain, which is elicited by stimuli that normally don't provoke pain (i.e., allodynia). The pain concentrated especially on his back, arms, legs, and joints that has strongly impaired his life for more than 5 years. In the last 2 weeks, he had very intense level of pain that had affected his overall activity, his work, as well as his relationships with others. His level of

**Table 3 | Correlation coefficients of the relationship between partners' nasal tip temperature during each experimental phase.**


\*p *<* 0.01.

affective pain, meaning his evaluative and emotional reaction to pain, was very high and fell within the 100th percentile of the normative sample of pain patients. On the other hand, his level of sensory pain, that is, his perceptual ratings of pain intensity fell within the 46, 2% of the normative pain patient sample. The patient described a moderate level of depressive state characterized by sadness, hopelessness, and little interest or joy in life. He took the medications of duloxetine (a serotonin-norepinephrine reuptake inhibitor), amitriptyline (a tricyclic antidepressant) and quetiapine (a short acting atypical antipsychotic).

Ms 1B did not report experiencing pain except a little pain in some body parts that affect her at a minimum level. She described her health as very good although she reported some general life stress to a little extent and some relationship difficulties with her partner.

*Emotion regulation reports.* The TAS-20 reports of Mr 1A classified him as alexithymic (raw score = 65) according to the cut-off scoring method, which indicated his difficulties in identifying and describing his feelings. Supporting this finding, his total level emotional awareness score (LEASsumscores = 47, *M* = 2.35) measured by LEAS put him around the 12th percentile of the healthy men sample (Subic-Wrana et al., 2001). This mean LEAS-total score corresponded to the range of scores of somatoform patients in a previous study (*M* = 1.93, SD = 0.58; Subic-Wrana et al., 2010). Moreover, according to a recent evaluation criterion of four item LEAS (Subic-Wrana et al., 2014), his mean score of LEAS not\_buttotal denoted his emotional awareness at an implicit level (i.e., a preconscious level of emotional awareness, that the affective arousal is expressed as bodily sensations or action tendency). Similarly, his mean scores for awareness of his own emotions (LEAS-self) and for other (LEAS-other) were 2.2, which again indicated an implicit level of emotional awareness (see **Table 4**, for subjective reports of the participants).

In terms of anger regulation, he reported high trait-anger, which means a general disposition to become angry (within the 99th percentile of the men sample). He reported expressing anger in a poorly controlled manner (99th percentile) or suppressing his anger (99th percentile). Yet, his expenditure of energy to monitor and control his anger was at a moderate to high level (70th percentile).

Ms 1B's TAS-20-based alexithymia score (raw score = 37) indicated her good ability to be aware of her feelings, and to identify and describe them. Similarly, her total LEAS score (LEAS sum scores <sup>=</sup> 0 59, *M* = 2.59) put her into the 35th percentile of women sample and almost on a level of explicit emotional awareness, indicating her ability to experience emotions consciously and express them verbally (Subic-Wrana et al., 2014). Interestingly, her mean LEAS-self (*M* = 2.85) and LEAS-other (2.25) scores were quite discrepant from each other compared to other participants of our study. Her LEAS-other score was almost at an implicit level of emotional awareness.

Her anger scales showed a moderate to high level of trait anger (75th percentile of the women sample). She reported a high tendency to suppress anger expression (80th percentile) and low-moderate tendency (50th percentile) to express anger in an outwardly negative and poorly controlled manner. She also


**Table 4 | Participants' scores for measures of emotion regulation and attachment styles.**

TAS-20, Toronto Alexithymia Scale-20; LEAS, Level of Emotional Awareness Scale ; STAXI, Spielberger Anger Expression Inventory; Experiences in Close Relationships-Revised (ECR-R).

reported a moderate to high (70th percentile) level of effort to monitor and regulate her anger.

as relaxed again with higher level pleasure and an average level of dominance.

*Heart rate (HR) and skin conductance levels (SCL).* The mean HR of Mr 1A, which was greater compared to his partner, was did not change much from baseline (*M* = 101.8, SD = 4.7, Min = 83.5, Max = 112.1) to anger (*M* = 101, SD = 5, Min = 83.5, Max = 110.5) but decreased at relaxation phase (*M* = 90.3, SD = 2.7, Min = 81.7, Max = 97.8) while the HR of Ms 1B remained relatively stable at almost all phases (Baseline: *M* = 71.3, SD = 5.4, Min = 56, Max = 86.2; Anger: *M* = 72.9, SD = 6.3, Min = 53.3, Max = 91.9; Relaxation: *M* = 69, SD = 6, Min = 56.9, Max = 101).

With regard to SCL, Mr 1A showed a slight increase from baseline (*M* = 3.9, SD = 0.1, Min = 3.7, Max = 4.7) to anger phase (*M* = 4.0, SD = 0.2, Min = 3.7, Max = 4.9) and then a decrease at relaxation phase (*M* = 3.8, SD = 0.2, Min = 3.4, Max = 6.5). On the other hand, Ms 1B showed a slight decrease from baseline (*M* = 3.5, SD = 0.2, Min = 3.05, Max = 4.04) to anger phase (*M* = 3.4, SD = 0.2, Min = 3.1, Max = 4.1), and a much pronounced decrease at relaxation phase (*M* = 3.03, SD = 0.5, Min = 2.68, Max = 5.69; see **Figure 4**).

## *State-affective experience*

Mr 1A reported a pronounced increase in pain experience at relaxation phase compared to other phases. In terms of experience of pleasure, arousal and dominance, he reported himself quite unhappy, a bit aroused and a bit being controlled at almost all phases, which did not show much variance (see **Table 5**, for the affective experience ratings for self and other). He evaluated his partner's affect similar to his own, as quite unhappy and a bit aroused. In terms of dominance, he reported Ms 1B as quite controlling at baseline, while his interpretation of her dominance decreased at anger and relaxation phase.

Ms 1B reported more differing affective experience for herself and her partner among phases. At baseline, she reported to be happy and her arousal and dominance were at low levels, while she reported feeling quite unhappy, a bit aroused and a bit controlled at anger phase. At relaxation phase, she reported feeling happy, relaxed and a bit controlled. Similar to her own ratings, she rated her partner also as quite happy, relaxed and controlled at baseline. At anger phase she appraised Mr 1A's affective experience also similar to her own, quite unhappy, a bit aroused but quite controlling. At relaxation phase, she also rated her partner

## *Case 2: healthy control and partner*

*Pain and psychological symptoms.* The couple presented no pronounced complaints about pain, depression, anxiety or stress. In addition, they described their health condition as very good. They did not report any chronic disease or use of medication.

*Emotion regulation reports.* According to the TAS scores, both partners reported a good ability in identifying and describing feelings and were in the range of no-alexithymia according to the cut-off scoring. Yet, Mr 2A had a slightly higher score than his partner (TAS 20 total raw scores were 44 and 38, respectively for Mr 2A and Ms 2B; see **Table 4**). Parallel with the TAS scores, the total level of emotional awareness of Mr 2A was around 45th percentile of the healthy man sample (LEAS-total raw score = 61). Yet, Mr 2A was the only participant who had lower LEAS-self score (*M* =2.45) than the LEAS-other (*M* = 2.65) score, which indicated almost an implicit level of emotional awareness. The LEAS-total score of Ms 2B was also consistent with her TAS score and she could be placed within the 52th percentile of healthy women sample. In addition her LEAS-self and -other scores counted her at an explicit level of emotional awareness (*M* = 3.1 and 3, respectively).

Regarding anger regulation, Mr 2A reported a high level of trait anger (80th percentile). His tendency to suppress anger, to express anger negatively and to try to control and modulate his anger was at moderate level (55, 55, and 60th percentile, respectively). On the other hand, Ms 2B had a moderate level of trait anger (55th percentile) and anger modulation (50th percentile), but a low level of suppressing anger (15th percentile) and expressing anger negatively (40th percentile).

According to the ECR-R, which assesses attachment styles, Ms 2B had a high score in anxious attachment style (*M* = 3.44), which was within the range of clinical sample in a previous validation study with German sample (*M* = 3.08, SD = 1.27, Ehrenthal et al., 2009). The ECR-R scores of Mr 2B were within range of healthy controls.

*Heart rate and skin conductance levels.* The mean HR of both partners decreased from baseline (Mr 2A: *M* = 77.6, SD = 6.3, Min = 60, Max = 101; Ms 2B: *M* = 112.6, SD = 5.2, Min = 96, Max = 120) to anger (Mr 2A: *M* = 70.4, SD = 6.5, Min = 56, Max = 89; Ms 2B: *M* = 100.4, SD = 6.5, Min = 76, Max = 112), and then to relaxation phases (Mr 2A: *M* = 66.8, SD = 5.6,

**Table 5 | Affective experience ratings of the participants for self and other.**


\*"Self" stands for participant's report about own affective experience. "Other" stands for participant's evaluation of partner's affective experience.

Min = 52, Max = 88.8; Ms 2B: *M* = 97.2, SD = 5.4, Min = 76.8, Max = 109.7). On the other hand, mean SCL increased in both partners from baseline (Mr 2A: *M* = 2.4, SD = 0.03, Min = 2.37, Max=2.53; Ms 2B:*M* =3.02, SD=0.02, Min=2.86, Max=3.76) to anger (Mr 2A: *M* = 2.50, SD = 0.04, Min = 2.38, Max = 2.59; Ms 2B: *M* = 3.25, SD = 0.17, Min = 3.01, Max = 3.7), with a more pronounced increase at relaxation phase (Mr 2A: *M* = 3.37, SD = 0.11, Min = 3.28, Max = 3.68; Ms 2B: *M* = 3.49, SD = 0.07, Min = 3.04, Max = 3.72; see **Figure 4**).

*State-affective experience.* Mr 2A reported his pleasure level to decrease at anger phase. Yet, his arousal and dominance levels did not vary much across phases, depicting almost a relaxed and dominant state. However, he appraised his partner's pleasure

and arousal quite changing and compatible with the experimental phases. He reported his partner's pleasure level as decreasing and arousal as increasing at anger phase and then vice versa at relaxation phase. Like in Couple 1, anger task was the only phase when he felt himself more dominant compared to his partner.

Ms 2B reported very few variances in terms of her own, and her partner's pleasure levels, remaining almost stable across phases. She reported herself and her partner feeling quite happy. On the other hand, she reported both herself and her partner a bit aroused at baseline and anger phases and then relaxed at relaxation phase. Consistent with her partner's appraisal, she felt being less dominant compared to her partner at anger phase.

## **DISCUSSION**

The theoretical accounts of SSD accentuate a network of bidirectional relationships between interpersonal interactions, emotion regulation and bodily disturbances (Waller and Scheidt, 2006; Henningsen et al., 2007; Subic-Wrana et al., 2010; Luyten et al., 2012). Despite this close linkage, there are only a few available studies having examined the real-time, affective interpersonal interactions of patients with SSD (e.g., Merten and Brunnhuber, 2004; Cano et al., 2008; Leong et al., 2011). These studies have shown that, both partners in an ongoing interaction reciprocally contribute to emotion regulation process, which becomes a precipitating and maintaining factor for the somatic symptoms. However, the literature is scarce of empirical research that have examined the coordination of multiple components of emotion (i.e., physiology, behavior, experience) of both parties in a real-time dyadic interaction.

In this case study, we aimed to examine how intra- and interpersonal emotion regulation at physiological and experiential levels is related to SSD. Previous studies suggest some kind of discordance between physiological, experiential and behavioral components of emotional process in SSD (Ditzen et al., 2008; Luyten et al., 2011; Pollatos et al., 2011b; Bondo-Lind et al., 2014; Okur, et al., in revision). In line with earlier studies, we proposed that the patient would present an intrapersonal incoherence among emotion response systems, characterized by higher autonomic activity but restricted affective experience compared to healthy controls. Moreover, trait emotion regulation patterns would affect the physiological changes during the affective interactions. At the interpersonal level, we predicted that, emotional incoherence would be more likely to be reciprocated by a complementary incoherence of emotional processing in the partner. This pattern would generate an interpersonal emotional incoherence represented by low correlations between partners in terms of physiological and experiential emotional processing.

In this paper, following an introduction of the accounts of emotion regulation in SSD, we presented an interpersonal experimental paradigm that included two case couples consisting of a patient with somatoform pain and his partner, and a couple of healthy controls. We chose anger and positive affect as central affects since these were reported to play particular roles in chronic pain (Fernandez and Turk, 1995; Zautra et al., 2005; van Middendorp et al., 2010). We measured participants' cutaneous temperature, heart rate, and skin conductance levels as imprints of autonomic activity during the interaction. Besides, we examined self-report and performance-based emotion regulation, affective experience and attachment styles of the participants. We investigated not only participants' own affect but also their perception of their partner's affective experiences.

The paradigm was successful in generating physiological and experiential changes in an ecologically valid and a structured interpersonal setting, which allowed for a dynamic emotional interaction. Trait emotion regulation, namely, alexithymia, level of emotional awareness and anger regulation predicted the course of cutaneous temperature changes across phases. The patient, his partner and the healthy couple showed some distinctive patterns of emotion regulation, as well. However, it should be noted that the results should be interpreted cautiously as we examined only two cases in this study.

The temporal analysis of the course of temperature changes on nose tip and forehead showed significant variances across phases, pointing to the effectiveness of experimental manipulation. Nasal tip temperature increased from baseline to relaxation in all participants except the patients' partner, whose nasal tip temperature slightly decreased. This regulation pattern of the patients' partner might suggest a complementary down-regulation of physiology in her interaction with the patient, who showed higher autonomic activity. In fact, as predicted, the patient showed higher stress responses as compared to his partner and healthy controls depicted by significant temperature increase on forehead in anger and relaxation phases. In addition, his mean SCL and HR were higher than his partner throughout the experimental phases. Such vigilant autonomic activity in SSD has been shown in previous studies, as well (Seignourel et al., 2007; Burns et al., 2008; Twiss et al., 2009; Luyten et al., 2011; Pollatos et al., 2011a,b).

Trait emotion regulation patterns also predicted the course of temperature changes. Higher alexithymia, increased anger regulation difficulties and lower scores in emotional awareness predicted higher changes in nasal tip temperature. This result supports the previous findings that have connected emotion regulation deficits with aberrant and higher physiological stress responses (Luyten et al., 2011; Pollatos et al., 2011a,b). Parallel with this finding, as expected, the patient had more restricted awareness and reflection to his own and others' emotions as well as high trait anger and poor anger regulation. The partner of the patient also showed a moderate to high level of trait anger. This prevailing angry feeling in both partners may reflect the contagious nature of affects in interpersonal interactions (Hatfield et al., 1993). The patient's affective pain, which indicates the evaluative and emotional reaction to pain, was also at a very high range although his sensory pain was at a moderate level. This illustrates that somatoform patients' affective appraisals regarding symptoms may contribute to the amplification of the symptoms (Hadjistavropoulos and Craig, 1994).

The second couple consisting of healthy partners showed indications of relatively enhanced emotion regulation. They both showed greater ability of being aware of, identifying and describing their own and other's emotions. However, some degree of trait anger existed in both partners' reports classifying Mr 2A as having high trait anger and a moderate level of anger regulation difficulties and Ms 2B as having a moderate level of trait anger.

The relationship of state affective experience and accompanying physiological changes were quite distinctive between participants. Although the patient's cutaneous temperature, HR and SCL showed noticeable variations across experimental phases, he reported quite stable and moderate level of arousal and pleasure, which were inconsistent with his higher autonomic reactivity. This discrepancy points to incoherence between his affective experience and somatic concomitants. In fact, the patient's high alexithymia and low emotional awareness scores could explain his restricted access to his feelings and accompanying autonomic changes. The subjective reports of Ms 1B, on the other hand, were, as expected, much more consistent with her physiological changes except for the baseline. The lack of consistency at baseline might be due to the possible performance stress at the beginning as well as her attempt to give a desired response suitable to a neutral baseline task.

For the partners of the control couple, the concordance of the subjective reports and physiological changes seemed to be superior than the patient to a certain extent. At anger phase, the nasal tip temperature and mean SCL increased in both partners, and they both reported a decrease in pleasure. Ms 2B reported that her arousal rose at anger phase, which was accompanied by a rise in nasal tip temperature and mean SCL although her mean HR declined. At relaxation phase, she reported lower arousal but her values of physiological imprints except her decreasing mean HR continued to increase. However, Mr 2A reported few changes in terms of arousal and pleasure, despite his declining mean HR and increasing mean SCL and thermal imprints from baseline to relaxation. Explaining this discordance, he scored low in LEAS-self subscale indicating some difficulties in consciously experiencing and describing his own emotions.

Analyses of interpersonal level of emotion regulation brought forward more multifaceted results than we proposed. The graphical trends of temporal changes in nasal tip temperature suggested discordance between the patient-partner dyads (Couple 1) and concordance in healthy control-partner dyads (Couple 2). However, correlation analysis of these temporal courses between partners, which are apparently more sensitive to the changes than visual inspection, suggested more concordance between the first dyad compared to the second one. At baseline, a positive correlation between nose tip temperatures of the partners was found only in the first couple. At anger phase, the partners of both Couple 1 and 2 presented strong positive correlations in nose tip temperature. At relaxation, only between the first couple, a strong negative correlation of temperature change was found. These findings might suggest a pattern of interpersonal emotion regulation in patients with SSD, which is quite the reverse of our predictions. The patient and his partner seem to show more interrelated change of temperature compared to the control couple.

The strong correlations of temperature between the first couple might be explained with the reciprocal nature of social interactions, which connotes the adaptive and complementary behavior of the interaction partner. It might be speculated that, by downregulating the physiological responses, the partner of the patient complemented the patient's higher autonomic activity and vice versa. In fact, the couple's affective reports for self and other lend some support to this complementarity. While the patient reported experiencing almost similar levels of pleasure and arousal, his partner reported experiencing more variance in these domains. Moreover, the patient underrated his partner's pleasure and arousal levels, while his partner overrated these affective experiences of him. The couple's poor performance on recognizing the other's affective experience was consistent with previous studies, which have reported emotion recognition difficulties in patients with SSD (Pedrosa Gil et al., 2008; Beck et al., 2013). Supporting these findings, both the patient and his partner had low scores in LEAS-other, which implies difficulties in understanding the other's emotions at an explicit level (Subic-Wrana et al., 2014).

The second couple with the healthy partners performed well in LEAS-other subscale, which implies a better ability of consciously recognizing the other's emotions compared to the first couple. They also performed relatively better in perceiving the trend of affective change in the partner. They correctly appraised each other's arousal to decrease at relaxation phase, and dominance to lessen at anger and rise at relaxation phases. Mr 2A was also accurate in perceiving the rise of his partner's arousal at anger although Ms 2B was not. The couple also could not accurately evaluate the changes in the other's feeling of pleasure. It seems that Mr 2A attributed some emotionality and fluctuating emotional responses to his partner. It may be speculated that the anxious attachment style of Ms 2B could contribute to her partner's attributions.

Our study has a number of limitations. Although our study demonstrates how embodied and intersubjective emotion models can be integrated into psychosomatic research, it involves only two cases and therefore provides scarce evidence for our hypotheses. Future research with greater sample size and robust statistical methods should examine the affective processes of interacting couples empirically. Secondly, despite previous recommendations (Mauss et al., 2005), in order not to interrupt the interactions of the couples, we could not include continuous measures of subjective experience. Thirdly, since we included only two case couples, we did not statistically analyze the continuous temporal changes of SCL and electrocardiography at within and between partners. Nevertheless, we demonstrated a tentative example to examine the relationship between emotion regulation and temporal course of cutaneous temperature changes at intra- and interpersonal levels. Forthcoming research should adopt statistical approaches with high temporal sensitivity (e.g., time sequence analysis, cross correlation analysis, actor independence models) in order to examine the course and coordination of multiple emotion response systems at these multi-levels (Hollenstein and Lanteigne, 2014). Likewise, we did not use observational measures of emotional interaction that allows for temporal analyses between observational and physiological data. We plan to employ observational measuresfor assessing emotion regulation and affective interactions in our ensuing study. Finally, future research should statistically control for sex differences and use of medication as they can have potential effects on emotional processing and physiology. Also, because factors, such as pain and alexithymia can be confounded with the patient status, ceiling or bottom effects are possible. Therefore causal assumptions should be made tentatively.

Our study illustrates the scientific yield of an embodied interpersonal paradigm for studying emotion regulation in SSD, in particular for regulation of anger and positive affect. An enhanced understanding of this intra- and interpersonally, and dynamically regulated phenomenon will provide potential for an optimized clinical regime and psychotherapy.

## **ACKNOWLEDGMENTS**

This work is supported by the Marie-Curie Initial Training Network, "TESIS: Toward an Embodied Science of InterSubjectivity" (FP7-PEOPLE-2010-ITN, 264828).

## **REFERENCES**


autonomic contagion. *Biol. Psychol.* 89, 123–129. doi: 10.1016/j.biopsycho.2011. 09.018


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 May 2014; accepted: 13 January 2015; published online: 10 February 2015.*

*Citation: Güney ZO, Sattel H, Cardone D and Merla A (2015) Assessing embodied interpersonal emotion regulation in somatic symptom disorders: a case study. Front. Psychol. 6:68. doi: 10.3389/fpsyg.2015.00068*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2015 Güney, Sattel, Cardone and Merla. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The intersubjective endeavor of psychopathology research: methodological reflections on a second-person perspective approach

# **Laura Galbusera<sup>1</sup>\* and Lisa Fellin<sup>2</sup>**

<sup>1</sup> Clinic for General Psychiatry, University of Heidelberg, Heidelberg, Germany <sup>2</sup> Division of Psychology, University of Northampton, Northampton, UK

#### **Edited by:**

Hanne De Jaegher, University of the Basque Country, Spain

#### **Reviewed by:**

Paul Verhaeghe, Ghent University, Belgium Paul Lysaker, Indiana University, USA

#### **\*Correspondence:**

Laura Galbusera, Clinic for General Psychiatry, University of Heidelberg, Vossstraße 2, 69115 Heidelberg, Germany

e-mail: 8laura.galbusera@gmail.com

Research in psychopathology may be considered as an intersubjective endeavor mainly concerned with understanding other minds. Thus, the way we conceive of social understanding influences how we do research in psychology in the first place. In this paper, we focus on psychopathology research as a paradigmatic case for this methodological issue, since the relation between the researcher and the object of study is characterized by a major component of "otherness." We critically review different methodologies in psychopathology research, highlighting their relation to different social cognition theories (the third-, first-, and second-person approaches). Hence we outline the methodological implications arising from each theoretical stance. Firstly, we critically discuss the dominant paradigm in psychopathology research, based on the Diagnostic and Statistical Manual of Mental Disorders (American Psychiatric Association, 2013) and on quantitative methodology, as an example of a third-person methodology. Secondly, we contrast this mainstream view with phenomenological psychopathology which—by rejecting the reductionist view exclusively focused on behavioral symptoms—takes consciousness as its main object of study: it therefore attempts to grasp patients' first-person experience. But how can we speak about a first-person perspective in psychopathology if the problem at stake is the experience of the other? How is it possible to understand the experience from "within," if the person who is having this experience is another? By addressing these issues, we critically explore the feasibility and usefulness of a secondperson methodology in psychopathology research. Notwithstanding the importance of methodological pluralism, we argue that a second-person perspective should inform the epistemology and methods of research in psychopathology, as it recognizes the fundamental circular and intersubjective construction of knowledge.

**Keywords: intersubjectivity, social understanding, psychopathology research, methodology, second-person perspective**

## **INTRODUCTION**

Psychology, as a discipline, is mainly concerned with knowing others' *minds*<sup>1</sup> . The problem of social cognition is therefore crucial to any psychological research enterprise, and the way we conceive social understanding influences the way we do research in psychology (Reddy, 2008). Questions regarding the possibility of understanding other persons, the way social understanding works and the influencing factors that play a role in this process are tightly related to epistemological and methodological issues such as: the validity of our claims in doing psychology research; the development of a proper methodology to understand our object of study; and the way we should frame and interpret our results according to the context in which they arise. These questions take a particular turn in the field of psychopathology research, where we do not only deal with other minds, but with minds especially experienced and constructed as *different*<sup>2</sup> , because of their distress and not-ordinary experience.

Psychopathology is a controversial field of research characterized by often very polarized views. Mainstream studies focus on "mental disorders," considered as inherent to individual "minds"

<sup>1</sup>*Mind* is here not conceived in the narrow sense of a computational mind or brain but rather, in a broader sense, as the subject matter of cognitive science. Similarly, we refer here to social understanding as an epistemological "problem," not in the narrow sense of a gap between persons that needs to be filled; irrespective of the theoretical framework we embrace, understanding others remains a complex phenomenon that needs to be better understood and therefore, in this broader sense, it does remain a dilemma.

<sup>2</sup>*Different* is here understood not in an ontological or moral sense but simply in a phenomenological sense: we perceive and experience others as different, especially in the case of what we categorize as psychopathology. Making sense, defining, or deconstructing this difference is the very starting point of psychopathology research.

(brains) suffering from psychological distress or impairment (or even biochemical or genetic deficits). At the opposite edge of this continuum, other approaches try to describe and understand the contextualized and embodied meaning of the distress (as for instance, phenomenological psychopathology) and look for its socio-relational and interpersonal features, origins, and functions<sup>3</sup> ; or the most radical positions (as for instance, anti/critical psychiatry and many systemic theorists) may even deny the existence of such a phenomenon as individual psychopathology in favor of a more social and relational understanding of distress<sup>4</sup> . Either way, psychopathology research has to deal with making sense of the experience of otherness, difference, and alterity, even when the aim is to deconstruct what is considered to be a labeling process. Before defining, classifying or constructing etiological theories, psychopathology research therefore needs to deal with the primary task of understanding others, but even more *different* others. The core methodological issue at stake here is therefore: how can we understand others, in their difference? To explore this question, it seems logical to bring together theories of social understanding with approaches to psychopathology research and this is what we do in the present paper.

Since decades the problem of social understanding has been at the core of the contemporary debate in the cognitive sciences. Different theories and frameworks have been proposed to account for this phenomenon and still the debate remains controversial (see, for instance, Gallagher, 2012; Dullstein, 2012). These theories look at how we understand others from three different perspectives: a third-person perspective (Theory-Theory), a firstperson perspective (Simulation-Theory), and a second-person perspective (e.g., Interaction Theory, IT). We critically review these perspectives on social cognition, highlighting and discussing their core claims. Though, it is important to stress that our aim is not to offer an additional contribution to the debate, but rather to take this very debate as a starting point for some methodological reflections relevant for psychology and the cognate disciplines. In particular, we look at the kinds of encounters that take place in the context of psychopathology research as a paradigmatic case of the methodological issue at stake; because here, as above mentioned, the relation between the researcher and the object of study is characterized by a major component of "otherness." Then, we highlight a number of methodological implications for psychopathology research that necessarily arise from our critical discussion of approaches to social cognition.

Although we generally acknowledge the need for methodological pluralism and we do not see these perspectives as mutually exclusive, we adopt a rather critical stance: in what follows, we will outline some shortcomings of the third- and first-person perspectives. If these shortcomings hold for a general theory of social understanding, then they should be relevant for those methodological issues in psychopathology research as well. Dixon (in Stanghellini, 2007) identified a dilemma inherent in the science of the mind, "Is the science of the mind in fact to be a science of the mind, or a science of something else, such as the brain or behaviour? Is it to be 'science by analogy' or 'physical science proper'?" (p. 69). The same question may be posed for a "science of suffering minds/souls" (from the Greek etymology of the word "psychopathology"). In exploring these questions, we will therefore point to a second person alternative as a promising (although not unproblematic) methodology of psychopathology research. Following Stanghellini's (2007) answer to Dixon's question:

The phenomenological perspective, and specially the secondperson mode, advocates that the context of the clinical encounter<sup>5</sup> should be one of co-presence (and not of dominance) with the aim of understanding (and not labelling), i.e. negotiating intersubjective constructs, and looking for meaningfulness through the bridging of two different horizons of meanings. (p. 70)

# **SOCIAL UNDERSTANDING AND METHODOLOGY IN PSYCHOPATHOLOGY: FROM A THIRD-PERSON PERSPECTIVE TO A FIRST-PERSON PERSPECTIVE**

## **THE THIRD-PERSON APPROACH OF MAINSTREAM PSYCHOPATHOLOGY RESEARCH**

Research in psychopathology mainly focuses on understanding the causes, correlates, and consequences of psychological disorders. It is commonly based on the *Diagnostic and Statistical Manual of Mental Disorders* (DSM; American Psychiatric Association, 2013) and its diagnostic categories, consisting mainly in lists of symptoms. Despite its claim of a-theoreticity (since its third edition), the DSM diagnostic system—and therefore mainstream research in psychopathology—relies on an epistemology of logical empiricism and on physicalist ontology (Schwartz and Wiggins, 1986; Parnas and Bovet, 1995; Parnas et al., 2013; Parnas and Gallagher, in press). Symptoms and mental states are reified and seen as ontologically independent atomic entities, as material thing-like objects. Psychological reality is constructed "out there," independent of any human perspective, as if it could be known objectively through empirical observation (e.g., medical test) and logical thinking by an "external objective expert" (Parnas and Bovet, 1995; Parnas et al., 2013). By a deceptive mistake that Husserl (1970) would call "naive objectivism" of the life sciences [or, a "game of semantics" in Timimi (2013) words], symptoms

<sup>3</sup>The idea that psychopathology can be assimilated to a "science of meaning" was originally formulated by Guidano (1991, p. 59) and is at the core of Ugazio's (1998, 2012/2013) socio-constructionist model. This model lays the foundations for a systemic theory of personality development that explains the transition from "normalcy" to psychopathology by the reciprocal positioning that the individual and the persons meaningful to him/her take within the critical meaning.

<sup>4</sup>According to these more critical stances, psychopathology is a stigmatizing label used to categorize, reify and medicalize the diversity and alterity of the other with the main outcome of pathologizing, alienating, and segregating them (see, for example—among the many—the criticisms put forward by classic authors such as Goffman, Laing, Foucault, Szasz, and more recently, Newnes, Parker, Timimi).

<sup>5</sup>Despite the fact that clinical and research encounters are two different joint hermeneutic endeavors and thus have many different characteristics and features, they also share many epistemological and methodological aspects and dilemmas (e.g., power and knowledge gap, reciprocal positioning), especially if we consider more recent collaborative–participative research designs employing in-depth qualitative and creative interview methods.

become much more than descriptive constructions: they are reified providing the illusion that the disorder itself exists as a natural object. This easily leads to etiological theories that link mental distress to supposed biochemical or genetic causes (and therefore, mostly pharmaceutical interventions). In this approach to diagnosis, individuals are equated to their diagnostic label and therefore stigmatized or even alienated and dehumanized. To borrow Simblett's (2013) words, it is possible "to understand DSM as a textual codification of power/knowledge that creates a version of reality, individuality and what is known about the nature of mental illness. But only one possible version" (p. 116). Despite attempts to unpack and deconstruct this discourse, revisions to DSM are concerned almost exclusively with its criteria and thresholds. This also shapes and limits the field of possible research in psychopathology: only approaches constrained within the boundaries of mainstream research are viable, therefore reinforcing its power and the knowledge imbalance in the research encounter (Irarrázaval and Sharim, 2014).

With some notable exceptions of systemic, psychodynamic, constructivist, and phenomenological authors, psychopathology research is mainly based on quantitative methods: symptoms, mental states, performances, personality traits, or neurological features (etc.) are operationalized as measurable variables to be statistically correlated with specific diagnoses (Sher and Trull, 1996). The source of diagnostic data are mainly structured interviews (or even self-reports) which limit the person's freedom of expression by severely restricting their possible responses. They are based on the same epistemology: de-contextualizing and fragmenting the other's experience into a list of internal mental states and external behaviors that may be counted as present or absent or rated by their intensity/severity. This in order to fit these behaviors and mental states into the rigid and prepacked diagnostic classification, which is then as well treated as a variable for research purposes. The attention is mainly on the verbal and cognitive level, the experience is dis-embodied and de-contextualized rather than socially situated (Cromby, 2012).

If we look at this research paradigm from the view of cognitive science we may notice many parallels with the Theory of Mind theory (TT) of social understanding also referred to as a thirdperson approach. The TT is based on the following main assumptions: others' mental states<sup>6</sup> are hidden, we do not have direct access to them (mind–mind gap or "inner world hypothesis"), and we therefore need some extra cognitive processes in order to infer the mental state of the other (mentalizing supposition)<sup>7</sup> . In inferring and theorizing about other minds we need to refer to common sense, i.e., folk psychological theories about how mental states (beliefs, desires, intentions) inform the behaviors of others (Malle, 2004). Observation becomes the evidence for theorizing and this constitutes our everyday stance toward others (spectatorial supposition): we always observe others' behavior with some degree of detachment, trying to infer their mental states from a third personal stance (Gallagher, 2001).

If we now look back at the mainstream methodology in psychopathology research we may notice similar assumptions at the basis of this paradigm. Variables such as symptoms, behaviors, performances are considered as an objective reality that can be observed by a detached researcher (expert); the mental states of other persons are often inferred from behavioral cues or even neurological features according to already existing theories. Therefore, this kind of approach may commit several errors and take recourse to biases, as widely illustrated by decades of attributional research.

The experience of the other person is usually directly accessed (or "assessed") by the expert position through structured interviews, where the researcher is considered as detached from the patient and his task is to infer the patient's state of mind, which is assumed as objective and a-contextual. A paradox seems to be entailed in this approach, as Reddy (2008) pointed out in her criticism of TT: if, on one hand, we adopt an empiricist view where the only way to know things is through experience given to our senses—and, on the other hand, we claim that other minds are not accessible to our experience but rather hidden behind behaviors (mind–mind gap), knowing other minds results in a logical impossibility. Similarly, how can we claim to adopt an empiricist methodology if the object of study (patient's mental states) needs to be inferred?

# **REDISCOVERING SUBJECTIVITY AND PATIENTS' FIRST PERSON PERSPECTIVE**

Parnas et al. (2013) strongly criticize the third-person approach in psychopathology research starting from a critical discussion of its subject matter: "The object of psychopathology is the 'conscious<sup>8</sup> psychic event', and psychopathology involves and requires an in-depth study of experience and subjectivity" (p. 271). They stress the importance of focusing on first-person experience and subjectivity in the study of psychopathology, without denying the usefulness and necessity of a methodological pluralism. In fact, although useful for understanding mental phenomena, the study of neural substrates, behavioral descriptions and task performances always assumes its relevance in relation to the conscious level: the researcher's interest is never in brain events and behaviors *per se* but in their relationship with mental phenomena and experience (Nordgaard et al., 2012). The call for a firstperson approach (Parnas et al., 2013) therefore not only points to the fact that first-person experience should be the object of study for psychopathology research—ultimately, the third-person approach is an attempt to understand others' experience as well, even if in a reductionist and fragmented way—the focus is rather on the nature of this object and on how it can be understood. The issue at stake is thus primarily ontological and epistemological.

Embracing a phenomenological view, consciousness cannot be ontologically considered as atomistic in nature because it is

<sup>6</sup> In the TT approach, minds and mental states are considered in a "software– hardware" relation to the body (Thompson, 2007); cognitive processes therefore happen within Cartesian minds, conceived as radically separated from the body (body–mind dualism) and from other minds (Western individualism). <sup>7</sup>According to TT, these mentalizing processes constitute our primary and pervasive way of understanding others (supposition of universality).

<sup>8</sup>Here the concept of consciousness is not meant in a psychodynamic sense, i.e., as opposed to the unconscious, but in a phenomenological sense: see quote from Parnas et al. (2013) below for a more detailed definition.

an ever-changing flow of mutually interdependent phenomena. Consciousness is a Gestalt, a meaningful whole that cannot be reduced to an aggregate of parts, a sum of "mental objects." Symptoms and behaviors are not meaningless entities from which we can infer hypotheses about mentality and create a-contextual definitions and quantifications; they always have a meaning<sup>9</sup> that derives from the total state of consciousness, embodied and embedded in the environment. Therefore, as maintained by Parnas et al. (2013):

It is crucial to understand phenomenal consciousness (subjectivity) as the overall field, ground, or horizon within which all "manifestation" or "presencing" of the objects of our awareness occurs. Consciousness, the phenomenal manifestation of thoughts, feelings, and perceptions, is not some kind of complex spatial, 3 dimensional object but a lived reality, a presence to itself and the world: "psyche," writes Jaspers, is "not […] an object (…) but 'being in one's own world,' the integrating of an inner and outer world." Consciousness manifests itself as a "becoming," a temporal "streaming" of a unity of intertwined experiences. (p. 274)

The phenomenological method introduced in psychiatry by Jaspers and other influential psychiatrists, such as Binswanger, Minkowski, and Blankenburg (and then expanded toward a more interpersonal perspective by Laing), represented the primary instrument for investigating and describing the first-person experience of patients (Bürgy, 2008). It is therefore often referred to as a first-person approach, mainly because of the clear shift in the consideration of the psychiatric object: the focus is on consciousness as a whole, grasped through an in-depth exploration of patient's first-person experience (Fuchs, 2010). As Parnas et al. (2013) maintained, notions such as self, ownership, reality, rationality, etc. are of core importance for psychopathology research, therefore rendering it necessary to focus on subjectivity and the first-person perspective.

Yet, epistemologically, one may wonder what a first-person approach might mean in a context where the object of study is actually the experience of the other. As Parnas et al. (2013) recognized: "(…) A second domain concerns how and to what extent is a psychiatrist able to access the patient's mind and reconstruct his experience" (p. 274). When shifting from the issue of the object to that of the method, can we still speak of a firstperson perspective? What does it mean then, to have a first person understanding of other minds? And how should we therefore conceive a first-person methodology? In what follows, we will try to clarify this issue and shed some light on terminology by referring back to the social cognition debate.

## **A FIRST PERSON METHODOLOGY FOR UNDERSTANDING OTHERS**

The first-person perspective on social understanding in cognitive science has been defended by the Simulation Theory (ST). Although sharing the same basic assumptions of TT, ST differs from the latter in the way the gap between two minds is filled. As for TT, others' mental states are considered as hidden: we lack direct access to them and our everyday stance toward the other is still an observational one. The problem is therefore still framed in the same way: how, when observing others, can we figure out their hidden mental states?

Instead of inferring mental states on the basis of folk psychological theories, ST claims that we need to simulate within ourselves the mental state of the other, as if we were them, in order to understand it.

In philosophy, this process has also been called the argument from analogy: by analogy with our own experience, we infer that other bodies must also experience the same sort of mental states that we have.

Although some (e.g., Goldman, 2005) conceive of this process of simulation as a mentalizing one, other approaches (Implicit Simulation Theories) maintain that we implicitly attune with others at much more basic levels. For instance, drawing on the neurological basis of mirror neurons, some ST proponents claim that through the implicit recognition of similarity between our actions, we are immediately able to reproduce the mental state of the other person when we see the action they perform (Gallese and Goldman, 1998).

Within the social cognition debate, ST has already been widely criticized under many aspects. Gallagher (2012), for instance, pointed out the contradiction in putting the very notion of simulation at the basis of social understanding:

One can see the starting problem clearly, for example, in Goldman's description of the first step involved in running a simulation routine. "First, the attributor creates in herself pretend states intended to match those of the target. In other words, the attributor attempts to put herself in the target's 'mental shoes"' (Goldman, 2005, p. 80). This first step seems tricky. How do I know which pretend state (belief or desire) matches what the other person has in mind. Indeed, isn't this what simulation is supposed to deliver? If I already know what state matches the target, then the problem, as defined by ST, is already solved. (p. 207)

As we will mention later on, while describing Gallagher's own theoretical proposal for social understanding, what he finds missing in first-person accounts is the recognition of contextual knowledge and interactive processes as necessary and constitutive parts of social understanding. Reddy (2008) further argued that a ST of understanding does not even solve the problem of the gap between two minds, as it basically relies on an overgeneralization of one case (one's own experience). Although in ST the focus is more on experiencing than on theorizing, the experience on which I base my knowledge of the other can only be my own: it is still an attribution based upon the self (Reddy, 2008).

The argument from analogy for explaining social understanding is considered controversial in the phenomenological literature; as we will contend in the next section, since Husserl's understanding of empathy as the primary mode of social understanding, it is clear that phenomenological theories are rather coherent with a second personal mode. Although a simulationist understanding of empathy, as an "as if " awareness of the other

<sup>9</sup>And also interpersonal functions or purposes, according to a more systemic– relational perspective.

person, has been repeatedly rejected in phenomenological theories, it can still in some cases inform the methodology of phenomenological psychiatry, which is nowadays sometimes referred to as first personal in this sense (Stanghellini, 2007, 2010; Fuchs, 2010).

For instance, in Jaspers' (1997) *General Psychopathology*<sup>10</sup> the process of understanding the patient has often been described as an "imaginative actualizing" of the other's experience (Fuchs, 2010; Wiggins and Schwartz, 2013): in order to understand others, we need to relive (*nachleben*) in ourselves their experiences. Starting from the assumption that the best evidence of mental life is self-reflection, the best way to access what cannot be immediately present to us (others' experience) is to make it present through a process of imaginative identification (Wiggins and Schwartz, 1997, 2013). Therefore, by intuitively representing the other's psychic states, we can grasp what it is like to be like him/her: a transpositional movement that actually follows the structure of analogy (Stanghellini, 2007). This process of empathically putting oneself in the other's place in order to understand him/her, presupposes a "bracketing" of one's own assumptions and prejudices, in order to get as close as possible to the original experience of the other. Although we acknowledge the importance of this methodological step, the epistemological concern related to a first-person methodology (as for the criticism of a ST of social cognition) is whether I am projecting my own experiences onto the other, which may go with the risk of transforming understanding into mere speculation (Stanghellini, 2007; Wiggins and Schwartz, 2013), or determining, rather than understanding, the other (Reddy, 2008).

This leads us to the exploration of what has been proposed as an alternative in cognitive science: a second-person perspective. Before entering into the core of the methodological discussion on this regard, it is worth looking at how, in the cognitive sciences, this approach has been defined and constructed through different contributions. We will do this in the following section in order to move, in Section "Methodological Implications for a Second-Person Psychopathology," to the methodological discussion, where we draw some methodological implications for psychopathology research directly from each main claim of the second-person approach in cognitive science.

# **A SECOND-PERSON APPROACH TO UNDERSTANDING OTHERS**

The second-person approach offers an alternative explanation of social cognition based on a firm refusal of the body–mind gap and the mind–mind gap. It is often referred to as Interaction Theory (Gallagher, 2001), which draws on a phenomenological understanding of social cognition. Nevertheless, different authors contributed to defining this perspective, rendering it more elaborate and complex.

Phenomenological approaches challenge the basic assumptions of TT and ST, emphasize the role of the body in the processes of human understanding, and refuse the Cartesian dualism of body and mind: the basis for understanding lies already in the pre-reflective intentional connection between bodies; personal emotions and intentions are already present in any expressive behavior, which is therefore considered as meaningful from the very start (Thompson, 2007; Gallagher, 2001). Coherently with this perspective, Gallagher (2008b) notion of direct perception refuses the mind–mind gap (and therefore the mentalizing supposition) by claiming that other minds are directly perceivable in interaction: we can see grief or fear in the expression of another person without the need to infer or theorize. Perception is "smart": when perceiving we already grasp the meaning of things in relation to us and our possibilities for action and response; this constitutes the basis of social understanding, which therefore mostly happens already at the pre-reflective, embodied level (Gallagher, 2008b). As Reddy (2008) reformulates it, emphasizing the role of emotional engagement, "we see feelingly."

The idea that we need to develop a Theory of Mind (ToM) in order to understand others is challenged from a developmental perspective as well, as early processes of embodied intersubjective understanding have been shown to be already present during the first months of infants' life (Trevarthen and Hubley, 1978; Trevarthen, 1979; Fivaz-Depeursinge and Corboz-Warnery, 1999; Reddy, 2008) and even in newborns (see Fivaz-Depeursinge and Philipp, 2014). This evidence stresses the role of emotional and pre-reflective engagement in social cognition (Reddy and Morris, 2004): the baby's world is non-verbal. Developmental studies have clearly shown that infants learn to understand others, not via mindreading other persons' independent qualities but through interactive engagement with them; for instance, the rhythmic attunement between the mother and the baby during breastfeeding is crucial for developing mind and communication (Kaye, 1982; Trevarthen and Aitken, 2001).

Another core assumption of the second personal stance lies in the refusal of the spectatorial supposition: we understand others in our everyday interactions with them, in the perception– action loops in which we are directly involved when interacting (Gallagher, 2008b). To believe that social cognition is based on an observational stance where we try to figure out the mental states of others as detached scientists does not do justice to our social reality. A second-person approach recognizes the intrinsic circularity of knowledge as a situated practice: "what we know of other minds must depend on our engagements with them, but these engagements must depend on what we know of them" (Reddy, 2008, pp. 31–32).

The process of social understanding cannot therefore be resolved by the sole effort of one person but it arises in the in-between of interaction, it is constituted by social interaction and shaped by emotional engagement (De Jaegher and Di Paolo, 2007; De Jaegher et al., 2010; Schilbach et al., 2013); moreover, as it takes two to tango, in order for an interaction to happen, the autonomy of the two interactors needs to be maintained (De Jaegher and Di Paolo, 2007).

With the concept of participatory sense-making (PSM) De Jaegher and Di Paolo (2007) emphasize the constitutive role of the

<sup>10</sup>We are aware that the view presented here is just one possible interpretation of Jaspers' General Psychopathology. Indeed, our aim here is not to engage in a critical discussion of Jasper's work or to identify this particular reading with phenomenological psychopathology in general, but to present an example of what a first-person methodology in psychopathology research may mean.

interaction for social understanding, an aspect that has become a ground for criticizing and integrating Gallagher's IT<sup>11</sup> (De Jaegher and Froese, 2009; De Jaegher et al., 2010). In Reddy (2008) words: engagement in the interaction does not only provide information about minds but creates them. A similar stance is taken by Ugazio's (1998, 2012/2013) constructionist approach, where she claims that conversational processes not only are the context in which individual identities develop but they are what constitutes them in the first place.

By stressing the importance of social interaction, it is, however, necessary to note that a second personal stance is not just a social constellation, the mere use of the "you," rather being an attitude of openness that involves the recognition and acknowledgment of the other as a person; it requires that we directly address the other as someone that can respond and understand (Reddy, 2008). Drawing on Buber's (1937) distinction between *I– Thou* relationship (second personal stance) and *I–It* relationship (third personal stance), Reddy (2008) notices that, even when interacting with someone, we may still regard him or her as an object, an instance of a category; with this stance, we do not take seriously the ongoing interaction and we actually remain in an observational, detached position.

As Fuchs (2012) also emphasizes, drawing on phenomenologists such as Husserl and Scheler, what distinguishes object perception and the perception of another person lies in a radically different attitude toward the object. Object perception is an enactive and dynamic process in which we immediately perceive things according to their affordances, for their predictability and the possibilities of action they offer to us<sup>12</sup> (Gibson, 1979). However, when we are directed toward other persons, our perception is not just driven by Gibsonian affordances, we relate to others in a "personalistic attitude," which means, we engage and resonate with them and we are responsive to their behavior, emotions, and intentions (Fuchs, 2012). Engagement, resonance, and responsiveness are therefore core defining aspects of a second-person perspective.

Importantly, this attitude toward others implies not only the recognition of similarity (as it is stressed by a first-person, simulationist approach), but also the acknowledgment of difference. In fact, in order to experience the other as a particular other to whom we are responsive, we need to recognize his or her difference from us, otherwise we would simply reduce their experience to our own (Reddy, 2008). This becomes clear from a phenomenological point of view (Zahavi, 2010, 2011) when considering the notion of empathy, which is regarded as constituting the core of prereflective social understanding. Empathy grounds an unmediated and non-inferential access to others' experience; still it differs from the direct experiential access we have to our own mind (the focus is on the other, not on ourselves or on what it would be like to be in the other's place). The notion of empathy, as it is understood in phenomenology, is truly second personal: we encounter others as embodied subjects, we are able to empathically grasp their experience, and still, the experience we make of them is different from their original experience (Zahavi, 2010) <sup>13</sup>. Indeed, as noted by Murray and Holmes (2014):

Husserl (1989: 170–180) characterizes intersubjectivity as Einfühlung—empathy—and Heidegger (1962: 153–163) writes of an ontological or prepredicative Mit-sein—'being-with' others—a hyphenated formulation that points to the prereflective experiential inseparability of these terms. "He [sic] who speaks enters into a system of relations which presuppose his presence and at the same time make him open and vulnerable" (Merleau-Ponty 1973: 17). (p. 13)

The different contributions to a second-person approach mainly focus on the pre-reflective, implicit level of experience, on the way we intuitively grasp the others' state of mind by engaging with them in here and now encounters. Refusing TT and ST suppositions of universality, the second-person approach therefore maintains that social understanding happens primarily at the embodied, pre-reflective level of experience; as Fuchs and De Jaegher (2009) called it, it is based on a dynamic process of "mutual incorporation."

In this regards, Dullstein (2012) critically pointed out that the second-person approach, by emphasizing the role of pre-reflective processes of understanding, may not yet provide an answer to the problem of how we actually understand others' mental states. In her comment on Reddy's and Gallagher's theory, she questions the extent to which these theories explain the phenomenon of social understanding, as it is conceived in the cognitive science debate. As she stated for Reddy's account:

The phenomena Reddy points to are well-known and hard to deny. Emotions do shape the way we experience each other. But the question is as to whether these phenomena help us to give new answers to the questions which the ToM debate is about: Do they allow us to acquire knowledge about the other's feelings or beliefs? (p. 236)

Similarly, she criticizes Gallagher for confusing two different notions of understanding: namely, understanding others in terms of their mental states and understanding as basically engaging or interacting. Although engagement and interaction are important and constitutive for social understanding, they cannot be confused with it; contrary to what Gallagher (2008b) claimed, social cognition is not the same as social interaction (Dullstein, 2012).

These questions are particularly relevant for the issue at stake in this paper; in fact, although (as we shall later argue) the interaction and engagement with research participants is of core importance for a methodological reflection, the research enterprise in psychopathology aims at understanding patients' meanings, beliefs, motives, and not just at empathically grasping them.

<sup>11</sup>As Schilbach et al., 2013 also noted, in fact, Gallagher's initial notion of direct perception may still fall into an observer epistemology: knowing other minds means perceiving them.

<sup>12</sup>Indeed Gallagher (2008b) maintained that perception is "smart" even when directed to material things: "I do not see red mass, shape, and color, and then try to piece all of that together to make it up to my car. I simply and directly see my car. (…) I see the car not just as some object among others, but as an object that I can use—that I can climb into and drive." (pp. 356–357)

<sup>13</sup>Upholding the recognition of an irreducible otherness, the concept of empathy cannot be clashed with a first person, simulationist approach. For a more detailed discussion on the topic, see Jardine, J. (forthcoming).

## As Zahavi (2010) clearly outlines, drawing on Schutz's insights:

Although on Schutz's view it is permissible to say that certain aspects of the other's consciousness, such as his joy, sorrow, pain, shame, pleading, love, rage and threats, are given to us directly and non-inferentially, he denies that it should follow from the fact that we can intuit these surface attitudes that we also have a direct access to the *why* of such feelings. But when we speak of understanding (the psychological life of) others, what we mean is precisely that we understand what others are up to, why they are doing what they are doing, and what that means to them. To put it differently, interpersonal understanding crucially involves an understanding of the actions of others, of their whys, meanings and motives. And in order to uncover these aspects, it is not sufficient simply to observe expressive movements and actions, we also have to rely on interpretation, we also have to draw on a highly structured context of meaning (Zahavi, 2010, p. 297).

By emphasizing the role of pre-reflective understanding, in which we can transparently grasp intentions and emotions of others, most exponents of the second-person approach (Gallagher, 2008b; Fuchs and De Jaegher, 2009; Fuchs, 2012) see this intersubjective endeavor as mostly unambiguous: "in our everyday engagements we do not constantly go around trying to solve puzzles" (Gallagher, 2008a, p. 169). However, they do not deny that behavior may actually become ambiguous in many situations and in these cases, since we cannot rely on primary embodied understanding, we need to start reflecting on the other's mental states, motives, and intentions. This is the place where TT and ST still play a role in understanding: we may in fact need to assume a more detached stance toward others and try to infer or simulate their mental states in order to understand them (Gallagher, 2008a,b; Fuchs, 2012).

Ratcliffe (2006) argued against the need to go back to a first- or third-person perspective in order to explain higher level processes of understanding: "all instances of interpersonal understanding are interactive. A wholly detached, theoretical I-he/she/it stance is something that is never adopted towards persons. Even third person stances are interactive and should not be identified with the impersonal stance of scientific enquiry" (p. 42; see also Di Paolo and and De Jaegher, 2012). Taking seriously the constitutive role of the interaction process, which is one of the core assumptions of the second-person approach, Ratcliffe (2006) denies that even more reflective processes of understanding may be seen as a person attributing mental states or unidirectionally interpreting another person: "B is not just interpreted by A but is also constitutive of the process through which A interprets A, B and the relationship between them" (p. 40)14. Therefore, social cognition should be rather seen as a collaborative enterprise of mutual understanding about the persons involved, their beliefs, their experiences, and emotions (Dullstein, 2012). This process could be described, at the linguistic conversational level, as Gadamer's (2004) hermeneutic circle: a mutual agreement, co-constructed in the interaction, on an object, which in this case is one of the persons involved. Similarly, at the implicit level, the same process may be understood, with Waldenfels (1979) as a mutual tuning of the two partners involved, as it happens, for example, in caregiver–infant proto-conversations (Dullstein, 2012).

As it is clear in Zahavi's (2010) words, for understanding others we rely not only on pre-reflective processes of perception in the here and now encounter but also on interpretation and on "highly structured contexts of meaning" (p. 297). Social understanding and meaning-making do not happen in a vacuum: according to the British anthropologist and cyberneticist Bateson (1979), "without context, words and actions have no meaning at all" (p. 15). Therefore, depending on the context we are in, our behaviors, beliefs, and the meaning we attribute to our own and other people's experiences and relationships may vary; and thus we may position ourselves and be positioned by others in different ways. Cronen et al. (1982) in the Coordinated Management of Meaning theory (CMM) showed how, in the context of the here and now situation, different levels of meaning intertwine and coordinate in a mutual interaction with others: starting from the episode and going up to the personal history, the history of the relationship and the cultural framework. All these aspects play a constitutive role in social understanding and come into play in every social encounter.

# **METHODOLOGICAL IMPLICATIONS FOR A SECOND-PERSON PSYCHOPATHOLOGY**

If we adopt a second-person perspective in understanding social cognition, what are the implications for the particular kind of interaction that is the focus of this paper, namely the relation between a researcher and a person presenting with a psychopathology? How may the insights coming from the social cognition debate enlighten the methodological process of research in psychopathology?

If we start from the last (and strongest) claim by Ratcliffe (2006), that any kind of interpersonal understanding is always constituted and influenced by the interaction in which it arises, we may first start to see that the research process is not as linear as it would seem. There is no epistemic subject (the researcher) gathering information about an epistemic object (the patient), but a dynamic process of sense-making in which both participants, as well as the interaction and its context (or setting), have a constitutive (although different) role. Interpersonal understanding conceived as a collaborative enterprise points to the active role of research participants in the constitution of knowledge and to the relational nature of the elicited data; even in experimental studies in psychology, participants' behavior is always an answer to a question posed by the researcher (Rommetveit, 2003). Indeed, especially in psychopathology research, one needs to acknowledge that patients are not passive objects to be analyzed but, according to a second-person approach, they always contribute to the process of understanding. As Rommetveit (2003) puts it: "Coauthorship of psychological theory on the part of the human informant is an epistemologically unique and distinctive feature of the psychology of the second person as a communicative genre" (p. 212).

<sup>14</sup>Ratcliffe's claim touches upon the core underpinning of social constructionism—although phenomenology differs from social constructionism in its ontological and epistemological claims—that is, the role of conversational processes as constitutive for social understanding; these claims are therefore also tightly linked to ideas of circular causality as put forward by cybernetics and systemic thinking.

These considerations necessarily raise the issue of validity in psychopathology research: are our descriptions and theories actually about what we claim to be the object of our research (i.e., the patient's experience)? If, as Rommetveit (2003) claims, psychology is a communicative genre, the data we elicit always contain information not only about the other, but also about ourselves. Moreover, drawing on Reddy (2008) account, we may push this argument even beyond the level of communication into the very pre-reflective process of perception:

Our perceptual experience of another person's frown or smile or tears, therefore, must always include in it our proprioceptive experience of our own bodily state and, most importantly, our affective and motivational state. Conversely, our proprioceptive experience of our own acts and reactions and feelings always involves the perception of what relevant others are doing, saying or feeling. As the psychologist John Shotter put it, there is a constant intertwining and intermingling of the two (p. 30).

Although Reddy (2008) argues that within active emotional engagement this link between proprioceptive experience (of the self and of self-feelings-for-the other) and perceptive experience (of other-feelings-for-the-self) is much tighter than in uninvolved observation, she also reckons that this intertwinement still happens even in more disengaged stances. Methodologically, it is therefore necessary to acknowledge this link and, for the sake of validity, it is important to find ways to disentangle it.

In contrast to quantitative research methods that postulate the neutral observational position of the researcher, qualitative methods in psychology (and therefore in psychopathology research) acknowledge reflexivity: that is, the researcher, in gathering the data and producing the analysis, is always a constitutive and influencing part of the research process (Dallos and Vetere, 2005; Lyons and Coyle, 2007). This is a core methodological concern in qualitative research that is dealt with through different strategies: going from, as it is common for all qualitative methods, an explicit consideration of the researcher "speaking position" (i.e., the epistemological framework); up to finer techniques that allow a thoughtful inclusion of the researcher's feelings, impressions and assumptions in the analysis process (as it is common to, e.g., Interpretative Phenomenological Analysis, Grounded Theory, or Narrative Analysis); and finally in actual cooperative (or co-authoring) research designs where the participant becomes actively involved in the process of validity check, for example, through respondent validation (Dallos and Vetere, 2005; Lyons and Coyle, 2007) 15 .

We consider the use of reflective practice, in its different forms and techniques, a very important methodological step for the research process. Reflecting on one's own theoretical assumptions and research questions but also on one's own personal motivation and personal history is a way of acknowledging the very intersubjective aspect of the research endeavor which does include the researcher as a constitutive part of it. Di Maggio et al. (2008) have interestingly maintained that autobiographical memory plays an important role in understanding others (especially with dissimilar others) and they therefore suggested that selfreflection may enhance the possibility and accurateness of social understanding<sup>16</sup> .

Another methodological implication of a second-person perspective, which again seems to be coherent with qualitative research methods, has to do with idiography. As already briefly mentioned in the previous section, Reddy (2008) highlighted the importance of acknowledging the other for his or her difference, avoiding reducing the other to a category or to his/her similarity to ourselves. From a second-person perspective, we see the other as a particular other:

A second person perspective pluralizes the other: there is no such 'the other' but different others depending of different degree and type of engagements. Engagement in the second person allows us to experience others within our emotional responses to them as particular others—an experiencing which is more than simply a recognition of their similarity to ourselves. (p. 27)

Similarly, idiography is concerned with the particular person: in contrast to nomothetic approaches, which are rather concerned with making claims at the population level and demonstrating general rules, idiographic approaches value the in-depth and detailed analysis of particular cases. There is no general "other," that may be equated to an average or a category, but single persons and single encounters to be understood in their own right. It is not the case that idiography eschews generalization, only, the strategies for generalizing are different and the methodological focus is on validity rather than on reliability (Smith et al., 2009). A focus on in-depth analysis of single cases has also been stressed by phenomenology: "It is not so much the number of cases seen that matters in phenomenology but the extent of the inner exploration of the individual case, which needs to be carried to the furthest possible limit" (Parnas et al., 2013, p. 273). In this case, generalization is not based on statistical average but on the typicality of a case (a prototype). In fact, the most illuminating cases are often not the most common ones (statistically speaking) but rather the exceptional ones; in this sense, the generalization from these cases qualitatively provides an expansion of understanding on the studied phenomenon (Parnas et al., 2013).

As we have seen, the recognition and acknowledgment of the other person in his or her difference and uniqueness, the active role of the other person in the process of interaction and the constitutive influence of the very interaction process for social understanding are core claims of a second-person perspective that have important methodological implications. Though, a secondperson approach not only makes us aware that the knowledge

<sup>15</sup>Importantly, as Davidson (2003) noticed, in psychopathology research, to include the patient as a partner in the research enterprise does not mean to consider him/her as a fellow scientist: "the role of the participant in our research is not to be a fellow psychologist but to be precisely what she or he is: (e.g.) a person experiencing life with schizophrenia." Cooperation with research participants is indeed based on the recognition of differences and of different roles.

<sup>16</sup>We do not agree with the simulationist approach proposed by Di Maggio et al. (2008) and with the emphasis they put on processes of mentalizing and mindreading. Yet, we believe that some of the insights proposed in their paper may be interesting even if looked at from a different theoretical framework and if applied to methodological issues.

about the epistemic object comes from our relationship with it but also that this relationship is mainly played out at the embodied level of engagement and empathy, which constitutes the core of social understanding. As we have outlined, phenomenological approaches contributed to the social cognition debate by highlighting the role of direct, pre-reflective processes of understanding that take place in the actual encounter between embodied subjects.

Coherently, within the tradition of phenomenological psychiatry this emphasis on pre-reflective engagement and on the importance of empathy for understanding others emerged in techniques like "the feeling diagnosis," where the clinician's emotional reaction to the patient was considered a way to understand psychopathology (Reddy, 2008).

The relevance of the embodied here and now situation of the clinical interview has also been stressed by the more recent phenomenological approach of Parnas et al. (2013). They contrasted a phenomenological method of interviewing with standard structured assessments, underlying how the interaction between the interviewer and the patients should be structured as a mutually interactive reflection: a dialogical I–Thou situation where the interpersonal rapport is crucial for eliciting the patient's experience in its full complexity and for understanding meaningful connections (Nordgaard et al., 2012). This stance is first of all based on a "phenomenological reduction":

What a phenomenological interviewer attempts to do is to suspend the standard presuppositions of the shared, commonsense world, the unquestioned, commonsense background with its assumptions about time, space, causality, and self-identity, and about what does and does not exist as "real." (Nordgaard et al., 2012, p. 360).

This first step allows the interviewer to be open toward the other and engage in a truly second personal and dialogical process of exploration, rather than monologically lead the interview according to predefined assumptions.

Notwithstanding the importance of this methodological shift, a second-person method cannot be limited to the here and now encounter between two embodied subjects. The intersubjective endeavor of the research process in fact does not end with interviewing but goes on through the whole process of analysis and a thorough methodological reflection on this process seems to be missing in contemporary phenomenological psychiatry.

As Dullstein (2012) noticed, the pre-reflective engagement, the acknowledgment of the other person in a truly second personal stance does not yet answer the question of how we understand the other person's beliefs, intentions, and motives. Similarly, even if a phenomenological interview allows a much more detailed and coherent description of the other's first-person experience, the question of how to understand these data still remains unanswered. In the here and now moment of encounter with the patient, the researcher, by bracketing his own assumptions, allows the opening of a space where the other's experience can be freely elicited in its full complexity and, by taking an I–Thou stance in the interaction, he can have an implicit direct grasp of the patient's experience. But how can we understand what we cannot immediately empathically grasp in the interaction? How can we make sense of the ambiguous or bizarre behaviors<sup>17</sup> (which often lead the diagnosis of a psychopathology) that do not actually appear to be meaningful to most of us?

As mentioned above, in cognitive science, the problem of how to understand the other in ambiguous situations, when primary and pre-reflective intersubjective processes of understanding are not enough, was often solved through a shift from an implicit second-person stance to an explicit third or first personal, reflective stance. The same shift can be often witnessed in psychopathology research, when moving from the here and now interview situation to the actual process of data analysis.

For instance, the EASE interview (Parnas et al., 2005), created for exploring anomalous self-experience in schizophrenia, is based on a phenomenological second-person understanding of the interview process which allows a thorough exploration of the patient's experience. Yet, the way this experience is accounted for in the analysis process seems to fall back into a third-person approach, since a checklist is used for evaluation. In fact, by using a checklist, the researcher reads the data (the elicited experience of patients) according to a "normative" theory, i.e., s/he looks for and selects the patient's words that fit into his/her theoretical framework, which is defined *a priori*. By doing this, the researcher assumes an independent and neutral third personal stance. Although the EASE checklist is inspired by a phenomenological theory of schizophrenia, this does not ensure that the methodology is truly phenomenological or second personal.

We do not deny the usefulness of checklists and of third-person approaches in general. Sometimes they constitute a necessary step for the research process, which should ideally combine different methods or tools; we believe that methodological pluralism is the way to go. Nevertheless, when applying a third-person method, it is important to be aware of its implications and, as highlighted above, of the problems that come with it. Using a checklist to read through empirical data may indeed be a useful way to validate a theory; on the other hand, though, if the authority of the analysis process remains with the theory (as in the case of third-person methods) the risk is to fall into a tautological process, where a theory is built on a reading of empirical data according to the same theory. In order for a theory to develop further, we believe that a second-person stance is necessary (at least as a step in the research process) to re-allocate the authority of the analysis process to the other's experience (see the end of this section for a further elaboration on this point).

Another example of this methodological issue is Davidson (2003) qualitative phenomenological analysis of interviews with persons with schizophrenia. As in the case of Parnas' studies, Davidson's interviewing technique is phenomenological, i.e., based on phenomenological reduction and on a dialogical second-person stance toward the other. The process of analysis though, seems to be rather first personal in the method that is applied for understanding the elicited narratives.

<sup>17</sup>By this expression we refer to those experiences that in most cultures are perceived and/or defined as extraneous to common sense understanding, e.g., psychotic experiences, hallucinations and delirium, although this is at the core of an animated debate.

This process is in fact mainly based on the concept of empathy, here conceived as an imaginative transposal into the other's place:

In cultivating empathy for another person's experiences, we have found it useful to build imaginative bridges between his or her experiences and our own. We do this—especially in cases in which the meaning of the experience is far from obvious—as one might do in certain acting classes, by recalling experiences in our own lives that have similarities to the experiences in question (Davidson, 2003, p. 123).

Although "stepping into the other's place" is methodologically very important if we are to get as close as possible to participant's original experience, the worry within a first personal method is still whether this is enough to grasp his/her "otherness," i.e., the aspects of his or her experience that I would not grasp even if I were in his or her shoes, because I am a different person.

As highlighted in the above discussion on third- and firstperson methodology, the Procrustean risk of walking down these routes is that we either try to fit the patient's experience into our own theories (eventually leading to tautology) or reduce it by analogy to our own experience. Although we acknowledge the importance and value of both Parnas and Davidson's work, with these two examples we wanted to show how, by grounding the validity of our understanding only on the here and now engagement with the patient (e.g., in the interview method) we may fail to account for his/her "otherness," the aspects of his/her experience that we may not immediately grasp or empathically understand.

In order to overcome this methodological problem, Stanghellini and Rosfort (2013) proposed the notion of "secondorder empathy," as a valuable alternative that goes beyond both the phenomenological notion of primary non-conative empathy and the conative notion of empathy. Non-conative empathy is the most basic form of empathy: the pre-reflective resonance between my own and the other's lived body that allows a direct, implicit understanding. Conative empathy is a more reflective and cognitive task that requires more than implicit attunement at the level of the lived body. Conative empathy is based on one's personal past experiences and knowledge of commonly shared experiences (common sense), and it consists in an active reflective act of understanding by analogy: "I look inside myself for stored experiences to make them resonate with those of the other" (Stanghellini and Rosfort, 2013, p. 342). By contrast, second-order empathy does not rely on similarity or analogy with the other, rather being based on the recognition of the other's autonomy: "In order to empathize with these persons, I need to acknowledge the existential difference, the particular autonomy, which separates me from the way of being in the world that characterizes each of them" (Stanghellini and Rosfort, 2013, p. 343).

Through the recognition of difference, the process of interpersonal understanding takes the form of a hermeneutic circle of negotiation of meaning between two autonomous subjectivities. Stanghellini (2010) therefore proposed hermeneutics as a framework for understanding psychopathology, which may be coherent with a second-person stance:

Second-person understanding, which requires an involvement (engagement) of the researcher (interviewer), but not of the kind that may obstruct the reliability of results, complements the firstperson approach. It envisions understanding not as the effect of the empathy or the internal actualization of the other's experience, but as an open cycle of questions and answers between interviewer and interviewee. Dialogue, seeking corroboration of the interviewer's constructs and the interviewee's self-understanding, is the major method of inquiry for structural psychopathology. (pp. 323–324)

As Blankenburg (1980) stressed, although from a phenomenological stance the researcher tries to bracket his own assumptions in order to get as close as possible to the other's experience (trying to grasp it in its own autonomy), it is inevitable that one's own subjectivity enters in the process of interpretation.

An integration of phenomenology and hermeneutics has already been recognized as pointing in the direction of a secondperson methodology, although the combination of the two has been so far rather unsatisfactory; in fact, hermeneutics has been only considered mainly for its role in interviewing techniques (Stanghellini, 2007, 2010) or in psychotherapeutic praxis (Fuchs, 2010). Instead, we argue that hermeneutics (together with phenomenology) should be taken seriously for a methodological grounding of the process of understanding at play in psychopathology research.

Integrating phenomenology and hermeneutics, Smith et al. (2009; see also Smith, 2004) developed a method of analysis, Interpretative Phenomenological Analysis (or IPA), that we propose here as an example of a second-person methodology that may be a valuable tool for psychopathology research. Without going into the technical details of IPA, we consider IPA as a valid and non-reductive attempt to grasp the other's experience: namely, a dynamic understanding that goes from the *within* (the patient's experience) to the *between* (the researcher and the patient) and back.

The dual process of understanding in IPA unfolds through a *double hermeneutics*, i.e., a circular movement like a dance, where, on one hand, we (try to) bracket our own prejudices and we empathically engage with the other, taking on an insider stance led by a *hermeneutics of empathy* (Smith et al., 2009); on the other hand, we use our own impressions, feelings, theoretical assumptions, and even critical stance for interpretation (*hermeneutics of questioning*). This accords well with what Reddy (2008) has argued, namely that a second-person methodology needs to be balanced between engagement and disengagement, being involved and at a distance, stepping into and out of the frame to explore it better.

In this dual process, we temporarily try to suspend (or better, bracket away, in the sense that they are acknowledged and tracked down, not ignored) our own personal lens to become more sensitive to the experiences of the other during both interviewing and analysis. When reading the transcripts, we note different kinds (descriptive, linguistic, interpretative, and self-reflexive) of comments at both margins of the text and we make use of a research journal to track and bracket our thoughts that may be later integrated in the interpretation. To put it in Smith et al.'s (2009) terms:

By focusing on attending closely to your participant's words, you are more likely to park or bracket your own pre-existing concerns, hunches, and theoretical hobby horses. It is not that you should not be curious and questioning; it is that your questioning at this phase of the project should all be generated by attentive listening to what your participant has to say. (p. 64)

The second step of the analysis process is rather interpretative: we do make sense of the other's experience from our personal stance and theoretical framework. However, if we are to avoid a third-person theorizing stance, interpretation cannot be based on a hermeneutics of *suspicion*<sup>18</sup> (Smith et al., 2009), where we understand the other's experience according to a theoretical perspective from the outside (an outsider expert stance, as for instance in psychoanalysis): the authority that should give validity to our claims is the experience of the other (Smith et al., 2009).

Interpretation is therefore here a reading from *within* the participant's experience19, yet, it emerges out of a continuous process of interaction *between* the researcher and the participant in the situated context, as meaning making does not happen in a relational void.

Coherently with a second-person perspective, Brown et al. (2011) contend that IPA provides a valuable alternative to various research methodologies that fail to account for the lived totality of individual experience, which is often either fragmentized and broken into separate components (e.g., cognition, emotion, memory, personality) or reduced to other analytic frames at broader social levels (e.g., discourse analysis).

However, this approach also has its limitations. First of all, it often fails to grasp the embodied level of meaning-making which lies at the core of any phenomenological encounter: what Brown and colleagues have called "the methodological problem of body in psychology" (Brown et al., 2011, p. 496; Cromby, 2012). To borrow Murray and Holmes' (2014) words:

And yet our impression of the IPA literature was that the body itself is often absent, or simply presumed to exist behind straightforward descriptions (or spoken testimony) from research participants, as if these descriptions straightforwardly conveyed what is called the lived-experience of the subject, his/her body, and his/her intersubjective relations with others. (p. 6)

Although a detailed methodological discussion of IPA is outside of the scope of this paper, this criticism is worth mentioning here as it touches upon one of the core aspects of a second-person approach: the primary embodied and pre-reflective processes that are always at play in social understanding.

Murray and Holmes (2014) recall Merleau-Ponty's (1973) original concepts of the embodied *parole parlante* (speaking speech) as opposed to *parole parlée* (spoken speech). Whereas the focus on "spoken speech" may seem to embrace the Cartesian reduction of the body to a lifeless object/matter (i.e., Husserl's *Körper*), Merleau-Ponty's phenomenology aims at understanding the embodied language rather than the abstract and decontextualized text: body and language are intertwined and inseparable. The participant's text is always embedded in the lived experience, its original context(s), and in the context of the intersubjective interview itself.

In most qualitative methods for analyzing interviews (IPA included) the "speaking speech" is often accounted for through the use of meticulous and accurate transcription procedures, which typically include taking notes on the participant's most evident para- and non-verbal behaviors (e.g., pauses, smiles, and crying) during the interview by inserting them into square brackets and, where relevant, commenting shortly on the episode. This practice has been criticized for failing to grasp the full embodied and intersubjective experience as situated:

It remains a (formalized, methodologically constrained) way of translating embodied experience into language: as such, it is just as likely to omit something of its ineffable quality as any other such attempt (…) it leaves the gulf between language and embodied experience intact whilst nevertheless giving the superficial appearance of bridging it. In this instance, then, it can appear as though embodiment has been addressed through the technical accumulation and management of detail (Brown et al., 2011, p. 499).

Brown et al. (2011) suggest that rather than seeking the solution in transcription techniques, the methodological issue of the body needs to be addressed differently. In this regard, in our opinion it is worth noticing a particular technique often implemented in qualitative methods: the recollection of interview (otherwise also referred to as diary of interview or research journal). The recollection of interview is the first stage of IPA, where the researcher writes down all his or her immediate impressions, feelings, thoughts that arose in the embodied encounter of the interview situation. If then integrated in the analysis process20, the recollection of interview may be seen as a better way to account for the "speaking speech" as well for the intersubjective context of the participant's words.

Interesting alternative ways of analyzing lived experience in its full complexity (and not just as straightforward description of experience) may be found in attempts to look not only at the content level of what is narrated but also at the way contents are talked about in the situated interaction. For instance, Lysaker et al. (2002, 2003, 2005) put a particular focus on aspects like the coherence and quality of narratives for understanding patients' experience. Similarly, Seikkula et al. (2011)focused on the dialogical quality of therapeutic conversations for investigating the experience of change. A further remarkable example of this research strand is put forward by Ugazio et al. (2009, in press): in their analyses of therapeutic conversations they not only looked at the

<sup>18</sup>Smith et al. (2009) draw on Ricoeur's (1970) distinction between two opposed interpretative positions: the hermeneutics of empathy and the hermeneutics of suspicion. Whereas the first attempts to reconstruct the original experience in its own terms, the latter is based on theoretical perspectives form the outside for understanding the phenomenon. Smith et al. (2009) therefore argue for a center-ground position that combines the two.

<sup>19</sup>What Smith et al. (2009) call "a reading from within" was already mentioned by Blankenburg (1980) as "immanent interpretation." (p. 67)

<sup>20</sup>Although an integration of the recollection of interview in the analysis process is coherent with the IPA guidelines, this technique is not always implemented. In fact, because the IPA guidelines are quite flexible, many methodological decisions are left to the researcher's judgment.

"narrated" meanings but also at the "narrating" and "interactive" levels, which refer to the more implicit, embodied and interactive dimensions.

Other methods have tried to include the embodied aspect of communication in the analysis process, as for instance the PRISMA method (Pieper and Clénin, 2010): a video-supported analysis method that uses the sensations, emotions, and thoughts of the researchers as tools for understanding.

Although some steps have been already made in this direction, the "methodological problem of the body" in understanding the other's experience seems to be still an open issue that needs to be accounted for, especially in regards to methodologies coherent with a second-person approach.

## **CONCLUSION**

In this paper, we have critically reviewed the main theories at the heart of the social cognition debate: looking into the core principles of the third-, first-, and second-person approaches, we have highlighted the implications and limitations of each theoretical stance. Moreover, we have outlined how the secondperson perspective addresses and tries to solve different problems related to third- and first-person theories of social cognition. Following the different contributions to this debate we have also stressed how, even within a second-person proposal, some issues still remain controversial; indeed, the second-person approach does not yet provide a definitive answer to the dilemma of social understanding, but in our opinion it represents the most convincing account of social cognition put forward so far within this field.

We followed Reddy (2008) in maintaining that the problem of understanding others constitutes the core methodological issue of psychology research and that therefore the theoretical frameworks accounting for this problem should inform the very process of research in its methodological concerns: we do try to understand others when doing research in psychology. Thus, we decided to focus this paper on psychopathology research, making it a paradigmatic case of this methodological issue.

Accordingly, linking social cognition theories with research methods based on similar assumptions, we underscored how the shortcomings and implications of each theoretical stance could be also viewed as methodological problems in psychopathology research. Once the epistemological and theoretical frame are recognized and explicated, third- and first-person methods can be criticized according to the same arguments that deconstructed these perspectives in the social cognition debate: i.e., mainly, the assumption that, for understanding others, the researcher starts from an observational stance, which is detached and independent from the object of study; that this stance is in the first place observational and that therefore processes of understanding occur within the observer (denying the primary and founding role of the interaction in meaning making); that the primary processes of understanding are already based on theorization and inference, leaving out the immediate embodied level of engagement and direct perception (instead of emerging from the dynamic intersection between both levels); and (for a first-person approach) that social understanding is based on my own individual experience, in analogy with the other's, but disconnected.

By discussing and challenging these assumptions, we outlined what we consider to be the core principles of the second-person approach, drawing on the different contributions that constitute it: i.e., mainly, the recognition of embodied and more direct processes of social understanding as primary (and therefore the importance of non-conative empathy and engagement for understanding); the assumption that our everyday stance toward others is not observational but interactive; the importance of the social interaction process as constitutive for social understanding; the fundamental personalistic attitude we assume toward others as soon as we recognize them in their difference and we acknowledge them as responsive others (an attitude that is here seen as a pre-condition for social understanding). Besides, we support the claim that the intersubjective matrix of social understanding does not simply draw on the implicit immediate level of interaction but also on higher and reflective intertwined levels of meaning, that are therefore seen as unfolding in the form of a hermeneutic circle.

Finally, we looked into how second personal theories of social understanding can inform the epistemology and methodology of psychopathology research, by reviewing research principles, techniques, and methods that are coherent with this perspective. We do believe that a second-person perspective is the most convincing methodological framework so far put forward for psychopathology research as it best accounts for the validity of our claims about the other.

The aim of this paper though, is not to defend one particular research approach against another, but rather to point out the different theoretical and epistemological assumptions supporting each methodological stance; therefore we critically discussed the limits and implications of different research methods. Although an integration of different techniques is needed and useful in research, and even first- and third-person methods should not be totally rejected, the problem of methodological pluralism centers on how to integrate methods in a complementary and meaningful way, so that we can preserve the validity of our final claims.

First of all, we believe that, in order to integrate different methods properly, a stronger critical awareness of their epistemological underpinnings, and their different targets and limits is needed. The reflections and discussions outlined in this paper are aimed at drawing the attention to this important issue, to enhance this awareness, or at least to offer some inputs for further debate. Secondly, if we are to avoid a view of methodological pluralism as a clash of (sometimes even contradictory) methods, research in psychopathology should be conceived within a broader theoretical framework addressing the problem of how we get to know others in the first place.

For instance, Reddy, 2008 maintained for a second-person approach:

Disengagement is not only inevitable, it provides a valuable dimension to knowledge that is born within engagement. Buber, comparing the intense intimacy of the I-Thou way of knowing with the I-it way, pointed out (albeit poetically) the inevitability of the latter: genuine engagement for him was a time-limited phenomenon. (…) But this is not detachment; it is disengagement born within and alternating with, engagement. What psychological science need is a balance—engagement first and disengagement second—between the two. (pp. 34–35)

Similarly, according to a second personal framework, we can argue for the need to integrate quantitative methods (thirdperson stance)<sup>21</sup> with qualitative ones (first- or second-person stance) in psychopathology research; but, for this integration, the validity of the results should rely on the latter, rather than on an illusory objectivity of the first; as Reddy wrote for a second-person perspective, engagement comes first.

In this regard, we believe that a second-person framework should always inform psychopathology research, as in the end we can only know others intersubjectively, from the more embodied levels of participatory sense making in the here and now encounter, to the hermeneutic circles of interpretation where different contextual levels of meaning come into play.

## **ACKNOWLEDGMENTS**

We would like to thank Michela Summa, Zeno van Duppen, Stefano Micali, and Thomas Fuchs for the inspiring discussions and for the helpful comments on earlier versions of the paper. We also gratefully acknowledge the editor and the two reviewers. This work is supported by the Marie-Curie Initial Training Network, "TESIS: Towards an Embodied Science of InterSubjectivity" (FP7- PEOPLE-2010-ITN, 264828).

## **REFERENCES**


<sup>21</sup>Seen here as relying on a stance of "disengagement" rather than "detachment," according to Reddy (2008) distinction.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 May 2014; accepted: 23 September 2014; published online: 17 October 2014.*

*Citation: Galbusera L and Fellin L (2014) The intersubjective endeavor of psychopathology research: methodological reflections on a second-person perspective approach. Front. Psychol. 5:1150. doi: 10.3389/fpsyg.2014.01150*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Galbusera and Fellin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Enactive account of pretend play and its application to therapy

# **Zuzanna Rucinska<sup>1</sup>\* and Ellen Reijmers <sup>2</sup>**

<sup>1</sup> Department of Philosophy, University of Hertfordshire, Hatfield, UK 2 Interactie-Academie VZW, Antwerp, Belgium

#### **Edited by:**

Hanne De Jaegher, University of the Basque Country, Spain

#### **Reviewed by:**

Frank Röhricht, University of Essex, UK Alemka Tomicic, Universidad Diego Portales, Chile Maria E. Molina, Universidad del Desarrollo, Chile

#### **\*Correspondence:**

Zuzanna Rucinska, Department of Philosophy, University of Hertfordshire, De Havilland Campus, Hatfield, Hertfordshire AL10 9EU, UK e-mail: z.rucinska@hotmail.com

# **INTRODUCTION**

This paper explores one relationship between philosophical understanding of pretend play and therapies that include symbolic play with objects in their repertoire.

In traditional therapies (and particularly psychodynamic therapies), play has been used to "uncover" problems of clients to allow therapists to "analyze" them. In those therapies, play in general (pretend play at most, but also playing with objects) is often seen as symbolizing. Similarly, in philosophical works, pretend play, traditionally seen as symbolic play, is often characterized as a representational capacity whereby an object or behavior "stands in for" or represents another (see Mitchell, 2002). Mental representational structures dominate both the characterization and explanations of pretense activities. Such description of pretend play goes hand in hand with how playing is seen in therapy, which is as representing or denoting something true about the person who is playing.

Systemic therapy, however, is an approach that tends to focus on interaction and maintaining a dialog between a therapist and a client (Watzlawick et al., 1967; Watzlawick and Jackson, 2009). It asks for a different view of play, in which it is not a tool for uncovering and interpreting meanings, but is seen as part of a "here and now" dialog that allows discovering new meanings with a client in order to facilitate his/her development of novel perspectives. Likewise, the novel enactive account of pretend play (EAPP) proposes such a view of play. Based on the functional–ecological approach to pretense (Szokolsky, 2006), vast literature about the importance of interacting with objects in development of cognition and in establishing pretense relationships (e.g., Piaget, 1962; Vygotsky, 1978), and motivated by the emergence of novel embodied, intersubjective and (radically) enactive approaches to

This paper informs therapeutic practices that use play, by providing a non-standard philosophical account of pretense: the enactive account of pretend play (EAPP). The EAPP holds that pretend play activity need not invoke mental representational mechanisms; instead, it focuses on interaction and the role of affordances in shaping pretend play activity. One advantage of this re-characterization of pretense is that it may help us better understand the role of shared meanings and interacting in systemic therapies, which use playing to enhance dialog in therapy rather than to uncover hidden meanings. We conclude with bringing together findings from therapeutic practice and philosophical considerations.

**Keywords: enactivism, pretend, play, systemic therapy, dialog**

cognition (Gallagher, 2009; Hutto and Myin, 2013), the EAPP highlights the role of interaction in pretense. It further focuses on the key role that the notion of *affordances* may serve in shaping pretense activities when playing with objects (and other people), and suggests that even symbolic play need not invoke mental representations (Rucinska, 2014a,b). The advantage of the EAPP is that it looks better placed to provide an understanding of the role shared meanings and interacting serves in therapy that uses play. As such, it may fit better with the goals of systemic therapies, which focus on interaction.

In this article we aim to show that we can broaden the scope of what playing may be used for in therapy. We suggest a different function for engaging in play in therapy: one of creating a dialog, instead of being a mirror of reality. The EAPP gives further reasons why the "staying within play" approach (Rucinska and Reijmers, 2014) is beneficial in therapy, as it already pays special attention to engagements (active exploration of objects in relevant intersubjective contexts), finding mentality in the interactions and not in encapsulated mental representations.<sup>1</sup> Understanding the possibility of a different account of pretend play as proposed by the EAPP makes an interesting case for therapists to reflect on their therapeutic practice.

# **PLAY IN THERAPY**

In traditional therapies (particularly psychodynamic therapies), play has been used to "uncover" problems of clients to allow therapists to "analyze" them. Drawing, playing with building blocks or puppets as well as pretend playing and role-playing is often used in various therapies, but as Russ and Fehr (2013)

<sup>1</sup>By "mentality," we broadly refer to kinds of mental or cognitive aspects of life.

point out, "play therapy continues to be most associated with psychodynamic approaches." In these approaches, play expressions are seen as manifestations of hidden or repressed longings, fears, or conflictive attachments, and are to be interpreted by the psychoanalyst. Verbalization and active labeling of the feelings are said to help the child understand and deal with the causes of their feelings and behavior (Freud, 1966; Dolto, 1985; Axline, 1989).

The psychodynamic approaches to play therapy are the dominant, but not the only available approaches. In systemic therapies, for example, play is used as a vehicle for communication and enhancing the dialog between the therapist and the client(s). The focus on interaction and communication in systemic therapy (see Watzlawick et al., 1967; Bateson, 1972; Watzlawick and Jackson, 2009) asks for a different view of play, seen as communication in context, and not as an expression of individual behaviors, thoughts or feelings that are projected onto the play or play materials, as in more traditional play therapy theories. The focus is not on what the play means or what the play expressions stand for, but on how the therapist can engage with the client in play in such a manner that it will enable the client's change or shift of perspectives. As such, playing gives the therapist a new role.

We will highlight two challenges for systemic therapists that have to do with seeing therapy as such dialog. The first is to hold on to a "not-knowing stance" (Anderson, 2005, 2012). The second is *not* to attribute fixed meanings to play. Both challenges have to do with the therapists' pitfalls to want to analyze play from outside of play, and to be in an expert position detached from the interaction (Cecchin, 1987; Cecchin et al., 1992; Bertrando, 2007).

Maintaining a "not-knowing stance" may be challenging to the therapist because trying not to interpret play, especially violent play, is not easy. Extreme behavior of children during therapeutic sessions, for example, can create situations where a systemic therapist cannot see the play as an ongoing interaction and is tempted to seek foothold in a "knowing stance." When a therapist is overwhelmed, he/she is likely to see a negative situation as a mirror of reality, not as a creation of a reality in an on-going dialog. On the basis of experience and intuition, but probably also under pressure of dominant play theories that stress the idea of play as individual expression of feelings and longings, a therapist may determine a person's problems *ex ante*, without exploring them further. For instance, impressed by the destructive way a boy behaved at a therapeutic session, a therapist at the *Interactie Academie* made a direct and determinate link between the aggressive moments in the game and the absence of a father figure. The boy's aggression was no longer seen as a meaningful part of the game, but was perceived as an ever-present personal trait. At that moment, the therapist lost her creativity and the game with its playfulness stopped. However, when the therapist decided to take a different approach and introduce role-playing (where the players choose their characters and negotiated their roles), there was a mutual engagement in the therapeutic session. It seems plausible to suggest that this positive effect was, in part, a result of not stigmatizing the boy's behavior and attributing blame. This example shows that the knowing position of the therapist, linked with her interpretation of the boy's aggression as a hidden longing, can block creativity in play, whereas her focus on play without interpreting it created a different dynamic.

A related challenge for therapists is *not* to attribute fixed meanings to play, but to understand that there is a variety of meanings that play can carry. Consider another case from our practice of a young boy playing with a dollhouse during one therapeutic session. A 9 years old boy tidied the house, correctly arranged its furniture, swept the floors and played the piano in the play. He did this without saying a word. Then, choosing carefully, he placed every object in one room of the dollhouse. Finally he locked that room, leaving only empty rooms. The play appeared to be finished. In this example, the therapist is again under the risk of searching for the meaning behind the boy's play, using dominant therapeutic theories and culturally embedded stories to analyze it. The dollhouse can be taken to stand in for the boy's home, or the play to stand in for his feelings toward his family, but one way or another, it is taken to represent an actual state of affairs.

We suggest a different approach to understanding play in therapy. In this case, the dollhouse need not stand in for the boy's specific feelings or family relationships; we have no way of knowing whether the play refers to the boy's home unless the boy explicitly says that the dollhouse is like his home. We suggest that therapists should not pay attention to *what could be* the hidden meanings behind play, but to *how* a client is playing at the time, and how the therapist can in turn play with a client to further influence and negotiate the play. We base this suggestion on the idea that the meaning of the play need not be seen as *hidden behind* the action, but as *being in* and *emerging out of* the action. Play—even pretend play—need not be seen as representing meanings, fixed by mental representations. To support this idea, we turn to the EAPP.

## **THE ENACTIVE ACCOUNT OF PRETEND PLAY**

In standard philosophical approaches, pretend play, traditionally seen as symbolic play, is often characterized as a representational capacity whereby an object or behavior "stands in for" or represents another. That is because pretense itself is taken to be a type of a mental state that enables one to act as if one thing was another. The recurring aspect that underlies present pretense theories [whether metarepresentationalist (Leslie, 1987), behaviorist (Perner, 1991; Lillard, 1994; Harris, 2000; Nichols and Stich, 2003), or intentionalist (Rakoczy et al., 2004, 2005) is the positing of mental representations. There are many ways to characterize mental representations, ranging from a stronger cognitivist reading in which mental representations involve internal symbol-processing mechanisms with semantic informationbearing structures that store mental contents (Leslie, 1987), to weaker, action-oriented representations or some form of motor representations (Wheeler, 2005). However, Leslie's (1987, p. 414) definition seems to best capture the mentioned theoretical models of pretense: "The basic evolutionary and ecological point of internal representation must be to represent aspects of the world in an accurate, faithful, and literal way, in so far as this is possible for a given organism." To explain the capacity to pretend, the theorists then postulate various kinds of internal cognitive mechanisms, which manipulate the veridical mental representations to create new *pretense* representations (albeit through different means) that direct pretend play (Leslie, 1987; Harris, 2000; see Nichols and Stich, 2000; for most elaborate mechanism).<sup>2</sup>

Such description of pretend play goes hand in hand with how playing is seen in therapy, which is as representing or denoting something true about the person who is playing. Presently we propose an alternative account of pretense, where higher cognitive capacities such as offline symbol swapping need not be invoked. The EAPP proposes that basic cases of pretend play like treating one object as another only requires active exploration of objects in a playful context, as supported by the theory of (social) affordances (Gibson, 1979; Noë, 2004; Chemero, 2009) and agents' sensorimotor skills (O'Regan and Noë, 2001).

Affordances are to be understood as possibilities for action. To quote Noë (2004, p. 105), "Things in the environment, and properties of the environment, offer or afford the animal opportunities to do things (find shelter, climb up, hide under, etc.). (. . .) When you see a tree, you not only directly perceive a tree, but you directly perceive something up which you can climb." As such, the immediate environment can solicit certain actions and resist others. Applied to understanding pretense, we can think of objects as affording novel possibilities in and through the play. These possibilities depend on the actor's sensorimotor skills and dispositions, as well as on the object's properties, and the novelty and creative use of objects emerge through their interaction. Setting the interaction in a playful context also provides further flexibility to the actions, affecting the use that the objects solicit.

There is still a great debate about what affordances actually *are*, that is, whether they count as relational properties or dispositions, or whether they are inherently social (elicited by interacting with other people) or canonical (elicited by a wider social context and narrative practices; see Costall, 2012).<sup>3</sup> Nevertheless, they are useful alternative constructs, both in terms of philosophy and therapy, as affordances can take over some of the purposes mental representations were supposed to serve. It is likely that in acting upon a prop (like in the banana–phone game), the player does not act independently of what is seen, but is guided by the prop (banana) and perceives in action what the prop affords (calling by holding to ear). Thus, affordances are strongly related to our sensorimotor capacity to interact with objects; they are best understood as the possibilities of action that come about in the interaction, as suggested by the action–perception–action loop: acting in the world brings about new affordances that further shape how you perceive and act on the world.<sup>4</sup>

This view reflects earlier, ecological approaches to pretense, where the nature of cognition is seen as dynamic and fluid, flexible and adaptable, and "pretend play is an especially good example of the fluid and dynamically intertwined presence of perception, action and cognition" (Szokolsky, 2006, p. 82). While more work is required to secure the EAPP, we provide here a first attempt at showing its benefits and relevance to therapy.

# **APPLYING THE NEW PLAY METHOD**

The EAPP can help to understand how to counter the two problems of systemic therapy: refraining from attributing prescribed meanings to behaviors, and taking the not-knowing stance. The notion of affordance can be useful for understanding that objects may not "denote" meanings but instead can "create" meanings through affording flexible actions to the actor. Regarding the "not knowing stance," following Costall, Szokolsky (2006, p. 68) explains: "Any object has an immense number of action possibilities, but these cannot be known in advance, in separation from the actor and the action." As we cannot know in advance what the objects can solicit in play (as their meanings are relatively flexible when negotiated in interaction), we should not fix our interpretations on them.

Taking an affordance-based view could allow therapists to have a different way of making use of play in therapy sessions. Consider an example of the "staying within play" approach, which uses play as a dialog that enables creating new meanings (Rucinska and Reijmers, 2014). This approach relies on using objects to create a playful dialog and an embodied experience. For example, one client (John) was asked to pick an object that would stand for the problematic relationship he wanted to deal with (the object happened to be a flexible snakelike ornament) as well as to pick objects to stand for different feelings he had regarding this relationship (he picked a book, an eraser and a colorful flower for his feelings and a sharpener, a feather and a postcard for the feelings of his son). John was then asked to put every object somewhere in the room, giving it a place in relation to the snakelike figure. Afterward, the therapist started a dialog with John about the form,

<sup>2</sup> It can be argued that this notion of mental representation underlies even the commitments of other theorists of pretence aside Leslie. Even the so-called "behaviorists" and "intentionalists" to pretence, who say that pretending is "merely acting as if," commit to the view that one is "acting as if " *a proposition is true*. For example, Harris (2000) claims that to successfully play banana– phone, a child must act according to a rule (or as he call it a "flag") that "this banana is not a phone" and edit these rules to generate new flags through a propositional model, with "statements written on the various flags" (p. 66), while Rakoczy et al. (2005, p. 81) claim that "in pretending to pour the actor symbolizes 'there is water coming out of this container,' he acts as if it was true." There is a clear indication that, explicitly intended or not, these theorists too commit to the notion of mental representation of the stronger, semantic kind.

<sup>3</sup>For example, adults initiate and guide children's play by showing how to play, which the child imitates, and encourage pretence play through various forms of verbal and nonverbal feedback. Immediate dialogical interactions afford others as potential co-operators. Intersubjective contexts can allow new ways of understanding to be established in the interaction [in what De Jaegher and Di Paolo (2007) call "participatory sense-making" activity]. Social context determines whether there is a breakdown in the play (such as when "flying movements" are used in "elephant" play) or whether it is accommodated (as "Dumbo the flying elephant" play). Such co-creation of meanings suggests that sensitivity to others' understandings, stemming from engagements in joint activity, allow for new, shared understandings to develop.

<sup>4</sup>That action and perception are tightly bound has been proposed and defended extensively in the literature (Noë, 2004), and can be seen in empirical findings. For example, Held and Hein's (1963) famous "Kitten Carousel" experiment showed that there was a significant difference in how the active kitten, controlling its locomotion, responded to its environment (avoiding visual cliffs, bracing themselves from being placed on the visual cliff, or avoiding looming objects) as opposed to the passive kitten, which did not engage in such behaviors. This finding suggests that there is an action– perception–action loop, whereby the engagement in moving around afforded its avoidance of visual cliffs. Thus, quoting Chemero (2009, p. 145), it is "more appropriate to understand affordances as being inherent not in animals, but in animal-environment systems."

shape and colors of the snakelike ornament and the way other objects were placed around it. Further, the therapist asked John to reposition the objects, as well swap seats with the therapist, who inquired further about how the relationship between the objects made John feel, what arrangements made him feel comfortable, and what bodily and emotional changes did he experience when he moved the objects around.

John and the therapist stayed, so to speak, *in* the play situation and *in* the play language. While this did not mark the end of the therapy sessions, there was a clear positive gain stemming from this form of interactive communication and hands-on engagement with objects; as John mentioned afterward, "he enjoyed the session, felt less depressive, and had a more hopeful feeling about the relationship that troubled him." We believe this method allowed John to "position" himself differently to the problem. It suggests a great impact of offloading the problem to the objects that one can literally manipulate (have a hands-on embodied experience with) that allows one to get new perspectives and shift own attitudes (for more details on John's case, see Rucinska and Reijmers, 2014).

## **CONCLUSION**

In this article we have suggested a different function for playing in therapy: one of creating a dialog, instead of being a mirror of reality. It shows that a therapeutic conversation is more than words. Playing, as an embodied activity, adds and reinforces the narratives, allowing new meanings to be created through object use and interaction with the therapist. Thus, while the use of creative methods and play is not new to systemic therapy, we believe that in the case above play served a special role: not only did it enrich the repertoire of the therapist, but it also allowed an embodied dialog to emerge. In this dialog, objects did not serve as "standins" to be further analyzed but, rather, meanings attributed to the objects were "offloaded" onto them to be further manipulated.

We also aimed to show that the EAPP, involving a concept of affordances, can help us further understand the effects of the "staying within play" approach. It can be useful for therapists to understand that the traditional way play is characterized (as representational) may be consequential and skew the focus of the therapy, as therapists tend to look for inherent meanings in play behaviors of clients and concentrate on what they symbolize. This takes away from focusing on the interaction itself, where new meanings and understanding between therapists and clients can emerge.

As mentioned earlier, dominant cultural and therapeutic narratives make it difficult to see interaction and use of objects in a play situation as a way of creating meaning; meaning is supposedly already established or assumed. Thus, if we were to *operationalize* what is going on in these therapies, we would introduce mental representations of the semantic kind. The practice of using play to "uncover meanings" of "suppressed feelings" would be best characterized as involving represented "meanings," "inherent" in the subject, whereby theorizing about them would be the right kind of practice to get to the mental life of the client.

Acknowledging the possibility of non-representational pretense motivates careful consideration of how play in therapy is to be understood, and broadens the spectrum of possibilities for therapists to use play in their therapy sessions. While taking a representational stance is tempting, it is not a necessary move, as an alternative is present. What is safe to say is that there seems to be a good fit between the EAPP and systemic therapy, in the sense that both focus on the interaction, where they find mentality. As the EAPP clarifies, interaction is a basis for mentality and is already in some sense meaningful, and so no extra level of mentality may need to be "uncovered." The EAPP also gives an alternative account to how, without focusing on interpretations and thinking counterfactual thoughts but through engagement with objects, the relevant changes in attitudes (shift of perspective) can come about.<sup>5</sup>

Ultimately, with this paper we have aimed at promoting more research of interdisciplinary kind to shed further light on the implications that theories (and theoretical jargon) may have onto practice, within and between various disciplines. We hope to invite further research to be done in psychotherapy and cognitive psychology from developmental as well as cultural perspectives, using the notion of affordances and the EAPP.

# **ACKNOWLEDGMENTS**

This work was funded by the Marie-Curie Initial Training Network TESIS: "Towards an Embodied Science of Intersubjectivity" (FP7-PEOPLE-2010-ITN, 264828). With special thanks to Dan Hutto, Alan Costall, Vasu Reddy, colleagues from the University of Hertfordshire and the *Interactie Academie* for support received in relation to the work presented in the paper.

# **REFERENCES**


Chemero, A. (2009). *Radical Embodied Cognitive Science*. Cambridge: MIT Press.

Costall, A. (2012). Canonical affordances in context. *Avant* 3, 85–93.

De Jaegher, H., and Di Paolo, E. (2007). Participatory Sense-Making: an enactive approach to social cognition. *Phenomenol. Cogn. Sci.* 6, 485–507. doi: 10.1007/s11097-007-9076-9

<sup>5</sup>That psychodynamic therapy mostly involves representational model should be further substantiated and tested against new developments within the wider field of psychodynamic therapy like the earlier-mentioned relational depth psychology approaches, which assign a significant importance to the "hereand-now" interpersonal realities unfolding within the therapeutic settings (see Stern, 2004). But while it is not excluded, for example, that there may be a way to accommodate psychodynamic approaches to uncovering meanings in therapy with the EAPP (under a different description of "meanings"), there seems to be a more natural fit between these approaches and the representational account of pretence.

Dolto, F. (1985). *La Cause des Enfants*. Paris: Robert Laffont.

Freud, A. (1966). *Normality and Psychopathology in Childhood: Assessments of Development*. London: Hogarth Press.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 May 2014; accepted: 03 February 2015; published online: 02 March 2015.*

*Citation: Rucinska Z and Reijmers E (2015) Enactive account of pretend play and its application to therapy. Front. Psychol. 6:175. doi: 10.3389/fpsyg.2015.00175*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright* © *2015 Rucinska and Reijmers. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**PERSPECTIVE ARTICLE** published: 04 August 2014 doi: 10.3389/fpsyg.2014.00850

# Understanding social engagement in autism: being different in perceiving and sharing affordances

# *Annika Hellendoorn\**

Department of Special Education, Centre for Cognitive and Motor Disabilities, Utrecht University, Utrecht, Netherlands

## *Edited by:*

Ezequiel Alejandro Di Paolo, Ikerbasque – Basque Foundation for Science, Spain

#### *Reviewed by:*

Hanne De Jaegher, University of the Basque Country, Spain Maria Brincker, University of Massachusetts Boston, USA

#### *\*Correspondence:*

Annika Hellendoorn, Department of Special Education, Centre for Cognitive and Motor Disabilities, Utrecht University, Heidelberglaan 1, P. O. Box 80.140, 3508 TC Utrecht, Netherlands e-mail: A.Hellendoorn@uu.nl

In the current paper I will argue that the notion of affordances offers an alternative to theory of mind (ToM) approaches in studying social engagement in general and in explaining social engagement in autism spectrum disorder (ASD) specifically. Affordances are the possibilities for action offered by the environment. In contrast to ToM approaches, the concept of affordances implies the complementarity of person and environment and rejects the dualism of mind and behavior. In line with the Gibsonian idea that a child must eventually perceive the affordances of the environment for others as well for herself in order to become socialized, I will hypothesize that individuals with ASD often do not perceive the same affordances in the environment as other people do and have difficulties perceiving others' affordances. This can lead to a disruption of interpersonal behaviors. I will further argue that the methods for studying social engagement should be adapted if we want to take interaction into account.

**Keywords: social cognition, theory of mind, embodied cognition, affordances, autism spectrum disorder**

How people are able to interact successfully with each other is a question raised and answered by researchers from different disciplines. While this question can be answered in numerous ways, the answer that emerges from a significant part of the literature is by employing a "Theory of Mind" (ToM). Although there are different definitions of this concept, the term "ToM" generally refers to the ability to attribute mental states to the self and other people in order to explain and predict behavior (Premack and Woodruff, 1978; Baron-Cohen et al., 2000). ToM approaches assume that people have a ToM that enables them to *infer*, either explicitly or implicitly, the mental state of a person from that person's behavior (Van Overwalle and Vandekerckhove, 2013). This implies that ToM theory separates the (supposedly meaningless) observable behavior from the (meaningful) private mind in a Cartesian way and ToM approaches have been criticized for that way of thinking (Gallagher, 2004; Reddy, 2008; Leudar and Costall, 2009/2011). From this perspective you need a ToM in order to interact successfully with other people. In addition to the criticism of Cartesian dualism, ToM approaches have also been criticized for isolating social understanding from the actual engagement (De Jaegher and Di Paolo, 2007; Fuchs and De Jaegher, 2009). According to ToM approaches, meaning is constructed in the minds of social participants. The idea that meaning is created in the ongoing active interaction between persons is not taken into account (Fuchs and De Jaegher, 2009).

In contrast to ToM approaches, more embodied approaches assume that mind and behavior are not separate. People directly perceive other persons' intentions in their actions without the need for an indirect, implicit or explicit, process of inference and theory (Gallagher, 2001, 2004; Good, 2007). This is consistent with the concept of "affordances." Affordances are the action possibilities that the environment offers to an animal

or person (Gibson, 1986). It is assumed that affordances are perceived directly, i.e., without the intervention of certain cognition operations, such as ToM (Gibson, 1986; Barrett, 2011). Directly does not mean that every affordance in the environment is automatically perceived and acted upon. The perception of affordances is dependent upon the particular information that is picked up by the perceiver and the information pick-up is dependent upon the characteristics and capabilities of the perceiver (e.g., the central nervous system, perceptual system, motor skills) and the interaction that the perceiver has with the environment (Gibson and Pick, 2000). Thus, an affordance is inherently specific to a particular perceiver. What an object or the action of another person affords one person may be different from what the same object or action affords someone else. What is relevant in the environment cannot be separated from the perceiver, it is not a pre-given. In addition to individual differences, different groups may also show differences in the perception of affordances. Differences have been found between novices and experts (Charness et al., 2001), between children with different motor skills (Adolph et al., 1993), between children and adults (Thelen, 2008), and between typically developing persons and persons with a physical or mental impairment (Loveland, 1991).

Gibson (1986) explicitly states that the perception of social affordances, which may be defined as the affordances provided by other people's behavior, is just as direct and based on the pick-up of information as the perception of affordances in the physical environment: "It is just as much based on stimulus information as is the simpler perception of support that is offered by the ground under one's feet. For animals and other persons can only give off information about themselves insofar as they are tangible, audible, odorous, tastable and visible" (Gibson, 1986, p. 135).

However, social affordances are also different from affordances of the physical environment: "They are so different from ordinary objects that infants learn almost immediately to distinguish them from plants and non-living things. When touched they touch back, when struck they strike back; in short they interact with the observer and with one another. Behavior affords behavior..." (Gibson, 1986, p. 135). This means that the actions of persons in social interaction are not only dependent upon the attunement (the particular information that is picked up) of both persons individually, but their actions are also dependent on the action of the other person. "What the infant affords the mother, is reciprocal to what the mother affords the infant" (Gibson, 1986, p. 135). Thus, social affordances are actively created and maintained by the joint action of two or more persons (Good, 2007). This is consistent with the idea that interactors' perception-action loops are coupled and interlaced with each other and that in social interaction agents participate in each other's sense-making (De Jaegher and Di Paolo, 2007; Fuchs and De Jaegher, 2009).

# **UNDERSTANDING SOCIAL ENGAGEMENT IN AUTISM**

It has been claimed that ToM theory can explain the socialcommunicative impairments of autism spectrum disorders (ASD; Baron-Cohen et al., 1985): "We have reason to believe that autistic children lack such a "theory." If this were so, then they would be unable to impute beliefs to others and to predict their behavior" (Baron-Cohen et al., 1985, p. 37). I have introduced the concept of social affordances as an alternative to ToM. Since affordance perception is based on the pick-up of information, the explanation for the social-communicative impairments in ASD from an ecological perspective should be sought in differences in information pick-up between people with and without ASD and the cascading effects this will have for the interaction. Several theories and studies have indicated that both children and adults with ASD pick-up different information compared to people without ASD (Mottron et al., 2006; Gepner and Féron, 2009; De Jaegher, 2013; Donnellan et al., 2013). An example might be emotion perception. Emotions can be viewed as social affordances in the sense that they call forth various interpersonal behaviors. For example, anger is likely to provoke avoidance, whereas joy is likely to encourage approach (McArthur and Baron, 1983). Studies show that the information that specifies facial expressions is a specific spatial integration of different facial features changing in a characteristic way. Perceivers respond to changes in the whole facial configuration. That information is critical and sufficient for face recognition and emotion perception (Tanaka et al., 1998; Behrmann et al., 2006a; Pellicano et al., 2006), and is largely supported by low spatial frequency information (Goffaux and Rossion, 2006). Studies indicate that individuals with ASD are less sensitive to configurations than people without ASD and show enhanced sensitivity in response to high spatial frequency (fine perceptual detail, sharp edges) versus low spatial frequency (general shape and large contour) stimulus information, compared to typically developing and developmentally delayed children and both for neutral as well as socially relevant stimuli (Deruelle et al., 2004; Vlamings et al., 2010). This is in accordance with personal accounts: "*I did not see the whole. I saw hair, I saw eyes, nose, mouth, chin,* ... *not face*." (Alex in Williams, 1999, p. 180). These studies suggest that the facial

expression may not afford the "typical" social behavior for people with ASD, because the facial expression, specified by configural information, may be difficult to perceive for persons with ASD. Studies on biological motion support the idea that affordances are specified by a particular type of information that is detected by typically developing individuals, but not by individuals with ASD. Johansson (1973) has designed experiments in which a few spots show the motions of the main joints of a person. When a moving presentation of this minimal information is shown to typically developing persons they can tell whether the point-light display is walking, dancing, fighting, etc. Studies show that children with autism have difficulties recognizing biological motion and emotion from point-light displays, while typically developing children and children with spatial deficits and a degree of mental retardation are able to do that (Jordan et al., 2002; Blake et al., 2003; Annaz et al., 2010; Nackaerts et al., 2012). Children with ASD also show a different pattern of eye movements while seeing point-light displays (Nackaerts et al., 2012). Other studies that have tested information pick-up through eye-tracking methods confirm that there are clear differences in information pick-up between people with and without ASD (Klin et al., 2002). This means that what a situation affords for a person with ASD is often different from what the same situation affords for a personwithout ASD.

In addition, as stated before, behavior affords behavior. Therefore the different information pick-up of a person with ASD will not only affect the actions of that person, but also the actions of the other person(s) in the interaction. Typically, although there may be many individual differences in the affordances that people perceive, there is some common ground in the sense that persons that are somewhat similar in capabilities, experience and culture perceive the same affordances in social interaction, i.e., they will act alike in a similar social context. However, it is well-known that a person with ASD will often act differently than a person without ASD in the same context, both in relation to the physical and to the social environment. Gibson (1986) notes that in order to become socialized a child has to perceive the affordances for herself as well as for others. In an interaction between a person with and without ASD, the dyadic partners may not be able to perceive the affordances of the other person and this may disrupt the rhythm of interaction. Trevarthen and Daniel (2005) have for instance shown that parents of children with ASD have difficulties in engaging with their child and that these interactions are characterized by less rhythmic interaction. In triadic interactions involving an object, the fact that two partners perceive different affordances of the object may also lead to less smooth interactions. Preference for producing or observing spinning or rotating movements (spinning objects, watching washing machine rotating, spinning wheels of toy cars) is for instance common in children with ASD (Bracha et al., 1995). If one child is for instance continuously spinning the wheels of a toy car while the other child is "driving" the car, this will probably decrease the amount of social interaction between the children. Consistent with the idea that a different affordance perception in ASD may underlie their social-communicative impairments, several studies have indicated that disturbances in basic perception-action process may underlie and are related to social-communicative impairments (Mottron et al., 2006; Gepner and Féron, 2009; De Jaegher, 2013; Donnellan et al., 2013; Kapp, 2013; Hellendoorn et al., 2014).

Although it should be taken into account that ASD is a pervasive developmental disorder which affects many developmental domains (Yirmiya and Charman, 2010), it is important that it is explained why there are *more* differences in the perception of social affordances between people with and without ASD than in the perception of affordances in the physical environment. Gibson (1986) already notes that: "The richest and most elaborate affordances of the environment are provided by other people. They move from place to place, changing the postures of their bodies... . The perceiving of these mutual affordances is enormously complex" (p. 135). Thus, there may be two explanations as to why social-communicative impairments are so pronounced in individuals with ASD. First of all, the perception of social affordances is different from the perception of affordances of objects because of *the nature of* social information. The social information consists of many features, is dynamic and multimodal (McArthur and Baron, 1983). Several studies show that children with ASD have specific difficulties, both delays and impairments, with perceiving dynamic and configural information, also in non-social situations (Deruelle et al., 2004; Behrmann et al., 2006b; Gepner and Féron, 2009; Annaz et al., 2010; Vlamings et al., 2010; Weisberg et al., 2014). This implies that the differences in perceiving social affordances between people with and without ASD cannot be attributed to the fact that the information is*social per se*, but to the fact that a lot of social affordances are specified by information that is difficult to pick-up for people with ASD. Another reason that may explain why people with ASD have the most difficulties in the social-communicative domain, may be related to the aforementioned idea that the different affordance perception of a person with ASD has cascading and possibly disrupting effects for the whole interaction since social affordances are actively created and maintained by the joint action of the actors in the interaction (Good, 2007). Thus, the affordances in the interaction between a person with ASD and a person without ASD will be different than the affordances in the interaction between two persons without ASD. It may even be that the social interaction affords a person with ASD to disengage, because the different perception of affordances makes the interaction very unpredictable, uncontrollable and stressful for them. Without interaction with other persons, the person with ASD will never learn to perceive the affordances and moreover, disengagement prevents the creation of affordances in interaction with other persons.

For some people with ASD social interaction affords a kind of disengagement in the sense that they explicitly theorize about social interaction, instead of engaging (Williams, 2004). Below are a few examples of how high-functioning people with ASD describe what they are doing in the social environment: "I was a scientist trying to figure out the ways of the natives. I wanted to participate, but I didn't know how" (Grandin, 1996, p. 132; cited in Williams, 2004). "By studying an individual's posture, actions, voice tone, and expression, I can now usually work out what they are feeling." (Lawson, 2001, pp. 8–9; cited in Williams, 2004). The fact that high-functioning individuals with

ASD can and do reason about social behavior does not imply that persons with ASD use ToM-style operations. In contrast, the fact that people with ASD act in this way in social interaction, while those that develop typically do not to theorize about social behavior in most social interactions, may actually show that they do not perceive the same affordances in a social environment. While the social environment affords engagement for the typically developing persons it affords detached theorizing for these high-functioning persons with ASD. As Reddy (2008) notes any "theory theory" is a very different understanding than skilled interaction with the environment: more like the understanding of a bystander than that of participant. This is also supported by studies that show that the performance of persons in ToM-like operations is not related to the skills people have in real-life engagement with other persons (Ozonoff and Miller, 1995).

# **TOWARD DIFFERENT RESEARCH METHODS**

Investigating social competence from an affordance approach requires different research methods than the methods that have been used to investigate social skills within a ToM paradigm. Research within the affordance approach should provide us with a description of the information people are responding to in social interaction, i.e., a description of which information people use to inform their actions. People with typical and atypical development, but also for instance children and adults, could then be compared to examine whether there are differences in the information they pick-up and use in social interaction. While the study of Klin et al. (2002) already provides an interesting example of such a design comparing people with and without autism with regard to their focus of attention while viewing a social scene, the participants in that study were still rather passive and detached from the interaction because the social scenes they watched were displayed on a video screen, and it was not an immersed situation. In line with the idea that cognition emerges in the interaction in a continuous perception-action cycle wherein behavior affords behavior, a study design with mobile eye tracking and coding of behavior of all participants in a real-time social interaction could provide the data that fits within the affordance approach to social perception. Since the actions of one person shape the actions of the other person (i.e., behavior affords behavior) more attention should also be given to research methods that measure variables of the interaction (*inter*-personal variables) instead of only focusing on *intra*- personal variables. De Jaegher (2006) states for instance that timing is a foundational aspect of successful social interaction which is disturbed in ASD.

In conclusion, "overcoming the myth of the mental," as Dreyfus (2006) states it, is difficult as is indicated by the popularity of ToM approaches and other approaches that fit within a cognitivist tradition. An embodied ecological perspective may offer a fruitful alternative to these approaches for studying both social and non-social cognition. The concept of affordances does justice to the idea that mind and behavior cannot be separated. People with ASD are not attuned to the same information as people without ASD. This leads to the specification of different affordances and may have cascading effects for the interaction with other persons. In conclusion, not only do people with autism experience or understand the world differently from other people, the environment (including other persons) really affords different behavior, simply because they are in it (Loveland, 2001).

# **REFERENCES**


face processing at an early age in autism spectrum disorder. *Biol. Psychiatry* 68, 1107–1113. doi: 10.1016/j.biopsych.2010.06.024


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2014; accepted: 17 July 2014; published online: 04 August 2014. Citation: Hellendoorn A (2014) Understanding social engagement in autism: being different in perceiving and sharing affordances. Front. Psychol. 5:850. doi: 10.3389/fpsyg.2014.00850*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Hellendoorn. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Intersubjectivity in schizophrenia: life story analysis of three cases

# *Leonor Irarrázaval 1\* and Dariela Sharim2*

<sup>1</sup> Centro de Estudios de Fenomenología y Psiquiatría, Facultad de Medicina, Universidad Diego Portales, Santiago, Chile <sup>2</sup> Escuela de Psicología, Facultad de Ciencias Sociales, Pontificia Universidad Católica de Chile, Santiago, Chile

## *Edited by:*

Ezequiel Alejandro Di Paolo, Ikerbasque - Basque Foundation for Science, Spain

#### *Reviewed by:*

Paul Lysaker, Indiana University, USA Mads Gram Henriksen, University of Copenhagen, Denmark

#### *\*Correspondence:*

Leonor Irarrázaval, Centro de Estudios de Fenomenología y Psiquiatría, Facultad de Medicina, Universidad Diego Portales, Av. Manuel Rodríguez Sur 253, Oficina 206, Santiago CP 8370057, Santiago de Chile, Chile e-mail: leonor.irarrazaval@udp.cl

The processes involved in schizophrenia are approached from a viewpoint of understanding, revealing those social elements susceptible to integration for psychotherapeutic purposes, as a complement to the predominant medical-psychiatric focus. Firstly, the paper describes the patients' disturbances of self-experience and body alienations manifested in acute phases of schizophrenia. Secondly, the paper examines the patients' personal biographical milestones and consequently the acute episode is contextualized within the intersubjective scenario in which it manifested itself in each case. Thirdly, the patients' life stories are analyzed from a clinical psychological perspective, meaningfully connecting symptoms and life-world. Finally, it will be argued that the intersubjective dimension of the patients' life stories shed light not only on the interpersonal processes involved in schizophrenia but also upon the psychotherapeutic treatment best suited to each individual case.

**Keywords: schizophrenia, phenomenology, hermeneutic, intersubjectivity, life stories, clinical psychology**

## **INTRODUCTION**

Pathological experiences are usually described as phenomena that are divorced from the life context in which they are manifested. Nevertheless, in the field of phenomenological psychopathology, symptoms have traditionally been considered from a more comprehensive perspective: they are embedded in the person's life thus their contents and meanings can only be understood within the context of that life. In themselves "unhistorical," symptoms become connected meaningfully only within the comprehensive picture of the patient's life as a whole (Jaspers, 1997).

An even stronger argument could be made to the effect that "no mental illness can be diagnosed, described, or explained without taking account of the patients' subjectivity and their interpersonal relationships" (Fuchs, 2012, p. 342). It is clear that psychopathological manifestations cannot simply be reduced to the workings of the nervous system (Fuchs, 2011). For that reason, the recommendation here would be not to establish linear or "cause/effect" relationships, but to approach mental illnesses with the notion of a "circular" mode of causality, regarding their emergence from subjective, neural, social, and environmental influences continuously interacting with each other (Fuchs, 2012).

Contemporary psychopathological phenomenology regards schizophrenia as a paradigmatic disturbance of embodiment and intersubjectivity (Dörr, 1970, 1997, 2005, 2011; Blankenburg, 2001, 2012; Fuchs, 2001, 2005, 2010a; Sass and Parnas, 2003; Stanghellini, 2004, 2009, 2011). From this approach, it seems appropriate to use methods that attempt to characterize not only the patients' symptomatic disturbances but also the interpersonal processes involved, broadening the scope of exploration to areas not taken into account in the criteriological manuals of diagnostic systems Diagnostic Statistical Manual of Mental Disorders (DSM) and International Classification of Deseases (ICD) (Fuchs, 2010b).

This paper presents the life story analysis of three cases that form part of the corresponding author's doctoral dissertation entitled "Study of disorders of the pre-reflexive self and of the narratives of first admitted patients with schizophrenia" (unpublished), covering a total of 15 patients with schizophrenia during their first psychiatric hospitalization.

The processes involved in schizophrenia are approached from a viewpoint of understanding, revealing those social elements susceptible to integration for psychotherapeutic purposes, as a complement to the predominant medical-psychiatric focus. Firstly, the paper describes the patients' disturbances of selfexperience and body alienations manifested in acute phases of schizophrenia. Secondly, the paper examines the patients' personal biographical milestones and consequently the acute episode is contextualized within the intersubjective scenario in which it manifested itself in each case. Thirdly, the patients' life stories are analyzed from a clinical psychological perspective, meaningfully connecting symptoms and life-world. Finally, it will be argued that the intersubjective dimension of the patients' life stories shed light not only on the interpersonal processes involved in schizophrenia but also upon the psychotherapeutic treatment best suited to each individual case.

Here, "life-world" refers to the person's subjectively experienced world, which emerges in the process of conceiving one's self and the others through a history of social interactions (Husserl, 1970; Schutz and Luckmann, 1973;Varela, 1990;Varela et al., 1991; Maturana and Varela, 1996).

## **MATERIALS AND METHODS STUDY DESIGN**

The study was developed within the qualitative paradigm, it being an explorative–descriptive type of study. This type of studies proceeds with inductive logic: in other words, both hypotheses and analysis categories are developed as the study progresses, and emerge from the data itself (Danhke, 1989 quoted in Hernández et al., 2003).

The so-called "critical case sampling" criteria was used, where the interest in an in-depth approach to the phenomena means working with few cases, with representativeness not being of key importance for these purposes. Thus, the significance and understanding emerged by qualitative inquiry have more to do with the richness of the cases chosen and also with the observational and analytical abilities of the researcher, rather than with size of the sample (Patton, 1990; Schwartz and Jacobs, 1996; Creswell, 1998).

## **PARTICIPANTS**

The broad research covered a total of 15 patients with schizophrenia during their first psychiatric hospitalization. All of them were males, aged between 18 and 25. Additional inclusion criteria were the following: (1) accessibility to the sample, (2) homogenous sample (Halbreich and Kahn, 2003), and (3) earlier first onset and higher risk of developing schizophrenia in men (Aleman et al., 2003).

The three cases were selected due to the variety of subtypes to illustrate the interpersonal processes involved in schizophrenia, taking the intersubjective dimension of the patients' life stories into consideration. Cases 1, 2, and 3, as they appear in the paper, correspond to patients with diagnoses of disorganized-type, paranoid-type, and catatonic-type schizophrenia, respectively.

## **INSTRUMENTS**

#### *In-depth interviews*

In-depth interviews were used to gather qualitative data from the first encounter with the patients and from their life stories. These interviews had open questions aimed at allowing for a natural manifestation of the patients' accounts. For the first encounter, the recommendations on interviews for the phenomenological diagnosis of schizophrenia were taken into account (Dörr, 2002), and clinical biographical focus criteria were used to perform the life story interviews (Sharim, 2005).

## *Positive and Negative Syndrome Scale*

The Positive and Negative Syndrome Scale (PANSS; Kay et al., 1987) is a rating scale used for measuring symptom severity of patients with schizophrenia. The name refers to the two types of symptoms: positive, which refers to an excess or distortion of normal functions (e.g., hallucinations and delusions), and negative, which represents a diminution or loss of normal functions.

## *The Examination of Anomalous Self-Experience*

The Examination of Anomalous Self-Experience (EASE; Parnas et al., 2005) is a semi-structured interview for the phenomenological examination of disorders of the pre-reflexive self, postulated as early markers or basic phenotype of the schizophrenic spectrum (Raballo et al., 2011). The EASE explores a variety of anomalous self-experiences, which typically precede the onset of positive symptoms and which also often underlie negative and disorganized symptoms (Parnas and Handest, 2003).

## **PROCEDURES**

Data gathering was performed by means of semi-structured interviews, which are characterized by the use of eminently "open" research questions. Less structured methods allow for the emergence of ideographic descriptions, personal beliefs and meanings, focusing on "how" the psychological processes occur (Barbour, 2000).

Five encounters with the patients were carried out. These encounters were coordinated throughout the three following phases:

**Phase I:** A first encounter to record the patients' accounts of the disturbances of self-experience and body alienations manifested in the acute episode (30–45 min interview carried out 1–2 weeks after hospitalization), following the confirmation of the diagnosis of schizophrenia in accordance with expert judgment and the standard diagnostic criteria of DSM-IV-R (American Psychiatric Association, 2003) and ICD-10 (Organización Mundial de la Salud, 2003).

**Phase II:** Two subsequent encounters to carry out the EASE (Parnas et al., 2005; 30–45 min per interview carried out 1 month after hospitalization), when patients did not score with "positive" symptomatology on the PANSS (Kay et al., 1987).

Note: The results of Phase II of the broad research have not been included in this paper. The results from the EASE exploration will be published in a complementary paper focused on basic self-disorders entitled "The lived body in schizophrenia" (in preparation).

**Phase III:** Finally, two further encounters were held to perform the life story interviews (30–45 min per interview carried out 1–2 months after hospitalization). The first encounter started with the open instruction "tell me about your self," "tell me about your life," while the second one was focused mainly on the patients' significant social interactions and personal meanings, also including their first image in life, their early dreams (hopes), their self-definition, and their expectations about the future.

All the interviews were recorded on video and fully transcribed for subsequent analysis. Extracts of the patients' accounts were kept literally in quotes.

## **ANALYSIS**

## *First encounter (Phase I)*

The patients' accounts of the disturbances of self-experience and body alienations manifested in the acute episodes were summarized in corresponding descriptions containing the essential structure of the transcripts, which were obtained with the "Descriptive Phenomenological Method in Psychology" (Giorgi, 2009), by following five steps: (1) the researcher reads the entire transcript in order to gain an overall sense, (2) the same transcript is then read more slowly, and underlined every time a transition in meaning is perceived, providing a series of units constituting meaning, (3) the researcher then eliminates redundancies and clarifies the meaning of the units, connecting them together to obtain a sense of the whole, (4) the arising units are expressed essentially in the language of the subject, revealing the essence of the situation for him, and finally, (5) there is the summarizing and integrating of the achieved understanding in a description with the essential structure of the transcript.

## *Life story interviews (Phase III)*

The criteria of the clinical biographical focus were considered in the life story analysis, which are part of the so-called "clinical human sciences" paradigm (Legrand, 1993; Sharim, 2005, 2011). This approach stresses the life story method, in which the clinical dimension is constantly present, working primordially on singularity: case-by-case, story-by-story.

At the same time, the examination of singularity and heterogeneity of individual situations allows the progressive appearance of common processes that structure behavior and organize these situations (Sharim, 2005, 2011; Cornejo et al., 2008). This method highlights the role of the subject in recounting his life story, giving the possibility to analyze the reciprocal relationship between the subject's determination by his history and his potential to create his own existence (De Gaulejac, 1999; De Gaulejac et al., 2005).

The in-depth analysis of the life stories was developed under a course guided by the co-author of this paper. The course was called "Hermeneutic analysis of biographical material for the study of patients with schizophrenia" and took place during one academic semester at the Catholic University of Chile. The analysis focused on the personal meanings (Fuchs and De Jaegher, 2009) by following the patients' history of significant social interactions.

Therefore, the transcripts were analyzed by peer researchers (corresponding author and co-author of this paper) both clinical psychologists with a specialty in psychotherapy. To avoid bias each researcher previously made a separate analysis and then met for the co-analysis, ensuring with this procedure the validity of the qualitative research (Maxwell, 1996; Morrow, 2005; Fischer, 2009).

Firstly, an individual (case-by-case) in-depth analysis of each narration using a hermeneutic approach was carried out. In this analysis each life story was re-constructed, carrying out a thematic and chronological ordering, which enabled the identification of "biographical milestones," as well as the analytical axes in each life story. Second, a cross-sectional analysis was carried out contemplating the stories all together, revealing the differences, similarities, and shared structural dimensions.

## **ETHICAL ISSUES**

The broad research, covering 15 patients with schizophrenia during their first psychiatric hospitalization, was regarded as entailing no physical, psychological, or social risks for the subjects involved, based on the Declaration of Helsinki principles, the Council for International Organizations of Medical Sciences (CIOMS) 1992 International Ethical Guidelines for Biomedical Research Involving Human Subjects, and the 1996 International Conference on Harmonisation (ICH) Good Clinical Practice guidelines, by the following Ethics Committees: (1) Research into Human Beings Ethics Committee of the University of Chile's Medical Faculty, dated January 19, 2011. (2) Ethics Committee Research of the Psychiatric Hospital, dated August 2, 2012. (3) Ethics Committee Research of the North Metropolitan Health Service (Santiago, Chile), dated August 16, 2012.

The Ethics Committees also approved the patients' and their tutors' (legal representatives) consent documents. In this regard, the following ethical aspects were taken into account: (1) consent was informed and obtained from the patients' tutors by

the attending doctor at Phase I of the study, considering that as a patient affected by an acute episode of schizophrenia, his competence or capacity is diminished and he must be authorized to participate. (2) Consent was obtained directly from the patients at Phase II of the study. (3) Pseudonyms were employed to protect the identity of the patients and ensure confidentiality (internal codes were used for each patient to replace their original names).

Note: Careful attention was paid in this paper to the protection of the patients' anonymity. Identifying information such as dates, locations, hospital numbers, etc., was avoided.

# **RESULTS**

## **INDIVIDUAL ANALYSIS (CASE BY CASE)** *Case 1*

Santiago (Santi) is an 18-year-old patient, diagnosed with disorganized-type schizophrenia. He has completed 8 years of basic school education. His father died of cancer 1 month before his hospitalization: until then, he lived with him and his two brothers. He is the middle brother. The patient's mother left home when he was 12 years old.

*First encounter.* A first interview was carried out after 2 weeks of hospitalization. In this encounter, the patient indicates that although he considers himself to be a "normal" person, begins to recognize a"*repetitive failure*." It is primarily the mediating process of thinking that has become the main impediment in this case.

The patient indicates that he hears voices, which are as if his own thoughts were repeated inside his head, like an echo, "*as if I was reading them aloud but with my mouth closed*." Most of the voices repeat meaningless things that he does not understand. He also hears voices on the radio, repeating what he is thinking: these are voices of unknown people who seem to be talking to him. Additionally, it sometimes seems to him that some television personalities repeatedly say things to him, all sorts of non-sense. He does not know how or why they do.

There are periods in which the "repetitive failure" intensifies, to the extent that it prevents him leaving home, and that only by going to bed to sleep is he able to take a break from these thoughts. This has made it difficult for him to progress with his studies or concentrate. He feels that this situation is annoying for him and is harmful because he cannot live a normal life.

At first, the patient figured it was sort of a game, playing with the voices and thoughts, but he could not control it, he could not stop it, he kept on playing. This was sometimes unbearable for him, and has even made him want to hang himself.

*Biographical milestones.* The life story interviews were carried out after 2 months of hospitalization. The patient was receiving the usual pharmacological treatment and had recently completed 12 electroconvulsive therapy sessions.

*"My mum left me when I was 12"*

Santi begins his account by indicating that he has had a hard life. He refers to his parents' divorce, and particularly to when his mother left him alone with his brothers when he was 12. His mother moved away from the city and got married again. "*It was very hard, when she wasn't there and we lacked a mother's love*."

In addition to being angry with his mother when she left home, Santi also points out that he did not get on with her as a child. He remembers that she used to get very annoyed with him when he and his father sometimes made fun of her.

The mother returned after 2 years for her children. Santi's brothers agreed to go with her, but he preferred to remain with his father. At the age of 14, he was living alone with his father. However, the brothers returned 2 years later, when he was 16, due to the serious situation with the mother's new husband, who beat them.

Santi states that he got on well with his brothers; they had an affectionate relationship, one of friends, between them. They helped each other out and shared the housework between them.

## *"I died in high school"*

Santi acknowledges that a significant change took place in his life at school. As a young child, he was a very good pupil and wanted to study medicine, but at the age of 12 he lost interest in his studies, skipped school, and began taking drugs. He had to repeat the last school year twice due to absenteeism. He liked the typical tools of the medical trade and wanted to have a stethoscope. "*Now that I have them here* (at the psychiatric hospital), *I ask myself, why can't I, if everyone else can?*"

He stopped taking drugs at the beginning of this year and returned to his studies. He wanted to study accountancy to earn money. He had recently started the first year of high school when he was hospitalized.

## *"My dad passed away recently"*

Santi states that his first memory is one of being with his family, when he was 7. It is a memory of the time when they were still living with their mother. He recalls it was his father who took them to a pretty square at the center of the city. "*Nice memories, everything was nice with my dad.*"

The father worked in the public sector and had taken early retirement, the reason for which is unknown. He did not remarry or have a relationship with another woman. Santi has a very positive image of him. He describes him as hard worker, a good father and who liked to go out and play ball with him and his brothers.

Santi displays an empathetic attitude toward his father, even a certain loyalty, which is made clear when he recounts the time when his mother left home, and later when his brothers left. In fact, he decided to stay alone with his father, despite the pain caused by the separation from his mother and brothers. "*My dad went through an extremely painful time, to put it one way, he didn't show it but, inside, he was feeling bad*."

The father passed away 3 months ago, from cancer, at the age of 65. He became ill a month before dying, and had immediately told his sons of his disease, so they were aware of how much longer the doctor had given him. The father was hospitalized at the time of his death.

Santi recognizes that he was very attached to his father, he states that "*even too much*." He realizes that he still has not gotten over the death of his father, "*because of my illness, I still have not gotten over it. I haven't realized what it all really means*."

## "*I see the future as nothing*"

Since the last 4 years, Santi has been becoming more and more distanced from the world, to the point where he is extremely isolated. He has no friends, does not study or work, takes no part in social activities and has not embarked on any romantic relationship.

During the week, he helped with some household chores, such as making lunch. Nor did he do anything special during the weekend, except go out to the square with his brother. He spent a lot of time in his room playing on his PlayStation. "*I seethe future as nothing, the way I'm going. Not doing anything, not studying, because where will I get like this? It's looking bad, isn't it? I'm worried*."

*Life story analysis.* The patient took part in the interviews without any problems. He appeared interested in obtaining more information on his state of health and motivated to seek help to secure a speedy discharge. He interrupted the interviews on a number of occasions to ask what his illness was, if it was very serious and when his attending doctor would discharge him. Generally, he appeared constantly concerned about his state and anxious to put an end to his confinement.

His life story contains a series of events that could be regarded as stressful. It is certainly possible to establish a connection between the death of his father (i.e., the patient's state of grief) and the emergence of the first acute episode, and also to identify his mother's leaving home as the crucial biographical milestone in the development of the prodromal stage of schizophrenia. Somehow, the sense of abandonment in the world has come to dominate the patient's life.

The scale of the emotional impact of the recent loss of a father is obvious: nevertheless, the patient at no time displays any signs of sadness and does not cry. Instead of a spontaneous emotional expression, he rationally discerns the seriousness of the situation and like a "witness" he testifies the tremendous impact this must have on his life.

He manifested an initial perplexity, conveyed with a degree of humor, in light of the apparent oddness and incomprehensibility of the account of his anomalous experiences ("the repetitive failure"). Nevertheless, although he recounts sad events in his life, any actual sadness can only be assumed. To put it one way, it is possible to "intuit" the patient's suffering, through the loneliness, abandonment and lack of support in his life, rather than by means of an explicitly emotional manifestation on his part.

The patient notices the paradoxical situation involved (of being hospitalized) when he states that he regards himself as a "normal" person, except for his "repetitive failure." Far from merely being a game, as he previously regarded it, it is now given the name of schizophrenia, a diagnosis that defines him as a seriously ill patient and justifies his compulsory commitment to a hospital. This has led him to realize that what is happening to him is not socially acceptable, and is thus regarded as more serious in his own judgment.

## *Case 2*

Angel is a 22-year-old patient, diagnosed with paranoid-type schizophrenia. He has 11 years of basic school education and lives with his parents and the eldest of his three sisters. He is the youngest of the siblings and the only brother. His family are evangelical Christians.

*First encounter.* A first encounter was carried out a week into his hospitalization. The patient has not been able to find a convincing explanation for the fear he feels, which he recognizes as his major impediment. He thinks he could be delivered over to the Tribulation – the Tribulation is a biblical time of pain.

About 3 months ago he began to feel persecuted by people. His house was the only place he felt safe, but for a few weeks now he has even begun to feel unsafe at home. The idea that somebody can hurt him comes from the fear he feels and he thinks that the worst thing would be that somebody kills him somehow, like stabbing him, for example. This fear is a distressing feeling, of wishing to escape, when he suddenly feels that something bad is going to happen to him.

He is quite concerned about his problem, and thinks a lot about it, and how to solve it. He wants to find a way to overcome the fear. He would like to find a "*clear and precise*" answer to what he should do, how he should live and how to face up to his fear. He wishes that the bible could tell him what to do in the Tribulation, "*if I was in that time, that it told me in light of this fear to do this or that, to face up to it, don't be afraid, I'll be with you*."

*Biographical milestones.* The life story interviews were carried out 1 month into the patient's hospitalization. He was receiving usual pharmacological treatment and his suitability for electroconvulsive therapy was being assessed.

"*When I was a kid I went to school*"

Angel woke up one night and found himself alone at home: it was very dark and he started crying. This is the earliest image that he recalls from his childhood. He also remembers that he would sometimes run up the stairs because he thought that someone, "*perhaps the bogeyman*," was after him.

He remarks that his grades were not great but things went well for him at school. During his childhood, he felt good because he went out to play and climb trees. He also liked to fix televisions and take apart toy cars. He stresses the fact that he was more outgoing and playful as a child.

His family was always good to him, and he notes that he had a happy childhood. He was closest to his mother, as she stayed at home and was very attentive and loving toward him. His mother was of good character, and only punished him on a couple of occasions, "*because once I hit my sister with a hammer, when I was playing, and my mum punished me, she gave me a slap on the behind*."

## *"My sisters were very critical of me"*

Angel has three older sisters. He has had a difficult relationship with them, and particularly with the eldest. He points out that his sisters criticized him a great deal and made fun of him. Therefore, even as a child, he took great care to say the right thing, so as not to make a fool of himself and feel embarrassed.

He was not only concerned to ensure that he said the right thing, but also with his personal appearance. He was very sensitive about the comments his sisters made about him. He states that he was very shy as a child, and when he was embarrassed by something he would run away and did not want to come back.

## "*Then I went to high school*"

At high school, Angel was unable to make friends. He notes that he changed, became less playful, less "chatty" and more reclusive. He did not play ball so much or join in with classmates as often.

He also comments that he found it difficult to appear in front of his classmates, and skipped school when he had to give a talk to the class on a subject. This got worse when he started to suffer from acne, which made him feel that people were looking at him too much and a little persecuted.

It was because of the acne that Angel began to skip school, until he stopped going completely and became totally isolated. "*By this point, the acne wasn't as bad, but it was the fact I missed school, I skipped class a lot, I was embarrassed that I skipped school so much, and that's why I stopped studying*."

"*Then I went out to work. That's when it all went wrong*"

Angel does not think that his acne is any better, but somehow he learned to come to terms with this concern. He has spent a lot of time at home, in his room playing on his PlayStation. This is what he has mostly done over the last 4 years, as he admits. "*I didn't see anyone except for my family, not friends, because it's a bit solitary on the PlayStation, you get closed in on yourself when you're on it*."

After 4 years, Angel went out to work. He notes that it is when everything went wrong. He had spent a lot of time at home, without going out. He notes that he was perhaps unprepared to go out and experience life like that all of a sudden. It was then that he began to feel that people were after him.

"*Now, as a person*"

In adolescence, Angel wanted to be an air force pilot but he could not apply because he did not finish his studies and was under the required height – "*it came as quite a blow, but I was still interested in mechanics*."

Angel does not have a clear vision of what the future holds, principally because he has not overcome the fear of being harmed and the thought that "somebody" will kill him, which is his most serious affliction. Nevertheless, he indicates that, if he can overcome his fear, he would like to work and study mechanics and electronics, which have been interests of his since childhood.

*Life story analysis.* The patient was very willing to take part in the interviews, although he generally appeared tired and dispirited. He seemed not to have much to say, or not to be ready to recount his story. He is of a religious disposition and a frequent reader of the Bible where, above all, he hoped to find an explanation for the problem affecting him: his fear.

His account is mainly based around the fear of being harmed, which is the subject of his delusion. He even appears, in a way, excited when talking about the problem of his fear and about the different explanations he uses to understand what is happening to him. Aside from this core problem afflicting him, his account barely touched on other aspects of his life, and he appeared to become dispirited, tired, and uninterested when moving away from the subject of his delusion.

He seems concerned that he is unable to find certainty in things, above all with regard to explaining his fear. He feels prey to a fear that is completely restrictive, and is unable to find a satisfactory explanation that would allow him to understand what is happening to him or to give a completely convincing response to overcome

the situation. He is aware of the extent of the fear and the significant limitations it causes in his life, and of the lack of any clear orientation as to how to overcome it.

The patient conveys a feeling of "ontological" uncertainty or insecurity. From an early age in his life, the world (and others) acquired a sense of unreliability or threat. Shame and fear of ridicule are the predominant emotional aspects of his experience in childhood. Somehow, later on in adolescence these emotions led to the fear of persecution. Persecution progressively became a fear of being hurt until it reached the extreme point of a fear that he would be killed, which manifested itself in the first acute episode.

# *Case 3*

Salvador (Salva) is a 25-year-old patient, diagnosed with catatonictype schizophrenia. He has completed 12 years of compulsory school education and lives with his father and older brother. His parents divorced 2 years ago.

*First encounter.* The first interview was carried out when the patient had been hospitalized for close to 2 weeks. He explains that 2 years ago started with an episode of mental illness: "*I was getting cramps in the back of my brain*." It was because of the confusion these cramps caused in his brain that he went to the psychiatrist. Then, he was diagnosed with depression and treated with medication for a year but the problem persisted.

He feels mental pressures, and indicates it is as if they squeeze his brain. His thoughts are jumbled up, all messed up with ideas. Reality gets distorted for him as well, as if he were in a constant dream. In addition, he has felt someone possessing his body and explains it as "demonic possession." He thinks that spirits get in when someone is depressed. It is something he cannot control, something unpredictable, imminent.

The patient is worried about the state of his mental health. It worries him to "live like this," and he feels a deep-seated desperation. He does not want to do anything and feels depressed, downcast, dispirited, and powerless. Before he was hospitalized, he wanted to committed suicide by jumping off a hill due to the desperation.

*Biographical milestones.* When the life story interviews were carried out, the patient had been hospitalized for a month and a half. He was receiving the usual pharmacological treatment.

"*My interest in religion began at the age of 8*"

Salva completed his primary education at a Christian school. He liked the religious part of school because religion was taught in a fun way. When he was a child, he used to go to church with his family. "*I liked the teachings about love, love for one another, love for one's neighbor*."

He points out that he was a very good student and got very good grades. He wanted to be a vet when he was a child, because he liked animals. He describes himself as a gentle, playful, brotherly, sweet boy.

"*They moved me to a worldly high school*"

The change of school had a negative impact on Salva. His performance suffered, and he went from being an outstanding student to being just an average one. He notes that students at the new school were treated more coldly.

He had wanted to be a vet since childhood but he could not go to university, as he did not pass the entrance exams. He therefore chose to study architectural drawing at a college, but did not manage to complete his first year there.

"*My mum was sweet to me when she was Evangelical*"

Salva had a good relationship with his mother as a child. He points out that his mother was very loving toward him whilst she was Evangelical. Later, however, for reasons unknown to him, she distanced herself from church. Their relationship deteriorated when he was a teenager.

He got on badly with his mother because, he explains, of their very different characters. His mother ill-treated him andfrequently insulted him. This made him feel powerless. "*She was really aggressive, and punished and hit me for anything. She used to insult me in all kinds of ways, she called me mentally ill*."

His mother also fought with his father and brother. She drank, and when she did so she became more violent.

"*I went through a lot in 2010*"

Salva states that he had his first episode of "mental illness" 2 years ago, and has not been able to work or study since then. "*I did nothing at home, just playing games on the computer; I'd play on it, football games and PlayStation. I spent a load of time doing that*."

It was in this same year that his mother left home and his father fell ill with diabetes. His brother had had a heart attack at the end of the previous year.

His mother left home to live with a new partner, saying she wanted her independence. At first he missed her, but was also angry. He did not want to see her or be with her after she left.

Salva continued to live with his father and brother. He feels very attached to them, and is concerned about their health. He feels he has a really great father, because he has had to play a double role. He gets on well with his brother too, who he regards as a second father.

"*It's great at church, they treat me really well*"

Salva's current friends are evangelicals and he joins them at church. He likes going to the church because there he got to know beautiful people and had a much closer relationship with God.

"*I like being in communion with God, praying, singing, that's how I look for protection*."

He has had four episodes of "demonic possessions," all of which happened at church. It was at church where he was told that his bodily experiences were "possessions" and that they are somehow "normal." However, the treatment he was given there was unsuccessful. They carried out "deliverances," which are a way of getting the devil out the body with prayer.

At the moment, Salva does not know why these episodes have happened to him, or whether they are due to an illness, and has not even talked much about the matter with his attending doctor.

"*In the future, I want to study massage therapy*"

Over the course of the last 7 years, Salva worked on and off in a number of fields. He took jobs as a shelf stacker in a supermarket, a cleaner at a cinema and a shop assistant. His last job was 2 years ago selling fragrances in a street market.

He has remained socially isolated over the last 2 years, only keeping in touch with his evangelical friends at church sporadically. "*I've found it difficult to relate to people in recent years. I haven't worked much or had much of a social life. I've been isolated*."

In the future he would like to have children, a wife and work giving massages, although he realizes that he remains scared about his mental state, that he feels vulnerable.

*Life story analysis.* The patient took part in the interviews willingly, although he did appear very tired and sleepy (he was constantly yawning). The disordered thoughts persist, as do his low spirits, mental pressures and the uncertainty in the face of possible new "possessions." He talks about himself and his life quite candidly and seems naïve, as if recounted by a small child. He speaks calmly, slowly, with little verve. It is a story with few elements told at a basic level of articulation.

He is very religious, a habitual reader of the Bible and a regular churchgoer. Now, although the episodes were "demonic possessions," fear does not appear to be the predominant or explicit emotion: it is rather the loss of control of his bodily experiences and the unpredictable nature of these episodes that make the patient desperate. In other words, his desperation is due to his inability to once again feel normal or healthy.

He left school 7 years ago and has not developed a specific plan to carry out his life. Although he wishes to have a "normal" life, his life project faces a vacuum. However, the lack of a plan does not seem to concern him at all. Instead, what most worries the patient at present is the state of his mental health, that is, the anomalous bodily experiences he is not able to control.

It is possible to make a connection between the emergence of the first acute episode and a series of stressful events that occurred in the patient's life at that time: his mother left home, his father fell ill with diabetes and his brother had heart problems, all in the same year. Although, the negative impact of the change in high school and the deterioration of the relationship with his mother in his adolescence are the crucial biographical milestones identified in the development prodromal stage of schizophrenia.

Besides, what the patient explains as "spirits getting into" does not seem to correspond to a typically clinical depression (as it was diagnosed initially), but rather to a severe "passivity" of his own existence, which finds concrete form in his disembodied experiences.

## **CROSS-SECTIONAL ANALYSIS**

The cross-sectional analysis shows that a severe disorder of intersubjectivity starts developing in early adolescence. Beginning at an early stage, the patients progressively distance themselves from the social world. This distancing becomes a structural element, a key part in the prodromal stage of schizophrenia.

It is not an active deliberate distancing, but rather an overall difficulty that hampers the living of a normal life. It implies a progressive "passiveness" of the patients' own existence, which manifests itself not only in the disturbances of self-experience and body alienations of the acute phases, but also in the patients' radical withdrawal from the social world.

For several years, the patients have not worked or studied, have had no social life, and have stayed shut in at home watching television or playing on their PlayStation for hours at a time. Here, it is important to notice that the acute episode occurred at a time when they were planning to return to their studies or the world of work after a number of years of extreme isolation.

It is possible to make a connection between the prodromal stages of schizophrenia and several stressful events that occurred in the patients' lives. It is also possible to follow a continuity in the experience of vulnerability regarding the main personal meaning configured early in life: the feeling of abandonment, the fear of ridicule and the feeling of powerlessness, corresponding to Cases 1, 2, and 3, respectively.

Nevertheless, the patients' withdrawal from the social world is what eventually leads to the manifestation of their psychosis. Somehow, in their attempts to returning to intersubjectivity, all of a sudden the patients confront themselves with their own "vulnerability" of being in the world.

Although they have some ideas about what to do in the future, the patients are insufficiently prepared, and lack a specific plan to implement them properly. Their life project faces a vacuum. This is what makes their condition so severe: there is an interruption in the patients' normal unfolding of life.

The patients do have a concept of what a "normal life" should be (basically, to study, to have a job, to marry, and to have a family), but they do not seem to possess the factual grounding needed to deal with the world, as if they were lacking the implicit "know how" to carry out the normal life they wish to live.

It should be noted that the patients'life stories feature a series of healthy elements or personal qualities that reflect a certain nobility of character: sensitivity, authenticity, naivety, empathy, and innocence. There does not appear to be any secondary gain associated with the symptoms.

# **DISCUSSION KEY FINDINGS**

In acute phases of schizophrenia, patients' accounts concentrate on (or are limited to) the disturbances of self-experience or body alienations. In other words, patients' accounts lie outside the time-space dimension of the social context and exclude personal history. Body alienation appears to be the way in which the de-subjectivized accounts find concrete form (or are materialized).

The assessment of the life stories complements the symptomatic descriptions embedding them in the patients' life-worlds, thus incorporating a social horizon. In this way, the dimension of intersubjectivity is illustrated in the patients' history of significant social interactions, discovering the interpersonal elements to integrate in psychotherapeutic and prevention models.

The articulation of the patients' life stories allow to follow the patients' progressive withdrawals from the social world, and also to identify the interpersonal conditions involved at the time of the acute episode's emergence. Thus, the spatiotemporal dimension of the personal history allows the understandability of the interpersonal processes involved in schizophrenia from a broader perspective.

From the individual analysis of the life stories, it is possible to identify the patients' biographical milestones, the personal meanings involved in their significant social interactions, and also continuity in their experience of vulnerability of being in the world, which are useful elements to consider for psychotherapeutic treatment.

The cross-sectional analysis of the life stories shows that a severe disorder of intersubjectivity starts in early adolescence, which should be a useful element to consider for the early detection and on the prevention. Beginning at an early stage, the patients progressively distance themselves from the social world, ending in a radical withdrawal. This distancing becomes a structural element, a key part of the prodromal stage of schizophrenia, as it was found in every case of the broader sample covering 15 patients with schizophrenia.

Social interactions are interrupted prior to the emergence of acute symptoms, possibly due to the threatening or anxiety provoking encounters with others. Nevertheless, the underlying anguish was not measured in this study. Instead, the study shows the personal vulnerability that leads to a psychotic break (or to the culmination of the intersubjective interruption).

## **CLINICAL IMPLICATIONS**

Psychotherapeutic interventions for patients with schizophrenia have been widely neglected in general. Current treatments are primarily with medication, including elctroconvulsive treatments in acute phases, thus following a medical-biological model that has not been questioned sufficiently. In this context, the intersubjective dimension seems extremely relevant for both the development of psychological treatments and the understanding of the interpersonal processes involved in schizophrenia (as an interruption in intersubjectivity).

From the very start of hospitalization, psychotherapeutic support would appear of fundamental importance. The patients should be accompanied on their return to intersubjectivity, whereas efforts should be made to provide proper emotional support for the realization of the overall problem affecting them. Prior to interventions focused on tasks (for example, successfully performing a social role, such as studying or working), the patients need to experience being in the world with another person, in a synchronous accompaniment of affective reciprocity.

In other words, the intersubjective dimension should be integrated in psychotherapeutic models focusing on the patients' social interactions. These models should be oriented to developing a collaborative encounter between the patient and the therapist, as well as enhancing metacognitive capacities, as it has been shown to be helpful especially for the recovery of patients with schizophrenia in several case studies (Dimaggio et al., 2008; Harder and Folke, 2012; Lysaker et al., 2013).

The process of recovering understandability would be a key aspect in overcoming the patients' alienation. Therefore, special consideration should be given to psychotherapeutic approaches that focus upon encouraging patients' self-understanding and the establishment of a common communicative base between patient and psychotherapist (Holma and Aaltonen, 1997, 2004a,b; Seikkula and Olson, 2003; Seikkula et al., 2006). The idea is that the patient's experience can be explicitly shared on the basis of a common meaning by a dialog process that takes into account the other's point of view (or second person-perspective; Stanghellini and Lysaker, 2007).

Patients' narrativity should improve along different levels of articulation, by the recognition of beliefs, the incorporation of emotions and the reconstruction of different meaningful life events. However, during acute phases delusional beliefs constitute the patients' only available form of cognitive and interpersonal organization, so instead of confronting them, the focus should be placed on the difficulty in pragmatically comprehending others and on the experience of vulnerability (Lysaker et al., 2011a,b,c; Salvatore et al., 2012a,b; Henriksen and Parnas, 2013; Škodlar et al., 2013).

Besides, acute psychosis in schizophrenia manifests itself with a collapse of the temporal dimension of the narrative plot, which leads to a de-contextualization of self-experience (Holma and Aaltonen, 1997, 2004a; France and Uhlin, 2006). From the so called "literacy hypothesis" (Havelock, 1980, 1991), which belongs to studies that follow the transition from orality to literacy in the development of the thematic consciousness, it could be noted that in the acute phase the patients lose the modality of ordering their experience in consensual logical sequences, displaying a narrativity with epic or poetic characteristics (Guidano, 1999).

The re-establishment of the consensual ordering given by the locational/situational aspects of the life story (by articulating the self-experience in thematic/chronological sequences; Havelock, 1980, 1991; Bruner and Weisser, 1991; Narasimhan, 1991; Guidano, 1999; Irarrázaval, 2003; Bruner, 2004; Holma and Aaltonen, 2004a) allows to follow the patients' progressive withdrawals from the social world, and also to identify the interpersonal conditions involved at the time of the acute episode's emergence.

In this sense, the articulation of the patients' life stories, expressed as narrative creations of their own subjectivity (and meanings), allows for the spatiotemporal dimension"re-ordering," as well as for the understanding of the interpersonal processes involved in schizophrenia from a broader perspective. This psychological understanding reveals the intersubjective dimension that connects the emergence of the acute episode with the patients' biographies, taking into account the personal meaning at play in each case.

In the case of Santi, there appears to be a need for emotional support aimed at accompanying him in becoming aware of the magnitude of the loss caused by the recent death of his father and, subsequently, to help him to develop strategies to deal with his feeling of abandonment in the world.

With Angel, his fear of ridicule is a structural emotional trait that dominates his life and is becoming a fundamental part of his worldview. Here, it is most important to deal with his sense of embarrassment and help him to accept himself. The aim is to provide a new, positive meaning to the sense of himself, overcoming his fear of ridicule in his encounters with others, or in other words, recovering the legitimacy of the sense of himself.

Salva requires an intervention in terms of developing a more basic sense of self-embodiment, which would be aimed at reflecting the feelings of "the other," to re-establish primordial reciprocity. Additionally, space needs to be created in which the patient can recover a feeling of protection in the world, overcoming the feeling of powerlessness.

From this viewpoint, taking into consideration the story the patient tells of himself improves the articulation of self-narrative, which should gradually be extended toward diverse areas of his life whose elaboration appears important for him to make his way back to daily life. It would be important to articulate the present considering the experience that takes place in the actual

interpersonal context, and from here to articulate the future as a horizon of possibilities.

Therefore, reconstructing the intersubjective dimension of the patients' life stories shed light not only on the interpersonal processes involved in schizophrenia, but also on the psychotherapeutic intervention best suited to each individual case. Moreover, when intervention in acute phases of schizophrenia focuses mainly on reducing "positive" symptomatology, without assessing the psychological and social elements that are part of the overall situation affecting the patient, relapse seems highly likely.

#### **LIMITATIONS OF THE STUDY**

Regarding the limitations of the study, mainstream scientific research in mental health has been dominated by quantitative methodologies and statistical analyses of big samples (representativeness), while the value of in-depth psychological analyses has been underestimated.

There is a predominant excessive confidence in the accuracy of numbers, as if they could not be easily manipulated in data analyses. This tendency has been supported by the illusion that numbers represent exactly (as a mathematical formula) the experience of the subject, rather than the patients' own stories.

While qualitative methodology has been the tradition for research in humanities and social sciences, psychotherapy research has been developed using the methodologies of the medical sciences, which are mostly quantitative, being the randomized controlled trials being the favored design.

Nevertheless, research in psychotherapy should be guided by questions that are relevant to clinical practice. It should not be forgotten that methodologies are only means to carry out scientific research, but should not be the ultimate aim in themselves. Thus in this field of research it seems necessary to incorporate the questions psychotherapists need to answer to improve the practice of psychotherapy (to help patients), and then to choose the most appropriate methodologies.

However, one of the main advantages of qualitative studies is the open, mindful and detailed assessment of the subjective experience, enabling the emergence of the patients' worldview and their personal meanings, which cannot be obtained by means of superficial assessments. Therefore, psychotherapists should also have a voice on the debate of which methodology is best suited to improving the practice of psychotherapy.

#### **FUTURE DIRECTIONS**

Certainly, it would be important to systematize the results of this study in a model of psychotherapeutic treatment for persons with schizophrenia, which should include the intersubjective dimension, starting from the hermeneutic analysis of the patients' life-worlds toward a meaning-based psychotherapeutic practice. This model would eventually require evidence of effectiveness.

Moreover, it would be interesting to explore gender differences in the processes involved in schizophrenia, investigating prodromal and acute stages, as well as life stories of women with schizophrenia. In addition, improvement is needed regarding the differential diagnosis between acute phases of schizophrenia and acute phases of other severe mental disorders, such as major depression and bipolar disorder.

Finally, the future challenge in the field of phenomenological psychopathology would be to develop a comprehensive/unified philosophical framework for an embodied science of intersubjectivity. And, consistently, to continue developing coherent methodologies for empirical research, since this is the closest we can get to the patients' life-worlds.

## **AUTHOR CONTRIBUTIONS**

Co-author Dariela Sharim made substantial contributions to the analysis and interpretation of data to include in the paper; she revised the paper critically for important intellectual content; she made a final approval of the actual version of the paper to be published; she agreed upon the accuracy and coherence of the development of the sections for the paper.

## **ACKNOWLEDGMENTS**

We would like to thank Thomas Fuchs from Heidelberg University, the Reviewers and the Editor for their helpful comments to improve the manuscript. Leonor Irarrázaval would like to thank Comisión Nacional de Investigación Científica y Tecnológica (CONICYT) for the grant "Beca Doctorado Nacional" (Doctorado en Psicoterapia UCH/PUC) and German Academic Exchange Service DAAD for the grant "Short duration research scholarships for doctoral students and young researchers."

## **REFERENCES**


in open-dialogue approach: treatment principles, follow-up outcomes, and two case studies. *Psychother. Res.* 16, 214–228. doi: 10.1080/105033005002 68490


Stanghellini, G. (2009). Embodiment and schizophrenia. *World Psychiatry* 8, 56–59.


the recovery of first- and second-person awareness. *Am. J. Psychother.* 61, 163–179.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 December 2013; accepted: 24 January 2014; published online: 12 February 2014.*

*Citation: Irarrázaval L and Sharim D (2014) Intersubjectivity in schizophrenia: life story analysis of three cases. Front. Psychol. 5:100. doi: 10.3389/fpsyg.2014. 00100*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Irarrázaval and Sharim. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Keep meaning in conversational coordination

# *Elena C. Cuffari\**

*Department of Philosophy, Worcester State University, Worcester, MA, USA*

#### *Edited by:*

*Eddy J. Davelaar, Birkbeck, University of London, UK*

#### *Reviewed by:*

*Eddy J. Davelaar, Birkbeck, University of London, UK Joanna Raczaszek-Leonardi, University of Warsaw, Poland*

#### *\*Correspondence:*

*Elena C. Cuffari, Department of Philosophy, Worcester State University, S-316, 486 Chandler St., Worcester, MA 01606, USA e-mail: elena.clare.cuffari@ gmail.com*

Coordination is a widely employed term across recent quantitative and qualitative approaches to intersubjectivity, particularly approaches that give embodiment and enaction central explanatory roles. With a focus on linguistic and bodily coordination in conversational contexts, I review the operational meaning of coordination in recent empirical research and related theorizing of embodied intersubjectivity. This discussion articulates what must be involved in treating linguistic meaning as dynamic processes of coordination. The coordination approach presents languaging as a set of dynamic self-organizing processes and actions on multiple timescales and across multiple modalities that come about and work in certain domains (those jointly constructed in social, interactive, high-order sense-making). These processes go beyond meaning at the level that is available to first-person experience. I take one crucial consequence of this to be the ubiquitously moral nature of languaging with others. Languaging coordinates experience, among other levels of behavior and event. Ethical effort is called for by the automatic autonomy-influencing forces of languaging as coordination.

**Keywords: coordination, meaning making, languaging, ethics, social interaction, enaction, distributed cognition, experience**

It is an exciting time to be a philosopher of language, as long as one is willing to look to what is happening in the language sciences. Here one finds confirmation of the deep skepticism that loomed throughout twentieth-century reflections on language: there is no such (simple) thing. Language cannot be studied as a phenomenon that is in any way separate from sensing, acting, interacting physical bodies and complex material and social worlds. What can a growing empirical and theoretical focus on dynamic conversational behavior mean for meaning? One consequence takes the form of a philosophical question: how can we account for the inherently moral character of human interactions, even as some aspects of our interactions are well explained by self-organizing mechanisms?

In notoriously deflationary style, Richard Rorty sums up a perennial philosophical view (shared by Wittgenstein (1953) and Mead (2009), among others) when he describes language as "noises and marks," which work by provoking other noises and marks. "To say that it [a given creature] is a language user is just to say that pairing off the marks and noises it makes with those we make will provide a useful tactic in predicting and controlling its future behavior" (Rorty, 1989, p. 15). Following Davidson, Rorty insists that language is not a medium, neither for expression nor representation (Rorty, 1989, p. 10). By seeing language as just another coping behavior with social consequences, he suggests, philosophers can get off the realism/idealism "see-saw" and thereby get to ask more practical and politically interesting questions. In particular, the upshot is that this view ". . . naturalizes mind and language by making all questions about the relation of either to the rest of the world *causal* questions, as opposed to the adequacy of representation and expression" (Rorty, 1989, p. 15).

Although this view is meant to espouse a "non-reductive behaviorism" (presumably with emphasis on the modifier), it can come off sounding somewhat emaciating. The "noises and marks" phrasing calls to mind Morse code, while the idea of predicting and controlling a fellow conversant evokes Terminator-type hyper-analytical visual perception that superimposes scrolling lines of data on the target object in sight. (It was the 80s, after all.) One can contrast this hollowing out of linguistic activity with a different account that was developing in the same decade—that of embodied cognitive linguistics. This research painted a radically alternative picture, that of the richly imagistic and fleshy inner life of metaphors and morphemes, all traceable to bodily structures and experiential patterns (e.g., Lakoff and Johnson, 1980; Johnson, 1987; Wierzbicka, 1988, 1996).

Interestingly, work in cognitive science today, specifically in the newly emerging paradigms of enaction, distributed cognition, and dynamical system approaches, indicates a return of the Rortyan perspective. Throughout this *social* cognitive science, the language of *coordination* increasingly is used to characterize not only social interaction dynamics and communication processes, but the workings of language itself (Clark, 1996; Fowler et al., 2008; Fusaroli et al., 2012; Dale et al., 2013, *inter alia*).

Different kinds of coordination are measured in research on language in interactional contexts. Some discuss coordination as the *alignment* of cognitive representations or conceptual schemes (Pickering and Garrod, 2004, 2014; Garrod and Pickering, 2009; Tylén et al., 2013). Conversation participants converge on representations by aligning "at many different levels, from basic motor programs to high-level aspects of meaning" (Garrod and Pickering, 2009, p. 293). Coordination understood as *physical entrainment* is also studied as potentially significant for language in its own right (Cowley, 2007; Fowler et al., 2008; Shockley et al., 2009; Riley et al., 2011). For example, Richardson et al. showed that visual attention—where people look and when—can "be coordinated on the basis of verbal contact alone" (Richardson et al., 2007, p. 407). Unintentional synchrony in seemingly non-linguistic phenomena such as posture and sway (Shockley et al., 2003), as well as speech rate (Street, 1984), vocal intensity (Natale, 1975), and pausing (Cappella and Planalp, 1981), invites analysis of linguistic interactors as constituting "jointaction systems" that can be studied as "non-decomposable units," or "self-organized dynamical systems that emerge from the nonlinear interactions and couplings that exist between and among individuals and the environment" (Fowler et al., 2008, p. 265). Fowler et al. (2008) for example find equivalence between interpersonal and intra-personal rhythmic coordination; whether the limbs in question belong to the same person or different people, and whether they are coupled by sight or by neuro-musclar tissue, "the same dynamical entrainment processes" operate (Fowler et al., 2008). By attending to the sub-personal processes of coordination dynamics, a *supra-personal* "dialogical system" (to borrow from Steffensen, 2012) comes into view.

Recent work refines the synchrony model of coordination by introducing the idea of *synergy* (for a review, see Fusaroli et al., 2014). A synergistic notion of coordination importantly distinguishes *complementarity* rather than simultaneity as a key characteristic of successful languaging. It also emphasizes the emergent dynamics of interpersonal dyadic systems, now understood not simply as dynamically orchestrated complex machines, but as sites of social cognition. "Crucial to this synergistic model is the emphasis on dialog as an emergent, self-organizing, interpersonal system capable of functional coordination" (Fusaroli et al., 2014, p. 147).

The synergistic approach to conversational coordination dovetails well with the enactive theory of social interaction, *participatory sense-making*, which likewise puts central explanatory weight on interpersonal coordination processes and thus "allows us to claim that social interaction constitutes a proper level of analysis in itself," one that enjoys its own autonomy or "life of its own" beyond the intentions of involved participants (De Jaegher and Di Paolo, 2007, p. 491; see also p. 494). Tracing the contours of coordination patterns and breakdowns, De Jaegher and Di Paolo describe human sociality as arising precisely in the interplay of influences between emergent interaction dynamics and the agents temporarily entrained by them (De Jaegher and Di Paolo, 2007, p. 492; see also Di Paolo and De Jaegher, 2012). Currently rounding out this coordination chorus, the distributed language approach (e.g., Thibault, 2011) pairs the early enactive autopoietic notion of languaging with the affordance paradigm of ecological psychology. "Languaging involves a complex coordination of multiple activities emphasizing the dynamics of real-time behavioral events that are co-constructed by co-acting agents" (Jensen, 2014, p. 2, this issue).

The move to complementarity, synergy, and supra-individual interaction dynamics arising from participatory coordination brings with it a slew of critical consequences for traditional analyses of conversational meaning-making, be they of philosophical or more applied linguistics stripe. The most radical implication of the coordination research is an overhaul in the definition of language itself. Language is now to be seen as a set of dynamic self-organizing processes and actions on multiple timescales and across multiple modalities that come about and work in certain domains (those jointly constructed in social, interactive, highorder sense-making). This is a very radical turn, one with many meanings. For example, on the basis of work in close kinship with these approaches, we are poised to appreciate language as multimodal (McNeill, 1992, 2005, 2012; Kendon, 2004; Streeck, 2009), and as a doing, i.e., as a "pragmatic and phonetic" rather than propositional or abstract issue (Hodges et al., 2012, p. 501). Furthermore, as Fusaroli et al. (2014) point out, taking this perspective is not merely a matter of stacking up new findings, but of clearing out old attitudes. In order to make space for proper appreciation of conversational synergy, they say we need to reject

two commonly assumed views: (1) the ultimate function [of conversational languaging] is not necessarily to reach deep mutual understanding of each other nor to converge internal representations; it is rather to realize an activity together which might or might not require deep mutual understanding (2) the function of a conversation cannot be defined on the level of the individual: the role of each individual component in a system. . . makes sense only within the functional organization of the dyad. (2014, p. 150)

Several key shifts are thus advocated by the synergy approach: a shift from individual to dyad in order to determine the functional teleology of a conversational interaction, and a shift from understanding the *meaning* of conversational action in terms of "deep mutual understanding" to the realization of a given and shared purpose or task.

Such shifts imply major philosophical and ethical consequences and raise a host of pressing questions (some to follow). Yet notice that these pivotal implications were more or less already there in the first-generation synchronous mechanism approach to conversational coordination. At root, the problem of coordination is "how a device of very many independent variables might be regulated without ascribing excessive responsibility to an executive subsystem" (Turvey, 1990, p. 938). Coordination means law-like patterns of movement that are emergent and self-organizing. Seeing language in this way brings about the gestalt switch that Rorty was after: language becomes a causal phenomenon, or better, a set of causal phenomena, fully on par with forces and events in the natural physical world.

The common heritage of coordination accounts of languaging interaction is the site of a significant tension, then. Precisely because language is a doing, a practical and physical as well as social and cultural activity, it finds a ready place on a continuum view of sense-making or fully embodied meaning generation, a view trained on the intrinsic normativity of always-caring, never neutral life in pursuit of life (Jonas, 1966; Di Paolo, 2005; Thompson, 2007). Recent work found in enactive, distributed, and dynamic proposals take this vantage point when they promote a nuanced and social picture of meaning-sharing. However, the paradigmatic resources of mechanism, movement, and even self-organization may be too thin, both epistemologically and ethically speaking, to account for the full significance and irreducible complexity of everyday human conversing. They only give us a birds' eye view of the story, one somehow beyond the system under observation.

I suggest that an adequately rich sense of meaning may be missed even in the synergistic coordination accounts, not because we lose track of skull-bound representations in adopting this perspective, but because we lose sight of the consequences a conversation can have for individual lives and selves (see also Kyselo, 2014, this issue). It remains to be seen if synergistic coordination gets us any further in our ability to explain how such consequences follow from the marks and noises we so perfunctorily get each other to make. (One may even notice that the coordination view of social cognition outruns the Rorty–Davidsonian dream of charitable anthropologists in the field: we are no longer predicting each other's moves, but are each and every one of us swept up in a smaller-and-larger-than-self tide of constraining and entraining languaging).

The challenge and the solution are the same: those of us interested in pursuing a radically non-representational, distributed, participatory, and behaviorally-attuned account of human languaging must work toward a better understanding of human embodied intersubjectivity as such. We are not pendulums. A conversation is more than a multimodal juggling act. But we do, in some ways, work like pendulums, and our conversations do fall into observable patterns and flows that may delight onlookers, especially those with access to multiple regression plots. It is exactly because as human social creatures we are remarkably adept at synchrony and synergy, turn-taking and rule-following, entraining each other and getting our movements hijacked installed hallway face-offs, that we must pay closer attention to what our bodies always already know how to do in conversational interactions. Empirical work supports the suspicion that just because a conversation runs like a well-oiled machine, it does not follow that interlocutors have jointly made or experienced any good sense (see Galantucci and Roberts, 2014). One possibility for paying better attention to our conversational co-enactings would be to investigate underexplored but highly relevant dimensions of our embodiment, including bodily protest, dissonance, discomfort, difference, and betrayal.

Richly intelligent and culturally elaborated as they are, our bodies can and do betray us. Frequently this betrayal comes in the form of habit. In 2007 I attended a talk that philosopher Shannon Sullivan gave on race. She spoke of one dimension of her experience of being a white southern woman: when people get verbally aggressive with her or are rude to her, she smiles. Seemingly against her will, her bodily practices carry and enact stark traces of a specific social-cultural upbringing. Despite her own frustration or discomfort, she habitually and automatically carries forward specific norms of how to be with others.

Social settings and scripts function similarly, assigning roles that play out as counterintuitive bodily actions. When I was working as a waitress in an upscale fusion restaurant about a decade ago, I once had a customer berate me and criticize my work in a way that was nonetheless perfectly polite in word choice and even in tone. But even as my body "took sides" with the insulting customer, obediently clearing his unwanted food, nodding, stepping back with a lowered head and then calmly walking away, a dissonance began to arise as a *creeping feeling*, the unsavory sense of needing to shake something off my back and shoulders, a hot tingle of anger as tears welled. There was a bodily knowledge that something in that outwardly smooth interaction had gone awry. I am not a mere billiard ball; my reactions are complex; and I do not "process" the emotional consequences of interactions immediately. With varying degrees of reflection and compassion, I can learn from experiences of bodily-emotional dissonance as I sort out the intra-individual tensions and unfold a broad range in meaning in what has transpired.

I do not know how the customer felt after this interaction on his side of things. One might imagine he felt smug and satisfied: he ultimately (and without much waiting) got what he wanted from his dining experience, and he imparted an important lesson to an ignorant girl. He sat back, comfortable, sure. He folded his hands on his belly. We both played our parts in the highly scripted ritual. We had coordinated well. But the meaning of the interaction was in no way the same for both of us.

There can be no denying the gendered and classed aspects of these examples, the distinctive contributions of personal as well as community histories. Our flesh-and-blood, inherently vulnerable, defensive embodiment senses and partially dictates the meanings that interactions have for us—consequences in terms of emotional experience, our possibilities for response and other action, our understanding. Evaluative reactions are conditioned by contexts, histories, and concern that can function as trigger points [Damasio's somatic marker hypothesis is one route toward linking social events and physiological reactions (1996)]. Nonsimilarity and non-identity in human embodiment thus act as content-generating resources, as each unique intelligent bodyself enacts its own dance with symbols, second-order language constraints, and situational dynamics (Cuffari and Jensen, 2014). That each of us interprets events or sentences differently is a basic motivator for communication and an on-going source of meaning-granting normativity. It is through conflict, argument, and negotiation that "deep mutual understanding" gets a chance to occur (Cuffari, 2014).

From these examples we must also note and take seriously the significant *temporal* dimensions of meaning unfolding, spilling beyond the boundaries of a dyadic episode. Studies in dynamic systems may be very useful here. Language is now understood as including many timescales "from milliseconds of brain activity to hundreds of milliseconds of individual cognitive processing, seconds and minutes of interaction, months and years of language acquisition, and hundreds of years of cultural language evolution" (R ˛aczaszek-Leonardi, 2010, p. 269). While new developments in mobile measuring technology and statistical analysis enable researchers to track the more micro of these timescales, many of these are arguably beyond the reach of what is available to phenomenological, first personal conscious experience or awareness during a languaging event. "If one agrees that language has an important function of interindividual coordination, some variables will pertain to this level, that is, the level of interaction, and may not be easily accessible to individual experience" (R ˛aczaszek-Leonardi, 2010, p. 275). But interactive coordination, as quickly or expertly as it may take place, nonetheless has complex consequences for individual experience. Proper investigation of these unfolding consequences will likely require identifying the right timescale for this sort of meaning. Merlin Donald's "slow process" hypothesis proposes an "intermediate" time zone for "complex events that extend over several hours (for example, a game or conversation)" and points out that "adult humans typically live, plan, and imagine their lives in this time range" (Donald, 2007, p. 214). Donald sees the slow process as a uniquely human capacity that co-evolved with cultural developments precisely to "handle the cognitive demands imposed by increasingly complex distributed systems" (Donald, 2007, p. 214). The slow process hypothesis implies a "deeper background vantage point," constituted by "a vastly extended working memory that serves as the overseer of human mental life" (Donald, 2007, p. 220). This presents a plausible physiological explanation for how individuals are simultaneously players on the great, shared stage of life while still maintaining a concrete experience and narrative sense of *my life*.

In this article I have been endorsing or performing a particular view of meaning as having to do with consequences. For example, I have used phrases like "what does this approach *mean* for language research" or "the interaction did not have the same *meaning* for both of us." This sense cannot exhaust the rich notion of the meaning of languaging, which as we have seen, is an always on the move, dynamic phenomenon unfolding across timescales and participants. Depending on the timescale one uses in observing languaging, meaning may be apparent at the level of an event, as in the way that mothers and babies complete each other's actions (R ˛aczaszek-Leonardi et al., 2013). In deconstructing a politician's television ad campaign, meaning may be seen at a social-systemic level. Nevertheless, the sense of meaning as "carrying forward" (Gendlin, 1962, 1997), as a series of changes or implications in phenomenologically available felt sense and action possibilities, is an important one that can and should be integrated into the social-interactive turn in cognitive science. Mark Johnson summarizes this pragmatist view of meaning, writing that human meaning is that which "concerns the character or significance of a person's interactions with their environments,"

. . . the meaning of a thing is in its consequences for experience how it "cashes out" by way of experience, either actual or possible experience. Sometimes our meanings are conceptually and propositionally coded, but that is merely the more conscious selective dimension of a vast, continuous process of immanent meanings that involve structures, patterns, qualities, feelings and emotions. (Johnson, 2007, p. 10).

For some current proposals of how personal histories of culturally situated embodied experience can inform the meaning of languaging acts (wordings, gestures, improvisational performances, etc.), see Jensen, 2014; Koubová, 2014 (this issue); and Cuffari et al. (2014).

As De Jaegher and Di Paolo point out, because sense-making is "essentially embodied in action" it is "directly affected by the coordination of movements in interaction" (De Jaegher and Di Paolo, 2007, p. 497). This suggests that meaning (in the sense I mean it) can be coordinated, or more precisely, that interacting coordinates processes of meaning making (e.g., responsive embodied activities in the interaction). This observation presses the importance of ethical attunement. It tells us an important thing: language approached as coordinating also means that in conversational exchanges, emails, and elevator rides, we are constantly getting coordinating and constrained, and doing the same to others, whether or not we are aware of it. But it does not say whether coordinating another's sense-making will have good or bad outcomes, or how we are to discriminate.

Immanent, embodied dimensions of our interactions personal experience, social position, habituated reactions, emotional and physical vulnerabilities, and temporality—are our sources of caring and evaluating. As ecological psychologist Bert Hodges tells us, "The pragmatics of languaging and language can thus largely be summarized as, learning how to be *caring* and *careful* in our speaking and listening to each other. To care and to be careful is to evaluate and select better and worse ways to move" (Hodges et al., 2012, p. 503). What will count as better or worse is sometimes immediately obvious and often an *emergent* inter-personally produced or discovered quality. But this is not always the case. The call to learning is a call to growth, improvement, and change—it does not suggest that merely going on interacting as we always do will suffice. Perhaps not every interaction "task" requires deep mutual understanding. But if it is true that "we converse in order to explore and create possibilities for doing something good together" (Hodges et al., 2012, p. 503), it seems that mutual understanding is an important element of conversational interaction. What will serve as our teacher in this crucial learning process, what can act as our normative guide, is our individual yet intersubjectively engaged embodiment.

## **ACKNOWLEDGMENTS**

The author wishes to thank Hanne De Jaegher, Ezequiel Di Paolo, George Fourlas, and a reviewer for sharp reading and inspiring comments at various stages of composing this commentary. This work is supported by the Marie-Curie Initial Training Network, "TESIS: Toward an Embodied Science of InterSubjectivity" (FP7-PEOPLE-2010-ITN, 264828).

## **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 May 2014; accepted: 15 November 2014; published online: 03 December 2014.*

*Citation: Cuffari EC (2014) Keep meaning in conversational coordination. Front. Psychol. 5:1397. doi: 10.3389/fpsyg.2014.01397*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Cuffari. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Emotion in languaging: languaging as affective, adaptive, and flexible behavior in social interaction

# *Thomas W. Jensen\**

*Centre for Human Interactivity, Institute of Language and Communication, University of Southern Denmark, Slagelse, Denmark*

#### *Edited by:*

*Hanne De Jaegher, University of the Basque Country, Spain*

#### *Reviewed by:*

*Ankita Sharma, Indian Institute of Technology, Jodhpur, India Giovanna Colombetti, University of Exeter, UK*

#### *\*Correspondence:*

*Thomas W. Jensen, Centre for Human Interactivity, Institute of Language and Communication, University of Southern Denmark, Sdr. Stationsvej 28, 2000 Frederiksberg, Slagelse, Denmark e-mail: twj@sdu.dk*

This article argues for a view on languaging as inherently affective. Informed by recent ecological tendencies within cognitive science and distributed language studies a distinction between first order languaging (language as whole-body sense making) and second order language (language as system like constraints) is put forward. Contrary to common assumptions within linguistics and communication studies separating language-as-a-system from language use (resulting in separations between language vs. body-language and verbal vs. non-verbal communication etc.) the first/second order distinction sees language as emanating from behavior making it possible to view emotion and affect as integral parts languaging behavior. Likewise, emotion and affect are studied, not as inner mental states, but as processes of organism-environment interactions. Based on video recordings of interaction between (1) children with special needs, and (2) couple in therapy and the therapist patterns of reciprocal influences between interactants are examined. Through analyzes of affective stance and patterns of inter-affectivity it is exemplified how language and emotion should not be seen as separate phenomena combined in language use, but rather as completely intertwined phenomena in languaging behavior constrained by second order patterns.

**Keywords: first order languaging, second order language, affective stance, ecological naturalization, interaffectivity, emotion, sense making**

# **INTRODUCTION**

Emotion and language belong together. Indeed, in this article it will be argued that emotion in fact lies at the heart of language if viewed as an embodied dialogical activity. Still, within mainstream linguistics as well as in communication studies, language and emotion have so far been categorized as belonging to two separate domains that must be kept apart: Language, on the one hand, belongs to the structures of thought comprising an abstract "language system"; it is based on words and representations and it is communicatively deliberate, while emotion, on the other hand, belongs to the body; it is associated with un-intentional reactions, sensations and actions visible in a non-abstract and separate "body language." This article, however, aims to show the inadequateness, and ultimately false nature, of these dichotomies while pointing to a new way of looking at the relationship between language, emotion, action, and intersubjectivity. It is about time to put an end to unfruitful divorce between language and emotion. They need to be brought back together.

## **FIRST ORDER LANGUAGING AND SECOND ORDER LANGUAGE**

New developments in language studies have now made it possible to investigate emotion as an integral part of our language activity rather than studying emotion as a somehow separate phenomenon added to speaking. The recent theoretical developments carving the way for such a proposal have taken place within a variety of new approaches to language, cognition and social interaction such as *distributed language* *and cognition* (Thibault, 2008, 2011; Kravchenko, 2009; Cowley, 2011a; R ˛aczaszek-Leonardi, 2011; Pedersen, 2012; Steffensen, 2012; Cowley and Vallée-Tourangeau, 2013; Jensen, 2014), *dynamical systems* and *interpersonal coordination* (Bickhard, 2007; Fusaroli et al., 2013b; Fowler, 2014), *dialogism* (Linell, 2005, 2009), *ecological psychology* (Gibson, 1979; Hodges, 2009, 2011), and *embodied and enacted cognition* (Chemero, 2011; De Jaegher and Di Paolo, 2007; Anderson et al., 2012; Di Paolo et al., 2013).

The key notion in the present work is the term *languaging*. It originally stems from the early works of Maturana (1970) and has recently been revived and redeveloped by a number of scholars working within the distributed language group (Love, 2004; Linell, 2009; Cowley, 2011a; Pedersen, 2012; Steffensen, 2012). In particular the term has been elaborated in various works by Thibault (2005, 2008, 2011), and it is this particular version of the notion of languaging that will be adopted in this article1 . In his 2011 article Thibault argues that the recent developments within distributed language studies represent:

..a renewed attempt to better understand the materially embodied, culturally/ecologically embedded, naturalistically grounded,

<sup>1</sup>Hence, when referring to languaging as behavior or whole-body sense making it is in the Thibault version of the term. For an overview of the various positions to languaging and the first and second order distinction, see Steffensen (2014).

affect-based, dialogically coordinated, and socially enacted nature of languaging as a form of whole-body behavior or whole-body sense making (p. 211).

This view attempts to capture the activity bound character of language as its primordial feature. Languaging involves a complex coordination of multiple activities emphasizing the dynamics of real-time behavioral events that are co-constructed by co-acting agents. For that reason languaging—language as an activity—is promoted as a *first-order* phenomenon, whereas what is usually referred to as language within linguistics—language as a symbolic and rule-governed system—is seen as a *second-order* construct or constraint on languaging behavior. The term "language" therefore becomes an umbrella term encompassing both first and second order as two different but intimately related dimensions in this specific kind of behavior.

Importantly, this approach entails an *inversion* of the traditional ontological order of language saying that firstly we have a "language system" which is then turned into use by "language users." This is rejected arguing that first of all there is *activity,* and out of this languaging activity "grows," on longer evolutionary as well as socio-cultural timescales, language as a symbolic system-like constraint that highly influences languaging behavior. This shift is crucial because it re-conceptualizes our general understanding of "language." Traditionally, within folk understandings as well as within linguistics, we look upon and comprehend language as a combination of system and use (with the system as the primary ontological phenomenon and the use as an epi-phenomenon). From a distributed perspective however, we can see language as an activity system; that is comprised of first order activity and second order constraint. i.e., "we depend on dynamics first and symbols afterwards" (Cowley, 2011a, p. 11). In that sense the term "language use" implies a pre-established system whereas *languaging* designates activity or behavior as the primary ontological feature of language while also acknowledging the socio-cultural constraints making this activity something distinct—or different from other types of activity or behavior.

This article is chiefly an examination of the affective and emotional dimension of languaging dynamics of face-to-face interaction (i.e., speaking, hearing, gazing, gesturing, mimicry, postural sway, and so forth) while also considering how these types of activity are constrained by second order patterns2 . The theoretical claims put forward in this work are developed on the basis of thorough analyses of empirical data consisting of video recordings of different situations and subsequent transcription that allow for detailed investigation of the inter-bodily dynamics of human dialog.

## **EMOTION AS PART OF LANGUAGING**

Within the growing literature on distributed language and cognition (Thibault, 2008, 2011; Kravchenko, 2009; Cowley, 2011a; R ˛aczaszek-Leonardi, 2011; Pedersen, 2012; Steffensen, 2012) the close relationship between emotion and languaging has often been implied (e.g., languaging as "affect-based" in the Thibault quote above). Still, a more thorough attempt to investigate the intricate connections between emotion and languaging remains to be seen. This article is a first step in this direction by relating and specifying the languaging approach in terms of emotions in social interaction. It will be argued that emotion is not separated from language—as an independent non-verbal component to verbal communication as it is often laid out—nor can emotion be regarded as merely a secondary function of language. Instead *emotion and affect are integral parts of languaging behavior*3, or rather languaging is whole body activity *including* emotion.

On a fundamental level we feel in conjuncture to the movements of ourselves as well as other people: We see, hear and experience other people's emotions in and through their wholebody movements (facial, gestural, postural, and vocal) and likewise we enact emotions by altering our voices, moving our bodies, using our facial muscles, making gestures, or touching each other (Colombetti, 2014). Thus, emotions and emotional experiences are inherently tied to bodily sensations. Indeed it is virtually impossible to imagine an emotion without a bodily sensation as famously argued in relation to fear by William James:

What kind of emotion of fear would be left, if the feelings neither of quickened heart-beats nor of shallow berating, neither of trembling lips nor weakened limbs, neither of goose-flesh nor of visceral stirrings, were present, is quite impossible to think (James, 1884, pp. 193–194).

Furthermore, a fundamental quality of emotions is their "ability" to ascribe value to experiences. Through emotions we experience something as "something"—fearful, exciting, boring, scary, attractive, or repulsive. As several neuro-scientific studies of people with brain damage have shown, without emotion the world appears "gray" and uniform with no appeal to act upon it (Damasio, 1996). Within such studies emotions are examined in relation to the human brain "as complex collections of chemical neural responses forming a distinctive pattern" (Damasio, 2003, p. 53). In short emotions can be seen as complex neural, chemical, and behavioral patterns functioning as feedbacks on encounters or situations processes by which our bodies assess their state and make adjustments to maintain their homeostasis. Thus, in this sense, which is the position taken in this article, emotions *are* in

<sup>2</sup>The specific focus on emotion in situ, in the observable here and now of social interaction, entails that what is often referred to as "autobiographical" emotional experience (Damasio, 2003), that is, emotional memories and knowledge about the past, will, for the most part, play a less prominent role in the following. Whereas what is commonly called "procedural" emotional experience (Tulving, 1984), or *emotional episodes* (Colombetti, 2014), that is, momentarily emotional experiences and action embodied in person's behavior, play a much more prominent role in the analyses as well as the theoretical chapters.

<sup>3</sup>Some scholars distinguish between affect (or mood), as a more primary and pervasive phenomenon, and emotion, as more experientially specific and distinct (see for instance Colombetti, 2014, pp. 2–15). However, for the sake of simplicity and space this discussion will not be pursued in the present work. Thus the terms "emotion" and "affect" are used more or less interchangeably also reflecting the various uses within linguistics and psychology respectively. "Affect" is a more common term in linguistics whereas "emotion" is more widespread in the social sciences and psychology. For this reason both terms are used here but with "emotion" as the most prevalent one.

fact movements; not just within us however, but also movements that connect experiences with situational affordances:

Emotions are processes of organism-environment interactions. They involve perceptions and assessments of situations in the connected process of transforming those situations. The body states connected with feelings are states of both response and remaking of experience. I say, "I'm fearful," but this really means "The *situation* is fearful"; fearfulness might appropriately be described as an objective aspect of the situation *for me at this moment*. (..) In short, emotions are both *in us* and *in the world* at the same time. They are, in fact, one of the most pervasive ways that we are continually in touch with our environment (Johnson, 2007, p. 66).

However, in order to relate these processes directly to language a re-specification of our conception of language is called for. And this is what the notion of languaging offers: As part of our languaging behavior, parts of our whole-body sense making, emotions are enacted as evaluative processes, intersubjective positions, and possibilities for action4 . In that sense emotions are part of a human-environment system. They are part of our ecology as properties of whole situations, including individuals and environmental structures. To sum up, given that emotions are seen, not as individual inner states, but as processes of organism-environment interactions, and given that languaging is seen, not as an abstract semiotic system, but as dynamic adaptive behavior, emotion is to be seen as an intrinsic part of languaging itself. Indeed, it is impossible to fully understand languaging as behavior without considering emotion.

## **STRUCTURE OF THE ARTICLE**

Overall the article can be divided into five major parts: Following this introduction, there is a critical examination of the way emotion has been addressed by separating it from language within linguistics and communication studies (section Traditional Obstacles in Integrating Language and Emotion). This is followed by a more elaborate treatment of a combined dialogical/ecological approach to language and cognition with a specific focus on an how emotion can be seen as part of languaging (section Languaging). Section Analyses is the empirical part, consisting of analyses of video recordings of real life social interactions investigating the claims put forward in the previous section. Finally, in section 5 the analytical findings and theoretical claims will be put in perspective in relation to the study of emotion and cognition and the methodological challenges of this new approach will be discussed.

## **TRADITIONAL OBSTACLES IN INTEGRATING LANGUAGE AND EMOTION**

Why is it then that the phenomenon we call language is commonly understood as something separate from emotion? Or rather, what is it in our understanding of the notion of "language" that makes it separate from that of emotion? An attempt to answer these huge questions, while staying within the space limitations of this article, has to operate with a strict focus. Let us therefore limit our focus to four widespread views on language and communication that indirectly have come to function as obstacles for a more integrated view on emotion and language: (1) A view on language as a *code-like system*. (2) A conception of language as a phenomenon first and foremost based on *words* resulting in distinctions between *language* vs. *body language,* and *verbal* vs. *nonverbal* communication. (3) A view on language and communication as a *transfer of information* from a sender to a receiver. (4) A view on language as a *social phenomenon* through and through that can be treated without any consideration of its biological dimensions.

Let us now take a closer look at these obstacles.

## **OBSTACLE 1: LANGUAGE AS A CODE-LIKE SYSTEM**

Twentieth century linguistics was dominated by powerful form-based theories of abstractions like structuralism and generative grammar that ended up excluding the dynamics of real time language behavior as a relevant study of object (Harris, 1987; Linell, 2009). As it has often been noted in the history of linguistics (Lyons, 1981) the two major components in Saussurean linguistics: *langue* and *parole*, share many similarities with the Chomskyan notions of *competence* and *performance*, in the sense that the proper object of study became "language" as a hidden set of structured forms underlying the various kinds of language use. The language system is conceived as either an autonomous system (langue) or a specific module in the brain (competence). In both cases the key is that the *language faculty* is separate and must be studied in its own right apart from the messy dialectics of real-time speech production and comprehension. As a consequence the focus on an idealized system of linguistic knowledge left no room for the role of emotion or affect; emotion was categorized as a phenomenon that *by definition* is excluded from (the study of) language.

Looking back, these abstract theories of language have been heavily criticized for losing sight of the way language is actually used and for completely neglecting the role of the context (Levinson, 1983; Chafe, 1994). As a consequence, since their heyday a wide variety of usage based approaches to language have appeared. There is, however, still a massively prevalent tendency to think of language in terms of system and use respectively5 ; the premise being that if studying language you can choose to focus on one or the other, but the fundamental division in system and use is—almost—unquestionable. The problem however in accepting this division, even for usage based theories, is that the underlying assumption is that the system is the foundation (or the essence) while the use is a changeable epiphenomenon. The theoretical consequence is that emotion can never be part of language itself; it can only be added as an extra non-linguistic device in language use.

<sup>4</sup>An alternative to view emotion and affect as inherently evaluative can be found in recent enactive approaches to emotion: "From the enactive standpoint defended here, bodily arousal is not merely a response to the subject's evaluation of the situation in which he or she is embedded. It is rather the whole situated organism that subsumes the subject's capacity to make sense of his or her world" (Colombetti, 2010, p. 157).

<sup>5</sup>See for instance this introductory line presenting the study of pragmatics: "Those aspects of language use that are crucial to an understanding of language as a system, and especially to an understanding of meaning, are the acknowledged concern of linguistic pragmatics" (Levinson, 1983, Back cover).

## **OBSTACLE 2: LANGUAGE AS FIRST AND FOREMOST BASED ON WORDS**

In his, 2005 book Per Linell describes a *written language bias* concerning a strong tendency in linguistics to describe and understand spoken language in the terms of written language resulting in a fatal lack of awareness of the distinct characteristics of spoken language. It has resulted in the common assumptions that writing and speaking are only different external manifestations of the same underlying "language" (langue, competence, conceptual system, etc.) and thus that writing and speaking basically share the same task of expressing human thought—albeit in different ways. A further consequence has been a *reification* of language. Language is seen as a phenomenon that by definition is based on words (or other lexical items), and subsequently sentences and grammar, as in written language6. That is, words or other lexical items, function as designators of fixed and welldefined meanings (except when deployed in metaphorical or indirect ways). Words are treated as separate entities that function as representations of meaning. As a consequence there is a separation between what is intrinsic to the meaning of words and what is somehow seen as being outside this confined linguistic meaning.

This view lives on in the popular and widespread (common sense) distinctions between *language* vs. *body language* or *verbal* vs. *non-verbal* communication. The former are based on words, the latter on something else (bodily practices) than words. Body language or non-verbal communication is by definition something separate from language concerning unintentional sensations or feelings that contain an "unspoken meaning"7 . Whereas body language is exclusively defined as *behavior*, not language (Boyes, 2005), the concept *paralanguage* is defined as metacommunication more directly related to language (Poyatos, 1993; Van Berkum et al., 2008). Still, it relies on a distinction of the linguistic content in itself (*what is said*) as separate from the variety of ways, typically involving prosody, pitch, volume, intonation etc., in which something is said or communicated (*how it is said*) (Thibault, 2008).

The theoretical consequence is again that the numerous, and affective laden, ways in which words are deployed (negotiated, interpreted, explored enriched, etc.) in the meaning dynamics of actual talk becomes detached from "language itself." Therefore, emotion and affect is treated as something that can only modify, emphasize or nuance meaning by its virtue of not being language.

## **OBSTACLE 3: COMMUNICATION AS TRANSFER OF INFORMATION**

The classical idea within communication studies is still that communication can be captured as a transfer of information between individuals (Weaver and Shannon, 1963). This notion rests on the idea that *something* is communicated and furthermore that this "content" is of a somewhat stable character. This idea has been analyzed in terms of *the conduit metaphor* in which language is viewed as a "conduit" conveying mental content between people (Reddy, 1979). It is metaphorically construed as if, whenever people communicate, they "insert" their mental contents (meanings, thoughts, concepts, etc.) into "containers" (words, phrases, sentences, etc.) whose contents are then "extracted" by listeners. Again it is worth noticing that this conceptualization rests on the highly problematic notion that meanings of utterances as somehow internal and distinct from their unfolding or deployment8 .

Interestingly there is a strong parallel to the way emotions, or rather emotional expressions and emotional communication, have been studied. The most obvious example is the way in which the human face has often been described as a sort of "mirror" of our emotional states. Thus, facial expressions are widely considered the most reliable source for studying emotions dating all the way back to Charles Darwin's seminal work *The Expression of the Emotions in Man and Animals* (1998/1872). More recently the psychologist Paul Ekman has conducted several studies on the alleged universal correspondence between basic emotions and specific facial expressions (Ekman, 2006, 2007) 9. However, this type of research in facial expressions rests heavily on a Cartesian division between the inner emotional state and the outer emotional expression: Emotions are hidden inside us and sometimes our facial expressions reveal this "inner landscape." Thereby the expressive or communicative part becomes only an outer byproduct of the inner source—the emotions themselves. Furthermore, there is a tendency to view emotions as revealing as well as "real." They can be trusted (unlike language) exactly because they are "involuntary not intentional" (Ekman, afterword in Darwin, 1998/1872, p. 372)10. They disclose our inner motives and desires

<sup>6</sup>The claim is that with the invention of writing the notion of language underwent a process of reification and objectification due to the permanent and visible signs on paper. The conception of language was transformed from an embodied activity into an object (of study). The view on language as a structured set of abstract forms used to represent things in the world evolves from this written dimension and its embodied dialogical nature is backgrounded or treated as irrelevant (Linell, 2005).

<sup>7</sup>This is mostly true of communication studies whereas as other fields, such as gesture studies, to a much larger degree see verbal utterances and gesture as one communicative whole and therefore gesture as a part of language (Kendon, 2004). Still, surprisingly only recently have gesture (arm and hand movements) been directly related to emotion and affect. More about this in section Languaging, primary intersubjectivity, and language.

<sup>8</sup>It is of course important to mention that the information transfer model has been heavily criticized for exactly this and is now abandoned by many positions within the social sciences. The more up to date alternative is a view on meaning in interpersonal communication as a co-constructed sense-making that is accomplished within the interaction itself (Jensen, 2014).

<sup>9</sup>Today there is to large extent an agreement within emotion researchers on the validity on cross cultural facial expressions and their correspondence to basic emotions. At least when it comes to the universality of distinguishing between negative emotions such as happiness, sadness, fear, anxiety on the one hand and positive emotions as surprise and joy on the other hand. It is for instance generally acknowledged that it is a universal phenomenon that eyes widen with surprise and joy and narrow with anger. Still, it has proven more difficult to distinguish within different negative emotional expressions—such as sadness, anger, fear, and disgust—than between positive emotions such as joy and happiness (Planalp, 1998).

<sup>10</sup>Evidently, this problem also concerns the distinction between verbal and nonverbal communication: "A long standing debate concerning verbal and nonverbal communication has been whether verbal communication can be trusted at all in terms of its 'truthfulness.' In almost any introductory textbook to nonverbal communication, students learn that words may lie, and nonverbal signals do not" (Sandlund, 2004, p. 84).

and thereby send an unintentional "message": "We don't make an emotional expression to send a deliberate message, although a message is received" (ibid: 373).

For this reason this approach has also been criticized and rethought within communication studies:

The fact that people can and do alter the expressions of even the primary emotions suggest that emotion display or *emotion expression* may be more aptly termed *emotional communication*, in the sense that emotional information, like other types of information, is shaped for audiences. (..) Emotions may (or may not be) be activated internal states, but when they are communicated, they are packaged in ways that are consistent with other communication practices (Metts and Planalp, 2003, pp. 348–49).

Still, even though the authors attempt to free themselves from the dualistic tension in the term "emotional expression" they get caught up in the communication transfer model. Emotional communication is still understood and conceptualized in terms of a sender and a receiver. Indeed, the whole argument governing the division of emotional communication into different "channels" or "cues" (physiological, bodily, vocal, and facial cues) is flawed by its own terminology. Thus, the sheer notion of emotional cues still entails a view on emotions as an encapsulated entity originating within the individual and then being brought into public light through different devices. Emotions are described as "information" which is then "shaped for audiences" when being communicated—exactly like the linguistic meaning is described within communication models. To sum up, the notion of emotional communication is only possible by means of dualistic separations of "inner emotional states" from the outer social communication of those states, and likewise a separation of the specific "emotional cues" (body language) from the "real language."

## **OBSTACLE 4: LANGUAGE AS A PURELY SOCIAL PHENOMENON**

This last obstacle reflects a tendency which is present in varying degrees within different contemporary language studies, such as *linguistic anthropology* (Wilce, 2009) *discourse analysis*, *discursive psychology* (Potter, 1996) and the so-called *third wave sociolinguistics* (Eckert and Rickford, 2002), to postulate that most, if not all, aspects of reality are constituted, embedded, and maintained in and through language. What we call "reality" is socially negotiated and linguistically constituted which means that we do not have access to any kind of reality outside of our linguistically determined experience. This view rests on the assumption that language does not represent a given reality "out there" but rather constitutes our experience of reality.

The basic idea that language is first and foremost a practice and cultural resource which gains its meaning, not from representing thoughts or ideas, but from what it does in contextually defined situations, actually does have many points in common with a distributed "languaging approach." Still, this purely social, or constructionist, view often comes with an unfortunate tendency to reject natural or biological phenomena as having a meaning outside of conceptual treatments. Put a bit crudely, it implies that language defines the scope of our experience and therefore we only have access to "natural" phenomena in and through our language use. Or rather, they only gain meaning by being conceptualized through language. This creates a focus on *language ideologies*(Bauman and Briggs, 2003), among them how emotions are conceptualized in our language use. Despite the relevance and interesting findings of such studies there is a tendency to reduce emotion to a matter of words or ways of talking:

Discursive psychology, for example, examines emotion vocabularies and refers to emotion discourse as a "way of talking." "Instead of asking the question, 'What is anger?'," Harré writes, "we would do well by asking, How is the word 'anger' actually used in this or that cultural milieu and type of episode?" (Maynard and Fresse, 2012, p. 93)

The premise of such studies lies in the constructionist assumption that our access to emotion is mediated and constituted by our language use. Emotions are only "emotions" when called "anger," "joy," "embarrassment," and so forth. Thus, emotions become intellectualized as a matter of words and concepts and the result is that there is no independent (emotional) reality outside of language. Instead of widening, or redefining, the notion of language, as inherent in the notion of languaging, language becomes detached from its embodied characteristics and emotion is locked in the confined room of emotion words. Likewise, bodily actions and movements are in many constructionist analyses (Harré, 1986; Gergen, 2009) treated as first and foremost a by-product of verbal discourse and social conventions which, in the end, results in a *social reductionism* that leaves the embodied biological dimensions of emotions fundamentally unexplained.

Now, from the vantage point of this article it is vital to avoid all of these obstacles separating emotion from language and instead strive toward an ecological naturalization that sees language "as fully integrated with human existence" (Cowley, 2011a, blurb), implying, among other things, that emotion and affect can be embraced as integral parts of languaging behavior. Let us now have a closer look at such an approach.

# **LANGUAGING**

# **AN ECOLOGICAL NATURALIZATION**

First of all it is important to clarify that an ecological naturalization (Steffensen and Cowley, 2010; Thibault, 2011; Steffensen, 2014) is by no means an attempt to reduce culture, sociality and language to biology, neurology or physics as implied in some previous attempts on naturalization (Pinker, 2003). On the contrary an ecological naturalization goes against any sharp distinction between the socio-cultural and the natural sphere. In relation to the present work, the key ambition is to present a study of how emotions can be analyzed in situ without committing to *either* a biological or a social standpoint that respectively excludes the other. Instead, inherent in the notion of languaging proposed here is the tenet that language, at the same time, is a cultural organization of processes and naturalistically grounded in human biology implying that:

..there is no inherent contradiction between seeing language as biogenic and as social, simply because sociality is our human way of being nature. This assumption both precludes the bioreductionism that ignores supra-individual (i.e., social or cultural) dynamics and the socio-reductionism that ignores the metabolic and ecological foundations of human existence (Steffensen, 2014).

Secondly, this ecological viewpoint crucially affects the re-thinking of the notion of language, conceptualized as first order dynamics and second order patterns, as mentioned in the introduction. Real time adaptive flexible behavior and coordinated activity is referred to as first-order languaging (putting weight on the fact that language arises from activity); this activity however presents itself (on a phenomenological level) as words and utterances with meanings and connotations and so forth, i.e., as second order language. Contrary to a representational view on language however, it is crucial to bear in mind that "speaking does not refer to the world; it *causes* an experience that happens to coincide or not with the narrow situation or the larger reality such as it is enacted" (Bottineau, 2010, p. 278). Thus, the meaningful patterns and configurations of speaking arise because we, as bio-social beings enmeshed in specific social realities, are accustomed to take, what Stephen Cowley has coined, a *language stance* (Cowley, 2011b). We learn to scrutinize and discriminate between different sounds (and movements) so that we hear vocalizations as words in the process of being enrolled in an ecological reality. In a complex bio-social environment, bodies, physical artifacts, words, embodied movements (gestures, gazes, mimicry, postural sway, etc.), social norms, and other sociocultural resources all function as enabling conditions or *affordances* (Gibson, 1979; Hodges, 2009) for human action. Thus, put a bit crudely, the focus shifts from abstract forms (as in traditional linguistics) to a reconsideration of how "we perceive bodily events as wordings. Emphasis on coordination allows due weight to be given to the fact that languaging predates literacy by tens-of-thousands of years. By hypothesis, all linguistic skills derive from face-to-face activity or languaging" (Neumann and Cowley, 2013, p. 18).

This ecological approach does not need to mark a sharp line or discrimination between (what is usually called) a natural or social/cultural reality. Instead the distinctions or dualisms between the biological vs. the social and the here-and-now vs. the grand scale formations are challenged by grounding languaging in bodily co-experience while at the same time being sensitive to overreaching cultural and social constraints on language.

## **LANGUAGING, PRIMARY INTERSUBJECTIVITY, AND LANGUAGE**

As laid out by Paul Thibault the recent movements within distributed language studies positions languaging as intimately related to intersubjectivity and affective attunement:

"Human language is seen more and more as a suite of flexible and adaptive behaviors that are based upon a naturalistically grounded intersubjective sensitivity to the bodily dynamics (movement) of others and the sensorimotor coupling relations between persons and their worlds that result from this in the intersubjective matrix" (Thibault, 2011, p. 212).

In the same vein, in a recent publication within embodied and social cognition, Joel Kruger refers to an older study of breastfeeding (Kaye, 1982): "the infant's earliest and most complex form of social interaction. The rhythmic cycles and backand-forth interplay of breastfeeding appears to play an important role in the infant's social cognitive development*...* Within the dynamics of this exchange, mothers sculpt the infant's attention: their behavior is organized by the mother's touch and physical prompting. The infant is guided to notice salient environmental affordances by the jiggling (e.g., the nipple affording feeding) that, in light of her underdeveloped endogenous attention and lack of behavioral organization she might not otherwise pick up" (Krueger, 2013, p. 43).

It seems obvious that the contours of languaging, in its most basic form, are definitely grounded in such early intersubjective behaviors. Of course, later in the course of life it expands and gains an enormous complexity by being enmeshed in the sociocultural reality, as described in the previous section. Thus, what is referred to in the present work as "languaging" overlaps, to some extent, with what other scholars, primarily concerned with bodily behaviors only (Gallagher, 2005; De Jaegher and Di Paolo, 2007; Gallagher and Zahavi, 2008), call *primary intersubjectivity* (Trevarthen, 1979). However, in this work "languaging" is put forward since the specific research interest and focus is different. It is a focus on showing the continuity between bodily engagements and activities including speaking and verbal behaviors—and thus second order. That is, bodily activity in the here-and-now which is always already being constrained by situation transcendent elements emanating from larger socio-cultural timescales. The commonly learned second-order language shows up in the flow of first-order languaging, shaping and constraining the possibilities for sense-making therein, though not exhaustively determining or explaining them. In that sense languaging behavior is infused with second order patterns; thus the first/second order distinction is not a clear cut separation like the traditional distinction between system and use.

Furthermore, there is a tendency within both primary intersubjectivity approaches (Trevarthen, 1979; Gallagher and Zahavi, 2008; Krueger, 2013) as well as embodied and extended approaches to cognition (Clark, 2008; Chemero, 2011) to *underthematize language*, and thereby not attempt to explain how language more specifically relates to our bodily engagements. Many scholars who seem quite progressive in relation to cognition, perception, emotion etc. still maintain a somewhat traditional view on language as "a tool for thinking" (in traditional views) or (in more modern versions, see Clark, 2008) a way of extending our minds into the world, and thereby neglecting the activity bound character of language (see Steffensen, 2009; Fusaroli et al., 2013a; for a similar critique). Whereas the languaging approach allow us to see language as first and foremost an activity; it "is a doing" (Cuffari, 2014, present volume) intimately tied to affective attunement while also being constrained by second order patterns.

## **AFFECTIVE STANCE AND INTER-AFFECTIVITY**

In opposition to traditional dualistic conceptions of emotion as "inner states" and behavior as "outer conduct" there is a long and rich phenomenological tradition of dealing with perception, action and emotion as intertwined phenomena by, among others, Maurice Merlau-Ponty:

I do not see anger as a psychic fact hidden behind the gesture (..) The gesture does not make me think of anger, it is anger itself. I perceive the grief or the anger of the other in his conduct, in the face or his hands, without recourse to any inner experience (Merleau-Ponty, 1964/1992, pp. 48–49).

The important point, made already more than half a century ago, is that we do not, as commonly thought, infer inner emotional states on the basis of (an interpretation of) outer behavior; rather we perceive emotions directly in our interlocutors. Emotions come about as behavioral patterns, or put another way, they are in the behavior, not a product of or something to be drawn out of the behavior. Relating this to languaging and human interaction and emotion we can say that, in interaction, we perceive emotions directly in order to do things. Gestures, facial displays, posture, wordings, or simply whole-body languaging acts, generate affordances for trajectories of further action in human dialog (Hodges, 2009). Interaction is constantly pushed forward by actions that invite or afford further actions by; here emotions play a crucial role as the "grease" keeping these dynamics going. In that sense human dialog is often, in varying degrees, infused with, what Karl Bühler called "communicative valence" (*kommunikative Valens*—Bühler, 1934, p. 31. Taken from Caffi and Janney, 1994):

During interaction, we tend to perceive others as "opening up" or "closing down," being responsive or reticent, making signs of approach or withdrawal; we perceive their relative strength or weakness, their fuller or lesser presence, their attentiveness or disinterest. All such perceptions are rooted in, and depend on, emotive displays. (..) It is the capacity, for example, to view "positive" behavior as a possible starting point for agreement or cooperativeness, "negative" behavior as a possible starting point for disagreement or conflict. (..) In all cases, the interpretation of emotive activities involves an appreciation of interpersonal relations and self-presentation (Caffi and Janney, 1994, p. 329).

This aspect of human interaction is often described in terms of stance taking (Du Bois, 2007; Goodwin, 2007; Goodwin et al., 2012). According to John Du Bois, when we express opinions, and/or display affect, three dimensions are at stake simultaneously; *evaluating* the topic we are talking about, *positioning* ourselves with respect to topic and others, and *aligning* or *dis-aligning* with our interlocutors. There is, however, a quite narrow focus on words and a somewhat individualistic point of view in (parts of) the stance literature11; consider for instance these lines from Du Bois' *The stance triangle*: "One of the most important things we do with words is take stance(..) Stance can be approached as a linguistically articulated form of social action" (Du Bois, 2007, p. 139). From the vantage point of this article it is crucial to widen the scope of stance so as to investigate affective stance as part of (whole-body) languaging behavior and intertwined with the dynamics of human-environment-systems. Stance is the perfect example of languaging as whole-body sense making; processes of evaluating, positioning and/or aligning/dis-aligning are by no means restricted to "the use of words" (even though they often play a part) but involve whole bodies engaging in adaptive flexible behavior.

Affective stance is crucial in understanding languaging as attunement to the environment in and through coordination of behavior (Bickhard, 2007; Fusaroli et al., 2013b). Languaging is about coordinating dynamics; it is "something we *do together*" (Fusaroli et al., 2013a, p. 2). Taking this perspective a step further, in a recent article on gesture in interaction Böhme and colleagues investigate how we do affective coordination together, coined as *inter-affectivity*.

Affect in face-to-face communication is assumed to manifest itself as embodied inter-affectivity. Our analyses will document that affect is in fact a dynamic and shared "in-between" phenomenon, jointly created by the participating interlocutors. Therefore, an *interactive expressive movement unit* is a sequentially organized product of joint gestural activities of co-participants in an interaction, which, by definition, entails more than one gesture unit (Böhme et al., 2014, p. 2116—italics in original).

This notion of inter-affectivity, challenging the idea of affect and emotion as properties of individuals, in turn makes it possible to question the traditional clear cut distinctions between Self and Other as two separate entities that can only communicate be means of "emotional cues" or "channels." Rather, human interaction can be seen as an unfolding of a "temporarily coordinated functional whole, consisting of two sub-systems (R ˛aczaszek-Leonardi, 2011). A consequence of this is that the unit of analysis shifts from the interpretation of individual doings and the causal link between separate actions to a more *systemic view* considering human interaction as a *dialogical system* (Steffensen, 2012) which can be seen as "systems of co-present human beings engaged in interactivity that bring forth situated behavioral coordination (or a communicative, structural coupling) (Steffensen, 2012, p. 513). Such behavioral coordination is infused with affective valence and emotion from the very outset. Adaptive flexible behavior is all about adjusting, attuning, directing, opposing or contrasting behavior within a human-environment-system, or human-human-environment-system. Or put in another way, emotions can be seen as the glue of dialogical systems.

# **ANALYSES**

## **METHOD AND TRANSCRIPTION**

Central to the notion of languaging, as previously described, is the inclusion of embodied actions of all sorts: posture, gaze, gesture, facial movements, voice quality, in- and out-breaths, etc., are all important parts of first order languaging. This of course needs to be reflected in the methodological praxis in general, and specifically for this work, in the transcribing and notation of interactional data.

However, there can be no such thing as all-encompassing transcription; for instance the notation of facial movements and gesture in the present work is by no means as detailed as studies focusing solely on these phenomena, for example by using close ups on each participants face and hands. In this case, only one camera for each recording was used. Still, as mentioned earlier the primary research questions for this work concern languaging

<sup>11</sup>See Goodwin et al. (2012) for a more broad approach to stance including bodily behavior as well.

behavior in its totality, not the specific role of facial movements or gesture as such. A basic model of the transcription system developed by the conversation analyst Jefferson (2004) is employed here which include notations of basic prosodic features, such as pitch, volume, speed, intonation, and tone of voice (i.e., smiley or crying voice). In many conversation-analytic studies the verbal and vocal activities are supplemented with comment lines of descriptions of embodied activities. Still, a serious challenge for developing a specific methodology for analyzing languaging in situ is the traditional outset in words and individual talking turns inherent in both the notion of speech acts as well as (to some degree) in conversation analysis (Searle, 1969; Hutchby and Woofit, 1998). As noted by, amongst others, Per Linell and Sarah Bro Pedersen a word and line based transcription (with bodily movements only appearing as comments) can in itself be seen as proof of a written language bias (Linell, 2005; Pedersen, 2012). Furthermore, to some extent this procedure (involuntary) reflects a tradition in linguistics that endows words and verbal behavior with a certain privileged status. Nevertheless, since we cannot go back in time and be present in the flow of interaction as it took place, we need to be able to capture and represent what went on. For the sake of recognizability this often means reading a word based transcription perhaps combined with notations of bodily movements.

Another way to go about it however is to combine words based transcriptions with images. Images have the advantage of favoring an in situ impression of the interaction instead of a retrospective description; they show the dynamics instead of trying to explain them. For these reasons the verbal transcriptions are combined with images paving the way for an analysis of these conversations as instances of whole-body languaging behavior. The verbal utterances are presented in the Danish original first and then translated into English in the following line (in italics). A complete overview of the transcription symbols is attached as an appendix to the article. Still, it needs to be said that there is a tension between the notion of languaging as whole-body sense making and this CA inspired model of transcription that is in need for clarification and further development in future works12.

## **ANALYSIS: AFFECTIVE STANCE IN LANGUAGING**

The following example is taken from a larger recording from a Danish school for children with special needs13. M and E, a pair of twins diagnosed with intellectual disabilities, and a speech and language therapist are sitting around a table playing a card game. It is a board game with different cards depicting various objects, animals and social situations and the objective is to train the verbal skills and social knowledge of the children. Leading up to the sequence below M has drawn a card and is now supposed to say what it depicts.

In the middle of the sequence something unexpected happens: instead of delivering a verbal answer to the two questions posed by S (in line 1 and 3) M suddenly performs a variety of (bodily) languaging actions (see second picture). Up till that point M has been sitting still while holding out the card with his right hand for both him and the other participants to see (see first picture). But all of a sudden the intensity changes in the inter-bodily dynamics between M and S. A series of affective movements start unfolding beginning in line 4 with M becoming highly energetic: throwing his torso back and forth, kicking under the table, smiling and moving his head while at the same time with high volume uttering two distinct sounds (RRCH RRCH) resembling the sound of pigs. Immediately the activity level of S changes as well. In the first half of line 5 her eyes widen significantly while gazing directly at M; she smiles and starts speaking with a distinct smiley voice with high volume, emphasis and rising intonation (see third picture). Together these rapidly evolving and tightly coordinated inter-bodily dynamics of M and S build an affective alignment. An alignment that emerges from the totality of the inter-actions, not just as a result of separate individual actions, but as an overall pattern or configuration of expressive movements (Böhme et al., 2014), vocal sounds and wordings that emerges as shared *interaffective experiences* of intense involvement, joy, and excitement. As depicted by the yellow circle in the last picture both M and S are complete engaged in their inter-affective movement dynamics (gesturing, moving their upper bodies, smiling, grimacing, and gazing at each other) that, taken together, build a shared affective encapsulated by the yellow circle. Thus, the yellow circles means to depict the affective development from M's (individual) gesture and whole-body movements to the inter-affective coupling between S and M in the last picture14.

Furthermore, it is crucial to pay attention to the sequential placement of M's initial languaging actions. They are embedded in the ongoing structure of the interaction and performed in line 4 at exactly the point in which a traditional verbal answer would be expected. But instead of stating verbally what is on the card M is *acting* the depicted content by uttering pig-like

<sup>12</sup>For other ways of capturing and transcribing languaging behavior see Steffensen (2012), Pedersen and Steffensen (in press), and Böhme et al. (2014). 13This recording was made available to me by professor, Gitte Rasmussen, on condition that the anonymity of all the persons involved was upheld. I would like to express my deepest gratitude to Gitte Rasmussen for the opportunity to work on these data.

<sup>14</sup>At the same time however, for a brief moment, this structural-emotionalcoupling isolates E as not being part of this alignment, which is apparent in the way he looks down at his own cards disengaging from the shared activities between S and M.

sounds, kicking under the table and throwing his torso back and forth. It can be seen as a whole-body languaging act of *showing* instead of *telling*. Indeed, these whole-body movements are an instance of *affective stance taking* embedded in the immediate environment and arising from ongoing processes of interaction. As described previously, stance is traditionally understood and described within the framework of words. In this case however, by letting whole-body actions replace wordings M takes a stance that immediately affords an alignment by S. In acting the answer instead of just saying it M indirectly *evaluates* the object as well, i.e., the predefined task at hand and the way the answer is meant to be delivered. Thus, this whole-body languaging behavior redefines the rules in a creative way and thereby *positions* M in relation to the game activities, which in turn enacts an interaffective space between M and S that *aligns* their stance taking and enhances an immediate intersubjective understanding between them.

## *First order languaging constrained by second order language*

Focusing on the second half of the response of S however, reveals the short lived character of this intersubjective alignment: S's confirming response in the beginning of line 5 is quickly repeated only this time without any of the initial prosodic features such as smiley voice, high volume and rising intonation: -YES WHAT IS IT CALLed<sup>↑</sup> -(.) *>*what is it called↑*<*), i.e., this repetition works more as a more straightforward request for a verbal answer. In other words, the first-order whole body stance taking is quickly constrained by a verbalized (second order) request. The here-and-now languaging behavior becomes enmeshed in the prerogative—or the second order constraint of the socio-cultural function of the game: To train the verbal skills of the children. The initial acknowledgement of S had a function: it cooperated in establishing an intersubjective alignment. Then, there were renewed possibilities; room for trying things that are hard and difficult, namely verbal depiction, which is the aim of the game and possible as S is willing to redefine the rules to achieve the goal. For a brief moment S had acknowledged that whole-body languaging is indeed language, meaningful and even powerful. At the same time however, verbal language is needed in this social learning activity, as well as in society in general, to accomplish certain tasks.

In relation to this example the consequence is that an embodied emotional languaging response needs to be enrolled in second order norms and patterns in order to gain recognition and acknowledgement, i.e., it needs to be verbalized. Thus, in the last part of line 5, after a mini pause of 0.2 s, S provides this requested verbal answer herself. She "takes a language stance" and thereby transforms the bodily actions of M into a recognizable verbal pattern naming and categorizing the action of M as depicting "*a: (.) pig but it's actually a wild boar.*"

## *Summary*

This example explicated how:



In the next example we will investigate further how affect and emotion are built into languaging behavior in the phenomenon of laughing while also being constrained by second order.

# **ANALYSIS: THE ECOLOGY OF LAUGHTER**

Laughter in interaction is an intriguing phenomenon in relation to emotion and affect. It is tempting, and therefore common, to consider laughter as a spontaneous and individual phenomenon; a force of nature that sometimes get the better of us resulting in individual single outbursts of laughter. On the other hand, laughter is commonly experienced as contagious. It rapidly spreads among interlocutors15, and in this regard it can be seen as a shared phenomenon that evolves in the intersubjective space between people. Furthermore, in a number of studies the conversation analyst Gail Jefferson has shown how laughter in interaction can be regarded as an activity that invites participation: *"speaker himself indicates that laughter is appropriate, by himself laughing, and recipient thereupon laughs"* (Jefferson, 1979, p. 80—italics in original). Thus, an interlocutor invites others to participate by the act of laughing itself, and furthermore, if the interlocutor does not join the laughing, or only laughs momentarily, the laughter of first-speaker lasts significantly shorter (Jefferson, 1984). In this sense, laughing in interaction is by definition something we do together, and for that reason solo-laughter is not common, nor acceptable, for too long in social interaction. This can remind us that there is much more to laughing than spontaneous and individual outbursts; on a fundamental level laughing is grounded in an ecology of inter-affectivity. It is integrated into the languaging behavior and profoundly tied to the biosocial interworld (Linell, 2009) of perceptions, bodily actions and social attitudes of interlocutors and embedded in interactional structures.

This longer sequence comes from a larger set of recordings of couple's therapy sessions featuring a therapist and married

<sup>15</sup>For instance, the *emotional contagion* approach explicates how emotional reactions such as laughter rapidly spread in groups of both mammals and humans. It is a process consisting of three steps—mimicry, feedback, and contagion—enabling people "from moment-to-moment to catch others' emotions" (Hatfield et al., 1993, p. 99).

couples16. As an introductory exercise this couple is asked to mention one thing about the other that they appreciate and value. This request however, is followed by a considerable pause of 3 s in line 1, which is subsequently broken by the starting laughter-and-talking. Apparently, the silence following what perhaps ought to be an easy task for a married couple creates a contrast that provokes laughter even though it also might appear as problematic17 :

*Laughing as a gestalt of shared expressive experience*

In this sequence the laughing emerges gradually from initial outbreaths and "laugh particles" interpolated within wordings in line 1 over the increase in volume, stress and smiley voice in line 2 to the eruption and flow of a full-fledged laughter in line 3– 7 (see second picture) until it suddenly stops in the overlaps of line 7 and 8 (see third picture). It lasts almost 8 s and has a clear trajectory. The distinct in- and outbreaths evolve in a rhythmical pattern that is completely intertwined with the inter-bodily dynamics of speaking, tone of voice, gesturing, postural sway, facial displays, gazing at each other or into the room, closing one's eyes and even tactility (gathering hands and touching one's face).

In line 1 the pause is suddenly disrupted by M moving his shoulders up and down in small rhythmical movements while making hearable outbreaths surrounding and interwoven in the articulation of "no(h)w." These actions are immediately reflected by a change in W's behavior from sitting still and looking into the distance to a distinct *smiling-and-gazing-behavior* directed toward M. In a flash, through the movements they share emotions building inter-affectivity. It is the totality of their "interactive expressive bodily behavior" that taken together appear as "one gestalt of shared affective experience" (Böhme et al., 2014, p. 2116). Thus, the initiation of this "laughing behavior" is built into the whole-body sense making inseparable from first order languaging behavior. Furthermore, the ending of this gestalt unit of laughing in line 6–7 comes about within a similar tight coordination of actions. Suddenly M and W inter-bodily affective dynamics are replaced by a quiet position of sitting still with their heads bowed and hands in their laps (third picture). In order to understand this sudden change we need to look at the behavior of the therapist. In the end of line 5 T starts changing her posture (see small yellow circle in second picture); she gathers her arms behind her back and then, just after M's speaking turn in line 6 (while M and W are still engaged in their laughing behavior) T closes her eyes and lets her head fall onto her chest. It is an action by which T visibly withdraws from the ongoing laughing behavior while displaying concentration and introversion as opposed to the extroverted mutual laughing exhibited by M and W. It is striking how this silent, yet overt, bodily demonstration achieves a change in the dialogical system that ultimately stops the ongoing laughing.

Laughing brings forth a "sharedness" by engaging people which is exactly the reason it is also highly sensitive to actions of disengagement. This is illustrated by the impact of the silent withdrawal of the therapist form the laughing activity; it brings the laughter to an end pointing to the fact that laughing itself *requests* participation in order to be sustained within a dialogical system. Like other languaging acts laughter is profoundly other-oriented; it requires a response in the form of more laughter to be maintained. Thus, what this analysis points to is that laughing is not only tightly bound to the inter-affective sharing and exploration of joy and amusement; it is also integrated in the overall languaging behavior and therefore it can easily be restructured and "toned down" by other languaging acts.

## *Employment of second order patterns in laughing*

Looking closer at the trajectory of the laughter reveals two significant "peaks" of laughing in terms of volume, intensity, duration and postural sway in line 3–4 (overlapping) as well as line 6– 7 (also partial overlapping). Common for these peaks is their sequential placement right after verbal and gestural actions; i.e., they seem to function as multimodal responses to what have just been said (and done by means of gesture) suggesting that these actions are not only built into the very structure of laughing, but even contributes significantly to its development. Now let us take a closer look at these actions.

In line 2 W makes a very distinct gesture-and-posture (see first picture) exactly at the point when M says *PAUSE FOR REFELCTI::ON:* thereby providing a visual feedback and image reflecting the wordings. Likewise, in line 6 a similar (albeit not identical) gesture-and-posture is performed by M simultaneously with his own speech on *an amazi(h)ng PRESSURE*-. We can call these repeated gesture-and-postures, an emblematic thinking-gesture-and-posture. They have the characteristics of placing the right hand or fingers either on one's cheek (first instance—see picture) or in front of the mouth (second instance) while wrinkling brows and looking downwards (somewhat like

<sup>16</sup>The recordings were undertaken in relation to my Ph.d. dissertation in 2008 in collaboration with the Danish Imago Center.

<sup>17</sup>As investigated by Gail Jefferson laughter in interaction sometimes has the social effect of dealing with sensitive topics. It can be seen as a way of managing troubles-talk "exhibiting that, although there is this trouble, it is not getting the better of him [the speaker]; he is managing; he is in good spirits and in a position to take the trouble lightly (Jefferson, 1984:, p. 351). Something similar seems to be the case here. By engaging in laughing behavior the couple mutually deals with the fact that they were not, on the spot, capable of recalling something valuable about each other.

the famous "The thinker" sculpture by Auguste Rodin). These gestural actions arise from and are integrated into the whole gestalt unit of laughing in which they have a complementary function to the ongoing speech. Both of them complement the meaning of the verbal actions of having to think hard whilst under pressure; i.e., they provide an image of "concentration" that in turn can be mutually elaborated adding to the sharing of affective experiences, and thus again contributes to the humorous effect which can be witnessed by the subsequent increase in laughter following them.

Thus, we can see how the first order activities of shared laughing are constrained and enriched by second *order patterns*. The utterances themselves are at the same time first order embodied actions (smiley voice, high volume, laugh particles within the wordings, postural sway, etc.) and second order manifestations of affording a view from the outside—e.g., "here we are, a couple in therapy without even being able to (immediately) come up with something nice to say about each other." It illustrates how languaging activity can be seen as multi-scalar, since it involves a coupling with other timescales transcending the here-and-now of situational activities. This dimension concern the second order patterns that originate from larger scale dynamics of interacting agents on larger (and longer) socio-cultural time scales. In dialogical terms it enacts "other voices" (Linell, 2009), i.e., in human interaction we do not just interact with each other, but also with an array of third parties emanating from cultural traditions, societal norms and so forth. As famously pointed out by Bakhtin: "The word in language is half someone else's. It becomes one's "own" only when the speaker populates it with his own intentions, his own accent, when he appropriates the word" (Bakhtin, 1982, p. 294). Thus, sense-making and meaning in interaction cannot be reduced to individual activity; it is, at once, inter-bodily, interactional, situated, and situation transcendent, and in that sense fundamentally *co-authored* (Linell, 2009; Steffensen, 2012, see also Cuffari, 2014, this volume):

Sense making re-enacts multiple voices, defined as *silent others* that affect what we think, say, do and not do in situated dialogue. Sense-making, thus, unfolds as *double dialogicality* that links socio-cultural history (norms, knowledge, rules etc.) with real-time dynamics as we orient toward each other and use cultural artefacts (including verbal patterns) (Pedersen and Linell, in press).

The verbal and gestural actions in line 2 and 6 "comment" on the situation by evoking a position viewing and evaluating this specific couple therapy interaction in the here-and-now from a larger "outside." The wordings, gesture and posture invite such an outside view of socio-cultural norms that creates a doubleness (Jensen and Cuffari, 2014) that actually seems to furnish and elaborate on the humorous effect. The second order view from the outside may add to a feeling of absurdity, which, in this case, makes the situation even funnier—and in paradoxical way contributes to the inter-affective sharedness of laughing together. In this way, having a closer look at the trajectory of the laughter illustrates how "laughing" is a rich and complex affective phenomenon deriving from first order activities while being constrained by second order patterns.

## *Summary*

To sum up, the affective quality of laughter as an integral part of languaging can be summarized in the following way:


# **FINAL REMARKS**

This article offers a re-specification of the traditional distinction between "language system" and "language use" as first order languaging and second order language. It is re-conceptualization that in turn offers an opportunity to see affect and emotion as part and parcel of languaging behavior while also being constrained by second order language. In that sense emotion and affect need not be separated from language; emotion and affect need not be treated as "non-linguistic elements" that are added to language. Instead languaging behavior is promoted as inherently affective and at the same time enmeshed in second order patterns. An obvious advantage of such an approach is that language can be studied as part of human action as such which again allows us to see aspects of that action, hitherto separated from language, such as affect and emotion, as part and parcel of language as it evolves from *human life* (Cowley, 2011a; Steffensen, 2014); not just as an "instrument" that we use for "communication." This entails that language is not first and foremost seen as a system, it is not just about words, and it is not conceived of as a channel that transfers information; nor is language understood as merely a social phenomenon devoid of a biological dimension. On the contrary it is grounded in a *naturalistic* approach to language that sees language as evolved from and completely intertwined with the complexity of *human behavior*.

However, this approach to language also raises serious conceptual and methodological challenges. One of them being: if language is re-specified as whole-body sense making, or behavior, how can we, as researchers interested in language, specify, delimit and measure our object of study? Or put simply, where does languaging begin and where does it end? In a recent review article Sune V. Steffensen discusses this problem arguing that Thibault's broad definition of languaging is indeed too broad:

While first order linguistic interaction and coordination is indeed a whole-bodied achievement, the definition may seem too broad, as it can be read as suggesting that each and any "whole-bodied achievement" is an instance of first-order languaging. But describing my boiling an egg or preparing an omelette as first-order languaging intuitively seems to stretch the term. On the other hand, Thibault's definition would be applicable if wordings played a part in recalling my mother's instructions of how to make an omelette, or if I elicit my family's preferences for hard-boiled or soft-boiled eggs (Steffensen, 2014).

It is true that preparing an omelet does not intuitively seem to be part of languaging. We need to be able to able to discriminate between languaging behavior and other types of behavior. As suggested in the introduction one way to define languaging behavior more precisely is to see it as *coordinated actions constrained by second order patterns*. Such an approach is also implied in the quote above by suggestion the inclusion of wordings in recalling a recipe as a possible way of viewing cooking as an instance of languaging.

Still, such a tentative definition does not solve all the problems in conceptualizing languaging as whole body behavior or sense making. First of all, it does not sufficiently address the question of intentionality and meaning. Many types of behavior are carried out without any intention of influencing the behavior or experiences of others, but for practical purposes: We make an omelet or prepare dinner for our family in order to get something to eat; not because we want to "convey a message" (even though that might sometimes be the case). Clearly, such an activity would not count as languaging behavior; on the other hand we might imagine a very distinct way of preparing a meal, a clattering of the crockery and cutlery, i.e., a hectic, hasty, and perhaps even angry way of cooking that (granted the presence of others) may indeed be orchestrated in a way such as to cause "an experience that happens to coincide with the narrow situation or the larger reality such as it is enacted, and has to be mapped against the environmental medium, including the psychological environment" (Bottineau, 2010, p. 278). Even if such a behavior is performed without the use of words it might still be (partial) communicative deliberate by its virtue of doing, acting and manipulating the environment in certain ways which transcends the mere practical purposes. Furthermore, "making an omelet" or "preparing a meal for a family dinner" are practices that are only possible within a specific ecological niche with certain historic-social-cultural horizons of significance, i.e., it is by no means detached from second order patterns. Likewise, cooking activities often require a certain culture-specific training; they often have a social character and perhaps even an emotional significance for the people involved. Does it count as first order languaging behavior then? There is no easy answer to this, and many further studies need to be performed in order to investigate further how languaging are enmeshed in human practices.

This article presents an ecological approach to language and emotion. One of the implications of such a point of departure is that the distinctions between what is considered biological vs. social are fundamentally challenged. Is preparing a meal a social or biological act? Or for that matter engaging in learning activities with a speech and language therapist or participating in couple therapy with your spouse? From the direction of this work, posing these questions makes little, if any, sense. This article argues for a reconsideration of the unreflective rift between the biological (individual) and the social (collective). Mainstream linguistics and cognitive science generally take biology as first and foremost an individual phenomenon, while sociality is understood as something purely collective and public. Correspondingly, emotion and cognition are construed as individual, internal, and private processes, while communication conversely is conceived as purely social, public, and outer. The problem arises when these distinctions come off as mutually exclusive. On a dichotomous reading, what is social is understood as that which by definition does not belong to nature or biology and the other way round (Cuffari and Jensen, 2014). However, the notion of ecology rests on a principal bio-social foundation; unlike the more familiar, and wholly social, concept of context:

The ecology is not an outer frame that just surrounds or contains the individual agents and it cannot be captured in the simple outer-inner dichotomy. Rather, the ecology emerges from the active sense-making of agents employing the physical materials and socio-cultural resources of the environment (Cuffari and Jensen, 2014).

In the same vein, we need to transcend the dichotomy between viewing emotions as either a primarily biological or social phenomena. Emotions are at the same time rooted in neurological structures and embodied sensations, subjectively felt experiences, socially embedded and integrated with action and languaging. In that sense, emotions are part and parcel of our ecology in the manner of which they are intertwined with our languaging behavior in the animal(human)-environment system. Embodied emotional actions are enacted in languaging as affordance to locate and orient us to the possibilities that we encounter. In that sense, emotions help us to build an interpersonal "geography" for us to share, participate in or confront.

## **ACKNOWLEDGMENTS**

I would like to thank Elena Clare Cuffari, Sarah Bro Pedersen and Sune Vork Steffensen for their critical comments and valuable contributions to earlier drafts of this article.

## **REFERENCES**


Bottineau, D. (2010). "Language and enaction," in *Toward a New Paradigm for Cognitive Science,* eds J. Stewart, O. Gapenne, and E. Di Paolo Enaction (Cambrifge, MA: MIT Press), 267–306.

Boyes, C. (2005). *Need to Know Body Language*. Haper Collins.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 April 2014; accepted: 22 June 2014; published online: 16 July 2014.*

*Citation: Jensen TW (2014) Emotion in languaging: languaging as affective, adaptive, and flexible behavior in social interaction. Front. Psychol. 5:720. doi: 10.3389/fpsyg. 2014.00720*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Jensen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Why call bodily sense making "languaging"?

# *Giovanna Colombetti\**

*Department of Sociology, Philosophy and Anthropology, University of Exeter, Exeter, UK \*Correspondence: g.colombetti@exeter.ac.uk*

## *Edited by:*

*Hanne De Jaegher, University of the Basque Country, Spain*

*Reviewed by:*

*Thomas Wiben, University of Southern Denmark, Denmark*

**Keywords: primary intersubjectivity, sense making, languaging, affectivity, language**

## **A commentary on**

**Emotion in languaging: languaging as affective, adaptive, and flexible behavior in social interaction**

*by Jensen, T. W. (2014). Front. Psychol. 5:720. doi: 10.3389/fpsyg.2014.00720*

I am sympathetic to Jensen's aim to "bring language and emotion back together." To speak is among other things to communicate one's affective state to others, and this communication is typically effectuated by embodied agents whose affective state also manifests in their face, posture, gestures, facial expressions, and tone of voice. When I talk to someone I usually look at them in a more or less engaged way; I may also smile or frown, nod sympathetically or shake my head in disapproval, giggle, laugh, gesticulate, alter the volume and pace of my voice, and so on. These actions are partly responses to what to the other says and how he says it, and often have the function of affecting how the interaction continues (a nod may communicate approval at what is being said as well as encouragement to carry on). The interactions analyzed by Jensen nicely illustrate clear instances in which language is continuous and integrated with other types of bodily engagement with other people. In addition to being responses to others, my actions, when I speak, are also often related to the meaning of my words. As I am telling my friend about the climb I did on the weekend, I move my head down and close my eyes when I tell her how scared I was of the height and that I did not want to look down; I reproduce climbing movements with my hands or even the rest of the body when I tell her about a difficult passage; I spread my arms when I tell her about the 360-degree view from the top of the mountain, etc. Here as well we can see a continuity between language in the sense of well-formed word-based speech, and a variety of communicative bodily gestures.

So, I agree with much of what Jensen says in his article. However, I remain unclear about his use of the notion of "languaging," particularly about its relationship to bodily sense making. I understand the point of talking of "languaging" to denote "language as an activity" (p. 2),<sup>1</sup> namely as a process and as a behavior rather than as a static system of symbols and rules. But the notion of languaging in Jensen's paper *also* appears to be stretched to include *all* instances of bodily sense making, which I think is problematic. For example, at the beginning of the article Jensen writes that languaging is first-order "behavior or whole-body sense making" (p. 1, footnote 1); and at the end he suggests that preparing a meal in the presence of others, yet without using words, can also be seen as an instance of languaging—if done "in a very distinct way" (p. 12) that communicates some kind of affect ("a hectic, hasty, and perhaps even angry way of cooking," p. 12).

Now, "whole-body sense making" may well be what grounds word-based linguistic phenomena, but it can also occur *before* language is acquired—so it's not clear that we should call all cases of bodily intersubjective sense making "languaging." We know from developmental psychology that already shortly after birth infants interact with their caregivers by responding to bodily contact, vocalizations, and gaze direction (e.g., Tronick et al., 1979; Tronick, 2003). In the first year of life, infants engage in progressively richer interactions with the caregiver, in what is known as "affect attunement," i.e., the cross-modal matching of vocalizations and bodily movements in terms of rhythm and intensity (e.g., Stern, 1985; Legerstee et al., 2007). The term "primary subjectivity" (Trevarthen, 1979), which Jensen mentions, refers to these and other skills that are present very early in development such as imitation, a capacity to distinguish between inanimate objects and people, and a responsiveness to others' facial expressions. These skills arguably embody a pragmatic form of understanding others (e.g., Gallagher, 2001), also dubbed a "participatory sense making" (De Jaegher and Di Paolo, 2007). Although these forms of bodily attunement do not disappear once language is acquired, and may be necessary for language acquisition (including systematicity and compositionality), in infants they seem to be best characterized as *pre*linguistic, as they do not require the capacity to utter words and meaningful sentences. Thus, to characterize them as instances of languaging, where "languaging" is (also) taken to denote "language as an activity," seems misleading. Moreover, to do so may even convey the message that forms of intersubjective bodily attunement are immature forms of sense making, waiting to be fully realized once language is acquired, rather than complete and autonomous stages of development. Incidentally, I do not think that Jensen believes this is the case, given that at some point he writes that "the contours of languaging, in its most basic form, are definitely grounded in such early intersubjective behaviors"

<sup>1</sup>Page numbers refer to the online version of Jensen's article.

(p. 6)—namely, he seems to think that not all instances of bodily intersubjectivity are forms of languaging. But then we are left with the question of what distinguishes the two.

I thus agree with Jensen when he acknowledges, at the end of his article, that his approach raises serious conceptual challenges. As *conceptual*, however, they will not be answered by performing "many further studies" (p. 12), but only by clarifying one's theoretical framework and adopting a consistent terminology.

## **ACKNOWLEDGMENT**

Giovanna Colombetti is supported by a grant from the European Research Council under the European Community's Seventh Framework Programme (FP7/2007- 2013), ERC grant agreement nr. 240891 (EMOTER).

# **REFERENCES**


interactions," in *Before Speech: the Beginning of Interpersonal Communication*, ed M. Bullowa (Cambridge: Cambridge University Press), 349–370.

**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 September 2014; accepted: 23 October 2014; published online: 07 November 2014.*

*Citation: Colombetti G (2014) Why call bodily sense making "languaging"? Front. Psychol. 5:1286. doi: 10.3389/fpsyg.2014.01286*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Colombetti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Rethinking conformity and imitation: divergence, convergence, and social understanding

# *Bert H. Hodges1,2 \**

<sup>1</sup> Department of Psychology, Gordon College, Wenham, MA, USA <sup>2</sup> Department of Psychology, University of Connecticut, Storrs, CT, USA

## *Edited by:*

Hanne De Jaegher, University of the Basque Country, Spain

#### *Reviewed by:*

Vasudevi Reddy, University of Portsmouth, UK Kathleen H. Corriveau, Boston University, USA

#### *\*Correspondence:*

Bert H. Hodges, Department of Psychology, Gordon College, Wenham, MA 01984, USA; Department of Psychology, University of Connecticut, Storrs, CT 06269, USA e-mail: bert.hodges@gordon.edu; bert.hodges@uconn.edu

Social and developmental psychologists have stressed the pervasiveness and strength of humans' tendencies to conform and to imitate, and social anthropologists have argued that these tendencies are crucial to the formation of cultures. Research from four domains is reviewed and elaborated to show that divergence is also pervasive and potent, and it is interwoven with convergence in a complex set of dynamics that is often unnoticed or minimized. First, classic research in social conformity is reinterpreted in terms of truth, trust, and social solidarity, revealing that dissent is its most salient feature. Second, recent studies of children's use of testimony to guide action reveal a surprisingly sophisticated balance of trust and prudence, and a concern for truth and charity. Third, new experiments indicate that people diverge from others even under conditions where conformity seems assured. Fourth, current studies of imitation provide strong evidence that children are both selective and faithful in who, what, and why they follow others. All of the evidence reviewed points toward children and adults as being engaged, embodied partners with others, motivated to learn and understand the world, others, and themselves in ways that go beyond goals and rules, prediction and control. Even young children act as if they are in a dialogical relationship with others and the world, rather than acting as if they are solo explorers or blind followers. Overall, the evidence supports the hypothesis that social understanding cannot be reduced to convergence or divergence, but includes ongoing activities that seek greater comprehensiveness and complexity in the ability to act and interact effectively, appropriately, and with integrity.

**Keywords: conformity, dissent, imitation, pragmatics, social learning, trust, truth, understanding**

# **INTRODUCTION**

One of the deepest assumptions of social psychology, and many allied disciplines, is that people have strong tendencies to conform, to imitate, to mimic, and to obey. These tendencies are claimed as the basis for coordination, communication, and culture (e.g., Richerson and Boyd, 2005; Mesoudi, 2009). The available evidence, though, reveals a far more complex and interesting set of dynamics: divergence is as pervasive as convergence, but its appearance is often unnoticed or minimized (Berger and Heath, 2008; Haslam and Reicher, 2012; Reicher et al., 2012; Hodges, in press). The story that needs to be told, though, is not simply that we need to pay more attention to divergence, but that we need to appreciate a larger set of dynamics: social understanding cannot be reduced to convergence or divergence.

Why have researchers and theorists been so slow to recognize the importance of divergence, disagreement, diversity, and dissent in the dynamics of social interaction? There are many reasons, but chief among them are theoretical and methodological biases that focus on what might be called "Cartesian individuals"—isolated individuals, separated from the world and others, thinking about how to achieve egoistic goals (e.g., to be accurate, to belong). From this perspective, others come to be treated as means to individually determined ends, rather than partners who must act together to learn and to care for each other and the larger ecosystems of which they are a part. Cartesian thinkers must either try (1) to infer what other isolated individuals are thinking or (2) to project their own thoughts onto others, and simulate what they might do in their situation. The first possibility is chancy at best; social understanding becomes a guessing game that is prey to the constant worry that one has guessed wrong. The second possibility, which initially inspires more confidence, is based on a crucial assumption that the other is similar to the self. The risk is that the assumption is presumptuous, that it hides real and important differences (Reddy, 2008).

The fascination of social psychologists with conformity and other forms of convergence is a consequence of their having begun their work with assumptions of individualism, independence, and isolation (Shotter, 2001). Given these assumptions, it might seem important to show how people influence each other, form common bonds, and productively pursue common goals. Divergence, in contrast, would be seen as relatively uninteresting, since it is a natural consequence of the independence and isolation of individual thinkers. However, I will argue that social understanding is more about embodied joint activity among people across time and task than it is about one individual generating ideas about another in order to predict and control outcomes. Rather than controlling outcomes, joint activities make participants more vulnerable

to others, and more dependent on the environment; nevertheless, it also increases the flexibility and integrity of their actions and choices.

An array of studies will be described indicating that people act less like Cartesian thinkers and more like social, embodied, dialogical partners, working together to learn how to act in ways that are good for themselves and their ecosystems. The evidence suggests that people are motivated to *understand* situations, others, and themselves, not simply to seek predetermined goals. Social understanding, as it will be addressed in this article, is not about trying to make the other like the self, or the self like the other, but is about jointly exploring a more comprehensive and complex field of action than any of its participants could have predicted or imagined alone.

The evidence to be presented comes primarily from studies in developmental and social psychology, but it has implications for studies in anthropology, language, learning, and many other domains. The four topics that will be addressed are (1) reinterpretations of classic studies in conformity that shift the focus to truth, trust, and social solidarity; (2) recent studies of the role of testimony and trust in children's actions and choices; (3) new experiments showing that people do not always conform, even when it is normatively expected; and (4) studies of imitation showing that children are surprisingly selective and careful in their following the lead of others. The evidence suggests that divergence is as newsworthy as convergence, and that there is much yet to learn about how these dynamics interact and play out in social understanding.

## **THREE THEMES AND A HYPOTHESIS**

Before the evidence itself is presented, three themes should be noted—social understanding, embodiment, and intersubjectivity. These are provided by the Research Topic (*Towards an embodied science of intersubjectivity: Widening the scope of social understanding research*) to which this article contributes. What I take those terms to mean will become increasingly clear in the ways that I make use of them, but it may help to sketch briefly their potential before delving into the details of divergence and convergence in social interaction.

## **SOCIAL UNDERSTANDING**

The working hypothesis explored in this paper comes from research applying values-realizing theory to social cognition (Hodges and Geyer, 2006; Hodges et al., 2014), perception–action (e.g., Hodges and Lindhiem, 2006; Hodges, 2007b), language (Hodges, 2007a, 2009), and developmental psychology (Hodges and Baron, 1992; Hodges, in press). Values-realizing theory claims that perception, action, and cognition are motivated by values, including clarity, coherence, comprehensiveness, and complexity (Hodges, 2009). The hypothesis is this: *understanding* is the ongoing activity of seeking comprehensiveness and complexity in our knowing and doing. While the values of clarity and coherence point to the need to differentiate and organize our experiences in meaningful ways, comprehensiveness and complexity pull activity toward larger, differing contexts that lead to continuities and discontinuities with prior experience. In an important sense, understanding enlarges and complicates our

views and actions rather than satisfying and simplifying them. More specifically, this hypothesis suggests that social understanding is the ongoing activity of divergence, not just convergence, of opening up new possibilities, not simply closing in on predetermined goals. Different people in different positions at different times interacting on common ground provide the basis for exploration, as well as a surer grasp of "this place and time and our identity in it" than when one person guesses or simulates.

## **EMBODIMENT**

As Wilson and Golonka (2013, p. 1) have suggested, embodied cognition is not the claim that bodies affect minds, but that skilled (mindful) action is a distributed set of physical relationships over time,"brain, body, and environment, coupled together via our perceptual systems." The examples to be considered will illustrate how social understanding involves multiple bodies interacting together, and how actions made by any one body are dependent on the presence, placement, and activity of other bodies over time. Following the lead of another person (or not) is more than guessing or projecting. It is a search to find the integrity of relationships, physical and social; it is also a search for social solidarity and truth. If so, how is that search carried out?

## **INTERSUBJECTIVITY**

As the earlier discussion of Cartesian perspectives on social knowledge suggested, intersubjectivity is usually taken to be the relation among independent, disembodied minds. If, however, it is an embodied social activity that pulls us beyond the common ground on which we stand toward a richer appreciation of the larger environment and the broader community within which we dwell, then we have the beginnings of an alternative approach, what might be called interaction theory (De Jaegher et al., 2010) or dialogical theory (Linell, 2009). That is, the way in which we come to know and understand others and ourselves is through engagement with each other.

Engagement takes the situated community as fundamental. There is no gap between individuals requiring a theory of or a simulation of "other minds," and there is no dispassionate observation of the world from a distance (Reddy, 2008). Rather, humans interact with each other and their environment in variety of meaningful ways, and in doing so they come to learn what the world is, who others are, and who they themselves are. Perceiving oneself, others, and the world are interwoven activities and may be direct (rather than inferred), but only over time, and in ways that require participants to be committed as embodied, engaged presences (Gallagher, 2008; De Jaegher, 2009; De Jaegher et al., 2010).

A central issue to be addressed in this paper is how the perceptions and actions of others are integrated with one's own actions and perceptions, and how this is constrained by embodied locations and specific social understandings. This integration is necessary for learning and language to occur, and it emerges from the values that make these activities possible (Hodges, 2009). This interactive, dialogical, engaged way of enacting social understanding has been challenging, both theoretically and empirically, for psychologists and other researchers to address adequately.

Nevertheless, even traditional methods of investigation, which focus on particular individuals, have revealed the social, ecological, and dialogical nature of social understanding. Thus, we now turn to how social understanding, embodiment, and intersubjectivity emerge from studies on conformity, imitation, and trust.

# **DISSENTING FOR TRUTH**

One of the most famous studies in social psychology is Asch's (1951, 1956) experimental dilemma in which he had confederates answer clear factual questions about lengths of lines incorrectly some of the time. Having heard the same wrong answer multiple times, the real participant was in an awkward position: he could say what he thought was false, or he could dissent from a unanimous majority. Asch's work is the *locus classicus* for claims about conformity among humans because people agreed with the confederates' wrong answers about 1/3 of the time, far more often than Asch expected. That is an impressive finding, but even more impressive is how often a lone participant told the truth about what he saw in the face of a unanimous consensus to the contrary. Unfortunately, the former finding has attracted virtually all the attention. Despite being the most cited reference to support the power of conformity, Asch's experiments are a powerful testament to divergence (Hodges and Geyer, 2006). Participants disagreed 2/3 of the time with a 100% consensus, and 95% of the time with a consensus over 80% (Asch, 1956).

What prompted this stunning display of dissent? The simple answer is truth-telling: it was when the majority answered incorrectly that participants disagreed a large majority of the time. How did a story of divergence turn into one of convergence? One crucial reason is that the experiments are notframed in terms of pragmatic actions, multiple relationships, and temporal dynamics, but rather in terms of an isolated Cartesian knower guessing and worrying. One explanation of Asch's (1956) results assumes that individuals in the experiment are confused by the misleading information and are unsure what is correct, so they *guess* it best to follow the lead of others. A second explanation claims that people realize what the correct answer is, but *worry* that if they disagree with others, they will be ostracized or embarrassed in some way; thus, in order to be liked by others, they agree with their wrong answers (Campbell and Fairey, 1989). Neither of these explanations actually explains the data.

These accounts do not even try to explain all the data, but focus only on incorrect, agreeing answers. This is startling on three counts. First, they take accuracy and dissent to be obvious and psychologically uninteresting. Second, they ignore the diversity of responses to the situation, which ranged from never conforming (26%) to conforming a majority of the time (28%). Third, they completely overlook the most obvious group to describe, the typical participants (the middle 46%), who dissent nine times and agree three times (i.e., the median) on critical trials (Hodges and Geyer, 2006). If Asch's participants were worried about being liked or being correct, why would they have disagreed so often, or agreed so little?

## **ENGAGEMENT, EMBODIMENT, AND UNDERSTANDING**

Hodges and Geyer (2006) proposed a new approach to understanding the Asch dilemma that attempts to address weaknesses of earlier interpretations. First, they suggested that Asch's participants were mostly neither the cowardly conformists that many social psychologists have portrayed, nor the independent truthtellers Asch was looking for; rather, they were ecologically sensitive, pragmatically astute individuals who were trying to be truthful and cooperative in a complex and awkward situation.

Second, Hodges and Geyer (2006) argued that there are multiple values—truth, trust, and social solidarity—that properly constrain Asch's participants. As Asch (1990) realized, truth matters to people, and he chided his social psychological colleagues for not acknowledging this fundamental fact. On the other hand, he saw the social influence of consensus as a danger (Asch, 1952). Despite Asch's reservations, trusting others and expressing social solidarity are not wrong: without them, the recognition and expression of truth itself would be hampered (Campbell, 1990). Asch's situation is not a simple choice between good and bad, between truth-telling and cowardice, but a delicate task of coordination: how can a participant speak truthfully in a way that honors his/her own view, that respects the views of others, and that answers appropriately to the experimenter?

Third, to pull off this coordination Hodges and Geyer (2006) hypothesized that many participants are engaging others in a nuanced, respectful way, varying their answers over trials rather than being trapped by an all-or-nothing choice. The 9/3 disagree/agree pattern of typical participants indicates clearly and truthfully their dissent from the consensus, yet it also respectfully acknowledges that consensus by repeating it occasionally. Hodges and Geyer (2006) claim that participants make local errors in an attempt to express a larger truth; that is, that they are in an awkward, frustrating situation in which there are tensions among multiple obligations. Although participants appear to be inconsistent, it is more likely that they are working to realize multiple values that are in tension. Almost certainly this is not a conscious strategy, but rather a product of a continuously evolving dynamical system in which prior choices constrain current ones (cf., Thelen et al., 2001).

Fourth, intersubjectivity appears in the Asch (1956) studies in the pragmatics of the quasi-conversation that the experiment is. Hodges and Geyer (2006) suggest that Asch's account is not sufficiently sensitive to participants' multiple obligations and to the conversational nature of the interaction among the real participant, their peers, and the experimenter. If a person always dissented from a group's expressed views (i.e., what Asch hoped would happen), it would be easy for that person to be seen as arrogant or dismissive. If, on the other hand, one agrees some of the time with incorrect answers, it functions as a pragmatic signal of one's commitment to taking others' views seriously (i.e., social solidarity) and one's openness to further engagement (i.e., trust) in the strange situation in which they find themselves.

Fifth, social solidarity must be maintained if one is to be taken as a serious witness to the truth of matters. If there is a lack of trust between parties, truth telling becomes much more delicate and difficult. Dissent cannotfunction if it is directed toward people who do not care what others think, or if there is no concern for those to whom the dissent is addressed. Dissent implicitly appeals to some sense of shared concern for truth and other values that provide the common ground for communicative discourse and social interaction.

Sixth, regarding embodiment, there is some evidence that it matters that participants are physically present and confronting each other as well as the experimenter. Attempts that soon followed Asch (1951) tried to isolate the participant in a literal Cartesian room to see how virtual group members (simulated by the experimenter) would create the social pressure that was assumed to produce Asch's results. TheCrutchfield (1955) procedure generally yields less agreement with wrong answers (Bond and Smith, 1996). This suggests that the physical–moral presence of others who speak to the participant, and the participant to them, contributes to the nature of the dilemma itself, as well as to the common ground necessary to address it.

To summarize, Asch's (1956) participants were not simply facing an epistemic quandary, about which they might guess or worry. Rather they were in a social-moral dilemma: what does one say in a frustrating situation, when one is facing two bad choices, either to speak truthfully and forcefully, but in a way that risks being perceived as disrespectful, or to speak with greater tact and humility, but at the risk of denying one's own convictions. Both of these options were chosen, but far more often Asch's participants varied dissents and agreements over time. Thus, dynamics of divergence and convergence were intertwined, revealing an embodied engagement with others that worked to honor truth, while also being sensitive to multiple relationships and multiple obligations. The hope of participants seems to be that if they say what they see, but also take account of others, perhaps together they can learn what kind of situation they are in and what to make of their disagreement. Engagement and dialogical interaction seek social understanding (i.e., a larger, richer appreciation of oneself, others, and the setting), rather than simply predicting or projecting.

# **TRUST AND GUIDANCE**

If there is any place where we expect to find widespread tendencies to follow the lead of others and to conform to observed practices, it is among children. What patterns of convergence and divergence have emerged in studies of social development? Young children—widely believed to be gullible conformists by some, and independent investigators by others—show surprising sophistication in terms of evaluating the worth of others' testimony about events in the world (Kuczynski and Hildebrandt, 1997; Harris, 2012). As is true of adults, children take account of their own perceptual experience of events and possibilities, but they also are guided by the perceptions and actions of others. Their actions and choices suggest they have social understanding, founded in embodied interactivity with others over time.

Recent research indicates that children's epistemic judgments reveal both more vigilance (Sperber et al., 2010) and more trust (Harris, 2012) than developmental psychologists generally have been willing to grant. For example, children trust those who have shown themselves reliable in the past, but they are not indiscriminate in that trust. If the more reliable informant is in a bad position to see the relevant information, children tend to trust a less reliable but better positioned informant (Corriveau and Harris, 2009; Brosseau-Liard and Birch, 2011). They even seem to operate on a

principle of charity: they are willing to learn from an informant who had previously been incorrect, if the informant's position had prevented him or her from seeing the relevant information. However, they discount information from someone who previously had been in a good position but was inaccurate (Nurmsoo and Robinson, 2009).

Children tend to choose other children to learn the affordances of novel toys, but they prefer adults as the best sources for names of new objects. In short, they respect the relevance of interactivity: they prefer to use guides more likely to have had relevant experience (e.g., VanderBorght and Jaswal, 2009; Rakoczy et al., 2010; Sobel and Corriveau, 2010; Koenig and Jaswal, 2011). They show a preference for first-hand testimony over second-hand evidence (Einav and Robinson, 2011), and they also show a preference for information that is consensually agreed upon by several adult witnesses, compared to a dissenter's claim (Corriveau et al., 2009). However, if an adult makes a claim that contradicts the child's own direct experience, children tend to question or correct the adult, rather than accepting the adult's mistaken claim (Koenig and Echols, 2003). If multiple adults make false statements (e.g., about the color of toy), most children state the correct color, but a minority follows the lead of the adults (Clément et al., 2004).

Two recent studies worked out versions of an Asch (1951) dilemma to present to 3 to 4-year-old children, one with a consensus of peers (Haun and Tomasello, 2011), and one with a consensus of adults (Corriveau and Harris, 2010). Their most stunning finding was how often children dissented from unanimous majorities: for example, 76% of 4 year olds and 58% of 3 year olds answered correctly every time in Corriveau and Harris (2010). In this same study, children increasingly dissented from incorrect majorities over succeeding answers, and when some clear, relevant good was at stake, they never agreed with incorrect adults (i.e., the child could win a prize, if they picked a bridge of the right length to cross a river in a game). This is dramatic evidence that children trust their own eyes, and are willing to disagree with a consensus of adults who answer incorrectly. On the other hand, they also show sensitivity to social consensus, at least when decisions do not appear particularly consequential.

One feature, related to embodiment is noteworthy. Corriveau and Harris used videotaped adults as their majority. Haun and Tomasello (2011) believed that stronger evidence of conformity in children could be found if there was face-to-face contact, and if the others involved were age-peers, not adults. They devised a procedure with four children, each looking at a book that presumably was the same for all; however, one child's book differed on selected pages. They found somewhat more conformity than Corriveau and Harris did (about 34%), but otherwise the picture that emerges is almost identical. Haun and Tomasello refer to the willingness to say things publicly that one does not find personally convincing "strong conformity" and they claim their experiments show children do this. However, the studies provide far more compelling evidence for children's clarity and conviction. Children, like adults, appear to be truth-tellers who are sensitive both to the information value of others' claims and to the pragmatic complexities of dissent and agreement with others.

Other evidence from developmental studies also yields the same pattern of cooperative engagement with others, but a strong tendency of children to trust their own perception-action capabilities. For example, when children are deciding whether to step across a gap in their surface of support that is sufficiently wide and deep that they hesitate, they often look to a parent for clarification, to see if they are smiling, frowning, or looking uncertain. What happens when the child perceives that the gap is crossable, but the parent discourages the action? Individual differences are considerable, but most children take the step, as if they were saying, "Mother knows best, but sometimes I know better" (Feinman, 1992, p. 252). Other studies using multiple sources of information have found that children generally look to knowledgeable sources more than attractive ones to clarify the situation. Children confronted by an unexplained object in the room look more readily at a stranger who appears confident about the object's meaning rather than looking at a more familiar and attractive person (e.g., their mother) who appears puzzled (Feinman, 1992).

Research on children's reactions to parental commands and instructions (Kuczynski and Kochanska, 1990; Kuczynski and Hildebrandt, 1997) indicates that children generally are cooperative, but they also engage in a number of actions that exhibit their own agency (e.g., complaining, arguing, partial compliance). Matas et al. (1978, p. 554) argued that "the competent 2-year-old ... is not the child who automatically complies ... when requested to stop playing and clean up the toys, but who gradually cooperates with the mother." Overall, children care about truth, not just approval, and engage in more dissent than is generally appreciated. Furthermore, their concern for truth and dissent is not so much a denial of their involvement in social relationships, as it is a sign of their commitment to them (Kuczynski and Hildebrandt, 1997). Reddy (1991, p. 144) provides evidence that this paradox of commitment and divergence begins prior to the end of the first year, when children initiate opposition to caretakers' actions and directives in a manner that can only be described as teasing. She observes that teasing is not so much a particular pattern of action but "is an element in a relationship," one that can bring its members closer together.

Overall, the picture that emerges from studies of children's trust in and use of testimony and advice from others suggests a developing sophistication that is surprisingly comprehensive and complex. Mostly children pay attention to embodied interactions of others and their likelihood of having observed or encountered relevant information. They do not seem to be guessing or projecting primarily, but interacting and acting in ways that are engaged, trusting, and vigilant. Even young children have a remarkably subtle understanding of relationships, timing, location, and how to find integrity. For the most part, children appear to act as dialogical partners, rather than blind followers or solo explorers.

## **SPEAKING FROM IGNORANCE**

It is often assumed that children are in a position of ignorance, in need of guidance from adults and older children to direct their efforts. As the research just reviewed indicates, children seem to share that conviction, but they also show a surprising confidence in their own abilities to see and know, and considerable flexibility

in how they integrate their own perspectives with those of various others. Acting from ignorance, however, is not confined to children. Adults are learners too, and they often find themselves in a position of ignorance with respect to others who know more. Do they trust and follow others' lead, or do they ignore others and follow their own counsel?

Hodges et al. (2014) explored this question by placing people in different positions relative to a screen so that two (A and B) could see information clearly, and one (C) could not. Furthermore, participants at C could easily see that A and B were better positioned than they were. They were then asked about information projected on the screen (e.g., superimposed words embedded in patterns). On critical trials participants at C had no definitive information with which to answer independently (e.g., they could see isolated letters but not the particular word about which they were questioned). However, they heard two other people (A and B) confidently give the correct answer before it was their turn.

Asch was surprised that people ever agreed with others' wrong answers. In contrast, the Hodges et al. (2014) experiment inverts the Asch situation: agreeing with others' answers appears to be the only sensible thing to do. However, Hodges et al. (2014) predicted that participants would surprisingly often violate this expectation: they would make up their own, incorrect answers rather than repeating the correct answer given by A and B. This disagreeing with wrong answers, which they called the speakingfrom-ignorance (SFI) effect, occurred about 30% of the time in several experiments. Further evidence indicated that participants were knowingly choosing not to agree with answers they believed were correct.

This result seems quite implausible at first. Unlike the Asch situation where there is a contradiction between perspectives, there is no contradiction in the SFI situation; thus, it seems there should be no dilemma. However, Hodges et al. (2014) found that participants do experience the situation as a dilemma. The reasons they do can be framed in terms of intersubjective engagement and embodiment. If the SFI situation, like the Asch situation is seen as a sort of conversation, then pragmatic constraints come into play. Pragmatic cooperativeness usually entails saying neither what you believe to be false, nor that for which you lack adequate evidence (Grice, 1975). However, an SFI situation pulls and twists these two aspects of cooperation inside out, creating a frustrating tension. While it is perfectly possible and appropriate to repeat what other, better-informed people have told you—it seems a simple matter of trust—many participants feel it is not quite right. "It feels like it's cheating," is the way some expressed it. The embodied location of each of the participants and the timing of their answers matters, and many participants feel a sense of obligation to be true to their position, as well as to the timing of their answer. Answering last affords them the option of answering correctly with considerable confidence, and about 50% of all participants always do so. However, their embodied position makes this awkward. The SFI effect reveals an understanding of the situation that is truthful and pragmatic: I cannot see from my position, so it is difficult for me to answer correctly and to do so with pragmatic warrant. This understanding of the situation, both in terms of dialogical relationships and in terms of embodied locations, constrains many

participants to go beyond immediate tendencies to "be correct" or "be agreeable."

Hodges et al. (2014) propose that the same dynamics at work in the Asch situation are also at work in the SFI situation—truth, social solidarity, and trust. Answering incorrectly, and disagreeing with better informed others, may seem irrational, but doing so truthfully acknowledges one's ignorance, concretely expressing one's commitment to truthfulness, not simply to being correct. It is also an expression of vulnerability and therefore it indicates trust in others' ability and willingness to appreciate the awkwardness of one's position and to continue to share their knowledge. Although social solidarity generally leads toward agreement, it goes beyond uniformity and consensus: it encourages each participant in a group to make his or her unique contribution to the integrity and well being of the group as a whole. Thus, at the level of conversational pragmatics, social solidarity leads each participant to want to make a distinctive contribution to the conversation, rather than blindly repeating what others have said. It is not wrong, of course, to repeat others when one is in a position of ignorance. For example, we generally expect students to repeat what their teachers tell them. However, we also expect students to offer their own answers, even when those answers are awkward or incorrect, an every day exemplar of an SFI effect.

To test the hypothesis that pragmatic constraints to speak truthfully and with epistemic warrant lead participants to disagree with correct answers sometimes, Hodges et al. (2014, Experiment 3) compared groups, one of which was primed to be particularly sensitive to the demands of honesty. Even though participants were given the opportunity of winning a monetary prize by answering correctly, 49% of the time participants in the honesty-prime condition chose not to agree with correct answers given by others, compared to 19% in the no-prime condition. Along with other findings of other experiments, the results suggest that observed incorrect, non-agreeing answers were "not a speaking-last effect, a speaking-from-adifferent-position effect, a speaking-to-differentiate [oneself from others] effect, or a self-presentation effect (e.g., drawing attention to oneself as unique or creative)" (Hodges et al., 2014, p. 228). Rather, it is a speaking-from-ignorance effect that is yielded by the dynamics of truth, trust, and social solidarity.

Engagement in the SFI situation requires attending to embodied selves. Participants can see others are better positioned than they themselves are, yet they do not always agree because they sense a responsibility to their own physical, social, and moral location in the experimental setup. Answers reflect the layout of the situation as a whole, and the interdependence among positions, not simply a choice of one perspective or another. Even when participants gave agreeing answers, which they did most of the time, many participants exhibited (as informally observed by the author) bodily tension when they were giving correct, agreeing answers (e.g., they lowered their voice as if embarrassed, they jiggled their pencil, they hesitated, they tried to sound like they were saying something novel rather than repeating others). Most likely, this tension emerged because they were aware that their position both did and did not warrant their correctness.

To appreciate how social understanding is operative in the SFI effect, one needs to think of social learning at the communal and historical levels. What is necessary for cultures to function effectively in terms of learning and sharing knowledge? Much attention has been paid of late to the importance of agreement, conformity, and faithful replication in the constituting of cultures (Richerson and Boyd, 2005; Mesoudi, 2009). However, there is also a need for innovation, creativity, and the ability to share and elaborate those discoveries. Cultures necessarily embody a tension between sharing common practices (i.e., homogeneity) and the production of new variations (i.e., heterogeneity) from which better tools and skills can emerge (Hodges, in press).

The SFI experiments suggest that it is better if not everyone agrees with expert opinion or the consensus judgment, at least all the time. The general wisdom embodied in this tendency is that it may be better not to follow others blindly, even if they seem to be in the position of the expert. Scientists are often annoyed when others do not follow their lead, but there is good reason for people to be cautious. People know things are more complicated than even experts can appreciate, and they know that science itself depends on people willing to challenge the consensus and to propose ideas that may seem crazy or impossible, at least at first. In any event the SFI effect shows that people's use of others' testimony is not simply a goal-driven, rule-following activity, but engages the dynamic interplay of divergence and convergence to realize values that may be more complex and further afield than answering the next question correctly.

# **SELECTIVE, FAITHFUL IMITATION**

Imitation, "matching the behavior of a model after observing it" (Over and Carpenter, 2012, p. 183), is a kind of conformity, although it is rarely treated as such. The main difference is whether a group or an individual is being imitated. One of the most basic facts of imitation, although often overlooked, is that it is selective: who and what is copied, when and how, are basic questions. Behind these questions is a still deeper one: why does imitation occur?

## **WHAT IS IMITATED AND HOW?**

Despite the intentional character of imitation, Horowitz (2003) has argued that what counts as imitation is vague. She studied chimpanzees and adult humans and found that both tended to copy a complex series of actions partially. Both noticed easier ways to solve the puzzle she presented, so that even adult humans who explicitly claimed to be imitating exactly failed to do so. A crucial issue is that it is experimenters who decide what is to count as relevant to the action to be imitated. Must the one imitating use the same hand as the model, use the yellow ball rather than blue ball, and so on, for it to be counted as matching the model? The relevance question is, of course, one of the most challenging in psychology. Deciding what is relevant demands a larger context of history, function, and purpose; it raises the question of why imitation exists and what it does in the larger scheme of things. One of the most active discussions among researchers in this regard is whether imitation is primarily a way of learning from others about the world, or whether its focus is more on developing relationships

(i.e., identifying or communicating with the model; Over and Carpenter, 2013).

Tomasello (1999) claimed that children imitate much more faithfully than chimpanzees, and subsequent work has substantiated that children are far more likely to copy causally irrelevant actions performed by an adult model in solving a puzzle (e.g., getting a piece of food) than chimpanzees who choose more efficient means of solving the puzzle (Whiten et al., 2009). While this has led some social anthropologists to refer to children's close copying as *over-imitation* (Lyons et al., 2007), implying that it is excessive or "blinkered" (Whiten et al., 2009, p. 2425), others have taken a far more positive view of the tendency, considering it *faithful* or high fidelity imitation (Nielsen and Blank, 2011; Over and Carpenter, 2012). The latter have seen it as contributing to the human propensity to transmit cultural patterns faithfully, allowing those patterns to spread and survive (Richerson and Boyd, 2005; Nielsen and Tomaselli, 2010).

How are children's imitative actions both selective and faithful? How and why do children sometimes imitate quite precisely and other times much more selectively? These are central questions now being addressed by researchers, and how they should be answered are matters of ongoing discussion and debate (Nielsen and Blank, 2011; Over and Carpenter, 2012)? I will not try to resolve all the difficulties, but it is interesting that imitation researchers are now appealing to social psychology and its views of conformity and mimicry to argue their cases (e.g., Over and Carpenter, 2013). Perhaps, the more complex views of trust and prudence, of agreement and dissent, discussed earlier can provide fresh perspectives on imitation as well.

The possibility explored in this section is that children's imitative acts are seeking understanding, rather than simply being acts of learning or acts of affiliation. I will argue that imitative actions are selective and faithful, not one or the other, but they also go beyond what these two terms suggest. A powerful exemplar of this claim is that children tend to copy intentional actions of others, but not others' mistakes or their failed attempts. If adult models begin but do not complete an action (e.g., pulling a top off), children tend to complete the action they saw partially done (Meltzoff, 1995; Nielsen, 2009). If they hear a puppet make a mistake in saying a sentence, repeating a word that is unnecessary, they tend to omit the word when they repeat the sentence (Over and Gattis, 2010). Children's perception of agency appears to be crucial: if the action is "modeled" by a machine or an inanimate toy, they imitate its movements more literally and less often. This replication of intention rather than repetition of observed action, which begins in the first year (Nielsen, 2009), indicates that what is being matched is ecological and prospective. It suggests that what motivates imitation is larger than simply learning about things, or simply affiliating with the model who has served as demonstrator.

## **WHEN DOES IMITATION OCCUR?**

There are a variety of conditions that affect the selectivity and faithfulness of imitative precision and completeness. One is the transparency of intentions. If an adult turns on a light switch with her head instead of her hands, children will imitate her action, but only if the adult's hands are empty. If the adult's hands are

occupied, then the children imitate turning on the light, but they do it with their hands (Gergely et al., 2002), illustrating both selectivity and faithfulness. More generally, children tend to imitate less faithfully in tasks that have a clear goal (e.g., extracting a prize from a puzzle box): they tend to omit extra motions and actions that do not contribute directly to extracting the prize (Horner and Whiten, 2005; Kenward et al., 2011). However, if the causal mechanisms of the puzzle are opaque, then the model's movements are followed more closely (e.g., Lyons et al., 2007). Thus, a second condition constraining selectivity is the transparency of the goal and the means of its achievement. A third trend is the increasing faithfulness of replication as children become older. In fact, adults sometimes imitate more completely and accurately than children: with no instructions to imitate adults imitated more than 5 and 3 year olds, and the older children included more causally irrelevant actions than the younger ones did (McGuigan et al., 2011). Fourth, children who are uncertain about how to solve a problem, or who have tried previously and failed at a task, tend to copy a model's actions much more faithfully than if they have not had difficulty (Williamson et al., 2008). Fifth, children who have been primed with social exclusion tend to imitate models more closely (Over and Carpenter, 2009).

Finally, there are two other situations that tend to yield more faithful imitation by children. One is when adults signal that they are intending to teach the child (Brugger et al., 2007; Bonawitz et al., 2011), and the other is when models demonstrate competence rather than ineptness (DiYanni and Kelemen, 2008). If children see an adult demonstrate a puzzle solution several times, they tend to imitate the demonstrator's actions, even if those actions do not appear to be necessary, but only when that particular demonstrator is present (Nielsen and Blank, 2011). This tendency of the child to take into account a demonstrator's particular way of achieving an outcome, rather than simply taking the shortest, most direct route to an outcome, is one that Nielsen and Blank argue is important for the development of cultural groups, including their diversity and richness. Nielsen and Tomaselli (2010) suggest that this tendency to attend to particular cultural ways of doing tasks appears in all kinds of cultures, and leads children (and later, adults) to engage in actions that may interfere with what they as individuals might desire or believe. They claim that this tendency to follow others' lead is neither blind nor maladaptive. Rather, it is a mark of humans' tendency to trust others to alert them to complexities of physical causality that are not easily observed, as well as helping to enhance social solidarity with other members of their culture.

These last two factors affecting faithfulness lead us to notice more carefully the question of who the model is in relation to the child. Are some models imitated more than others?

## **WHO IS IMITATED?**

Human infants and children tend to choose as models those who have imitated them (Over et al., 2013), who are warm and friendly (Nielsen, 2006), who have acted reliably in the past (Clément et al., 2004), and who are ingroup members (i.e., use child's native language rather than another; Kinzler et al., 2011). Perhaps, the broadest pattern that emerges is that embodied engagement, and dialogical interactivity leads to greater imitation. Children are more likely to imitate faithfully if there is intersubjective engagement of adult and child prior to the demonstration that will serve as the test of imitation. Imitation is increased if the adult plays with the child prior to the test, or talks with them, and if the child is particularly tuned to interacting with others (Nielsen, 2006; Brugger et al., 2007; Hillbrink et al., 2013). In fact, one way a child and an adult can interact with each other is imitating together (Nielsen et al., 2013).

The increased imitation is tied to the specific individual who has engaged and interacted with the child previously or in the larger context in which the imitation task per se is embedded (Yu and Kushnir, 2014). Embodiment, as well as specificity, matters: imitation occurs markedly less when videotaped demonstrators are presented rather than live demonstrators (McGuigan et al., 2007; Nielsen et al., 2008). Thus, intersubjective engagement seems to encourage imitative behavior, and it is not due to some general increase in arousal, attention, or receptivity. The engagement is dialogical, concerted, and embodied: children imitate *with* others, not simply as a response to an action or a movement, but as a dialogical activity with a particular other person with whom they are engaged socially and physically.

The large-scale picture that emerges from these studies is that children do not simply converge with those whom they observe, nor do they diverge as if alienated. Children have a natural affinity for convergence, but not with just anyone, or anything, or under any circumstance. They seem to be attuned to others that care about them, and to those situations in which there is something to learn and something to share.

## **WHY DOES IMITATION OCCUR?**

One possibility, still widely taken for granted, is that imitation in infants and young children is some hard-wired tendency to repeat what they observe, and should not be taken as intentional action (Lyons et al.,2007). All the evidence reviewed above suggests otherwise. Imitation is far too selective and varies too much in its fidelity to be some form of automatic motor mimicry (if such a thing exists at all). Over and Carpenter (2012) proposed that imitation is motivated in three ways. First, children are motivated to learn about the world, and to use others to do so. Second, they are also motivated to identify with the person being imitated and the larger social activities they embody. Third, children are sensitive to social pressures that encourage particular ways of doing things. It is the latter two conditions, they propose, that encourage more specific, detailed, and complete copying. Finally, they claim that no existing theory of imitation does a good job of accounting for existing evidence along these three motivational axes.

The challenges to imitation researchers go even deeper, though, than Over and Carpenter's (2012) critique. Consider, for example, two recent experiments. Buttelmann et al. (2013), as well asYu and Kushnir (2014), find a substantial number of children, sometimes a majority, who do not choose to follow either of two models that are presented, or who engage in an action other than the two options in which the experimenter was interested. For example, 14-month-old children watch a model, who has previously spoken either German (the child's language) or Russian to them, turn a light on with his head. There is more imitation of the German

speaker's action, but an even more interesting finding is that a majority of children do not imitate either speaker, but turn the light on in their own way, usually with their hands.When presented with a model that chose one of two objects and acted pleased with his choice, children later showed no preference for the model's choice in making their own choice. These results seem similar to the frequent finding in social anthropology and psychology that people tend to trust their own judgments and experience (Eriksson and Coultas,2009; Eriksson and Strimling,2009; Hodges,in press), and do not follow too readily the lead of others. The irony is that it is procedures and choices of just the sort these two studies consider that are assumed to be most vulnerable to conformity effects.

## **SEEKING UNDERSTANDING IN IMITATION**

Bråten (2000),Nagy (2006), andReddy (2008) outline a larger context for understanding imitation, suggesting that it is a primitive dialog, not simply a conduit for passing on expertise, as cultural anthropologists often treat it. Infants initiate actions in an apparent attempt to provoke parents into reacting. These provocations are marked by heart deceleration (symptomatic of anticipation), unlike imitative responses, which show heart acceleration (Nagy and Molnar, 2004). The child and the adult see the other as caring what the other does, and as being open to what the other has to offer. Infants are sensitive to whether others are looking at them or away, and prefer direct visual engagement (Farroni et al., 2002; Rigato et al., 2011). Adult and child have to sense an openness and obligation to each other that is emotional, that indicates"I take you the way you are" and that anticipates what the other might do next (Bråten, 2009). It is this promise of learning together that encourages people to conform to parents, teachers, and colleagues, as well as to challenge and test them in a dialog that appears to begin even before children can speak (Meltzoff and Williamson, 2010).

There is a newfound appreciation among imitation researchers for its social nature (e.g., Over and Carpenter, 2013). However, it appears that they have slipped into the same sorts of dichotomies that befuddle standard explanations of conformity in social psychology (Hodges et al., 2014). One explanation given for faithful imitation in children is that they are predisposed to see any purposeful action by adults as causally relevant (Lyons et al., 2007; Whiten et al., 2009). Another explanation is that faithful imitation arises from children's increasing sensitivity to cultural norms and their desire to learn the socially approved way to do things (Kenward, 2012). The former is similar to what Deutsch and Gerard (1955) called informational influence (i.e., we conform in order to be correct), and the latter is similar to normative influence (i.e., we conform in order to belong and be liked). However, as was true in the case of conformity, this dichotomy cannot capture the subtlety and sophistication of children's selective faithfulness.

It appears that children are trying to be faithful to more than norms or causes. Bannard et al. (2013) claim that children act in ways that are precocious, as if they can do more and know more than they are able to achieve and complete. Perhaps, imitation by children is not simply about copying what exists, but more about trying to explore what is promising in the actions of other people and in the events of their environment. The intentional activities

of children appear to be exploring something larger and more complex than physical causality and social identity. Hillbrink et al. (2013) suggest that we should look at imitation, not just as an instrumental act, but also as a communicative act that involves reflection on the significance and values of others. If, however, values are not personal and social preferences, but are rather the "global constraints on an ecosystem" (Hodges and Baron, 1992; Hodges, 2007a, 2009), it may be that imitation is a precocious search for the integrity of those ecosystems as a whole.

## **CONCLUSION: UNDERSTANDING, DIALOG, AND SURPRISE**

All of the phenomena explored in this paper—the Asch effect, social reference effects (i.e., children seeking and using information from others), the speaking from ignorance effect, and imitation effects—yield the same fascinating and deep pattern. People seek and respond in ways that show their propensity for truthful information, for effective action, for social appropriateness, and for trust and prudence.

The evidence from all these domains suggests that people, including children, participate in social–physical encounters as engaged partners, intending to learn about the world, about others, and about themselves in a way that allows them to act appropriately and effectively. In these encounters people pay close attention to the embodied location of themselves and others in judging the worth of testimony by others and in deciding what they themselves should say and do. Furthermore, they show considerable sensitivity to historical patterns: people who have indicated their interest and care previously, and who have provided accurate and useful guidance in the past, are accorded greater deference than those who have been less caring and accurate, or are unknown. Overall, adults and children show considerable sophistication in their ability to integrate information from a variety of sources over time in ways that are appropriate to their immediate physical and social well-being, but that also gives promise of their being able to continue to learn about their social and physical locations and obligations.

The larger picture that emerges is that people are less concerned about predicting and controlling than they are in understanding the world, others, themselves, and how they all fit together. The evidence that has been reviewed suggests that people's actions reveal that they are seeking something more comprehensive and complex than most theories of conformity and imitation can countenance. Much of the burden of this article has been to show that divergence is far more common and powerful than social and developmental psychologists have acknowledged. When truth is on the line, adults and children defend it against majorities and models that would lead them astray. Nevertheless, across all these domains children and adults show themselves to be sensitive to the worth of others' perspectives and the need to acknowledge that worth. In the sharpest dilemmas, the diversity of action and choice is considerable, but it indicates that people generally work to find some accommodation that maintains social, physical, and moral integrity.

Finally, children and adults rarely act as if they are Cartesian thinkers, trying to figure out the world on their own. They show ample evidence of being guided by others, but they show a limited appetite for following others blindly or completely. Rather

than being independent learners or conformist imitators, they act selectively and prudently to be faithful to the world, to their own perceptions and actions in it, as well as to the perceptions and actions of others. They seem to be looking for a larger, richer understanding that holds these together.

This search can be characterized as a dialog, a conversation among self, others, and the world. Theory and research on conversations and dialog have tended to emphasize alignment: speakers converge on vocabulary, pronunciation, syntax, and many other aspects of language as they talk together. This has led to claims that alignment is necessary to be able to predict what others'will say and to control one's own replies (Pickering and Garrod, 2013). This is the same impulse that has allowed psychologists to minimize divergence and selectivity in conformity and imitation. Fusaroli et al. (2012) observed that people who are conversing engage in selective alignment; in fact, they noted that indiscriminate alignment undermined effective performance on the task. Although, it is rarely noted, speakers diverge as much as converge when it comes to what they say and how they say it, varying on virtually every dimension measured by linguists (Strigul, 2009). Perhaps, the most profound fact about dialog is that "it is the things that we cannot predict that are the most important parts of conversation. Otherwise, it is hard to see why we should speak at all" (Howes et al., 2013, p. 359). It is the larger, richer dialog of convergence and divergence that is needed for language, learning, and life to continue. Perhaps, what is most needed for researchers and theorists is to be surprised once again by the dynamics of this dialog.

## **ACKNOWLEDGMENTS**

Portions of this project were supported by an Initiative Grant from Gordon College. The author is grateful to Jerry Burger for encouragement to begin this project, and for helpful comments by Katharine Adamyk, Kelly Burton, Ben Meagher, Zsolt Palatinus, and Colwyn Trevarthen.

## **REFERENCES**


of Everyday Mindreading, ed. A. Whiten (Cambridge, MA: Basil Blackwell), 143–158.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 May 2014; paper pending published: 04 June 2014; accepted: 23 June 2014; published online: 08 July 2014.*

*Citation: Hodges BH (2014) Rethinking conformity and imitation: divergence, convergence, and social understanding. Front. Psychol. 5:726. doi: 10.3389/fpsyg.2014.00726 This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Hodges. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Social interaction, languaging and the operational conditions for the emergence of observing

## *Vincenzo Raimondi\**

Linguistique Anthropologique et Sociolinguistique – Institut Marcel Mauss, École des Hautes Études en Sciences Sociales, Paris, France

#### *Edited by:*

Hanne De Jaegher, University of the Basque Country, Spain

#### *Reviewed by:*

John Joseph McGraw, TESIS Network, Denmark Talbot J. Taylor, College of William and Mary, USA

#### *\*Correspondence:*

Vincenzo Raimondi, Linguistique Anthropologique et Sociolinguistique – Institut Marcel Mauss, École des Hautes Études en Sciences Sociales, 190-198 Avenue de France, 75013 Paris, France e-mail: vincenzoraimondi@ hotmail.com

In order to adequately understand the foundations of human social interaction, we need to provide an explanation of our specific mode of living based on linguistic activity and the cultural practices with which it is interwoven. To this end, we need to make explicit the constitutive conditions for the emergence of the phenomena which relate to language and joint activity starting from their operational-relational matrix. The approach presented here challenges the inadequacy of mentalist models to explain the relation between language and interaction. Recent empirical studies concerning joint attention and language acquisition have led scholars such as Tomasello et al. (2005) to postulate the existence of a universal human "sociocognitive infrastructure" that drives joint social activities and is biologically inherited. This infrastructure would include the skill of precocious intentionreading, and is meant to explain human linguistic development and cultural learning. However, the cognitivist and functionalist assumptions on which this model relies have resulted in controversial hypotheses (i.e., intention-reading as the ontogenetic precursor of language) which take a contentious conception of mind and language for granted. By challenging this model, I will show that we should instead turn ourselves towards a constitutive explanation of language within a "bio-logical" understanding of interactivity. This is possible only by abandoning the cognitivist conception of organism and traditional views of language. An epistemological shift must therefore be proposed, based on embodied, enactive and distributed approaches, and on Maturana's work in particular. The notions of languaging and observing that will be discussed in this article will allow for a bio-logically grounded, theoretically parsimonious alternative to mentalist and spectatorial approaches, and will guide us towards a wider understanding of our sociocultural mode of living.

**Keywords: social interaction, recursive consensual coordination, languaging, observing, bio-logical approach, Maturana,Tomasello, intention-reading**

## **SOCIAL COGNITION AND LANGUAGE**

Over the last decades, "social cognition" has become the object of intense interdisciplinary research. Many theoretical and empirical efforts have been dedicated to understanding the specific conditions on which human interaction and the ontogenetic development of our socio-interactional skills rely. In this context, explaining how individuals involved in interaction solve the "problem of other minds" in order to conduct effective coordination stands out as a major challenge for many scholars. However, a debate has flourished concerning the validity of supposing some kind of "mindreading" to account for social interaction. Whereas the cognitivist accounts view this as a crucial issue (e.g., Frith, 2008) and propose several models to resolve it, the embodied and enactive approaches consider representational and spectatorial explanations of human interactivity to be inadequate. According to the latter, social engagement with others does not fundamentally constitute a cognitive problem to be solved through the mutual detection of mental states by the interacting individuals; rather, it is the result of embodied, ecologically embedded, intersubjective dynamics (De Jaegher and Di Paolo, 2007; Gallagher, 2008a,b; Hutto, 2009; De Jaegher et al., 2010; Di Paolo and De Jaegher, 2012).

Consistent with non-mentalist approaches to interaction, I would like to direct our attention to how the explanation of linguistic activity can broaden our understanding of human interaction and sociality. Up to the present, theories in the cross-disciplinary domain of social cognition have not privileged the investigation of the linguistic phenomenon, or have taken traditional views of language for granted. A partial exception to this is Tomasello's influential research conducted on joint activity, leading to the author's hypothesis of a *functional* relation linking intention-reading to language, and language acquisition in particular. However, this hypothesis is questionable, as is Tomasello's conception of language.

A major obstacle for understanding the *constitutive* relation that links language to social interaction is the fact that the linguistic phenomenon is still frequently conceived in inadequate terms. Here I will propose an alternative explanation of both language and social interaction using a different epistemological framework. To this end, I will first draw on Tomasello's model to discuss the limits of cognitivist approaches, including those that are more "socioculturally oriented." I will subsequently show how these limits can be overcome.

Building on developmental and comparative research, Tomasello et al. (2005) offer an interdisciplinary approach in order to explain language and culture by tracing them to the foundational conditions of social engagement and joint activity (e.g., Carpenter et al., 1998; Tomasello, 1999, 2003). According to Tomasello, both human collaborative activities and communication – conceived as a special activity based on the utilization of "linguistic symbols" as cultural artifacts – are possible thanks to our prosocial dispositions and certain unique cognitive skills. Modified throughout the years, the most recent version of this theory downplays the simulationist positions previously held by Tomasello (1999) and postulates that a species-specific *sociocognitive infrastructure* provides humans with the capacity for "shared intentionality"1 (Tomasello et al., 2005; Tomasello, 2008). Along these lines, Tomasello puts forth the theory of a universally inherited infrastructure which would include skills for imitative learning and role-reversal, a disposition for cooperation and the uniquely human skill of recursive intention-reading, allowing us to understand communicative intentions cooperatively. In language sciences, similar arguments have been proposed by Levinson, among others, in his hypothesis of an innate and universal "interaction engine" (Levinson, 2006a,b).

Supported by a host of experiments, Tomasello's theory is supposed to account for, among other things, the ontogenetic emergence of "joint attention"in infants' early interactions. Beginning around nine months of age, infants start to jointly attend to objects with others in interactive settings, following the other's gaze (Scaife and Bruner, 1975; Bruner, 1977), and starting to respond to and initiate pointing gestures (Bates et al., 1975). While the explanation of the emergence of such "triadic" interactions is the object of fierce debate (see, e.g., Eilan et al., 2005; Seemann, 2011), Tomasello, in agreement with Bruner's (1995) conception of just such a developmental step as the first "meeting of minds," argues that the emergence of joint attention reveals the development of intention-reading skills, permitting the child to "know together" with his caregivers that they are attending to the same thing (Tomasello, 2008). This is supposedly the first step in the subsequent development of full-fledged mindreading (Lohmann et al., 2005; Tomasello et al., 2005).

What then is the impact of this hypothesis on our understanding of language? Tomasello argues that not only could the hypothesis of a sociocognitive infrastructure explain language acquisition, it could also offer important insights for comparative research as well as phylogenetic investigation into the origins of language. The crucial point here is that the conventionalized symbolic system which we use to coordinate with each other in joint activities, or "linguistic code" as it is labeled by Tomasello, "(...) rests on a nonlinguistic infrastructure of intentional understanding and common conceptual ground, which is infact logically primary" (Tomasello, 2008: 58). By discovering the communicative intentions of the others, the child ontogenetically acquires skills for communication, typically by first understanding and

initiating activities based on joint attention (for example, by pointing at objects in order to request them), and then by appropriating intention-based expressions addressed to him by adults. In this manner, precocious intention-reading gradually allows the child to grasp the meaning and function of conventional symbols, which can be then mapped into usage-patterns (Tomasello, 2003). In other words, Tomasello's model supposes that shared understanding of goals and recursive intention-reading are already in place when children begin to speak. According to this model, the sociocognitive infrastructure is a prerequisite for language acquisition and is, in fact, its developmental precursor. In line with this, Tomasello recommends that studies on the phylogenetic origins of both language and cultural life should include an inquiry into the evolution of this sociocognitive infrastructure as a necessary preadaptation for the emergence of language and culture. Moreover, he argues that qualitative differences between contemporary primates with regard to social engagement and symbolic communication would be explained by the hypothesis that non-human primates lack just such a species-specific skill enabling the detection of communicative intentions in a cooperative goal.

It is beyond the scope of this paper to offer an exhaustive analysis of Tomasello's theory, so I will not be able to address all of its important insights concerning cooperation and human sociality (e.g., Tomasello, 2009, 2011). I will restrict myself to discussing the explanation provided for interaction and language through the notion of intention-reading, in order to present a non-mentalist approach to the same questions.

*Prima facie*, looking to social interaction and joint activity in order to seek out the *raison d'être* of language may not seem problematic in itself; quite the contrary. As opposed to formalist and nativist views of language, the conception of linguistic phenomena as inherently social and activity-grounded can be linked to several long-standing positions held both in linguistics and philosophy. Undoubtedly, any theorization about the precise conditions necessary for language to emerge within interactional real-time dynamics – which is admittedly one of the principal aims of Tomasello's work – is a precious contribution.

However, when it comes to the hypothesis provided, Tomasello's model remains highly contentious. First of all, Tomasello's position has garnered criticism concerning the postulated precocious emergence of intention-reading, as well as the complex meta-representations and recursivity it would entail (Griffin and Dennett, 2008; Moore and Barresi, 2010; Reboul, 2010). Another controversial issue concerns the idea that a communicative intention could be understood independently from the precise linguistic forms that express it; by definition, one cannot come without the other (for a similar argument, see Taylor and Shanker, 2003). Tomasello actually argues in favor of a causal relation between a communicative intention and its linguistic form, in that the grasping of the former leads to the subsequent appropriation of the latter. However, although Tomasello claims to draw on philosophy of language for such notions as "non-natural meaning" (Grice, 1989) and communicative intention, it should be observed that the theories to which he refers do not imply the "developmental claim that an understanding of intentions comes before communication" (Racine, 2011: 33). In addition to this,

<sup>1</sup>According to Tomasello, "shared intentionality," presented as the mutual acknowledgment of joint commitment and joint intentions between interacting individuals, is a necessary condition for the realization of human practices, since they all supposedly involve "sharing of psychological states" in a cooperative goal.

and more importantly, Tomasello offers no operational explanation for the emergence of any mechanism of intention-reading; it is merely assumed to exist, as though it were a "X-ray perception" of intentions (Cowley, 2004). For this reason, I contend that this mechanism is not at all operationally grounded. The emergence of such a functional skill remains *unexplained*, although seemingly *justified* by its putative function in bio-logical heritage as sort of cognitive leap separating humans from other primates (Raimondi, 2013). Based on our knowledge of living beings, what operational foundation would allow the assumption that a human organism could develop such a mechanism by the age of nine months? One of the main limits of the hypothesis is that an intentionreading mechanism should be explained starting from its own conditions of possibility. However, as soon as we try to show its emergence, we become aware that precocious intention-reading is neither operationally possible nor necessary.

While Tomasello rejects the existence of a Chomskian linguistic faculty, he proposes a sociocognitive infrastructure based on a similar conception of organism and ontogenetic development. Ultimately, Tomasello's model relies on highly questionable assumptions about the status of language as a symbolic conventional tool and the role of mind in the explanation of interaction. The hypothesis of intention-reading as a precursor to linguistic learning is therefore dependent on controversial epistemological background.

I would therefore suggest a shift in focus to address the issue of the constitutive relation between interaction, joint activity and language on radically different epistemological bases. On the one hand, I will challenge Tomasello's conception of mind, interaction and language. On the other hand, I will propose alternative theoretical arguments to show that language and human interaction are not functionally but constitutively related as they take place in the same operational-relational matrix. This means that we need to show how individuals, through the operation of mutual coupling, generate the interindividual domain to which linguistic and interactional phenomena should be traced in order for them to be explained. By the same token, it will become possible to understand why we cannot consider such phenomena to be the product of any faculty or property of the mind, precluding any mentalist explanation to account for their generation.

# **INTERACTION, SEE UNDER MIND**

Along with others scholars (De Jaegher and Di Paolo, 2007; Gallagher, 2008a,b; Leudar and Costall, 2009; De Jaegher et al., 2010; Di Paolo and De Jaegher, 2012), I argue that cognitivist approaches are inadequate to provide an explanation of social interaction. I discuss some of the issues related to such approaches by drawing on Tomasello's model. After all, the sociocultural approach which Tomasello seeks to provide does not prevent him from relying on a conception of "mind" that, however "socially oriented," remains committed to the some traditional cognitivist assumptions about mind and behavior. Epistemogically, this model endorses mentalist and folk-psychological views of organism as well as a spectatorial conception of interaction.

Mentalist assumptions include the idea that all phenomena related to the individual's interactions with his environment could be explained by the presence of a mental mechanism which would be functionally responsible for the generation of said phenomena (in the present case, Tomasello's recursive intention-reading is such a mechanism). This supposes a hierarchical organization inherent to the organism whereby phenomena belonging to the behavioral level arise *as specified* by processes taking place at another level, whether the latter be mental mechanisms or the neurobiological implementation of these mechanisms. Cognitive mechanisms are therefore assumed to be endowed with causal powers in the generation of behavior. Accordingly, they determine the adaptive competence of the organism that interacts with its medium. Such a hierarchical relation between mind and behavior is thus viewed as fundamental. This is consistent with the representationalist conception of cognition as an internal process that generates a representation of the environment in order produce an adequate response to it. Within this tradition of thinking, since subpersonal operations supposedly explain the organism's "know-how," mentalist explanations seem to be a suitable way to account for interactional phenomena.

By folk-psychological characterizations of mind, I refer to the pervasive idea that intentions and other mental states, normally ascribed to agents in daily life, are entities that exist on a more fundamental level than the behaving agents themselves. For example, Tomasello et al. (2005) endorse a mentalist and folk-psychological view of cognition in assuming that intentions and goals drive the genesis of behavior that is adaptive to the sociocultural niche. From this perspective, "intention" is actually conceived as an "internal entity that guides the person's behavior" (Tomasello et al., 2005: 676).

Mentalist and folk-psychological views of cognition are intimately connected to an intellectualist postulate which assigns a spectatorial position to interacting individuals. According to such a view, these interacting individuals are being constantly faced with the problem of mutually detecting and predicting the mental states underlying the other's behavior. Because of this assumption, Tomasello argues that shared intentionality, as the foundation of joint activity and communication, can only be achieved through special skills allowing the comprehension of others' cooperative intentions. The spectatorial view implies that the agent needs to represent the others' minds in order to achieve intersubjectivity with them. Since intentions are supposedly internal entities that cause behavior, a child is immediately faced with the problem of making sense of the behavior of adults. Before he can grasp intentions, "(...) from the infant's point of view the adult is just making noise (for whatever reason)" (Tomasello, 2003: 23). Therefore, bridging the *self/other gap* requires an *ad hoc* infrastructure. However, this functionalist explanation relies on the creation of a mechanism coherent with the problem that the analyst himself posits as such.

By drawing on Tomasello's model, I have briefly illustrated some of the epistemological reasons why many studies of social cognition consider human beings to be spectators of others' behavior, and focus on individual mechanisms in order to explain how we act together and understand each other in interactive settings. However, I contend that these assumptions are based on an inadequate conception of organism, and that cognitivist heuristics unavoidably lead to a one-dimensional, individually-grounded notion of interaction. It should be remarked that the conflation of interactional and individual in the cognitivist approach causes us to lose sight of the interactional as a distinct domain.

# **THE EPISTEMOLOGICAL BACKGROUND FOR A BIO-LOGICAL EXPLANATION OF INTERACTION**

As an alternative epistemological paradigm, I will rely on Maturana's "Biology of cognition" (Maturana, 1978, 1988, 2002), and on some assumptions shared by embodied and enactive approaches. In the interest of brevity I will only highlight certain aspects of Maturana's theoretical contribution and I will assume that most of its core features (e.g., autopoietic organization, structural determinism, nervous system's operational closure etc.) are already familiar to the reader, as well as its similarities and differences with regards to the enactive and embodied approaches.

What I define hereafter as a "bio-logical approach" is based on just such a non-reductionist epistemological framework. In a nutshell, taking a bio-logical stance to account for interaction means seeking out the *conditions of possibility* for all phenomena related to interacting individuals by drawing on our understanding of living beings. To this end, we need to make explicit the systemic conditions under which social interaction exists, clarify its relation with the constitution of living beings, and provide it with a generative explanation. By "generative explanation," I mean an explanation that first traces the phenomena requiring explanation to the existential domain where they belong, and then proposes a mechanism that generates the *explanandum*. In this case, the phenomena to be explained are social interaction and language.

The bio-logical approach challenges the traditional cognitivist view of living being. Whereas the latter takes for granted a hierarchical organization (wherein the neurobiological level determines and controls the behavioral level, as we have seen above), the former posits two non-hierarchically related domains: on one hand, the domain of the living being's structural components, and on the other, the domain in which the living being exists as an organism. Like every system, living beings basically exist as such in two co-occurrent domains: one in which it can be seen as an organism operating *as a whole* in interaction with its medium; and one in which it exists as a *composite entity* which can be deconstructed in order to observe its molecular and supramolecular components, its internal dynamics, and its structural changes. As Maturana argues, these two domains "do not intersect": they constitute two radically different domains of phenomena that cannot be reduced to each other. Consequently, any attempt to explain the phenomena of one domain in terms of the other is inadequate. There is, however, a dynamic generative relation between them arising from the structural changes that the living being and its medium trigger in each other during the course of their "structural coupling" (see, e.g., Maturana et al., 1995).

Let us examine what adopting this view implies. On one hand, neurobiological processes belong to the domain of structural components. On the other hand, the apparent and non-apparent *dimensions* of the *relational operation* of the living being with its medium, such as behavior, mind, and emotions, constitute the "operational sphere" of the organism as a whole, and cannot be traced to the domain of components. Although the structural dynamics that takes place in the domain of the components participate in the systemic process, these dimensions pertain to the organism as a whole and denote classes of phenomena that take place in the operational domain in which the individual exists as such. Strictly speaking, such dimensions are determined neither by the system's structure (the "inside") nor by the medium's structure (the "outside"), but are dependent on the dynamic interplay between the two. However, this co-modulation is constrained by the structures of both the organism and the medium. The result of this structurally determined dynamic is the generation of the operational relational matrix in which the organism exists at every moment in the course of its living as a spontaneous outcome of both a phylogenetic and ontogenetic history. The *organism's existential domain* is therefore *inherently operational and relational*.

Several conclusions can be drawn from this approach. First, it prevents us from assuming a neurocentric conception of cognition. Cognition concerns the organism as a whole, not its components. Maturana and Varela (1980, 1992) have shown that the neural network operates as a closed system and does not have inputs and outputs, properly speaking. For that reason, the nervous system does not and cannot pick up information from the environment in order to compute a representation of it, nor can it specify the phenomena taking place in the domain of the organism as a whole. The role and the adaptive character of neurobiological processes in the generation of the organism-as-a-whole's relational operation are to be understood as part of a systemic, dynamic process that involves both the operations of the organism and the medium (see, e.g., Maturana, 2000). This dynamic triggers structural changes in both the living being and its medium in such a manner that they cannot be anything but congruent to each other until the living being dies.

Second, this approach prevents us from accepting mentalist explanations. Unlike the traditional cognitivist position, the biological framework allows the relation between different dimensions of the individual's operational sphere, such as those of behavior and mind, to be understood in terms of systemic solidarity; that is to say, one dimension does not specify the features of another, neither do the different dimensions "exert a control" over each other. In other words, within the organism's operational sphere, no dimension is to be considered as more fundamental than the others. However, the multidimensional architecture of the organism's operational sphere and its constitutive systemic dynamics allows us, as observers, to establish correlations between its different dimensions. As a matter of fact, if behavior, mind and emotion are different yet interdependent dimensions of the organism's operational sphere, they could be conceived as Borromean rings, simultaneously distinct and interlocking.

Finally, since the mind is a dimension of the operation of the organism as a whole (and therefore does not coincide with neurobiological processes), and since the nervous system cannot be said to determine the generation of the organism's operation, no linear causal power concerning the generation of behavior can properly be assigned to brain or mind, as is the case in mentalist approaches. Furthermore, intentions and goals belong to our description of the organism's operational sphere in relation to its medium, and not to neurobiological processes. At the same time, it is clear that rejecting the Cartesian conception of mind does not imply that one subscribes to any kind of eliminativism or physicalism. Rather, it suggests that, as Keijzer (2001: 33) argues, "mind

applies at a personal level and does not provide a conceptualframework which specifies how subpersonal processes operate to bring a person's behavioral capacities into being." To this we can add that operational-relational capacities are brought into being not by neurobiological processes alone, but by the dynamic interplay between these processes and the medium.

By understanding that the organism's existential domain should be regarded as inherently operational and relational, it becomes possible to see all phenomena related to an organism's relational operation as belonging to the domain of its realization as a whole. Social interaction, joint activities and language are not explainable as products of neurobiological dynamics or other inner mechanisms, since they take place in the relational domain. Thus, their emergence and specific features can only legitimately be explained with reference to the human operational–relational matrix.

## **THE DOMAIN OF INTERACTION AND COORDINATION**

Based on the bio-logics of living beings, what are the conditions through which human social interaction emerges and how are these conditions linked to language? Concerning interaction, I would like to emphasize that the bio-logical approach allows us to shift from an explanation of interaction centered on individuals to an explanation of interaction within its own domain as such. In focusing on the relational domain of interaction, we are aware that although this domain is brought forth through the operation of two or more organisms conserving their independent identities, it possesses its own organization. This approach radically challenges the individualist understanding of interactivity, and puts the interactional process at the heart of the present inquiry.

Let us begin by developing an explanation of interaction that will draw on the bio-logical standpoint. As seen before, the organism as a whole is structurally coupled to its medium, and the mutually adaptive relation between the two is an existential condition that results from a specific ontogenetic and phylogenetic history. Most importantly, the organism as a whole exists precisely through the *relational operation* of coupling. The relational operation is thus not episodic – rather it is brought forth by an ongoing, necessarily continuous dynamic. Interaction between organisms can therefore be better understood as a spontaneous and inevitable consequence of structural coupling; that is to say, as a recurrent event in the ontogenetic history of living beings2. It follows that our understanding of interaction is logically subordinated to our understanding of the constitutive conditions of structural coupling. In other words, in accordance with Maturana and Varela, we can say that interaction is subordinated the conservation of the invariant conditions of living: that is to say, the *autopoietic organization* of living being (which takes place in the domain of components) and the organism's *relation of adaptation* to its medium (which takes place in the domain of the organism as a whole). In other words, we do not need to provide any justification for the fact that interactions happen all the time throughout the biosphere, nor for the effectiveness of these interactions. What

is needed is instead to identify the conditions that generate different interactional phenomena among different species in general, and joint activity amongst human beings in particular.

It is clear that from a non-representationalist point of view, interaction can often be analyzed as a bi-directional, co-regulated dynamic of coordination, as shown by theorists of both dynamics systems and enactive approaches (e.g., Fogel, 1993a,b; De Jaegher and Di Paolo, 2007; Fogel and Garvey, 2007). In line with Maturana's definition, I argue that we can speak of *consensual coordination* when:


Thus defined, consensual coordination is similar to the ethological notion of "ontogenetic ritualization," which is frequently observed in several species and in non-human primates in particular (see Tomasello, 1999). By emphasizing the *consensual* character of this coordination I highlight two key aspects: first, that the relation between the observed interdependent behaviors would not be observed without a specific ontogenetic history, and second, that this coordination occurs as the *spontaneous* consequence of coupling. Although the term "consensual" employed by Maturana can evoke agreement and may therefore be perceived by some as ambiguous, the proposed definition should clarify its meaning in the context of a bio-logical approach. Furthermore, it should be clear that the emergence of consensual coordination is not a consequence of a deliberate, planned strategy, nor does it include goal directedness; rather, the establishment of consensual coordination allows individuals to successively draw on an already established "consensual domain" of coordination patterns, in order to operate "strategically." Taking this definition into account, "coordination" will hereafter refer only to consensual coordination.

With that said, if we focus on interaction and consensual coordination alone, we cannot entirely explain how language and complex human sociocultural practices can emerge. This becomes clear as soon as we note that, from a bio-logical viewpoint, coordination cannot be seen as a communicative setting or "information transmission." It would be misleading to speak of "communication" in order to account for animal coordination. This would mean that the conduct of the individuals involved "conveys a message" which refers to circumstances related to the message's emission, "as if what determines the course of the interaction were the meaning and not the dynamics of structural coupling of the interacting organisms" (Maturana and Varela, 1992: 207). Consensual coordination does not rely on this informational model. No "information" is exchanged and no object can be denoted or *observed* by the interacting individuals. Any alleged exchange of signals between coordinating individuals is only a description of the interaction made by the observer (Maturana and Varela, 1980).

<sup>2</sup>In this paper I will maintain a distinction between the terms "interaction" and "structural coupling"; while employing the latter to refer to the bidirectional, constant mutual triggering between organism and its biotic and abiotic medium, I reserve the use of the former to refer to delimited events where a given sequence of interlocked operations is distinguishable between two or more organisms.

We must still wonder which specificity inherent to human coupling gives rise to language, compared to other modes of living in the biosphere where language is apparently absent. To explain the emergence of human cultural and linguistic phenomena, it is therefore necessary to make explicit the specific feature of human domain of consensual coordination.

# **RECURSIVE CONSENSUAL COORDINATION: LANGUAGE AND HUMAN JOINT ACTIVITIES**

Given this definition of consensual coordination between interacting individuals, I would argue that a bio-logical explanation of language and joint activity can be provided. In line with the previous considerations, this explanation must trace language's constitutive conditions to the bio-logics of living systems. In keeping with Maturana (1988), our question could be formulated as follows: under which circumstances within the history of interactions between living beings can language emerge? Or, in other words, how can we explain linguistic activity as a class of phenomena related to structural coupling, and therefore as a consequence of a specific history of coexistence between living beings? This is an epistemological question that must first be answered from a theoretical standpoint.

Social interaction is fundamental in species for which individual ontogeny occurs as a part of a network of co-ontogenies brought about through consensual coordination. In human interactions, it is the emergence of *recursion* within the consensual domain that gives rise to the *classes* of inherently social phenomena that we distinguish as language, communication, and more generally, human sociocultural practices. *Recursive consensual coordination* is, in effect, the generative mechanism we were looking for. Building on Maturana's work, I choose to define *languaging* as *a process based on recursive consensual coordination of individuals' interrelated operations, taking place in the interindividual relational domain.* Minimal languaging appears in the domain of interaction as soon as individuals operate a coordination which takes place, recursively, "at the top" of their historically established domain of coordination. The new classes of operations that one can thereby distinguish still consist of consensually interrelated operations. However, they differfrom those based on"flat"consensual coordination in that they only take place through a recursive process which draws on the history of other coordinated operations brought about by the individuals in prolonged, intimate coexistence.

To clarify the power of recursive coordination, it is best to see an example of how it functions. Let us consider a "flat" human coordination such as the passing of toys between an infant and his caregiver. This activity presents many aspects of a coordination framework that we can observe in other species. However, a new framework appears if the infant and his caregiver bring about a new coordination by recursively drawing on the pre-established one as an operational basis; i.e., when activity such as the play of passing toys allows the emergence of a new activity that includes the *request* to pass said objects. The circumstances are similar but we can now observe a new class of phenomena. Vocalization, gestures, movements, and the other interrelated operations are now elements of a recursive consensual coordination that is identifiable as a new activity. This new class of doing things together cannot be

reduced to the previously established class; however, its possibility relies precisely on this previously established class.

This basic example shows that the process of languaging constitutes an astounding expansion of individuals' operational relational matrix, and that it allows the generation of new classes of interrelated operations that are bio-logically possible only through recursion. Importantly, these classes of operations constitute our human doings; they coincide with our "doing things together" in coexistence as different types of joint activities. Moreover, because of the multiplying character of recursivity, new coordination can occur recursively in the flow of "doing things with others." The flow of languaging should therefore be understood from within the mutual operational-relational interdependencies which it brings about. This flow of coordination extends beyond isolated occurrences of coordination: individuals' respective operational spheres (including our behavioral, mental and emotional dimensions) remain interdependent beyond the event of coordination. Ontogenetically, the languaging flow sets a matrix of interdependence within which all our operations as human beings exist. "Doing things with the others" through recursive consensual coordination can therefore be considered as the invariant organization of the systemic dynamic of human structural coupling. In other words, languaging constitutes a species-specific feature of the mode of living through which we human beings exist as a distinct class of organisms. This mode of living constitutes the human "ontogenetic phenotype" (notion introduced by Maturana and Mpodozis, 2000; see, e.g., Maturana and Verden-Zöller, 2008); or to put it another way, the core feature of our "developmental system" (Oyama, 2000; Oyama et al., 2001).

Although it is not possible to develop these notions at length within the limits of this article, it is important to show the theoretical implications of an approach in terms of languaging compared to other conceptions of language. What we call "language" coincides with constitutive *elements of coordination* within languaging. Language therefore belongs to the process of languaging and can be considered as a multi-scalar system of discriminant differences which allow us to bring about different forms of activities. In such as regards the complex systems of dynamic operational configurations brought about by each event of recursive coordination, these elements can be considered as "semiotic elements" precisely in that they specify different configurations of coordination. By the same token, aspects of our operation that do not result in a difference of coordination are not "semiotic elements" in relation to a given contingent, consensual domain. Undoubtedly, we can distinguish some of the more salient classes of semiotic elements within our present cultures, and we can study them using the most thorough and sophisticated systems of analyzable regularities (lexical, grammatical and phonological). At the same time, other systems of regularities relating to the event of coordination can now be taken into account: gesture, prosody, conversational turns etc. (e.g., Kendon, 1990; McNeill, 1992; Schegloff, 2007). Nevertheless, all these systems of regularities do not explain languaging themselves, nor do they exhaustively describe the operational architecture underlying recursive coordination.

In several aspects, the explanation of languaging allows us to embrace the dialogical, actional view of language as opposed to an internalist, monological view (Linell, 2009). In keeping with the distributed approach to language (Cowley, 2007, 2011; Thibault, 2011), it should be noted that the event of coordination is a co-constructed dynamic that engages the embodied organism and occurs in real-time interactivity. Such a dynamic unfolds on extremely fast time-scales, measurable in fractions of a second. Meaning is directly inherent to the flow of recursive coordination and to its contextual operational architecture within each interactive situation.

Here I would like to emphasize that by identifying "recursive consensual coordination" as the generative mechanism underlying such a real-time, interactional process, we can understand what makes it unique in comparison to other kinds of "flat" coordination. Importantly, since it is operationally grounded on the bio-logics of structural coupling, languaging can be traced to interaction and coordination, yet it constitutes phenomena whose properties are not reducible to them. Moreover, this process takes place in a flow of operational interdependence that goes beyond the setting of any single event of coordination, and whose result is the network of human practices. Also, it is clear that language cannot be considered as being either logically primary or secondary to sociocultural activities, because language and recursive coordination are necessarily co-occurrent. Although they can be analytically distinguished, human joint activity and language arise from the same process; one is not the cause of the other.

As we have previously examined, the emergence of consensual recursive coordination does not require any previous agreement between interacting individuals. Rather, such coordination relies on the congruent transformation of our operational spheres during the process of living together, and it is a systemic, spontaneous result of this process. Recursive coordination does not therefore require agreement, or previous understanding; on the contrary, it is the condition by which agreement and understanding can arise. In fact, coordination does not even presuppose cooperation, since cooperation refers to the configuration of emotionning within which a given coordination is brought about. Even though cooperative coordination is crucial to human mode of life, what is proposed here is not an irenic vision of interaction; it includes all antagonistic forms of coordination (negotiations, conflicts) in as much as all these forms do not invalidate but rather intrinsically confirm the consensual character of coordination, along with the constitutive interdependence between individuals' operational spheres. This occurs as *conversation*. What I refer to as "conversation" is a flow of languaging where individuals operate a recursive coordination which draws on the consensual distinction of the configuration of interrelated operations brought about by a previous occurrence of recursive coordination. For example, in conversation we can refuse or negotiate the "communicative actions" enacted (or "projected") by others, actions that by definition specify a certain immediate or future effective interrelation between the operational sphere of others and our own. As a result, conversation allows us, by operating in languaging, to modulate or to change the course of the dynamic flow of our operational interdependence. Since this shift in the flow of languaging occurs through recursive coordination, it does not disintegrate the interrelation between our operational spheres, but allows an expansion of it while remaining within the realm of

languaging. The same is true for such events as misunderstandings (or lack of understanding), that can be "repaired" through recursive coordination. Conversation provides the possibility of a fully human reciprocity, which in turn makes it possible to preserve languaging by languaging. Without conversation, our interactions would only be the accumulation of simple sequences of recursive coordination. Finally, conversation represents an immensely complex evolution compared with the phenomena brought about by "flat" coordination. I would go so far as to say that conversation is one of the fundamental aspects of our living-through-languaging.

# **INTEROBJECTIVE DISTINCTIONS AND THE EMERGENCE OF OBSERVING**

Having introduced recursive consensual coordination as the generative mechanism of language and joint activity, I need to make explicit another fundamental aspect of languaging. The following should further clarify the relevance of the bio-logical approach in order to overcome cognitivist accounts of the emergence of social interaction and joint activity, such as Tomasello's. The spectatorial position that cognitivists ascribe to interacting individuals implies that they engage in the *observation* of objects, persons, intentions, "shared knowledge" and "common ground." However, this observation cannot bio-logically precede recursive coordination and therefore cannot be a precondition of language and joint activity. To the contrary, I will show that such an operation of *observing* is generated precisely through languaging.

As claimed earlier in this paper, non-human animal interactions do not and could not take place by "referring to objects." However, we should now explain how we as human beings refer to the circumstances related to our operation. To this end, it is necessary to define what is intended here as an object. Within the presented epistemological framework, objects are dynamic operational configurations related to recursive coordination and therefore to our relational operation. While objects are admittedly constituted through the operations of each of us as single individuals, their constitution relies on recursive coordination with others. More specifically, I consider that objects are the *sine qua non* operational condition for recursive coordination. Recursive coordination is brought about by taking a given configuration of interrelated operations as the operational basis for a further coordination. These configurations of operations remain obscured to the individuals, who only operate different kind of distinctions: "Objects arise in language as operations of coordinations of coordinations of doings that stand as coordinations of doings about which we recursively coordinate our doings as languaging beings" (Maturana, 2002: 28).

From a cognitive point of view, objects depend on operating consensual "interobjective" distinctions, that is to say, distinctions related to the configuration of interrelated operations which bring about a recursive consensual coordination. Ontogenetically, the process of languaging leads to the routinization of distinguishing objects (entities, relations, processes). This epistemological explanation implies that, for the individuals, objects are as experientially present and real as the operations that allow them to arise, independently from the domain – physical, relational, abstract, imaginary – in which they can be classed by an observer

thereafter. With regard to individuals operating recursive coordination, objects exist first as immediate configurations of operation and can then be *observed* as objects through a subsequent recursive operation, as distinctions of distinctions of distinctions.

Let us explore what I mean by *observing*. If the previous considerations are clear, we can go a step further and consider what happens when individuals start distinguishing their own interobjective distinctions through recursive coordination. « Observing » becomes then possible: recursively operating on interobjective distinctions is equivalent to being mindful about the objects that are distinguished through coordination. In this regard it is important to note that observing is a process that relies on the bio-logics of living beings, to the extent that observing is a possibility inherent to the operation of the organism as a whole, provided that it can operate through recursive consensual coordination. In this light, while observing is admittedly possible only under some specific conditions (with a given phylogenetic trajectory and an ontogenetic history of coexistence while doing things together through languaging), it can be explained as a bio-logical operation without basing it on any other principle or functional device. By making us distinguish our own distinctions in terms of entities, experiences and feelings, observing is therefore another key element in the explanation of the sociocultural practices that characterize the human mode of life. In effect, it is through the operation of observing that description-making, development of narrative skills and reflection become possible. These operations draw on the process of distinction of objects arising in recursive coordination, and on its increasing recursive complexity. Furthermore, as we learn to operate distinctions through the practices within which objects exist, these objects can be operated independently of the single occurrences of interaction. This means that they are gradually embodied in the relational operation of structural coupling to our medium and are operated recurrently in the process of making sense of daily human life – even during solitary activities.

Virtually all configurations of operations can become objects in the process of languaging and therefore expand the interindividual domain of objects and practices. More generally, we are dealing with what Maturana would call an "interobjective domain" (2000, 2005), which includes both observed and non-observed objects, and is constitutively open to dynamic expansion and change, since it is strictly contingent on historical and situated circumstances of coordination. This being said, it is clear that the term "interobjective domain" relates to an abstraction that one can make of a network of dynamic languaging flows. These flows always take place in an ever-changing present during the course of interactions within a given network of human beings, and follow a not-pre-established drift which draws on an inherently peculiar, cultural history of recursive coordination. It should be remarked that the notion of "interobjective domain" can be partially assimilated into that of "common ground" (Clark, 1996; Tomasello, 2008), meaning that of common knowledge, assumptions, and norms "shared" by individuals; but only if we consider the latter from a non-intellectualist, non-spectatorial standpoint. The notion of "interobjective domain" refers to the matrix of potential configurations of coordination operable by individuals through languaging, at a given moment in their ontogenetic history.

We can now understand why languaging makes it possible for human beings to reference entities and events. Since objects are the operational condition for languaging, it follows that interactions not relying on recursive consensual coordination (such as the interactions existing between individuals of other species) also do not entail the constitution of interobjective domains. This should not be surprising, as modes of living which do not include "operating and observing objects" are clearly just as viable and adaptive for those organisms which preserve structural coupling with their medium. Where there is languaging, there are language, objects and human sociocultural activities. Where one does not exist, neither can the others. Language, objects and human joint activities arise together through languaging.

Logically, some epistemological consequences follow. First, there is no *original* "linking problem" which individuals would have to face in their supposed efforts to "connect" languaging to objects. Thus, we cannot ascribe to infants the putative task of connecting linguistic symbols to the entities existing in the world, which Tomasello would hope to facilitate with his hypothesis of intention-reading skills. Human beings do not resort to language as though it were a system of symbols denoting entities that exist beyond their recursive operation. The flow of interrelated operations in languaging allows us to constitute, conserve and multiply objects over generations. This argument challenges the representationalist function of language and its status as a system of "symbolic tools" that we "use,"3 although symbolic thinking does take place in languaging. We will later see the importance of this for language acquisition.

Second, any spectatorial account of language acquisition is inadequate. We have seen that Tomasello considers intentionreading as logically and ontogenetically primary. However, not only does the bio-logical conception of organism challenge both the mentalist and the folk-psychological assumptions behind this hypothesis (see §3); but also, based on the explanation of observing, infants cannot be the spectator of any "communicative intention," mental state or of any other type of object before they operate interobjective distinctions. Since observing takes place in languaging as a condition for the establishment of complex forms of joint activity, it follows that observing can neither take place outside of nor before recursive coordination. The infant cannot observe any object before he begins to participate with others in specific kinds of doings and recursive coordination. When individuals observe, that is to say when they consensually distinguish objects related to the circumstances of coordination, they are already languaging.

Finally, and most importantly, this approach allows us to reconcile a non-representational conception of neurobiological processes (since, bio-logically, the nervous system does not work

<sup>3</sup>As Maturana argues: "It is because we human beings find ourselves operating in language as our natural manner of being that we live *language as if this were a transparent instrument by means of which we coordinate our behaviors in the distinction and handling of objects* – as if these existed independently from what we do with them – and we do not see what we are doing as we language. Because we live without seeing what we do as we language, we do not see that what constitutes our languaging is our living in a recursive flow in coordinations of coordinations of doings, and that objects arise as tokens of coordinations of doings that obscure the doings they coordinate in this recursive flow." (Maturana, 2000: 462; italics are mine).

with symbols, representations or content), with the possibility of our human "contentful mindedness." We, as human beings, operate objects as our cognitive way of living through languaging, often simultaneously observing some of these objects. However, it should be remarked that observing and consciousness constitute only one aspect of our otherwise noncontentful moment-tomoment operation within the flow of living. Interestingly, this explanation is congruent with Hutto and Myin's (2013) Scaffolded Mind Hypothesis and Developmental Explanatory Thesis, according to which " (...) all the mentality-constituting interactions are grounded in, shaped by, and explained by nothing more, or other, than the history of an organism's previous interactions."

# **ONTOGENETIC IMPLICATIONS OF THE BIO-LOGICAL APPROACH**

Let us now consider ontogenetic development, language acquisition and the emergence of sociocultural skills from a bio-logical standpoint. The key theoretical proposal is that children learn to speak by languaging. This means that children actually language before they are able to emit their first words. In some aspects, this turns Tomasello's theory on its head.

First of all, I suggest that a clean separation between the prelinguistic and the linguistic stage does not allow us to fully grasp the trajectory across which the operational-relational, interindividual domain of the infant and his caregivers expands through recursive coordination. By beginning to operate in recursive coordination with them through joint activities very early on in his ontogeny, a child starts participating in the network of doings that constitute the culture within which his caregivers exist as human beings. This ontogenetic process opens up a multiplicity of further joint activities in daily coexistence.

A multitude of research has shown that coordination arises very early in infant-caregiver interactions, starting as a mutual coorientation and emotional attunement (Stern, 1977; Trevarthen, 1979; Fogel, 1993b; Beebe and Lachmann, 2002; Greenspan and Shanker, 2004). As a relational process, early interactions establish the first domains of interrelation between the operational spheres of the child and his caregivers. The emotional and behavioral attunement thus generated becomes a consensual domain open to expansion in the course of recurrent interactions, including care practices and play. This consensual domain, although very rich, remains a domain of "flat" coordination, in some ways similar to that which we observe in other primates' interactions.

However, it is precisely with the phenomena arising from joint attention episodes that the first events of languaging appear, bringing new possibilities to joint activity. The child can then coordinate his attentional focus with that of the caregiver, follow objects with his gaze in dyadic settings, and transform routines of manipulation into new classes of coordinated operations. By distinguishing objects related to patterns of coordination, he can start participating in new joint activities. To repeat what I have previously stated concerning the example of the passing of toys, satisfying a request pertains to a new class of interrelated actions that cannot be assimilated into the previously established configurations of coordination on which they depend.

The development of the child's responsiveness to others' doings, as well as of his own disposition to initiate an event of

coordination, is to be understood as the spontaneous result of an ontogenetic trajectory. Across this trajectory, the variety of configurations of coordination in which he is able to participate gradually increase, while at the same time his structure changes in the course of his living. This challenges the idea of a sort of developmental discontinuity represented by Tomasello's "nine months revolution," the time in a child's life at which intention-reading skills supposedly emerge. Although episodes of recursive coordination establish a new step in the history of coexistence, what we have here is a single process, and a single generative mechanism to explain its historical trajectory. In fact, sequences of pointing (Bates et al., 1979; Tomasello, 2008) belong precisely to some of the first events of recursive coordination initiated by an infant, building on the consensual domain of activities already established. On the one hand, pointing is an operational element of recursive coordination that relies on an operational basis of pre-established patterns of coordination. These patterns ensure the interrelation between operational spheres in certain circumstances. On the other hand, pointing provides the possibility of establishing a new class of coordination that includes the fact of reorienting the attention of the other. The latter results in the constitution of a new class of coordinated operations, meaning that when the child points, he is languaging, since recursive consensual coordination is brought about by all the operational elements that can possibly give rise to it, whether "verbal" or "non-verbal." This initially sporadic participation in recursive coordination gradually allows the child to expand his range of activities through the process of operating on the consequence of recursive coordination with his close circle of relations. From this point on, the gradual distinction of new elements of coordination and objects occurs together with new events of recursive coordination. This process allows the child to acquire operational experience specific to languaging, and to make joint activity his domain of existence as a human being. The child himself then becomes a sociocultural *agent*.

Tomasello seems to have this process in mind when he speaks of non-verbal, prelinguistic communication as "natural communication" (Tomasello, 2008). However, the mentalist and spectatorial reformulation of events remains problematic in that it introduces intention-reading as an explanatory mechanism, not only lacking bio-logical grounding, but preventing us from grasping the fact that we are coping with one single process – that is to say, languaging. Moreover, the process that gives rise to language acquisition and sociocultural learning can be bio-logically explained without appealing to representationalist and spectatorial accounts. Although we as observers can contemplate a metadomain in which we associate elements of coordination and circumstances of interaction, we cannot ascribe to the child the cognitive task of matching objects in his world to "symbols" – a problem to which intention-reading would provide a solution. Not only does this solution require us to presuppose an inadequate epistemological framework, it also causes us to lose sight of the *interaction itself*. We then fail to fully understand language and joint activity as constitutively belonging to the same process. As Maturana argues, "Part of the difficulty in understanding the relation between language and existence rests on the view of language as a domain of representations and abstractions of entities that pertain to

a different concrete domain. Yet language is not so, languaging occurs in the concreteness of the doings of the observer in his or her actual living in the praxis of living itself" (Maturana, 2002: 32).

# **OBSERVING COMMUNICATIVE INTENTIONS**

I have shown, based on Maturana's work, that observing is the result of a history of interaction through languaging, and is a necessary operation for our mode of living in recursive coordination. This means that I do not need to posit any functional device for it, but only assume that our neurobiological processes are adequate for the relational–operational domain in which we human beings exist.

With regard to one of the most debated subjects of social cognition, it should be now clear why folk-psychology (understanding other's beliefs and mental states) requires the operation of observing, and relies on the emergence of different objects that are operated gradually in infancy as the result of an ontogenetic history of coexistence in languaging. Different objects and different classes of recursive coordinated operations emerge gradually: selfconsciousness and reflection (Maturana, 2005), meta-discursive skills (Taylor and Shanker, 2003; Taylor, 2012) and a language stance (Cowley, 2011) as well as the understanding of narrative practices (Hutto, 2008). All this allows the child to operate in an interobjective domain of beliefs and mental states. The important factor to be taken into account is therefore the process leading to the ontogenetic establishment of such a domain.

In this context, we can add a few words about intention-reading as presented by Tomasello. I have already made clear that the functional intention-reading infrastructure as presented by Tomasello is neither bio-logically grounded, nor required to account for "language acquisition." The explanation for the ontogenetic emergence of social interaction, joint activity, language and objects has been provided by drawing on the bio-logical understanding of structural coupling and the process of recursive consensual coordination. However, another crucial point here is that while I have argued that intentions are not internal entities causing behavior, it remains true that adults constantly attribute intentions to each other in their daily life. From an epistemological standpoint, how should we actually explain this mutual attribution of communicative intentions?

Since intentions are not components of the living being's structural domain, they should belong to the operational domain of interaction. If we draw on the explanation of objects and of the operation of observing, a rather different definition of communicative intention can be provided in place of the one presented in many mentalist approaches. I argue that communicative intentions are related to one of the previously introduced key features of languaging: conversation. I propose that we consider that what Tomasello, drawing on philosophy of language and pragmatics, calls a communicative intention is not an internal entity causing action, but instead can be explained as a class of objects constituting the *sine qua non* condition for conversation. These objects coincide with the interobjective distinction of the specific way in which individuals' operational spheres would be interrelated by a given recursive coordination. In other words, "communicative intention" refers to the consensual distinction of the operational

result to which a prefigured coordination would lead. For example, when a caregiver asks a child to fetch a toy, the communicative intention is the particular operational interrelation between the caregiver's and the child's operational spheres, which must be brought about in order for that specific event of coordination to be realized. However, for a communicative intention to exist it has to be operated. In the present case, the communicative intention arises as an immediate interobjective distinction when the child and his caregiver consensually operate a recursive coordination (i.e., the negotiation of the request) that modifies the prefigured trajectory of the operational interrelation (the request projected by one of them). The interobjective distinction of communicative intention is therefore the operational basis for the emergence of conversational classes of coordinated operations, such as negotiation.

Put differently, as an observer, I use the term "communicative intention" to identify a contingent interobjective distinction that is not required for a single sequence of coordination, but that rather makes possible a flow of recursive coordination (such as a conversation). These distinctions, initially operated in an immediate way by the child during his conversation with others, and only later recursively observed, can be subsequently named through a new recursion – for example, in the case of a given communicative action which individuals ascribe to each other during discourse). Finally, if communicative intentions can be "objects of observing," could intention-observing (as defined above), rather than intention-reading (as detection of mental states), be a precursor to language, or at least to conversation? The answer is logically negative. From a logical and operational point of view, infant cannot observe any object before operating recursive coordination. No previous intention-observing is necessary in order to bring about the developmental structural transformation which allows a child to converse; on the contrary, it is only by the operational experience which each individual already has of his domain of languaging that he can begin to converse. Again, observing neither precedes nor causes recursive coordination: it does not provide individuals with the know-how for the coordination, but is rather a concomitant operational condition for several classes of activities enacted through languaging. This means that intention-observing is not a precursor to language; at the same time, we can ascribe communicative intentions to others while languaging.

# **CONCLUSION**

The principal aim of this paper has been to contribute to studies in the domain of social cognition and interaction by introducing some considerations on the constitutive conditions of language. From an epistemological point of view, I have focused on the domain of human interaction itself and have shown that human social interaction, language and sociocultural activities arise from the same operational-relational matrix.

What I have defined as a "bio-logical" approach challenges cognitivist accounts of social engagement and coordination. In opposition to the cognitivist hypothesis proposed by Tomasello in order to explain language acquisition and joint activity, which he considers as warranted by a Cartesian infrastructure, I have suggested that we turn our attention towards the bio-logical conditions through which the operation of observing arises. As previously stated, a generative explanation for human interactional phenomena is needed. This implies, on one hand, the identification of the domain to which we can trace the phenomena to explain (in our case, linguistic activity and sociocultural practices), and on the other hand, the proposition of a mechanism that would allow the occurrence of the phenomena to explain. Such a domain is that of structural coupling between living beings, wherein interaction plays a fundamental role. A bio-logical framework allows us to see the interactional domain itself as the appropriate domain for explaining human interactivity through the lens of "consensual coordination." In keeping with the work of Maturana, the proposed mechanism is that of recursive consensual coordination, which can be seen as the organization underlying all linguistic activity, and more generally, human doings. By the same token, it has been possible to show the emergence of the operation of observing along with its implications in human development. Observing, self-consciousness and mindedness are human forms of existing in the operationalrelational domain, and they therefore cannot be reduced to any subpersonal infrastructure.

Throughout this paper, I have also summarized the reasons for avoiding the assumption that, ontogenetically, intentionreading is a prerequisite for engaging with others in social and linguistic activities, and have provided arguments precluding such a characterization. Along with the arguments for a bio-logical understanding of language and interaction, I have developed arguments against Tomasello's hypothesis of intention-reading as the precursor of language. On one hand, I have argued that the bio-logical understanding of organism allows us to reject both mentalist explanations and folk-psychological assumptions (see §2 and §3). On the other hand, I have shown that language is not a symbolic toolset and cannot not be considered as secondary to the establishment of joint activities, because it is a constitutive element of each event of recursive coordination (§5 and §7). Furthermore, the spectatorial stance that is implied by any sort of intention-reading skills would ultimately require the operation of observing, which can arise only through languaging and cannot therefore be its precursor (§6 and §8).

The bio-logical approach has some implications for the study of social interaction and joint activity. First, it is precisely because of our ontogenetic trajectory of structural transformation that we, as individuals developing in languaging, can operate congruently to what an observer could describe as the properties of our culturally situated system of coordination, and then, recursively and through reflection, elaborate strategies and follow individual or joint goals congruent to our coordination experience. Second, in order to explain coordination we cannot trace it to such notions as communication, cooperation, symbols or intentions which we use to refer to aspects of the process of coordination itself, and cannot therefore give rise to it. Rather, it is necessary to reveal the bio-logical framework within which the phenomena related to the same notions take place. This is one of the reasons why we cannot rely on a functionalist conception of language as a tool used for extra-linguistic transactions, as activities that could occur without or before languaging; this manner of proceeding confuses the way we make sense of our doings in languaging with the genesis of languaging. Third, it is not so much that language has an important

impact on human agency and cultural life, but rather, languaging *is* human agency. As said before, the operations that give rise to recursive coordination are the constitutive, discriminant elements that configure a given event of coordination as such. We do not "use" these elements; rather, we enact them throughout the operational flow of coordination, although in some cases, by observing and therefore by constituting them as objects, we can consider that we are using them to produce a certain effect.

Finally, by recognizing recursive consensual coordination as an invariant organization of human interactional dynamics, it becomes possible to understand different classes of phenomena, from language acquisition to all kind of sociocultural practices, as resulting from a single process. These phenomena remain to be studied in detail within their own domains, but the bio-logical explanation of languaging steers us towards a wider scope of understanding social interaction, and our specific mode of "doing things with others".

# **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 May 2014; accepted: 29 July 2014; published online: 14 August 2014.*

*Citation: Raimondi V (2014) Social interaction, languaging and the operational conditions for the emergence of observing. Front. Psychol. 5:899. doi: 10.3389/fpsyg.2014. 00899*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Raimondi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Narrativity and enaction: the social nature of literary narrative understanding

# *Yanna B. Popova\**

*Department of Cognitive Science, Case Western Reserve University, Cleveland, OH, USA*

#### *Edited by:*

*Ezequiel Alejandro Di Paolo, Ikerbasque - Basque Foundation for Science, Spain*

#### *Reviewed by:*

*Marco Caracciolo, University of Groningen, Netherlands Marco Bernini, Durham University, UK Dirk Van Hulle, University of Antwerp, Belgium*

## *\*Correspondence:*

*Yanna B. Popova, Department of Cognitive Science, Case Western Reserve University, 10900, Euclid Avenue, Cleveland, OH 44106, USA e-mail: yannapopova7@gmail.com*

This paper proposes an understanding of literary narrative as a form of social cognition and situates the study of such narratives in relation to the new comprehensive approach to human cognition, enaction. The particular form of enactive cognition that narrative understanding is proposed to depend on is that of participatory sense-making, as developed in the work of Di Paolo and De Jaegher. Currently there is no consensus as to what makes a good literary narrative, how it is understood, and why it plays such an irreplaceable role in human experience. The proposal thus identifies a gap in the existing research on narrative by describing narrative as a form of intersubjective process of sense-making between two agents, a teller and a reader. It argues that making sense of narrative literature is an interactional process of co-constructing a story-world with a narrator. Such an understanding of narrative makes a decisive break with both text-centered approaches that have dominated both structuralist and early cognitivist study of narrative, as well as pragmatic communicative ones that view narrative as a form of linguistic implicature. The interactive experience that narrative affords and necessitates at the same time, I argue, serves to highlight the active yet cooperative and communal nature of human sociality, expressed in the many forms than human beings interact in, including literary ones.

**Keywords: narrative, narrative understanding, literature, participatory sense-making, social cognition**

# **SETTING THE STAGE: HOW DO NARRATIVES MEAN?**

Stories are everywhere in human lives and storytelling is indeed part of all human cultures. We think in narrative, remember in narrative and interact in narrative. People tell stories in words, in pictures and in movement, in musical forms, and through increasingly diverse multimodal means. We learn through stories told in the news and in history books, we make decisions based on stories reported in criminal trials, we find it effortless to engage with the fictional stories revealed in our favorite novels and films. As the semiotician Barthes had noted, "narrative is international, transhistorical, transcultural: it is simply there like life itself" (Barthes, 1977, p. 79). The question remains, however: why and how are human experiences best organized by stories?

Stories have been studied for centuries from a variety of perspectives and with distinct questions in mind. Although a much scrutinized subject and the topic of many volumes, the field of narrative research is still an open one. That narratives play an irreplaceable role in human knowledge organization is undeniable, yet the reasons for that very fact remain elusive and ultimately dependent on the orientation of the research paradigm asking the questions. Most broadly, work on narrative can be divided between positivistic (scientific) and hermeneutic (humanistic) approaches, although that very division often cuts across individual disciplines and even theorists. Therefore, as I will argue in this article, narrative is best studied from the point of view of a new and emerging approach to the study of the mind as developed in the enactive paradigm. While cognitive science from its inception has aspired to represent the true marriage of humanistic and scientific ways of understanding, this merging of aims is only just beginning to be realized in what is termed "enactive cognitive science." This article also attempts to frame some common research topics between the theoretical study of narrative, as undertaken historically, and current cognitive science. In a book length study (Popova, in press) I have developed a model of narrative understanding as a cognitive process reliant on perceptual causality, a phenomenon distinct from mere temporal succession, and experienced as inherently meaningful, thus linking it to the important work of Michotte and his intellectual descendants (Michotte, 1963). The experiential notion of perceptual causality is used to flesh out an understanding of narrative causality and our conception of action sequences in stories: their intentional nature and their telicity (the fact that they have beginnings and endings). This is in tune with a broadly phenomenological understanding of narrative as strongly implying a meaningful causal structuring, a teleological grasping of the events of a story in a particular way. This proposal goes toward explaining narrative's acknowledged ubiquity as a form of knowledge organization in a principally non-representationalist or functionalist way. Definitional in the enactive approach is that cognition bears a constitutive relation to its objects. In a similar vein, in my understanding story is defined further as a relational domain constituted or enacted in the very interaction between an autonomous agency responsible for the causal contingencies of the narrative and most commonly known as a narrator, and the reader. A recognition of the presence of such a narrating consciousness that relays the narrative events and thereby shapes them in the process of telling, and how the story develops in interaction with the reader, will be developed and explained through the notion of "participatory sense-making" as proposed and elaborated in the enactive view of human cognition (De Jaegher and Di Paolo, 2007).

## **INTENTIONALITY IN NARRATIVE UNDERSTANDING**

Human lives are driven by living in a world where actions take both a practical and a theoretical priority. From the events of everyday life, to participation in cultural acts, to just being in the world, our primary way of interacting with a world is through practical action. Action is most commonly the result of coordinated movement but it is commonly accepted that not all movement constitutes an action. Most philosophers and others deliberating these problems would agree that it is human intention or purpose that transforms a movement into a deliberate action, the latter being understood as both the self-awareness of pursuing a specific goal, and the recognition by others that an agent's actions are also deliberate or goal-directed. As some phenomenologists have argued, the very experience of one's own intentionality is linked to the agent's own self-reflexive consciousness of agency: the awareness that I know that I can cause something to happen1 . Such a phenomenology of agency that we possess and that we reciprocally understand others to possess has been plausibly linked to the evolutionary and cognitive advantages afforded to our ancestors by the ability to voluntarily control the body as a means of communicating meaning2 . Using the body thus as an instrument or as a representational mechanism of sorts has been a means of providing our ancestors, but also any normally developing infant, with a bodily-based sense of agency. Accepting that human beings are regularly driven by intention and that intention is to some extent readable for the people that surround them and share their social and perceptual world leads also to another fundamental aspect of human consciousness. As understood in phenomenology, this view describes the understanding that all consciousness (all perceptions, imaginings, memories, etc) is intentional, it has directedness toward an object or person, it is "about or of something"3 . Such an understanding immediately calls attention to an inevitable consequence of this, namely, that human thought is intrinsically tied to the world, be it in the form of physical objects or other living beings. This also means that human actions are always already understood by other human beings within a context of intention, motives and goals, and not as mere physical movements or random events. In the context of action, human movements are grasped together, holistically, as an action, or a series of actions. Our lived experience, as embodied creatures within a social world, is therefore intrinsically meaningful to ourselves and to others. Furthermore, a mere unreflective instinctive behavior is to be distinguished from true agency. Thus, my sitting on the computer with the intention to write an article is an action, but a bird's singing outside my window to attract a mate is better described as an instinctual response to a possible physiological need. The reason for this distinction is that my purpose to write an article may not be narrowed down to just one thing and thus may not be uniquely determined or understood by others and even by myself, covering instead a wide plethora of goals, motivations, and circumstances, all of which surpass by far an animal's more narrowly understood series of actions and their expected, because ultimately predetermined, outcome. Human agency thus covers many reasons for acting, which is precisely what cannot be said of non-human agents. What matters for human intentionality then, including how we understand it when applied to text interpretation, is that intention itself should not be understood as always uniquely determined or initially hidden and then discovered or discoverable, but as emerging from a process of interaction between agents.

The purpose of the above interlude has been to situate the discussion of narrative understanding that is to follow in the same context of agency, intentionality and dynamic interaction that have characterized more recent developments in the study of human action, perception and consciousness. In its initial description the enactive approach (Varela et al., 1991) emphasized the indelible link between cognitive processes and an organism's embedded activity. Sensorimotor enactivism, as subsequently developed in the work of Noë and colleagues (Noë, 2004, 2010; see also Hutto and Myin, 2013) explains the practical knowledge characteristic of perception, understood as a process of interaction between an organism and its environment. But social interactions, rather than sensorimotor ones, dominate certain human practices, specifically the production and reception of narratives. We act in the world in no small measure because we expect our actions and intentions to be understood as meaningful, to be made sense of, by other people. Human lives in all their inherent complexities take place in the open space of shared realities and shared meanings, not within individual isolated brains. More importantly still, while the agency of an individual is of great importance for sociality, it is acting for and through one another (interacting) that ultimately defines who we are. Our human world is a social world and it takes place in large measure outside of our brains, in the common shared activity that is life. If we take this view and apply it in a wider framework, as I will be doing currently, we can see the reading and understanding of books as essentially not that different from other forms of interaction within a social

<sup>1</sup>See Gallagher and Zahavi (2008, p. 158). As the authors explain, this kind of conscious awareness does not have to be of a very high order; very often it is just a case of a pre-reflective awareness. At other times, there may be explicit awareness of acting for a reason, as in more complex decision making processes.

<sup>2</sup>Merlin Donald's theory of "mimesis" as a form of representing reality that is intentionally controlled because bodily based, goes a long way toward explaining a fundamental difference in representing reality that human beings possess in distinction to other forms of life (see Donald, 2004). Others have similarly argued that humans are unique in using the body as an instrument (a tool) for achieving understanding in the public sphere of social life where we generally dwell. (see Tallis, 2003). The main argument behind both Donald's and Tallis' proposals is that by being able to see, rehearse and refine various "mimetic skills" (Donald) or the use of the visible hand (Tallis), human beings have evolved as the embodied and enactive agents that we are, living and communicating in a public, shared and visible world.

<sup>3</sup>See Gallagher and Zahavi (2008, p. 7).

world: through a careful and deliberate process of intersubjective sense-making.

Existing characterizations of the reading process of fictional narratives foreground the nature of meaning in human communication in general, irrespective of disciplinary affiliation. How do narratives mean? How do readers make sense of written stories? How can this process be best described and explained? These are the questions guiding the research. There are many ways in which the reading of fiction has been theorized and studied mainly by literary scholars, but also by discourse specialists, psychologists and linguists. With some degree of simplification it can be stated that, despite their differences, the vast amount of existing approaches see narrative understanding as a process of communication in which the written text offers meaning and leads to interpretation through some degree of involvement on the part of the reader. These approaches can thus be classified as generally contributing to the explication of a process of "narrative transmission" between an addresser and an addressee in a given act of communication. From early literary theory (Jacobson, 1960), through speech-act theory (Searle, 1975) and relevance theory (Sperber and Wilson, 1995), to rhetoric (Booth, 1961), and studies of discourse (Graesser et al., 1994) literary communication has been assumed to take place between the multiple identities and functions of the person believed to be sending the message: "real author," "implied author," "narrator," and the equally multiple assumed identities of the "addressee": "real reader," "implied reader," "narratee." Within this basic communicative set-up, many distinctions have been drawn with respect to the degree to which the process of narrative transmission is mainly text-centered or reception-centered, on the one hand, and who the main participants in the process itself are, on the other. I will deal with each of those distinctions briefly and under separate rubrics in the next few sections. My own hypothesis about narrative understanding as participatory sense making will be developed in Sections Narrative Enaction: Changing the Assumptions of Narrative Understanding, Narrative Enaction and Participatory Sense-Making, and Narrative Enaction: Current Empirical Data and Future Possibilities below.

# **NARRATIVE AS INHERENT STRUCTURE: TEXT-CENTERED APPROACHES**

In this group belong theories that seek to examine textual features, properties and characteristics of the narrative text itself as the most significant aspect of the meaning construal process. The definitional criteria of narrative proposed in formalist and structuralist theories have centered on temporal and causal ordering, plot and action structure, and orientation toward human agents and their purposeful actions, among others, all of which are seen as text-internal and therefore pertaining to issues of form and content. The structuralists' project was a deductive, and ultimately a reductionist, method of identifying the features of narrative structure independent of the intentions or construal of the teller or reader of any story. Although classical narratologists are the main proponents of text-internal views, there is also a significant amount of psychological and early cognitive science work that similarly distinguishes narrative from other forms of thought organization on text-internal grounds. Thus, even Jerome Bruner (1986, p. 11), considered rightly the father of "folk psychology" and narrative reasoning, speaks about the "narrative" and the "logico-scientific" or "paradigmatic" modes as two distinct modes of cognitive functioning with their own specific operating principles and criteria of well-formedness that are manifestly text-specific. On his account people employ the paradigmatic type of reasoning when they think about scientific or logical matters, while narrative thought serves the purpose of explaining the changing directions of human action. Crucially, Bruner sees narrativity as a structural property, a cognitive invariant of sorts, that only later, in different discourse realizations gets a constructivist flavor. Early story grammars (Rumelhart, 1975; Mandler and Johnson, 1977) also attempted to isolate the unique internal structures (schemata) of narrative through an analogy with assumed internalized language rules believed to characterize the knowledge and use of language, as proposed by Chomsky's generative grammar. Thus, these story schemata are formalized as a set of generative rules that are used to understand and produce narrative as a specific text-type in opposition to other types such as description, argumentation or instruction. Schemata and story grammars are insufficient to explain narrative understanding on their own, however. Despite the fact that they organize aspects of memory and guide interpretation of new narratives by supplementing missing information, a good narrative is a distinctive and coherent series of events uniquely informed by a specific point of view. Despite the irreducibility of causality as a mental process, the connectivity and configuration of a good narrative are imposed by a specific narratorial viewpoint, as I will argue below, and not a result of a given narrative schema instantiation4 .

Finally, in this group of text-internal approaches I will classify a number of theories put forward by philosophers and literary critics that have become known as poststructuralist. As an approach to the reading of fictional and other texts, deconstruction, which is another name for the poststructuralist theories I have in mind, has been the dominant paradigm for a period from the 1960's to the 1990's. Derrida's *differance* is understood as a process of dissemination of meaning wherein all communicative constraints on a producer and a receiver of meaning are removed in favor of an agentless and limitless web of signification, which works against any specific authorial intention and any given interpretation. The main thrust of the poststructuralist approaches is thus a search for the latent contradictions in texts that the participants in a communicative exchange are themselves believed to be blind to, because any intention at communicating meaning is judged to be subsumed by the discourse-driven disembodied signifying process itself. One of the reasons for this ultimately flawed understanding of language is the fact that it deliberately ignores the significant factor of meaning being born in the interaction of the meaning constitutive practices of human agents.

<sup>4</sup>The configurational aspect of narrative, seen as not text-internal but stemming from the act of "grasping together" has been proposed by Mink (1978) and later extensively developed by Ricoeur (1985). Mink, in particular, speaks about narrative events being properly described not just as events, but as events "under a description" (Mink, 1978, p. 145). More of this will be discussed later.

# **NARRATIVE AS COMMUNICATION: TEXT-EXTERNAL APPROACHES**

The set of approaches which reject the self-sufficiency of the text itself and look for describing meaning as the product of the reader's reception outnumber by far the text-internal approaches. The main dividing line concerning issues of reception has to do with the distinction between more theoretical phenomenological models of idealized, hypothetical, or universal authors/readers, and more psychologically grounded ones who have sought to explicate in a more empirically sound way some of the responses of real readers to literary texts.

A communicative understanding of literature provides the starting point for many of the text-external approaches to meaning construal in narratives. Narrative need not be always verbally instantiated but it needs to be somehow externally presented to be communicated and understood, as in a silent film, or a dance, or a mime performance. Verbal communication has been looked at in terms of the communicative intention of a speaker and the subsequent interpretation of that intention, but also in terms of existing conventions (normativity) and context. Unless some form of explicit verification of the original communicative intention is made, what gets transmitted in an act of verbal communication is a series of cues that get reconstructed by a listener. Any communicative exchange is then just an attempt at meaning making which may or may not be successful. Earlier models of communication in language relied heavily on the six elements involved in any verbal communication, proposed by Jacobson (1960), and their corresponding linguistic functions. The elements and their respective functions are: the addresser ("expressive function"); the addressee ("conative function"); the context ("referential function"); the code ("metalinguistic function"); the channel ("phatic function"); and the message ("poetic function"). Jacobson believed that all these functions are involved in every act of verbal communication but only one was dominant in any particular verbal exchange. Somewhat self-evidently, the poetic function was seen as specific to forms of verbal art, particularly poetry. What is important to note even in this early model is the realization that the message alone does not and cannot supply all the meaning of the exchange. A speech act is a process where much of what gets communicated derives from an interaction between a speaker, and a listener, but also and importantly with the help of context, code and intention. In literary theoretical approaches the shift toward understanding narrative as a form of communication has led to an increased preoccupation with understanding the reception process itself (albeit in a non-empirical way) and to a move beyond the formalism of early narratological models. In more linguistic approaches it has become evident in the increased interest in the pragmatics, rather than the semantics, of texts.

# **PRAGMATICS, SPEECH-ACT THEORY AND RELEVANCE THEORY**

Pragmatics, despite its close connection with linguistics, was originally developed by philosophers, such as Austin (1962) and Searle (1969), a fact that explains its preoccupation with what is taken to be the real acts and dynamic contexts of language exchanges between people. Pragmatics studies the uses of language in human communication, which have variously been termed "parole" (Saussure, 1974), "performance" (Chomsky, 1965) or aspects of "language behavior" (Lyons, 1977), and have been excluded from strict grammatical descriptions. The assumption in philosophical pragmatics is that in using language we perform various actions or speech acts, which go beyond the merely verbal exchange of words. Such an understanding of a whole narrative as a speech act is a clear precursor to more sociological views of narrative and related notions such as Labov's (2003) influential notion of "tellability" or "reportability" of a story—the reason for telling a story to somebody. The most important aspect of linguistic pragmatics for our purposes here is to recognize its open acknowledgement of some degree of cooperation and reciprocity in language understanding: meaning and understanding are always correlative. On the face of it this view appears consistent with the one being developed below of narrative understanding as participatory sense-making. The key difference is how the concept of cooperation and participation is understood: as a passive way of unpacking an intention, in the former case, or as an emergent interaction, in the latter.

One important contribution of pragmatics to narrative understanding is Grice's (1975) notion of "conversational implicature" and the related "cooperative principle," which is nothing more than a normative assumption of cooperation between language producers and receivers in any act of verbal communication, including narrative understanding. Language is rarely able to convey meaning explicitly, so through words and sentences people say things that prompt others to make inferences and understand the implied meanings. According to Grice, four maxims, of *quantity* (is the information sufficient), *quality*, (is it true), *relation* (is it relevant), and *manner* (is it orderly), underlie the cooperative principle and give rise to different non-explicit meanings (implicatures). Thus, the successful recovery of an implicature by a recipient depends on recognition of the communicative intention of the sender. When a maxim is broken or "flouted," this is in turn understood by the recipient to be deliberate and therefore interpreted as such. An early attempt, among others, to situate a narrative understanding within a Gricean framework is Pratt (1977), where both naturally occurring narratives and fictional narratives are seen as consistent with the maxims of quantity, relation and manner. What is specific to fictional narrative, however, is its lack of "truthfulness," its inherent, because intended yet nondeceptive, "untruth." This means that in telling a fictional story its author deliberately flouts the maxim of quality (its truthfulness) and thereby marks the text as a distinct form of communication. What is problematic in this description is the failure to acknowledge the relative unimportance of the reader's recognition or interpretation of this assumed illocutionary act of pretense. Does truthfulness matter for the reader's interpretation? Does the fact that fiction is in some sense not real detract from its communicative purpose or intent? Does it therefore evoke or necessitate some additional way of understanding, such as pretense or "makebelieve?" This has been the position of some philosophers in the analytic tradition such as Currie (1995) and Walton (1990). In other work, Adams defines fiction as an act by an author of transferring origin to another speaker that he creates (Adams, 1985, p. 10). It is my view that emphasizing truthfulness at the expense of relevance is precisely one of the reasons why a communicative understanding of fictional narratives runs into difficulties. The lack of truth in fictional narratives is not a real problem if the principle of relevance is given the priority it deserves, a view given an extensive treatment in Walsh (2007)5 . In other words, for narrative understanding it matters very little if the story relates real facts, but it matters a lot how it is told and how we make sense of that telling.

If the four maxims, proposed by Grice, are examined in detail it is clear that the notion of relevance is of great importance to all of them. The flouting of the maxims produces implicatures precisely because some utterances appear to be irrelevant in a given context. Some linguists have therefore argued that the *maxim of relation* (be relevant) overrides Grice's other maxims. Sperber and Wilson's (1995) *relevance theory* replaces Grice's cooperative principle with the *principle of relevance*<sup>6</sup> . The degree of relevance of a communicated sentence or text is dependent on two factors: context and processing effort. The optimally relevant interpretation, as defined by Sperber and Wilson, will be the least costly one in terms of processing effort and the most extensive one in the range of its cognitive and contextual effects (Sperber and Wilson, 1995, p. 125). Relevance theory rightly claims to be able to account more satisfactorily for a wider range of communication than much other modern pragmatics does. The reason for this is that it offers a psychologically valid account of the mechanisms involved in language understanding. What is psychologically realistic in this account is the acceptance that the two critical notions for relevance, context and processing effort, are psychologically motivated notions: they reflect each participant's individual and subjective assumptions about the world and the given context, not some objective, represented and pre-given versions of it. Relevance theory also emphasizes the importance of motivation, of identifying the communicator's intention, for meaning construal. At the same time, a fundamental problem for relevance theory with respect to narrative understanding is again the absence of consideration of the relational nature of that process, or, in other words, of omitting the interactional aspect of it. In assuming a single, optimally relevant and complete interpretation for all readers and all readings, relevance theory thus fails to account for the interactive, dynamic, and changeable processes of meaning construal that different readers or even the same reader engage in at different times and in different contexts7 .

Despite the fact that pragmatic theory is useful to account for aspects of narrative understanding, along the lines described above, it has not been widely applied to narratives for that specific purpose. When it has been so applied, it has been mainly under the rubric of rhetoric. One of the best existing examples is the very influential *Rhetoric of Fiction* (Booth, 1961), where the novel, and by extension, any literary narrative, is conceived as a rhetorical act of "telling." Booth's undeniable contribution to narrative understanding consists in elaborating on the relations existing in the narrative communicative act, and specifically the participants in it, the details of which will be discussed below. Booth's own later work (1988) develops a more interactive understanding of how readers communicate with books through his metaphor of books as friends, who can either help or harm us, thus introducing an ethical dimension to the act of communication. Other more recent attempts are the rhetorically-oriented work of Phelan (1996) and Rabinowitz (1977), both of whom also emphasize not just a communicative but an ethical dimension in the rhetorical act that is each narrative telling and reception. A step even further in literary pragmatics is understanding fictionality itself as a specific rhetorical stance, as developed by Walsh (2007). His position is that the problem of fictionality should be seen not as a problem of truthfulness, but of relevance (Walsh, 2007, p. 30) and that each narrative interpretation is ultimately a matter of how we resolve the question of relevance: why a certain text is worthy of attention, interpretation or evaluation for any given reader.

# **NARRATIVE COMMUNICATION: THE PARTICIPANTS**

It is to some extent clear why a conversational narrative can be seen to be similar as a communicative act to other verbal exchanges like an ordinary conversation, a public speech or a letter. For that reason, in text-external approaches to narrative understanding it has been assumed that the standard for all narratives is a naturally occurring conversational narrative. Yet, it is also clear that the communicative context of a fictional narrative can be very different. For a start, any novel is a much more complex and deliberately crafted linguistic artifact than a story told at the dinner table. Secondly, the presumed intention of a writer is not available or knowable in the same way as that of a conversational participant. In early forms of practical literary criticism interpretation of texts was sought with the help of biographical or historical data on the author's life, an approach that was soon deemed flawed, however, and exposed by what is known as "the Intentional Fallacy" (Wimsatt and Beardsley, 1946). What followed was a development of a more sophisticated view of what represents an authorial intention in narrative, acknowledging that readers rely not on any actual or explicit statements of intention but, rather, recognize the indubitable assumption of intention contained in every text, a view that underlies, as I have suggested earlier, how we understand any human action.

The role of the agent(s) in any form of literary communication has been controversial and has not been resolved in any definitive way. The main disagreements concern the levels of

<sup>5</sup>In much of the psychological work on discourse processing the understanding of texts is also seen as a form of communication. This work has sought to establish how the reader is able to build and maintain a mental representation of the text world and all the actions and characters that it contains (see Van Dijk and Kintsch, 1983). What is assumed in these models, however, is a unique and unambiguous message that is encoded in the text and then decoded by any competent reader in pretty much the same way. This is a very problematic assumption for reasons that will be discussed below.

<sup>6</sup>The definition of the principle states that "[e]very act of ostensive (i.e., mutually manifestly intentional) communication communicates the presumption of its own optimal relevance" (Sperber and Wilson, 1995, p. 158).

<sup>7</sup>As a general criticism of speech act theory and other pragmatic theories of interpretation it can be said that they are, in the words of Linell (2005), "monologic" approaches to language use. This means that they fully embrace the information processing model of cognition, the simple transfer model of

communication, and the code model of language, proposed as far back as Jacobson (1960).

communication in a narrative, of which there are thought to be two, although a hybrid third cross-category has been a main concern for all kinds of theoretical and practical approaches to narrative understanding. As Genette has put it, "a narrative of fiction is produced fictively by its narrator and actually by its real author (Genette, 1988, p. 139)." Yet, in the absence of a real person talking, there has been proposed another agent, a textually implied narrator or author, who leaves a mark of his/her presence on the text in the shape of its specific norms and choices8. The concept of the *implied author*, introduced by Booth (1961), can thus be seen to describe a text's assumed intention: an assumed agency necessarily employed when interpreting a text. The concept therefore is seen not as a simple prop in the reading process but an indispensable function of the interpretative process itself, an analytical position that every reader anticipates and fills. The controversy about the concept concerns whether it stands for some form of imagined, anthropomorphized entity or a textual process itself, with the majority of opinion weighing in on the position that the implied author is not a presence but a textual projection of the reader's own interpretative strategies. Finally, the intra-narrative level of a novel is the one where communication is taking place between a narrator, who tells the story and a narratee that may or may not be specifically mentioned. The main point that I would like to make here is that, no matter what we call it, the reader constructs some kind of a conversational participant in the process of reading, a mediating consciousness between herself and the reported events. That participant is, as Bortolussi and Dixon suggest, not an abstract or logical characteristic of the text, but a mental representation in the mind of each reader (Bortolussi and Dixon, 2003, p. 72). The narrator is a fictional, yet psychologically real, enunciating instance of an act of telling and telling is, on my view, a form of interaction. The model I am proposing below offers an explanation that need not take textual presence and anthropomorphic presence of a teller as mutually exclusive aspects of the reading process, but as constituents of the reader's co-construction of meaning in a text.

I adopt the narrator in a literary act of communication as the main participant interacting with a reader for a number of reasons. First, in naturally occurring conversational narratives, there is always a speaker. Second, literary narratives from the Homeric epic to the realist novel and beyond have a more or less explicit and sustained enunciating instance that manipulates what we get to know and how we get to know it. Indeed, for many theorists the presence of a narrator constitutes a defining feature of verbal narrative, much in the same way as a film is assumed to be shot through a camera held and manipulated by a real person. In natural narratives or nonfictional discourse the author of the discourse speaks in his or her own voice, while in fictional narratives what is said is attributed to the speaking "voice" of the text itself and originates with the narrator, an entity that is separate from the actual author (Bortolussi and Dixon, 2003; Mellmann, 2010). This is because both the implied author and narrator are identified in relation to individual texts, not a compiled entity based on many texts, something that makes them distinct from the real author. Similar descriptions include Abbott's (2002, p. 77) and Chatman's (1990, p. 77) "inferred author," or Eco's (1990) "model author." I hypothesize therefore that a narrator, assumed to have agency, intentionality and physical perspective is a participant in any narrative interaction with a reader9 . If readers assume the existence of a conversational participant who is the agent responsible for the text, the process of literary interpretation is an intersubjective process of sense-making, and will be a reflection of each individual reader's distinct construction of that agent's stance. In some forms of fictional narrative, such as 1st person autobiographical fiction, there may be significant degrees of overlap between the historical author and the narrator, a fact which nevertheless does not detract from the importance of the distinction itself. What is being emphasized here is that, rather than being an "anthropomorphic fallacy," as suggested by Bortolussi and Dixon (2003, p. 174) that participant is a real psychological effect of the interactive language processing, a symptom of the eminently social aspect of human interaction10. Recent neuroimaging studies have confirmed this human tendency by showing that silent reading of direct vs. indirect speech activates voice-selective areas in auditory cortex (Yao et al., 2011). Seeing narrators as ubiquitous in verbal narratives should not be seen as simply a linguistic convention or a mere form of linguistic construction (for this view see Dancygier, 2012) but a natural disposition of the inherent intersubjectivity of human minds.

Because it is ultimately a form of mental construction, there has been no unanimity in how various theorists have treated the concept of the narrator. It has been called a voice (Bal, 1985), a narrating agent (Rimmon-Kenan, 1983), a narrative position (Toolan, 1988), or some other form of inferential construction on the part of the reader (Fludernik, 1993). I suggest that the presence of a narrator underlies a specific functional feature of narrative that has already been mentioned, namely, that the goal of narrative is not primarily informative, but interactive. Narratives do not just recount general experience, but make it specific, thereby evaluating it (Polanyi, 1981), and showing it

<sup>8</sup>For a detailed examination of the history of the concept and its critical reception see Kindt and Müller (2006).

<sup>9</sup>In their comments two anonymous reviewers have raised the objection that ultimately the only minded participant in an intersubjective encounter with the reader is the real author. As I will argue below, narrative enaction is likely to depend on types of narrator as well as many other linguistic factors. Whether and how readers respond to these types of narrators remain, however, largely unexplored empirical questions, although some initial results will be discussed in the section on empirical data. It is my point that the presence of a narrator unifies and shapes the reader's response in specific ways, depending on how this imaginary participant is construed. It is possible that readers will respond differently to narrators who are named or are part of the story in some explicit way (e.g., when they are homodiegetic in Genette's, 1980 typology), as opposed to 3rd person heterodiegetic ones.

<sup>10</sup>For a similar view on the need for the narrator see Mellmann (2010). For the opposite view see Walsh (2007). For the view that certain types of narrative with no explicit linguistic traces of a narrator, such as 3rd person narration or narration in free indirect discourse, have no speakers, see Hamburger (1973) and Banfield (1982).

has a point that is worth sharing (Labov, 2003) 11. If we accept that every text has a speaker and in understanding we interact with that speaker, the problem is resolved because the interactive process is not textually but contextually situated. A problem for establishing the narrator as the main participant in the interacting process may potentially be the fact that some narrators are seen as "unreliable," that is as somebody whose rendering of the story the reader has reasons to suspect (Rimmon-Kenan, 1983, p. 100). From my perspective it is important to understand that the reader will employ whatever knowledge they have or may gain from the narrative in order to make sense of it, irrespective of the fact that they may suspect inconsistencies in the narrator's version of events. This is because the inconsistencies are there to be discovered, played with, and perhaps ultimately resolved (or not), all of which happens in the process of reading and sense-making.

## **ENACTIVE SOCIAL COGNITIVE SCIENCE**

Enactive approaches to human cognition foreground the social and intersubjective nature of human understanding. The name "enactive approach" to mind and life should be understood as initially proposed by Varela et al. (1991) and subsequently developed in Thompson (2007), Stewart et al. (2010) and Di Paolo and De Jaegher (2012). The most important suggestions of this approach for research on social cognition, where I situate narrative understanding, is the notion of *participatory sense-making* (De Jaegher and Di Paolo, 2007). This notion breaks with long standing assumptions about hidden intentions in individual minds, as well as with a dominant mentalistic view of how we understand others, such as "theory of mind" (Baron-Cohen, 1995). The notion of participatory sense-making captures the idea that social interactions are dynamic, unexpected, and to some extent unpredictable, hence emergent. As I have tried to demonstrate, understanding the cognitive processes involved in literary reception have followed closely what has been assumed to constitute social cognition (albeit related only to language processing), as for example, in the cases of linguistic pragmatics or discourse studies. Recently, there have been explicit attempts to describe the processes of literary interpretation as mind-reading, where reading and making sense of fiction is seen as a pleasure inducing exercise of our theory of mind (Zunshine, 2006). The problem with these approaches, as I see them, consists precisely in the mentalistic slant that they promote. While there is a more decisive turn toward exploring the socially situated nature of character minds in Palmer (2004), it is still the case that the social and public nature of mind is used here in an observer-like way to make sense of characters' actions and emotions and not as framing an interactive engagement with a reader. As Di paolo and De jaegher put it, mentalizing or reasoning about the supposed mental states of others is a legitimate cognitive process, but not one that is at play always or in general (Di Paolo and De Jaegher, 2012, p. 2). Moreover, the view that the "shared mind" is primary has been around for a long time, evidenced in the work of a number of thinkers from distinct traditions such as phenomenology (Merleau-Ponty, 1945), social-cultural psychology (Vygotsky, 1978), analytic philosophy (Hutto, 2004), developmental psychology (Trevarthen, 1979; Hobson, 2004), and more recently linguistics and cognitive semiotics (Zlatev, 2005; Zlatev et al., 2008). The enactive view of human cognition, also broadly comparable to what is called "intersubjectivity" by some theorists (Zlatev et al., 2008), proposes a markedly different view from the theory of mind positions about how we understand other people. It argues that it is not simply the case that human mental states are primarily private or solipsistic, and only subsequently, through inference or simulation, they get projected onto others so that we can know what they are thinking. The claim is that in some basic sense, forms of human engagement with others (beliefs, intentions, attentional states, and even emotions) are fundamentally intersubjective.

For theory of mind approaches there are two ways that these assumed intersubjective processes work: either through some form of information processing reliant on innate computational modules of "intention detection," "shared attention mechanism," etc. (Baron-Cohen, 1995) or through unconscious simulation of the intentions or feelings of another (Goldman, 2006). The implausibility and shortcomings of the former have been duly criticized by Gallagher (2008) in favor of "direct perception" in which the developing human subject engages without any need for complex mentalizing. With respect to the latter, it is of great value to look at Di Paolo and De Jaegher's (2012) own assessment of sub-personal neural mechanisms (such as mirror-neurons) that simulation theorists promote as the substrate underlying social cognition. Rather than seeing mirror mechanisms as causally responsible for social cognition (which is the dominant view), Di Paolo and De Jaegher very plausibly suggest that in fact it is interactive social experience that may produce the mirror functions and the imitative actions that are observed in human subjects. This distinction importantly draws attention to the fact that sub-personal neural mechanisms may be necessary but not sufficient for social understanding, thus depicting a crucial distinction between the two. The inherent plasticity and malleability of the mirror neuron system in humans is also indicative of social interactions playing at least an enabling role for the development of these mechanisms (Di Paolo and De Jaegher, 2012).

## **NARRATIVE ENACTION: CHANGING THE ASSUMPTIONS OF NARRATIVE UNDERSTANDING**

It is important to see the implications for social cognition of enactive cognitive science when put against the framework of embodied cognitive science as a whole. Much recent work in cognitive linguistics (Johnson, 1987; Lakoff and Johnson, 1999; Hampe, 2005) has assumed that meaning is grounded in sensorimotor experience, but this experience is commonly framed

<sup>11</sup>It is of interest to note that the concept of the narrator has been largely ignored in studies of discourse processing. In more recent cognitive narratology the issue of intention has resurfaced with the notion of "the intentional stance," used by Herman (2008) to account not only for what he calls "an innate tendency to read for intentions" (p. 240) in narrative practice, but also to argue that it is narrative practice itself that gives rise to such human tendency to ascribe intentionality. It is proposed by Herman that the problem of whose intention is communicated in a narrative can be resolved by treating it as a "structure of know-how" in a more general process of folk-psychological reasoning, a point to which I will return below when discussing his views on how narratives mean.

as unconscious cognitive processing as in Lakoff and Johnson's "cognitive unconscious"), basic motor schemas (Mandler, 2004; Hampe, 2005) or neural activations (Gallese and Lakoff, 2005). This framing deliberately blurs the distinction between conscious experience and sub-personal neural processes which may ultimately ground embodied experience but are not equivalent to it. Barsalou's (1999) work on perceptual symbol system, innovative as it was for its rejection of a separate abstract level of conceptual representation, also carries the mentalistic torch in equating concepts with modality-specific neural activations, thus bypassing the issue of conscious conceptual knowledge and, the social nature of its linguistic realization. Despite claims to the contrary, a description of language as essentially a private intramental phenomenon shared between people solely on the basis of their common embodiment, as promoted currently in nearly all research in cognitive linguistics, is the old mentalistic view but dressed differently. Linguistic knowledge can never be private, as Wittgenstein (1953) noted long time ago, and cannot be reduced to what goes on in individual minds or brains. The interactive nature of linguistic encounters is not addressed to a satisfactory level in the theory of "conceptual blending" (Fauconnier and Turner, 2002), where the dynamic aspect of meaning construal is noted, but human cognitive processes are described again as subconscious acts of "blending" together various elements (concepts, frames, whole scenarios), thus producing new and emergent linguistic meanings. Needless to say, none of these developments in the cognitive science of language attend to the intentional, relational, and participatory emergence of meaning among conscious subjects who share a language.

My situating of the study of narrative understanding within an enactive view of human cognition grows out of a deep dissatisfaction with various models of literary cognition, as discussed above, that have looked at narratives as texts to be interpreted, without broader considerations about how cognition is enacted. Hence, even though there are many books on cognition and narrative (Turner, 1996; Herman, 2002; Dancygier, 2012), my proposal here aims to create a more radical turn in the cognitive study of literature by firmly situating narrative study as a form of enactive cognition12. One of the main points that I am making throughout this paper is that stories are not static or inert cultural artifacts; they are expressions of intersubjective meaningful action and participatory sense-making between tellers (narrators) and readers. In other words, they are interactive processes in their own right, as opposed to formal structures (as assumed in structuralist narratology), or individualistic (monologic) processes of reader interpretation (as taken up in discourse studies or pragmatic theories of communication).

To bring the discussion back to narrative understanding, and specifically narrative understanding achieved through the medium of language, we need to address again the nature of linguistic meaning, but this time take into account the enactive view, as introduced above, and explore its implications for language. Particularly, it is important to look at how the inevitability of a co-evolving meaning change in any linguistic encounter can modify long-entrenched ideas about language and its nature. As shown above, traditional forms of linguistics adopt the same ontological assumption about meaning as traditional computational approaches to thought processes, namely that it is possible to analyze the world in terms of context-free data. In relation to language, this view is summed up in semantic descriptions of linguistic units as sets of fixed and independent elements, termed concepts or symbols. Pragmatics, as I have shown, attempts to override the inefficiencies of this description by postulating various contextually implied meanings, but still suffers from the assumption of a transfer model of communication between individual minds, and the accompanying assumptions of fixed predetermined meanings that require decoding. For that reason, in some accounts written and spoken language have been treated as two distinct modes of language behavior (Chafe, 1994), the former characterized as a formal system of symbols and rules; the latter, as the pragmatic use of these forms and rules in everyday speech.

This polarized view of essentially two kinds of language has been shown to be a misrepresentation and a simplification of how language works, termed "the written language bias in linguistics" (Linell, 2005). Similar view with respect to the language sciences and linguistics in general have been voiced before by Harris (1981, 1996), who suggested that linguists do not describe "real language" but fabricated, "mythical" forms of it that do not match the reality of language use. More recently, Linell (2009) has argued strongly that the dominant view in linguistics of language as a system of abstract symbols and rules that somehow get transmitted and decoded between individual minds in communication is insufficient to account for the dialogic nature of actual linguistic exchanges. He has proposed instead a view whereby the actionoriented aspects of language are given a priority and he has named this process "languaging," as opposed to the original pragmatic term "language use" (Linell, 2009, p. 274). The latter, according to him, still promotes the abstract mental nature of language, which is then seen as secondarily and perhaps only peripherally being put to use in a given context. The process of "languaging," on the other hand, highlights the active, spatially and temporarily situated, and interactive nature of how we speak to each other. It draws attention to the fact that meanings in language are made and not simply retrieved. It connects with the enactive view of human cognition in its recognition of the fundamentally social and co-authored nature of human meaning-making, and gives it a description unavailable in more traditional linguistic theories. A basic question concerns whether speech and writing are ultimately different in that the latter is assumed to be more complete, rigid and final, thereby restricting any potential interactive dynamics present in talk-in-interaction. The point I am making here is that when we read written narratives we enact them; we invest them with a speaker that we treat as a conversational participant, we become willing partakers in their worlds, but they also become part of ours. Narratives constitute both interventions in our sense-making powers as readers, and are, reciprocally, the

<sup>12</sup>I acknowledge the fact that recently there have been attempts to develop models of literary narrative understanding that also use some form of enactive cognitive science to substantiate their claims, such as Herman (2008), Caracciolo (2012a,b, 2013). The specifics of how these valuable hypotheses are situated in relation to the one proposed here will be taken up in the next section.

dynamic constructs of the intervention itself. It is simply not true to say that narrative enaction happens in one direction only; from a text to a reader. Yes, we have all felt the unmistakable pull of a book or a film, when hours, even days and months, after reading a story a given character, a scene, or a moment stays with us to the extent that we cannot push it away. We have all experienced the inability to put a book down despite various urgent demands on our time. How does a story achieve this high level of communion with a reader? How is this possible and more importantly, why are these processes so specific to our individual sensibilities, if we take stories to be autonomous and self-contained worlds? I argue that they are not. When we read, we re-create a situation, a moment, an act in order to understand it. This understanding is shared, yet also personal and dependant on many factors such as gender, knowledge, verbal expertise, and experience, among others. Borrowing the words of the poet Antonio Machado, Varela described enaction as the laying down of a path in walking: "Wanderer the road is your footsteps, nothing else; you lay down a path in walking" (Varela, 1987, quoted in Thompson, 2007, p. 13). I would like to use the same metaphor to describe the process of literary reading: each one of us lays a path when we experience a meaningful encounter with a story. That path is and stays our own, although it may change on subsequent encounters with the same text. This uniquely subjective and experiential process that literary fiction engenders goes toward explaining the overwhelming multiplicity of interpretations that people come up with, and the consequent disagreements over literary meanings that have troubled the study of literature. This need not be considered the disciplinary disadvantage that it has been taken to be, as I will argue below.

The participatory sense-making, proposed by De Jaegher and Di Paolo (2007), pays attention to two factors: both individual cognition, and interaction, neither of which, on its own, is sufficient to account for the relational dynamics of social cognition. In the context of literary narratives this means that as readers we share in the narrating, moment by moment, of the unfolding events. Maintaining patterns of coordination, but also breakdowns of coordination and recovery are all part of participatory sense-making. I see literary narrative understanding as such a process of participation. Conflicts are possible and in fact often necessary when a particular prediction we make as readers turns out to be wrong. Narrative emotions such as curiosity, surprise, and suspense are indeed the result of such continual conflict between a reader's causal construal through trial and error of the unfolding narrative dynamics13 . The main avenue for coordination between reader and teller in a narrative is thus temporal dynamics: flash-forwards and flashbacks in the sequence of events, the rapid tempo of a summary vs. the slowness of a scene, techniques like showing and telling, are all temporal displacements, epistemological consequences of the proximal or distal self-positioning of a narrator. A literary story, much more than the stories we tell daily, relies on how the telling decides on and arranges what is told, which the reader enacts in sense-making. This is rarely a linear process and one that leaves gaps, ambiguities, rival perspectives, and often unresolved open-endedness. Examining the interactive possibilities of telling, of mediacy in literary narratives, most commonly studied in terms of temporal/perspectival dynamics, and grammatically realized through the categories of *tense, aspect*, and *aktionsart*, thus provides a way to put side by side linguistic function and the sense-making processes of the reader. Textual features and aspects of narration, which can be studied systemically, can then be correlated with observed responses.

What I argue further is that the interactive potential of written narratives is not diminished by the nature of our encounter with them, i.e., as written texts. Linguistic choices do channel this encounter and guide the interactive process through various means, as suggested. But these are not grammatical choices only. When we enact a narratorial viewpoint, it is not because the narrator is a mere linguistic construction or a discourse feature that we decode, but because we experience it as a meaningful participatory act between ourselves and the teller. The main underlying assumption behind my claims is that the language of fiction does not simply reflect nor describe an objective reality for the reader to recreate but is very much an instrument in the co-creation, or to put in enactive terms, in the bringing forth, of that reality. If we accept, as I do, that narrative presupposes intentional directedness, a "grasping together," which involves causality, as phenomenological narrative theorists like Mink (1978) or Ricoeur (1985) suggest, then we can say that the sense-making processes we engage in will result in a relational reshaping of that causally shaped grasping for each reader, a sense of change, of an alteration of experience14. This happens because so much of the experiential world of the story becomes the reader's own world.

## **NARRATIVE ENACTION AND PARTICIPATORY SENSE-MAKING**

The enactive approach to social cognition has not been applied to literary reading in the form suggested here, although there exist a number of previous considerations, which despite using different terminology and with very different ends in mind, can be evaluated for the relational aspect of literary reading that they highlight. I examine some of these suggestions here and evaluate them in relation to the enactive view I propose, beginning with older theories and finishing with some recent ones that have relied on enactivism for their models. A theoretical focus on the reader is historically associated with the Constance School in Germany, where hermeneutics (in the case of Jauss, 1982) and phenomenology (in the case of Iser, 1978) were used to produce largely theoretical accounts for the processes of readers' contribution to textual meaning. Reception theory, as these models are

<sup>13</sup>See Sternberg's (1978) account of narrative dynamics, based specifically on these three narrative emotions.

<sup>14</sup>In relation to everyday story-telling a similar claim has been developed under the name of "the narrative practice hypothesis" (Hutto, 2007). The proposal is that folk-psychological understanding of other people occurs normally as an effect of story-telling practices, through the support of others. Reasons for acting thus become familiar to children through explanation, linking beliefs, desires and outcomes in social scenarios. The problem with this is that beliefs and attitudes are, more commonly, aspects of the way agents reflect, *post hoc*, on their own or others' activity. While these are verifiable in everyday contexts, explanation becomes problematic in the context of fiction.

known, produced some valuable contributions that can be seen as relational in the sense of enactive cognitive science. Participation is definitional to the notion of "textual gaps" or points of indeterminacy in any text (Iser, 1978), which the reader needs to fill. Literary texts have more gaps than other forms of communication, hence, require more active participation. For Iser literature is markedly different from other forms of language encounters because literary texts represent not the real and known world but generate fictive worlds which are completed in distinct ways by the reader (Iser, 1978, pp. 23–27). His main point is that textual structures, what he calls "textual perspectives" embedded in the text, in some sense control reader response, so that there are always certain limits imposed on reception processes. One significant problem for this theory is that no attempt was ever made by Iser to connect his view of the reception process with actual empirical work on real readers. At the same time, it is clear that for Iser textual meanings are understood as potentially "given" in the text and then jointly realized through reader's involvement.

Understanding the text as unfinished, as a potentiality, as a "virtual reality" has been a part of other treatments of literature that can be seen as a starting point for an enactive study of narrative texts. Ryan (2001) speaks of *immersion* in narrative worlds, Gerrig (1993) uses the metaphor of *transportation* to describe what takes place in the mind of the reader, and Nell (1988), of *entrancement* or *being lost in a book*. While these theories capture some of the reader's involvement, they still present a picture where the overall assessment of what happens with a reader in an encounter with a book is quite passive. In the analytic tradition Walton (1990) has proposed a representational theory of art, where books (and other art forms) are understood as props that prescribe and guide specific imaginings, similarly to the way children use toys to participate in games of pretense. I think that the notion of participation is already contained in Walton's view of texts as props. Given the inherent ecological meaning of props, an interesting question would be to explore types of text in relation to "easiness of use" of those props. In terms of comparison with the enactive view, Walton's is still a mentalistic view where imagination is understood as an intramental imaginary experience, instead of an interactive one. More recent views from the philosophy of aesthetics and cognitive science speak more openly of mental simulation as an important part of the reading process (Currie, 1995; Currie and Ravenscroft, 2002). Simulation is understood here as the automatic mental mimicry of a specific experience attributed to another (Goldman, 2006), hence as resulting from the sub-personal mirroring processes that simulation theories rest on. It was argued above that simulation theories of understanding other people have their serious problems, which an enactive view of social cognition tries to address. On that basis, applying simulation theories to understanding fictional minds is also problematic. Perhaps closest to the view I am proposing comes Ryan's (2001) discussion of "spatio-temporal immersion" in narrative and its connection to specific linguistic forms. Ryan rightly assumes that the reader's participation somehow relates to degrees of self-involvement (Ryan, 2001, p. 98) but these are not systematically correlated with specific textual features, and the possible dependencies remain unexplored. Ryan adopts an (unacknowledged) embodied and enactive view of making sense of a narrative when she speaks of the reader's "virtual body" inhabiting the narrative world, adopting certain perspectives, prospective vs. retrospective narration, the use of present tense, etc., all of which are taken to be specific narrative strategies for reader immersion (Ryan, 2001, pp. 133–134). It is relevant to point out here that postulating interaction, as in my proposal, instead of mental simulation gets rid of some of the difficulties faced by immersion/simulation theories.

In more recent work a prominent narratologist (Herman, 2008) has proposed an understanding of texts as a form of joint attentional engagement with artifacts. This proposal is enactive to the extent that it assumes some form of narrative intentionality which is realized not internally, as a hidden mental object to be communicated, but in the form of practical know-how whereby textual cues, for example deictic shifts, are seen as prompts (affordances) for construing meaning. While very much in agreement with the general enactive standpoint that Herman takes, I have two main reservations about this formulation. First, the accepted view in ecological psychology is that affordances are dispositional properties of physical objects15 . Describing texts as providing affordances for interaction with an interpreter is therefore a form of sensorimotor enactivism (Hutto and Myin, 2013), more suited to explanations of practical knowledge, rather than social interaction. I am not sure to what an extent Herman takes texts to provide affordances metaphorically (at one point he compares textual designs with a coffee machine's built-in activity structure to make coffee (p. 256). If taken literally, the proposal raises a second objection in that affordances are understood here as inherent properties of texts which somehow tell us directly what to do with them, leaving the laborious and temporal process of sense-making unattended to. Yet, as I have argued before, textual understanding is a dynamic process unfolding in time, going through rhythms of coordination, breakdowns and recovery, which often does not end with a story's conclusion. The key to literary understanding, I argue, is a deliberate process of sense-making, reliant on conscious modification and regulation between intentional agents (real or imaginary), and hence necessitating a prolonged attention and also something akin to what Tomasello (2014) very recently described as "shared intentionality." In other words, it is not the structure of narratives, or language, or culture *per se*, that generate intersubjective understanding, but the inherent socially recursive and "shared" mind that sets this process in action (see also Di Paolo and De Jaegher, 2012). Agency is prior to action and literary interpretation is continually created by readers not in the form of reproduced textual patterns (plot or structure), nor passive automatic dispositions and affordances, but as shared agency, as a constant attunement to the assumed agency of another.

Another recent view, proposed by Caracciolo (2012a), already moves beyond Herman's view of textual cues as affordances, and toward something closer to what I am proposing here. While

<sup>15</sup>The Gibsonian sense of *affordances* (Gibson, 1979) describes an organism's perception/action in terms of the opportunities arising from its interaction with an environment. *Affordances* are bundles not of qualitative data, but of immediately given motor information which facilitates perception and practical action (p. 134).

elsewhere the author has maintained that in understanding fiction the reader simulates a fictional consciousness, most commonly the one(s) that the text gives direct access to Caracciolo (2013), here he sees narrative understanding as a dialog between author and reader, a form of shared experientiality. Despite relying on the notion of joint attention and Dennett's intentional stance (as does Herman), Caracciolo is taking a non-explicit step toward interaction when he claims that authors and readers experience a story in essentially similar ways (p. 198)16 . Where he differs from my proposal is in his separation between experientiality (what he calls "the intentional level"), mainly seen as embodied, nonconceptual knowledge, constituting the common ground between agents in a narrative situation, and higher-order, narratively constituted interpretations, which he sees as essentially distinct from the former. The shared reality of a created storyworld is thus taken here to be based solely on the shared embodiment and shared cultural practices of the participants, and not as the shared intention of a participatory process of sense-making of individual agencies that I am proposing. As I argued above, joint attention is born in collaborative activity, that is, in shared intentionality, not just in sub-personal, shared embodiment.

## **NARRATIVE ENACTION: CURRENT EMPIRICAL DATA AND FUTURE POSSIBILITIES**

It is part of my proposal to emphasize that work done in the field of empirical studies of literature bears directly on the enactive view, as developed here. In this section I discuss the empirical possibilities of that approach, both with respect to current findings and future research. The empirical study of literature, the examination of real, as opposed to hypothetical acts of reading, is where a lot of what has been discussed above can demonstrate its validity and validation. As an experimental activity the empirical study of literature is reliant on the methods and assumptions used in psychology and discourse studies. Historically, it has been a willfully neglected field, especially given the large theoretical body of work dealing with literary meaning, as shown in the previous discussion. It is of great interest to my current proposal that some form of participatory understanding of the processes of literary reception can be found precisely among practitioners of the empirical study of literature (Bortolussi and Dixon, 2003; Miall, 2006). Bortolussi and Dixon propose an approach that they term "psychonarratology," where textual features are examined in close correlation to reader interpretive constructions in the context of a specific reading (Bortolussi and Dixon, 2003). Miall and Kuiken (1994) and Miall (2006) investigate how specific features of the language of texts (imagery, alliteration, meter, syntactic inversion, etc.) influence meaning creation by readers.

The first main issue in empirical studies is a question of research design: how best to study a given text. Discourse studies have traditionally examined questions of inference in a text: from causal connections between narrative events, to processing of anaphoric expressions, to textual cohesion, and other text properties. This type of research uses simplified short narratives, thus greatly limiting the scope and usefulness of any findings by the assumption that all texts, regardless of complexity, make the same requirements on a reader. When real texts are the subject of experimental research, there are a number of options that researchers can take. The most promising one for participatory sense making is the one where particular aspects of a literary text are manipulated, thus isolating a specific effect, and then comparing the reception of that text with the one of the original text. If we accept the hypothesis that a reader enacts a particular narratorial consciousness, there are aspects of how the narrator is presented in a text that are immediate candidates for such empirical work. For example, 1st person, 3rd person, omniscient or figural narration require examination with respect to ease of comprehension and/or aesthetic judgment (value). Another outstanding empirical question is: do readers consciously differentiate between such types of narrators, and if they do how this influences the sensemaking process? Consciousness in a novel is displaced from the situation of telling in either time (reporting the past or the future), or person (type of narrator), and these displacements correlate with specific sense-making strategies. Hence, in conversational narratives story peaks happen in the present tense and the use of the present in a literary narrative becomes a linguistic signal of immediacy vs. displacement (Chafe, 1994). Second, the long standing discussion in narratology between the two main narrative rendering techniques: *showing* and *telling* (Genette, 1980) needs to be evaluated for the same effects. Manipulation of texts with these types of specific features will provide ways to understand how the positioning of the narrator (proximal, in showing; or distal, in telling) to the narrated events affects sense-making. Again, I emphasize the point that in narrative grammatical features, like tenses, are not just forms that correspond to divisions into past, present and future, but also signals to control how some information is to be enacted. Narrating from a particular spatio/temporal or personal/vicarious viewpoint creates for the reader an experiential stance for participation in the storyworld. Third, the main narrative situations pertaining to any narrative sense-making consist of the narratological categories of person (does the narrator belong or not to the narrative world); distance (does the narrator adopt a retrospective or synchronous temporal position); and perspective (does the narrator present an inside view of events and characters, or an external one, or both) (Genette, 1980; Stanzel, 1984). The variations that these combinations provide work toward establishing degrees of availability of the narrative worlds that we inhabit as readers: as a reader I cannot conceive of an imaginary world in which I am not present. But they also serve the purpose of a reader's intersubjective alignment with the narrating consciousness of the story.

Various aspects of reader involvement have made it into the experimental designs of empirical studies. For example, Bortolussi and Dixon have studied degrees of identification with a narrator that a reader undertakes as an aspect of implicitly and explicitly given knowledge about the narrator's actions. They manipulated a text excerpt, so that it became more explicit about the narrator's purpose and created two conditions with an original and analtered text. They predicted that when the reader has to work more, as in reading the original passage, there will be more identification, more opportunity to attribute their own experience to the narrator. The results confirmed that even though the

<sup>16</sup>In another paper (Caracciolo, 2012b), the author also suggests looking at narrative interpretation as a "joint process of sense-making."

explicit altered versions provided more information, the readers saw the narrator as easier to understand in the original version. Miall's (2006) approach is also strongly consistent with the proposal of participatory sense-making. By studying "literariness" or "foregrounding," which originates with formalist views and is traditionally associated with text-specific formal qualities such as metaphor or alliteration, Miall shows it to be a manifestation of the enhanced special nature of the interaction processes between reader and text. Literary narratives have a "dehabituating" role to play in human cognition, which means they invite us to consider frames for thought and feeling that are novel or unfamiliar (Miall, 2006, p. 3), hence more demanding. Importantly for the discussion here, dehabituation is an interactive process initiated by language forms in literary reading, but experientially correlated with heightened attentional or aesthetic states in readers that can be experimentally verified. Finally, Miall's approach points to a need to engage not just in studying how readers interpret texts but in how they experience literary works, a requirement which, importantly, includes considerations of feeling. While most theoretical and empirical work on narrative engages the issue of interpretation, an important question that remains largely unaddressed is what kind of experience is brought by reading, and the answer is emotive experience. Empirical findings about selfimplication during reading (Larsen and Seilman, 1989) show that readers of literary texts draw more on active personal experience. Such results may not only be a validation of the enactive view but also a way to define what is distinctive about literature as a sense-making process.

## **CONCLUSION**

The theoretical and practical study of literary narratives has produced multiple and often contradictory ways of explaining their structure, function, and meaning. Regardless of this prolonged scrutiny there is currently no consensus as to what narratives are and why people find them both engaging and uniquely suited for expressing aspects of human experience. I have argued that stories do not happen in individual minds, either those of tellers or readers, but in the dynamic interaction between them. Traditional narratology, as well as cognitivist story grammars, have relied on static abstract structures of text which are assumed to determine readers' understanding through detached mental representations of a story world. A pragmatic communicative understanding of stories, on the other hand, has assumed that both language and the verbal stories that we tell in it, are explicable through an information processing model of cognition and a transfer model of communication, both of which have proved insufficient. I have argued that stories are best understood as processes of patterned interaction, prospectively anticipated and retrospectively reflected upon in a participatory sense-making between essentially two participants: a reader and a teller. This to some extent imaginary participant is not just a linguistic effect but a manifestation of the irreducibly intersubjective nature of human minds. Literary reading is thus a shared act of participation, moment by moment, in the unfolding action; a process of leading and being led in order to enact an experience. I, as a reader, supply the memories, the imaginings, and the feelings in order to inhabit a world that until then is not my own, but becomes my own when I enact it. A meaningful encounter with a story is thus a participatory act of performance where meaning lies not in words, concepts or events but in the intersubjective spaces they create between the participants.

# **REFERENCES**


Booth, W. (1961). *The Rhetoric of Fiction.* Chicago, IL: University of Chicago Press.


Miall, D. S. (2006). *Literary Reading: Empirical and Theoretical Studies*. New York, NY: Peter Lang.


Wittgenstein, L. (1953). *Philosophical Investigations*. Oxford: Basil Blackwell


Zunshine, L. (2006). *Why We Read Fiction: Theory of Mind and the Novel.* Columbus, OH: Ohio State University Press.

**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 May 2014; accepted: 28 July 2014; published online: 22 August 2014.*

*Citation: Popova YB (2014) Narrativity and enaction: the social nature of literary narrative understanding. Front. Psychol. 5:895. doi: 10.3389/fpsyg. 2014.00895*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Popova. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Voice, (inter-)subjectivity, and real time recurrent interaction

# *Fred Cummins\**

*UCD School of Computer Science and Informatics, University College Dublin, Dublin, Ireland*

## *Edited by:*

*Ezequiel Alejandro Di Paolo, Ikerbasque - Basque Foundation for Science, Spain*

#### *Reviewed by:*

*Tom Froese, Universidad Nacional Autónoma de Mèxico, Mexico Kristian Tylen, Aarhus University, Denmark Joanna Raczaszek-Leonardi, University of Warsaw, Poland*

#### *\*Correspondence:*

*Fred Cummins, UCD School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland e-mail: fred.cummins@ucd.ie*

Received approaches to a unified phenomenon called "language" are firmly committed to a Cartesian view of distinct unobservable minds. Questioning this commitment leads us to recognize that the boundaries conventionally separating the linguistic from the non-linguistic can appear arbitrary, omitting much that is regularly present during vocal communication. The thesis is put forward that uttering, or voicing, is a much older phenomenon than the formal structures studied by the linguist, and that the voice has found elaborations and codifications in other domains too, such as in systems of ritual and rite. Voice, it is suggested, necessarily gives rise to a temporally bound subjectivity, whether it is in inner speech (Descartes' "cogito"), in conversation, or in the synchronized utterances of collective speech found in prayer, protest, and sports arenas world wide. The notion of a fleeting subjective pole tied to dynamically entwined participants who exert reciprocal influence upon each other in real time provides an insightful way to understand notions of common ground, or socially shared cognition. It suggests that the remarkable capacity to construct a shared world that is so characteristic of *Homo sapiens* may be grounded in this ability to become dynamically entangled as seen, e.g., in the centrality of joint attention in human interaction. Empirical evidence of dynamic entanglement in joint speaking is found in behavioral and neuroimaging studies. A convergent theoretical vocabulary is now available in the concept of participatory sense-making, leading to the development of a rich scientific agenda liberated from a stifling metaphysics that obscures, rather than illuminates, the means by which we come to inhabit a shared world.

**Keywords: joint speech, participatory sense-making, intersubjectivity, dynamic entwining, chant**

# **1. INTRODUCTION**

We speak with confidence of something called "language," as if this term referred to a single system, capable of multiple forms of manifestation (writing, speech, signing), but unified by organized structures and processes in the formal domains of phonology, morphology, syntax, and semantics. This emphasis on systematicity and symbolic encoding has utterly dominated the scientific view of "language" at least since the structuralist innovations of Saussure (1959/1916), and has been greatly reinforced by the pivotal role of generative linguistics in the birth of the cognitivist account of mind as a form of symbol-based information processing (Fodor, 1975). In the context of inter-personal communication, language, on this view, serves as a form of message passing, whereby ideas conceived in the mind of one person are encoded, first into words, and then into movements of mouth or hand, at which point they become transmittable to another, who sets about decoding them, thereby gaining access to the ideas of the sender. The message passing perspective on language is compelling, powerful, and supported by a host of technologies, from the very first forms of writing to the most sophisticated of digital platforms.

The emphasis on symbols and systematicity allows the identification of a tentative boundary between the linguistic and the non-linguistic. For example, a conventional distinction is drawn between phonological and non-phonological characteristics of the sounds of speech. Roughly, those features that support the identification of discrete categories such as phonemes, are taken as indices of linguistic structure, while non-categorical and continuously varying features such as the loudness of a voice would lie beyond the notional bounds of language proper. Once discrete entities belonging to non-overlapping categories are available, they can be combined into larger symbolic structures, from syllables to novels.

Language thus appears to be a clearly delineated and unified phenomenon, of which one can meaningfully construct theories. This leads to a compelling observation that there seems to be a yawning chasm between the many kinds of communication systems found in animals and the generative, creative richness found in every human language. And so the foundations are laid for the perplexing observation that language seems to have appeared not so very long ago in an evolutionary timescale, and to have immediately enabled the development of the whole of human culture, technology, and all the institutions of all societies.

Two related observations will serve to provide us here with a slightly different view of "language." The first is that *the above story is fundamentally committed to an ontological split between mind and world*. If we accept such a split, then meanings or ideas belong firmly in the realm of the mental, and they find expression indifferently in writing or speech, each of which provides a kind of physical container for the passing of ideas from one mind to the next. The second observation is that *the traditional story enforces a somewhat arbitrary divide between the linguistic and non-linguistic*, motivated by the desire to ensure that language is systematic and supports the kind of symbolic operations familiar from syntax and related disciplines. If we observe communication among people, we see many aspects to that behavior that never feature in linguistic theory, and that nevertheless seem to be reliably and essentially associated with inter-personal communication. These two observations are related, because if we consider alternatives to the Cartesian mind/world split that divides ideas and meanings from sounds and movements, the apparent significance of many of the behaviors and features reliably and regularly attending communication may change, and with that, the boundaries of "language" may shift, or, indeed, fragment, to reveal a variety of phenomena that do not admit of a single systematic description.

I will argue that the way in which we conventionally treat of the phenomenon called "language" is overly restrictive, and seems more appropriate to the characterization of writing than speaking/listening (Linell, 2005). Older than writing by far is the voice, and the voice has remarkable properties all of its own. Chief among these is the obligatory association between the voice and a transient subject-pole that grounds intentionality. This, it seems to me, may be part of the reason the inner voice seems to be inextricably associated with the Cartesian subject. To develop this notion, I will turn to the substantive domain of joint, or collective, speaking, showing how collective speech engenders a different kind of subject, displaying collective intentionality. Furthermore, just as the voice of the individual admitted of development and codification in writing, so collective speaking admitted of development and codification in practices of liturgy and ritual. Written language, which is the more accurate target of modern linguistics, is thus not the only descendent of voice. The empirical study of collective speaking is in its infancy, but it reveals emergent phenomena that arise only in the real time reciprocal interaction of speakers speaking in unison. These emergent phenomena add substance to the argument that the traditional depiction of language as message passing mischaracterizes, or omits, much of what is going on in vocal communication (Cowley and Love, 2006). It neglects the fluid intertwining of subjectivities that arises in real time reciprocal interaction, and that appears clearly in joint speaking. This only becomes apparent if we approach languaging (rather than language) as a set of multi-faceted behaviors that defy characterization from a single metaphysical viewpoint1 .

## **2. REVISITING DESCARTES**

Let us fancifully drop in on Descartes as he deduces his own existence. The statement "Cogito, ergo sum" is without doubt the most famous line in Western Philosophy, and the basic outline of the argument underlying it is overly familiar <sup>2</sup> . A skeptical philosopher, wishing to establish a foundation for true and certain knowledge, recognizes that the world of appearances, mediated by the senses may be illusory. He considers what remains after denying the testimony of the senses, and reasons thus:

So after considering everything very thoroughly, I must finally conclude that this proposition, *I am, I exist*, is necessarily true whenever it is put forward by me or conceived in my mind (Meditation 2, AT 7:25).

The "I" that is invoked here is explicitly and emphatically not a body, but a mind (7:27). The split between mind and world is absolute. Irrespective of how the consequences are played out, Descartes' certainty has become the split we have failed to distance our selves from. Substance dualism narrowly conceived is, of course, not a respectable metaphysical position any more, but the split that is effected here between mind and world, and at the same time, between metaphysics and epistemology, far from being overcome, has become the foundational assumption upon which the whole of psychology (and more) has been built. As Sheets-Johnstone put it, it has become "a lexical band-aid covering a 350-year-old wound generated and kept suppurating by a schizoid metaphysics" (Sheets-Johnstone, 1999, p. 275).

But what is going on for Descartes? There is a voice. Whether it is a voice speaking in Latin "Cogito, ergo sum!," or a voice speaking in French "Je suis, j'existe!," it is a (silent) utterance—a thought in the form of words. Without language (better: languaging), there is no such thought. Without a culturally specific history of vocal interaction among people during which meanings and uses of language emerge, there is no such voice. The solipsistic prison of Descartes' fancy is not so devoid of other people as he seems to believe, for in harboring the voice that can utter the "Cogito!," it is populated by the practice of Latin, or the practice of French. Closing the eyes does not keep out the world, and it does not keep out other people.

The inner voice of linguistic thought that speaks here "to" Descartes is not different in kind from the outer voice of overt speech. Indeed, the whole metaphorical quagmire associated with the use of the terms *inner* and *outer* stems from the very confusion I wish to here circumvent. Vygotsky has presented a thorough argument that the overt but self-directed speech of young children is, firstly, a specialization of intersubjective social speech, and secondly, is the precursor to inner speech, or linguistic thought (Vygotsky, 1986). This insight provides us with an understanding of continuity between overt speech and silent speech, or linguistic thought.

What if we choose to interpret Descartes' predicament somewhat differently? Instead of considering the voice as evidence of a pre-existing subject, we might consider it to *give rise to* a transient subjecthood. We cannot understand the occurrent thought as an utterance in the message-passing sense, as there are not two distinct domains, a speaker and a listener, for any message to be passed among. But we are now entertaining the tentative notion

<sup>1</sup>A complementary account of languaging from an enactive perspective is provided in Bottineau (2010). This account adheres to a more conventional view of what the domain of language is than adopted here, but many of the fundamental concerns raised therein resonate with the themes of this article.

<sup>2</sup>The famous Latin phrase does not appear in the Second Meditation, where the original argument is most clearly made.

that there is no Cartesian subject before the occurrence of the thought, and so any subjecthood associated with this utterance arises with the utterance and fades thereafter. This is not a fully fledged psychological subject, equipped with the mechanisms of "cognitive systems"; it is a subject-pole that allows a distinction between subject and world, or self and other, to be discerned, and that supports or invites the ascription of intentionality. It is a transient orientation, tied to the real time unfolding of the linguistic thought itself ("*... whenever it is put forward by me or conceived in my mind.*"). Later in the 2nd Meditation, Descartes himself seems to concur with this association of the Subject with the transient inner voice when he says "I am, I exist—that is certain. But for how long? For as long as I am thinking. For it could be that were I totally to cease from thinking, I should totally cease to exist" (Meditation 2, AT 7:27). Now the nature of "thinking" has not been generaly agreed upon, but the form of thinking Descartes here alludes to is clearly the utterance of an inner voice, in specific words, words which he is capable of repeating to us, words which we can characterize as Latin or French. I wish to pursue this idea, that *voice* gives rise to the complementarity between the poles of subject and world, and it does so in real time.

# **3. VOICES AND SUBJECTS**

[V]oice is a kind of sound of an ensouled thing. For none of the things without soul gives voice, though some are said by analogy to give voice, such as the flute and the lyre and whatever other of the things without soul have the production of sustained, varied and articulate sound. For voice also has these features and so there is a likeness (Aristotle, 1986, 420b, p. 178).

The association between the animate (even ensouled) subject and the voice is ancient. In Connor (2000) the long history of the subjects perceived as being behind voices emanating from unlikely places is recounted in detail. From the Delphic oracle through the medieval fascination with demonic possession, prophecy, and divine inspiration, voices perceived as coming from the stomach, the genitals, or even a crack in the rock have been enthusiastically attributed to invisible subjects, rather than to sound-producing properties of either inanimate objects or of atypical parts of the body itself. Much of the ghoulish fascination that the ventriloquist's dummy attracts lies in the obligatory projection of a subject behind the grotesque appearance. Connor writes:

For I *produce* my voice in a way that I do not produce these other attributes [eyes, hair, gait, fingerprints, etc]. *...* giving voice is the process which simultaneously produces articulate sound, and produces myself, as a self-producing being (Connor, 2000, p. 3).

It is telling that the words uttered in one of the very earliest sound recordings, made by Alexander Graham Bell in 1881, are "T-r-r— T-r-r—There are more things in heaven and earth Horatio, than are dreamed of in our philosophy—T-r-r—*I am a Graphophone* and my mother was a Phonograph" (Volta Laboratory, 2013, Emphasis added), thus instinctively investing one of the very first disembodied voices born of technology with subjecthood of its own. Remarkably, the telephone and the phonograph came into being almost simultaneously—in 1876, 1877. Add to these the advent of radio transmission of the human voice, first done in 1900 in Brazil, and it is clear that we have been awash in disembodied voices for over a 100 years and counting. The irritating proliferation of pseudo-personalities such as the iPhone's Siri seems likely to continue.

If the voice Descartes conjures up alone generates a subjectivity that is aligned with the classic subject-object distinction at the level of the single individual, then we might give consideration to the possibility that voice employed in different circumstances might generate other forms of subjectivity, without commitment to individual Cartesian minds.

## **3.1. SHARED SUBJECTIVITY AND COMMON GROUND**

When an utterance is made in a specific context with speaker and listener both present, it is interpreted in the light of the shared understanding of all parties. This has found expression in theoretical notions of common ground (Clark and Brennan, 1991), or socially shared cognition (Schegloff, 1991). Most developments of the idea of common ground are couched within the information processing/message passing framework, and therefore make use of some version of aligned or shared representational content. However, it is not necessary to appeal to such unobservable constructs from a hidden Cartesian world (Hutto and Myin, 2013). There is ample evidence that participants in a conversational exchange become mutually linked in many subtle but observable ways. Eye movements (Richardson et al., 2007), postural sway (Shockley et al., 2009), and even blinking (Cummins, 2012) have all been found to become subtly intertwined in conversation, leading to a dynamic entanglement of the participants. Speakers and listeners are further linked through the provision by the latter of signals of ongoing engagement through postural, gestural, and vocal indices or backchannels (Wagner et al., 2014).

The yoking together of two or more people engaging in language behavior establishes a common basis from which the participants confront the world. It makes available a shared framework within which statements can be interpreted. It thus provides a scaffold for shared intentionality (Carr, 1987). The ability to share an intentional perspecitive seems to be at the very heart of human language use, but it is not an all or nothing affair. Two protesters with common purpose who chant the same slogan demonstrate an extreme alignment with respect to the world. But two people engaged in heated disagreement must still achieve a great deal of alignment in order to disagree felicitously. The topic of disagreement must be foregrounded, at the expense of everything else. In disputing causal chains, in laying out competing sequences of events, and in presenting different interpretations of the significance of actions and events, two disputants are necessarily sharing a great deal of background framing, picking out these events rather than those, identifying the same actors, while quarreling over their respective roles. Even in the absence of conversational exchange, people observing the same scene exert reciprocal influence on one another, such that their gaze behavior, and by inference, the details they pay attention to, become inter-dependent. In a series of experiments summarized in Dale et al. (2013) gaze behavior of subjects are demonstrated to depend sensitively on the presence of others, and on whether one subject knows or believes that the others are seeing and hearing the same things as they are.

If joint languaging provides a very powerful example of intentional alignment, then it might be that that the ability to coordinate the manner in which we jointly pay attention to the world is an important skill that facilitated the emergence of such behavior, as argued in Fusaroli and Tylén (2012). Sometime between the last speciation event some 5 or 6 million years ago that gave rise to chimpanzees and bonobos on the one hand, and the hominid line on the other, something happened that had profound consequences for our ability to share perspectives and to coordinate with one another. There is one small biological change that we know occurred in that time, that might play a significant role here. That change gave rise to the white sclera of the human eye that contrasts vividly with the darker iris, thus providing a very clean signal of the direction of gaze of a partner (Tomasello et al., 2007). The other great apes do not have such a contrast, and their ability to align their gaze is severely limited, and based on head direction rather than the eyes—although chimpanzees and bonobos in particular do display some evidence of understanding the visual perspective of another (Okamoto-Barth et al., 2007). The ability to follow each other's gaze thus facilitates the sharing of attention, and has been demonstrated to structure mother-child interactions, while inducing the abilty to take part in languaging (Tomasello and Farrar, 1986).

As common ground is established, the subjective point from which utterances are spoken also shifts. Vygotsky has pointed out how the (linguistic) subject becomes an implied, rather than an overt, element in speech once common understanding has been established (Vygotsky, 1986, p. 236). For example, it would be odd to respond to the question "Would you like a cup of tea?" with the answer "No, I don't want a cup of tea," instead of simply "No." Similarly, a group of people waiting for a bus establishes sufficient shared context that no one is likely to point out the obvious and say "The bus for which we are waiting is coming," but simply "coming" or some such expression. The dropping of the linguistic subject is more extensive yet in inner speech, of which Vygotsky says "it is as much a law of inner speech to omit subjects as it is a law of written speech to contain both subjects and predicates" (Vygotsky, 1986, p. 243). Many languages allow dropping of any explicit mention of the subject once they can be inferred on pragmatic grounds. This is not merely a syntactic quirk of one group of languages, as it is found in such typologically distant languages as Japanese, Chinese, Turkish, and Spanish (Huang, 1984).

It would be a mistake to simply equate the subject pole of a subject-world complementary pair with the syntactic subject, but it would be inexcusable too to ignore the deep link between the fundamental linguistic structure of subject and predicate on the one hand and the subjective pole from which utterances are brought forth on the other. The subject pole that arises in the unfolding of the voice grounds intentionality, and provides an anchoring point for reference. This is, perhaps, most explicit in the manner in which deixis functions, allowing use of terms such as "there," "here," "then," "now," whose meaning is anchored in the joint situation created by conversational participants; It is also explicit in the manner in which the first personal pronouns, both singular and plural, find flexible, and context-specific use. It is implicit too in establishing a shared register and perspective within which meaning is negotiated. The differentiation of subject and world, and the ability to establish a shared perspective within which utterances function, precedes any overt syntactic knowledge or awareness by millennia (Olson, 1996).

## **3.2. ALIGNMENT vs. SYNERGY**

The dynamic intertwining of conversational participants interacting in real time has not gone unnoticed. An influential approach to account for the many overt and subtle ways in which two interlocutors become linked is found in the Interactive Alignment model of Pickering and Garrod (2004, 2014). This model seeks to describe the tendency for conversational partners to imitate one another at a variety of levels, from syntactic biasing, through lexical selection, down to the level of phonetic, and gestural imitation. The idea that similarity in one domain can unconsciously bleed through representational levels to generate similarity in other domains provides some explanatory purchase on a great deal of corpus-based data. As a general account of the dynamic coupling and mutual accommodation found among speaker/listeners, however, it is somewhat limited. It leaves language resolutely within the heads of individual conversing partners, and this does not move beyond the Cartesian, representationalist framework. It is "representation-hungry," demanding computational representations at many levels, and indeed, in its most recent form, it conjures up a baroque series of simulations inside the heads of individuals who must not only act, but also predict the actions of others (Pickering and Garrod, 2014). This approach does not generalize in any obvious way to multiparty conversations. Nor does it account for coupling among interactants that are not strictly imitative in nature, as with the mutual influence exerted on blinks (Cummins, 2012). The tendency to alignment suggests that felicitous conversation would result in mere mimicry, which is again not what we observe, and it privileges similarity, at the expense of complementarity, thereby missing the fundamental role-based nature of conversation in which the positions of speaker and listener alternate.

A competing account has recently been proposed that regards inter-personal coordination in dialog as a form of synergy or dynamical coupling (Fusaroli et al., 2014). This approach is rooted in dynamical approaches to coordination that are levelagnostic, seeking to understand emergent phenomena at one level (e.g., the dyad) as arising through processes of self-organization from the constrained interaction of autonomous components at a lower level (the speaker/listeners) (Kelso, 1995; Latash, 2008). This approach highlights the sensitivity of participants to real time recurrent interaction, as is evident even in the early interactions of infants and mothers (Murray and Trevarthen, 1986). It emphasizes the intertwining of the movements of participants, leading to dimensional reduction, so that two interacting persons become, temporarily, a simpler collective entity than the two persons considered as a mere conjunction of individuals. It acknowledges both synchronized and complementary actions as they contribute to this simplification, and it emphasizes the manner in which shared understanding of task constraints leads to stability of patterning in time. Although still somewhat speculative, this level-independent approach seems commensurate with the approach to be developed here that treats groups of people as synergetically organized domains in their own right, with respect to which subjectivities of a collective nature can be identified.

Synergistic approaches to human communication have been argued for by others. Thibault (2011) adopts a position not unlike the present one in which a fundamental distinction is drawn between what he calls talk and text. The role of voice described both here and in his work emphasizes the bodily entrainment that arises at a very fine scale among interactants, while the properties that linguists conventionally consider, and that admit of a computational description, constitute a distinct, and secondorder set of phenomena. Although not focussed on languaging, Riley et al. (2011) argue that interpersonal movement coordination is the result of establishing interpersonal synergies of the sort described here, and they distinguish between componentdominant dynamics, as portrayed within a cognitivist framework, with interaction-dominant dynamics in which the autonomy of the level of interaction is more thoroughly acknowledged. Finally, the perceptual crossing paradigm introduced by Auvray et al. (2009) provides a minimalist experimental set up in which two people interact in real time in a minimal virtual space. While not communicative in any conventional sense, the nature of the emergent behavior observed serves to illustrate the principal point being made that the interaction itself constitutes a level of relative autonomy that is not reducible to the conjunction of properties of its components (Froese et al., 2014). These latter two examples illustrate that social interaction and languaging are not separate phenomena. Languaging is a constitutive part of the manner in which interpersonal entrainment or coupling arises in the moment by moment real time reciprocal interaction among people.

## **3.3. VOICE vs. WRITING**

Before giving further consideration to the relationship between subjecthood and voice, it is appropriate to recall the vast chasm that separates speech from writing, not least as the claim is made here that most of the phenomena described by modern linguistics relate, in fact, to the structure of written communication, and are only indirectly relevant to the act of speaking, which is the central form in which languaging is manifested (Linell, 2005). Since the advent of alphabetic writing in Greek society, a naive view has been available that writing is simply a device for transcribing speech. Olson (1996, p. 66) identifies overt statements that express this view from Aristotle, Saussure, Bloomfield, and more. This is why theories of syntax, morphology, and semantics, that together delimit much of that which we call "language," allow themselves to study and model the formal characteristics of symbol strings, without consideration of the medium of expression. This insensitivity to the enormous differences between writing and speech underlies the focus by Saussure on *langue* rather than *parole*, and by Chomsky on competence, rather than performance. With that, modern linguistic theory has turned its attention away from the most common form of languaging, indeed the only one that existed from the fuzzy origins of speech until the relatively recent development of writing and the even more novel phenomenon of mass textual proliferation. It has ignored the real time reciprocal interaction among people giving voice from context-specific situations of concern.

We have now a wealth of research that documents very substantial changes that arise with the advent of writing, and especially with the spread of literacy consequent to the development of printing. These changes affect not only the way language is used, but the very structure of the consciousness of language users (Stewart, 2010). Ong (1982) provides an authoritative and comprehensive catalog of differences between the way knowledge is managed, shared, and verbalized in primary oral cultures, and in highly literate ones. Olson (1996) further documents the profound conceptual and cognitive implications of the spread of literacy. Much of this work focusses on the novelties that accompany writing and literacy. McLuhan claimed that "writing was an embalming process that froze language" (McLuhan, 1964), and he provides an anectode from Prince Modupe, who speaks of his encounter with the written word in his West African days:

The one crowded space in Father Perry's house was his bookshelves. I gradually came to understand that the marks on the pages were *trapped words*. Anyone could learn to decipher the symbols and turn the trapped words loose again into speech. The ink of the print trapped the thoughts; they could no more get away than a *doomboo* could get out of a pit*...* (McLuhan, 1964, p. 84).

With writing, texts achieve an independence from their sources. A spoken utterance is necessarily vouched for by the speaker, while a written sentence asserts, without the contingency and commitment of a speaker. I have mentioned that voice gives rise to a subjective pole. Here we can see that the complement is also true: writing gives rise to a particular kind of *objectivity*, one in which for the first time it is possible to have "facts that speak for themselves" (Latour, 2013). (For an insightful account of several ways in which objectivities are constructed, see Daston and Galison, 2007). Written sentences remain immutable and thus support dissection and analysis in a way that spoken utterances, which must be articulated each time they come into being, do not. The further development of speech and language technologies in the service of message passing has given rise to forms of spoken langauge, e.g., in news broadcasts or public service announcements, that bear greater similarity to written texts than to spoken utterances, while recent increases in the possibility of text-based reciprocal exchanges, e.g., in SMS messaging, further serve to complicate the relation between voices, texts, messages, and intentions3 .

It is interesting in this regard to consider the constraint observed by Everett to hold in the language of the Pirahã, an Amazonian tribe whose language is remarkable in its simplicity and omissions, having no counting system, very restricted tenses, arguably no syntactic recursion, etc. The Pirahã also have no mythology or stock of fiction. Everett attributes many of these constraints to what he calls the Immediacy of Experience Principle, according to which statements by the Pirahã "contain only assertions related directly to the moment of speech, either

<sup>3</sup>My thanks to the anonymous reviewer who pointed out that the stark dichotomy between spoken and written texts has become considerably more complex.

experienced by the speaker or witnessed by someone alive during the lifetime of the speaker" (Everett, 2009a, p. 132). Here, the strong tie betwen the speaker and the words spoken appears to have become sedimented into the very structure of the language and culture, leaving no room for the disembodied words found in writing. It is perhaps no coincidence that Everett's observations have become controversial precisely among those linguists who hold syntax, and syntactic recursion in particular, to be central to the very nature of language (Hauser et al., 2002; Everett, 2009b).

# **4. SPEAKING IN UNISON**

The act of speaking in unison is a common form of vocal behavior that is accorded no particular theoretical significance in a message passing view of language. On the received view, minds and subjects are closed and singular; thus many people saying the same thing at the same time appears merely as a multiplication of the individual speaker. The behavior does seem somewhat perplexing though, for what message is being passed if we all know the words? It is worthwhile to consider both the occasions in which people often speak in unison, and the form of the speech so produced.

"Joint speaking" is an umbrella term I have coined to cover all occasions in which the same words are uttered by multiple people in unison (Cummins, 2013a). This includes many practices of collective prayer, the chants of both protest demonstrators and sports fans, the recitations of young school children, performances of choral speech, and the swearing of collective oaths in secular contexts. To all these naturally occurring variants we can also add the simultaneous reading of novel texts by pairs (or more) of speakers in the laboratory in a paradigm known as Synchronous Speech (Cummins, 2003, 2009).

This brief survey of situations in which people speak in unison makes it clear that this behavior is very widespread, and is found in virtually every culture. It is thus a central, and not a peripheral, example of languaging. With the exception of joint speaking in classrooms, which serves a multitude of purposes imposed by educational authorities rather than expressing any sentiment of the speakers, all of the naturally occurring forms of joint speech are found in situations in which the attribution of collective, shared, intentionality seems to straightforwardly capture the significance of the practice for participants. In prayer contexts, collective speaking testifies to shared beliefs. In protest, the shared purposes of the crowd are made manifest through chanting. Among sports fans, chants are a means by which collective identity is sustained and asserted. None of this is at all surprising, nor in need of precise definition—at least, no more precise than seems warranted for the attribution of beliefs, desires, and intentions to individuals. While we may not all be enthusiastic chanters, even a reluctance to join in such behavior testifies to the obligatory assocation of such voicings with the underlying sentiments.

But if message passing does not illuminate such behavior, it seems fair to ask how we might better characterize it; why are people engaging in such vocal activity, if not to pass ideas around? While there is probably not a single answer to this question, a useful conceptual approach suggests itself from the theory of speech acts (Austin, 1975). Austin noted that many utterances achieve something simply by virtue of being spoken. Examples include "I pronounce you man and wife," or "I apologize for my behavior." Such utterances he called "performatives." In the treatment provided by Austin, they are frequently signaled by such verbs as "pronounce," "decree," "promise," etc. The set of performatives Austin alludes to, and the associated set of acts performed is very restricted. If there is merit to the idea that uttering gives rise to the complementary poles of subject and world, then *all* utterances might properly be considered to be performatives, and the establishment of a transient subject pole with an implicit intentional structure would then be an achievement of the act of uttering. This approach to understanding joint speech helps to make sense of some of its most reliable features. In what follows I will consider mainly the three most common forms of joint speech4 : collective prayer, protest chanting, and sports chanting.

All three forms of joint speech are frequently, almost inevitably, characterized by repetition: the same phrase or short verse is repeated tens, or even hundreds of times over. Repetition makes sense if the temporally bound act of utterance is required to establish and maintain a transient subject pole with respect to which we can identify beliefs or intentions. Repetition is undergirded by physical actions such as fist pumping, bead twiddling, or arm waving. While bead manipulation is relatively private, the more macroscopic actions further serve to facilitate synchronization among participants.

Repetition also serves to accentuate and exaggerate the rhythmic properties of utterances, while repetition of a short phrase can also induce a change in perception from speech to song (Deutsch et al., 2011). In repeated spoken chants, the form of speech that arises thus blends seamlessly into the musical domain, establishing a continuity between speech and music. The close relation between spoken and sung chant is signaled by the very ambiguous nature of the word "chant" in English which applies with equal facility in either domain. It is interesting that a focus on collective speech makes a continuum between speech and music appear natural, even obligatory, while the message passing perspective as articulated most clearly by Pinker (1999) insists on an absolute divide between the two domains. On the messagepassing view, speech is an expression of the highly valued notional faculty of language, and thus central to our human minds, while music is denigrated as "auditory cheesecake," with no—from his perspective—apparent functional significance, thus meriting being grouped together with artistic expression, cheesecake, and pornography (Pinker, 1999). If anything illustrates the limited capacity to describe, or even see, that the message passing perspective induces, surely it is this failure to appreciate the continuum we are all familiar with that extends from instrumental music, through song, rap, poetry, rhymes, rhetoric, and chant (Cummins, 2013a). We might note in passing that the contrast between the real time participatory nature of the voice that is here contrasted strongly with the frozen nature of writing finds a strong parallel in contemporary discussion of the relationship between live musical performance and recording (Chanan, 1995).

We like to speak of the "wisdom of crowds," but the rather more familiar notion of the ignorance of the mob, whose powers

<sup>4</sup>I hypothesize—I am not sure how one might measure relative frequency here.

of reason are not to be trusted, is perhaps more apt for many of the situations under consideration. While groups have frequently been found to outperform individuals in tasks of judgment and estimation (Koriat, 2012), groups involved in joint speech of protest are often found in volatile situations where collective actions are rudimentary and aggressive. It is worth noting though that some degree of sophistication in the beliefs that are jointly articulated is provided by the formal scaffold of call and response. The device of having a single leader call a series of questions to which the crowd provides a series of responses is found in both prayer and protest, though perhaps less so in sports chants. In prayer, this sequence of leading call and collective response is often formalized into liturgical rites, allowing for a great deal of complexity in the beliefs that are thereby expressed. In protest, it is far more common to see only a single call, and a single response, and the very nature of protest mitigates against the kind of codification found in ritual liturgical practices. Sports chanting seems to be more concerned with the demonstration of collective identity than with the formulation of explicit statements of belief or intention, and call-and-response chants are less common.

If we view writing as an elaboration of some aspects of speaking, i.e., a technological extrapolation that gives rise to a formal system of the kind studied under the somewhat misleading label of "language," then we might observe that vocal behavior, or languaging, appears to have other extrapolations, other forms of extension, and other forms of codification, so that the formal constructs of the linguists are not the only descendents of the voice. Collective speech has found integration into rituals in a great diversity of traditions. The Abrahamic religions all formalize collective speaking within their respective services, and in each of them the rituals integrate joint speaking into a carefully orchestrated sequence of complementary acts by service leaders and participants that include highly stylized sequences of movements such as bowing, kneeling, marching, etc. Other religious traditions have engaged in similar forms of codification (Bell, 1988). Parallels between linguistic grammar and ritual structure have previously been noted (Michaels et al., 2010), but the principal point argued here is that voice has given rise to more than one species of formalization. Liturgy and ritual do not admit of the same generative mutability as freely spoken or written text, but by codifying such utterances in collective speech and ritual, the implicit intentional structure that arises in speaking and performing, together with the associated belief structure, is stabilized. With such observations, the boundaries of "language" become somewhat less determinate, and the subjects that find voice become both more numerous and more varied.

## **4.1. DYNAMIC ENTANGLEMENT IN SYNCHRONOUS SPEAKING**

If the relation between voice and subjectivity put forward here has merit, joint speaking appears as an extreme example that can serve to hone our considerations of the form and nature of collective intentionality. In monolog, I alone dictate the intentional ground of my utterances; in conversation, the shared ground is fluid and negotiated; in chanting it is immovable. Are there then any signatures of joint intentionality that we can observe? In the spirit of the dynamical coupling hypothesis of Fusaroli et al. (2014), we might look for evidence that joint speakers are strongly coupled, giving rise to emergent phenomena at the supra-individual level.

In a series of behavioral studies in which speakers are asked to read novel texts in unison, no major differences that would serve to pick out speech as collective based on its acoustic characteristics alone have been observed (Cummins, 2014). Speech produced in these constrained laboratory settings is remarkably unremarkable, and the technique of having subjects speak in synchrony has been used as a device for obtaining unmarked speech in several phonetic studies (Krivokapic, 2007; Kim and ´ Nam, 2008; O'Dell et al., 2010; Dellwo and Friedrichs, 2012). The unmarked phonetic structure of speech elicited in the synchronous speaking situation contrasts strongly with the observation that texts recited in ritual and rite are frequently, if not inevitably, highly stylized in prosodic form. For example, consider the typical pattern with which the Hail Mary is said when reciting the rosary, or, in a secular context, the characteristic form of the Pledge of Allegiance as recited by American schoolchildren. Prosodic stylization thus appears as a reliable, but not necessary characteristic of joint speech.

There is one form of speech error found in a synchronous speech task that seems to be unique to that situation, and that illustrates a strong dynamic coupling between speakers. When one speaker makes a speech error, it is frequently, though by no means always, observed that both speakers stop speaking simultaneously. Sometimes this abrupt cessation can even be in mid-syllable. Abrupt and simultaneous cessation of speech seems to be unique to this situation, and I have previously compared it to the collective tumbling that happens so readily in a threelegged race if either participant makes a misstep (Cummins et al., 2013). This seems to suggest that the task of synchronizing leads to a close intertwining of the process of speech production by each speaker, leaving each vulnerable to mistakes by the other. This observation might be tempered, however, by noting that the degree of synchronization found in the laboratory is typically much greater than that found in the wild, where relatively loose temporal alignment is common and tolerated.

A second source of empirical phenomena associated with joint speaking comes from an fMRI study by jasmin et al. (in preparation), in which subjects spoke prepared sentences in a variety of conditions, including speaking alone, listening, speaking in synchrony with the experimenter and speaking in synchrony with a recording of the experimenter. Importantly, subjects were not informed of the difference between the latter two conditions, and on debriefing, they were never aware that recordings were used at all. In contrasting the regional blood flow subsequent to speaking in the latter two synchronization conditions, a marked difference was found in macroscopic patterns of cortical activity, despite the obliviousness of subjects to the contrast. In particular, synchronization with a live person was characterized by an increase in activity in right hemisphere locations, including the temporal pole, supramarginal gyrus, superior temporal gyrus, and the right hemisphere homolog of Broca's area—the latter three are areas that, in the left hemisphere, are reliably implicated in speech production activity. There is thus a large scale alteration to the well-known hemispheric asymmetry that attends speech production, but only when the speaker is coupled in real time to another speaker, and not when the non-self voice has the inflexibility of a recording.

# **5. VOICE, (INTER-)SUBJECTIVITY AND REAL TIME RECURRENT INTERACTION**

As scientists, there is a need to acknowledge that the metaphysical background within which one works makes some inquiries possible, and some impossible. For all the acknowledged successes of the message passing view of language rooted in a Cartesian framework, there are very many familiar phenomena that have been passed over, or, at best, relegated to the outer wastelands of the non-cognitive and non-linguistic. I have here sought to work with a notion of the subject that is an emergent property of specific kinds of interpersonal interaction rooted in real time reciprocal exchange. This unconventional view of the subject brings with it a very different view of what language is, to the point where the systematic formal system described by modern linguistics no longer appears to be describing the human capacity to create shared perspective, to generate a shared common ground, and to bring forth a common world. Where received approaches to "language" treat of regularities found in sequences of symbols, I have focussed on the voice, uttered from a specific concerned perspective, and necessarily tied to the real time negotiation of a subjective pole. In the voice, we find a strong index of intentionality, but an intentionality that shifts, that arises fluidly, that is sometimes grounded in an individual, sometimes in a negotiated context, and that sometimes seems to emerge at the collective level in a manner no longer reducible to the thoughts, beliefs, and perspectives of the contributing individuals (Carr, 1987). This dissociation of the voiced subject from the solipsistic individual is seen perhaps most clearly in the case of joint speech. The emphasis on voice and intentionality serves to position the symbolic domain of structural and generative linguistics as a specific, limited, extrapolation, and codification of an older practice of *uttering* that has given rise to several distinct extensions and codifications in such domains as ritual and rite.

The loosening of metaphysical commitments that results when we abandon the Cartesian subject offers the opportunity to reconsider many phenomena, and joint speech provides an important and familiar case in point. The practice of joint speech is not restricted to any particular culture. As well as being ubiquitous, it is immediately apparant that the situations in which people speak collectively do not form an arbitrary or incoherent set. All such situations seem to provide strong evidence of collectively held beliefs, and it is through the collective voicing that this attribution becomes warranted. It might help here to note that the subjectivity being treated so rudely is not coextensive with the mind of an individual, nor with the idea of a cognitive system, conceived of as a set of sub-personal information processing mechanisms that some hypothesize to underlie observed behavior. The subject pole referred to here is an aggregate to whom it makes sense to attribute a limited range of intentions, and in particular, beliefs. I am thus wielding the term "belief" here in a sense rather like the dispositional account provided by Ryle (1949). This flexible notion of the subject seems to work when applied to an individual, a conversing dyad, or a lynch mob, each of whom can be said to speak from a distinct position, with a specific perspective. In strenuously avoiding the Cartesian split between mind and world, we would do well to avoid adopting an overly rigid metaphysical position. Rather, if subjects admit of the kind of treatment proposed here, then an ontological lightness of touch that can encompass many kinds of intentional subjects seems warranted.

The empirical phenomena described above strongly highlight the importance of real time dynamic interaction among people in generating the subject-pole to which beliefs can sensibly be attributed. The neural signature of collective speaking is found when speaking with a live speaker, but not with a recording (jasmin et al., in preparation). Live conversational partners become entangled not only in ways that fit a linguistic description (lexical priming, syntactic biasing, phonological, and phonetic imitation, Pickering and Garrod, 2004), but in a host of subtle ways that have hitherto been treated of as non-linguistic. These include gaze, posture, gestures, and blinks, but this set might conceivably be considerably extended as researchers turn their attention more and more to physiological markers of interaction (Campbell, 2007; Richardson et al., 2007; Shockley et al., 2009; Cummins, 2012; Wagner et al., 2014). The voice is an important part of the means by which a collective perspective is established and maintained, but it is one among many. The interaction of voice and gaze may play a particularly strong role in allowing the protracted sustainment of conditions of joint attention, which appears as a possible foundation for the shared intentionality required to ground a human cultural world (Tomasello et al., 2005) 5 .

The dynamic entanglement seen in conversation, and in joint speech, can be empirically described as a form of mutual coordination, whereby two or more participants display a transient inter-dependence on many levels (Shockley et al., 2009; Fusaroli et al., 2014). This third-person account lends itself well to ethological and experimental observation and modeling. A well-worked mathematical framework for describing how autonomous systems that interact in real time can give rise to emergent phenomena at the collective level is available, e.g., as illustrated by the field of coordination dynamics (Kelso, 1995; Oullier and Kelso, 2009). Social cognitive neuroscience has recently begun to recognize that nervous systems of interacting individuals behave quite differently from those of solitary subjects, and often become inter-dependent (Hari and Kujala, 2009; Babiloni and Astolfi, 2012; Schilbach et al., 2013). This opens up a vast empirical research agenda for the future.

But the shifting ground of subjectivity that is here espoused poses challenges for description from a phenomenological or experiential point of view. Here, the recent concept of participatory sense-making may be of assistance (De Jaegher and Di Paolo, 2007; Fuchs and De Jaegher, 2009). Participatory sense-making extrapolates from the basic enactive account that grounds sensemaking (perception/action in the service of the generation of meaning) in the adaptive interaction of an autonomous agent with its environment (Froese and Di Paolo, 2011). Building on

<sup>5</sup>Small wonder then that the appearance of "language" appears utterly mysterious from the vantage point of modern linguistics (Hauser et al., 2014). The discipline has defined its own subject almost out of existence.

this perspective, participatory sense-making describes how the moment-to-moment interaction of two subjects gives rise to a mutuality in their joint sense-making, allowing for the joint creation of meaning. On this account, the emergent domain constituted by the inter-dependent activities of two or more subjects warrants treatment as a phenomenological domain in its own right (Cummins, 2013b). Intersubjectivity then is the enactment of a novel phenomenological domain in the sustained, real time coordinated activities of two or more people. There appears to be a convergence of the theoretical vocabulary and the demands raised by empirical studies that bodes well for further scientific work.

A host of open questions relate to the role of clock time and synchronized behavior. In collective speaking, we observe highly coordinated action that relies, not on a common external beat or timekeeper, but on shared knowledge among interactants. Highly synchronized behavior that is scaffolded by an external beat is also very common, as in music making, marching, or dancing, but this kind of collective entrainment does not seem to bring with it an automatic sense of commitment to underlying beliefs or intentions. We are all familiar with western school kids dancing happily to the religiously tinged beats of Bob Marley, without worrying about whether they really subscribe to the tenets of Rastafarianism. Much work remains to be done in gaining a better understanding of how collective coordinated behavior gives rise to collective intentionality, and what the necessary preconditions for that in the contributing individuals are.

A willingness to countenance subjective poles that are not coextensive with the individual person, and that rise and fade in a dynamic fashion, is incompatible with the grounding assumptions of much of conventional psychology. Of course, psychology itself has grappled since its inception with the boundaries of the subject (Dewey, 1896). One way of describing the subject matter of psychology is with reference to the twin poles of experience and behavior, for which a causal account is sought. This approach looks out at the world from a subject whose existence, persistence, and integrity is taken for granted. The approach taken here, and enabled by the enactive framework more generally, is to reverse the direction of inquiry, from a view toward experience (whose?) and behavior (by whom?), and to look instead at the shifting referents of the personal pronouns "I," "we," "you," etc. It is here that it becomes apparent that the received view of language will not serve, any more than the notion of a solipsistic mind. Of course the contemporary scientific view of language is deeply rooted in a specific set of psychological commitments, and a view of mind as information processing, that together gave birth to the cognitivist worldview. Adopting a different stance with respect to the ground of experience must, it seems, go hand in hand with a willingness to question the boundaries that have traditionally served to demarcate the linguistic domain. This opens up the enticing prospect that we might begin to question, negotiate, and re-evaluate just what, and who, "we" think "we" are.

In Seeger (2004), an account is provided of the way music and song are integrated into the lives of the Suyá people of the Amazon basin. Some songs, the shout songs, are sung from what we might consider a conventional egocentric perspective. Others are sung in unison. Of these Seeger notes:

The Suyá men said they sang shout songs for their sisters*...* When I asked them for whom they sang unison songs, they responded that they simply sang them. They weren't for anyone. A man did not sing a unison song as a brother, lover, or individual. He sang it as a member of a group, whose identity was partly established through the song. Thus, they sang for a general audience: the act of singing was the statement. In some sense, invocations had no audience at all*...* (Seeger, 2004, p. 83)

## **ACKNOWLEDGMENT**

I am indebted to three anonymous reviewers who provided very thoughtful feedback which improved the present contribution.

## **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 April 2014; accepted: 27 June 2014; published online: 18 July 2014. Citation: Cummins F (2014) Voice, (inter-)subjectivity, and real time recurrent interaction. Front. Psychol. 5:760. doi: 10.3389/fpsyg.2014.00760*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Cummins. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# From "cracking the orthographic code" to "playing with language": toward a usage-based foundation of the reading process

# *Sebastian Wallot\**

Department of Culture and Society and Interacting Minds Centre, Aarhus University, Aarhus, Denmark

## *Edited by:*

Ezequiel Alejandro Di Paolo, Ikerbasque – Basque Foundation for Science, Spain

#### *Reviewed by:*

Maarten Wijnants, Radboud University Nijmegen, Netherlands Anna M. T. Bosman, Radboud University Nijmegen, Netherlands

#### *\*Correspondence:*

Sebastian Wallot, Department of Culture and Society and Interacting Minds Centre, Aarhus University, Jens Christian Skous Vej 4, 8000 Aarhus, Denmark e-mail: sewa@cas.au.dk

The empirical study of reading dates back more than 125 years. But despite this long tradition, the scientific understanding of reading has made rather heterogeneous progress: many factors that influence the process of text reading have been uncovered, but theoretical explanations remain fragmented; no general theory pulls together the diverse findings. A handful of scholars have noted that properties thought to be at the core of the reading process do not actually generalize across different languages or from situations single-word reading to connected text reading. Such observations cast doubt on many of the traditional conceptions about reading. In this article, I suggest that the observed heterogeneity in the research is due to misguided conceptions about the reading process. Particularly problematic are the unrefined notions about meaning which undergird many reading theories: most psychological theories of reading implicitly assume a kind of elemental token semantics, where words serve as stable units of meaning in a text. This conception of meaning creates major conceptual problems. As an alternative, I argue that reading shoud be rather understood as a form of language use, which circumvents many of the conceptual problems and connects reading to a wider range of linguistic communication. Finally, drawing from Wittgenstein, the concept of "language games" is outlined as an approach to language use that can be operationalized scientifically to provide a new foundation for reading research.

**Keywords: reading research, natural reading, meaning, language use, language games**

## **LANGUAGE USE AND READING – WHY BOTHER?**

Reading is a "culturally cognitive" phenomenon that sets humans apart from other intelligent creatures. Theoretically, reading is interesting because it is a learned practice that incorporates many human capacities; from basic processes of visual perception to abstract cognitive skills such as reasoning, imagination, and creativity. The ability to read and comprehend texts has become a key necessity for participation in contemporary society: it is a prerequisite for all forms of higher education (Rindermann and Ceci, 2009), and has direct consequences for health and life expectancy (Pignone and DeWalt, 2006). Accordingly, the empirical investigation of the reading process has one of the longest traditions in experimental psychology, dating back more than 125 years.

However, despite this long tradition, the scientific understanding of reading has made rather heterogeneous progress: much progress has been made in uncovering many facts about reading, highlighting how linguistic, individual and situational factors influence the process of reading under certain circumstances. However, this progress in gathering facts about reading and its constituent factors has not been complemented by a similar theoretical progress that pulls together the observed facts. This is reflected in the complex patterns of contextual effects on reading behavior that pervade the scientific literature (Van Orden et al., 2001) and a rather fragmented theoretical landscape (Rayner and Reichle, 2010).

Moreover, and in addition to the problems inherited from strict experimental investigations, recent findings on connected text reading, literary reading, and cross-language investigations of reading have started to gnaw at the edges of accepted assumptions, as they seem to indicate that some of the major theoretical commitments in most of reading research – such as the primacy of the word level, the importance of lexical features, or the assumption that words have a definite meaning – do not apply to naturalistic, or at least more complex reading situations.

In this essay, I explore the possibility that the observed heterogeneity and non-convergence in reading research is due to a somewhat misguided conception of the reading process: up to now, reading research has been very much concerned with the front-end of the reading process (i.e., how visual features can be correctly identified as words) and the search for general mechanisms that are invariant across contexts (Pollatsek et al., 2003). Conceptually, reading is seen as a rather passive process. Other fields that concern themselves with how language works, such as the philosophy of language or interaction and communication research, have largely moved on from a strict mechanistic view to a usage-based view of language. In a nutshell, this means that the function of linguistic tokens have no life of their own, but are first and foremost subject to temporal and contextual factors, turning the theoretical priorities of contemporary reading research on its head.

In what follows, I will discuss the possibility that the theoretical priorities in psychological reading research have to be reconsidered. Especially, that some of the theoretical core-commitments of contemporary reading research might be incommensurable with necessary assumptions about reading as a language phenomenon. To explain my reasoning, I will first give a brief description of what would seem to be indispensable ingredients of reading as a phenomenon that would probably be agreed upon by the vast majority of psychologists, philosophers and scholars of literature. Then, I will summarize the core-commitments of contemporary research on reading and evaluate how these commitments relate to the minimal core-assumptions one has to make for reading as a language phenomenon. Based on this review, I will conclude that the empirical results and fundamental concerns about meaning are somewhat at odds with the current core-assumptions of reading research. And furthermore that the conceptual problems that arise in this context might be solved by adopting a more usagebased view on reading, where reading is conceptualized less as a translational process that "maps printed words into the mind," but more as a "one-and-a-half-person dialog" of sorts. I will finish by picking up Wittgenstein's concept of "language games" and describe how this concept can be used as a linchpin for a general understanding of communicative processes, linking reading with online communication, and how this allows the deduction of the concept of "reading games", that can potentially serve as a new core-foundation for reading research in empirical investigation of the reading process.

## **WHAT IS READING FOR?**

Reading and writing are relatively recent cultural developments: the earliest records of the precursors of writing data back to cave paintings roughly 20,000 years old, and the first traces of proto-writing appeared 3,400 BC in the Middle East, the Proto-Cuneiform. This pictographic form seems to have been first invented for bookkeeping purposes. In an interesting way similar to rather modern developments of emailing and texting, these communications were of a rather short lived nature: as Nissen (1993) notes "After authorized individuals have broken sealed stoppers of collars in order to gain access (...), the fragmented sealings may have been kept somewhere for control purposes but then lost their purpose and were consequently disposed of. Written documents were unquestionably treated in the same way. They served to carry out future check (...). After a certain time had lapsed, this information was no longer useful. Concequently, the tables were probably thrown away in regular intervals (...) (p. 6)." Hence, early instances of reading and writing were much more connected to their environment, serving rather direct, sign-post like functions.

It took at about another 600 years for the first appearance of coherent texts that would qualify as literature (Grimbly, 2013), and yet roughly another 1000 years until the first alphabetic languages appeared (Sampson, 1985). These developments in the conventionalization of writing systems finally served to establish writings as a more permanent medium for communication across broader scales, where authors could present their thoughts to an increasingly bigger audience. Finally, printing techniques allowed for increasingly efficient multiplication and simultaneous

distribution to several places at once – broadening the space for communication.

Through writing, authors could preserve their thoughts, allowing communication on new temporal scales, even past the lifespan of an author. Besides this one-author-one-reader relationship, reading of texts also tied in with online communication between groups of people, as many individuals could now read the same text and discuss, interpret, and act upon its content (for example, the concept of the newspaper, or the air-dropping of pamphlets during war time). Of course, modern informational technology has also created a space in-between those two – the classical reading situation and the classical form of online communication – where emails, short messages, and chats allow a more fast-paced, tightly coupled exchange that uses reading and writing as means to transport content within the setting of online communication. Hence, it seems generally acceptable to say "Reading (and writing) is a form of communication, and it evolved to serve a communicative function."

Furthermore, we can refine this statement, by specifying more how reading (and writing) goes about serving that communicative function in a basic way, which is by providing a specific medium, a visual symbol system/writing system that transports content. This is a general aspect that reading shares with all other forms of communication, that transport content by means of some medium – for example, the sound of the voice during reading aloud (but also visual aspects, such as gestures, facial expressions, etc.). Hence, we can say: "Reading is an activity necessary for "accessing" content from the communicative medium of the writing system – necessary in order to "use" a writing system as a communicative medium."

In the end, one could also say that for something to be labeled a successful communication – or indeed communication at all as opposed to mere activity (and similar to the distinction between action and behavior) – "meaning" needs to be present, or that an activity needs to be meaningful.

What has been said so far might seem trivial, or commonly agreeable upon, but we will see that the details of how these aspects of reading are understood in particular will make quite a difference. We will need some space to unpack this second statement – relating the process of reading to meaning – because it will turn out to be more complicated and in the end maybe more disagreeable than it seems at first glance. It will also mark the first departure of how core-assumptions in psychological reading research are understood or implemented compared to other fields, such as interaction and communication research. The highlighted terms "assess" and "use" stand for different ideas of how one can think about how texts work and will ultimately relate to some notion of "meaning". In the next section, I will review how ideas of "access", "content" and "meaning" are related in contemporary reading research, and how they seem to be understood and practically implemented in that field.

# **PSYCHOLOGICAL RESEARCH ON READING – "CRACKING THE ORTHOGRAPHIC CODE"**

How has reading been conceived in psychology? The earliest systematic investigations of reading probably come from the work of Cattell (1886a,b), who investigated reading on the letter, word, and sentence level using tachistoscopic methods. His research revealed some basic facts about reading that have stood the test of time fairly well, for example that readers can read longer letter strings when these are grouped into real words, as opposed to being random concatenations of letters, and that the latency in sounding out a monosyllabic word is shorter compared to sounding out a single letter. Based on these findings, his conclusion was that reading was a synthetic process, in which a word was read and recognized by a reader as a whole.

These and their own findings prompted Erdmann and Dodge (1898) to formulate a "total shape" theory of reading, describing skilled reading as holistic recognition of words. In particular, they presented evidence that skilled readers that are familiar with a specific vocabulary can identify words as long as 22-letters reliably within very short exposure times of 100 ms (the experiments were conducted in German, where it is possible to compound several words, especially nouns, into a single word). As Scheerer (1981) points out, this prompted one of the first great theoretical debates about the reading process, because Wundt (who had – up to that point – not been particularly interested in empirical reading research), doubted that this was possible. In particular, Wundt (1900) thought that the effective presentation time of words in the tachistoscope was prolonged by after-image effects, and that multiple shifts of attention must have occurred on substrings of these extremely long words in order to successfully read them.

The crux of this debate was, whether reading is basically an analytic process (where local details of a word need to be visually analyzed first in order to successfully read it), or a synthetic process (where the word is read as a whole), and this debate came to dominate the theoretical discourse of reading researchers well into the 1960s (Gibson and Levin, 1975).

Another line of reading research that started at the turn of the century was that of eye-movements during reading (Huey, 1908). Early investigations of eye-movements did not directly address the issue of analytic versus synthetic reading, as it was clear that the visual span around foveal vision during a fixation could easily provide information about longer words, even if they were only fixated once. Hence, the shifts of attention that Wundt proposed were unlikely to reveal themselves as a pattern of multiple fixations within a single word. In any case, this research seemed to corroborate the notion of reading as a discrete, word-by-word identification process, whereby individual words are fixated in sequence, and the duration of a fixation was indicative of the skill of a reader (see Quantz, 1897, for the first investigations of skilled reading using the eye-voice span).

Eventually, the debate about word reading as an analytic versus synthetic process was tried to be settled by the introduction of dual-route models (Coltheart, 1978), which incorporated both processes into a single theory of word reading. The basic idea was the reading of a word could either go through a direct route, where the "total shape" of the word was being directly mapped to its representation in the mental lexicon (synthetic reading), or it could go through an indirect route, were the individual letters of the word needed to be recognized and the phonology of that word was reconstructed though its spelling and could be used to

map the word to its representation in the mental lexicon (analytic reading). Furthermore, the dual route models also incorporated reading speed as a fundamental variable, as it was hypothesized that the direct route would permit faster word reading compared to the indirect route.

Direct access was assumed to be faster, but contingent on the reader's familiarity with the word read (Doctor and Coltheart, 1980). This familiarity effect could be captured by the frequency with which a word appears in a language, as a stand-in for the average memory strength evoked by that word for the average reader. Accordingly, word frequency became a central variable that was important for all well-developed reading models, either as an explanatory principle or as a fact to be explained, no matter their specific architecture (Grainger and Jacobs, 1996; Coltheart et al., 2001; Pollatsek et al., 2003; Engbert et al., 2005). Many more lexical variables that described word properties have been subsequently described in an attempt to find the set of relevant lexical word properties that would allow a reader to "crack the orthographic code" and map the visual features of a word to its internal representation.

In general, theories of reading in psychology have been concerned with this "front end" of the reading process, and tried to describe invariant relationships that would permit a reader to map a word to the mental lexicon. Implicitly or explicitly, it seems as if comprehension of a word has been loosely equated with the success of the mapping of word and representation, where the meaning of the word is stored. This way meaning is explicitly tied to the level of words. The (semantic) content of a word equals its meaning. Of course, the number of definitions of meaning of a word are many, but in the end, meaning is treated as a stable word property, at best locally and incrementally developing and only subject to peripheral context effects. When meaning is accepted to be given, then it makes sense to focus on the processes that lead up to it – hence, the focus on "access" (in a broad sense) of the content of a word – or in other words, the focus on how readers crack the orthographic code of a piece of written language. Under the hand, this perhaps implicit view of meaning in reading is one of a word semantics, where the stable, well-definable building blocks of meaning reside on the word level, and any higher forms of meaning can – one way or the other – be reduced to that level. Meaning of any complexity can be decomposed into its elements, its words.

This is not only reflected in the architecture of theories and models of reading, but also in the experimental procedures that are utilized in reading research: here, the focus on isolated words and sentences was deemed sufficient, as they seemed to encapsulate the essentials of written language comprehension: words contain the basic meanings and lexical constituents of written language, while the syntactical features of a language can be sufficiently tabulated within a single sentence (Wallot and Van Orden, 2011). Following this logic, the investigation of isolated words and sentences was thought to contain the potential to uncover the general rules of written language perception.

Just as a little hint at this point: the idea of "language-use" as it is conceptualized in contemporary philosophy and interaction studies does not easily fit into this picture. If anything, it seems that the most plausible interpretation of the term language use in the contemporary view of reading research would be, that it captures the dynamics of the reading process, which in turn would be (entirely) defined by the sequence of the properties of its wordconstituents.

However, thinking about the concept of "language use"in reading at this point would be premature, as we have not yet answered the question of why this should be interesting or even relevant. In order to do so, we will concern ourselves with the following two questions in the following two sections: (1) How did the outlined research program in reading research fare so far with its focus on the word level? (2) Given that meaning is a central and ultimately necessary ingredient for reading as a phenomenon, are these assumptions of psychological models and theories in line with plausible definitions of meaning that have been much more pondered upon in the philosophy of language?

# **STATE-OF-THE-ART: DO WORD PROPERTIES REVEAL A FUNDAMENTAL LEVEL OF READING?**

The experimental investigation of reading has been heavily focusing on stimuli sets of no more than a few words, with studies that are explicitly aimed at more naturalistic text reading encompassing a handful of sentences at the most (Clifton and Duffy, 2001). And indeed, some basic features of the reading process (such as the fixation-saccade-sequence in reading) suggested that reading is inherently word reading. The process of text reading appears very complex, and many reading researchers feel that quite a reduction of complexity is necessary before systematic investigations of reading are possible. Interestingly, Wundt (1911) also argued that an experimental analysis of complex intellectual functions such as reading demanded such a reduction, but he argued that an experimental investigation of the linguistic processes involved in reading might escape such an experimental analysis, and rather needed non-experimental methodologies (Scheerer, 1981).

Contemporary reading researchers are also aware of this tension. For example, Rayner and Pollatsek (1994) state that "Critics of the information-processing approach often argue that attempts to isolate component processes of reading result in tasks very much unlike reading. (...) Admittedly, [many of] these tasks are unlike reading (p. 8)." However, either due to lack of attractive alternatives or as the expression of an optimistic attitude toward scientific progress, the current sentiment still seems to be: "Suppose we are interested in studying walking. If we study the motor responses that people make when they take two steps, critics may say, "But that's not walking. When you walk you go a long way." True, but are the motor responses any different when you take two steps? Undoubtely not" (again Rayner and Pollatsek, 1994, p. 8).

Whether one agrees or not, there has certainly been good reason for this linguistic sparcity in reading research, because the basic lexical and syntactical features already show complicated interaction effects with each other. For example, a reaction time recorded by a key-press to read the word "pepper" is on average faster if the word "pepper" is preceded by a semantically related word, such as "salt," compared to a control condition where "pepper" is preceded by an unrelated word (Neely et al., 1998). However, if "salt" is presented twice in succession, just before "pepper" appears, this facilitative effect vanishes.

All simple reading tasks reveal such complicated patterns of interactions among the factors that are studied in reading research (Van Orden et al., 2001; Pickering et al., 2013), and these interactions are not just limited to the scale of word reading: while, for example, Kintsch and Keenan (1973) found that reading times increase linearly with the syntactic complexity of a sentence, more recent research by Keller et al. (2001) found that this effect is actually dependent on the lexical features of the constituent words of a sentence (i.e., word frequency).

This cursory example might not appear troublesome on its own. However, they are no exceptions to the rule, but are symptomatic for the current state-of-affairs in reading research: Van Orden and Kloos (2005) provide an in-depth discussion of the complicated interaction effects observed in single-word reading research on the role of phonology in reading (see also Van Orden et al., 2001). Intuitively, phonology seems a potentially important aspect of reading, because the development of reading usually follows the development of speech, and because the majority of writing systems incorporate some form of phonological coding into their orthography. Accordingly, as has been described above, phonology is assumed to be an important mediator in the reading process, for example described in the indirect route in dual-process theories (Coltheart, 1978).

In their review, Van Orden and Kloos (2005) argue that effects of phonology in the reading process are fundamentally contingent on the task demands within a specific reading task that is employed to investigate reading, and that this "task condingent evidence, instead of settling the debate [about the role of phonology in reading] simply fuels it (p. 63)."

One example area thatVan Orden and Kloos (2005) discuss and that illustrates the pervasiveness of complex interaction effects in reading is the case of homophone errors: homophones are words that sound like another word when spoken, but differ in spelling, which creates conflicting responses from readers – for example in a categorization task, when the words "break" versus "brake" should be categorized as "part of a car." Skilled readers make homophone errors irrespective of their familiarity, which seemed to falsify the dual process theory where highly familiar words should be recognized via the direct access route that does not incorporate mediating phonology, and it could be concluded that phonology does not matter. However, when the breadth of the category in a categorization task is changed, then readers make more homophone errors for unfamiliar compared to familiar words (Jared and Seidenberg, 1991) indicating that phonology does matter differently for both routes. Whether homophone effects appear in a reading task also depends on other task aspects, such as the general difficulty of the task (Lukatela and Turvey, 1994) or reader's skill (Unsworth and Pexman, 2003).

As Van Orden and Kloos (2005) conclude, the problem is that reading reveals itself as an ultra-sensitive phenomenon where different aspects of tasks and readers depend on each other. Furthermore, the addition of further factors will rather complicate matters: as the number of factors that are incorporated into the design of reading tasks increases, so does the number of interactions among those factors. Hence, the exquisitely complicated relations between the different factors in reading can never be stably pinned down (at least so far it has not been). Paradoxically, the attempt to identify and isolate the mechanisms that serve as building blocks of the reading process rather gives rise to the conclusion that given circumstance, everything matters. Without being able to pinpoint reliably what matters when, one effectively confronts a situation where "everything is dependent on everything else."

Regarding the case of phonology in reading, this means that the question cannot be settled. Ideally, laboratory tasks would reveal a robust role for phonology throughout, but given that they don't, the question has not found a satisfactory answer and creates circumstances in which scientist are not so much informed by the evidence, but rather have to choose which evidence should be given priority (Van Orden et al., 2001). Similar problems have also appeared in the study of reading disorders, where dyslexic readers were found to deviate from normal readers in innumerable aspects of cognitive measures, such as basic perceptual ability, working memory, attention, or temporal processing, but at the same time, none of these measures by themselves provide a sufficient criterion for the diagnosis of dyslexia (Wijnants et al., 2012). This persistent non-convergence of findings from laboratory research on reading has been noticed long since (Van Orden et al., 2003), and reflected in critical assertions that a general theory of reading is nowhere in sight despite a more than 100-year research effort (Rayner, 1998; Rayner and Reichle, 2010).

However, it is important to point out clearly one more time that this state of affairs is already observed in carefully controlled experimental laboratory tasks of reading, and no research program was ever conducted to systematically investigate whether these tasks capture a process that is anywhere akin to naturalistic reading. One of the challenges for reading research in the near future will be to weed-out which of the tasks that were used in laboratory settings to study reading actually generalize to more naturalistic reading situations – and which ones are really only confined to a laboratory life of their own (Hunt and Vipond, 1991; McNerney et al., 2011; Wallot et al., 2013). Three interesting aspects of reading have come to the light of day that seem to question whether reading research has placed its bets on good assumptions: (1) The role of idiosyncrasies in reading, (2) the generalization of research findings across different languages, and (3) the generalization of results of laboratory research to more naturalistic text reading situations.

Idiosyncrasies in reading behavior are long-since known in psychology (Rayner, 1998). What is understood to be an idiosyncratic process does not even lie in the realm of individual interpretations of a text or the like. Rather, what is meant are systematic quantitative or qualitative differences in measures of the reading process between individuals, such as which particular passages in a text evokes emotional responses during reading, how readers move their eyes across a text (for example with few, long fixations versus many short fixations during reading), how often or not they re-read passages of a text, or simply how they differ in reading speed (Miall and Kuiken, 1994; Rayner and Pollatsek, 1994). Even the latter, simple measure can reveal astonishing differences. For example, in a study of text reading of mine (Wallot et al., 2013), participants read simple fictional prose. None of the participants knew the text beforehand, and all were college students, literate

native readers in the language the text was presented in. All participants had to answer a comprehension questionnaire after reading. On average, reading of the text took at about one hour. However, it took the slowest reader almost 2.5 h to read the text, while the fastest reader went through it in a little more than 17 min. Yet, both of them were perfectly able to answer the administered comprehension questionnaire (i.e., answering all questions correctly). However, current theories of reading are inherently theories of the average reader. Neither quantitative (such as reading speed), nor qualitative (such eye-movement patterns) idiosyncratic reader differences find any deep consideration in the well-developed theories and models of reading (Grainger and Jacobs, 1996; Coltheart et al., 2001; Pollatsek et al., 2003; Engbert et al., 2005). Moreover, these idiosyncrasies can usually only be observed in somewhat naturalistic or at least complex reading tasks that boast at least a little bit of degrees of freedom for the reader, such as connected text with an overarching or emergent meaning (Hunt and Vipond, 1991; Sikora et al., 2011). However, reading tasks – like most of the tasks utilized in experimental psychology – are explicitly designed to minimize idiosyncratic behavior as much as possible, and in so far as idiosyncrasies make for differences in the reading process or between reading outcomes, they show up as an error term in the experimental study of reading. Current research does not seem to have the conceptual tools to deal with strong idiosyncratic processes in reading, because it requires that the reading process is fundamentally the same across people in the details of how it works, and individual differences are often seen as obstacles that are in the way of such a kind of understanding (Lupker et al., 1997).

While the case of idiosyncratic reading behavior hints at the limits of the current framework that searches for context-free mechanisms, this framework has been put more directly to the test in cross-linguistic investigations of reading: a recent debate has sparked around reading universals, that is, around the aspects of the reading process that are invariant across languages and reading situations (Frost, 2012). After all, if a well-definable, context-independent cognitive architecture supports reading as a cognitive activity, then these building blocks should be the same no matter the language. This also follows from our general consideration, that reading has evolved as a means for human communication via written texts, and that no matter the specific details of a writing system, all these systems are powerful enough to express the same ideas and ideas of the same degree of complexity. It has turned out that what were thought to be basic building blocks of the reading process, such as letter position invariance within a word, do not occur the same way across languages. For example, new research showed that the reading process is relatively robust to the scrambling of the letter positions in a word, pointing to a fundamental property of the mental (and neurophysiological) processes during reading (Frost, 2012). However, it was subsequently shown that this is mostly a phenomenon of European languages, but does not pertain to other languages such as Hebrew, where letter position is of great importance (Velan and Frost, 2007; Velan et al., 2013) – and we want, for now, to cast aside the question of what a concept such as letter position would mean for logosyllabic languages, such as Mandarin Chinese.

Another strong motivation for a reconsideration of the current framework in reading research comesfrom afew recent studies that have investigated in how far the putatively basic constituents of the reading process that have been identified in laboratory research actually apply to more naturalistic reading, that is, reading of connected texts. To that end, lexical variables such as word frequency during reading of connected text of several 100 or 1000 words were investigated. As we have discussed above, the word frequency effect is quite central in contemporary reading research, and states that words that occur more often in a language (i.e., possess a higher word frequency) are read faster. Word frequency is thought to capture a mapping between the visual appearance of a word and the associated memory strength for that word, again resting on the assumption of a word-level-semantics where the meaning of a word can be defined as a stable and elementary property in written language (Coltheart et al., 2001).

In two studies using different text and reader populations (Wallot et al., 2013; Wallot et al., accepted), it turned out that lexical variables such as word frequency explain only around a 0.001 to 1.0% of the observed variance in reading times (compared to 10–25% that are commonly observed in experimental studies on reading). So more is different, and taking two steps in a row might after all not be so similar to a days march – at least in the case of reading words and texts.

Other variables of principle theories of reading in psychology have not fared better when applied in the context of text reading (such as situation model variables that capture central aspects of sentence-level reading – McNerney et al., 2011). This strongly suggest, that if one wants to build a theory of reading, then perhaps other avenues have to be pursued – or as the authors of another study that investigated the transfer for psychological theories to connected text reading put it: "(...) we suggest, perhaps not surprisingly, that there is continued room for theoretical development to better capture the qualities of language that influence the ease with which it is understood" (McNerney et al., 2011).

Of course, the question is, where to look for new room? Maybe the right components of the reading process have simply not yet been found, and another lexical or sub-lexical feature will eventually solve the current problems. Alternatively, the very kind of stability that is expected of language and texts in reading research and the very basic assumptions of what reading really is might have to be reconsidered. In the next section, we want to pursue the second route, trying to evaluate the plausibility of some of the principle assumptions about reading that have been made in psychological reading research.

## **DO WORDS HAVE MEANING?**

As laid out before, contemporary psychology of reading views text as a decomposable communication, decomposable on the word level. That is, that stable word properties exist, and understanding of written text is first and foremost a decoding problem. This view of texts finds its complement in the component-processes in perception (e.g., word reading times, fixations, pronunciation times) and cognitive architecture (e.g., the mental lexicon) that mirror and match the supposed word-level structure of the text and vice versa (Wallot and Kelty-Stephen, 2014). As I noted above, the matching of a visual input of a word to its representation seems to satisfy the act of comprehending that word in contemporary reading research (a view that even seems to be shared by critics of the contemporary account – cf. Van Orden and Kloos, 2005). However, the lexical features that provide the informational basis for this mapping of visual input to representation are fairly static (within the human life-span), and for them to be of any value in this decoding process, the other end, that is the meaning of a word, needs to be similarly stable and static as well. The question is, whether such a view of meaning in written language is plausible – or asked differently: how can we say that words have meaning?

Before discussing the problem, a note is in order: clearly, if meaning is central to reading, then a definition of meaning is necessary in order to make headway toward a thorough theory of reading. My intuition is that a viable definition of meaning would need to go beyond language, and would need to have a wider basis in the interactions between organism and their environment (e.g., Turvey and Carello, 1981), also given that this has once been at the origin of reading and writing (Nissen, 1993). The problem of defining meaning is nothing that I will address in the current article, however, but it is also not necessary for the argument I want to make: what I want to examine is merely the optionspace that one has to provide a workable definition of meaning (for written language), and whether this option space includes the possibility of defining meaning in terms of elemental features of words, or put differently: irrespectively of the current state of reading research, and even if one could simply wish for the findings that reading experiments would produce, would it make sense to define meaning as a stable property of words, which is required if one wants to explicate a theory of reading that is driven by objective features of written language.

As we have laid out, reading serves a communicative process and hence, reading ultimately needs to be about meaning. Furthermore, as I have tried to show, the psychology of reading conceives its basic constituents of a text and the perceptual and mental processes that act upon it mainly on the word level, and since these constituents need to be stable and context independent, this needs to be the level where meaning resides, encapsulated in words. There are several instantiations of how word meanings can be conceptualized in the different reading models, but the different versions are equivalent in that they seem to assume a locally definable meaning that will at best incrementally change in iterative learning processes1. The dominant ideas are that in a first step, there needs to be an associative process that relates visual word properties to some inner memory trace, such as provided by conceptual or neural networks. This way, the visual features that make up a word are connected to the word's content (= meaning).

Several organizations of meaning of words have been proposed: either in the form of a mental lexicon, that possesses entries that are elemental meaning (i.e., the word "w1" has meaning "m1") or definitions (i.e., the word "w1" possesses the definition "[w2, w3,

<sup>1</sup>Just as a side note, one has to say that it is not the case that psychological investigations did not also find evidence that word meaning is more flexible than that – or of changes within a single "learning instance" or less – however, these findings have persistently remained outside of the scope of the dominant and well-developed theories of reading in psychology.

w4]"). The definition can be a well-defined set of words (such as in strict views of the mental lexicon) or again some form of an associative network or matrix (where weights potentially connect one word to all other words, for example reflected in higherdimensional theories of language such as HAL or LSA (Lund and Burgess, 1996; Landauer et al., 1998). Either way, the end product needs to yield a stable word content in order to lawfully connect the lexical word features to a word's meaning.

Hence, the first question we need to ask is whether written language is decomposable into elemental meanings (on the word level)? The question of meaning has not been discussed in abundance among psychological researchers (Schvaneveldt, 2004). However, one can find occasionally a reference to Frege's work, citing his axiom of composability (Bußmann, 1996) that states that the (literal) meaning of a sentence is composed of the meaning of its constituent words and their syntactical arrangement in the sentence. However, on a closer look, Frege's thought on language is a little more complex than his axiom of composability suggests, for Frege complemented his axiom of composability with the axiom of context dependence, according to which a word cannot be defined or understood without knowing the sentence in which it is embedded in. At first glance, these two axioms seem contradictory. However, this contradiction can be resolved if one does not insist on the priority of the axiom of composability: if one knows the sentence in which words are embedded in, one can provide the meaning or definition for those words as they are used in the sentence. This seems an acceptable statement, and is one that could in principle be formalized, for example in impredicative logic (Aczel, 1988), and hence this resolution might even suit itself for a formal description of meaning in text2. The problem is, however, that this solution also does away with the word level as the basic level of the text and a basic level for the reading process – and with words as the carrier of a stable, well-definable and elemental meaning.

Another philosopher of language that was concerned with an elemental, stable level of meaning was Wittgenstein in his early work (Wittgenstein, 1922/1983). He proposed the idea of elemental sentences that form stable units of meaning (i.e., basic facts). However, as the term "elemental sentence" already conveys, the elements of meaning here are rather situated on the level of statements, incorporating relations between words in a sentence – not as a kind of elemental meanings in the sense of a word-tokensemantics, but by virtue of a second step that relates the words to each other by virtue of their membership in that sentence. A word can only be considered as"elemental sentences"under certain circumstances, and, as Wittgenstein reasoned, the elemental sentences cannot be further reduced to their constituents (=words) and still be meaningful. This is due to the symbolic quality of words, that makes them arbitrary units (a view of language with which many reading researchers would agree). They are mere replacement characters, variables in a logical relationship, and

can as such not be meaningful, because an arbitrary symbol does by definition not have a specific meaning. Accordingly, arbitrary symbols that stand in some relationship to each other will also not be meaningful.

This insight is a major problem for the conception of meaning in many reading theories: what all their conceptions of meaning have in common are not only that they are elemental, usually on the word level, but also that they are situated within a closed symbolic system: meaning is defined within a closed symbolic system either as an elemental relation (the meaning of word "w1" = "m1"), or as a composed definition (the meaning of word "w1" = "w2, w3, w4"). And as Wittgenstein also points out, a definition of a word by mere means of other words completes in the end a perfectly tautological cycle devoid of meaning. Substituting explicit definition by associations will also not solve the problem, for as Høffding already pointed out for the case of perception, associations alone cannot do any work, cannot create intelligence (Calkins, 1896) – or meaning. How to solve the problem?

## **USING LANGUAGE**

We left off with Wittgenstein's description of the problem that meaning cannot be gotten within an encapsulated system of logical relations among symbolic constituents. How did Wittgenstein solve the problem? In his early work, Wittgenstein postulated a so-called picture theory of meaning (Wittgenstein, 1922/1983), stating that language is only in so far meaningful, as it refers to a fact, states-of-affairs in the world. With regards to the word-semantics view of meaning that seems to characterize contemporary reading research, the picture theory of meaning provides a twofold extension: first, it upscales the level of stability from the word-meaning-level to the level of statements. This is necessary, because in order to refer to states-of-affairs in the world, single words will usually not suffice, but a set of words and their relations to each other are necessary to provide sufficient reference. Second, it brings in meaning as a property that is co-determined by an environment outside of language, outside of a text, not within it. The interaction with the world is now a necessary precondition for meaning in language, a view that is also held by for example (Gibsonian) ecological psychology (Turvey and Carello, 1985). Still, there are also similarities: after all, Wittgenstein's early thinking revolved around elemental sentences that describe states-of-affairs in the world as facts – basic facts, that are stable. And as soon as a proper reference (i.e., a proper sentence) has been constructed to specifically refer to a fact, it provides a meaningful building block. Hence, one could now search to operationalize elemental sentences (instead of elemental word properties) for a theory of reading. However, the idea of a basic meaning on the level of elemental sentences has received two blows. One with the general demise of logical positivism, more specifically the problems of verification (Popper, 2005), complicating the idea of a fact as a basic and stably describable property of the outside world. Another one with the development of Wittgenstein's own thoughts about language in his late work.

In the Philosophical Investigations (Wittgenstein, 1953/2010), Wittgenstein expands and in parts revises his earlier positions. In further exploring the role of linguistic and non-linguistic context

<sup>2</sup>For the sake of clarity, it should be noted here that such a kind of formalization would not lead to a simple primacy of one axiom over another, or the sentence level over the word level, but would rather yield a mutual relationship between the two, but a relationship where one level cannot be reduced to the other.

in language understanding, he arrives at a more complicated picture of meaning, that is not even stable on the level of elemental sentences, but is inherently dependent on context. Wittgenstein shows that there is a great number of contextual layers above the word or sentence level that need to be taken into account in order to know what a word or a sentence means (such as the larger set of utterances or paragraphs a sentence is embedded in, the shared individual and cultural history between interlocutors or readers, authors and texts, or the affordances of the actual and remote environment in which communication takes place). This now removes us very far from ideas of a general kind of meaning that could be encapsulated with small bits of written language, and moves us more into the realm of communication science and social interaction, that have picked up language use as a fundamental idea in communication. In a somewhat negative wording of the concept, one cannot get the meaning of a word or statement with knowing how it is used in a particular instance.

In communication research, the idea of language use refers to a great many different ways in which contextual constraints are effective or are utilized by interlocutors to arrive at a meaningful exchange. Some examples are the establishment of common ground, patterns of turn-taking during conversation, non-verbal clues and gestures, or linguistic alignment during conversation (Clark, 1996; Pickering and Garrod, 2004). These concepts are inherently relational and are not so much based on meaningful elements in language, but rather establish meaning by virtue of arranging and re-arranging the elements. Their relational quality is absolute in that if one removes from or exchanges one interlocutor or some relevant environmental feature in a conversation, they cannot be defined anymore in the same way or change their meaning.

Arriving this way at language use as a fundamental aspect of sense making casts up two problems for the current discussion of meaning: first, all these aspects of language use have been described for communication during online interaction, which is at the surface not quite how reading looks like – at least one needs to motivate an analogous way of conceiving the reading process. Second, among all the many aspects of language use in social interaction, how can we find a feature of language use that seems readily applicable to the reading situation and can be thought of as a fundamental aspect of language use in reading? Regarding the first question, i.e., how to motivate the analogy to language, we can ask whether there are certain similarities between the process of online communication (such as dialog) and the process of reading, that put the two in the same ball park.

Even though these proposals have largely been put forward with more interactive situations in mind, there seem to be some basic aspects that the two share: just as in a conversation where what is said and understood depends on the intentions of the interlocutors, reading depends on the intentions of the reader. This has consequences for how reading unfolds over time, behaviorally and emotionally, and what is remembered afterward from a text (Hunt and Vipond, 1991). Furthermore, reading depends on the assumed intentions of the author. The intentions of authors have also been carefully studied to make sense of written language across centuries of exegesis and are necessary in order to understand the meaning of ironic, satirical or metaphoric statements, which cannot be understood from their linguistic surface structure alone (Gibbs, 2002).

Similarly, it has been pointed out that just as "The possibility of language, thought, and interpretation depends on the triangular situation which relates speaker and listener, and both to a shared object in the public world which they can observe together, and to which they can observe each other's responses. Such a triangular situation exists in literature. Interpretations of a text will vary from person to person, culture to culture, and century to century. However, it does not follow that a text means whatever its readers take it to mean, since disagreements about the meaning of a text are only possible against a shared basis of agreement" (Davidson, 1993), highlighting how the process of reading as a communicative process is related to other forms of communication, such as conversation, and how intersubjective contexts necessarily factor into the reading situation.

Turning back to contemporary reading research, these assertions bring up the interesting question of how one should judge the experimental situations that are dominantly used to study reading from this perspective? As most reading tasks feature reading of isolated, random letter strings, or a few sentences at the most, many of the aspects that are considered necessary pre-condition for the reading process in literary studies and the philosophy of languages are virtually absent in the empirical investigations of reading.

These doubts set aside, there are not only interesting conceptual commonalities between reading and conversation as communicative processes, but they also share similarities with respect to their dynamic structure, with respect to basic patterns of behavior that can be observed in both: both exhibit kinds of feedback loops in behavior, such as in conversations where interlocutors go back and forth to clarify terminology and topic, until common ground is established (Clark, 1996). In reading, similar feed-back loops are evident to secure proper understanding of a text, such as rereading previous text passages (Rayner et al., 2006), together with reflective thought processes which now substitute for communicative exchange. Similarly, when understanding is jeopardized or a new topic is introduced, one observes disruptions of otherwise rather "smoothly" proceeding processes in conversation as well as in reading. As we will see in the next two sections, such dynamic aspects of the reading process will be important to provide a measure of basic aspects of language use that can be employed in reading research. But first we will need to find a core concept of language use that motivates a particular operational definition that can be employed in reading research, and that provides an interpretational dimension for its measures. In the next section, I will briefly introduce Wittgenstein's concept of "language games," and show how it captures a basic aspect of language use that can be applied in the context of reading.

# **LANGUAGE GAMES: A FUNDAMENTAL ASPECT OF LANGUAGE USE**

Wittgenstein introduces "language games" in his Philosophical Investigations as a concept that holds together the great diversity of language activities that can be observed. After describing the sensitivity of meaning in language to the various contextual constraints that one can identify, Wittgenstein reasoned that natural language-use is not governed by a general process that underlies all of those instances, but that it is inherently dependent on historical, social and other contextual factors (Wittgenstein, 1953/2010). Hence, natural language is not a homogenous category that can be defined by a small set of general language-rules or languageelements that hold across all contexts, but it is rather composed of different classes of language use, for which Wittgenstein coined the term "language game." In analogy to real games, Wittgenstein pointed out that language games possess rules according to which language is used within each game, but that different games differ in terms of the rules that govern use. Furthermore, the rules observed in each game are emergent. That is, even though they seem to have real causal power within the same family of language games, they do not point to any fundamental principles of how language generally works. The rules do not transcend the boundaries of the particular language game within which they are observed.

Again, when reading the Philosophical Investigations, it seems clear that Wittgenstein had more social, dialogical situations in mind, than a person reading a book. However, as we have briefly discussed in the previous section, reading, and conversation share a good deal of conceptual and behavioral similarity, to warrant an analogy between language games and "reading games." If we try to use the idea of language games for reading, then we are interested in how this concept relates to the current state of reading research and its challenges. Furthermore, of course, we are also interested in how it can be utilized for the empirical investigation of reading.

Regarding the question of how reading games tie in with the state-of-affairs in reading research that was summarized at the beginning of this manuscript, the crucial point to take from the application of language games to reading is that the differences between two reading games (that is, reading situations such as reading silently or aloud, reading in English or reading in Hebrew, or reading prose or poetry), can be both, quantitative and qualitative: reading games that belong to the same family exhibit similar rules, while reading games that belong to different families abide by potentially completely different sets of rules (Wittgenstein, 1953/2010).This makes immediately understandable why the effect-landscape observed in current reading research is so heterogeneous: contextual variations and experimental manipulations do sometimes not just constitute mere quantitative changes in the manipulated factor, but can effectively constitute a change from one type of reading game into another, changing reading qualitatively. However, in the absence of a definition of the boundary conditions within which a particular reading game is stable and only quantitative variations occur, a particular experimental variation that looks rather moderate from the perspective of the researcher (such as reading a word silently or aloud – Forster and Chambers, 1973) can tip a reading game, not just changing a particular aspect of that game, but turn it into a new game that works according to entirely different rules altogether. If one buys into reading games as a fundamental concept, this explains why reading research is so diverse, and scientists keep being surprised by entirely unexpected context effects, for instance the relative insensitivity of the reading process to letter-position in some languages (Frost, 2012) or the continued interaction between task-aspects and reading performance that stress or suppress the role of phonology in reading (Van Orden and Kloos, 2005).

Also, I have stressed the tension between single-word and connected text reading. However, there are of course instances were naturalistic reading is reading of only one word, and the concept of reading games gives a proper role to the case of single-word reading in naturalistic settings: even though the case of text-reading seems to be the standard that reading research aims at explaining, everyday life is of course full of examples where only one or two words convey information. For example imagine somebody sitting in a restaurant and starting to feel the sudden urge to visit the restroom. Of course, this person will get up, look for a sign saying "restroom" (or "toilet," or "WC"), and have that sign guide their searching behavior. The sign means then "toilet," or perhaps "place where on can relieve oneself in private." However, the understanding of the word is a function seminar to the triadic relation that Davidson (1993) described, where the reader needs to have a certain intention with regard to the word "restroom," the word needs to be presented in a particular context, and the intention of the (proximate) author, in our case perhaps the restaurant owner, to provide guidance for her customers. Imagine the person seeking for a restroom sees the word "restroom" as part of an advertisement for American Standard water closets. This will surely evoke a different behavior and understanding of that word.

Furthermore, one must not forget that investigating how a person understands a word for scientific psychology means observe behavior in response to the word, and different contexts ("restroom" in the context of restroom and "restroom" in the context of advertisement) will elicit very different behaviors. While this is intuitive for understanding in these everyday situations, one has to wonder what participants in psychological laboratory tasks understand when they read random word lists on a computer screen – what are the intentions of the participants with regard to/regarding the text stimuli, and what are the intentions of the author that factor into reading here?

Regarding the question of how the concept of reading games can be used for the empirical investigations of reading, the crucial point to take away is that what is unifying across reading games is not the presence of a particular set of rules that applies throughout all contexts (such as that high frequent words that occur often in a language are read faster compared to low frequent words), but that reading games are always rule-abiding, exerting a structuring effect on reading behavior (such a set of locomotion patterns for somebody looking for a restroom in response to the sign "restroom"). This rule-abiding aspect of reading games can serve as a new fundament for reading research and solve the outlined problems – i.e., what is the common core across systematic idiosyncrasies in reading behavior between different readers, what is common across reading in different languages and situations, and how one can define the (text-)reading process in the absence of a strong and stable relationship between surface properties of the text (such as lexical word features) and reading behavior.

Regarding reader idiosyncratic differences in reading behavior between readers, the reading game conception would allow us relax the degree of detail that we need to explicitly address, for example when investigating the question whether two readers that read the same text but in very different ways (i.e., many short saccades and fixations with frequent regression compared to few long fixations and saccades with few regressive eye-movements) possess a similar or different degree of aptitude in reading or comprehension of a text. From the reading game perspective, we could hypothesize that the better a reader abides by the rules of a reading game, the better she is able to utilize (con-)textual information and thus the better of a reader she is. That is, no matter what form the specific rules of a reading game take, the more one abides by the rules, the better the game is played – no matter what specific behavioral pattern a reader exhibits during reading, the key-question is to what extent this patter reflects a rule-abiding reading process or not.

We can make a similar argument for the other two cases, reading in different languages and text reading: reading in two languages might exhibit differences in how readers utilize certain word features (e.g., English vs. Hebrew), and some writing systems might even exhibit word features that nothing others do not even possess (e.g., English vs. Mandarin), but proficient reading should always be a structured, rule-abiding activity, as there will be some systematic aspects in the relation between reader and text within each writing system. Similarly, for text reading, we would require that a reader who reads and understands a text exhibits some form of structured behavior during reading, even if this structure cannot be pinned down to specific features of the text in a general manner.

Furthermore, the concept of rule-abidingness in reading games might be used to empirically sort-out the boundary conditions up to which the same reading rules apply (e.g., that high frequent words are read faster), or at which they change (e.g., that high-frequent words are read substantially faster in isolated word reading tasks, but not during connected text reading – Wallot et al., 2013). This can be done because reading games provide us with a bottom-up definition of the boundary conditions between any two qualitatively different reading contexts: when a reader moves from one reading game to another that differs in rules, then this will lead to a disruption of rule-abiding behavior at the transition-point between the two games, as the established rules of the first game are broken while the new rules of the second game are still being established. In contrast, if a reader moves between two reading games that share the same set of rules, the degree of rule-abidingness will remain stable at the transitionpoint between games. However, in order to test such a hypothesis, one needs an operational definition of a reading game to measure such effects in empirical data. In the next and last section of this manuscript, I consider possible statistical operationalizations of the reading game concept, and review some preliminary evidence of the utility of the concept.

## **READING GAMES: POSSIBLE OPERATIONALIZATIONS**

Following the conceptual clarification, what is needed is an operationalization of the concept of reading games, or to be more precise, the degree of rule-abidingness in measures of reading. Here, an important note is in order: when thinking of rule-abiding reading behavior, it is not implied that the rules are consciously understood or explicitly followed by the reader. What is rather meant, is that measures of reading behavior in a particular context are not random, but follow systematic patterns that can be formulated as a rule by an observer, such as "the higher the frequency of a word, the faster that word is read by a reader."

The conception of rule-abidingness in a reading game that is presented here is in some respect very similar to the standard assumptions that go into current theories of reading: if a reader aptly reads a text (or word), comprehends it (sufficiently) and acts in accordance with it (for example by moving their gaze further along a text, or opening the door that leads to a restroom as opposed to out into the kitchen), this implies that the text constraints the reader's (reading) behavior, and that the reading behavior that can be measured is somehow coupled to the text. However, if the reading game analogy holds, then it will – for many cases – be problematic or impossible to formulate the other side of the equation in a general manner, that is, to what aspects of the text the reader's behavior is coupled to and in what way.

As I have discussed, simple reading tasks that allow next to zero variation on the side of the reader and carefully try to investigate only one factor of reading at a time already fail to yield a stable pattern of general mechanisms that guide the reading process. One should not expect to fare any better as the complexity of the stimulus material and the degrees of freedom on the side of the reader are scaled up. Hence, quantifying the degree of order in reading behavior is an attempt to define the degree of coupling between text and reader when access to only one of the two is possible.

Hence, we seek a measure of rule-abidingness that tells us how structured a particular measure of reading (e.g., eye-movements or prosody of a voice record during reading) is without having to specify where the structure comes from in detail. Such measures can be taken from the toolbox of statistical physics (such as permutation entropy analysis, Bandt and Pompe, 2002; cross-convergent mapping, Sugihara et al., 2012; recurrence quantification analysis, Webber and Zbilut, 2005; or fractal characteristics, Wallot et al., 2012). These methods provide measures of the degree of temporal structure and predictability in time-series, and could lend themselves to an operational definition of rule-abidingness of language games, because they extract and quantify the degree of temporal structure without the need to define the rules *a priori*. Furthermore, they are non-linear methods that are able to detect rules in time-series that do not follow any simple, obvious patterns, such as chaotic time-series that appear random, but are in fact deterministic (Zbilut et al., 1998). This is important because the crux with strong unexpected context effects is that when they occur, the rules of the new context are not yet well understood, and can thus not be easily formulated on the grounds of the rules observed in previous contexts. However, as has been described above, by simply quantifying the degree of temporal structure, we are able to distinguish between different reading games, also in the absence of more detailed knowledge.

The example in **Figure 1** illustrates this point by showing how such a measure of temporal structure can be used to capture transitions between two qualitatively different behaviors. The data was generated by the Lorenz system, an equation system that consists of three coupled differential equations (**Figure 1A**). Depending on the parameterization of the system, it is capable of exhibiting different types of dynamics. When the parameters are changed accordingly, the system transitions from one type of behavior (a stable fix point) to another (oscillating behavior). This is also evident in a one-dimensional "measurement" of the system

**FIGURE 1 | (A)** 3D illustration of a switching between two attractors (i.e., two quantitatively different types of behavior) in the Lorenz system, a coupled differential equation system consisting of three equations. When going through a phase-transition (i.e., moving from one attractor to another), the system does not show a smooth or instantaneous transition between the two attractor-states, but produces a transition period with major displacement. **(B)** Time-series of a single dimension of the Lorenz system that shows the behavior in the first attractor, the behavior in second attractor, and the transition phase. The behavior within each attractor looks very different, and the transition phase is marked by a period of increased fluctuation. **(C)** Recurrence plot (RP) of the time-series in **(B)**. Recurrence plots are 2-dimensional representations of a time-series where time moves from the lower-left part of the plot to the upper-right part along the diagonal of the matrix. Dark areas in the plot indicate a high

(**Figure 1B**), which could be thought of as analogous to a timeseries of word reading time during text reading, for example. To quantify the degree of temporal structure, the one-dimensional time-series can be represented as a recurrence plot that shows the degree of structure within that time-series (**Figure 1C**). From the recurrence plot, one can now derive statistics of temporal structure (Webber and Zbilut, 2005) in the original time-series (**Figure 1D**). As can be seen, both types of behavior exhibit a high degree of temporal structure, but the transition point between them is marked by a brief loss in that structure, indicating a change from one behavior to the other. This example does not only illustrate how a change in temporal structure can be detected, it also highlights another important point about the reading game concept: as mentioned earlier, Wittgenstein described that the rules observed in language-use dot not point to the foundations of language, but are themselves emergent features of language-use-in-context. Similarly, the Lorenz system can exhibit oscillatory behavior, but the fact that oscillations are observed is not in a straight-forward way informative about its architecture, as it, for example, does not include a sinusoidal function. When the behavioral rules are degree of temporal structure in the behavior of the time-series. White areas represent the absence of temporal structure in the time-series. The RP is similar to an autocorrelation plot, where time at lag0 runs along the diagonal. As one moves away from the diagonal toward the upper-left or lower-right part of the plot, one sees time-lagged behavior. Hence, the plot shows that the initial behavior (i.e., behavior in attractor 1) is highly structured, indicated by the dark area in the lower-left. Similarly, behavior in attractor 2 is highly structured, indicated by the striped area in the upper-right. However, the transition period between the two attractors is marked by a brief absence of structure. **(D)** Illustration of an RP-based measure of structuredness (%Determinism) of the time-series in **(B)**. For both attractors, 1 and 2, the time-series possesses a high degree of temporal structure, but the transition between both attractors is marked by a loss of structure, indicated by the dip in %Determinism.

emergent, such as the oscillations in the Lorenz system, and the system moves from one type of behavior to another, then the transition is unsmooth, creating an abrupt drop in the structuredness of behavior (Kelso, 1995). Hence, if the rules that govern reading behavior are similarly emergent, as the concept of reading games holds, then such transitions will necessarily occur when shifting between two qualitatively different reading contexts, predicting the formation of a new reading game.

To illustrate the effect for reading, I collected a set of pilot data to provide a proof-of-concept of the reading game proposal, namely that the degree of temporal structure in reading can provide a bottom-up definition of the boundaries between two reading games. In a self-paced reading task, participants read a text of 1099 words. The first half of the text was randomized, effectively providing an individual word reading task that is used in most reading studies, while the second half of the text was left intact. The time-series of reading times is displayed in a recurrence plot (as in **Figure 1**), which is used to compute temporal structure within a reading time-series. As can be seen in **Figure 2**, individual word reading and text reading appear as two qualitatively different

tasks: while each of them shows a specific global reading pattern, there is basically no overlap between those patterns, as indicated by the white spaces to the upper-left and lower-right off the main diagonal. Furthermore, this distinction also predicts a "change in rules" between the two tasks: while word frequency plays a substantial role in individual word reading (*R*<sup>2</sup> <sup>=</sup> 0.046; *<sup>p</sup>* <sup>&</sup>lt; 0.001), this decreases to a marginal effect in text reading (*R*<sup>2</sup> <sup>=</sup> 0.006; *p* = 0.061).

Furthermore, in a recent set of studies, we utilized recurrence quantification analysis on text reading data to assess reading performance of children and adults and for the prediction of text comprehension: in one study (O'Brien et al., 2014), children (2nd, 4th, and 6th graders) and adults read a simple children's story silently or aloud in a self-paced manner. That is, participants always pressed a button to reveal each new word of the story, read the word, and pressed the button again to reveal the next word of the text. Hence, the intervals between two consecutive button presses estimated the reading time of that word (Just et al., 1982). It was found that recurrence measures that quantify the temporal structure in reading times increase with age and distinguish better between readers of different age than reading rate. In an investigation of reading process predictors of text comprehension (Wallot et al., accepted), we found that the degree of temporal structure of reading times turned out to be a good predictor of text comprehension in both, silent and oral reading, and again better than reading speed. A third study on the effect of repeated reading also found that repeated text reading, which is thought to increase reading fluency for that text, led to increases in temporal structure of reading times for less skilled readers, even though the pattern of effects was not as clear

as in O'Brien et al. (2014) or Wallot et al. (accepted). Generally, these results fit with the reading game conception, where ruleabidingness – as measured by the degree of temporal structure in reading times – indicates mastery of a reading game and thus should relate positively to reading skill and text comprehension. Moreover, these findings tie in with new research on conversation during dyadic interaction, where the degree of temporal shared structure in utterances between interlocutors positively correlated with the success of the interaction on a shared decision making task (Fusaroli and Tylén, under review). This seems to indicate that temporal structure lends itself as a measure of rule-abidingness, which can serve as a general indicator of skilled language use – be it reading or conversation – in the language game conception.

Of course, the evidence presented comes from a single case of reading or is based on retrospective interpretations of already published work, and proper prospective data to test some of the basic predictions of the language game conceptions need to be collected. Nevertheless, these findings lend some motivation that the concept of reading games can serve as a fruitful and fundamental property of reading, that circumvents some of the conceptual problems of contemporary theories of reading, especially their take on meaning. To utilize and explore the value of this conception, first investigations are needed that solve basic measurement issues, such as what measures of temporal structure (e.g., RQA, CCM, entropy measures, correlation dimensions) make for sensitive and reliable operationalizations of rule-abidingness, and whether and how they converge. Then, subsequent investigations might shed light at more specific hypotheses, such as whether rule-abidingness can be used to predict differences between qualitatively different reading tasks, or serve as a general metric for skilled language use across readers, texts and languages, and connect reading to back to the broader field of human communication.

## **ACKNOWLEDGMENTS**

This work was supported by the Marie-Curie Initial Training Network, "TESIS: toward an Embodied Science of InterSubjectivity" (FP7-PEOPLE-2010-ITN, 264828). I thank John McGraw and Anna Haussmann for helpful suggestions and commentaries on this manuscript.

## **REFERENCES**


and silent reading. *Sci. Stud. Read.* 18, 235–254. doi: 10.1080/10888438.2013. 862248


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 May 2014; accepted: 27 July 2014; published online: 22 August 2014. Citation: Wallot S (2014) From "cracking the orthographic code" to "playing with language": toward a usage-based foundation of the reading process. Front. Psychol. 5:891. doi: 10.3389/fpsyg.2014.00891*

*This article was submitted to Cognitive Science, a section of the journal of Frontiers in Psychology.*

*Copyright © 2014 Wallot. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Pooling the ground: understanding and coordination in collective sense making

# *Joanna Ra˛czaszek-Leonardi 1\*, Agnieszka De˛bska2 and Adam Sochanowicz <sup>2</sup>*

<sup>1</sup> Psycholinguistics and Cognitive Psychology Lab, Institute of Psychology, Polish Academy of Sciences, Warsaw, Poland <sup>2</sup> Psycholinguistics Lab, Faculty of Psychology, University of Warsaw, Warsaw, Poland

## *Edited by:*

Ezequiel Alejandro Di Paolo, Ikerbasque – Basque Foundation for Science, Spain

## *Reviewed by:*

Theo Rhodes, University of California Merced, USA Gregory James Mills, University of Groningen, Netherlands

## *\*Correspondence:*

Joanna Ra˛czaszek-Leonardi, Psycholinguistics and Cognitive Psychology Lab, Institute of Psychology, Polish Academy of Sciences, UL. Jaracza 1, 01-378 Warsaw, Poland e-mail: raczasze@psych.uw.edu.pl Common ground is most often understood as the sum of mutually known beliefs, knowledge, and suppositions among the participants in a conversation. It explains why participants do not mention things that should be obvious to both. In some accounts of communication, reaching a mutual understanding, i.e., broadening the common ground, is posed as the ultimate goal of linguistic interactions. Yet, congruent with the more pragmatic views of linguistic behavior, in which language is treated as social coordination, understanding each other is not the purpose (or not the sole purpose) of linguistic interactions.This purpose is seen as at least twofold (e.g., Fusaroli et al., 2014): to maintain the systemic character of a conversing dyad and to organize it into a functional synergy in the face of tasks posed for a dyadic system as a whole. It seems that the notion of common ground is not sufficient to address the latter character of interaction. In situated communication, in which meaning is created in a distributed way in the very process of interaction, both common (sameness) and privileged (diversity) information must be pooled task-dependently across participants. In this paper, we analyze the definitions of common and privileged ground and propose a conceptual extension that may facilitate a theoretical account of agents that coordinate via linguistic communication. To illustrate the usefulness of this augmented framework, we apply it to one of the recurrent issues in psycholinguistic research, namely the problem of perspective-taking in dialog, and draw conclusions for the broader problem of audience design.

**Keywords: dialog, common ground, perspective-taking, coordination dynamics, language**

# **INTRODUCTION**

In most traditional approaches to language, sense-making happens at the individual level. Language itself is seen as an information carrier, in which vessels for meaning arrive from the speaker, and are unpacked by the addressee by means of complex computational processes over pre-existing representations (e.g., Katz, 1966; Frazier and Clifton, 1996). Even when the dialogical nature of linguistic communication in guiding production and comprehension is acknowledged, as in recent mechanistic models (Pickering and Garrod, 2004, 2013), the process of communication has a similar, information-transmitting character: it's goal is most often for participants to understand each other, which consists of making their situation models as similar as possible.

Recently, however, an increasing number of approaches have investigated language in a more ecological setting of situated social coordination. The approaches vary from more pragmatic ones, which regard language as a tool for social coordination, to more radical ones, in which it is the linguistic interaction itself that temporarily transforms individual cognition and constitutes social coordination. Neither of these approaches considers understanding each other to be the ultimate goal of an interaction. Rather, the aim is to form (or to become) a temporary functional system, jointly structured by environmental requirements1.

In a recently proposed model of dialog as interpersonal synergy (Fusaroli et al., 2014), this systemic and functional character of linguistic interaction is given a more systematic form. This model is based on an assumption that language, instead being a system of meaning carriers is rather a system of constraints on an ongoing, situated interaction. Due to the history within a culture and within development in this culture, language has the power to functionally control<sup>2</sup> the interaction as a whole (Ra˛czaszek-Leonardi and Scott Kelso, 2008; Ra˛czaszek-Leonardi and Cowley, 2012). Such a perspective on language, in which interaction in a concrete situation is constitutive of the meaning of utterances, brings several major changes to the way explanations of linguistic behaviors are constructed:

<sup>1</sup>Such a view of the role of language in situated cognition is congruent with Hutchins (1995) distributed cognition approach, where the focus is the ability of individuals

to form collective functional organizations. In this approach, the collective, global level assumes a systemic property. Both local and global factors in cognition and action are investigated. When we refer to a 'system' in this text, we mean such an organization of individuals. Obviously, every such organization is situated in a particular environment that shapes it in different time scales. It is thus possible to conceive organisms-environment organization as a system as well. This is a matter of focus. In this paper, we chose to focus on human interaction – mostly dyadic but scalable to more participants – and treat environmental factors as constraints on this system.

<sup>2&</sup>quot;Functional control" is a term in motor control theory (from which the notion of 'synergy' has been adopted). Functional control is exerted through reducing the degrees of freedom of the parts of the system in a specific way, enabling a system to perform a coordinated movement, adequate to an ongoing activity (e.g., Bernstein, 1967).


Recent research has uncovered a variety of mechanisms for maintaining physical coherence in interacting individuals, such as similarity in time (synchrony) and space (imitation). Some studies have also investigated mechanisms for physical complementarity, which involved reciprocity of movement (van Schie et al., 2008; Sartori et al., 2011) or turn-taking structure (Wilson and Wilson, 2006). Yet the search for mechanisms that provide coherence and complementarity on the content level have thus far been limited to just one part of the story, namely the similarity aspect, which is, for example, achieved through priming (as in Pickering and Garrod, 2004), or, less mechanistically, through the process of grounding dialog in dynamically developed common ground (Clark, 1996). What seems to be much less developed is the conceptual apparatus, which could account for semantic complementarity, i.e., for meaningful differences that make people interact in the first place and that are integrated in a dialog, resulting in a more capable collective structure. A step in this direction is research on the emergence of dialogical scripts, in which complementary roles develop in the course of taskoriented interaction (Mills and Gregoromichelaki, 2010; Mills, 2014); however, that research pertains more to the general moves in conversation (functionally understood), while here we would like to focus on the semantic resources available for an interacting dyad.

The aims of this paper are to advocate the need for a conceptual apparatus that can encompass such semantic complementarity, to trace established concepts and approaches that can support its theoretical foundations, and to begin its construction. Realization of these aims will require the integration of the synergetic approach to dialog with more traditional dialog research, which is the main source of the key concepts. To situate language in action, we first briefly survey the ways in which the relationship between linguistic communication and coordination has been conceptualized, emphasizing pragmatic approaches that represent an 'understanding-for-coordination' perspective. Then, we determine which conceptual tools are already available to talk about language in coordination; namely, we analyze the notions of 'common' and 'privileged' ground and their respective role in the explanations of task-oriented linguistic encounters. Next, we propose that although the notion of dynamically accumulating, situationally relevant common ground has been indeed a step toward understanding the coordinative role of language in research on dialog, it is not sufficient to account for the distributed nature of a conversing system. For this, the notion of 'pooled ground' will be advanced to describe resources on which the emerging, qualitatively new, functional dialogical structure is based. Finally, we apply this augmented framework to the recurrent problems in psycholinguistics and cognitive psychology. The case we will analyze is the debate on perspective-taking in dialog. We show that what might seem like an automatic egocentric perspective (e.g., Keysar et al., 2000, 2003) may stem from the functionality of such behavior for the dyadic system as a whole. We also reflect on the applicability of the proposed notion to the broader phenomenon of audience design. The view from the level of interaction prompts to interpret audience design not only as adapting one's speech to the listener so that she better understands it but also as designing one's speech to seek what is missing in the speaker's knowledge but is crucial for the joint project. Both examples will demonstrate the explanatory value of the collective level and raise questions about the proper level of analysis for linguistic structures and behaviors.

## **COMMUNICATION: UNDERSTANDING AND COORDINATION**

'Understanding' is one of the most broadly discussed concepts in both philosophy and in psychology of language; thus, reviewing, even superficially, its many facets exceeds the scope of this paper. Leaving aside the problem of understanding as grasping the meaning of a proposition in its relation to the external world, we will focus only on understanding in interaction and briefly survey the ways in which understanding is seen to relate to interpersonal coordination.

In many traditional approaches to language, understanding has been treated as a sole goal of linguistic communication. As Wittgenstein (1967, p. 114) complained: "*(*...*) we are so much accustomed to communication through language, in conversation,*

<sup>3</sup>This distinction might not be easy to make in some embodied accounts of cognition, where physical systems, due to their structure, shaped by natural selection, can also be seen as meaningful and intentionally committed to projects in the world (e.g., Merleau-Ponty, 1963). In such embodied view, physical interaction between the living system and the world, and among systems, can thus also be meaningful. Being aware of this, we preserve the distinction for the clarity of discourse and possible connection to research performed in more traditional approaches. See also the comments in Conclusions.

*that it looks to us as if the whole point of communication lay in this: someone else grasps the sense of my words—which is something mental: he as it were takes it into his own mind. If he then does something further with it as well, that is no part of the immediate purpose of language*." The realization of the goal of understanding each other was often described as coding and decoding a message (e.g., Katz, 1966). This made Dummet (1996, p. 97) characterize the traditional view as assuming that "c*ommunication is (*...*) essentially like the use of a telephone: the speaker codes his thought in a transmissible medium, which is then decoded by the hearer (*...*) Concepts are coded into words and thoughts, which are compounded out of concepts, into sentences, whose structure mirrors, by and large, the complexity of the thoughts*."

The 'code' conception of language, or a conduit metaphor of communication (Reddy, 1979), is recently increasingly criticized both in philosophy and in psychology. It seems to fail in many ways; one of the most important is being unable to adequately address the issue of contextual flexibility (the same message could be understood to mean different things in different contexts). Without making the context (more precisely, the relevant features of the context) part of the 'code,' a communication model that consists simply of encoding and decoding has difficulty explaining how the same encoding can at different times yield different decodings (Barwise and Perry, 1983; Krauss and Chiu, 1997). Not of lesser importance is the fact that, as noted in Wittgenstein's quote, the code conception of language ignores the pragmatic and performative aspects of linguistic behavior.

Accounts that attempt to embed goals of human communication in a wider social context and not restrict it only to mutual understanding have been present in the philosophical literature for quite a long time. This pragmatic aspect of linguistic communication has been emphasized, for example, in the work of Hillary Putnam, who indicated that language and linguistic behavior hold a subservient role in the global activity of the users. As he put it:

"*What succeeds or fails is not, in general, linguistic behavior by itself but total behavior. E.g. we say certain things, conduct certain reasonings with each other, manipulate materials in a certain way and finally we have a bridge that enables us to cross a river that we couldn't cross before. And our reasoning and discussion is as much a part of the total organized behavior complex as it is our lifting of steel girders with a crane. So what I should really speak of is not the success or failure of our linguistic behavior, but rather the contribution of our linguistic behavior to the success of our total behavior* (Putnam, 1978, p. 100)."

In pragmatic approaches, the personal and contextual factors are openly admitted in the process of understanding an utterance. According to Dascal and Berenstein (2003, p. 83.), in compliance with Gricean tradition, *understanding is always pragmatic understanding. "It is not a matter only of understanding speaker's words (determining the "sentence meaning"), but always a matter of getting to the speaker's intention in uttering those words in that context (determining the "speaker's meaning").*"

Similar debates have been present in analytical philosophy, where utterance comprehension should result, according to Michael Dummet, in the recognition of interplay between conventional meaning attributed to words and sentences and the contextual determinants. The degree to which the former factors (conventionalized meaning) indeed play the role in the process of communication also varied in philosophical theories – from practically determining this process to being always modified and dependent on context. As in Davidson (1986, p. 174): "*We must give up the idea of a clearly defined shared structure which languageusers acquire and then apply to cases."* According to Davidson, what people converge on is only passing theories, and such convergence is a result of applying all possible resources at hand – both linguistic and extra-linguistic.

The tension between understanding as a goal in itself or as a means to coordination is correlated with the tension between the representative vs. performative functions of language. If the goal is just to understand each other, the representative function is emphasized and the process of communication becomes one of making these representations similar [as in the Pickering and Garrod's (2004) model]. However, if the language's function is sought rather in effectuating coordination, its creative and performative powers come to the forefront. We find a similar distinction in Dummet (1996, p. 185, 187), who stated that "*the true opposition is between language as representation and language as activity (*...*); the significance of an utterance lies in the difference that it potentially makes to what subsequently happens."*

The debate sketched above seems to reflect, from a philosophical point of view, the controversies entailed by the relationship between understanding and overall practical coordination as a goal of communication. Both understanding and coordination rely on the similarity of knowledge between the interaction participants. However, while understanding each other seems to refer to and rely on overlapping knowledge, in practical coordination, the knowledge implicated in the deeds of the partners need not be entirely common, as long as actions are appropriate. Only if linguistic interaction is considered 'for understanding' can its goal be described as broadening the scope of mutually shared knowledge; when language use is seen as a control process in an ongoing interaction, leading to practical coordination, what is mutually shared is but a foundation on which something new is created in interaction. The core of the discussion can be thus seen as a question to what extent successful communication consists in broadening and strengthening a pre-existing harmony and to what extent it consists of efforts that aim to coordinate and overlap separate idiolects in the goal of creating a new quality under external constraints.

In philosophical inquiries, it has also been underlined that one's comprehension of a given utterance can only be accessible through the manifestation of the state of understanding. Such an approach allows for a departure from considering understanding only as a private, covert, and individual process that is purely a mental phenomenon. As noted by Quine (1990, p. 58): "*In practice, we credit someone with understanding a sentence if we are not surprised by the circumstances of his uttering it or by his reaction to hearing it – provided further that his reaction is not one of visible bewilderment.*" Thus, the 'operationalization' of understanding (success in communication), similar to the conversational-analytic approaches, is through what happens next in the overall interaction.

We will now move to the characterization of these issues from the psycholinguistic perspective. Counterparts of the above mentioned problems in psychological and linguistic research on language involve many issues that appear when pragmatics and jointness (dialogicity) of language are addressed. In what follows, we focus on the subset of those issues, surveying the toolbox of available concepts. We begin with an overview of the notion of common ground, which constitutes a pivotal concept in addressing the above questions on both the theoretical and empirical level, and then continue to the notion of privileged ground. The sufficiency of these concepts will be evaluated for accounting for task-oriented dialog.

## **COMMON AND PRIVILEGED GROUND**

The notion of common ground has been most extensively used and explored in psycholinguistic theories by Clark and Marshall (1978), Clark et al. (1983), and Clark (1996). Common ground, defined as a "sum of mutual knowledge, beliefs and suppositions" (Clark, 1996, p. 93) enables agents to recognize and represent the general information about the world as well as about previous states and current situations that is shared among them. This is the basis for mutual expectations of each other's behavior in a given stage of the task (Clark, 1996, pp. 43–49). The most important feature of common ground is thus the assumption of mutuality. It is not enough that two people have the same knowledge; they must realize that this knowledge is mutually shared.

In most psycholinguistic research, common ground has been treated as a relatively simple characterization of mutually available information, which would be a prerequisite in communication. In many experimental settings it is usually operationalized as those elements of a visual field that are accessible to both participants. Yet it is important to appreciate the complexity of this concept, its joint, dynamical, and contextualized nature. This has been most fully exposed in research on dialog, and especially in Clark's (1996) approach, where it serves to ground conversation and to enable the principle of least collaborative effort to explain many aspects of linguistic interactions.

Clark sees a conversation as a type of rational joint action, with different levels of joint projects, hierarchically and sequentially organized (Clark, 1996; Bangerter and Clark, 2003, p. 150). A minimal joint project is understood as an adjacency pair – a proposal from Agent A to take a joint project and Agent B's response to uptake it, like in a typical question–answer pair (Clark, 1996, pp. 191–220). Linguistic communication is a tool for the coordination of more basic actions immersed in a physical world. Some joint actions obviously do not require coordination via language, like dancing or playing a piano duet, but in most co-actions language is necessary to succeed. For example, when Ann and Bob are engaged in moving a table, they might use language for navigation in different joint projects that constitute the joint action, such as selecting a place to put it, lifting the table, lowering it together, etc. Bangerter and Clark (2003) noticed that sounds or words produced during conversation, which were traditionally considered to be turn-taking signals or emotional acts may in fact reflect the structure of the joint task, as for example, they may serve as markers that indicate the stage of the project. They analyzed corpora from experimental

communication tasks and spontaneous telephone conversations (over 3.5 million words in English and German) to show that words like "okay" and "all right" served as horizontal markers (indicating the beginning and end of a particular joint project), and "m-hm," "u-huh," "yeah" were vertical markers (signalizing the expectation of continuation).

Although communication serves as a coordination device for joint actions, it itself needs to be coordinated. Interlocutors participate in a collaborative process (grounding) where they constantly signalize to each other their engagement in a course of events. As Clark (1996, p. 246) noticed, in the grounding process, new information is prominent when it concerns the basic level of action (in physical world or in speech act), but signals pertaining to the level of understanding "should be backgrounded." Usually, agents involved in a joint action need to signalize if they finish or start a new project to maintain continuity and compatibility in a track of joint projects, but they tacitly assume that they accomplished the level of mutual understanding. Their assumptions might be easily violated if the other party's behavior is not in line with expectations that emerged in the communication process (congruent with Quine's (1990) characterization above, behavior in co-action is thus the final criterion for ascribing understanding). These expectations, though not explicitly expressed, are part of the common ground. They are built on three types of information: initial common ground (mutual knowledge that participants bring into a conversation), the current state of the joint activity, and public events that happened from the beginning of the joint action (Clark, 1996, p. 43).

Cumulative history of dialog with another person forms background information that dynamically creates a shared context. It may, for example, cause shaping utterances from long and informative to short and elliptical (Mills and Gregoromichelaki, 2010; Mills, 2011), or even result in less care in the pronunciation of words that have been mutually used in a given conversation (Fowler, 1988). The tendency to make a conversation shorter and more succinct in a shared context is consistent with the least collaborative effort principle (Clark and Wilkes-Gibbs, 1986). This principle has been evoked mostly in situations in which participants must agree on a reference, in order to explain how redundancy is kept minimal. For example, in a communication game when a director has to provide the matcher with information about the shape of tangrams (highly ambiguous, geometrical figures), his first descriptions are relatively long and detailed, but in subsequent rounds they become shorter, up to almost becoming proper names. The speaker does not have to use long utterances anymore because the dyad has developed ways to conceptualize and refer to the tangrams (Wilkes-Gibbs and Clark, 1992).

Accumulated common ground on the level of conceptualization is also responsible for the phenomenon called lexical entrainment, where a speaker refers in the same way to the same object in the interaction with the same interlocutor but might change the term in a conversation with another interlocutor (Brennan and Clark, 1996). This is also an example of applying the least collaborative effort principle (changing a referring term for the same pair without a good reason is in conflict with the conversational economy, Metzing and Brennan, 2003). Similarly, the act of perspectivetaking in conversation may be useful in minimalizing the cost of future possible misunderstandings. If the interlocutors are aware that their visual perspectives differ, they will try to use terms that refer to neutral spatial descriptions (Schober, 1998).

Other aspects of common ground, understood as a shared physical, social, and linguistic environment in a current state of activity, might explain how interlocutors are able to properly interpret utterances that are strongly context-dependent, such as definite references. When Ann says to Tom, "Give me the bottle," she most likely means the one that they both have seen or talked about previously, so Tom can safely reject interpretations that refer to his private bottle of water hidden in his bag4. Depending on the recognition of what is and what is not in common ground, the addressee may narrow the interpretations to those related to the speaker's knowledge and their shared history of communication.

Thus,'common ground' is a very broad construct. It focuses on everything that is recognizably shared in a conversation. Especially as construed in Clark's theory, its incremental, dynamic, dialogical, and situated character makes it a very useful notion for explaining how people zoom in on common references or resolve ambiguities with least collective effort (e.g., Wilkes-Gibbs and Clark, 1992; Clark, 1996; Clark and Krych, 2004). Equally important, it helps determine what information would be new, i.e., what is worth volunteering in a next conversational move and worth entering into the accumulating common ground. According to Grice's cooperative principle, the volunteered information must be based on common ground to be relevant to the course of discussion, but it must be novel enough to be a real contribution to the conversation. Saying something that actually is a part of mutual knowledge is a violation of the quantity maxim, and in usual conversation, it may turn into an implicature, such as in a situation when Bob flirts with a woman and Ann says to him: "I think you have a wife."

However, even this dynamic, task-oriented, and joint conception of common ground does not allow to address the complementary parts of knowledge that remain private but nevertheless influence how a dyad is coordinating on the task. By focusing on what is common, mutually shared, the notion of common ground emphasizes the similarity (or coherence) aspect of the formed synergy. This, perhaps, stems from the historical provenience: the main theoretical focus of work on common ground was how people establish a common reference to external objects and much less on the distributed aspects of joint actions. The least collaborative effort principle does point to the fact that one interlocutor counts on the knowledge of the other, but it is mainly the shared knowledge. The principle was designed rather to explain curbing redundancy in speech acts than to make possible the distribution of resources, which, after all, also (if not primarily) leads to performing the tasks with least collective effort.

The synergetic approach may be useful to augment the taskimmersed dialog theory with this distributive aspect by more clearly relating the 'linguistic' and 'action' projects in Clark's approach. It proposes a specific relationship between the joint projects on the level of action and on the level of conversation. Basing on the notion of language as a constraint (Ra˛czaszek-Leonardi and Scott Kelso, 2008; Pattee and Ra˛czaszek-Leonardi, 2012), in this model the moves in a dialog are not viewed as containers for the transmitted content but rather as constraints on a collective project. Given the joint nature of linguistic interaction, these are jointly constructed. Thus the dynamics of both individual and joint action is regulated and guided by language rather than being expressed or described in it.

The controlling role of language in collective projects thus requires a joint establishment of task-relevant constraints using linguistic structures. This means that these two projects cannot really be understood separately: being 'just' a constraint, an utterance can be understood only in context of the ongoing project, as it relies for meaning on the action it constrains (Pattee and Ra˛czaszek-Leonardi, 2012; Ra˛czaszek-Leonardi, 2014). The two sides of a linguistic interaction: a joint pragmatic project and joint construction of constraints might nevertheless rely on different mechanisms and may provide different sources of structuring for a conversation. Thus an appearance of a given utterance at a given moment of interaction may reflect both the structure of the task and the conventionalized ways of structuring linguistic interactions so that they become effective controls in interaction.

The proposed constraining relationship between language and coordination in situated coaction leads in many cases to similar predictions as Clark's grounding theory. For example the abovementioned shortening of expressions and increase in the use of ellipsis in the course of a conversation stem from the fact that less control is needed when more coordination is already in place. However, beyond that, accepting the constraining role of language (rather than content-conveying one) also facilitates seeing linguistic behaviors as serving a larger, distributed system. Conceptual pacts are good examples of a dyad zooming in on effective controls in a given situation; the process of emergence of the dialogical scripts can be seen in the light of their stabilizing participant's roles in frequently recurring joint projects. The latter can take place both on the timescale of a particular interaction in a particular task (as, e.g., in Mills, 2011) and on the slower timescale, when culturally specific dialogical scripts emerge, revealing frequent structures of joint projects encountered in the social life of a particular culture (such as, for example, question–answer adjacency pairs, or "greeting chats,"which may have particular structures, cadence, and even limited contents, e.g., "weather chats" in England, or asking about the health of relatives in Poland).

Another good example of joint establishing of task-dependent effective linguistic controls comes from research on language functioning in joint decision making: e.g., Fusaroli et al. (2012) show that performance on a joint decision task depended not on unspecific lexical alignment of the participants, but rather on the dyad's selecting-by-alignment of specific dimensions that were crucial for the task. Repetitive expressions of those dimensions, in turn, kept the actions of the participants organized around them. Importantly for the arguments presented in this paper, they would do so even if actual actions and knowledge, on which the use of those expressions was based, were idiosyncratic to each participant.

<sup>4</sup>In the light of what follows, however, it is possible that if Tom knows that the hidden bottle better suits the purpose of Ann, he could reach for it (or direct his gaze toward it), even knowing that Ann refers to a commonly known bottle.

It is important to reiterate that the principle of least collaborative effort pertains to both levels of coordination: coordination of controls (where only minimally needed constraints for the ongoing interaction are jointly provided, and where partners count on each other to make constraints more precise) and the coordination of a joint project itself, when participants rely on being similar but also on each other's idiosyncratic capabilities in the division of labor. These capabilities (skills and knowledge) might be complementary and remain unshared, as long as they do the job required for the project.

The idiosyncratic knowledge in linguistic interaction is referred to as privileged ground. While the concept of common ground has a long tradition in philosophy and psycholinguistics, the concept of privileged ground is relatively new and has been used in more limited contexts. It was construed in opposition to the common ground and is defined as knowledge that a single interlocutor attributes only to herself (for example, because she has privileged perceptual access to it; see e.g., Keysar et al., 1998). In many examples of linguistic analyses of communicative interactions and in psycholinguistic research, privileged information is usually seen as a distractor, drawing attention away from the common ground on which the interaction should stay, as dictated by the experimental tasks. If an interlocutor cannot ignore distractors present in privileged ground effectively, it is usually concluded that she shows an egocentric tendency (Keysar et al., 1998, 2000, 2003; Wardlow Lane et al., 2006; Lin et al., 2010).

Constructing tasks in this way, however, researchers limit the applicability of their results to only a subset of everyday communication situations—asubset, let us add, that is compatible with the view of 'understanding' as 'equalizing world models.' In a way, the privileged information is made irrelevant by design. Yet if the distributed character of interactions is to be taken seriously, the importance of role-division and idiosyncratic contributions to the task become evident, and, with it, the importance of the privileged ground and the ways of making the relevant elements of it bear on the task. Focusing on privileged information by an individual in interaction thus becomes a necessity, a desired thing, not an imperfection of the participant. The question, in such a distributed framework, is thus not how the common ground is broadened for understanding but how both common and relevant privileged information can be used in collaboration on a realized project5.

It seems that neither the notion of common ground nor privileged ground are sufficient to account for this kind of diverse but complementary influence that the participants can exert on joint projects within a distributed system. If linguistic interaction is to effectuate the coordination of a dyad toward various projects according to the least collaborative effort principle, it has to realize the division of labor: i.e., optimally using both parties' resources, without making them common.

A similar aspect of collectivity in meaning creation through constraint construction is also visible on slower time scales and larger systems: one can recall here Hilary Putnam's view on how the meaning of words is distributed in populations. He introduced the notion of division of linguistic labor, relating it to the performance and coordination of real-life tasks via linguistic means: *"(*...*) it is certainly not necessary or efficient that everyone who has occasion to buy or wear gold be able to tell with any reliability whether something is really gold. The foregoing facts are just examples of mundane division of labor (*...*). But they engender a division of linguistic labor: everyone to whom gold is important for any reason has to acquire the word 'gold'; but he does not have to acquire the method of recognizing if something is or is not gold. He can rely on a subclass of speakers. (*...*) that collective body divides the 'labor' of knowing and employing these various parts of the 'meaning' of 'gold'*" (Putnam, 1975, p. 141).

Returning to dyadic interactions and faster time-scales: a concept is thus needed that can account for the dyad's ability to rely on the knowledge of both participants, however, without the condition of its mutuality. Such knowledge can be a basis for complementary behaviors in a task situation (as agents act on the basis of common and private knowledge) and for linguistic acts that may not necessarily reveal or convey information but also signalize responsibility for privileged knowledge and scout for possibly relevant information. An expression dictated by an individual's privileged ground may thus become an active control of the dyad's behavior, which means that information, which does not enter common ground might nevertheless be decisive for interaction. Thus the proposed concept should pertain to a dyad as a whole and should help understand the resource in which the dyad's behavior is grounded.

## **POOLING THE GROUND – A VIEW FROM INTERACTION**

The view of language as a constraint on social coordination poses the creation of functional synergy, not understanding itself as the main *explanans*. The main questions thus concern how language facilitates coordination of cognition and action in concrete situations, how it controls and disambiguates possible ways of knowing and acting. In a sense, thus, it is not the context that disambiguates the word senses, as in traditional information-processing approaches, but rather utterances in a situation that actualize certain possibilities for interpretation and action and thus 'disambiguate' the context (Ra˛czaszek-Leonardi and Scott Kelso, 2008; Collier, 2014). Expressions do not convey meanings but rather, once used, operate reflexively, contributing to the common context and organizing experience.

This aspect of human interaction parallels the notion of reflexivity applied by Garfinkel (1967) in his ethnomethodological studies on practical everyday activities. Garfinkel (1967, p. 8) emphasized that it is commonly "*treated as the most passing matter of fact that members' accounts, of every sort, in all their logical modes, with all of their uses, and for every method for their assembly are constituent features of the settings they make observable.*" On this view, reflexivity means that members shape action in relation

<sup>5</sup>In this paper we make a strong assumption that collaboration is our species' most prevalent mode. This does not preclude local competition and diversity – because in the slower time-scale they lead to more flexibility and better exploration of possibilities. Recently it seems increasingly popular to accept that the collective-collaborative level can be selected for as well (Christakis and Fowler, 2009; Smaldino, 2014).

to context, while the context itself is constantly redefined through action6.

Thus in such 'view from interaction,' linguistic expressions, always immersed in co-action, effectuate dynamic changes both in individual participants (according to their history in a given culture) and on the level of a dyad, where they control interactants' behavior in a dialogical process. Congruent with the third claim of the synergetic approach mentioned in introduction, the formation of such a functional distributed system requires both coherence of a dyad (to be a system at all) and complementarity – i.e., the division of labor, which allows for an optimal use of the resources of each participant.

The key issue for understanding language use in dialog is to identify the mechanisms, i.e., processes both on the individual and interaction level, due to which coherence and complementarity are realized7. In the case of physical aspects of human interactions, an increasing amount of evidence for the existence of mechanisms for maintaining coherence is described in developmental contexts where infants focus on, imitate and synchronize with adults (Meltzoff andMoore,1977; Murray and Trevarthen,1985;Johnson et al., 1991) and in adults (Schmidt et al., 1990; Shockley et al., 2003). Not requiring division of labor, these mechanisms function alike in different contexts, perhaps differing in strength, when, e.g., the need for social coherence is greater (for discussion on this point, see also Fusaroli et al., 2014). In linguistic interactions, one mechanism proposed for achieving similarity is priming, with its various types (semantic, syntactic, etc.).

However, mechanisms that realize the coordination of diversity in interaction, i.e., those that bring about division of labor, complementarity, flexibility, and compensation, are not as selfsufficient. They cannot be described without taking into account a specific situation of interaction. Complementarity is a complex relational concept that involves not only the cognitions and actions of participants but puts those in relation to a situation in which the interacting participants are immersed. On a physical level, mechanisms for achieving complementarity in human coaction are being uncovered. Early education of attention for co-action is visible in development (Ra˛czaszek-Leonardi et al., 2013), as well as early signs of complementary action in anticipation to the caretaker's movements (Reddy et al., 2013), while in adults the activation of neural structures responsible for complementary and compensatory (and not only imitative) movements have been demonstrated (van Schie et al., 2008; Sartori et al., 2011). In the language domain models are proposed for entraining antiphase in syllable rate for turn-taking (Wilson and Wilson, 2006). Yet when it comes to the content level of linguistic interaction, it seems that the mechanisms for achieving complementarity are still not worked out. As said earlier, priming and even more elaborate mechanisms for the construction of common ground, because of their focus on mutuality, will not explain the complementary aspect of this level of communication.

We propose that in forming task-dependent dyadic systems, the informational resource can be characterized as 'pooled ground.' This refers to the aggregate of the common ground and the relevant privileged ground that may never enter common ground (become mutual) yet is a basis for individual behavior influencing the dyad. To pool knowledge in coordinative situations, language is thus used not only to confirm a shared vision of a situation, but also to 'scout' for and signalize mutually *un*available resources (information or skills), which would enable efficient functioning of the global system. The necessity of the concept comes from changing perspective from the individual to the dyadic level and acknowledging its distributed nature. It does not matter if resources are shared, as long as one of the participants makes them effective in the dyad's dialog and, eventually, behavior.

Here we use the first two tenets of the synergetic approach, mentioned in the introduction. By ascribing functionality to the entire system, we analyze individual processes as parts of this system. New variables – such as effectiveness or stability of a system as a whole – become explanatory also for the behaviors of the individual participants. The dyad, acting on the basis of unshared information, is a qualitatively new system, dependent on the interaction of the individual resources. Meaning is made in interaction due to individually produced constraints the bases of which might not be shared (i.e., the private knowledge that is the reason for their production is never expressed) but nevertheless bear on the behavior of the system.

From this perspective, the situation of communication, unlike in traditional psycholinguistic experiments, can be viewed not as relying on common ground, with elements of privileged ground distracting from perfect mutuality, but rather as relying on common ground with elements of privileged ground enabling moves (actions and utterances) that are beneficial for the overall behavior of the system yet never entering the common ground. Language thus acts as a constraint on individual and dyadic dynamics and, on the other hand, is an outcome of dynamic processes within individuals and dyads (Ra˛czaszek-Leonardi and Scott Kelso, 2008; Pattee and Ra˛czaszek-Leonardi, 2012).

Polanyi (1966, p. 6), in his *Tacit Dimension*, similarly describes the process of apprehending knowledge:

<sup>6</sup>Authors are grateful to the anonymous reviewer for pointing out this affinity. Indeed, there are more parallels between the view of language as social coordination advocated here and Garfinkel's ethnomethodology. Perhaps most importantly, Garfinkel treats all utterances as indexical, therefore under-defined and always relying on the context of co-action. This under-definition is a key element in the framework that treats language as a system of replicable constraints on interactive events.

<sup>7</sup>Here, by 'mechanisms' we mean processes that are sources of forces that make the coherence and complementarity possible. The trouble, however, is that in the case of such multisystem and multi-timescale phenomenon as language, those forces may be difficult to localize. On the one hand, it certainly is not enough to search for them only at the level of individual mind/brain; on the other hand, taking all the relevant systems and timescales into account might not be feasible in the process of theory construction. Here, we limit ourselves to those processes that produce structuring forces on the level of the individual and on the level of interaction, and limit the time scales to those of ongoing interaction and cultural evolution, bracketing out processes on different timescales while being aware of their presence. For a more detailed discussion of the multisystemic and multi-timescale nature of language, see Ra˛czaszek-Leonardi (2003, 2010, 2014) or Enfield (2013). For how this influences the form of linguistic theory, see Ra˛czaszek-Leonardi (2012)

<sup>&</sup>quot;*Our message had left something behind that we could not tell, and its reception must rely on it that the person addressed will discover that which we have not been able to communicate.*"

Or we may risk an even stronger claim: sometimes it is not necessary that the person make the discovery; she might rely on a partner having made it to make a next step in a joint reasoning. Communicative acts effectuate idiosyncratic changes in interlocutors, which will never be mutually available but which, in the cases of good communication, may lead to desirable collaborative outcomes.

The problem with the definition of the pooled ground lies in specifying what is enough to be known about the knowledge of the other to rely on it for the task: it is not the proposition, or any other form of a piece of factual knowledge itself, but rather consequences of acting upon it for the joint task. While the common ground requires that A know x, B know x, and they both know that they know x (Clark and Marshall, 1978; Clark, 1992, 1996), and while privileged ground means that A does not know x, B knows x and B knows that A does not know x, a task-dependent pooled ground could be described asA not knowing x, B knowing x andA knowing that B knows x8, which seems paradoxical without A knowing the content of x.

However the paradox dissolves if – as in the presented approach – language acts as a constraint, not as content carrier. The same expression may – to some extent –act differently on each interlocutor. For A, it might be enough to receive a signal that B knows the information needed for a task to rely on it. This is different from actually receiving this information; the content of B's knowledge does not enter the common ground. Knowing the task constraints should help predict the use of common (mutual) ground and the use of privileged (private) ground, which could change dynamically during task-dependent interaction.

The notion of pooled ground thus goes beyond common ground. It also goes beyond "implicit common ground," proposed by Pickering and Garrod (2004). Their conception is very helpful in finding mechanisms that establish common ground between interlocutors: it points to the possibility of its arising without inferences about, or modeling, the interlocutors' state of knowledge. Instead, they claim, the implicit common ground arises automatically in the interlocutors by being in the same culture, situation, or task and being part of the same conversation (letting the same words activate relevant information in each partner). It is therefore a much more automatic and resource-cheap process than actually drawing inferences about the other's knowledge. This mechanism takes into account the fact that the interlocutors are co-present on many different timescales (in culture, in multiple social projects, in a particular task, in a particular project within a task). The world, as its best model (Brooks, 1990), acts on both interlocutors alike.

What is still missing, again, is the distributed nature of the dialogical system: a mechanism for specialization in a task and bringing pooled, not only shared resources to bear on the dyad's effectiveness. The pooled ground concept is thus based on the fact that different cultural and experiential history of the participants in an interaction will make the activated knowledge that guides behavior different for each interlocutor. This has often been viewed as a trouble, and possible cause of misunderstanding. However, this very same fact is the dyad's strength, allowing for an optimal use of the potential resources. Thus, diversity is for good and for ill: for good because the idiosyncrasy of knowledge makes the knowledge base, upon which a dyad acts, much broader; for ill because it inevitably leads to misunderstandings and cases of miscommunication. It seems that much research has been devoted to the causes of misunderstandings treated as failures of communication, while in this light they can be seen as inevitable consequences of scouting for broader ground on which the interaction may build. Without misunderstandings, the discovery of relevant diversities would not be possible.

The concept of pooled ground has perhaps a stronger affinity to what Brown-Schmidt, one of the very few researchers who strives to go "beyond common and privileged ground" and toward taskimmersed interactions, has called "potential" common ground: *"(*...*) that interlocutors would treat the common ground status of potential discourse referents as a gradient phenomenon sensitive to various sources of information in the discourse context*" (Brown-Schmidt, 2012, p. 65). The trick is to make this potentiality exert its influence without ever becoming common, leading to a truly distributed, and thus economical system, functioning according to the least collective effort principle.

## **APPLICATION: PERSPECTIVE-TAKING IN DIALOG**

To summarize, the synergetic notion of dialog, which views language as a system of constraints functionally controlling interactions, has a potential of clarifying the relationship between the two kinds of coordination previously recognized in dialog (Clark, 1996). The coordination on the linguistic level means establishing controls that are appropriate for the coordination on joint projects. The principle of least collaborative effort pertains to both levels: (i) enforcing the sparse (thrifty) use of constraints on an ongoing dynamics, and making both partners contribute to their construction and (ii) distributing the roles to make the dyad less redundant and more effectively using the resources, pooling them adequately to the situation.

Adding the collective level of situated interaction to the explanatory apparatus, with its qualitatively new resource in the form of the pooled ground, allows to see in a different light some of the recurrent problems in psycholinguistics. For the purpose of this paper, we have chosen to focus on perspective-taking in dialog. Perspective-taking is a case of a broader phenomenon in dialog research, namely audience design, and after a detailed analysis of the former, we also draw implications for this more general notion.

Factors that determine which perspective (allo- or egocentric) is taken in a given moment of an interaction have been a subject of intense debate over the last 15 years. According to Clark et al.'s (1983) theoretical proposition mentioned above, interlocutors should immediately restrict their interpretations according to the perspective of the interlocutor, narrowing it to the common ground. However, in the work of Keysar et al. (1998, 2000, 2003), this principle has been questioned by the results that show

<sup>8</sup>Contrary to common ground, where a strict interpretation assumes that "B knows that A knows that B knows x," in pooled ground this does not seem necessary. B does not have to know that A knows that she knows x. For example, B may speak/behave like an expert to A without knowing that she is one; however, she might also know this and therefore design her expressions accordingly.

that addressees consider particular objects to be potential referents, even if these objects are not in common ground with the speaker (not visible to the speaker). When the commands from the speaker (e.g., "take a small candle") were ambiguous and referred to a mutually visible object as well as to an object hidden from the speaker but visible to the addressee, participants in the addressee role often fixated on the hidden object first, indicating that they perceived it as a possible referent. The presence of a hidden semantic competitor made the time of interpretation longer compared to a situation without such a competitor. Sometimes, participants showed even more 'grave' egocentric mistakes by reaching for, or even grasping, the object in the privileged ground (Keysar et al., 2000).

These results were interpreted as evidence for a default egocentric perspective in communication. Keysar et al. (2000) proposed a model of perspective-taking in dialog, a perspective-adjustment model, in which interpretation is an egocentric process, with mechanisms of late adjustment to the speaker's perspective activated only in cases of misunderstanding. Additional evidence for egocentric strategies in communication has been provided by research on cognitive costs of perspective-taking. Lin et al. (2010) showed that the ability to ignore the privately accessible part of a visual area in conversation correlates with executive resources such as working memory and inhibitory control. Other studies have confirmed that reasoning about others' perspective indeed might not be automatic, even for adults (Apperly et al., 2007), which seemed to further support the egocentric model.

Despite their influence on interpretation theories, Keysar et al.'s (2000, 2003) studies were criticized on methodological grounds. As Hanna et al. (2003) noticed, objects in privileged ground which had to be ignored by participants were chosen in Keysar's setups in such a manner that they were the best perceptual or semantic match for the descriptions (for example, they were the most typical referents). Consequently, participants had to resolve two conflicts: perspective discrepancy and lexical conflict. Lexical description pointed to the most typical referent, while shared perspective pointed to the less typical object visible for both participants. In Hanna et al. (2003), where lexical competition was under control, results showed that participants focused mostly on the shared objects, already from the beginning of the interpretation process. Nonetheless, they did look at the semantic competitors in privileged ground longer than at other irrelevant objects, so the perspective information was not the only type of information that determined behavior.

Accounting for these and other similar results which could not be readily encompassed within the Keysar's model, Hanna et al. (2003), Hanna and Tanenhaus (2004) and Brown-Schmidt and Hanna (2011) proposed and refined a different model of perspective-taking in dialog, namely the constraint-based model. The model emphasizes the probabilistic and incremental nature of the interpretation process, where, from the beginning, different constraints (prosodic, syntactic, semantic, pragmatic, etc.) influence interpretation. The final interpretation depends on the strength of each constraint and on the competition among them. It may happen that despite the active perspective-taking (being in common ground) constraint, a stronger saliency constraint wins at the beginning of the interpretation process, focusing attention on the privileged but very salient object. Importantly, the constraint-based model allows for embracing relevant influences from different sources in the course of communication. Perhaps even those that were traditionally neglected and that stem from the joint nature of conversation.

This was shown in Duran and Dale's (2014) recent study (see also Duran et al., 2011). The goal was to show that both egocentric and other-centric biases are simultaneously activated and compete for expression. The likelihood of eventually choosing one over the other depended, among other factors, on the information about the speaker's capabilities. In their task, participants were required to interpret verbal instructions from a partner speaking from a specific spatial location with respect to the study participant, who directed them to select an object on a computer screen. Although participants in interaction were not physically co-present, the spatial referent was ostensibly visible to both the speaker and the addressee, albeit from different angles. Occasionally, instructions could be ambiguous with respect to which object (e.g., one on the left or the other on the right, depending on whose perspective was taken) should be selected.

Depending on additional information available on their partner (they were informed that the partner was either real or simulated), participants grounded interpretation either from their own visual perspective (i.e., egocentric stance) or from the visual perspective of their partner (i.e., other-centric stance). They did the latter more often if the speaker was known to be simulated, evidently preferring the egocentric stance if they knew they interacted with a live interlocutor who was able to (1) take their point of view if necessary and (2) ask a clarifying question in case of equivocation. Thus, the behavior of the participants was congruent with the least collaborative effort principle: putting less effort (egocentric perspective) when some effort was expected to be shared by a partner. In the case of a simulated partner, incapable of collaboration, other-centric responding was shown to be not only more frequent but also faster. Additionally, measuring the shape of response trajectories, the authors demonstrated that competition from an egocentric tendency was weaker in this condition.

Duran and Dale (2014) also showed, compatibly with the constraint-based model, that the data obtained were well accounted for by a dynamical model, in which the two perspectives are defined as attractors of individual dynamics. Attractors coexist, and which one is chosen depends on their relative strength, which is influenced by the beliefs about the partner in interaction. What is crucial, though, is that the speeds of the participants' reactions and the form of their behavior (the shape of trajectories for reaching the goal) were influenced by a mere presence of the non-chosen attractor. This illustrates what was mentioned earlier: information that potentially is relevant for the task and only potentially can enter common ground nevertheless exerts its influence on the ongoing interaction.

The above shows how the dialogical, joint nature of conversation brings in valid and important constraints that, together with the knowledge of common ground, co-determine the perspective taken on a concrete scene. However, most of the experimental work of Hanna and Tanenhaus (2004) and Brown-Schmidt and Hanna (2011) as well as the work of Duran and Dale (2014) pertain to rather limited situations, congruent with those traditionally studied in research on communication: agreeing on reference and resolving ambiguities, where the task-relevant objects are – by experimental situation design – presented in common ground. It is worth noticing, as some already have (Brown-Schmidt, 2012), that situations chosen for studies mostly involve interpretation of descriptions or imperatives, which require focusing on common ground and rarely, for example, the informational questions, which would require focusing on the privileged ground. Thus again, the tasks were chosen to study how people understand each other and not how they are able to form distributed functional systems. Yet it seems logical that if linguistic interactions are to broaden the capabilities of a dyad, it is precisely the private, or privileged, information that should be in focus. In fact, in one study by Brown-Schmidt (2009), participants were asked about an object in their private ground with an informational question. In this situation, they clearly focused more on the privileged ground target than on the common ground competitor, showing sensitivity to the speaker's informational demands.

The power of an interacting system comes from its distributed nature. Using the concept of pooled ground helps understand how perspective taking may serve the global organization of a dyad. If we look at most real-life situations from a global perspective of an interactive dyad, pooling the ground for a dyadic system immersed in a task requires first scouting for information that potentially might be relevant for the task and volunteered or signalized in a collaborative interaction and then zooming on appropriate linguistic controls that coordinate this information. Keysar et al.'s (1998, 2000, 2003) results, as well as the slight initial egocentric bias found in almost all the above studies, might thus be taken not as evidence for egocentricity but as a preparation for being a valid partner in an interaction, able to contribute, or signal, idiosyncratic information or competence. Experimental setups where the participant sees that some information is blocked from the partner's view lead to an increased responsibility for this very information in this participant (he is the only one who has access to it) and thus increases the tendency to focus on it.

We thus see an increasing flexibility in the models of perspective-taking: from Clark's automatic initial adjusting to common ground (e.g., Clark and Carlson, 1981), or Keysar's automatic, initial egocentricity (e.g., Keysar et al., 1998) to the constraint-based model of Hanna and Tanenhaus (2004) and Brown-Schmidt (2009), where perspective-taking depends on interaction of various factors (lexical, perspective of the partner, capabilities of the partner). In the next step (in Duran and Dale, 2014), the constraints are shown to be co-present and dynamically influence perspective-taking decisions. This emphasizes the joint, dialogical, nature of communication and the principle of the least collective effort. The synergetic model, underscoring both the jointness and the distributed nature of the conversing system, which requires pooling the participants' resources, makes it possible to generalize the constraint-based model to other situations than ambiguity resolution or agreeingon-reference, by letting various structures of the task determine the shape of the linguistic exchange and thus better predict conversational moves and the focus of attention. This, however, is possible only if we let the global level (the functional synergy)

exert its influence, determining the distribution of complementary roles.

Rising to the level of interaction for explanatory variables has consequences for the phenomenon of audience design in general, of which perspective-taking is an example. The usual focus of the studies is on the ability of the speaker to adjust the utterance according to her beliefs about the knowledge or social status of the listeners (Clark and Krych, 2004; Horton and Gerrig, 2005). Addressees' reactions are rich in cues about their conversational needs, which has been elegantly demonstrated by Kuhlen and Brennan's (2013) work that led to questioning the validity of using confederates in some studies of interactive dialog. For example, in Brown and Dell (1987) experiment participants told a story to an allegedly naïve partner, who in fact was a confederate. In that case, participants were not eager to take the alleged addressees needs into account, which was interpreted as a proof of egocentrism. However, in the Lockridge and Brennan (2002) replication, when the confederate was replaced by an actual naïve partner who heard a story for the first time and who was allowed to give feedback to the speaker, participants showed sensitivity toward the addressee's lack of knowledge already in the early stages of utterance production. This strong effect of the interlocutor presence suggests that parties in a dialog are actually very skillful in estimating the knowledge and conversational needs of a partner during dialog.

Focus on 'doing together,' however, leads one to ask a question whether, perhaps, participants are equally skillful in recognizing potentialities and not only needs of the others. Isaacs and Clark (1987), in their study on audience design, showed that recognition of the expertise level with respect to the task material is almost immediate, determining both experts' and novices' way of referring to objects. Perhaps the principle of least collaborative effort and the distributed nature of joint project realization, with the notion of pooled ground, can thus be useful also for generalizing principles of audience design: from offering information to be understood to designing contributions to get what is needed for interaction to go further. Such framework can be helpful in broadening the investigation of interaction to the contexts beyond the tasks that require zooming in on the same reference in common ground. In other contexts, audience design serves not only to supply information but also to seek information from a more knowledgeable partner: expressions are designed to get to the privileged ground but only as much as is needed to make our own next move.

# **CONCLUSION**

Pragmatic approaches see language as immersed in a variety of social projects. This perspective, taken in conjunction with dialogical and collective view on meaning-making, points to the fact that realization of a project often requires the coordination of distributed resources.

The notion that a global level of interaction may possess causal and functional properties is advocated by enactive approaches to cognition (Di Paolo et al., 2008) and, in the domain of linguistic functioning, by the model of dialog as interpersonal synergy (Fusaroli et al., 2014). At this level, with respect to collective goals,

complementary roles for participants in a synergy are defined. Within such a framework, the use of language in interaction is thus responsible not only for creating and maintaining coherence and mutual understanding but also for distributing the roles in a task-dependent and complementary fashion. To describe the resource available to a dyad in this process, the notion of 'pooled ground' was proposed, which pertains to the level of interaction as a whole and comprises both the mutually known common ground and the elements of privileged grounds that may enter the common ground or may never do so, nevertheless having a causal role in constraining the dialogical system's behavior.

Just as the alternative attractor in the Duran and Dale (2014) study that exerted influence on the shape of reaching trajectories, the privileged knowledge will have an influence on a speaker's utterances (both the content and the way they are made), making them act slightly differently as constraints on the listener simply by virtue of being different physical controlling signals. This brings us back to the distinction between the physical and the semantic, which was made in note 3 at the beginning of this paper: in the framework in which language is understood as a constraint on an ongoing interaction, it is easier to see how the physicality of an utterance may become meaningful in a given situation.

The synergetic model leads to the reinterpretation of seemingly egocentric behaviors in perspective-taking as dyad-oriented; namely, they may stem from 'scouting' for useful task-relevant information. Similarly, audience design of utterances should be understood with respect to the joint project realized, and not as motivated solely by understanding each other. The emphasis on pooling and not equalizing the ground may show in a different light the problem of misunderstandings. They are a natural consequence of scouting for broader resources; their appearance is not only a signal that something should be repaired but, equally valuably, a signal of a potentially usable difference. They stem from constantly testing privileged information that can be volunteered or signalized in an interaction. The collective, distributed sensemaking would thus not be possible without misunderstandings.

Balancing the synchrony/complementarity factors in a synergy leads to novel predictions about communicative behavior. It may, for example, be useful in determining the 'degree of novelty' that will be accepted in a conversation. In a situation of a strong need for group coherence, one might predict a heavier redundancy, i.e., staying within common ground (an emphasis on communion and the phatic aspect of an encounter) rather than risking miscommunication while scouting for maximal gain.

The theoretical and empirical focus in psycholinguistic studies exclusively on language, on linguistic exchanges and their 'understanding,' leads to underappreciation of a richly structured interaction constrained by many factors being already in place. Viewing linguistic interactions first as interactions on joint projects, with language as a source of constraints that structure them and divide labor, removes the explanatory burden of meaning-making, and understanding from language alone and

poses it in the study of interaction in its context. With the pooled ground over both participants as resource, these interactions, as distributed collective structures, can be truly richer and more able than each of the participants alone.

## **ACKNOWLEDGMENTS**

The authors wish to thank Gregory J. Mills for very helpful comments that made this paper much better. The work on this paper was supported by the EuroCORE (EuroUnderstanding) grant "Digging for the Roots of Understanding" (Funding Decision 888/N-EuroUnder/2011/0) to the first author and DSM 109031/2014 to the second author.

## **REFERENCES**


Clark, H. H. (1992). *Arenas of Language Use*. Chicago: University of Chicago Press.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 May 2014; accepted: 10 October 2014; published online: 07 November 2014.*

*Citation: Ra˛czaszek-Leonardi J, De˛bska A and Sochanowicz A (2014) Pooling the ground: understanding and coordination in collective sense making. Front. Psychol. 5:1233. doi: 10.3389/fpsyg.2014.01233*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Ra˛czaszek-Leonardi, De˛bska and Sochanowicz. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Culture's building blocks: investigating cultural evolution in a LEGO construction task

# *John J. McGraw1\*, Sebastian Wallot 1 †, Panagiotis Mitkidis1,2,3 † and Andreas Roepstorff <sup>1</sup>*

<sup>1</sup> Interacting Minds Centre, Department of Culture and Society, Aarhus University, Aarhus, Denmark

<sup>2</sup> Center for Advanced Hindsight, Social Science Research Institute, Duke University, Durham, NC, USA

<sup>3</sup> Interdisciplinary Centre for Organizational Architecture, School of Business and Social Science, Aarhus University, Aarhus, Denmark

## *Edited by:*

Ezequiel Alejandro Di Paolo, Ikerbasque – Basque Foundation for Science, Spain

#### *Reviewed by:*

Chris Sinha, Lund University, Sweden Mirko Farina, Macquarie University, Australia Richard Heersmink, Macquarie

University, Australia

#### *\*Correspondence:*

John J. McGraw, Interacting Minds Centre, Department of Culture and Society, Aarhus University, Jens Chr. Skous Vej 4, Building 1.483, 3rd floor, 8000 Aarhus C, Denmark e-mail: quickdraw74@hotmail.com

†Sebastian Wallot and Panagiotis Mitkidis have contributed equally to this work.

One of the most essential but theoretically vexing issues regarding the notion of culture is that of cultural evolution and transmission: how a group's accumulated solutions to invariant challenges develop and persevere over time. But at the moment, the notion of applying evolutionary theory to culture remains little more than a suggestive trope.Whereas the modern synthesis of evolutionary theory has provided an encompassing scientific framework for the selection and transmission of biological adaptations, a convincing theory of cultural evolution has yet to emerge. One of the greatest challenges for theorists is identifying the appropriate time scales and units of analysis in order to reduce the intractably large and complex phenomenon of "culture" into its component "building blocks." In this paper, we present a model for scientifically investigating cultural processes by analyzing the ways people develop conventions in a series of LEGO construction tasks. The data revealed a surprising pattern in the selection of building bricks as well as features of car design across consecutive building sessions. Our findings support a novel methodology for studying the development and transmission of culture through the microcosm of interactive LEGO design and assembly.

**Keywords: cultural evolution, cultural transmission, joint action, joint attention, shared intentionality, materiality, path dependence, schema theory**

# **INTRODUCTION**

Natural selection has proven to be a uniquely successful scientific paradigm. By identifying the basic processes through which organisms change, Darwin (1859) established a research program that has not only revolutionized the study of life, but has provided a template for what a comprehensive model of transformation over time ought to look like. And with the additional refinements and achievements of the modern evolutionary synthesis, many of the subtler mechanisms, including the way that biological information is genetically transmitted, have yielded to scientific inquiry and experimentation (Dobzhansky, 1937; Huxley, 1942; Mayr, 1942). Moreover, because evolution—for the most part—subsumes all aspects of biological life, it has been used not only as an explanation of changes in biological *form*, but of the *behavior* of organisms (Darwin, 1890; Lorenz, 1937; Tinbergen, 1951). Indeed, some theorists believe that the mysteries of human behavior, and the achievements of human societies, may ultimately find their explanations in rigorous applications of evolutionary theory to patterns of human interaction, potentially explaining culture itself (Wilson, 1975; Dawkins, 1976; Sperber, 1996). But in spite of many attempts to adapt ideas from biological evolution to the study of culture, beginning soon after the publication of Darwin's magnum opus (Spencer, 1864; Galton, 1869; Haeckel, 1900), the preliminary approaches have, as of yet, failed. This includes even the impressively nuanced models of such 20th century scholars as Steward (1955) and Parsons (1966). But if theorists of society have had the archetype of biological evolution to inspire them for so long, why have they come up short in their attempts to achieve something similar for culture? Is culture qualitatively different than biology, so that attempting to create an "evolutionary theory" of culture is non-sensical, or a mere metaphor? In agreement with a growing number of scholars (Boyd and Richerson, 2005; Mesoudi, 2011; Sterelny, 2012), we hold that many of the basic processes which undergird the evolutionary theory of life apply to culture as well. Inspired by Darwin's meticulous study of details, we hypothesize that an evolutionary theory of culture will develop through careful observations of the smallest phenomena that can still be called "cultural." And just as Darwin gradually came to an understanding of natural selection by noting tiny differences among barnacles, finches, and other creatures, an understanding of the evolutionary processes of culture will likely derive from particularistic studies of culture's "building blocks."

We identify these building blocks as skills that human beings are uniquely predisposed to develop during infancy and childhood, but *only* through engaging others in richly scaffolded1 cultural contexts (Luria, 1976;Vygotsky, 1978; Hobson, 2002; Rogoff, 2003; Reddy, 2008). We use the term "skill" to indicate a theoretical framework of human behavior as constituted by capacities of relationality to people and things in pre-existing, culturally engineered

<sup>1</sup>Scaffolding, in this case, refers to the alteration of the environment, and of the instruction itself, to meet the learner's needs. Like the use of scaffolding in construction, the notion is that a set of temporary supportive practices and artifacts are put into place to facilitate the construction process, until the project is selfsupporting or completed. Wood et al. (1976) were among the first to use the term in relation to teaching and learning.

environments. Additionally, the term skill suggests a capacity that: (1) is developed; (2) never achieves a final state of enskilment, but is essentially determined by the continued exercise of the skill; and (3) depends upon all earlier uses of the skill, that is say, a skill always has a "history." It is upon the foundation of our "skills for intersubjectivity" and our "skills for interobjectivity" that culture is built. And to crib from Darwin (1859, 490): "...from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved." But just as natural selection was not apparent before Darwin's studies, so cultural evolution remains little more than a tantalizing mirage until its processes are rendered visible. The theory of natural selection was developed, and continues to be refined, by studying physical organisms as well as their ancestors' fossilized remains. In order to make culture visible, its *processes* must be operationalized in particular forms of interaction and materialized in *products* of those interactions; the object of study must first be an"object" before an empirical science can truly begin. We attempted to accomplish this very thing in our quasi-naturalistic joint action experiment.

In the study, pairs of participants were required to construct four model cars using LEGO® building bricks (see **Figure 1**). The pairs constructed their models in consecutive 10 min building sessions and employed distinctive "modes of interaction" during each of these sessions: egalitarian cooperation (EC), turn-taking (TT), and hierarchical cooperation (HC). At the beginning of each of the four sessions, participants were given written instructions for one of these modes of interaction. For EC, participants were directed to go about building their car however they saw fit. For TT, participants took turns in designing the car: one person offered a design suggestion while the other aided in constructing thatfeature and then they would reverse roles. For HC, one participant served as the"director"in charge of design decisions throughout the entire session. Upon ending the first HC session, participants reversed roles in the very next session. After each 10 min building session, we collected the car and remaining LEGO bricks and supplied the pairs with an identical set of building bricks at the beginning of the following session. The cars themselves served as our primary source of data, as described in more detail below.

**FIGURE 1 | Photo of participants building a car model.**

## **THEORETICAL FRAMEWORK**

Dynamics representative of larger cultural processes are at work in this experimental task. The design and assembly of a model car is a type of joint action, in which "two or more individuals coordinate their actions in space and time to bring about a change in the environment" (Knoblich et al., 2011). Much of what goes by the name "culture" could be defined in the same way (see Risjord, 2012). Additionally, these actions are "embodied practices of mind" (Gallagher, 2005, 206–236) whose proper unit of analysis is the coordinated dyad engaged in "participatory sensemaking" (De Jaegher and Di Paolo, 2007). As Risjord (2007, 414) observes: "Culture is neither a psychological phenomenon nor some kind of abstraction from individuals. It is the social interactions themselves, perfectly public and observable, yet distinct from any individual participant." Culture may be thought of as some arrangement of interlocking joint actions that build up from two people to larger and larger groups. Joint action relies on joint attention, the mutual attendance to an object indexed by such things as gaze following (Tomasello, 2008), and shared intentionality, the capacity to develop shared goals and coordinate actions toward the achievement of those goals (Tomasello and Carpenter, 2007; Gallotti and Frith, 2013). Additionally, orchestrating these skills in model car construction is an example of collaborative engagement in which "agents share goals and action plans manifested in a joint intention" (Tomasello et al., 2005). Numerous scholars have suggested this ability for shared intentionality and cooperation to be essential for culture (Tomasello, 1999b; Schönpflug, 2008; Boyd and Richerson, 2009).

How could simple tasks performed in such short time scales reveal anything substantial about cultural evolution? Joint actions are forms of "microgenesis" (Werner, 1957; Rosenthal, 2004; Sinha, 2005, 1553), that is, developmental processes unfolding in "real time." And whether demonstrated through "triadic interactions" between infants and caregivers (Trevarthen and Hubley, 1978), the assembly of LEGO models by adults (Clark and Krych, 2004; Bjørndahl et al., 2014), or troops coordinating their movements on a battlefield—all exemplify microgenesis. Longer term processes of cultural evolution, although undoubtedly more complex, are built from microgenetic actions. Just as a biologist might focus on macroevolution or microevolution (Filipchenko, 1927; Dobzhansky, 1937), both exhibit the identical processes of natural selection; the model of selection behind the rise of the dinosaurs and the model of selection behind the antibiotic resistance of a given species of Staphylococcus are one and the same. So while studying long term changes of social organization is a viable means to investigate the topic, we must turn to real time interactions if we are to build an experimental science of cultural evolution.

## *Skills for intersubjectivity*

Intersubjectivity has been defined as "the sharing of experiential content (e.g., feelings, perceptions, thoughts, and linguistic meanings) among a plurality of subjects" (Zlatev et al., 2008). It is marked by such things as shared emotions (Michael, 2011), empathy (Scheler, 1954; Zahavi, 2008), and "resonance systems" which lead us to experience, in some partial way, the"feeling" of an action when watching someone else perform the action (Gallese, 2001; Gallagher, 2008). Many of the processes that get bundled together in the term "intersubjectivity" begin to develop during infancy as a set of skills: "...capabilities of action and perception of the whole organic being (indissolubly mind and body) situated in a richly structured environment. As properties of human organisms, skills are thus as much biological as cultural" (Ingold, 2000, 5). The development of these skills emerges from interacting with older, more competent humans engaged in *task-oriented patterns of practice*, even if these practices are "mere play" (Hobson, 2002, 42; Di Paolo et al., 2010, 72–78).

Trevarthen and Hubley (1978) discussed the easy imitation of smiles, cooing, and interpersonal gaze by young infants as "primary intersubjectivity" which begins to give way, around 9 months of age, to "secondary intersubjectivity." Discussing this more recursive form of intersubjectivity, Hobson (2002, 61) notes: "Clearly, personal relations are not just about exchanging smiles and coos and other endearing or not-so-endearing gestures with someone else. They are also about sharing experiences of things. Personal relations are about connecting with someone else and making reciprocal emotional contact, but also about exchanging points of view, or agreeing and disagreeing about this or that, or sharing jokes. If we can clarify how infants engage with someone else so that communication is *about* a third object or outside event, then we may draw closer to seeing how they come to think about things." Infants begin to demonstrate secondary intersubjectivity by participating in "triadic interactions" which involve "a referential triangle of child, adult, and the object or event to which they share attention" (Tomasello, 1999a, 62). A classic task of this kind is the rolling of a ball back-and-forth between child and caregiver. At this young age, then, humans begin to fluidly engage in practices that introduce them to the conventionalized uses and meanings of objects. Enculturation, the long term process by which a person acquires the requisite languages, skills, and sensibilities of the groups to which she belongs (Wexler, 2006; Kiverstein and Farina, 2011; Lende and Downey, 2012), depends on forms of social cognition that develop from engaging in playful activities of this sort. It may be significant to note that human infants*enjoy* the processes and practices of becoming a cultural being. Creating productive research programs to investigate cultural evolution ought to look for these sorts of enjoyable activities as indices of being on the right path.

But the attraction to games and then stories that so marks early stages of development are also forms of "serious play." From these activities norms, rules, and values are introduced to the child who quickly begins to embody his particular culture's mores (Sinha, 2009, 174–176). With these normative engagements with others and with objects, the child also begins to intelligently observe and act on regularities in the environment, a process called *schematization* (see Piaget, 1952). Schemas are memories that an individual develops for recurrent features of the world which are sufficiently open and flexible to apply to "sets" or "categories" rather than to idiosyncratic items (Bartlett, 1932; Rumelhart, 1980; McGraw, 2007). For an English-speaking person in the contemporary globalized world, schemas would be typical for such things as "trees," "buildings," and "flags," but probably not for "cyclotrons," "halberds," or "transepts." A schema for a car, for instance, would include basic characteristics like "four-wheeled vehicle,""possesses

an enclosed space for driver and passengers," and a host of typical components (e.g., steering wheel, windshield, headlights). The fact that people would use a modifier before the term to identify items uncommon for the set (e.g., three-wheeled car, flying car, solar car) suggests the importance of common features in the development and consolidation of schemas.

While schemas are routinely demonstrated in our daily interactions, trying to find them in language presents many challenges that researchers have been wrestling with for decades (D'Andrade, 1995; Shore, 1996; Quinn, 2005). Sinha (2005, 1538) suggests an alternative approach very much in line with our study: "...cognitive and cultural schemas find material realization—are embodied—in the artifacts of material culture; and the way in which such artifacts are themselves embedded in culturally appropriate, normative structures of action and interaction. In this perspective, mind is socially distributed between people, and mental processes are supported by objects which embody and represent them. Cognition extends beyond the individual; embodiment goes beyond the skin." Searching for schemas in the physical world seems eminently preferable to inferring them from language since investigating the materialization of schemas affords a more quantitative and empirical approach, as we demonstrate in the analysis below.

## *Skills for interobjectivity*

Unfortunately, the shadow of Descartes still looms large; just as understanding mind apart from body is now perceived to be a philosophical blunder, so trying to understand the social and the cultural without considering its material basis revisits a distressingly common "category mistake" (Ryle, 1949). Though culture is made up of bodies, places, and things, many discussions about culture would lead one to think it was composed of abstract forces alone (see Latour, 2005). However, reflecting on people and social forces without consideration of their material aspects and accompaniments reveals itself to be an impoverished substitute: try to imagine Roman Catholicism without Bibles, churches, communion wafers, monasteries, crucifixes, tombs, or Rome. Culture is a particular coordination among "things" in the world, including but not limited to bodies, places, structures, and technologies. Coordination among these various things, through languages, customs, and rituals, does not exist apart from them. Even the notion of a culture apart from the things in the world that make it up turns out to be empty of content. One of the goals of this article is to highlight the ineliminable materiality that goes along with culture and, consequently, with cultural transmission (Sinha and Rodríguez, 2008; Sinha, 2009). Discussions of cultural transmission must take into account the fact that the social and the material are necessarily linked, even if many scholars have seen the latter as mere effect or consequence of the former. In fact, the social and the material co-constitute one another, so that one cannot reasonably discuss one without the other (Latour, 1996; Miller, 1998, 2005; Malafouris, 2013). Additionally, because of the differences, particularly in time scale, between behaviors, bodies, and artifacts, each of these employs distinctive processes of cultural transmission. Nevertheless, robust forms of cultural transmission are demonstrable in each of these activities and structures, and across their varying time scales, from the immediate effects of

imitative learning between children and caregivers to the potentially long lasting effects of writing manuscripts (Garrod et al., 2007; Levy, 2012).

Philosophical discussions about intersubjectivity routinely fail to mention the importance of objects and other features of the physical world. But as Tomasello noted above, a triadic interaction typically features an object as the vertex in a referential triangle involving infant and caregiver. It is the object or event which"joins" the individuals' attention (Sinha and Rodríguez, 2008, 357) and it is "through participation in joint actions normatively structured around the use of artifactual objects...that the child finds an entry into the intersubjective realm of reasons for actions" (Sinha, 2009, 182). Peculiar objects, such as rolling balls or spinning tops, afford joint attention processes in non-trivial ways. But beyond such attention-grabbing toys, the material world is more than a canvas or blank slate for the play of human intersubjectivity. Enfield (2000, 42) observes that "representations are distributed across the 'community of minds' via coordinated focus on this mediating semiotic material—this may include gestures, proxemics, haircuts, people's faces, melodies, cultural artifacts, odors, plants, animals, clothing, meteorological phenomena, among just about anything else that two people can coordinate attention on." Intersubjectivity is thus dependent on "mediating structures" which include artifacts and the cultural practices that make sense of them (Hutchins, 1995). For instance, a person's subjective perception of time is based on intersubjective notions of what time is and how it is measured, neither of which mean much without the calendars, clocks, and watches that people routinely put to use for purposes of interpersonal coordination (Williams, 2004).

The enaction of joint attention and shared intentionality, typically considered to be intersubjective phenomena, are also "interobjective." The term interobjectivity came from Latour (1996, 240) who observed that "if you set yourself the task of following practices, objects and instruments, you never again cross that abrupt threshold that should appear, according to earlier theory, between the level of 'face-to-face' interaction and that of the social structure; between the 'micro' and the 'macro'." Latour suggests that a careful description of all the mediators involved—people, artifacts, and other structures—offers a window into processes operating across multiple time scales. Artifacts may serve as powerful repositories of symbolic meaning, but more than that, their built-in design and engineered affordances permit later generations and even historically discontinuous peoples to learn from and use these structures, often without explicit training (Malafouris and Renfrew, 2010; Hodder, 2012). These things are sometimes discussed as forms of "external memory" through which a society, wittingly or not, records its achievements for posterity (Donald, 1991; Meskell, 2005). And as important sources of information for social scientists, human artifacts are catalysts and precipitates of human interaction; artifacts are for the social sciences what fossils are for biology (Ehrlich and Ehrlich, 2008, 66). In artifacts we can trace the transformation over time of human interaction. Evolutionary theory has demonstrated the importance of studying fossils for working out the details of life's history. Similarly, the study of artifacts may provide the sort of objective data necessary to unravel many of the key mysteries of cultural evolution (Basalla, 1988; Kirsh, 2010; Johannsen et al., 2014).

Our study differs from many earlier investigations of cooperative joint action by not only studying the intersubjective skills necessary for such interaction, but also "seeing through things," in this case model cars, to derive conclusions about cultural processes as a whole. Too many cultural theorists forego the archeologists' emphasis on material culture, but it is precisely in material culture that many of the conventions, representations, and "ideas" that others consider to be essentially private and abstract are to be found in concrete form (Sinha and Rodríguez, 2008, 364). By focusing on the dyad as our basic unit of analysis and by looking at LEGOs as mediators of cooperation, we have tried to overcome these limiting biases. In doing so, we foreground aspects of culture that have been previously understudied, namely its basis in skills for intersubjectivity *and* interobjectivity.

## **GOALS OF THE STUDY**

We consider our study and its results to be a "proof of concept." This study presents methods for discerning, and quantifying, schema-like intersubjective understandings in material form. By designing experimental tasks that require pairs or groups to act together toward achieving—via LEGOs—a materialization of shared features of the environment (like CARS), concepts are transformed into percepts. This approach was inspired by recent work in cognitive science that looks to action and interaction for insights about human cognition (Rogoff, 1990; Varela et al., 1992; Hutchins, 1995; Goodwin, 2000; Noë, 2004; Stewart et al., 2010; Knoblich et al., 2011; Dale et al., 2014). In fact, the notion of the schema ought to be conceived as one feature in a much larger picture of human interaction. For schemas—just like words, phrases, and behaviors—only come about through the developmental processes that underlie human capacities in general, which derive from the fusion of our skills for intersubjectivity with our skills for interobjectivity.

# **MATERIALS AND METHODS**

The data below derived from a larger project investigating human interaction. Here, we present an analysis of the products of those interactions (i.e., the model cars built by the pairs of participants). The analysis of interaction measures is presented in additional publications based on the study (Mitkidis et al., in review; Wallot et al., in review).

# **PARTICIPANTS**

A total of 74 participants from Aarhus University participated in the experiment (average age: 23.5 years SD = 3.5 years) and were randomly assigned to pairs. Using standardized forms in the subjects' native language, the pairs were instructed to cooperate in the construction of model cars using LEGO building bricks. The experiment lasted 75 min. At the end of the experiment participants were compensated with 350 DKK (≈47 EUR). The protocol was reviewed and approved by the Ethics Committee for Region Midtjylland, Denmark. All participants signed a written informed consent form.

# **PROCEDURE**

The 37 pairs of participants used LEGO building bricks to construct model cars during four consecutive 10 min sessions. At the beginning of each building session, subjects were given a new box of LEGOs which contained the same building bricks present in every other session. Also, participants were given different instructions on how to go about building a car together during each session. The order in which the instructions were given was randomized for each pair of participants. Subsequent data analysis revealed that neither the modes of interaction (EC, TT, and HC) nor their order ended up being salient since the results described below demonstrate very strong carry-over effects from earlier to later sessions; if anything, the modes of interaction might have worked against this effect. Additional publications based on this study utilize results based on these modes of interaction and discuss their significance (Mitkidis et al., in review; Wallot et al., in review). At the end of each session, the model car and remaining LEGOs were removed from the room and the car was later photographed, both as an assembled model as well as a set of disassembled building bricks (see **Figure 2**). Each car's pieces were counted and categorized using a unique identifier for the type and color of each LEGO brick.

## **RESULTS**

To evaluate similarity between any two cars, the number of different pieces they shared in common was calculated. Afterward, the

number of common pieces was divided by the overall number of different pieces used in both cars to account for the fact that bigger cars will tentatively show greater overlap of component pieces by chance alone.

To evaluate whether there was an overarching pattern across all pairs that reflected participants' understanding of the concept of a car, rank-order distribution was constructed using all pieces from all cars. As can be seen in **Figure 3**, the number of pieces that were used to build cars scaled logarithmically to the rank order of pieces (*R*<sup>2</sup> <sup>=</sup> 0.992) with exceptions at the front- and back-end of the distribution; very frequently and very infrequently occurring pieces deviated from this relationship. An inspection of these deviations revealed that the very frequently occurring pieces were wheels, hubs, and axes; arguably indispensable components of a car. The very infrequently used pieces seemed to be largely non-functional pieces that were neither necessary nor typical of cars and possessed little in the way of aesthetic or ornamental quality (see examples in **Figure 3**).

The broad, logarithmic distribution of pieces in between seem to fall on a continuum of highly functional (such as larger plates used to construct the chassis of a car) and highly stereotypical pieces (such as round, transparent pieces that typically served as car lights) on the high-frequency end, and increasingly non-functional pieces on the low-frequency end.

To investigate how cars developed across sessions, we investigated the average carry-over effect in pieces from one car to the next. The similarity between consecutively built cars increased from session to session (see **Figure 4A**); consecutive cars shared a greater and greater percentage of the same kinds of pieces [*F*(2,104) = 6.84, *p* = 0.002, η = 0.116]. Interestingly, there was also an increasing influence of the first model on consecutively built models, as models constructed in subsequent sessions shared an increasing amount of pieces with the *very first* model car built [*F*(2,104) = 6.31, *p* = 0.003, η = 0.108], as demonstrated in **Figure 4B**.

To investigate the rates of productivity across different building sessions, we calculated the size of each car (i.e., the number of its component pieces) and subjected the measure of car size to a repeated measures ANOVA with the factor session number (1, 2, 3, 4). As can be seen in **Figure 5**, cars grew bigger across the building sessions [*F*(3,156) = 18.80, *p* < 0.001, η = 0.266].

To investigate how design features changed across building sessions, we calculated the dominant color used in the four building sessions by each pair of participants. This was done by calculating the percentage of LEGOs within each color category for each car, and then summing that percentage across all four cars built by each pair. We then investigated how the percentage of the dominant color changed across the four building sessions, subjecting the percentages to a repeated measure ANOVA with the factor session number (1, 2, 3, 4). As shown in **Figure 6**, the proportion of the dominant color was strongest in the first model and dropped off from session 1 to 2, but then increased steadily from sessions 2 to 4 [*F*(3,156) = 16.27, *p* < 0.001, η = 0.238].

While most of the aforementioned measures refer to patterns derived from within-pair comparisons across the four sessions, we also performed a between-pairs comparison of the overlap in

**FIGURE 2 | A disassembled car model.**

**FIGURE 3 | Plot of the logarithm of number of pieces vs. the rank-order of pieces.** The middle part of the distribution is characterized by a strong relationship between functional/stereotypical vs. ornamental pieces. The

front-end of the distribution marks fundamental, indispensable pieces (wheels and axes), while the back-end features increasingly non-functional, non-ornamental pieces.

**session compared to cars built in all subsequent sessions (B).** Cars constructed in later sessions showed an increasingly greater overlap with their predecessors, and with the very first models.

LEGOs for sessions 1, 2, 3, and 4. As shown in **Figure 7**, the similarity of cars within each session, quantified as the average number of pieces shared, did not differ as a function of building session. To investigate the diversity of cars across sessions, we calculated the average overlap of pieces between all the cars constructed in each session [*F*(3,204) = 0.79, *p* = 0.502].

# **DISCUSSION**

As demonstrated in **Figure 3**, all car models featured many things in common. This almost certainly derives from the culturally mediated schemas participants share. Coming into the experimental setup with similar schemas exerts non-trivial influences on behavior since it greatly accelerates coordination among

**FIGURE 6 | Proportion of the dominant color in a car model as a function of building session.** The prominence of the dominant color in a car was strongest in the first building session and dropped from the first to the second session, only to steadily increase across the remaining three sessions.

**pairs, but in the same session.** The similarity of cars within each session, quantified as the average number of pieces shared, did not differ as a function of building session.

participants (e.g., they do not have to puzzle over what the word for car means or what a car should basically look like). Moreover, these shared schemas immediately reduce the possibilities given the large set of LEGOs; since individual building bricks are more or less important for constructing a model car, the actual set of LEGOs and their combinations far exceeds the usable set for accomplishing the task.

Translating the basics of car design into a LEGO model posed no real challenge for the pairs. They are heirs of a technological culture that worked out the basics of wheeled transport over many centuries. For example, cars cannot be built in such a way that two of their wheels roll in one direction while the other two roll perpendicular to that direction. Participants, because of the schemas they shared, did not need to engage in fruitless experiments regarding the alignment of wheels or hundreds of other possibilities that run counter to the basic template of a car; history had accomplished this work already. Perhaps they did not realize it, but all participants came into the experimental setting with all the know-how they required to build model cars from the very first building session. This is a significant point since people in other times and places would have no such knowledge, individually or collectively. It is because of this simple fact that an experiment like this can capture something meaningful about culture.

In constructing their first model, pairs negotiated significant coordination costs—they needed to learn how to successfully work with each other in achieving the task—that, once paid, could be reliably recaptured in each successive building session by working together in similar ways and producing a model that basically conformed to the prior models they had already produced. Successful coordination became increasingly predictable by adhering to designs that reified their prior coordination patterns. Car designs became more and more standardized across sessions, but they also *grew* from one session to the next. This increase in the number of pieces used for each car demonstrates something like a "ratchet effect" (Tomasello, 1999a, 37–41) in that the efficiencies of adopting conventions established in earlier sessions freed up resources (particularly time) for additional modifications in later sessions. Tomasello describes the ratchet effect as the ability, peculiar to humans alone, to faithfully learn and preserve innovations over time, and generations, which permits additional modifications to accumulate. This human capacity is ratchet-like not only because it slowly cranks things upward in complexity, but also because it prevents slippage that might cause innovations to be dropped (i.e., lost or forgotten; Tan and Fay, 2011). The kind of imitative learning that already begins to show up in triadic interactions leads to the "cumulative cultural complexity" that defines human culture; it is a form of inheritance that ties human bodies and minds to their artifacts, all of which have "cultural histories" (Tomasello, 2006, 206). As Tomasello (2006, 205) notes: "...none of the most complex human artifacts or social practices—including tool industries, symbolic artifacts, and social institutions—were invented once and for all at a single moment by any one individual or group of individuals. Rather, what happened was that some individual or group of individuals first invented a primitive version of the artifact or practice, and then some later user or users made a modification, an improvement, that others

then adopted perhaps without change for many generations, at which point some other individual or group of individuals made another modification, which was then learned and used by others, and so on over historical time." Tomasello concludes that just a few basic, though momentous, abilities which distinguish us from our nearest kin, the chimpanzees, were required for the development of human culture which he sees as our ability to create "history." And history is not simply *having* a past, but the intentional *preservation* of the past—through memories, actions, and objects—so that it may have relevance for the present and future.

Given that participants began each session with the identical set of building bricks, it might be expected that they would produce four unique models. After all, the number of LEGO bricks used for an average car model produces immense combinatorial possibilities. However, as seen in the results, this was not the case. Others might expect that because participants' schemas about cars are so similar, the pairs might find it most efficient to employ a "status quo" bias (Kahneman et al., 1991), essentially producing the same model again and again. As can be seen in the results, though, this is far from the stepwise progressions exhibited in the actual comparisons. The data revealed a surprising pattern in the selection of building bricks as well as features of car design across consecutive building sessions. The model in each later session demonstrated an increasing reliance on the model which immediately preceded it. Additionally, the very first model served an increasingly important role as a design template in each later session. As expressed by the cars themselves, each pair of participants seems to have consolidated their schematic representations of LEGO model cars, so that they became increasingly convinced what a LEGO car "ought" look like as they proceeded from one session to the next.

## **THE PERSISTENCE OF MEMORY**

When looking over the results, a set of stepwise progressions shows up across numerous measures. We identify these patterns as "path dependence" (David, 1985; Liebowitz and Margolis, 1995; Garud and Karnøe, 2001b) demonstrative of rapid conventionalization. Path dependence refers to the ability of influences from the past, usually near the beginning of a phenomenon, to strongly constrain aspects of its future. This often occurs even when the early conditions have little functional relevance for later conditions. Garud and Karnøe (2001a, 4) describe how "phenomena are sensitive to small differences in the underlying sequence of events" such that "a steady accumulation of small differences can result in the technological field locking onto a trajectory." In broad terms, path dependence exhibits the persistence of past states in future states and has often been discussed using the truism "history matters" (North, 1990, 100; David, 2001).

The notion of path dependence has been influential in economic theory, where scholars have often invoked it to explain inefficiencies that endure in spite of seemingly superior alternatives (Liebowitz andMargolis,1995). Examplesfrom technological history have played an important role in demonstrating the power of path dependence. David (1985) described how early models of typewriters required organizing the keyboard using the QWERTY layout that has dominated ever since. However, the mechanical reasons for implementing this format ceased to be relevant

a short time later, as new mechanisms were introduced. And, of course, these mechanical constraints have no relevance for computer keyboards which use an entirely different implementation to link keystroke inputs to graphic outputs. The QWERTY layout has endured in spite of reasoned alternatives at the time and greatly superior alternatives at later times. In the 1930s, a pair of education professors by the name of Dvorak and Dealey developed a keyboard configuration that permits users to type much faster while also reducing errors and strain (see Noyes, 1983). Nevertheless, the ready availability of QWERTY typewriters ensured that the majority of typists would learn using this layout and the fact that the majority of typists continued to learn the QWERTY format ensured that manufacturers would continue producing such machines in greater and greater quantities over time.

This example highlights how path dependence relies on "positive feedback," the amplification of an effect by its influence on the processes which give rise to it. The fact that there is a superior alternative to the QWERTY layout and that rational consumers ought to select the superior format over the inferior one—as many economic models would predict—is not, in fact, what occurred. Mechanical constraints at an early stage of development necessitated a particular layout which has dominated ever since, in spite of better alternatives. According to adherents of the path dependence model, this suggests that history can trump powerful competing principles: "History then is the tool to understand what rationality and efficiency do not explain, that is, the random sequence of insignificant events that are not addressable by economic theory" (Liebowitz and Margolis, 1995, 17–18). As is evident in this case, as well as in many other instances of history from the demise of the dinosaurs due to a stray meteor to the discovery of the American continents by sailors searching for a quicker route to Asia—*contingent* events, that is, events which might have transpired in some other way, often change things in ways that cannot be foreseen, even using the best scientific models at our disposal. In similar fashion, participants had tremendous freedom in developing their first car models but the relatively arbitrary forms they settled upon exerted downstream influences on all their later models, an effect very much like path dependence (see **Figure 8**).

Relating these findings to evolutionary theory, Stephen Jay Gould's book, *Wonderful Life: The Burgess Shale and the Nature of History*, offers a provocative interpretive framework. In the book, Gould investigates the significance of the "Cambrian explosion," a geological period that began around 540 million years ago, for the theory of evolution. In just 60 million years, life went from a small variety of relatively simple organisms to a huge diversity of complex organisms; almost all animal phyla ("the fundamental ground plans of anatomy") developed in this period and very few new ones have come into being in the 500 million years afterward. The most remarkable finding, according to Gould, is not that this proliferation occurred, but that animals on Earth today evolved from only a fraction of those which existed during this prolific era. Instead of a continued diversification of organisms, as exhibited during the Cambrian explosion, a small sample from that time served as ancestors for all later life. Gould (1989, 47) notes that "the later history of life proceeded by elimination,

not expansion. The current earth may hold more species than ever before, but most are iterations upon a few basic anatomical designs." Much impressed with the odd, often fantastic, anatomical varieties present in the Cambrian period (and preserved in the Burgess Shale), Gould (1989, 47) observes that "later history is a tale of restriction, as most of these early experiments succumb and life settles down to generating endless variants upon a few surviving models." Gould argues that once a basic form proves to be successful it begins to reproduce rapidly and its variations become increasingly subtle over time. There are fewer and fewer grand design changes of the sort that would revoke the "ground plans of its anatomy" and potentially lead to a new phylum. What is common to both the path dependence literature, particularly in relation to technology, and to evolutionary theory is that big innovations early on establish a path which all later members of the type follow. Whether it be the dominance of the QWERTY layout over and against novel keyboard layouts a short time later or the hegemony of a subset of phyla for more than 500 million years, a principle of evolution seems to be that basic forms established early on consolidate their hold and prevent interloper designs from entering their niche. This process reduces diversity of form but accelerates increasingly specific processes of optimization.

Just as Gould might have predicted, our results demonstrate the inordinate importance of the first car model for shaping later models. While the nature of the experimental setup provides participants with a set of conditions to produce four novel designs, the opposite, in fact, occurs. Reflecting on the evolutionary process, Gould (1989, 321) notes how "little quirks at the outset,

occurring for no particular reason, unleash cascades of consequences that make a particularfuture seem inevitable in retrospect. But the slightest early nudge contacts a different groove, and history veers into another plausible channel, diverging continually from its original pathway. The end results are so different, the initial perturbation so apparently trivial." Instead of evolutionary processes completely determining the nature and scope of life, he asserts "history as the chief determinant of life's directions" (1989, 288). Similarly, each pair's first car model, that first concatenation of arbitrary design decisions and brick selections, served as a design template for all later building sessions, which ended up as variations upon a theme. And just as with the distinctive phyla established during the Cambrian, car designs made by *different* pairs showed no convergence (see **Figure 7**). This seems to indicate that there were no constraints or attractors based on function or optimality that would cause all pairs to converge toward an "ideal" design. Instead, it is as if those arbitrary first designs established distinctive channels which, while running concurrently and in parallel, did not have any particular aim toward which they might evolve.

Stuart Kauffman (1995, 195), a theoretical biologist and complexity theorist, and Gould are in agreement regarding the general pattern of life since the Cambrian explosion, namely that once "species with a number of major body plans sprang into existence, this radical creativity slowed and then dwindled to slight tinkering. Evolution concentrated its sights closer to home, tinkering and adding filigree to its inventions." This reduction in basic diversity relates to the amplification of "conflicting constraints" as organisms become increasingly "locked in" to their fundamental anatomy (1995, 199–201) and as all evolving life becomes more and more competent for its niche so that interlopers face greater competition.

Kauffman (1995, 202) takes this "Cambrian pattern of diversification" even further, believing it to be exhibited in a wide range of complex phenomena, including technological evolution: "...given a fundamental innovation—gun, bicycle, car, airplane—it appears to be common to find a wide range of dramatic early experimentation with radically different forms, which branch further and then settle down to a few dominant lineages." To be clear, neither Gould nor Kauffman argue against the increase of *overall* diversity through evolutionary processes, but posit a reduction in the diversity of *basic* forms, what corresponds to the level of phyla in biological taxonomy. Subsequently, increased diversification happens at lower taxonomic ranks, particularly through speciation. Reviewing his juxtaposition of the Cambrian explosion with technological evolution, Kauffman (1995, 205) concludes: "the parallels are striking, and it seems worthwhile to consider seriously the possibility that the patterns of branching radiation in biological and technological evolution are governed by similar general laws...tissues and terra-cotta may indeed evolve in similar ways. General laws may govern the evolution of complex entities, whether they are works of nature or works of man." Kauffman's assertion that a Cambrian pattern of diversification may be applicable to technological evolution would seem to be exhibited in the results of this joint action study. The first car established something like a "phylum" which consolidated in each successive session. This pattern seemed to apply both to the LEGO bricks selected as well as the dominant color participants settled upon. The results seen in this study may exhibit larger dynamics of cultural evolution, a set of dynamics that fall in line with the phenomenon called path dependence. And while the warrant is tentative, similar dynamics may also shape complex phenomena as diverse as anatomical structures and the evolution of technology.

# **CONCLUSION**

Few would argue against Tomasello's description of the ratchet effect leading to "cumulative cultural complexity," but most would assume this to mean increasing diversification as time goes forward. The argument here is that the cumulative complexity of culture occurs in a subtle fashion: for any cultural innovation, experiments in basic form lead soon thereafter to processes of reduction and elimination as a dominant path is established. From that moment onward, increasingly small, and gradual, modifications reiterate the basics of the original form.

Given the results of this "proof of concept" study, it would seem that applying evolutionary theory to the study of culture is a generative exercise. And this would seem to be true in spite of the fact that the phenomena in question, biological transformation over time and cultural transformation over time, operate on qualitatively different "kinds." Biology and culture are continuous, but they are clearly not the same thing; transformation over time, however, refers to a set of processes that may well apply to a wide range of phenomena. In pursuit of this, we have utilized ideas about path dependence in our analysis of the products of joint action. A prominent pattern across many phenomena is a

reduction in the diversity of basic forms over time. Based on these findings, it is reasonable to conclude that solutions to invariant tasks and challenges need not be endlessly novel, thus draining energy and resources from other tasks and challenges; an earlier solution that has already proven to be satisfactory is the foundation upon which subtler optimizing processes can set to work. An additional reduction in variability derives from shared schemas that facilitate intersubjective as well as interobjective coordination. The possession and use of schemas means that we approach a task with many ideas about the world shared in common. Even though these ideas greatly constrain potential variability, their usefulness in promoting coordination enhances overall efficiency. As present and subsequent experience can be made to "more or less" accommodate prior expectations, and update those expectations, the adjustments necessary to succeed in the present are greatly reduced in both time and complexity.

## **ACKNOWLEDGMENTS**

We thank L. Seitzberg, C. Larsen, and J. Aaris for their assistance with the study and R. Fusaroli for reviewing an early draft of the paper. Additionally, our study would not have been possible without B. Thomsen and T. Sørensen of the LEGO Foundation. This work was supported by the Marie Curie Initial Training Network, "TESIS: Towards an Embodied Science of InterSubjectivity" (FP7-PEOPLE-2010-ITN, 264828).

## **REFERENCES**


Ryle, G. (1949). *The Concept of Mind*. London: Hutchinson's University Library.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 May 2014; accepted: 26 August 2014; published online: 12 September 2014.*

*Citation: McGraw JJ, Wallot S, Mitkidis P and Roepstorff A (2014) Culture's building blocks: investigating cultural evolution in a LEGO construction task. Front. Psychol. 5:1017. doi: 10.3389/fpsyg.2014.01017*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 McGraw, Wallot, Mitkidis and Roepstorff. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Instituting interaction: normative transformations in human communicative practices

# *John Z. Elias1\* and Kristian Tylén2,3*

<sup>1</sup> Department of Philosophy, School of Humanities, University of Hertfordshire, Hatfield, UK

<sup>2</sup> Department of Aesthetics and Communication, Center for Semiotics, Aarhus University, Aarhus, Denmark

<sup>3</sup> Interacting Minds Centre, Aarhus University, Aarhus, Denmark

## *Edited by:*

Hanne De Jaegher, University of the Basque Country, Spain

#### *Reviewed by:*

Charles Lenay, Université de Technologie de Compiègne, France Carol A. Fowler, Yale University, USA (retired)

## *\*Correspondence:*

John Z. Elias, Department of Philosophy, School of Humanities, University of Hertfordshire, Hatfield AL10 9AB, UK e-mail: john.z.elias@gmail.com

Recent experiments in semiotics and linguistics demonstrate that groups tend to converge on a common set of signs or terms in response to presented problems, experiments which potentially bear on the emergence and establishment of institutional interactions. Taken together, these studies indicate a spectrum, ranging from the spontaneous convergence of communicative practices to their eventual conventionalization, a process which might be described as an implicit institutionalization of those practices. However, the emergence of such convergence and conventionalization does not in itself constitute an institution, in the strict sense of a social organization partly created and governed by explicit rules. A further step toward institutions proper may occur when others are instructed about a task. That is, given task situations which select for successful practices, instructions about such situations make explicit what was tacit practice, instructions which can then be followed correctly or incorrectly. This transition gives rise to the normative distinction between conditions of success versus conditions of correctness, a distinction which will be explored and complicated in the course of this paper. Using these experiments as a basis, then, the emergence of institutions will be characterized in evolutionary and normative terms, beginning with our adaptive responses to the selective pressures of certain situational environments, and continuing with our capacity to then shape, constrain, and institute those environments to further refine and streamline our problem-solving activity.

**Keywords: experimental semiotics, normativity, conventionalization, communicative practice, institutionalization**

# **INTRODUCTION**

Institutions, understood as societal structures constituted and governed, at least in part, by explicit rules, presuppose a language in which such rules can be formulated and expressed (Searle, 2005, 2010). This point alone indicates an intimate interrelation between our institutional and linguistic activities. Yet this dependency on language might tempt us to picture institutions as somehow magically created by declarative speech acts, conjured, as it were, through the incantations of performative utterances. Such a picture obscures the fact that, prior to the formal declaration of an institution's existence and the explicit articulation of its structures and functions, various practices, customs, conventions, traditions, etc, comprise the relevant activity that undergoes institutionalization. This development is not a matter of mere historical accrual, but a dynamic process necessary for the evolution of viable institutions. Understanding the emergence of institutions from tacit and fluid practices and processes entails disentangling the interplay between the informal and formal, the implicit and explicit, an interplay centrally involving the use of language in different roles and forms.

Recent studies in semiotics and linguistics offer pertinent insights into the coordinative and organizing power of language (e.g., Garrod et al., 2007; Mills, 2013). A broadly evolutionary framework guides much of this work, with semiotic and linguistic communication conceived as adapting to environmental conditions. These experiments demonstrate that interacting participants, jointly solving a problem, often in the guise of a game, are acutely sensitive to the selective pressures of the situation at hand, converging on common communicative practices and vocabularies without explicit deliberation or decision concerning these practices (Garrod and Doherty, 1994; Fay et al., 2008; Mills, 2011). Participants produce manifold communicative forms in response to the demands of the task, with the task in turn exerting selective pressure on those forms, leading, if successful, to the survival of those most functionally suited to the problem situation. Thus a particular situational environment, defined by a particular problem or set of problems, calls forth and selects for communicative practices fit for that situation. This basic dynamic of fecund generation of communicative forms and their functional selection may be viewed as an engine of specialization, spurring and honing the specialized vocabularies characteristic of specific disciplines.

These themes will be expanded in what follows. We will begin with a review of relevant experimental work on the evolution of communicative systems and signs, with special focus on the *optimally interacting minds* experiments (Bahrami et al., 2010; Fusaroli et al., 2012), which provide an especially promising experimental paradigm to explore the role of language in the formation of institutions. Setting the frame of this field of research, we treat these experiments as a kind of laboratory for larger considerations concerning communication and coordinative activity. Specifically, we claim that linguistic interaction within these situations is more continuous with technique and action than propositional representation. This in turn will entail clarifying the notion of *situation* as it operates in these experimental settings, which, as a corollary, will involve critical pressure placed on the idea of *situation models*, and whether the ecological concept of *affordances* might better explicate the dynamics of joint action within the constraints of situations (Knoblich and Sebanz, 2008).

The roles of *convention* and *instruction* in processes of institutionalization will then be taken up, as fulcrums enabling the transition from conditions of *success* to conditions of *correctness*, thereby tracing the emergence of institutions in terms of the transformation of our *normative* engagements. We start with the poles of spontaneous coordination and explicit instruction, which provide a stark way of sketching these normative distinctions. Between these poles, however, lies a continuum involving *convention* and *conventionalization* of communicative practices. Indeed, the implicit conventionalization of technique, of ways of going about and accomplishing tasks, points to the establishment of standards of correctness independent of explicit declaration and decree (Mills, 2011). A spectrum is thus charted, stretching from the convergence of communicative practices, driven and determined by conditions of success, to the development of convention, involving emerging norms of implicit correctness, to the articulation of instructions, which, for the purposes of this paper, defines a kind of endpoint of explicitly stated standards of correctness. These normative considerations, we emphasize throughout, are inextricably bound up with differences in linguistic interaction.

Undergirding the discussion, running through it as a theme, is a *functionalist* conception of language as acutely adaptive communicative activity (Tylén et al., 2010). More generally, this experimental work exemplifies the dynamics of natural languages as living, evolving systems, teeming in their multifaceted applications, their various uses and forms, with certain terms and turns of phrase in turn selected for use in specific situations, leading to the development of adaptive vocabularies fit for particular purposes, and, eventually, to the specialized discourses of distinct disciplines and institutions. What arises, then, is a view of communicative activity as environmentally and normatively sensitive, with selective pressures comprising situations within which communication may be taken as functional or successful. And with the gradual development of convention, and the eventual introduction of instruction, situations become structured according to standards of *correctness.* Institutionalization, then, is defined by the normative move from *selection* to *sanction* of actions within increasingly intentionally informed environments. This approach to institutions is consonant with recent turns in the cognitive sciences (Clark, 2006; Rowlands, 2010; Hutto and Myin, 2012) in which cognitive capacities are conceived as fundamentally environment-involving, as copings and engagements within the constraints of various environments; as such, this paper is an attempt to apply these concepts to the processes and dynamics of institutions (Gallagher and Crisafi, 2009; Gallagher, 2013). From this perspective, much of our large scale social cognitive activity may be viewed as the deliberate shaping of situations and environments, with the aim of guiding and cultivating the activities occurring within them. In shaping and constraining our environments, we shape and constrain our activities and ourselves.

# **SETTING THE SCENE: EXAMPLES AND ELUCIDATIONS FROM EXPERIMENTAL SEMIOTICS**

Recent studies in experimental semiotics have investigated the evolutionary aspects of semiotic and linguistic communication (Galantucci,2009; Galantucci and Garrod,2010). However, empirical investigation of the evolution of natural languages is inherently problematic, as their evolutionary origins are either difficult to ascertain or completely inaccessible, and certainly not available for experimental manipulation. One way experimental semioticians circumvent this problem is by having participants communicate in graphical media without recourse to conventional linguistic symbols (Galantucci, 2005; Healey et al., 2007; Dale et al., 2011), for instance in scenarios similar to the game Pictionary (Garrod et al., 2007; Fay et al., 2008, 2010). These constraints compel participants to create symbols from scratch, thereby setting up conditions in which the evolution of sign and symbol systems can be observed and analyzed.

In a representative experiment, Garrod et al. (2007) had subjects play a game in which they constructed graphical signs for a pre-established set of items; the game proceeded through several rounds in which players play in pairs, in alternating roles of drawer and identifier. In conditions allowing for interactional feedback, participants produced articulated signs based mainly on iconic resemblance to the referred items. However, through an evolutionary process the signs tended to become simplified and streamlined, reflecting a reduction in their iconic or pictorial character. For instance, in a case from a similar experiment (Fay et al., 2008), the graphical representation for "parliament," which began as a drawing of a chamber with circular benches and stick figures facing one another, ended as an abstraction of two partial curves with a single small circle in between (Fay et al., 2008, p. 3554). While a residuum of iconicity remained, the representation was no longer identifiable by its iconic resemblance to its referent, and would strike a naive newcomer as completely arbitrary. What appears to have happened is that the reference of each use of the sign became its prior use or tokening: the gradually streamlined sign no longer referred to the concept "parliament" directly through resemblance, but rather to previous episodes of successful communication in the history of the sign's use; i.e., the abstracted partial curves referred to, reminded recipients of, the more complex representations that occurred before. In other words, a stepwise process occurred of incremental simplification through repeated use, with each increasingly reduced instance linked to its predecessor, resulting in the distillation of an optimally efficient form.

Congruent observations have been made concerning natural languages (Millikan, 2005), supporting the relevance of these experiments to the workings of language at large. And while iconicity in verbal language may not be as obviously evident, recent studies have made the case for its prevalence (Perniss et al., 2010), suggesting a similar tradeoff between complexly iconic and more simplified forms dependent on tacit social coordination and negotiation. Again this speaks to the living and evolving nature of languages, undergoing change as they unfold and adapt in space

and time. Indeed these experiments offer something of an artificial window onto possible mechanisms underlying the origins of language, the conditions under which words are forged and formed, and in which they must *succeed* if they are to *survive*. Furthermore, as inherently historical phenomena, words do not simply "pick out" their referents in abstract and static one-to-one referential relations, but rather mean what they do through a temporal process of reliable reproduction and use, grounded in the common knowledge that others in the community are participants in that history. In the experiment above, for example, the simplification of signifiers ensues precisely because participants trust that interlocutors will have encountered something sufficiently like the sign in the past, such that they will recognize the shorthand version on offer. Of course community members do not need to be familiar with all the historical details of a sign's use: what matters is that those in currency are recognizably rooted in the history of the community in question. With that said, however, the historical and communal determination of a word's or sign's meaning does not restrict its use to a predefined community, for the community in question extends to anyone who encounters and learns its use through interactions with other members. The historical trajectory of a community's interactions, and the conventions, expectations and potential fixities that inhere therein, will be considered further in the course of the paper.

In light of this applicability to language more broadly, relevant experiments are not limited to graphical signs and symbols, but also demonstrate the adaptation of natural language under controlled conditions. The aforementioned *Optimally Interacting Minds* experimental paradigm explores the evolution of ordinary verbal language within the constraints of a task situation. Through a series of trials, two people perform a visual discrimination task individually; they do so in the same room, each at their own separate computer. As long as they offer congruent answers (whether right or wrong) they simply precede to the subsequent trial. However, if they give divergent answers, they are prompted to verbally negotiate their joint decision; their linguistic interactions are subsequently analyzed in relation to their performance on the task (see **Figure 1** for a schematic of the *Optimally Interacting Minds* experimental setup). The task thus requires dyad members to, on a trial-by-trial basis, determine who had the more vivid experience of the visual stimulus contrast and submit that person's decision as their joint answer. Results show that well-performing dyads converged on a common, stable set of terms to communicate confidence, a kind of scale of verbal expressions allowing dyad members to compare their individual levels of confidence. Importantly, general linguistic alignment – that is, the indiscriminate repetition and reinforcement of linguistic forms – failed to positively correlate with performance. Rather, it was the alignment of terms functionally relevant to the task at hand – in this case, conducive to the communication of confidence in discussions of incongruent answers – that was predictive of performance, pointing to the strong context-dependence of linguistic coordination (Fusaroli et al., 2012).

In this process, of converging on a common communicative practice, the seeds may be seen of an implicit *institutionalization* of a particular approach to solving the presented problem. That is, dyads tacitly *instituted* linguistic practices enabling them to better

function as a problem-solving system. However, the emergence of such convergence does not in itself constitute an *institution*, in the stricter sense of a social entity in part created and governed by explicit rules. While language in this experiment plays a crucial role in the problem-solving activity, it does not function in the capacity required for the establishment of institutions proper, i.e., by explicit representation or declaration of rules which can be either obeyed or broken. Rather, the use of language here is more akin to actions taken *in* the course of a situation, as opposed to representations *of* a situation.

This distinction may be elaborated by the following contrast. Again, participants in the above experiment communicate their confidence in their answers in a simple perceptual discrimination task; such communication drives their decision-making in direct response to the situation itself, and hence inextricably occurs within the immediate context of that situation. Successful communicative practice – here, convergence on a consistent set of terms to convey confidence – is forged under the selective pressure of the task at hand: well-performing pairs arrive at a means of communicating that *works*, that meets the demands of the situation and affords successful coping within that situation. Yet one might imagine successful pairs informing prospective participants *about* the task they faced, the problems they had to address and solve, and the ways they went about doing so. And perhaps they might proceed to *instruct* future subjects in how to go about responding to this situation, or to situations very much like it. This kind of communication would occur outside of the pressing pressures of the task itself; the task is no longer directly responded to but *represented*, described, to others. With representations of the situation, and representations of how to act in the situation, potential participants would now have something to *conform to*, namely depictions of how to complete the task in a particular way, and something to *comply with*, namely the instructors' intention that they complete the task in accordance with those depictions. Yet the distinction between *descriptions of* and *prescriptions for* actions must be kept in mind, and while the exact contours of the move from the one to the other may vary from case to case, some general considerations will be sketched and suggested in later sections, including the human propensity toward imitation (Horner and Whiten, 2005).

This transition from transient, emergent coordinative activity to instructions about that activity can serve as an entryway into instituted practices proper. Given situations which select for successful communicative practices, instructions about such situations make explicit what was tacit practice, instructions which can then be followed *correctly* or *incorrectly*. Whereas practices that evolve in response to the selective pressures of a task may *fail* or *succeed* in relation to the task, instructions create conditions in which correct and incorrect actions are possible. The communicative practices of well-performing dyads in the *optimally interacting minds* paradigm can be deemed relatively successful or efficacious, but strictly speaking cannot be considered correct or incorrect, for no standards of correctness yet exist concerning those practices; they are not right or wrong *per se* but more or less functional with regard to solving the problem at hand. Instructions, however, introduce standards of correct practice and action by explicitly representing those practices

and actions, and thus give rise to a distinction between conditions of *success* versus *correctness*. However, this stark contrast between implicit practice and explicit instruction, while illustrative here at the outset, belies a more continuous picture involving the gradual conventionalization of communicative practice, in which conditions of correctness come into play prior to, and independently of, the introduction of instruction (e.g., Healey, 2008).

These distinctions, of course, remain coarsely sketched at the moment; indeed the road from tacit habit and practice to explicit institution is a crooked and complicated one (e.g., Fleetwood, 2008), and will be treated more thoroughly in what follows. Moreover, there are certainly cases in which institutional contexts themselves provide the conditions for the emergence of spontaneously responsive practices and actions, and so it would be a mistake to suggest that the trajectory is necessarily unidirectional. However, thefocus here is not on how that path happens to proceed in particular cases, but rather on the basic conditions required for the emergence of institutional structures. Suffice it to say at this stage that if one were to take these experimental paradigms as representing some recurring and prevalent set of circumstances, a task or problem situation that people encounter with sufficient regularity and urgency, then it may be fruitfully treated as a kind of microcosm of specialization and, if extrapolated further, institutionalization (Healey, 1997).

Having reviewed some representative examples from this realm of research, we will now proceed to unpack these preliminary observations, and take a closer look at particular implications. In the next section we explicate the notion of *situation* as it applies to these experiments, and examine the role of communication and language therein, a role grounded in the coordination of joint action as opposed to propositional representation. These considerations concerning language will serve to set up what follows, as we address the development of coordination, convention, and, eventually, instruction, in the emergence of institutional interactions, further elucidating the normative distinction between conditions of *success* versus those of *correctness*.

# **COMMUNICATION UNDER SELECTIVE PRESSURES: SITUATIONS AND AFFORDANCES**

Since the concept of *situation* plays a number of different roles in a variety of domains, we should take stock of the term as it has operated in the discussion thus far. As a start, a situation may be described as a set of circumstances, driven and informed by specific human demands and goals, which in turn exerts pressure on actions performed in accord with those demands and goals. So a situation, in this sense, is at once constituted by human actions and feeds back onto them, is both determined by and determining of those actions. A situation, then, may be provisionally defined as a humanly comprised selecting environment, within which actions may succeed or fail to meet the needs or demands fueling the unfolding of the situation; actions that succeed are selected for, while those that fail are selected out. Of course this is something of an idealization: failed and failing actions often persist despite their repeated failure, for various reasons. For current purposes, however, the idea of a situation as, in principle, determining conditions of success or failure will serve to set the stage for what follows.

In the experiments presented above, communication occurs precisely under such pressing and pressured conditions. In these contexts, the situation is constituted by particular tasks or problems, which participants attempt to address or solve in the course of their activity. Communication here serves to coordinate the joint decision-making of the participants, driving and shaping their actions as the situation unfolds in time. Put more strongly, the communication might be said to *comprise* the situation as a kind of cohering glue, coordinating the participants and partly constituting their joint activity (Demichelis and Weibull, 2008). That is, communication may be conceived as continuous with actions taken within the currently occurring situation, as actions subject to conditions of success or failure, as opposed to propositions characterized by conditions of truth or falsity. While we will not attempt to conclusively argue this point here, in this section we suggest ways in which language may be operating in these settings, as opposed to simply assuming and imposing a reflective, representational conceptualization.

Returning to the *optimally interacting minds* paradigm in particular will help ground some of these thoughts. Recall that participants adjust and attune their confidence by means of linguistic interaction in order to arrive at a shared decision. Here linguistic communication may be understood in terms of the sharing of information, affording access and coupling to the perspective, or experience, of one another (Fusaroli et al., 2014a,b). Thus communication may be viewed as a function of the flow of information through the decision-making system, as an aspect of dynamic informational attunement to the situation. Again, this would be opposed to a view of language as composed of propositional statements analyzable in terms of direction of fit to the world (Price, 2013). Rather, the use of language in this scenario is more aptly conceived in terms of *coping*, employed in direct engagement with a situation, in contrast to a conception of language as somehow standing outside the pressures of a situation, where participants have the space to model or represent the situation independently and to manipulate and control that model.

From this point of view, the status of *situation models*, defined as multidimensional representations of currently unfolding situations (Zwaan and Radvansky, 1998), may be called into question. Such models are often conceived as internal cognitive representations belonging to individuals, and therefore the job of communication is to coordinate and align the distinct situation models of the individuals involved. Indeed, Pickering and Garrod (2004) in their influential account state that linguistic alignment on multiple levels of representation leads to the alignment of situation models. So it seems as if these models are first private and must come to be shared, rather than public and shared from the start. However, Fusaroli et al. (2014a) proceed to question the notion of situation models, understood as internal representations which are aligned by linguistic interaction. Furthermore, communication often is not a matter of simple alignment or matching but rather the achievement of complementary roles and contributions in the course of interaction.

Here we suggest two critical replies to the notion of situation models. On the one hand, language itself may constitute the situation model: rather than merely facilitating the sharing and alignment of internal representations or models, the linguistic interaction, the engagement with the public symbols and artifacts of language, may itself count as the construction and manipulation of a model of the situation (Clark, 1997). On this account, the model, or modeling process, is shared from the start, jointly attended to and co-constructed in the course of communication within the situation. On the other hand, the situation itself, to paraphrase roboticist Rodney Brooks, can simply serve as its own best model (Brooks, 1990). While parts or aspects of the present situation may be modeled or represented, the situation as a whole need not be: the situation is simply *there*, to be attended to and engaged with. From this perspective, linguistic activity serves to guide and direct attention and action in the course of unfolding situations (Richardson and Dale, 2005).

If the situation is directly engaged with during joint activity, without the mediation of situation models or representations guiding that activity, then the situation itself must in some sense be able to direct and constrain that activity. The ecological notion of *affordances* seems a good candidate to account for this, though the term is often subject to loose and various applications. Affordances in the original Gibsonian sense (Gibson, 1977) are functional relations between an organism and the environment it encounters, and hence do not need to be represented and imposed upon the environment. Objects in the environment are perceived in terms of the abilities of an organism to interact with those objects (Greeno, 1994). Thus a pen is perceived by a grown adult in terms of fine motor control by the fingertips; however, an infant who has not yet acquired such fine motor control will not perceive the pen in those terms, but would instead perhaps perceive it as something to grab by the fist and place in its mouth. Affordances then are dependent on the abilities of the perceiver, and those abilities may be in various states of development and transition, with blurry boundaries in between. Therefore the line between what can or cannot be done with an object may be vague and subject to change; affordances are therefore dynamic in relation to a perceiver's abilities.

Furthermore, while affordances may be understood in this fairly restricted sense of direct bodily engagement with objects – e.g., an object as being graspable in a certain way – the concept is also often applied to possibilities for action more broadly. Here again the line may not be absolutely clear: one perceives a cup as affording *drinking from* because one perceives it as affording *being grasped* in a certain way, a grasp which itself only takes shape in the course of a goal-oriented action such as drinking (Garbarini and Adenzato, 2004). Moreover, there is the question of extending the concept to situations more broadly (Chemero, 2003). That is, given the multiple constraints of a particular situation, can it be said to *afford* certain possibilities for action? If so, then normative considerations will have to enter in, as the range of possible actions within a situation depend not only on the abilities of the actors and the physical features of the objects at hand, but also on a sense of what actions *ought* to be taken given the situation, as well as which courses of action are better than others. The issue remains, however, whether the possibilities for action themselves need to be represented or modeled in some sense, and if so, how such models might be conceived. So it may be said that the notion of affordances affords a range of applications; yet determining

when a concept is being extended, or an ambiguity exploited, can be a difficult matter. A more extended exploration of affordances, however, is beyond the scope of this paper; for the moment we may say that the concept offers a possible alternative to the prevalent notion of situation models, and furthermore may motivate a non-representational conception of communication, i.e., a conception of communication as a form of joint action facilitated by the affordances of unfolding situations, rather than necessarily dependent upon or bound up with representations of those situations (Hodges and Fowler, 2010).

As the concept of affordances predominantly applies to interactions with physical objects and artifacts such as tools, it is worth exploring its potential applications to social and symbolic artifacts such as language. Firstly, with a physical tool, an individual in isolation can, in principle, learn and effectively use the tool; the presence of other people is not, logically speaking, required (though of course an individual may encounter physical limitations in attempting to perform a task alone, but that is a separate matter). A single person can rely on and respond to the affordances of the tool – the fact that it is graspable and manipulable in this or that way – and exploit its physical features in interacting with the environment, such as a sharp stick affording throwing while hunting (Heft, 1989). And the standard by which the use of the tool may be deemed successful or not is the intention of the tool user herself, what the user intends to do or accomplish with the tool. A person might intend to use a flat head screwdriver to pry open a jar, and may succeed or fail in the act depending on whether the tool is fit for the task. But while a particular individual may set her own standards in the use of a tool, the same may not be said for the use of language, for the success of communication depends on whether or not one is understood by others (Davidson, 1992). The standards for successful communication are not set individually but communally (Wittgenstein, 1953/2001). So insofar as it makes sense to speak of *affordances* with regard to language, of the possibilities for action that certain words in certain situations *afford*, a social dimension must necessarily be included. The communication occurring in the above experiments, for example, takes place within the context of joint activity coordinated by common goals, so actors may fulfill or fail the intentions of others as well as their own. Yet there is a distinction between *failing with others* and *failing others*; that is, there is a subtle but significant difference between failing jointly with another versus failing another. The latter perhaps implies a power relation of some sort, or at least a distinct stance toward the activity in question, in which the intention of the other must be *complied* with.

In the following section these normative considerations will be elaborated, in terms of the distinction between *conditions of success* and *conditions of correctness*, with uses of language serving as a shifting hinge from one to the other. Our aim in this section, meanwhile, has been to suggest a view of language use under pressured situational constraints, as an alternative to a thoroughly propositional conception. While we don't pretend to have presented a full account, we have offered possibilities in terms of affordances and joint action, with linguistic communication affording informational coupling and coordination within a dynamically interacting system. A declarative, propositional picture of language

may appropriately apply, however, in cases of explicit instruction and compliance therewith. These different uses of language reflect different ways of relating and interacting within and between situations, differences we explore in the remainder of the paper.

## **NORMATIVE DISTINCTIONS AND DISCRIMINATIONS: CONVENTION, INSTRUCTION AND INSTITUTION**

In previous sections we have introduced a normative dichotomy between *conditions of success* versus *conditions of correctness*. In this section we further specify these distinctions, and complicate the discussion with consideration of processes of convention and conventionalization. We set the stage with a brief recap of the *optimally interacting minds* paradigm, to help ground what follows in a specific concrete case. Recall that pairs that performed better on the visual discrimination task tended to converge on a common vocabulary to communicate relative confidence in their answers: some pairs converged on a confidence scale comprised of visual terms, as in "I think I saw" and "I did not see anything," while others were voiced in terms of sureness, as in "I'm almost sure" and "I'm absolutely sure" (Fusaroli et al., 2012, p. 4). Ultimately the type of scale used did not matter, as long as they came to tacitly share a consistent practice of communicating their levels of confidence in negotiating a joint response. Thus the demands of the task exerted pressure on participants to communicate in a way that enabled them to fulfill those demands. That is, the experimental setup constituted a selecting environment, comprising a situation defined by conditions of relative success and failure, driving the evolution of actions and practices in accordance with those conditions. Given this situation, participants came to develop a viable vocabulary, specifically honed to cope with the task at hand. Again, here as well as in the other experiments mentioned, feedback and interaction were crucial to the development of these convergent patterns, which arose from the dynamics of the interaction over time, rather than the explicit intentions of the individuals involved (Fusaroli et al., 2014b).

It should be made clear, however, that it is the normative character of the practices themselves that is under question, the normativity *internal* to the practices. This point is important since participants in a paradigm like *optimally interacting minds* receive feedback from the experimental setup as to whether their replies are correct or incorrect. Yet this is a matter of the reinforcement provided by the environment, and hence is *external* to the participants' practices under those conditions, however much those practices develop in response to that environment. That is, regardless of how the environment feeds back onto and constrains their actions, the actions themselves cannot, under the circumstances, be deemed correct or incorrect, but only more or less successful in adjusting to that feedback and meeting the demands of the task; there are as of yet no standards of correctness in place, and so no way to say that *this* and not *that* particular practice is correct. In other words, while correctness of *outcome* may be said to be in place, in the sense of the right aim or end to be achieved, there is no question of correctness of *practice* as of yet. At this point, the means to be taken remain open, as long as the end is achieved: given a goal, whatever methods or tools that may bring about that goal are acceptable. In this sense the confidence scales arrived at in the *optimally interacting minds* paradigm are tool-like, in

that it does not matter which type of scale (e.g., whether in the vocabulary of "vision" or "sureness") is used, as long as they *work* to meet the same end; thus they demonstrate the detachability of means from ends characteristic of purely tool-like or instrumental relations.

However, the normative status of this coordinative practice, its basic instrumental character in terms of pure conditions of success or efficiency, may quickly become transformed in the course of development. We've noted that arriving at a consistent communicative practice is crucial for successful performance. It is then perhaps a short step from this convergence of coordinative practice to the eventual *routinization* and *conventionalization* of such practice. Though a specific procedure may not be explicitly established, procedural routines may emerge and establish themselves in the course of repeated and continual interaction, procedures that may be diverged from or *violated*, and recognized as having been so violated. So it may be that the idea of a starting stage in which there is, strictly speaking, no right or wrong way of going about, where it is purely a matter of whatever works in the context of the task, may be something of an idealization, or at the very least a highly transient phase which undergoes rapid transformation. In other words, though these normative distinctions are conceptually extricable, in the course of actual practice they may well blur together from the very beginning.

A relevant example is found in an experiment by Garrod and Doherty (1994). Here participants jointly navigated a set of mazes either in "isolated pairs" playing together through repeated trials or in "speech communities" where participants would change partner from trial to trial within a closed community. The task was constructed in such a way that participants had to give each other directions and indicate positions in the mazes. They thus had to converge on ways of linguistically referring to positions and routes. Initially participants would generally rely on quite concrete ways of talking about positions in the maze, for instance by reference to the mazes' figurative properties or by describing the route one would need to go to reach a critical position. However, in the course of the experiment some participant pairs would evolve more abstract coordinate systems (e.g., the chesslike matrix system of specifying a column and row index such as A1 or 3.4) that, once established, proved very effective and transported well between different shapes of mazes. Again, not unlike the reduction of iconicity in previous examples, this development seems to proceed from reliance on the concrete instantiation of the single maze toward a more abstract scheme applicable to all mazes despite their individual shapes and differences. Interestingly, speech communities were more inclined to converge on this more optimal strategy than isolated pairs. Furthermore, and of particular relevance here, in community groups the matrix scheme tended to become conventionalized: the matrix scheme was thus applied even in cases that lend themselves to a more figurative strategy (see also Tylén et al., 2013). That is, even in situations in which a figurative approach would have provided an easier means of reference and direction, the more abstract matrix scheme was nevertheless adhered to. While, in early trials, adaptation to the concrete perceptual stimulus is driving joint linguistic behavior, in later trials the gradual establishment of shared "procedures" comes to override local stimulus affordances: a practice, arising from and rooted in a history of communal interaction, comes to be entrenched and imposed upon the current situation. Still, at this stage, this gradual process of conventionalization proceeds implicitly and only becomes apparent to participants if violated.

This latter point is especially evident in an experiment investigating the development of procedural conventions in a coordination task involving the arrangement of actions and utterances in a certain order (Mills, 2011). Again, participants were organized into small communities, though in this case they communicated by means of a text chat tool. While the referential aspects of the task were made trivial, the experimental situation afforded the evolution of procedures for how and when to share information and coordinate actions. Each participant started with their own list of words, which was not viewable by others. The task was then to submit words from their own list in the formation of one shared alphabetically ordered list. However, they also could not view the submissions of the other. Thus participants had to both communicate their words and to converge on procedures for informing each other which words had already been submitted, and when it was the other's turn to submit a word. In the course of the experiment, communication within groups grew increasingly arbitrary and rarefied, with progressively abbreviated utterances positioned in highly specified points in the interactions, their meanings, again, determined by the particular histories of those particular communities. These conventional patterns were built up without explicit agreement, emerging from what allows for successful completion of the task, solutions which were then repeatedly taken up, refined, and rendered more efficient.

However, these patterns became explicitly apparent in a critical last trial, where, unbeknownst to the participants, the chat tool would pair up members from half of the communities with partners from different communities: suddenly participants experienced that all the subtle, tacit routines that have evolved with their partners through the course of the preceding trials were violated, bringing them into explicit attention. The manipulation yielded a dramatic drop in performance and participants performed significantly more self-corrections. These observations point to a kind of intermediate stage on the path toward fully instituted practices: despite the highly implicit nature of the interactive procedures, the reactions to violation indicated an emerging normative dimension. For example, one pair of participants may have established a routine in which they would trade turns by indicating the next item for their partner to submit. Meanwhile, another pair may have evolved a tacit routine in which participants would inform their partner which item they had just themselves submitted. When, in that critical last trial, participants relying on such different procedures are unknowingly brought together, their procedures break down revealing their emergent normative character (see **Table 1** for transcript example from Mills, 2011). For instance, in the transcript example below, Participant 2 is expecting to be told which item Participant 3 has just submitted; instead Participant 3 is naming an item that is on Participant 2's list, thereby following a very different routine. Participant 2's reaction in line 5 indicates that the breakdown of collective routine is experienced not primarily as *unsuccessful* in meeting task demands, but as *wrong* in a socially



normative sense. That is, the exchanges spoke to the violation of norms of interaction, and not merely a struggle with an unfamiliar vocabulary.

Cases in which convention overrides local considerations of efficiency and functionality demonstrate the dissociation of conditions of success and functionality from those of correctness. In other words, they indicate the implicit establishment of a certain way of doing things that is not treated merely as an instrumental means to some end. This conventionalization of practice, therefore, opens space for a tacit, emergent sense of *correctness* independent of explicit instruction. It appears that the historical momentum of social interaction to some extent takes precedence over immediate considerations of efficiency and functionality. Though of course in these situations social interaction is itself a significant factor, specifically in terms of the mutual expectations of community members. So while this fixity or conformity of practice may seem in certain specific cases to be inefficient or even detrimental, this conformity may prove functional overall to the extent that sociality itself becomes a major factor in the problemsolving system. That is, there may exist a trade-off between immediate instrumental efficiency and the historical entrenchment of social expectations. Thus the satisfaction of expectations in the course of an activity may override considerations of local affordances.

A revealing ambiguity with the term *expectation* is perhaps worth noting here. On the one hand, it may reflect a neutral attitude toward likelihood or probability, as in *I expect it will rain this afternoon*. This sense is evident in the contrast between *desiring* versus *expecting* something to be the case. On the other hand, it can be used to express an evaluative attitude, as with the *expectations* one may hold for oneself. The question, then, is whether the relevant social expectations are to be understood in terms of adjustments to statistical regularities, to what is likely to happen given particular conditions, or if they are to be understood in the normative sense of what *should* happen, of how others *ought* to act in particular situations. Though these senses of expectation may be conceptually discriminated, they may effectively become blurred in the course of actual interaction.We raise this point not to resolve it here, but to suggest that human dispositions toward social adherence and cohesion may play a role in infusing expectations with normative force (Miceli and Castelfranchi, 2002).

For example, the human proclivity to imitate may be a factor in the establishment and normalization of means and procedures. It is well established that human infants, in contrast to chimpanzees, faithfully imitate the observed actions of another even if some of those actions are manifestly not required for the completion of the demonstrated task (Horner and Whiten, 2005). Whereas chimpanzees disregard irrelevant actions for the sake of efficiency, human children imitate in full despite the cost in efficiency. This tendency to over-imitate, which may have evolved as a channel for the transmission of cultural knowledge (McGuigan et al., 2011), may drive the move from a merely instrumental relation of means to ends to a more conventionalized determination of means and methods. This is one among many cognitive biases that imbue observed behavior with a normative status (Csibra and Gergely, 2009), skewing human development toward conventionality and heightened sensitivity to norms.

In contrast to tacit convention, however, standards of correctness may be explicitly created when others are *instructed* about the task. With the introduction of instructions, actors are expected to comply with the intention of the instructors to have the task accomplished after a certain fashion, in a certain way. The question, then, becomes explicitly one of the right way of going about, of the correct means and method of performing particular tasks. Under these conditions, the actions to be taken are, in significant respects, *represented* by the instructors, representations which serve as the content of imperatives or commands, i.e., *this is how things are done*, *this is how you shall proceed*. Thus the standard to be met is no longer just the successful completion of the task, but of performance according to an explicitly specified protocol. Experimental studies explicitly investigating these aspects of normativization and institutionalization are still quite sparse. However, there are some studies concerning the passing of instructions about procedures among participants. For instance, in recent studies on cumulative cultural transmission, participants acquire a procedure and then have to instruct new participants, who in turn instruct new participants in a "diffusion-chain"-like design (see Mesoudi and Whiten, 2008 for a review). In a representative study, participants had to work together in groups to make paper planes that would fly as long as possible or build the tallest tower of spaghetti (Caldwell and Millen, 2008). Intergenerational exchange was simulated by gradually replacing group members with new ones. Successively, new group members were introduced and invited to contribute to the refinement of current practices. While the focus in these studies so far has been on the accumulation of cultural skill, knowledge, and innovation, such experimental designs can potentially inform discussions on the transition from conventional to fully institutionalized practices.

It is with the introduction of instruction, perhaps, that instituted practices proper come into existence, constituting a kind of endpoint of the continuum we are considering. In this regard the dependence of institutions on the declarative force of language is markedly evident, both in the articulation of representations of actions to be taken and the articulation of the imperative to perform them in that way (Gelati et al., 2002). This, again, is in contrast with the tacit use of language explored in previous sections, where linguistic interaction serves to guide and coordinate joint action in the course of a situation. By comparison, with the *representing* power of language, explicit rules may be formulated that can be either obeyed or broken, correctly or incorrectly followed. Thus tracing the normative transition from implicit practices to explicit instructions is a matter of discriminating the

different ways language operates in relation to situations, including how communication both indicates and determines relations of power between people. For instance, as to the source of the power to enforce instructions, as to what enables a person to be in the position to communicate instructions about some course of action, such authority may be derived, at its origins, from knowledge and experience directly (e.g., Kruglanski et al., 2005): the instructors, presumably, know how to go about addressing the problem at hand, having accomplished the task themselves, which justifies their formulation and delivery of instructions. Thus knowledge here is the primary authority, whether practical (knowing how) or propositional (knowing that), which gives instructors the right to speak on the matter, and to not only describe but prescribe actions. An alternative developmental story can be spun of community members describing and discussing ways of going about, arriving at a consensus rather than a hierarchical execution of orders. In this context the role of written language can be seen as especially relevant and stabilizing, investing instructions and declarations with a seemingly permanent, impersonal authority, in contrast to oral commands, conveyed by the impermanent speech of particular persons (Tylén and McGraw, 2014).

## **INSTRUMENT AND INSTITUTION**

A key theme threading through the discussion above, which we address directly in this section, is the *instituting* of specific means to achieve an end, whereby those means become a *way* of going about. This notion is similar to the Searlian *by-way-of* relation (Searle, 2010): while one may, say, fire a pistol *by means of* pulling the trigger, one votes in an election not merely by means of but *by way of* the ballot box. The pulling of the trigger contributes causally to the pistol's firing; by contrast, submitting a ballot itself *counts as* the very act of voting, and is not to be understood as an instrument toward some separate end. But whereas for Searle such institutional facts necessarily depend on collective beliefs, we treat the selection and institution of practices in terms of their gradual and tacit establishment, a process driven by human disposition and action as opposed to propositional attitudes.

Central to this process, we claim, is the emergence of an implicit sense of *correctness* above and beyond instrumental *success*, which, as discussed earlier, may arise through *conventionalization*. With convention, a certain pattern of practice is established, by virtue of which deviation is possible; establishing a pattern enables the possibility of breaking it. Though again, in the case of human interaction, such a pattern isn't a matter of simple statistical regularity, of the assessment of, and adjustment to, probability: rather there is a normative character to the persistence of the pattern and the expectation inhering therein (Miceli and Castelfranchi, 2002). This is particularly the case with communicative practices, given the necessary involvement of, and negotiation with, others. And as we hope has been demonstrated by our review above, conventions need neither be explicitly established nor explicitly acknowledged: neither the origin of nor the adherence to convention requires explicit deliberation. Rather responsiveness to convention may be seen as akin to a kind of perception, as a sensitivity to temporally extended patterns of interaction, a sensitivity cultivated by participation in those patterns. Through this

recognition of patterns comprised of communal histories of interaction, a tacit sense of correctness is instituted, a sense of a more or less right *way* of doing something, relative to the community one engages in.

Establishing conditions of correctness has a number of significant implications. Firstly, when a particular means of achieving an end has been established as a *way* of accomplishment, as a *style* of doing characteristic of a community, such instituted activity can become an "object," so to speak, of joint attention, a temporal structure around which to coordinate. Conventions, as reliable patterns of interaction, can serve as coordinative structures in the course of an activity, facilitating its flow (Alterman and Garland, 2000). One effect, then, of conventionalized practice is to simply make interactions more streamlined and efficient in this manner.

Secondly, and perhaps more profoundly, conditions in which it makes sense to say that this or that act is *correct*, beyond how successful it might be, enables the conveyance and detection of significance in a way that mere instrumentality wouldn't allow. For, if conditions and considerations of success, solely and strictly speaking, were all that were in play, then, in principle, any change of procedure, any alternate act taken in the course of some goal, would be treated instrumentally as merely another means toward that goal. Whether these changes would prove fit for purpose is another question: some may be tossed aside as inefficient or unfeasible, which speaks to their instrumental dispensability. Since no grip exists in the means themselves, they would be in principle interchangeable, assessable only in terms of their instrumental success, their status as mere means to an end. There would be no sense of different means *meaning* different things, as they would all be defined by the goal-driven constraints of the given situation.

However, if correctness of practice is established, if the means are in some sense fixed, and become a *way*, then variation may be treated as violation, divergence deemed deviation. Under such conditions, difference in action may take on special significance, beyond being another means to be dispensed with or disposed of. This, again, is bound up with the transformation from *instrumental* to *instituted* practice. For, with instituted practices, conditions in addition to those of success are introduced, enabling a further sensitivity to differential activity. The question becomes one not only of outcome or goal, but of the character of the practice itself. And when actions are no longer only means in relation to an end, no longer determined solely by their aim, variation can become meaningful within and against such instituted patterns of interaction. This is not to say that change of practice, under these conditions, isn't possible, but that such change would be treated, at least initially, as a violation of current norms of interaction. Change would thus undergo some normative process of negotiation, whether explicit or implicit, and not just practical adjustment, which speaks to an important dynamic between fixity and flexibility of practice. This is also not to say that conditions of success and correctness are somehow opposed or separate: they are very much interrelated, and our concern here has been their conceptual disentanglement, however much they may be a tangle in fact.

These themes are especially pertinent to linguistic interaction. Many have noted the necessity of conditions of correctness for

linguistic meaning. Consider, for instance, the centrality of normativity in the work of Wilfred Sellars (e.g., Sellars, 1956/1997), for whom the establishment of norms of correctness is crucial to our capacity for conceptual thought and our operation in the "space of reasons." Consider as well the work of Donald Davidson, for whom conditions of correctness and truth are central to the very possibility of thought and content (Davidson, 2005). In the essay "Truth Rehabilitated," Davidson speculates on an infant's growing entry into language. In the early learning stages, the child, he says, "is still a pragmatist" (Davidson, 2005, p. 15), concerned with the consequences of its vocal behavior, whether in the form of reinforcement from others, or in the attempt to attain something through others. From the perspective of the teacher, already a master of the language, the child is being taught the meanings of words and phrases; from the point of view of the child, linguistic engagement is purely a matter of result and outcome. It is with the dawning awareness of the possibility of being mistaken, that this or that word may be applied correctly or incorrectly, that the child starts to have a sense of the meanings of the words being used. For this possibility for error is not merely a matter of failure: a word is wrong not because it somehow fails to work on some occasion. Rather a word is right or wrong because its use has been established or *instituted* as such. There is much more to be said on this subject, of course. Suffice it to say that, with the introduction and *institution* of correctness, the instrument of language is no longer merely instrumental but intrinsically meaningful, in its sensitivity to correctness and the violation thereof.

Here we should acknowledge the use of natural language in many of the experiments reviewed, and hence the prior presence of conditions of correctness. Thus the normative transition described above, from conditions of success to correctness, occurs within a frame in which a basic sense of correctness is already in place. However, a distinction may be made between the material of language itself, conditioned by conditions of correctness, and the particular linguistic practices that develop from that material. The latter may be more or less successful depending on the situation, and may themselves come to be instituted as correct communicative practice. In this light the emergence of communicative practices may be viewed as recapitulating the transition from conditions of success to those of correctness characteristic of the institution of language itself.

## **CONCLUSION**

In this paper we've traced the emergence of coordinative, conventional, and institutional interactions in terms of the transformation of our *normative* engagements, a process inextricably involving variations in linguistic and communicative practice. This instituting of communicative practice provides a conspicuous opportunity to investigate the variety and interdependence of our normative relations. A normative context must exist for this process to get a grip, a setting in which success or failure is possible, selecting for certain words and communicative forms that *work*, which are functionally suited to a problem situation. The tools of ordinary language are brought to bear to address a problem, and refined and retooled in the process, forging a vocabulary transitioning from the everyday to the specialized, from the common to

the honed for the task at hand. And with the emergence of convention and the introduction of instruction comes an instituted environment that not only selects but *sanctions* certain actions, constituting a significant normative shift in social organization.

In charting this course from conditions of *success* to those of *correctness*, we started with the stark contrast between implicit coordination and explicit instruction, in order to clearly introduce and elucidate the normative distinction. We then explored the continuum between these two poles, in the form of the emergence of convention and the establishment of tacit standards of correctness. We also touched on potential dissociations between the two, both in the sense of conditions of success existing prior to and independently of correctness, as well as the possibility of conditions of correctness coming apart from those of success. The latter is evident, and perhaps familiar, in the case of practices and procedures, deemed or instituted as correct, no longer working efficiently; that is, though officially considered correct, they may well have become dysfunctional and unsuccessful.

This normative perspective provides a way of characterizing processes of institutionalization. From this stance, practices become *instituted* when they are established as correct above and beyond their instrumental success. So while certain practices are *selected* under conditions of success, they become *instituted* under conditions of correctness, whereby mere *means* become *ways* of doing. And as a terminological aside, perhaps the verb form *institute* (as in *instituted* practices or *instituting* activities) is more aptly applied to cases of implicit correctness, in which the processes retain a degree of fluidity and informality, whereas the nominalized *institution* may be best reserved for social structures constituted by the articulation and declaration of formal and explicit rules.

Again, we've been keen to proffer a conception of linguistic interaction as basically coordinative rather than representational. Such a view points to a role for language in the instituting of interaction that does not depend on the idea of declaring institutional facts into existence, of creating institutional reality by performatively representing it as such. Rather communicative and linguistic practices enable and facilitate coordinative activities (Maturana, 1978). Furthermore, being essentially social, linguistic practices may be especially prone to normativization, and hence serve to consolidate coordination. Indeed these normative transformations are themselves inherent to language: language by its nature is a dynamically *instituted* and *instituting* phenomenon.

The experimental semiotic frame here surveyed is especially applicable to these processes, as it treats signs and words as living forms within local environments, adapting to the selective pressures of specific scenarios and problems, with success and failure a matter, as it were, of life and death. Hence the emblematic nature of something like the *optimally interacting minds* experiment, which serves as a microcosm of the selective environments that foster adaptive communicative activity. Indeed there appears to be a kind of double adaptation at work: not only do communicative and linguistic actions adapt to the task environment, but people come to adapt to the developing linguistic environment as well, by aligning with and adopting the communicative forms employed. Thus a vocabulary develops to adapt to a problem, creating a linguistic environment which in turn is adapted to. In this light institutions can be seen as informed and controlled communicative environments designed for the consideration and solution of specific societal problems.

Crucially, the aims and ends of an institution are themselves articulated in terms of the language of the institution itself. The language of an institution to a certain extent constitutes the possibility of its aims and goals. For example, the possibility of convicting someone of a crime is constituted by legal institutions: the legal system, in its institutional articulation, is not merely an instrumental means of achieving the goal of finding someone guilty, but rather constitutes the very possibility of that goal. Language in this sense may be viewed as a kind of cultural technology, enabling the opening up of conceptual possibilities (Clark, 1996). And while focusing on the tool-like aspects of language may offer insights (e.g., Tylén et al., 2010), emphasizing the efficiency and instrumentality of linguistic interaction, a focus on the instituting and institutional aspects of language needs to enter in as well; for there is a difference between making things easier and making things possible to begin with.

Finally, the experimental work reviewed exemplifies the ways in which the broader resources of natural language are brought to bear on certain situations. Indeed, we always find ourselves situated in specific situations (Gallagher, 2012), which are always informed to some degree by direction, purpose or functionality, whether in the form of an explicit aim or goal, or more implicitly and indefinitely. Our ordinary language, in its varied and variegated vocabulary, has evolved, and continues to evolve, in response to fluid, multifarious circumstances. Just as these experiments illustrate the shaping of communicative activity under the selective pressures of contrived and controlled experimental conditions, so too has natural language been forged under pressures to cope with a vast and various range of situations, selected under shifting conditions of success and failure, with the survival of the fittest forms for those situations. To articulate the evolutionary perspective explicitly: words that work *live*, continuing in circulation and continually reproduced, while those that do not work, that fail to serve, *die*, falling out of use, and no longer reproduced. And while language adapts to human environments, to situations constituted by human needs, we, of course, adapt to our environment by way of language, in turn further informing our environments in the creation and differentiation of our diverse social milieux.

## **ACKNOWLEDGMENTS**

The authors wish to thank Riccardo Fusaroli and Shaun Gallagher for their feedback and support, as well as the reviewers for their constructive criticism. This work was supported by *TESIS: Towards an Embodied Science of InterSubjectivity*, a Marie Curie Initial Training Network (FP7-PEOPLE-2010-ITN, 264828) and The Danish Council for Independent Research project *Joint Diagrammatical Reasoning in Language*.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 May 2014; accepted: 03 September 2014; published online: 23 September 2014.*

*Citation: Elias JZ and Tylén K (2014) Instituting interaction: normative transformations in human communicative practices. Front. Psychol. 5:1057. doi: 10.3389/fpsyg.2014.01057*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Elias and Tylén. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# An enquiry concerning the nature of conceptual categories: a case-study on the social dimension of human cognition

## *John Stewart\**

Département Technologie et Sciences de l'Homme, Université de Technologie de Compiègne – Cognitive Research and Enaction Design, Compiègne, France

#### *Edited by:*

Ezequiel Alejandro Di Paolo, Ikerbasque – Basque Foundation for Science, Spain

#### *Reviewed by:*

Serge Thill, University of Skövde, Sweden Shaun Gallagher, University of Memphis, USA Chris Knight, University of Comenius, Slovakia

#### *\*Correspondence:*

John Stewart, Département Technologie et Sciences de l'Homme, Université de Technologie de Compiègne – Cognitive Research and Enaction Design, Compiègne, France e-mail: js4a271@gmail.com

Cognitive Science, in all its guises, has not yet accorded any fundamental importance to the social dimension of human cognition. In order to illustrate the possibilities that have not so far been developed, this article seeks to pursue the idea, first put forward by Durkheim, that the major categories which render conceptual thought possible may actually have a social origin. Durkheim illustrated his thesis, convincingly enough, by examining the societies of Australian aborigines. The aim here is to extend this idea to cover the case of the conceptual categories underpinning modern Western science, as they developed historically first in Ancient Greece, and then at the Renaissance. These major non-empirical concepts include those of abstract Space (Euclidean space, perfectly homogeneous in all its dimensions); abstract Time (conceived as spatially linearized, with the possibility of imaginatively going back and forth); and a number of canonical logical categories (equality, abstract quantity, essential versus accidental properties, the continuous and the discontinuous, the transcendental...). Sohn-Rethel (1978) has proposed that the heart of the conceptual categories in question is to be found in an analysis of the exchange abstraction. This hypothesis will be fleshed out by examining the co-emergence of new social structures and new forms of conceptual thought in the course of historical evolution. This includes the Renaissance, which saw the emergence of both Capitalism and Modern Science; and on the contemporary situation, where the form of social life is dominated by financial speculation which goes together with the advent of automation in the processes of production. It is concluded that Cognitive Science, and in particular the nascent paradigm of Enaction, would do well to broaden its transdisciplinary scope to include the dimensions of sociology and anthropology.

**Keywords: concept formation, social structures, renaissance philosophy, capitalism, automation**

# **INTRODUCTION**

One of the major merits of Cognitive Science is that it provides a *trans*-disciplinary approach to phenomena that are only too often fragmented into separate disciplines that only communicate on the fringes. Right from the start, the "Computational Theory of Mind" (CTM), whatever its defects and limitations, provides a principled connection between the fields of psychology, neuroscience, and linguistics. However, there is one major discipline in the human sciences that is remarkably absent from the synthesis achieved by cognitive science: and that is sociology. To be sure, there is a whole field which goes by the promising name of "social cognition." But when one looks closer, it turns out that what is involved is the way that "social factors" can influence or "color" cognition *after the event*; or alternatively, that when both cognition and human society *are already in place*, some cognitive resources can be allocated to thinking about social forms (for example kinship relations, or even explicitly political matters). What is missing is any inkling of the idea that the social dimension may be actually *constitutive* of humanity itself; that a population of individuals who were not *already* profoundly socialized would not be properly human. If this is correct, then the relative weakness of sociology in cognitive science is a fundamental flaw. This critical remark holds for all the currents in contemporary cognitive

science. It applies not just to the classical TCM, but also to all the connectionist and neo-connectionist variants, as well as to the nascent alternative of Enaction (Varela et al., 1991; Stewart et al., 2010).

This is clearly a major issue; and it would require at least a whole book to do it anything like justice. In the space of a single article, all I can do is to indicate schematically the existence of the problem; and then to illustrate what may be involved by a single case-study which will be inevitably very limited and partial with respect to the problem as a whole. The specific area I have chosen, in order to attempt a constructive proposal, is that of the genesis of conceptual categories.

# **THE NATURE OF CONCEPTUAL CATEGORIES**

# **A PHILOSOPHICAL PROBLEM: THE GENESIS OF CONCEPTUAL CATEGORIES**

As a point of entry into the question I wish to examine, I will base myself primarily on a little-known book by Durkheim (1915). Although this book was published a century ago, it has been virtually ignored. Consequently, the ideas it presents are as new and original as when they first appeared; and I make no apology for taking it as a basic reference. Today, Durkheim is mainly known as one of the founders of modern sociology; but it is worth noting that he had a genuine culture in philosophy. The question of the nature and origin of conceptual categories has indeed a long history in the philosophical tradition; to introduce the question, I will quote directly from Durkheim:

"At the root of all our judgments there are a certain number of essential ideas which dominate all our intellectual life; they are what philosophers since Aristotle have called the categories of the understanding: ideas of time, space, class, number, cause, substance, and so on. These conceptual categories correspond to the most universal properties of things. Thought seems unable to liberate itself from them without destroying itself, for it would appear that we cannot think of objects that are not in time and space, which have no number, and so on. Other ideas are contingent and unsteady; we can conceive of their being unknown to a certain man, a society, or an epoch; but these basic concepts appear to be practically inseparable from the normal working of the human mind. They are like the solid framework which encloses all possible thought."

#### (Durkheim, 1915, pp. 21–22)

The next question, then, is this: where do these categories come from? In the philosophical tradition, there are two main answers: *Empiricism* and*Apriorism.* Empiricism is the doctrine according to which the categories are built bottom-up, by bits and pieces, on the basis of regularities in perceptual experience. This viewpoint was developed historically by the British Empiricists: Locke, Berkeley, and Hume. It culminated with Hume's famous conclusion that the notion of "causality" could only be an illusion (Hume, 1748). The reason is that, however, often we observe that event B follows event A, this can never be a sufficient reason to arrive at the idea that A is a genuine *cause* of B; we can never be sure that next time, A may fail to be followed by B, or that B could occur without *necessarily* being preceded by A. It was this scandalous conclusion, that the concept of "causality" is only an illusion, that provoked Kant to "awake from his dogmatic slumbers," and lead him to propose his "Copernican revolution" in epistemology (Kant, 1781). Far from experience leading to the categories it was the other around: if there were no categories in the first place, no real experience would exist at all. Kant expressed this by saying that the categories exist *a priori*.

Now the problem here is that these twin doctrines, empiricism and apriorism, are both severely defective. Empiricism is decisively refuted by Kant's critique; it just does not hold up. On the other hand, if we are looking for a scientific answer to the question of where the categories come from, apriorism is totally inadequate: to say that they exist"a priori"is just putting a name on our ignorance and begging the question. The empiricist answer, saying that the categories derive gradually over time on the basis of empirical experience, is not valid; but empiricism, for all its faults, does at least attempt to give an answer, whereas apriorism just eludes the question altogether. It is arguably because each of these twin doctrines is about equally defective that the philosophical debate between them has been going on for centuries, and would seem to be interminable.

It is important to recognize here that relatively recent developments in cognitive science – in particular the currents of embodied cognition, extended cognition and distributed cognition – represent a significant advance with respect to the "stand-off" between empiricism and apriorism as diagnosed by Durkheim. "Extended cognition" involves recognizing the role of

technical artifacts and technological systems in establishing specifically human cognition (Stiegler, 1998; Havelange et al., 2003), and this opens up one route to recognizing the importance of the social domain. The current of "distributed cognition" attributes an important role to interactions between individuals. The weakness of such approaches, in the present perspective, is that they focus on *interactions*, which presupposes that the individuals between whom such interactions can occur are already fully constituted. They thus fall into the trap of "methodological individualism" which has been roundly criticized by Giddens (1977). In the same vein, Steiner and Stewart (2009) have argued that the term "social" is misused when it is used to refer to a situation where there are merely inter-individual interactions (such as the phrase "social insects" to denote ant colonies). What is missing is a proper focus on the social structures which implement the "social synthesis," a theme we shall return to below. To sum up, none of these recent developments, in spite of their undoubted interest for cognitive science, have yet attributed afundamental role to the social domain as such. The nascent paradigm of Enaction, which has already been mentioned, would provide a suitable framework for developing a fuller appreciation of the social dimension of human cognition; this has not yet been done, but this article is meant as a step in this direction.

## **A SOCIAL ORIGIN FOR THE CATEGORIES?**

It was in this situation, that of an awkward stalemate, that Durkheim (1915) proposed an audacious and radically original hypothesis. In order to introduce his hypothesis that the categories have a *social* origin, Durkheim notes that there are actually two *sorts* of knowledge: on the one hand *empirical knowledge,* which relates directly to the interactions between an individual and his environment1; and on the other hand knowledge which is framed in terms of the categories, that are essentially social in nature. "Between these two sorts of knowledge there is all the difference which exists between the individual and the social, and one can no more derive the second from the first than one can deduce society from the individual" (Durkheim, 1915, p. 28). Durkheim concludes his Introduction as follows:

"Thus renovated, the theory of knowledge seems destined to unite the opposing advantages of the two rival theories. It keeps all the essential principles of the apriorists; but at the same time it is inspired by that same positive spirit which the empiricists have striven to satisfy. It leaves the faculty of reason its specific power, but it accounts for it and does so without leaving the world of observable phenomena. It affirms the duality of our intellectual life, but it explains it, and with natural causes. The categories... appear as priceless instruments of thought which the human groups have laboriously forged through the centuries and where they have accumulated the best of their intellectual capital. A complete section of the history of humanity is resumed therein. ... This is how it is legitimate to compare the categories with tools2; for on its side, a tool

<sup>1</sup>Durkheim remarks elsewhere that if a man were reduced to having only empirical knowledge based on individual perceptions of this sort, "he would be indistinguishable from the beasts" (Durkheim, 1915, p. 487).

<sup>2</sup>In view of the social importance of tools, and indeed the thesis that "Technology is Anthropologically Constitutive" (Stiegler, 1998; Havelange et al., 2003; Steiner, 2010) it is fascinating to see here that Durkheim himself spontaneously makes the association between conceptual categories, tools and social institutions.

is material accumulated capital. There is a close relationship between the three ideas of tool, category and institution."

#### (Durkheim, 1915, p. 32)

On the face of it, this would appear to be an attractive proposition. It must be admitted, however, that a century later, Durkheim's proposal has received very little attention from the academic community. The brute fact is that it has not even been criticized; essentially, it has just been ignored. A possible reason for this, or at least a contributing factor, is that the bulk of Durkheim's long book is devoted to an analysis of the society of the Australian aborigines. It is therefore important to emphasize that Durkheim's choice of a terrain to gather empirical evidence in support of his hypothesis was in no way guided by a preference for the bizarre or the exotic, but for clear methodological reasons: "in the study of any natural phenomenon which undergoes evolution, there is an immense advantage in starting with the most primitive<sup>3</sup> form known." Durkheim illustrates this precept quite explicitly with the case of living organisms: "Biological evolution has been conceived quite differently ever since it has been known that mono-cellular beings exist.... The discovery of unicellular beings has transformed the current idea of life. Since in these very simple beings, life is reduced to its essential traits, these are less easily misunderstood." (Durkheim, 1915, pp. 18–19). Similarly: "Primitive civilisations offer privileged cases because they are simple cases. That which is accessory or secondary has not yet come to hide the principal elements. All is reduced to that which is indispensable, to that without which there could be no society. But that which is indispensable is also that which is essential, that is to say, that which we must know before all else... But primitive societies do not merely aid us in disengaging the constituent elements of society; they also have the great advantage that they facilitate the explanation of it. Since the facts there are simpler, the relations between them are more apparent. The reasons with which men account for their acts have not yet been elaborated and denatured by studied reflection; they are nearer and more closely related to the motives which have really determined these acts" (Durkheim, 1915, pp. 18–19). Thus, the reason why Durkheim drew mainly on ethnographic studies of Australian aborigines, with supplementary material from studies of Native Americans, was not "simply for the pleasure of telling the particularities and singularities of a very archaic (society)"; but because he hoped thereby to approach the essential constituent elements of human *society*, and to explain them.

Now Durkheim's adherence to this methodological principle did indeed bear fruit in the clarity and relative simplicity of his conclusions. It became rapidly apparent that in all these "primitive" societies, there seems to be an anthropological invariant: the very nexus of their social life is provided by *religion*: but a religion which is in large part foreign to all idea of divinity or gods. What is at the root of these religious practices is a distinction between the *profane* and the *sacred.* Durkheim therefore goes on to ask what could be at the root of this distinction. An Empiricist might

suggest that the notion of "sacred"could derive from extraordinary and possibly "supernatural" events, such as cosmological rarities, showers of falling stars and the like (the theory called "naturism"); or maybe it derives from the phenomena of dreams (the theory of "animism"). But Durkheim very properly dismisses both of these suggestions: since all these phenomena, naturist or animist, do actually occur in the realm of "natural events," they *cannot* for the life of them suggest the notion of the "sacred" as *different in kind* from the profane. But, Durkheim continues, there are indeed two different *sorts* of reality with which human beings are confronted. On the one hand, there is the ordinary everyday reality of perceived objects and processes (which corresponds non-problematically to the class of the profane); but on the other, there is indeed a quite different *sort* of reality, which is equally non-negotiable by an individual, and that is... social reality! So Durkheim arrives at the conclusion that the "sacred" is neither more nor less than the form in which "the social" presents itself to the consciousness of individuals in these "primitive" societies. His task then becomes to show that the conceptual categories of time, space, and so on have their natural origins in the *religious* categories by which social life is ordered. He was able to muster an immense amount of empirical data to support this hypothesis.

By means of a very thorough and critical appraisal of the ethnographic literature, Durkheim came to the conclusion that, quite generally, the "elementary form of the religious life" was that known as "totemism." Each tribe is divided into a certain number of *clans* (usually a dozen or so). Each clan is identified by its emblematic *totem*, which is often but not necessarily a particular species of animal or plant (an additional indication of the sacred nature of animals is given by the cave-paintings at Lascaux and elsewhere – Curtis, 2006). The totem is sacred for members of the clan; it is forbidden for consumption (except possibly under special ritual circumstances). This system is inseparably religious and social, confirming Durkheim's theory concerning the intimate connection between the two. We now come a crucial point: for the Australian, everything which is in the universe is considered to be a part of the tribe; consequently, just like men, all things known are distributed between the clans (Durkheim, 1915, pp. 166–168, where Durkheim cites some examples). Naturally enough, things which are attributed to the same clan tend to have some similarities; this is particularly clear in the case of the phratries4, where there are just two classes. Thus, if the white cockatoo is in one phratry, the black cockatoo will be in the other; and the moon is regrouped with the black cockatoo whereas the sun is with the white cockatoo. However, as Durkheim notes with insistence, "the feeling of resemblances is one thing and the idea of class is another.... The contents cannot furnish the frame into which they fit... This is why the idea of class must not be confused with that of a generic image... The best proof of the distance separating these two notions is that an animal is able to form generic images though ignorant of the art of thinking in classes and species." (Durkheim, 1915, pp. 171–172). Thus, the very notion of "class", and of systematically and logically classifying entities into a system of classes, is a clear example of a non-empirical, *a priori* conceptual category.

<sup>3</sup>This reference, here and later, to "primitive" societies is of course politically incorrect. I nevertheless employ this term (in "scare quotes") in the same sense of "primordial" that Durkheim uses when referring to single-cell organisms. It should go without saying, but maybe even better by saying it explicitly, that when I do use this term in this text, there is no negative connotation (on the contrary, these societies were arguably far less alienated than our own contemporary society).

<sup>4</sup>Although it is not always the case, far from it, certain tribes are organized in just two phratries, with half the clans belonging to each phratry.

What we see here is that the very first systematic classifications that we meet with in history are "modeled upon the social organization, or rather that they have taken the forms of society as their framework. It is the phratries which have served as classes, and the clans as species." (Durkheim, 1915, p. 169). One could scarcely ask for a clearer or more direct vindication of Durkheim's hypothesis that the *a priori* categories have a social origin. Durkheim provides analogous demonstrations for other major categories. We cannot go into the details here, but will have to content ourselves with the barest summary: "it is the rhythm of social life which is at the basis of the category of time; the territory occupied by the society furnished the material for the category of space; it is the collective force which was the prototype of the concept of efficient force, an essential element in the category of causality." (Durkheim, 1915, p. 488).

Durkheim's methodological choice of starting with "primitive" societies thus paid clear dividends. It does, however, have one disadvantage: it can leave the impression that for "primitive savages" there may well be a relation between forms of thought and forms of social life; but that when it comes to civilized societies, especially in the modern Western world, this "primitive" stage has been surpassed and there is no longer any such relation. This was not at all Durkheim's own view; he thought that he was not at the end of the story, but just at the beginning. He remarked: "Attributing social origins to logical thought is not debasing it or diminishing its value or reducing it to nothing more than a system of artificial combinations; on the contrary, it is relating it to a cause which implies it naturally. But this is not saying that the ideas elaborated in this way are at once adequate for their object." (Durkheim, 1915, p. 493). Thus, Durkheim considered that he had laid the foundations for a whole new research program, consisting of following through the whole evolution of human thought, and of relating this to concomitant changes in the forms of social organization. In the next section, we will attempt to respond to this challenge.

# **ABSTRACT THOUGHT AND THE EXCHANGE ABSTRACTION THE EXCHANGE ABSTRACTION**

The aim of this section is to examine whether there is a plausible social basis for the major categories of modern Western thought: more specifically, for the categories which Kant himself identified as being *a priori*, i.e., not derived from empirical experience.

A necessary prerequisite for this task is to characterize the forms of social life in an appropriate way. To this end, I will introduce here the concept of "social synthesis." Every human society in which there is some degree of division of labor must necessarily have a mechanism which provides functional answers the following three questions: (i) What are the productive activities which will be performed? (ii) How is the sum of all the work to be performed to be distributed between the members of society: *who* will do what? (iii) How are the fruits of this labor to be divided up amongst the members of society: who will receive what? – It is a question of the viability of any form of social life that there should be a mechanism which provides an effective answer to this question (not necessarily explicitly, but in terms of practical results); in the absence of adequate answers, there will be anarchy and the dissolution of the society. It is worth emphasizing that this question of the "social synthesis" is not merely ancillary; it is

absolutely fundamental to the very constitution of human society as such.

Now in very broad terms, one can make a distinction between two major types of mechanism for ensuring the social synthesis, which thereby condition two very different sorts of human society; I will call them "traditional" societies and "market" societies. In the great majority of human societies in the past, the mechanism of social synthesis can be designated by the term "traditional": there is a definite sort of social order, with a specification of the roles of the various members of society, which is reproduced from generation to generation in an essentially unchanged form. In many cases, this social order comprises institutions of discussion and negotiation: the African palaver can serve as a metonymical example. One can also speak of a "communal" mode of production, where the nature of the productive activities themselves integrate in large measure the distribution of their fruits and, upstream, the corresponding division of labor. We may remark that there is some proximity here with animal communities, where in some cases the differentiation of activities necessary for collective survival can be quite sophisticated. However, since no animals have the capacity for language, there is no animal equivalent to the institution of the palaver type.

By contrast with these "traditional" forms of social organization, there are societies (including our own contemporary society) that we can designate by the term "market" societies. Here, a large part of the social synthesis is neither traditional, nor the object of relatively direct discussions, nor integrated with the productive activities themselves; it is delegated to the mechanisms of a *market economy.* In this case, the social synthesis is achieved by the famous "invisible hand" of Adam Smith, according to the laws of supply and demand which are balanced by the mechanism of *prices.* In other words, the social synthesis is not directly achieved as such; it is, rather, the "emergent" result of a whole series of purely local economic decisions, without there being anywhere a coherent vision or conscious will at the level of the whole. It is important to emphasize that in market societies, characterized by a division of labor, economic exchanges play a fundamental role because they determine the *form* of the social synthesis. The life of each individual depends on the activities of production and consumption; but without the intervention of market exchanges none of these activities would occur. Each economic crisis is an object lesson in the fact that the activities of production and consumption are perturbed precisely to the degree that the functioning of economic exchanges is compromised.

The cornerstone of a fully developed market economy is the social institution of *money.* It is important and interesting to note that the invention of money as such was not immediate. The successive steps in what was a long historical process, involving a considerable investment of collective intelligence, have been carefully documented in the work of Simmel (1900). Briefly, some of the major steps were: gift and counter-gift; direct barter; the appearance of certain commodities which were not quite like the others because they were systematically used as intermediaries in market exchanges (grain is a good example); the use of precious metals; and the first instances of coined money, which mark a first culmination of the process and will

serve as the basis for our analysis of the *exchange abstraction.* In what follows, I will base myself essentially on the work of Sohn-Rethel (1978).

Market exchange involves an *abstraction* because it requires a rigorous relation of *mutual exclusion* between *use* and *exchange.* The activities of use on one hand, and activities of exchange on the other, are not simply different; they *must* take place separately, during different and mutually exclusive temporal periods. The reason is that the exchange activity serves the sole purpose of a change in owner, in other words a change in the *purely social* status of the commodities as elements of private property. In order for such a change to take place on the basis of a negotiated agreement, the *material* status of the commodities, their physical condition, must remain unchanged during the whole period of the negotiation – or rather, which is even more relevant here, their material status must be *presumed* to be unchanged. Sohn-Rethel (1978) provides a graphic presentation of this key point:

"There, in the market-place and in shop windows, things stand still. They are under the spell of one activity only; to change owners. They stand there waiting to be sold. While they are there for exchange they are not there for use. A commodity marked out at a definite price, for instance, is looked upon as being frozen to absolute immutability throughout the time during which its price remains unaltered. And the spell does not only bind the doings of man. Even nature herself is supposed to abstain from any ravages in the body of this commodity and to hold her breath, as it were, for the sake of this social business of man. Evidently, even the aspect of non-human nature is affected by the banishment of use from the sphere of exchange."

(Sohn-Rethel, 1978, p. 25)

The practical activity of exchange does not in itself have any meaning in terms of nature; it is purely social by its constitution and scope. Nevertheless, the transfer of ownership that is negotiated under property laws in no way lacks physical reality itself. Exchange involves the movement of the commodities in time and space from one owner to another, and constitutes events of no less physical reality than the use-activities which it rules out. It is indeed precisely because their physical reality is on a par that these two kinds of activity, exchange and use, are so mutually exclusive. Thus, exchange is an *abstraction* because, while remaining inseparable from use (otherwise no-one would bother to exchange the commodities in question), it quite rigorously excludes it. At the same time, it is a *real* abstraction, because it is a perfectly real event in time and space.

To sum up the argument so far: in market societies, there are two registers of spatio-temporal reality which exist side-byside, but which mutually exclude each other. This point will be so important for what follows that it will be useful to employ specific terms. In German, the register of "use" is designated by the term "first nature" (*erste Natur*); this register is entirely and substantially material. The register of "exchange" is designated by the term "second nature" (*zweite Natur*); this register is entirely social and, by its constitution, perfectly abstract. The same term "nature" is employed to indicate that these two worlds are endowed with an equal degree of spatio-temporal reality, and that they are inextricably combined in the fabrication of our daily life in a market society.

The strange relation between "first nature" and "second nature" is brought to its peak by the social institution of coined money. Money is an abstract, paradoxical entity: it performs a decisive function in the social synthesis, but *unbeknown* to the actors concerned (we will come back to this point). But even if the "exchange abstraction" is practically never thought of *as such* by economic agents, no animal can begin to understand what money is: it is a register that is solely accessible to human beings5. Sohn-Rethel (1978) makes this point in striking fashion, and I will cite him again:

"Take your dog with you to the butcher and watch how much he understands of the goings-on when you purchase your meat. It is a great deal and even includes a keen sense of property which will make him snap at a stranger's hand daring to come near the meat his master has obtained and which he will be allowed to carry home in his mouth. But when you have to tell him 'Wait, doggy, I haven't paid yet!' his understanding is at an end. The pieces of metal or paper which he watches you hand over, and which carry your scent, he knows, of course; he has seen them before. But their function as money lies outside the animal range. It is not related to our natural or physical being, but comprehensible only in our social interrelations as human beings. It has reality in time and space, has the quality of a real occurrence taking place between me and the butcher and requiring a means of payment of material reality. The meaning of this action registers exclusively in our human minds and yet has definite reality outside it – a social reality, though, sharply contrasting with the material realities accessible to my dog. Here we have the spheres of the "first" and "second nature" which we distinguished earlier side by side, and unmistakably divided."

(Sohn-Rethel, 1978, p. 45)

Marx says quite explicitly that the exchange abstraction never receives a mental representation as such, since its sole expression resides in the *act* of considering that the value of one commodity is equal to the value of another (Marx, 1867, p. 162). Gold, or silver, or any other material entity which lends to money its instantiation as a visible, palpable body is only a *metaphor* for the exchange value, it is not the abstraction as such. In fact, the material instantiations of money do more to mask than to reveal its veritable "second nature."

Historically, when commodity exchanges spread, becoming multilateral and involving a wide range of commodities, there was an overwhelming practical need to employ one of these commodities as a *general means* for the exchanges of the others. This new role did not in itself, immediately, confer the commodity in question with an appearance that is different from before; but as a means of exchange, it is invested with the *postulate* that it should undergo no material change as long as it continues to exert that function. It is therefore easy to understand that the choice of a "standard-commodity" will fall on an entity that by virtue of its physical durability, its divisibility

<sup>5</sup>Chen et al. (2006) have suggested that capuchin monkeys may have some capacity to engage in exchanges with conspecifics. It is actually reassuring that some animals may have an opening in this direction, since this provides a basis for possible evolution towards distinctively human forms of understanding. However, no behavior of this sort is found spontaneously in the wild. It must be re-emphasized that a number of so-called "primitive" human societies (Aborigines and native Americans in their natural state before meeting with Europeans) make no use of money; and, as Simmel (1900) has pointed out so clearly, even in humans the development of a monetary system was a long and very gradual process, spanning centuries. The basic point made is thus remains valid.

and its mobility is relatively conform to the required properties. In this way, the *postulate* of immutability, which has its true source in the abstraction of exchange, quite rapidly acquires the appearance of a *consequence* of its particular properties. The fact that a special "aura" attends this commodity "not like the others" does more to confirm than to refute this misleading appearance.

This confusion reaches a summit when the choice for a "standard-commodity" falls on one of the precious metals. On the occasion of each market transaction, it was necessary not only to weigh the metal, but also to melt it and test for purity; in short, it was necessary to relapse into treating them according to their first nature. And precisely for this reason, they failed in the end to perform their function as a universal means of exchange. This deficiency only found its solution with the invention of *coined money*: this step, which was to have such weighty consequences, was first taken in Ionia around 680 BC. With coined money the preceding relation, where its status as exchange-value was subordinated and masked by its material, first-nature status, was overturned. A piece of coined money is stamped in order to signify that it is to serve as a means of exchange and not as a use-object. Its weight and metallic purity are guaranteed by the emitting authority; thus, if it happens that a coin of money has lost weight through wear and tear, the authority in question will replace it free of charge. Its physical matter has become merely the bearer of a social function6. A piece of coined money is an entity which conforms to the *postulates* of the exchange-abstraction; it is *presumed* to be composed of a substance which is absolutely unchanging, on which time has no effect, and which is thus unlike any material substance which actually exists in nature.

## **ABSTRACT THOUGHT**

We come now to a crucial point. We have characterized the form of social life in societies governed by a market economy in terms of the *exchange abstraction*; is it now possible to identify a corresponding form of thought, and more precisely a corresponding set of "*a priori*" conceptual categories?

Sohn-Rethel (1978) introduces his response to this question with a pleasant thought-experiment. The leading role is played by a philosophically minded Athenian from Classical Greece, who asks himself searching questions about the coins of money in his pocket: "What sort of substance *should* these coins be made of?" As none other than the great Plato emphasized clearly, all material objects existing in the world are perishable, corruptible, and unable to resist the ravages of time; but it seems clear that precisely because of this, ordinary material objects are not properly suitable to the function of *money.* Now Plato also speaks of entities of another sort, which are spotless, eternal, perfectly pure, and always strictly identical to themselves: he denotes them by

the honorary title of "Ideas." So, our Athenian asks himself, "are coins of money actually pure Ideas?" Worried, he takes hold of the coins in his pocket, and thinks hard: "These coins *are* real things; and they are real not just for me, but for all my fellow citizens who accept them in payment for wares. Might money be immaterial? – what an absurd idea, no coin could properly be money if it did not have material reality." So he comes to the reassuring conclusion that the substance that his coins are made of is a *real* substance, as real as any other substance existing in time and space. And yet, this substance is quite different from all these other ordinary substances, because this one is just as immutable as the entities that Plato speaks of. But how can a substance which is immune to the ravages of time exist in time? Nowhere in the whole of nature, and nowhere within the limits of sensory perception, can any such substance be found. But then, how can our Athenian *know* about this extraordinary sort of substance if he cannot see it or hear it or touch it? He knows about it *by thought* and only by thought. Never in all his life has he ever come across this sort of entity, something which is obstinately and uncompromisingly real and yet which is detached from any of the sensory qualities by which things are usually real for us.

This reflection can introduce us to more detailed examination of the formal analogies which exist between the conceptual categories of philosophical thought on one hand, and the distinctive features of the exchange abstraction on the other. It is important to emphasize here that what characterizes each and every one of these conceptual categories is their "canonically apodictic" nature: quite generically, each of them has the remarkable property that once they are identified, in their ideality, it appears intrinsically manifest that they could not be other than they are. At the same time, they are radically non-empirical: there is nothing in our daily experience of nature which is sufficiently similar for it to be at the origin of the concept. What Sohn-Rethel (1978) is suggesting is that actually, there is something in our daily experience that does fit the bill: however, this is not any sort of material reality, but *social* reality.

We therefore hold the germ of an understanding as to how it can be that certain particularities of the exchange abstraction – which is a social form *par excellence* – can be at the root of conceptual categories which are both radically non-empirical, and which can yet be applied to think about material, physical reality. This may be a good place to remark that the relation between social forms and forms of thought, as it is manifesting itself here, is not simple; it is not a question of direct linear causation in one direction or the other. The social forms and the thought forms come about *together*; while there is a sense in which it is the social forms which provide the ground for the conceptual forms (Sohn-Rethel's presentation can be read in this way), it is surely at least as much the other way round: the cognitive capacity to think in a certain way is a condition for the corresponding form of social life to arise. It is salutary to recall here, as Simmel (1900) has so clearly shown, that the emergence of societies based on a market economy occurred only very gradually, over the course of many centuries. One of the reasons for this is surely that human mentalities had to change in order for this evolution in social forms to be possible.

<sup>6</sup>This also explains how it is that the same function can be performed by simple pieces of paper... as long as they bear inscriptions which cannot be easily forged so that they carry the same guarantee. An anecdote may provide a pleasant illustration of the striking contrast between first and second nature which comes into play with bank-notes. When I was seven years old, I inadvertently left a bank-note in my trouser pocket when it went to be laundered. I was amazed to see my mother recuperate some damp fragments of the note, which still bore in barely legible form the number of the note, whereupon she took them to the bank and obtained in exchange... a brand-new bank-note!

With this, we have set up the case for supposing that the exchange abstraction may indeed be at the root of the basic conceptual categories of Western thought. It now remains to flesh out this account by developing it in more detail. In the next section, we shall do this in two ways; firstly, by looking at a set of fine-grained "homologies"; and secondly, by looking at the correspondences between social and conceptual forms as they have co-evolved in the course of human history.

# **FLESHING OUT THE RELATION BETWEEN SOCIAL AND CONCEPTUAL FORMS**

## **CHARACTERISTICS OF THE EXCHANGE ABSTRACTION IN ANCIENT GREECE: THE HOMOLOGIES**

The account we have given of the correspondence between the exchange abstraction and the abstract categories of Platonic thought have so far been expressed in rather general terms. If this relation is real, it should be possible to spell it out in more detailed terms. Sohn-Rethel (1978) has risen to this challenge, and we shall now present the set of homologies between seven of the canonical set of basic categories, for which he has found corresponding aspects of the exchange abstraction.

## *Solipsism*

The doctrine according to which "I alone exist" (*solus ipse*) is a leading leitmotiv of Western philosophy. This doctrine reached the summits with Descartes and Berkeley. In Descartes' famous "*Cogito ergo sum,*" the "self" in question guarantees its *own* existence – the very idea would collapse if the "existence" in question extended to anything other than the subject of the *cogito*. Berkeley deliberately pushes this solipsism to a provocative limit with his "*Esse est percipi*": "to be" is neither more nor less than being perceived. In other words, it is not only other subjects but the whole world which only exists to the extent that *I* perceive it. With his usual clarity, Kant summarizes the apodictic character of solipsism: "there is no foundation in theoretical reason which makes it possible to infer the existence of another subject." This is of course an affront to common sense: in ordinary everyday life, no-one seriously doubts for a moment the existence of other subjects, nor the real existence of the external world. So where could this preposterous idea, which is clearly non-empirical, have come from? There is undeniably a certain irony in looking for an origin in the social domain, because solipsism would seem to be the very antithesis of sociability. But Sohn-Rethel (1978) rises to the occasion.

Since solipsism is a private thought *par excellence*, the first idea that comes to mind concerning the social sphere is that of *private property*. This is all the more plausible in that at first sight it would seem that the institutional principle of private property is logically prior to market exchanges. But Sohn-Rethel (1978) argues that actually the relation is the other way around: the principle of "private property"is actually only a retrospective conceptualisation of necessities that are already inherent in the social *act* of exchange. Let us look at this more closely.

During the whole duration of an exchange transaction, the commodity in question must imperatively be withdrawn from the sphere of use. This is what we have already analyzed above, where we noted that market exchanges induce a rigorous mutual

exclusion between *use* and *exchange.* We now have to pursue this analysis, by examining the consequences of this separation for the consciousness of the agents. To do this, we will successively examine the two aspects: first that of use, then that of exchange.



To sum up: what the owners of commodities *do* in the context of a commodity exchange is effectively equivalent to practical solipsism; and this is the case, quite independently of what the agents concerned may or may not actually think or say about it. There is thus indeed a telling correspondence between "solipsism" as a philosophical category, and certain aspects of the exchange abstraction.

# *The unicity of that which is*

The first thinker in human history who attained the sphere of "pure thought", a style of thought quite different from anything that exists in traditional communal societies, was Parmenides (Cornford, 1939). His central concept is designated, in Greek, by the words τoεoν, which is generally translated as "the One; that which is." This entity is intrinsically and perpetually unchanging; it occupies the whole of space; it lacks all the attributes of sensory perception; it is strictly homogeneous and uniform; it is indivisible; it is incapable of any sort of becoming or decaying; and it is forever immobile. Parmenides emphasizes that the reality and the being of this entity are such that it is intrinsically and literally inconceivable to think that it does not exist. This reasoning is central to his whole doctrine; and it marks the first time in the whole of human history that a conclusion is based on purely logical arguments. Thus, the τoεoν is the starting-point for a thought-process which proceeds by pure reasoning. In other words, what characterizes this style of thought, quite unprecedented at that time, is the fact that this purely conceptual thought grasps the dialectics of truth and non-truth according to the canons of logical *necessity* which is absolutely binding. Parmenides writes: "The fact of thinking, and the thought "it is," are one and the same thing. For you will never find any thought divorced from that which is, from what the thought is about. For there is not, and there never will be, any thing other than that which is." Hegel (1833) was later to recognize himself perfectly in this stance, and comments: "This is indeed the fundamental idea. Parmenides marks the beginning of philosophy."

We may note that the concept of τoεoν is a *premise* for the logical arguments of Parmenides; but the origin of the concept itself is enigmatic. One thing is clear at any rate: it is a radically *non-empirical* concept. It is indeed totally evident that no-one has ever seen (or heard, or touched, or tasted, or smelt) anything at all which bears the least resemblance to this τoεoν. In this respect, it is worth noting that neither Parmenides, nor any of the other founders of Greek philosophy, claim to have personally *invented* their key concepts themselves. Parmenides never suggests, for example, that he arrived at this concept by a process of generalization on the basis of multiple cases in order to arrive at the level of a universal concept. The abstractions which underlie these concepts are of a quite different sort: one *finds* them already there, complete in themselves, totally without any process by which they could be derived. They come from elsewhere, outside and independently of any human thought.

It is in this difficult situation that Sohn-Rethel (1978) proposes his audacious solution to the problem. According to him, the concept of Parmenides corresponds in quite exemplary fashion to a description of the abstract substance from which, ideally, *money* should be made. A market commodity can be exchanged between two private owners precisely to the extent that it has the capacity to be constituted as the object of a mutual exclusion of ownership. It is this capacity which makes it impossible for such a commodity to belong simultaneously to two different owners: a commodity is essentially *one* in the context of a rivalry between *two* owners.

What, precisely, does this "unicity" consist of? It has nothing to do with the indivisibility of the commodity considered as a material entity; it has nothing to do with its actual natural properties. In fact, what is brought into play is not the unicity of the commodities themselves, but the unicity of their *existence.* The ways in which a commodity can be perceived – as an object and in terms of its possible use-value – are as diverse as the persons who perceive it; but it *exists* in a single world which is common to all the private individuals, and this is the world of market exchanges.

The unicity of the exchange abstraction is thus absolutely fundamental, because it is this unicity which constitutes it as an instrument capable of realizing the social synthesis; in other words, of conferring on the society in question its coherence and its unity. There is thus an astounding formal concordance between this unicity of the exchange abstraction, and the ontological unicity of the τoεoν of Parmenides which is the founding abstraction of philosophical thought.

## *Abstract quantity*

The work of the formalist school of mathematics (Weir, 2011), notably following Hilbert, have made quite explicit something which was up until then merely implicit in the whole of "pure

mathematics": this is the perfectly *abstract* quality of "natural numbers." The mathematical definition of these numbers involves a notion of "abstract quantity" defined by nothing other than the relation "larger than" (>), "less than" (<), or "equal to" (=)7. The fact that the very considerable work of the formalist school was necessary to make these concepts explicit is an eloquent indication of their abstract, non-empirical nature. "Numbers" as we experience them empirically are not at all built in this way (which explains the abstruse, non-intuitive nature of "formal mathematics" which has given such headaches to pupils and teachers alike in schools where a well-intentioned but possibly quite misguided attempt has been made to introduce this new program of "modern maths"). Numbers as we come across them in daily life are never separated from the objects that are to be counted; what we can actually experience empirically are twenty sea-shells, or twenty cows. But then, if the concept of "pure quantity" cannot be derived from empirical experience, where on earth could it have come from?

Sohn-Rethel (1978), continuing his analysis of the exchange abstraction, sees an answer to this enigma in the following way. The act of exchange contains within itself the *postulate* that the two sets of commodities to be exchanged are *equal*. But how are we to define and to characterize this "equality"? It does not reside in the identity of the commodities, because if they were completely identical there would be no point in exchanging them; only *different* commodities are exchanged. Neither are the commodities considered to be equal in the minds of the agents, because their action would become absurd if they did not see any advantage in realizing the exchange. What is more, this sort of evaluation only exists in the solipsistic register of each individual conscience; from one person to another, such evaluations are not comparable. Nevertheless, it is of the very essence of the postulate of equality that it transcends the gulf of experience between the agents. The postulate of equality does not derive from their experience; the only thing they to agree is that the two sets of commodities can be exchanged. The two sets of commodities are *rendered equal* by the very act of exchange; they are not exchanged in virtue of any sort of "equality" that they possess in themselves.

An act of exchange of this sort, which ends up by *postulating* the equality of the sets of commodities, may well be preceded by a negotiation, by a sort of petty bargaining where what is at stake for each agent is "take more" and "give less." Now it is true that many commodities can be measured in dimensional units (tons, gallons, square metres, and so on). But the comparative terms "more" and "less"employed during the bargaining do not involve a quantitative comparison between, for example, tons of coal, gallons of petrol, or square yards of fabric. The relational equation postulated by an act of exchange leaves behind it all such dimensional measure, and *establishes* a level of pure non-dimensional quantity. At the end of all this we find, very precisely, the level of pure numbers defined by nothing other than ">," "<," and "=."

<sup>7</sup>I thank one of the reviewers for pointing out that technically, the definition of "natural numbers" also requires the core concept of a "successor function." Sohn-Rethel does not address of the notion of a "successor function," nor the passage from "abstract quantity" to "natural number."

## *Abstract time and space*

In the list of categories of synthetic *a priori* judgement, as Kant set them out, an important place is occupied by the concepts of time and space. This space is that of Euclidean geometry: it is notably characterized by the fact of being rigorously homogeneous and isotropic. As Jaynes (1976) has pointed out with great perspicacity, time is only accessible to reflexive consciousness, and indeed to scientific thought, if it is metaphorically transposed to this conceptual framework of an ideal space: in this context, "time" is nothing other than a Euclidean point which advances uniformly along a straight line which is also Euclidean. It may not be necessary to dwell at length on the totally non-empirical nature of these concepts, since this thematic leitmotiv is becoming familiar. The space in which we move in the course of our daily life is anything but homogeneous and isotropic (Merleau-Ponty, 1945). As embodied beings, we are constantly subject to the anisotropic influence of gravity (in fact even this characterization is already idealized with respect to our phenomenologically immediate lived experience). And even the space of our movements in the two horizontal dimensions is not homogeneous, being encumbered in all sorts of ways. We have no perception of spatiality outside our actions (this is particularly clear in the "enactive" approach to cognition and perception). Now these actions are constitutively dependent on the particularities of our embodiment and of our natural *Umwelt*; and both of these are anything but homogeneous and isotropic. And as for time, considered as we have immediate lived experience of it, its "framing" by the metaphor of spatiality is in no way empirically given; and on the other hand, it is characterized by biological and psychological rhythms, day and night, which once again are anything but homogeneous and linear. So where could the rigorous *ideality* of the Euclidean conceptions come from?

As we may expect, Sohn-Rethel (1978) sees the source of this ideal abstraction in the switch which comes when the categories of space are applied not at the level of use, but at the level of market exchanges. At the level of use, which we interpret here as covering the totality of all human activities in relation with nature, space, and time are inextricably linked to natural events and human activities: as for example in the ripening of harvests, the seasons of the year, hunting animals, the birth, and death of human beings, and generally everything that happens in the course of life. Now every act of exchange requires abstracting away from all this, because the commodities are supposed to be quite immutable during the whole duration of the exchange. The transaction does take a certain lapse of time, because one must include the delivery of the commodities and the payment which concludes the exchange. But the totality of this time is emptied of all the material realities which make up its content at the level of use.

Very similar considerations apply to space, for example the distance that the commodities must cover when they change owners. While the commodities are in transit from the old to the new owner, the equality between the two sets of commodities holds at each position and at each instant *in exactly the same manner* as at any other position and time. It is for this reason that time and space, when they are applied to the exchange, *must* be perfectly homogeneous. They are also continuous, in the sense that they allow for an interruption at any moment during the transit. In other words, the exchange abstraction excludes everything which makes up history, whether it be human history or natural history. The empirical reality of facts and events, and their descriptions which make it possible to differentiate one local time and position with respect to another, is entirely obliterated. This is how time and space acquire that character of universality and atemporality which *must* mark the exchange abstraction in each of its traits.

## *Substance and accidents*

It is well known that Aristotelian logic operates a fundamental distinction between the "essential" properties of an object – in brief, the necessary and sufficient properties for an object to belong to a certain class of objects (for example, being "a tree," "a cat," and so on) – and the "accidental" or contingent properties, those that an object can have (or not) without affecting its membership of a class (for example, the fact that a cat is gray or ginger). In its more highly developed form, this distinction becomes that between "primary" properties – in physics, these reduce essentially to the mass, the position and the state of movement of a particle – and "secondary" properties such as its color, its sound, its smell and so on. It is pretty evident that this conceptual scheme – which gives pride of place, need it be said, to the "essential" or "primary" properties – is the exact opposite of the empirical situation, for everything that can actually be perceived is relegated to the status of "accidental" or "secondary" properties. But if the "essential,""primary" properties are non-empirical, where do they come from?

Sohn-Rethel (1978) once again finds an answer in the exchange abstraction. In fact, we have already largely presented what is at stake: the "ideal" substance of which money should, ideally, be made is very precisely devoid of all sensory qualities; all that remains are the properties necessary for it to transit in abstract space and time. Let us recall, once again that we are dealing with an "abstraction" precisely because the *use*-value of a commodity (and without which it would actually not have any exchange-value either) is constituted precisely by its empirical qualities.

## *The continuous and the discontinuous*

One of the grand themes which characterize the whole tradition of Western mathematics is the tense opposition between the continuous and the discrete (Salanskis, 1992). Already in ancient Greece, this gave rise to the paradoxes of Zeno – Achilles who would arguably never quite catch up with the tortoise. Another key moment was the invention of differential calculus by Leibniz and Newton. Once again, this is a concept that does not arise in the empirical sphere of daily practice; and once again, Sohn-Rethel (1978) finds roots for it in the exchange abstraction. On the basis of what we have already said, and summing up, it is clear that an act of exchange *must*, intrinsically, be described as *the abstract movement, in abstract space and time (i.e., homogeneous, continuous and empty) of abstract substances (materially real but devoid of any sensory qualities) which do not undergo any material change and which can only be differentiated in a quantitative and non-dimensional manner.* Now on one hand the constancy of the exchange value confers a continuity to the whole process of exchange; but on the other, it must be possible to *interrupt* the movement of the commodities at any place and time in order to

verify the constancy of their value, and this cuts their movement up into a number of discrete packets. This contradictory nature, both continuous and discrete, comes from the social origin of their abstract nature.

## *The transcendental*

A final element in this list resides in the feature that above and beyond the relatively fine and specific details of the homologies we have examined in a–f, there is an over-riding, generic characteristic of the conceptual categories. Although philosophers are general silent (not to say evasive) concerning the genetic origin of the Kantian categories, they all agree that these categories are both "given" *a priori* in a non-empirical fashion, and at the same time absolutely compelling in their apodictic normativity. "Logic" in this sense has the property that it could not be other than what it is. This is the meaning of the philosophical term "transcendental." But where could this remarkable property come from? Once again, we find a corresponding characteristic on the side of the exchange abstraction. This abstraction is indeed founded not on empirical facts, but on *social postulates*; and there is a sense in which they postulates could not be other than they are, on pain of the entire edifice collapsing (and in this case, in the framework of a market society, all activities of production and consumption would cease and the whole society would materially collapse). We can make an impressive list of these postulates which all have in common this feature that on the one hand they are *pure postulates*, but at the same time endowed with a sort of intrinsic necessity. Thus: it is a *postulate* that the use of commodities should be suspended until the action of exchange is completed; that no modification should occur in the physical state of the commodities, and that this postulate must be maintained even if empirical facts would seem to run counter to it; that the commodities which are exchanged should count as equivalent in spite of all their manifest empirical differences; that the fact of acquiring and giving up commodities is bound to *a priori* conditions concerning their exchangeability; that commodities change owners by transiting from one place to another without being materially affected, and that this movement occurs in an "empty" space. None of these formal concepts invokes any sort of empirical, factual observation; they are all *norms* that the exchange of market commodities *must* satisfy in order to implement the social synthesis.

## *Conclusion*

Having examined in some detail several of the "homologies" identified by Sohn-Rethel, this may be the place to pause and to pose anew the question of the *status* of these homologies; in other words, the *nature* of the putative relation between forms of social life and forms of thought. Quite generally, if there is a correlation between two entities X and Y, and this correlation is not merely an illusion due to pure chance, there can be three reasons for this. It may be that variation in X is a cause of variation in Y; or that variation in Y is a cause of variation in X; or yet again that there is a common cause of variation in both X and Y (of course these three possibilities are not necessarily mutually exclusive). In the present case, the initial formulation of the question by Durkheim tended to suggest that there was a causal relation in the direction from social forms to conceptual forms. But we have already suggested, at the end of section 3, that the relation almost certainly functions also in the other direction: the cognitive capacity to think in a certain way is a condition for the corresponding form of social life to arise. And finally, reflection on the nature of the "homologies" proposed by Sohn-Rethel (1978) raises a third possibility. We have emphasized, concerning the τoεoν of Parmenides, the apodictic nature of the concept; it seems to have a sort of inner necessity, such that it could not be other than it is. And concerning the category of the "transcendental," we have again noted this apodictic quality that has so impressed philosophers over the centuries; but we suggested there that this striking quality is in resonance with the fact that the various aspects of the exchange abstraction are also *pure postulates* with a sort of intrinsic necessity. In other words, the social forms and the conceptual forms we have been examining have a fundamental feature in common. To sum up, it would seem that all three of the possible reasons for a correlation between social forms and conceptual forms make a significant contribution.

## **HISTORICAL EVOLUTION**

The "homologies" that we have examined in the previous section are based on the state of affairs at a particular point in time and place. This situation – that of Ancient Greece – is indeed a key moment; but it is nevertheless only one moment in a continuous and ongoing process. The relationship between forms of thought and forms of social life comes into fresh light if we look at their *co-evolution* in the course of human history.

If we take as a starting-point our pre-hominid ancestors, the totemic systems of the aborigine societies that we examined in section 2 already represent a first appearance of a form that is both social and conceptual.

The second major step is one that we have also seen already: the identification of the major conceptual categories of Western thought by the Ancient Greeks. What is nevertheless worth remarking here is the amazing historical coincidence of time and place: Athens, around 400 BC, saw both the work of Plato and Aristotle, and the inversion of coined money. An additional factor, which is of both social and cognitive importance, is that this was also the epoch of the invention of alphabetic writing (see Goody, 1977 for a fuller exposition of the significance of this momentous event, which marked the entry into history in the modern sense of the term).

The third step is that of the European Renaissance, in the 16th and 17th centuries. As its name implies, this was a period of a return to the high intellectual ideals of Ancient Greece, after the decline of the Roman Empire and the interlude of the "Dark Ages." It was more than just a return, however; since this was also the period of the birth of "Modern Science." What is to be noted here is that the Greeks invented almost all the concepts under the sun; but they did not really put them to use to discover new, fundamental knowledge. In a sense, the Greek concepts were strangely static; it is almost as though their canonical nature, their apodictic character which the Greeks themselves thematized explicitly, left them bereft of the possibility of

development. The spirit of modern science contrasts with this: scientific concepts are eminently put to use to create unprecedented knowledge; and because they are used in this way, they themselves evolve.

Is there a corresponding difference in the character of the exchange abstraction? If we look for it, the answer is yes. For the Greeks, money was used essentially as a means of external commercial exchanges (Scheidel et al., 2007). Domestic production was not involved; to put it bluntly, this is because it was performed by slaves who were not paid money for their work. This is precisely what changed at the Renaissance, which was also the period of the invention of Capitalism. The key point is that now, money was invested in the production process itself, with the invention of salaried labor. And need it be pointed out that Capitalism is intrinsically *dynamic*: the capital invested is returned with a profit, which can then be reinvested and so on, leading to a potentially exponential growth. This fits remarkably with the fact that at the heart of modern science, most notably with Newton, there is the concept of a *dynamic system*; or more precisely, a State-Determined Dynamic System (SDDS;Aubin and Dalmedico, 2002). There is probably no better way of illustrating the fecundity of our hypothesis of a profound link between conceptual forms and social forms, than to use it in a back-and-forth fashion to sharpen our identifications of both sides of the relation. What then can the concept of a SDDS point to in the functioning of Capitalism?

An important feature is that a SDDS is perfectly "autonomous": once it is set up, and the dynamic law governing the temporal evolution of its "state" is specified, everything thenceforth occurs without the least"external"intervention. Laplace, who emphasized the radical determinism of such a system, is reputed to have replied to Napoleon when the latter questioned him about the place of God in his system: "Sire, I have no need of that hypothesis" (Rouse Ball, 1908). This can point us to the fact that in a truly capitalist system, the process of production is*theoretically* automatic. It is true that it is common to speak of a capitalist of this sort as a"manufacturer" – as though Mr Ford, for example, had really made thousands and thousands of cars with his own hands; but this is misleading. How does the capitalist fulfill his role as a "producer"? He does not accomplish this by his own work: he achieves it neither with his hands, nor with tools and machines that he would operate himself. He achieves it by means of the money he has invested as capital, *and with nothing else*. "The process of work is a process between entities that the capitalist has bought," says Marx (1867), "entities that belong to him." In fact, if ever a capitalist did come to lend a hand himself, that would only show that he had partially failed in this role as a capitalist entrepreneur, and strictly speaking he should pay himself a salary for that manual work. In other words, the role of "producer" falls on an entity which does not perform a single productive function in the work-process. To sum up the essential point: the key characteristic of the production process, from the point of view of the capitalist entrepreneur who invests in it, is that this process should function *all by itself.* The power of the capitalist system resides in this *postulate* of the self-acting or "automatic" nature of the production process.

It is important to note that a postulate of this sort does not necessarily correspond to a historical reality; in fact, as we shall see below, it will require centuries before the social reality of production relations began very progressively to assume the ideal form that we have just described; and even then they did so imperfectly. This only makes it clearer than ever that the postulate of the automaticity of production processes does not come from any empirical source in the actual technology of production; it is rather the other way round, the fact that in the course of the historical evolution of technology the latter progressively comes to conform to the"ideal"in question is a *consequence* rather than a cause of this postulate. The postulate itself is in no way empirical; it is clearly in the realm of the non-empirical *a priori*; and what we have seen is that it is *formally intrinsic* to the social relations of production in a capitalist society. The formal homology between this *postulate*, which is social through and through, and the Newtonian concept of a SDDS which is also based on a *postulate*, is quite impressive.

Finally, we come to the contemporary period. The lead here is given by the idea that we have just expressed: theoretically, from the point of view of a capitalist, profits should ensue *automatically*. Now it may be thought that this idea is a little far-fetched; in a small start-up enterprise, the budding capitalist is likely to do quite a bit of the work himself; and even later, when the enterprise has grown, he still has to do a lot of real work – buying the raw materials, setting up the factory, and equipping it with all the necessary tools, hiring the salaried workers and putting them to work, ensuring that their salary demands will not become excessive – and even then he has not finished, because he must take care of marketing the products once they have been made. But the fact is that history has taken care of bringing to light the kernel of truth in this theoretical idea: over the last century, the core of capitalism has shifted away from entrepreneurial capitalism to the financial sector. Today shareholders, and at a larger scale the great financial corporations, indeed do little else than accumulate the profits and reinvest them on the financial markets; thus coming remarkably close to the theoretical ideal.

So much for the "social relations" side of the picture. What about the "cognitive" side? Here again, it is the theme of "automaticity" that provides the insight. The hallmark of the contemporary scene is the digital computer, which is playing an ever-increasing role. It may not be necessary to labor the point: the essential feature of a computer is that operations on formal symbols are carried out *automatically* and, thanks to electronic technology, with ever greater speed and capacity. The full significance of this comes from the fact that this automaticity is not restricted to merely abstract operations, but that it is being linked to real material production. A clear indication of this is the everincreasing importance of robots in the production-line; at a deeper level, it is important and fascinating to realize that what has made this link-up possible and effective is the wealth of *scientific knowledge* that has been accumulated since the Renaissance. It is indeed modern science that provides the link between conceptual forms on one hand, and the possibility of effective material action on the other.

An important consequence of this automatization of the production process is that the social institution of salaried employment is coming under pressure. Mass unemployment is of course a social scourge, and is morally unacceptable. However, it is equally clear that as production processes tend towards complete automation, there will be less and less materially productive work to go round. The question is whether it is appropriate to resist this trend; or whether any such attempt is akin to King Canute ordering the sea-tide not to advance, and thus doomed in advance. This is a political question, to which we will return in conclusion.

## **GENERAL CONCLUSION**

A century after Durkheim made the audacious suggestion that the fundamental conceptual categories may have an origin in the forms of social life, this hypothesis has still not received serious attention by the academic community. Concomitantly, and this is in all probability this is not an accident, the question of the origin of the conceptual categories (when it is not simply eluded) is still in the state of wavering between the twin alternatives of apriorism and empiricism which remain equally unsatisfactory for the very reasons so clearly exposed by Durkheim. It is difficult to avoid the haunting impression that all is not well in the house of Reason.

To sum up the arguments presented in this paper, our final conclusion is that there is indeed a strong relationship between social forms and conceptual forms, including the case of societies governed by a market economy. Indeed, we may go so far as to suggest that social forms and conceptual forms may actually be inseparable; and this for three convergent reasons. Firstly, specific cognitive forms are necessary for the social form to function (more specifically, only humans are able to understand what "money" is – cf the anecdote of the dog at the butcher). Secondly, the evolution of social forms (more specifically, the successive forms of capitalism) *drives* a corresponding evolution in mental forms. Finally, the cognitive forms and social forms in question share a fundamental, constitutive characteristic, that of *abstraction.*

Now if there really is such a strong relationship between the forms of social life and the prevalent forms of thought, a question may well be asked: why is it that this relationship is not more immediately apparent, both to analysts and to members of the societies in question? An attempt to answer to this searching question brings us to the domain of ideology – the particular hallmark of successful ideology being that it does not appear as such. Two illustrations may help make this point. Contemporary capitalism has reached a near-perfect stage8, where immense profits ensue to the financial sector for doing practically nothing of any social utility – but these profits seem virtually invisible, both to the tax-collector (the rate of taxation on capital gains is less than that on the salaried earnings of the common worker), and to public consciousness which seems to find this situation quite normal; all this at a time when much is made of the "economic crisis," and national governments are heavily in debt and held to severe budgetary restrictions. Another illustration of this impressive blindness concerns the situation we have already described, created by the fact that the processes of production are in large part automated. One might have thought that this would open up near-utopian perspectives: if these gains in productivity could be shared in a socially equitable fashion, all members of society would be able to devote the main part of their waking hours to activities that they considered intrinsically

rewarding. But instead of that, the very same situation is widely interpreted as a demoralizing threat to salaried employment.

To link these considerations with our previous discussion, we may recall the remark of Marx when he says that the exchange abstraction never receives a mental representation as such, since its sole expression resides in the *act* of considering that the value of one commodity is equal to the value of another. Putting this together with the illustrations of our social ignorance, there need be little surprise that we are largely unaware of the implications of our social situation. The fact that we have no clear consciousness of the effects of our social life on the way we function cognitively cannot be taken as evidence against the hypothesis put forward in this paper. The illustrations also show, however, that this social ignorance has deleterious consequences. Consequently, if the arguments presented here can contribute to even a modest increase in our social awareness, this paper will have been well worthwhile.

Finally, I return to the more general issue addressed in the introduction to this paper: the importance accorded to the social dimension by Cognitive Science. In the space of a single article, it has obviously been out of the question to treat this issue exhaustively. The methodological choice has been made to proceed by metonymy, by concentrating on a case study. The specific domain that has been selected – the origin of conceptual categories – reveals that the social dimension has indeed been systematically ignored. Two major contributions in this area, those of Durkheim and Sohn-Rethel, have received virtually no serious attention by the academic community. Our re-examination of this work has shown that these studies are certainly incomplete, but are basically sound. The conclusion is that there is a call for in-depth follow-up of this question. And enlarging beyond this metonymical example, the social dimension of cognition is worthy of substantial development.

## **REFERENCES**


<sup>8&</sup>quot;Near-perfect" in its own twisted terms – of course this is anything but "perfect" from a humanistic point of view.

Marx, K. (1867). *Capital.* Harmandsworth: Penguin Books, 1976.

Merleau-Ponty, M. (1945). *Phenoménologie De La Perception*. Paris: Gallimard.


Steiner, P. (2010). Philosophie, technique et cognition. *Intellectica* 53–54, 7–40.

Stewart, J., Gapenne, O., and Di Paolo, E. (eds). (2010). *Enaction : Toward a New Paradigm for Cognitive Science.* Boston: MIT Press.

Stiegler, B. (1998). *Technics and Time, 1: The Fault of Epimetheus*. Stanford: Stanford University Press.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 March 2014; accepted: 07 June 2014; published online: 25 June 2014. Citation: Stewart J (2014) An enquiry concerning the nature of conceptual categories: a case-study on the social dimension of human cognition. Front. Psychol. 5:654. doi: 10.3389/fpsyg.2014.00654*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Stewart. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Commodities and cognition

# *Paul Loader\**

*Informatics, University of Sussex, Brighton, UK \*Correspondence: P.Loader@sussex.ac.uk*

## *Edited by:*

*Hanne De Jaegher, University of the Basque Country, Spain*

*Reviewed by:*

*Ezequiel Alejandro Di Paolo, Ikerbasque - Basque Foundation for Science, Spain*

**Keywords: Sohn-Rethel, cognition, commodities, apriorism, empiricism**

## **A commentary on**

**An enquiry concerning the nature of conceptual categories: a case-study on the social dimension of human cognition** *by Stewart, J. (2014). Front. Psychol. 5:654. doi: 10.3389/fpsyg.2014.00654*

John Stewart's paper examines arguments for a social explanation of the conceptual categories which Kant and others have posited as the pre-requisite for abstract cognition. Such an explanation would have the advantage of providing a satisfactory alternative to both empiricism, which cannot tell us how we get from the particular to the universal, and Kantian apriorism which does not really offer an explanation (p. 2).

Stewart looks first at Durkheim's (1915) account which, focusing on ethnographic studies of Australian aborigines, argues that abstract concepts used in these societies had their origins in (essentially social) religious categories. Stewart spends the bulk of the paper, however, examining Sohn-Rethel's (1978) account. Combining Marx's twin insights that ideas are "sublimates" of the "material life process" (Marx, 1844, p. 104) and that the "mystical character" of the commodity is such that "not an atom of matter" (Marx, 1867, p. 138) enters into its existence as an exchange value, Sohn-Rethel argues that the "enigmatic cognitive faculties of civilized man" (1978, p. 34) have their roots in commodity exchange. In particular abstract nonempirical concepts are derived ultimately from the "real abstraction" of commodity exchange. Stewart selects some of Sohn-Rethel's own "homologies" to demonstrate this relation: solipsism corresponds to the situation of the agent in exchange wherein "the action is social, the mind is private" (p. 43); Parmenidean "oneness" derives from the universal equivalence between commodities expressed through exchange value; "abstract quantity" from the "non-dimensional quantity" which the act of exchange attributes to objects as commodities; "abstract time" from the immutability of the object considered as commodity, and so on.

Stewart makes a good case for Sohn-Rethel as presenting an account of cognition which is "social" in a more thoroughgoing way than is typically countenanced by embodied, extended, and distributed approaches. Moreover in highlighting Sohn-Rethel's work within this context, Stewart's paper may serve to generate discussion in other, equally illuminating, directions. In particular we might note Sohn-Rethel's rejection of the traditional view that "abstraction is the inherent activity and the exclusive privilege of thought" (p. 19), in favor of a conception whereby abstraction initially manifests itself through action. With "real abstraction," says Sohn-Rethel, "only the action is abstract, the consciousness of the actors is not" (p. 30). Thus, we would seem to have an instance of a higher form of cognition constituting itself enactively (and collectively). This perhaps provides the germs for an interesting challenge to accounts which, whilst embracing various elements of an embodied, embedded, outlook, still reserve a non-enactive space for |offline cognition," with the latter characterized solely in terms of the inner representational states of individual agents, decoupled from real-time interaction with the world (e.g., Clark and Grush, 1999; Wheeler, 2005).

We might add that Stewart is to be commended for introducing an unambiguously Marxian theorist into the arena of cognitive science. Marx has had some significant influence on recent currents in cognitive science, via intermediaries such as Vygotsky, Merleau-Ponty, and Levins and Lewontin but the Marxian character of this influence is rarely acknowledged. Here the relevance of at least one strand of Marxian theory is made explicit.

Insofar as there may be difficulties with aspects of Stewart's account, these perhaps have less to do with Stewart's own admirably lucid summary and analysis of Sohn-Rethel's book, and more to do with details of the latter's own argument. It might be suggested, for example, that "solipsism" is not really a conceptual category on a par with "time," "space," "oneness" etc. but is rather a kind of philosophical aberration (albeit one that may well have its roots in the alienation of commodity exchange.) Connectedly, some of Sohn-Rethel's arguments for a connection between particular conceptual categories and the exchange nexus seem more convincing than others, a fact which perhaps leaves the reader wishing for an independent criterion by means of which the correctness or otherwise of these correlations can be assessed. There is also, for this reader, at least, an apparent equivocation (in Sohn-Rethel's account) between the idea that commodity exchange is the source of conceptual abstraction *per se* and the idea that commodity exchange is the source of particular abstract concepts. None of these points, however, should be seen as detracting from the originality and overall plausibility of Sohn-Rethel's position.

If there is one area of tension within Stewart's own analysis this is perhaps to be found in his concern with the precise causal relation between social forms and conceptual forms. He argues that "the social forms and the thought forms come about together; while there is a sense in which it is the social forms which provide the ground for the conceptual forms... it is surely at least as much the other way round: the cognitive capacity to think in a certain way is a condition for the corresponding form of social life to arise." (p. 6) Stewart's predicament here is, in part, a familiar one, recognizable both to dynamically oriented cognitive scientists, and to dialectically oriented Marxists alike. It is the problem of how to elucidate reciprocal causal relations in any particular instance without appearing to give ground to either side of the causal equation in isolation. In this case, conceding that "a cognitive capacity to think in a certain way" is a prerequisite for the forms of social life under discussion, might be giving adherents of apriorism too much to play with.

Sidestepping the causal logistical aspects of this quandary, the problem might be at least partly ameliorated through recognition that the "cognitive capacity" in question could be an embodied one, and so not "a priori" in any traditionally cognitivist sense. Here it is worth remembering that, like Sohn-Rethel's "real abstraction," Lakoff and Johnson's "embodied concepts" were likewise an attempt to surmount the apriorism/empiricism divide by grounding elements of conceptual thought in material being:

Reason is not, in any way, a transcendent feature of the universe or of disembodied mind. Instead it is shaped crucially by the peculiarities of our human bodies.

(Lakoff and Johnson, 1999, p. 4)

Perhaps a synthesis of Sohn-Rethel's and Lakoff and Johnson's insights might provide fruitful terrain for future research.

## **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 September 2014; accepted: 29 September 2014; published online: 04 November 2014. Citation: Loader P (2014) Commodities and cognition. Front. Psychol. 5:1181. doi: 10.3389/fpsyg.2014.01181 This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Loader. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# *John L. Protevi\**

*French Studies, Louisiana State University, Baton Rouge, LA, USA \*Correspondence: johnprotevi@gmail.com*

#### *Edited and reviewed by:*

*Ezequiel Alejandro Di Paolo, Ikerbasque–Basque Foundation for Science, Spain*

**Keywords: durkheim, tarde, deep social cognition, quorum sensing, enaction, mind-in-life continuity**

## **A commentary on**

# **An enquiry concerning the nature of conceptual categories: a case-study on the social dimension of human cognition** *by Stewart, J. (2014). Front. Psychol. 5:654.*

*doi: 10.3389/fpsyg.2014.00654*

John Stewart proposes we study what we can call "deep social cognition" (DSC), as opposed to the mere embedding or extending or modifying of cognition by social factors, as Stewart characterizes the tradition of "social cognition" studies to date.

DSC claims that for humans our basic or non-empirical categories—space, time, identity, equality, and so on—are relative to social practices. One could say that DSC takes up the mind-in-life continuity thesis (Thompson, 2007) and explores it relative to human cognition. To fight the representation lists, early enactivists insisted that whatever the content of cognitive processes enacted in the coconstitution of organismic value and environmental affordance, those contents were in fact enacted and not objective reflections (realism) or subjective creations (idealism).

A common enactivist strategy here was to study single-celled organisms (e.g., *E. Coli*). If they displayed cognition qua sense-making, then the ground floor of the mind-in-life continuity thesis would be established and it would then be a matter of studying qualitative shifts in the continuum of organismically rooted cognition: consciousness vs. sentience, selfconsciousness vs. "mere" consciousness, etc. Once the baseline is established, however, Stewart implies, there has to be a follow up investigation of the correlation in human beings of historical/social forms of life and basic categories.

After this mise-en-scène, in the remainder of the comment I will raise some points not so much in criticism as in hopes of offering further research avenues.


a recent enactivist piece thematizing top-down / bottom up complementarity in social life. Tarde criticizes Durkheim for giving himself his "social facts" as already established: in this case, the categories of time, space, subject, object, etc as reflecting social forms. Tarde insists, however, on an account of the genesis of such categories from a molecular field of differences. Tarde is not really an individualist, however, as the basic social units are not really units at all, but "monads" in a constant state of variation and imitation of others. For Tarde, then, the big universals—social forms, basic categories—are formed and held together by minute "repetitions with a difference" (to adopt the terms of Gilles Deleuze). So Tarde insists students of society need a bottom-up methodology—though of course once the categories are in place they guide the socialization of thought in succeeding generations, so there is room for top-down effects as well.

Tarde insists however that the social facts are fragile and in need of constant reinforcing—just how much innovation is allowed before topdown enforcement squelches them, or indeed, before they take hold and change the top-level structures? So adding a bottom-up Tardean perspective allows us to account for different rhythms of change in categories in a way that Durkheim's progressive model doesn't (as I understand it, Durkheim has an account of modernity as increasing specialization in the division of labor). Hence Tarde's critique of Durkheim:

Mr. Durkheim spares us such terrible tableaux. With him, no wars, no massacres, no brutal invasions. Reading him, it seems that the river of progress has flowed smoothly over a mossy bed undisturbed by froth or somersaults. [*...*] Evidently, he inclines towards a Neptunian, rather than a Vulcanian, view of history: everywhere he sees sedimentary formations, nowhere igneous upheavals. He leaves no place for the accidental, the irrational, this grimacing face at the heart of things, not even for the accident of genius. Latour et al. (2008).


that Read also looks to the Italian Autonomia thinkers, in particular Virno's reading of Marx on the "general intellect" as it relates to the post-industrial economy. This field of thought bears on Stewart's discussion of financial capitalism, which cannot be underestimated as a vitally important philosophical/political topic, just as nuclear power and global climate change rose to the forefront of thought in the eras in which they assumed dangerous potentials.

## **ACKNOWLEDGMENTS**

My thanks to Ezequiel Di Paolo, John Stewart, and the anonymous reviewer for Frontiers.

## **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 July 2014; paper pending published: 26 July 2014; accepted: 26 July 2014; published online: 26 August 2014.*

*Citation: Protevi JL (2014) Extending the DSC paradigm: some areas for future research. Front. Psychol. 5:889. doi: 10.3389/fpsyg.2014.00889*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Protevi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The cruel and unusual phenomenology of solitary confinement

## *Shaun Gallagher1,2,3 \**

<sup>1</sup> Department of Philosophy, University of Memphis, Memphis, TN, USA

<sup>2</sup> School of Humanities, University of Hertfordshire Hatfield, Hertfordshire, UK

<sup>3</sup> Faculty of Law, Humanities and the Arts, University of Wollongong, Wollongong, NSW, Australia

#### *Edited by:*

Hanne De Jaegher, University of the Basque Country, Spain

#### *Reviewed by:*

Matthew James Ratcliffe, Durham University, UK Lisa Guenther, Vanderbilt University, USA

#### *\*Correspondence:*

Shaun Gallagher, Department of Philosophy, University of Memphis, 331 Clement Hall, Memphis, TN 38152, USA e-mail: s.gallagher@memphis.edu

What happens when subjects are deprived of intersubjective contact? This paper looks closely at the phenomenology and psychology of one example of that deprivation: solitary confinement. It also puts the phenomenology and psychology of solitary confinement to use in the legal context. Not only is there no consensus on whether solitary confinement is a "cruel and unusual punishment," there is no consensus on the definition of the term "cruel" in the use of that legal phrase. I argue that we can find a moral consensus on the meaning of "cruelty" by looking specifically at the phenomenology and psychology of solitary confinement.

**Keywords: solitary confinement, cruelty, intersubjectivity, induced autism, self**

punishment.

A number of legal declarations prohibit "cruel" punishments. The Eighth Amendment to the United States Constitution (1791), for example, declares: "cruel and unusual punishments [shall not be] inflicted." From the beginning, however, the wording was thought "too indefinite," or "to have no meaning in it1." It is still difficult to find a clear definition of "cruel" in the legal domain. The intent of the present paper is in line with a recommendation made by Radin (1978, p. 992), that the courts"must search for a deeper moral consensus on the meaning of cruelty in order to determine whether a specific punishment comports with current standards of decency." Rather than looking to legal history, "legislative enactments, referenda or opinion polls" (Ibid), however, I propose that we look to a combination of philosophical and scientific methods that include phenomenology, psychology, psychiatry, developmental psychology, and neuroscience, to explicate the specific experiences of those who undergo punishments, with a view to formulating a deeper moral consensus2 .

My focus in this paper is limited to the practice of solitary confinement. Solitary confinement may differ from one prison to the next in the precise details of how it is carried out. I assume, however, that the common element is some high degree of isolation – the reduction or complete elimination of intersubjective contact between the prisoner and others for a significant amount of time. Accordingly, I'll begin with an outline of some classic phenomenological concepts related directly to the notion of intersubjectivity. I'll then show how these concepts are reinforced by developmental studies. The question then becomes: what happens when subjects are deprived of

**INTERSUBJECTIVITY** Phenomenological philosophy, which can be traced to the work of Edmund Husserl at the beginning of the twentieth century, has recently been incorporated into scientific studies of cog-

**BASIC CONCEPTS IN THE PHENOMENOLOGY OF**

nition, including embodied and enactive approaches to social cognition (e.g., Gallagher, 2001, 2005, 2012; Ratcliffe, 2006; De Jaegher et al., 2010). Phenomenology, even in its classical form, emphasizes the constitutive nature of intersubjectivity. I'll briefly discuss three concepts from classical phenomenology directly relevant to this idea: being-with, transcendental intersubjectivity and intercorporeity.

intersubjective contact? Here I'll appeal to the notion of induced autism, and then look closely at the effects of solitary confinement. In the final section I return to the question of what constitutes cruel

Heidegger (1962) provides an analysis of human existence in which being-with (*Mitsein*) or being-with-others is part of the very structure of human existence, shaping the way that we are in the world. According to this notion, the social dimension is not an external add-on or supplement to our existence. Being-with does not signify that we are in-the-world first, and then because of that we come to be with others. In other words, our social nature does not depend on empirically encountering others; it is rather an *a priori* structure – the fact that others are in the world only has significance because our existence is structured as being-with. If one happens to be alone, one still has the structure of beingwith – and "only as being-with can [one] be alone" (1985, p. 238). Heidegger goes on to further emphasize that this particular way of being-with co-determines other aspects of our existence, including our relations with the world around us: "By reasons of this *with-like* [*mithaften*] being-in-the-world, the world is always the

<sup>1</sup>Granucci, (1969, p. 842); citing representatives to the First Congress, Smith and Livermore, respectively. Granucci provides a fine-grained history of the phrase.

<sup>2</sup>This is clearly a different hermeneutical procedure than found in most legal considerations where appeal is often to historical meaning or to evolving moral standards on such questions.

one that I share with Others. The world of [human existence] is a *with-world* [*Mitwelt*]" (1962, p. 155/118). One encounters others primarily through one's various projects, and even in terms of what one perceives. One's projects equally implicate other people – as co-workers, intended recipients and so on. So the world in which we find ourselves cannot be extricated from our relations with other people – it is permeated with social relations. Heidegger also makes the point that being-with shapes our own self-experience. One's *own* existence is something that one experiences in the kinds of pragmatic projects that one shares with others.

In effect, one doesn't come to have a social constitution by way of interacting with others; one is "hard-wired" to be otheroriented, and this is an existential characteristic that makes human existence what it is. The term "hard-wired" is not a term that Heidegger would have used. I introduce it, however, to indicate the possibility of something going wrong in regard to beingwith (see below). To the extent that the *Mitsein* structure is damaged, it damages the very core of the individual's human existence.

The notion of an existential sociality is not only relevant to questions about social cognition and our relations to others. According to Husserl, the very *objectivity* of the world *as experienced* depends on others. This is what he refers to as transcendental intersubjectivity (e.g., Husserl, 1959, p. 449; Husserl, 1968, p. 295). Intersubjectivity is transcendental, in the sense that it is a condition of possibility for us to experience anything like a coherent and meaningful world, and specifically to experience it as real and objective. This latter point is what Husserl's concept of transcendental intersubjectivity adds to Heidegger's notion of *Mitsein*. The analysis Husserl gives is based on the perception of our immediate environment. We see things, not as mere surfaces, but as multi-sided objects based on an implicit reference to the (real or potential) perceptual perspectives that others can take on the same objects. Our basic experience of the world as having reality or objectivity depends on a kind of tacit confirmation by others.

Thus everything objective that stands before me in experience and primarily in perception has an apperceptive horizon of possible experience, including my own and that of others. Ontologically speaking, my perception of the world is, from the very beginning, part of an open but not explicit totality of possible perceptions [that others may also have]. The subjectivity belonging to this experience of the world is open intersubjectivity. (Husserl, 1973, p. 289; translated in Gallagher and Zahavi, 2012)

The idea that something may go wrong with the basic structures of being-with or transcendental intersubjectivity was followed up in the tradition of phenomenological psychiatry, as found in the classic works of Jaspers (1997), Minkowski (1970), Blankenburg (1971), and others. A certain form of derealization, for example, can be analyzed as a disruption of transcendental intersubjectivity, to the point that real things may no longer feel real or familiar, or as fully objective as they should. A significant privation of intersubjectivity, accordingly, may lead to an erosion of the sense of reality (although, to be sure, not all forms of derealization are due to such privation). Such experiences, found in instances of schizophrenia, may also be closely tied to the

phenomenon of depersonalization. Phenomenologists have analyzed some of the symptoms of schizophrenia (including autistic aspects of schizophrenia) as involving very basic disruptions in self-experience or ipseity (e.g., Sass and Parnas, 2003). "Such experiences may also involve dissociative features, in which one experiences a pathological, subjective detachment from the external world, an estrangement from one's body and even from mental processes" (Varga, 2012, p. 103). On this view, the loss of a basic intersubjective dimension of existence can lead to the loss of the sense of realness, as well as disturbances in what some have called the minimal self (Gallagher, 2000; Zahavi, 2007).

In terms of our actual engagement with others, being-with and transcendental intersubjectivity are cashed out in very basic sensory-motor processes involved in our bodily interactions with others. Merleau-Ponty calls this "intercorporeity": "between this phenomenal body of mine and that of another ... there exists an internal relation which causes the other to appear as the completion of the system" (Merleau-Ponty, 1962, p. 352). For Merleau-Ponty, our perception of others is interactional rather than observational, and the actions of others elicit the activation of our own motor systems. These processes involve the kind of motor resonance often described in the mirror neuron literature; but Merleau-Ponty emphasizes the dynamic interchange that one finds in the affective attunement that occurs between interacting agents. Merleau-Ponty's notion of intercorporeity has been a special motivation for the more recently developed embodied and enactive approaches to perception and intersubjectivity found in the cognitive sciences (Varela et al., 1991; Noë, 2004). In this regard, Merleau-Ponty suggests that the borders of the transcendental and the empirical become indistinct – we should not think of the facticity of embodiment as external to subjective experience or cognition, but the place where the mind happens, or as he dramatically puts it, where "the transcendental descends into history" (Merleau-Ponty, 1967, p. 107). The best way to see the details of this kind of embodied intersubjectivity is by looking at developmental studies.

# **DEVELOPMENTAL STUDIES**

The *interaction theory* of social cognition draws on the work of the phenomenologists, but also the developmental studies of primary and secondary intersubjectivity (Trevarthen, 1979; also see Rochat, 2001; Hobson, 2004; Reddy, 2008). Primary intersubjectivity involves the sensory–motor capacities that shape our interactions with others from the very beginning. Just after birth, for example, infants are capable of interacting with others, as evidenced in experiments on neonatal imitation (Meltzoff and Moore, 1977).

Throughout the first year of life, infants develop an enactive perceptual access to the emotional and intentional states of others. At 2 months, for example, second-person *interaction* with others is evidenced by the timing of their movements and emotional responses. Infants "vocalize and gesture in a way that seems [affectively and temporally] "tuned" to the vocalizations and gestures of the other person" (Gopnik and Meltzoff, 1997, 131). Further evidence for this is provided by still face experiments (Tronick et al., 1978) and contingency studies (Murray and

Trevarthen, 1985) where infants become significantly upset when faced with unresponsive behavior or mis-timed responses from the mother.

The concept of secondary intersubjectivity (Trevarthen and Hubley, 1978) is associated with the advent of joint attention during the first year. Infants start to notice how others pragmatically engage with the world and they begin to co-constitute the meaning of the world through interactions with others in joint actions. Pragmatic and social contexts start to matter and they enter into situations of participatory sense-making (De Jaegher and Di Paolo, 2007).

The important point made by interaction theory is that both primary and secondary intersubjectivity are not only early developing, characterizing our existence from infancy, but they remain essential aspects of our continued adult existence with others. Moreover, in processes of primary intersubjectivity we develop and continue to sustain a relational sense of self. That is, a sense of self that is intricately coupled to others. Neisser (1988) called this the interpersonal aspect of the self. If one's primary, most basic, minimal sense of self is tied to one's embodied, sensorymotor, proprioceptive processes, these processes are fully involved in intersubjective interactions from the start. We are, as Guenther (2013) puts it, *relationally constituted*.

All of these intersubjective dimensions are reflected later in the way we start to form our self-narratives, and our own narrative self. In contexts where infants are already interacting with their caregivers in personal and pragmatic relations, beginning narratives are elicited from 2-year-olds by questions and prompts (Howe, 2000), and "the child's own experience ... is forecast and rehearsed with him or her by parents .... [C]hildren of 2–4 years often "appropriate" someone else's story as their own" (Nelson, 2003, p. 31). These developmental facts suggest the importance of the role played by narratives in our understanding of self and others – and they continue to be important throughout our adult life.

Again, it's important to note that the capacities of primary and secondary intersubjectivity are not precursors; they are not left behind, but continue to characterize our mature adult behavior – supplemented and transformed via communicative and narrative practices. Behavioral analyses of social interactions in joint actions and shared activities, in working together, in communicative practices, and so on, show that adult agents unconsciously coordinate their movements, gestures, and speech acts (Kendon, 1990; Issartel et al., 2007; Lindblom and Ziemke, 2008). In communicative practices we coordinate our perception–action sequences; our movements and gestures are coupled with changes in velocity, direction and intonation in the movements, gestures and utterances of the other speaker.

Furthermore, the social interaction which characterizes primary and secondary intersubjectivity goes beyond each participant; it results in something (the creation of meaning) that goes beyond what each individual qua individual can bring to the process (De Jaegher et al., 2010). One can think of dance or the tango as a metaphor for the kind of dynamic production of meaning involved in interaction. In the tango something dynamic is created that neither individual could create alone. These interactive practices shape who we are; our identities; our meaningful

experiences of the world; and what we take to be valuable or not so valuable.

# **WHAT HAPPENS WHEN SUBJECTS ARE DEPRIVED OF INTERSUBJECTIVE INTERACTIONS? INDUCED AUTISM**

During the Ceau¸sescu regime in Romania young children were left in orphanages, often because of the extreme poverty of their parents. Hobson (2004) summarizes the conditions in these orphanages:


Children from these orphanages tended to lack the reciprocal to and fro of social exchange, they showed limited social awareness and empathy, they found it difficult to maintain social interaction, and they would rarely turn to their adoptive parents for security and comfort (Hobson, 2004).

Studies by Rutter et al. (1999) showed that a small but much higher than expected proportion of these children developed an atypical (or quasi) form of early childhood autism. A variety of studies found severe problems with social relationships and communication involving


Additional studies show


Rutter et al. (1999) drew the tentative conclusion that prolonged experience of such terrible social and non-social privation was responsible for these quasi-autistic symptoms. Hobson is less tentative: the circumstances of these institutions led to a form of induced autism. Autism (naturally occurring or induced) involves "a disorder of the system of child-in-relation-to other" (Hobson, 2004, p. 203).

By looking at studies of naturally occurring autism we can be more specific about the embodied aspects involved in generating social deficits. There is extensive evidence to suggest that autism involves problems with basic sensory–motor processes that support primary intersubjectivity. Long-standing research based on the analysis of videos of infants younger than 1 year and later diagnosed with autism shows asymmetries or unusual sequencing in crawling and walking, as well as problems and delayed development in lying, righting, sitting (Teitelbaum et al., 1998). Recent studies by Elizabeth Torres et al. (2013) show in great detail disrupted patterns in re-entrant (afferent, proprioceptive) sensory feedback that usually contribute to the autonomous regulation and coordination of motor output. From an early age, this feedback supports volitional control and fluid, flexible transitions between intentional and spontaneous behaviors.

Torres shows that across the entire autistic spectrum there is a disruption in the maturation of this form of proprioception, accompanied by behavioral variability in motor control. In clear contrast to typically developing individuals, the normalized peak (micro-movement) velocity and noise-to-signal ratios of all participants with ASD, including adolescents (14–16 years old) and young adults (18–25 years old), across different ages and across verbal or non-verbal status remained in the region corresponding to younger typically developing children. In the motor system, noise overpowers signal in ASD. Proprioceptive input was random (unpredictable), noisy (unreliable), and nondiversified, and autistic subjects had difficulty distinguishing goal-directed from goal-less motions in most tasks (Torres et al., 2013, p. 16).

Accordingly, because proprioception is random, noisy, and restricted, it's unlikely that individuals with ASD can anticipate the consequences of their own impending movements in a timely fashion. It's also unlikely they could apply fine-tuned discriminations to the actions and emotional facial micro-expressions of others during real time social interactions – entailing a disruption of intercorporeity.

To be clear, my appeal to the data on motor problems in ASD is not meant to suggest an equivalency between individuals with ASD and those that have a form of induced or quasi-autism. Rather, the point is simply that some of the same motor difficulties that correlate with problems in social or intersubjective experience can be found in both groups. Furthermore, both of these groups are, or come to be, embedded in socially rich environments, and this clearly differentiates them from prisoners in solitary confinement who may develop similar motor problems (see below). Indeed, we'll see that prisoners in solitary confinement are moving on the opposite trajectory: deprived of intersubjective contact, they sometimes develop very basic motor problems. In contrast, children who show signs of quasi-autism often improve once they are introduced into social and caring environments; likewise, some individuals with ASD who engage in social interactions improve their social performance and achieve a high level of intersubjective activity. The important point, in the context of this paper, is that motor problems that can undermine social interaction can be induced by social and physical privation.

## **SOLITARY CONFINEMENT**

Prisoners who are subjected to solitary confinement show symptoms and describe a phenomenology that is not equivalent to either autism or induced autism, but reflect similar motor problems, and often times more extensive and serious disruptions of experience.

Guenther (2013), looking at the phenomenology associated with solitary confinement, describes it as becoming "unhinged": "[Prisoners subjected to solitary confinement] see things that do not exist, and they fail to see things that do. Their sense of their own bodies – even the fundamental capacity to feel pain and to distinguish their own pain from that of others – erodes to the point where they are no longer sure if they are being harmed or are harming themselves" (2013, p. xi). There is a long list of experiences associated with solitary confinement: anxiety, fatigue, confusion, paranoia, depression, hallucinations, headaches, insomnia, trembling, apathy, stomach and muscle pains, oversensitivity to stimuli, feelings of inadequacy, inferiority, withdrawal, isolation, rage, anger, and aggression, difficulty in concentrating, dizziness, distortion of the sense of time, severe boredom, and impaired memory (Smith, 2006).

Peter Smith notes: "whether and how isolation damages people depends on duration and circumstances and is mediated by prisoners' individual characteristics; but for many prisoners, the adverse effects are substantial" (2006, p. 441). He documents high rates of mental illness resulting from solitary confinement, starting in the nineteenth century. Hearing about new practices of solitary confinement in American prisons, delegates from Europe came to learn about it. One visitor, the author Charles Dickens, refers to solitary confinement as "slow and daily tampering with the mysteries of the brain ... immeasurably worse than any torture of the body" (1957, 99; cited in Guenther, 2013, p. 18).

Studies of 100 inmates in California's Pelican Bay Supermax prison (Haney, 2003) found 91% of the prisoners suffering from anxiety and nervousness; 70% "felt themselves on the verge of an emotional breakdown" (p. 133); 77% experience chronic depression. One prisoner reported the following experience:

I went to a standstill psychologically once – lapse of memory. I didn't talk for 15 days. I couldn't hear clearly. You can't see – you're blind – block out everything – disoriented, awareness is very bad (Cited in Grassian, 1983, 1453).

Another confirmed sensory disturbances:

Melting: Everything in the cell starts moving; everything gets darker, you feel you are losing your vision (Grassian, 1983, 1452).

And another confirmed memory problems.

Memory is going. You feel you are losing something you might not get back." (Grassian, 1983, 1453).

A systematic review of the phenomenology of solitary confinement reveals symptoms that involve serious bodily and motor problems, derealization, and self-dissolution (or depersonalization).

## *Bodily and motor problems*

Dickens, when visiting an American prison, was curious about the trembling of the prisoners in solitary confinement –"their nervous ticks, their difficulty in meeting his eye or sustaining conversation, their cringing posture and nervousness ..." (Guenther, 2013, p. 19). To Dickens' observation a prison guard replied:

Well it's not so much a trembling, although they do quiver – as a complete derangement of the nervous system. They can't sign their names

to the book; sometimes they can't even hold the pen ...sometimes they get up and down again twenty times in a minute.... Sometimes they stagger as if they were drunk, and sometimes are forced to lean against the fence, they're so bad (Dickens, 1957, pp. 105–106).

Dickens adds that the prisoner's sensory awareness, their capacities to see and hear clearly, to make sense of their perceptions were diminished. "That it makes the senses dull, and by degrees impairs the bodily faculties, I am quite sure" (Dickens, 1957, pp. 108–109). Guenther is right that "it is precisely at the level of bodily perception, sensibility and affectivity that prisoners find their relation to the world undermined" (2013, p. 154).

It's a question open to future empirical investigation whether this kind of undermining of embodiment is similar to the sensory– motor problems described by Torres (2013) in terms of disrupted patterns in the peripheral nervous system – disruptions of the re-entrant (afferent, proprioceptive) sensory feedback that usually contribute to the autonomous regulation and coordination of motor output, as well as to primary intersubjectivity. The observed symptoms do seem similar: poverty of eye-to-eye gaze and gestures in social exchanges; limited language and to-and-fro conversation; a variety of sensory-motor problems.

## *Derealization*

One also finds, correlatively, reportsfrom prisoners in solitary confinement reflecting a derealization – undermining their relation to the world. Thus, the experience of object boundaries becomes uncertain.

It becomes difficult to tell what is real and what is only my imagination playing tricks on me.... the wire mesh on [the] door begins to vibrate or the surface of the wall seems to bulge. (Guenther, 2013, p. 35; citing Grassian, 1983; Shalev, 2009).

As Guenther suggests, in solitary confinement the transcendental intersubjective basis of the experience of the world as real and objective is structurally undermined (2013, p. 35). It completely closes down the possibility of secondary intersubjectivity and therefore of participatory sense making, undermining the capacity to sustain meaning. These problems with derealization, and with sensory-motor processes, correlate with depersonalization and the dissolution of the self.

## *Self-dissolution*

Christensen, in a study of a woman who experienced solitary confinement in Denmark, writes: "The person subjected to solitary confinement risks losing her self and disappearing into a non-existence" (Christensen, 1999, p. 45; cited and trans. by Smith, 2006, p. 497). It is important, however, to specify precisely what aspects of self are at stake in such a statement. Guenther (2013, p. xiii) gives a better indication when she asks: "How could I lose myself by being confined to myself? For this to be possible, there must be more to selfhood than individuality .... Solitary confinement works by turning prisoners' constitutive relationality against themselves." That is, solitary confinement disrupts the *relational self* by disrupting primary and secondary intersubjectivity, and the intercorporeity essential to social interaction.

The practice of solitary confinement is not, as some of the original prison administrators thought, a way for the prisoner to return into self – "The inmate was expected to turn his thoughts inward ..."– a rehabilitation through isolation with oneself (Smith, 2006, p. 456; see Guenther, 2013, p. xvi). Such a proposal reflects a traditional concept of self as an isolated individual substance or soul that benefits from introspection. If, in contrast, the self is relational, then solitary confinement, by undermining intersubjective relationality, leads to a destruction of the self. Stripping away the possibility of primary intersubjectivity – leading to the experience of depersonalization – goes to the very basic level of the *minimal embodied self*.

It also affects the *narrative self*. Self-narrative depends on having something to narrate, and having someone to whom to narrate. In addition, self-narrative practices require four distinct capacities (Gallagher, 2007):


itself. As Donald (2006) puts it, metacognition provides the "cognitive governance" that allows for disambiguating and differentiating events within the narrative.

As it turns out, all four capacities are under threat in the context of solitary confinement. Among the commonly reported symptoms that result from solitary confinement are distortions in the sense of time, which can clearly affect the capacity for temporal ordering; basic disruptions in bodily integrity, so that differentiation between self and non-self is compromised (Guenther, 2013, p. xi); impaired memory; and cognitive difficulties (concentration, confusion) that will clearly affect metacognition.

One can understand the self as a pattern of various aspects (Gallagher, 2013), some of which we have named as minimal embodied aspects, relational aspects, and narratival aspects. On the pattern theory of self, what we call self consists of a complex pattern of a sufficient number of contributories, none of which on their own is necessary or essential to any particular self. Taken together, a certain pattern of characteristic features constitutes an individual self. Such patterns may change over time, taking on different weights and values for the individual they define, and for others, who normally have an influence on how the pattern unfolds. The pattern includes minimal embodied and experiential aspects, affective aspects, intersubjective or relational aspects, psychological/cognitive aspects, narrative aspects, extended aspects, and situational aspects (Gallagher, 2013). Extended aspects include those things that an individual has invested in or considers his own, as James (1890, p. 279) suggested: "*a man's Self is the sum total of all that he* CAN *call his*, not only his body and his psychic powers, but his clothes and his house, his wife and children, his ancestors and friends, his reputation and works [etc.]". Situational aspects include aspects that play some (major or minor) role in shaping who we are, including the kind of family structure and environment where we grew up; cultural and normative practices that define our way of living, but even the physical surroundings that offer affordances or disaffordances for action.

The evidence reviewed above suggests that solitary confinement negatively affects all of these aspects. Reports from prisoners, medical personnel, psychologists, and psychiatrists suggest serious problems with minimal embodied aspects (e.g., physical health and motor problems), experiential aspects (e.g., sensory problems, derealization), affective aspects (e.g., depression, anxiety), intersubjective or relational aspects (e.g., isolation), psychological/cognitive aspects (e.g., lack of concentration, confusion), narrative aspects (e.g., memory problems, distortions in time sense), extended aspects (e.g., lack of control over personal property), and situational aspects (e.g., relatively dire circumstances in prison cells). A breakdown in some significant number of these aspects would be sufficient to alter, or even eradicate the pattern that constitutes self in any particular case.

## **CLARIFYING THE NOTIONS OF CRUEL AND UNUSUAL**

The words "cruel and unusual punishment" first appeared in the English Bill of Rights in 1689. As initially noted, they also appear in the Eighth Amendment to the United States Constitution (1791): "cruel and unusual punishments [shall not be] inflicted." On the British side, the term "cruel" was synonymous with "severe," and generally signified punishments that were disproportionate to the crime (Granucci, 1969, p. 860). The American interpretation, in contrast, focused on identifying cruel methods (and specifically torturous methods) of punishment (see e.g., Berkson, 1975)3. Unfortunately, some cruel and unusual punishment is not so unusual – so we may prefer the wording of the Universal Declaration of Human Rights adopted by the UN General Assembly (A/RES/217, 1948): "No one shall be subjected to torture or to cruel, inhuman or degrading treatment or punishment." This still leaves us with the question of what constitutes cruel, inhuman or degrading punishment, and as noted at the start it is still difficult to find a clear definition of these terms in the legal domain.

In 1972, United States Supreme Court Justice William Brennan, in *Furman v. Georgia* (408 U.S. 238; 1972), a case involving the death penalty, defined four principles that determine when a punishment is cruel and unusual:


Unlike the first principle, principles 2, 3, and 4 are easier to measure or define. It's not clear, however, that on their own, arbitrariness, social rejection, and lack of necessity define the concept of cruelty. Justice Brennan thus suggests that these principles need to be applied in a convergent fashion. That the concept of "cruelty" (or "degrading to human dignity"4) remains obscure can be seen in how it is glossed in the following explanation.

[These criteria are] interrelated, and, in most cases, it will be their convergence that will justify the conclusion that a punishment is "cruel and unusual." The test, then, will ordinarily be a cumulative one: if a punishment is *unusually severe*, if there is a strong probability that it is inflicted arbitrarily, if it is substantially rejected by contemporary society, and if there is no reason to believe that it serves any penal purpose more effectively than some less severe punishment, then the continued infliction of that punishment violates the command of the Clause that the State may not inflict inhuman and uncivilized punishments

<sup>3</sup>In Coker vs Georgia (433 U.S. 584 [1977]), however, the US Supreme Court interpreted the phrase in terms of disproportionality (Radin, 1978). Most legal interpretations of this phrase are tied to the death penalty. The range of interpretation of what constitutes "cruel and unusual" is wide, however. Thus, MacReady (2009, 708) reports, in *The Lancet*, "Substandard prison health care is deemed a violation of the Eighth Amendment to the Constitution that prohibits cruel and unusual punishment, making prisoners the only group of Americans who are guaranteed medical care." For more on the history and background on the legal issues concerning cruel and unusual punishment and solitary confinement, see Dayan and Dayan (2007); Madrid v Gomez (889 F. Supp. 1146 – Dist. Court, ND California, 1995), a case in which a district court judge came close to condemning solitary confinement as cruel and unusual; also Wedekind, 2011 and Solitary Watch (n.d.).

<sup>4</sup>The concept of dignity is not well defined in the law either (see McDougal et al., 1980). Pellegrino (2008, p. xi) states: "... there is no universal agreement on the meaning of the term, human dignity." The term in used in a variety of ways, but it is often associated with the concept of respect for the human person.

upon those convicted of crimes. (Brennan, 1972, p. 239; emphasis added)

The severity of punishment that is degrading to human dignity is explicated as when the severity is "unusually severe."

Without dismissing the other three principles, I want to suggest that the phenomenology of solitary confinement provides a clearer interpretation of the concept of *cruelty* or *degrading of human dignity,* one that on its own should be sufficient for disqualifying solitary confinement as an acceptable punishment5.

The concept of self or person that the liberal tradition sets up as having dignity and demanding respect is a standard that treats the self as a stand-alone individual capable of autonomous deliberation and decision (see e.g., Code, 2011). Both phenomenology and science shows this to be an abstraction that fails to recognize the *relational* nature of the self with embodied, experiential, and affective dimensions, complicated by narrative, extended and situated aspects of human existence. Solitary confinement morally degrades human dignity by literally degrading (if not destroying) the human self in all of these aspects, starting with the deeply relational dimension. Ethically and practically speaking, this multi-dimensional, relational self is the only viable concept of self that the liberal tradition should use to measure its own practices pertaining to dignity, respect, and justice. If we destroy the self in its full pattern, or in a sufficient number of its aspects, it would be difficult to argue that we are respecting the person in any moral sense and not degrading the dignity of the human being.

## **ACKNOWLEDGMENTS**

The author acknowledges support received from the Marie-Curie Initial Training Network, "TESIS: toward an Embodied Science of InterSubjectivity" (FP7-PEOPLE-2010-ITN, 264828), European Commission Research, and the Humboldt Foundation's Anneliese Maier Research Award. This paper was presented at the *Workshop on Torture and Solitary Confinement: Phenomenology and Ethics*, University of Memphis (April 2014). I thank the participants in that workshop, especially Joshua Dohmen, Lisa Guenther, Bruce Janz, Matthew Ratcliffe, Zuzanna Rucinska, and Shokoufeh Sakhi, for their helpful comments.

## **REFERENCES**


<sup>5</sup>There is no universal agreement that solitary confinement need be considered cruel and unusual punishment. Thus, Bonta and Gendreau (1990), who discount phenomenological and qualitative studies in favor of more objective and experimental ones, conclude: "solitary confinement may not be cruel and unusual punishment under the humane and time-limited conditions investigated in experimental studies or in correctional jurisdictions that have well-defined and effectively administered ethical guidelines for its use" (p. 361). Bonta and Gendreau's study, however, has been subject to widespread criticism [see, for example, the critique by Jackson (2002)]. Furthermore, most psychiatric studies of solitary confinement have condemned the practice (Grassian, 1983; Haney, 2003). Note that objective research of prison conditions, including solitary confinement, is very difficult to undertake, given security constraints. Most studies of solitary confinement are commissioned as parts of lawsuits (e.g., Grassian and Haney have both testified as expert witnesses).

Texts from the estate. Part 1. 1905-1920]. ed. I. Kern (The Hague: Martinus Nijhoff).


Merleau-Ponty, M. (1967). *The Structure of Behavior*. Trans. Fisher. Boston: Beacon.


Noë, A. (2004). *Action in Perception*. Cambridge, MA: MIT Press.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 April 2014; paper pending published: 16 May 2014; accepted: 26 May 2014; published online: 12 June 2014.*

*Citation: Gallagher S (2014) The cruel and unusual phenomenology of solitary confinement. Front. Psychol. 5:585. doi: 10.3389/fpsyg.2014.00585*

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Gallagher. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# ADVANTAGES OF PUBLISHING IN FRONTIERS

FAST PUBLICATION Average 90 days from submission to publication

COLLABORATIVE PEER-REVIEW

Designed to be rigorous – yet also collaborative, fair and constructive

RESEARCH NETWORK Our network increases readership for your article

# OPEN ACCESS

Articles are free to read, for greatest visibility

## TRANSPARENT

Editors and reviewers acknowledged by name on published articles

GLOBAL SPREAD Six million monthly page views worldwide

# COPYRIGHT TO AUTHORS

No limit to article distribution and re-use

IMPACT METRICS Advanced metrics track your article's impact

SUPPORT By our Swiss-based editorial team

EPFL Innovation Park · Building I · 1015 Lausanne · Switzerland T +41 21 510 17 00 · info@frontiersin.org · frontiersin.org