# COMPONENTS OF THE LANGUAGE-READY BRAIN

EDITED BY: Cedric Boeckx and Antonio Benítez-Burraco PUBLISHED IN: Frontiers in Psychology

#### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-893-1 DOI 10.3389/978-2-88919-893-1

## About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

## Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **COMPONENTS OF THE LANGUAGE-READY BRAIN**

Topic Editors: **Cedric Boeckx,** University of Barcelona & Catalan Institute for Research and Advanced Studies (ICREA), Spain **Antonio Benítez-Burraco**, University of Huelva, Spain

This volume highlights new avenues of research in the language sciences, and particularly, in the neurobiology of language. The term "language-ready brain" stresses, on the one hand, the importance of a brain-based description of our species' linguistic capacity, and, on the other, the need to appreciate the crucial role culture plays in shaping the linguistic systems children acquire and adults use. For this reason, the focus is not put on language per se, but on our learning biases and cognitive pre-dispositions toward language. Both brain and culture are considered at two crucial levels of inquiry: phylogeny and ontogeny. In a fast-growing field like the language sciences and specifically, language evolution studies, this book has tried to capture several of the most exciting topics explored currently, sowing seeds for future investigations.

**Citation:** Boeckx, C., Benítez-Burraco, A., eds. (2016). Components of the Language-Ready Brain. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-893-1

# Table of Contents

*04 Editorial: Components of the Language-Ready Brain* Cedric Boeckx and Antonio Benítez-Burraco

### **Section 1: Genetics**

*05 Retinoic Acid Signaling: A New Piece in the Spoken Language Puzzle* Jon-Ruben van Rhijn and Sonja C. Vernes

### **Section 2: The brain**

*13 Merge in the Human Brain: A Sub-Region Based Functional Investigation in the Left Pars Opercularis*

Emiliano Zaccarella and Angela D. Friederici

*22 'Syntactic Perturbation' During Production Activates the Right IFG, but not Broca's Area or the ATL*

William Matchin and Gregory Hickok


## **Section 3: Development**

*69 The "Globularization Hypothesis" of the Language-ready Brain as a Developmental Frame for Prosodic Bootstrapping Theories of Language Acquisition*

Aritz Irurtzun

*76 Temporal Attention as a Scaffold for Language Development* Ruth de Diego-Balaguer, Anna Martinez-Alvarez and Ferran Pons

## **Section 4: Evolution**


# Editorial: Components of the Language-Ready Brain

Cedric Boeckx <sup>1</sup> \* and Antonio Benítez-Burraco<sup>2</sup>

<sup>1</sup> Catalan Institute for Research and Advanced Studies (ICREA)/Department of General Linguistics, Universitat de Barcelona, Barcelona, Spain, <sup>2</sup> Department of Philology and its Didactics, University of Huelva, Huelva, Spain

Keywords: language, language development, evolution, neurolinguistics, evolutionary biology

**The Editorial on the Research Topic**

**Components of the Language-Ready Brain**

Our intention in putting together this volume was to exemplify and highlight new avenues of research in the language sciences concerning the neurobiology of language. We chose the term "language-ready brain" for our Research Topic, like we did for Boeckx and Benitez-Burraco, because we think it is high time to stress, on the one hand, the importance of a brain-based description of our species' linguistic capacity, and, on the other, the need to appreciate the crucial role culture plays in shaping the linguistic systems children acquire and adults use. In this sense, the focus of neurobiological investigations should not be "language," but our learning biases and cognitive pre-dispositions toward language (i.e., "language-readiness"). Both brain and culture considerations ought to shape research at all levels of inquiry: phylogeny and ontogeny.

The contributions to this research topic break new grounds, by either revisiting long-standing issues (such as the role of Broca's region, the relevance of lateralization, the evolutionary origins of phonology, the role of basic cognitive and perceptive abilities in language acquisition, or the functions performed by language), or by examining closely issues that we are sure will rise to prominence in the near future (like the translational models of language processing into specific patterns of brain oscillations or the nature of the gene networks in which known "language genes" are found integrated). Taken together, the papers collected here shed light on language at the level of the genetics (van Rhijn and Vernes), brain connectivity (Murphy; Theofanopoulou), and physiology (Matchin and Hickok; Zaccarella and Friederici), cognition (de Boer; de Diego-Balaguer et al.), and behavior (Bouchard; Irurtzun; Reboul; Samuels).

In a fast-growing field like the language sciences, Research topics cannot hope to capture all relevant aspects of the field, but we hope that the present volume offers a snapshot that some of the most exciting research taking place today, sowing seeds for future investigations.

# AUTHOR CONTRIBUTIONS

Both authors wrote the editorial.

# FUNDING

Preparation of this work was supported by funds from the Spanish Ministry of Economy and Competitiveness (grants FFI2013-43823-P and FFI2014-61888-EXP), as well as funds from the Generalitat de Catalunya (2014-SGR-200).

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Boeckx and Benítez-Burraco. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

#### Edited and reviewed by:

Manuel Carreiras, Basque Center on Cognition, Brain and Language, Spain

> \*Correspondence: Cedric Boeckx cedric.boeckx@ub.edu

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 25 April 2016 Accepted: 06 May 2016 Published: 23 May 2016

#### Citation:

Boeckx C and Benítez-Burraco A (2016) Editorial: Components of the Language-Ready Brain. Front. Psychol. 7:762. doi: 10.3389/fpsyg.2016.00762

# Retinoic Acid Signaling: A New Piece in the Spoken Language Puzzle

*Jon-Ruben van Rhijn1,2 and Sonja C. Vernes1,3\**

*<sup>1</sup> Department of Language and Genetics, Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands, <sup>2</sup> Molecular Neurophysiology Group, Department of Cognitive Neuroscience, Radboud University Medical Center, Nijmegen, Netherlands, <sup>3</sup> Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands*

Speech requires precise motor control and rapid sequencing of highly complex vocal musculature. Despite its complexity, most people produce spoken language effortlessly. This is due to activity in distributed neuronal circuitry including cortico-striato-thalamic loops that control speech–motor output. Understanding the neuro-genetic mechanisms involved in the correct development and function of these pathways will shed light on how humans can effortlessly and innately use spoken language and help to elucidate what goes wrong in speech-language disorders. *FOXP2* was the first single gene identified to cause speech and language disorder. Individuals with *FOXP2* mutations display a severe speech deficit that includes receptive and expressive language impairments. The neuro-molecular mechanisms controlled by *FOXP2* will give insight into our capacity for speech–motor control, but are only beginning to be unraveled. Recently FOXP2 was found to regulate genes involved in retinoic acid (RA) signaling and to modify the cellular response to RA, a key regulator of brain development. Here we explore evidence that FOXP2 and RA function in overlapping pathways. We summate evidence at molecular, cellular, and behavioral levels that suggest an interplay between FOXP2 and RA that may be important for fine motor control and speech–motor output. We propose RA signaling is an exciting new angle from which to investigate how neuro-genetic mechanisms can contribute to the (spoken) language ready brain.

#### *Edited by:*

*Antonio Benítez-Burraco, University of Huelva, Spain*

#### *Reviewed by:*

*Constance Scharff, Freie Universität Berlin, Germany Tomokazu Tomo Fukuda, Tohoku University, Japan*

*Christina Roeske contributed to the review of Constance Scharff*

> *\*Correspondence: Sonja C. Vernes sonja.vernes@mpi.nl*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

*Received: 01 September 2015 Accepted: 10 November 2015 Published: 26 November 2015*

#### *Citation:*

*van Rhijn JR and Vernes SC (2015) Retinoic Acid Signaling: A New Piece in the Spoken Language Puzzle. Front. Psychol. 6:1816. doi: 10.3389/fpsyg.2015.01816*

Keywords: retinoic acid, FoxP2, synaptic plasticity, development, motor skills, striatum, dopamine receptor

# SPEECH AND SPOKEN LANGUAGE

Speech is the primary modality by which humans use language, and human orofacial morphology is uniquely suited to the production of intricate vocalizations needed for spoken language (Lieberman, 2007). The orofacial musculature is one of the most complex muscle systems in the body and in order to successfully produce meaningful speech these muscles must be controlled and coordinated in rapid sequences involving distributed neuronal circuitry. This motor activity is generated in several neural loops that select appropriate actions and generate the necessary motor patterns. One crucial circuit, the cortico-basal ganglia loop, sends activity from the motor cortex to the striatum (a component of the basal ganglia) where activity is integrated. Subsequently, outputs from here modulate activity in several thalamic nuclei. Activity from the thalamus is then sent back to the motor cortex, where a specialized population of output neurons organizes the complex thalamocortical inputs (Kravitz and Kreitzer, 2012; Calabresi et al., 2014). These cortical output neurons send the information, via the pyramidal tract, to motor neurons directly controlling muscle tissue. These neurons are either located in the spinal cord (controlling limb and body movements), or in the brainstem's cranial nerve nuclei

(controlling facial and vocal tract movements). An illustration of the cortico-basal ganglia loop (in the rodent brain) is given in **Figure 1A**. Proper connectivity within this pathway is necessary to enable the precise outputs needed for orofacial muscle control.

The striatum can be seen as a central hub within the motor pathway, making it one of the most intriguing regions in which to investigate properties of motor circuitry and orofacial control. Striatal activity is especially important for fine motor behavior and motor skill learning (Doyon et al., 2003) and cortical and subcortical circuitry, including the striatum, has been established as highly important for speech–motor control (Lieberman, 2002). Furthermore, increased activation of the basal ganglia (which incorporates the striatum) has been shown via functional brain imaging (fMRI) in specific speech–motor language tasks (Wildgruber et al., 2001; Booth et al., 2007). Lastly, morphological changes in the striatum have been described in individuals with speech problems such as stuttering (Craig-McQuaide et al., 2014) and non-fluent aphasia (Ogar et al., 2007).

The principal cell type in the striatum is the medium spiny neuron (MSN), which makes up approximately 98% of all striatal cells (Kemp and Powell, 1971; Huang et al., 1992; for review, see Kreitzer and Malenka, 2008). MSNs can be further divided into two categories of neurons that have different connectivity and opposing functions: dopamine receptor type 1 (D1R) and dopamine receptor type 2 (D2R) expressing cells (**Figure 1A**). D1R expressing MSNs connect to thalamic nuclei via the "direct pathway" which results in excitation of the motor cortex. D2R expressing MSNs form an "indirect pathway" that connects to the thalamus via multiple subcortical regions leading to inhibition of the thalamus and thus reduced cortical input (**Figure 1A**), (Albin et al., 1989; Kravitz and Kreitzer, 2012; Calabresi et al., 2014). This balance between excitation (resulting in more movement) and inhibition (less movement) is crucial for coordinated motor function (Calabresi et al., 2014) including fine orofacial motor control.

In order to unravel the fundamental components that enable humans to effortlessly use spoken language, we will need to understand the neuro-genetic mechanisms involved in establishment, function, and maintenance of speech–motor pathways.

# SPOKEN LANGUAGE AND FOXP2

A breakthrough in speech and language genetics came with the identification of the first gene to cause a speech/language disorder: *FOXP2* (Lai et al., 2001). Mutations in *FOXP2* were found in a large pedigree known as the KE family (Hurst et al., 1990; Fisher et al., 1998; Lai et al., 2001). Affected family members were diagnosed with a severe speech impairment known as developmental verbal dyspraxia (also known as childhood apraxia of speech; OMIM: 602081) and carried a mutation in one copy of their *FOXP2* gene. In addition to speech impairments, affected family members demonstrated receptive and expressive language problems (Watkins et al., 2002a). Although rare, *FOXP2* mutations have been found in a number of unrelated families and individuals with similar speech/language phenotypes (MacDermot et al., 2005; Feuk et al., 2006; Shriberg et al., 2006; Lennon et al., 2007; Palka et al., 2012; Rice et al., 2012; Zilina et al., 2012; for review, see Bacon and Rappold, 2012). In depth investigations of the KE family phenotype indicated a severe impairment in orofacial praxis tasks (Vargha-Khadem et al., 1995; Lai et al., 2001; Watkins et al., 2002a). In addition, impairments in language production tasks (e.g., phoneme addition, word repetition) were found between control and affected individuals (Vargha-Khadem et al., 1995). Different aspects of speech are thus impaired in KE family members (Watkins et al., 2002a). Orofacial praxis deficits underlie impaired lexicon building and subvocal (internal) speech representations which can affect irregular verb grammar (Doyon et al., 2003) and rule based grammar learning (Ullman, 2001). Thus, some of the language impairments in the KE family could be related to the core speech production deficits observed.

FOXP2, and its murine homolog Foxp2, are found across many regions of the developing and postnatal brain (FoxP2 will be used when referring to both species). Intriguing is the high expression of FoxP2 throughout the mouse and human cortico-striato-thalamic motor circuitry (Lai et al., 2003). During early development FoxP2 is broadly expressed in these regions, but in later developmental and postnatal stages expression becomes more restricted (**Figure 1B** depicts Foxp2 expression in the postnatal mouse brain). In adults, Foxp2 is limited to deep layer cortical neurons (layer 5 motor cortex and layer 6 throughout; Ferland et al., 2003; Morikawa et al., 2009; Hisaoka et al., 2010; Tomassy et al., 2010; Reimers-Kipping et al., 2011; Tsui et al., 2013). Within the striatum, Foxp2 is highly expressed in both types of MSN, though more commonly in D1R MSNs compared to D2R neurons (Vernes et al., 2011). Corresponding with its expression pattern, imaging studies have shown humans with *FOXP2* mutations display structural and functional differences in motor areas. Affected members of the KE family showed structural gray matter volume differences in the motor cortex and striatum (Watkins et al., 2002b). Furthermore, functional imaging studies showed an underactivation of the striatum and altered cortical activation (including speech/motor areas such as the left anterior insular cortex) during word generation and word repetition tasks (Liegeois et al., 2003).

Converging evidence from FoxP2 expression pattern studies and phenotypic characterization of human mutations suggests that FOXP2 may play an important role in the development of the speech–motor pathway. The high expression of Foxp2 in a specific subset of neurons (D1R MSNs) in the striatum indicates a functional specificity related to motor tasks requiring the striatothalamic connections of the direct pathway. Malfunctions within this pathway could ultimately affect aspects of the motor circuitry related to fine motor control and contribute to the observed speech–motor deficit in humans.

# FOXP2 AS A MOLECULAR ENTRY POINT INTO SPEECH–MOTOR PATHWAYS

FoxP2 is a transcription factor; its molecular function is to regulate the expression of other genes, switching them on or off in a temporally and spatially controlled manner. FoxP2 has been shown to regulate 100s of different genes involved in processes crucial to brain development and function, ranging from neurogenesis and migration, to neurite outgrowth and synaptic activity (Spiteri et al., 2007; Vernes et al., 2007, 2011; Konopka et al., 2009; Devanna et al., 2014). Recently, evidence has suggested that FOXP2 regulates a number of genes involved in the retinoic acid (RA) signaling pathway (Devanna et al., 2014). RA is a vitamin-A derivative essential to mammalian development. Disruption of the RA signaling pathway (caused by genetic disruptions or dietary deficiencies) can have severe consequences during development and adulthood (Holson et al., 1997; Krezel et al., 1998)

Retinoic acid induces genetic and morphological changes in cells. When neuronal precursors (cells that generate neurons during development) differentiate into neurons they switch on genes normally found in mature neurons, stop dividing and grow long processes known as neurites (Siegenthaler et al., 2009; Korecka et al., 2013). We previously compared how neuronlike cells with or without FOXP2 responded to RA and found that cells showed stronger genetic and morphological changes in response to RA if FOXP2 was present (Devanna et al., 2014). In addition we discovered that FOXP2 changed the expression of RA receptors – proteins that directly control the cellular response to RA (Devanna et al., 2014). Of particular interest, FOXP2 upregulated retinoic acid receptor β (RARβ) and a number of other genes involved in transport or modification of RA were also transcriptionally regulated (e.g., RORβ, CRABPII, and ASCL1). These experiments suggest an intriguing link between FOXP2 and the RA pathway, in which FOXP2 seems to contribute to or modify the cellular response to RA.

Given the importance of the RA pathway for development, this raises new questions about how FOXP2 might mediate its effects on brain and neural circuit development. Could the relationship between FOXP2 and the RA pathway be relevant for (1) normal motor circuitry development and function, and/or (2) effects of FOXP2 dysfunction in patients? To address these questions, we need to understand how FoxP2 and the RA pathway might interact, and in what way FoxP2 mutations might affect the RA pathway on a cellular, functional and behavioral level.

# RA, FOXP2, AND MOTOR BEHAVIOR

Retinoic acid is a key compound during embryogenesis, affecting a multitude of critical developmental pathways. Precise control of RA levels is essential for normal brain development as either an excess or a deficiency of RA results in widespread adverse effects on the brain.

Gestational treatment of rats with excess RA results in behavioral deficits in learning, memory and motor function (Holson et al., 1997). Rats treated with excess RA displayed poor generalized motor control including impairments in the 'righting reflex' (the ability to return to upright position), and the ability to sit only on the back paws. In addition, gestationally treated adult rats showed problems with learning and memory, such as decreased learning rates in a water filled T maze (Butcher et al., 1972; Holson et al., 1997). Rats lacking dietary vitamin A (of which RA is a metabolite) also perform poorly on motor learning and motor performance tasks (Carta et al., 2006). Furthermore, mice engineered to lack a key facilitator of RA signaling (RARβ) develop severe locomotion deficits and are highly impaired on motor learning tasks (Krezel et al., 1998).

The displayed motor deficits are similar to phenotypes observed in mouse models of Foxp2 dysfunction. Mouse models of two well characterized patient mutations of FOXP2 have been created that have comparable phenotypes. One mouse model reflects the R553H missense mutation found in the KE family (Lai et al., 2001). The second mouse model mirrors an early stop codon in exon 7 introduced by a non-sense mutation that leads to a loss of FOXP2 protein in an independent family with speech/language disorder (MacDermot et al., 2005; Groszer et al., 2008). Mice that have a homozygous Foxp2 mutation show severe general motor impairments, reminiscent of animals treated with excess RA. However these Foxp2 homozygous mutants do not survive beyond 3–4 weeks after birth, possibly due to a requirement for Foxp2 in other organs such as the lungs or heart (Groszer et al., 2008). In mice where a single copy of Foxp2 is affected (as per the heterozygous state of the mutations observed in patients) general motor control is normal but motor learning is impaired (Groszer et al., 2008; French et al., 2012). This more subtle phenotype closely resembles the motor learning phenotype observed in RA deprived rats (Carta et al., 2006). For an overview of the different phenotypes exhibited by Foxp2 mutation, RAR mutation, and RA treatment, see **Table 1**.

# FOXP2 AND RA SIGNALING AFFECT NEURONAL FUNCTION

In addition to the behavioral deficits, vitamin A deprivation/supplementation adversely affects striatal development and function. Cells in the developing lateral ganglionic eminence (the precursor region of the striatum) do not differentiate into the appropriate neuronal subtypes when RA signaling is blocked (Toresson et al., 1999; Chatzi et al., 2011). However restoring RA levels rescued this phenotype and resulted in normal differentiation into appropriate neuronal cell types (Chatzi et al., 2011). Separately, mice engineered to knockout the *RAR*β gene display gross morphological striatal defects including impaired neurogenesis and deficits in acquiring proper neuronal identities (Liao et al., 2008). Lastly, chronic postnatal vitamin A supplementation has been linked to oxidative cell toxicity in the striatum (de Oliveira et al., 2007).

Foxp2 also contributes to striatal cell morphology and function. Foxp2 mutant neurons exhibit reduced neurite growth and branching in primary striatal cultures (Vernes et al., 2011) and the *in vivo* striatum displays aberrant neuronal activity. Mice with a heterozygous Foxp2 mutation showed unusually high activity in the dorsomedial striatum during active motor behavior (French et al., 2012). This suggests striatal cells can no longer properly modulate their activity following input from motor areas when lacking Foxp2. Moreover, the increased striatal activity normally seen when animals perform motor learning tasks was absent in mutant mice. Instead, a decrease in firing rate was seen, again suggesting aberrant modulation of responses to cortical and/or thalamic input (French et al., 2012). Additionally, extracellular measurements on striatal brain slices from heterozygous Foxp2 mutant animals show these cells fail to respond to induction of long term depression (LTD; Groszer et al., 2008). An inability to induce long term plasticity [either LTD or long term potentiation (LTP)] has debilitating consequences as scaled activity (plasticity) is necessary for circuits to properly regulate their input and output. Synaptic long term plasticity changes underlie information storage and are necessary for learning and memory (Novkovic et al., 2015; Zhu et al., 2015). Interestingly, in the striatum, synaptic plasticity has been strongly linked to motor learning (Dang et al., 2006; Kreitzer and Malenka, 2007). Defects specifically related to striatal LTD and LTP are known to affect procedural motor learning and the acquisition of new motor paradigms (Gubellini et al., 2004).

Aberrant induction of synaptic scaling has also been found in mice following acute RA depletion, which results in a complete lack of hippocampal LTP or LTD (Misner et al., 2001). This phenotype was specific to RA depletion and was reversible, as vitamin A supplementation rapidly restored normal synaptic plasticity (Misner et al., 2001). At a molecular level, RA signaling is mediated by the action of RA receptors (RARs; RARα, RARβ, and RARγ) and similar plasticity defects have been shown for mice lacking RARα (Sarti et al., 2012) or RARβ (Chiang et al., 1998). Hippocampal cells from these mice fail to establish LTD when subjected to low frequency stimulation – the paradigm necessary to induce LTD in the hippocampus. By contrast, excess RA induced the reverse effect in cultured hippocampal slices, where increased excitatory activity was observed (Aoto et al., 2008). It is not yet known if RA signaling affects synaptic plasticity in the striatum. However, the similarity in synaptic activity phenotypes between Foxp2-, RARα-, and RARβ-deficient animals (albeit focusing on different brain regions) does indicate these transcription factors may play a role in similar intracellular pathways regulating neuronal activity and synaptic plasticity.

The aforementioned plasticity (LTD and/or LTP) deficits in Foxp2, RARα, and RARβ mutant animals suggests an improper reaction of neuronal circuits to changes in external input. Induction of LTD or LTP leads to a decrease or an increase, respectively, in the amount of glutamate receptors (of the AMPAreceptor class) at the synaptic membrane (Seidenman et al., 2003; Briand et al., 2014; for review, see Luscher and Huber, 2010). This change in AMPA receptor abundance modifies the response strength of a cell when it is excited. The change in stimulus–response strength is transient, and in time the normal AMPA receptor distribution will be restored, returning synaptic responses to normal levels. RA treatment of hippocampal cultures has shown an increase of AMPA receptors on the cell surface (Aoto et al., 2008), but no data on the striatum is currently present. The shared synaptic plasticity defect following disruption of RA signaling pathways or Foxp2 mutation does suggest that they both may influence receptor abundance or localization at the synapse in the striatum, an intriguing area for further study.



*–, no effect;* +*, mild effect;* ++*, strong effect; N/A, not applicable; NT, not tested.*

A thorough investigation of the mechanisms leading to LTD and LTP deficits resulting from RA/RAR and Foxp2 malfunction will be necessary to understand if they function in the same pathways. Understanding the molecular mechanisms underlying striatal function, especially related to complex motor circuitry function, will lead to a better understanding of striatal speech– motor control.

# MOLECULAR LINKS BETWEEN RARs AND FOXP2

Retinoic acid receptors canonically function as transcription factors, regulating genes responsible for directing normal embryogenesis and brain development. Interestingly, FoxP2 and RARs share some of the same target genes (Balmer and Blomhoff, 2002; Delacroix et al., 2010; Devanna et al., 2014). RARs are highly expressed in the brain (Krezel et al., 1999) and are present throughout embryonal development (Mollard et al., 2000), postnatal development (Wei et al., 2011), and in adults (Krezel et al., 1999; Zetterstrom et al., 1999). Notably high expression of RARs can be found throughout the motor circuitry, including cortical, striatal, and multiple thalamic regions (Krezel et al., 1999), (**Figure 1B**). We focus on two key receptors found in the motor circuitry: RARα and RARβ. RARα is found in layer 5 of the cortex and in the thalamus – both regions that overlap with murine Foxp2 expression (Krezel et al., 1999; Zetterstrom et al., 1999; Ferland et al., 2003; Lai et al., 2003; Hisaoka et al., 2010). Interestingly, Foxp2 only overlaps with RARα in the motor cortex layer 5, because Foxp2 expression is largely restricted to layer 6 of other mature cortical areas. RARβ is strongly expressed only in the striatum, another site where Foxp2 expression is highest (**Figure 1B**). Notably, FOXP2 has been shown to directly drive RARβ expression in human cells (Vernes et al., 2007; Devanna et al., 2014), although this is yet to be shown in the striatum. This high level of overlap, combined with shared target genes and molecular interactions, strongly supports interplay between FoxP2 and RARs in motor pathways.

# CONCLUDING REMARKS

In addition to its canonical role during embryogenesis, studies described here suggest RA signaling plays a specific role in the development and function of striatal motor circuitry and may link to FoxP2 function. Disruption of the RA pathway results in strikingly similar phenotypes to FoxP2 mutation on multiple levels, which suggests a potential mechanistic interaction. FoxP2 and RARs can regulate some common target genes, affect similar cellular phenotypes and show highly overlapping expression patterns in the cortico-striato-thalamic motor circuitry. In the striatum, aberrant function of Foxp2 and RA signaling contributes to altered development and, in the case of mutations of mouse Foxp2, altered synaptic plasticity similar to that seen in the hippocampus of RARα mutant animals. Given that RARβ is predominantly expressed in the postnatal striatum, it seems likely that its disruption will also affect striatal plasticity, however, this is yet to be experimentally determined. Lastly, animals with mutated Foxp2 or RA signaling defects show comparable motor control/learning impairments. Thus at multiple levels (molecular, cellular, circuit, and behavioral) there is evidence that interplay between FoxP2 and RA signaling may facilitate proper development and function of motor circuitry. This evidence from mice is strengthened by findings in songbirds which show both FoxP2 and RA influence song learning by acting in circuits that have parallels with human vocal-motor pathways (Haesler et al., 2007; Wood et al., 2008). In the future it will be of great value to understand if these signaling cascades interact to influence neuronal mechanisms related to song learning or speech–motor control, and if RA signaling deficits are involved in aberrant speech–motor development in humans. The capacity for human speech and spoken language is dependent on multiple molecular and neural building blocks. With the link between FoxP2 and RA signaling, a new block has been suggested, giving us new opportunities to investigate the evolution and development of the (spoken) language ready brain.

## REFERENCES


#### ACKNOWLEDGMENTS

This work was supported by a grant from the Donders Institute, Radboud University Nijmegen, a Marie Curie career development grant awarded to SCV and by the Max Planck Society. We would like to thank Moritz Negwer for valuable comments on the manuscript.


of retinoid binding proteins and receptors and evidence for presence of retinoic acid. *Euro. J. Neurosci.* 11, 407–416. doi: 10.1046/j.1460-9568.1999.00 444.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 van Rhijn and Vernes. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Merge in the Human Brain: A Sub-Region Based Functional Investigation in the Left Pars Opercularis

#### Emiliano Zaccarella1, 2 \* and Angela D. Friederici 1, 2

<sup>1</sup> Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany, <sup>2</sup> Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Berlin, Germany

Language is thought to represent one of the most complex cognitive functions in humans. Here we break down complexity of language to its most basic syntactic computation which hierarchically binds single words together to form larger phrases and sentences. So far, the neural implementation of this basic operation has only been inferred indirectly from studies investigating more complex linguistic phenomena. In the present sub-region based functional magnetic resonance imaging (fMRI) study we directly assessed the neuroanatomical nature of this process. Our results showed that syntactic phrases—compared to word-list sequences—corresponded to increased neural activity in the ventral-anterior portion of the left pars opercularis [Brodmann Area (BA) 44], whereas the adjacently located deep frontal operculum/anterior insula (FOP/aINS), a phylogenetically older and less specialized region, was found to be equally active for both conditions. Crucially, the functional activity of syntactic binding was confined to one out of five clusters proposed by a recent fine-grained sub-anatomical parcellation for BA 44, with consistency across individuals. Neuroanatomically, the present results call for a redefinition of BA 44 as a region with internal functional specializations. Neurocomputationally, they support the idea of invariance within BA 44 in the location of activation across participants for basic syntactic building processing.

Keywords: pars opercularis, clusters, syntax, merge, fMRI

# INTRODUCTION

Traditionally language is thought of as one of the most complex cognitive functions. Recently, it has been claimed, however, that the human capacity to process complex syntactic structures is based on a very basic binary process which syntactically binds words together hierarchically to form larger structures. Because of the fundamental nature of this computation, called merge in theoretical linguistics (Chomsky, 1999; Adger, 2003), the determination of its neural implementation would constitute the neurobiological basis of a process which is at the root of any complex syntactic structure (Berwick et al., 2013). Up to now, the operation has almost never been directly studied in isolation, as syntax usually has been studied in more complex sentential contexts (Just et al., 1996; Stromswold et al., 1996; Moro et al., 2001; Cooke et al., 2002; Röder et al., 2002; Ben-Shachar et al., 2003, 2004; Constable et al., 2004; Bornkessel et al., 2005; Fiebach et al., 2005; Grewe et al., 2005; Friederici et al., 2006b; Santi and Grodzinsky, 2007, 2010; Caplan et al., 2008; Kinno et al., 2008;

#### Edited by:

Cedric Boeckx, Catalan Institute for Research and Advanced Studies (ICREA) and Universitat de Barcelona, Spain

#### Reviewed by:

Narly Golestani, Université de Genève, Switzerland Evie Malaia, University of Texas at Arlington, USA

> \*Correspondence: Emiliano Zaccarella zaccarella@cbs.mpg.de

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 15 September 2015 Accepted: 10 November 2015 Published: 27 November 2015

#### Citation:

Zaccarella E and Friederici AD (2015) Merge in the Human Brain: A Sub-Region Based Functional Investigation in the Left Pars Opercularis. Front. Psychol. 6:1818. doi: 10.3389/fpsyg.2015.01818 Newman et al., 2010). These and other studies across different languages indicate that the larger region in and around Broca's area in the inferior frontal cortex (IFG) supports syntactic processes (for reviews see Vigneau et al., 2006; Friederici, 2011).

A second region to be considered in the frontal cortex is the frontal operculum (FOP) which is a phylogenetically older than Brodmann Area (BA) 44 (Sanides, 1962; Friederici, 2006) and has been shown to be involved in syntactic classification as well as word based processing (Grasby et al., 1994; Stowe et al., 1998; Friederici et al., 2000b, 2006a). The adjacent anterior insula (aINS) is involved in the processing of short two-word sequences independent of whether they constitute a phrase or not (Zaccarella and Friederici, 2015). Other studies trying to localize syntactic processes reported the involvement of temporal regions, i.e., the left posterior superior and anterior temporal lobe (ATL) regions rather than inferior frontal regions (Bottini et al., 1994; Stowe et al., 1999; Vandenberghe et al., 2002; Humphries et al., 2005, 2006). These studies, however, used long sentences and compared these to word-lists often allowing minimal syntactic processes. The lack of any ATL activity in studies comparing sentences that only differed in syntactic complexity (Friederici et al., 2006b) speaks in favor of a compositional semantic role rather than a syntactic role of this area (Barsalou, 1982; Humphries et al., 2007). Indeed, a series of recent magnetoencephalography (MEG) experiments looking at conceptual compositionality effects at the phrasal level, found the ATL to be active during the construction of complex semantic representations, when color concepts had to be combined together with real objects, or when the same colors had to be combined with nouns (object labels) carrying semantic information (Bemis and Pylkkänen, 2011, 2013; Del Prato and Pylkkanen, 2014). Semantic sensitivity for the area is strongly confirmed by a recent fMRI study showing that activity change in the ATL varied as a function of the presence of lexicosemantic information, but not of syntactic variables, during the construction of progressively increasing linguistic structures (Pallier et al., 2011).

The goal of the present study was to identify the neural basis of the most basic syntactic computation, upon which any more complex hierarchical structure can be derived. By this computation two words, i.e., this and ship, are bound together to a hierarchical phrase containing both words—i.e., this ship with this dominating ship. By applying the same mechanism again, we then recombine this phrase with the closest element occurring in the sentence, to form increasing syntactic hierarchies—i.e., this ship sinks. Phrases of two-word length—like this ship—are the ideal level to investigate this most basic process of syntactic binding, as the amount of cognitive load required to process such small constructions is very limited. This means that—after classifying this as a determiner and ship as a noun—only the operation of merge is necessary to make it a phrase. Crucially, at the very same two-word level, it is also possible to create contexts consisting of simple lists of words—like stone, ship—in which no phrase can be created, as no syntactic relationship holds. Because of this minimal opposition, two-word manipulations are an ideal level to identify the most basic and essential syntactic computation of merge. We hypothesize that this computation should be located in Broca's area, as this region has been found to support syntactic processes (Vigneau et al., 2006; Friederici, 2011; Hagoort and Indefrey, 2014) and further that if its assumed fundamental nature holds, it should be localizable (a) in a very confined subregion within this area, and (b) with little variance across individuals.

Neuroanatomically, Broca's area can be subdivided into BA 45 and BA 44, whose borders were first mapped based on the cellular organization of their regional tissues (Brodmann, 1909), and then redefined using observer-independent cell density profiles over histological slices of postmortem brains (Amunts et al., 1999). More recently, a multireceptor-based analysis separated BA 44 into an anterior dorsal part, and a posterior ventral part (Zilles and Amunts, 2009; Amunts et al., 2010). A very recent meta-analytic functional connectivity-based parcellation (CBP) approach even proposed a decomposition of BA 44 into five separate subregions, called clusters (Cs) two of which located in its more posterior part (C1 and C4), another two in the more anterior part (C2 and C3), and a third one in the inferior frontal junction (C5; Clos et al., 2013). The CBP approach first identifies the whole-brain co-activation pattern for each voxel contained in BA 44 across several thousands fMRI studies, and then groups together those voxels into distinct clusters, according to the similarity of their co-activation patterns across the brain. While the functional specificity of these five clusters as indicated by the metaanalysis is low, with each of the clusters' functional domain ranging from action, working memory, switching to other cognitive tasks, the fine-grained subdivision of BA 44 may allow a precise localization of the most fundamental syntactic process assumed for any natural language. Here we hypothesize, that if the high functional specificity of merge as a fundamental syntactic computation holds, this mechanism should to be localizable in one of the sub-clusters within BA 44, with little inter-individual variance.

This hypothesis was investigated in an fMRI study using twoword sequences either allowing hierarchical syntactic binding to apply (phrase trials) or not (list trials), which were presented visually. Phrase trials and list trials were constructed as parallel as possible, only varying in the possible application of syntactic binding. They differed in their first element, which was either a determiner (e.g., this) or a noun (apple), while the second element was always a phonotactically legal pseudoword (flirk) to drastically reduce conceptual-semantic processing in both conditions (Bemis and Pylkkänen, 2011). Two corresponding one-word conditions were also included, in which the pseudowords were substituted with a series of X's (e.g., XXXXX) to explore the effect of number of words. Thus, the experiment included two factors: type of STRUCTURE ["phrase" (PH), allowing syntactic binding, vs. "list" (LS) not allowing syntactic binding] and number of WORDS ("2-words" vs. "1 word plus Xs"). We employed three progressively regionstringent levels of data analysis to localize syntactic binding: (i) a whole-brain analysis to know whether BA 44 and FOP and/or the aINS show activity during two-word processing; (ii) a more restricted volume-of-interest analysis for BA 44 to directly contrast phrase processing vs. word-list processing; and (iii) a cluster-of-interest analysis within BA 44 to localize syntactic

binding at the individual subject level. These three analyses allow us to test whether: (i) BA 44 is highly sensitive to structure formation even at the lowest level of phrase structure building; (ii) the fundamental nature of syntactic binding is expressed by a stringent localization in a subregion within BA 44, using a clusterbased approach in which the five clusters by Clos et al. (2013) are used to test sub-regional sensitivity for merge within BA 44; (iii) the invariant character of syntactic binding assuming little variance of the localization within this subcluster of BA 44 across individuals.

# MATERIALS AND METHODS

#### Participants

We tested 27 right-handed subjects (Oldfield, 1971), but only 22 subjects (11 female; mean age 28.5 years, standard deviation (SD) 3.62 years; all native German speakers) were included in the analysis. Four subjects were excluded because of poor behavioral performance. One additional subject was excluded because the trial list file was corrupted. The local ethics committee of the University of Leipzig approved all procedures used during the experiment. Written informed consent was obtained from all subjects.

# Stimuli Construction

At the 2-words level, phrasal syntactic contexts (2-PH) comprised eight adjectival determiners of two-syllables in length to appear as first word—jede/jedes (each), eure/euer (your), jene/jenes (that), diese/dieses (this) followed by 48 different pseudowords (e.g., DIESE FLIRK). List contexts (2-LS) comprised eight nouns selected from the CELEX corpus for German (Baayen et al., 1995) which were matched to the determiners for syllabic length, letter length, and syllabic stress—Apfel (apple), Käse (cheese), Ofen (oven), Efeu (ivy), Motor (motor), Kiwi (kiwi), Haken (hook), Koffer (suitcase). A corresponding example of list context was APFEL, FLIRK. These words and the pseudowords were constructed and controlled according to all relevant psycholinguistic parameters, and for the pseudowords we strongly avoided associative effects with real words by using an automatized screening procedure based on the same CELEX corpus, followed by a final filtering selection done by three mother-tongue German speakers (see Appendix A–Stimuli construction for detailed information). Importantly, the use of pseudowords in either phrasal or list contexts at two-word level, was directly intended to reduce potential interactions due to semantic activity, given indeed that we were explicitly interested in finding neural correlates of syntactic processing in the brain. In doing so, we took advantage of the intrinsic linguistic distinction between the syntactically prominent functional lexicon (e.g., determiners, prepositions, conjunctions) and the semantically prominent contentive lexicon (e.g., nouns, verbs, adjectives), and coupled it together with the semantic-free nature of the pseudowords themselves. In this respect, since determiners have less semantic content that do nouns, the choice to use pseudowords instead of real words helped us to: (1) keep syntactic activity at work in determiner-pseudoword contexts, while removing semantic information; (2) remove syntactic and semantic information in the noun-pseudoword contexts, by reducing compounding effects via head de-lexicalization (see Supplementary Material); (3) shield further light on the role of BA 44 as syntactic-sensitive area. Finally, at the 1-word level, the pseudowords were substituted with a series of X's (e.g., XXXXX) to obtain 1-word phrasal contexts (1-PH: DIESE XXXXX) and 1-word list contexts (APFEL XXXXX).

# Procedure

Before entering the scanning room, participants performed a short practice session of the actual experiment on a desktop computer located just outside the MR unit area. None of the stimuli used in the instruction session were used during the experimental session. Once in the scanner, stimuli were presented visually using the software package Presentation <sup>R</sup> (Neurobehavioral Systems, Inc., Albany, CA, USA) with a Sanyo PLC-XP50L LCD XGA (Sanyo Electric Co., Ltd., Moriguchi, Japan; pixels = 1024 × 768; refresh rate = 100 Hz) backmirror system mounted on the head-coil. Although the projector was already adjusted for minimal luminance, a white font/gray background was used, and preferred by all subjects. A monospaced font (Courier) was used (capitalized letters; 45 pt.). A single trial consisted of a white fixation cross which remained at the center of the screen until a random jitter of either 0 or 1000 ms after volume acquisition started the visual stimulation. Stimulusonset-asynchrony was 8.6 s on average. All trials had a total duration of 900 ms and the items were presented sequentially on the screen one after the other (Supplementary Figure 1). Given that our stimulus construction was syllable-constrained, the first bi-syllabic word remained on the screen for 600 ms, while the second monosyllabic word/X string lasted 300 ms. As soon as the fixation cross reappeared, immediately after the second item within the trial had been shown, subjects were requested to perform a simple sequence judgment task similar to the one used in Friederici et al. (2000a), as quickly as possible, by indicating via triple-choice button-pressing whether the two words together formed a phrase (e.g., DIESE FLIRK = yes), whether they did not form a phrase, but just a list of two nouns (APFEL FLIRK = no), or whether it was a trash trial with X strings (DIESE XXXXX/APFEL XXXXX = trash). We used a fully counter-balanced stimulus exposure across conditions, such that in half of the cases the determiner (or the noun) was followed by a pseudoword, and in the other half of the cases by a sequence of Xs. Therefore, subjects could not discriminate between conditions on occurrence of the first word of the trial, rather they were forced to pay attention to the second word to solve the task. Subjects were requested to use the right index finger, the right middle finger, and the right ring finger to accomplish the task. The order of both buttons and trials were fully randomized across subjects. Each experimental dataset collection lasted ∼42 min.

## Behavioral Data Analysis

Mean reaction times for correct responses (RTs) and accuracy rates were calculated for each condition of each participant and were analyzed using a Two-way within-subject analysis of variance (ANOVA), with factors STRUCTURE (phrase: PH vs. list: LS) and number of WORDS (2- vs. 1-word). Missing responses were counted as non-correct responses.

# fMRI Data Acquisition

Functional images were acquired with a 3T whole-body Bruker Medspec 3000 Scanner. The functional data were acquired using a T2<sup>∗</sup> -weighted gradient-echo echo-planar-imaging (EPI) sequence, with the following parameters: TR = 2.0 s, TE = 30 ms, flip angle = 90◦ , FOV = 19.2 × 19.2 cm<sup>2</sup> , in-plane resolution = 3×3 mm<sup>2</sup> ; data matrix = 64×64; slice thickness = 3 mm; interslice gap = 1 mm; number of slices = 30 (axial slices, parallel to AC-PC line/whole-brain coverage, ascending direction), number of volumes = 1270 volumes. T1-weighted 3D MP-RAGE (magnetization-prepared rapid gradient echo) images (Mugler and Brookeman, 1990)—TI = 650 ms; TR = 1300 ms; alpha = 10◦ ; FOV = 256 × 240 mm—were previously acquired with a non-selective inversion pulse to be used for preprocessing of the functional data.

# Functional Imaging Data Analysis

Functional data were analyzed using the SPM8 software package (http://www.fil.ion.ucl.ac.uk/spm/). In the pre-processing session, subject-specific functional volumes were co-registered with corresponding structural T1-weighted images. Functional time series were further realigned to the first image to correct for motion artifacts, and resliced for timing correction. A gray-matter segmentation-based procedure was used for normalization to the standard MR template included in the SPM software package. A Gaussian filter of 8 mm<sup>3</sup> FWHM was used to smooth the data. A high pass filter of 128 s was used to attenuate slow global signal changes. These data entered in a number of analyses described below, they were also used for an additional analysis focusing on the insula and its subregions presented in a separate article (Zaccarella and Friederici, 2015).

#### fMRI Whole-brain Data Analysis

The SPM8 software package was then used to perform a twostage random-effects analysis to ensure result generalizability over the population level (Penny and Holmes, 2004). The first five volumes from each dataset were excluded to allow for magnetic saturation effect. Subject-specific general linear models were assessed using the hemodynamic response function from the SPM software (Friston et al., 1995). Single stimulus functions were modeled according to their timing onsets. Error trials and fillers trials were modeled as distinct conditions, and movement parameters were treated as regressors of no interest. Contrast estimates for the four experimental conditions (compared against the global mean) were obtained using first-level statistics. The contrast estimates were then used in a second-level withinsubjects ANOVA to assess group contrasts. Statistical inferences were drawn at P < 0.05, with a Family-Wise Error (FWE) correction.

#### Volume-of-Interest Analysis

Following our initial hypothesis that BA 44 is responsible for merge processing, we focused on the left infero-frontal region alone, by performing a finer-grained analysis in BA 44 to assess the specific effect of phrases compared to lists, directly (2-PH > 2-LS). The cytoarchitectonically-defined BA 44 from the maximum probability map (MPM; Supplementary Figure 2) of the SPM Anatomy Toolbox (Eickhoff et al., 2005) served as an independent search space to avoid selection bias (Kriegeskorte et al., 2009; Vul and Kanwisher, 2010). A smallvolume correction (SVC) was used to threshold the results, at P < 0.05, FWE-corrected.

#### Cluster-of-Interest Analysis

The multi-modal CBP map for BA 44 proposed in Clos et al. (2013) served as mask for the Cluster-of-Interest (COI) analysis. Here we first wanted to simply localize the cortical distribution of the active voxels to the contrast 2-PH > 2-LS, which we found at P < 0.05, FWE-corrected in the SVC analysis discussed above. This map, which is bounded by the same cytoarchitectonic region (MPM) that was used in the above SVC analysis, consists of five sub-regional BA 44 clusters comprising a posterior-dorsal cluster (C1), an anterior-dorsal cluster (C2), an anterior-ventral cluster (C3), a posterior-ventral cluster (C4), and an inferior frontal junction cluster (C5; see also Supplementary Figure 3). For our purposes, the activation mass obtained from the SVC analysis was first transformed into a binary image of zeroes (not-active voxels) and ones (active voxels), and then dot-multiplied with each cluster volume from the BA 44 parcellation map described above. Following this procedure we then counted the total number of ones (active-voxels) falling within each cluster to determine the overlapping region. Additionally, the five CBP clusters were further used as seed sub-regions to extract signal intensity, to evaluate the mean activity distribution of the syntactic binding effect across the different clusters. Mean signal extraction from the five clusters was done using Marsbar 0.41 for SPM (available at http://marsbar.sourceforge.net).

#### Individual Peak Activity Distribution Analysis

Finally, we were interested in assessing whether at the individual level, the peak distribution of neural activity across individual subjects was homogeneously spread within BA 44, or rather gathered around one of the CBP clusters described above, therefore showing little variance over space. To evaluate cluster sensitivity at the subject level, we again used the map for BA 44 as a binary searchable space, and dot-multiplied it with the subject-specific contrast we obtained as {T} maps from first-level statistics, for the contrast (2-PH > 2-LS). For each subject, we then extracted a unique 3D coordinate maximum corresponding to one voxel. Each 3D coordinate was in turn localized as belonging to a particular cluster, using each BA 44 sub-regional cluster as independent mask, following an analogous counting procedure of the one described in the COI analysis above. From the resulting distribution, we first performed a standard chisquare distribution test. We then employed a randomization test of goodness-of-fit to strengthen, or possibly weaken, the significance of our cluster-sensitivity. We therefore drew 10,000 random samples from a population with our known proportions we obtained from the data, re-calculated the chi-square for each replicate sample, and then counted how many times a larger chi-square value was obtained during randomization (McDonald, 2009). The proportion of replicates with chi-square values equal to or greater than the first observed value was then taken as final p-value. A threshold of p = 0.05 (5% of times) was chosen.

# RESULTS

# Behavioral Results

No significant effect for accuracy was found. A significant effect for STRUCTURE [F(1, 21) = 25.003; p < 0.0001] and an interaction between WORDS and STRUCTURE [F(1, 21) = 5.896; p < 0.05] were found for the reaction time data. A series of paired t-tests revealed that subjects were slower for 2-LS compared to 2-PH (t = 3.93; p < 0.001). An almost significant difference for LS: 2 > 1 was found (p = 0.059), while there was no significant difference for the contrast PH: 2 > 1 (p > 0.1; Supplementary Figure 4).

#### Whole-brain Analysis

We found a main effect of WORDS in the left and right FOP/adINS at x = −33; y = 23; z = −2 and x = 36; y = 23; z = −2, respectively, and in the left BA 44 at x = −48; y = 11; z = 7. The main effect of STRUCTURE, as well the WORDS × STRUCTURE interaction did not yield any significant clusters that survived standard statistical thresholds (Supplementary Table 1).

#### Volume-of-Interest Analysis

The hypothesis-driven analysis in BA 44 revealed a significant cluster for phrase compared to list at two-words level (2-PH > 2- LS) in the ventral anterior part of BA 44 at x = −48; y = 17; z = 16, using a small volume correction analysis (SVC) in the area. No significant voxel was found active for the opposite contrast (2-PH < 2-LS), even at more liberal thresholds. To note, prior direct comparison between phrase and list at two-words level, we run an ANOVA with factors WORDS and STRUCTURE within BA 44 to gain information about a possible interaction within the region. Interestingly, we detected within the region the 3D voxel showing the highest peak by downloading the unthresholded WORDS × STRUCTURE activation map we obtained from the SPM group-averaged output (x = −51; y = 20; z = 13). From this 3D coordinates we then extracted signal intensity for all four conditions to verify whether an interaction between WORDS and STRUCTURE would have survived statistical control. To note, we found a significant interaction between the two factors at p = 0.039 level [F(1, 21) = 4.83]. Direct comparison between 2-PH and 1-PH was significant at p < 0.001 level (t = 3.87), as it was between 2-PH and 2-LS at p = 0.007 level (t = 2.94). Direct comparison between 2-LS and 1-LS was not significant (t = 1.67; p = 0.11; see **Figure 1A**. See also Supplementary Table 1 and Supplementary Figure 5 for the interaction effect). We further performed additional independent analyses in the other regions that were found to be active for the main effect of WORDS, to evaluate whether syntactic binding was specifically performed in BA 44 alone, or whether additional portions of the cortex were also involved. There was no difference between phrases and lists in the other regions under analysis (see also Appendix B–Volume-of-interest Analysis for more information). To gain further exploratory indication on the relative functional contribution of phrases and lists to the main effect of WORDS in the inferior frontal regions, and to verify the results we obtained from the SVC analysis, we went then back to our full brain datasets and performed two distinct planned contrasts at the two levels of the STRUCTURE factor. For the contrast lists vs. oneword list condition (LS: 2 > 1) we found activity in the opercula only at x = −33; y = 23; z = −2 and x = 36; y = 23; z = −2. For the contrast phrases vs. one-word phrase condition (PH: 2 > 1) we found additional recruitment of the ventralanterior portion of left BA 44 at x = −51; y = 11; z = 7, which, together with the LS: 2 > 1 results, suggests a stronger involvement of BA 44 for phrases than for lists. At the wholebrain level, however there was no significant difference in activity for the contrast phrases and lists (2-PH > 2-LS) using standard threshold methods.

#### Cluster-of-Interest Analysis within BA 44

All voxels active for the contrast phrase vs. list at twowords level (2-PH > 2-LS) in the SVC in BA 44 fell within the anterior–ventral cluster C3 (100% overlap; 12/12 voxels; see **Figure 1B**). Remarkably, paired t-tests for signal intensity revealed a significant difference in activity between phrases and lists (2-PH > 2-LS) in the C3 cluster only [t(21) = 2.97, p = 0.007, surviving Bonferroni-correction for the number of tests [p = (0.05/5 tests) = 0.01], and in no other sub-region (see **Figure 2** and Supplementary Figure 6).

#### Individual Peak Activity Distribution Analysis

A strong cluster-sensitive distribution of the individual peak activity for the contrast (2-PH > 2-LS) was found in C3, as compared to the other BA 44 sub-regions [χ 2 (4) = 13.45, p = 0.009, confirmed after 10,000 randomization tests for goodnessof-fit; see **Figure 3**].

#### DISCUSSION

The goal of the present fMRI study was to identify the neuroanatomical basis of the most fundamental syntactic computation, which is at the root of all natural languages (Chomsky, 1995; Berwick et al., 2013). This basic computation, called merge, which binds two words together syntactically, allows to build up syntactic structures with increasing hierarchy. Here we found that this most basic process of syntactic binding corresponded to increased activity in a most confined brain region, i.e., the anterior section of the ventral left pars opercularis, BA 44 at the posterior part of Broca's area. Conversely, a phylogenetically older area, the FOP/adINS, was found to be equally active for both phrasal structures and unstructured wordlists, not discriminating between these.

With respect to the FOP/aINS, the present analysis indicates that this area's function—previously identified by a region-ofinterest analysis of the insula—is also identifiable at the whole brain analysis, thereby extending results from the same study (Zaccarella and Friederici, 2015). Its involvement revealed from the contrast between two- and one-word stimuli may reflect word-accumulation processes during which the categorical information and the grammatical status of the word is first accessed (Friederici et al., 2000b) and then shortly maintained on hold (Grasby et al., 1994), before further processing takes place. A similar activity pattern found for both phrases and lists is not surprising, given the low degree of functional specialization that the FOP/adINS has inside the language processing system (Saygin et al., 2004; Mutschler et al., 2009). This lower specialization of the left FOP/adINS compared to BA 44 also finds support in another fMRI study which showed that this area, in contrast to BA 44, was not able to distinguish between grammar types, but only able to detect an error in the order of syllables in sequences (Friederici et al., 2006a).

In BA44 we found that the basic operation of syntactic merge was sensitive to a specific cluster within the region, such that only the anterior-ventral cluster as one among five sub-regions discriminated between phrases and lists. Crucially, we discovered that the localization of this activity within the same anteriorventral cluster was highly consistent across participants. While a

Zaccarella and Friederici Merge in the Left Pars Opercularis

previous functional connectivity-based meta-analysis described this subregion as being associated with all kinds of language processes (Clos et al., 2013), the present study is the first to delineate this subregion from other regions within BA 44 in its function in language. This particular subregion appears to be responsible for the most basic syntactic computation at the root of all syntactic hierarchies. Remarkably, within neurolinguistic literature, the contribution of parts of Broca's area, in particular BA 44, to syntactic processing has mostly been discussed in terms of syntactic complexity at the sentential level, since the area was found to be crucial for the processing of syntactically more complex sentential hierarchies, compared to simpler ones (Röder et al., 2002; Friederici et al., 2006a,b; Bahlmann et al., 2008). The present data are in line with the view of BA 44 being activated as a function of structural hierarchy, but they clearly go beyond this view by demonstrating that the most basic syntactic computation upon which more complex hierarchies are built, can be neuroanatomically located in a sub-region of BA 44. This means that because both complex and simpler linguistic hierarchies necessarily share the same computational merging algorithm, BA 44 activates as a function of structural hierarchy regardless of the linguistic complexity itself.

The sustained cluster-sensitivity of the merge computation in the anterior-ventral cluster of BA 44 both at the group and the individual level points toward a fundamental and constrained nature of merge at the neural level. While we acknowledge that the present work only tested for one single language, the low interindividual spatial variability we found across our representative set of subjects, might be taken as a first approximate indication in favor of the fundamental character of the computation itself. The present finding of a sub-regional specificity and invariability of the most basic process closely resembles the neural organization of other basic sensory processes, as for example in the visual system (Downing et al., 2006), while the low inter-individual variability of the basic processes gives rise to the assumption that their function to structure relation is predetermined. In this respect, because of the essential nature of merge as being the shared computation across all human languages, future studies should systematically focus their attention on the neural implementation of very basic linguistic processes in multiple languages. This would ultimately prove whether the universality of merge at theoretical level (Chomsky, 1995) adequately corresponds to some neuroanatomical generalizability at the neural level suggested here.

From an evolutionary perspective, no clear evidence for a human-like language syntax in nonhuman species has been presented so far (Fitch and Hauser, 2004; Bolhuis et al., 2014). Structural studies have shown that the FOP is a phylogenetically older cortex, which is fully represented in monkeys (Sanides, 1962), while BA 44 seems to be more expressed in humans than in monkeys, in which it plays a role in orofacial somatomotor processes (Petrides et al., 2005). The activation pattern we report in our study closely resembles the one proposed for human and non-human artificial grammar processing in the adult brain, in which violations to transition probabilities are found to activate FOP, while violations to more rules activate BA 44 (Friederici et al., 2006a). This functional split between the labor of FOP/aINS and that of BA 44 is particularly intriguingly if put into relation with recent theoretical linguistic models, which propose that the specificity of merge in language should reside in the property that words have to create constituents where the lexical label of the single dominant word (e.g., determiner) is reflected as a hierarchical influence onto the newly created syntactic constituent (e.g., determiner phrase; Boeckx, 2010; Chomsky, 2013; Murphy, 2015). At the interface between linguistic theory and neurolinguistics, the merge mechanism would then consist of two phases: one in which linguistic elements are strung together without any hierarchical dimension in the FOP/aINS, and a labeling phase in which the dominant lexical element transforms the string into a hierarchically labeled syntactic structure in the anterior-dorsal BA 44. We believe that this speculation can constitute a testable model for the evolution of the language faculty, in which behavioral, functional and anatomical data can be put together in a comparative perspective within across human and animal species (Murphy, 2015).

# CONCLUSION

The sub-anatomical specificity for the process of syntactic binding, called merge, has a strict neural basis in the anteriorventral cluster of BA 44. The profoundly constrained regional localization of this syntactic operation converges on the conclusion that the computation at the root of our syntactic knowledge has strict neural basis. The constraint localization of the activity and its consistency across the participants point toward the fundamental neurobiological nature of the operation of merge itself, thereby providing a novel view on the relation between linguistic theory and neurobiology.

# ACKNOWLEDGMENTS

We wish to thank all participants for their involvement, Mandy Jochemko, Anke Kummer, and Simone Wipper for MRI data acquisition, Lars Meyer for his valuable support during the analysis of the functional data, Michiru Makuuchi, Katharina von Kriegstein, and Isabell Wartenburger for their comments on earlier versions of the manuscript, Francie Manhardt for evaluating the stimulus material, and Kerstin Flake for helping us creating the figures. This work was supported by a grant of the European Research Council (ERC-2010-AdG 20100407 awarded to AF) and by the Berlin School of Mind and Brain (EZ).

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2015.01818

# REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Zaccarella and Friederici. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# 'Syntactic Perturbation' During Production Activates the Right IFG, but not Broca's Area or the ATL

#### William Matchin<sup>1</sup> \* and Gregory Hickok<sup>2</sup>

<sup>1</sup> Cognitive Neuroscience of Language Laboratory, Department of Linguistics, University of Maryland, College Park, MD, USA, <sup>2</sup> Auditory and Language Neuroscience Laboratory, Department of Cognitive Sciences, University of California, Irvine, Irvine, CA, USA

Research on the neural organization of syntax – the core structure-building component of language – has focused on Broca's area and the anterior temporal lobe (ATL) as the chief candidates for syntactic processing. However, these proposals have received considerable challenges. In order to better understand the neural basis of syntactic processing, we performed a functional magnetic resonance imaging experiment using a constrained sentence production task. We examined the BOLD response to sentence production for active and passive sentences, unstructured word lists, and syntactic perturbation. Perturbation involved cued restructuring of the planned syntax of a sentence mid utterance. Perturbation was designed to capture the effects of syntactic violations previously studied in sentence comprehension. Our experiment showed that Broca's area and the ATL did not exhibit response profiles consistent with syntactic operations – we found no increase of activation in these areas for sentences > lists or for perturbation. Syntactic perturbation activated a cortical-subcortical network including robust activation of the right inferior frontal gyrus (RIFG). This network is similar to one previously shown to be involved in motor response inhibition. We hypothesize that RIFG activation in our study and in previous studies of sentence comprehension is due to an inhibition mechanism that may facilitate efficient syntactic restructuring.

#### Edited by:

Cedric Boeckx, Catalan Institution for Research and Advanced Studies/Universitat de Barcelona, Spain

#### Reviewed by:

Olaf Hauk, MRC Cognition and Brain Sciences Unit, UK Ina Bornkessel-Schlesewsky, University of South Australia, Australia

> \*Correspondence: William Matchin wmatchin@umd.edu

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 12 December 2015 Accepted: 05 February 2016 Published: 23 February 2016

#### Citation:

Matchin W and Hickok G (2016) 'Syntactic Perturbation' During Production Activates the Right IFG, but not Broca's Area or the ATL. Front. Psychol. 7:241. doi: 10.3389/fpsyg.2016.00241

Keywords: syntax, sentence processing, language, fMRI, inferior frontal gyrus, Broca's area, ATL, production

# INTRODUCTION

Language can be analyzed as a cognitive faculty consisting of several components, including a core structure-building system – syntax – that operates over stored lexical atoms (Chomsky, 1982, 1995; Hauser et al., 2002). Much work attempting to localize syntactic operations has focused on Broca's area (Stromswold et al., 1996; Hagoort, 2005; Grodzinsky and Santi, 2008; Friederici, 2011) in the left inferior frontal gyrus (LIFG; Brodmann areas 44 and 45) and the anterior temporal lobe (ATL; Rogalsky and Hickok, 2009; Bemis and Pylkkänen, 2011; Brennan et al., 2012). However, the response profile of Broca's area during sentence comprehension appears to be more compatible with a domain-general function such as working memory or cognitive control than with syntax (Kaan and Swaab, 2002; Novick et al., 2005; Rogalsky and Hickok, 2011; Bornkessel-Schlesewsky and Schlesewsky, 2013), although this is still a hotly debated issue (Hickok and Rogalsky, 2011; Fedorenko et al., 2012b). Similarly, recent neuroimaging and neuropsychological studies have implicated the ATL in semantic rather than syntactic processes (Rogalsky and Hickok, 2009; Pallier et al., 2011; Del Prato and Pylkkanen, 2014; Wilson et al., 2014).

We chose to perform a functional magnetic resonance imaging (fMRI) experiment during sentence production to contribute to this debate. Sentence production studies in fMRI and magnetoencephalography (MEG) have revealed large overlap with the activation patterns found in comprehension, suggesting that similar neural networks underlie sentence processing in both modalities (Braun et al., 2001; Blank et al., 2002; Haller et al., 2005; Golestani et al., 2006; Menenti et al., 2011; Segaert et al., 2012; Del Prato and Pylkkanen, 2014; Pylkkänen et al., 2014). These studies are informative with respect to the neurobiology of sentence production; however, few production studies have manipulated syntactic variables compared to the vast literature on syntactic processing in comprehension. Syntax production studies will be important to provide complementary evidence to the comprehension literature to better understand syntactic processing in the brain.

We attempted to parallel the effects of syntactic violations that several researchers have used to study syntax in comprehension (e.g., Embick et al., 2000; Moro et al., 2001; Friederici et al., 2003). However, there are significant obstacles in extending the violation approach to production. Instructing subjects to produce artificial syntactic errors means that subjects will expect the upcoming violation. This may eliminate the effect of interest because expectation lessens the strength of the neural response to syntactic violations (Lau et al., 2006). The short time constraints of fMRI make difficult capturing infrequent natural errors or using a paradigm to induce subjects to produce them (Ferreira and Swets, 2005). Because of these reasons, we forced subjects to intermittently and unexpectedly switch their planned syntactic structure mid-utterance. The logic is that switching structures increases demands on the neural resources involved in syntactic processing (as well for other mechanisms). We expected to capture this effect in the blood-oxygen level-dependent (BOLD) response during scanning.

In the present study, we controlled the syntactic structure of the subjects' utterances with a constrained sentence elicitation task, similar to Caplan and Hanna (1998). To induce syntactic restructuring, we borrowed the target perturbation paradigm from motor control research (Paulignan et al., 1991; Elliott et al., 1995; Izawa et al., 2008). In this paradigm subjects attempt to hit a target, e.g., reaching from one point to another on a screen. On most trials the subject's target and/or sensory feedback remain constant throughout the trial. On a smaller proportion of random trials, the subject's target or sensory feedback is altered midmovement. For example, the target location changes, or a force is applied to the subject's arm. On such trials, the subject must adapt and correct the movement trajectory online to reach the goal. We adapted this approach to syntax, dubbing our paradigm "syntactic perturbation."

We trained subjects to produce either active sentences (e.g., Susan is following Charlie) or passive sentences (e.g., Charlie is being followed by Susan). On most trials (standard trials, 80%) subjects did not switch their planned structure. On a smaller proportion of random trials (switch trials, 20%) a cue prompted subjects to switch structures mid-utterance. In other words, on switch trials the cue prompted the subject to switch from active to passive or from passive to active. This task was artificial, raising questions about the ecological validity of our experiment – such considerations should be kept in mind when evaluating the results. However, we assumed subjects would update their syntactic structure regardless of this artificial nature.

The key assumptions of our experiment are the following: (1) the planned syntactic structure of an utterance is built in advance of speech production (at least for mono-clausal active and passive sentences), and (2) this plan can be dynamically updated during speech production. The first assumption is supported by the fact that juxtaposition errors often occur for words or phrases of the same syntactic category and from the same syntactic position (Fromkin, 1971) – to account for this regularity, speakers must have built the syntactic structure in advance of articulation. The second assumption is supported by an experiment showing that speakers decrease their rate of speech predictively if the structure they ultimately utter contains a syntactic violation (Ferreira and Swets, 2005).

Our design consisted of two main contrasts: STRUCTURE (sentences > word lists) and PERTURBATION (switch trials > standard trials), and one secondary contrast: COMPLEXITY (passive sentences > active sentences). Using our novel paradigm in production rather than comprehension, we examined the response profile of these contrasts in areas traditionally associated with syntax. Our main goal was to further inform the debate on the role of these regions in syntactic processing. Also, we believe that the discussion of the neurobiology of syntax has focused overwhelmingly on the ATL and Broca's area because of the repeated use of similar experimental manipulations in comprehension. We sought to determine whether our experiment, differing in modality and task, found activation in areas outside of these regions, potentially indicating a role for them in syntactic processing. We discuss our predictions for each of these contrasts in turn.

The contrast of sentence > word lists in comprehension has frequently revealed activation in the ATL, often bilaterally but also left lateralized (e.g., Mazoyer et al., 1993; Humphries et al., 2005; Rogalsky and Hickok, 2009). Two recent MEG studies found increased activation for preparation to produce two-word phrases compared to production of single words (Pylkkänen et al., 2014) and two-word lists (Del Prato and Pylkkanen, 2014). Current research supports a semantic interpretation of ATL function that drives these effects. We expected that the contrast of STRUCTURE in our study would also activate the ATL because the production of sentences presumably requires semantic processing that the production of lists does not.

The sentence > list contrast in comprehension occasionally activates Broca's area (Bedny et al., 2011; Fedorenko et al., 2011; Pallier et al., 2011), but these effects are much less consistent than for the ATL (Rogalsky and Hickok, 2011). This suggests that this activation reflects the contribution of working memory resources or cognitive control mechanisms needed to parse difficult input rather than fundamental syntactic operations (Novick et al., 2005; Rogalsky et al., 2008). Our sentences are short, simple in structure, and guided by a strict template; we believe this minimizes demands on these mechanisms. Therefore we did not expect STRUCTURE to activate Broca's area.

For similar reasons, we also expected that Broca's area would not show a significant effect of PERTURBATION. During switch trials, subjects had an unambiguous selection of the alternative sentence construction, which should minimize selection demands (Miller and Cohen, 2001). With respect to working memory, these resources are taxed in conditions that place heavy demands on maintenance of information or retrieval across intervening material (Baddeley, 1992; Gibson, 2000; Lewis et al., 2006). During switch trials, subjects had to quickly restructure their utterance but did not have to maintain additional material or retrieve information across long distance.

We did not have equally strong predictions for the PERTURBATION contrast in the ATL. Although the existing evidence does not support a role for the ATL in syntax, changes in syntactic structure lead to changes in semantic interpretation (Chomsky, 2014). If the ATL plays a role in combinatory semantics, syntactic restructuring might induce activation in this region for semantic processes. Therefore we expected to potentially see increased activation for PERTURBATION in the ATL.

Significant effects of PERTURBATION outside these areas could reflect syntactic operations in areas not traditionally associated with syntax, such as subcortical areas (see Lieberman, 2001; Ullman, 2004; Boeckx and Benítez-Burraco, 2014 for these non-standard proposals). Activation for this contrast could also reflect non-syntactic mechanisms. These could be linguistic (e.g., reanalysis of thematic role assignment), or non-linguistic (e.g., error detection, attention). In the discussion section we discuss robust activation of the right IFG for PERTURBATION in the context of the literature on action inhibition and the role it may play in syntactic restructuring.

The secondary COMPLEXITY contrast (passive/complex > active/simple) is an extension of previous work that has shown increased activation in Broca's area for passive sentences (e.g., Ye and Zhou, 2009; Mack et al., 2013). The standard interpretation of this finding is that increased syntactic processing resources are used to process passive sentences. However, while historical approaches of generative grammar (Chomsky, 1982, 2002, 2014) posited a syntactic complexity difference between passives and actives (application of a movement operation in passives), modern syntactic theory does not (largely due to the VPinternal subject hypothesis – active sentences also involve movement of the subject, Kitagawa, 1994). Any complexity difference between passives and actives therefore likely lies in non-syntactic factors, such as the mapping of arguments to thematic roles (we thank an anonymous reviewer for pointing this out). The fact that Broca's area does show increased activation for passives during comprehension supports a non-syntactic interpretation of the function of this region. Whether this region shows increased activation for passive compared to active sentences during production is an open question and should inform hypotheses of this region's function. We did not have strong predictions for this contrast, but included it because it allowed comparison with previous research in comprehension.

# MATERIALS AND METHODS

# Subjects

Twenty-one right-handed, native speakers of English (age 19– 33, 10 female) volunteered for participation. Subjects had normal or corrected-to-normal vision, no hearing impairment, and reported no history of neurological disorder. Subjects were paid \$10 for participation in a 1-hour behavioral training session. One subject was excluded from the fMRI portion of the experiment due to difficulty with the task during the behavioral session resulting in 20 remaining subjects in the fMRI experiment. Subjects were paid \$30 an hour for participation in the fMRI session. Consent was acquired from each subject before participation and all procedures were approved by the Institutional Review Board of UC Irvine.

# Stimuli

The stimulus for each trial consisted of a cue that progressed through three stages: PREP, GO, and FINISH. Every stimulus had the same basic appearance: simple line drawings of the people engaged in the target sentence, the names of the people in large font next to the drawings, the verb to be used in the sentence in the middle of the screen, and an arrow underneath the verb pointing to the right or the left (**Figure 1**). Identical stimulus presentation was used for both sentences and lists – only the subject's task changed. Twenty different transitive verbs were used. Verb length varied from one to three syllables. Verbs were selected for a mix of articulatory complexity. Here is the complete list of verbs: admire, deceive, examine, follow, frighten, greet, harass, help, hug, kick, kiss, pinch, poke, protect, punch, push, rob, scare, tease, tickle.

Verbs were randomly distributed throughout the experimental runs. Four people were used with these names: Mary, Susan, Charlie, and Kevin. The first person was always a different gender than the final person, and people were randomly distributed in different positions throughout the experimental runs. Three people appeared on each cue: one person on the left (START) and two on the right (END). The END people were displayed vertically, one above the other. During the first stage, the PREP stage, a rectangular box surrounded the START person. The arrow, in black color, pointed from the START person horizontally toward the middle of the END people, and not directly toward either of them. During the second stage, the GO stage, the rectangular box disappeared, serving as a "go" signal for the subject to begin articulating. During the third stage, the FINISH stage, the arrow turned blue and tilted up or down to point to the target END individual for that trial. This design forced the subject to begin articulating without knowing which person to end the sentence with and to use the information provided at the FINISH stage to complete the sentence with the correct person. The PREP stage lasted for 500 ms. We chose this time to give subjects enough time to process the information and plan their utterances. The GO stage began immediately after the 500 ms, and subjects began articulating in synchrony with the disappearance of the box. The interval between the GO stage and the FINISH stage was 300 ms, and

the FINISH stage remained on the screen for 1000 ms, followed by fixation until the next trial. During the behavioral training session, the subject would initiate the next trial whenever ready. During the fMRI session, the inter-trial-interval was fixed at 4200 ms, for a total trial duration/inter-trial interval of 6 s.

#### Task

The task was production of either sentences or lists and to restructure appropriately to the switch cue. This resulted in a 2 × 2 design: **STRUCTURE** (sentence, list) and **PERTURBATION** (standard, switch). In the sentence condition, subjects produced sentences with either active or passive construction using the template detailed below. These two constructions comprised an additional sub-factor within the sentence condition, **COMPLEXITY** (active/simple, passive/complex). Active sentences were cued with an arrow pointing away from the first person, and passive sentences were cued with an arrow pointing toward the first person (**Figure 1**). Active sentences were produced with this template: (person 1) **is** (verb)**ing** (person 2). e.g., Mary is following Charlie. Passive sentences were produced with this template: (person 1) **is being** (verb)**ed by** (person 2). e.g., Mary is being followed by Charlie. We instructed subjects to use the progressive aspect on every trial and not to deviate from the template. In the list condition, subjects produced a list of words based on the information from the cue. Subjects ignored the identity of the particular verb on the cue and did not use it in their lists. When the arrow pointed to the right (as in active sentences), subjects produced a list with this template: (person 1) "word **right** arrow" (person 2), .e.g., "Mary word right arrow Charlie." When the arrow pointed to the left (as in passive sentences), subjects produced a list with this template: (person 1) "word **left** arrow" (person 2). e.g., "Mary word left arrow Charlie." We chose the word word to approximately control for the duration of planning and articulation that would take place for the word is in the sentence condition. This timing was relevant to when the subjects were cued to restructure during their utterance (we discuss this in more detail below). Subjects made their utterances at a natural speaking rate.

Subjects did not know how to complete the sentence/list at the beginning of each trial. The FINISH stage indicated which person (top or bottom) would be the second person in the sentence. Subjects were instructed to begin their utterances at the GO stage and use the information on the FINISH stage to determine which name to produce. We set the ISI between the GO stage and the FINISH stage to be 300 ms to allow subjects enough time while speaking naturally to update their utterance without making mistakes on the switch trials. As an example, if the target sentence were "Mary is following Charlie," at the GO stage subjects started speaking "Mary is following . . .", then 300 ms later at the FINISH stage they updated their plan to include "Charlie" and finished. Similarly for the list condition, if the target list were "Mary word left arrow Charlie," they would start speaking "Mary word left

arrow. . ." at the GO stage, and update at the FINISH stage to include "Charlie."

Standard trials occurred as described above; switch trials involved not only updating person 2, but also switching the orientation of the arrow mid-production (**Figure 2**). On sentence switch trials, subjects switched their target sentence from active to passive or vice versa, e.g., Mary is following (person 2) → Mary is **being followed by Charlie**. During list switch trials, subjects needed to switch whether they said right arrow or left arrow, e.g., Mary word left arrow (person 2) → Mary word **right arrow Charlie**. Standard and switch trials were presented at a 4/1 ratio and in random order within each run, such that subjects could not predict what the next trial would be. We used this ratio because this approximate ratio was used in previous studies of target perturbation and fMRI (3/1 ratio used by Tourville et al., 2008), and a smaller ratio of standard to switch trials might have resulted in anticipation of switch trials. We did not want subjects to use a strategy of not committing to a syntactic plan on every trial in order to avoid errors.

The sentence and list conditions were presented in separate runs to avoid confusion and task-switching effects. To balance the spatial orientation of the cues, we counterbalanced across sides by presenting subjects with cues that flowed from left to right (depicted in **Figures 1** and **2**) and cues that flowed from right to left (active sentences correspondingly began with a left arrow instead of a right arrow). Subjects always received two runs from either the sentence or list condition in a row, one each of left and right cue orientation (order counterbalanced across subjects), and we collapsed all analyses across the two orientations.

# Behavioral Training Session

Before running the experiment in the fMRI scanner, we familiarized subjects on the task in a behavioral training session. We wanted subjects to be well prepared for the task in the fMRI scanner to limit variance in performance as well as minimize effects of exposure. In the training session we explained the task to the subjects, including a demonstration by the experimenter on several trials. Then, subjects were asked to perform the task themselves. In the first several trials, the experimenter remained in the testing room to give feedback and instruction. When the subject grasped the task, the experimenter left the room and the subject proceeded self-paced. Subjects performed both tasks with both orientations for a total of four experimental runs, consisting of 50 trials apiece, for a total of 100 trials in the sentence condition and 100 trials in the list condition. The subjects' utterances were recorded and their performance was analyzed. A subject's response was considered an error if they produced the incorrect sentence construction (e.g., active instead of passive), produced the word right instead of the word left (or vice versa), or if they made a speech error during the trial (e.g., produced the wrong speech sound, extensive delays, etc.). Substituting the names of people (e.g., Mary instead of Susan) or substituting one verb for

construction (in red) when that cue was presented. Only the image within the large black rectangles was part of the stimulus. LEFT: PREP stage, during which the subject prepared to begin producing the sentence with either active or passive construction. MIDDLE: GO stage, during which subject was cued to begin producing the incomplete sentence. RIGHT: FINISH stage, during which the completing information was presented. On switch trials, the subject would have to change from one structure to another.

another (e.g., push instead of punch) were not counted as errors, unless the subject also made an additional error as described above. We were only able to collect and analyze behavioral data from 14 out of 20 subjects due to equipment issues. To assess the effect of perturbation on behavior, we averaged across constructions in the sentence conditions and direction in the list conditions. We then performed a 2 × 2 ANOVA (STRUCTURE × PERTURBATION). Subjects underwent the fMRI portion of the experiment after completing the behavioral session, either the same day or on a subsequent day, within a week after the behavioral session.

# fMRI Experiment

Before scanning, subjects were briefly re-familiarized with the task by performing a few trials in each condition outside the scanner. Subjects were instructed to produce their utterances out loud in the scanner, but quietly and with minimal articulation. Subjects received 12 total experimental runs during the experiment (six sentence, six list, counterbalanced by orientation). During the experiment, a fixation cross was displayed on a screen in-between trials. Stimuli were delivered with Matlab software (Mathworks, Inc, USA) utilizing Psychtoolbox (Brainard, 1997; Kleiner et al., 2007). Subjects were given ear covers and foam earplugs to attenuate scanner noise. Each run contained 40 standard trials and 10 switch trials in random order with no explicit rest trials. Presentation order of sentence and list runs was counterbalanced along with cue orientation across subjects. Active/passive constructions and left/right arrow lists were presented at equal frequency. The high-resolution anatomical image was collected following the experimental runs. The scanning session lasted about 1 h and 15 min in total.

# fMRI Data Collection and Analysis

MR images were obtained in a Philips Achieva 3T (Philips Medical Systems, Andover, MA, USA) fitted with an eightchannel RF receiver head coil at the high field scanning facility at UC Irvine. We first collected a total of 1896 T2\*-weighted EPI volumes over 12 runs using Fast Echo EPI in ascending order (TR = 2 s, TE = 25 ms, flip angle = 90◦ , in-plane resolution = 1.95 mm × 1.95 mm, slice thickness = 3 mm with 0.5 mm gap). The first four volumes of each run were collected before stimulus presentation and discarded to control for T1 saturation effects. The high-resolution T1-weighted anatomical image was acquired in the axial plane (TR = 8 ms, TE = 3.7 ms, flip angle = 8 ◦ , size = 1 mm isotropic).

Slice-timing correction, motion correction, and spatial smoothing were performed using AFNI software (http://afni. nimh.nih.gov/afni). Motion correction was achieved by using a 6-parameter rigid-body transformation, with each functional volume in a run first aligned to a single volume in that run. Functional volumes were aligned to the anatomical image, and subsequently aligned to Talairach space (Talairach and Tournoux, 1988). Functional images were resampled to 2.5 mm isotropic voxels and spatially smoothed using a Gaussian kernel of 6 mm FWHM. Finally, functional images were rescaled to reflect percent signal change from the mean signal during each run.

First-level analyses were performed on each individual subject's data using AFNI's 3dDeconvolve function. The regression analysis was performed to find parameter estimates that best explained variability in the data. Each predictor variable representing the time course of activity associated with the task was entered into a deconvolution analysis that estimated parameters best representing the timecourse of the hemodynamic response function in percent signal change values. Timecourse estimates were modeled beginning with the onset of the PREP stage, i.e., when the subject began planning the sentence. The following eight regressors of interest were used in the experimental analysis: sentence active, sentence passive, list left, list right, sentence switch: active to passive, sentence switch: passive to active, list switch: left to right, and list switch right to left. The six motion parameters were included as regressors of no interest. Second-level group analyses were then performed. The values from the experimental contrasts from each subject and condition were entered into a mixed-effects analysis with subjects as random variables using AFNI's 3dMEMA function. We tested the following contrasts: sentence vs. list (**STRUCTURE**), active vs. passive (**COMPLEXITY**), and switch vs. standard (**PERTURBATION**). Because we were particularly interested in switch effects for the sentence condition, we examined the effects of PERTURBATION for sentences and lists separately in addition to the interaction of STRUCTURE and PERTURBATION. We corrected for multiple comparisons though Monte Carlo simulation using AFNI's 3dClustSim function to hold the family-wise error (FWE) rate to less than 0.05. We estimated smoothness in the data from the residual error time series for each subject's first-level analysis using AFNI's 3dFWHMx function. These estimates were averaged across participants for input to 3dClustSim (simulations were restricted to in-brain voxels). Activations were considered significant with a per-voxel threshold of p < 0.001 (one-tailed) and a cluster size threshold of 610 mm<sup>3</sup> (39 voxels).

# ROI Analyses

Given the extensive literature documenting a relationship between Broca's area, the ATL, and sentence processing, we performed ROI analyses on these regions. We extracted percent signal change values within structural ROIs for the left and right ATL, Broca's area, and the right hemisphere homolog of Broca's area, the right inferior frontal gyrus (RIFG) and ran statistical analyses. For Broca's area and the RIFG, we used templates in Talairach space for BA44 and BA45 provided by AFNI based on the cytoarchitectonic probability maps of Amunts et al. (1999). We included every voxel in each map and combined both maps together to form a single mask for Broca's area and a single mask for the RIFG. The relevant functional regions of interest for the ATL do not align well to probability maps based on cytoarchitectonics; we constructed left and right ATL ROIs based on coordinates reported in the neuroimaging literature. We obtained the center of mass coordinates reported by Rogalsky and Hickok (2009) for the sentence > list contrast in the left and right ATL, and created spheres with radius 10 mm around the coordinates. We averaged across all voxels within each ROI and analyzed the average percent signal change values across

the entire estimated timecourse. We first analyzed the effect of COMPLEXITY (passive > active) within each ROI with paired t-tests. We then collapsed our analyses across constructions in the sentence conditions and direction in the list conditions, resulting in 2 × 2 ANOVAs for each ROI (STRUCTURE x PERTURBATION).

# RESULTS

## Behavioral Performance

To reiterate, we only collected behavioral data during the behavioral training session before the fMRI session. **Figure 3** shows the behavioral performance of the 14 subjects for whom we collected data. For non-switch standard trials, subjects performed near ceiling for the sentence and list conditions. The clear outlier is the sentence switch condition. Even though subjects' performance dropped during switch sentence trials, their performance was still above 80%, indicating that they could successfully perform the task. A 2 × 2 ANOVA revealed a significant main effect of STRUCTURE, F(1,13) = 5.282, p = 0.039, η <sup>2</sup> = 0.289, no significant main effect of PERTURBATION, F(1,13) = 3.232, p = 0.095, η <sup>2</sup> = 0.199, and a significant interaction, F(1,13) = 5.353, p = 0.038, η <sup>2</sup> = 0.292. Follow-up two-tailed t-tests (α = 0.025) revealed a marginally significant effect of PERTURBATION for sentences, t(1,13) = 2.077, p = 0.058, Cohen's d = 0.555, and no effect of PERTURBATION for lists, t(1,13) = 0.668, p = 0.516, Cohen's d = 0.169. These results confirm that performance was only impaired during the sentence switch condition.

collapsed across constructions in the sentence conditions and right/left arrow in the list conditions. Analysis revealed a significant main effect of STRUCTURE and a significant interaction between STRUCTURE and PERTURBATION. ∼: marginally significant simple effect of PERTURBATION for sentences (p = 0.058) at p < 0.025. Error bars indicate standard error of the mean. See text for details of statistical analyses.

# Whole-Brain fMRI Analyses

The whole-brain contrasts of STRUCTURE and COMPLEXITY did not reveal activation in the ATL or Broca's area. The effect of STRUCTURE (sentences > lists) revealed increased activation for sentences in left visual cortex, right precentral gyrus, right postcentral gyrus, and bilateral middle frontal gyrus (**Figure 4**). The effect of COMPLEXITY (passive > active sentences) revealed one cluster in the left postcentral gyrus (**Figure 4**). See **Table 1** for Talairach coordinates for each significant cluster of activation for these contrasts.

The effect of PERTURBATION in the sentence condition (sentence switch > sentence control) revealed increased activation during the switch condition in a network including areas typically found for experiments of response selection/inhibition as in the Go/No-Go task (Simmonds et al., 2008; Swann et al., 2009). The GO/No-Go task requires subjects to inhibit a planned motor response when a "stop" signal appears, as well in areas found for perturbation in low-level motor control (Diedrichsen et al., 2005; Tourville et al., 2008; **Figure 5**). Particularly strong activation was observed in the right IFG and anterior insula that has been shown to be involved in "stopping," or the cancelation of a planned response (Aron et al., 2003, 2014). Activations for this contrast also included the supplementary motor area (SMA), pre-SMA, basal ganglia (right caudate nucleus), left inferior parietal cortex, right STS, and right IFG/MFG (**Figure 5**). The effect of PERTURBATION in the list condition (switch lists > standard lists) revealed one cluster in the left cerebellum (**Figure 5**, bottom). See **Table 2** for Talairach coordinates for each significant cluster of activation for these contrasts.

The interaction contrast of PERTURBATION with STRUCTURE did not reveal any significant clusters when cluster-corrected for multiple comparisons, suggesting that there was a similar switch effect across the sentence and list conditions in the brain, although the separate contrasts for these conditions activated different sets of areas.

# ROI Analyses

For the ROI analyses, based on our expectations from the literature, we separately examined the effect of COMPLEXITY (passive > active sentences) using a one-way t-test and performed a 2 × 2 ANOVA of PERTURBATION (switch vs. control) and STRUCTURE (sentences vs. lists).

There was no effect of COMPLEXITY (passive > active) for any of the ROIs (all reported tests are one-tailed t-tests). Broca's area: t(1,19) = −0.059, p = 0.477; RIFG: t(1,19) = 0.069, p = 0.473; left ATL: t(1,19) = 1.746, p = 0.952; right ATL: t(1,19) = 0.799, p = 0.783. The high t-value of the left ATL indicates that there was a possibility of higher activation for active – less complex – sentences.

In Broca's area, there was no significant effect of STRUCTURE, F(1,19) = 1.443, p = 0.244, or PERTURBATION, F(1,19) = 0.714, p = 0.408, and no significant interaction, F(1,19) = 0.164, p = 0.408. In the RIFG, there was no significant effect of STRUCTURE, F(1,19) = 0.005, p = 0.946, a significant effect of PERTURBATION, F(1,19) = 13.541, p = 0.002, and no significant


N = 20. FWE cluster-corrected p < 0.05; individual voxel threshold p < 0.001, cluster size threshold 610 mm<sup>3</sup> . Coordinates reflect the center of mass of each significant cluster. Coordinates are reported in Talairach space.

effect of the interaction, F(1,19) = 0.663, p = 0.426. Activations for each of these conditions in Broca's area and the RIFG are displayed in **Figure 6**.

Both ATL regions showed a significant main effect of PERTURBATION (reduced activity for perturbation), no main effect of STRUCTURE, and no interaction. Left ATL: STRUCTURE, F(1,19) = 0.597, p = 0.449; PERTURBATION, F(1,19) = 6.963, p = 0.016; interaction, F(1,19) = 2.820, p = 0.110. Right ATL: STRUCTURE, F(1,19) = 0.123, p = 0.729; PERTURBATION, F(1,19) = 13.161, p = 0.002; interaction, F(1,19) = 0.396, p = 0.537. Activation for each of these conditions in left and right ATL ROIs are displayed in **Figure 7**. While the test of the interaction between STRUCTURE and PERTURBATION in the left ATL was not significant, it should be noted that this effect trended toward significance.

## DISCUSSION

We performed a novel investigation in the effort to understand the neural bases of syntax: a constrained speech production task, including two different sentence constructions (active and passive), unstructured lists, and a "syntactic perturbation" paradigm. One goal was to probe the response profile of the traditional candidates for syntactic processing and their right hemisphere homologs with this novel PERTURBATION paradigm and a contrast of STRUCTURE (sentences > word lists). We also included a secondary contrast of COMPLEXITY (passive > active sentences) to determine if this effect previously found for comprehension in Broca's area extended to production. Finally, another goal of the experiment was exploratory – to determine whether networks outside of the traditional candidate regions for syntax would activate to syntactic perturbation. We will first focus our discussion on the activation profiles of Broca's area and the ATL. Following this we discuss the effects we obtained for the whole-brain contrasts, particularly the activation we obtained for PERTURBATION in the right IFG and the potential role this region plays in sentence processing.

# The Activation Profile of Broca's Area

The domain-general hypotheses of Broca's area suggest that this region underlies a non-syntactic mechanism during sentence processing, either resolving representational conflict through cognitive control (Novick et al., 2005) or providing working memory resources (Rogalsky and Hickok, 2011). The lack of effects for our PERTURBATION contrast in this region is consistent with these accounts, and contrary to the expectations of a region involved in syntax, as perturbation was expected to tax syntactic processing.

We also did not observe a significant effect of STRUCTURE in Broca's area. Previous work has shown that this contrast is observable in small subregions directly adjacent to regions that do not show this contrast (Hickok and Rogalsky, 2011; Fedorenko et al., 2012a). Our structural ROIs may have contained sentence-selective and non-selective subregions, thus weakening our power to detect effects of STRUCTURE. Regardless, the contrast was clearly not robust, and combined with the fact that the PERTURBATION contrast did not approach significance in this region speaks against a syntactic function.

We did not replicate previous findings for passive > active sentences in Broca's area in comprehension (Ye and Zhou, 2009; Mack et al., 2013). Hypotheses of Broca's area function in sentence processing should take this disparity into account, while noting that the task constraints of our study may have substantially

reduced our ability to detect activation differences between these constructions.

# The Activation Profile of the Anterior Temporal Lobe (ATL)

Our whole-brain analysis did not reveal any effects of COMPLEXITY and STRUCTURE in the left or right ATL. However, the ROI analysis did reveal a PERTURBATION effect for the ATL bilaterally – decreased activity for perturbation. We attribute the null effect of STRUCTURE and the decreased activity for PERTURBATION to a semantic rather than syntactic function of the ATL and decreased attention to semantic content in our study.

Our ROI plots showed less activity for switch sentences than for natural sentences, which reduced the sensitivity of our analyses to detect a main effect of STRUCTURE. This reduction can be explained by decreased attention to the semantic content of the stimulus for switch trials. Rogalsky and Hickok (2009) showed that attention substantially affects activation to semantic content in the ATL. The demanding nature of our task may have distracted subjects away from the semantic content of the sentences, reducing the difference in semantic processing between lists and sentences. Our stimuli also had limited semantic content generally. We used proper nouns instead of common nouns (e.g., Mary instead of the acrobat), and simplistic line drawings devoid of detail rather than pictures of actual people engaging in action as used in other studies (e.g., Menenti et al., 2011; Segaert et al., 2012). Future studies seeking to obtain effects of structure in the ATL during speech production should enrich the semantic nature of the materials and choose a task that does not require heavy attentional demands.

.

The decreased activity for PERTURBATION is contrary to the expectations of a region involved in syntax, but compatible with a role for semantics. Any effect of PERTURBATION would presumably increase demands on syntactic structure building, rather than decrease them. The increased attentional demands of switching syntactic structures, however, likely reduced attention

#### TABLE 2 | Effects of PERTURBATION.

fpsyg-07-00241 February 19, 2016 Time: 21:36 # 10


N = 20. FWE cluster-corrected p < 0.05; individual voxel threshold p < 0.001, cluster size threshold 610 mm<sup>3</sup> . Coordinates reflect the center of mass of each significant cluster. Coordinates are reported in Talairach space.

to the semantic content of the sentences, accounting for a reduction of activity in the ATL as discussed above.

The major piece of data in support of a basic syntactic function of the ATL is the observation that the structural effect in the ATL can be found for sentences with the content words replaced by non-words, retaining the structural "feel" but with greatly impoverished semantic content (i.e., jabberwocky stimuli; Mazoyer et al., 1993; Humphries et al., 2006; Rogalsky et al., 2011). However, this effect is much less robust than for full sentences, with some studies failing to observe it at all (Pallier et al., 2011; Fedorenko et al., 2012c). Future research could determine the source of these discrepancies, including testing the notion that there may be a functional-anatomical subdivision within the ATL between syntactic and semantic processing (Rogalsky and Hickok, 2009).

# Whole-Brain Contrasts of Complexity and Structure

We first discuss the whole brain contrasts of COMPLEXITY and STRUCTURE. The whole-brain contrast of COMPLEXITY revealed one significant cluster in the left post-central gyrus. Since passive sentences are longer than active sentences, requiring additional articulation, this cluster likely reflects the increased motor speech output and corresponding somatosensory input rather than any core linguistic function. The whole-brain contrast of STRUCTURE only revealed activity in visual cortex and bilateral superior frontal areas. These regions have been previously associated with visual attention (Kastner and Ungerleider, 2000; Corbetta and Shulman, 2002). This suggests that demands on visual attention were stronger during the sentence condition than during the list condition, which is supported by the behavioral data.

The lack of additional effects in language-related regions for these contrasts deserves explanation. We have already discussed Broca's area and the ATL; other language-related areas that are typically activated by this contrast include the left posterior temporal lobe and the angular gyrus (Bedny et al., 2011; Pallier et al., 2011; Fedorenko et al., 2012c). The difference between our results and previous studies cannot be attributed solely to the differences between production and comprehension; several production studies have revealed effects in these areas (Menenti et al., 2011, 2012; Segaert et al., 2012, 2013). As discussed in the introduction, the structurally simple and short sentences that we used minimized demands on working memory and cognitive control, and our stimuli did not encourage rich semantic processing. It may be the case that effects in these languagerelated regions are due to these processes. Previous research points to a role for the posterior temporal lobe in working memory and cognitive control (Hickok et al., 2003; Glaser et al., 2013) and the angular gyrus in semantic processing (Binder et al., 2009; Price et al., 2015), consistent with this speculation.

# Syntactic Perturbation Reveals a Network for Response Selection, Action Inhibition, and Motor Control

While syntactic PERTURBATION did not activate traditional language areas of the left hemisphere, it did activate other brain regions, including medial frontal areas (SMA, pre-SMA), the right caudate nucleus, the right posterior STS, the right IFG, and the right anterior insula. These are regions that have been reported in studies of perturbation and motor control in other domains (Diedrichsen et al., 2005; Suminski et al., 2007; Tourville et al., 2008) and studies of response selection/action inhibition implementing go/no-go designs (Simmonds et al., 2008). The list PERTURBATION contrast activated only the cerebellum. This disparity of results between the sentence and list conditions must be treated carefully, as the interaction contrast did not reveal a significant statistical interaction between STRUCTURE and PERTURBATION in any regions. This suggests that there were similar activation patterns for both conditions, but that the effect was somewhat stronger in the sentence condition.

The activation of the right caudate nucleus is consistent with the suggestion that the basal ganglia are involved in syntactic operations (Lieberman, 2001; Ullman, 2004). However, we do not believe that this activation in our study reflects syntax. This is because the right basal ganglia are part of a larger network that is strongly implicated in stopping, discussed below.

While RIFG activation is sometimes reported for syntactic manipulations (Embick et al., 2000; Friederici et al., 2000; Meyer et al., 2000; Fiebach et al., 2005; Tyler et al., 2010), it is not common for experiments of basic sentence processing, and the aphasia literature does not support a strong association between deficits in sentence processing and the RIFG (Damasio, 1992; but see Caplan et al., 1996). The effect of PERTURBATION in this region therefore likely reflects non-syntactic mechanisms. The operative mechanism may be action inhibition, or "stopping," which has been attributed specifically to the RIFG in conjunction with the other areas activated by the PERTURBATION contrast (Aron et al., 2003, 2014). Under this hypothesis, the RIFG operates as a "brake." We can apply this braking hypothesis to the

current study through reverse inference. After subjects planned to produce a sentence with a given sentence construction, on switch trials they utilized the brake to inhibit this plan. When subjects planned to produce a list of words, they also relied on the brake, but less so.

Our study provides insight into a surprisingly large amount of previous studies of syntax and sentence comprehension that report activation of the RIFG. Such studies can be divided into two groups: studies of complex/non-canonical sentence constructions and garden-path sentences (Meyer et al., 2000; Fiebach et al., 2005; Grewe et al., 2006; Bornkessel-Schlesewsky et al., 2012; Chan et al., 2012), and studies involving syntactic violations (Embick et al., 2000; Moro et al., 2001; Ben-Shachar et al., 2003; Friederici et al., 2003, 2006; Bahlmann et al., 2008). The fact that our task explicitly involved stopping suggests that this mechanism may account for RIFG activations in these previous studies. When subjects process a sentence with non-canonical sentence structure or syntactic violations, they must revise their initial parse to arrive at the correct interpretation. This revision may rely on an inhibition function to quickly reject the current parse in favor of a new one. Supporting this hypothesis, Caplan et al. (1996) found that patients with right hemisphere lesions had significantly worse sentence comprehension than control subjects, particularly for complex sentence constructions (although these effects were not as strong as in patients with left hemisphere lesions). Future research could further investigate the hypothesis of a "braking" function during sentence comprehension.

# CONCLUSION

The present study sought to implement a novel paradigm in the study of syntax and the brain: a constrained sentence production

# REFERENCES


task with a perturbation paradigm applied to syntactic structure. While our activations point to a possibility of a stopping mechanism in the RIFG that facilitates structural revision, it is difficult to make any firm conclusions based on this study alone. The lack of effects for syntactic PERTURBATION and STRUCTURE in Broca's area suggest that this region performs a non-syntactic function during sentence processing. This supports the previous body of evidence against a role for syntax in Broca's area (Rogalsky and Hickok, 2011). Finally, we did not extend previous effects of sentences > word lists in the ATL to production, although the lack of an effect may have been due to reduced activity in this region during perturbation. This is consistent with a role for the ATL in combinatorial semantics.

# AUTHOR CONTRIBUTIONS

WM and GH conceptualized and designed the experiment. WM created the stimuli and collected and analyzed the data. WM and GH wrote and revised the manuscript.

## FUNDING

This research was supported by NIH Grant # DC03681 awarded to GH.

### ACKNOWLEDGMENTS

We would like to thank Jonathan Venezia for useful suggestions, Maral Aghvinian for assistance with data analysis, and two anonymous reviewers for their comments.

magnetic resonance imaging. Psychol. Sci. 14, 433–440. doi: 10.1111/1467- 9280.01459




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Matchin and Hickok. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Brain asymmetry in the white matter making and globularity

#### *Constantina Theofanopoulou\**

*Department of General Linguistics, Universitat de Barcelona, Barcelona, Spain*

Recent studies from the field of language genetics and evolutionary anthropology have put forward the hypothesis that the emergence of our species-specific brain is to be understood not in terms of size, but in light of developmental changes that gave rise to a more globular braincase configuration after the split from Neanderthals-Denisovans. On the grounds that (i) white matter myelination is delayed relative to other brain structures and, in humans, is protracted compared with other primates and that (ii) neural connectivity is linked genetically to our brain/skull morphology and language-ready brain, I argue that one significant evolutionary change in *Homo sapiens'* lineage is the interhemispheric connectivity mediated by the Corpus Callosum. The size, myelination and fiber caliber of the Corpus Callosum present an anterior-to-posterior increase, in a way that inter-hemispheric connectivity is more prominent in the sensory motor areas, whereas "high- order" areas are more intra-hemispherically connected. Building on evidence from language-processing studies that account for this asymmetry ('lateralization') in terms of brain rhythms, I present an evo-devo hypothesis according to which the myelination of the Corpus Callosum, Brain Asymmetry, and Globularity are conjectured to make up the angles of a co-evolutionary triangle that gave rise to our language-ready brain.

#### *Edited by:*

*Antonio Benítez-Burraco, University of Huelva, Spain*

#### *Reviewed by:*

*Pedro Tiago Martins, Pompeu Fabra University, Spain Bridget Samuels, Pomona College, USA*

#### *\*Correspondence:*

*Constantina Theofanopoulou, Department of General Linguistics, Universitat de Barcelona, Edifici Josep Carner, 5a Planta, Gran Via de les Corts Catalanes 585, 08007 Barcelona, Spain constantinaki@hotmail.com*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

*Received: 27 July 2015 Accepted: 24 August 2015 Published: 10 September 2015*

#### *Citation:*

*Theofanopoulou C (2015) Brain asymmetry in the white matter making and globularity. Front. Psychol. 6:1355. doi: 10.3389/fpsyg.2015.01355* Keywords: brain asymmetry, lateralization, skull, globularity, corpus callosum, white matter, brain rhythms, language

# Introduction

The general aim of this paper is to support the idea that the key underlying our human- specific cognitive profile is to be found in the changes that brought about a more globular brain shape. As far as I can tell, two scientific hypotheses have already been put forward claiming that it was essentially this globular shape that determined the brain profile of *Homo sapiens*: one, by Boeckx and Benítez-Burraco, 2014a,b), called it "*globularity*" and hypothesized that modifications in the fronto-parieto-thalamic network ought to be taken into account; the other, by Hublin et al. (2015), called it the "*globularization phase,*" focusing, thus, on the developmental phase of the shaping, which, according to their findings, is due to bulging parietal and occipital bones (see **Figure 1**).

These two hypotheses can be understood as viewing the same changes from different perspectives. While Hublin et al. (2015) concentrate mostly on the evolutionary (occipito-parietal protrusion) and developmental (cerebellum) facet of the problem, Boeckx and Benítez-Burraco, 2014a,b; Benítez-Burraco and Boeckx, 2015) mainly focus on the link between globularity and language on a genetic level: their positing that the fronto-parieto-thalamic network might be of relevance seems to reflect the great majority of the findings that implicate the fronto

(B44/45)-parietal (BA22) lobes in linguistic processing, broadening this cortical network to its subcortical "afferent and efferent expansion," namely the thalamus (Buzsáki, 2006; Theofanopoulou and Boeckx, 2015). The integration of the thalamus should not strike us as irrelevant at all, as it essentially relays the cerebellar input to the frontal lobe (BA44/45 included; for cerebello-thalamic connectivity: Leiner et al., 1989; Schmahmann, 1997; Engelborghs et al., 1998, for thalamico-BA44/45 connectivity: Ford et al., 2013; Bohsali et al., 2015) and its growth is correlated to associated expanded areas, like the parietal lobe (and more specifically the novel precuneus, Cavanna and Trimble, 2006; Bruner, 2014; Bruner et al., 2014). By the logic of co-evolution (namely that two tightly connected brain parts exert pressure to each other, affecting each other's evolution), it can be deduced that both the cerebellum and the thalamus are crucially involved in the cognitive mechanisms that result in language.

In this paper, however, I will take the expansion of the cerebellum to be the guiding line for the following reasons: firstly, the cerebellum is directly connected to the evolutionarilysignificantly splayed parieto-occipital bone; secondly, it is particularly in these posterior sensorimotor regions of the cortex where the Corpus Callosum permits interhemispheric connectivity thanks to its anterior-to-posterior increase in size, myelination and fiber caliber (Aboitiz et al., 1992; Doron and Gazzaniga, 2008); thirdly, this connectivity is taken to be crucial for permitting the rhythmic interhemispheric interplay observed in language processing sensorimotor networks (Morillon et al., 2010).

The frontal cortex will not figure much in my core hypothesis, as according to Barton and Venditti (2014), even though absolute and proportional frontal region size increased rapidly in humans, this change was tightly correlated with corresponding size increases in other areas and overall brain size; besides, research has demonstrated that the parieto-occipital fossa's protrusion was the most decisive; lastly, as regards to evolutionary changes in frontal connectivity, Neubert et al. (2014) have shown that actually what differentiates humans' and macaques' frontal cortex is its coupling to posterior auditory areas (which is much stronger in humans), something that seems to justify my attention toward the posterior sensorimotor areas.

Ultimately, my objective is to bring out how -under this novel perspective- we could also make sense of the intricate idea of Brain Asymmetry ('Lateralization') and, thus, elucidate the hitherto unexplored relation of Brain Asymmetry-Corpus Callosum-Globularity. More specifically, I will argue that the architecture of the Corpus Callosum, allowing for interhemispheric connectivity in the posterior cortex (posterior temporal- posterior parietal- occipital cortex) and for intrahemispheric connectivity in the anterior- medial cortex (frontal- anterior parietal cortex-anterior/medial temporal cortex), suggests a novel way of capturing how brain asymmetry (a phylogenetically common trait) made it possible for our language- ready brain to arise (see **Figure 2**).

The structure of the paper will be the following: first, I will briefly introduce the idea of Brain Asymmetry- Lateralization (see Brain Asymmetry – Lateralization) and explain how it fits in the framework I wish to put forward; then I will look into how Asymmetry can be captured in terms of brain rhythms and analyze how the morphology of the Corpus Callosum renders this asymmetry possible (1.1). Next, I will draw attention to alpha and beta rhythms and propose a way in which they could constitute an overlooked window into both Ontogeny and Phylogeny (1.2), adducing supporting data from deficits that could be seen as speech-related oscillopathies (1.3). In Section "White Matter-Globular Brain Pattern," I will delve into the idea of the White Matter begetting our Globular Brain Pattern during development. In the end, I will discuss how the posterior brain and skull coevolved in *H. sapiens* (see Posterior Brain and Skull Enlargement in *Homo sapiens*).

#### Brain Asymmetry – Lateralization

Brain Asymmetry, long thought to be human- specific, has been shown to lie along an evolutionary continuum (Fitch and Braccini, 2013), so that even the first biological pillar of the uniqueness of human language, namely its strong leftlateralization (Lenneberg, 1966) has fallen down. Comparative studies have suggested a left-hemispheric dominance for conspecific communication in a wide variety of species (Ocklenburg and Güntürkün, 2012), such as chimpanzees (Taglialatela et al., 2008), rhesus monkeys (Hauser and Andersson, 1994), dogs (Siniscalchi et al., 2008), mice (Ehret, 1987), sea lions (Böye et al., 2005), and frogs (Bauer, 1993). More tellingly, a left-dominance has been reported in canaries as regards to hypoglossal functions (Nottebohm, 1971), in zebra finches concerning vocal learning (Voss et al., 2007; Moorman et al., 2012) and in Bengalese finches for song discrimination (Okanoya et al., 2001). Another shared dominance worth mentioning is that of emotional processing in the right hemisphere (e.g., Önal-Hartmann et al., 2011). There is evidence that it obtains also in gelada babboons (Casperd and Dunbar, 1996), mangabeys (Baraud et al., 2009), rhesus macaques (Vermeire and Hamilton, 1998), chimpanzees (Parr and Hopkins, 2000), marmosets (Hook-Costigan and Rogers, 1998), and dogs (Siniscalchi et al., 2008).

Last but not least, asymmetry in motor behavior and more concretely, in handedness, has been erroneously thought to be human- specific and furthermore to imply -along with language-

Singer, 2010b.)

a general left hemispheric dominance common to humans (Harris, 1991). On the one hand, left-motor-lateralization is not unique to humans: as Smaers et al. (2013) review, lateralization in motor behavior has been found in primates (Nudo et al., 1992; Bogart et al., 2012; Sun et al., 2012), non-primate mammals (Ehret, 1987; Rogers et al., 1994), birds (Vallortigara and Andrew, 1994), fish (Cantalupo et al., 1995; Bisazza et al., 1997), reptiles (Engbretson et al., 1981; Hoso et al., 2007) and amphibians (Bauer, 1993; e.g., pawedness in toads Bisazza et al., 1996), footedness in birds (Rogers and Workman, 1993) and finnedness in fish (Hori, 1993). Handedness, in particular, was recently shown to be present also in non-primate mammals (bipedal marsupials), something that challenges the notion that 'true' handedness is unique to primates (Giljov et al., 2015). On the other hand, the right- handedness rule would imply "*that most left handed people display right hemispheric dominance for language, an assertion not validated by rigorous empirical studies* (Knecht et al., 2000)," as Washington and Tillinghast (2015) observe. Ocklenburg and Güntürkün (2012) suggest there are both genetic and epigenetic factors we should take into account in the context of brain asymmetry. They provide the example of pigeons: their determined embryonic egg- position (genetic factor) permits only their right eye to be stimulated by light (epigenetic factor), resulting in left hemisphere superiority for visual object discrimination. (For a good experiment on how early navigational experience in pigeons affects lateralization, see Mehlhorn et al., 2010.) The authors finally suggest that similar "*early spinal asymmetries could act as lateralized "precursors" of asymmetrical cortical motor functions*" in humans (such as prenatal bias on turning the head to the right); epigenetic factors though should not be overlooked, as their relevance in human handedness is much more important (unlike birds in the case of vision, humans are not genetically confined to using only one hand!). The lower incidence of left-handedness in countries where the left hand is associated with uncleanliness is a good example portraying how much epigenetic factors affect handedness (Zverev, 2006). Siding with Benítez-Burraco and Longa (2012), I conclude that "*the relationships between righthandedness (structural and functional) brain lateralization, and language are perhaps not significant enough, or illuminating from an evolutionary perspective.*"

With the aforementioned I wish to underline that human language lateralization is not due to a dominance of the left hemisphere for language as such: none of the hitherto known cognitive functions emerged during hominin evolution. Rather, they are phylogenetically shared, as one should expect given the conservation of brain rhythms across a wide range of species (Buzsáki et al., 2013; Boeckx and Theofanopoulou, 2015). This should lead us to consider the following: given that gray mattersubcortical parts of the brain are associated to sensorimotor and cognitive functions, and white matter modulates the distribution of action potentials among them and the neocortex (Fields, 2005, 2008), it is probably this modulatory function that gives to the core-cognitive functions (gray matter) the level of complexity detected only in our species and required for language.

I agree with Ocklenburg and Güntürkün (2012) in that white matter might be an overlooked window to brain asymmetry- issues: "*A common conception is that functional asymmetries are a consequence of structural asymmetries in the brain . . .. . .research...has focused on macroscopic gray matter asymmetries... evidence from recent studies in animal models suggests that structural asymmetries in connectivity patterns of homologous regions in the two hemispheres may be of greater functional relevance.*" Another reason to believe so is the developmental nature of white matter's myelination; occurring relatively slowly over the lifespan (Hynd et al., 1995), white matter constitutes a perfect mirror candidate of the developmental nature of human language acquisition. Besides, as I noted above, the chemical mechanisms of myelination are decisive for axons' generating action potentials, something that can be directly associated with the oscillatory basis of language I will shortly highlight.

In what follows, I will try to illustrate the relevance of the brain's largest white tissue structure, i.e., the Corpus Callosum, in Brain Asymmetry, and afterward provide a link to Globularity. Let me clarify that the Corpus Callosum was not chosen merely because of its size, but because of its decisive position in the brain and its human- specific structure. The size, myelination and fiber caliber of the Corpus Callosum presents an anteriorto-posterior increase (Doron and Gazzaniga, 2008), resulting in interhemispheric connectivity being more prominent in the sensory motor areas, whereas "high- order" areas are more intrahemispherically connected (**Figure 3**). Studies comparing humans' and monkeys' corpora callosa tellingly revealed that in humans the proportion of large diameter fibers in callosal regions that interconnect primary sensory areas is higher than in macaques (Aboitiz et al., 1992) and that the fiber organization has nothing in common with the callosal organization reported in monkeys (Jones et al., 1978; Killackey et al., 1983).

The hypothesis I wish to put forward is the following: it is the structure of the Corpus Callosum that makes humans' brains display a sophisticated, selective asymmetry:

in the anterior/medial cortex, where callosal fibers are narrow and intrahemispheric connectivity is enhanced, asymmetry is expressed at the level of small- world networks, i.e., cognitive functions appear to be lateralized as modules (e.g., the default network in the left and the attentional in the right hemisphere: Wang et al., 2008; De Schotten et al., 2011); in the posterior cortex, there is no such asymmetry, as visual, auditory, and motor functions appear in both hemispheres. What makes the posterior cortex asymmetrical is to be found in the hemispheres' refinement toward processing input of specific 'sampling rate' (temporal-faster rate sampling executed in the left hemisphere and spectral-slower rate in the right). In the coming section, data fostering my hypothesis will be adduced (see **Figure 3**).

#### Asymmetry in the Dynome and the Corpus Callosum

Contemporary neural models of auditory language processing proposed that the two hemispheres are differently specialized in either temporal (left hemisphere) or spectral (right hemisphere) resolution (Zatorre et al., 2002), or in other terms that they differ in terms of their preferred "sampling rate" with the left hemisphere being well suited for faster- rate sampling and the right for slower rate (Poeppel, 2003; Hickok and Poeppel, 2007, 2015; Morillon et al., 2010). According to Celesia and Hickok (2015), "*These two proposals are not incompatible as there is a relation between sampling rate and spectral vs. temporal resolution: rapid sampling allows the system to detect changes that occur over short timescales, but sacrifices spectral resolution, and vice versa.*"

More concretely, Morillon et al. (2010) found that there are two auditory speech sampling mechanisms working in parallel: while syllabic parsing of the input (slow- rate deltatheta oscillations, ∼4 Hz) is predominantly assigned to the right hemisphere, the left hemisphere has been shown to have a primacy for the processing of phonemic input (fastrate gamma oscillations, <sup>∼</sup>40 Hz; see **Figure 4**). A dynamic interplay is assumed to allow for the timely coordination of both information types, namely for fast phonemic gamma being

modulated by syllabic theta oscillations. The hypothesis of functional asymmetry concerning hemispheres' preferential cues is known as the AST (asymmetric sampling in time) hypothesis (Poeppel, 2003; Ghazanfar and Poeppel, 2014).

Tellingly, even though this asymmetry was most pronounced in the auditory cortex, motor areas also express natural oscillatory activity that corresponds to the same rates: intrinsic jaw movements oscillate at delta/theta oscillations, thus presenting an overlapping parsing with the syllabic network, while the phonemic fast gamma oscillations underlie tongue and formant transition movements (e.g., trill at 35–40 Hz; Morillon et al., 2010).

Another significant finding of the experiment conducted by Morillon et al. (2010) was a strong intrinsic asymmetry (also manifest at rest) between the articulatory (left hemisphere) and the hand motor cortex (right hemisphere). This asymmetry is suggested to be phylogenetically "inherited," probably because of the long shared sinistral pharyngeal muscle control on the one hand and the dextral hand gestures control on the other. (Let me parenthesize here to point out from another perspective that erroneously right- handedness has attracted all the attention: what is significant for language is hand gestures accompanying language and not handedness *per se.*)

In addition, there are cases of other, non-human, even nonvocal-learning species, whose lip- smacking is tuned into the same slow oscillatory cycles, present in human syllabic sampling (such as the Gelada Baboons *Theropithecus gelada*, Bergman, 2013, see also Ghazanfar et al., 2012). It would then be critical to find out whether the Gelada Baboons display also a humanlike right dominance of these lip movements and whether they are coupled with other faster oscillatory cycles, subserving communicatory processes. With the latter, I don't mean to imply that our linguistic profile is due to our capacity of housing spectral and temporal information within a narrow time-window, since this competence is again found to be present in other species: mustached bats exhibit the same oscillatory asymmetry in echolocation processing (Washington and Tillinghast, 2015). Rather, what all these phylogenetic observations are meant to highlight is that we indeed share both generic and elemental mechanisms with other species: the key to our questions is to be found in how *H. sapiens* 'coupled' the modalities inherited. Adopting a Darwinian thinking, it seems indeed plausible that the connectivity across and within modalities afforded by the peculiar structure of the Corpus Callosum is evolutionarily significant. In the case of auditory processing, it has been experimentally shown that it is the posterior Corpus Callosum that gives rise to this linguistically- crucial theta- gamma coupling (Rumsey et al., 1996; Pollmann et al., 2002; Nosarti et al., 2004). Even early dichotic listening experiments on patients with Corpus Callosum abnormalities had specified that agenesis in the splenium is pertinent to aberrant auditory interplay (Sugishita et al., 1995; Pollmann et al., 2002). Friederici et al. (2007) tellingly observe that "*an intact posterior third of the C[orpus] C[allosum] connecting temporal regions is a necessary precondition for a prosody-induced N400 mismatch effect. Lesions in the anterior two-thirds of the CC that connect frontal regions, in contrast, can cause a modulation of the prosody- induced*

*mismatch effect but cannot eliminate the effect.*" Sammler et al. (2010) conducted an experiment with two groups of patients with lesions either in the anterior or the posterior Corpus Callosum: the latter did not exhibit the expected mismatch between segmental (temporal) and suprasegmental (spectral) features of language.

Apart from the role of the Corpus Callosum in auditory processing, all these studies also hint at the involvement of the right hemisphere in language processing (for a review see Lindell, 2006). Fully consistent with my hypothesis, Overath et al. (2015), after finding that the effect of speech segment length was robust in both hemispheres, inferred the following: "*It is therefore possible that laterality effects are driven more by higher order linguistic processing demands than by speech analysis per se.*"

From this perspective, we can also explain the initial results from split- brain patients (Corballis, 1998), that apparently confirmed the linguistic incompetence of the right hemisphere: callostomy patients could not verbally answer to language questions presented to isolated right hemispheres, because the articulatory (only) ability is left- dominant. However, when asked for non-verbal responses, the patients demonstrated speech auditory comprehension, by picking with their left hand (right motor control) the object (uttered by the experimenter) among an array of objects (see **Figure 5**).

Lastly, the idea of lateralization in terms of synchronic activity in the posterior cortex -mediated by large- diameter, fastconducting callosal regions- was firstly formulated by Aboitiz et al. (2003). However, back then there was no evidence reporting a right- dominance for spectral and a left- dominance for temporal processing, so that the authors made the following conclusion: "*There is not yet evidence for the existence of synchronic ensembles during performance in working memory tasks, but 40 Hz synchronic oscillations have been reported during linguistic performance... The role of synchronic activity in working*

FIGURE 5 | A split- brain patient demonstrates speech auditory comprehension ('Ball'), by picking the ball with his left hand (right motor control; Adapted from: http://thebrain*.*mcgill*.*ca/flash/capsules/ experiencebleu06*.*html).

*memory processes, be they linguistic, auditory or visual, urgently needs to be investigated."* The proposal of dynamic asymmetry presented here should be really close to what Aboitiz et al. (2003) had in mind.

#### Alpha and Beta Rhythms- a Window to Ontogeny and Phylogeny of Mirror Neurons

Although research has mostly focused on the rhythms that have a discernible effect in auditory processing (delta/theta and gamma oscillations), alpha and beta rhythms are suggested to be equally implicated but in a reverse mode: by being suppressed.

This shift of interest toward the significance of these two rhythms is apparent if one pays attention to the way alpha/beta suppression is treated by Poeppel and colleagues: Doelling et al. (2014) investigated to what extent the oscillation-based envelope tracking (discussed above) also mediates the relationship between sharpness and intelligibility; the results supported this correlation but the authors also noted that another rhythm (alpha) was suppressed in their experiment. They limited their observation to mentioning Obleser and Weisz (2012), who have shown that alpha power suppression is related to intelligibility, but 1 year after they sought to put this to test (Luc and Arnal, 2014). Interestingly, they found that the typical increase in theta band was always followed by a broadband suppression of alpha (9–14 Hz) and beta (15–25 Hz) bands and suggested that "*the brain exploits the level of post-stimulus alpha suppression as internal evidence to determine how well the stimulus matched the prediction.*"

If we now turn back to the experiments made by Obleser and Weisz (2012), alpha rhythm seems to have far-reaching implications in auditory processing: on the one hand, it has been shown to be suppressed to reinforce acoustic intelligibility, but on the other hand, alpha rhythm is enhanced during auditory memory retention (Knecht et al., 2000; Jensen et al., 2002; Obleser et al., 2012; Theofanopoulou and Boeckx, 2015). These findings suggest that alpha rhythm moderates the interplay between working memory/attention and intelligibility. Tellingly, when the auditory memory is overloaded (hence alpha rhythm enhancement is further employed), acoustic degradation affects processing, because of the noncanonical ellipsis of alpha suppression (Obleser et al., 2012). This is also consistent with studies reporting an increased activation of the cerebellum in high- load tasks, suggesting a prominent role of the cerebellum in working memory processing (Kirschen et al., 2010; Stoodley et al., 2012; Luis et al., 2015).

On the grounds that Morillon et al. (2010) regard the synchronization of the motor (jaw movements/trill or formant transitions) and auditory (syllabic/phonemic) modalities as significant, I take it that there must be such a correlation also between the motor and auditory alpha and beta suppression. This conclusion can be reached thanks to studies focusing on the Mirror Neurons System: a system of neurons which were first thought to be activated only during action execution and action-observation (Rizzolatti and Craighero, 2004). However, recent experiments leave no doubt that mirror neurons (MN) are also implicated in auditory processing (Cuellar et al., 2012) and sensorimotor learning (Catmur et al., 2007; Hickok and Hauser, 2010). At present, it is acknowledged that MN integrate cross- modal information and crucially all the information that has been said to be involved in language processing (Senkfor, 2002; Molnar-Szakacs and Overy, 2006).

What is even more relevant to the present paper is that EEG and MEG studies report a suppression of alpha/mu and beta- band activity in the sensorimotor area, among other areas (Rizzolatti and Luppino, 2001; Muthukumaraswamy and Johnson, 2004; Ulloa and Pineda, 2007; Pineda, 2008; Perry and Bentin, 2009; Perry et al., 2010; Frenkel-Toledo et al., 2013; Lange et al., 2015; from now on, I will focus on the mu-alpha rhythm suppression, as beta- band suppression has only very recently been shown to be involved in the context of MN; see Lange et al., 2015).

There is corroborating evidence that alpha/mu rhythm desynchronization in the sensorimotor system appears early in infancy and its functional properties are so strongly modulated by maturation, that the sensorimotor system evolves from a random (in infants) to a "small- world" organization (in children and adults; Ferrari et al., 2009; Marshall and Meltzoff, 2011; Cuevas et al., 2014; Berchicci et al., 2015). This "small- world" networking is achieved by a wiring pattern in the brain that is thickly intraconnected locally and sparsely interconnected globally (Changizi, 2001, p. 571; Karbowski, 2003; Sporns and Kötter, 2004; Sporns and Zwi, 2004). Pineda (2005) proposed that "*mu rhythms represent an important information processing function that links perception and action-specifically, the transformation of 'seeing' and 'hearing' into 'doing'.*"

In addition, if we go back to Morillon et al. (2010), they suggest that "*inherent auditory- motor tuning at the syllabic rate and acquired tuning at the phonemic rate are also compatible with two recognized stages of language development in infants; an early stage with production of syllables that does not depend on hearing (also observed in deaf babies), followed by a later stage in which infants match their phonemic production to what they hear in caregiver speech.*" It seems to me that alpha/mu rhythm's maturation in development could indeed be the key for the interplay of the two sampling rates discussed, given that it is directly connected to the maturation and myelination of the white matter: Jann et al. (2012) found positive correlations of Fractional Anisotropy with alpha frequency within the splenium of the corpus callosum. Let me reiterate that the splenium, being at the posterior-myelinated part of the Corpus Callosum, presents an overlap of myelin water fraction with Fractional Anisotropy values within their thick axons that permits fast signal conductance. It is, furthermore, noteworthy that Miller (1994) sees such a connection between myelination, alpha frequency and intelligence that he put forward a brain myelination hypothesis of intelligence.

More importantly, these ontogenetic observations can be linked also to phylogenetic issues, which could result in interesting future experiments. It is known that MN were originally detected in monkeys' area F5. What is not yet appreciated is that MN respond to the observation of lipsmacking and hand- actions (Ferrari et al., 2003) by inhibiting

mu rhythm (Vanderwert et al., 2013; Cook et al., 2014). Given that there are hypotheses -in the context of human ontogenyproposing that MNS in humans pass from being purely visual to multimodal (Iacoboni and Dapretto, 2006; Ferrari et al., 2009), it could be conjectured that this developmental shift, mediated by the myelinated posterior Corpus Callosum, was crucial for our linguistic cognition. (For data fostering this idea, see Autism Spectrum Disorder.)

#### Deficits as Speech- Related Oscillopathies

# *Autism spectrum disorder*

It shouldn't strike us as strange that a plethora of evidence in line with the above comes from autism, where actually the developmental process is most obviously affected. Murphy et al. (2014) found that alpha- band deployment was severely impaired, giving rise to increased distraction, and Jochaut et al. (2015) encountered that ASD patients, instead of down- regulating gamma activity by theta, presented an opposite dependency such that gamma and theta- coupling jointly increased out of physiological ranges.

In light of what has been said about the implication of these rhythms in speech processing, it is clear that dysfunctional theta/gamma coordination and alpha suppression would disrupt the alignment of neuronal excitability with syllabic onset, compromising speech decoding.

The data seem also in consonance with what has been conjectured before about an earlier visual system which later becomes multi- modal: Damarla et al. (2010) show that ASD patients display more activation in visuospatial (bilateral superior parietal extending to inferior parietal and right occipital) areas, something that possibly indicates a compensatory role of visual processing during speech perception. The latter is supported by experiments according to which ASD subjects extensively explore the mouth region in face- to-face situations (Klin et al., 2002), and use specific attention modes to enhanced local visual processing (Schwarzkopf et al., 2014).

Regarding the Corpus Callosum in ASD, there is a aboundance of studies proving that its size is degenerated in the posterior areas (Alexander et al., 2007; Just et al., 2007, among others). Moreover, the fact that abnormalities in the parietal lobes (Courchesne et al., 1993) and the posterior fossa (Courchesne et al., 1994) have been detected in infantile Autism lends credence to the contention that the enlarged brain and skull areas co-evolved in *H. sapiens*.

### *Schizophrenia*

Also in schizophrenia abnormal neural oscillations and synchrony has been associated with less organization in subdivision of the corpus callosum than controls (Uhlhaas and Singer, 2010a and references therein). Most studies have focused on deficits in the generation and maintenance of coherent gamma- range oscillations (Light et al., 2006; Minzenberg et al., 2010; Kirihara et al., 2012). However, Moran and Hong (2011) after reviewing EEG studies on Schizophrenia and describing some of the key functional roles exerted by gamma, low frequencies, and their cross-frequency coupling, conclude that even isolated alterations in gamma or low frequency oscillations may impact the interactions of high and low frequency bands.

Turning now to the Corpus Callosum: Leroux et al. (2015) revealed that reduced leftward functional lateralization for language in patients with schizophrenia was correlated with altered callosal integrity, reflecting decreased, and/or slower interhemispheric communication. In addition, Peters and Karlsgodt (2015) concluded that aberrant interhemispheric communication in schizophrenia is due to disrupted maturation at adolescence, with later changes likely due to disease neurotoxicity or to abnormal or excessive aging effects. In agreement with my hypothesis, neuroimaging studies showed lower callosal integrity (through FA or RD) in either the whole Corpus Callosum (Miyata et al., 2010; Knöchel et al., 2012; Freitag et al., 2013) or, more specifically, in the splenium region (Kyriakopoulos et al., 2008; Holleran et al., 2014; Balevich et al., 2015).

#### White Matter- Globular Brain Pattern

In this section I will try to make clear how the development of the white matter underlies, or rather, co-evolves with the growth of our brain. More specifically, I will try to show how the processes of myelination and energy allocation (thermodynamics) of the white matter can shed light on our pursuit of what determines the shape of our brain. [The reasons why I am using the terms "shape" or "pattern" instead of "size" are well- explained in Boeckx and Benítez-Burraco (2014a) and Hublin et al. (2015). Suffice it to mention here one of their arguments: over the course of the past 30 000 years brain size declined slightly in recent *H. sapiens*, hence it strikes me as quite biased to keep focusing on brain's size solely.]

The idea is based on a statement in Boeckx and Benítez-Burraco (2014a): "*if the brain grows differently, it wires differently.*" I'd prefer to think of this in terms of allometric evolution, and say that "differences in brain growth and wiring co-evolve." In line with Buckner and Krienen (2013), I deem that the most telling wiring "element" can be found in the context of myelination and synaptic plasticity. As they note: "*Myelination of the cerebrum is delayed relative to other brain structures and in humans is globally protracted compared with other primates, including chimpanzees ... these collective observations suggest that the expanded cortical mantle of the human brain comprises networks that widely span the cortex without consistent feedforward/feedback connectivity and, further, that these circuits mature late into development.*"

In a similar vein, Hublin et al. (2015) take the extended period of growth during ontogeny and the delayed maturation of brain structure to contribute to our brain shape and its cognitive complexity. Prolonged human development is consequently thought to be key for the globularization developmental phase, present only in *H. sapiens* (Gunz et al., 2010, 2012). Furthermore, the fact that in humans myelination of the cortical axons is slow during childhood and extends beyond late adolescence allows their brain to "wire" while interacting with an enriched physical and cultural environment, viz. while being exposed to a vast variety of stimuli.

Even though studies showing that white matter volume increased in *H. sapiens* (Schoenemann et al., 2005; Sakai et al., 2011) are of relevance, Ventura-Antunes et al.'s (2013) observations should call our attention: building on Mota and Herculano-Houzel (2012), they drew the conclusion that cortical size is not only proportional to white matter volume, but to the average caliber and longitudinal tension along the axons of the white matter. This is how they explain that cortical size in rodents and primates scales differently: while rodents' brains wire with constant connectivity fraction, as a uniform network with the addition of isometrically longer fibers, primates' brains scale as a small-world network, growing through the addition of nodes that are densely intraconnected locally but only sparsely interconnected globally (Changizi, 2001; Karbowski, 2003; Sporns and Kötter, 2004; Sporns and Zwi, 2004).

Crucially, my hypothesis seems to fit very well in this picture. But the way I construe the role of the Corpus Callosum has little to do with the traditional view represented by Ringo et al. (1994). These authors set forth the idea that the strategy large brains use to compensate their conduction delays in transcallosal information transfer is not to increase the inter-hemispheric processing that depends solely on the Corpus Callosum, but to increase the intra-hemispheric amount of fibers that connect local lateralized networks. Ringo et al. (1994) and Hänggi et al. (2014) made a direct correlation between this observation and brain size; however, neither this small- world strategy nor large brain size is a specific trait of *H. sapiens*. In my opinion, humans' identifying features should be sought in the changes our neural wiring manifests. According to the hypothesis put forward in this paper, the Corpus Callosum is of great relevance, given that it displays a unique structure, which is related intrinsically to language processing coupling I described above. Let me just remind the reader that in humans the proportion of large diameter fibers in callosal regions that interconnect primary sensory areas is higher than in macaques (Aboitiz et al., 1992) and that the fiber organization of the Corpus Callosum has nothing in common with the callosal organization reported in monkeys, where the density of callosal connections varies according to body part within sensory representations (Jones et al., 1978; Killackey et al., 1983).

In order to conceive well of what connectivity means in brain terms, we should pay attention not only to the 'wiring' but also to the 're-wiring' of the brain. With the latter I am referring to the crucial phase of synaptic pruning in development, which proves to be very pertinent to my hypothesis. Moreover, viewed through the prism of energy allocation and thermodynamics in the brain, its relevance to brain asymmetry becomes conspicuous.

Indeed both Ventura-Antunes et al. (2013) and Hublin et al. (2015) hinted at the bearing of energy in the context of brain development. Human brains appear to use their prolonged development as a strategy to counterbalance large brains' energetic costs. The brain is extremely thermoregulated and vulnerable to energy shortages during development, as it requires circa 66% of the basal metabolic rate for functioning and maintenance by 4.2–4.4 years, when the brain approaches its adult size and synaptic densities are maximal (Holliday, 1986; Kuzawa, 1998). This exuberance of synapses is said to be needed to allow the synapse removal required for neural network refinement (Innocenti, 1995; Innocenti and Price, 2005; from an evolutionary standpoint, thermoregulation merits additional attention, considering that Neanderthals had a different endocranial heat dissipation pattern, when compared with modern humans, but a comparable amount of heat production, something that, according to Bruner (2014), could possibly be associated with the extinction of the first).

Myelination and synaptic pruning co-operate to adjust the energy consumption of the brain. Skoyles (2012) puts it boldly: "*This relative delay of myelination maturation is also consistent with small world connectivity refinement occurring particularly in the later stages of neuromaturation during adolescence, in which distant connections are pruned to create a more hub based connectivity...*. *such network refinement depends upon the synaptic pruning that "rewires" the local area neural networks formed between neighboring area neurons."* More importantly, Skoyles also provides a link between myelination and energy efficiency of axon transmission that can be directly linked to my hypothesis "*for the passage of each spike, a 0.5* μ*m unmyelinated axon costs about 12-fold more in energy than when that spike is passed through a myelinated one.*" Finally, if myelination reduces energy costs for interhemispheric communication, as it follows from the myelination and fiber structure of the

Corpus Callosum, synaptic pruning allows for neuron rewiring changes that refine anterior and medial cortex to establish and refine its intrahemispheric networks (Chklovskii et al., 2004; Skoyles, 2012).

These data can also be related to how the rhythms of the brain are tuned. Feinberg and Campbell (2010) found that the increase of myelination of long axonal fibers during adolescence results in long-range connectivity through reduced slow-wave activity (delta, theta) and decreased energy consumption. According to them and Uhlhaas and Singer (2010b), developmental changes correlate with the precision of rhythmic synchrony. More concretely, Zaehle and Herrmann (2011) observed a positive correlation between posterior callosal white matter density and inter- hemispheric frequency of visually evoked gamma oscillations, indicating a clear nexus between the connectome and the dynome (Boeckx and Theofanopoulou, 2014). Finally, on the grounds that Barbato and Kinouchi (2000) find a great relationship between optimal pruning and the first learning experiences, it can be deduced that the linguistic input actually pilots the brain's development.

#### Posterior Brain and Skull Enlargement in *Homo sapiens*

According to Hublin et al. (2015) "*modern humans developed a more globular shape of the brain primarily resulting from a bulging in the parietal areas and a ventral flexion. In addition, modern humans display a proportionally larger cerebellum, larger olfactory bulbs and temporal lobe poles, and a wider orbitofrontal cortex*" (see **Figure 6**).

For reasons I mentioned in the Introduction, I take the posterior cortex to be of more evolutionary significance than the frontal cortex; with this I don't mean to downgrade the acknowledged importance of the latter in higher-order language processing. It is noteworthy though that recent experiments show that the more eminent difference between humans' and monkeys' frontal cortex is its stronger connectivity with the sensorimotor cortex and not within frontal areas (Neubert et al., 2014). If we interpret these data in terms of co-evolution, we can say again that the anterior cortex co-evolved under the pressure of posterior cortex's enlargement.

Turning now to the posterior cortex, my hypothesis is fostered by the findings of Bruner (2010): "*as brain size increases, the parietal lobes undergo relative flattening in nonmodern humans. This pattern is stressed in Neanderthals, which show, however, a certain widening of the parietal volumes. Only H. sapiens shows a generalized enlargement of the entire parietal surface*." Furthermore, the bulging parietals of modern humans have been linked to evolutionary reorganization of deep parietal brain areas that gave rise to the novel precuneus (see Cavanna and Trimble, 2006 for the role of precuneus in cognition).

Buckner and Krienen (2013) review an array of studies concluding that the most telling change in the evolution of our lineage (that can be connected to cortical expansion) is the enlargement of the cerebellum (expressed mostly in the dentate nucleus) and its extensive projections to association cortex. In their final assessments they note: "*The cerebellar* *association zones are disproportionately expanded in humans, but the functional origins and importance of cerebellar expansion remain unresolved. Adaptionist ideas... seek explanations for cerebellar enlargement as a specific, selected feature of evolution.*" Far from suggesting that the hypothesis presented here resolves the issue, I take the correlation between the myelination of the posterior Corpus Callosum and the enlargement of the cerebellum to be an insightful window into the evolution of our brain.

I side with Barton and Venditti (2014), when they propose the following: "*cerebellar specialization was a far more important component of human brain evolution than hitherto recognized and that technical intelligence was likely to have been at least as important as social intelligence in human cognitive evolution. Given the role of the cerebellum in sensory-motor control and in learning complex action sequences, cerebellar specialization is likely to have underpinned the evolution of humans' advanced technological capacities, which in turn may have been a preadaptation for language*."

Essentially, as brains grow during ontogeny, the bones of the skull accommodate the expanding brain. The protrusion of the posterior cranial fossa in modern humans presents a good correlation with the cerebellar lobes, and the bulging parietal bones with the parietal lobe and specifically the precuneus.

It is also remarkable that some cranial changes have been associated with the Corpus Callosum: "*In terms of evolution, shape and position of the corpus callosum are influenced by the general endocranial architecture, mainly by the flexion of the cranial base*" (Bruner et al., 2012). Furthermore, considering that the tentorium cerebelli rotates inferoposteriorly in human fetuses (Jeffery, 2002) and that the antero- posterior stretching of the Corpus Callosum length was shown to vary in humans *"due to the association between splenium and the anterior insertion of the tentorium cerebelli, caused by spatial proximity and consequent biomechanical relationships"* (Bruner et al., 2012), there can be an overlooked relationship between some cranial fossa and the Corpus Callosum.

To conclude, I have argued in this paper that the special morphology of our Corpus Callosum provides an explanatory link between the (selective) Asymmetry long thought to be the key to understanding the evolution of our specific mode of cognition and the growth pattern that results in a globular brain(case), which sets us apart from other primates. On the basis of the evidence reviewed here, it can be said that accounts that ignore the critical role and anatomical position of the Corpus Callosum fail short of capturing what makes our brain's languageready.

# Acknowledgments

I am indebted to Cedric Boeckx for his constant encouragement and help. Many thanks also to Antonio Benítez- Burraco for helpful discussion. Preparation of this work was supported by funds from the Spanish Ministry of Economy and Competitiveness (grants FFI2013-43823-P and FFI2014- 61888-EXP).

# References


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Theofanopoulou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Corrigendum: Brain asymmetry in the white matter making and globularity

#### Constantina Theofanopoulou\*

Department of General Linguistics, Universitat de Barcelona, Barcelona, Spain

Keywords: brain asymmetry, white matter, lateralization, brain rhythms, language

#### **A corrigendum on**

Edited and reviewed by:

Antonio Benítez-Burraco, University of Huelva, Spain

\*Correspondence: Constantina Theofanopoulou constantinaki@hotmail.com

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 13 November 2015 Accepted: 16 November 2015 Published: 14 December 2015

#### Citation:

Theofanopoulou C (2015) Corrigendum: Brain asymmetry in the white matter making and globularity. Front. Psychol. 6:1857. doi: 10.3389/fpsyg.2015.01857 **Brain asymmetry in the white matter making and globularity** by Theofanopoulou, C. (2015). Front. Psychol. 6:1355. doi: 10.3389/fpsyg.2015.01355

In the Acknowledgments Section of the original Hypothesis & Theory article, the last sentence erroneously reported grant number FFI2014-61888-EXP as part of funds supporting the work. The only grant was #FFI2013-43823-P, provided by Spanish Ministry of Economy and Competitiveness.

This error does not change the scientific conclusions of the article in any way.

# ACKNOWLEDGMENTS

I am indebted to Cedric Boeckx for his constant encouragement and help. Many thanks also to Antonio Benítez-Burraco for helpful discussion.

**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Theofanopoulou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The brain dynamics of linguistic computation

#### Elliot Murphy \*

Division of Psychology and Language Sciences, University College London, London, UK

Neural oscillations at distinct frequencies are increasingly being related to a number of basic and higher cognitive faculties. Oscillations enable the construction of coherently organized neuronal assemblies through establishing transitory temporal correlations. By exploring the elementary operations of the language faculty—labeling, concatenation, cyclic transfer—alongside neural dynamics, a new model of linguistic computation is proposed. It is argued that the universality of language, and the true biological source of Universal Grammar, is not to be found purely in the genome as has long been suggested, but more specifically within the extraordinarily preserved nature of mammalian brain rhythms employed in the computation of linguistic structures. Computational-representational theories are used as a guide in investigating the neurobiological foundations of the human "cognome"—the set of computations performed by the nervous system—and new directions are suggested for how the dynamics of the brain (the "dynome") operate and execute linguistic operations. The extent to which brain rhythms are the suitable neuronal processes which can capture the computational properties of the human language faculty is considered against a backdrop of existing cartographic research into the localization of linguistic interpretation. Particular focus is placed on labeling, the operation elsewhere argued to be species-specific. A Basic Label model of the human cognome-dynome is proposed, leading to clear, causally-addressable empirical predictions, to be investigated by a suggested research program, Dynamic Cognomics. In addition, a distinction between minimal and maximal degrees of explanation is introduced to differentiate between the depth of analysis provided by cartographic, rhythmic, neurochemical, and other approaches to computation.

#### Edited by:

Antonio Benítez-Burraco, University of Huelva, Spain

#### Reviewed by:

Aritz Irurtzun, Centre de recherche sur la langue et les textes basques, France Timothy Michael Ellmore, The City College of New York, USA Constantina Theofanopoulou, Universitat de Barcelona, Spain

#### \*Correspondence:

Elliot Murphy, Division of Psychology and Language Sciences, University College London, Chandler House, 2 Wakefield Street, London WC1N 1PF, UK elliotmurphy91@gmail.com

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 11 July 2015 Accepted: 18 September 2015 Published: 13 October 2015

#### Citation:

Murphy E (2015) The brain dynamics of linguistic computation. Front. Psychol. 6:1515. doi: 10.3389/fpsyg.2015.01515 Keywords: neural oscillations, biolinguistics, syntax, dynome, theta, alpha, beta, gamma

The argument for placing language at the center of investigations into human cognition has by now been pushed on a number of fronts, from palaeoanthropology to philosophy (McGilvray, 2013; Hauser et al., 2014). In contrast, attempts to place the brain at the center of the language sciences have been met with suspicion and even ridicule, typically due to the observation that higher cognitive constructs like verb and phrase cannot presently be made commensurable with lowerlevel neurophysiological structures like dendrite and cortical column. Substantial engagement with the biology literature is a feature still lacking in departments of linguistics, despite the Minimalist Program's narrowing of the boundaries between the computational and conceptual capacities of humans and non-humans (Chomsky, 1995, 2012, 2015b).

One of the core motivations linguists have for leaving aside biology and keeping to computational investigations arises from Poeppel (2012) and Chomsky's (2000) insightful discussions concerning philosophy of science, theoretical reduction, and unification. These authors point out that, as with the reduction of physics to an unaltered chemistry in the early years of the twentieth century, it may well be that a new neurobiology yielded by a "Galilean" revolution is required for commensurability with the computational theories of syntacticians to be achieved, rather than a revolutionized theory of language. But the common claim that linguistics is biology at a suitable level of abstraction (Berwick, 2011) is also used to effectively get linguists "off the hook" of directly exploring the biology of language, satisfied as many are with concluding that this is purely the job of neuroscience. Yet if neuroscientists are not guided by the concerns of computationalists across the cognitive sciences, and not just linguistics, then there is little reason to believe that this goal will ever be achieved. As Lenneberg (1964, p. 76) noted, "[n]othing is gained by labeling the propensity for language as biological unless we can use this insight for new research directions—unless more specific correlates can be uncovered."

# Dynamic Cognomics: Preliminary Remarks

The central argument of this paper will be that recent developments in brain dynamics and neurochemistry can provide the type of framework needed to meet Poeppel and Embick's (2005) challenge of "granularity" mismatch, or the problem of reconciling the primitives of neuroscience with the primitives of linguistics (see also Fitch, 2009; Poeppel, 2011). The brain simply does not know what syntax or phonology are, and these concepts are much too coarse to be implemented neurally. In 1996, Poeppel noted of cell assemblies and oscillations that "it is unclear whether these are the right biological categories to account for cognition" (1996, p. 643), but by now the oscillation literature has sufficiently expanded to incorporate numerous cognitive processes.

Linguistics can direct the brain sciences insofar as its insights into the universality of operations like concatenation (setformation) inform the goals of neurobiology, while the brain sciences can direct linguistics insofar as they place constraints on what possible operations neuronal assemblies and their oscillations can perform. While linguists should focus on making their claims about language biologically feasible, neuroscientists should conversely ensure they do not sideline the notion of computation, as stressed by Gallistel and King (2009).

In order to explore these manifold agendas, I will adopt the multidisciplinary approach promoted by Boeckx and Theofanopoulou (2014), which endorses an interweaving of the sciences concerned with the following topics: the computations performed by the human nervous system (the "cognome"; Poeppel, 2012), brain dynamics (the "dynome"; Kopell et al., 2014), neural wiring (the "connectome"; Seung, 2012) and genomics. This framework exposes the misleading nature of common questions surrounding whether the brain's wiring "makes us who we are," which have been given an impetus by calls from Seung (2012) and others for a map of the connectome. The connectome constrains the kinds of operations performed by the nervous system, but it cannot reveal what operations in particular are performed. What is needed, as Seung himself has explained, is not just a comprehensive model of neural wiring, but also neural computation, which is what a theory of the cognome can contribute (see Reimann et al., 2015 for a proposed algorithm to predict the connectome of neural microcircuits).

Bridging the two domains, I will argue, is the dynome; or what physicists would term the mesoscale, and not the microscale. The dynome is the level of brain dynamics, encompassing electrophysiology, and neural oscillations. It explores "not only what is connected, but how and in what directions regions of the brain are connected" (Kopell et al., 2014, p. 1319). The cartographic literature (e.g., fMRI and DTI studies) typically displays theoretical and empirical satisfaction with discussions of neural "activation," "firing," and "pathways," keeping at a connectomic level of spatiotemporal brain nodes and edges (Bressler and Menon, 2010). The dynome adds to such a "functional connectome" an understanding of the regions involved in producing and processing brain signals. Although I will focus on brain rhythms, it should be noted that the dynome extends beyond neural oscillations and includes other temporal structures (Larson-Prior et al., 2013).

I would also like to propose that the universality of language, and the true biological source of Universal Grammar, is not to be found purely in the genome as has long been suggested (where there are surprising layers of variation; Benítez-Burraco and Boeckx, 2014a,b), but more specifically within the extraordinarily preserved nature of mammalian brain rhythms (the oscillations of mice and rats have the same pharmacological profiles as humans) likely arising from the deployment of longdiameter axons of long-range neurons (Buzsáki et al., 2013, see also Calabrese and Woolley, 2015). Such cortical and subcortical structures are "among the most sophisticated scalable architectures in nature" (Buzsáki et al., 2013, p. 751), with scalability referring to the ability to perform the same operations with increasing efficiency despite escalating organizational complexity. Brain rhythms, yielded in part by such structures, would therefore be expected to be capable of complex forms of information-transmission and integration.

A central question posed by this paper, then, is "Why claim that neuroscience requires a Galilean revolution in order for it to be made commensurable with linguistics when the properties of syntax may be able to be translated into rhythmic brain processes?" The current paper will suggest a new research program, Dynamic Cognomics, to explore the neurobiology of language in a deeper and more electrophysiologically explicit fashion than many existing cartographic neuroimaging studies, but some important background is needed before any concrete research goals can be drawn up.

# Cartographic Directions

In Murphy (2015a) it was claimed that the ability to label linguistic structures with a categorical identity (e.g., determiner, verb, and adjective), having concatenated two elements into an unordered set, and transfer them in a cyclic fashion to the conceptual-intentional (CI) interface is the defining property of the human computational system. This perspective will be maintained here. It will be argued that modifications in oscillatory couplings and the cell assemblies targeted by such dynomic operations are a viable candidate for what brought about what could be regarded as a phase transition from single-instance set-formation (of the kind seen in birdsong) to unbounded set-formation. For instance, the phase/nonphase rhythm of syntactic computation ([C/T[v/V[D/N]]]), emphasized by Richards (2011), Uriagereka (2012) and Boeckx (2013), may translate well into the rhythmic processes of neural oscillations.

Since the origins of modern cognitive neuroscience, linguistic processes have been claimed to elicit numerous event-related potentials (ERPs) by psycholinguists using magnetoencephalography (MEG) and electroencephalography (EEG) (see Swaab et al., 2012 for a review). As time-frequency analysis and its Fourier transforms developed into a mainstay of "ERPology" (Luck, 2014) in the 1990s and 2000s, it became possible to test the involvement of distinct brain regions and the concomitant electrical activity for various linguistic processes, given the standard assumption that language is a cognitive system. The ERP community has spent a great deal of time decomposing the major components, such as the P600 and N400. It is taken for granted that the level of analysis provided by these "large" components does not suffice at the electrophysiological level to describe generic linguistic sub-operations. The urge to seek a finer level of granularity, then, is clearly manifested in the ERP community through EEG and MEG investigations (Lau et al., 2008), but this objective is not found in the vast majority of cartographic neuroimaging research.

In recent decades, neuroanatomical inquiry into the structures responsible for syntactic processing has led to a number of revelations concerning the biology of language. Petersson et al. (2012) reveal the inadequacy of the classical Broca-Wernicke-Lichtheim language model of the brain by noting how the language network extends to substantial parts of superior and middle temporal cortex, inferior parietal cortex, along with subcortical areas such as the basal ganglia (Balari and Lorenzo, 2013), the hippocampus and the thalamus (Theofanopoulou and Boeckx, Forthcoming a). The network is also implicated in more general cognitive systems like the default-mode network and the multiple demand system.

Brodmann area 44 and the posterior superior temporal cortex appear to be involved in a pathway which supports core syntactic computations (Friederici et al., 2006, see also Tettamanti and Weniger, 2006; Santi and Grodzinsky, 2010), with the combinatorial network being identified by Poeppel (2014) as the anterior medial temporal gyrus and anterior inferior temporal sulcus. Lieberman's (2006) "Basal Ganglia Grammar" model proposes the existence of a pattern generator whose excitation/inhibition mechanism is located in the basal ganglia. This interfaces with working memory space located in Broca's area (Santi et al., 2015). Lieberman estimates that the dorsolateral prefrontal circuit is involved in sentence comprehension, projecting from the prefrontal cortex toward the lateral dorso-medial region of the globus pallidus, and the thalamus, which projects back to the prefrontal cortex. Balari and Lorenzo (2013, pp. 100–102) have suggested that this may be the circuit used as language's computational system operating within a structure of working memory networks (Balari et al., 2012).

# Evo-devo Directions

As the theory of evolution expands beyond the Modern Synthesis and into areas such as evolutionary-developmental (evo-devo) biology (Carroll, 2006; Bolker, 2008) there is in turn more potential for space for linguists to find their place within biology. In the evo-devo program, following the lead of traditional formalists such as Vicq-D'Azyr, Goethe and Owen (Amundson, 1998, 2006), natural selection is "a constantly operating background condition, but the specificity of its phenotypic outcome is provided by the developmental systems" (Pigliucci and Müller, 2010, p. 13). Evo-devo departs from Neo-Darwinian adaptationism (NDA), or "phylogenetic empiricism" (Chomsky, 1968), in that it takes the saltationist view that species are the result of punctuated genetic changes. The functionalism of NDA should also be rejected, since functions do not typically pre-exist organic form (Müller, 2008), which is determined by morphogenetic parameters such as the viscoelastic properties of cellular matrices and the kinetic activity of cellular diffusion (what Alberch termed "morphological evolution"), and which at best have what Balari and Lorenzo call a "functional potential" (2013, p. 37). Contrary to ideas in Dawkins (2006, p. 202) and Lieberman (2015), laws governing the conservation of developmental pathways should be "acknowledged with a creative character similar—if not superior—to that of natural selection" (Balari and Lorenzo, 2013, p. 115). Form often precedes function, then, and natural selection acts as a "filtering condition on pre-existent variants"; thus "arrival of the fittest, instead of survival of the fittest, is the core issue in any evolutionary study" (Narita and Fujita, 2010, P. 364, see also Bertossa, 2011).

In this connection, Rakic and Kornack (2001) observe that the phase of asymmetric cell division yielding neuronal cells differs in timing between humans and monkeys to the extent that human neuronal populations are thought to be between 8 and 16 times larger than those of monkeys. Human-specific neuronal traits include the protein ApoE4, providing stronger synaptic connections (Bufill and Carbonell, 2004). Parker and McKinney (1999) detail how the myelinisation of the neocortex occurs in humans until the age of 12, but lasts only 3.5 years in rhesus monkeys. Zhang et al. (2011) also propose the existence of 1241 primate-specific genes, 280 of which are human-specific. 54% of these human-specific genes are upregulated in a brain area implicated in higher cognition, the prefrontal cortex. These new genes are "much more likely to be involved in gene regulation" (Diller and Cann, 2013, p. 256), a major topic in evo-devo.

Recent research in avian genomics suggests that the evolution of externalization may also not be as difficult as typically considered by generative grammarians. Pfenning et al. (2014, p. 1333) demonstrated that the profiles of transcription genes in vocal learners can be aligned, with 50 genes being shared between humans and birds which are "enriched in motor control and neural connectivity functions." Both humans and birds appear to have converged on identical solutions to vocal learning; a remarkable finding considering the 310 million year gap separating birds from humans. In summary, a slight epigenetic change, termed the "Small Bang" in Murphy (2015a), could have produced an alteration in the human computational system. The next section will consider how these operations could be implemented in the brain.

# Rhythmic Directions

How much physiological detail is required to capture the operations of the language faculty? Theofanopoulou and Boeckx (Forthcoming b) claim that studying neural dynamics only at the level of brain waves is sufficient, but as demonstrated below, a more refined biophysical picture is not only possible but in fact necessary to adequately explain the origins of linguistic computations like concatenation, cyclic transfer and labeling. What is needed is not just a neuroscience of language, but a neurophysiology of language. For instance, at the most general mesoscopic physiological level of local neuronal groups, synchronized firing patterns result in coordinated input into other cortical areas, which gives rise to the large-amplitude oscillations of the local field potential. Inhibitory interneurons play an important role in producing neural ensemble synchrony by generating a narrow window for effective excitation and rhythmically modulating the firing rate of excitatory neurons. Interneurons place constraints on the oscillations responsible, as argued here, for computation. Subthreshold membrane potential resonance may also contribute to oscillatory activity by facilitating synchronous activity of neighboring neurons. As Cannon et al. (2014, p. 705) note, "the physiology underlying brain rhythms plays an essential role in how these rhythms facilitate some cognitive operations."

Shifting focus from neuroimaging to more recent investigations of brain oscillations may provide a welcome (but as yet tenuous) way of reconstructing in neural terms the operations of theoretical linguistics. Brain rhythms "have come of age," as Buzsáki and Freeman (2015, p. v) put it. They reflect synchronized fluctuations in neuronal excitability and are grouped by frequency, with the most common rhythms being delta (δ: ∼0.5–4 Hz), theta (θ: ∼4–10 Hz), alpha (α: ∼8–12 Hz), beta (β: ∼10–30 Hz), and gamma (γ : ∼30–100 Hz). These are generated by various cortical and subcortical structures, and form a hierarchical structure since slow rhythms phase-modulate the power of faster rhythms.

It is by now well established that neural oscillations are related to a number of basic and higher cognitive functions, for example speech perception (Giraud and Poeppel, 2012; Kayser et al., 2014). According to Giraud and Poeppel's temporal linking hypothesis, oscillation-based decoding segments information into "units of the appropriate temporal granularity" (2012, p. 511). Oscillations may consequently explain how the brain decodes continuous speech, however Giraud and Poeppel's form of dynomic research crucially centers on the segregation of phonological, and not semantic or syntactic units, which may implicate different brain areas and rhythms. The γ , θ, and δ rhythms respectively correspond closely to (sub)phonemic, syllabic and phrasal processing, as Giraud and Poeppel note, restricting their experimental inquiry to the γ and θ bands. In addition, the neural dynamics responsible for syntactic operations may be obscured by the processing of external sensory events like speech, and so different experimental designs may be required to control for this.

Oscillations have also been linked to the timing of cortical information processing (Klimesch et al., 2007). As Vaas notes, "Intrinsic oscillatory electrical activities, resonance and coherence are at the root of cognition" (2001, p. 86), with the condensing and dissolving of oscillatory bursts possibly explaining the "cinematic" nature of subjective experience (Freeman, 2015). As Poeppel has put it, the brain essentially "breathes" through oscillations. If such generic neural operations are also shown to be responsible for syntactic computations, and not just linguistic perception, this would lend weight to Hagoort's (2014) interpretation of the cartographic literature, which holds that the establishment of an axis of language production and comprehension is not justifiable. Expanding on Giraud and Poeppel's (2012, p. 511) goal of establishing a "principled relation between the time scales present in speech and the time constants underlying neuronal cortical oscillations," one of the central challenges will be to draw up relations between oscillatory time constants and the time scales of syntactic computation. This latter topic has yet to be explored in any serious detail, possibly due to a widespread prejudice that neurolinguistic investigations of syntax must analyse phrasal units, such as noun and verb phrases, rather than the underlying operations which construct them, such as set-formation and labeling (although see Ohta et al., 2013 for an innovative approach to localizing Merge and Search operations).

#### Oscillations as Functional Units

Recent debates about the origins of ERP component generation have led some (Tass, 2000; Makeig et al., 2002) to propose that components do not arise purely from latency-fixed polarity responses which are additive to continuing EEG responses, but rather arise through a superposition of oscillations which reset their phases in reaction to sensory input (although see Sauseng et al., 2007 for the methodological limitations of particular phase resetting claims). For our purposes, it is worth noting that this phase reset model was the first to propose a strong dependency between components and oscillations, introducing to brain dynamics a functional and not purely electrophysiological role. This immediately granted researchers the ability to transfer understanding of components (which are in turn linked to cognitive faculties) to brain rhythms whilst correspondingly inferring the nature of components from an emerging understanding of oscillations. While cognitive electrophysiologists have embraced this integrally reciprocal perspective (Klimesch et al., 2004), linguists generally remain hostile to the claim that the nature of mental computations—like components—could be explored explicitly through biophysics.

While the cognome resides at the Marrian computational level (Marr, 1982), I would like to suggest that there is in fact no algorithmic level at syntax. At most there are algorithms at the interfaces. Psycholinguistic theories can algorithmically model language processing, as Neeleman (2013) discusses, but syntax itself (being composed of operations like Concatenate, Label, and Transfer) has no need for this. Nevertheless, the dynome, with its operations of information segregation and spike timing organization, can in some sense be seen as an algorithmic level, implemented by the cellular structures of the connectome. These Marrian concerns become more vivid when we consider with Martins and Boeckx (2014) that syllables, which are unique to humans, evolved from primate lip-smacking. In terms of brain rhythms, they are both identical, yet one is human-specific and another is not. The implications for the study of labeling, not acknowledged in Murphy (2015a), are clear: only comparative investigations of domain-general neurophysiological mechanisms, and the context in which they operate, will lead to enhanced understanding of human-specific computations. There are two central approaches to the cognomedynome one could adopt: re-construct the cognome from the bottom-up, or import linguistic constructs into a model of the dynome. I will be primarily concerned with the latter methodology, though the material reviewed and the model outlined open up the possibilities of using neurophysiology to guide linguistic investigations.

#### The Basic Label Model of the Cognome-dynome

At the most general level of analysis, neural oscillations emerge from the tension between the brain's two most central principles: segregation of function and dynamic integration (de Pasquale et al., 2012). Human brains are highly complex dynamical systems with principles of cellular and electrochemical organization which range across a hierarchy of scales. The brain cannot function purely through anatomical connections the locus classicus of standard neuroimaging studies—but additionally requires dynamic functional connectivity, achieved through oscillatory synchronization. Frequency bands alone are not sufficient for computation; rather, it is their interactions which are significant. Intuitive prejudices against studying complex systems in these dynamical terms abound: for instance, chemical dynamics are typically thought about in terms of reaction kinetics, being stipulated as pre-formed stable variables, ignoring the molecular composition/decomposition process.

A core feature of the brain's functional complexity is created by rhythms generated in different cortical and subcortical tissue. Oscillations denote distinct states of brain activity, while oscillatory activity reflects a dynamic interplay between the dissimilar cell types of discrete circuits (Buzsáki, 2006). Brain rhythms, with their inter-wave hierarchies, provide "a syntactical structure for the spike traffic within and across circuits at multiple time scales" (Buzsáki and Freeman, 2015, p. viii). "Phase synchronization" will additionally be a central notion to the present discussion, referring to a consistent phase coupling between two neuronal signals oscillating at a given frequency. γ band synchronization (GBS) in particular has been intensively studied due to its apparent role in phase coding and perceptual integration (Fries, 2009), and is thought to be a major process subserving a fundamental operation of cortical computation implicated in various cognitive functions. Which functions are involved depends ultimately on what neural circuits GBS operates on. The following sub-sections will present a way of exploring the operations of the cognome in terms of these dynomic operations, leading to a form of what I will call Dynamic Cognomics.

#### Concatenation

The central proposal of the model pursued here is that the interaction of brain rhythms yields linguistic computation. Lower frequencies such as the α range are known to synchronize distant cortical regions; procedures which may represent the substrates of linguistic cross-modular transactions (Kinzler and Spelke, 2007). More precisely, I will assume that the α band embeds γ rhythms generated cross-cortically, yielding a form of inter-modular conceptual combination, the electrophysiological equivalent of concatenation. The assemblies implicated by the γ range may have been influenced by the extended neocortical myelinisation discussed above, with direct effects on the network of information stored across such regions. This is consistent with recent claims that α is responsible for the binding of visuo-spatial features (Roux and Uhlhaas, 2014) and is deployed in the service of determining successful lexical decisions (Strauss et al., 2015). I will further assume that the items concatenated are also initially "lexicalized" by α-embedded cell assemblies oscillating at the γ range within supragranular layers of the default-mode network (Raichle et al., 2001).

#### Transfer

Linguists take concatenation to occur cyclically (Chomsky, 2008), and so I will additionally assume that this Spell-Out/Transfer process is realized through embedding the above γ rhythms inside the θ band, which finds its source in the hippocampus. I will adopt the claim of Theofanopoulou and Boeckx (Forthcoming b) that γ must be decoupled from the α band through the activity of the thalamic reticular nucleus for γ -θ embedding to take place. Both types of Transfer operations—Spell-Out to the sensorimotor interface, SM, Interpret to the conceptual-intentional interface, CI—will be subsumed under this approach, which at a minimum involves this desynchronization of α-generated structures and consequent θ-synchronization. Though the thalamic reticular nucleus is here identified as a core component of desynchronization, other regions may also be involved. Due to its role in γ -θ embedding in auditory processing (Nosarti et al., 2004), the posterior corpus callosum is also likely to be heavily involved in Transfer operations.

#### Labeling

Along with concatenation and transfer, there is also labeling. Two major observations have been made about this operation: (i) It is unique to humans (Murphy, 2015a); (ii) It is based on principles of minimal computation (Chomsky, 2015a). Labeling is also monotonic in that once a set has been labeled (as a verb or determiner phrase, for instance) its identity is sustained when embedded inside another set. Since labeling must take place at the point of transfer to the interfaces (to prevent a structure being a Verb Phrase at CI but a different phrase at SM), labeling must be seen as a core syntactic operation (Murphy, 2015b; Piattelli-Palmarini and Vitiello, 2015), and not emerging epiphenomenally at the interfaces, despite it having a less central role than unconstrained "Merge" (concatenation) which operates independently from either CI or SM.

As Boeckx and Theofanopoulou (2015) note, labeling was not formulated at a fine enough level in Murphy (2015a) to avoid the granularity mismatch problem. In order to correct for this, I will define labeling as the attribution to a concatenated set some categorical specification created from the Labeling Assembly, which is composed of aspects of (i) general cognitive constraints, (ii) the CI system, (iii) the cognome and (iv) the precursor lexicon (pLEX). The final of these four constituents is taken to be the set of flat and atomic "root" structures (Boeckx, 2014a), from which morphology constructs internally hierarchical words (Nóbrega and Miyagawa, 2015). When John is concatenated with ran, the labeling algorithm produces a Verb Phrase, not a Noun Phrase (see Adger, 2013; Narita, 2014a and Murphy, 2015a for further algorithmic details). This covers the basic outline of labeling, but in order to achieve a finer level of granularity it will be necessary to descend to the dynomic level, and ultimately (in the final section) the cellular level.

In dynomic terms, I will take labeling to be the slowing down of γ to β followed by β-α coupling, involving a basal ganglia-thalamic-cortical loop (see Cannon et al., 2014 for the rhythmogenesis of β in the basal ganglia). This would disinhibit the thalamic medio-dorsal nucleus via the β band. This frequency coupling arises from a relationship between oscillations which form a hierarchy such that the speed of the slower rhythm controls the power of the faster rhythm. Due to its involvement in phrasal processing, I will assume that the δ band may be involved in the later stages of this process. The role of the thalamocortical network as a slow rhythm generator, and hence a single dynamic and functional unit of brain oscillations, has been recently supported by Crunelli et al.'s (2015) review of the EEG literature. Accumulating evidence suggests that β holds objects, whereas γ merely generates them (Martin and Ravel, 2014). Dean et al. (2012) also show how β is an excellent candidate for comparing old and new information from distinct modalities due to its wider temporal windows; that is, it would compare phase heads (old information) with late-merged non-phasal elements like complements (new information), likely drawing on different conceptual representations and hence different "core knowledge systems" and brain regions (Spelke, 2010). Related both to Balari and Lorenzo's (2013) claim that the basal ganglia is the center of their "Central Computational Complex" and Jouen et al.'s (2013) findings that this structure is implicated in acquiring the serial response order of a sequence, Theofanopoulou and Boeckx (Forthcoming b) propose that this region holds one of the γ -supported items before slowing it down to the β frequency as a consequence of the conduction delays resulting from the surrounding neural regions. Thus the β band accomplishes the role of labels, a claim supported by findings that β activity maintains existing cognitive states (Engel and Fries, 2010). More broadly, the basal ganglia and the striatum are implicated in sequencing and chunking, with striatal structures operating at the β range (Leventhal et al., 2012). The core position occupied by the basal ganglia in this labeling model also fits well with imaging studies which have revealed the region's involvement in "syntactic complexity," specifically the processing of type-identity intervention of matching labels, being activated in a recent fMRI study when a noun phrase similar to the dependency head in a long-distance dependency intervenes in the dependency (Santi et al., 2015). Basal ganglia nuclei in humans are also around twice as large as would be predicted for a primate of our size (Schoenemann, 2012), and since humans do not appear to have substantially more sophisticated movements than apes, this increase may well have supported higher cognitive capacities like labeling.

#### Formal Considerations

Introducing new formalisms will permit a clearer explication of dynamic cognomics. Although they appear similar, what follows will have no direct bearing on, and should not be considered an extension of, standard set-theoretic notational conventions relating to such things as functional application.

First, we can notate γ -θ embedding as {θ(γ )}, with γ being embedded inside θ rhythms. If it is known how many γ cycles are to be embedded (for instance, 7), this can be notated as {θ(γ 7)}. We can notate the decoupling process required to transfer concatenated structures as γ (•)α, where γ is decoupled from the α band. Frequency coupling can correspondingly be notated as γ •α. The decreasing of γ to β can be represented as γ<→β, where "→" refers to a state change. Post-phrasal syntactic reanalysis and wrap-up effects can be represented with ψ. Finally, the (hypothetical) cell assemblies responsible for particular lexical features, such as the [+singular] feature of man, can be represented as ζ[man(+singular)]. If it is known in which regions (cytoarchitechtonic or otherwise) such assemblies are located, this can be represented as, for instance,ζ:BA44[man(+singular)], while the rhythm band can be additionally represented as ζ[man]:γ .

We are now in a position to write a simple derivation. Take the sentence The man is called John. This can be represented in familiar syntactic terms as a Tense Phrase, ignoring superfluous details (e.g., morphological operations): [TP[DP The man][T[<sup>T</sup> is][VP called John]]]. In the interests of clarity, I will put aside precise categorical concerns and denote labeled phrases with "L," with multi-phrasal labels being italicized. Even though sentences are parsed in a left-right fashion, generative linguistics holds that syntactic derivations proceed right-left. In order to deal with this perennial psycholinguistic problem, I suggest that structures are concatenated, labeled and transferred as and when they are heard, read or otherwise perceived, and after every lexical unit a "look back" procedure is triggered to reanalyse the labels and features of each structure, denoted here by ψ (see Chesi, 2015 for a comprehensive left-right derivational proposal). In psycholinguistic terms, this may account for certain wrap-up effects which occur when subjects reach the final word of a sentence during online processing (Field, 2004). This approach is also consistent with the "one-system" contention of Lewis and Phillips (2015) that grammatical theories and language processing models describe the same cognitive system, as evidenced by the fact that grammar-parser misalignments only seem to occur as a consequence of limitations in domain-general systems such as memory access and control mechanisms. It follows that "online and offline representations are the product of a single structure-building system (the grammar) that is embedded in a general cognitive architecture, and misalignments between online ("fast") and offline ("slow") responses reflect the ways in which linguistic computations can fail to reflect the ideal performance of that system" (Lewis and Phillips, 2015, p. 39). This one-system hypothesis also proposes that the grammar goes through a series of structure destruction and rebuilding operations as new words are encountered; a process which aligns well with the rhythmicity of the present model and the effects of ψ.

The derivation will proceed as follows. The is generated by distributed γ activity in the supragranular cell assemblies responsible for its long-term storage, ζ[the]. This rhythm would be embedded within α activity before being transferred to the interfaces through being decoupled from α and newly embedded within hippocampal θ activity. ζ[man], operating at the γ range, would then be embedded within α before being transferred. The two representations would then be labeled a Determiner Phrase at the Labeling Assembly, which I will identify as the circuits connecting the thalamus, basal ganglia, prefrontal cortex, and anterior temporal regions. To achieve labeling, the embedded cycles would be slowed to the β range (γ<→β) before being coupled to β (β•α). The labeled phrase [the man] would be maintained in memory via the β rhythm. The subsequent material [is called John] would then be added in a similar fashion:

ζ [the]:γ → {α(ζ [the]:γ )} → α(•)ζ [the]:γ → {θ(ζ [the]:γ )} ζ [man]:γ → {α(ζ [man]:γ )} → α(•)ζ [man]:γ → {θ(ζ [man]:γ )} {θ(ζ [the]:γ )(ζ [man]:γ )} ψ γ<→β α•((ζ [the]:β)(ζ [man]:β)) → α•(ζ [Lthe man]:β) ζ [is]:γ → {α(ζ [is]:γ )} → α(•)ζ [is]:γ → {θ(ζ [is]:γ )} γ<→β α•((ζ [Lthe man]:β)(ζ [is]:β)) → α•(ζ [L[Lthe man][Lis]]:β) ψ ζ [called]:γ → {α(ζ [called]:γ )} → α(•)ζ [called]:γ → {θ(ζ [called]:γ )} γ<→β α•((ζ [L[Lthe man][Lis]]:β)(ζ [called]:γ )) → α•(ζ )[L[Lthe man][L[Lis][Lcalled]]]:β) ψ ζ [john]:γ → {α(ζ [john]:γ )} → α(•)ζ [john]:γ → {θ(ζ [john]:γ )} γ<→β α•((ζ [L[Lthe man][L[Lis][Lcalled]]]:β)(ζ [john]:γ )) → α•(ζ [L[Lthe man][L[Lis][Lcalled john]]]:β) ψ

Notice that, as with Computational Ethology (Murphy, 2015a) and recent syntactic proposals (Hornstein, 2009; Adger, 2013), labeling is here placed at the center of the dynome's linguistic operations. As a result, call the above cognome-dynome hypothesis the Basic Label model. What remains to be added to the derivation by empirical investigation are the factors of time-frequency domain and the anatomical regions of cellular assemblies (e.g., "embed γ of region r within α of region s for time t"). All elements in the derivation, then, are created as simple γ assemblies, and only some (namely, labeled phase heads) become more complex β assemblies; consider the difference between adverbs like nearly and verbs like ran. The rhythmic division of complexity which follows from this is supported by Honkanen et al. (2014), who demonstrated thatsimple objects represented in visual working memory employ the γ band, while more complex objects are represented by the β band. This oscillatory procedure also matches the generative view that phase heads have a longer derivational life than non-phase heads (Boeckx, 2014a; Narita, 2014a). The role attributed here to γ assemblies additionally finds some support in Bastiaansen and Hagoort's (2015) EEG study of semantic unification, which detected larger γ-band power for semantically coherent than semantically incongruent sentences. Larger β-band power was also found for syntactically correct sentences relative to ungrammatical sentences, lending support to the hypothesized labeling power assigned to β in the present model.

#### Some Empirical Consequences of Dynamic Cognomics

Among many other forms of imaging and behavioral data, neuroimaging studies should be used as a guide for dynamic cognomic investigations. With respect to linguistic computation, the left anterior temporal lobe has been implicated in basic combinatorics (concatenation) and phrasal construction (labeling) (Bemis and Pylkkänen, 2013; Westerlund and Pylkkänen, 2014), while the posterior middle temporal gyrus is involved in lexical access (lexicalization) and ambiguity resolution (Turken and Dronkers, 2011). Given the present rhythmic perspective on linguistic computation, the much-discussed fronto-temporal language network would consequently be purely an output system of the above operations, not a core syntax region. Friederici (2012) holds that distinct regions of the left inferior frontal gyrus are responsible for "different" types of syntax, arguing, for instance, that the dorsal stream is only implicated in embedded structures or structures deviating from normal ordering. Yet, as the above model makes clear, the basic combinatorics are universal across syntactic structures, whether simple or complex; set-formation is still set-formation whether it is found in a small clause or a Shakespearean sonnet.

While relatively little is known about how oscillations relate to cognitive operations, significant advances could come from direct empirical investigations teasing apart γ and β from other rhythms, demonstrating a correlation with a syntactic manipulation (and perhaps a dissociation with another operation which could be linked to slower rhythms and working memory or attention processes; see Lakatos et al., 2008 for the role of oscillations in attention). Due to its high temporal and spatial resolution and signal-to-noise ratio, electrocorticography is also highly applicable to testing the Basic Label model, having been used to investigate speech production (Bouchard and Chang, 2014), language comprehension (Cervenka et al., 2011), and having been flexibly deployed both in humans and animals. In addition to the cartographic studies above, paradigms such as that in Ohta et al. (2013), which differentiate the neural correlates of concatenation and search/agreement operations, could be employed. Despite having noted the limitations of cartographic studies, an area of ongoing neurolinguistic research is the spatial scales of brain rhythms. It could be explored, for instance, whether ongoing oscillations and generic computations share the same neuronal generators. Emerging technologies to experimentally test and refine the Basic Label model include high-density electrode recordings and optogenetic tools (Chow et al., 2010; Viventi et al., 2011), along with the more traditional EEG and MEG devices. Bemis and Pylkkänen (2013) showed that between 200 and 300 ms after the presentation of a word which can be combined with a previous item, the left anterior temporal lobe is activated, implicating this region in semantic composition. This would consequently be a good estimate of when oscillation studies might detect labeling effects to arise, given the role of labels in semantic composition (Hornstein and Pietroski, 2009; Murphy, 2015b). At the most general level of lexical comprehension, EEG and MEG studies would also predictably find coherent oscillatory activation of large neuronal assemblies when processing words relative to processing pseudowords, as Pulvermüller et al. (1994) found. A level of cortical entrainment would also be predicted for non-syllabic, phrasal, and sentential structures during the auditory presentation of simple stimuli; structures which are not part of any speech stream but are rather internally constructed by the comprehender, and whose rhythmic generators would likely align closely with the regions implicated in the Basic Label model.

Neural potentials have typically been analyzed through frequency, time-frequency, and wavelet representations (Kaiser, 2010). Independent component analysis (ICA) has also been used successfully in estimating the sources of neural systems given multiple recording locations, since these systems generate independent and continuous activity and combine linearly and instantaneously (Hyvärinen and Oja, 2000). However, spatial ICA does not allow the interpretation of time-varying patterns, and in the case of EEG it also does not produce a model of "phasic events" of rhythmic activity.

Given these shortcomings, I would like to introduce the possibility of analysing a continuous signal as a linear combination of reoccurring waveforms. This is achieved by combining overcomplete representations with adaptive signal models. If the goal is to extract waveforms from a single continuous channel, then it follows that we should adopt a generative model which summates impulse responses, being a multiple input, single output (MISO) model. Principe and Brockmeier (2015, p. 15) term this a phasic event model. This proceeds in two steps: learning a set of waveforms occuring repeatedly throughout a signal, and estimating an atomic

decomposition of a signal in terms of timing, amplitude, and waveform index (see **Figure 1** for an example). The major advantages of this over other models is that the phasic event analysis learns the reoccurring waveform shape and allows the pinpointing of the amplitude and timing of phasic events. The model consequently captures the transitory nature of neural events.

significant amplitude atoms (timing, amplitude, and waveform index) appear at the bottom as colored bars. Color intensity corresponds to amplitude (from

Principe and Brockmeier, 2015, p. 15).

Given the structure of the Basic Label model and the division of EEG patterns and local field potentials into rhythms (α, β etc.,) and phasic events (sharp waves, β and γ ripples etc.), I think an approximate correlational (in the sense of Embick and Poeppel, 2015) division between computations and representations can be established between, respectively, phasic events (carried out in and between the cell assemblies of particular regions) and rhythms (necessarily localized at such regions).

Although oscillations are likely not all that is needed to provide a solution to the problem of linguistic computation, they nevertheless appear to be a vital part of the answer. Aside from language-centered obstacles, comparative dynamic cognomics will also face the notable challenge of the variation in oscillation presence across species, with the reasons for much rhythmic variation still unknown. For now, the Basic Label model satisfies the cognome-dynome operational level, but we would ultimately want to satisfy the connectome and other lower levels. As a result, the next section will expand on the bare electrophysiological details outlined above leading to the broadening of multidisciplinary concerns and perspectives.

# Biophysical Directions

To adequately explore the neurochemical and biophysical details of the Basic Label model, it is useful to introduce a distinction between minimal and maximal degrees of explanation:

(1) a. Minimal degree of explanation (MinDE): The use of brain dynamics to explain why the cognome performs the operations it does, and not some other imaginable operations.

b. Maximal degree of explanation (MaxDE): The use of brain dynamics in addition to causally relatable accounts of neurochemistry and its underlying biophysics to explain why the cognome performs the operations it does, and not some other imaginable operations.

Note that MinDE has minimal requirements, whereas MaxDE has no stipulated limits, embracing the full range and plurality of the natural sciences. Neuroimaging studies, for instance, do not even reach the level of MinDE, whereas a purely rhythmic approach to the dynome of the kind found in Theofanopoulou and Boeckx (Forthcoming b) satisfies MinDE without reaching the neurochemical and biophysical precision of MaxDE. Kopell et al. (2014, p. 1319) stress that connectome-dynome linking hypotheses need to be supplemented with "the biological details that relate this connectivity more directly to function." This is where I will attempt to depart from analyses which remain at the levels of the dynome and cognome (e.g., Sporns, 2013). For instance, Theofanopoulou and Boeckx (Forthcoming b) only refer in passing to basic interneuron classes, and their model lacks any serious neurobiological details. As Allen and Monyer (2015, p. 85) comment, "when considering interneurons, it would be important to investigate the role they play in the reactivation of cell ensembles occurring during sharp wave/ripples."

Mechanistic ventures beyond the dynome are, I think, in the proper spirit of Turing's (other) thesis regarding morphogenesis, which was concerned not just with a description of an organism's forms (similar to the computational level of modern linguistics) but also with a proto-evo-devo theory of the cellular mechanisms which give rise to such forms (Turing, 1952, see also Maini, 2004). As Kopell et al. (2014, p. 1324) note, "an immersion in the physiology supporting temporal dynamics suggests mechanisms that would not be obvious if one were thinking abstractly about computation and rhythms"; a statement which carries urgent lessons for theoretical linguistics and neuroimaging.

Contrary to much of Koch's (1999) ambitious work, the following section will argue that the divide between biophysics and computation is in fact incommensurable, and that a different biolinguistic strategy will be required to resolve the granularity mismatch problem. This approach will use the Basic Label model alongside neurochemistry as tools to construct a neurobiologically feasible cognome, free of the technical baggage—though not the methodological naturalism (Chomsky, 2000; Collins, 2015)—of minimalist syntax and its lexico-centrism and "featuritis" (Boeckx, 2014a).

# Feeble Currents and Cognomic Substrates

Though much interdisciplinary work remains to be carried out, dynamic cognomics has the potential to progress neurolinguistics beyond the situation described by Szathmáry in 1996: "Linguistics is at the stage at which genetics found itself immediately after Mendel. There are rules (of sentence production), but we do not yet know what mechanisms neural networks are responsible for each rule" (1996, p. 764). So far, I have only presented a model of how to embed the cognome within the dynome, but it is also vital to ground the dynome within the connectome and microlevel analyses, in turn addressing Szathmáry's concern.

It has been shown that neuronal populations can synchronously discharge due to an internal or external event, and additionally as a result of dynamic interactions between reciprocally coupled networks, which serve to "tag the responses of neurones that need to be related to one another," as König (1994, p. 31) put it in his seminal assessment of neural oscillations. This synchronous activity further tends to be oscillatory in nature (Liu et al., 2010). Oscillations have also been linked to neurochemistry (Muthukumaraswamy et al., 2009), as discussed below. While oscillatory electrical activity in cell assemblies has been observed since the 1920s beginning with Berger's (1929) ground-breaking work, inspired by the Liverpool surgeon Caton's (1875) studies of the "feeble currents" generated by rabbit and monkey brains, its role in cognitive capacities has been intensively explored only since the new millennium (Jensen et al., 2002; Ossandón et al., 2011), largely down to theoretical, technological, and optogenetic advances. Updating Caton's imagery, McCormick et al. (2015, p. 133) summarize that brain rhythms are generated through "the interaction of stereotyped patterns of connectivity together with intrinsic membrane and synaptic properties."

At the most common level of investigation, time-locked frequency analysis can decompose an EEG signal and identify changes in oscillations. But the widespread use of non-invasive and high-temporal resolution MEG, and recent advances in its source localization power (Wipf et al., 2010), have led to enhanced understanding of the spatiotemporal dynamics of oscillations and how they operate within neural networks. Recent work has begun to deliver an increasingly precise account of how, for instance, different classes of GABAergic interneurons in the hippocampus coordinate activity giving rise to network oscillations (Allen and Monyer, 2015), strengthening dynomeconnectome correspondences. GABA<sup>B</sup> receptors also perform time integration of cell assemblies (classically defined as a set of neurons exhibiting stronger within-group connectivity than with other connected neurons; Hebb, 1949) from the subsecond to second scale (Deisz and Prince, 1989), a vital function in computing conceptual and linguistic information representations.

Going beyond this level of analysis will require mapping rhythms to the numerous interneuron classes, which are defined based on cell body location, expression of marker proteins, axonal arborization, and other properties (Whittington and Traub, 2003; Klausberger et al., 2005; Somogyi and Klausberger, 2005). Korotkova et al. (2010) attempted to reach such a goal by showing how the removal of NMDA receptors in parvalbumin-expressing (PV) interneurons reduced the power of θ oscillations in the CA1 hippocampal region, while also reducing the γ -power modulation by θ oscillations. PV interneurons and somatostatin-expressing (SOM) interneurons preferentially synapse, respectively, onto the cell bodies and proximal dendrites of pyramidal cells and the distal dendrites of pyramidal cells (Royer et al., 2012). The silencing of PV interneurons, but not SOM interneurons, altered the θ phase precession in the brains of mice running on a treadmill belt in the experiments conducted by Royer and colleagues, suggesting that PV interneurons are highly fit to control the firing phase of principal neurons during θ oscillations, permitting the extension of a causal chain from cognome to dynome to a specific part of the connectome.

It should be noted, however, that PV and SOM expression is common to numerous hippocampal interneuron classes, and so further optogenetic work is needed in order to establish the role of individual interneuron classes in oscillation generation. Fruitful prospects for such work can be found in recent advances in juxtacellular recordings, permitting the monitoring of a single interneuron in vivo. To take a relevant case, Lapray et al. (2012) discovered that PV basket cells—providing inhibition to the pyramidal cell body and proximal dendrite—fire preferentially at the descending θ phase (findings reproduced by Varga et al., 2012), while ivy cells—providing inhibitory currents onto pyramidal cell dendrites—fire preferentially during ascension and at the trough. These studies reveal that during a single θ cycle the inhibitory power onto distinct pyramidal cell sectors varies systematically (see also Brandon et al., 2014).

Viewing cell assemblies as the fundamental unit of computation rather than single neurons can by now be justified in that assemblies can tolerate noise by not being redirected in their trajectory, unlike single or small clusters of neurons (which would also be effected by spike transmission failures), intensifying the justification for placing such assemblies at the center of the Basic Label model. Given the information chunking and feature merging roles attributed to γ cycles, Buzsáki suggests that episodes of γ oscillations, which contain strings of cell assemblies, "may be regarded as a neural word" (2010, p. 365); that is, a discrete unit of information. If induced γ is also responsible for constructing coherent conceptual objects by synchronizing neural discharges binding together distant brain regions, as proposed by Tallon-Baudry and Bertrand (1999), then oscillations may also be responsible for complex semantic phenomena like copredication, through which a single object or event can be conceptualized via simultaneously concatenated yet contradictory properties, e.g., The newspaper I held this morning has gone bust or Lunch was delicious but took forever (see Murphy, Forthcoming). Brain rhythms would consequently play a crucial role in constructing what Aristotle termed the "place of forms."

Topics in electrophysiology should also direct the concerns of those investigating the brain dynamics of linguistic computation. Certain areas of recent research appear to be more commensurable with elementary computational operations than others. For instance, transfer of charges across membranes of all brain structures leads to a current giving rise to an extracellular field, which in turn influences the membranes. The transmembrane voltage (Vm) is defined as the difference between the intracellular (Vi) and extracellular voltage (Ve) at a time t and location x: Vm(x,t) = Vi(x,t) – Ve(x,t). A topic of contemporary debate is whether this endogenous field with its spatiotemporal Ve-fluctuations changes neuronal functions through ephaptic coupling (see Jefferys, 1995 for an overview). This process amounts to a feedback mechanism through which the neural structures producing a given field are in turn affected by them, yielding a self-generated cyclic loop. In terms of range, ephaptic coupling influences structures ranging from synapses to discrete neurons to neural networks.

At the microscale, a linear relationship is seen between a chemical synaptic current Isyn and Vm, with such current being able to be described as Isyn(t) = gsyn(t)(Vm(t) – Erev), where gsyn is the synaptic conductance and Erev is the reverse current. Following the above self-generated model, V<sup>e</sup> changes alter synaptic currents. In addition, ephaptic coupling of V<sup>m</sup> to electric fields influences spiking due to its effect on active cell conductances (Anastassiou et al., 2011). The explanatory force of ephaptic coupling becomes clearer with parallel plate wholeslice stimulation, which has shown that emergent properties of networks are more sensitive to electric fields than discrete neurons (Deans et al., 2007). As noted by Anastassiou and Koch (2015), the entrainment of spiking to field strengths as minimal as 0.5 mV/mm suggests that ephaptic entrainment to endogenous fields contributes to brain rhythms. Stronger ephaptic feedback also occurs after slower (<8 Hz) waves such as θ and δ compared to faster γ waves, suggesting that the nonsynaptic electrical signals seen in ephaptic coupling contribute to neural computation.

As with ephaptic coupling, I would additionally like to propose cross-frequency coupling (CFC) as a core component of computation, as discussed above. It has been suggested that this generic operation coordinates spatiotemporal neural dynamics (Canolty and Knight, 2010; Lisman and Jensen, 2013), resolving a long-standing problem over how neural activity is synchronized. With larger neuronal populations oscillating at lower frequencies and smaller populations doing so at higher frequencies, CFC would enable their synchronization. In particular, it has been shown that via "phase-amplitude" CFC the phase of the lower frequency modulates the amplitude of the higher frequency component, a process claimed to be involved in information transfer for faculties such as memory (Tort et al., 2009, though see Aru et al., 2015 for current limitations of phase-amplitude modeling).

But while much is known about the biophysical substrates of individual frequency components, the cellular mechanisms behind frequency interactions—the origin of linguistic computation in the Basic Label model—remain opaque. Initial research leading to such an account has already been mentioned: Recall Korotkova et al. (2010) and their findings regarding hippocampal θ•γ coupling and its reliance on NMDA receptor-mediated PV interneuron excitation (see also Bi and Poo, 1998; Tort et al., 2008). Using laminar electrodes to measure activity in monkey primate visual cortex, Spaak et al. (2012) found that α phase in infragranular layers modulates γ amplitude in supergranular layers (see also Friston, 2008); similar to how thalamic nuclei oscillating at the α band synchronize distant cortical regions oscillating at higher frequencies. As Aru et al. (2015) note, the most elegant theory to account for these findings is that periodic membrane potential fluctuations generate low frequency oscillations which subsequently gate the incidence of higher frequency activity in a phase-specific fashion. From a functional perspective, the above nested γ cycles could act as multiplexing mechanisms (Buzsáki, 2006, p. 356) for sustaining working memory representations by sending multiple representations as a single complex message to be recovered and "unpacked" downstream (see Hyafil et al., 2015 for empirical support, and Baddeley et al., 2014 for a review of working memory mechanisms); precisely as is seen in labeling and phasal transfer.

At a more general level, the cognome must operate within certain fundamental constraints on neuronal dynamics, such as the free-energy principle (following seminal insights from Friston, 2010) through which the homeostatic brain minimizes the dispersion (entropy) of interoceptic and exteroceptic states. If entropy is the average of "surprise" over time, then the brain will choose appropriate sensations to minimize surprise, and in so doing "the brain is implicitly maximizing the evidence for its own existence" (Bastos et al., 2012, p. 702); a notion not too far removed from Vaas's assessment that the brain is "a self-referential, closed system, a functional reality emulator that constructs the world, rather than reconstruct it" (Vaas, 2001, p. 88). This form of "predictive coding" conforms to the free-energy principle and the image of the brain as a constructive organ, assembling and inferring linguistic representations. Studies of chaotic itinerancy (Tsuda, 2013, 2015), many-body physics and thermodynamics (Vitiello, 2015) may also prove indispensable in describing the high-dimensional state space of cortical activity implicated in computation (see the essays collected in Ohira and Uzawa, 2015 for discussion).

An emerging consensus regarding the validity of the communication-through-coherence (CTC) hypothesis lends further impetus to the claim that rhythms bring about linguistic computation (Bastos et al., 2015). CTC claims that rhythmic synchronization, especially in the β and γ bands, modulates the efficacy of anatomical connections, and that oscillations are necessary for long-distance assembly formation (König et al., 1995; Fries et al., 2008). CTC can be complemented with recent developments in the understanding of the functional role brain rhythms play, with assembly formation being the core operation at the connectome level necessary to establish the kinds of crossmodular representational structures seen in natural language (Lopes-dos-Santos et al., 2011). γ band activity, for instance, has been associated with numerous cognitive functions such as memory and selective attention (see **Figure 2** for examples of connectome-cognition links). With γ bands arising from an interplay of inhibition (produced by GABAergic neurons) and excitation (produced by glutamergic neurons), Bosman et al. (2014) propose that these bands have their origin in basic functional motifs conferring an advantage for low-level system processing and multiple cognitive functions (see also Bartos et al., 2007; Buzsáki and Wang, 2012).

The broad functionality of γ makes it an ideal candidate, along with the thalamus (discussed below), for being the conductor of language's cross-modularity. The role of GBS in visual feature integration (Bosman et al., 2009), for instance, makes it a prime candidate for carrying out the forms of conceptual assimilation seen in any number of semantic phenomena. If linguistic computations are in fact responsible for this cross-modularity, then language can perhaps be more closely aligned to dominant descriptions of consciousness and working memory (Dehaene et al., 2014), even if we are forced to remain "virtually mute" (Chomsky, 1998, p. 440) about the nature of experiential content (Strawson, 2008, 2010).

In addition, GBS has been shown to support certain lowlevel functions in the hippocampus which may be vital to particular cognitive functions attributed to this region, such

as memory encoding and retrieval (Bosman et al., 2014). As mentioned, the hippocampus is the site of γ •θ coupling in that multiple γ waves are typically embedded within a single θ cycle (Bragin et al., 1995). Along with the standard phase locking operation through which higher waves occur at stable phases in cycles of lower waves (Belluscio et al., 2012), this allows spike coordination and may consequently be partly responsible for low-level dynome operations like phase coding (see **Figure 3**). As Lisman and Jensen (2013) review, the dual γ and θ oscillations form a code for representing multiple items in an ordered way. Since each θ cycle contains four to eight nested γ cycles, different forms of spatial information (such as a series of events from short-term memory, constituting an "episode") can be represented and sequentially coordinated within a given cycle. This may in turn constrain the number of lexical items or features able to be transferred in a given phase. Through the coding scheme discussed by Lisman and Jensen, the cell assembly that fires during a given γ cycle forms a topographic pattern representing a particular item from memory. If this oscillatory mechanism is also responsible for syntactic computation, this would lend weight to the strong connection drawn in Murphy (2015b) between syntactic phases and episodic memory. The number of γ cycles able to be embedded within a θ cycle may also be the reason why working memory is limited to its classic constraint of 7 ± 2 (Kamiñski et al., 2011). Roux and Uhlhaas (2014) make the related claim that oscillatory activity assures the maintenance of working memory information. This explanation is of precisely the kind of granularity linguists should seek to capture syntactic operations like labeling, which involves storing conceptual roots in memory. In brief, and returning to issues outlined above, if intrinsic coupling across cortical oscillations is responsible for the hierarchical combination of computations at the syllabic and phonemic levels, "restoring the natural arrangement of phonemes within syllables" (Hyafil et al., 2015), then this leads to the possibility that hierarchical syntactic computations result from similar mechanisms.

These operations are all conserved from early in mammalian evolution, with the above interplay between excitation and inhibition being found in crustaceans (Nusbaum and Beenhakker, 2002) and major phyla dating back 350 million years (Katz and Harris-Warrick, 1999). Bosman et al. (2014) draw on such considerations in claiming that the evolutionary acquisition of this excitation-inhibition interplay led to the selection of these γ waves as a principal element of computation. If this GBS mechanism was a "direct, inevitable consequence of early circuitry organization" (Bosman et al., 2014, p. 1994), then it may be that it is an exaptation (being co-opted) in that it was later afforded a functional role in systems of memory and learning (see also Gould and Vrba, 1982). Further, topdown neocortical processes implicated in particular higher cognitive faculties like working memory (Buschman and Miller, 2007) and free-choice reach (Pesaran et al., 2008) also appear to be carried by interareal synchrony in the β rhythm (Bressler and Richter, 2015), increasing the electrophysiological validity of the functional roles attributed to this wave above.

#### Cognomic Constraints and their Neurobiological Realizability

In the same way that γ oscillations "arise simultaneously and inevitably with inhibitory-excitatory interplay, and are neither an epiphenomenon nor a separate cause of the functionality beyond the underlying circuits" (Bosman et al., 2014, p. 1995), I would like to suggest that "linguistic" "computations" (which, as discussed, are neither purely linguistic nor thoroughly computational) are to be seen as identical to the operations of the connectome, which can be described in electrophysiological terms at the dynome level and in still more abstract terms at the cognome level, in a similar way that heat and energy can be reduced to thermodynamics. While I hope to have shown that distinct oscillatory phases segregate discrete units of information (visual, olfactory, semantic, etc.,), there remains the possibility that they also serve computations spanning multiple oscillatory cycles. Oscillatory phases may be the means through which different lexical features (e.g., ϕ, tense) are processed or time-locked with other features, leading to agreement relations, the resolution of filler-gap dependencies, feature inheritance/copying, and other familiar syntactic operations. Multiple β or θ cycles could, for instance, employ dynomic operations like "cycle skipping" (Brandon et al., 2013) to control which cell assemblies are activated upon subsequent cycles to trigger different aspects of lexical and conceptual representations.

These remarks cover some basic computations, but what of their constraints? Consider Wurmbrand's (2014) Merge Condition, stated below:

(2) Merge Condition:

Merge α and β if α can value a feature of β.

This condition ensures that set-formation via concatenation is licensed only under Agree, requiring also feature valuation. Leaving aside further details and the possibility that Merge applies freely, the scientist concerned with establishing linking hypotheses between linguistics and neuroscience is faced here with a number of challenges but also some surprising possibilities. For example, the cell assemblies implicated via cycle skipping in the features of α and β may undergo phase-locking, leading to oscillatory synchronization of two discrete units of information. When this occurs, feature valuation takes place and the derivation can converge. If this process is barred in virtue of rhythmic coupling restrictions and the limits of assembly synchronization, feature valuation, and hence concatenation, does not take place. If the distribution of unvalued features, [uF], also contributes to the demarcation of phases (Narita, 2014a), then the dynamics of feature valuation would likely align closely with the present Basic Label account of Transfer, since valuation, Agree and other copy-forming operations such as Internal Merge apply as a fundamental part of Transfer. Notice that this model at once implies specific neurobiological limitations, in that the hypothetical coupling responsible for feature valuation should occur after the cross-cortical {α(γ )} embedding proposed to be the substrate of set-formation. This leads to clear, causallyaddressable empirical predictions, to be investigated in future research.

As a secondary concern, I will assume that feature valuation (along with feature inheritance and Agree) are both cases of a more generalized Search operation, which forms relations between identical feature complexes (Ohta et al., 2013; Kato et al., 2014). Kato et al. (2014) even go as far as claiming that Search is in turn just an instance of Merge, and that the human language faculty may reduce to pure Merge. The Basic Label model and Kato et al. consequently yield different predictions about the dynome. From here, the matter is purely empirical, but these subtleties in distinct cognome-dynome hypotheses are yet to be investigated and are potentially of substantial interest to dynamic cognomics.

We are now in a position to outline a concrete research program. The first phase of dynamic cognomics will involve the above ongoing research into translating or reconstructing the operations of syntax into oscillation terms. The second phase should center on translating the constraints of syntax, such as those concerning agreement, movement, and anti-locality. For instance, Richard's (2010) Distinctness Condition, prohibiting the presence of multiple lexical units of the same label within a single phase complement, may be the consequence of how many distinct rhythms it is possible to couple in specific actions (Boeckx, 2013). These <sup>∗</sup>XX-like structures (e.g., structures containing multiple phase-internal nouns such as <sup>∗</sup> John Mary ate apples) may be ungrammatical because of the oscillatory patterns local language regions can sustain. These constraints may form the backdrop of what Narita (2014a, p. 26) identifies as a core aspect of minimal computation, the "Minimal Workspace" through which the construction and transferring of syntactic structures takes place. To put it more concretely, languageexternal systems (interfaces) may only be able to sustain a single rhythm from the γ and β bands due to the small size of localized regions, and would hence be incapable of interpreting multiple category-identical elements in a single cycle. The phase/nonphase rhythm of syntactic computation would thus arise from the limits of oscillatory sustainability, and the connection between syntactic phases and oscillatory phases becomes more than purely orthographic: [C [T v[V D/n [N]]]] emerges from [β [γ β[γ β [γ ]]]] given the labeling role attributed to β above, which in turn explains <sup>∗</sup>XX violations. Narita's (2014b) <sup>∗</sup> {t,t} constraint, which prohibits the transfer of syntactic objects whose two members are both traces/copies of movement, also strikes me as amenable to a similar, if not identical explanation. Objects of the {t,t} kind cannot be labeled, as in (3), and are hence illicit (Moro, 2006, p. 15):

(3) <sup>∗</sup> [which picture of the wall]<sup>i</sup> do you think that [the cause of the riot]<sup>j</sup> was {t<sup>i</sup> ,tj}?

What is needed is consequently a re-conceptualization of language as not only a system of thought, planning and interpretation, but also a system of oscillatory and electrophysiological information synchronization. The computational constraints explored by Wurmbrand and others can direct inquiry into the possibilities of dynomic operations, although this process may require further elaboration of the nature of the role of oscillations in cognition.

#### Globularity and Cortico-centrism

Recent developments in systems neuroscience have identified large scale distributed brain networks, typically explored through fMRI and MEG (Brookes et al., 2012). Data from fMRI suggests

that the implication of a functionally specific set of neurons in any given computation is assisted by a backdrop of large-scale neural assembly inter-communication. These networks are composed of sub-networks with correlating and anti-correlating patterns, leading to a situation in which a single large-scale network may operate through overlapping but distinct neural sub-networks. **Figure 4** highlights the major operations at the level of the cognome, dynome, and connectome, along with general laws influencing such operations.

As the cognome-dynome-connectome linking hypotheses expand, it is important not to ignore the fundamental role of the genome. Consider briefly the genes RUNX2, the DLX suite and the BMP family, involved in skull and brain development (Perdomo-Sabotal et al., 2014). In a series of ongoing research, Boeckx and Benítez-Burraco (2014a,b, Benítez-Burraco and Boeckx, 2015) hypothesize that a modification in this gene network gave rise to a more "globular" head shape (relative to Neanderthals/Denisovans; Bruner, 2004; Gunz et al., 2012; Theofanopoulou, 2015)—approaching a level of sphericity unseen in our closest ancestors—and the consequent re-wiring of cortical and sub-cortical structures, permitting the construction of the forms of cross-modular representations well established in psychological, philosophical, and semantic theories of concepts (Spelke, 2010; Pietroski, Forthcoming). Globularity may also have contributed, as some have suggested, to an increase in wiring efficiency across the brain (Chklovskii et al., 2002). It is of outstanding interest for biolinguistics and dynamic cognomics that functional links of this kind are beginning to be drawn between genes and their cellular consequences for the human cognitive phenotype.

An evaluation of these observations can also be made alongside a consideration of what Piattelli-Palmarini and Uriagereka (2008) see as the optimizing role language has in building syntactic and phonological structures, which proceeds via minimal search and related principles of computational efficiency (Larson, 2015). This minimalist perspective leads to a separation of optimality from language's proposed "function" of mapping structures to the interfaces, since similar optimizing principles are found elsewhere in the natural world, leading Piattelli-Palmarini and Uriagereka (2008, p. 209) to "suspect that the process behind the abstract form follow[s] from physicochemical invariants." But lacking a theory of brain dynamics, the authors are unable to ground these general proposals within any neurobiological framework. I suggest that the microcellular level and the dynome, operating within some general physical laws of neural organization such as free-energy, can provide a potential substrate of such "physico-chemical invariants." The only human-unique aspect of the model pursued here, then, is the context in which the conserved and universal rhythms discussed above perform their operations of coupling and decoupling; namely, a globular brain case, which would have led to a decrease in the types of "spatial inequalities" (Salami et al., 2003) between cortical and subcortical regions which would prohibit long-distance coupling. This would imply that the numerous centuries-long approaches to human-uniqueness, ranging from philosophy to medicine, have approached the matter from the wrong perspective. Instead of asking what it is about humans which allows us to form complex systems of symbolic interpretation, we should instead ask what it is about other animals which prohibits them from doing so.

Globularity may also have led to the expansion of the neo-cortex and the pulvinar, spurred on by the reduction of the large Neanderthal visual system (Pearce et al., 2013). As Benítez-Burraco and Boeckx (2015) point out, cross-modular concepts likely employ thalamic nuclei such as the pulvinar and the medio-dorsal nucleus, not least because of the thalamus's role in modulating fronto-parietal activity, regulating cortical oscillations (Saalmann et al., 2012) and enhancing the rhythmic range of different frequency bands (Singer, 2013). Controlling rhythmic behavior is also a function attributed to RUNX2 (Reale et al., 2013, see also van der Lely and Pinker, 2014 for genetic discussion relating to phonological computations). A literature review leads Theofanopoulou and Boeckx (Forthcoming a,b) to claim that the thalamus is the brain region which tunes the oscillations of other subcortical structures (see also Boeckx, 2014b). The importance of the thalamus for higher cognition was also speculated in work by Campion and Elliot-Smith (1934), rejecting the dominant cortico-centrism and suggesting that cortico-thalamic impulse circulation was responsible for "thought."

Relatedly, due to the few protein differences between humans and chimpanzees, the individuating computational factors may be attributed to cis- and trans-regulatory genes (Somel et al., 2013). Hominid-unique features which may have led to the higher mental faculties of humans include novel neuronal cell types and the duplication of developmental proteins such as SRGAP2, leading to unique dendritic spine density and form (Geschwind and Rakic, 2013). Synaptic and dendritic maturation also occurs in humans for a considerably longer time than in non-humans (Bianchi et al., 2013). If we also consider the conclusions of Harris's review of cortical computation in mammals and birds, that the "human cortex appears to contain the same cell types, and their patterns of wiring and gene expression appear basically similar to well-studied model systems" (2015, p. 3184), the importance of subcortical investigations into linguistic computation becomes even clearer. While subcortical structures have often been derided as the "reptilian brain," responsible for only primitive drives, far removed from the neocortex's higher echelons of thought, the perspective of dynamic cognomics re-situates subcortical regions like the thalamus and the basal ganglia into the core areas responsible for linguistic phrase structure building (see also Johnson and Knight, 2015 for evidence that the thalamus plays a key role in neocortical oscillations involved in memory processes).

Summarizing these findings, it appears that the developed interneurons and dendritic spinal strength proposed by Geschwind and Rakic (2013) fortified long-distance assembly connections and, in turn, the mechanisms of ephaptic coupling, CFC and other neuronal processes (operating within the confines of the CTC hypothesis) necessary for the rhythmic interactions claimed above to be the source of computations like labeling and cyclic transfer. The targeting of the perisomatic region of pyramidal neurons by inhibitory interneurons in particular leads to the formation of γ rhythms and their concomitant properties of conceptual assimilation. Though many intervening neurochemical processes need to be accounted for and explained, it seems that such processes, along with novel Ve-fluctuations, are the reason why we find the cyclic short-term memory storage capacities seen in labeling. Updating Darwin's claim of "He who understands baboon would do more toward metaphysics than Locke," we can conclude that he who understands brain rhythms would do more toward biolinguistics than Lenneberg.

#### References


## Acknowledgments

My thanks go primarily to Cedric Boeckx for his persistent encouragement, and to Karl Friston, Alec Marantz, Jyrki Tuomainen, and Wing-Yee Chow for helpful discussion. This work was supported by an Economic and Social Research Council scholarship (1474910).


in electrophysiological brain networks. Neuroimage 63, 1918–1930. doi: 10.1016/j.neuroimage.2012.08.012


Crunelli, V., David, F., Lõrincz, M. L., and Hughes, S. W. (2015). The thalamocortical network as a single slow wave-generating unit. Curr. Opin. Neurobiol. 31, 72–80. doi: 10.1016/j.conb.2014.09.001

Dawkins, R. (2006). Climbing Mount Improbable. Oxford: Oxford University Press.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Murphy. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The "Globularization Hypothesis" of the Language-ready Brain as a Developmental Frame for Prosodic Bootstrapping Theories of Language Acquisition

#### Aritz Irurtzun\*

CNRS, IKER (UMR 5478), Bayonne, France

In recent research (Boeckx and Benítez-Burraco, 2014a,b) have advanced the hypothesis that our species-specific language-ready brain should be understood as the outcome of developmental changes that occurred in our species after the split from Neanderthals-Denisovans, which resulted in a more globular braincase configuration in comparison to our closest relatives, who had elongated endocasts. According to these authors, the development of a globular brain is an essential ingredient for the language faculty and in particular, it is the centrality occupied by the thalamus in a globular brain that allows its modulatory or regulatory role, essential for syntactico-semantic computations. Their hypothesis is that the syntactico-semantic capacities arise in humans as a consequence of a process of globularization, which significantly takes place postnatally (cf. Neubauer et al., 2010). In this paper, I show that Boeckx and Benítez-Burraco's hypothesis makes an interesting developmental prediction regarding the path of language acquisition: it teases apart the onset of phonological acquisition and the onset of syntactic acquisition (the latter starting significantly later, after globularization). I argue that this hypothesis provides a developmental rationale for the prosodic bootstrapping hypothesis of language acquisition (cf. i.a. Gleitman and Wanner, 1982; Mehler et al., 1988, et seq.; Gervain and Werker, 2013), which claim that prosodic cues are employed for syntactic parsing. The literature converges in the observation that a large amount of such prosodic cues (in particular, rhythmic cues) are already acquired before the completion of the globularization phase, which paves the way for the premises of the prosodic bootstrapping hypothesis, allowing babies to have a rich knowledge of the prosody of their target language before they can start parsing the primary linguistic data syntactically.

Keywords: globularization, prosodic bootstrapping, language development, language acquisition, postnatal development

# 1. INTRODUCTION: THE GLOBULARIZATION HYPOTHESIS

According to a recent article in this journal by Boeckx and Benítez-Burraco (2014a), "much work in neurolinguistics has unintentionally emphasized the externalization component of language, since morpho-phonology is perhaps the easiest aspect to single out linguistic tasks, even if the word "syntax" was said to be the target of the relevant works. In so doing, work on neuroimaging biased

#### Edited by:

Cedric Boeckx, Catalan Institute for Research and Advanced Studies (ICREA) and Universitat de Barcelona, Spain

#### Reviewed by:

Silvia Martínez Ferreiro, University of Barcelona and University of Groningen, Spain Monika T. Molnar, Basque Center on Cognition, Brain and Language, Spain

> \*Correspondence: Aritz Irurtzun aritz.irurtzun@iker.cnrs.fr

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 30 July 2015 Accepted: 10 November 2015 Published: 09 December 2015

#### Citation:

Irurtzun A (2015) The "Globularization Hypothesis" of the Language-ready Brain as a Developmental Frame for Prosodic Bootstrapping Theories of Language Acquisition. Front. Psychol. 6:1817. doi: 10.3389/fpsyg.2015.01817 the results toward the Broca-Wernicke model, and all too quickly attributed "syntax" to Broca's area." In contrast, Boeckx and Benítez-Burraco (2014a,b) have advanced the hypothesis that our species-specific language-ready brain (a brain which is suited for acquiring natural languages) should be understood as the outcome of developmental changes that occurred in our species after the split from Neanderthals-Denisovans, and which resulted in a more globular braincase configuration in comparison to our closest relatives, who had elongated endocasts. They propose that even if factors like brain lateralization are important, the development of a globular brain is at the outset of our language faculty, and in particular, it is the centrality of the thalamus in a globular brain that allows its modulatory or regulatory role, essential for syntactico-semantic computations (cf. i.a. Wahl et al., 2008).

Inportantly, globularization takes place postnatally (cf. Lieberman et al., 2002; Neubauer et al., 2010; Gunz et al., 2012), therefore, according to Boeckx and Benítez-Burraco's hypothesis, even if innately specified, the combinatorial syntactic ability of humans is not innate stricto sensu, but the outcome of a postnatal developmental phase. After globularization a new brain configuration is obtained whereby the thalamus occupies a central position (and a central role). As Boeckx and Benítez-Burraco (2014a) put it, "a proper characterization of the language-ready brain that does not recognize a central role to the thalamus is unlikely to be correct, for it would miss the critical engagement of the thalamus in regulating cortical activity. By providing low-frequency oscillations capable of embedding higher-frequency oscillations across distant brain regions, the thalamus provides the crucial regulation needed to form the sort of meaningful cross-modular conceptual structures that are characteristic of language."

In this paper I discuss the developmental dimension of Boeckx and Benítez-Burraco's hypothesis and relate it to one of the most prominent hypotheses in early language acquisition studies; the "prosodic bootstrapping hypothesis" (cf. i.a. Mehler et al., 1988; Christophe et al., 2003; Bernard and Gervain, 2012; Gervain and Werker, 2013; Langus and Nespor, 2013).

The argument is presented as follows: Section 2 gives a brief overview of the development of the ability for phonological discrimination in human infants (an essential prerequisite for the identification and acquisition of the prosodic patterns of the target language). Section 3 presents the basic tenets of the prosodic bootstrapping hypothesis (a hypothesis that claims that language-acquiring children use prosody as a guide for inferring the basic syntactic pattern of their target language). Last, Section 4 argues for a natural combination of the globularization hypothesis and the prosodic bootstrapping hypothesis. In a nutshell, the globularization hypothesis proposes that the ability for syntactic computations is not innate, but that it rather develops after the postnatal globularization phase. In contrast, as the studies of early phonological development show, babies a few moths old have already a rich knowledge of the prosodic patterns of their target language. Therefore, and in line with the prosodic bootstrapping hypothesis, languageacquiring babies will be able to use their early-acquired prosodic knowledge as a guiding principle for inferring the syntax of their target language the moment the syntactic ability develops.

# 2. EARLY PHONOLOGICAL ABILITIES IN HUMAN INFANTS

Some essential ingredients for language acquisition are already present at birth. Since the seventies, a wide range of studies have shown infants' capacity for very early phonological parsing and discrimination (for an overview, see Panneton and Newman, 2012; Vihman, 2014). For instance, Eimas et al. (1971) found that infants as young as 1 month of age are able to discriminate the voice onset time (VOT) of synthetic stop consonants like /p/-/b/ in a manner approximating adult categorical perception. Similar results were obtained by Moffitt (1971) with 20- to 24-week-old infants in a study attesting the discrimination in place of articulation of different consonants. Given the limited exposure of newborn infants to speech, these results suggest that this categorical perception in a linguistic mode may be innate, and in the general debate on language nature vs. nurture, scholars such as J. Mehler have built upon these early capacities to argue for innatist "selectionist" theories of language learning whereby the baby "learns" her target language by "forgetting" others (cf. i.a. Mehler, 1974; Mehler and Dupoux, 1990).

What is more, the earliest fetal responses to auditory stimuli are reported at 19 weeks of gestation, long before the development of the fetal ear is complete (cf. Hepper and Shahidullah, 1994; Abdala and Keefe, 2012), and effects of very early auditory categorization have also been found in utero: a number of experiments have shown that third-trimester fetuses' auditory experience can influence their postnatal auditory preferences: newborns tend to quiet in response to their mothers' voice (and touch), Marx and Nagy (2015), and they also tend to prefer their mother's voice over other female's voice (cf. Mehler et al., 1978; DeCasper and Fifer, 1980; Fifer, 1981; Querleu et al., 1984; Spence and DeCasper, 1987; Ockleford et al., 1988; Hepper et al., 1993) 1 . Besides, as reported by DeCasper and Spence (1986), newborns also tend to be more reinforced by the audition of speech passages they heard in utero over passages they were not exposed to (and they can remember them for over a month; Granier-Deferre et al., 2011). Finally, Mampe et al. (2009) provide evidence that even the cry melodies of newborns of around 3 days of age are shaped in accordance with the intonational contours of the language they were exposed to prenatally (German vs. French). All this conforms evidence of a very early ability for the discrimination and memorization of complex sounds in newborn infants.

Regarding prosody and rhythm, there is ample evidence that newborns also have the ability for discrimination between inputs varying in different suprasegmental properties (see i.a. Morse, 1972; Olsho et al., 1982; Mehler et al., 1988; Karzon and Nicholas, 1989; Shahidullah and Hepper, 1994; Sansavini et al., 1997; Nazzi et al., 1998a,b; Carral et al., 2005). In

<sup>1</sup>However, infants do not seem to show preferences for their father's voice (cf. Ward and Cooper, 1999).

particular, studies like Nazzi et al. (1998a) show that babies can discriminate between languages pertaining to different rhythmic classes [such as Japanese (mora-timed) or British English (stress-timed)] when exposed to low-pass filtered speech signals. The setting in this type of experiment shows that babies discriminate between rhythmic classes because by lowpass filtering (e.g., under 400 Hz) the speech signal, it gets a dramatic degradation of its phonemic content (i.e., the vast majority of its formant structure is removed), while it retains its rhythmic structure. Other studies employing this type of low-pass filtered stimuli (like Byers-Heinlein et al., 2010) provide evidence that language discrimination in neonates which were surrounded by a bilingual environment prenatally is robust, and that that language preference reflects previous listening experience (see also Gervain and Werker, 2013; Molnar et al., 2014a,b). Besides, other types of studies show that at 4 1/2 months babies tend to listen longer to speech samples that include prosodic pauses corresponding to syntactic units, as opposed to speech samples with pauses that break syntactic units (cf. Jusczyk and Nelson, 1996 and references therein).

All these results are to be framed in the fast (pre- and post-natal) development of the basic structures for sound discrimination in humans (whereby infants already possess an adult-like dedicated neuronal network for phonological processing at 3 months of age (cf. Dehaene-Lambertz and Baillet, 1998 as well as Peña et al., 2003; Dehaene-Lambertz et al., 2006 or Dubois et al., 2015) 2 .

Interestingly, however, early acoustic discrimination is not a human-specific ability, for it is also observed in a wide variety of other animals like guinea pigs (Vince, 1979), sheep (Vince et al., 1982) or chinchillas (Kuhl and Miller, 1975), and discrimination of languages of different prosodic types is also mastered by different species like cotton-top tamarins (cf. Ramus et al., 2000), or rats (cf. Toro et al., 2003).

Nonetheless, there is a growing amount of literature arguing that human infants go well beyond mere acoustic patternrecognition and learning; evidence suggests that babies use the prosodic patterns of their target language in order to infer the syntactic structure underneath them in a sort of "reverse engineering." That is, part of the knowledge obtained by babies from categorical perception is restricted to a specific area (say, learning of the vowel space or the consonantal inventory of the target language), but a subpart of the learning obtained with this innate capacity is more consequential: learning the tunes of the surrounding language helps the child making informed guesses about the syntactic structure of the language [this is so because the prosodic pattern of a language partially reflects the syntactic structure underneath (cf. Gussenhoven, 2004; Truckenbrodt, 2007; Selkirk, 2011)]. This is in a nutshell the proposal of the "prosodic bootstrapping hypothesis."

# 3. THE "PROSODIC BOOTSTRAPPING" HYPOTHESIS

Prosody and rhythm are essential ingredients of natural language (cf. i.a. Brentari, 1999; Gussenhoven, 2004; Pfau and Quer, 2010) and a growing number of scholars argue that they have a close connection with other aspects of human cognition like musical aesthetics and computation, or our mathematical abilities (cf. i.a. Rebuschat et al., 2011; Arbib, 2013; Asano and Boeckx, 2015) 3 . Current literature converges in the idea that beyond the early ability for prosodic discrimination, "prosodic segmentation abilities emerge crosslinguistically some time around 8 months" (Nazzi et al., 2006, p. 296).

The rationale under the rapid acquisition of prosody could be seen as emerging from the combination of the following two factors:


<sup>2</sup> See Telkemeyer et al. (2009) for a near-infrared spectroscopy and EEG study showing that a right hemispheric lateralization for slow acoustic modulations (characteristic of prosodic features) is present at birth (see also Telkemeyer et al., 2011).

<sup>3</sup> See also Wang et al. (2015) for a proposal about the brain areas in charge of the human-specific ability for the integration of multiple features in abstract pattern learning.

<sup>4</sup>A reviewer rightly notes that these studies are performed with adult subjects, not with babies. Indeed, we need infant studies to assess the development of rhythmic brain oscillations (see Musacchia et al., 2015, for some of the first reported evoked oscillations analyses in infants).

Actually, a recent study of the spectral amplitude modulation in the speech rhythm shows that (Australian English) infantdirected speech "exaggerates" the synchronization between syllable-rate modulations and stress-rate modulations, whereas adult-directed speech is dominated by syllable-time modulations. This is taken as evidence showing that infant-directed speech "is primarily stress-dominant, which could "tune" the infant brain toward stress-based speech segmentation—an adaptive strategy for boot-strapping early language learning" (Leong et al., 2014). Such infant-directed speech hyperarticulations are taken to help the child acquire the relevant phonological distinctions in her language (Kuhl et al., 1997; Cristia, 2013), a knowledge that is mostly acquired during the first year of life (cf. i.a. Kuhl et al., 1992; Werker and Tees, 2002) 5 . Incidentally, it has to be noted that recent studies have shown that the characteristic "hyperarticulation" of infant-directed speech may be restricted to these suprasegmental levels of prosody, given that rather than hyperarticulated, phonemic contrasts can be hypoarticulated in infant-directed speech, i.e., that mothers hyperarticulate their infant-directed speech in prosodic aspects, but in segmental aspects mothers may "speak less clearly to infants than to adults" (cf. Martin et al., 2014).

Now, several authors have proposed that the early acquired rhythmic properties of languages are not idiosyncratic and isolated properties, but rather that they are strongly correlated with the particular syntactic properties of the particular languages (i.e., that there are correlations between rhythmic patterns and syntactic patterns in that languages tend to cluster with the same rhythmic and syntactic properties, conforming linguistic typologies). Furthermore, the explanation of this typological clustering is proposed to derive from the fact that rhythmic patterns serve to bootstrap or catalyze the acquisition of the specific syntactic patterns of each language (cf. i.a. Mehler et al., 1988; Christophe et al., 2003; Bernard and Gervain, 2012; Gervain and Werker, 2013; Langus and Nespor, 2013) 6 . In particular, a number of authors have proposed that the relative order between heads and their complements strongly correlates with the rhythmic type of the language. A number of experiments have shown that languages whose correlates of phrasal accent are increases in duration and intensity tend to be head-initial (with a Verb-Object word order) whereas languages that realize stress through a combination of higher pitch and intensity (and possibly also duration) tend to be head-final (with an Object-Verb word order)<sup>7</sup> . This generalization is known as the 'iambictrochaic law' (cf. i.a. Hayes, 1995; Nespor et al., 2008; Shukla and Nespor, 2010), which is taken to be a basic law of grouping based on general auditory perception (i.e., not specific to language) that states that units (language or music) that differ in intensity tend to be grouped as constituents in which the most prominent element comes first, and units that differ in duration are grouped as constituents in which the most prominent element comes last<sup>8</sup> . As Nespor et al. (2008) put it, "if [their] proposal is on the right track, one of the basic properties of syntax can be learned through a general mechanism of perception."

This line of reasoning is reinforced by recent studies such as Gordon et al. (2015) suggesting that there is a correlation between rhythm perception skills and morpho-syntactic production in children with typical language development (and note also that a strong association between reading skills and meter perception and rhythm processing has been found; Flaugnacco et al., 2014; Leong and Goswami, 2014). Likewise, studies like Zumbansen et al. (2014), Leong and Goswami (2014) report the beneficial effects of both pitch and rhythm in the clinical therapy for patients with Broca's aphasia.

In the next section, I argue for the natural combination of the "globularization" and "prosodic bootstrapping" hypotheses.

# 4. SYNTHESIS: THE "GLOBULARIZATION HYPOTHESIS" AS A DEVELOPMENTAL FRAME FOR THE "PROSODIC BOOTSTRAPPING" HYPOTHESIS

Let us focus on the two main ideas that we have seen so far, which are that (i) according to the "globularization hypothesis" of Boeckx and Benítez-Burraco (2014a,b), the postnatal globularization of the brain is an essential ingredient for the development of our syntactic capacities, and that (ii) according to the "prosodic bootstrapping hypothesis" of Mehler et al. (1988), Christophe et al. (2003), Bernard and Gervain (2012), Gervain and Werker (2013), Langus and Nespor (2013) and others, children use prosody in order to infer the syntactic pattern of the language they are acquiring.

The combination of these two hypotheses brings about an interesting picture regarding language acquisition: it leaves room for a delay in the acquisition of syntax with respect to prosody. If the "globularization hypothesis" is correct, syntactic capacities develop some months after birth and if the "prosodic bootstrapping hypothesis" is correct, children use prosody as a guiding principle for acquiring syntax. That is, babies may have a rich knowledge of prosody (as pure melodic patterns, unrelated to any syntactic structure) by the moment they develop the capacity to start parsing syntax. Crucially, all the data discussed in Sections 2 and 3 point in that direction: after some months of pre- and post-natal experience with linguistic input, babies have a fairly good knowledge of the prosodic properties of the language(s) spoken around them, this knowledge being arguably well established by the time they develop the structures necessary for parsing syntax. Therefore, babies will be able to use all this phonological knowledge as a guiding principle to discover the syntax behind the acoustic

<sup>5</sup> In fact, lack of adequate acquisition of the phonology of the target language can generate disorders such as dyslexia (Paulescu et al., 2001; Goswami, 2011; Saralegui et al., 2014; see also Benítez-Burraco, 2013)

<sup>6</sup>Donegan and Stampe (1983, 2004) have even proposed a "holistic typology" based on rhythmic grounds in order to account for the polarized structural divergence of languages like Munda and Mon Khmer.

<sup>7</sup> In turn, speakers of languages with different rhythmic patterns like English vs. Japanese tend to behave differently in the way in which they group nonlinguistic stimuli (Iversen et al., 2008).

<sup>8</sup> In a recent study de la Mora et al. (2013) observed that rats group sequences based on pitch variations as trochees, but that they do not group sequences varying in duration as iambs.

signals. As a matter of fact, the hypothesis by Boeckx and Benítez-Burraco (2014a,b) can provide a developmental rationale for the prosodic bootstrapping hypothesis of early language acquisition. Given Boeckx and Benítez-Burraco's hypothesis, it is natural for a rich phonological knowledge to be established before the syntactic ability develops, for the necessary mechanisms for phonological acquisition are present at birth. Then, endowed with a rich prosodic knowledge, language-acquiring children will be able to use it as a bias for hypothesizing the syntactic pattern of the target language (which in a Bayesian model could take the form of an informed prior). In an nutshell, the prosodic bootstrapping hypothesis claims that beyond the observed typological correlation between prosodic and syntactic patterns, there is a causal developmental connection between them: babies use prosody to guess the syntactic pattern of their target language and my proposal is that the globularization hypothesis provides a natural developmental frame for the prosodic bootstrapping hypothesis, for it presents

#### REFERENCES


a relatively late syntactic development vis à vis the prosodic development.

As a last remark, it should be noted that the globularization hypothesis—besides capturing the fact that prosodic knowledge precedes syntactic knowledge—also leaves room for explaining why first language acquisition is fast, but not immediate, for not all the necessary neurocognitive machinery would be established from birth (cf. Boeckx and Benítez-Burraco, 2014a,b). Even if innately specified, some maturation is in order for a fully language-ready brain.

# FUNDING

This work was partially funded by the following agencies: the European Union's Seventh Framework Program (AThEME 613465), the Basque Government (IT769-13), and the Spanish MINECO (FFI2012-38064-C02-01, FFI2014-53675-P, FFI2013- 41509-P).


specialization for speech perception and production. Neuron 56, 1127–1134. doi: 10.1016/j.neuron.2007.09.038


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer, Silvia Martínez Ferreiro, and handling editor, Cedric Boeckx, declared their shared affiliation, and the handling editor states that the process nevertheless met the standards of a fair and objective review.

Copyright © 2015 Irurtzun. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Temporal Attention as a Scaffold for Language Development

#### *Ruth de Diego-Balaguer1,2,3\*, Anna Martinez-Alvarez2,3 and Ferran Pons3,4*

*<sup>1</sup> Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain, <sup>2</sup> Cognition and Brain Plasticity Unit, Institut d'Investigació Biomèdica de Bellvitge, Barcelona, Spain, <sup>3</sup> Department of Basic Psychology, University of Barcelona, Barcelona, Spain, <sup>4</sup> Department of Basic Psychology, Institute for Brain, Cognition and Behavior (IR3C), University of Barcelona, Barcelona, Spain*

Language is one of the most fascinating abilities that humans possess. Infants demonstrate an amazing repertoire of linguistic abilities from very early on and reach an adult-like form incredibly fast. However, language is not acquired all at once but in an incremental fashion. In this article we propose that the attentional system may be one of the sources for this developmental trajectory in language acquisition. At birth, infants are endowed with an attentional system fully driven by salient stimuli in their environment, such as prosodic information (e.g., rhythm or pitch). Early stages of language acquisition could benefit from this readily available, stimulus-driven attention to simplify the complex speech input and allow word segmentation. At later stages of development, infants are progressively able to selectively attend to specific elements while disregarding others. This attentional ability could allow them to learn distant non-adjacent rules needed for morphosyntactic acquisition. Because non-adjacent dependencies occur at distant moments in time, learning these dependencies may require correctly orienting attention in the temporal domain. Here, we gather evidence uncovering the intimate relationship between the development of attention and language. We aim to provide a novel approach to human development, bridging together temporal attention and language acquisition.

Keywords: language development, infancy, attention, temporal orienting, statistical learning, rule learning, morphosyntactic development, word segmentation

# INTRODUCTION

Speech is a complex auditory stimulation. A single word in speech can be perceived as a sequence of phonemes, as a whole word, as a stem and a suffix or as having a specific meaning, depending on the level of processing. In order to face this complexity infants do not learn all of this information at once but rather in an incremental fashion. In particular, two main linguistic milestones -word segmentation and non-adjacent rule acquisition- appear in a sequential fashion. During the first months, infants are able to segment speech into words and recognize them; however, it is not until after the first year that they are able to understand and detect the subtle changes carried by different rule transformations (Gómez and Maye, 2005; Christophe et al., 2008). As a matter of fact, brain development in general is not uniformly distributed through infancy. Different brain structures and their white matter connections do not develop homogenously, nor do all cognitive functions develop at the same speed (Gogtay et al., 2004; Diamond, 2007). In particular attention shows also a developmental progression with different mechanisms arising at different moments

#### *Edited by:*

*Antonio Benítez-Burraco, University of Huelva, Spain*

#### *Reviewed by:*

*LouAnn Gerken, University of Arizona, USA Andriy Myachykov, Northumbria University, UK*

#### *\*Correspondence:*

*Ruth de Diego-Balaguer ruth.dediego@ub.edu*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

*Received: 15 September 2015 Accepted: 11 January 2016 Published: 02 February 2016*

#### *Citation:*

*de Diego-Balaguer R, Martinez-Alvarez A and Pons F (2016) Temporal Attention as a Scaffold for Language Development. Front. Psychol. 7:44. doi: 10.3389/fpsyg.2016.00044*

of development. *Exogenous attention*, captured by salient events in the environment, is functional much earlier than *endogenous attention*, which allows for selecting which information to process and which to ignore (Posner and Cohen, 1984). The progressive general cognitive development may be seen as affecting all functions independently. However, cognitive functions do not work in isolation. In particular, the attentional system acts as a filter to any incoming stimulation, influencing perception, and therefore may affect learning, which suggests that the development of the attention system is likely to shape the way language is processed and how it develops along with the available attention resources.

Extensive literature has previously reported a link between attention and different aspects of language processing. Adult speakers of different languages tend to adapt the syntactic structure of their productions as a function of their focus of attention on the visual information they describe (Myachykov et al., 2005; Ibbotson et al., 2013; Tomblin and Myachykov, 2015). Similarly, focus and topicalization are naturally used to draw attention to relevant elements in the sentence (Jackendoff, 2002). Indeed, using focus to comprehend sentences activates left inferior parietal lobe (IPL) overlapping with the attentionorienting network used in visual attention (Kristensen et al., 2012). Even babies produce isolated words in an attempt to attract the listener's attention to their own focus of interest (Jackendoff, 2002). This behavior is closely linked to the development of joint attention present at the end of the first year of life (Bruner, 1983; Carpenter et al., 1998; Brooks and Meltzoff, 2002). From joint attention interaction infants begin to link words with objects and events (Baldwin, 1991, 1993).

This previous literature converges with our proposal with the close link between attention and language, the source of the link is closely related in these studies to the communicative roots of language, given that attention is used as a tool to drive the listener's attention to the focus of attention of the speaker (Smith et al., 2011). However, our proposal, in contrast, is intimately related to processing. The development of attention affects how the input is processed because it filters the input received, independently of the presence of an interlocutor and in the absence of a message to be transmitted. This is what shapes language learning from this point of view, in the same way as it shapes learning of other sources of information.

The present proposal presents an integrative approach of language acquisition, in which the powerful and dynamic interplay of exogenous and endogenous attention mechanisms allows infants to focus on different aspects of speech at different moments in development. In particular, during the initial stages of language acquisition, attention is captured by salient elements of speech, such as prosodic cues (e.g., pitch, rhythm, or pauses) because the infant perceptual system is guided by stimulusdriven attention. As months pass and endogenous attention progressively develops, this more flexible mechanism can be used to learn non-adjacent linguistic dependencies. This allows filtering out irrelevant information and selectively focusing on relevant elements that reliably predict forthcoming information. As we will argue and support with evidence from infant development, changes in these two different aspects of cognition are not independent. In other words, the attention mechanisms available early on limit the type of linguistic information that infants can extract from speech. The delay in development of more controlled mechanisms of endogenous attention may not indicate a disadvantage in language acquisition but rather an advantage at early linguistic stages. That is, in agreement with Newport's "Less is More" hypothesis (Newport, 1990), this delay in the development of endogenous attention and the initial use of more automatic exogenous attention mechanisms may allow young infants to face a perceptual simplification of the complex speech stream early in the learning process. Therefore, relying in prosodic cues may be crucial and beneficial during the first months of life, when exogenous attention is the main mechanism available. Such a pattern would lead to the observed early segmentation and acquisition of words during the first months, which is followed by a shift later in infancy to focus on the upcoming information indicated by relevant cues (as in non-adjacent dependency learning) when the infant is able to select which information to attend and which to ignore.

Crucially, the other important difference from the previous theoretical approaches is that our proposal is based on the allocation of attention in the *temporal domain*. Whereas previous proposals have focused purely on visual attention and how it influences the conceptualization of the message to be conveyed (Levelt, 1989) and how this is reflected in the linguistic output, we are interested in how attention affects the processing of the ongoing auditory stimulation -the speech flow. Given that speech is a sequence of sounds that unfolds in time, attention to speech is necessarily oriented in the temporal domain. Because temporalselective attention directs resources to certain moments in time, enhancing perception (Correa et al., 2006; Nobre et al., 2007), it can allow for the extraction of different events in speech (e.g., consonants, vowels, words, and phrases) that have different durations and appear in a certain order and moments in ongoing speech. Recently, more general proposals have also underscored that cerebral mechanisms for timing and ordinal knowledge are in charge of the neural representation of sequences in different domains, including language (Dehaene et al., 2015). More precisely, speech has temporally rhythmic and salient prosodic cues that capture attention automatically when they appear, helping, for example, to locate boundaries to segment words. On the other hand, segments carrying cues for rule dependencies may have different durations and different onset times (such as suffixes and pronouns). Attention can be progressively tuned to focus on these cues when they are progressively noticed to predict later upcoming dependencies in a sequence of words. This tuning requires the engagement of endogenous attention in the temporal domain. Therefore, acquiring words and rules may require the engagement of different attentional systems. A dynamic shift between systems should develop in the course of learning. Indeed, recent data in adults show that the same prosodic cues can lead to exogenous effects related to segmentation, even in the absence of any possible learning and endogenous effects when the prosodic information can be used as a cue to extract non-adjacent rules (de Diego-Balaguer et al., 2015).

By studying the attention mechanisms involved in language learning, we also pave the way to understand some of the sources of language learning disabilities. In the following sections, we first describe the typically developmental trajectory of attention and language functions and their underlying brain development. We review evidence supporting the hypothesis that maturation of the attention mechanisms may serve as a scaffold for language development, and we review evidence indicating the close relationship between attention deficits and impairments in language acquisition.

# STAGES OF DEVELOPMENT OF THE ATTENTION SYSTEM

Before entering into the details of attention development, some conceptual clarifications can be helpful concerning the terms that are used throughout this paper. In the attention literature, a distinction between exogenous (bottom–up) and endogenous (top–down) attention has been classically proposed, and a plethora of studies have dissected the effects that characterize each of these systems and their interactions (Chica et al., 2013). In brief, both types of attention have been proven to facilitate processing. However, exogenous orienting appears even when a secondary task is performed, and it can be voluntary attenuated but not completely suppressed. Endogenous attention is often voluntary, but it can also appear with no effort and even when participants are not aware of the relationship between the cue and the target. Some models of attention (Corbetta and Shulman, 2002; Corbetta et al., 2008) propose a distinction in terms of *stimulus-driven* vs. *goal-directed* attention, which partially overlap with *exogenous* vs. *endogenous* attention but have important discrepancies that are worth mentioning here. Within the frontal and parietal brain regions involved in attention, stimulus-driven attention involves a more ventral fronto-parietal network (Corbetta et al., 2008), including the inferior parietal cortex, the ventral and inferior prefrontal cortex (PFC), and insula, as well as subcortically, the superior colliculus. Goal-directed attention, in contrast, involves a more dorsal fronto-parietal network, including the middle PFC and the superior parietal lobe (Corbetta et al., 2008), and subcortically, the pulvinar of the thalamus.

An important distinction within this framework that might help to understand the attentional systems in a less dichotomic way is the distinction between saliency when (i) no task or goal is present (i.e., exogenous saliency) compared to when (ii) elements are salient because they share some feature that is relevant for the task or goal of the subject despite not being the target of the task (i.e., task-relevant saliency). In other words, a red circle surrounded by green squares will attract attention in the absence of any task due to their exogenous saliency; however, a green circle will drive our attention if our task is to detect a green square because the circle shares a relevant feature (i.e., green) to our task. This distinction is important not only because neuroimaging data show that when performing a task, sensory-salient and task-relevant stimuli induce the activation of different brain networks but also because in the absence of a task, these different types of stimulation do engage the ventral network (Corbetta and Shulman, 2002; Chica et al., 2013). Because very young infants do not have a goal-directed system available, salient stimuli in the environment may trigger the ventral network and subcortical areas. With incremental learning and the progressive availability of the goal-directed system, the relevant elements in the environment attract the ventral attention system in a more *task-relevant* manner. In terms of what this might mean for language, infants may first be attracted by any change in pitch, pauses or in voice onset time in speech sounds due to their intrinsic saliency, whereas later in development, once prosodic characteristics, representations of the phonemes and words of their native language are learned, only those prosodic variations and speech sounds that correspond to the contrast of their language will attract their attention. To avoid misunderstandings in the course of the paper, we will refer to *saliency* to designate only those stimuli that attract attention due to their sensory characteristics irrespective of their relevance.

Turning back to the development of the attentional system, three main attentional mechanisms have been described in the literature on attention development: alertness, orienting, and endogenous attention (Colombo, 2001). The first two characterize exogenous attention. Rudimentary forms of each of the functions of attention are already present to some degree at birth, but each exhibits progressive maturity during the first years of life.

The arousal system is already present at early stages of development. This attention system is associated with an infant's level of alertness and readiness to process stimuli from the environment. From birth to 2 months of age, alertness is commonly initiated by exogenous stimulation (Wolff, 1965). Very young infants show "obligatory attention" (Stechler and Latz, 1966) or "sticky fixation" (Hood, 1995), a difficulty in interrupting gaze from a given stimulus they are fixating in order to shift attention to a different one. The efficiency of disengaging from and shifting gaze to a stimulus increases during the first months after birth (Hunnius and Geuze, 2004). This phenomenon is tied to the neurological maturation of the visual pathway and associated with subcortical structures (Richards et al., 2010) that, although present at birth, are still developing in terms of their connectivity to most cortical areas (Casey et al., 2004; Uylings, 2006). Between 2 and 3 months of age, maturation moderates the inhibition mechanisms that limit eye-movements, which start to gain cortical control. Effective visual exploration requires disengaging and shifting gaze across different locations, and the perseveration of this sticky phenomenon at 7 months of age is an early feature of later emerging autism (Elison et al., 2013) that persists with disengagement difficulties in childhood (Landry and Bryson, 2004). Therefore, this attentional mechanism might be more related to communicative and social aspects of language development.

At later stages, infants start developing the ability to orient attention toward a particular stimulus in space (Courage and Richards, 2008). The infant's visual behavior in the first year of life is dominated by an orienting system of attention with two main components (Ruff and Rothbart, 1996). On the one hand, the spatial-orienting network, which includes the posterior parietal cortex and several subcortical systems, mediates attentional functions, such as engagement, disengagement, shifting, and inhibition of return. On the other hand, the object recognition network, which includes pathways from the primary visual cortex to the parietal cortex and the inferior temporal cortex, orients attention to object features. A remarkable developmental progression of this orienting system occurs between 3 and 9 months of age (Ruff and Rothbart, 1996). During this period infant flexibly and quickly orient attention to stimuli in the environment in terms of experiential factors (e.g., novelty or complexity) rather than their exogenous salience (Courage et al., 2006).

Finally, the endogenous orienting of attention shows a slower and later developmental time course than other attention systems, showing a remarkable change during the later parts of the first year and beyond (Colombo, 2001). It is not until the end of the first year that more complex features arise and endogenous control of attention starts acquiring an executive component to a greater extent (Courage and Richards, 2008), which is closely related to the initial maturation of the dorsal prefrontal and the anterior cingulate cortices (Posner and Petersen, 1990). Age-related attentional improvements are related to changes in structural and functional connectivity (Rueda et al., 2015). Data reveal that increased attentional performance is related to greater information transfer in the brain, which involves distributed brain nodes and paths that connect these nodes. Crucially, the anterior cingulate cortex does not begin to develop long-range connectivity with other brain areas until after the first year of life, developing progressively during childhood (Fair et al., 2009; Gao et al., 2009). However, although the structural connectivity pattern in children resembles that of adults, the functional connectivity of attentional networks shows different patterns. While adults' orienting and executive attentional systems exhibit separate functional networks, these systems are more unified in children (Fair et al., 2007). In summary, whereas exogenous attention shows an earlier maturation course, endogenous attention develops later and slower, continuing its development through childhood and until adolescence (Colombo and Cheatham, 2006; Johnson et al., 2015).

Despite the apparent dichotomy between exogenous and endogenous attention, the appearance of the latter does not imply an inhibition of the exogenous mechanism but a better interaction between both. As it has been highlighted in the adult literature (Corbetta et al., 2008), stimulus-driven attention is able to break the engagement of goal-directed attention, highlighting the close interaction between systems. However, goal-directed attention can attenuate the interference from distractors by decreasing the activation of stimulus-driven attention. The ability to ignore salient distractors to support learning is observed at 8 and 12 months (Althaus and Mareschal, 2012; Tummeltshammer et al., 2014) but is not present earlier, indicating a stronger influence of exogenous factors on young infants' attention. From this point of view, it is clear that the development of the exogenous and endogenous mechanisms of attention do not show a strict sequential order but rather a smooth overlap with subtle signs of endogenous attention appearing before 8 months of age but with poor command observed to progressive control reached at the end of the first year (see **Figure 1**).

Most of these developmental descriptions of infants have been based on studies of visuospatial attention. Although this description is useful in understanding the developmental progression of attention in general, tracking auditory information in time is critical when we consider language learning due to the intrinsic temporal characteristics of speech. Generally speaking, to orient attention in time, different capacities need to be in place. Infants need to be able to perceive the difference between distinct temporal lags and be sensitive to the order of elements in a sequence. Once these perceptual capacities are available, the attentional system should be able to orient to these different elements in the auditory domain to extract information from speech. Temporal processing is supported by a corticosubcortical network, including the premotor cortex, basal ganglia and cerebellum that have been proposed to also be involved in speech processing (Kotz and Schwartze, 2010). As previously mentioned, subcortical structures are functional and are used by infants from birth (Casey et al., 2004; Uylings, 2006), and sensorimotor cortices are the first to develop (Dehaene-Lambertz and Spelke, 2015). The early availability of these structures may allow infants to use temporal information in an exogenous manner in the early stages of development.

Studies exploring infants' ability to perceive time mostly focus on regular temporal structure perception (e.g., rhythm and regular isochronic sequences), i.e., focusing in attention mechanisms that are mainly exogenous and stimulus-driven (Demany et al., 1977; Haith et al., 1988; Adler et al., 2008). These studies show that in the first months of age, infants are sensitive and can orient their attention in time following regular patterns. vanMarle and Wynn (2006) reported that 6-month-old infants can discriminate event durations between 2 and 4 s. Brannon et al. (2004, 2008) additionally showed that 10-month-old infants can detect changes in temporal rhythm by detecting a temporal deviation in a stream of tones formed by a regular inter-stimulus interval. In terms of infants' ability to benefit from rhythmic and regular patterns, their behavior is similar to that observed in adult research (Large and Jones, 1999; Barnes and Jones, 2000; Sanabria et al., 2011).

In contrast, the ability to orient attention in time endogenously has not been reported in infancy. Recent data (Martinez-Alvarez et al., under review) indicate that the ability to endogenously orient attention in time appears after the ability to orient attention in space. More precisely, whereas 12-month-olds show only spatial orienting abilities, 15-month-olds are able to adapt their anticipatory behavior according to both spatial and temporal predictive cues. A recent study with children revealed also that the developmental trajectory of voluntarily use temporal cues is delayed relative to the use of spatial cues. However, this study showed that 11-year-olds were only able to implicitly but not voluntarily orient attention in time (Johnson et al., 2015), which apparently seems to contradict the results with infants (Martinez-Alvarez et al., under review). Different explanations of these results with children are possible. As Johnson and colleagues explain, one possibility is that the temporal cues were

conceptually more demanding than the spatial cues. Another possibility is that the spatial uncertainty of target appearance in their paradigm diminished the utility of the temporal cue. Electrophysiological and behavioral investigations have shown that temporal predictability is most successful when joined with spatial predictability (Doherty et al., 2005; Rohenkohl et al., 2014). Indeed, preliminary evidence from Coull's lab shows that children can use temporal cues when the spatial location of the target is known in advance (Johnson et al., 2015). Further investigations are needed for a better understanding the development of temporal attention at the functional and anatomical levels.

In sum, the development of attention is characterized by a shift from exogenous, stimulus-driven orienting of attention, particularly during the first 3 months of age, to a smooth progression to greater endogenous control, the first hints observable before 8 months and showing a marked dominance after the first year (Johnson, 1990; Ruff and Rothbart, 1996). Although little evidence is available from attention orienting in the temporal domain, clear effects of sensitivity to temporal differences and rhythmic cues are present early on, whereas endogenous orienting of attention in time appears later.

# LANGUAGE DEVELOPMENT IN INFANCY IN RELATION TO ATTENTION

In the current section, we review the developmental trajectory of linguistic abilities in infants, focusing on studies on word segmentation and non-adjacent rule learning, which are the two main milestones of interest for our hypothesis. Throughout the review, we point out the role of the attention mechanisms related to the language data available at each stage.

# Early Stages of Language Learning and Exogenous Mechanisms of Attention

The early capacities of infants to acquire their native language have been extensively reported. Even before they begin to produce their first words, infants have already acquired an important amount of linguistic knowledge. One very early ability is their sensitivity to perceive the rhythmic characteristics of language at birth, showing discrimination of the stress patterns in different languages and an early preference for their native language stress pattern (Nazzi et al., 1998). Importantly, prosodic characteristics, such as intonation, stress and pitch variations, are salient perceptual cues that can easily attract infant's exogenous attention. These prosodic cues play a key role in word segmentation because infants exploit these even before they use other cues to locate word boundaries (Mattys et al., 1999).

During the first months of life, infants are capable of extracting words in spoken language by detecting and exploiting other perceptual cues. For example, neonates (Teinonen et al., 2009) and 8 months old infants can make use of statistical regularities between adjacent syllables (also known as transitional probabilities, TP) to locate word boundaries and extract words from both artificial (Saffran et al., 1996) and natural languages (Pelucchi et al., 2009). On the other hand, the combination of both prosodic and statistical cues shows how predominant are the acoustic features of infant-directed (ID) speech (e.g., exaggerated pitch contours) compared to adult-directed (AD) speech to attract infants' attention (Fernald, 1985; Cooper and Aslin, 1990) and to facilitate infants' word segmentation (Trainor and Desjardins, 2002; Thiessen et al., 2005).

Statistical learning is a remarkable ability, and numerous studies have been developed to understand the mechanisms underlying and the factors affecting this type of learning. Indeed, several important features of growing literature with the same paradigms are important to mention in relation to the hypothesis outlined here. One critical factor is that statistical learning is a simple adaptive capacity that can also be found in other animals that have much less developed prefrontal cortices. For example, rodents exposed to the same type of linguistic speech streams are able to correctly segment it, albeit with a somewhat different computation (Toro and Trobalón, 2005). Another important feature is that the presence of statistical regularities in the input captures attention (Turk-Browne et al., 2005). Thus, even when no effort to learn is given, regularities can be extracted from the input (Saffran et al., 1997), capturing our attention in an automatic manner, which is consistent with the fact that even newborns are able to detect these statistical regularities (Teinonen et al., 2009). Therefore, the development of endogenous attention is not necessary for this learning to occur. In the same vein, electrophysiological evidence indicates that once words are segmented, the recognition/detection of a known word within the speech stream also captures attention (Sanders et al., 2002; Parise et al., 2010; de Diego-Balaguer et al., 2015), enhancing the long-term memorization of the segmented word forms.

Another important fact underscores the importance of exogenous attention in these early learning stages and highlights the adaptive function of the unavailability of the endogenous system in young infants. In adults, the manipulation of diverted attention, orienting endogenous attention outside the speech stream, can interfere with teach (Toro et al., 2005). Adults and older infants can orient their attention endogenously, and although this can be helpful when it converges to track the critical information for learning, it can also interfere with learning when it diverts from the correct focus of attention. For example, attention diverted from the dependency by the attraction of novel words that need to be ignored prevents nonadjacent learning. Infants' ability to generalize the detection of non-adjacent dependency to nonsense stems (e.g., *These meeps*) occurs only if they are first presented with familiar stems (e.g., *These chairs*). When attention is captured by a *novel* intervening element, learning of non-adjacent dependencies is altered (Soderstrom et al., 2002). In contrast, if salient information automatically captures infants' attention and this information is helpful for learning, the absence of endogenous attention prevents infants from disengaging and reorienting their attention to a different focus of attention that may interfere with the correct computation. In this way, the early dominance of this automatic exogenous mechanism can make learning more likely to occur. Other salient features, such as adjacent repetitions, can also act as important attentional attractors improving learning. Already present at birth, infants possess an automatic perceptual mechanism to detect repetitions in the auditory domain (Endress et al., 2009). This is reflected in greater activation in the temporal and left frontal brain areas when tested for recognition after exposure to simple repetition-based structures (ABB; e.g., "mubaba," "penana") than to random sequences (ABC; e.g., "mubage," "penaku").

Overall, the evidence indicates that the characteristics of speech with their statistical regularities and the salient prosodic cues are perfectly adapted to make the most of the early availability of exogenous attention. By engaging exogenous stimulus-driven attention, available since birth, learning can be achieved. The absence of control of voluntary attention at these early stages of development does not limit infants' ability to acquire language but rather helps them by allowing infants to follow their stimulus-driven mechanism to capture the relevant information for learning automatically.

# Early Signs of Non-Adjacent Dependency Learning in the Rise of Endogenous Attention

Although the ability to segment and extract words from speech is a critical milestone of language acquisition, to acquire the grammar of their language infants must also track nonadjacent relationships. Importantly, the extraction of hierarchical structures relies on temporally distant relationships and is fundamental to capture the properties of language (Chomsky, 1957). Nonadjacent dependencies refer to cases in which two elements co-occur over one or more intervening elements. In natural languages, for example, in English, there is an association between auxiliaries and inflectional morphemes, irrespective of the intervening verb stem (e.g., **is** walk**ing**; **is** runn**ing**; **is** eat**ing**). Infants must dismiss the variable irrelevant information and focus instead on the invariant relevant cues that predict the non-adjacent dependency (Gómez and Maye, 2005).

Because the endogenous system appears progressively in the course of development, its initial use in its earliest stages depends on the convergent presence of exogenous cues. The first signs of non-adjacent tracking in language are observed in the phonological domain where the presence of exogenous cues helps infants to track the dependencies grouped by their high similarity (for a review, see Sandoval and Gómez, 2013). For example, infants as young as 7 months can use harmony on vowel which are more salient than consonants and linked to prosodic variations as a cue to find word boundaries (Mintz and Walker, 2006; Kanpem et al., 2008) but cannot use consonantal harmony (Nazzi et al., 2009; Gonzalez-Gomez and Nazzi, 2012). They need to reach 10 months of age before they can, an age where endogenous attention starts to be more prominent. In a similar vein, newborns can discriminate adjacent rules based on the repetition of the same syllable but not when rules are non-adjacent (ABA; e.g., "bamuba," "napena") (Gervain et al., 2008). In contrast, 7-month-old and older infants track nonadjacent dependencies but only under some circumstances; when non-adjacent syllables are identical and the interleaved syllables are different (e.g., *le di le, ga po ga)* (Marcus et al., 1999; Gerken, 2006). Unexpectedly, a more recent study demonstrated that German infants as young as 4 months of age could discriminate between grammatical and ungrammatical nonadjacent dependencies in Italian (Friederici et al., 2011). As the authors indicate, Italian morphosyntactic dependencies also contain phonological dependencies. Given that phonological, non-adjacent dependencies are tracked from very early stages in development, it has been proposed that 4-month-olds may be tracking the phonological aspects to discriminate these non-adjacent dependencies. In other words, the exogenous attentional resources already available to 4-month-olds could have driven the success of such young infants in this task.

# Learning more Challenging Non-Adjacent Dependencies with Greater Maturation of Endogenous Attention

Although young infants can track non-adjacent linguistic rules under certain learning conditions (e.g., when the dependent units are similar and the intermediate elements are dissimilar), learning of morphosyntactic dependency appears several months after phonological dependency learning has occurred. This non-adjacent dependency learning that appears in more challenging perceptual and linguistic arrangements, requires greater involvement of endogenous mechanisms. Simply tracking non-adjacent dependencies can be used to locate word boundaries but is not enough to extract and generalize the underlying rule that entails the creation of abstract categories for generalization (Peña et al., 2002).

In this context, prosodic information in natural languages provides reliable cues not only for word segmentation but also for rule learning (Jusczyk, 2002) because prosodic pauses tend to co-occur with syntactic boundaries. Nevertheless, although prosodic cues play a role in word segmentation from birth, it is not until the first year that infants start exploiting these cues for rule extraction (Johnson, 2008; Seidl and Johnson, 2008). The presence of this prosodic information in an artificial language enhances the extraction of non-adjacent dependencies compared to continuous speech streams without pauses (Peña et al., 2002). An important point to highlight is that the presence of pauses *per se* does not improve learning of non-adjacent dependencies. Those pauses need to occur at the boundaries of the position of the dependencies to be useful (Endress et al., 2005; Mueller et al., 2010). The use of these cues (stress pattern or prosodic pauses) for word segmentation requires only orienting attention to the position of the prosodic information that captured attention, that is, in an exogenous fashion. However, the use of prosodic pauses for rule extraction additionally requires the use of this cue to selectively focus attention on concurrent phonological information at this specific position. This cue then has to be used as a relevant predictor of forthcoming information to extract the rule dependency, which implies focusing attention to this cue and the predicted element while disregarding the intervening irrelevant information.

Within the morphosyntactic domain, nonadjacent relationships are often found between subject and verb agreement (**he** walk**s**) or between auxiliary and verb agreement (he **is** walk**ing**). Learning of morphosyntactic, non-adjacent dependencies emerges after the first year of life (Gómez and Maye, 2005). This developmental course is reasonable when considering the challenge of the task, that is, in order to track the dependency among non-adjacent elements, infants must first identify the morphemes without involving any given similarity and then track the dependency between them across intervening elements irrelevant to the rule dependency (**he** walk**s**; **he** run**s**; **he** eat**s**). In one of the first studies exploring infants' ability to learn verb–tense agreement (Santelmann and Jusczyk, 1998), researchers reported that 18-month-olds accepted grammatical phrases in English, such as "is running," and rejected ungrammatical phrases, such as "can running," whereas 15-month-olds were not able to differentiate between the phrases. Moreover, learning was possible only under certain conditions, with infants succeeding when the intervening element extended three syllables or less (e.g., *Grandma is always singing*, but not *Grandma is almost always singing*).

In addition, in order to learn a non-adjacent relation of the form "**these** cat**s,**" infants must track a dependency between two elements that occur over an intervening element and create different categories (e.g., determiner, noun, verb). Several lines of research have explored the mechanisms underlying the ability of grouping elements into categories. For example, it has been proposed that *frequent frames* (e.g., "these × are") yield category formation by their frequent co-occurrence with intervening content words and constitute the basis for the creation of grammatical categories (Mintz, 2003). Gómez and Maye (2005) showed that 15- and 18-month-old succeed when frames have high variability in the intervening word inside the frame but failed with low variability which is in agreement with the frame-based categorization proposed by Mintz (2003). Similarly, increasing the variability of the irrelevant intervening information makes adjacent relations less statistically informative and the non-adjacent dependency more prominent allowing learners to focus on the relevant and reliable relationship among non-adjacent elements (Gómez, 2002). Interestingly these different studies converge in a similar age between 15 and 18 months old as Gerken et al. (2011) where these authors found that infants use selective attention to focus on languages having learnable grammatical patterns. These studies converge to the parallelism between the development of non-adjacent dependency learning, category formation and endogenous control of attention in the second year of life. The importance of correct tuning in attention for the acquisition of non-adjacent rules is also seen in the Lany and Gómez (2008) study, where infants younger than those of the previous studies were able to track non-adjacent dependencies if the correct attention focus was guided by training them first on the dependencies between categories. Infants later discriminated grammatical and ungrammatical items involving non-adjacent dependencies with the same category words.

Thus, the overall pattern in agreement with the progressive ability to orient attention endogenously and the close collaboration between exogenous and endogenous attention. Early on, infants need more concurrent exogenous cues such as high degree of similarity, same identity between the dependent pairs (Creel et al., 2004; Onnis et al., 2005) or prior exposure to them (Lany and Gómez, 2008; Lai and Poletiek, 2011) (for a review, see Perruchet et al., 2012), to help them to orient their attention to the relevant information (Pacton and Perruchet, 2008; Pacton et al., 2015), allowing a greater interaction between exogenous and endogenous attention. At later stages of development, after the first year of age, the improved endogenous system allows infants to rely less on the availability of these salient features to orient their attention to the relevant information.

An important point to consider is that signs of discrimination of more complex non-adjacent dependencies at a very early age have been observed only in electrophysiological studies (Mueller et al., 2012). Online EEG measures may not reflect the same knowledge as more overt behavioral responses that require greater explicit knowledge. In that sense, it is worth considering that indicators of prediction present from birth are reflected in mismatched responses in the EEG at the presentation of unexpected events, and these early online effects reflect these more automatic prediction mechanisms. However, recent research has shown that electrophysiological indexes of conscious access, equivalent to the P300 in adults, that show a nonlinear pattern, can only be tracked clearly at the end of the first year. This response associated to consciousness was visible and sustained from 12 to 15 months of age (750 ms) and may serve to amplify the sensory input through selective attention (Kouider et al., 2013). Conscious access before the first year of age may not be possible because even if the structural architecture is in place, its immaturity may not allow an adequate flux of information for conscious availability (Dehaene-Lambertz and Spelke, 2015). From the perspective presented here, this conscious access may be required for these predictions to reach a long-lasting representation that may allow the infant to show behavioral effects. More studies are needed to examine early computation of different types of non-adjacent dependencies in infancy. New research should take into account the role that variables attracting attention may have in their acquisition in order to understand when the capacity actually arises in development.

# BRAIN DEVELOPMENT OF THE ATTENTION AND LANGUAGE NETWORKS

In terms of brain development, the parallel maturation of the attention and language network is also evident. This is in part unavoidable given the partial overlap between those two networks. As we have previously mentioned, a fronto-parietal network with either more ventral or dorsal distribution is related to stimulus-driven and goal-directed attention mechanisms, respectively. These areas are connected through the superior longitudinal fascicle (SLF), and the ventral and dorsal connectivity is ensured through the SLF III branch and I branch of this fascicle, respectively; these two connections have been proposed to interact through the SLF II, which connects the dorsal regions of the PFC to the ventral regions of the parietal lobe (de Schotten et al., 2011; **Figure 2**, left).

For language, a division in ventral and dorsal pathways ensures audio-motor integration and language production dorsally and language comprehension and semantic processing ventrally (Hickok and Poeppel, 2007). Direct connections between the language-related areas in the left frontal and temporal cortices are sustained dorsally through the arcuate fasciculus (AF) and ventrally by paths running through the extreme capsule (Saur et al., 2008; Brauer et al., 2011, 2013). The dorsal connection is also assured indirectly through the parietal lobe with shorter segments (Catani et al., 2005): an anterior segment connecting the premotor and inferior frontal regions with the IPL and a posterior segment connecting inferior parietal and temporal cortices (**Figure 2**, right). There is some controversy concerning the terminations of the AF (Dick and Tremblay, 2012, for a review). Interestingly, this bundle overlaps with the SLF III, previously mentioned in relation to the ventral attention network, the anterior segment of the AF and the SLF III having a greater right lateralization (Catani et al., 2005; López-Barroso et al., 2013). Although the same nomenclature is used for attention and language in terms of ventral and dorsal streams, only the ventral attention and dorsal language stream overlap (see **Figure 2**). Although we have based this section on models of attention based on visual attention, Corbetta et al. (2008) did mention that the ventral attention network responds to different modalities. The overlap between the ventral attention and dorsal language networks is even greater if we consider that the temporal attention network shows a greater left functional lateralization, pointing again to the importance of temporal attention in speech processing (Coull et al., 2011).

During development, studies of whole ventral language connections demonstrated that newborns exhibit an adult-like ventral connection between the frontal and temporal lobes, and even children at 7 years of age have a preferential use of this pathway for sentence comprehension, in contrast to adults, who preferentially use the dorsal pathway (Brauer et al., 2011). In contrast, the dorsal pathway follows different developmental courses, with two subparts maturing at different rates. Whereas the dorsal connections reaching the premotor cortex are functional at birth, the terminations of the dorsal pathway reaching the posterior portion of Broca's area (BA 44) are still underdeveloped (Perani et al., 2011) and are not fully myelinated at the age of seven (Brauer et al., 2011, 2013). Adult studies on the learning of new languages indicate that whereas the audio-motor subpart is related to word learning (López-Barroso et al., 2013), the processing of non-adjacent elements relies on the latter subpart running from frontal BA 44 (Friederici, 2011). This pathway, which may support hierarchical (non-local) dependencies (Boeckx et al., 2014), follows a later and slower rate of development, similarly to non-adjacent, dependency rule learning.

Rapid changes are observed during the first year of life in terms of maturation. Sensory and motor systems myelinate earlier than brain systems serving higher level functions (Flechsig, 1920). Myelination starts at different times and occurs at different rates in different areas. At birth, there is little differentiation between gray and white matter in cortical areas. The primary visual cortex rapidly matures during the first 3 months with parallel myelination of optical radiations, whereas the primary auditory cortex and acoustic radiations extend over the first 3 years of life. The frontal areas and cortico-cortical connections continue to mature until puberty, but myelination is already observed during the first year in all associative regions. Diffusion measures increase with the compactness/myelination of the tracts in the left lower part of the cortico-spinal tract and in the parietal part of the AF relative to the right during the first 3 months of life. During this period the maturation of the right hemisphere is generally faster than the left (i.e., superior temporal sulcus, STS), but the inferior frontal gyrus (IFG) shows earlier left than right development. The left AF matures faster than the right

and correlates with the maturation of BA 44 and the posterior part of the STS (Dehaene-Lambertz and Spelke, 2015 for a review).

Postnatal maturation shows subcortical white matter expansion in the connections to the frontal, anterior temporal, and parietal cortices, as measured by diffusion imaging and volume expansion (Hill et al., 2010). Surface expansion reflects an underlying change in synaptogenesis, dendritic arborization, gliogenesis, and intracortical myelination. The lateral temporal and parietal lobes and the dorsal and medial prefrontal regions are functionally and structurally not mature at birth. They show high expansion in cortical folding in both hemispheres in infants compared with adults. The latest maturation in synaptic density, peak cortical thickness, and mature values of gray matter density are reached in the dorsolateral prefrontal cortex. The comparison between human and macaque monkey cortices reveals that these dorsal, medial frontal and lateral parietal cortices show correlated high postnatal and evolutionary expansion (Hill et al., 2010). This pattern suggests similar patterns of cortical expansion in the development and evolution of these areas, which points to the importance of these areas for human specific functions.

These changes in connectivity at the structural level are also reflected in functional connectivity. Graph-theoretic measures of infants' brains (Power et al., 2010) indicate that the developing functional networks are in some respects similar to adult networks. The necessary connections are present; however, the brain connectivity compared to adults tends to have strong resting state functional connectivity MRI (rs-fcMRI) signal correlations with nearby regions, even during childhood. The progressively local correlations tend to weaken, whereas correlations with more distant regions, such as those between the frontal and parietal cortices, tend to increase. This trend stems from synaptic pruning that contributes to reduced local rs-fcMRI correlation, and myelination that could facilitate increased long-range connectivity.

Considering the overall attention and language networks, brain regions and connections of overlap are observed between the ventral attention network and the dorsal language network (**Figure 2**). Ventral prefrontal and insular regions integrating the ventral attention network and the anterior segment of the AF show an early availability, whereas the IPL and their connections show a later and more progressive development. This delayed development also affects the dorsal attention network with the dorsal prefrontal regions having a slow maturation extending to childhood and with delayed maturation of the parietal lobe (Casey et al., 2000; Fuster, 2001). There is evidence showing that the left IFG is engaged in the extraction of TPs (Karuza et al., 2013) as well as the PMC (Cunillera et al., 2009) when no other cue is available to segment speech (McNealy et al., 2006; Scott-Van Zeeland et al., 2010). The early functionality of the left IFG and the premotor cortex (PMC) allows early use of TPs and stimulus-driven attention to orient to salient prosodic information and to segment speech. When the dorsal prefrontal cortex starts to be maturationally functional during the second year of life (Colombo and Cheatham, 2006), the dorsal fronto-parietal network allows for more proficient control of attention. The later maturation of the dorsal prefrontal cortex (DLPFC) and part of the ventral attention network (i.e., IPL), including the temporo-parietal junction (TPJ) (see **Figure 2**), allows progressively to (i) orient the ventral attention network to task-relevant representations (e.g., phonemes of the native language and segmented words) created in the earlier stages of development, (ii) recruit goal-directed attention, (iii) optimal functioning of the attention system, that requires the effective interaction between the two networks through the TPJ and the DLPFC (Corbetta and Shulman, 2002; Corbetta et al., 2008), necessary to accurately and selectively attend to specific stimuli and shift the focus of attention when relevant stimulation appears.

# ATTENTION DEFICITS AND LANGUAGE DEVELOPMENT DISORDERS

The proposal delineated here makes a straightforward prediction in relation to the effects of attention deficits in language development. If control of attention is a function used for the optimal acquisition of non-adjacent rule dependencies, then impairments in the development of this function should interfere with the acquisition of these rules. In contrast, early language development relying on more automatic attention mechanisms should not be affected.

Commonly, children acquire language rapidly and effortlessly. However, some children show problems acquiring language. Specific language impairment (SLI) is classically defined as a developmental disorder of language characterized by difficulty in acquiring language in the absence of neurological damage, hearing deficits, or intellectual disabilities (Bishop, 1992; Leonard, 1998). The prevalence of SLI in pre-school children is approximately 7% (Tomblin et al., 1997; Law et al., 2000). Longitudinal studies reveal that more than 70% of diagnosed cases of SLI in kindergarten persist into adulthood (Johnson et al., 1999). SLI children have been shown to have difficulties in the acquisition of non-adjacent dependencies (Hsu et al., 2014) and in the use of prosodic information for syntactic processing (Sabisch et al., 2009). In a longitudinal study, impaired prosodic processing of word stress during early development was shown to be an early marker of risk for SLI (Weber et al., 2005).

Linguistic impairments often co-occur with non-linguistic deficits, including attention-deficit/hyperactivity disorder (ADHD). Both SLI and ADHD frequently overlap within the same children, that is, comorbidity between the two disorders is commonly found (Baker and Cantwell, 1992; Benasich et al., 1993; Coster et al., 1999; Noterdaeme and Amorosa, 1999; Tomblin et al., 2000; Lindsay et al., 2007). ADHD is the most frequent diagnosis among children with language impairments (Cohen et al., 2000). Longitudinal studies suggest that SLI children have a profound risk for ADHD (Baker and Cantwell, 1987; Cantwell and Baker, 1987; Beitchman et al., 1989; Benasich et al., 1993; Redmond and Rice, 1998, 2002). More precisely, deficits in selective attention (Stevens et al., 2008) and sustained attention (Spaulding et al., 2008; Finneran et al., 2009) have been found in children with SLI. ADHD is a common childhood disorder characterized by a persistent pattern of inattention and/or developmentally inappropriate levels of hyperactivity/impulsivity (American Psychiatric Association, 2000). ADHD prevalence is approximately 10% in children (Faraone et al., 2003; Pastor and Reuben, 2008). As with SLI, children with ADHD are a highly heterogeneous group. ADHD is commonly divided into three subtypes: ADHD-Inattentive (ADHD-I), ADHD-Hyperactive-Impulsive (ADHD-H/I), and ADHD-Combined type (ADHD-C). Whereas children in the ADHD-I subgroup usually show difficulties with attention control, sustained attention and are often inattentive, ADHD-H children exhibit high levels of activity and poor impulse control. ADHD-I children do poorly in tasks requiring sustained attention, covert shifting of attention and selective attention. Thus, individual differences in the control of selective attention in infancy may be related to ADHD-I outcomes. Children in the ADHD-I group are more probable to meet criteria for learning disability than ADHD-H children (Willcutt and Pennington, 2000).

Similar to the findings on attention deficits found in SLI children, a similar pattern is present in ADHD children. Between 50 and 90% of children with ADHD have co-occurring language difficulties (Gualtieri et al., 1983; Camarata et al., 1988; Love and Thompson, 1988; Tirosh and Cohen, 1998). However, the overlap between these disorders shows an asymmetrical pattern, that is, more ADHD children have co-occurring SLI than SLI children have co-occurring ADHD (Tannock and Schachar, 1996). Higher order cognitive functions (e.g., executive functions, working memory, and attention) have been explored as possible causal deficits for SLI and ADHD disorders (e.g., Cardy et al., 2010; Hutchinson et al., 2012).

In SLI, abnormal diffusion measures are observed systematically in the SLF and AF (Verhoeven et al., 2012; Roberts et al., 2014). A more recent study showed also differences in the ventral language network (i.e., the inferior fronto-occipital fasciculus, IFOF) (Vydrova et al., 2015). The discrepancies between studies may stem from the heterogeneity of the disease with children with more semantico-pragmatic profiles that are more likely to show differences in the IFOF function and those with and without associated ADHD, which may cause an associated SLF abnormality in addition to the AF. The brain structures supporting cognitive functions commonly associated with ADHD have also been investigated. Gross anatomical changes in brain dimensions are often associated with ADHD, specifically, reduced dimensions of the caudate nucleus, the prefrontal cortex, the corpus callosum, and the cerebellar vermis (see Bush et al., 2005 for a review) and in the parietal lobes (Sowell et al., 2003) are found in ADHD. Evidence from pathophysiology research has shown that ADHD physiology involves dopaminergic and noradrenergic pathway dysfunction in the prefrontal cortex and subcortical regions of the brain (Barkley et al., 1992; Castellanos et al., 1996; Faraone and Biederman, 1998; Konrad et al., 2006). This network partially overlaps with both goal-directed attention and temporal processing. DA dysfunction affects mainly the dorsal regions of the PFC, which are those required for goal-directed attention. The subcortical regions affected (i.e., striatum) and the cerebellum are important structures for temporal processing (Coull et al., 2011).

Recent studies have provided the first evidence that temporal selective attention during speech perception predicts language outcome in preschool children. Children who selectively allocate attention to informative moments during speech, such as word onsets, demonstrate better metalinguistic capacity (Astheimer et al., 2014).

# CONCLUSION

Infants acquire language exceptionally fast and without any given instruction. But, how can infants so easily achieve such a remarkable landmark, whereas adults struggle to do so? Following Kuhl's view (Kuhl, 2004), understanding how the early brain is committed to the statistical and prosodic patterns experienced early in life helps to explain the longstanding puzzle of why infants are better language learners than adults. One of the possible answers is the way their cognitive development is structured, with functions, such as attention, appearing in an incremental fashion and assisting language learning.

Based on the characteristics of the developmental trajectory of the attention and language systems, we have outlined the hypothesis that attention development, characterized by an initial phase when attention is stimulus-driven, followed by a progressive ability to endogenously control the focus of attention, shapes the developmental trajectory of language. In the evidence reviewed here, we have seen that the learning trajectory of two types of linguistic learning (words and rules) shows a different profile in infant language development. Whereas words in fluent speech are already segmented and extracted at early stages, non-adjacent dependencies occurring over temporally distant elements are learned many months later (see **Figure 1**).

More precisely, the early segmentation and word learning abilities is profoundly influenced by the salient characteristics of the speech signal, with an important role of prosodic information. Later acquisition of more complex information associated with the extraction of more distant dependencies is influenced by variables that help infants to focus attention on the relevant elements carrying the dependency and to disregard the information that is not relevant for the acquisition of the dependencies. This trajectory goes hand in hand with the development of the ability to progressively orient attention endogenously. Early in this phase, infants require more concurrent salient cues, such as phonological similarity or identity repetition, to help them to orient their attention to the relevant information. A greater development of goaldirected attention allows infants to learn less salient, nonadjacent dependencies by relying more on endogenous cues. In terms of brain development, whereas the initial stages of development rely on the availability of some areas of the ventral attention network, including the ventral prefrontal regions and the premotor cortex, the latter stages require the maturation of more dorsal prefrontal and parietal regions (see **Figure 2**).

We consider that this development of attention in different stages allows for an earlier simplification of learning. This early learning is driven by the automatic capture of attention, creating the first building blocks that learning can lean on when control of attention allows for the extraction of more complex relations between non-adjacent elements in speech. Data from adults show that they can track both adjacent and non-adjacent information at the same time, and one information can interfere with the other (Romberg and Saffran, 2013). Thus, the inability to reorient attention away from the automatic attractors of attention is valuable in the early stages of acquisition, allowing for incremental learning.

Moreover, the same exogenous system that allows young infants to extract words using salient cues may also help them to extract complex rules. Young infants are able to succeed in non-adjacent learning that otherwise would not be available after the first year of life. In these early stages, this success of non-adjacent dependency tracking occurs only under certain conditions. Applying our present proposal to this developmental scenario, two main conditions should be fulfilled to extract non-adjacent dependencies in the early stages of development: (1) a rudimentary mechanism of endogenous attention should be available to select certain predictive elements and to disregard irrelevant information, and (2) stimulus-driven factors should be present in the linguistic input (e.g., certain degree of similarity or saliency) to automatically capture the exogenous attention system.

The implications of our hypothesis are clear in terms of the parallelism between the development of the endogenous attention system and the rule learning abilities in healthy infants. This relation is seen not only in healthy development but also in the effects of attention deficits in relation with impairments in language development. The importance of being able to exploit the available information given by exogenous cues, such as prosodic information, to orient attention endogenously is crucial not only in infant healthy development but also in studies with different pathologies.

Comprehending the cognitive processes involved in language development is of critical importance for our understanding of why, under certain conditions, language development impairment occurs. However, research in the field of language development often offers limited explanations bounded within the language domain, ignoring the importance of other cognitive functions. The present proposal overcomes these limits and presents an integrative approach to understand the role of attentional tuning during language acquisition. By reviewing the main stages of attention and language development and possible impairments, we have strengthened the importance of taking an interdisciplinary approach to the study of human development. We believe that this integrative approach exploring the role of temporal attention as a scaffold for language development can lead to a wider scope than previous proposals, allowing the development of a precise model of language and cognitive function interaction during learning that has important clinical and developmental consequences, hence providing an important contribution to the language learning and language rehabilitation fields.

# AUTHOR CONTRIBUTIONS

RD-B, AM-A provided the ideas and wrote the article. FP contributed to discussions, writing and checked that the infant research review was accurate.

# ACKNOWLEDGMENTS

The authors have been funded by an ERC StG (TuningLang 313841) to RD-B. They are thankful to Pablo Ripollés for his help with the figures and to Joan Orpella for helpful discussions during manuscript preparation.

# REFERENCES


attention-deficit/hyperactivity disorder. *J. Commun. Disord.* 43, 77–91. doi: 10.1016/j.jcomdis.2009.09.003


segmentation mechanisms. *Front. Psychol.* 6:1478. doi: 10.3389/fpsyg.2015. 01478


Jackendoff, R. (2002). *Foundations of Language*. Oxford: Oxford University Press.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 de Diego-Balaguer, Martinez-Alvarez and Pons. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Can a bird brain do phonology?

Bridget D. Samuels 1, 2 \*

*<sup>1</sup> Department of Linguistics and Cognitive Science, Pomona College, Claremont, CA, USA, <sup>2</sup> Center for Craniofacial Molecular Biology, University of Southern California, Los Angeles, CA, USA*

A number of recent studies have revealed correspondences between song- and language-related neural structures, pathways, and gene expression in humans and songbirds. Analyses of vocal learning, song structure, and the distribution of song elements have similarly revealed a remarkable number of shared characteristics with human speech. This article reviews recent developments in the understanding of these issues with reference to the phonological phenomena observed in human language. This investigation suggests that birds possess a host of abilities necessary for human phonological computation, as evidenced by behavioral, neuroanatomical, and molecular genetic studies. Vocal-learning birds therefore present an excellent model for studying some areas of human phonology, though differences in the primitives of song and language as well as the absence of a human-like morphosyntax make human phonology differ from birdsong phonology in crucial ways.

#### Edited by:

*Cedric Boeckx, Catalan Institute for Research and Advanced Studies (ICREA)/Universitat de Barcelona, Spain*

#### Reviewed by:

*Pedro Tiago Martins, Pompeu Fabra University, Spain Eric Raimy, University of Wisconsin-Madison, USA*

#### \*Correspondence:

*Bridget D. Samuels, Department of Linguistics and Cognitive Science, Pomona College, 185 E. Sixth St., Claremont, CA 91711, USA bridget.samuels@gmail.com*

#### Specialty section:

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

Received: *19 June 2015* Accepted: *13 July 2015* Published: *28 July 2015*

#### Citation:

*Samuels BD (2015) Can a bird brain do phonology? Front. Psychol. 6:1082. doi: 10.3389/fpsyg.2015.01082* Keywords: birdsong, phonology, language-ready brain, cognitive biology, comparative neuroscience, evolution of language, biolinguistics

## 1. Introduction

The striking similarities between how some birds learn to sing and how human infants learn to talk has been a source of fascination for researchers for generations, dating back to Darwin's (1871) Descent of Man. Darwin already understood that the capacity for vocal learning is a rare ability in the animal kingdom but constitutes an important component of birdsong and human language learning. For this and other reasons, Darwin called birdsong the "nearest analogy to language" and looked to birds for insight into how human language may have evolved.

Modern research has confirmed that vocal learning is indeed a rare ability, particularly among mammals. Another key component of how we process speech, namely categorical perception, was once thought to be quite rare as well, giving rise to the notion that "speech is special" because it uniquely makes use of this ability. However, an explosion of work beginning with Kuhl and Miller (1975) established that categorical perception is ubiquitous in species ranging from macaques (May et al., 1989) to crickets (Wyttenbach et al., 1996). Other animals can perceive human speech categorically and can perceive their own vocalizations categorically; moreover, humans perceive non-speech stimuli such as colors categorically.

The availability of new genetic and neuroimaging techniques has complemented these behavioral studies so that we may begin to understand birdsong and human language on the level of neural connectivity and gene expression. Interestingly, these approaches underscore the similarities between perception and production in humans and birds that are vocal learners. Here, I review some recent literature on this topic, focusing on two main areas: vocal learning and vocalization structure (phonological syntax). In each of these areas, what is used to learn, perceive, and produce birdsong appears to be highly similar to what is employed in human speech. However, human phonology is crucially different from birdsong phonology because of its connection to human morphosyntax, which is a semantically compositional or "lexical" syntax in the sense of Marler (1998). Thus, a bird brain may not be truly language-ready, but may still provide an excellent model for understanding components of human speech and the constraints that shaped the evolution of the human language faculty.

# 2. Vocal Learning

Vertebrates all seem to have the ability for auditory learning, or committing a novel sound to memory. Vocal learners have the additional ability to imitate or mimic a learned sound. Human language relies heavily on vocal learning, since all vocabulary items and a variety of other linguistic structures must be learned in order to achieve linguistic competence. Yet, it is a wellknown curiosity that our species is alone among primates in having a well-developed capacity for vocal learning, though Seyfarth and Cheney (1986) suggest that vervet monkey calls may be learned. Among the myriad species that have been studied, among mammals only humans, cetaceans, pinnipeds, elephants, and some bats are relatively strong vocal learners; oscine songbirds (passerines), parrots, and hummingbirds are among the best vocal learners in the animal kingdom (see references in Schachner et al., 2009 and Petkov and Jarvis, 2012).

Comparisons between strongly vocal-learning birds and those with a poor capacity for vocal learning can be used to shed light on how the neural plasticity and other capacities needed to support the vocal learning mechanism may have evolved. Moreover, comparing learned and innate birdsongs can provide the opportunity to probe whether or to what extent vocal learning allows more structurally complex song. Note that the capacity for complex vocal learning emerged independently in three clades of birds, which are separated by 68 million years from a common ancestor (see references in Pfenning et al., 2014). Alternatively, this capacity may only have arisen twice in birds: once in hummingbirds and once in the common ancestor of parrots and songbirds, which are closely related, with a loss of the ability in the suboscine songbirds (Suh et al., 2011; Petkov and Jarvis, 2012) and perhaps a gain in at least one suboscine species (Saranathan et al., 2007). Currently, most research on vocal learning in birds has focused on the passerines, but an intriguing recent study on suggests that one portion of the song system is similar in songbirds, hummingbirds, and parrots, while another portion evolved uniquely in parrots over 29 million years ago (Chakraborty et al., 2015). The similarity between the vocal learning systems in these avian clades is remarkable for the same reason that the similarities between the avian and human ones are: evolution has come up with nearly the same means of developing this ability time and time again. For researchers studying human language, this is fortunate since it means that birds can model the object of our study to a surprising extent.

Doupe and Kuhl (1999) provide an overview of the evidence for vocal learning in a particular species, which involves the following properties: (i) initially immature vocalizations ("babbling") that eventually become adultlike; (ii) a relatively fixed individual-level repertoire that varies across individuals/groups; (iii) individual-level differences that depend on experience/exposure; and (iv) the necessity of auditory feedback to maintain normal vocalizations. The behavioral evidence for vocal learning in songbirds and parallels to human first language acquisition have been reviewed widely in the literature (see e.g., Doupe and Kuhl, 1999; Bolhuis et al., 2010; Berwick et al., 2011), and I will not recap those arguments here. Schachner et al. (2009) discuss a relatively new line of research investigating the connection between vocal learning and spontaneous rhythmic motor entrainment, or the ability to align movement with auditory input (i.e., move to a beat or dance). They found support for the hypothesis that entrainment is a by-product of selection vocal mimicry that arises from a specialized connection between the auditory and motor systems (Patel, 2008): upon analyzing videos of a wide variety of animals purportedly dancing, they found that only vocal mimicking species showed any evidence of entrainment. These included the Asian elephant and 14 species of parrot. It has also been widely noted that both humans and songbirds exhibit critical or sensitive periods for native-like song/language acquisition (see e.g., Lenneberg, 1967). However, not all vocal learning species have this property; starlings, canaries, and pied flycatchers are "open-ended" learners (Brainard and Doupe, 2002; Eriksen and Lampe, 2011), and Prat et al. (2015) argue against a short critical period in Egyptian fruit bats, which are vocal learners and initially exhibit immature vocalizations akin to babbling. I therefore set this issue aside.

#### 2.1. Neural and Molecular Evidence

A number of recent studies investigating the neural and molecular underpinnings of vocal learning focus on songbirds. Vocal learning is served by regions in the motor cortex and striatum in in both songbirds and humans, and these regions appear to have a uniquely direct connection in both humans and vocal-learning birds, as opposed to non-vocal-learning birds and primates (Pfenning et al., 2014). The anterior forebrain pathway involved in song learning and plasticity in the adult song of vocal-learning birds links the HVC (a region formerly known as the hyperstriatum ventrale, pars caudalis) to Area X of the basal ganglia, the thalamic nucleus dorsolateralis anterior pars medialis (DLM), the lateral magnocellular nucleus of the anterior nidopallium (LMAN), and the robust nucleus of the arcopallium (RA), where it connects with the posterior motor pathway, which is also involved in song production and learning (Bolhuis et al., 2010). Pfenning et al. (2014) took a computational approach, screening gene expression databases from humans and all three clades of vocal-learning birds as well as the non-vocal-learning dove, quail, and macaque. The results of these gene expression studies confirmed that not only have human and vocal-learning bird brains evolved convergently from an anatomical perspective in ways that are not true of non-vocal-learning species, this convergence has also occurred on a molecular level. For birds and humans to arrive at the ability of vocal learning involved the convergent evolution of expression patterns of hundreds of genes in the regions of the brain that subserve this behavior. Many of these genes affect neural connectivity or function in fine motor control. Area X and VS in the songbird (finch) striatum show specialized gene expression similar to that of the putamen and body of the caudate in the human basal ganglia. The songbird RA is the most similar in specialized gene expression to somatosensory cortex in humans, specifically the primary motor cortex and adjacent somatosensory portion of the central sulcus, as well as the ventral portion of the laryngeal motor cortex. In these areas, the number of genes with significantly shared specialized expression between finches and humans ranges from the tens to the hundreds. The expression levels of Foxp2 in Area X have been studied extensively; see Bolhuis et al. (2010) for a recent overview of the literature on this gene in humans and other species. Levels of FoxP2 are higher in Area X in juvenile zebra finches during the sensitive period for song learning (Haesler et al., 2004). In canaries that add new song elements to their repertoire at the end of breeding season, the level of Foxp2 expression is higher during this period (Haesler et al., 2004). Singing downregulates Foxp2 mRNA in Area X in both juvenile zebra finches and adult males during "undirected" singing in the absence of a female (Teramitsu and White, 2006; Teramitsu et al., 2010).

It has been suggested that the avian pallium—which contains several areas discussed above, including the HVC, RA, and LMAN—is homologous with the mammalian neocortex. Homology between these structures would be significant because computation in the laminated cortex is considered to be responsible for complex behavior. Although only mammalian brains have a cortex, birds are also capable of sophisticated behaviors including tool use, basic arithmetic, causal reasoning, and recognizing themselves in mirrors (see references in Calabrese and Woolley, 2015). Like the mammalian primary auditory cortex, the avian auditory pallium (Field L) consists of three regions that receive auditory input from the thalamus (Bolhuis et al., 2010). The auditory pallium and neocortex display highly similar patterns of connectivity (Wang et al., 2010), and gene expression analyses also highlight similarities between these two tissues (Dugas-Ford et al., 2012). Calabrese and Woolley (2015) recorded neuronal populations in different portions of Field L in zebra finches and showed that the auditory pallium exhibits the same hierarchical information-processing principles as the canonical cortical microcircuit in mammals. Their conclusion is that this microcircuit evolved in a common ancestor of birds and mammals, 300+ million years ago. As Harris (2015) notes, it may be even older; the fish brain also has a pallium, and invertebrates such as cephalopods also display striking intelligence. Harris therefore suggests that the canonical cortical microcircuit may be evolutionarily quite old, but only repurposed for intelligence in species where the benefits of doing so outweighed the costs of increased brain size, energy expenditure, and development time.

The overall picture that emerges from these studies is that the neural and molecular bases of vocal learning in humans and songbirds have strong similarities, owing in part to convergent evolution (analogy) and in part to homology. It should be noted that both analogy and homology are of potential interest to the study of language evolution. Homologies highlight our ancient heritage, the biological substrate that was adapted and/or exapted for the externalization of language. Analogies show that similar solutions may arise to similar problems (Gould, 1976). For example, the last common ancestor of the octopus and vertebrates was ca. 750 million years ago; the octopus eye emerged ca. 480 million years ago and the vertebrate eye emerged completely independently 640–490 million years ago, yet human and octopus eyes have 70% of their expressed genes in common (Ogura et al., 2004; Fernald, 2006). Of the 1052 genes expressed in the octopus eye, 1019 (97%) are evolutionarily quite old, dating back to the common ancestor of bilateria (Ogura et al., 2004). Convergent identical amino acid substitutions have been discovered in a number of areas, including the gene encoding the motor protein Prestin, which is crucial for echolocation, in bats and cetaceans (Liu et al., 2010; see Pfenning et al., 2014, for further examples). This is in part because the vertebrate brain provides a highly genetically constrained substrate upon which to build (Jarvis, 2004). Noting analogies like these helps to shed light on the physical and developmental constraints on solving the problem in question, which "may essentially force natural selection to come up with the same solution repeatedly when confronted with similar problems" (Hauser et al., 2002, p. 1572). In the context of describing the growth of language in a human child, Chomsky (2005, 2007) has dubbed properties that arise from such constraints "third factor" principles, which interact in a dynamic fashion with the genetic endowment (first factor) and experience (second factor). Studies like the ones described here highlight the fact none of these factors can be viewed in isolation, and that in particular the third factor shapes the first in a powerful fashion that we are only beginning to uncover.

# 3. Phonological Syntax

One of the properties that distinguishes vocalizations like human language and the songs of birds and whales from the calls of non-human primates is the rich structure of the former. On the other hand, primates are capable of producing distinct calls with distinguishable referents (Arnold and Zuberbühler, 2006a,b, 2008; Ouattara et al., 2009; Cäsar et al., 2013), whereas the same song serves a number of expressive functions in birds. The idea that human language integrates a song-like expressive system with a lexical system like that of other primates has been recently explored by Miyagawa et al. (2013, 2014). In the sections that follow, I will review evidence suggesting that the structure of birdsong is like that of human phonology in important ways, that the elements within songs are context-sensitive like the elements of human speech, and that birds may be capable of computations as complex as those demanded by human phonology.

#### 3.1. Hierarchical Structure

The structure of birdsongs can be modeled as exhibiting hierarchy with limited depth. Each individual has a repertoire of notes, akin to phonemes in human speech, often shared with other individuals of the species. A sparrow or Bengalese finch has a repertoire of less than 8 note types, such as whistles, trills, and buzzes in the case of the sparrow, each exhibiting withincategory variation (Marler, 2000). Multiple notes are produced sequentially to produce a syllable. A syllable is defined as a group of notes bordered by silence, unlike syllables in human speech, which readily follow each other without any interruption.

A typical zebra finch syllable might range from 60–180 ms in duration (Fehér et al., 2009). When interrupted by a strobe flash in the midst of a syllable, a zebra finch will complete the syllable, which suggests that these chunks are units of motor planning (Cynx, 1990). A sequence of several syllables that repeats during the course of a song is called a motif (Slater, 2000). Doupe and Kuhl (1999) liken motifs to phrases in human language, though Yip (2006) is tempted to equate them with prosodic words. An entire song bout consists of several motifs. The number of songs created by an individual bird varies greatly according to species. A winter wren may know 5–10 distinct songs, each lasting 10 s, whereas each starling may know up to 100 motifs and combine some of them in a song bout that is 30 s to a minute long (Yip, 2006). Nightingales and mockingbirds may have larger repertoires of hundreds of songs (Marler, 2000; Berwick et al., 2011), organized into less than a dozen "packages" of bouts that are typically produced together (Todt and Hultsch, 1996). It isimportant to note that notes and syllables do not have any meaning. This is what Marler (1998, 2000) calls "phonological syntax" or "phonocoding"; the elements of songs can be combined in different sequences, but this does not change their meaning. Similarly, human vocalizations consist of combinations of sounds (phones) into morphemes, but the phones themselves are not meaningful. Of course, this differs from human language on a word-level or sentence-level scale, which is said to have "lexical syntax" or "lexicoding"; the meaning of a word results from the meanings of its morphemes, and the meaning of a sentence arises from the meanings of its words. It is also important to consider that human speech does not bottom out at the segmental (phone) level. In all modern phonological theories, phonological processes operate over smaller units: distinctive features, elements, or articulatory gestures. There is no evidence for manipulation of any sub-note-level features in birdsong.

Analogies between birdsong syllables and human syllables, and between birdsong motifs and human prosodic words or phrases, are of limited utility. Conservatively, one can say that language and song are alike in having structure on different timescales: notes/phonemes in the tens of milliseconds, syllables around 100–200 ms, and longer timescales for larger units (Doupe and Kuhl, 1999; Yip, 2006). These elements are arranged in non-random order, as will be discussed in a later section. It has been suggested that chunking songs into motifs and syllables may serve purposes for both memorization and production, similar to breaking a ten-digit telephone number into chunks of three or four digits (Williams and Staples, 1992). I have noted in previous work (Samuels, 2011) that the maximal number of segments in a human syllable is around 5 (depending on theory-internal considerations), which is at the upper limit of the number of elements we can simultaneously hold in short-term memory (Miller, 1956; Cowan, 1998, 2001). It is also interesting to note that humpback whale songs follow the same general pattern discussed here: they typically consist of up to ten ordered elements, which are then repeated a few times as a unit (Payne, 2000). Reduplication, which is a common way of expressing pluralization, durativity, and other grammatical functions in human language and also plays a role in many language games, resembles this order-preserving repetition (Samuels, 2011; Miyagawa et al., 2014). However, reduplication only creates a single extra copy of the elements over which it operates.

There is some experimental evidence concerning what areas of the brain control birdsong structure. Kao and Brainard (2006) found that inducing lesions in the LMAN of zebra finches reduces variability in syllable structure, which is normally greater in male birds' undirected singing than it is in their singing to females. However, damage to the LMAN does not affect the number of motif repetitions or the sequencing of syllables. In adult finches, auditory units in the LMAN and in the HVC respond more strongly to a bird's own song than to the songs of other conspecifics (Lewicki and Arthur, 1996; Doupe, 1997). Some neurons in the zebra finch HVC appear to integrate auditory information over a window of several 100 ms, so they are sensitive to certain sequences or combinations of syllables (Lewicki and Arthur, 1996). It has been suggested that such sequences are represented in the HVC via population coding (Nishikawa et al., 2008). Like humans, zebra finches show left-hemisphere dominance of the HVC and in the caudomedial nidopallium, which have been compared to the human Broca's and Wernicke's areas, respectively (Moorman et al., 2012; Pfenning et al., 2014). There is also evidence to suggest that more complex song syntax is associated with changes in gene expression and neural organization (Boeckx and Benítez-Burraco, 2014). The Bengalese finch, which is a domesticated type of white-backed munia, has a more complex song structure than its wild counterpart (Okanoya, 2004). This difference appears to be reflected in differential androgen receptor expression in the GABAergic neurons in Area X and in differential epigenetic regulation (methylation) of regions upstream of the start codon for this receptor (Wada et al., 2013). A recent vein of research into the mechanisms of human speech perception is exploring coupled theta-gamma oscillations in the auditory cortex as a means through which the different time scales of the speech stream may be integrated, perhaps via a more general mechanism of attention (Martins and Boeckx, 2014). The coupling of theta waves, which track syllabic rhythm, with gamma waves that track a shorter interval corresponding to the segment or phoneme, could enable "de-multiplexing" of the speech stream to facilitate parsing and encoding (Hyafil et al., 2015). There is evidence suggesting that coupling may be disrupted in some individuals with autism (Jochaut et al., 2015).

#### 3.2. Contextual Alternations

Human speech is comprised of sounds or phones that can be categorized in terms of their membership in abstract categories known as phonemes. A phoneme may have multiple realizations, known as allophones, that are distributed in a context-sensitive manner. For example, the voiceless stop consonants /p, t, k/ in English are aspirated when they appear word-initially, unaspirated after /s/, and unreleased or glottalized word-finally. Membership in a particular phonemic category varies from language to language: the alveolar flap [R] is an allophone of /t/ and /d/ that appears intervocalically or trochaic foot-medially in English, as in the words putty and ladder, whereas [R] is considered by some phonologists to be an allophone of /r/ in Spanish (Harris, 1969). The realization of a phoneme can also be affected by its neighbors in a phenomenon known as coarticulation, as it is attributed to anticipatory or lagging movement of the vocal apparatus. The context-dependent, ruleor constraint-governed realization of phonemes/allophones is a defining characteristic of human phonological systems.

Wohlgemuth et al. (2010) showed that the realization of a Bengalese finch syllable is significantly affected by the preceding and following syllables. A syllable is called "convergent" if it can be preceded by at least two different syllables, and is called "divergent" if it can be followed by at least two different syllables. The identity of the following syllable affected realization of its divergent predecessor 92% of the time, and the identity of the preceding syllable affected the realization of the following convergent syllable 92% of the time. These effects extended even beyond the immediately preceding/following syllable and could be detected at least two syllables away. Measurements of RA activity suggested that this region plays a role in this context-sensitive phonology, as it responds differentially to the same syllable when produced in different contexts, though RA activity is still more strongly correlated across realizations of the same syllable than across different syllables. The magnitude of differences in response to the same syllable in different contexts correlated with the magnitude of the phonological variation across those contexts.

Allophonic-style variation has also been found at the level of notes in swamp sparrows. Lachlan and Nowicki (2015) performed careful habituation/dishabituation studies showing that sparrows categorize notes differently according to their length and their position within a syllable. Among types of notes that descend rapidly in frequency, there is a clear trimodal distribution in length in the songs of male sparrows from Pennsylvania. Short notes (clustered around 8 ms in duration) typically occur syllable-initially, while long notes (clustered around 32 ms in duration) typically occur syllable-finally. Notes of intermediate length (clustered around 16 ms in duration) can occur both syllable-initially and -finally. Interestingly, these categories are learned, and male swamp sparrows from New York have a bimodal distribution of note types that is missing the cluster of intermediate-length notes. The Pennsylvania birds in Lachlan and Nowicki's study categorized the intermediate-length notes with the short notes in syllable-initial position, but with the long notes in syllable-final position. While it is possible that the birds construct completely different categories for syllableinitial and syllable-final word types, there remains the intriguing possibility that intermediate notes serve as an "allophone" of a phoneme-like short-note category in one position but are allophones of the long-note category in another position.

#### 3.3. Computational Complexity

The formal complexity of grammars can be categorized according to the type of rules sufficient to generate them (Chomsky, 1956). The following broad categories, known as the Chomsky Hierarchy, can be defined as follows (Wall, 1972):

	- b. Context-free: A → ω, where ω 6= the null string

All known phonological alternations and phonotactics, which govern the sequential distribution of phonemes, fall into the class of regular languages and can thus be modeled with finite-state machines (Johnson, 1970; Kaplan and Kay, 1994; Karttunen, 1998). This contrasts with the domain of sentencelevel syntax, which has been known since Chomsky (1956) to exhibit context-free patterns. It is now recognized that crossserial dependencies in syntax fall outside the class of contextfree languages, requiring mildly context-sensitive computations (Shieber, 1985). On the basis of this difference, Heinz and Idsardi (2011, 2013) have argued that there are likely to be multiple, distinct language learning modules that deal separately with these disparate patterns. Even within phonology, there may be more than one. Phonological patterns sometimes involve restrictions on adjacent sounds, but can also involve longdistance computations. For example, some languages including Navajo prohibit the alveolar sibilant [s] and the post-alveolar sibilant [S] from co-occurring within a word, regardless of the distance between them (McDonough, 2003). Heinz and Idsardi (2013) (see also references therein) pursue the hypothesis that phonotactic constraints fall into a few distinct sub-regular classes, specifically the strictly local class when only a contiguous string of adjacent segments is involved and the strictly piecewise class for long-distance patterns like the Navajo case. Stress patterns may be of either of these types, though a few may require counting, which is measurably more complex but still falls within the class of regular languages. An intriguing question, then, is whether the phonological alternations seen in birdsongs are of these types, and/or whether birds are capable of these kinds of computations.

In nature, no known types of birdsong require more computational power than human phonological patterns: both fall within the class of regular languages. This has been shown for Bengalese finch song, which is among the more complex and variable song systems (Berwick et al., 2011). A state transition diagram of a typical Bengalese finch song (abstracting away from the probabilities of state transitions) is shown in **Figure 1** alongside a reduplication pattern found in English (see Raimy, 2000 and Samuels, 2010b, 2011 for more details on the loop formalism used to represent reduplication). Bengalese finch songs are of the simplest type recognizable by a finitestate automaton, strictly locally 2-testable languages, meaning it is possible to determine whether a sequence is licit by looking at a moving window of two-note sequences. A further interesting property of Bengalese finch songs is that they are easily learnable in a technical sense (Kakishita et al., 2009), which is not true of regular languages more broadly. As noted above, some phonotactic constraints in human languages fall into the strictly local class, though the window of observed segments must be larger than two (perhaps maximally around

five segments). Other types of birdsongs, such as those of starlings and American thrushes, are even less complex, requiring only low-order Markov models to describe the sequence of motifs (Dobson and Lemon, 1979; Gentner and Hulse, 1998). I do not know of any patterns that require strictly piecewise computation in birdsong. Attempts to determine whether starlings and finches can learn or spontaneously extract context-free patterns have generated controversy and are widely considered inconclusive at this time (Gentner et al., 2006, 2010; van Heijningen et al., 2009; ten Cate et al., 2010; Abe and Watanabe, 2011; Beckers et al., 2012; Everaert and Huybregts, 2013).

# 4. Conclusions

Although significant gaps in our knowledge remain, recent genetic, neuroanatomical, and behavioral studies have served to underscore the parallels between human language phonology and birdsong. These similarities are due in large part to convergent evolution, but some have their roots in homologies of neural structures, such as between the mammalian auditory cortex and the avian pallium. There is strong evidence that a bird brain can do some types of phonological computations, as evidenced by the patterns and relationships among elements in birdsong, which closely resemble the relationships between elements in human phonology by every measure on which they have been compared. Still, important differences remain.

One of the main differences between human and avian phonology has already been briefly mentioned in the discussion of hierarchical structure above: the primitives of birdsong are unlike those of human language. Notes seem act in a more atomic fashion than phones, which can be—and indeed must be, to provide an adequate and insightful account of human phonological systems (Jakobson et al., 1952; Halle, 2002) decomposed into smaller phonological features (or equivalently for the present purposes, elements or gestures). It may be the case that human languages can exist without this featural level, as has been argued for Al-Sayyid Bedouin Sign Language (Aronoff et al., 2008; Sandler et al., 2011), which lacks featural minimal pairs that are ubiquitous in all other known spoken and signed languages (cf. bin vs. pin in English, which differ in the presence or absence of a voicing feature on the first segment).

This discussion of a signed language raises another disparity between human and avian communication: unlike birdsong, human languages can be externalized in more than one modality. It is commonly held that signed and spoken language phonology are in fact identical, differing only in the (learned) content of their features (Brentari, 1998; Hale and Reiss, 2000; Mielke, 2008). Taken together, these data suggest that avian and human phonology are more comparable on a computational level than a representational one. I have argued that the underpinnings of phonological features are not unique to humans, however (Samuels, 2010a). The origins of phonological features may be attributed in part to perceptual biases known as auditory discontinuities that we inherited from the basic mammalian auditory system (see e.g., Brown and Sinnott, 2006; Kluender et al., 2006; Mesgarani et al., 2008). Some of these perceptual biases are shared with birds such as budgerigars also (Brown and Sinnott, 2006). Some birds and mammals, including non-human primates, have additionally been shown to attend spontaneously to formants (energy peaks in the acoustic signal), which are crucial correspondents of sub-segmental features in human speech (Fitch, 1994). The presence of a kinesthetic mode of language in humans also suggests that studying movement systems could also be informative. Alongside the attempts to teach primates to sign (e.g., Nim Chimpsky, Washoe the chimpanzee, Koko the gorilla, etc.), which were relatively successful relative to the prior failed attempts to teach primates to speak, some researchers have looked to "action grammars" as precursors of linguistic syntax (Greenfield et al., 1972; Greenfield, 1991, 1998; Johnson-Pynn et al., 1999; Fujita, 2007, 2009). Interestingly for the present purposes, Greenfield (1991) has suggested a parallel between action grammars and the combination of phonemes into words. Such studies suggest that moving beyond birdsong and investigating other behaviors, such as mating dances, could potentially be illuminating in this regard as well.

Birdsong also appears to be absent of non-local dependencies, which are attested in patterns such as vowel and consonant harmony in human language. Interestingly, harmony patterns provide some of the best evidence for underspecification, or the initial absence of a particular phonological feature on a certain class of segments in lexically stored morpheme forms. I have suggested elsewhere that underspecification may be a unique feature of human language, which follows if the basic elements of other vocalization systems are not composed of features like ours are (Samuels, 2015).

Another major difference is that birdsong is not fed by a recursive morphosyntactic cycle. A large number of phonological phenomena in humans are bounded by morphological or syntactic domains. For example, they may occur within words but not across them. Others are re-computed each time a new morpheme is added to the derivation, such as stress:

witness the differences between govern with stress on the first syllable, governmental with stress on the penultimate syllable, and governmentalese with stress on the final syllable. All this is to say that birdsong and human phonology differ substantially in the nature and structure of the input they receive. It is therefore worthwhile to consider the question of how potentially pre-existing phonological capabilities could have come to fit together with a more complex "upstream" system like that of human morphosyntax. Taken together, the evidence presented here suggests that further investigations of birds can help us to pinpoint interesting questions to ask about the cognitive abilities, neural circuitry, genetics, and epigenetics that are involved in human language, and about the nature of language evolution itself.

Of course, such studies are only one piece of the puzzle. For example, birds are not currently as amenable to genetic engineering as common laboratory species such as mice and zebrafish, which limits the availability of certain experimental approaches—but a better understanding of birds can provide the rationale for studies that may be possible in other species. Studies of Foxp2 provide an excellent example of this kind of cross-species synergy. Initially, a heterozygous point mutation in FOXP2 was famously identified as being associated with a language disorder, developmental verbal dyspraxia, in a British family (Lai et al., 2001). It was then established that this gene is highly conserved from reptiles to humans, but especially among mammals, with strong evidence for recent selection in the human lineage (Enard et al., 2002; Scharff and Haesler, 2005). Due to current technological limitations, RNAi-mediated knockdown using a lentivral vector has been used to study the effect of reduced Foxp2 expression in Area X of the zebra finch brain, rather than a transgenic approach (Haesler et al., 2007; Schulz et al., 2010). In mice, heterozygous and

#### References


homozygous Foxp2 knockouts as well as humanized knockins have been studied, and a mouse model has been developed with a conditional null (floxed) allele, allowing crosses to transgenic lines expressing Cre drivers for tissue- and timespecific conditional knockouts (French et al., 2007). Knockdown (in finches) or haploinsufficiency (in mice) of Foxp2 leads to altered or inaccurate vocalizations (Shu et al., 2005; Haesler et al., 2007), and in the finch this is associated with the altered density of spiny neurons in Area X (Schulz et al., 2010). Interestingly, the human version of Foxp2 has strong effects on the plasticity of the striaum and accelerates learning when introduced into mice (Schreiweis et al., 2014). Mice with certain point mutations in one copy of Foxp2, including those that cause developmental verbal dyspraxia in humans, are developmentally delayed, somatically weak, and have impaired auditory-motor association learning owing to strongly altered activity in the striatal circuits, but they make the expected range of acoustically normal vocalizations (Gaub et al., 2010; French et al., 2012; Kurt et al., 2012). These studies collectively give a more robust view of this gene's role in vocalization than would be possible using a single species. In sum, looking at the communication systems of other animals as well as their cognitive abilities more generally is also necessary to achieve a better perspective on what abilities underlie human language, what species share them, and how they may have evolved.

#### Acknowledgments

I am grateful to the guest editors of this Special Topic, Antonio Benítez-Burraco and Cedric Boeckx, for the opportunity to contribute to this volume. I am also indebted to the two anonymous reviewers and to Hiroki Narita for their very insightful comments and suggestions, which improved the manuscript significantly.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Samuels. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# **Evolution of speech-specific cognitive adaptations**

#### *Bart de Boer\**

*Artificial Intelligence Laboratory, Vrije Universiteit Brussel, Brussels, Belgium*

This paper argues that an evolutionary perspective is natural when investigating cognitive adaptations related to language. This is because there appears to be correspondence between traits that linguists consider interesting and traits that have undergone selective pressure related to language. The paper briefly reviews theoretical results that shed light on what kind of adaptations we can expect to have evolved and then reviews concrete work related to the evolution of adaptations for combinatorial speech. It turns out that there is as yet no strong direct evidence for cognitive traits that have undergone selection related to speech, but there is indirect evidence that indicates selection. However, the traits that may have undergone selection are expected to be continuously variable ones, rather than the discrete ones that linguists have focused on traditionally.

#### *Edited by:*

*Cedric Boeckx, Catalan Institute for Research and Advanced Studies – Universitat de Barcelona, Spain*

#### *Reviewed by:*

*Monica Tamariz, The University of Edinburgh, UK Gareth Roberts, University of Pennsylvania, USA*

#### *\*Correspondence:*

*Bart de Boer, Artificial Intelligence Laboratory, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium bart@ai.vub.ac.be*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

*Received: 29 July 2015 Accepted: 17 September 2015 Published: 29 September 2015*

#### *Citation:*

*de Boer B (2015) Evolution of speech-specific cognitive adaptations. Front. Psychol. 6:1505. doi: 10.3389/fpsyg.2015.01505* **Keywords: evolution of speech, combinatorial structure, language evolution, biology-culture co-evolution, language-specific selection**

# **Introduction**

What properties of the brain make it language-ready? Many properties of the brain are needed, including "obvious" ones such as a supply of oxygen and nutrients. However, when cognitive scientists and linguists consider this question, they focus on properties that are to at least some extent unique to language and/or unique to humans (Hockett, 1960; Hauser et al., 2002). This is implicitly an evolutionary point of view, because what is investigated is defined in relation to what is found in related species. Here it is argued that even though the language-ready brain can be studied fruitfully without reference to its evolutionary history or without reference to comparable abilities in other species, keeping in mind the evolutionary perspective is important. After all, the behaviors and brain regions that are involved must either be similar to those of other apes, and if they are more different than would be expected from random drift, there must be an evolutionary reason, either related to language or not.

An evolutionary perspective may also help to resolve the debate about whether behaviors or mechanisms related to language are "language-specific" or "domain-general." The problem is that one researcher's "language-specific" is another researchers' "domain-general," as it is essentially arbitrary where one draws the line. From the evolutionary perspective this is even clearer as any cognitive mechanism involved in language must be based on an earlier one that was not. However, the evolutionary perspective may provide a way out, as the question of whether a trait has undergone selective pressure related to language is in principle amenable to empirical investigation (even though this may be very hard). Hence the question of whether a trait is domain-general or language-specific can be operationalized by asking whether it has undergone selective pressure related to language. In this paper, certain aspects of the language-ready brain related to speech will be considered from an evolutionary perspective. Speech is here defined as the physical signal that is used to convey language, and although this paper will focus on signals in the acoustic modality, most of what is said is true for sign language as well. Researchers with a naïve view of biology sometimes consider speech as a somewhat uninteresting process of externalization unrelated to the core properties of language (e.g.,Bolhuis et al., 2014). However, from an evolutionary perspective it is one of the most interesting aspects of language. There are three reasons for this. Firstly, speech is the aspect of language that is closest to the physical world and therefore the most likely to leave traces in the fossil record (de Boer, 2012; reviewed in, e.g., Fitch, 2010, section 2). Secondly, and related to this, speech has close analogies in other animals' behaviors. Thirdly, speech has very interesting cognitive properties (defined more precisely below) that have been proposed by some researchers as direct precursors to syntax (Carstairs-McCarthy, 1999; Studdert-Kennedy, 2005).

Two cognitive properties that allow speech but that are not found in closely related primates are precise voluntary control over the larynx and extensive vocal imitation (Ackermann et al., 2014). This paper will focus on a third aspect: *combinatorial speech*, the ability to use a small set of learned building blocks that can be recombined into an unlimited number of utterances using learned rules. This ability to deal with combinatorial structure is the basis of the phonology and phonotactics of modern human languages. Before looking at evidence for languagespecific selective pressure in cognitive traits for dealing with combinatorial structure, a brief theoretical discussion is necessary about what kinds of traits can evolve, and what can therefore be expected.

# **Constraints on Evolution**

An important constraint on evolution is that it needs to work with what is already there: selection works on variations in the population, and this variation is caused by randomness in transmission. However, transmission in complex organisms must be relatively high-fidelity and variation must therefore be small. Evolution will consequently be gradual. However, this appears to pose no important constraints on language evolution. Precursors of many of the prerequisites for language have been inferred for the latest common ancestor with the other apes (Fitch, 2010, chapter 6). In addition processes of analogous evolution observed in other groups of species show that traits required for language that are missing in the latest common ancestor can evolve relatively quickly, for instance vocal mimicry<sup>1</sup> or (song) structure (Honda and Okanoya, 1999).

A more subtle constraint arises because language itself evolves culturally while humans evolve biologically. It has been argued that because culture changes much more quickly than biology, language provides an insufficiently stable target, and therefore arbitrary adaptations to it cannot evolve (Chater et al., 2009). Mathematical analysis shows that only the smallest stable learning biases need to evolve (Kirby et al., 2007; Smith, 2011; Thompson et al., 2012) because once a learning bias is in place cultural evolution will tend to amplify the effect of the bias, therefore masking the distinction between strong and weak biases, and thus eliminating any selective advantage of a stronger bias. If only small learning biases can evolve, it may be that these are too small to detect experimentally.

Nevertheless larger adaptations to culturally changing language can evolve through co-evolution between language and cognition (e.g., Deacon, 1997). This can happen when cultural evolution pushes the language to become more challenging for the learners (through expanding vocabulary, or through expanding the sound system, for instance). Biological evolution can then make a small adaptation (in the sense mentioned above). This will allow for cultural evolution to make the language even more complex than before, and through continuous co-evolution a large adaptation to language can eventually evolve. Candidate for such adaptations can be the ability to produce and perceive a large range of signals (de Boer, 2015) or the ability to learn large lexicons (de Boer, 2014). Such traits are by necessity continuously variable, whereas in general traits that are considered by linguists are discrete in nature, e.g., the ability to use recursion (Bolhuis et al., 2014), or the universals considered by Evans and Levinson (2009).

# **Experimental Investigation**

What evidence exists for adaptations dealing with combinatorial structure? The fact that languages can be analyzed as having combinatorial structure does not necessarily mean that this structure is also represented in the brain (Zuidema and de Boer, 2009). However, evidence from for instance speech errors (Meyer, 1992), treatment of loanwords (e.g., Vendelin and Peperkamp, 2006) or poetry (Maddieson, 2008) indicate that speakers are aware of the building blocks, even if these building blocks do not necessarily correspond to phonemes. Moreover, evidence from acquisition indicates that infants learn the building blocks and the structure of their language from a very young age, both in production of intonation (Mampe et al., 2009) or phonemes (e.g., Kuhl and Meltzoff, 1996) and in perception of phonemes (Maye et al., 2002; Kuhl, 2004). This indicates that there must be cognitive mechanisms that help in learning building blocks of speech, whereas there is no evidence that these mechanisms are present in other apes. On the other hand, evidence from the emerging sign languages ABSL (Sandler et al., 2011) and CTSL (Caselli et al., 2014) indicate that combinatorial structure emerges gradually in new human languages, and that full languages can exist without much combinatorial structure.

One way to operationalize the search for traits that have undergone selection related to language is to look for brain regions that react preferentially to language. There is good evidence that there are regions specialized for processing speech and phonetic cues (e.g., Leaver and Rauschecker, 2010) and that there are even regions specialized for phonotactics (Raettig and Kotz, 2008; Rossi et al., 2011). However, there is also evidence that the precise processing of phonotactic structure is influenced by literacy (Castro-Caldas et al., 1998). Incidentally, Vendelin and Peperkamp (2006) also found that orthography influences how loanwords are treated. This raises the question of how much of the observed specialization and behavior is due to acquisition, and how much of it is indicative of evolutionary selection due to speech. DNA studies may provide insight, but although our

<sup>1</sup>For example, whereas the Black-browed Reed Warbler (*Acrocephalus bistrigiceps*) mimics 2–5 species, the closely related Marsh Warbler (*Acrocephalus palustris*) mimics more than 100 (Hamao and Eda-Fujiwara, 2004). Nevertheless their cytochrome b (mtDNA) distance is only 10–11% (Leisler et al., 1997) whereas that between humans and chimpanzees is 15–16% (Castresana, 2001).

knowledge is expanding rapidly (Dediu, 2015), we are still far from being able to relate genetic evidence with speech, the vocal tract or the brain.

Another way to operationalize the search for language-related selection is to search for behaviors that behave differently for linguistic than for non-linguistic signals. For this one needs to conduct experiments using artificial signals or to have participants devise their own signals. This allows for the possibility to include the degree of resemblance to language as a condition in the experiments and therefore to detect specialization for language. In the context of language evolution, the first such experiments were done by Galantucci (2005), but these were mostly meant to investigate emergence of signals and their structure. Since then many experiments have been done to investigate language evolution in a laboratory setting (for reviews: Galantucci, 2009; Scott-Phillips and Kirby, 2010; Kirby et al., 2014). However, few of these experiments look at speech and signals, and those that do mainly focus on cultural processes of emergence of structure (e.g., Garrod et al., 2010; Roberts et al., 2015).

Verhoef et al. (2014) however have compared two different accounts of the emergence of combinatorial structure, one based on the communication-relevant needs for distinct signals, the other on cognitive principles of processing efficiency, and found that the way human participants create structure can best be explained by the latter account. Nonetheless, this study could not determine whether these cognitive processes were languagespecific or not.

A study by van der Ham and de Boer (2015a) has looked at behavior of human participants in a distributional learning task of language-like stimuli and has explicitly tested whether reproduction behavior was as predicted by a domain-general learning mechanism or by a learning mechanism specialized for language. It was found that in this case, behavior could

# **References**


be explained by the domain-general mechanism. Another way to detect cognitive mechanisms that have undergone selective pressure related to speech is to look for mechanisms that behave differently for speech-like stimuli than for less speech-like stimuli. An experiment along these lines has compared category learning and reproduction in the acoustic, visual and tactile modalities (van der Ham and de Boer, 2015b) and found that humans are somewhat better in the tactile and acoustic modalities, but that there is no indication of strong specialization. Results so far therefore do not show unambiguous evidence that point to selective pressure related to language.

# **Discussion**

Although so far no cognitive traits that have undergone selective pressure related to speech have been identified, and although identifying the selective pressures that have shaped any trait is very difficult, nevertheless the evolutionary perspective can help structure research into the cognition of speech and language. After all, the intuitive notion of what cognitive traits are linguistically interesting corresponds to what traits have evolved under selective pressure for language. In addition the evolutionary perspective may help determine what kind of traits can have evolved and those may be rather different than the kind of traits linguists have traditionally focused on—less discrete and formal, more continuous and related to the function of language. Finally, the interdisciplinary approach that the evolutionary perspective entails has led to a number of promising new tools to investigate cognitive adaptations related to language.

# **Acknowledgment**

The research was funded by ERC starting grant 283435, ABACUS.


for the effect of iconicity on combinatoriality. *Cognition* 141, 52–66. doi: 10.1016/j.cognition.2015.04.001


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 de Boer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Why language really is not a communication system: a cognitive view of language evolution

#### *Anne C. Reboul\**

*CNRS UMR 5304, Laboratory on Language, Brain and Cognition (L2C2), Institute for Cognitive Sciences-Marc Jeannerod, Bron, France*

While most evolutionary scenarios for language see it as a communication system with consequences on the language-ready brain, there are major difficulties for such a view. First, language has a core combination of features—semanticity, discrete infinity, and decoupling—that makes it unique among communication systems and that raise deep problems for the view that it evolved for communication. Second, extant models of communication systems—the code model of communication (Millikan, 2005) and the ostensive model of communication (Scott-Phillips, 2015) cannot account for language evolution. I propose an alternative view, according to which language first evolved as a cognitive tool, following Fodor's (1975, 2008) Language of Thought Hypothesis, and was then exapted (externalized) for communication. On this view, a language-ready brain is a brain profoundly reorganized in terms of connectivity, allowing the human conceptual system to emerge, triggering the emergence of syntax. Language as used in communication inherited its core combination of features from the Language of Thought.

Keywords: language evolution, language-ready brain, communication, code model, ostensive model, Language of Thought, globularity

#### Introduction

Language evolution has been mainly approached through the evolutionary notion of *function*. As language is routinely used in human communication, the natural assumption is that the function of language is communication. As a consequence, theories of language evolution have centered on scenarios that try to explain the kinds of selection pressures that could have triggered the emergence of this rather remarkable communication system. Inevitably given that communication is the epitome of a social phenomenon, these scenarios have been "social"1 . However, seeing language as a system of communication and proposing that it has evolved *as* a system of communication (i.e., seeing language as being a system of communication in the *strong* sense) rather than *being merely used in communication* (i.e., seeing it as being a system of communication in the *weak* sense) raises a host of difficult issues which have to do with the very nature of language. The question of whether language is or is not a communication system in the strong sense that it evolved *for* communication is far from anecdotal as its answer strongly constrains what a language-ready brain would comprise in terms of necessary preliminary cognitive abilities.

#### *Edited by:*

*Cedric Boeckx, Catalan Institute for Research and Advanced Studies/Universitat de Barcelona, Spain*

#### *Reviewed by:*

*Norbert Hornstein, University of Maryland, USA David Adger, Queen Mary University of London, UK*

#### *\*Correspondence:*

*Anne C. Reboul, CNRS UMR 5304, Laboratory on Language, Brain and Cognition (L2C2), Institute for Cognitive Sciences-Marc Jeannerod, 67 boulevard Pinel, 69675 Bron cedex, France reboul@isc.cnrs.fr*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

*Received: 28 July 2015 Accepted: 08 September 2015 Published: 24 September 2015*

#### *Citation:*

*Reboul AC (2015) Why language really is not a communication system: a cognitive view of language evolution. Front. Psychol. 6:1434. doi: 10.3389/fpsyg.2015.01434*

<sup>1</sup>Számado and Szathmáry (2006) list eleven different scenarios (gossip, grooming, group bonding/ritual, hunting, language as a mental tool, pair bonding, motherese, sexual selection, song, status for information, and tool making), only one of which—language as a mental tool—is clearly and unquestionably non-social.

That language is eccentric among animal communication systems cannot be seriously disputed. It has a core combination of features—semanticity, discrete infinity, and decoupling—that is found nowhere else in nature to our present knowledge (Chomsky, 1966/2009). Relative to the evolution of language as a system of communication, this core combination of features raises two major difficulties:


As Számado and Szathmáry (2006) have noted, none of the extant scenarios can satisfactorily answer these two questions2 . Given these possibly intractable difficulties, it makes sense to reexamine the evidence in favor of the conclusion that language is a communication system and that it has evolved *as* a communication system.

Thus, the main goal of this paper is to assess the notion that language is a communication system in the strong sense. Here, a few words (for a complete presentation, see Animal Communication Systems) about what a communication system is are in order. The traditional view of communication systems is the code model3 : the communicator encodes the message she wants to communicate, this encoded message is relayed along a channel to the receiver who decodes it and recovers the intended message. Though it is generally considered that this applies fairly well to animal communication systems4 (see Animal Communication Systems), there are serious doubts that it can apply to the use of language in human communication. This is because, as has been abundantly argued (Sperber and Wilson, 1995; Carston, 2002; Recanati, 2004, 2010, following in the steps of Grice, 1989), on the whole, the semantic meaning of an utterance (the *sentence meaning*) fails to correspond exactly to what the speaker intended to communicate (the *speaker's meaning*). In other words, encoding–decoding processes are not sufficient to recover the message. While this *contextualist* position is by now largely acknowledged in both philosophy of language and linguistics, it did not penetrate the field of language evolution until very recently, when Scott-Phillips (2015) proposed a new view of language evolution. According to him, there are two main roads to the evolution of a system of communication:


In other words, while, on the code model view of language, it is *continuous with all other animal communication systems*, on the ostensive view of language, it is *discontinuous with all other animal communication systems*.

Obviously, arguments against the code model view of language as a communication system may well be inoperative against the ostensive view of language as a communication system. Thus, both theoretical frameworks will have to be examined, and we will begin with the most popular one, i.e., the code model.

# Language as a Communication System under the Code Model

As we have just seen, under the code model of communication, language is continuous with animal communication systems, and here it is useful to make a brief incursion into animal communication systems.

#### Animal Communication Systems

Though whole books have been written on the subject of the evolution of communication in animals (e.g., Hauser, 1996; Oller and Griebel, 2004), their authors have often been content to use the word without giving it a precise definition. They rely on its vernacular meaning and on a rather vague notion of *information transfer*<sup>6</sup> , waving at Shannon and Weaver's (1949) quantitative definition of *information*. As pointed out by Owren et al. (2010), this is usually accompanied by the idea that this transfer of information is based on an *encoding* (on the signaler's side) and a *decoding* (on the receiver's side) process7 . It is this view of communication as information transfer that makes honesty central to the evolution of communication systems.

Another line of thought was opened by Krebs and Dawkins (1984), who claim that the root of the evolution of animal communication lies in manipulation, linking the sending of a signal (the unit of animal communication systems) to a response (by the recipient) advantageous to the signaler. This

<sup>2</sup>Basically, the two questions above subsume the four questions proposed in Számado and Szathmáry (2006).

<sup>3</sup>Generally considered to have its origins in Shannon and Weaver's (1949) theory of information.

<sup>4</sup>Though for a dissenting view, see Owren et al. (2010) and Section "Animal Communication Systems".

<sup>5</sup>I leave a more complete presentation of an ostensive system of communication to Section"Language as a Communication System under the Ostensive Model" below, where Scott-Phillips' proposal will be discussed.

<sup>6</sup>Cheney and Seyfarth's (1990) classical analysis of vervet monkeys' alarm calls is an example of that strategy.

<sup>7</sup>This is where the investigation into animal communication systems meets with the code model of communication.

view of communication was clearly influential, as shown by Maynard Maynard Smith and Harper (2003, p. 3) definition of a signal:

"We define a 'signal' as any act or structure which alters the behaviour of other organisms, which evolved because of that effect, and which is effective because the receiver's response has also evolved."

In other words, the evolution of communication is not the evolution of the signal in isolation, but rather of pairs of signal– responses. This might be thought to go all the way toward a manipulation account of communication, but this is not the case. In the comments that follow, Maynard Smith and Harper outline some consequences of their definition that put them squarely on the information transfer side. First, if the signal affects the receiver's behavior, it must do so in a way that is not, on the whole, detrimental to the receiver (otherwise selection would rapidly eliminate receptivity to it). Second, this means, on Maynard Smith and Harper's view, that the signal must *reliably and honestly* (truthfully) convey information about the environment or about the signaler's present state and/or future behavior. In other words, the signal evolved for its behavioraltering effects, but that does not mean that it does not carry information.

A stronger challenge to the information-based studies of communication has developed, however, through a series of papers by Owren et al. (2010, for a synthetic presentation), initially inspired by Krebs and Dawkins (1984), but presenting an alternative, rather than a mere addition to the informationbased view. Owren et al.'s (2010) most convincing examples are mating signals. While mating signals have generally been analyzed in the information-based literature as transmitting information to females about males' genetic worth8 , Owren et al. (2010) propose an alternative view. Mating signals, whether visual, auditory, etc., are (in general) not informing the receiver of the signaler's genetic quality, nor is it their function to do so. Rather, mating signals exploit pre-existing sensory preferences of females. These preferences usually have evolved in entirely different contexts (e.g., foraging for food), but once evolved they are ripe for exploitation. Thus, mating signs directly impinge on females' sensory systems, and did not evolve for the purpose of transmitting (reliable) information about the signaler's genetic value. It is important to note that Owren et al. (2010) do not exclude the possibility that mating signals may occasionally carry (reliable) information about the signaler's genetic worth. Rather, if they do so, this is incidental. Their main function, which explains why they evolved, is not to signal fitness, but to attract females. This, basically, is Owren et al.'s (2010) alternative view of animal communication: its main function is not to transfer information between organisms, but to induce behaviors in the receiver that are advantageous to the communicator. Eschewing the negatively loaded term *manipulation*, they propose an *influence*-based view of animal communication.

This is clearly not the place to settle that debate (the interested reader is directed to the papers in Stegmann, 2013), but there one thing worth pointing out. While Owren et al. (2010) rightly deplore the detrimental effect on the animal communication literature of the (language-inspired) information-based approach, one may equally deplore the effects on the language evolution literature of an approach based on animal communication9 , however, tainted by (mis-)conceptions of human language.

One of the best examples of a view of language evolution that sees language as continuous with animal communication systems, in keeping with the code model, is Millikan's account of language and its evolution. I will mainly discuss her most recent book centering on language (Millikan, 2005).

#### Millikan's Account

Millikan's approach to language belongs to the presently influential philosophical program aiming at "naturalizing" the mental10, concerning both mental representations and their communicative counterparts. In a move that has become classical in such programs, she aims at establishing a continuity between *natural* signs or meaning and *non-natural* signs or meaning.

The distinction between some form of natural signification (based on correlations that are, more often than not, grounded in causality) and linguistic signification is far from new, but it was given a paramount importance in Grice's (1989) classical analysis of meaning, which is also Millikan's main target. Grice's strategy was to look at two uses of the verb *to mean*. Thus, he began by comparing the following examples:


While in the first example, the verb *to mean* is used in its *natural* sense, in the second, it is used in its *non-natural* sense. Grice noted that these two uses of the verb are distinguished by the implications that one is entitled to draw from each of them. While natural meaning is *factive*, in the sense that *x means (meant) p* entails *p*, non-natural meaning (henceforth *meaningnn*) is *non-factive* in the sense that *x meansnn p* does not entail *p.* On the other hand, meaning*nn* is *under voluntary control* in the sense that from *x meansnn p* one can deduce that *Someone meantnn p by x*. However, natural meaning is *not under voluntary control* (it does not license the corresponding inference). So, in short, natural meaning is factive and not under voluntary control while meaning*nn* is non-factive and under voluntary control.

Grice (1989, p. 219) went further, however, and added the following definition of meaningnn:

"A meantnn something by *x*" is roughly equivalent to "A intended the utterance of *x* to produce some effect in an audience by means of the recognition of that intention."

<sup>8</sup>Given their greater biological investment in reproduction in the vertebrate and even more in the mammal species, females are generally the "choosy" sex.

<sup>9</sup>Interestingly, in their paper, Owren et al. (2010) strongly suggest that language and animal communication are entirely disjoint phenomena, a view with which I concur.

<sup>10</sup>Initiated by Dretske (1981).

In other words, meaningnn is not only under voluntary control: additionally, the speaker has a double intention:


Grice was at pain to emphasize that the primary intention is crucial to the definition: cases where the audience recognizes the meaning without recognizing the primary intention are not cases of meaningnn. Additionally, Grice insisted that, though meaningnn could be conventional, it did not have to be conventional. In other words, on Grice's view, normal linguistic communication is not a matter of encoding and decoding as such, but rather of recognizing the speaker's primary intention.

Grice's account of meaningnn has been Millikan's target all along her philosophical career (the first instance was Millikan, 1984). Her goal has been to show that the psychological side of Gricean meaningnn is not necessary, that linguistic communication is indeed a matter of encoding–decoding and that meaning is conventional in a utterly non-psychological sense11. In other words, what distinguishes natural signs or meaning from non-natural signs or meaning is only factivity, not volition: natural signs are factive, non-natural signs are not (the signaler may be mistaken or deceptive). Thus, Millikan's distinction between natural and non-natural meaning is wholly non-psychological.

Millikan's account of meaning centrally uses the notion of *function*, explicitly borrowed from evolutionary biology. Millikan (1984) introduces the notion of *proper function,* which is fundamentally *historical* in the following sense: it does not refers to what an entity (be it an organ or a behavior) actually does, but rather to why that entity not only exists now, but has persisted (possibly with modifications) since its emergence, in other words, why it has been selected for. So whatever the state of your heart, and regardless of whether it actually reliably pumps blood throughout your body, its proper function is to pump blood, because this is the reason why hearts have evolved, been preserved (and improved) throughout vertebrate history. Note that proper functions are not limited to biological organisms: they can also characterize artifacts of all kinds, from institutions to tools. In other words, they can be the product of either biological or cultural evolution. The essential thing is that the entity considered has a history which explains why it persisted throughout time by the function it normally performs.

On Millikan's view, language is a communication system, on a par with the other animal communication systems, as far as its evolution is concerned. She shares with Maynard Smith and Harper's (2003) definition of a signal the idea that signals evolve in tandem with responses (indeed, she views language as the solution to coordination problems in humans12). Her idea is that the proper function of a signal is to evoke a specific response in the receiver, and that it does so through information transfer13. While clearly the notion of information transfer involved applies to natural language as well as to animal communication systems, Millikan acknowledges that linguistic signals and animal signals are different up to a point. This can be seen through her analysis (Millikan, 2004) of vervet alarm calls. In linguistic terms, such calls (e.g., the leopard call) have a double direction of fit: both world-to-signal (i.e., the call reflects the current state of the environment, e.g., the presence of a leopard in it) and signalto-world (i.e., the signal simultaneously enjoins the recipient to give a specific response, e.g., flying to the top of the canopy). Millikan proposes to call such double-directed signals *pushmipullyu representations*. As Millikan (2013) herself concedes, it does not make any sense to "translate" animal signals into language. For instance, the vervet leopard alarm call is in no way equivalent to the complex sentence "There is a leopard here and you must climb to the top of the nearest tree". Though this might reflect fairly faithfully the meaning of the call, it is not a translation, because animal signals are, on the whole, holistic14 : *the signal means something as a whole, not as a combination of its parts*. Indeed, as Millikan acknowledges, it is only with language that the two directions of fit (*indication* = world-to-signal and *direction* = signal-to-world) become differentiated.

Thus, Millikan acknowledges that animal signals are bidirectional, but linguistic utterances are not. I will now turn to a criticism of Millikan's position, using two kinds of arguments: general arguments regarding the very notion of a linguistic signal in signal-information/response pairings, and pragmatic arguments regarding signal-information/response pairings.

#### Some Difficulties with Millikan's Position

The very structure of Millikan's theory raises major difficulties and those difficulties are all linked, in one way or another, to the essential historicity of Millikan's notion of signal, inherited from her notion of proper function. Basically, for pairings such as those that Millikan proposes as the origin of signals to occur, the signaltype, the information-type, and the response-type each have to be perennial and the repeated couplings between signals of that type, information of that type and responses of that type also have to be perennial.

<sup>11</sup>Which is where she parts ways with Lewis, 1969/2002 account of convention.

<sup>12</sup>Again, though she borrows the term *coordination problem* from Lewis, 1969/2002, there is very little left of Lewis' account of convention in Millikan's theory. I will not discuss this here, as it is hardly central to my main purpose.

<sup>13</sup>Subject to the same strictures as mentioned by Maynard Smith and Harper (2003): while the response must be advantageous for the sender, it must not be generally detrimental to the receiver, otherwise selection would eliminate receptivity to it. This led Maynard Smith and Harper to the (correct) conclusion that, on such an account, the evolution of signals is bound by honesty constraints. 14There is evidence that some monkey species occasionally combine two calls to produce a modified meaning (Zuberbühler, 2002). These fairly limited phenomena are still poorly understood (for an intriguing pragmatic account, see Schlenker

et al., 2014), but they hardly challenge the huge difference in compositionality between animal communication system and language (the only example of fairly sophisticated combinatoriality is birdsong, which, however, is not semantically compositional).

This raises difficulties for the three main components of the pairings:


I will examine them one after the other.

#### Signals

A first and major question is what a linguistic signal should be. Under Millikan's broad definition of a signal, something is a signal if its proper function is to trigger a specific response in an audience, through information transfer.

Signals have to be units of communication, i.e., they have to transfer the information/produce the response in their own right. Basically, this means that they have not only to be semantic units, but also have to be communicative units (though the two normally coincide in holistic animal signals, as we shall see, they do not in language). Traditionally, it has been considered that language is *doubly articulated*15: on a rough and ready description, at the phonological level, phonemes are combined into meaningful words; at the syntactic level, words are combined into meaningful sentences. Clearly, phonemes, being semantically vacant, are not semantic units, and hence not signals. So, the first candidates for signals are words. On the face of it, they seem to be good candidates: they are perennial enough both in their forms and in their meanings16. The main problem with words is that, while they are semantic units, they are not communicative units. Though shouting "Fire!" may be a perfectly well-formed communicative act in some circumstances, most linguistic communicative acts do not correspond to isolated words. This leaves us with the sentence, understood as a utterance-type.

There is, however, a major problem with the notion that sentences are linguistic signals in the required sense. Couched as an argument:

**Lack of History Argument (Syntactic)**: Given linguistic creativity, sentences are fairly often one-off, that is, they lack the history necessary to the establishment (through signalinformation pairing due to repeated correlations of signal and information) of a proper function.

To show why this is the case, I will now examine (and reject) an objection to the notion that language is characterized by linguistic creativity. This objection targets one of the core properties of language, i.e., discrete infinity.

It is to the effect that humans being finite cannot be said to produce an infinity of different sentences. This *Finitude* Argument has been formulated as follows by Li and Hombert (2002, p. 196): "Theoretically the number of possible sentences in English is indefinitely large because theoretically 'the longest English sentence' does not exist. If one chooses to describe English syntax or certain aspect of English syntax in terms of rewriting rules, one can claim that a recursive function is needed. However, one never conjoins or embeds an indefinitely large number of sentences in either spoken or written language. 'Indefinitely large number of sentences' or 'infinitely long sentences' are theoretical properties." This seems to rests on a profound misunderstanding of both discrete infinity and recursion. To see it, an analogy with another system providing discrete infinity, i.e., mathematics, is useful. Saying that, because we do not (and could not, as finite beings) produce infinitely long sentences, discrete infinity and recursion are not relevant features of language is on a par with saying that, because we do not (and could not) count to infinity, discrete infinity and recursion are not relevant features of mathematics. The argument is, to say the least, mystifying. Arguably, recursion is needed to count up to any number greater than one, just as it is needed to produce any sentence with an embedding. Once you have the relevant recursive ability, you have the theoretical possibility of counting to infinity or to producing infinitely long sentences, and whether you do it or not is utterly irrelevant. Discrete infinity is a structural, not a behavioral property. Thus, human finitude is no argument against linguistic creativity.

More crucially, the argument is no answer to our worry regarding the absence of history for sentences. Even though each human, being a finite organism, cannot produce an infinity of different sentences with different contents, linguistic creativity *as a structural property of language* allows each human to produce sentences different from all those produced before, with contents different from all of those produced before. This being so, the fact that sentences may not have the necessary history to function as signals in pairs of signal-information/response remains a central problem. In sum, human finiteness is not an argument against linguistic creativity and is no answer to the absence of history for sentences.

This, then, is the first major problem for Millikan's theory and it is, obviously, a syntactic argument. There are, however, further objections to her proposal and we will now turn to information.

#### Information

Regarding information, Millikan has concentrated on two main pragmatic phenomena, illocutionary force (Millikan, 1984, 2004, 2005) and implicatures of the scalar variety (Millikan, 2005). Beginning with the former, from 1984 on, her argument has been mainly based on the pairing between sentence forms (affirmative, interrogative, imperative, etc.) and the corresponding speech acts, covering both information and response. Leaving responses aside for further discussion later on, let us concentrate on information17 . The "information" pairing is between sentence

<sup>15</sup>Anderson (2013) rightly points out that this is not, strictly speaking, correct, given that there is a third articulation at the morphological level. I will ignore this complication here.

<sup>16</sup>Obviously, words change both in acoustic form and meaning with time. But while this may be a relatively quick process (taking at most decades rather than centuries or millennia), words still are stable enough to qualify as signals.

<sup>17</sup>Setting aside both the evolutionary side of Millikan's proposal as well as the pairings between signal and response, it is clear that Millikan's view of the pairings between signals and information has much in common with contemporary constructivist approaches to language acquisition in linguistics (Goldberg, 2006). I will not discuss constructivism as such here for reasons of space. Note however that *mutatis mutandis,* the pragmatic arguments against Millikan's account also apply to constructivism.

form and illocutionary act (or illocutionary force) and, as Millikan herself acknowledges (following Strawson, 1964), fairly often, an utterance can be linked to widely different illocutionary forces. Consider (3):

(3) Peter will come tomorrow.

Depending on the circumstances, this can indeed be interpreted as a promise, a menace, a warning or a prediction. Millikan proposes to get around this problem through a multiplicity of (proper) functions. As said above, the proper function of an entity is not what it actually does but why it has persisted through time. And even if it is not *always* reliably associated with that function, it is sufficient that it is associated with it *often enough*. Thus, the existence of occasional functions different from the proper function of a sentence is not a problem. Here, it is interesting to look at Millikan's view of language change (which concerns the emergence of implicature readings). According to Millikan, if a linguistic form with a given proper function becomes associated often enough with another different function, this second function will become its new or additional proper function. In other words, the proper function of a linguistic item depends on the frequency with which this item is associated with this function and a linguistic signal can have several functions, proper or otherwise.

Let us look at an example:

(4) The pianist played *some* Mozart sonatas.

Notoriously, this utterance can be given two interpretations:


According to Millikan, the initial proper function of (4) is to communicate (5). However, (4) is sometimes used to communicate (6) and, in time, this gives rise to a new function for (4). In addition to (5), (4) has also the function of communicating (6).

There is something mysterious about the process, however. How is it, if the proper function of (4) is to communicate (5), that, on the first occasion of its being used to communicate (6), the hearer will recognize that this is the case? Here, we turn to a first pragmatic argument:

**First Occasion Argument**: If meaning is established through repeated pairings, for such a pairing to take off, the meaning of a linguistic signal (or construction) has to be established on the occasion of its first production. A pragmatic inference will more often than not be necessary.

Note that the same argument applies to (3) above. Suppose that the initial function of (3) is to convey the illocutionary force of prediction. How does (3) acquire the additional functions of conveying the illocutionary forces of warning, menace of promise?

A final problem to do with first occasion arises for those signals who are associated with a given speaker meaning on a single occasion (one-off), as is clearly the case for some creative metaphors, such as18:


In such cases, there is no way to recover the intended meaning through semantic compositionality, and pragmatic inferences to the speaker's intentions are obviously necessary.

This is not the only difficulty, however. If a single linguistic signal can have several (proper) functions, this approach leads to widespread ambiguity in linguistic signals. And this suggests a second pragmatic argument:

**Ambiguity Argument**19: This approach supposes widespread ambiguity in linguistic signals. The resolution of that ambiguity will have to be done through pragmatic inferences.

Note, however, that what is central to Millikan's view is not the absence of context-based pragmatic inference *per se*, but rather the absence of the *Gricean* kind of pragmatic inferences. Specifically what this means is that Millikan does not reject contextualism as such but that she rejects any brand of contextualism in which either the context includes psychological representations (e.g., speaker's intentions or beliefs) or the interpretation process leads to psychological representations (e.g., *By X, the speaker meant Y*).

Here, it is interesting to go back to Millikan's analysis of natural signs. As she notes, while natural signs do not have proper functions, they are nevertheless paired with types of information: smoke and the presence of a fire, clouds and future rain, etc. However, while natural signs are factive, they are not necessarily paired bi-univocally with the information they convey. Sometimes, two different natural signs with identical forms will be associated with two different informations depending on which environment each of them occurs in. Let us take an example. It so happens that identical tracks can be left by, e.g., a small bird and a small rodent. However, in wood A, there are only birds and no rodents, while in wood B, there are only rodents and no birds. Thus, natural signs with the same form will be read (factively) as corresponding to birds in wood A and to rodents in wood B. In other words, even natural signs can be context-dependent relative to the information they convey. If this is the case, why not apply the same solution (contextdependency) to sentences? Sentences would always be associated with context types, and utterance types would correspond not to sentences, but to couples of sentences and context-types. It is these composite utterance types that would be paired with proper functions, rather than sentences in isolation. And, obviously, such composite utterance types would make perfect sense as signals in signal-information/response pairs. Note that on such

<sup>18</sup>Strangely, these are given by Goldberg (2006, p. 6) as examples of constructions, that is as examples of repeated pairings between forms and functions.

<sup>19</sup>This argument was first opposed to Millikan's view (though not under that name) by Origgi and Sperber (2000).

a view (which reflects Millikan's see Millikan, 2004, Chap. 10), nothing like a Gricean "psychological" account is needed. The type of contexts concerned do not include any representation of the speaker's intentions or beliefs, or indeed, of anyone's mental states.

Let us now come back to example (3) above. As said before, a sentence such as *Peter will come tomorrow* may be understood as a promise, a menace, a warning or a prediction. Can we make sense of this in terms of utterance type, i.e., in terms of couples of sentences and (non-psychological) context types? In this specific case, it seems rather difficult to distinguish between these different illocutionary forces without appealing to mental states in both the speaker and the hearer. Presumably, leaving aside the fairly neutral speech act of prediction, what illocutionary force such an utterance will have will very much depend, not only on the speaker's intention but also on what she knows, or believes she knows, about her hearer's mental attitudes to Peter's coming. The same reasoning applies to (4): whether it will be interpreted as (5) or (6) will depend at least in part on the intention the hearer attributes to the speaker.

In other words, the requirement that the context be nonpsychological seems a gratuitous complication as far as linguistic communication is concerned, as distinguishing between different illocutionary forces will, more often than not, depend on the representation of the relevant attitudes in the speaker, the hearer or both. There is yet another worry, which again, goes back to the first occasion argument. Given that utterance types are themselves composite, being couples of sentences and context types, one can also ask how such couples come into existence, leading to a higher order first occasion problem. This problem is especially acute for linguistic communication, given decoupling, which allows speakers to speak of absent or non-existent objects, introducing a further difficulty as both the signal and its referent have to be present for any association process to operate.

Hence, neither the assumption of widespread ambiguity for sentences, nor the assumption of composite utterance types, leading to semantic inflation, can work given psychological parsimony. Basically, exchanging *semantic parsimony* + *psychological inflation*, as proposed by Grice, for *semantic inflation* + *psychological parsimony*, as proposed by Millikan, is not tenable. Whether one goes for semantic parsimony or for semantic inflation, one cannot escape psychological inflation. Thus, it does not seem that composite utterance types can play the role of signals in signal-information/response pairs either.

#### Responses

Let me now come to my third objection to Millikan, relative to the response type associated with the signal. Going back to Millikan's central example, speech acts, the "information" pairing is between sentence form and illocutionary act, but the "response" pairing is between sentence form and perlocutionary act. Here, it is important to see why Millikan shares with Maynard Smith and Harper the view that it is not signals that have evolved, but rather signal–response pairs. This makes sense on an evolutionary view (be it biological or cultural) because, while conveying information does not as such make sense in evolutionary terms (information is a precious commodity, so why share it?), triggering responses in others, as long as these responses are advantageous to the signaler, makes perfect sense. So, on a view such as Millikan's, according to which language is a communication system, it seems reasonable to see linguistic signals (whatever they are) as paired with responses rather than only with information.

Millikan's main example is assertion, which, on the response side, is, according to her, paired with receiver's belief. Obviously, not all assertions lead to receiver's belief, but, as indicated above, for the pairing between assertion and receiver's belief to be established (or, in other words, for receiver's belief to be the proper function of assertion), it is sufficient that assertion be paired with belief often enough. Here, I want to discuss the appropriateness of belief as a receiver's response in an evolutionary perspective.

On the face of it, it would seem that any receiver's response in signal–response pairs should be detectable if the pairing is to have evolved20:

**Detectability of Response Argument**: for signal-response pairings to get off the ground, both the signal and the response must be detectable (respectively, by the receiver and by the signaler).

The problem with belief is not only that it is a mental state (and as such less easy to detect than a behavior or an action); it is in addition especially difficult to detect among mental states. While intentions are fairly often obvious from bodily preparation for action21 , and emotions or feelings are detectable through facial expressions, belief seems to be wholly internal and not linked to any specific exteriorization22 . One could argue of course that, given a belief with a certain content in her hearer, the speaker can detect its presence through his behavior interpreted *via* Theory of Mind, i.e., *via* the attribution of mental states. This, however, not only seems uncertain (see below), it also is not clear whether Millikan would agree with such a development, which is tantamount to re-introducing a rather Gricean (psychological) factor in the evolution of communication. Thus, belief appears to be a fairly strange candidate for a response in signal–response pairings.

This, however, is only a first objection. A second, and potentially more decisive objection is that responses, on such a view, have to be advantageous to the signaler (or, in the case of language, to the speaker). But belief as such is not advantageous to the speaker. Rather it is the behavioral consequences of the receiver's belief (his deciding "to act on his belief ", so to speak) that may be advantageous to her. But, how exactly a hearer will act on his belief will depend on a host of other things, including his other beliefs and his desires, which strongly underdetermines the behavioral consequences of his (speaker induced) belief. Let

<sup>20</sup>No association is possible otherwise.

<sup>21</sup>It seems indeed to be the mental state that most animals or young children detect fairly easily, though perhaps in less mental terms such as *goal* or *purpose*.

<sup>22</sup>This may be because belief is phenomenologically vacant: there is nothing it is like to believe something outside of religious or quasi-religious (e.g., political, esthetic, and ethical) belief.

us suppose, for instance, that John wants to go, while Mary wants him to stay. Mary could say:

#### (9) It is raining.

While the belief that it is raining might indeed induce John to stay, it might equally well make him take his umbrella, phone for a taxi or do a number of other things, none of which is staying, and none of which is what Mary wishes him to do. In other words, even in such simple cases, hearer's behavioral responses are far from being obvious and there is certainly no way to predict them with any degree of certainty. And linguistic communication is of course far from being limited to such simple circumstances. In other words:

**Underdetermination of Behavioral Response Argument**: In humans at least, the automaticity or even the frequency of a given response to a given linguistic signal is largely underdetermined, undermining the pairing of signals and responses.

So Millikan's choice of example, associating a linguistic signal (assertion) with a response that is a mental state (belief) can be explained through the fact that human action is not so automatic that it can be reliably associated with signals, barring imperatives in such strongly authoritative circumstances that the hearer has no choice but to comply. This, however, has two fairly negative consequences for her view of the evolution of linguistic communication: first, mental states are not the most detectable of responses, which raises a major difficulty for a signal-response pairing account such as hers (Detectability of Response Argument); second, mental states are additionally only indirectly advantageous to the speaker: they can only be advantageous to her if they lead her hearer to a behavior that she wants him to perform, but this is uncertain in most cases (Underdetermination of Response Argument).

Thus, Millikan's endeavor to "de-psychologize" language and range it among all other animal communication systems fails. We will now turn to Scott-Phillips's (2015) highly different view of language as a communication system.

# Language as a Communication System under the Ostensive Model

*Ostensive communication*, a notion that Scott-Phillips borrows from Sperber and Wilson (1995), corresponds to the view that human communication is intimately linked to the crucial notion of *relevance*. Relevance is a minimax notion and the communicative version of relevance goes as follows:

**Relevance**: An utterance is relevant to the extent that:


The cognitive effects produced by the interpretation of an utterance can be of three sorts: strengthening or weakening the conviction with which previous assumptions are entertained; deleting a previous assumption that is contradicted by the new information obtained (depending on the confidence the hearer places in the speaker); producing new assumptions. The *Communicative Principle of Relevance*<sup>23</sup> says:

Every utterance carries the guarantee of its own optimal relevance.

*Optimal relevance* is achieved when the cognitive effects of an utterance balance its interpretive costs. The reason why utterances carry the guarantee of their own optimal relevance is because any utterance is an instance of *ostensive-inferential communication*. A behavior is an act of ostensive-inferential communication in as much as it makes it obvious to the receiver that the signaler has produced it with a communicative intention—this is the *ostension* part—and it is produced as evidence to be used in the inferential process through which the receiver will recover the signaler's informative intention (i.e., the content she intended to communicate)—this is the *inference* part. Thus, an act of ostensive communication guarantees that it is worthwhile for the hearer to pay attention to it. Hence, by putting ostensive-inferential communication at the heart, not only of linguistic communication, but, as we shall now see, of language evolution, Scott-Phillips is taking a position which is the opposite of Millikan's relative to language. Millikan's rejection of inferential pragmatics and insistence on signal– response pairings makes her analysis unable to deal with the semantic underdetermination that is characteristic of linguistic communication. Scott-Phillips' proposal can deal with it. But, as we shall see, it does more than that: his proposal basically reverses the problem.

At the center of Scott-Phillips' view is a distinction between *natural codes* (which correspond to what Millikan describes) and *conventional codes* (which do not). The originality of Scott-Phillips' proposal is to see ostensive communication (a shorthand for ostensive-inferential communication) not as a way of solving the problem of the semantic underdetermination of the conventional linguistic code (which would thus still be the basic root of linguistic communication), but as itself the root of human, including linguistic, communication, the conventional codes constituting language as a system being added to give human communication more expressive power. In other words (Scott-Phillips, 2015, p. 577), "there is a qualitative difference between the codes used in the code model, and the linguistic code. Put simply, one makes a type of communication possible, the other makes a different type of communication expressively powerful." Conventional codes are ubiquitous in language, being found at the phonological, lexical, syntactic and even pragmatic (e.g., politeness conventions) levels. Scott-Phillips (2015, pp. 628–629) concludes: "This view of a language as a set of conventional codes that augments ostensive communication recognizes both the pragmatic foundations of linguistic behavior, and the importance and nature of the conventions that make languages different to other, simpler cases of ostensive-inferential communication, such as points, non-linguistic vocalizations, nods of the head, and so on."

So, to sum up, on Scott-Philipps' view, language is indeed a communication system, but it is a communication system entirely

<sup>23</sup>There is also a Cognitive Principle of Relevance, which we will ignore here.

discontinuous with most if not all animal communication systems as it has evolved in the wake of abilities for ostensive communication that themselves depend on the previous evolution of a sophisticated Theory of Mind, developed on the basis of pre-existing primate abilities in social cognition, but outstripping them by far. Language itself is a collection of conventional codes, which greatly enhance the expressive power of ostensive communication, but which, nevertheless, are still in need of pragmatic inferencing, as they are, more often than not, semantically underdetermined relative to speaker's meaning.

There is no doubt that Scott-Phillips' proposal differs in many ways from Millikan's. There is, however, one point on which they seem to meet. It is highly difficult, from Scott-Phillips' presentation to see where exactly his conventional codes would differ from constructions, and, as we have seen, Millikan is also something of a constructivist. What is more, Scott-Philipps adopts a few other constructivist tenets. For instance in his fifth chapter, he rejects the Chomskyan notion of Universal Grammar24, which he sees as unnecessary. He also rejects the idea that recursion is a central factor in syntax and in linguistic creativity, though he seems to accept linguistic creativity in as much as he claims that linguistic communication is unlimited in the number of different contents language may be used to communicate.

This is not the only aspect in which Scott-Phillips' theory meets Millikan's. Another important meeting point between the two accounts is the notion of a signal-response pair as the basic communicative unit. Basically, Scott-Phillips distinguishes between *signals, cues, coercion, accidents*, by whether or not the behavior is designed to give rise to (designed) responses. In the case of a signal (the only communicative unit), the signal is designed to trigger the designed response (very much in keeping with Maynard Smith and Harper's definition, see Language as a Communication System under the Code Model). The cue is not designed to trigger the response, though the response is designed as a response to that type of cue. In coercion the action is designed to trigger the response, but the response is not designed as a response to that type of action. And finally in an accident, neither the accident nor the response are designed relative to one another.

Given these two important points of agreement between Millikan's and Scott-Phillips' views, it makes sense to ask whether Scott-Phillips' proposal falls foul of the objections raised above against Millikan's. Obviously, the pragmatic objections (First Occasion Argument and Ambiguity Argument) do not apply. But, as we shall see, both the Lack of History Argument and the Underdetermination of Behavioral Response Argument do apply to Scott-Phillips' theory.

As discussed above, any theory that defines communicative units as the result of pairings between signals and information/responses *ipso facto* supposes perenniality in signal types, in information types, in response types and in the pairings that link them. Scott-Phillips differs from Millikan in acknowledging from the start that the information communicated by different utterances of a given sentence will differ from occasion to occasion, and he does not explain this through widespread ambiguity. He explains it through the deep semantic underdetermination of linguistic (conventional) codes. This deep underdetermination affects speaker's meaning, and makes it necessary for the conventional codes to be supplemented by pragmatic inference. While on Scott-Phillips' model, pragmatic inference is available, this nevertheless means that different utterances of the same sentence will not be repeatedly paired with the same information. This leads us to a *pragmatic* version of the Lack of History Argument:

**Lack of History Argument (Pragmatic)**: Given semantic underdetermination, the speaker meaning attributed to one utterance of a given sentence will often be one-off, that is, it will not necessarily be attributed to any other utterance of the same sentence. In other words, utterances lack the semantic stability necessary to the establishment of a conventional code.

Let us now turn to responses. The example Scott-Phillips gives of a signal is of a man pushing a woman down under the eyes of another colleague, who laughs in response25. The pushing was intended to be seen by the laughing colleague and thus it is a communicative signal designed to trigger as its designed response the laughter. While this example is certainly not susceptible to the Detectability of Response Argument (laughter being detectable), it nevertheless is susceptible to the Underdetermination of Behavioral Responses Argument. Rather obviously, the intended receiver might have remonstrated instead of laughing.

Thus, while Scott-Phillips offers an original and attractive theory, it falls foul of some of the same difficulties that plague Millikan's. My diagnosis is that this can basically be explained by the fact that these difficulties come from what the basic proposition shared by the two views is: that language is a communication system.

# The Language-ready Brain

The proposition that language is a communication system imposes obvious constraints on the abilities that have to preexist for language to get off the ground. Unsurprisingly, given that communication is the epitome of a social phenomenon, these abilities are social. On the code model, the main constraint is *honesty* (see Animal Communication Systems)—and this is all the more important in language, given the opportunities for cheating that decoupling offers. This has led to the view that *altruism*, as a phylogenetic pro-social tendency, is a prerequisite for human linguistic communication and for language evolution. On the ostensive model, linguistic communication and language

<sup>24</sup>Though Scott-Phillips acknowledges that there may well be linguistic universals, he proposes to explain them through Cultural Attraction Theory, not Universal Grammar.

<sup>25</sup>He rightly notes that under his view one and the same behavioral token could be at one a signal, a cue, coercion and an accident depending on who observes it.

evolution basically depend on the preexistence of a Theory of Mind of some kind. However, as we have just seen, the notion that language is a communication system in the strong sense that it evolved *for* communication is implausible in view of the difficulties it meets with. One fairly obvious suggestion to account for its use in human communication is that it originally evolved for entirely different purposes and was then exapted (Gould and Vrba, 1982) for communication. Determining what those purposes were is a prerequisite for determining which pre-existing abilities should comprise the language-ready brain.

Here, recall the two questions listed in Section "Introduction", and more specifically the question of why humans—and only humans—need a system of communication that allows them to communicate a potential infinity of different contents. Communication is rife in nature, but language is unique. This immediately raises a further question: where does this infinity of different contents come from? As Millikan (2013) rightly notes, human cognitive sophistication is also unique. Thus, one potential answer to the question above is that a cognitively sophisticated species needs an appropriately sophisticated system of communication. What this means basically is that human intelligence, rather than human sociability, is the key to language. We can go one step further, however, following Fodor and Pylyshyn (2015), and note that thoughts and sentences share the same structural organization: just as sentences structurally compose words in a creative way, thoughts structurally compose concepts in a creative way. Language is creative, because thought is creative. Or, in Fodor and Pylyshyn (2015, p. 89) words, "That thoughts and sentences match up so nicely is part of why you can sometimes say what you think and vice versa."

Hinzen (2013) goes farther and proposes that language is primarily an internal tool for thought and that syntax is the root of the semantic and propositional organization of thought in humans, and hence of the specificity of human thought, compared with non-human animal thought. This, it should be clear, also answers the second question raised in Section "Introduction", i.e., why does linguistic communication allows decoupling which clearly facilitates cheating and deceiving? On a view in which language evolved for thought, discrete infinity, semanticity and decoupling are not structural features specific to linguistic communication, they are structural features specific to thought *and in no way dependent on whether language is externalized for communication* or not. Note that discrete infinity and decoupling, which are obvious embarrassments for a theory of language as a communicative system in the strong sense, raise no problem for a theory of thought: obviously, discrete infinity and decoupling are ways of exponentially increase thought production, while the question of honesty does not arise for thought. So, basically, all of this comes to the suggestion that language did not evolve *for* communication, it evolved *for* thought (as advocated by Chomsky: see, Chomsky, 2014). It allows us to construct what medieval philosophers (Panaccio, 1999) called *complex concepts*, propositions, judgments, etc. This is essentially Fodor's *Language of Thought Hypothesis* (Fodor, 1975, 2008). Language was then externalized for communication, and its externalized version inherited its core combination of properties.

While this explains why language is such an exotic communication system26, it does not, in and off itself, explain why such a sophisticated system of thought is unique to humans: why did humans—and only humans—need such a sophisticated system of thought? Another human specificity is the richness of the human conceptual system. While some core conceptual mechanisms (Carey, 2009) may be shared with other species (notably with great apes, see Gómez, 2004), the extent of the human conceptual system is unique. This difference is obvious from the very limited size of the vocabulary acquired by animals engaged in language research programs (≈300 words) as compared to the size of human vocabularies (300 words at 3-year-old, 6,000 at 6-year-old, and around 200,000 at 18-yearold; for animals' lexicon, see Anderson, 2004; for human lexicons at different ages, Bloom, 2000). While some of the difference may be due to externalized language itself, it is highly unlikely that this is the only explanation. Indeed, other considerations militate against an identity between the human conceptual system and non-human animal conceptual systems, including those of other primates. While monkeys can learn to visually categorize images (Fabre-Thorpe, 2003), they usually do so after intensive training (involving thousands of trials), by contrast with young children who learn new concepts (and the corresponding words) instantaneously (Bloom, 2000; Waxman, 2004). Apart from any reservation, one might have to consider visual categorization as a proof of concept possession, this hints at highly different mechanisms of conceptualization. Finally, though Orangutans may be an exception (Vonk and MacDonald, 2004), other great ape species, though able of categorical discrimination at different levels of abstraction, present a highly different profile from what is found in humans: the intermediate or basic level (roughly corresponding to the level of the species), which is by far the most easily accessed in humans, is the most difficult for them (it is the level at which they fail to transfer learned categories: see Vonk and MacDonald, 2002). Thus, all in all, there are good reasons to doubt that conceptualization follows the same path in humans and in non-human animals.

Here, the hypothesis is that different mechanisms operate in human conceptualization, explaining why humans have conceptual repertories so much wider than other species do. Having a huge conceptual system, however, can only be useful if the concepts can be assembled into complex concepts or propositions (thoughts). While association can bind concepts together (and does in both human and non-human animals), it power is limited: at most, it could lead to sequences of concepts. On the other hand, syntax allows structured and compositional mental representations to emerge (see Hinzen, 2013; Fodor and Pylyshyn, 2015 for more detailed arguments). The suggestion is thus that syntax emerged at the mental level to organize concepts into (propositional) thoughts.

<sup>26</sup>Because it is discontinuous from all other animal communication systems not merely by being the only ostensive communication system (Scott-Phillips, 2015, and see Language as a Communication System under the Ostensive Model), but also by *not* having evolved *for* communication in the first place.

The obvious question is what led to the emergence of such different mechanisms of conceptualization in humans. Here, it is hard to avoid speculation, but, as argued by Boeckx and Benítez-Burraco (2014a,b) and Benítez-Burraco and Boeckx, 2015, there are important differences between modern humans and the Neanderthal/Denisovan branch, the main one being the globularity of the modern human skull compared to the elongated shape of the Neanderthal/Denisovan skull. Boeckx and Benitez-Burraco hypothesize that this change in shape corresponds to major brain reorganization leading to greater cerebral connectivity. Additionally, they point out that this change is not so much due to the enlargement of the frontal lobes as to the expansion and reorganization of parietal areas. Now, one of the peculiarities of the Neanderthal/Denisovan skull is the so-called Neanderthal bun, a bump on the occipital part of the skull, corresponding to the primary visual cortex (V5, Brodmann area 17). While there is clearly more to conceptualization than perception, it is hard not to link the change in human conceptualization to the specificity of human perceptual preference for global processing of visual scenes (Navon, 1977; Kimschi et al., 2005) as opposed to non-human primate perceptual preference for local processing (Fagot and Deruelle, 1997; Fagot and Tomonaga, 1999; Fagot et al., 1999, 2001). It is not impossible that the reorganization assumed by Boeckx and Benítez-Burraco also concerned the occipital area with capital consequences on visual preferences in modern humans, leading to improved conceptualization.

#### Conclusion

While most of this paper has been dedicated to show that language is not a communication system in the strong sense (i.e., it did not evolve *for* communication) and to outline an alternative cognitive account and its consequences for the language-ready brain, I do not want to close it without saying a word about the externalization of a pre-existing language for communication. While I criticized Scott-Phillips' (2015) account above, I nonetheless think that it makes a lot of sense as an account of the *externalization* of language. As noted

## References


above (see Language as a Communication System under the Ostensive Model), Scott-Phillips, following Sperber and Wilson (1995), sees language as a sophisticated brand of ostensive communication and proposes that a less sophisticated and wholly unconventionalized brand of ostensive communication preceded the formation of linguistic conventions. As he notes, all ostensive communication rests on mind-reading abilities. While he is content to suppose that such abilities somehow derived from previous primate social abilities, this is unlikely for a number of reason, the main one being that primates seem pretty restricted in that area. At best, chimpanzees may be able of recognizing intentions (and even that is in dispute: for a general presentation, see Lurz, 2011). Scott-Phillips gives no reason why mind-reading abilities would make such a jump in humans. The hallmark of human mind reading is that, in Dennett's (1987) words, it involves higher-order intentions (e.g., *Peter believes that Mary believes that p*), in other words, metarepresentations. Now metarepresentation crucially depends on recursion as the representations involved are structurally recursive. Under the scenario I propose, the development of recursive syntax in the Language of Thought allowed humans to develop mindreading abilities far in excess of anything to be found in nonhuman species. This allowed humans to indulge in ostensive communication, leading to linguistic conventions, roughly along the lines indicated by Scott-Phillips. Note, in addition, that under this revised scenario, acquiring words means matching words to pre-existing concepts (as largely recognized in the lexical acquisition literature, Bloom, 2000). This largely dispels the problem described in the semantic version of the Lack of History Argument I opposed to Scott-Phillips' view (see Language as a Communication System under the Ostensive Model). While speaker's meaning has to be stable on the view that language evolved as a communication system (and clearly is not), sentence meaning stability is quite enough to ensure the establishment and learning of lexical conventions on the view that language is a communication system only in a weak sense. This is because language as a communication system in the weak sense can piggyback on the pre-existing conceptual system and Language of Thought, that, as argued by Hinzen (2013), fixes referential and propositional meaning.

Carey, S. (2009). *The Origin of Concepts*. New York: Oxford University Press.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Reboul. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Brain readiness and the nature of language

#### Denis Bouchard\*

Département de linguistique, Université du Québec à Montréal, Montreal, QC, Canada

To identify the neural components that make a brain ready for language, it is important to have well defined linguistic phenotypes, to know precisely what language is. There are two central features to language: the capacity to form signs (words), and the capacity to combine them into complex structures. We must determine how the human brain enables these capacities. A sign is a link between a perceptual form and a conceptual meaning. Acoustic elements and content elements, are already brain-internal in non-human animals, but as categorical systems linked with brain-external elements. Being indexically tied to objects of the world, they cannot freely link to form signs. A crucial property of a language-ready brain is the capacity to process perceptual forms and contents offline, detached from any brain-external phenomena, so their "representations" may be linked into signs. These brain systems appear to have pleiotropic effects on a variety of phenotypic traits and not to be specifically designed for language. Syntax combines signs, so the combination of two signs operates simultaneously on their meaning and form. The operation combining the meanings long antedates its function in language: the primitive mode of predication operative in representing some information about an object. The combination of the forms is enabled by the capacity of the brain to segment vocal and visual information into discrete elements. Discrete temporal units have order and juxtaposition, and vocal units have intonation, length, and stress. These are primitive combinatorial processes. So the prior properties of the physical and conceptual elements of the sign introduce combinatoriality into the linguistic system, and from these primitive combinatorial systems derive concatenation in phonology and combination in morphosyntax. Given the nature of language, a key feature to our understanding of the language-ready brain is to be found in the mechanisms in human brains that enable the unique means of representation that allow perceptual forms and contents to be linked into signs.

Keywords: language evolution, evolvability, linguistic signs, brain readiness, self-organization

# Introduction

The main point of this paper is that the central trait of human language is the capacity to form signs by linking perceptual forms and meanings<sup>1</sup> . This predicts that the core mechanisms that make a brain ready for language are those that enable this capacity. Moreover, switching from a computational view of language to a sign-based theory provides a unified approach to the

#### Edited by:

Cedric Boeckx, Universitat de Barcelona, Spain

#### Reviewed by:

Christopher I. Petkov, Newcastle University, UK Wendy Sandler, University of Haifa, Israel

#### \*Correspondence:

Denis Bouchard, Département de linguistique, Université du Québec à Montréal, C.P. 8888, Succursale Centre-Ville, Montreal, QC H3C 3P8, Canada bouchard.denis@uqam.ca

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 26 June 2015 Accepted: 26 August 2015 Published: 09 September 2015

#### Citation:

Bouchard D (2015) Brain readiness and the nature of language. Front. Psychol. 6:1376. doi: 10.3389/fpsyg.2015.01376

<sup>1</sup>This paper presents the main hypotheses exposed in Bouchard (2013). I have therefore borrowed substantially from that text without indicating it by quotation marks or references in order not to overburden the readers.

functioning of the main subsystems of language. The perceptual and conceptual substances of signs create a system that reaches a level of such complexity that it triggers self-organization, deriving specific properties of signs, as well as the basic structuring of language in its phonology, semantics, and syntax.

# The Core Competence for Language

A language-ready brain raises two evolutionary puzzles: a puzzle of emergence and a puzzle of design (Hoefler, 2009, p. 1). The puzzle of emergence addresses the problem of bridging the gap from a stage where our ancestors had no language to a stage where they had language as we know it today. How and why did language emerge in humans and not in other species?

Lewontin (1998) raises strong doubts about the possibility of reconstructing the evolutionary history and the causal mechanisms of the acquisition of linguistic competence (and cognition in general). He emphasizes the near impossibility to come up with evidence "that there was heritable variation for, say, linguistic ability, in our remote ancestors when the human species was still evolving into its present form and that those who possessed this ability, in the remote past, left more offspring by virtue of that ability" (p. 111). So it is extremely difficult for the standard theory of evolution by natural selection to inform us on how language, and more generally, cognition arose and spread and changed. As he points out, humans had an ancestor in common with the chimpanzee and the gorilla about 10 million years ago. So 20 million years of evolution separate us from our closest relatives. During that period, "a major difference in the consequences of cognitive power has taken place during human evolution that makes the cognitive difference between gorillas and chimpanzees trivial compared to our cognitive distance from them" (p. 116). Evolved forms may diverge very dramatically in a relatively short period of time. Lewontin gives the example of cows, goats, and deer that differentiated 10 million years ago. Therefore, it is unlikely that we can determine even approximately—when our linguistic capacity emerged in our ancestry. In addition, a trait may derive from analogy just as well as from homology. Moreover, we cannot measure the actual reproductive advantages of cognition or language. Fossils, furthermore, are of very little help concerning cognition, and often we cannot even be sure whether a fossil is from an ancestor or some relative on another branch of the bush-like relations between species. So we cannot tell what our immediate nonlinguistic ancestors were like cognitively. Almost two decades after the publication of his paper, the problem still appears to be substantial, though advances in our knowledge of genes open some research avenues concerning heritable variation, even for remote ancestors.

Nevertheless, there is room for testable theories about what language is, what brain mechanisms this requires, and whether some of these brain mechanisms are unique to humans at least compared to other current species. As we progress in our understanding of the human brain, we can compare it with the neuro-anatomy of related species and see how they differ in form and function. We can pinpoint some current neurological distinctive trait(s) that enable(s) language, and hence determine WHAT made language emerge. Regarding WHEN and HOW the organism evolved to get that change, we can only speculate. But at least we can elaborate a theory that passes the test of evolvability: if a theory can show how some actual neuro-anatomical element enables language as we know it, then that theory is in accord with the fact that an organism with a language-ready brain is an evolvable organism, because this neuro-anatomical element can indeed develop according to the laws and principles of biological evolution, since it exists in human brains. Moreover, the nature of the neuro-anatomical trait can give us an indication of what it could have come from. This is particularly the case if language is a side effect of the neuro-anatomical trait, as I argue below: the other functional effects of the trait can further restrict the possible scenarios.

This brings us to the second evolutionary puzzle, the question of design: how and why did language evolve with the properties that we observe rather than some other set? To identify the components that make a brain ready for language, neuroscientists must know precisely what such a brain must do, hence ultimately, what language is. Not that the brain mechanisms will somehow be analogical to the functional aspects of language: examples abound where it has been shown that the neural substrates or the mechanisms supporting behavior, are not predicted by psychological models. However, we must understand precisely what language is and have well-defined linguistic phenotypes to search for the neural substrates that enable these phenotypes.

There are numerous properties that have been attributed to language. Many have been recently proposed and many are not widely accepted because they depend on narrow theoretical assumptions. It would be a formidable task to look at hundreds of properties in exploring the language-readiness of the brain, and probably futile in many instances since the properties are probably ephemeral. It is more productive to investigate two properties of language for which there is a long-standing and broad consensus among scholars—the capacity to form signs (words, morphemes), and the capacity to combine them into complex structures:

"at least two basic problems arise when we consider the origins of the faculty of language [...]: first, the core semantics of minimal meaning-bearing elements, including the simplest of them; and second, the principles that allow infinite combinations of symbols, hierarchically organized, which provide the means for use of language in its many aspects" (Chomsky, 2005, p. 4).

If we can explain how the brain is ready for these two basic properties, how it enables them, we are heading in the right direction. However, if we consider what the founder of the most prominent theoretical model in linguistics says about the evolution of these two properties, the prospects look rather dim. Concerning the capacity to form signs, Chomsky (2010) says that it is "of totally mysterious origin." Moreover, though he has contributed to a very influential paper on the origin of linguistic combinatoriality (Hauser et al., 2002), Chomsky and some of his colleagues now believe that the origin of combinatoriality is also a mystery, as indicated in the very title of their paper: "The mystery of language evolution" (Hauser et al., 2014).

The problem is further amplified by the fact that, despite recent attempts to limit it, the current model still relies on a large set of innate, language-specific conditions—Universal Grammar (UG)—which is a repertory of unexplained properties (Chomsky, 2007, p. 19)<sup>2</sup> . UG is therefore a highly problematic component from an explanatory point of view, since the richer the set of language-specific brain features, the harder it will be to account for it: "Aspects of the computational system that do not yield to principled explanation fall under UG, to be explained **somehow in other terms**[my emphasis, DB], questions that may lie beyond the reach of contemporary inquiry, Lewontin (1998) has argued" (Chomsky, 2007, p. 24). This is as close as one can get to saying that UG is also an unsolved mystery, maybe even an unsolvable one<sup>3</sup> .

The three mysteries are not simply subcases of the difficulty to reconstruct evolutionary history and the causal mechanisms of the acquisition of linguistic competence: they are also problems of evolvability. The UG model appears incapable of providing a principled explanation based on some neuro-anatomical elements that would account for the numerous language-specific components it postulates. Brain readiness and evolvability are closely linked, so evolvability is an important test for linguistic theories: the traits that a linguistic theory requires of the human brain must be highly plausible according to the known laws and principles of biological evolution. We may not be able to trace the evolutionary path of how language emerged, but we can evaluate the degree of evolvability of a linguistic model, its plausibility given known laws of evolution.


<sup>3</sup>Moreover, the UG problem has actually increased, since analyses in that model have drifted toward a constant increase in functional categories (to wit, the cartography approach, Cinque, 1994, 1999, 2002; Belletti, 2004; Rizzi, 2004 and nano-syntax Kayne, 2010). Most of these functional categories are redundant system-internal correlates (there are functional categories of SIZE, COLOR, ORIGIN, etc. because there are adjectives of those categories): they add nothing to our understanding of the facts. They are not even discovered correlations but invented correlations, elements added to the theory solely to correspond to some phenomenon (much in the behaviorist way so fiercely criticized in Chomsky's, 1959 review of Skinner; see the discussion in Bouchard, 2001).

In the face of the triple mystery assessment, we might judge that the evolvability of the language-ready brain is too hard a problem and decide to simply drop it. But scientists don't like to give up. If the problem appears insurmountable from the perspective a theory, however widely scholars adhere to it, its apparent incapacity to deal with such core issues as signs, combinatoriality and language-specific conditions in general, can be a motive to scrutinize that theory to figure out why it fails in this respect, and to use this assessment to elaborate an alternative model that can adequately address the core issues. Proponents of UG, and those who share the mystery assessment about language such as Lewontin (1998), all put a high emphasis on the property of discrete infinity found in language, which is assumed to be the core property of the language phenotype: "the core competence for language is a biological capacity shared by all humans and distinguished by the central feature of discrete infinity—the capacity for unbounded composition of various linguistic objects into complex structures" (Hauser et al., 2014, p. 2). This is understandable from a historical background. Generative grammar was born in the context of emerging tools in mathematical logic. For the first time, these tools provided the means to formalize recursion, which had been informally recognized as a property of language for some time (cf. Humboldt's infinite use of finite means). In this context, the most striking characteristic of human language is its discrete infinity. It is tempting to see discrete infinity as an essential property of language, and to put the corresponding technical tools of recursion at the heart of the model. It is then natural to assume that recursion is the crucial distinctive property of human language.

But this core assumption leads to a triple mystery. We should therefore question that assumption. The language phenotype, like all "facts," is a set of observational propositions which are part of the theory: they are not external to the theory and independent (Lakatos, 1970), and their status can be questioned like any other proposition, particularly in the face of an overwhelming problem such as when a theory leads to a shroud of mysteries. It turns out that the assumption of the centrality of recursion and discrete infinity, though shared by many language scientists, is incorrect. Although it is an observable trait of language, it is not the core phenotype it is assumed to be, but a side effect. The core competence for language is the capacity to take elements from two substances with no logical or natural connection between their elements—perceptual forms and meanings—and to link them into signs (words, morphemes). This capacity to form Saussurean signs is the sole distinctive trait of human language. The fact that only human language has discrete infinity does not imply that recursion is a distinguishing mechanism. This mechanism is uniquely human; however, it is not original: it actually arises from prior elements of the two substances of signs that contain primitive combinatorial processes and produce the effects of recursion<sup>4</sup> .

<sup>2</sup>For instance, here is an illustrative sample of UG elements taken mostly from Hornstein and Boeckx (2009) and Narita and Fujita (2010):

<sup>-</sup> endocentricity labeling;

<sup>4</sup> In this paper, I compare my view with that of Chomsky, since it is the most influential one. There are many other theories of language and its origin, some of which relate to the brain and machinery prior to language. Because of space limitations, I cannot do them justice here, so I refer the interested readers to the extended discussion of other approaches in Bouchard (2013).

To see this, let us now turn to the detailed properties of linguistic signs.

# The Sign Theory of Language

A linguistic sign is generally presented as involving two elements—a meaning and a form—and a link between the two. Saussure (1916) introduced the terms signified and signifier to emphasize that this linking is purely mental, established by speakers. I use the terms "concept" and "percept" in this spirit: they are dynamical mental creations, cognitive structures (see Jackendoff, 2002, ch. 10). This is an oversimplification, however. A linguistic form (signifier/percept) is a mental state linked to an acoustic/visual material element: this element is not linguistic but in the domain of the sciences that deal with the physical and mental properties of acoustic perception and production (Henceforth, I will only discuss acoustic material of the oral modality, but the ideas carry over to the gestural modality). Similarly, a linguistic meaning (signified/concept) is a mental state linked to a psychological element, a chunk of cognition that the mental state evokes: this element also is not linguistic but in the domain of the sciences that deal with psychological phenomena related to thought. It is only when a language establishes Link 1 between a representation of a perceptual element and a representation of a conceptual element that these are linguistically relevant and become a signifier and a signified.

#### **(1) Figure 1**

FIGURE 1 | The structure of a linguistic sign. (A) shows the structure of the word "little." Its linguistic elements are its meaning (here simply represented as LITTLE), which is related to the combination of phonemes that are its form. These linguistic elements are each related to elements outside the realm of language: a certain chunk of cognition for the meaning LITTLE, and physical sound waves for its form. (B) shows the structure corresponding to the word "star."

The linguistically crucial part of a sign is a reciprocal predication: it is the systematic attribution of a vocal form and a meaning to each other. The link between signifier and signified is not determined by logic or by intrinsic properties in the nature of the phonic-acoustic or conceptual substances: it is purely linguistic. The properties of the substances to which the signifiers and signifieds are linked cannot explain why a particular phonetic entity is tagged as the signifier of a certain meaning or why a particular conceptual entity is tagged as the signified of a certain form. These links are not due to natural causes, but rather are arbitrary because the nature of the sounds that our phonatory articulators produce and the nature of the concepts that our conceptual system constructs are so different that they cannot entertain a meaningful, logical, or iconic relation (Saussure, 1916, pp. 155–156).

Now consider syntax. If we look at it in terms as neutral as possible, syntax is minimally defined as the processes by which signs are combined. Consider a simple example of the syntactic combination of the two signs little and star. Each sign is complex by definition—a form resulting from the union of a signified and a signifier. Syntax does not combine just signifiers or just signifieds, it combines relations between signifiers and signifieds, i.e., signs. Since signified and signifier are irreducibly united, any operation applying to one is reflected on the other. So when two signs are combined by a relation R, R operates simultaneously on both their signifieds and their signifiers, as shown in the combination of little and star in (2).

**(2) Figure 2**

which operates simultaneously on their meanings, creating a relation R(CI) at the conceptual-intentional level, and on their forms, creating a relation R(SM) at the sensory-motor level.

Since R operates simultaneously on both the signifieds and the signifiers of the signs in (2), it is itself a sign. I will refer to this set of signs that combine syntactic elements as combinatorial signs (C-signs), to distinguish them from the more familiar unit signs (U-signs), namely words/morphemes. This immediately raises two questions: What is the signifier of a C-sign? What is the signified of a C-sign? As already indicated in Bouchard (1996, 2002), the signifier of a C-sign will take whatever form a language arbitrarily selects from among those that our physiology provides as a combinatorial percept in the modality of that language. These forms are drawn from physical traits of the forms of words. For instance, a first trait in an oral language is that vocal units appear linearly ordered. So signifiers made up of these vocal units can share a temporal edge—they can be temporally juxtaposed: two signifiers can be ordered next to one another, and this can be grammatically significant in the system of a language. For instance, in (3), it is grammatically significant that saw and John are juxtaposed, but not that John and yesterday are juxtaposed: the juxtaposition of yesterday is grammatically relevant only with respect to the phrase saw John (or Mary saw John under different assumptions).

(3) Mary saw John yesterday.

The order of juxtaposition is also frequently significant, as in the pairs in (4):

(4) a. John saw Mary—Mary saw John b. John is sick—Is John sick?

A second trait is that the two signifiers can share a temporal space, as when a modulation is superimposed on the phonemes of a constituent: one signifier is the intonation placed on the other signifier, such as an intonation expressing a question (4b). Other possible superimposed elements are stress and length<sup>5</sup> . All these combinatorial percepts depend on the physiological traits of the modality, so they vary across modalities. For instance, the visual–gestural channel of sign languages has more types of combinatorial percepts because it uses more articulators and more dimensions than the auditory–oral channel (Bouchard, 1996).

The set of possible signifiers for a C-sign is extremely restricted because the set of physiological relational vocal percepts is small. So arbitrariness is limited by what are ultimately principles of physical science, as Thompson (1917) anticipated for biological systems in general. Languages vary in their choices of signifiers among these combinatorial percepts, as expected in the light of arbitrariness. For instance, the syntactic relation "direct object" can be expressed by any of these combinatorial signifiers: juxtaposition in the order V-NP or NP-V, a Case affix or a Case tone on the complement, an object affix or an object tone on the verb. This follows from Saussure's general principle of arbitrariness. There is no "reason of nature" for a language let alone all languages—to choose any particular combinatorial signifier among those enabled by our physiology: any signifier is a possible candidate, because each one can optimally satisfy the requirement to encode meanings in a form. Indeed, each possibility is instantiated in some language or other. Languages choose from among the various possibilities of combinatorial signs, just as they arbitrarily choose from among the various possibilities of unit signs. Which combinatorial percepts are possible signifiers is not stipulated in some universal list, but is determined by prior properties of the perceptual substance of the modality of the particular language. Under this view, if there was no variation in the way languages express a relation such as "direct object," if they all had the same signifier for it, this would be a most improbable accident, just as it would be if the signifier of a unit sign happened to be the same in all languages. Since Saussurean arbitrariness extends to C-signs, variation in syntax is a virtual necessity. Consequently, which particular combinatorial signifier is used in any specific case in a language must be learned just as much as any signifier at the word level. The numerous instances in which each language must choose a C-sign create the impression that languages can be amazingly different. But this is just an impression due to the cumulative effect of the choices; in fact, each choice of C-sign involves only one of the very few percepts that human physiology allows as the signifier of a C-sign. Though each combination is very simple, these combinatorial means cumulatively allow syntax to create organized groups of signs which can attain a very high degree of complexity overall.

Consider now the nature of the meaning of a C-sign, that is, the relation R at the conceptual-intentional level. The signified of R is a relation of predication. Predication, namely the capacity to attribute properties/information to objects, is a universal trait of human cognition. As Hurford (2007a, p. 527) indicates, "In the very earliest mental processes, long antedating language, binary structure can be found, with components that one can associate with the functions of identifying or locating an object and representing some information about it."

In a combination of signs as in (2), the semantic part of the C-sign links two elements so that one adds its denotation as a restriction on the other, either in the usual sense for subject– predicate and topic–comment relations, as in (5), or in the sense of saturation, as in (6).

	- b. that book, I really liked (the property of the comment is attributed to the topic).
	- b. in the kitchen (the property of the Noun Phrase is attributed to the locative preposition).

In summary, syntax is a set of combinatorial signs that allow the formation of complex signs. The perceptual form of a Csign can only be either a juxtaposition or a superimposition of a vocal (or gestural) percept; this limitation on the combinatorial signifiers is due to properties of the human sensorimotor systems. The signified of a C-sign is predication, which was exapted from the pre-linguistic cognitive system of humans. Like other signs, combinatorial signs are subject to arbitrariness due to the nature of the two substances that they link. Therefore, which combinatorial signifier a language chooses for any particular predicative relation (i.e., "construction") is arbitrarily selected from among those permitted by its modality. These are the main tenets of the Sign Theory of Language.

# The Kind of Brain Mechanisms Required for the Formation of Signs

A sign is a link between elements from domains of very different natures—a physical/perceptual form and a

<sup>5</sup> In addition to these very direct ways of indicating that there is a relation between two signs, we can also indicate that a relation is being established between two percepts by physically shaping one in a conventionalized way, in a paradigm that indicates what relation is being established with the other, as with Case marking or agreement.

psychological/conceptual meaning. The core problem is to identify the brain mechanisms that enable links between these two kinds of elements. This neurological property (or set of properties) must be unique to the human brain since only humans have words: no other animal comes close to having equivalent signs detached from the immediate environment and as productively created. In Bouchard (2013), I suggest that these neuronal systems must have properties similar to the uniquely human systems of neurons discussed by Hurley (2008). These systems have the capacity to operate offline for input as well as output: they can be triggered not only by external events stimulating our perceptual systems but also by brain-internal events (including counterfactuals); they can also be activated while inhibiting output to any external (motoric) system. These Offline Brain Systems (OBS) are not specifically designed for language but they provide the crucial trait.

As early as 1891, Saussure understood that the fundamental duality of language is not in the linking of sound and meaning, but "resides in the duality of the vocal phenomenon AS SUCH, and of the vocal phenomenon AS A SIGN—of the physical fact (objective) and of the physical-mental fact (subjective)" (quoted in Bouquet and Engler, 2002, p. 20). The question is in what way, exclusive to humans, the vocal phenomenon enters into the mental domain, into the brain. Non-human animals can correctly classify and appropriately respond to stimuli, so acoustic elements, as well as informative content elements, are already brain-internal, but as categorical systems linked with brain-external elements. Being indexically tied to objects of the world, they are restricted in their mental activations and they cannot freely undergo linkings, they cannot form signs. Something different must be present in human brains. The brain mechanisms we are looking for must enable a vocal sound to be represented in the brain in a way detached from any brainexternal phenomenon, as a purely brain-dependent entity, an activation of an OBS or something similar. Consequently, these representations of percepts can be arbitrarily linked to concepts: they can function as signifiers. I refer to these neural systems as Detached Representation systems (DR systems).

In addition to the physical element of a sign becoming a purely mental representation, the informational content of a sign is also different from that of an animal communication system unit. The content of an ACS unit is a category, i.e., a neural linking of similar results from sensory input, a class of input stimuli. This level is still linked to perceptual input—to the outside world. Even the signals that apes learn through intensive training remain at the level of action observation and embodied simulation of action triggered by external events. The content of a linguistic sign is at a more abstract level. It comes from human-specific cognemes that are abstracted from any sensory input or immediacy. This is the level at which detachment is attained. The concepts/meanings of signs do not represent or stand in for outer objects, but are brain activations that take internal events as inputs. This notion of "concept" is similar to the "amodal symbols" of Barsalou (1999) and the "types" of Penn et al. (2008):

"[...] only humans form general categories based on structural rather than perceptual criteria, find analogies between perceptually disparate relations, draw inferences based on the hierarchical or logical relation between relations, cognize the abstract functional role played by constituents in a relation as distinct from the constituents' perceptual characteristics, or postulate relations involving unobservable causes such as mental states and hypothetical physical forces. There is not simply a consistent absence of evidence for any of these higher-order relational operations in nonhuman animals; there is compelling evidence of an absence" (Penn et al., 2008, p. 110).

In order to be able to form linguistic signs, humans had to evolve brain systems that enable a more abstract representational level, so that concepts and percepts can be linked. It is not a percept per se that is linked with a concept per se in a linguistic sign, but a representation of the percept and a representation of the concept, i.e., a mental state corresponding to each of them, as we saw in figure (1). The crucial innovation is in the way some human neuronal systems function. Language did not emerge because there was environmental pressure for better communication or thought organization (though it brought leverage for both). It is not a system with a function of communication that emerged, nor with the function of organizing thought. It is a system of signs that emerged because elements from two very different substances met in the brain via their representations by new neuronal systems.

If the known laws of biology are extrapolated, we expect these brain systems to be in continuity with neuronal systems that are part of the machinery of the pre-linguistic brain, i.e., the brain of a prehuman species that has not yet achieved the capacity for detachment of the sort discussed above. Given biological continuity, it is likely that these are not radically different systems, but rather that they are offline activations of systems involving neurons in essentially the same parts of the brain.

In Bouchard (2013), I conjecture that these systems developed this novel kind of activation due to an increase in synaptic interactions that was triggered by several compounding factors. A large brain with a huge cortex offers a greatly increased potential for synaptic interactions. In addition, the more globular shape of the brain, with the thalamus in the middle, affords more cross-modular interactions (Boeckx, 2012). Moreover, alleles such as ApoE4 significantly improve synaptic repair; hence, they dramatically increase synaptic interactions. In addition, the long dependency during infancy feeds more cultural material into these additional brain capacities. With such a massive increase in synaptic interactions and complexity of circuitry due to biological changes and extensive cultural stimulation, a critical level was reached in hominid brains; some neuronal systems started being triggered by strictly internal brain events, introducing a new form of offline activation with no link to external events related to sensory inputs or motoric outputs. These strictly internal (offline) activations of some micro-anatomical structures represent a small evolutionary step: like the latching discussed by Russo and Treves (2011), they occur without altering the make-up of the neuronal network or any of its constituent properties. But DR systems have gigantic consequences: they enable brain activity of a novel kind and complexity, a unique representational capability that leads to higher level mentalizing.

The dramatic increases in both the number of neurons and the number connections between neuronal networks are instances where quantity produces quality, the brain activity becoming less input-driven and less rigid. It is not obvious that there is an immediate functional behavioral advantage for an individual to have this kind of detached brain activity. It can slow down reactions to the immediate environment, creating a sort of framing problem. From our current perspective, we see a quality in the innovation; but it may have come only in the long run part of the pleiotropy of the innovation in brain activation that occurs due to material design, with no teleological push for an improvement of the individual's immediate well-being. Enhancements in the number of neurons and of connections lead to an increase in computational abilities and internal activity, but have little effect on the link between the brain and the perceptual systems interacting with the outer world. This kind of system does not evolve due to functional pressures: it takes on functions after its emergence. As Gould and Lewontin (1979)remark, a trait is not necessarily for something: it can just be a consequence.

This considerable upgrade in the quantity and quality of brain activity is like duplication in genes: other areas/systems can take over (Deacon, 2006), particularly given that the novel functional property of these micro-anatomical structures is less specialized, not tied to particular systems related to perception, but has a general representational capacity. Consequently, the various brain operations related to these systems are expected to exhibit great plasticity, with their anatomical location being diffuse. This is another feature that neuroscientists should be looking for.

Though, it may not be possible to reconstruct the evolutionary history of the causal factors for the brain systems that enable the formation of linguistic signs, we can nevertheless test whether such systems actually exist and whether they exhibit some of the predicted properties, such as plasticity and pleiotropy. There is already evidence in support of the hypothesis.

Concerning the existence of these offline systems, we can see them at work in language once we isolate their effects from those of other activities concurrent with language at the motoric and conceptual levels. For instance, Meister and Iacoboni (2007) report on an experiment in which they compare the processing of visual stimuli while performing an action perception task and two linguistic tasks. They did not find any area specifically activated or with higher activity during the two linguistic tasks: "when visual stimuli concerning object-oriented actions are processed perceptually, they activate a large bilateral fronto-parietal network. When the same stimuli are processed linguistically, they activate only a subset of this network and no additional areas" (p. 6). They argue that these results support "the evolutionary hypothesis that neural mechanisms for language in humans co-opted phylogenetically older fronto-parietal neurons concerned with action perception" (p. 6). The identification of neural systems involved in language, and their role, is extremely difficult. As Dehaene and Cohen (2007) point out, module sharing may involve all levels of brain hierarchic organizations: micro-maps (millimeter-size columns), meso-maps (centimetersize circuits), and macro-maps (larger-size networks). But with the rapid progress in technology to probe the brain, scientists can refine the testing of linguistic properties relating to neural systems, and eventually put the hypothesis to a test.

Regarding plasticity, Hein and Knight (2008) provide evidence that the same brain region can support different cognitive operations (theory of mind, audiovisual integration, motion processing, speech processing, and face processing) depending on task-dependent network connections (see also Bookheimer, 2002, p. 153). There is no fixed macro-anatomical structure that is exclusively dedicated to language: linguistic processing is a widespread property of the neural networks (Fedor et al., 2009). Language exhibits extensive plasticity for the localization of its components between and within individuals (Neville and Bavelier, 1998), during its development (Karmiloff-Smith, 2006), in its repair (Hagoort, 2009), and depending on its modality (Neville, 1993; Mayberry, 2002). The often-noted association between human praxis and language also points in the same direction. There is a genetic linkage between handedness and language dominance, and clinical correlations between aphasia and apraxia (Donald, 1998).

Regarding pleiotropy, if human brains have systems of neurons that are functionally less specialized, systems that can be activated in absentia, triggered by representations of events instead of the events themselves, and produce representations of events with no brain-external realization, then we should find evidence for this capacity in other functional traits unique to humans. There is compelling evidence that several interrelated traits are uniquely human, and absent or in very rudimentary forms in other animals (e.g., Premack, 2004; Penn et al., 2008; Fedor et al., 2009). This Human-specific Adaptive Suite extends across many domains and involves qualitatively huge differences from species that are closely related to us. Here is an indicative list, with a few of the relevant references.

Human-specific cognitive traits


#### Human-specific neurological traits


The human-specific cognitive and neurological traits are so closely linked that several scholars assume that at least a good part of them coevolved synergistically from a common factor underlying these various cognitive modules (Szathmáry, 2008; Fedor et al., 2009). Some assume that the underlying supermodule is one of the functional modules, the two most

popular being Theory of Mind and language. However, Penn et al. (2008) argue compellingly that the suite of discontinuities between human and non-human minds cannot be explained by relating an explanans directly to the functioning of these cognitive domains. (See Bouchard, 2013, pp. 113–114) for arguments against the language-first and ToM-first hypotheses). The Human-specific Adaptive Suite provides initial evidence for a neurobiological innovation with general representational potential. Given the limitations in the current techniques available, the specifics of many of these traits are still unclear, but they may ultimately help us resolve the problem of the neural basis of sign formation.

Though this is difficult, the hypothesis can nevertheless be tested. One useful line of inquiry can be found in recent experiments by Stanislas Dehaene and Laurent Cohen (Dehaene, 2005; Dehaene and Cohen, 2007). They show that some adaptations can occur much faster than is expected on a genetic scale, due to a process that they call "neuronal recycling" that operates during cultural acquisitions such as reading and arithmetic. They observe that part of the human cortex is specialized for these two cultural domains. Since invention of these cultural activities is too recent to have influenced the evolution of our species, they hypothesize that this specialization results from neuronal recycling: reading and writing are not genetically encoded, but they nevertheless find their niche in a well-suited set of neural circuits.

Note that under the hypothesis that the novel brain systems coincidentally allowed mental states corresponding to elements of the perceptual and conceptual substances to meet in our brains to form linguistic signs, this does not raise what Chomsky (2005, 2007, 2010, 2011) refers to as the Jacob-Luria problem. Though he accepts that pressures to communicate may have played a role in the gradual fine-tuning of language, Chomsky has repeatedly claimed that, at its origin, language could not have evolved due to communicative pressures because this raises a problem:

(7) Luria/Jacob problem: How can a mutation that brings about a better communication system provide any survival advantage to the first single individual who gets it?

A mutation occurs in a single individual, whereas communication takes place between individuals<sup>6</sup> . Under my hypothesis, the offline systems with general representational capacity took on this other function of linking percepts and concepts after they were in place due to a suite of evolutionary pressures. The Luria/Jacob problem does not arise in this approach because the change was not for language or any of its functions like communication or organizing thought. The change produced offline systems. Linguistic signs are a side effect of this neurobiological property. Even if it depended on a mutation (but I doubt this to be the case as indicated above), the new trait could spread in a population because it has evolvability of its own, and all the members of that group are then brain-ready for the innovative side effect when it occurs: by the time words come around, they can be understood by conspecifics.

The advent of some kind of DR system is the crucial small change that made a big difference. This provided the core biological mechanism of the language phenotype—the capacity to link percepts and concepts into signs. Given this capacity and the prior properties of the two substances of the elements linked by signs, the rest of the linguistic properties follow without the need of any additional language-specific rules or conditions. In the next sections, I show how this happens in the three core components of grammar: phonology, semantics, and syntax.

Before turning to these issues, an important question remains to be addressed. Once humans had the capacity to form a limitless number of signs, they developed a capacity to learn and remember a vast set of such signs. How exactly this additional capacity depends on the mechanisms of the first capacity is a question that can now be asked, given my hypothesis. If I am correct in supposing that the DR system is likely to be (part of) what provides humans with a more advanced Theory of Mind (such as a shared attention mechanism and a meta-representation of others' mental states, Baron-Cohen, 1995), and if it also turns out to be correct that word learning strongly depends on an advanced ToM (Bloom, 2000), then the DR system would be crucial for both the capacity to form signs and the capacity to learn and remember them. See the discussion in section 9.5 of Bouchard (2013) and references therein.

# Contrastive Dispersion of Percepts and Combinatorial Phonology

As is the case in other biological systems, DR systems are complemented by epigenetic self-organizing constraints that emerge from interactions among properties of building materials that limit adaptive scope and channel evolutionary patterns (Jacob, 1982; Erwin, 2003). Since the linguistic linking between a percept and a concept is arbitrary—that is, it is not hardwired but made possible by their representations in DR systems the representation of any percept can potentially be linked to the representation of any concept, and the links can change very rapidly. So there are innumerable possible links. This is compounded by the fact that there are infinitely many incrementally different vocal forms that we can produce and perceive, and an untold number of possible concepts/signifieds because DR systems introduce a detachment from the immediate situation that opens the door to any imaginable situation, presented from a multitude of perspectives. Moreover, there is the logical possibility that individuals will choose different linkings: in the extreme case, each individual would have its own system. Therefore, DR systems introduce an unprecedented sort of chaotic system in the brain. This creates randomness that is confronted with material constraints. As in other situations far from equilibrium, small chance disturbances are progressively amplified by material properties and result in clusterings, in order out of chaos (Prigogine and Stengers, 1984).

In this kind of self-organization, local interactions of components of a system generate complex organized structures

<sup>6</sup>Pinker (1994: 365), foreseeing the objection, counters that the initial grammar mutant could talk to the 50% of brothers and sisters, and sons and daughters, who shared the new gene.

on a global scale. In language, the potential chaotic dispersions of arbitrary signs are constrained by the physical and cognitive properties with which the signs are confronted. These constraints restrict the linguistic sign system in a way that maximizes contrastive dispersion and creates clusterings that result in the various properties of language that we observe in phonology, semantics, and morphosyntax.

#### Phonological Segments

The signifier/percept of a sign is the part most noticeably influenced by material properties. Though the representation of any percept at all could in theory become a signifier, the possibilities of the chaotic system are considerably narrowed by material properties of our production systems and perception systems.

A salient property of human vocalizations is that they are perceived as segments: discrete elements. This is a general design feature of human neurophysiology: information that unfolds over time is chunked in the acoustic domain, as well as in other domains such as vision. This is a bilateral stimulus-neutral system of temporal segmentation that operates before feeding specialized lateralized systems such as the processing of speech or music (Poeppel, 2001). Sensory input is analyzed on different timescales by the two hemispheres. High-pass (global) information from short 20- to 50-ms temporal integration windows is passed to left hemisphere areas, whereas low-pass (local) information from long 150- to 250-ms integration windows is passed to the right hemisphere (Poeppel, 2001) (However, the issue is still unclear and recent work shows that lateralization in this case may be weak; Giraud and Poeppel, 2012). These oscillations arise naturally in our perception of vocalizations (Poeppel, 2003; Sanders and Poeppel, 2007), and the temporal integration of vocalizations is reflected as oscillatory neuronal activity. The timings correspond to typical segments and syllables.

Similar, bilateral segmentation systems appear to be shared by other species; they are the basis of the auditory processing of species-specific vocalizations in macaque monkeys, and the ability of squirrel monkeys to discriminate between conspecific and non-conspecific vocalizations (according to studies reported in Poeppel, 2001). This timing ability is the basis of a system with an important adaptive benefit: a strong change in rhythm signals danger. In sum, we perceive sound as segments, in a digital, not analog way. Segments are perceived as being produced concatenated. An important question is what determines the particular repertoire of possible phonemes. Why do digitized vocal percepts cluster in a few particular hot spots among the innumerable, chaotic possibilities we can produce and perceive? As in other chaotic systems, the clusterings depend on frequency and accumulation: chance vocalizations are progressively amplified by material properties pertaining to ease of production and distinctness of perception. On the production side, vocalizations involve the displacement of organs, hence muscular energy. Certain vocalizations are easier to pronounce and require less energy; this is likely to favor their use and increase their frequency (Lindblom, 1992).

The human perceptual systems also set upper bounds on the distinctions that we can perceive or produce as signifiers. Distinctness of expression is particularly important in the case of acoustic information since it is only physically available for a very short length of time and cannot be recovered in the case of an erroneous perception. Nowak et al. (2002) found that the demands of discriminability (as well as memory and time to learn) constrain the system to a fairly small set of signals, an observation already made by Wang (1976, p. 61). The actual repertoire is very small: a few dozen discrete perceptual elements. This observation extends to sign languages that use the gestural modality: there are very few gestural minimal elements, and like phonemes, they are made up of articulatory features (see, for instance, Brentari, 2002). This small set of percepts is a result of self-organization. Vocalizations that are easier to produce and can be more distinctly perceived have a higher frequency of use. As frequencies increase, accumulations occur at certain points in the articulatory–acoustic continuum. Percepts cluster in particular hot spots as a result of this contrastive dispersion. As Lindblom (1992) (following Liljencrants and Lindblom, 1972) indicates, a compromise between perceptual distinctiveness and articulatory cost brings about quasi-optimal perceptual distinctiveness. But this is not sufficient, because the search space is too large for convergence on a structure as complex as the human phonological system. However, if we take into account the properties of building materials, selforganization derives the phonemic clusters. Thus, Carré and Mrayati (1990) and Oudeyer (2005, 2006, 2007) show that canalization by the vocal tract and general acoustic theory define "eight discrete regions of such a tube where deformations, or constrictions, afford greatest acoustic contrast for least articulator effort" (Studdert-Kennedy, 2005, p. 64), and these correspond to places of articulation in natural languages. Thus, vocalic systems most frequently have peripheral vowels, which are the most contrasted (Ménard, 2013).

#### Phonological Combinations

This severe limitation on the number of usable percepts is the source of the clash between the possibilities of the perceptual and conceptual substances. There are innumerable meanings and ways to partition meaning (more on this below), but discriminable speech sounds are limited by the material properties of sound production and perception. The combinatorial formation of signifiers is usually attributed to this clash between the possibilities of the two systems. "If the symbols were holistic vocalizations like primate calls, even a thousand symbols would be impossible to keep distinct in perception and memory" (Jackendoff, 2002, p. 242). In simulations like Oudeyer's, the small number of clusters "automatically brings it about that targets are systematically re-used to build the complex sounds that agents produce: their vocalizations are now compositional" (Oudeyer, 2005, p. 444). How could that be? Where do the compositional processes come from? The answer is again found in the material properties already present in the forms. Vocal units have the following universal material properties:


These acoustic and auditory properties are also distinguishing elements in the signals of other mammals (Lieberman, 1968).

Vocalizations occur in time, and the material properties of vocal articulators are such that we cannot produce more than one vocal unit at a time. This is a contingent property of language production. Since vocal units are aligned in time, our perceptual system captures the linear properties of vocalizations when they are produced, in particular the linear relationship between two vocal units, the most salient one being linear adjacency. The linear adjacency of two vocal percepts is itself a percept and can be represented by a DR system, like any other percept. The relational percept of juxtaposition is already in the stock of our perceptual system; hence, it is available for DR systems that link concepts and percepts. Another material property of vocalizations is intonation; therefore, another perceptual element represented by DR systems is the tone superimposed on a vocal unit, of which there are a few distinctive values due to contrastive dispersion. Similarly, the length and stress of a segment are percepts that can be represented by DR systems, within the limits of distinctive values. Crucially, in an arbitrary system, the percept represented by a DR system and linked to a concept can be any element among those recognized by the perceptual system: a vocal unit, a juxtaposition of units, an intonation, a length, or a stress. Because the system is arbitrary, it makes no difference whether the represented element is simple or complex. The acoustic image can be a single phoneme or the relational percept of juxtaposition applying any number of times to phonemes, as well as any of the available distinctive intonations, lengths, and stresses on these elements. These complex elements remain within the limits of what humans can distinctively perceive or produce because their parts have the appropriate qualities. Phonological combinatoriality comes from a material property of the articulatory and perceptual systems, namely the fact that vocalizations are temporally linearized, which entails the percept of juxtaposition. The phonetic data provide information on the source of more abstract principles: segmenting into phonemes, as well into as words/morphemes, already contains computational properties (see DeWitt and Rauschecker, 2011 for combinatorial properties in basic perception). This simple concatenationrecursion of phonemes allows an unlimited derivation of signifiers: any combination of distinguishable percepts can be a signifier. This system is subject to a general law of nature whereby the frequency of an element is inversely correlated with its complexity: the simpler an element is, the more likely it is to appear in nature (cf. Zipf, 1965/1949). Though concatenationrecursion of phonemes can derive infinitely complex signifiers, the simpler ones are much more likely to be formed, produced, or heard. This higher frequency creates accumulations that make the system relatively conservative in terms of the number and complexity of elements that form its signifiers. In addition, production ease and auditory salience influence not only the dispersion of vowels and consonants, but also syllabic templates, or sequences of segments: the combinations of phonemes are subject to phonotactic constraints, such as the energy expended for the transition, which also constrain the nature and number of potential signifiers. The constraints that arise from properties of the articulators and ease of articulation influence what phonemes occur in adjacent positions as early as babbling (MacNeilage and Davis, 2000). The overall complexity of a signifier is also likely to be limited by memory and retrieval capacities.

Discrete speech sounds and their combinations emerged because they are consequences of material laws that apply to a certain kind of organism hosting DR systems that can represent elements of their perceptual and conceptual systems and links between them. The chaotic system deriving from these brain systems must have the properties that we observe because the building materials channel the way the system becomes structured into specific self-organizations.

# Contrastive Dispersion of Meanings and Combinatorial Semantics

Segmentation is also a design feature of the human cognitive make-up. We digitize the world and events into discrete chunks, action packages varying from 0.3 to 12 s, mostly 1 to 4 s long (Schleidt and Kien, 1997). As for the ontology of the cognitive units, our perceptual attention systems treat the world as containing two basic kinds of entities (Hurford, 2007a, p. 527):


Another aspect of cognitive segmentation is found in the two types of attention discussed by Humphreys (1998). Global attention captures the gist of the whole scene. In language, this corresponds to something like the main predicate and its arguments. Local attention is subsequent focal attention on local features of individual objects. In language, this corresponds to secondary predicates such as nouns, adjectives, etc.

By allowing detachment, DR systems introduce a chaotic expansion on the meaning side of language: there is an extremely large if not infinite number of potential (offline) concepts. First, the vast number of objects and situations we perceive can all be represented offline as concepts, as well as their properties. This is compounded by the various perspectives we can have on them (Quine, 1960). Moreover, the potential for concept formation is multiplied by the affordance of intra-brain interactions where some neuronal systems are triggered by other brain events. In addition, a particular language can partition the conceptual substance in countless possible ways to delimit its lexical meanings (Saussure's radical arbitrariness). In a system of arbitrary signs, any of these elements treated by the cognitive system could be a meaning represented by DR systems and could be linked to a vocal form.

But this unbridled expansion in meanings is constrained by design features of our cognition. For instance, our global attention process is constrained as to the number of participants that it can take in at a glance: we can subitize at most four salient objects at a time (cf. the "magical number 4" in Hurford, 2007b). Though actual events can involve any number of actants, the chaos of what we observe is organized by subitizing and chunks of four or fewer actants. The recurrence of the perception of these chunks in the environment creates accumulations, and language has settled on predicates with at most four arguments. The chaotic expansion that could potentially arise from linguistic arbitrariness in the meanings of words is also limited in a more general way by material properties and self-organization. Here, too, order arises out of chaos and clusters are formed in the mass of the conceptual substance as a result of frequency and accumulation. In this case, accumulation depends on the material conditions that make the situations denoted by the concept relevant for the organisms. The more a situation has some importance and/or is encountered frequently by the organism, the more frequently concepts associated with it will be activated. The accumulations self-organize around the concepts most used by the organism. It is this usefulness that makes the meanings tend to correspond to fairly broad and/or usual categories of things, actions, qualities, etc. (an observation already found in Locke, 1690/1964, p. 15). Similarly, Nowak et al. (2002, p. 2131) note that "[t]he evolutionary optimum is achieved by using only a small number of signals to describe a few valuable concepts."

Usefulness is also the motivation for the important role played by basic level concepts (Rosch and Mervis, 1975; Rosch et al., 1976). Murphy and Lassaline (1997) argue that the basic level is an optimal compromise between informativeness and distinctiveness: this level is informative, because we can infer many unobserved properties once we know which basic category something belongs to, and distinctive because it is a relatively easy categorization to make. Thus, if you ask someone What are you sitting on?, you are more likely to get the answer chair rather than a subordinate such as kitchen chair or a superordinate such as furniture. Names for basic-level concepts are among the first common nouns learned by children (Brown, 1958).

In fact, we can construct so many particular objects and events and their properties out of reality, potentially an infinite number, that it would not be useful (in a general as well as in an evolutionary sense) since most of them recur only very rarely, if at all. This is likely why meanings tend to converge on these hot spots of accumulation.

Even with an important number of U-signs and the possibility of combining them by means of C-signs, the resulting meanings are nevertheless generally quite broad and may correspond to several different situations in the world, including the meanings of sentences. Trying to remedy this underdetermination would force language into ever more complex constructions, to a point where it would be extremely unwieldy. Humans have another prior mental attribute that avoids this problem and favors the cumulative use of broad concepts: a system of pragmatic inferences that derives from a full Theory of Mind (ToM). Given the pragmatic inferences that derive from ToM and the context of utterances, expressions need not have fully determined meanings in order to convey information that is sufficiently precise to be of current use. When two human beings interact, they each have a full ToM, similar cognitive and perceptual systems, and similar contextual information. Therefore, they both know that they have an enormous amount of information in common, and their language faculty does not operate in a vacuum. Using and understanding language involves intensive reliance by speakers on their shared conceptual and contextual knowledge. Pragmatic theories from authors as diverse as Ducrot (1984), Grice (1975), Levinson (2000), and Sperber and Wilson (1986) all share this observation that comprehension is inferential and it draws on both sentence meaning and context (in a very broad sense). Since the inferential system is independently grounded, languages do not drift into an unbridled multiplication of meanings redundant with contextual information, but converge on broad, sufficiently informative meanings (Bouchard, 1995; Hoefler, 2009). A similar argument can be made from the perspective of language's other main function, i.e., thought organization.

To sum up, discrete meanings are clusters formed in the mass of the conceptual substance as a result of maximizing contrastive dispersion across the space for signifieds under the effects of frequency and accumulation due to relevance/usefulness. These clusters are relatively few in number and signs tend to have fairly broad meanings. This does not adversely affect the communicative or thinking functions of language because linguistic signs reside in organisms that independently have an inferential system that supplies the required complementary information.

#### Syntax

#### The Source of Syntax

Syntactic combination of words and phrases raises the same question as phonological concatenation. Where do the combinatorial tools come from?

If we try to determine what brain systems enable the formal properties of syntactic combinations and the plausibility of these systems given known laws of evolution, it is likely that we will not get very far, because formal systems are only very remotely related to factors involved in evolutionary changes. The system that forms signs (lexicon) and the system that combines signs (syntax) have properties that are so different in current models that they seem quite disconnected. For instance, Chomsky (1995, p. 8) says that matters concerning "the sound–meaning pairing for the substantive part of the lexicon [...] appear to be of limited relevance to the computational properties of language." But that is not so in the approach I adopt. If we look at the physiological and cognitive properties of the elements being combined, a hypothesis emerges with means and a method of confirmation that are clear enough to be verifiable. Since I argue that the syntax of a language is a set of particular combinatorial signs, each with its signified and signifier, I change the ontology of syntax from a formal computational system to a set of neurophysiological elements.

Syntactic compositional processes, i.e., C-signs, are simply functional uses of universal pre-existing properties of vocal sounds and universal pre-existing properties of our cognitive system. Combinatorial syntax is due to the self-organization of these prior vocal and cognitive elements. On the conceptual side, the most frequently represented element is the relation of predication, since it is common to all the attributions of properties. This is compounded by the fact that human brains with DR systems have extended this cognitive process: DR systems can not only attribute properties from sensory inputs to perceived objects but, by operating offline, they can also attribute abstract conceptual properties, not linked to immediate sensory inputs. Predication is the broad meaning par excellence. It is a relation that is broad enough to apply to almost all possible meanings and it is omnipresent in our cognitive system. So it is the meaning that creates by far the strongest concentration point in the chaos of semantic DR systems. The fact that our linguistic system has integrated the predicative function at its core simply reflects the place of this readily exaptable concept in our cognitive system, its high rate of frequency and accumulation.

On the perceptual-physical side, words being made of concatenated phonemes, i.e., of elements with properties of vocal sounds, the most frequent elements are temporal sequencing, and superimposition such as intonation, stress, and length. These traits are always present, so they are by far the most frequent elements in the vocal perceptual system. Thus, the hottest accumulation point in the mass of the conceptual substance is the relational concept of predication, and the hottest accumulation points in the mass of the perceptual substance are the two relational percepts of juxtaposition and superimposition. These accumulation points are so overwhelmingly dominant in their respective domains that they increase the frequency of links involving them to the point where these links inevitably accumulate and crystallize. It follows that when human organisms develop signs due to properties of their prior DR systems, they inescapably develop combinatorial signs involving predication as a meaning and juxtaposition and/or one of the forms of superimposition as a signifier. In short, syntax is a consequence of self-organization arising out of the chaos created by DR systems, as is the linguistic sign.

Syntactic combination arises from prior properties of the conceptual and perceptual substances involved, given general laws of nature concerning highly complex systems, à la Prigogine and Zipf. These cognitive and material design properties have a very strong canalizing effect. In particular, they are all primitive combinatorial processes: predication combines an object and its property; order and juxtaposition hold of two segments; intonation, length, and stress apply to segments. As a result, the sign itself introduces combinatorial systems into the linguistic system, and from these primitive combinatorial systems derive concatenation in phonology and combination in syntax. The logically prior properties of the physical and conceptual components of signs are the source of key design features of language, including the particular type of combinatorial system that it has. Syntax happens to have functional effects that are useful for communication and thought, but they are not the factors that triggered its emergence; they are just fortunate consequences.

#### Type-recursion

In addition to concatenation-recursion, as found in phonology, the syntax of human language exhibits a particular kind of recursion, where an element of type X can be embedded within other X elements indefinitely. I refer to this as type-recursion. We want to know not only why language has recursion, but also why it has type-recursion.

Type-recursion involves more than recognizing nested attributes of objects (an ability that some animals have) (Penn et al., 2008, p. 117). To have type-recursion, you need an additional property: the complex signs must have a label; they must belong to a category. If a phrase did not have a labeled category, it could not contain another phrase of the same category.

Since properties of signifiers are essentially those of phonological elements, the types cannot come from these. The source of the typological distinction must be in the signified/meaning. Whether these categories are determined ontologically or functionally is an important question that has been debated for centuries. I will not address it here since it is tangential to the issue. I will simply assume the broad hypothesis that lexical items have categories when they interact in syntax. It is also broadly assumed that the phrasal categories are identical to the lexical categories (Noun, Verb, Adjective, Preposition, Tense, etc.). This is due to the fact that syntactic phrases are endocentric: the category of a phrase always comes from one distinctive component, which we refer to as the head—ultimately a lexical head, a U-sign.

The syntactic properties of headedness and endocentricity derive from prior properties. First, asymmetries in syntactic relations, such as the asymmetry between heads and dependents, come from the fact that predication, the meaning of C-signs, is asymmetrical (Venneman, 1974; Keenan, 1978; Bouchard, 2002): the property expressed by the dependent is attributed to the head. Second, endocentricity derives from the way we cognitively process property-attribution (predication): in our cognitive perception of the world, an object to which we attribute a property remains an element of the same type; in a way, it remains the same object. In language, this means that a noun to which we add an adjective remains a nominal thing; a verb to which we add an argument remains a verbal thing, etc. There is a kind of hyperonymic relation between the head and the phrase (cf. Bauer, 1990; Croft, 1996). Assuming the parsimonious hypothesis that the only syntactic primitives are lexical and combinatorial signs, we derive the Endocentricity Theorem:

(8) Endocentricity Theorem

The category of a constituent X is the category of the element that receives a property by the predication of the C-sign that formed X.

In other words, if X is formed by a C-sign that assigns the property of A to B, then X's category is the category of B (B being the "object" that receives the property of A).

We now see why language has type-recursion: typerecursion occurs whenever a restraining sign or one of its elements happens to be of the same type as the restrained sign whose category projects and determines the category of the complex sign. Type-recursion is a side effect of the combinatorial properties of the substances of signs, interacting with a general cognitive principle of property attribution.

Combinatorial syntax is not a hard-wired property that has evolved at some time. This ability ultimately derives from the particular representational capacity of DR systems that allows the formation of signs. Discrete infinity is a side effect of limitations on chaotic systems like arbitrary language, in interaction with material properties of the sensory-motor and conceptual substances. Both concatenation-recursion and typerecursion derive from the resulting self-organization that takes place. The reason that other animals do not have anything like combinatorial syntax in their communication systems is that they do not have DR systems, which also explains why they do not have unit-signs/words. The crucial leap for a language-ready brain was the development of DR systems that enable the linking of elements of two substances with no logical or natural connection between their elements, so that the linking is purely symbolic. The sole distinctive trait of human language is the capacity to form Saussurean signs. Recursion and discrete infinity are just side effects of this trait.

The ontology of syntax is not a formal computational system, but a set of neurophysiological elements. These elements have high evolvability, in contrast with formal systems. Interestingly, it is by attributing a non-central place to recursion that we can explain how language became type-recursive. My account is in the spirit of Evo-Devo proposals: type-recursion is not due to a specific genetic change but to logically prior properties of the building materials of language.

The Sign Theory of Language has high evolvability with respect to signs and combinatoriality. But of course, a linguistic theory must also pass the test of accounting for the collection of properties that linguists have uncovered about language. I am fully aware that if I am to make the radical claim that syntax is just a small set of C-signs determined by the nature of the sensorymotor and conceptual substances, I must show how that proposal can account for the numerous claims made about the syntax of human languages over the years. Space limitations prevent me from doing that here. But the linguistically inclined reader will find a long discussion in Part IV of Bouchard (2013) that tackles a representative sample of some of the constructions that have been most influential for theoretical argumentation over several decades:


In addition, Bouchard (2002) analyses the distribution and interpretation of adjectives in French and English in exquisite detail, as well as bare noun phrases and bare determiners (clitics).

In all these cases, the unification proposed in STL leads to new insights that allow us to progress in our understanding of language. Many properties that we know about make sense in this model, whereas they just existed, were described but left unaccounted for, in classic models.

### Conclusion

We must understand precisely what language is and have well defined linguistic phenotypes to search for the neural substrates that enable these phenotypes.

From a linguistic perspective, there are strong reasons to assume that the central trait of human language is the capacity to form signs by linking perceptual forms and meanings, rather than the currently prominent view that puts computational tools with discrete infinity at the core of language. Recursive syntax turns out to be a side effect of sign formation: due to general principles, an organism that develops signs inevitably develops combinatorial signs. The Sign Theory of Language offers a comprehensive and unifying approach to the functioning of the main subsystems of language: by calling on the perceptual and conceptual substances of signs and the self-organization that it triggers, the theory explains specific properties of signs, as well as the basic structuring of language in its phonology, semantics, and syntax.

This change of perspective regarding linguistic phenotypes suggests to direct research on neural substrates that enable metarepresentational functionalities, detached from sensory input and motoric output. Such Detached Representational systems need not be language specific. Given biological continuity, it is likely that the neural mechanisms will have broad effects at the functional level. While looking for the neural substrates that enable the formation of linguistic signs, it may therefore be useful to consider possible effects of these substrates on non-linguistic traits that may also depend on DR systems, such as the traits discussed under the Human-specific Adaptive Suite.

Another important question in probing the language-ready brain in adults is how much of the mechanisms are in place at birth and how much of the language system takes form during infant development.

My hope is that the change in perspective that we are led to from purely linguistic considerations will enlighten the search for the neural substrates of language, so we will eventually have a better understanding of what makes, and made, the human brain language-ready.

The general outline of the model is as follows:

(9) **Figure 3**

the most frequent ORC element is predication. These accumulation points are so overwhelmingly dominant in their respective domains that they increase the frequency of links involving them to the point where these links inevitably develop into combinatorial signs (syntax). Given categorization and endocentricity (due to object permanence), the syntax of languages has the formal property of type-recursion.

## References


Ducrot, O. (1984). Le dire et le dit. Paris: Éditions de Minuit.


Pinker, S. (1994). The Language Instinct. New York, NY: William Morrow.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Bouchard. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.