Edited by: Aaron P. Blaisdell, University of California Los Angeles, USA
Reviewed by: Leslie Phillmore, Dalhousie University, Canada; Shogo Sakata, Hiroshima University, Japan
*Correspondence: K. Gillespie-Lynch, Department of Psychology, 4S-234, College of Staten Island, CUNY, 2800 Victory Boulevard, Staten Island, NY 10314, USA. e-mail:
This article was submitted to Frontiers in Comparative Psychology, a specialty of Frontiers in Psychology.
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.
Using a naturalistic video database, we examined whether gestures scaffold the symbolic development of a language-enculturated chimpanzee, a language-enculturated bonobo, and a human child during the second year of life. These three species constitute a complete clade: species possessing a common immediate ancestor. A basic finding was the functional and formal similarity of many gestures between chimpanzee, bonobo, and human child. The child’s symbols were spoken words; the apes’ symbols were lexigrams – non-iconic visual signifiers. A developmental pattern in which gestural representation of a referent preceded symbolic representation of the same referent appeared in all three species (but was statistically significant only for the child). Nonetheless, across species, the ratio of symbol to gesture increased significantly with age. But even though their symbol production increased, the apes continued to communicate more frequently by gesture than by symbol. In contrast, by 15–18 months of age, the child used symbols more frequently than gestures. This ontogenetic sequence from gesture to symbol, present across the clade but more pronounced in child than ape, provides support for the role of gesture in language evolution. In all three species, the overwhelming majority of gestures were communicative (i.e., paired with eye contact, vocalization, and/or persistence). However, vocalization was rare for the apes, but accompanied the majority of the child’s communicative gestures. This species difference suggests the co-evolution of speech and gesture after the evolutionary divergence of the hominid line. Multimodal expressions of communicative intent (e.g., vocalization plus persistence) were normative for the child, but less common for the apes. This species difference suggests that multimodal expression of communicative intent was also strengthened after hominids diverged from apes.
The idea that language evolved from a primarily gestural mode of communication is centuries old (Condillac,
Because behaviors such as language and gesture do not fossilize, evolutionary links between gesture and language are impossible to prove. However, there is strong evidence in favor of an ontogenetic relationship between gesture and language. Gestures may allow infants to refer to objects before mastering their names and to gain input about relations between words and objects (see Iverson and Goldin-Meadow,
Children typically begin using gestures several months before they begin using words (Goldin-Meadow and Morford,
Focusing on objects that were referred to in one modality before appearing in another modality at a later observation, Iverson and Goldin-Meadow (
More generally, we hypothesized that all three species would exhibit a shift from greater reliance on gestures to greater reliance on symbolic communication with development. Such evidence would support the gestural theory of the evolution of language. Because evolution is a series of ontogenetic sequences, with earlier stages, more preserved in evolutionary history than later ones, cross-species similarities in early developmental sequences provide relevant evidence for reconstructing phylogenetic history (Parker and McKinney,
The logic of cladistic analysis is such that traits found across an entire clade (defined as species with a common immediate ancestor) are likely to have existed in the common ancestor (Parker and McKinney,
One of the primary challenges in comparing gestures across species is that definitions of gesture vary across studies. Gesture has been defined as specifically as communicative movements of the hands and as broadly as any visible bodily action (Kendon,
Varying definitions of communicative intention, or evidence that a gesture was emitted in order to influence another, also complicate cross-species comparisons of gestures. Communicative intention is often indexed by the presence of attention-getting behaviors (such as vocalization), monitoring the attentional state of the addressee (e.g., gaze alternation between addressee and referent), or persistence in maintaining a gesture until a response is elicited (Bates et al.,
Captive bonobos, chimpanzees, gorillas, and orangutans display clear evidence of communicative intention, or monitoring the attentional states of others, while gesturing. For example, they more frequently use purely visual gestures when their audience is facing them and communicate with vocalizations more when their audience is facing away (Tomasello et al.,
Although many studies of human development use eye contact to infer that a gesture is communicative, the majority of gestures produced by humans between 12 and 21 months of age may not co-occur with eye contact (Blake et al.,
Previous research comparing human infants to captive apes (at a mean age of 18 years) and language-enculturated adult apes revealed that apes exhibit more eye contact when gesturing than human infants do (see Leavens and Hopkins,
Unlike pre-linguistic children, apes who are not language-enculturated produce primarily dyadic (referring to the recipient of the gesture) rather than triadic (indicating a third entity) gestures (Camaioni,
Language-enculturated adult apes may exhibit more pointing with the index finger relative to reaching gestures than both captive apes (at a mean age of 18 years) and human infants younger than 19 months of age (Leavens and Hopkins,
However, pointing increases with age for human infants (Locke et al.,
There is also evidence that young language-enculturated apes use their lexigrams (non-iconic visual signifiers) to request more often than to indicate (Greenfield and Savage-Rumbaugh,
Tomasello (
A chimpanzee, a bonobo, and a human child participated in the current study. The ape participants were Panpanzee, a female chimpanzee (
Inter-individual routines consisted of play and exploration both within the apes’ living space and while foraging through the forest outside their home. While the same lexigrams were available both inside and outdoors, the lexigram boards used while exploring the woods were plastic covered printed sheets, while the keyboards inside were electronic. When a lexigram on one of these inside boards was pressed, an electronic voice spoke the word that lexigram represented. Lexigram boards used during exploration were designed to be highly portable and did not emit words when pressed. In order to capture all possible communication on video, caregivers spoke the word for each lexigram touched on the more portable lexigram boards. The apes understand human speech and often respond to a caregiver’s speech through lexigrams and/or gestures (Savage-Rumbaugh et al.,
The human participant in this study was a typically developing girl, GN, who was reared by her middle-class parents in a typical European-American linguistic environment. The observations were done at home in naturally occurring situations, usually, but not always, indoors.
Video data of the bonobo, Panbanisha, and the chimpanzee, Panpanzee, were recorded from soon after birth until Panpanzee was moved to a new location when she was 3 years and 11 months of age. Biweekly or monthly recordings of varying length were conducted until the apes were 26 months of age; subsequent recordings occurred every few months. Monthly hour-long videos of the child, GN, were recorded from 8.5 months of age till almost 2 years of age. In a few instances, it was necessary to return a second day to complete the hour for a particular month. In each case, the video was naturalistic; there was no attempt to make the settings across species more similar than they actually were.
Following the methods of Iverson and Goldin-Meadow (
While the bonobo, Panbanisha, first used lexigrams communicatively at 11 months of age, the chimpanzee, Panpanzee, began lexigram use at 13 months of age (Brakke and Savage-Rumbaugh,
As in Iverson and Golden-Meadow’s study, the offset of data analyses for the child coincided with clear evidence of multiword speech operationalized as five occurrences of different word combinations (18 months). Because the apes combined lexigrams less frequently than the child combined words and continued to use mainly single words throughout the study, the offset of ape data analysis was determined by the availability of usable data. After 26 months of age, no videos were available of the chimpanzee, Panpanzee, until she was 30 months of age. Thus, data analysis was terminated at 26 months of age for both apes. Generally, videos focused on only one ape at a time. When videos included both apes engaging in activities with one another, the video could be coded for either ape as long as the ape was visible for the majority of the sampled video. GN’s data captured daily interactions at home and in her backyard in various contexts (such as eating breakfast, celebrating a birthday, playing with dolls, etc.) to give a reflection of normal daily interactions with her parents and other people. The environment in which the child was filmed more closely approximated the environment of participants in the Iverson and Goldin-Meadow (
Coding schemes for both apes and the child were developed based on methods developed by Iverson and Goldin-Meadow (
Lexigram use (for the apes) was defined as touching a lexigram while the referent was glossed by caregiver or electronic voice on the lexigram board (see Figure
Gestures were coded according to their form into one of the following categories:
Other gestures were exhibited by the child, but not the apes; these included
Still others were exhibited only by the chimpanzee. Only the chimpanzee was observed to once exhibit a
When gestures were deictic, they were also assigned a likely referent. Two clues to reference were used: the caregiver’s behavioral or verbal response to the gesture and the object or person which the gesture pointed toward. Gestures that involved reaching or pointing into the distance with no visible referent with the likely intention of causing motion in the indicated direction were interpreted as meaning
Each gesture and lexigram was also coded as either communicative or non-communicative. Communicative gestures or lexigrams possessed at least one of the following properties: persistence, eye contact, or vocalization (note: vocalization is different from speech). Persistence involved repeating a gesture or lexigram use at least two times in a row, going out of one’s way to communicate, or maintaining a communication until responded to. Eye contact involved turning the head toward or looking at a caregiver’s face immediately before, after, or during the gesture. Vocalization involved vocalizing at the same point in time as a communication or immediately prior to or after it.
We also recorded for each gesture and lexigram whether the behavior was an imitation of an immediately preceding behavior by the caregiver.
Inter-rater reliability was established by calculating the percentage agreement, or the frequency with which both coders made the same decision divided by the sum of agreements and disagreements, between two independent coders for the existence, type, and quality of gestures. This was a conservative measurement of reliability because agreement on all of the behaviors that were
Panpanzee | Panbanisha | Human child | |
---|---|---|---|
Gesture/symbol existence | 83 | 81 | 80 |
Gesture/symbol type | 80 | 82 | 98 |
Referent | 94 | 87 | 95 |
Eye contact | 89 | 90 | 83 |
Vocalization | 100 | 97 | 93 |
Persistence | 87 | 84 | 90 |
Communicativeness | 81 | 81 | 97 |
The most basic finding, and one that is central to the gestural theory of language evolution, is the similarity of gestures among bonobo, chimpanzee, and human child at comparable periods of development (see examples in the video frames presented in Figures
Following Iverson and Goldin-Meadow (
Figure
Three findings concerning the expression of communicative intent are of particular relevance to the evolution of language: one is that all three species use the complete array of behaviors that signal communicative intent: eye contact, vocalization, and persistence (Figure
The third important finding concerning the expression of communicative intent was not foreseen: the child used a much higher proportion of multimodal expressions of communicative intent than the apes did. 84% of the child’s communicative gestures utilized more than one means of signaling communicative intent; in contrast, only 23% of the chimpanzee’s communicative gestures and 22% of the bonobo’s communicative gestures utilized more than one means of signaling communicative intent [human vs. chimpanzee: χ2(1) = 314.901,
We expected to find that gestures preceded symbol use more often than the reverse for the human child and the language-enculturated apes. Following Iverson and Goldin-Meadow (
We also hypothesized that all three species would exhibit a shift from greater reliance on gestures to greater reliance on symbols (words for the child, lexigrams for the apes) with increasing age. In order to ensure that a varied range of contexts were represented when assessing patterns of communicative development, we compared the frequency of gesture and symbol use during the first half of the study to the frequency of gesture and symbol use during the second half of the study. Thus, we compared observations from the first 7 months of the study to observations from the last 7 months of the study for the apes and observations from the first 4 months of the study to observations from the last 4 months for the child. Because there were an uneven number of data points for the apes, data from their 19th month of age, the middle data point, was excluded from analysis. Analyses focus on frequency of use rather than the number of referents referred to within a given modality.
Between 11 and 14 months of age, GN, the child, produced 211 communicative gestures and 23 words during observation sessions. Between 15 and 18 months of age, she produced an average of 219 communicative gestures and 513 words during observation sessions (Figure
Given that children indicate more, whereas language-enculturated apes request more on the symbolic level, we expected language-enculturated apes to exhibit a greater proportion of reaches relative to points when compared to a human toddler of a similar age. We focused upon canonical examples of communicative reaching and pointing and excluded from analysis head-points, point-touches, and reach-touches.
Visual inspection of the frequency of different communication types across the clade (depicted in Figure
The child produced 138 points and 151 reaches over the course of the study. The bonobo produced 11 points and 271 reaches. The chimpanzee produced 17 points and 358 reaches. Fisher’s tests revealed that the child produced a higher proportion of points relative to reaches than the bonobo (
Exploring the idea that only human children are motivated to share experience for its own sake, we also compared the frequency of showing gestures in child, chimpanzee, and bonobo. In support of the idea that showing something to another may be uniquely human, the child was the only one to use a showing gesture – although the showing gesture was still not very frequent.
The current study, with its unique naturalistic video database for a young chimpanzee, bonobo, and human child, provides support for the role of gesture in language evolution. At the most basic level, we see a functional and formal similarity in gesture in all three species, with a young bonobo, chimpanzee, and child all at comparable periods of communicative development. Gestures served a communicative function across species in that they were usually paired with evidence of communicative intent. Similar types of gestures were also observed across species. According to the logic of cladistic analysis, these shared gestural capacities were likely present before the divergence of the three species five or six million years ago. Acknowledging the likelihood of gesture as a biological capacity in the clade’s common ancestor, it is nonetheless impossible to know to what extent and how it was actualized in behavior at that time. Still, given that human language, as we now know it, had not yet evolved at that time, this is one line of evidence for the gestural foundation of human language evolution.
The ontogenetic precedence of gesture before symbol across the clade provides another line of evidence for the gestural theory of language evolution. The frequency of symbol relative to gesture use increased with development across the clade. While phylogeny does not repeat ontogeny, it is the case that later stages of development cannot evolve without the ontogenetic foundation of earlier stages already being present (Parker and McKinney,
However, there was also evidence of the phylogenetic divergence of humans and apes in the domain of communication. Symbols (in the form of words) became more frequent than gestures in the child’s later observations, whereas gestures remained more frequent than symbols (in the form of lexigrams) for the chimpanzee and bonobo throughout the study period. While the same qualitative pattern of reference being first achieved through gesture and only later through symbols was observed across the clade, this pattern was statistically significant only for the human participant. These developmental patterns are in line with the subsequent evolution of complex language in
Like atypically developing humans communicating with typically developing humans, apes may face communicative barriers when trying to communicate with humans that they would not face when communicating with other apes. Findings from atypical human developmental trajectories suggest that developmental changes in gesture relative to symbol use may depend upon the match between individuals and communicative modalities. While most human children move from more gestures to more words with development, blind children do not exhibit this pattern (Iverson and Goldin-Meadow,
Given limitations of the lexigram system devised to help apes communicate with us, compared with the flexibility of human speech, gestures may be a better match than symbols for apes but not humans. Unlike human speech, lexigram boards are not always available and have a constrained number of possible referents (a maximum of 256). Thus, a combination of gestures and symbols may confer some of the referential flexibility to language-enculturated apes that speech comes to provide to humans with development.
However, differences between the human and the apes in the observed frequency with which items transitioned from gesture to speech may also be attributable to the greater variety of contexts in which ape communication was observed relative to the human child; this greater variety of contexts greatly reduced the occurrence of the same referent across time, making the sample size too small to attain statistical significance. The human participant was assessed in a constant home environment, similar to that used by Iverson and Goldin-Meadow (
It is also important to note that the distinction between gestures and symbols made for the purposes of the current study is somewhat arbitrary (see Kendon,
It is important to note that all three species of the clade used gesture communicatively and that all three species exhibited the same set of markers of communicative intent: eye gaze, vocalization, and persistence. Cladistic analysis suggests that these markers of communicative intent in the gestural modality were present in our common ancestor five to six million years ago. The combination of gesture and vocalization may have particular importance in language origins (Cartmill and Maestripieri,
In line with our hypothesis, the human child more frequently paired gestures with vocalization than the apes. This association of gesture and vocalization, as well as the existence of gestures unique to the child (e.g., nodding, waving), constitute additional evidence suggesting the co-evolution of gesture and speech after the evolutionary divergence of the hominid line five to six million years ago.
Contrary to our hypothesis, and to previous comparisons of older apes to human children (Leavens and Hopkins,
The human child produced a far greater number of pointing gestures than did the apes; in contrast, the apes produced a greater number of reaching gestures than did the child. Only the human child produced showing gestures. Together these findings provide gestural evidence that ape communication is more instrumental than that of human children; in contrast, children gear their communication more to the sharing of experience with another (Tomasello,
However, we must not forget that both pointing and reaching were present in all species, in the same way that both declarative and imperative symbol productions are present later in development of the same apes, as well as two human children (Lyn et al.,
The finding that the human child pointed more relative to reaching than the apes is again contrary to previous comparisons of older language-enculturated apes and human infants (Leavens and Hopkins,
When comparing development across three species, it is difficult to equate the species in terms of developmental level as skills are likely to develop at variable rates across species. Having more representatives of each species could increase our understanding of normative measures of development in each species and allow us to compare developmental stages across species more effectively. Additionally, a greater number of representatives of each species would allow us to disentangle species and individual differences.
Reliability between coders was substantially higher for some of the coding decisions (particularly for type of communication and for whether or not it was communicative) for the human child than it was for the apes. Nonetheless, inter-rater reliability reached acceptable standards for every species. These differences in coding reliability between the human child and the apes could be due to poorer video quality for the ape data and to difficulty on the part of human coders in coding ape gestures.
While lexigrams share a number of important similarities with words, such as an arbitrary correspondence between symbol and referent, they also have key differences. For example, ideas expressed in lexigrams do not always have a one-to-one correspondence to ideas expressed with words. For example, the lexigram “Sue’s-gate” is a single lexigram that could mean either a landmark or a more complex relation between a gate and a person. However, given that the semantic complexity of symbols was not our object of study, this difference should not have affected our results.
A more important difference between lexigrams and words is that lexigrams could be coded only when interpreted by a human caregiver or glossed by a machine while human speech needed only to be responded to in order to be coded. We did not code the number of lexigrams that were neither interpreted nor glossed and thus excluded from analyses. However, given that each ape was typically paired with a single caregiver who was intent on encouraging and capturing all of the ape’s communicative attempts, it is likely that only a small proportion of the apes’ lexigram use went unrecorded in the current analyses. In any case, it is likely that a small proportion of the child’s verbal communications were not responded to, so it is possible that there was no difference in selectivity between ape and child communication.
Future research should also examine the emergence of imperative (requests for something to be granted) or declarative (attempts to cause another to see what one sees: Bates et al.,
In order to better investigate ontogenetic and phylogenetic relations between gesture and speech, future cross-species comparisons should distinguish between dyadic and triadic gestures as well as between deictic, iconic (or picture-like), and representational gestures. Dyadic gestures, referring to another, may be more developed in apes than triadic gestures, referring to objects. Even within triadic gestures, it is possible that apes use them to refer to other living beings more, while children in industrial societies use them to refer more to inanimate objects.
With respect to action gestures, Tanner and Byrne (
What does this study tell us about the relationship between symbols and gestures? It provides evidence of a phylogenetic and ontogenetic transition from gesture to symbol. At the same time, it provides new evidence for the co-evolution of gesture and speech. The study documents clear similarities and differences in the ontogeny of communication of a chimpanzee, a bonobo, and a human child. The similarities provide insights into shared potential which could have helped our ancestors develop language from gesture. Differences suggest ways that humans may have diverged from other members of the clade in their communicative development, and provide evidence for the co-evolution of gesture and speech.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We dedicate this manuscript to the memory of Panbanisha, who was taken from us too soon. We would like to thank Jana Iverson for generous feedback on study design, Cristina Khou for help developing the coding scheme, and Goldie Salimkhan for organizing the video data. Data preparation was supported by a grant from the Leakey Foundation to Patricia Greenfield and by funding from the FPR-UCLA Center for Culture, Brain and Development to Kristen Gillespie-Lynch. Ape data collection took place at the Language Research Center, Georgia State University, and was supported by grants from NICHD to Duane Rumbaugh and Sue Savage-Rumbaugh. We gratefully thank the parents of GN for the opportunity to video record their family for this study. Author Gillespie-Lynch is currently in the Department of Psychology, College of Staten Island, City University of New York. Author Feng is currently in the Department of International Education, Columbia University. All federal and local regulations were followed regarding the use of animals in research. All research activities were approved and overseen by the Institutional Animal Care and Use Committees and the Institutional Review Boards at the respective institutions.