# ORIGIN OF TROPICAL DIVERSITY: FROM CLADES TO COMMUNITIES

EDITED BY: James Edward Richardson and R. Toby Pennington PUBLISHED IN: Frontiers in Genetics and Frontiers in Ecology and Evolution

#### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

> *The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88945-050-3 DOI 10.3389/978-2-88945-050-3

## About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

## Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

## Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

## What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **ORIGIN OF TROPICAL DIVERSITY: FROM CLADES TO COMMUNITIES**

Topic Editors:

**James Edward Richardson,** Universidad del Rosario, Colombia & Royal Botanic Garden Edinburgh, UK **R. Toby Pennington,** Royal Botanic Garden Edinburgh, UK

Savannah, River Tarangire, Tanzania. Photo Flávia Pezzini.

In this volume we aimed to assess progress in determining the processes by which current patterns of tropical biodiversity were established and are maintained. Tropical regions are highly species-rich and we present studies that have improved our understanding of the generation of that diversity at local, regional and global scales. We demonstrate how diverse fields from molecular phylogenetics, phylogeography, palaeontology and palaeoecology continue to improve our understanding of the natural history of the tropics.

**Citation:** Richardson, J. E., Pennington, R. T., eds. (2016). Origin of Tropical Diversity: From Clades to Communities. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-050-3

# Table of Contents

## **General Conceptual or Methodological Articles**

	- Luciano B. Beheregaray, Georgina M. Cooke, Ning L. Chao and Erin L. Landguth

## **Taxon Specific Articles**


Thomas L. P. Couvreur, W. Daniel Kissling, Fabien L. Condamine, Jens-Christian Svenning, Nick P. Rowe and William J. Baker

*116 Patterns of diversification amongst tropical regions compared: a case study in Sapotaceae*

Kate E. Armstrong, Graham N. Stone, James A. Nicholls, Eugenio Valderrama, Arne A. Anderberg, Jenny Smedmark, Laurent Gautier, Yamama Naciri, Richard Milne and James E. Richardson

## *129 Corrigendum: Patterns of diversification amongst tropical regions compared: a case study in Sapotaceae*

Kate E. Armstrong, Graham N. Stone, James A. Nicholls, Eugenio Valderrama, Arne A. Anderberg, Jenny Smedmark, Laurent Gautier, Yamama Naciri, Richard Milne and James E. Richardson

## **Area Specific Articles**

*130 An engine for global plant diversity: highest evolutionary turnover and emigration in the American tropics*

Alexandre Antonelli, Alexander Zizka, Daniele Silvestro, Ruud Scharn, Borja Cascales-Miñana and Christine D. Bacon


Lisa Pokorny, Ricarda Riina, Mario Mairal, Andrea S. Meseguer, Victoria Culshaw, Jon Cendoya, Miguel Serrano, Rodrigo Carbajal, Santiago Ortiz, Myriam Heuertz and Isabel Sanmartín

*212 Continental scale patterns and predictors of fern richness and phylogenetic diversity*

Nathalie S. Nagalingum, Nunzio Knerr, Shawn W. Laffan, Carlos E. González-Orozco, Andrew H. Thornhill, Joseph T. Miller and Brent D. Mishler


Rosane G. Collevatti, Levi C. Terribile, José A. F. Diniz-Filho and Matheus S. Lima-Ribeiro

*246 Evidence for an intrinsic factor promoting landscape genetic divergence in Madagascan leaf-litter frogs*

Katharina C. Wollenberg Valero

*253 Comparative phylogeography of eight herbs and lianas (Marantaceae) in central African rainforests*

Alexandra C. Ley, Gilles Dauby, Julia Köhler, Catherina Wypior, Martin Röser and Olivier J. Hardy

*267 Microrefugia and species persistence in the Galápagos highlands: a 26,000-year paleoecological perspective*

Aaron F. Collins, Mark B. Bush and Julian P. Sachs

# Editorial: Origin of Tropical Diversity: From Clades to Communities

James E. Richardson1, 2 \* † and R. Toby Pennington2 †

<sup>1</sup> Programa de Biología, Universidad del Rosario, Bogotá, Colombia, <sup>2</sup> Tropical Diversity Section, Royal Botanic Garden Edinburgh, Edinburgh, Scotland

Keywords: tropical, diversity, phylogeny, phylogeography, ecology, paleontology, community

**The Editorial on the Research Topic**

#### **Origin of Tropical Diversity: From Clades to Communities**

For centuries, one of the main questions in relation to the Earth's biological diversity has been disentangling the causes for the comparatively high diversity in tropical zones, particularly the Neotropics. This volume aimed to demonstrate how molecular phylogenetics, phylogeography, paleontology, and paleoecology contribute to our understanding of the processes that gave rise diversity in the tropics.

The response of authors to our initial invitation to participate was overwhelming, and the resulting manuscripts are brought together in this eBook. We have attempted to order the contributions, placing first those that deal with general conceptual or methodological issues and that are relevant to all parts of the tropics. These included studies on how tropical diversity was generated and/or is maintained at those local scales (Cannon and Lerdau; Beheregaray et al.; Collevatti et al.). The next group of articles focuses on specific taxa, though in most cases these are distributed throughout the tropics, and so address global questions. The final group of articles addresses studies confined to single major tropical regions including Africa and Australia.

#### Edited by:

Federico Luebert, University of Bonn, Germany

#### Reviewed by:

Loïc Pellissier, ETH Zurich and WSL Birmensdorf, Switzerland

\*Correspondence:

James E. Richardson jamese.richardson@urosario.edu.co

† These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 08 June 2016 Accepted: 03 October 2016 Published: 20 October 2016

#### Citation:

Richardson JE and Pennington RT (2016) Editorial: Origin of Tropical Diversity: From Clades to Communities. Front. Genet. 7:186. doi: 10.3389/fgene.2016.00186 GENERAL AND CONCEPTUAL CONTRIBUTIONS

The first group of general and conceptual papers leads with Cannon and Lerdau who presented a study that suggested that inter-specific gene flow may play an important role in preventing extinction, particularly of rare species, and thus maintaining species diversity. Although the number of well corroborated examples of hybridization amongst tropical plant species is increasing (examples include Duminil et al., 2012), assessing the generality of Cannon and Lerdau's suggestion will require many more studies of the frequency of interspecific crosses in tropical floras. Beheregaray et al. reviewed studies that have demonstrated ecological speciation in the tropics and encourage an approach using a conceptual and analytical framework based on genetic data and simulation modeling. Collevatti et al. presented statistical and macroecological approaches in a multi-model inference framework that can be applied to comparative phylogeography. The method employs direct reconstruction of lineage divergence in time and space together with ecological niche modeling and coalescent simulation. This multi-model approach is suggested to be more likely to determine correctly the micro-evolutionary processes and spatial context of lineage divergence.

General explanations for the latitudinal gradient in species richness were postulated by Hurlbert and Stegen and through empirical studies (e.g., Richardson et al.; Antonelli et al.). Hurlbert and Stegen highlighted the deficiencies in evaluating the predictions of a single diversity hypothesis that does not rule out alternatives. Richardson et al. and Antonelli et al. both indicated that tropical lineages of plants had not evolved at a faster rate than temperate ones, and supported the hypothesis that the tropics are more species rich because tropical conditions occupied greater areas and have existed for longer periods than temperate ones. Antonelli et al. analyzed c. 22,600 species and c. 20 million geo-referenced records of angiosperms and suggested that differential rates of diversification were not responsible for the latitudinal gradient as there were no significant differences between speciation and extinction in tropical and non-tropical lineages. However, Pyron and Wiens (2013) showed that in amphibians the latitudinal gradient could be explained by a combination of higher speciation in the tropics, greater extinction in temperate zones, and low dispersal out of the tropics compared with an alternative explanation of colonization of the tropics from temperate regions.

## TAXON BASED CONTRIBUTIONS

Although at global scales diversification rates of plants between tropical and temperate regions do not appear to differ in the studies of Antonelli et al. and Richardson et al., care must be taken in accounting for regional differences in diversification rates. Richardson et al. indicated a more rapid diversification for the neotropical genus Theobroma and allies than in related lineages from other tropical and temperate regions. The diversification of Theobroma coincided with, and may have been driven by, Andean uplift, supporting the ideas of Gentry (1982) that the greater diversification in the Neotropics might have resulted from the greater and more recent orogenic events there in comparison with Africa or Asia.

In this research topic, we aimed to emphasize the whole tropics because a global synthesis is necessary to place varying local patterns and processes in context. This global perspective was covered in a review of patterns in diversity in tropical reef fishes (Cowman et al.), diversification processes in climbing palms (Couvreur et al.), and in tropical trees (Armstrong et al.; Weeks et al.). Armstrong et al. indicated that Neotropical lineages of the predominantly lowland tropical forest genus Manilkara (Sapotaceae) had a faster diversification rate than African or Asian ones. However, comparisons of the capacity of closely related families to evolve into distinct biomes are also necessary. Weeks et al. compared climatic niche evolution in two broadly distributed sister families. In Anacardiaceae, adaptation to different climates seems to have played a prominent role in diversification, while in Burseraceae much less so. Couvreur et al. demonstrated how the morphological changes necessary to evolve from non-climbing to climbing habit may have played an important role in palm diversification leading to the recent speciation of one-fifth of extant palm species.

## REGIONAL CONTRIBUTIONS

The Neotropics perhaps remain the major tropical region that is best studied in the field of evolutionary diversification. In this volume, Antonelli et al. demonstrated that the American tropics are the area with the highest evolutionary turnover and rate of emigration in comparison to other major tropical regions. Carrillo et al. highlighted that, for mammals, the Great American Biotic Interchange between the Laurasian and Gondwanan land masses of the Americas began c. 7–10 mya and that the first migrations involved temperate taxa, although this result may have been skewed by greater sampling at higher latitudes. Patterns of migration between Laurasian Central America and Gondwanan South America were also assessed in Malpighiacaeae (Willis et al.) indicating that a remarkable amount of dispersal from South to Central America contributed more to increasing phylogenetic diversity there than in situ diversification.

Based on paleoecological evidence, it was thought that Quaternary glacial cycles fueled neotropical diversification through the alternation of glacial aridity and warmer, wetter inter-glacials that led to contraction and expansion of species ranges (e.g., Haffer, 1969). The universality of "refuge theory" was subsequently challenged by data that suggested that many extant species were in fact of pre-Pleistocene origin (e.g., reviewed by Moritz et al., 2000). This debate has primarily been argued in studies of Neotropical lineages but processes may have been different in other tropical regions. In this collection of papers, Damasceno et al. assessed the neglected vanishing refuge model (VRM), first described by Vanzolini and Williams (1981), which describes a process of diversification that implicates both vicariance and divergent selection. For the neotropical lizard species, Coleodactylus meridionalis, Damasceno et al. found that environmental and genetic analyses fitted the predictions of the VRM, but physiological data did not. Based on fossil data, Collins et al. demonstrated species persistence in the Galápagos highlands in mesic micro-refugia during the last glacial maximum. It was suggested that these micro-refugia may be areas of preservation and resilience to increasing aridity as global temperatures increase.

In order to make better comparisons amongst tropical regions, more studies are needed that focus on Africa, the Indian sub-continent and Australasia, all of which appear to be more neglected in the literature of tropical diversification and biogeography. New studies may provide insights of the effects of differing regional climatic and geological histories. For example, the scale of Pleistocene aridification in Africa was much greater than in the Neotropics or Southeast Asia. Diversification in Southeast Asia may have proceeded differently as a result of the mostly archipelagic nature of the region, its complex tectonic history, and Quaternary sea-level changes (Richardson et al., 2014).

This ebook includes extensive reviews and novel studies on Afro-Madagascan taxa including a review of the evolution of African plant diversity (Linder) and the Rand flora (Pokorny et al.), which do much to explain patterns of diversity within Africa and the comparative paucity of species on that continent compared to other parts of the tropics. Linder implicated a role for phylogenetic niche conservatism in the construction of different elements of the African flora. The austro-temperate flora has recruited lineages from similar southern hemisphere temperate floras and the lower latitude floras from those of the tropical regions of other continents. The poverty of the African tropical flora may have been related to the spread of C4 grasslands and the change to fire-regulated ecologies. Pokorny et al. revealed how different elements of the Rand flora, which occupies the continental margins of Africa and adjacent islands, have adapted to differing levels of intensity of aridification at different spatial and temporal scales. The age of

disjunctions coincide with the climatic affinities of each Rand Flora lineage. Wollenberg Valero emphasized the importance of both topographic and riverine barriers in generating genetic divergence along with body size changes in lineage diversification in microendemic species of Madagascan leaf litter frogs. Ley et al. studied the possible impacts of Pleistocene aridification on phylogeographic patterns in eight species of herbs in the family Marantaceae. They suggested that areas of high diversity might have been both Quaternary refuges and/or secondary contact zones. These patterns in herbaceous plants were similar to those found among tropical tree species (Dauby et al., 2014), especially in the study area of southern Lower Guinea.

In a volume that we edited on the historical assembly of major biomes in 2004 (Pennington et al., 2004), which had a major focus on the tropics, the topics of phylogenetic diversity and especially community phylogenetics were hardly mentioned because they were in their infancy. In the subsequent decade, they have become influential, and two contributions to this volume reflect this. The value of mega-phylogenies generated from species found in multiple forest plots was assessed by Erickson et al. Phylogenetic resolution and estimates of phylogenetic diversity were better and more consistent in the mega-phylogeny than in phylogenies of individual plots. Based on specimen records and a generic phylogeny of ferns, Nagalingum et al. identified a hotspot of both taxic and phylogenetic diversity in the wet tropics of northeastern Australia. The variable that best explained phylogenetic diversity was annual precipitation, but the species-energy hypothesis was supported with the correlation of phylogenetic diversity to precipitation plus radiation. Northeastern Australia has also been shown to be a zone in which elements from Laurasia have recently mixed with Australian (or Gondwanan) ones that could

## REFERENCES


Haffer, J. (1969). Speciation in Amazonian forest birds. Science 165, 131–137.


Pennington, R. T., Cronk, Q. C. B., and Richardson, J. E. (2004). Introduction and synthesis: plant phylogeny and the origin of major biomes. Philos. potentially result in high phylogenetic diversity (Richardson et al., 2012).

It is increasingly clear from studies presented in this volume and previous ones (e.g., McKenna and Farrell, 2006; Moreau and Bell, 2013) that the timing and drivers of tropical diversification are a result of multiple factors acting at different spatial and temporal scales and in different ways in different taxonomic groups. This volume contributes a substantial number of studies from an empirical perspective and also presents novel ways of using new and traditional forms of data. Over the past quarter of a century huge progress has been made in developing our understanding of the events that gave rise to the diversity of tropical regions. The potential for furthering our understanding of these processes has been boosted by new DNA sequencing technologies that permit the accumulation of billions of data points that will allow us to reconstruct more accurately the evolutionary history of tropical organisms. A continuing challenge will be the gathering of more samples, particularly from lesser studied tropical regions, which will allow regional biota to be placed in a global context. Our aim was to include studies from some of those more neglected areas, but more are still needed, particularly in Africa and Asia. We hope that this volume will serve to encourage further studies.

## AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

## ACKNOWLEDGMENTS

This project was the brain-child of Valentí Rull, who we thank sincerely.

Trans. R. Soc. Lond. B Biol. Sci. 359, 1455–1465. doi: 10.1098/rstb. 2004.1539


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Richardson and Pennington. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

**REVIEW ARTICLE** published: 21 January 2015 doi: 10.3389/fgene.2014.00477

## Ecological speciation in the tropics: insights from comparative genetic studies in Amazonia

### *Luciano B. Beheregaray1\*, Georgina M. Cooke2 , Ning L. Chao3,4 and Erin L. Landguth5*

<sup>1</sup> Molecular Ecology Lab, School of Biological Sciences, Flinders University, Adelaide, SA, Australia

<sup>2</sup> The Australian Museum, The Australian Museum Research Institute, Sydney, NSW, Australia

<sup>3</sup> Departamento de Ciências Pesqueiras, Universidade Federal do Amazonas, Manaus, Brazil

<sup>4</sup> National Museum of Marine Biology and Aquarium, Pintung, Taiwan

<sup>5</sup> Division of Biological Sciences, University of Montana, Missoula, MT, USA

#### *Edited by:*

Toby Pennington, Royal Botanic Garden Edinburgh, UK

#### *Reviewed by:*

Octavio Salgueiro Paulo, Universidade de Lisboa, Portugal James Edward Richardson, Royal Botanic Garden Edinburgh, UK

#### *\*Correspondence:*

Luciano B. Beheregaray, Molecular Ecology Lab, School of Biological Sciences, Flinders University, Adelaide, SA 5001, Australia e-mail: luciano.beheregaray@ flinders.edu.au

Evolution creates and sustains biodiversity via adaptive changes in ecologically relevant traits. Ecologically mediated selection contributes to genetic divergence both in the presence or absence of geographic isolation between populations, and is considered an important driver of speciation. Indeed, the genetics of ecological speciation is becoming increasingly studied across a variety of taxa and environments. In this paper we review the literature of ecological speciation in the tropics. We report on low research productivity in tropical ecosystems and discuss reasons accounting for the rarity of studies. We argue for research programs that simultaneously address biogeographical and taxonomic questions in the tropics, while effectively assessing relationships between reproductive isolation and ecological divergence. To contribute toward this goal, we propose a new framework for ecological speciation that integrates information from phylogenetics, phylogeography, population genomics, and simulations in evolutionary landscape genetics (ELG). We introduce components of the framework, describe ELG simulations (a largely unexplored approach in ecological speciation), and discuss design and experimental feasibility within the context of tropical research. We then use published genetic datasets from populations of five codistributed Amazonian fish species to assess the performance of the framework in studies of tropical speciation.We suggest that these approaches can assist in distinguishing the relative contribution of natural selection from biogeographic history in the origin of biodiversity, even in complex ecosystems such as Amazonia. We also discuss on how to assess ecological speciation using ELG simulations that include selection. These integrative frameworks have considerable potential to enhance conservation management in biodiversity rich ecosystems and to complement historical biogeographic and evolutionary studies of tropical biotas.

**Keywords: biogeography, adaptive divergence, evolutionary landscape genetics, phylogenetics, phylogeography, ecological genomics, tropical diversification, biodiversity conservation**

#### **INTRODUCTION**

*"Natural selection, as we shall hereafter see, is a power incessantly ready for action* ...*" (Darwin, 1859)*

Evolution creates and sustains biodiversity via adaptive changes in ecologically relevant traits. Understanding how organisms adapt and diversify have been topics of fundamental importance in biology for over 150 years (Losos et al., 2013). It is now realized that evolution reflects both historical and contemporary contingencies and is mediated by bidirectional interactions between ecological and evolutionary changes (Schoener, 2011; Losos et al., 2013). Thus, to understand adaptive changes and patterns of biological diversification it is often necessary to integrate information from both past (e.g., geomorphologic and paleoclimatic variation) and recent (e.g., natural selection) processes (Avise, 2000; Hoffmann and Sgro, 2011; Losos et al., 2013). Bridging macro and microevolutionary studies provides ways to clarify regional patterns of biological diversification and their underlying mechanisms of

speciation. Here we focus on the contribution of studies that simultaneously assesses historical and ecologically driven signatures of population divergence to explain patterns of biodiversity in tropical ecosystems.

Speciation – the continuous process that gives rise to biological diversity, is often intimately associated to changes in phenotypes and to adaptation to the ecological environment (Darwin, 1859; Mayr, 1963; Dobzhansky, 1970). Natural selection contributes to ecological adaptation both in the presence or absence of geographic isolation between populations, and as a consequence, drives phenotypic diversification, population divergence, the evolution of reproductive isolation and the formation of new species (Endler, 1977, 1986; Schluter and Conte, 2009; Nosil, 2012). Ecologically mediated selection is considered an important driver of speciation (Schluter and Nagel, 1995; Schluter, 2001; Shafer and Wolf, 2013). Yet, ecological mechanisms driving reproductive isolation vary in space and time and can have different phenotypic and genetic signatures (Nosil, 2012), making it difficult to assess

the role and the proportional contribution of natural selection in the origin of species. On the other hand, non-mechanistic spatial frameworks are generally easier to establish and to test, and as such have historically dominated the study of speciation. Such frameworks tend to focus on the geographic arrangement of populations undergoing reproductive isolation (e.g., allopatric, parapatric, or sympatric) instead of the processes driving the evolution of reproductive isolation. These studies have shown that allopatric speciation is common across taxonomic groups and biomes (Mayr, 1942; Coyne and Orr, 2004). They have also promoted the view that speciation primarily results from genetic drift due to geographic isolation over that of speciation via natural selection (Schluter, 2001; Via, 2001; Rundle and Nosil, 2005).

The increasing integration between molecular genetic approaches with theoretical and empirical ecological studies (both in the field and experimentally; Nosil, 2012; Shafer andWolf, 2013) seen during the last 20 years has restored excitement about the role of ecology in speciation, returning to Darwin's (1859) original theory that divergence may be driven by adaptation rather than merely being a by-product of geographic isolation. The term 'ecological speciation' was coined during this period (Schluter, 1996) and can be defined as "the process by which barriers to gene flow evolve between populations as a result of ecologically based divergent selection between environments" (Nosil, 2012). During ecological speciation, populations occupying different biotic or abiotic environments or using different resources experience unique selection pressures that directly or indirectly result in reproductive isolation, even if the original populations remain in contact (Schluter, 2001). The historical foundations of ecological speciation are inextricably linked to the works of Darwin, Mayr, and Dobzhansky, which provide the reference with which 21st century views of speciation should be contrasted (Harrison, 2012). As is often the case in evolutionary biology research, ecological speciation has not been free of criticism. These include discussions about the value of this terminology, its equivalency to parapatric and sympatric speciation, and the role of divergent natural selection in the speciation process (readers interested in contrasting views to those presented here should consult Sobel et al., 2010 and Harrison, 2012).

Ecological speciation can occur under several spatial scenarios – from allopatry through to sympatry (Schluter, 2001; Coyne and Orr, 2004; Rundle and Nosil, 2005), with classical examples being parapatric (Endler, 1977; Sobel et al., 2010). Ecological divergence and reproductive isolation can evolve under a single or under multiple geographic scenarios (e.g., it could begin in allopatric and be completed in sympatric conditions, Rundle and Schluter, 2004). The geographic arrangement of populations is important because it affects the source of selection and rates of gene flow, but what is essential in ecological speciation is that the divergence be primarily driven by divergent selection (Nosil, 2012). Indeed, substantial population divergence and speciation is known to occur between ecologically dissimilar populations even in theface of ongoing gene flow between these (i.e., divergence-with-gene-flow; Smith et al., 1997; Beheregaray and Sunnucks, 2001; Schluter, 2001; Coyne and Orr, 2004; Schluter and Conte, 2009; Bernatchez et al., 2010; Cooke et al., 2012a; Sexton et al., 2014). Identifying these incipient ecological species represents an opportunity to investigate ongoing evolutionary processes in situations where adaptive divergence

and reproductive isolation are associated (for a review see Orr and Smith, 1998). Allopatric ecological speciation might also be common in situations where ecological divergence is a stronger driver of reproductive isolation than genetic drift (Funk, 1998;Vines and Schluter, 2006).

Studying the role of ecology in the speciation process is a popular research area these days. OurWeb of Science® search conducted in May 2014 indicated that 996 articles were published under the topic of "ecological speciation," with 862 of these being empirical studies. Interest in this research topic has grown exponentially in recent years, and so has the corresponding citation rate (**Figure 1**). For instance, 2013 has seen almost 200 publications and over 5,600 citations, compared to 60 articles and 1,600 citations in 2008. Interestingly, our evaluation shows that the majority of empirical studies (674 out of 862) used information from genetic datasets or genetic knowledge to improve pattern interpretation. Indeed, the genetics of ecological speciation is becoming increasingly studied across a variety of taxa and environments (Wolf et al., 2010; Nosil, 2012; Shafer and Wolf, 2013). As a result, speciation research has expanded beyond traditional boundaries, since genetic and genomic techniques now offer the means to disentangle the manifold (albeit complex) processes of drift from selection, across various evolutionary stages, environments and taxa (Moritz et al., 2000; Shafer and Wolf, 2013; Seehausen et al., 2014).

The initial aim of this paper is to examine the literature to assess the research interest about ecological speciation in the tropics, the region that encompasses the most-species rich ecosystems on Earth. Tropical rainforests and coral reefs house a disproportionally high number of species and, as a consequence, have become the focus of great attention of scientists and the general public (Primack, 2014). For instance, even though tropical rainforests occupy only 7% of the Earth's land area, they are estimated to contain most of the Earth's species (Corlett and Primack, 2010). Although several opinions, reviews, and a few meta-analyses have been published about speciation in the tropics (e.g., Haffer, 1997; Moritz et al., 2000; Wiens and Donoghue, 2004; Rull, 2008; Hoorn et al., 2010b; Salisbury et al., 2012; Bowen et al., 2013; Smith et al., 2014), to the best of our knowledge none have focused on ecological

**Science, May 2014).**

speciation. To address this shortcoming, we examine the empirical literature to identify analytical approaches that have been used to study ecological speciation in the tropics.

Motivated by a review about diversification of rainforest faunas (Moritz et al., 2000) and by our initial assessment of the literature, we further argue for analytical approaches that explore genomic (or genetic) datasets at the population level and thus bridge historical and ecological considerations in tropical speciation research. We perceive the need for a research program that simultaneously addresses large-scale biogeographical and taxonomic questions in the tropics, while effectively assessing finer-scale relationships between reproductive isolation and ecological divergence.

To contribute toward this goal, we propose a framework for ecological speciation research that integrates information from evolutionary landscape genetics (ELG), genome scans, population genetics, phylogeography, and phylogenetics. We then use published population-level genetic datasets from five codistributed and taxonomically diverse Amazonian fish species (Cooke et al., 2012a,b,c,d, 2014) to illustrate the performance of the framework in studies of tropical speciation. Specifically, these studies examine the accumulation of population and lineage diversity in these fish groups within the context of geomorphological history, tributary arrangement, and divergent natural selection putatively associated with hydrochemical ecotones (i.e., at the interface of major rivers with different 'water colors'). Although these lineages show different biogeographic and evolutionary histories, striking commonalities exist in the way that population genetic divergence and incipient speciation appear to be ecologically induced and spatially influenced by ecotones.

We suggest that this framework can assist in distinguishing the relative contribution of natural selection from biogeographic history in the origin and maintenance of biodiversity, even in a complex and understudied tropical ecosystem such as Amazonia. In addition, we show that it is feasible to frame ecological speciation in a spatially explicit ELG context that includes selection. We also discuss recent developments in environmental mapping, simulations and population genomics that allow clarification of population divergence across selection gradients.

### **WHY IS THE RESEARCH EFFORT IN ECOLOGICAL SPECIATION SO LOW IN TROPICAL ECOSYSTEMS?**

Our assessment of empirical research productivity in ecological speciation shows that the tropics have been largely left behind compared to other ecosystems. Only 51 publications (5.2%) out of the 996 previously identified articles (**Figure 1**) dealt with tropical species, and only 40 of those could be classified as empirical studies. Although our searches might have missed relevant studies that did not use the words "tropical" or "tropic(s)" in the abstracts, keywords, or titles, our inspection of the literature indicates that research effort in ecological speciation is indeed largely biased toward temperate regions. This conclusion is consistent with the empirical literature reviewed in a recently published book on ecological speciation (Nosil, 2012), with a review of the topic (Rundle and Nosil, 2005) and with several syntheses about tropical diversification (Moritz et al., 2000; Rull, 2008; Bowen et al., 2013).

Research trends in tropical ecological speciation were collected by inspecting the 40 recovered empirical articles for key selected categories. In terms of biological realm, most publications focused on terrestrial (65%), rather than marine (20%), or freshwater (15%) organisms. Terrestrial plants and terrestrial invertebrates were the best represented taxonomic groups (each with 27% of total articles), followed by fishes (17%), aquatic invertebrates (15%), birds (12%), herpetofauna (10%), and mammals (3%). Comparisons involving two or more species (65%) were more common than intraspecific surveys (35%). Interestingly, 45% of studies included comparisons across habitat gradients, and 35% of the 40 articles assessed patterns of gene flow. In terms of the main approaches used to generate and analyze data (**Figure 2**), a large proportion of studies used information from molecular phylogenetics (63%), phylogeography (50%) and morphology (42%). To our surprise, only few surveys of population genetic structure were conducted in the topics (27% of the total), with a handful of those also incorporating information from genome scans of selection (10%) or from landscape genetics (7.5%). As expected for any integrative field of research, ecological speciation was usually assessed by combining two or more approaches (80% of articles). Most combinations (83%) included phylogenetic or phylogeographic datasets, exemplifying the reliance of the field on molecular genealogical information.

What are the reasons for the rarity of ecological speciation research in tropical regions? First of all, basic taxonomic, ecological and distributional data are usually either inadequate or simply non-existent for most tropical biotas. Such information is needed to identify ecological 'opportunities' for speciation but is currently limited and hard to collect in the tropics compared to temperate regions because of the tropics' inherently high levels of biodiversity, endemism and remoteness. In addition, tropical ecosystems are typically found in developing countries, where essential resources (e.g., research funding, *in situ* capacity and accessible cataloged bio inventories) are the scarcest. The latter is consistent with the positive correlation found in another review (Beheregaray, 2008) between global research productivity of population-level genealogical surveys with country's wealth.

The low number of studies on ecological speciation might also be related to a historic research bias toward clarifying speciation timing and rates of diversification in the tropics. Many studies

of speciation have used molecular phylogenies (e.g., **Figure 2**) to clarify temporal and spatial patterns of diversification in tropical ecosystems. These surveys and meta-analyses have invariably compared lineages at or above the species level and framed pattern interpretation within historical biogeographic and macroevolutionary contexts. They have contributed substantially to our understanding of tropical diversity – such as improving the quality of the contentious debate about Tertiary versus Quaternary speciation timing (Pennington et al., 2004; Rull, 2008; Hoorn et al., 2010b, 2011; Antonelli and Sanmartín, 2011) and clarifying general patterns of diversity and diversification (e.g., for birds, Salisbury et al., 2012; plants, Richardson et al., 2001; Hughes et al., 2013; and mammals, Rolland et al., 2014). However, we believe they have also created a paradigm about tropical speciation that often ignores population-level information and the roles of ecology and natural selection in the origin of species. Any biogeographic scenario, recent or historical, begins with population differentiation, a process underpinned by changes in the physical and biotic environments of populations (Darwin, 1859; Avise, 2000). Ecological considerations at the scale of demes are therefore relevant to understanding not only underpinnings of population divergence and incipient speciation but also to inform large-scale biogeographical patterns (Wiens and Donoghue, 2004). Indeed, there is mounting evidence from multiple taxa and ecosystems suggesting that divergent natural selection might be an important driver of biodiversity (Schluter, 2000; Coyne and Orr, 2004; Rundle and Nosil, 2005; Schluter and Conte, 2009), including the tropics (Smith et al., 1997, 2001; Schneider et al., 1999; Garcia-Paris et al., 2000; Ogden and Thorpe, 2002; Lopez-Fernandez et al., 2010; Cooke et al., 2012a,b,c,d, 2014).

## **AN INTEGRATED GENETIC-BASED FRAMEWORK FOR STUDYING ECOLOGICAL SPECIATION IN THE TROPICS**

We now describe a framework to study ecological speciation in the tropics that integrates genomic (or genetic) data from populations with simulation modeling specific for ELG (summarized in **Box I**; **Figure 3**). Although the primary impetus is to assess the role of ecology as a driver of biodiversity, this type of research also generates valuable information for several data-deficient areas in tropical research. These include improvements of biological inventory databases and taxonomy, the discovery and delineation of cryptic species, selection gradients, ecotones and regional hotspots for conservation management, and the clarification of speciation timing, patterns of connectivity and metapopulation structure.

Below we introduce the main components of the framework and discuss design and experimental feasibility within the context of tropical research. We also provide a more detailed description of simulations and ELG (two largely unexplored approaches in ecological speciation) and their expected contributions. This is followed by a performance assessment of the proposed research framework based on published studies of population diversification and ecological speciation of Amazonian fishes.

#### **SAMPLING STRATEGIES AND PHENOTYPIC INFORMATION**

Beginning with sampling a particular study system, the framework allows for both single-taxon studies and for comparative analyses

#### **BOX I | Rationale.**

The emphasis of the framework is about assessing gene flow among populations (or preferably, among adaptive phenotypes) distributed across heterogeneous environments or ecotones while controlling for spatial genetic autocorrelation and vicariant history. In this scenario, evidence for ecological speciation is obtained if a positive correlation is found between genetic differentiation and environmental (or phenotypic) divergence, after the effects of geographic distance (Shafer and Wolf, 2013; Sexton et al., 2014) and historical biogeography between populations are accounted for.This is considered evidence because ecologically divergent selection across environments can reduce gene flow between populations and in this way drive local adaptation and population divergence (Schluter, 2000; Räsänen and Hendry, 2008; Nosil, 2012). Notwithstanding the focus on population genetics, the framework also integrates genealogical information from phylogenetics and phylogeography to disentangle ecological divergence from vicariant biogeographic history.This distinction is needed because ecological speciation usually requires a study system in which the existence of an allopatric phase is very unlikely in the context of evolutionary history (Endler, 1982; Coyne, 2007; Nosil, 2008). In an additional conceptual and analytical step, we synergistically combine the outcomes of above-mentioned empirical analyses of gene flow and population diversification with spatially explicit simulations in evolutionary landscape genetics, ELG (Landguth et al., 2012, 2014).These ELG simulations can be used to statistically assess inferred scenarios of environmentally induced population divergence.They can also shed light on the role of landscape structure and individual based organism response to landscape structure in the evolution of reproductive isolation (**Box II**). Although this framework was conceived to assess ecological speciation in the tropics, it can be applied to other ecosystems and to most sexually reproducing species. This is particularly true for biotas in temperate regions for which a priori information about traits subject to divergent selection is more abundant and for which better logistics allow greater experimental tractability (Nosil, 2012) compared to biotas from tropical regions.

that use population samples from codistributed species. Although population-level sampling and data collection for codistributed species can be logistically intricate and expensive, comparative studies are often more powerful and rewarding than single taxon surveys. Such studies allow for the identification of biogeographic histories that are shared among biotas (Avise, 1992; Bernatchez and Wilson, 1998) and regions that promote rapid adaptive evolution, have high concentrations of historically isolated populations, or both (Davis et al., 2008; Carnaval et al., 2009). In this case, sampling species across a range of mobility and life histories provides a powerful strategy to make generalizations about the effects of landscape history and environmental structure on gene flow (e.g., Galarza et al., 2009). Our proposal to integrate diverse empirical and simulated approaches (**Figure 3**) is valuable for comparative surveys because the applicability of individual approaches for assessing ecological speciation scenarios is expected to vary among groups of organisms (Nosil, 2012).

Natural selection works on phenotype, and the connections between selection on ordinary phenotypic traits and reproductive isolation are often strong and straightforward (Schluter, 2001). On the other hand, the complexity of both the environmental variation and the genetic basis of adaptive phenotypes hamper our ability to use genetic data to map adaptive variation over

the landscape (Lowry, 2010; Schoville et al., 2012). The spatial delineation of ecologically relevant phenotypes (i.e., adaptive phenotypes; **Figure 3**) or of traits related to reproductive isolation can be fundamentally important to generate hypotheses about divergence across environmental gradients and to inform sampling strategies. In cases where *a priori* information is available about the distribution of phenotypically divergent ecotypes within species or between closely related lineages (e.g., Bernatchez et al., 2010) it might be prudent to invest in replicated population sampling across geographically separate but similar environments (Smith et al., 1997). Here, evidence for independently repeated genetic divergence (neutral or adaptive) in divergent phenotypes correlated with similar environmental gradients is strong evidence for selection (Bernatchez et al., 2010; Rosenblum and Harmon, 2011).

Regardless of whether a single or multiple species are being targeted, intensive sampling (both in number of individuals per deme and in number of demes) is required to ensure rigorous statistical analyses. Within lineages, sampling should be hierarchical to enable analyses of spatial dependence: the design should be spatially nested regarding dispersal potential to capture the

degree of autocorrelated genetic data (due to daily dispersal) and the environmental-species relationships (due to beyond natal dispersal; Manel et al., 2010).

The rigorous sampling regime proposed here contrasts remarkably with the sparse sampling strategies generally used in phylogeographic and phylogenetic studies of tropical biotas (Beheregaray, 2008; Rull, 2008; Antonelli and Sanmartín, 2011). Although intensive population sampling might be seen prohibitive for some projects, efforts to document biodiversity in the tropics have increased and become more integrated in recent years (Janzen et al., 2009). Consistent with this view, responsible collection of specimens and associated data and openly sharing of this knowledge have been advocated (Rocha et al., 2014). Similarly, the field of biogeography seems to be experiencing a renaissance for inter-disciplinary research associated in part with the realized and potential impact of rapid data accumulation for poorly studied regions (Dawson et al., 2013). Altogether, these and other developments (e.g., the recently created open access journal *Scientific Data*) should contribute to improving and accelerating initiatives supporting bioinventory collections and population sampling in the


tropics. Population genetic surveys should tag along to ensure that tissue samples for DNA analysis (and in some cases for transcriptomics, **Figure 3**), are collected together with voucher specimens and other key associated data such as phenotypic information.

#### **LANDSCAPE MAPPING**

Landscape (or riverscape or seascape) mapping is an important step that should be carried out in parallel with sampling design. A landscape is an area that is heterogeneous with regard to at least one variable of interest (Turner et al., 2001). A landscape with low permeability can decrease gene flow, while increasing genetic drift and population structure. The purpose of incorporating such information is to identify environmental and climatic gradients that might be impacting on both the structural and functional habitats of study species. This should improve the delineation of ecotones or habitat transition areas and thus inform on the most appropriate sampling design. It also creates scenarios to anchor empirical analyses of gene flow and simulations in ELG (discussed below). Mapping can be done based on data collected in the field or using existing GIS datasets that contain summary statistics for variables of interest, such as climate, landcover, structural habitat, topography, disturbance, and ecosystem productivity. For example, climate data can be gathered from WorldClim (Schoville et al., 2012) and Microclim (Kearney et al., 2014). Looking ahead, the increase availability of rich environmental and climatic data is expected to enable sophisticated functional analyses that can potentially be used in ecological speciation research. For instance, mechanistic niche models or probability of occurrence maps that identify functional traits that limit distributions (e.g., physiological responses and constraints) provide a view of the fundamental niche that can then be mapped to the landscape (Kearney and Porter, 2009). These traits can be allowed to evolve (e.g., by incorporating selection and heritability) to explore the

potential impact of evolution and to predict adaptive dynamics and distribution shifts (Kearney and Porter, 2009; Hoffmann and Sgro, 2011). Rich ecological datasets that include measures of resource availability and biotic interactions are also expected to positively impact ELG by providing an opportunity to test whether specific ecological factors drive adaptive genetic variation (Schoville et al., 2012).

### **PHYLOGENETICS AND PHYLOGEOGRAPHY**

The integration of molecular genealogical information from phylogenetic and phylogeographic analyses into the framework is a prerequisite for subsequent analyses of intraspecific (i.e., within lineages) gene flow. Two main reasons account for this requirement. First, these analyses enable the discovery and spatial delineation of historically isolated lineages and hidden biodiversity, such as evolutionarily significant units (ESUs) and cryptic species. Levels of species richness and morphologically cryptic biodiversity have been grossly underdocumented in assessments of tropical biodiversity (e.g., Parra-Olea and Wake, 2001; Cooke et al., 2012d; Funk et al., 2012) because these rarely employ thorough morphology-based analyses, intensive population sampling and DNA-based methods (Beheregaray and Caccone, 2007; Beheregaray, 2008). For instance, our comparative research program on the evolution of Amazonian fishes focused on species suspected to be represented by a single evolutionary lineage across the range of each taxon. We sampled populations from four nominal taxa (i.e., species) from a project on upper reaches of the Negro river and from five nominal taxa for the water color project described below. After performing genealogical and population genetic analyses it became evident that we were actually working with, respectively, a minimum of eight (e.g., Cooke et al., 2009; Sistrom et al., 2009; Piggott et al., 2011) and eleven (e.g., Cooke et al., 2012a,b,c,d, 2014) ancient cryptic species in each project. Notwithstanding the value of molecular data, we recognize that

morphology-based taxonomy has a central and unrivaled position in biodiversity research (Schlick-Steiner et al., 2007) and advocate for resource investment toward traditional taxonomy and museum collections to ensure that newly reported cryptic species will be properly cataloged and described following discovery.

The second reason for integrating molecular genealogical studies is that they add an essential component to the understanding of patterns of population structure and levels of reproductive isolation: time. Temporal changes in the physical and biotic environment of a population lead to demographic variations that are correlated with the structure of population genealogies (Avise, 2000). Phylogeographic studies, when integrated with information from historical disciplines of Earth sciences, can potentially describe the chronology of demographic variation and reproductive isolation of population units (Avise et al., 1998; Hewitt, 2001). As such, they can disentangle scenarios of relatively recent divergence due to ecology – the setting targeted by studies of ecological speciation, from patterns of allopatric divergence due to vicariant biogeographic history. Phylogeographic analyses can also address questions about the species divergence process that cannot be addressed without genealogical data. These include clarifying how evolutionary divergence proceeds in the context of colonization and ecological opportunity (Beheregaray et al., 2004), and how repeated distributional shifts may have constrained diversification to taxa in which reproductive isolation apparently evolves very quickly (Carstens and Knowles, 2006).

#### **POPULATION STRUCTURE, GENE FLOW AND GENOME SCANS OF SELECTION**

In this step, DNA markers are used to genotype individuals across heterogeneous environments with the aim of clarifying patterns of gene flow and identifying the spatial locations of genetic discontinuities (i.e., population boundaries). This is a key step in the framework that should be done at the metapopulation level after historical diversification has been accounted for (as above). The expectation is that during ecological speciation, ecological or environmental distance (analogous to geographic distance), reduces homogenizing gene flow and correlates to genetic population differentiation. These patterns are now considered common in nature and usually referred to as isolation-by-ecology (Shafer and Wolf, 2013), or isolation-by-environment (Sexton et al., 2014). Evidence indicates that adaptive divergence between selective environments constrains gene flow through selection against either immigrants or hybrids (Schluter, 2000; Räsänen and Hendry, 2008). Similar to what happens in allopatric and parapatric speciation, ecologically induced population differentiation might not necessarily produce complete reproductive isolation or new species (Hendry, 2009; Nosil, 2012).

Although isolation-by-ecology can be detected with relatively small neutral molecular datasets such as microsatellites and AFLPs (Shafer and Wolf, 2013; see below for examples), next-generationsequencing (NGS) methods now offer the possibility of identifying and typing 1000s of genetic markers (i.e., SNPs) for population genomic analysis. These included genotyping-by-sequencing methods that enable screening SNPs throughout the genome of non-model species in a relatively inexpensive manner (Narum

et al., 2013). Using a large number of genetic markers is important in ecological speciation research. These markers can clarify evolutionary relationships between populations, ecotypes and lineages, while improving delineation of barriers to gene flow, selection gradients and patterns of isolation-by-ecology caused by demographic factors. Simultaneously, these large datasets can be used to identify genomic regions (or specific loci) that show evidence of divergent selection (Luikart et al., 2003; Seehausen et al., 2014). In population genomics, the latter step is often done using genome scans that compare allelic variation at markers spread throughout the genome in many individuals from ecologically different populations or species. During genome scans of selection, markers that potentially carry the signature of natural selection (i.e., 'outlier' loci used to characterize population adaptations) will be distinguished from neutral markers (i.e., markers used to infer population parameters and phylogeography) because they exhibit exceptionally high levels of differentiation (Luikart et al., 2003; Beaumont, 2005). The premise of the method is that drift, inbreeding and gene flow usually have genome wide effects, whereas selection leaves signatures only at those loci that are adaptive to a particular scenario (Luikart et al., 2003; Seehausen et al., 2014). As such, differentiation will accumulate in regions under selection, whereas in other regions, genetic drift will require longer periods of time to accumulate (Wu, 2001; Nosil et al., 2009). By extension, sympatric or parapatric populations evolving in the presence of gene flow are predicted to show greater heterogeneity across the genome than spatially isolated populations (Wu, 2001; Nosil et al., 2009; but see Renaut et al., 2013).

#### **SIMULATION MODELING FOR EVOLUTIONARY LANDSCAPE GENETICS**

In this section, we provide an introduction about simulations and landscape genetics (for a review see, Landguth et al., 2014) and list general questions that can be addressed with this type of approach (**Box II**). Simulations have provided many important findings in various disciplines (e.g., Grimm and Railsback, 2005), and are increasingly accepted by empiricists (Jeltsch et al., 2013). The quasi-experimental framework of simulation models offers several important benefits for scientific research. For example, the ability to repeat simulations of a system with varying parameters (sensitivity analysis) allows researchers to assess how much confidence we can put in the conclusions derived from simulations and to predict how a system or its behavior will change if certain processes are altered. With empirical data, the range of parameter values and assumptions that can be tested is usually much more limited, potentially leading to weaker or more uncertain inferences. Simulations can also mimic perfect sampling conditions, which can lead to stronger inferences (see review in Balkenhol and Fortin, 2014). In sum, simulation modeling can be used to predict and explain, guide sampling and data collection, illuminate core dynamics of a system, discover new questions, bound outcomes to plausible ranges, quantify uncertainties, and train students and practitioners (Epstein, 2007).

Simulations have been used in population and landscape genetics for many years, and the availability of software for simulating genetic data is increasing steadily (reviewed in Hoban et al., 2012; Hoban, 2014). However, there are important differences between

ELG simulations and'classic' population genetic simulations. First, many genetic simulation approaches generate genetic data (i.e., summary statistics) only at the population-level. In contrast, many ELG approaches produce genetic data for every individual (even if these are grouped into populations), and thus rely on individualbased models (IBMs). IBMs are classes of computational models for simulating the actions and interactions of autonomous individuals. Individuals can differ in their attributes (e.g., males vs. females) and these attributes influence their actions (e.g., different dispersal for males vs. females), as well as their reactions toward each other (e.g., mating strategies) or to other simulation settings (e.g., varying propensity to cross simulated barriers for males vs. females).

A second major characteristic of ELG simulations is the fact they are always based on spatially explicit models. These models are defined by placing individuals or groups of individuals (i.e., populations) on 1- or 2-dimensional regular lattices, or in irregular (x-, y-) coordinate space. Specific rules in the model then define how individuals move and interact across space, for example by defining the distances they can move away from their birth location, or the distance within which they can find a mating partner. Population genetic simulations are often also spatially explicit (e.g., Balloux, 2001), but while simulating population genetic data without space is possible, this is not the case for ELG simulations. In addition to space, another vital feature of ELG simulations is the direct incorporation of environmental heterogeneity into the underlying model. This usually requires a spatial representation of the environment that individuals are placed in. Importantly, this environment is spatially variable (i.e., non-homogeneous) in space, and potentially also in time, and it directly affects some or all of the essential processes included in the model. Thus, the rules that govern the actions and reactions of simulated individuals not only depend on pure space, but also on the user-defined environmental heterogeneity included in the model. This is the key distinguishing feature between population genetic and landscape genetic simulation modeling.

The ELG simulation modeling implemented in our ecological speciation framework includes an additional component: it specifically integrates landscape genetics and evolutionary genetics, focusing on how space and selection impacts on the evolution of reproductive isolation (speciation). For example, selection can be controlled via fitness landscape surfaces (Wright, 1932; Gavrilets, 2004) that determine the genotype-dependent viability of offspring in a spatially explicit setting. Let us consider a single bi-allelic locus model in which three relative fitness surfaces are specified for the three genotypes (AA, Aa, and aa). Selection is then implemented through differential survival of offspring as a function of the relative fitness of its genotype [e.g., determined in 'water color' in our Amazonian study (Cooke et al., 2014); see below] at the location on that surface where the dispersing individual settles. Then, through ELG simulations (e.g., Landguth et al., 2012), one can explore modeling of natural selection in landscape genetics with individual organism dynamics. Through sensitivity and uncertainty analysis, a factorial study design can be used with a range of species-specific dispersal strategies (controlling for gene flow) across landscape

resistance scenarios. In addition, a range of genetic parameters and exogenous selection (arising from the environment) as well as endogenous selection (arising from genetic incompatibilities) can be explored while tracking the evolution of reproductive isolation (e.g., Cooke et al., 2014). Note that, when attempting to understand endogenous selection in spatial settings, fitness is determined by epistatic interactions, in form of the wellknown Dobzhansky–Muller model (Dobzhansky, 1937; Muller, 1942; e.g., Eppstein et al., 2009). Thus, extensions to multiloci selection models must be considered for addressing these questions.

### **FRAMEWORK TESTING: COMPARATIVE ANALYSIS OF PHYLOGEOGRAPHY AND ECOLOGICAL SPECIATION IN AMAZONIAN FISHES**

Here, we provide a number of examples from our published work (Cooke et al., 2012a,b,c,d, 2014) that illustrate how integrating population genetics, genome scans of selection, ELG and sequenced-based phylogeographic and phylogenetic methods can be used to assess the relative influence of environmental gradients and biogeographic history in shaping Amazonian fish biodiversity. The studies used a thorough sampling design and amassed 905 individuals from 48 putative populations of five taxonomically diverse and ecologically distinct fish species endemic to Amazonia (**Figure 4**). These included freshwater representatives of marine-derived lineages from two orders; a tetraodontiform (the puffer *Colomesus asellus*) and a perciform (the croaker *Plagioscion squamosissimus*) and ancient freshwater lineages representing three Gondwana-relict orders; a characiform (the characin *Triportheus albus*), a siluriform (the catfish *Centromochlus existimatus*) and a gymnotiform (the electric fish *Steatogenys elegans*). Nuclear and mitochondrial DNA (mtDNA) sequences were used for genealogical analyses and AFLP datasets were used for analyses of gene flow and genome scans. Readers interested about species-specific datasets, analytical methods, and detailed biogeographic reconstructions should refer to our five publications (Cooke et al., 2012a,b,c,d, 2014).

The five species are largely codistributed, yet results indicate they have very different evolutionary histories. Phylogenetic analyses unexpectedly revealed five cryptic species within the catfish *C. existimatus* and two fully sympatric cryptic species within the electric fish *S. elegans* (**Figure 5**). Molecular dating indicates that these lineages all diverged during middle to late Miocene (**Figure 6**). In contrast, there was no evidence for cryptic species within the marine-derived taxa or within *T. albus* (**Figures 5** and **6**). Despite these differences, however, several congruous patterns that may actually reflect general forces shaping freshwater biodiversity in Amazonia were detected. Below we introduce the study region and summarize key comparative findings about inferred spatial arrangements of neutral and putatively adaptive genetic diversity. These are discussed within the contexts of geomorphological history of Amazonia and divergent selection.

#### **STUDY SYSTEM**

The Amazonian aquatic environment sustains dramatic hydrochemical and ecological gradients that impose physiological

**boundaries identified for each of the five study species (excluding** *C. existimatus***).** Analyses used either a combination, or all of the following analytical methods based on nDNA and mtDNA sequences and on AFLP

STRUCTURE (Falush et al., 2003), mtDNA ΦST (Excoffier et al., 1992), and AFLP F ST (Lynch and Milligan, 1994). Site abbreviations are: N, Negro River; B, Branco River; A, Amazon River; M, Madeira River; T, Tapajós River.

constraints upon its aquatic communities (Junk et al., 1983; Henderson and Crampton, 1997; Rodriguez and Lewis, 1997; Saint-Paul et al., 2000; Petry et al., 2003). These aquatic conditions have been grouped into three water types or 'colors' and are differentiated largely by sediment composition, geochemistry and optical characteristics (Sioli, 1984): (i) white water, which has an Andean origin, is turbid in nature and is characterized by large amounts of dissolved solids and a neutral pH; (ii) clear water, which is comparatively transparent and contains low content of dissolved solids and a neutral pH; and (iii) black water, which is transparent yet stained by tannins and humic acids leached from vegetation and has low pH (pH ∼5 or lower). Both clear and black water differ from white water in that they are craton born, draining the Brazilian and Guyana shields, respectively (Hoorn et al., 2010a). As a result, their sediment composition and channel formation are different from the fast-turbid Andean white waters. Major ecological gradients have been shown to generate biodiversity via divergent natural selection elsewhere (Endler, 1973; Smith et al., 1997, 2001). This led to the prediction that major differences in water color between rivers of the Amazon Basin provide ecological opportunities for natural selection to drive genetic divergence between populations of aquatic organisms.

The study system within the Amazon Basin consists of the five major rivers representing white, black and clear waters: the Amazon (white), Madeira (white), Branco (seasonally black), Negro (black), and Tapajós (clear; **Figure 4**). The study area encompasses two putatively strong selection gradients or ecotones: where the black waters of the Negro River meets the white waters of the Amazon River, and where the clear waters of the Tapajós River meets the white waters of the Amazon River. Additionally, the transect allows testing for genetic structure geographically associated with river confluence. That is because two controls in which rivers of the same water color meet were included: the confluence of the black waters of the Branco and Negro Rivers and the confluence of the white waters of the Madeira and Amazon Rivers (**Figure 4**).

#### **COMPARATIVE PATTERNS**

Despite differences in ecology and phylogenetic history, strong evidence indicates that the biogeographic and ecological contexts of the Amazon Basin have promoted largely congruent finescale phylogeographic and population-level structuring. From a genealogical perspective, all species showed strong signals of demographic expansion and in every case the timing of these occurred well within the Quaternary period, with the exception of *C. asellus* whose expansion began ∼0.5 Ma earlier (**Table 1**).

statistical parsimony.

of the circle is proportional to its frequency. Lines joining

dates of marine incursions, the breach of the Purus Arch and subsequent formation of the Proto Amazon River are after Lundberg et al. (1998). The date estimate for the final formation of the modern Amazon River is after Campbell

phylogenetic analysis using BEAST 1.6.1 (Drummond and Rambaut, 2007). Where phylogenetic methods were not employed, the timing of speciation has been stated as unknown.

From a population genetic perspective, all species and species complexes showed a predominant barrier to gene flow at the confluence of the Negro and the Amazon Rivers, and/or again at the confluence of the Tapajós and Amazon Rivers (**Figures 4** and **5**). No significant barrier to gene flow was identified at the confluence of the Madeira and Amazon Rivers in any species. Indeed, based on both mitochondrial and nuclear datasets, population

structure was strongly associated with 'water color' but not with river system. For instance, statistically significant differentiation was consistently found between populations from rivers of different colors, but not between those from rivers of the same color (**Tables 2** and **3**; **Figure 4**). For *C. existimatus*, the large number of inferred cryptic species resulted in insufficient data for intraspecific analysis. However, the *C. existimatus* cryptic


**Table 1 | Historical demographic analyses based on DNA sequences for all species and species complexes.**

Summaries include the Sum of squared deviations (SSD), Raggedness Index (R), and Fu's test of neutrality (FS ). Estimates of time since expansion are shown for analyses exhibiting evidence of demographic expansion. Phylogroups correspond with haplotype networks in *Figure 2*.


#### **Table 2 | Analysis of molecular variance (AMOVA) based on mtDNA sequences for three study species.**

Comparisons 'between water colors' are based on populations grouped according to river color. FI stands for Fixation Index, significant results are indicated by\*.

**Table 3 | Analysis of molecular variance (AMOVA) based on AFLP data for three species and species complex.**


Comparisons 'between water colors' are based on populations grouped according to river color. FI stands for Fixation Index, significant results are indicated by \*.

species 'D' endemic to black waters of the Negro River, appeared recently derived and genetically distinct from white water lineages (**Figure 5v**).

Remarkably, ecologically driven divergence was also depicted in replicate over the riverscape in the two sympatric cryptic species discovered within the electric fish *S. elegans*. Here, intraspecific divergence within both sp. 1 and sp. 2 is due to a barrier to gene flow between black and white water, whereas no barrier to gene flow was identified between white water rivers (**Figure 7**; **Table 2**). These lineages likely exemplify "natural replicates of the ecological speciation in progress" (*sensu* Rosenblum and Harmon, 2011) and as such should assist with the discovery of general rules about divergent natural selection that may result in ecological speciation (Cooke et al., 2014).

Using outlier loci approaches based on genome scans, divergent selection was quantified between populations of *C. asellus*, *T. albus* and *S. elegans*. Heightened selection was detected at the interface of different water colors in the three species (**Table 4**), irrespective of river system and tributary arrangement.

#### **SIMULATING ELG ALONG THE AMAZON RIVERSCAPE**

Here we describe an ELG approach that incorporates selection to simulate genetic exchange along the Amazonian riverscape. Although this was used to explore the reasons behind the black water/white water population boundary detected in the electric fish *S. elegans* (**Figure 8**; Cooke et al., 2014), such approach can be readily extended to a wide range of studies of ecological speciation.

Simulations were conducted in the individual-based landscape genetics program CDPOP v1.2 (Landguth and Cushman, 2010; Landguth et al., 2012). Individual genetic exchange was simulated for over 100 non-overlapping generations as a function of individual-based movement, mating, dispersal, and

selection, using 100 individuals spatially located at each of the nine populations in the study system. One locus was set to be under spatial selection and tied to 'water color,' and 19 loci were selectively neutral across a riverine distance surface that controlled for individual movement (mating and dispersal). Following similar studies (e.g., Landguth and Balkenhol, 2012) and expanding to a spatially explicit environmental gradient in a riverscape setting (see **Figure 8**), selection pressures were altered due to 'water color' between populations by considering three spatially explicit relative fitness surface scenarios. (i) No spatial selection gradient ('uniform'): in this scenario, the three genotypes (AA, Aa, and aa) were being selected against, but uniformly across the 'water color' riverscape scenario, thus having no spatial dependency. (ii) Gentle spatial selection gradient ('gentle'): here, a 'gentle' spatial selection gradient corresponding to the three river color locations and three genotypes was used. (iii) Steep spatial selection gradient ('steep'): for this scenario, stronger spatial selection gradients were assigned to each genotype for black, mixed, and white waters. For further details, see Cooke et al. (2014).

In the *S. elegans* replicated system, population genetic analyses and *F*ST-based genome scans showed that recent divergence appeared linked to a major hydrochemical gradient within each cryptic species (**Tables 3** and **4**; **Figure 8**). Results from ELG simulations further corroborate these findings. These ELG



Repeat outliers are those detected only in multiple comparisons, which corrects for type I errors. The ALFP data are based on 460 polymorphic loci for C. asellus, 360 polymorphic loci for T. albus and 310 polymorphic loci for S. elegans. NA, samples not available.

simulations show that neutral data can give a low population differentiation signal (similar to the empirical neutral data findings based on AFLPs). They also show that selection-driven loci can respond with high population differentiation to the water color ecotone (similar to the empirical outlier loci findings).

The results link selection across an ecological gradient with reproductive isolation and it was speculated that assortative mating based on chemically different water types may be driving the divergence (Cooke et al., 2014). The rationale is that the major differences in pH and conductivity between waters of the Negro and Amazonas influence the transmission of electric signals used for courtship signaling and for precise synchronization of external fertilization – a hypothesis consistent with the idea that electric discharges in African electric fish are drivers of sympatric speciation (Feulner et al., 2006).

#### **EVOLUTIONARY PROCESSES SHAPPING AMAZONIAN FISH DIVERSITY: GEOMORPHOLOGICAL HISTORY AND DIVERGENT NATURAL SELECTION**

The palaeogeographic and paleoenvironmental changes in South America during the Miocene are known to have profoundly affected the evolution of the Amazonian fish fauna. These changes include the uplift of the Andes and associated neotectonic events,

the incursion of marine waters into previously freshwater systems and the dramatic reorientation of major river drainages (e.g., the Amazon River), (Lundberg et al., 1998; Lovejoy and Collette, 2001; Montoya-Burgos, 2003; Hubert and Renno, 2006; Lovejoy et al., 2006; Ruzzante et al., 2006; Hubert et al., 2007; Sistrom et al., 2009; Hoorn et al., 2010a,b; Cooke et al., 2012b,c,d). Indeed, phylogenetic analyses combined with fossil data indicate that the early Miocene marine incursions were likely responsible for the colonization and adaptation to freshwaters of the two marine-derived lineages we studied (**Figure 6**). Moreover, dating of speciation events within *Plagioscion*, *S. elegans* and *C. existimatus* are consistent with the Miocene diversification of fishes observed in the fossil record, as well as in other molecular studies (Rull, 2008; Sistrom et al., 2009; Piggott et al., 2011). Moving further in time, our analyses detected phylogeographic signals supporting easterly trajectories of colonization down the Amazon River and consistently strong demographic expansions for all species dated for the Quaternary (**Figures 5** and **6**; **Table 1**). These studies also suggest that the ancestral ecotype for each species was in Andean-derived white water (Cooke et al., 2012b,c,d, 2014). These events of population history are likely correlated with the final establishment of the modern Amazon River during late Pliocene-early Pleistocene (Rossetti et al., 2005; Campbell et al., 2006). This association was attributed to the availability of novel aquatic habitats along the vast Amazon and its catchment, which enabled range expansions and left concordant records in the population genealogies of codistributed species (**Figure 6**).

Although palaeogeographic events have clearly contributed to Amazonian fish diversity, the comparative analysis indicates that the force of divergent natural selection, in this case between water colors, deserves equal consideration. One of the primary tenets of ecological speciation theory is that high ecological opportunity, such as the colonization of new environments in the absence of predation and/or competition, will promote rapid population divergence resulting in speciation (Mayr, 1963; Schluter, 2000). Indeed, adaptive radiations following the colonization of new habitats are well documented in fish (Bernatchez andWilson,1998; Beheregaray et al., 2002). The simultaneous Quaternary expansion events of the five species into the Amazon River system may have presented the high ecological opportunity necessaryfor divergent natural selection and adaptation between Andean (white) and Craton-derived (black and clear) waters.

Classic signatures of vicariant biogeographic history such as genetic drift, inbreeding and migration have genome wide effects, while selection usually leaves signatures only at those loci that are adaptive or tightly linked to adaptive loci via hitchhiking (Luikart et al., 2003; Jensen et al., 2007). For this reason, *F*ST-based genome scans were employed to assess the role of selection in the origin and maintenance of population divergence between water colors. However, key to ecologically based divergent natural selection is that the process should be greater between selective environments than within (Schluter and Conte, 2009). Indeed, in each species, more outlier loci on average were found in pairwise comparisons between different water colors than in comparisons between the same colors (**Table 4**).

For *C. asellus*, divergent selection detected between the Tapajós and Amazon Rivers was a powerful yet recent phenomenon, such that genetic drift had not yet accumulated in selectively neutral genomic regions (Cooke et al., 2012b). In contrast, the effects of isolation by environment was suggested for *T. albus* between black, white and clear waters (Cooke et al., 2012a). For that species, a population boundary between water colors has also arisen within the selectively neutral AFLP loci and is particularly marked in the mitochondrial data (**Figure 5iii**). For *S. elegans*, both the empirical landscape genetics and simulated ELG explicitly linked selectiondriven population genetic structure to the water color ecotone, validating results of *F*ST-based genome scans. By incorporating information about population history and by independently validating outlier results with simulations, we improved our ability to exclude type 1 errors normally associated with genome scans of selection (see Cooke et al., 2014 for details).

In the context of comparative biology, the use of genome scans revealed variable, yet valuable perspectives on the process of adaptive divergence across water colors, both within and between species. Although there is no certainty that ecologically diverged lineages will eventuate as reproductively isolated species (Futuyma, 1987; Via, 2009), the observation of isolation by environment within *T. albus* and speciation in the *S. elegans* complex across the same environmental gradient added credence to the assertion that water color is a general force shaping Amazonian freshwater biodiversity. Considering this, it may also be likely that divergent natural selection is the mechanism maintaining population boundaries observed between water colors for *P. squamosissimus* and *C. existimatus* (**Figures 5ii, v**).

The evidence for the role of divergent natural selection in these studies occupies a relatively recent time frame and appears associated with the formation of the modern Amazon River and its aquatic ecotones. Considering that time erases the signature of divergent selection within the genome, resulting in a more diffuse pattern of genetic divergence (Via, 2009), ecological speciation may not be just a recent phenomenon within the Amazon Basin. Here, a broad taxonomic range of species with different life histories were sampled and in every case there was evidence that water color influenced population divergence. Thus, the process of divergent natural selection is also likely to be widespread in the accumulation of contemporary diversity, which may shift the paradigm for future evolutionary research in Amazonia.

## **FUTURE DIRECTIONS**

Clarifying the role of ecology on population adaptation and divergence is crucial for understanding the origin of species and the evolutionary potential of biodiversity to respond to ongoing global change (Hoffmann and Sgro, 2011; Seehausen et al., 2014). Although the theory of adaptation is well developed, measuring the strength and characteristics of selection in nature remains a daunting task (Endler, 1986; Schluter, 1988; Jones et al., 2012). We generally lack knowledge about the genomic basis of population adaptation and divergence, especially for tropical organisms. Despite these gaps in knowledge, recent years have seen exciting developments due to the explosion of information for non-model organisms created by NGS technologies. While modest genetic datasets can certainly inform on ecological divergence that may result in ecological speciation (e.g., Cooke et al., 2012a,b, 2014), it is now feasible to 'genomicize' ecologically and

evolutionarily important species at relatively low costs (Travers et al., 2007; Ellegren, 2008; Bernatchez et al., 2010; Seehausen et al., 2014). NGS resources relevant for ecological speciation research include genome-wide marker panels for population genomics, whole genome sequences and transcriptomes. These can be used to identify gene regions and traits involved in adaptation, to understand selective factors influencing adaptive variation and to assess evolutionary resilience.

As illustrated in this review, studies of ecological speciation in the tropics should merge analyses of genealogical history and gene flow with recent developments in spatial modeling and ELG simulations. This integration provides valuable biogeographic and environmental contexts to disclose associations between landscape features and evolutionary processes, including divergent natural selection. Such approaches, when combined with NGS datasets and with information about divergent phenotypes can be used to identify adaptive loci across the landscape (Lowry, 2010; Schoville et al., 2012), and to better assess the relative sensitivity of adaptive and neutral loci across environmental barriers and under a range of gene flow (Balkenhol and Landguth, 2011). However, researchers have to unambiguously determine whether markers are under selection and undoubtedly, this multigenic process encompasses additive, epistatic interaction, and pleiotropic effects. While several analytical challenges remain, ELG simulations in combination with powerful genomic datasets can provide a snapshot of the potential of evolutionary processes. Future research should address how landscape heterogeneity affects the generation of cluster of reproductively isolated genotypes through landscape restricted migration. Furthermore, how do spatial selection gradients influence the emergence of reproductively isolating clusters? These questions have direct relevance for mosaic hybrid zones (e.g., Ross and Harrison, 2002) and how they are shaped by individualbased movement strategies, heterogeneous landscapes, spatial selection gradients, and their interactions. Forthcoming ELG simulations should combine a range of landscape complexities (affecting dispersal) with spatial selection gradients and complex endogenous selection (i.e., multi-loci interactions, recombination, and mutational models) within the framework outlined here to understand processes controlling the emergence of reproductive isolation.

Ecological adaptation and divergence in the face of homogenizing gene flow is still a controversial topic. Many studies of ecological speciation in nature appear as correlative, are based on results from lab experiments (Nosil, 2012), and showed only weak associations between divergent selection and levels of reproductive isolation (Hendry, 2009). The latter is not unexpected because divergent selection acts on adaptive traits responsible for post- and prezygotic reproductive isolation along an evolutionary continuum that ranges from adaptive variation within panmictic populations to complete reproductive isolation between species (Schluter, 2001; Hendry, 2009). In fact, a recent meta-analysis suggests that genetic divergence induced by ecologically based divergent selection is pervasive across time-scales and taxa (Shafer and Wolf, 2013). Marine ecosystems are no exception, with two decades of phylogeographic research indicating that the combination of ecological divergence and partial isolation (parapatry) probably offers the richest opportunities for diversification in the sea (Bowen et al., 2013).

The time is right for implementing studies that synergistically generate and explore information from adaptive phenotypes, phylogeography, population genomics and simulations in ELG to understand ecological divergence and speciation in the tropics. Such endeavors are expected to challenge results from current surveys that assess tropical diversity based on sparse population sampling and on geographic models that do not incorporate selection. In doing so, they will complement historical biogeographic and evolutionary studies of diversification of tropical biotas. Integrative frameworks such as the one illustrated here have considerable potential to enhance conservation management in biodiversity rich ecosystems and to contribute toward a better understanding of how ecology, space and time interact with the genome.

#### **ACKNOWLEDGMENTS**

We thank Peter Teske and two reviewers for comments on the manuscript and Minami Sasaki and Luisa Beheregaray for assistance with the references and data entry, respectively. The Amazonian research outlined here was funded by the Discovery Program of the Australian Research Council (ARC DP0556496 to LBB) and by Macquarie University through postgraduate travel and student grants to GMC. Local arrangements were supported in part by the Brazilian Council of Research and Technology (CNPq-SEAP No. 408782/2006–4 to NLC). Collection permit is under IBAMA#1920550, and ethical approval under Macquarie University #2007/033. LBB also acknowledges support from the ARC Future Fellowship program (FT130101068).

#### **REFERENCES**


*argentinensis*. *Mol. Ecol.* 10, 2849–2866. doi: 10.1046/j.1365-294X.2001.t01-1- 01406.x


Saint-Paul, U., Zuanon, J., Correa, M. A. V., Garcia, M., Fabre, N. N., Berger, U., et al. (2000). Fish communities in central Amazonian white- and blackwater floodplains. *Environ. Biol. Fishes* 57, 235–250. doi: 10.1023/A:1007699130333


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 July 2014; accepted: 29 December 2014; published online: 21 January 2015.*

*Citation: Beheregaray LB, Cooke GM, Chao NL and Landguth EL (2015) Ecological speciation in the tropics: insights from comparative genetic studies in Amazonia. Front. Genet. 5:477. doi: 10.3389/fgene.2014.00477*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2015 Beheregaray, Cooke, Chao and Landguth. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Variable mating behaviors and the maintenance of tropical biodiversity**

*Charles H. Cannon 1,2 \* and Manuel Lerdau 1,3*

*<sup>1</sup> Key Lab in Tropical Ecology, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, China, <sup>2</sup> Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA, <sup>3</sup> Departments of Environmental Sciences and Biology, University of Virginia, Charlottesville, VA, USA*

Current theoretical studies on mechanisms promoting species co-existence in diverse communities assume that species are fixed in their mating behavior. Each species is a discrete evolutionary unit, even though most empirical evidence indicates that inter-specific gene flow occurs in plant and animal groups. Here, in a data-driven metacommunity model of species co-existence, we allow mating behavior to respond to local species composition and abundance. While individuals primarily out-cross, species maintain a diminished capacity for selfing and hybridization. Mate choice is treated as a variable behavior, which responds to intrinsic traits determining mate choice and the density and availability of sympatric inter-fertile individuals. When mate choice is strongly limited, even low survivorship of selfed offspring can prevent extinction of rare species. With increasing mate choice, low hybridization success rates maintain community level diversity for extended periods of time. In high diversity tropical tree communities, competition among sympatric congeneric species is negligible, because direct spatial proximity with close relatives is infrequent. Therefore, the genomic donorship presents little cost. By incorporating variable mating behavior into evolutionary models of diversification, we also discuss how participation in a syngameon may be selectively advantageous. We view this behavior as a genomic mutualism, where maintenance of genomic structure and diminished inter-fertility, allows each species in the syngameon to benefit from a greater effective population size during episodes of selective disadvantage. Rare species would play a particularly important role in these syngameons as they are more likely to produce heterospecific crosses and transgressive phenotypes. We propose that inter-specific gene flow can play a critical role by allowing genomic mutualists to avoid extinction and gain local adaptations.

**Keywords: syngameon, genomic mutualists, tropical trees, selfing, inter-specific hybridization, densitydependence, maintenance of diversity**

## **Introduction**

Numerous ecological mechanisms, both stochastic and deterministic, affect species co-existence in communities with high levels of diversity (Connell, 1961; Janzen, 1970; Hubbell, 1997; Webb and Peart, 1999; Wright, 2002; Volkov et al., 2005). While most attention has focused on how such diversity arises and how interactions among species promote and constrain diversity, fewer studies have examined the implications of demographic and genetic factors in the maintenance of species-level diversity. The unified neutral theory of biodiversity and biogeography

#### *Edited by:*

*James E. Richardson, Royal Botanic Garden Edinburgh, UK*

#### *Reviewed by:*

*Paul Fine, University of California, Berkeley, USA Mike Arnold, University of Georgia, USA*

#### *\*Correspondence:*

*Charles H. Cannon, Department of Biological Sciences, Texas Tech University, Flint and Main Street, Box 43131, Lubbock, TX 79409, USA chuck.cannon@ttu.edu*

#### *Specialty section:*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics*

> *Received: 30 May 2014 Accepted: 30 April 2015 Published: 19 May 2015*

#### *Citation:*

*Cannon CH and Lerdau M (2015) Variable mating behaviors and the maintenance of tropical biodiversity. Front. Genet. 6:183. doi: 10.3389/fgene.2015.00183* (Hubbell, 2001; Rosindell et al., 2011) has provided an effective set of ecological null models to test patterns of species composition and abundance in these communities and to discern the roles of different ecological mechanisms in promoting/constraining species-diversity.

In these ecological models, species act as discrete evolutionary units: conspecific individuals are inter-changeable while heterospecific individuals are completely distinct. Speciation, in other words, is assumed to be complete and reproductive isolation fixed. While only two studies have investigated tropical trees, some studies on diverse groups of animals and plants have shown, however, that closely related and sympatric species often maintain some level of inter-specific fertility, particularly among plants (Anderson, 1953; Morjan and Rieseberg, 2004; Seehausen, 2004; Arnold et al., 2008b; Givnish, 2010). Numerous studies of secondary contact zones have demonstrated that hybrid offspring can invade new habitats (Donovan et al., 2010), inter-specific gene flow can allow the introgression of advantageous alleles from one species to another (Arnold et al., 2008a,b; Fitzpatrick et al., 2010), and that genetic rescue through hybridization can occur during population crashes (Baskett and Gomulkiewicz, 2011). Increasing genome-scale evidence has also demonstrated that reticulation among sympatric close-relatives has played a major role in the few well-studied groups (Dasmahapatra et al., 2012; Twyford and Ennos, 2012; Yoder et al., 2013; Thomson et al., 2015). Selffertilization, primarily in plants, can also play a significant role in reproduction (Porcher and Lande, 2005).

In this study, using a numerical simulation model, we attempt to incorporate an ecologically responsive mating behavior by allowing both inter-specific crosses and self-fertilization to play a role in neutral meta-community ecological dynamics. We combine both intrinsic levels of inter-fertility and self-compatibility with extrinsic community-level density-dependent factors, e.g., species composition and abundance. While ecologists have achieved a relatively good understanding of ecological mechanisms for species co-existence, we have little understanding of how species diversify in tropical communities. This analysis allows us to bring traditional perspectives in plant genetics and evolution, like the syngameon (Lotsy, 1925; Grant, 1971), to bear on the traditionally ecological issue of biodiversity in hyperdiverse communities.

## **Variable Mating Behaviors**

We model individual plant reproductive behaviors, namely selfing, conspecific out-crossing, and inter-specific gene flow, in a metacommunity of varying species diversity. We allow these behaviors to respond to an environmental signal, which is the composition and abundance of pollen reaching their stigma. as responsive and adaptive reproductive behaviors. These three reproductive behaviors are common in wild populations and individuals commonly retain the capacity for each behavior. Although most tropical trees are highly out-crossed (Bawa et al., 1985a,b), selfing may be more frequent than expected (Kondo et al., 2011), particularly for endemic species (Alonso et al., 2010). Selfing rates have been shown to be inversely related to population densities (Ward et al., 2005). Moreover, sympatric and closely-related species can and do hybridize in natural communities (Kamiya et al., 2011; Duminil et al., 2012; Thomson et al., 2015).

Importantly, genome size appears to be largely stable in tropical trees (Chen et al., 2014). Polyploidization is generally considered rare among woody plants (Petit and Hampe, 2006). Most genera that have been examined have consistent ploidy levels while the impact of homoploid hybridization on diversification in plants has been reviewed (Rieseberg and Carney, 1998). Recent studies indicate that the genomes of eucalypt species are largely colinear (Hudson et al., 2012). Conserved genomic structure facilitates variable mating behaviors while unstable genomic structure should promote rapid speciation. In the following discussion, we refer to heterospecifics that retain partial inter-fertility as "genomic mutualists." This idea corresponds to the genic model of speciation (Wu, 2001), corresponding to a stable and intermediate inter-fertility (stages II or III) where speciation never reaches completion. This model is based upon the basic structure proposed for a syngameon (Lotsy, 1925; Grant, 1971), where multiple species remain partially inter-fertile but primarily experience divergent selection on portions of the genome with low levels of neutral or adaptive gene flow occuring in other parts of the genome.

Highly diverse communities would therefore be composed of numerous syngameons or suites of genomic mutualists, where individuals predominantly out-cross but retain some diminished capacity for self-fertilization or inter-specific hybridization against a conserved genomic background. Variable mating behavior in these individuals, instead of being determined solely by genotype, is determined by the quantity and quality of pollen rain. The quantity of a particular species of pollen, either from conspecifics or genomic mutualists, could act as a proxy measure for "success" in the community of that particular genotype, either because of its capture of pollination services or its ecophysiological success in that particular location. Pollen precedence would control this behavior, where conspecific pollen is more vigorous (Williams et al., 1999) and, if present, will fertilize most ovules (Howard et al., 1998). On the other hand, if an individual primarily receives heterospecific pollen, despite reduced vigor and lower fertilization rates, the probability of hybridization correspondingly increases. This type of density-dependent, unidirectional gene flow between sympatric genomic mutualists has been reported in oaks (Lepais et al., 2009). While some plants exhibit some level of self-incompatibility, the selective advantage of self-fertilization typically remains high, despite potential costs of inbreeding depression (Steinbachs and Holsinger, 1999). In situations where conspecific pollen is rare, possible mentor pollen effects could also facilitate hybridization (Knox et al., 1972), as mixtures of self and hetero-specific pollen on the stigma have been found to make the pistil more receptive to hybridization.

Our model incorporates an ecologically dynamic framework of conditions that determine species behavior based on the interaction of several probabilistic realities. This variable behavior is akin to fuzzy reasoning (Zadeh, 2008), also known as conditional logic, which remains somewhat controversial although it has also been successfully applied in the ecological modeling of animal behavior (Inglis and Langton, 2006). These principles are particularly advantageous when uncertainty and contingent processes play a significant role in the outcome. Given the general complexity of highly diverse tropical tree communities, particularly given their slow macro-evolutionary processes (Petit and Hampe, 2006) and dynamic biogeographic histories (Cannon et al., 2009), uncertainty should be expected to play a central role in the evolutionary and ecological behavior of tropical organisms. In these evolutionary and ecological conditions, variable mating behavior, where mate choice is contingent upon the environmental signal in the pollen rain, should be an appropriate model.

## **Resisting the Extinction Vortex**

Because a large proportion of species in highly diverse communities are rare (He et al., 1997), most are chronically vulnerable to stochastic local extinction (Ovaskainen and Meerson, 2010). This vulnerability implies either that extinction (and the concomitant immigration and/or speciation) rates are high in such systems or that rare species have coping mechanisms to avoid local extinction. Variable mating behaviors would be a potential coping mechanism because isolated individuals, suffering from an absence of conspecifics and poor pollination service in the local neighborhood, could expand their effective population sizes. Unidirectional gene flow into the rare species could be disadvantageous to individuals of the dominant species if the two species were in direct ecological competition. In highly diverse communities, however, most species form only a small fraction of the entire community. This general feature of highly diverse communities could ameliorate the potential cost of being a genetic donor to a rare species, because direct competition with hybrid offspring clustered around the rare genetic recipient would be correspondingly rare. Previous analysis of density dependent competition at the seedling stage in lowland tropical forest on Barro Colorado Island demonstrated that the individuals were affected by conspecific but not heterospecific competitors (Comita et al., 2010), but because the majority of species are sympatric with closelyrelated species, these results could underestimate the potential competition among genomic mutualists.

To explore the potential costs to common species of unidirectional hybridization with rare species through direct competition with their hybrid offspring, we examine conspecific and congeneric encounter rates within local neighborhoods in the Pasoh Long-Term Dynamics plot (Kochummen and LaFrankie, 1990), a lowland rainforest in peninsular Malaysia. At Pasoh, only 157 out of 811 species (19%) were the sole representative of their genus in the local community, while another 106 species were sympatric with one other congeneric species. The remaining 548 species (68%) exist sympatrically with at least two other congeners. This pattern is evident in numerous other locations across Southeast Asia (**Figure 1**). The opportunity for variable mating behavior is therefore abundant in these communities.

Here, we incorporate variable mating behavior into a metacommunity model of species co-existence using a data-based simulation with a simple decision tree based on the quantity and species composition of pollen received by an individual. We assume that among genomic mutualists, pollination is a stochastic process and the composition of the pollen rain is determined by the relative population densities of inter-fertile individuals in the meta-community. Additionally, we assume that conspecific out-crossing is substantially more likely to produce viable offspring than selfing or hybridization but if a cross is successful, the offspring are ecologically equivalent. Here, the assumptions of ecological equivalence among a suite of closely-related species is probably more acceptable than the assumption of community wide ecological equivalence across all of the species in the community made in the neutral theory of biodiversity and biogeography (Hubbell, 2001). Even if hybrids and selfed offspring have lowered fitness, the alternative is extinction, which is always more costly. We believe that modeling reproductive effort as a variable behavior, controlled by the density-dependent local community composition of inter-fertile species and the probability of successful outcomes from various types of crosses, can provide further insight into species extinction in highly diverse communities, generate numerous testable hypotheses about the evolution of rare species, and has major implications for species management.

## **Materials and Methods**

## **Model Parameters**

We included six variable parameters in our model of variable mating behavior and species co-existence. We performed three replicates of all possible combinations of the variable parameters. The variable parameters were:

## Species Diversity

The initial diversity of genomic mutualists that occur within the larger matrix of the community. We assume that all species are in the same genus and experience the same level of self-compatibility and inter-fertility. The values included 3, 5, and 10. These values capture the majority of the observed range of local diversity of congeneric taxa in the Pasoh 50 ha plot (see below).

## Community Size

The numbers of individuals in the community of genomic mutualist at the three diversity levels. The relative abundances of each species are based on the patterns observed in Pasoh forest plot data for groups of congeners. 3 spp. = {195,75,30}; 5 spp. = {300, 200, 150, 100, 50}; 10 spp. = {220, 200, 125, 110, 100, 90, 75, 50, 20, 10}.

## Pollen Limitation Coefficient

Proportion of the community of genomic mutualist that act as pollen donors in any one reproductive event. For example, in the three species community simulation where the total population is 300, with a PLC = 0.1, a single individual receives pollen from thirty other individuals (0.1 *×* 300); in the five species community simulation where the total population is 800, with a PLC = 0.01, a single individual receives pollen from eight other individuals (0.01 *×* 800). This parameter had a large impact on reproductive success and was examined across coefficient values ranging from 0.01 to 0.5 {0.01, 0.015, 0.02, 0.025, 0.1, 0.15, 0.25, 0.5}.

## Self-fertilization Success

The probability of producing a viable offspring from a selffertilization event. Conspecific crosses were always successful. A value of 0 signifies complete self incompatibility. These values are

**FIGURE 1 | Percentage of species existing in sympatry with congenerics and species richness of most diverse tree genera in four locations across tropical Asia.** "1 sp." illustrates the number of species with a only one species from a genus present in the community; "2 spp." illustrates the number

with species with at least two sympatric congeneric; and "*>*2 spp." illustrates the number of species with at least three sympatric congenerics. The five most diverse genera in each location are listed below the solid line, with the total number of species observed indicated.

derived from empirical studies across tropical tree taxa. {0, 0.1, 0.25, 0.4}.

## Hybridization Success

The probability of producing a viable offspring from an interspecific pollination event. A value of 0 signifies complete hybrid incompatibility. These values of interfertility among closelyrelated species are derived from previously cited reviews of plant species {0, 0.1, 0.25, 0.4}.

## Individual Fecundity

The number of potential progeny that could be produced by an individual in each reproductive event {10, 50, 250}, less the unsuccessful self-fertilization and inter-specific crosses. It should be noted that this level of fecundity greatly underestimates the actual fecundity of almost any tropical tree species, which can easily produce orders of magnitude greater ovules. This underestimate of fecundity simplified and facilitated rapid computation of the simulation. This very limited potential fecundity should impose a very strong burden on the fraction of successful hybrids or selfed offspring.

Several parameters were not allowed to vary in our simulations. We assumed that all crosses between conspecific individuals were successful and that all viable progeny were equally fit and ecological equivalent, whether they were of self, hybrid, or out-crossed origin. All individuals were bisexual and species identity of the progeny was determined by the mother. The size of the modeled community was fixed through the simulation (see parameter 2). Stochastic mortality was fixed at 1.5%. Note that individual reproductive success and the type of offspring produced can only be determined by the interaction of several probabilistic parameters and community composition and not inherent to the individual. No spatial parameters were imposed on the model, therefore any individual could mate with any other individual in the community with equal probability.

## **Simulation Procedure**

We established the initial community for each of three replicates for all unique combinations of variable parameters (see above). At the outset of the model, a community of a fixed population size was established (3 spp. = 300 individuals; 5 spp. = 800 inds., and 10 spp. = 1000 inds), composed of "common" and "rare" species (see parameter 2). For every individual in the initial community, three values were recorded and tracked through each reproductive event for the entire simulation. These values were: (1) individual identity, (2) species identity, and (3) age.

Each simulation was conducted in an iterative process of three steps (see below). Reproduction was simultaneous for all individuals over 500 events and no spatial effects were incorporated into the model. During each reproductive event, each individual was crossed with a random selection of individuals in the community, irrespective of species, given the values of the pollination limitation coefficient and fecundity parameters. Recruitment was random from the entire pool of viable progeny and equal to the number of stochastic deaths. We did not attempt to model the entire community of nested sets of inter-fertile species but each small subset of sympatric and inter-fertile species was modeled separately. The mean number of species remaining in the community at the end of the simulation from the three replicates for all possible combinations of parameter values was used to generate a response surface to examine the sensitivity of the community at each parameter value across its range of variation.

## **Step One**

Generate the progeny for each individual, based upon the particular set of parameter values used in each run of the simulation. Progeny for each individual were generated during each reproductive event in the following way:

A random subset of individuals in the community were chosen as pollen donors, based upon the pollination limitation coefficient (parameter 3). No spatial parameters were imposed on mate selection.


### **Step Two**

A proportion of the existing individuals, chosen randomly, in the model die, given our rate of stochastic mortality (fixed at 1.5%). All viable progeny produced by all individuals in step one are pooled into a "seed bank."

## **Step Three**

Individuals are chosen from the seed bank as recruits to replace the dead individuals. The seed bank was also substantially larger than the required number for replacement. These progeny are immediately able to reproduce. One "year" is added to each living individual's age.

## **Community Composition of Closely-related Species**

To estimate the potential impact of inter-specific hybridization at the community level, we calculated the number of tree species that live sympatrically in the Pasoh Long-Term Dynamics Plot with congeneric species, assuming that congeneric species are at least partially inter-fertile and represent "genomic mutualists." To estimate the potential cost of unidirectional gene flow to dominant species, we counted the number of conspecific and congeneric individuals present in the 25 m radius neighborhood surrounding 601 focal trees, representing 15 individuals from the 10 most common species from four categories of congeneric diversity: 3 species/genus, 3 species/genus, 5–6 species/genus, 10–14 species/genus, plus the most common species in the most diverse genus (45 *Eugenia* spp.). We tested empirical patterns against a null spatial model for local neighborhoods, using 1000 replicates. Simulations and analyses were written in Mathematica 7 (Wolfram Research, 2008).

## **Results**

Given stochastic mortality among individuals, we find that hybridization success, self-fertilization success, degree of pollination limitation, and level of fecundity all have significant effects on species co-existence (**Figure 2**). Inter-specific hybridization had the most pervasive effect, even at relatively low probabilities of success, maintaining the original community diversity over a substantial number (500x) of reproductive events, regardless of

**FIGURE 2 | Species richness after 500 reproductive events, given different levels of sympatry (3, 5, and 10 congeneric species), fecundity (10, 50, and 250 offspring/generation), pollination success (0.01, 0.015, 0.02, 0.025, 0.1, 0.15, 0.25, and 0.50), selfing success (0, 0.1, 0.25, and 0.4), and hybridization success (0, 0.1, 0.25, and 0.4).** Each contour line represents 0.5 species and darker shades indicate lower species diversity. Contour lines in each graph are interpolated from selfing and hybridization success rates. See Methods and Supplemental Information for more details.

species diversity, pollination limitation, or fecundity. The effect becomes slightly stronger as diversity increases among the suite of genomic mutualists. The rate of hybridization success at which this effect becomes obvious (*∼*20%) is probably a substantial over-estimate, given the substantial under-estimate of individual fecundity in the model compared to the actual fecundity of trees. Tropical trees can produce massive amounts of ovules during each reproductive event. We would argue that the important factor is the absolute number of viable offspring produced and not the proportion. Given realistic levels of fecundity, we would suggest that vanishingly small levels of inter-fertility, probably below those measurable in a typical field study, would have a similar impact to those observed in this model.

When pollination limitation is strong (*<* = 10%) and fecundity is moderate to high (*>*10 offspring), given our model, selffertilization strongly promotes species co-existence, even at low levels of success (10%). Again, the same statement could be made here about the importance of the absolute number of viable offspring and not the proportion. A bisexual individual only needs to produce a single offspring to be "fit." Once pollination success improves, with more than 10% of the community acting as potential mates, the patterns become consistent and largely unresponsive to fecundity, selfing and hybridization success. On the other hand, when self-fertilization and hybridization are not allowed (x- and y- origin on all graphs), the communities in our model invariably lose all of their diversity through stochastic extinction of rare species and eventually consist of a single species.

Self-fertilization became the dominant mode of reproduction when pollination limitation is strong (red lines in leftmost graphs in each panel of **Figure 3**), as almost all of the remaining individuals are a product of selfing at the end of the simulation, even when the success of selfed crosses is only 10%. On the other hand, when pollen limitation is not a factor and individuals can cross with half of the meta-population, selfed individuals never occur in the community (absence of red lines in rightmost graphs in each panel of **Figure 3**). The rate of inter-specific hybridization

does not respond strongly to pollination limitation when selfing is not allowed (blue lines in the top row of graphs in each panel of **Figure 3**), although it does seem to interact with selfing, as increased pollinator success leads to greater proportions of inter-specific hybridization relative to selfing (**Figure 3**). Finally, the proportion of outcrossed individuals gradually increase with decreasing pollination limitation through the simulation, with the proportion of hybrid offspring reaching a peak and slowly tapering off (**Figure 3**). This pattern indicates that while variable mating behaviors delay extinction of rare species, they do not prevent extinction.

In terms of possible costs to the common species, who would be acting as pollen donors, caused by direct competition with hybrid offspring, we found that direct competition among closelyrelated species rarely occurred within local neighborhoods in highly diverse forests simply because of the general low densities of species. In the Pasoh forest, the most common tree species accounts for less than 3% of the stems, while species with median levels of abundance contribute less than 0.04% to the entire community. Within a local neighborhood of 25 m radius, a focal individual encounters an average of 197 individuals, composed of an average of 104 species. The observed local neighborhood species diversity is significantly lower than the mean species diversity predicted by a null spatial model (134 spp., *p <* 0.05), agreeing with previous reports of clumped species distribution patterns (Condit et al., 2000). Overall, the number of stems *>*1 cm DBH of both conspecifics and congeners represents a very small fraction of the local neighborhood, and focal individuals are more likely to encounter conspecific individuals than congeners (**Figure 4**), although as congeneric diversity increases, the encounter rate becomes roughly equivalent for conspecifics and congenerics. Many individuals in genera with 2–3 sympatric species almost never encounter congeneric individuals. Given that individuals almost never directly compete with congenerics, unidirectional inter-specific hybridization, from dominant to rare species, cannot cause significant genetic costs to the pollen donor.

## **Discussion**

When variable mating behaviors are allowed in a stochastic metacommunity model, species co-existence in diverse communities is greatly enhanced, even when fertility among species is low compared to out-crossing with conspecifics. In this neutral model, this result occurs primarily by delaying the local stochastic extinction of rare species. As their population size declines, the reproductive effort of rare species will be pushed toward hybridization and selfing because they no longer receive sufficient conspecific pollen. This shift in mating behavior not only effectively expands the local effective population size but it greatly increases their potential genetic diversification and allows them to capture adaptive genes present in locally dominant species. As the local population approaches extinction, variable mating behaviors become increasingly beneficial. Because of the limited direct interaction with congeneric individuals in the local neighborhood, the dominant species also incurs little cost to being a pollen donor to the rare species.

Seed dispersal limitation and clumped species distributions are a general characteristic of tropical tree communities (Ashton, 1969; Condit et al., 2000), which limits the recolonization of a community by rare species and creates spatially clumped "families". First of all, these characteristics should further increase the likelihood of stochastic local extinction in rare species. Furthermore, the offspring of a rare species, including hybrids, will form a spatially clumped grove. The proximity of these mixed offspring could effectively allow them to back-cross, increasing mate choice with family members and potentially re-establish the local population (Baskett and Gomulkiewicz, 2011), returning the species to being predominantly out-crossed. It is possible that if the rare species did not recover locally, it would be swamped out by the common species. Given this model, rare species could also become a nexus of diversification as they begin to produce hybrid phenotypes, possibly transgressive or exadapted in a biotically and abiotically complex and dynamic environment (Givnish, 2010).

This behavior fits well with observations of inter-specific gene flow and the interaction of species in syngameons. We propose that a persistent but reduced capacity for variable mating behaviors among sympatric closely-related species would create a stable genomic mutualism (**Figure 5**). Adapted from Wu's (2001) view of genic speciation, these genomic mutualisms represent a balance between purifying selection within a species for a particular phenotype and diversifying selection among species for novel phenotypes. Gene flow among these genomic mutualists would be relatively infrequent, highly variable in spatial scale, primarily unidirectional, and episodic, caused by either stochastic or deterministic population decline of one species in the local community. During these episodes, individuals of an increasingly rare species

will primarily receive either hetero-specific or self pollen, greatly increasing the probability and advantage of producing offspring through these alternative reproductive pathways. As would be expected, selfing primarily benefits individuals when pollination limitation is extreme. Even if the average viability of selfed or hybrid offspring is low, a small proportion of individual offspring might have equal or even greater fitness than the mother tree, particularly if environmental factors are changing and creating novel habitats (Donovan et al., 2010). During these periods of local population decline, inter-specific hybridization can also provide selective advantages to rare species by allowing them to capture advantageous alleles and traits from successful species (Fitzpatrick et al., 2010). If a population is crashing due to deterministic processes, instead of stochastic reasons, such as susceptibility to fungal infection, then capturing genetic variation and alleles from a dominant and resistant genomic mutualist will be advantageous (Barton, 2001).

Diverse tree communities, exemplified by those in tropical Southeast Asia, exist in highly dynamic biogeographic, climatic, and ecological landscapes (Cannon et al., 2009; Woodruff, 2010; Slik et al., 2011; Raes et al., 2014), punctuated by brief periods of rapid change and strong selection that may dominate evolution (Gutschick and BassiriRad, 2003). High biological diversity itself lends a significant element of ecological complexity to the community, as the local composition of predators, herbivores, pollinators, and other inter-acting species at different trophic levels is variable spatially and temporally. Some studies indicate that known hybrid zones promote biodiversity in other parts of the community (Whitham et al., 1994; Adams et al., 2011). Ultimately, given the complexity of these communities, both in their current setting and in their historical dynamics, a high element of uncertainty exists in the reproductive success of an individual genotype, particularly given the general pollination limitation in these highly diverse systems (Alonso et al., 2010), which limits mate choice. In this initial formulation, our model focuses on tropical tree communities. Many of the properties of the model may also apply to other organisms living in species rich communities such as coral reefs, where most species are sympatric with closely related and partially inter-fertile species.

## **Chronically Rare**

In highly diverse communities, local rarity of the majority of species is a pervasive feature (Connell, 1978; Kochummen and LaFrankie, 1990; Pitman et al., 2001; Cannon and Leighton, 2004). One simple result of the model is clear: without variable mating behaviors, meta-communities always become dominated by a single species through stochastic population dynamics. Mitigating processes *must* exist that allow rare species to escape extinction. As mentioned above, previous ecological models have identified deterministic, largely density-dependent, factors that help maintain community diversity. Our results indicate that additional evolutionary factors can play a role as well.

Little is known about the stability of population size of species in highly diverse communities through long periods of time, but given millennial dynamics, common species in today's forests have probably been rare in the past, if not across their entire distribution. Local rarity, where an individual of a species finds itself in a community devoid of conspecifics, must occur in the history of almost all species. Most tree species can be frequently found growing outside their "preferred" habitat. By hindcasting current species distribution models of *>*300 Dipterocarpaceae species on the geographic and climatic conditions at the last glacial maximum, Raes et al. (2014) found that while most species could persist during the glacial period, their predicted historical abundances were substantially different than current abundances for many species. These millennial dynamics of climate and community change obviously have played a major role in the evolutionary history of these forests. The benefits of maintaining some level of interspecific fertility has probably affected most species in these communities, as no one species truly dominates or remains common over millennial periods of time and across its entire distribution.

Our results potentially have implications for the general understanding of species co-existence and the management of rare species in highly diverse communities. Previous work indicates that the processes of competitive exclusion in diverse communities are inefficient and protracted (Hubbell, 2006). The assumption that reproductive isolation between different species ultimately provides a long-term selective advantage to individuals, the basic premise of the Biological Species concept, has been demonstrated in simple scenarios that assume consistent selection pressures in relation to life history strategy, but this basic assumption has *never* been proven for long-lived species in highly diverse communities dominated by high degrees of uncertainty in the selective environment. We argue that while a unique and complex suite of phenotypic traits may provide an "instantaneous" competitive advantage in a particular ecological community (Davies et al., 1998), the advantage gained by a particular phenotype over long periods of time is unpredictable and frequently, what was advantageous can become disadvantageous. Instead, species identity, as determined by its genealogy, may be more fluid and dynamic through time and space, particularly as the rate of change and spatial heterogeneity in the environment increases in relation to the demographic turn-over in the community, particularly if the suite of phenotypic traits are not tightly linked genetically with traits that also promote assortative mating among phenotypes. Our model applies primarily to species in highly diverse communities, where numerous suites of genomic mutualists are embedded within a much larger and diverse community and each population contributes a relatively small proportion to the entire community. Below a certain level of diversity in the community, these mechanisms likely play a minor role, only affecting the small number of species.

A striking aspect of tropical forest diversity is our limited taxonomic knowledge of many major families (see http:// floramalesiana.org/ for a list of families which have never been treated systematically in the Flora Malesiana project, which was initiated over 60 years ago). Many groups have defied strict taxonomic treatment, like the Myrtaceae family, despite the best efforts of dedicated taxonomists. Additionally, diverse groups display a wide range of ecological and evolutionary characteristics. Examples of major tree genera in Southeast Asian forests include: (1) *Litsea*, in the laurel family, with small, frequently unisexual, and primitive flowers producing largely bird-dispersed fruits; (2) *Aglaia* or *Dysoxylum*, both in the Meliaceae family, which produce profuse displays of minute flowers and produce a wide variety of fruit types; (3) *Ficus*, in the Moraceae, with its highly specialized inflorescence and obligate symbiosis with its pollinating wasps; and (4) *Diospyros*, in the Ebenaceae, in which individuals are mostly unisexual (species are dioecious), flowers are large, open, and showy, and most trees exist in the understory. Few traits seem to link these diverse plant groups except substantial potential for hybridization in mixed communities, where a large fraction of the species are rare (Ashton, 1988a), live sympatrically with congeneric species (LaFrankie et al., 1995; Cannon and Leighton, 2004), and mate choice is frequently limited (Ashman et al., 2004; Vamosi et al., 2006). We would suggest that reticulate evolution is a viable explanation for the challenges of species description and identification.

Most detailed pollination studies in these forests also indicate that generalist pollinators are dominant (Momose et al., 1998; Sharma and Shivanna, 2011). Even within genera that show clearly divergent floral morphologies, hybridization still occurs (Zjhra, 2008). Flowering time has been shown to be slightly staggered among sympatric *Shorea* species (Ashton, 1988b), but considerable overlap in floral receptivity exists (Kettle et al., 2011). In less diverse communities, such as temperate forests, successful species truly dominate the community and can gain an advantage through reproductive isolation to prevent the donation of pollen or advantageous alleles to rare species. Examples of persistent inter-specific hybridization in such communities, e.g., oaks, may reflect the influences of other, possibly phylogenetic or ecological factors (Muller, 1952; McCauley et al., 2012), and may play a smaller role in the maintenance of community diversity (Howard et al., 1998; de Casas et al., 2007).

The results of our analysis strongly suggests that variable mating behavior may be an important force in maintaining community level species diversity, primarily by lowering the vulnerability of rare species to stochastic extinction. This behavior also has major implications for the management of rare and endangered species. Given our model, gene flow among genomic mutualists, to both expand population size and to increase genetic variance in the offspring, may be the most effective mechanism for preventing their local extinction and for enhancing their ability to adapt to novel environmental conditions. While little is known about the possible impact of such gene flow, future environmental conditions in the tropics will be largely unprecedented in the Quaternary Period (Corlett, 2012), with global warming pushing climates into the warmest conditions seen since the Eocene. Given global climate and land use change patterns, current habitats are unlikely to persist for even in the near future, making hybrid and novel phenotypes important to the persistence of local endemics and rare endangered species (Guo, 2014).

Ultimately, above and beyond the model presented here, we feel that variable mating behavior may play a major role in tropical diversification as well. To satisfy the standard allopatric or parapatric model of speciation, an elaborate series of biogeographic events must be devised to explain how all of these species would have become subdivided and spatially isolated, or at least parapatric, for evolutionarily significant periods of time. While sympatric speciation seems possible under certain circumstances, the theoretical requirements are stringent (Gavrilets and Vose, 2007; Fitzpatrick et al., 2009) and probably not generally found in highly diverse communities, particularly for long lived organisms with multiple overlapping generations such as rain-

## **References**


forest trees and coral reefs. In these organisms, the demographic integration of ecological and evolutionary dynamics can extend over centuries (Laurance et al., 2004; Petit and Hampe, 2006; Issartel and Coiffard, 2011). For the ecological speciation model to achieve complete reproductive isolation in the tropical setting, the selective forces would have to be quite strong, because the probability of inter-specific gene flow is quite high and the barriers to gene flow would have to evolve rapidly, during the unusual phases when a species is found in isolation. We do not refute that strong selection pressures can cause divergence among sympatric and closely-related taxa (Fine et al., 2013; Misiewicz and Fine, 2014) but rather question whether this process will frequently lead to complete reproductive isolation and whether this fixed endpoint is actually advantageous. A non-zero level of inter-fertility may play a substantial role over evolutionarily significant periods of time.

Our results suggest that inter-specific introgression may play a novel role, by linking two or more lineages through a genomic mutualism periods of community change or fluctuating selection pressures. When conditions are stable and the forces driving ecological speciation are consistent, hybrid offspring would be disadvantageous and reproductive isolation mechanisms would strengthen but when conditions change, even on the scale of hundreds of years, these hybrids can play a critical role in allowing rare or less fit species to capture advantageous alleles through backcrossing and introgression. Additionally, critically rare species, on the verge of local extinction, may play a considerable role in diversification by generating transgressive phenotypes through the greater increased rate of hybrid crosses attempted (Givnish, 2010). Finally, no evolutionary theory can adequately explain the general process of diversification in tropical tree communities, given their evolutionary and ecological setting, where sympatric speciation appears to be the rule, not the exception. Variable mating strategies and genomic mutualisms may provide an evolutionary mechanism for tropical diversification, not only for its maintenance.

## **Acknowledgments**

CC received funding from the Yunnan Province Science and Technology Talent Project ( #O9SK051B01) and Xishuangbanna Tropical Botanical Garden of the Chinese Academy of Sciences. MTL received funding from a Senior Visiting Professor Fellowship from the Chinese Academy of Sciences ( #2060299). We thank S. Levin and B. Blackman for comments on earlier versions. We also thank C. O. Webb, J. W. F. Slik, S. Rice, and R. Dorit for helpful conversations.

Anderson, E. (1953). Introgressive hybridization. *Biol. Rev.* 28, 280–307.


evolutionary causes and consequences. *Ecology* 85, 2408–2421. doi: 10.1890/03- 8024


the four corners region. *West. N. Am. Nat.* 72, 296–310. doi: 10.3398/064.072. 0304


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Cannon and Lerdau. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## On the processes generating latitudinal richness gradients: identifying diagnostic patterns and predictions

## *Allen H. Hurlbert1,2 \* and James C. Stegen3*

<sup>1</sup> Department of Biology, University of North Carolina, Chapel Hill, NC, USA

<sup>2</sup> Curriculum for the Environment and Ecology, University of North Carolina, Chapel Hill, NC, USA

<sup>3</sup> Pacific Northwest National Laboratory, Richland, WA, USA

#### *Edited by:*

James Edward Richardson, Royal Botanic Garden Edinburgh, UK

#### *Reviewed by:*

David Vieites, Spanish Research Council, Spain Matthew R. E. Symonds, Deakin University, Australia

#### *\*Correspondence:*

Allen H. Hurlbert, Department of Biology, University of North Carolina, Chapel Hill, NC 27599-3280, USA e-mail: hurlbert@bio.unc.edu

We use a simulation model to examine four of the most common hypotheses for the latitudinal richness gradient and identify patterns that might be diagnostic of those four hypotheses. The hypotheses examined include (1) tropical niche conservatism, or the idea that the tropics are more diverse because a tropical clade origin has allowed more time for diversification in the tropics and has resulted in few species adapted to extratropical climates. (2) The ecological limits hypothesis suggests that species richness is limited by the amount of biologically available energy in a region. (3) The speciation rates hypothesis suggests that the latitudinal gradient arises from a gradient in speciation rates. (4) Finally, the tropical stability hypothesis argues that climatic fluctuations and glacial cycles in extratropical regions have led to greater extinction rates and less opportunity for specialization relative to the tropics. We found that tropical niche conservatism can be distinguished from the other three scenarios by phylogenies which are more balanced than expected, no relationship between mean root distance (MRD) and richness across regions, and a homogeneous rate of speciation across clades and through time.The energy gradient, speciation gradient, and disturbance gradient scenarios all produced phylogenies which were more imbalanced than expected, showed a negative relationship between MRD and richness, and diversity-dependence of speciation rate estimates through time. We found that the relationship between speciation rates and latitude could distinguish among these three scenarios, with no relation expected under the ecological limits hypothesis, a negative relationship expected under the speciation rates hypothesis, and a positive relationship expected under the tropical stability hypothesis.We emphasize the importance of considering multiple hypotheses and focusing on diagnostic predictions instead of predictions that are consistent with multiple hypotheses.

**Keywords: biodiversity, disturbance, diversification, latitudinal gradient, simulation, speciation rate, species richness, zero sum**

"The reason every one of you is telling it differently is because each one of you touched a different part of the elephant. So, actually the elephant has all the features you mentioned."

—"Elephant and the blind men," Jain Stories, JainWorld.com

### **INTRODUCTION**

While biologists have generated a variety of hypotheses to explain the teeming diversity of life in the tropics relative to more temperate regions, there has been less progress in ruling out or agreeing on the primary processes responsible for biodiversity patterns. Because such patterns result from a mixture of ecological and evolutionary processes playing out over space and time, and because these patterns may be assessed at multiple spatial and temporal scales and for radically different taxonomic groups, it is unsurprising that different investigators have emphasized the importance of different processes. In some respects, biodiversity patterns and their study resemble the allegorical elephant that is examined by multiple blind men, each coming to radically different conclusions about the nature of their study subject. Independent investigations into different aspects of biodiversity

patterns have resulted in conflicting conclusions about underlying processes.

There are two primary ways in which biodiversity research resembles efforts of the apocryphal blind men. First, the narrow focus of most studies on a single process, pattern, region, or taxonomic group undermines our ability to properly evaluate and distinguish among hypotheses and precludes understanding of how processes change across systems—the blind man who attempts to characterize an elephant based entirely on feeling the tail is doomed to fail. Although there have been several promising efforts toward broader integration (e.g., Hillebrand, 2004; McGill, 2010; Jansson et al., 2013), ecologists and evolutionary biologists would more rapidly advance biodiversity science by simultaneously evaluating multiple hypotheses that make predictions about multiple patterns associated with biodiversity gradients. After all, if the elephant represents the truth about how biodiversity gradients originate and are maintained, then an appreciation for that truth can only be obtained by recognizing that it must simultaneously explain the tail, ear, legs, body, trunk, and tusks.

Second, many studies make inferences based on observations that might be consistent with a particular hypothesis, but that are neither unique to nor sufficient for that hypothesis. One blind man based his conclusion that the elephant was a rope on the fact that he felt something long, narrow, and frayed at the end. Had he started with a set of *a priori* hypotheses and their associated predictions, it would have been clear that multiple hypotheses are consistent with such a narrow set of observations. Informed by *a priori* predictions, the blind man would have characterized a different set of features that would collectively distinguish among his hypotheses. In a biodiversity context, the observation that species richness is positively correlated with net primary productivity is consistent with an argument based on energetic or ecological limits (e.g., Wright, 1983), but is also consistent with explanations based on other causal variables that might correlate with productivity (e.g., rates of speciation, time in a region, degree of similarity to ancestral environments). To distinguish among a set of hypotheses one must test predictions that are diagnostic; testing many necessary but non-diagnostic predictions is necessary but not sufficient.

Our broad goal here is to continue developing an approach that facilitates the integration of simulation models with empirical data to enable the advancement of biodiversity science into the paradigm of multi-hypothesis/multi-prediction evaluation (e.g., Stegen and Hurlbert, 2011; Stegen et al., 2012; Hurlbert and Stegen, 2014). We specifically develop multi-pattern predictions from four hypotheses that have been proposed to explain patterns of species richness. The tropical niche conservatism hypothesis suggests that tropical regions have more species because descendants of a tropical ancestor will all tend to have tropical environmental tolerances, and diversity outside the tropics is hence constrained by both successful colonization and limited time for diversification (Wiens et al., 2010; Romdal et al., 2013). The ecological limits hypothesis ("energy gradient") suggests that tropical regions are most diverse because of a greater energetic capacity to support viable species populations (Wright, 1983; Hurlbert and Stegen, 2014). The evolutionary rates hypothesis ("speciation gradient") suggests that evolutionary rates are faster in the tropics whether because of the kinetic effects of temperature (Rohde, 1992; Allen and Gillooly, 2006), the increased importance of biotic interactions (Schemske, 2009), or increased area (Rosenzweig, 1995). Finally, the tropical stability hypothesis ("disturbance gradient") suggests that tropical environments have been less susceptible to major disturbances such as repeated glaciations that have led to higher extinction rates in temperate to polar latitudes (Brown and Lomolino, 1998; Weir and Schluter, 2007).

To generate multi-pattern predictions associated with the above hypotheses, we build on a previous spatial simulation model of diversification and dispersal across a broad environmental gradient (Hurlbert and Stegen, 2014). By incorporating into the model different assumptions that correspond to the above hypotheses we evaluate the ability of multiple predicted patterns to diagnose underlying process.

While the simulated scenarios differ in critical ways, they all share several key components that we view as likely features of any realistic diversification process across a broad gradient. These include (1) niche conservatism, where descendants have traits that are similar to their ancestors with respect to environmental tolerances (Wiens and Graham, 2005); (2) environmental filtering (or "selection," Vellend, 2010), where species with traits poorly suited to their environment will have reduced performance and lower population sizes compared to species with traits well-suited to their environment; and (3) stochastic extinction that occurs with a probability that declines exponentially with increasing population size (Lande et al., 2003). While we examine one scenario in which there are no energetic limits and individuals and species may accumulate exponentially and indefinitely through time (the "pure niche conservatism" scenario), the rest of the simulations impose a zero sum energetic constraint such that increases in abundance or biomass by one species must be offset by a collective decrease across all other species (Hurlbert and Stegen, 2014).

As a starting point for diagnosing the diversification scenario underlying a given dataset, we examined secondary biodiversity patterns and metrics highlighted in Hurlbert and Stegen (2014). However, the consideration of additional diversification scenarios here requires additional patterns to differentiate them. As such, we used a recently developed analysis framework (Rabosky, 2014) for characterizing diversification rates through time and across the simulated phylogenies emerging from our four scenarios. We present model outputs as a collection of patterns that appear to differentiate among diversification scenarios associated with the tropical niche conservatism, ecological limits, evolutionary rates, and tropical stability hypotheses.

### **MATERIALS AND METHODS**

We utilize a simulation model of diversification and dispersal along a one-dimensional spatial gradient spanning 10 adjacent regions from the warm tropics to the cooler temperate zone as described in Hurlbert and Stegen (2014). Briefly, a simulation begins with a single species originating within a region at either the tropical or temperate end of the gradient, which achieves a regional population size that is determined in part by the match between the regional environment and the species' intrinsic environmental optimum. Each species has a fixed per-individual probability of spawning a daughter species which inherits the environmental optimum of its parent with some small amount of variation, reflecting strong niche conservatism. Each species also has a fixed per-individual probability of dispersing to adjacent regions. Members of the diversifying clade are envisioned to use a common pool of limiting resources such that a zero sum energy constraint is a reasonable assumption (Hurlbert and Stegen, 2014). Each region can therefore support some maximum number of individuals summed across species. As the number of species increases in a region, the average population size decreases. Finally, extinction occurs stochastically with a per species probability that is a negative exponential function of population size.

We examined three different zero-sum scenarios that have the potential to influence latitudinal gradients in species richness (**Table 1**). In each scenario, one key parameter varied across the spatial gradient, and all three scenarios were able to support the same global number of individuals. Under the "energy gradient"


**Table 1 | Parameter values used in the four simulation scenarios presented in this analysis.**

Values that are listed as a range refer to the values at the temperate and tropical ends of the spatial gradient, respectively, and all gradients were linear.

scenario, the total number of individuals that could be supported in a region increased linearly from 4,000 in the temperate zone to 40,000 in the tropics. The 10-fold increase in carrying capacity across the gradient is similar to the roughly 10-fold increase in net primary productivity across the latitudinal gradient at coarse resolutions (Kucharik et al., 2000). Under the "speciation gradient" scenario, the per-capita probability of speciation increased linearly from 3 <sup>×</sup> <sup>10</sup>−<sup>7</sup> in the temperate zone to 3 <sup>×</sup> <sup>10</sup>−<sup>6</sup> in the tropics. This 10-fold gradient in speciation rate is at the upper end of estimates for how speciation or net diversification rates might vary across latitudes (Allen et al., 2006; Rolland et al., 2014). Under the "disturbance gradient" scenario, disturbance events occurred with a regular frequency leading to extinctions, but the magnitude of disturbance events increased linearly from 75% of all individuals being killed per event in the tropics to 99% in the temperate zone. We chose a relatively high value of disturbance for tropical regions because it is increasingly recognized that tropical regions have been impacted by global climate fluctuations (Colinvaux et al., 1997). Nevertheless, in our disturbance gradient scenario, such impacts are still less than would be expected by physical displacement of species by ice sheets at high latitudes. All zero-sum simulations were run for 100,000 time steps, or approximately five times longer than it took for equilibrial richness gradients to emerge, and 10 replicate simulations were run for each scenario.

For comparison, we also studied a scenario in which the zero-sum constraint was removed. In this case there are no energetic limits and as a consequence there is exponential and indefinite accumulation of individuals and species (Hurlbert and Stegen, 2014). We refer to this as the "pure niche conservatism" scenario, and associated simulations were stopped once total extant richness exceeded 10,000 species (typically a few 100 time steps) for reasons of computational efficiency. Code for running simulations is provided in our online github repository (http://github.com/ahhurlbert/species-energy-simulation).

#### **SIMULATION METRICS AND ANALYSIS**

We examined several simulation metrics that were previously found (Hurlbert and Stegen, 2014) to be helpful in diagnosing processes underlying species richness gradients. These include the correlation between latitude and regional species richness, the correlation between the length of time a clade has been in a region (estimated from its extant members) and regional richness, a measure of phylogenetic tree imbalance or asymmetry (β, Blum and François, 2006), and the slope of the relationship between the scaled mean root distance (MRD) of species in a region and regional richness (Hurlbert and Stegen, 2014).

We also used Bayesian analysis of macroevolutionary mixtures (BAMM, Rabosky, 2014) version 1.0 to analyze the tempo of speciation dynamics across the phylogeny of extant species generated under each scenario. We focused on simulations with an ancestral species in the tropics to be consistent with a tropical region of origin that appears to be common for most large clades (Jablonski et al., 2006; Jansson et al., 2013). For zero-sum simulations, we analyzed the extant phylogeny after 30,000 time steps, after equilibrial richness patterns had been achieved. An advantage of the BAMM approach is that it allows for the estimation of heterogeneous rates across the tree, as might be expected if subclades develop key innovations or colonize previously unoccupied regions. In addition, the method allows for a mixture of diversitydependent and diversity-independent dynamics across the tree (Rabosky, 2014).

We ran BAMM analyses using phylogeny-specific priors suggested by the setBAMMpriors function in the R package BAMMtools (Rabosky et al., 2014), running 2–10 million generations of reverse jump Markov Chain Monte Carlo sampling depending on the scenario, and discarding the first 20% of generations as a burn in period. Convergence was assessed by comparing 3–5 BAMM runs for each simulated scenario, and re-running BAMM for longer if runs appeared not to have converged. These analyses result in the estimation of marginal probabilities of speciation for each branch in the phylogenetic tree, including an estimate of instantaneous speciation rate at the tips (Rabosky et al., 2014).

### **RESULTS**

The absence of a zero sum constraint as implemented in the pure niche conservatism scenario left a clear signature in the dynamics of diversification (**Figure 1A**). Clades originating in the tropics developed a strong classical latitudinal gradient (*r* = −1.0), while clades originating in the temperate-most region developed a reverse gradient (*r* = +1.0; **Figure 1A**). As predicted

by the niche conservatism hypothesis, independent of region of

coefficient between latitude and richness, Pearson's correlation coefficient

imbalance. Simulations of tropical origin in red, and simulations of temperate origin in blue.

origin, a strong time-for-speciation effect emerged with the most species in regions that had been occupied the longest (**Figure 1A**). In contrast to all three zero sum scenarios, the MRD-richness slope for the pure niche conservatism scenario was always close to 0 (**Figure 1A**), while β was typically positive, indicating a slightly more balanced phylogeny than expected from random (**Figure 1A**).

The three zero sum scenarios examined—energy gradient, speciation gradient, and disturbance gradient—all shared several features in their simulation output. First, all three scenarios resulted in classical latitudinal gradients regardless of the ancestral region of origin (**Figures 1B–D**). Reverse gradients in richness existed briefly under a temperate ancestral origin, but flipped to traditional gradients within 15–20 thousand time steps. All three scenarios exhibited non-zero MRD-richness slopes, with simulations of a temperate origin yielding positive values

and simulations of a tropical origin yielding negative values (**Figures 1B–D**). In addition, all three scenarios resulted in imbalanced phylogenetic trees with negative β, although β appeared to become less negative through time (**Figures 1B–D**).

The time-richness relationship and its dependence on ancestral region of origin was one metric that differed among the three zero sum scenarios. Under the disturbance scenario, a strong, positive time-richness relationship emerged regardless of region of origin (**Figure 1D**). A strong, positive time-richness relationship emerged under the energy gradient scenario, but only for simulations of tropical origin. Under a temperate origin, an initial positive time-richness relationship quickly flipped to be negative as species accumulated in the high energy regions that were colonized last (**Figure 1B**). Later in the simulation as old species in the ancestral region went extinct and were replaced by younger taxa, the strength of the time-richness correlation became weaker. Finally, under a speciation gradient, initially positive time-richness

relationships for both regions of origin shifted through time to become negative (**Figure 1C**). This occurred because one result of high speciation rates under a zero sum constraint is high species turnover, resulting eventually in the loss of old basal species and the accumulation of relatively young species in the tropics.

Bayesian analysis of macroevolutionary mixtures analyses result in very distinct patterns of diversification across the four scenarios (**Figure 2**). Under pure niche conservatism, the best fit model to the phylogeny was one involving a homogeneous process of near-constant per-lineage speciation rates (**Figure 2A**). In contrast, the three zero sum scenarios led to slowdowns in the estimated speciation rate from root to tips of 60–90%, and could be differentiated from each other due to varying degrees of rate heterogeneity within their respective phylogenies (**Figures 2B–D**).

For all three zero sum scenarios, the most basal split led to two lineages which continued to diversify in the tropical region of origin, but only one of which eventually colonized the temperate end of the gradient. Rate shifts identified in BAMM analyses coincided with the colonization of novel, more temperate parts of the gradient. In the speciation gradient scenario, subclades colonizing the temperate-most regions with the lowest per-individual probabilities of speciation predictably resulted in depressed per-lineage speciation rates (**Figure 2C**). In contrast, the disturbance gradient scenario resulted in higher per-lineage speciation rates in temperate regions as disturbance-caused extinctions provided continued opportunitiesfor diversification relative to the more stable tropics (**Figure 2D**).

Averaging tip-specific BAMM estimates of speciation rate across species within regions more directly illustrated these findings. Under the speciation gradient scenario, these rates decreased with latitude, under the disturbance scenario they increased with latitude, and for both the energy gradient and pure niche conservatism scenarios, speciation rate appeared to be independent of latitude (**Figure 3**). This figure also highlights one limitation with our simulations, namely that under the pure niche conservatism scenario, the number of species quickly increased to levels that became computationally intractable to deal with, and hence we were forced to end the simulation prior to colonization of the temperate-most regions. Nevertheless, the trajectory of our metrics over the course of the simulation suggests that basic

#### **FIGURE 2 | Output from Bayesian analysis of macroevolutionary mixtures (BAMM) analysis of four distinct diversification scenarios. (A)** Pure niche conservatism, **(B)** energy gradient, **(C)** speciation gradient, and **(D)** disturbance gradient. Top row, extant phylogenies at t = 200 time steps **(A)**, or t = 30,000 time steps **(B–D)**. Branches are color coded by

instantaneous estimates of speciation rate, from low (blue) to high (red). Color labels at the tips reflect the geographic region of maximum abundance for each species, from temperate regions (purple) to tropical regions (green). Bottom row, estimated median speciation rate over the time course of diversification under each scenario.

patterns of diversification and phylogenetic shape would remain unchanged for longer runs that spanned more of the gradient (**Figure 1A**).

#### **DISCUSSION**

Disentangling the relative support for multiple biodiversity hypotheses is challenging because similar patterns may emerge from very different underlying processes. Here, we add to the findings of Hurlbert and Stegen (2014) by identifying several aspects of phylogenetic structure that distinguish a non-zero sum diversification scenario from a set of zero sum scenarios, and further identify speciation rate patterns that distinguish among those zero sum scenarios.

#### **DIAGNOSTIC VERSUS NON-DIAGNOSTIC BIODIVERSITY PATTERNS**

Diversification in the absence of a zero sum constraint—as in the pure niche conservatism scenario—leads to balanced phylogenetic trees with no strong relationship between the MRD in a region and the number of species in that region. In addition, under niche conservatism alone, BAMM analyses show that speciation rate is consistent through time and across clades. This is in contrast to the zero sum scenarios we investigated, which all resulted in imbalanced trees (see also Davies et al., 2011), clear relationships between MRD and richness, decelerating rates of speciation through time, and substantial rate heterogeneity.

Many of the patterns that have been suggested to provide evidence for a non-zero sum, pure niche conservatism explanation of diversity gradients also emerged from the zero sum scenarios. For example, the"time-for-speciation effect" (Stephens andWiens, 2003; Wiens, 2011), or the relationship between time in a region and number of species in that region, is held as a hallmark for purely historical processes. However, we found a positive timerichness relationship under the disturbance gradient scenario and the energy gradient scenario with a tropical origin. This makes the time-richness relationship a necessary but insufficient test of the tropical conservatism hypothesis—it is unable to rule out alternative hypotheses.

Two additional patterns—beyond time-richness relationships that have been suggested from verbal arguments to reflect pure niche conservatism did not emerge under our pure niche conservatism scenario. First is the prediction that climate-richness (and hence latitude-richness) relationships should be more variable among subclades under pure niche conservatism, and more similar under ecological constraints (Buckley et al., 2010). In earlier work comparing the same pure niche conservatism and energy gradient scenarios considered here, we showed that in fact the opposite should be expected (Figure 3 in Hurlbert and Stegen, 2014). For small subclades the assumption of a zero sum constraint is more likely to be violated, and hence diversity patterns will not necessarily track aggregate estimates of environmental variation such as net primary productivity (Hurlbert and Stegen, 2014). Second, Hawkins et al. (2006) and Hawkins (2010) have suggested that a higher MRD in temperate regions of low richness—thus, a positive correlation between MRD and richness—was predicted by a pure niche conservatism scenario. Here and inHurlbert and Stegen (2014), we found the opposite to be true; under pure niche conservatism the MRD-richness slope is expected to be close to 0, while non-zero slopes are expected only if niche conservatism operates alongside a zero sum constraint. Significant correlations between MRD and richness do not, therefore, support the hypothesis that niche conservatism alone is responsible for richness gradients (Algar et al., 2009), and in fact appear to reject that hypothesis.

Our simulations show that the presence of a zero sum constraint results in surprisingly consistent phylogenetic patterns regardless of whether the primary biodiversity driver was limiting resources, speciation rate, or disturbance (**Figure 1**). This provides a broader base of support—expanding results in Hurlbert and Stegen (2014)—for using a combination of the MRD-richness relationship and the value of β to distinguish between zero sum and non-zero sum scenarios within empirical systems. Our simulations further suggest that one can differentiate among zero sum scenarios using BAMM (Rabosky, 2014) to make higher resolution inferences within and across phylogenies. We specifically found that the relationship between mean speciation rate and latitude differs diagnostically among the three zero sum scenarios.

Bayesian analysis of macroevolutionary mixtures analyses revealed that for empirical systems in which there is an overarching zero sum constraint, the observation that speciation rates are highest in temperate regions is sufficient to identify a diversification scenario involving greater magnitudes of disturbance in the temperate zone relative to tropical regions. While by definition disturbance is a non-equilibrial process, the zero sum constraint leads to an equilibrium between speciation and extinction reflecting high taxonomic turnover. Consistent with the disturbance scenario, Weir and Schluter (2007) found speciation and extinction rates of birds and mammals to be higher in northern latitudes relative to rates within the tropics. If it were shown that a zero sum constraint existed for the groups studied by Weir and Schluter (2007)—using β and the MRDrichness slope—and if the latitudinal pattern in speciation rates they observed was also found using BAMM, one could cleanly reject all hypotheses studied here except the tropical stability hypothesis.

The BAMM analyses further revealed that empirical observations consistent with a zero sum constraint and with speciation rates increasing toward the tropics are sufficient to identify a scenario in which a latitudinal richness gradient is driven by increased per-capita speciation probabilities in the tropics. From a theoretical perspective this aligns with the hypothesized kinetic effects of temperature on per-capita evolutionary rates under the metabolic theory of ecology (Allen et al., 2006; Stegen et al., 2009, 2012). Empirically, Rolland et al. (2014) found mammalian speciation rates to be higher in the tropics relative to the temperate zone, and highlighted a variety of processes that may underlie increased speciation rates in the tropics. We suggest that pursuing specific, complementary analyses found here to be diagnostic— MRD-richness, β, and BAMM—would allow evaluation of these underlying processes.

Finally, BAMM analyses showed that empirical evidence for a zero sum constraint but no clear relationship between speciation rate and latitude is consistent with a diversity gradient resulting from a geographic gradient in available energy. Jetz et al. (2012) compiled and analyzed a phylogeny of all 9,993 extant birds and found no relationship between latitude and net diversification rate, although they did not attempt to estimate speciation and extinction rates separately. The lack of a relationship between latitude and speciation rates is the weakest diagnostic in that it may be observed due to lack of power, or to the interaction of conflicting processes. However, an examination of **Figure 2B** indicates that another potentially useful diagnostic is that the variation in estimated speciation rates from root to tips is far greater than the variation among tips. In the other two zero sum scenarios, rate variation from root to tips appears to be of similar magnitude as rate variation among tips. This is a pattern that deserves additional scrutiny in future studies.

#### **TOWARD A MULTI-HYPOTHESIS, MULTI-PATTERN PARADIGM**

Our aim is to emphasize and enable an approach to biodiversity science that has the potential to accelerate progress by using multi-pattern 'fingerprints'—generated through *a priori* simulation modeling—that can differentiate among alternative hypotheses attempting to explain species richness patterns. We argue that such a shift toward a multi-hypothesis, multi-pattern paradigm is needed as most biodiversity studies evaluate the predictions of a single diversity hypothesis that are often insufficient for ruling out alternative hypotheses. This may be especially important when the interaction between multiple hypotheses is important in driving observed patterns. Unlike the blind men who each focused on the single pattern that was most obvious to them, biodiversity scientists must expand their awareness to consider the breadth and complexity of empirical patterns that only when considered together will reveal the true elephant.

We urge a focus on the comparison of multiple hypotheses using predictions that are diagnostic, rather than predictions that are merely consistent with one particular hypothesis under consideration. The best way to identify diagnostic predictions is to compare secondary biodiversity patterns expected under the modeling of different macroecological and macroevolutionary scenarios (Grimm et al., 2005; Gotelli et al., 2009). We further propose an integrated approach that extends analyses of our simulation model to a larger suite of methods used across empirical studies (e.g., GeoSSE, Goldberg et al., 2011) while simultaneously characterizing empirical systems using the metrics and methods shown here to provide diagnostic signatures. While the comparison of empirical patterns to patterns simulated under different processes has been widely used in macroevolution and phylogenetics (Morlon, 2014), until now no models have included an explicit spatial context (beyond binary tropical versus temperate bins) while simultaneously modeling trait evolution, environmental filtering, and an individual-based energetic constraint (see Appendix 1 in Hurlbert and Stegen, 2014). Our simulation framework thus has the flexibility to model a range of diversification scenarios over spatial and environmental gradients.

Any identification of diagnostic predictions using our approach is provisional. Future work modeling diversification scenarios not examined here could result in duplicate patterns. We view scenarios that invoke the evolution of key innovations that incorporate additional traits related to trophic niche, or that incorporate temporal variation in climate across the different regions as particularly important to consider in future expansions of our model. On the other hand, the identification of patterns as being nondiagnostic is inherently useful and reduces the likelihood that evidence is over-interpreted in favor of one hypothesis over another, regardless of the number of unmodeled scenarios.

In addition, alternative choices in the construction of a simulation model may result in different patterns that can be considered diagnostic. While we have done basic sensitivity analyses to confirm that our conclusions are not dependent on the specific parameter values chosen (see Hurlbert and Stegen, 2014), we welcome the development of alternative simulation models. Testing the robustness of our identified diagnostic patterns to a variety of simulation implementations will provide the strongest support for inferences made from empirical data.

Here we have argued for an approach that generates diagnostic *a priori* predictions across a suite of hypotheses, each associated with a feature of the environment that varies spatially. Our implementation of this approach has focused on generating a set of predictions from each hypothesis and then comparing predictions across hypotheses to identify a diagnostic set of patterns. We recognize, however, that multiple processes contribute to empirical richness gradients. Thus, an important next step is to determine how best to identify the operation and relative importance of multiple processes acting simultaneously to influence richness gradients. As one option for taking this next step, our simulation model could be easily modified such that multiple factors co-vary across the spatial gradient. We encourage such inquiry – our simulation code is publicly available – and further encourage the analysis of alternative simulation models; by integrating multiple models we can triangulate upon patterns that provide the most rigorous hypothesis tests and thereby enable the greatest conceptual advances.

#### **AUTHOR CONTRIBUTIONS**

Allen H. Hurlbert and James C. Stegen designed simulations, Allen H. Hurlbert conducted analyses, and Allen H. Hurlbert and James C. Stegen wrote the paper.

#### **ACKNOWLEDGMENTS**

The authors are grateful to Drs. Rull, Pennington, and Richardson for the invitation to submit an article to this special issue. Allen H. Hurlbert was supported by institutional funds from the University of North Carolina, as well as NSF grant DEB-1354563. James C. Stegen was supported by a Linus Pauling Distinguished Postdoctoral Fellowship at Pacific Northwest National Laboratory, which is operated for DOE by Battelle under contract DE-AC06-76RLO 1830.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 June 2014; accepted: 16 November 2014; published online: 02 December 2014.*

*Citation: Hurlbert AH and Stegen JC (2014) On the processes generating latitudinal richness gradients: identifying diagnostic patterns and predictions. Front. Genet. 5:420. doi: 10.3389/fgene.2014.00420*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Hurlbert and Stegen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Comparative evolutionary diversity and phylogenetic structure across multiple forest dynamics plots: a mega-phylogeny approach

*David L. Erickson1 \*, Frank A. Jones 2,3, Nathan G. Swenson4, Nancai Pei 5, Norman A. Bourg6, Wenna Chen7, Stuart J. Davies 8, Xue-jun Ge7, Zhanqing Hao9, Robert W. Howe10, Chun-Lin Huang11, Andrew J. Larson12, Shawn K. Y. Lum13, James A. Lutz 14, Keping Ma15, Madhava Meegaskumbura16, Xiangcheng Mi 15, John D. Parker 17, I. Fang-Sun18, S. Joseph Wright 3, Amy T. Wolf 10, W. Ye7, Dingliang Xing9, Jess K. Zimmerman19 and W. John Kress <sup>1</sup>*


#### *Edited by:*

*Toby Pennington, Royal Botanic Garden Edinburgh, UK*

#### *Reviewed by:*

*Natalia Martinkova, Academy of Sciences of the Czech Republic, Czech Republic Marcial Escudero, Doñana Biological Station - Spanish National Research Council, Spain*

*Tim Baker, University of Leeds, UK*

#### *\*Correspondence:*

*David L. Erickson, Department of Botany, Museum Routing Code-166, National Museum of Natural History, Washington, DC 20013-7012, USA e-mail: ericksond@si.edu*

Forest dynamics plots, which now span longitudes, latitudes, and habitat types across the globe, offer unparalleled insights into the ecological and evolutionary processes that determine how species are assembled into communities. Understanding phylogenetic relationships among species in a community has become an important component of assessing assembly processes. However, the application of evolutionary information to questions in community ecology has been limited in large part by the lack of accurate estimates of phylogenetic relationships among individual species found within communities, and is particularly limiting in comparisons between communities. Therefore, streamlining and maximizing the information content of these community phylogenies is a priority. To test the viability and advantage of a multi-community phylogeny, we constructed a multi-plot mega-phylogeny of 1347 species of trees across 15 forest dynamics plots in the ForestGEO network using DNA barcode sequence data (*rbc*L, *mat*K, and *psb*A-*trn*H) and compared community phylogenies for each individual plot with respect to support for topology and branch lengths, which affect evolutionary inference of community processes. The levels of taxonomic differentiation across the phylogeny were examined by quantifying the frequency of resolved nodes throughout. In addition, three phylogenetic distance (PD) metrics that are commonly used to infer assembly processes were estimated for each plot [PD, Mean Phylogenetic Distance (MPD), and Mean Nearest Taxon Distance (MNTD)]. Lastly, we examine the partitioning of phylogenetic diversity among community plots through quantification of inter-community MPD and MNTD. Overall, evolutionary relationships were highly resolved across the DNA barcode-based mega-phylogeny, and phylogenetic resolution for each community plot was improved when estimated within the context of the mega-phylogeny. Likewise, when compared with phylogenies for individual plots, estimates of phylogenetic diversity in the mega-phylogeny were more consistent, thereby removing a potential source of bias at the plot-level, and demonstrating the value of assessing phylogenetic relationships simultaneously within a mega-phylogeny. An unexpected result of the comparisons among plots based on the mega-phylogeny was that the communities in the ForestGEO plots in general appear to be assemblages of more closely related species than expected by chance, and that differentiation among communities is very low, suggesting deep floristic connections among communities and new avenues for future analyses in community ecology.

#### **Keywords: ForestGEO, barcode, phylogeny, community assembly, phylogenetic diversity, ecology**

#### **INTRODUCTION**

Phylogenetic hypotheses have played an increasingly important role in ecology over the last decade and their use in understanding community processes has been well reviewed (Webb et al., 2002; Cavender-Bares et al., 2009; Swenson, 2013). Knowledge of phylogenetic relationships among species has been used to quantify various aspects of ecology, including competition (Webb, 2000; Kembel and Hubbell, 2006; Webb et al., 2008; Cavender-Bares et al., 2009; Lebrija-Trejos et al., 2013), environmental filtering (Cavender-Bares et al., 2004; Uriarte et al., 2010; Liu et al., 2013; Pearse et al., 2013), pathogen and herbivore selection (Gilbert and Webb, 2007; Whitfeld et al., 2012), succession (Whitfeld et al., 2012) and the spatial differentiation of phylogenetic diversity (Weiblen et al., 2006; Graham and Fine, 2008; Fine and Kembel, 2011). In the context of conservation biology, phylogenetic information has also been used to quantify diversity within and among communities (Faith, 1992; Hardy and Senterre, 2007). The best measure of diversity that is most relevant for conservation assessment remains an important question. For example, does species diversity or phylogenetic diversity best capture the full spectrum of organismal diversity and traits in a community or habitat to be conserved (e.g., Swenson, 2013)? Nonetheless, the ability of phylogenetic data to precisely quantify evolutionary history within and among communities provides a framework for addressing how best to quantify, manage and conserve biodiversity and communities.

The application of evolutionary information to questions in community ecology has been limited in large part by the lack of accurate estimates of phylogenetic relationships among individual species found within communities. This dearth of information has been particularly true for the most species- and ecologically-diverse communities in the tropics where existing phylogenetic data are most limiting (Webb and Donoghue, 2005; Kress et al., 2009). Traditionally, phylogenetic systematists have focused on taxonomic groups and lineages, not communities, on the assumption that phylogenetic treatments are most robust when all members of a clade are included in the analysis. In communities where diverse sets of species are present, the very large evolutionary divergences among co-occurring taxa and more sparse taxonomic sampling have been thought to hinder accurate reconstructions of phylogenetic relationships (Poe and Swofford, 1999).

Newly emerging tools for constructing community phylogenies have largely ameliorated these concerns. Supertree methods, which prune and graft taxa from existing phylogenetic trees, can be used to construct phylogenetic relationships among species in a community (Bininda-Emonds and Sanderson, 2001; Webb and Donoghue, 2005). However, these methods have two drawbacks. Firstly, a phylogeny assembled from separate phylogenetic trees carries topological information, but contain no information on the evolutionary distances connecting species (i.e., branch lengths). Because the use of phylogenies in community ecology is specifically dependent upon evolutionary distances, branch lengths must be inferred. Assigning branch lengths to a topology with no intrinsic branch length information requires assumptions (e.g., bladj; Webb et al., 2008) where the branch lengths between any two dated nodes are evenly divided among the nodes separating the dates, which is unrealistic. Secondly, unless the reference trees from which the super-phylogeny is constructed contain all members of the community, which is extremely unlikely particularly for diverse tropical communities, the relationships of many species will be inferred only at higher taxonomic levels where relationships are completely resolved (Kress et al., 2009) and information about the tips of the phylogeny will be lost. Despite these limitations supertree-based community phylogenies have in many ways revolutionized community ecology. The availability of supertree tools, such as phylomatic (Webb and Donoghue, 2005), has resulted in an explosion of interest in the merging of community ecology and phylogenetic systematics (Swenson, 2013).

A relatively new source of phylogenetic character information available to complement supertree methods in community ecology is DNA barcode sequence data. Multi-locus DNA barcodes for plants are composed of genes or parts of genes that have traditionally been used in molecular systematics (Soltis et al., 2011). The community phylogenies that have been estimated from DNA barcode sequence data are robust and congruent with overall phylogenetic expectations for vascular plants (Kress et al., 2009; Pei et al., 2011; Whitfeld et al., 2012; Yessoufou et al., 2013). The advantage of these DNA barcode phylogenies is their ability to (1) better resolve relationships at the species-level in clades where supertree methods are less robust and (2) provide direct estimates of evolutionary distances (e.g., branch lengths) that connect clades within the phylogeny (Kress et al., 2009).

Recently supertree methods have been combined with DNA barcode sequence data to enhance resolution in community phylogenies (e.g., Kress et al., 2010). In these cases the phylogenetic relationships generated through supertree algorithms are a combination of broadly accepted patterns of taxonomic relationships at the deepest phylogenetic nodes provided by a guide or constraint tree while phylogenetic resolution among genera and species at the tips of the branches is provided by the rapidly evolving DNA barcode markers. Equally important is that branch lengths may be estimated with the DNA barcode sequence data throughout the tree, including the parts of the tree that are constrained. This merging of the two methods has been particularly fruitful in a number of community studies (e.g., Kress et al., 2010; Uriarte et al., 2010; Lebrija-Trejos et al., 2013).

The next step in community analyses is to build multiple local phylogenies simultaneously that can be quantitatively compared. Currently most community phylogenies are constructed for one community at a time using different genes and different algorithms for estimating the phylogeny, as well as employing different dating methods, all of which will likely limit the ability to compare results among the communities. A few studies have employed molecular phylogenies to multiple communities (Swenson et al., 2012), but most comparisons among communities have relied upon either species taxonomic lists (Ricklefs et al., 2012) or taxonomic supertree methods (e.g., phylomatic). If we are to use phylogenetics to compare the structure, diversity, and ecological determinants of diversity among communities, then we must develop robust methods to build and employ multi-community phylogenies. Furthermore, an area in which the application of phylogenetic hypotheses to understanding ecological processes remains relatively less well explored is the geographic distribution of phylogenetic diversity and structure (Hardy and Jost, 2008). The power of sequence-based phylogenies to resolve evolutionary relationships and calculate evolutionary distances within communities can now be applied to determining genetic differentiation and phylogenetic diversity among sites and communities by combining DNA barcode sequence data from multiple communities into a mega-phylogeny across these communities. The value of using these measures of phylogenetic diversity to assess the conservation status of communities representing various habitat types and regions across the globe should not be underestimated (e.g., Faith, 1992).

In this study the ForestGEO (http://www*.*forestgeo*.*si*.*edu) global network of forest dynamics plots was used as the focus for developing a single large phylogeny for comparing measures of phylogenetic structure within and among plots. These plots have been developed over the last three decades to monitor forest change in different forest types around the world. Recently an effort has been initiated to generate DNA barcodes for tree species in each plot as a new tool for forensic ecology and community phylogenetics (e.g., Kress et al., 2009, 2010; Jones et al., 2011; Pei et al., 2011; Swenson et al., 2012). Here a method is developed for reconstructing species relationships based on the DNA barcode sequence data in fifteen different ForestGEO plots simultaneously by constructing a single mega-phylogeny. The benefits of a simultaneous phylogenetic reconstruction are addressed by estimating branch lengths and evolutionary divergence within and among the individual plots. Finally, analyses of the geographic distribution of community structure, measures of phylogenetic diversity across these plots (e.g., Phylogenetic Diversity, Mean Phylogenetic Diversity, and Mean Nearest Taxon Density), and inferences into the mechanisms that produce these observed patterns are provided.

## **MATERIALS AND METHODS**

### **COMMUNITY SAMPLING AND GENOTYPING**

The samples for our analyses were obtained from 15 forest dynamics plots, which are part of the ForestGEO network organized by the Smithsonian Institution (http://www*.*forestgeo*.*si*.* edu; **Figure 1**). Some of these sites have been the focus of investigations into the application of DNA barcodes in understanding the processes of community ecology (e.g., Kress et al., 2009, 2010; Uriarte et al., 2010; Pei et al., 2011; Swenson et al., 2012). We used samples from four plots in tropical Asia, two from subtropical Asia, one from temperate Asia, two from the neotropics, five from temperate North America, and one from temperate Europe (**Table 1**). A total of 1347 species were included in the final dataset, encompassing 553 genera in 125 families and 43 orders.

Three samples per species were directly sequenced at three separate loci corresponding to the commonly used DNA barcode markers: (1) 552 bp of the ribulose-bisphosphate/carboxylase Large-subunit gene (*rbc*L;), (2) approximately 760 bp of the maturase-K gene (*mat*K), and (3) the *psb*A-*trn*H intergenic spacer (median 450 bp). All three markers are derived from the chloroplast genome. Methods for DNA extraction, PCR, and sequencing follow Kress et al. (2009) and Pei et al. (2011). Sequences for some of taxa were retrieved from GenBank (trees in Yosemite, Wind-River, and Wytham plots); for an individual species we used only our original sequence data or GenBank data and never combined original DNA barcode sequence data with GenBank data for the same species. All DNA barcode data generated for the study have been submitted to GenBank (see Supplemental Table S1 for accession numbers for our original sequences and those retrieved from GenBank).

#### **SEQUENCE ALIGNMENT**

DNA barcode sequence data for trees collected from the 15 forest dynamics plots at each of the three separate markers were aligned across all species then concatenated together in an alignment supermatrix for estimation of phylogenetic relationships. The *rbc*L gene data were aligned through back-translation, using transAlign (Bininda-Emonds, 2005). The *mat*K gene was also initially aligned using transAlign, and then adjusted manually to remove gaps corresponding to frame-shift mutations. Following manual adjustment of the alignment to remove gaps, the matrix was aligned a second time using MAFFT (Katoh and Standley, 2013), implementing the FFT-NS-2 option for larger datasets. The *psb*A-*trn*H marker was aligned using SATe (Liu et al., 2012), implementing the PRANK aligner (Löytynoja and Goldman, 2005) for sub-groupings and the MUSCLE aligner (Edgar, 2004) for merging sub-alignments. SATe is a "divide and conquer" style algorithm where an initial set of sequences is subdivided into smaller sets which are aligned and then joined back into a single alignment using a consensus alignment algorithm. SATe is iterative and goes through many cycles of generating sub-alignments and merging to consensus alignment using the likelihood score of a phylogenetic tree to determine an optimal alignment state. To improve the estimate of alignment in SATe, a guide tree derived from the Phylomatic portal (Webb and Donoghue, 2005) was used as a starting tree in the alignment. The guide tree used in

and tropical habitats and are distributed globally.


**Table 1 | Descriptions of the ForestGEO plots examined in this study are given.**

*For each plot, the number of species, genera, and families is shown, as are general classification of the Geography, habitat type, and GPS coordinates. The number of species in the Mega-phylogeny is given, and is smaller than the sum among all communities due to shared species in some communities.*

SATe was not a constraint tree, and thus the tree inferred from a final alignment in SATe may differ from the phylomatic input tree. SATe allowed us to generate a single alignment block for the hyper-variable *psb*A-*trn*H marker for all species, in contrast to sets of nested alignments as used previously (Kress et al., 2009).

### **PHYLOGENETIC RECONSTRUCTION**

The aligned 3-gene matrix was fully analyzed in the phylogenetic tree-building algorithm GARLI (Zwickl, 2006) via the CIPRES portal (Miller et al., 2010) to produce the 1347 taxon phylogeny that we call the "mega-phylogeny." The configuration file used with GARLI is given in Supplemental Table S2. In addition to the aligned 3-gene matrix we utilized a phylogenetic constraint tree (described below). The aligned data-file was also partitioned by locus for use in GARLI, so that each of the three genes had separate model parameters estimated using the program MODELTEST 3.7 (Posada and Crandall, 1998). The use of SATe greatly assisted model estimation at this stage because only a single model was required for the *psb*A-*trn*H marker, whereas with nested alignments either a single model would need to be chosen for all discrete alignment blocks (which would be artificial since the same model would not readily be chosen for all alignment partitions), or a very large number of models would be estimated separately for the same genetic locus. For a best tree search, 100 search replicates were initiated, each starting from random tree, to search for a best, most likely phylogeny. Further, we implemented a separate set of 100 bootstrap runs under the CAT-GAMMA model in GARLI, while still using the ordinal level constraint tree, to quantify support for the topology used in subsequent analyses.

Because of the relatively rapidly evolving sequence data provided by the DNA barcode markers and the inclusion of a large number of species spanning broad evolutionary distances, we employed a constraint tree to fix the deep phylogenetic relationships (Kress et al., 2010). The search for the best tree was performed with a constraint tree derived from Phylomatic using the R20120829 phylogenetic tree for plants, derived from the Angiosperm Phylogeny Group III reconstruction (APGIII, 2009). The constraint was modified in Mesquite (Maddison and Maddison, 2014) in which each taxonomic order was reduced to a polytomy. This effect enforced phylogenetic relationships at the level of order and above. The molecular data were then responsible for reconstructing family, generic, and species relationships within orders. The quality of the phylogenetic reconstructions was evaluated by quantifying the fraction of resolved nodes, and the level of monophyly at the taxonomic family- and genuslevels. Although the constraint tree fixed relationships among orders according to APGIII, the branch lengths for all groups of taxa, including those fixed by the constraint-tree, were calculated from the aligned DNA barcode sequence alignment. As such, the combination of the constraint and sequences enabled phylogeny reconstruction by limiting the searched tree space and estimation of branch lengths across the depth of the tree.

In addition to constructing a single phylogeny for 15 ForestGEO community plots, phylogenetic relationships were estimated in each of the 15 plots separately. Taxa corresponding to each plot were pruned out from the aligned 3-marker matrix produced for the full 1347 taxon set and a phylogeny was constructed using the alignment for the taxa present in each plot as described above. Any benefits of high-taxon density to sequence alignment in the larger dataset were accordingly propagated to the estimates of alignment for each individual plot. For each of the 15 community plots, a best tree search with 100 independent search replicates was conducted in GARLI via the CIPRES portal using the same configuration parameters as the megaphylogeny. The best scoring ML tree was used in subsequent comparisons between individually constructed community phylogeny and those estimated within the context of the mega-phylogeny.

To evaluate how well taxa were resolved in the mega-phylogeny and in individually constructed plot phylogenies, the fraction of non-zero length branches (that is, the fraction of resolved branches) were calculated for the entire mega-phylogeny, for individual plots that were pruned out of the mega-phylogeny, and for each individually constructed plot phylogeny. To compare how changes in taxonomic composition were associated with degree of phylogenetic resolution, spearman rank correlation was computed between the resolution of each phylogeny with species richness, Mean Phylogenetic Distance (MPD) and Mean Nearest Taxon Distance (MNTD), the latter described below. Similarly, we used spearman correlation to examine how rates of resolution changed as a function of latitude, as we moved from the tropics to temperate environments.

### **MEAN PATH LENGTH (MPL) CALIBRATION OF PHYLOGENY**

Mean Path Length (MPL) calibration (Britton et al., 2002) was used to transform all molecular phylogenies into ultrametric chronogram. MPL estimates branch lengths using the mean of all branches descending from it, and thus is closer to molecular clock calibration. The algorithm was implemented using APE (Paradis et al., 2004) implemented through the Picante package (Kembel et al., 2010) of the R programming language (R Core Team, 2012) with the "chonoMPL" command, setting the root age to 1, as opposed to attempting to assign any dates. This method was selected because (1) it most directly reflects inferred evolutionary distances (i.e., branch lengths) with the minimum of alteration of branch length relative to other methods of generating an ultrametric tree (Britton et al., 2002), and (2) attempts to use Bayesian methods for branch length calibration (e.g., BEAST; Drummond and Rambaut, 2007) were unable to reach a state where the optimization converged for the larger phylogenies. Thus, each of the 15 separately generated community phylogeny, and the mega-phylogeny were transformed with MPL and these transformed phylogenies were used in analysis of phylogenetic distance (PD) and diversity (Sections Phylogenetic Diversity Metrics and Comparative Community Phylogenetic Diversity and Structure).

### **PHYLOGENETIC DIVERSITY METRICS**

Three common metrics of phylogenetic diversity were utilized to quantify differences among the 15 ForestGEO plot-based community phylogenies. All of these metrics were estimated within the Picante package (Kembel et al., 2010) of the R programming language. For each plot community, the phylogenetic diversity was calculated and then the values observed were compared for individually constructed phylogenies and for those estimated within the mega-phylogeny. The PD metric (Faith, 1992), which sums the branch lengths for any defined set of taxa in a phylogeny, is correlated with species richness, but greatly refines estimates of diversity by incorporating a quantitative measure of evolutionary divergence (Faith, 1992; Forest et al., 2007; Morlon et al., 2011). For individually constructed community phylogenies, PD was simply the sum of all branch lengths in the phylogeny. For community phylogenies within the mega-phylogeny, PD was the sum of all branch lengths within the mega-phylogeny connecting the species belonging to that community.

The second metric utilized was MPD (Webb et al., 2002), which obtains an average for the pair-wise PD across all pairs of taxa in a community. As such, MPD is not directly correlated with species number by default, and is strongly influenced by branch lengths at the deepest nodes of the phylogeny (Swenson, 2013). This metric gives an estimate of the overall divergence of taxonomic clades present in a community and is sensitive to replacement of taxa that differ in broad taxonomic placement.

The third metric employed was MNTD (Webb et al., 2002), which provides an average of the distances between each species and its nearest phylogenetic neighbor in the community. MNTD quantifies the degree that a community may be a set of closely related species vs. a heterogeneous set of taxa from disparate taxonomic clades. MNTD is necessarily sensitive to replacement of closely related taxa and is much less sensitive to changes at the basal (or oldest) nodes of the phylogeny. For each of these terms, the phylogenetic diversity is inferred through the summed branch length distances connecting species in the phylogeny, thus distance is equivalent to diversity.

The absolute values of PD, MPD, and MNTD are not relevant here; rather the differences in these metrics estimated from independently derived phylogenies vs. those estimated from the mega-phylogeny are most important. To compare how estimates of phylogenetic diversity vary, the proportional difference for the values in each community were measured and values of difference were plotted for all 15-plot communities. For each metric, 15 values were calculated representing the difference between individually constructed plot phylogeny and values inferred from the mega-phylogeny. The percentage difference was calculated as: [(Mi − Mj)/Mj]∗100 where M = the metric under evaluation (PD, MPD, or MNTD), i = the value estimated from individually constructed community phylogeny and j = the value estimated from the mega-phylogeny. A value of zero corresponds to no difference in estimates of PD between that inferred in the mega-phylogeny and that from individually constructed phylogenies. We further examined if there was a significant correlation between latitude and phylogenetic diversity using the spearman correlation coefficient with decimal values of latitude for each community plot. Whereas species richness is known to exhibit a strong latitudinal gradient, we used this correlation to evaluate if phylogenetic diversity metrics exhibit similar patterns.

#### **COMPARATIVE COMMUNITY PHYLOGENETIC DIVERSITY AND STRUCTURE**

To compare the phylogenetic diversity and structure among ForestGEO plots, two methods were used, both estimated within the Picante package of the R programming language, and using the MPL transformed mega-phylogeny. The first metric was the Inter-community Mean Pairwise Distance, which is a measure of phylogenetic beta diversity (Webb et al., 2002) and is calculated as the mean for all pair-wise comparisons of PD between the taxa of two different communities (the "mpd.comdist" routine within Picante). The second metric is the MNTD among nearestneighbor pairs of species in different communities (the "comdistnt" routine within Picante) and is sensitive to higher-level taxonomic substitutions (i.e., changes in representation of taxonomic family or order) among communities. For mpd.comdist and comdistnt, both the mean and variance of the intercommunity PDs were plotted.

To further test if each of the 15 ForestGEO plots was a random sample of the larger community of species represented by the mega-phylogeny, a randomization test implemented in Picante was used to estimate the standard effects size of each of the three PD metrics. This test was run for the three phylogenetic diversity metrics PD, MPD, and MNTD using the MPL transformed mega-phylogeny. For each of the three metrics, the algorithm in Picante was run using 999 randomizations of the community within the mega-phylogeny applying the "taxa.labels." The "taxa.labels" model maintains the species richness of each community as well as the number of forest plots a particular species may be assigned to (i.e., a species observed in one forest can only be found in one forest in the randomized data), but alters the evolutionary relationships (i.e., branch lengths connecting species) in that community by randomizing the names of the species at the tip of the phylogenetic tree (Webb et al., 2002). The model generates a distribution from the 999 independent randomizations, against which the observed value of phylogenetic diversity (PD, MPD, or MNTD) may then be compared and a *p*-value assigned to it. Communities with a *p*-value of *<*0.05 were judged to be significantly different from random within the context of the 15 plot mega-phylogeny. *Z*-values, observed and expected values of diversity, and *p*-value are given as supplemental data (Supplemental Tables S3–S5, respectively, for PD, MTD, and MNTD). Departures from random have been interpreted as a signal for local-level processes within communities, such that species with observed PDs significantly less than the randomized mean are more closely related than expected (i.e., phylogenetically clustered) and hence the result of environmental filtering on phylogenetically structured traits (Webb, 2000). Alternatively, species with evolutionary distances significantly greater than the observed mean are more distantly related than expected (i.e., phylogenetically overdispersed), which is consistent with the role of competition in structuring species composition (Webb et al., 2002). The entire ForestGEO mega-phylogeny was treated in essence as a global "meta-community" and as such these metrics provide evidence for similar ecological processes among communities that are linked to the environment or taxonomic structure.

## **RESULTS**

## **PHYLOGENETIC RECONSTRUCTION**

Phylogenetic resolution, which is the fraction of non-zero length branches in a phylogeny, varied among the 15 single-plot phylogenies and the 15-plot mega-phylogeny. The 15-plot megaphylogeny with molecular branch lengths selected from the most likely of 100 independent maximum-likelihood tree searches is shown in **Figure 2**. The distribution of the Orders throughout the 15-plot mega-phylogeny are presented in **Figure 3A**; with the diversity of orders within each plot shown in **Figure 3B**. The fraction of resolved species for the mega-phylogeny was over 78% using the phylogeny with the best likelihood score derived from 100 independent search replicates. A consensus tree from rapid bootstrapping of the mega-phylogeny found 70.2% of all nodes were supported using majority rule 50% criterion, which closely mirrored the 78% resolution in the highest scoring ML

tree. The rates of resolution for the independently derived community phylogenies (**Table 2**) ranged from 81% (Dinghushan) to 100% (Wytham and Yosemite). A significant relationship was found between phylogenetic resolution and species richness (*r* = −0*.*799, *p >* 0*.*001), as smaller community phylogenies (and those at higher latitudes) were more likely to be fully resolved. Importantly, however, phylogenetic resolution for a plot was consistently higher when estimated within the context of the mega-phylogeny (**Table 2**). On average a 3.5% increase in resolution was found, ranging from an 8% increase for Bukit-Timah and Changbaishan to no increase for Wind-River and Yosemite (**Table 2**).

A significant relationship was found between MNTD for a plot and its phylogenetic resolution (*r* = 0*.*874; *p >* 0*.*001), with higher MNTD equating to improved resolution. A similar effect was seen with MPD (*r* = 0*.*658; *p* = 0*.*008). The relationship of MNTD with phylogenetic resolution paralleled the observation of species richness and phylogenetic resolution, and was similar to correlation with latitude (*r* = 0*.*397, *p* = 0*.*142), such that as communities were composed of fewer species, it was easier to distinguish among them topologically.

#### **COMMUNITY PHYLOGENETIC DIVERSITY AND STRUCTURE**

The three diversity metrics (PD, MPD, and MNTD) calculated for each plot varied for those derived from the mega-phylogeny vs. the individually constructed plot phylogenies (**Figure 4**). A weak relationship was observed between species richness and the proportional difference for PD (*r* = 0*.*393, *p* = 0*.*083), but exhibited a significant positive relationship for MPD (*r* = 0*.*741, *p* = 0*.*002) and MNTD (*r* = 0*.*525, *p* = 0*.*028) as larger plots exhibited less differentiation in the estimated metrics (**Figure 4**). Averaged over all communities, the percent difference in estimated PD was, PD = 14.38%, MPD = 2.297%, and MNTD = 38.76%. The percent difference for MNTD was striking, and is most evident in the smallest plots with a range of 60% divergence for Changbaishan, to 15% divergence for BCI (**Figure 4**), which

reflects the difficulty that phylogenetic reconstruction methods may have in inferring evolutionary distances when the mean of those distances is very large. The improvements in estimates of PD within the mega-phylogeny are most dramatic for the smallest plots where the higher taxon density of the mega-phylogeny greatly improves estimates of branch lengths among all species found in those communities. The inter-plot Mean Phylogenetic Distance (inter-MPD) was broadly similar for 13 of the 15 plots (**Figure 5**), with only the most species poor plots (e.g., Wind-River and Yosemite) differing significantly from the other 13 plots. This reflects the wide taxonomic composition of many of the plots, where high variation within plots obscures differentiation among the plots, as seen through taxonomic representation of different orders within each plot (**Figure 3B**). Similarly, the inter-plot Mean Nearest Taxon Distance (inter-MNTD) exhibited no differentiation among any of the ForestGEO plots, regardless of geographic location or species richness (**Figure 4**).

In contrast to the inter-community diversity metrics, randomization tests, which evaluate if communities are a random subsample of the larger phylogeny, found that the communities were not a random set of species (**Table 3**). In the three PD metrics used, all three exhibited significant differences from

individual plot are mapped on the mega-phylogeny in red to show the evolutionary and taxonomic diversity present in each plot.

random in the most speciose plots, with a consistent trend toward their being significantly clustered (**Table 3**, and Supplemental Tables S3–S5 for PD, MPD, and MNTD, respectively). For PD, the five temperate sites exhibited no departure from random, whereas each of the plots with more than 62 species (excepting Luquillo) was significantly clustered. For MNTD the result was even more skewed with 12 of the 15 plots exhibiting significant clustering. For MPD significant clustering was found for the four most species rich tropical plots (BCI, Bukit-Timah, Dinghushan, and Gutianshan), whereas the most species-poor community plots were inferred to be overdispersed (Wabikon Lake, Wind River, Wytham, and Yosemite). Overall the eight tropical or sub-tropical plots, when considered over all three PD metrics, were significantly clustered in 15 out of 24 cases. In the remaining nine cases they were not different from random, and none were inferred to be over-dispersed. Alternatively for the seven species-poor temperate plots, four were overdispersed (only with MPD), eight were significantly clustered (seven for MNTD and one for PD with Changbaishan), and the remaining 12 showed no departure from random (**Table 3**). Two plots, Luquillo and Nanjenshan, were



*The fraction of non-zero length nodes in the phylogeny was used to determine the percent resolution for the best-supported ML phylogeny.*

community phylogeny vs. that observed for the same community in the mega-phylogeny. Values are plotted as a function of Species Richness of the ForestGEO community.

consistent in exhibiting no significant departures from random for any of the phylogenetic diversity metrics whereas all other of the plot phylogenies exhibited some significant departure from random for at least one of the metrics.

**FIGURE 5 | Two methods to infer differentiation among communities are shown, with the inter-community MNTD (top) and**

**inter-community MPD (bottom).** Boxplots for each community show the mean (dark bar within box), interquartile range (box), and 95% confidence interval (whisker bars), computed from all pairwise contrasts between plots.

#### **DISCUSSION**

In the field of ecology phylogenetic data have been used to understand ecological processes (Webb et al., 2002; Cavender-Bares et al., 2009), the roles of trait conservatism and dispersal limitation in structuring communities (Fine and Kembel, 2011; Liu et al., 2013), and the regulation of beta diversity (Swenson et al., 2012). In addition, phylogenetic information has been applied to the identification of specific environments critical for conservation (Faith, 1992; Forest et al., 2007; Morlon et al., 2011). Accordingly, the ability to generate and use phylogenetic data to address core questions in ecology and to assess conservation priorities are of increasing importance.

The results shown here demonstrate that constructing a single mega-phylogeny inclusive of many individual community plots improves the estimation of the evolutionary relationships and distances among species in each separate plot. The mega-phylogeny is also helpful in examining the patterns of phylogenetic diversity within and among plots to explore broad scale patterns that may reflect processes regulating community assembly and the maintenance of diversity. Long-term biodiversity monitoring plots, **Table 3 | Values for three species richness (SR) and three Phylogenetic Diversity metrics Phylogenetic Distance (PD), Mean Phylogenetic Distance (MPD), and Mean Nearest Taxon Distance (MNTD) are given for each plot.**


*For each metric 999 randomizations were used to assess departure from random community structure. Significant differences from random are in bold, with pattern denoted by superscript. Standard effect sizes, Z and p-values are reported in Supplemental Tables S3–S5.*

*, Significant Overdispersion; , Significant Clustering.*

such as the ForestGEO network, provide an ideal context for investigating phylogenetic diversity and geographic structuring among plots to address questions regarding community assembly at very broad scales.

#### **GENERATING PHYLOGENIES**

The use of a constraint tree to construct the mega-phylogeny was adopted in this study and it is recommended for use in large community phylogenies, particularly those built with rapidly evolving sequence data as found in DNA barcodes (Kress et al., 2010). For example, the non-protein coding marker *psb*A*-trn*H has been used phylogenetically at very low taxonomic scales (e.g., within genera or families) because of the difficulty in aligning sequences among distantly related taxa. This limitation has slowed its adoption as an official DNA barcode marker (Hollingsworth et al., 2011). However, in this study we were able to use the SATe algorithm to align *psb*A-t*rn*H across all species, including distantly related ones, in the analysis rather than as in prior studies in which the marker was aligned in a nested format within a supermatrix and did not contribute to the inferred relationships of deeper taxonomic scales (Kress et al., 2009; Pei et al., 2011). This marker evolves very rapidly and global alignment may have contributed to the non-constrained mega-phylogeny exhibiting differentiation from expectations in APGIII. However, the use of *psb*A*-trn*H in a global alignment produced a higher fraction of resolved nodes than the use of only *rbc*L+*mat*K, and did not negatively affect rates of family and generic monophyly (**Table 1**). Also, a nested approach to alignment of *psb*A-*trn*H requires some subjective decisions with regards to the scale at which to group sequences, which may result in the exclusion of sequences from taxa that are not readily included in groupings. This effect in turn will result in a greater asymmetry in the aligned sequence matrix, and, therefore, will complicate model selection for different data partitions in phylogenetic inference. For these reasons we recommend a global alignment of *psb*A-*trn*H in plant DNA barcode phylogenies using SATe in conjunction with a constraint tree that will enforce higher-level taxonomic resolutions.

Even the relatively limited sequence content from DNA barcode markers, as demonstrated here, can be successfully used to the construct a highly robust phylogeny across multiple plots with high rates of resolution and monophyly. When compared with other studies of very large phylogenies, the mega-phylogeny had comparable rates of resolution among species (Smith et al., 2009, 2011), and an overall remarkably high rate of 78% taxonomic resolution. The 15-plot mega-phylogeny with 1347 species in 43 orders and 125 families (**Table 1**, **Figure 2**) was significantly larger than the individual plots in which the average was 12 orders and 38 families (**Table 1**). The mega-phylogeny improved resolution among species in most communities relative to constructing phylogenies for individual plots (**Table 2**). The construction of a community phylogeny is greatly improved in the context of resolving difficult taxonomic relationships when taxon density is high (Smith et al., 2011) and the lower level of taxonomic resolution in the mega-phylogeny as a whole does not affect the inferred rates of resolution for the included plots. The increased taxon density of the mega-phylogeny represented by a lower estimate of the MNTD was a central driver in improving rates of phylogenetic resolution (see Supplemental Table S4). As the genetic distances among species become more continuous and evenly distributed, the ability to infer phylogenetic relationships increases, which is reflected in the strong correlation between decreasing MPD and increasing phylogenetic resolution (0.73). Therefore, as ever-larger mega-phylogenies are generated to include an expanded scope of land plant diversity, then more fully resolved and well-supported community phylogenies can be pruned from them.

#### **IMPROVING PHYLOGENETIC RESOLUTION**

Improving the accuracy of relationships among species in a community phylogeny is not just a methodological detail. Poorly resolved phylogenies can result in biased estimates of the diversity metrics used to infer ecological process (Davies et al., 2012) or may lead to very different conclusions about ecological process in a particular community (Kress et al., 2009). The low rates of taxonomic resolution in supertrees relative to molecular derived community phylogenies may adversely affect ecological inference (Kress et al., 2009); yet with supertrees, at least all samples in a study are assembled and dated similarly, and thus results observed among communities are consistent and comparable (Fine and Kembel, 2011). The challenge of collecting genetic data for all the members of a community has limited the use of molecular phylogeny in studies of community ecology, particularly in studies comparing across multiple communities (Swenson et al., 2012). With the widespread generation of DNA barcode data across tropical plots, such as the ForestGEO network of forest dynamics plots, information on phylogenetic relationships can now be applied to many communities simultaneously. The benefits of constructing phylogenies for multiple communities concurrently as well as the advantages of increased taxonomic resolution and more accurate evolutionary distances among species and clades are many. Because evolutionary distance, or branch lengths, are necessary to infer processes of community assembly, one of our goals was to quantify the improvement of estimating evolutionary distances through the use of a megaphylogeny of many plots to construct phylogenies of individual plots.

Nearly all studies of community phylogenetics have examined one community at a time. In most cases the community phylogenies were constructed using supertree methods, including phylomatic (Webb, 2000; Cavender-Bares et al., 2004; Fine and Kembel, 2011) or direct sequence data (Kress et al., 2009; Uriarte et al., 2010; Pei et al., 2011), but it is difficult to know if differences in the results are attributable to differences in the phylogeny employed or in the ecological processes themselves. We have shown here that constructing a molecular phylogeny for all communities together improves estimates of phylogenetic diversity and structure compared to estimating individual phylogenies for each community.

#### **PHYLOGENETIC DIVERSITY**

A mega-phylogeny may also improve estimates of community phylogenetic diversity through the conversion of all phylogenies into molecular-clock-based ultrametric trees using the MPL adjustment (Britton et al., 2002) and then directly estimating three commonly employed diversity metrics (**Table 3**). Communities with the lowest species diversity showed the greatest contrast in diversity measures when estimated in the megaphylogeny vs. the individual-plot phylogenies (**Figure 4**). For example, in the Yosemite and Wind-River plots (where species richness = 7), diversity estimates from individually-derived phylogenies were less than half that observed in the mega-phylogeny; whereas for the larger plots the differences were much less. For all communities, the values of PD were lower in individuallyconstructed community phylogenies (**Figure 4**). We note that this result considers only trees, and that work comparing canopy and understory diversity suggest that temperate forests may contain comparable phylogenetic diversity when all plants are considered (Halpern and Lutz, 2013). However, for our observations, divergence between estimates were correlated with species richness of the plot (Species Richness vs. % difference in MPD = 0.68) with smaller plots showing the greatest differentiation, and suggests that the mega-phylogeny should greatly improve comparisons among plots, particularly when those communities differ in species richness.

#### **PHYLOGENETIC STRUCTURE AMONG COMMUNITIES**

A growing, but still small, number of studies have compared phylogenetic structure across communities (Hardy et al., 2012; Swenson et al., 2012; Oliveira-Filho et al., 2013a,b). However, as shown here the evolutionary structure among plots, via the inter-community measures of MPD and MNTD (**Figure 5**), complements similar patterns of phylogenetic structure within communities. The lack of differentiation among plots (**Figure 5**), with the exception of the extremely taxon-poor Yosemite and Wind-River plots in the Cascade and Sierra Nevada Mountains, is striking. The prevalence of trees in the families Fabaceae, Euphorbiaceae, and Myrtaceae in the tropical plots and their relative paucity in the plots located in temperate environments was not significant enough to differentiate these communities in most cases. The effect of latitude on measures of phylogenetic diversity was highly significant (with PD, MPD, and MNTD showing Spearman correlation coefficient of -0.905, 0.684, 0.521, respectively) and followed changes in species richness along the tropical to temperate transition. The correlation for PD was negative with latitude, whereas MPD and MNTD were positive, reflecting how the two latter metrics remove the effect of species richness on phylogenetic diversity. The reliance of MPD on the genetic distances of the most basal nodes of the phylogeny and the emphasis on the presence or absence of basal lineages suggest that substitution of one family (or order) in communities that differ in species number are equivalent. It is even more striking that the inter-community estimates of MNTD should show similarly low rates of differentiation among sites. While the differentiation in MPD can be more readily explained by the role of deeper nodes in determining differentiation, the MNTD would be inflated when comparing environments from the tropics with that of the temperate zones. The lack of differentiation among plots corresponds well to the observation that trees in these plots are in general phylogenetically clustered, and that environmental filtering is driving assembly processes. The main caveat is that we can infer a role of environmental filtering from phylogenetic clustering only when the traits that drive fitness are evolutionarily conserved.

#### **PHYLOGENETIC DISTANCE AND ECOLOGICAL PROCESSES**

A central benefit of constructing a mega-phylogeny containing many communities is our ability to more accurately contrast ecological processes operating in different communities. Therefore, phylogenetic patterns that are observed (e.g., clustering, overdispersion) are not attributable to differences in how community phylogeny are assembled, but are more directly linked to different ecological processes in those communities. We note that disentangling these processes within a community phylogenetic context remains a challenge, as we are just beginning to apply phylogenetic information to multiple communities and appropriate null models of phylogenetic pattern that incorporate explicit geographic differentiation are still being developed. The role of dispersal limitation and biogeographic vicariance in generating differences in species composition observed in different communities affect our results as would community assembly processes within sites. Yet the patterns derived with existing models can at least be viewed as having an ecological or evolutionary basis rather than a simple product of phylogeny construction.

In our study, for each of the different metrics of PD the most diverse tropical communities were composed of a set of more closely related species than expected at random in the context of the null model used (**Table 3**). The pattern of increased relatedness was most evident for the nearest-taxon metric MNTD, which exhibited significant clustering for all but two plots, but was also true for MPD and PD for the tropical communities. This clustering of related species could be attributable to several factors. From the perspective of community ecology, these observations are consistent with local scale environmental filtering for phylogenetically conserved traits and niche conservatism. We note that with such geographically widespread communities other factors, including dispersal limitation linked with regional vicariance speciation, will play important roles and will require further investigation. Null models of no-dispersal limitation among communities will need to be explicitly re-examined in future work as we continue to construct phylogenies that encompass an increased number of communities.

With respect to environmental filtering and niche conservatism, these two processes are not mutually exclusive, although they make different assumptions regarding the role of phylogenetic conservatism and the role of dispersal. Much work has been done on the degree to which trait conservatism occurs in tropical forests (reviewed in Cavender-Bares et al., 2009) and the role of trait conservatism on phylogenetic pattern (Kraft et al., 2007; Crisp et al., 2009). Kraft et al. (2011) demonstrated that increasing phylogenetic trait conservation will amplify phylogenetic structure, which results in communities composed of more closely related sets of species. Crisp et al. (2009) examined phylogenetic distribution across major South American biomes and found a high degree of constraint on the ability of related groups to invade novel biomes. These results are concordant with our observations of the tropical communities studied here, in which species in each community tended to be phylogenetically clustered. A growing number of studies (e.g., Hardy et al., 2012; Ricklefs et al., 2012) have found evidence for globally-scaled processes regulating species diversity in the tropics. For example, in the neotropics the number of individuals and the number of species in certain families is strongly conserved across five replicated forest plots (Ricklefs et al., 2012). While the main objective of that particular study was an evaluation of the theory of ecological neutrality in community assembly (Hubbell, 2001), the results are concordant with high levels of phylogenetic trait conservatism and environmental filtering (Kraft et al., 2011). In some cases, field-based studies have shown mixed results in linking phylogenetic signal to trait dispersion in tropical forests (Liu et al., 2013). Therefore, even though the current results are consistent with a global pattern of environmental filtering and niche conservatism as a driving force in community assembly, more work needs to be done to clarify the role of phylogenetic trait conservatism in large-scale community processes.

#### **ACKNOWLEDGMENTS**

This study was made possible by the CTFS-ForestGEO network through the support of the Smithsonian Institution, the Smithsonian Tropical Research Institute, The Chinese National Science Foundation, Ministry of Science and Technology, Taiwan, the Arnold Arboretum of Harvard University, and the Frank Levinson Family Foundation. The paper resulted from a CTFS-CForBio workshop in Changbaishan, China, supported by NSF grant DEB-1046113 to Stuart J. Davies, a National Natural Science Foundation of China grant 31200471 to Nancai Pei. In addition to additional National Natural Science Foundation of China grants 31011120470 and 312111072 to Zhanqing Hao. Support was also provided by the Ministry of Science and Technology, Taiwan, grants 101-2313-B-178-001-MY2 and 102-2313-B-178-002-MY3 to Chun-Lin Huang. Most importantly, we wish to note and thank the outstanding efforts of individuals in the field, who have made the voucher and tissue collections/identifications, as well as those in the lab, who have generated the molecular genetic data.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene. 2014.00358/abstract

#### **REFERENCES**


environmental filtering, biogeography and mesoclimatic niche conservatism. *Glob. Ecol. Biogeogr*. 21, 1007–1016. doi: 10.1111/j.1466-8238.2011. 00742.x


diversity more than seasonality: insights into the ecology of high legumesucculent-plant biodiversity. *S. Afr. J. Bot*. 89, 42–57. doi: 10.1016/j.sajb.2013. 06.010


evidence from New Guinea. *Ecography (Cop.)* 35, 821–830. doi: 10.1111/j.1600- 0587.2011.07181.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 May 2014; accepted: 26 September 2014; published online: 05 November 2014.*

*Citation: Erickson DL, Jones FA, Swenson NG, Pei N, Bourg NA, Chen W, Davies SJ, Ge X-j, Hao Z, Howe RW, Huang C-L, Larson AJ, Lum SKY, Lutz JA, Ma K, Meegaskumbura M, Mi X, Parker JD, Fang-Sun I, Wright SJ, Wolf AT, Ye W, Xing D, Zimmerman JK and Kress WJ (2014) Comparative evolutionary diversity and phylogenetic structure across multiple forest dynamics plots: a mega-phylogeny approach. Front. Genet. 5:358. doi: 10.3389/fgene.2014.00358*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Erickson, Jones, Swenson, Pei, Bourg, Chen, Davies, Ge, Hao, Howe, Huang, Larson, Lum, Lutz, Ma, Meegaskumbura, Mi, Parker, Fang-Sun, Wright, Wolf, Ye, Xing, Zimmerman and Kress. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## To move or to evolve: contrasting patterns of intercontinental connectivity and climatic niche evolution in "Terebinthaceae" (Anacardiaceae and Burseraceae)

#### *Andrea Weeks <sup>1</sup> \*, Felipe Zapata2, Susan K. Pell 3, Douglas C. Daly4, John D. Mitchell <sup>4</sup> and Paul V. A. Fine5*

*<sup>1</sup> Department of Biology and Ted R. Bradley Herbarium, George Mason University, Fairfax, VA, USA*

*<sup>3</sup> United States Botanical Garden, Washington, DC, USA*

*<sup>4</sup> Institute of Systematic Botany, The New York Botanical Garden, Bronx, NY, USA*

*<sup>5</sup> Department of Integrative Biology and Jepson and University Herbaria, University of California, Berkeley, CA, USA*

#### *Edited by:*

*Toby Pennington, Royal Botanic Garden Edinburgh, UK*

#### *Reviewed by:*

*Matthew T. Lavin, Montana State University, USA Wolf L. Eiserhardt, Royal Botanic Gardens, Kew, UK*

#### *\*Correspondence:*

*Andrea Weeks, Department of Biology, George Mason University, 4400 University Drive, 3E1, Fairfax, VA 22030, USA e-mail: aweeks3@gmu.edu*

Many angiosperm families are distributed pantropically, yet for any given continent little is known about which lineages are ancient residents or recent arrivals. Here we use a comprehensive sampling of the pantropical sister pair Anacardiaceae and Burseraceae to assess the relative importance of continental vicariance, long-distance dispersal and niche-conservatism in generating its distinctive pattern of diversity over time. Each family has approximately the same number of species and identical stem age, yet Anacardiaceae display a broader range of fruit morphologies and dispersal strategies and include species that can withstand freezing temperatures, whereas Burseraceae do not. We found that nuclear and chloroplast data yielded a highly supported phylogenetic reconstruction that supports current taxonomic concepts and time-calibrated biogeographic reconstructions that are broadly congruent with the fossil record. We conclude that the most recent common ancestor of these families was widespread and likely distributed in the Northern Hemisphere during the Cretaceous and that vicariance between Eastern and Western Hemispheres coincided with the initial divergence of the families. The tempo of diversification of the families is strikingly different. Anacardiaceae steadily accumulated lineages starting in the Late Cretaceous–Paleocene while the majority of Burseraceae diversification occurred in the Miocene. Multiple dispersal- and vicariance-based intercontinental colonization events are inferred for both families throughout the past 100 million years. However, Anacardiaceae have shifted climatic niches frequently during this time, while Burseraceae have experienced very few shifts between dry and wet climates and only in the tropics. Thus, we conclude that both Anacardiaceae and Burseraceae move easily but that Anacardiaceae have adapted more often, either due to more varied selective pressures or greater intrinsic lability.

**Keywords: biogeography, biome shifts, continental vicariance, diversification, long-distance dispersal, phylogenetic niche conservatism**

#### **INTRODUCTION**

A richer understanding of factors underlying the origin of the diverse tropical flora depends on aggregating global biogeographic, taxonomic and phylogenetic information from multiple plant clades. Although much progress has been made reconstructing the tempo and genealogical patterns of angiosperm evolution in the past two decades, most of the major plant lineages lack the comprehensive global sampling and taxonomic studies necessary to address basic questions regarding the timing of radiations, geographic origins, and dispersal histories. Many angiosperm families are common to tropical forests around the world, yet very little is known about which lineages are ancient and have experienced multiple vicariance events, and which lineages have experienced more recent long-distance dispersal. Moreover, the timing and directionality of such dispersal events is often unknown, although recent studies have found that many tropical radiations appear to have coincided with climatic and geographic events, such as Oligocene–Miocene cooling and drying in both the Americas and Africa, the connection of North and South America, and the Andean uplift (Bouchenak-Khelladi et al., 2010; Hoorn et al., 2010; De-Nova et al., 2012; Hughes et al., 2013).

Recent studies have highlighted the importance of phylogenetic niche conservatism in understanding large-scale patterns of plant biogeography and evolution (Wiens and Donoghue, 2004; Donoghue, 2008). The hypothesis is that because adaptation to new climatic zones requires a complex array of morphological and physiological innovations, as long as dispersal is possible,

*<sup>2</sup> Department of Ecology and Evolutionary Biology, Brown University, Providence, RI, USA*

*in situ* adaptation by the resident flora during periods of global cooling or drying may often be less common than immigration of other lineages that are pre-adapted to freezing or dry climates. For example, many Northern American temperate plant lineages have colonized the high-elevation South American Andes since overland connections were established between the two continents (Bell and Donoghue, 2005; Hughes and Eastwood, 2006). However, South American lineages have also colonized these freezing habitats, even from warm tropical lowlands (Donoghue, 2008), demonstrating that adaptation to new biomes, as well as immigration from other areas, can result in diversification within regions (Donoghue, 2008; Donoghue and Edwards, 2014).

The angiosperm lineages Anacardiaceae and Burseraceae (Sapindales) represent an excellent study system for investigating the biogeographic history of tropical diversification and the relative importance of movement and climatic adaptation in angiosperm evolution. Once recognized as the single taxon "Terebinthaceae" (Marchand, 1869) and subsequently shown to be sister taxa (Fernando et al., 1995; Gadek et al., 1996; Bremer et al., 1999; Savolainen et al., 2000a,b; Pell, 2004), these families collectively comprise ca. 1500 lianas, shrubs and trees distributed on every continent except Antarctica and are major elements of the structure and diversity of temperate, seasonally dry tropical forest and tropical wet forest floras (Gentry, 1988; Pennington et al., 2010) (**Figure 1**). Anacardiaceae species display a remarkable range of fruit morphologies and seed dispersal syndromes not present in Burseraceae (see Daly et al., 2011; Pell et al., 2011). This disparity is primarily responsible for the recognition of more genera in Anacardiaceae (82) as compared to Burseraceae (19), although each family has approximately the same number of species. However, the geographic range as well as the morphological and ecological diversity of Anacardiaceae considerably eclipses that of Burseraceae, which makes the Terebinthaceae lineage a valuable comparative model system for testing the relative contributions to diversification of climate adaptation and intercontinental movement.

The differences between the families raise the question of how sister lineages having identical ages and nearly equivalent numbers of species could have taken such different evolutionary trajectories whilst becoming so widespread. Anacardiaceae comprise a cosmopolitan group, are found at a greater range of latitudes and elevations, and a broader range of habitats than Burseraceae. All Burseraceae, by contrast, are intolerant of frost and are thus limited to lower elevation zones in the African, American, Asian and Pacific (sub-)tropics.

Testing which events may have been correlated with cladogenesis in Terebinthaceae is contingent on reconstructing a densely sampled, time-calibrated phylogeny for the group. Divergence time estimates are a critical component of testing historical biogeographic hypotheses for Terebinthaceae because routes of overland range expansion available to Anacardiaceae and Burseraceae may have been different as global climate fluctuated over geological time. For example, ancestral lineages of frost tolerant Anacardiaceae may have been able to disperse through regions

**FIGURE 1 | (A)** Global distribution of Anacardiaceae and Burseraceae (red area is Anacardiaceae only, blue is Burseraceae only, and gray is where the two families' distributions overlap), and fossils used to calibrate the phylogenies in this study (1, Oligocene/Late Eocene *Cotinus* leaf; 2, Middle Miocene *Loxopterygium* fruit; 3, Middle Eocene *Anacardium* fruit; 4, Middle Eocene *Bursericarpum aldwickense* and *Protocommiphora europea* fruits; 5, Middle Eocene *Bursera* subgenus *Elaphrium* leaf). **(B)** Fruit diversity in Anacardiaceae and Burseraceae (1, *Triomma* pseudocapsule*;* 2, *Protium, Bursera* and *Boswellia* nuculaniums; 3, *Garuga* drupe; 4, *Solenocarpus* drupe; 5, *Spondias*, internally operculate drupe; 6, *Dracontomelon*, externally operculate drupe; 7, *Loxostylis*, drupe subtended by wing-like calyx; 8, *Anacardium*, drupe subtended by fleshy hypocarp; 9, *Campylopetalum*, drupe subtended by an accrescent bract; 10, *Swintonia*, drupe subtended by wing-like corolla; 11, *Schinopsis*, samara.) (illustrations 1–7 and 10–11 from Engler, 1883; 8 from Faguet, 1874/1875; 9 copyright Bobbi Angell).

located above the frost line during the Miocene (e.g., *Rhus*, Yi et al., 2004) whereas the entirely frost-intolerant Burseraceae may have been excluded from them. Historically, angiosperm biogeographers have hypothesized that cosmopolitan and pantropical groups experienced vicariance as a consequence of the break-up of Gondwana (Raven and Axelrod, 1974). Both Anacardiaceae and Burseraceae had been considered Gondwanan families in the past (Raven and Axelrod, 1974; Gentry, 1982). More recently, divergence time estimates for a number of angiosperm lineages having predominantly frost-free distributions, including Burseraceae, have indicated that the northern hemisphere land corridors have had a more influential role in establishing species' ranges than previously hypothesized (Chanderbali et al., 2001; Renner et al., 2001; Davis et al., 2004; Richardson et al., 2004; Weeks et al., 2005; Zerega et al., 2005). In the case of Burseraceae, Weeks et al. (2005) implicated a North American origin of the family followed by Paleocene migration of lineages eastward over the warm North Atlantic land bridge and along the Tethys Seaway to Southeast Asia, as well as early trans-oceanic dispersals to Africa and South America. However, this study did not optimize the ancestral distributions using a geographically diverse group of Anacardiaceae outgroup taxa nor did it incorporate known Anacardiaceae fossils as calibration points.

Contemporary studies have not clarified the relative contribution of vicariance and dispersal in generating the current distribution of both Anacardiaceae and Burseraceae and whether shifts in climatic niches may have promoted their diversification. Here, we address three sets of questions regarding the evolution of Terebinthaceae:


## **MATERIALS AND METHODS**

#### **TAXON SAMPLING**

Ingroup species from Anacardiaceae and Burseraceae were chosen on the basis of assembling the most complete biogeographic coverage possible from all major lineages. Species were selected with reference to preexisting phylogenies (Pell, 2004; Weeks et al., 2005; Fine et al., 2014) as well as to recent taxonomic literature that recognizes the new segregate genus *Poupartiopsis* (Mitchell et al., 2006) and the newly circumscribed genera *Searsia* and *Protorhus* (Moffett, 2007; Pell et al., 2008; **Table 1**). From Anacardiaceae we obtained 67 of 82 genera (169 species) and sampling was spread across the higher ranks of recently published classifications: (1) the five tribes sensu Mitchell and Mori (1987) as updated by Pell (2004), including Anacardieae (7/8 genera sampled/genera total), Dobineae (2/2), Rhoeae (39/47), Semecarpeae (3/5), Spondiadeae (16/20); and (2) Pell's and Mitchell's (Mitchell et al., 2006) modifications to Takhtajan's (1987) subfamilial system, including Spondioideae (16/20) and Anacardioideae (50/60). From Burseraceae, we obtained 16 of 19 genera (136 spp.) from the five taxonomic alliances sensu Daly et al. (2011): the *Beiselia* alliance (1/1 genus), the *Protium* alliance (3/3 genera), the *Boswellia* alliance (2/2 genera), the *Bursera* alliance (3/3 genera) and the *Canarium* alliance (7/10 genera). Burseraceae includes several disjunct genera (*Canarium*, *Commiphora*, *Dacryodes*, *Protium*) whose species are distributed in African, American, Asian, and Pacific regions, and care was taken to sample representatives from each. Outgroup taxa from Sapindales were chosen with reference to Gadek et al. (1996) and comprise species of Meliaceae (1 sp.; Muellner et al., 2006), Rutaceae (5 spp.; Groppo et al., 2008), and Sapindaceae incl. Aceraceae (15 spp.; Harrington et al., 2005).

#### **MARKER SELECTION AND SEQUENCE ALIGNMENT**

Sequence data for assessing the individual phylogenies of Anacardiaceae and Burseraceae have been generated by current authors using multiple phylogenetic markers (Weeks, 2003; Pell, 2004; Fine et al., 2005, 2014; Weeks et al., 2005; Pell et al., 2008). The published datasets overlapped for three DNA sequence regions: the nuclear ribosomal external transcribed spacer (*ETS*), the chloroplast *trnL* intron and *trnL-F* intergenic spacer (*trnL-F* region), and the chloroplast *rps16* intron. All of these regions have proven alignable across the targeted taxa and useful for investigating phylogeny at the familial and generic levels. These three datasets were expanded with additional taxa for the current study using amplification and sequencing protocols as outlined in publications referenced above. Multiple sequence alignment for each locus was carried out in MAFFT v7.0 (Katoh and Standley, 2013) with the E-INS-i algorithm. To improve alignment quality, we ran GBlocks V0.91b (Castresana, 2000) with parameters −b3 = 4, −b4 = 10, −b5 = h to clean the alignments as this has been shown to improve subsequent phylogenetic analyses (Talavera and Castresana, 2007). Before phylogenetic inference, we evaluated whether the final concatenated matrix should be partitioned by marker or by any combination of markers, and which nucleotide substitution model should be employed for the final partition scheme. For this analysis, we used the Bayesian Information Criterion as implemented in PartitionFinder (Lanfear et al., 2012) using the greedy algorithm, and we unlinked branch length estimates for each of the substitution models in each partition. Results of this analysis showed that the matrix should be treated as a single partition evolving under the GTR+I+Gamma model of nucleotide substitution.

#### **SOURCES OF FOSSIL CALIBRATION POINTS**

Both families have rich micro- and macro-fossil records with which to calibrate their phylogeny and test hypotheses of historical biogeographical evolution in Terebinthaceae. The three Anacardiaceae fossils chosen for calibration are classified within extant lineages: (1) an early Oligocene/Late Eocene *Cotinus* leaf fossil from the Florissant flora, Colorado, United States (34 Ma; MacGinitie, 1953); (2) a Middle Miocene *Loxopterygium* fruit fossil from the Ecuadorian Andes (10 Ma; Burnham and Carranco, 2004); and (3) a Middle Eocene *Anacardium* fruit fossil

#### **Table 1 | Accession information for sampled taxa.**

#### **ANACARDIACEAE**

*Abrahamia littoralis* ined., Pell 609 (NY), Madagascar, AY594403, KP055360, AY594434; *Allospondias lakonensis* (Pierre) Stapf, Pell 1035 (NY), Vietnam, KP055186, KP055361, KP055483; *Amphipterygium adstringens* (Schltdl.) Schiede ex Standl., Pendry 845 (E), Mexico, KP055187, AY594583, AY594496; *Anacardium excelsum* (Bertero and Balb. ex Kunth) Skeels, Daly 13970 (NY), Colombia, KP055188, KP055362, KP055484; *Anacardium occidentale* L., Mori 24142 (NY), French Guiana, KP055189, KP055363, AY594497; *Anacardium parvifolium* Ducke, Reserva Ducke (INPA), Brazil, KP055190, KP055364, KP055485; *Anacardium spruceanum* Benth. ex Engl., Esteril INPA2527 (INPA), Brazil, KP055191, KP055365, KP055486; *Antrocaryon amazonicum* (Ducke) B.L.Burtt and A.W.Hill, Mitchell 663 (NY), Brazil, AY594410, AY594584, AY594441; *Apterokarpos gardneri* (Engl.) C.T.Rizzini, Pirani 2586 (NY), Brazil, KP055192, AY594585, AY594498; *Astronium fraxinifolium* Schott, Pendry 505 (E), Bolivia, KP055193, AY594586, AY594542; *Astronium lecointei* Ducke, Reserva Ducke (INPA), Brazil, KP055194, KP055366, KP055487; *Baronia taratana* Baker, Pell 625 (NY), Madagascar, KP055195, AY594627, AY594568; *Blepharocarya depauperata* Specht, Craven et al. 6762 (MO), Australia, KP055196, KP055367, KP055488; *Blepharocarya involucrigera* F. Muell., R. Jensen 00826 (A), Australia, KP055197, KP055368, KP055489; *Bonetiella anomala* (I. M. Johnst.) Rzed., Johnston Wendt and Chiang 11488 (F), Mexico, KP055198, AY594587, AY594543; *Bouea macrophylla* Griff., Gentry and Frankie 66957 (NY), Peninsular Malaysia, KP055199, AY594589, AY594500; *Bouea oppositifolia* (Roxb.) Meisn., Ambri and Arifin W746 (A), Papua New Guinea, KP055200, –, KP055490; *Buchanania glabra* Wall. ex Engl., Pell 1062 (NY), Vietnam, KP055201, –, KP055491; *Buchanania reticulata* Hance, Pell 1057 (NY), Vietnam, KP055202, KP055369, KP055492; *Buchanania siamensis* Miq., Pell 1054 (NY), Vietnam, KP055203, KP055370, KP055493; *Campnosperma gummiferum* (Benth.) Marchand, Reserva Ducke (INPA), Brazil, KP055204, KP055371, KP055494; *Campnosperma micranteium* Marchand, Randrianaivo 691 (MO), Madagascar, KP055205, KP055372, KP055495; *Campnosperma schatzii* Randrian. and J.S. Mill., Randrianasolo et al. 602 (MO), Madagascar, KP055206, KP055373, –; *Campylopetalum siamense* Forman, Garrett 1398 (NY), Vietnam, KP055207, KP055374, KP055496; *Cardenasiodendron brachypterum* (Loes.) F.A.Barkley, Pendry 691 (E), Bolivia, KP055208, KP055375, AY594503; *Choerospondias axillaris* (Roxb.) B.L. Burtt and A.W. Hill, S. K. Pell 1108 (NY), Vietnam, KP055209, KP055376, KP055497; *Comocladia dodonaea* (L.) Urban, Specht 10 (NY), Puerto Rico, KP055210, AY594592, KP055498; *Comocladia engleriana* Loes., Garcia Castaneda 1472 (LL), Mexico, KP055211, KP055377, AY594506; *Comocladia mayana* Atha J.D. Mitch. and Pell, Atha 5604 (NY), Belize, KP055212, KP055378, KP055499; *Comocladia mollissima* Kunth, Gillis I0317 (TEX), Mexico, KP055213, KP055379, KP055500; *Cotinus coggygria* Scop., Bamps 8753 (LSU), France, KP055214, –, AY594545; *Cotinus obovata* Raf., Reichard 386 (MOR), USA, KP055215, AY594593, AY594546; *Cyrtocarpa edulis* (Brandegee) Standl., Elias 10714 (F), Mexico, KP055216, KP055380, AY594547; *Cyrtocarpa procera* Kunth, Torres 1240 (NY), Mexico, KP055217, AY594596, AY594548; *Dobinea vulgaris* Buch.-Ham., Delendick 76.1570 (NY), Nepal, KP055218, –, AY594512; *Dracontomelon dao* (Blanco) Merr. and Rolfe, Pell 807 (BKL), USA (cultivated in Hawaii), KP055219, KP055381, KP055501; *Dracontomelon duperreanum* Pierre, Pell 1034 (NY), Vietnam, KP055220, KP055382, KP055502; *Dracontomelon vitiense* Engl., Regaldo and Vodonaivalu 905 (F), Fiji, KP055221, KP055383, AY594550; *Drimycarpus racemosus* (Roxb.) Hook. f. 1, Grierson/Long 4261 (A), Bhutan, KP055222, –, KP055503; *Drimycarpus racemosus* (Roxb.) Hook. f. 2, Pell 1118 (NY), Vietnam, KP055223, KP055384, KP055504; *Euroschinus aoupiniensis* Hoff., Pell 1134 (BKL), New Caledonia, KP055224, KP055385, KP055505; *Euroschinus elegans* Engl., J. Munzinger 6642 (BKL), New Caledonia, KP055225, KP055386, KP055506; *Euroschinus falcata* Hook.f., Herscovitch s.n. (NY), Australia, KP055226, KP055387, KP055507; *Euroschinus jaffrei* M. Hoff, McPherson 18174 (MO), New Caledonia, KP055227, KP055388, KP055508; *Euroschinus papuana* Merr. and L.M.Perry, Takeuchi et al. 16409 (A), Papua New Guinea, KP055228, KP055389, KP055509; *Euroschinus verrucosus* Engl., Guillaumin et al. 12227 (NY), New Caledonia, KP055229, KP055390, KP055510; *Euroschinus vieillardii* Engl. var. *glabra*, Pell 1140 (NY), New Caledonia, KP055230, KP055391, KP055511; *Faguetia falcata* Marchand, Pell 600 (NY), Madagascar, KP055231, AY594598, KP055512; *Fegimanra africana* (Oliv.) Pierre, Reitsma and Reitsma 1257 (MO), Gabon, KP055232, AY594599, AY594515; *Fegimanra afzelii* Engl., G. Walters 647 (MO), Gabon, KP055233, KP055392, KP055513; *Gluta renghas* L., Pell 806 (BKL), Malaysia, KP055234, KP055393, KP055514; *Gluta tavoyana* Hook. f., Pell 1075 (NY), Vietnam, KP055235, KP055394, –; *Gluta tourtour* Marchand, Randrianasolo 770 (MO), Madagascar, KP055236, KP055395, KP055515; *Gluta wallichii* (Hook. f.) Ding Hou, Beaman 7065 (NY), Borneo, KP055237, AY594600, AY594516; *Haplorhus peruviana* Engl., O. Zöllner 4030 (L), Chile, KP055238, KP055396, KP055516; *Harpephyllum caffrum* Bernh. ex Krauss, Lau 1588 (NY), USA (cultivated in Hawaii), KP055239, AY594601, AY594518; *Heeria argentea* Meisn., Goldblatt s.n. (MO), South Africa, KP055240, AY594602, KP055517; *Lannea coromandelica* (Houtt.) Merr., Pell 1041 (NY), Vietnam, KP055241, KP055397, KP055518; *Lannea rivae* (Chiov.) Sacleux, Randrianasolo 662 (MO), Tanzania, KP055242, KP055398, AY594520; *Lannea schweinfurthii* Engl., Randrianasolo 661 (MO), Tanzania, KP055243, AY594605, AY594552; *Lannea welwitschii* (Hiern) Engl., Nemba and Thomas 532 (NY), Cameroon, KP055244, KP055399, AY594553; *Laurophyllus capensis* Thunb., Brand 207 (NY), South Africa, KP055245, KP055400, KP055519; *Lithrea molleoides* (Vell.) Engl., Pendry 711 (E), Bolivia, KP055246, KP055401, AY594554; *Loxopterygium grisebachii* Hieron., Pendry 678 (E), Bolivia, KP055247, KP055402, KP055520; *Loxopterygium sagotii* Hook.f., Polak 309 (E), Guyana, KP055248, AY594606, KP055521; *Loxostylis alata* Spreng. ex Rchb., Mitchell 652 (NY), South Africa, KP055249, AY594607, AY594522; *Mangifera foetida* Lour., Pell 1097 (NY), Vietnam, KP055250, KP055403, KP055522; *Mangifera minor* Blume., Pell 982 (NY), Papua New Guinea, KP055251, KP055404, –; *Mauria heterophylla* Kunth, Woytkowski 7788 (G), Peru, KP055252, KP055405, KP055523; *Mauria simplicifolia* Kunth, Leiva et al. 1552 (F), Peru, KP055253, KP055406, AY594556; *Mauria thaumatophylla* Loes., Nee and Wee 53816 (NY), Bolivia, KP055254, KP055407, KP055524; *Melanochyla angustifolia* Hook. f., AC Church 312 (A), Indonesia, KP055255, KP055408, KP055525; *Melanochyla bracteata* King, Niyomdham 1174 (A), Thailand, KP055256, KP055409, KP055526; *Melanochyla castaneifolia* Ding Hou, Ambriansyah and Arifin 903 (L), Indonesia, KP055257, KP055410, KP055527; *Metopium brownei* Urb., Brokaw 295 (NY), Belize, KP055258, AY594609, AY594557; *Metopium toxiferum* (L.) Krug and Urb., P. Fine s.n. (UC), Cuba, –, –, KP055528; *Micronychia bemangidiensis* Randrian. and Lowry, Birkinshaw 1622 (MO), Madagascar, KP055259, KP055411, KP055529; *Micronychia macrophylla* H. Perrier, Pell 643 (NY), Madagascar, AY594414, AY594610, AY594443; *Micronychia tsiramiramy* H. Perrier, Pell 634 (NY), Madagascar, –, AY594611, AY594524; *Myracrodruon balansae* (Engl.) Santin, Schinini 24043 (F), Paraguay, KP055260, KP055412, AY594559; *Myracrodruon urundeuva* Allem., Pendry 724 (E), Bolivia, KP055261, AY594613, AY594560; *Ochoterenaea colombiana* F.A. Barkley, Sánchez 2598 (F), Colombia, KP055262, KP055413, AY594561; *Operculicarya decaryi* H. Perrier, Randrianasolo 627 (MO), Madagascar, KP055263, AY594614, AY594525; *Operculicarya pachypus* Eggli, Pell 664 (NY), Madagascar, KP055264, KP055414, KP055530; *Orthopterygium huaucui* (A. Gray) Hemsl., Smith 5726 (NY), Peru, KP055265, AY594615, AY594526; *Ozoroa dispar* (C. Presl)

*(Continued)*

#### **Table 1 | Continued**

R. Fern. and A. Fern., R. Brand 33 (NY), South Africa, KP055266, KP055415, KP055531; *Ozoroa insignis* Delile, Randrianasolo 680 (MO), Tanzania, AY594415, KP055416, AY594444; *Ozoroa mucronata* (Bernh.) R. Fern. and A. Fern., R. Brand 1078 (NY), South Africa, KP055267, KP055417, KP055532; *Ozoroa obovata* (Oliv.) R. Fern. and A. Fern., Randrianasolo 707 (MO), Tanzania, AY594416, –, AY594445; *Ozoroa pulcherrima* (Schweinf.) R.Fern. and A. Fern., Luwiika et al. 305 (BRIT), Zambia, KP055268, KP055418, KP055533; *Pegia nitida* Colebr., Zhanhuo 92-254 (MO), China, KP055269, KP055419, AY594563; *Pegia sarmentosa* (Lecomte) Hand.-Mazz., Pell 1096 (NY), Vietnam, KP055270, KP055420, KP055534; *Pentaspadon annamense* (Evrard and Tardieu) P.H. Hô, S. K. Pell 1042 (NY), Vietnam, KP055271, KP055421, KP055535; *Pentaspadon poilanei* (Evrard and Tardieu) P.H. Hô, S. K. Pell 1036 (NY), Vietnam, KP055272, KP055422, KP055536; *Pistacia atlantica* Desf., Frantz s.n. (BRIT), USA (cultivated), KP055273, KP055423, KP055537; *Pistacia chinensis* Bunge, Heng 11622 (CAS), China, KP055274, KP055424, KP055538; *Pistacia mexicana* Kunth, Calzada 20869 (BRIT), Mexico, KP055275, KP055425, KP055539; *Pistacia vera* L., Pell 304 (LSU), USA (cultivated), KP055276, KP055426, KP055540; *Pistacia weinmannifolia* J. Poiss. ex Franch., Pell 1098 (NY), Vietnam, KP055277, KP055427, KP055541; *Pleiogynium hapalum* A. C. Sm., Smith 1940 (G), Fiji, KP055278, KP055428, KP055542; *Pleiogynium timoriense* (A. DC.) Leenh., PIF28193 (A), Queensland, KP055279, KP055429, KP055543; *Poupartia minor* Marchand, Pell 657 (NY), Madagascar, KP055280, KP055430, AY594530; *Poupartiopsis spondiocarpus* Capuron ex J.D. Mitch. and Daly, Randrianasolo 592 (MO), Madagascar, KP055281, KP055431, AY594446; *Protorhus grandidieri* Engl., Randrianasolo 1230 (MO), Madagascar, KP055282, KP055432, KP055544; *Protorhus longifolia* Engl., Brand 322 (NY), South Africa, KP055283, KP055433, KP055545; *Protorhus sericea* Engl., Randrianasolo 783 (MO), Madagascar, AY594406, KP055434, AY594437; *Protorhus viguieri* H. Perrier, Randrianasolo 776 (MO), Madagascar, KP055284, KP055435, AY594440; *Pseudosmodingium andrieuxii* Engl. 2, Tenorio 17041 (F), Mexico, KP055286, –, AY594565; *Pseudosmodingium andrieuxii* Engl. 1, Tenorio 17041 (F), Mexico, KP055285, KP055436, AY594566; *Pseudospondias microcarpa* Engl., Randrianasolo 809 (MO), Gabon, KP055287, KP055437, KP055546; *Rhus aromatica* Aiton, Mayfield 2881 (LSU), USA, AY594418, AY594621, KP055547; *Rhus chinensis* Mill. 1, S. K. Pell 1063 (NY), Vietnam, KP055288, KP055438, KP055548; *Rhus chinensis* Mill. 2, Altvatter and Hammond 7132 V95 (MOR), USA (cultivated from Japan), KP055289, AY594622, KP055549; *Rhus ciliolata* Turcz., G. Hall 0777 (NY), Mexico, KP055290, KP055439, KP055550; *Rhus copallina* L., Mitchell 666 (NY), USA, AY594419, AY594623, KP055551; *Rhus coriaria* L., E. Vitek 2000-301 (W), Portugal (naturalized), KP055291, KP055440, KP055552; *Rhus glabra* L. 'Laciniata', S.K. Pell 750 (BKL), USA (cultivated), KP055292, KP055441, KP055553; *Rhus lanceolata* (A. Gray) Britton, Campbell 39 (NY), USA (cultivated), KP055293, AY594625, AY594449; *Rhus michauxii* Sarg., living collection accession 080590 (ODU), USA, KP055294, KP055442, KP055554; *Rhus ovata* S. Watson, J. D. Mitchell 1503 (NY), USA, KP055295, KP055443, KP055555; *Rhus perrieri* (Courchet) H.Perrier, Randrianasolo 629 (MO), Madagascar, AY594421, AY594626, KP055556; *Rhus sandwichii* A. Gray, Pell 831 (NY), USA (Hawaii), KP055296, KP055444, KP055557; *Rhus thouarsii* (Engl.) H. Perrier, Pell 638 (NY), Madagascar, KP055297, AY594628, AY594452; *Rhus typhina* L., Mitchell 672 (NY), USA, KP055298, AY594629, AY594453; *Rhus virens* Lindh. ex A. Gray, Mitchell 667 (NY), USA, KP055299, AY594631, KP055558; *Schinopsis brasiliensis* Engl., Bridgewater 1012 (E), UK (cultivated), KP055300, AY594632, KP055559; *Schinopsis marginata* Engl., Nee and Wee 53889 (NY), Bolivia, KP055301, KP055445, –; *Schinus areira* L., Pendry 737 (E), Bolivia, KP055302, AY594633, AY594572; *Schinus fasciculata* (Griseb.) I.M. Johnst., Mendoza 2013 (NY), Bolivia, KP055303, KP055446, KP055560; *Schinus gracilipes* I.M. Johnst., Pell 1008 (BKL), USA (cultivated), KP055304, KP055447, KP055561; *Schinus myrtifolia* (Griseb.) Cabrera, Moraes 1809, Bolivia, KP055305, KP055448, KP055562; *Schinus terebinthifolia* Raddi, Prinzie 111 (MO), USA, KP055306, KP055449, KP055563; *Sclerocarya birrea* Hochst. subsp. *caffra* ( Sond. ) Kokwaro, SKP 695 (NY), USA (cultivated from South Africa), KP055307, AY594634, AY594574; *Searsia erosa* (Thunb.) Moffett, Stevenson 1395170 (NY), South Africa, AY594420, AY594624, AY594448; *Searsia lancea* (L.f.) F.A.Barkley, Pell 693 (BKL), USA (cultivated from South Africa), KP055308, KP055450, KP055564; *Searsia longipes* (Engl.) Moffett, A. Randrianasolo et al. 675 (MO), Tanzania, KP055309, KP055451, KP055565; *Searsia lucida* (L.) F.A.Barkley, Pell 691 (BKL), USA (cultivated from South Africa), KP055310, KP055452, KP055566; *Searsia pendulina* (Jacq.) Moffett, Pell 694 (BKL), USA (cultivated from South Africa), KP055311, KP055453, AY594450; *Searsia undulata* (Jacq.) T.S.Yi A.J.Mill. and J.Wen, Pell 692 (BKL), USA (cultivated from South Africa), AY594423, AY594630, AY594454; *Semecarpus anacardium* L. f., Codon and Codon 13 (NY), Nepal, KP055312, AY594635, AY594575; *Semecarpus forstenii* Blume, Regalado and Sirikolo 812 (F), Solomon Islands, KP055313, KP055454, AY594535; *Semecarpus magnificus* K. Schum., Tree OE4C0215 (MIN), Papua New Guinea, KP055314, KP055455, KP055567; *Semecarpus neocaledonicus* Engl., Pell 1128 (NY), New Caledonia, KP055315, KP055456, KP055568; *Semecarpus obscurus* Thwaites, Motley 2914 (NY), Mauritius, KP055316, KP055457, KP055569; *Semecarpus reticulatus* Lecomte, Pell 1084 (NY), Vietnam, KP055317, KP055458, KP055570; *Semecarpus schlechteri* Lauterb., Tree WP3B0619 (MIN), Papua New Guinea, KP055318, KP055459, KP055571; *Semecarpus tonkinensis* Lecomte, S. K. Pell 1094 (NY), Vietnam, KP055319, KP055460, KP055572; *Smodingium argutum* E. Mey., Winter 88 (MOR), South Africa (cultivated), KP055320, AY594636, AY594576; *Sorindeia juglandifolia* (A. Rich.) Planch. ex Oliv., G. Walters 875 (MO), Gabon, KP055321, KP055461, KP055573; *Spondias malayana* Kosterm., Pell 775 (BKL), USA (cultivated in Hawaii), KP055322, KP055462, KP055574; *Spondias mombin* L., Mitchell s.n. (NY), USA (cultivated), –, –, KP055575; *Spondias pinnata* (Linn. f.) Kurz, Pell 1060 (NY), Vietnam, KP055323, KP055463, KP055576; *Spondias tuberosa* Arruda, W. Thomas s.n. (NY), Brazil, KP055324, KP055464, KP055577; *Swintonia schwenckii* Teijsm. and Binn. ex Hook. f., Herscovitch s.n. (NY), Australia (cultivated), KP055325, KP055465, KP055578; *Tapirira bethanniana* J.D. Mitch., Mori 24337 (NY), French Guiana, KP055326, AY594638, AY594578; *Tapirira guianensis* Aubl. 2, Cornejo and Canga 8194 (NY), Ecuador, KP055328, KP055467, KP055580; *Tapirira guianensis* Aubl. 1, Daly 13984 (NY), Colombia, KP055327, KP055466, KP055579; *Tapirira obtusa* (Benth.) J.D. Mitch., Mori 24744 (NY), French Guiana, KP055329, AY594639, AY594579; *Thyrsodium spruceanum* Benth., Mori 24215 (NY), French Guiana, KP055330, AY594641, –; *Toxicodendron borneense* (Stapf) Gillis, Sidiyasa and Arifin 1481 (L), Indonesia, KP055331, –, –; *Toxicodendron griffithii* (Hook. f.) Kuntze, Koelz 30428 (L), India, KP055332, –, –; *Toxicodendron pubescens* Mill., Mitchell 1501 (NY), USA, KP055333, KP055468, KP055582; *Toxicodendron radicans* (L.) Kuntze, Pell 545 (LSU), USA, KP055334, AY594642, AY594540; *Toxicodendron rhetsoides* (Craib) Tardieu, Maxwell 90-101 (L), Thailand, KP055335, –, –; *Toxicodendron succedaneum* (L.) Kuntze, Pell 1092 (NY), Vietnam, KP055336, KP055469, KP055583; *Toxicodendron vernicifluum* (Stokes) F.A. Barkley, Mitchell 660 (NY), USA (cultivated from South Korea), KP055337, AY594643, AY594580; *Toxicodendron vernix* (L.) Kuntze, Mitchell 673 (NY), USA, KP055338, KP055470, AY594581; *Trichoscypha acuminata* Engl., Walters et al. 539 (MO), Gabon, AY594425, KP055471, AY594456; *Trichoscypha ulugurensis* Mildbr., Randrianasolo 726 (MO), Tanzania, AY594426, –, AY594457.

*(Continued)*

#### **Table 1 | Continued**

#### **BURSERACEAE**

*Ambilobea madagascariensis* (Capuron) Thulin, Beier and Razafim., Nusbaumer LN905 (MO), Madagascar, KF034990, KM516857, –; *Aucoumea klaineana* Pierre, Walters et al. 466 (MO), Gabon, FJ233911, KM516858, GU246086; *Beiselia mexicana* Forman, Pell s.n. (NY), Mexico, AY315111-2, AY314997, GU246085; *Boswellia frereana* Birdw., Thulin and Warfa 5599 (UPS), Somalia, AY315084-6, AY314998, KM516800; *Boswellia neglecta* S. Moore, Weeks 00-VIII-29-1 (TEX), Ethiopia, AY315087-9, AY314999, GU246087; *Boswellia sacra* Birdw. (syn = *B. carteri*), Weeks 01-X-08-3 (TEX), North East Africa, AY315090-2, AY315000, GU246088; *Bursera biflora* Standl., Weeks 99-VII-17-7 (TEX), Mexico, AY315039-41, AY315001, GU246089; *Bursera copallifera* (Sessé and Moc. ex DC.) Engl., Weeks 00-X-24-1 (TEX), Mexico, AY315042-4, AY315002, KM516801; *Bursera coyucensis* Bullock, Weeks 98-VII-15-3 (TEX), Mexico, KM516830, KM516859, –; *Bursera cuneata* (Schltdl.) Engl., Weeks 99-VII-17-1 (TEX), Mexico, AY315045-7, AY315003, GU246090; *Bursera discolor* Rzed., Weeks 98-VII-15-1 (TEX), Mexico, AY309305-7, AY309282, KM516802; *Bursera fagaroides* (H.B.K.) Engl., Weeks 01-X-08-1 (TEX), Mexico, AY309308-10, AY309283, KM516803; *Bursera hindsiana* Engl., Weeks 00-VI-14-1 (TEX), Mexico, AY315048-50, AY315004, GU246091; *Bursera infernidialis* F.Guevara-Fefer and Rzed., Weeks 99-X-11-1 (TEX), Mexico, KM516831, KM516860, KM516804; *Bursera lancifolia* (Schltdl.) Engl., Weeks 98-VII-14-5 (TEX), Mexico, AY309317-20, AY309286, GU246092; *Bursera longipes* Standl., Weeks 98-VII-14-6 (TEX), Mexico, AY309320-2, AY309287, –; *Bursera microphylla* A. Gray, Weeks 01-X-08-2 (TEX), USA, AY309326-8, AY309289, GU246093; *Bursera penicillata* Engl., Weeks 99-X-13-2 (TEX), Mexico, KM516832, KM516861, KM516805; *Bursera sarukhanii* Guevara and Rzed., Weeks 00-VIII-18-6 (TEX), Mexico, AY315051-3, AY315005, KM516806; *Bursera simaruba* (L.) Sarg., Goldman s.n. (BH), USA, AY309341-3, AY309293, GU246094; *Bursera spinescens* Urb. and Ekman, Weeks 01-VIII-23-1 (TEX), Dominican Republic, AY309356-8, AY309294, KM516807; *Bursera steyermarkii* Standl., Weeks 99-VI-13-5 (TEX), Guatemala, KM516833, KM516862, –; *Bursera tecomaca* (DC.) Standl., Weeks 02-IV-23-1 (TEX), Mexico, AY309359-61, AY309280, FJ466463; *Canarium album* (Lour.) Raeusch., HCAN 24/N98-18 at NGR, China, AY635362, AY635355, FJ466464; *Canarium balansae* Engl., Munzinger 2965 (NOU), New Caledonia, FJ466459, FJ466493, FJ466465; *Canarium bengalense* Roxb., HCAN 25/N98-19 at NGR, China, AY635363, AY635356, FJ466466; *Canarium decumanum* Gaertn., HCAN 6/N90-155 at NGR, Malaysia, AY635364, AY635357, FJ466467; *Canarium harveyi* Seem., HCAN 16/N92-30 at NGR, unknown, AY635365, AY635358, FJ466468; *Canarium indicum* L., Lai s.n. (BH), Malaysia, AY315113-5, AY315006, FJ466469; *Canarium madagascariense* Engl., Randrianaivo et al. 746 (MO), Madagascar, FJ466462, FJ466496, FJ466471; *Canarium madagascariense* subsp. *bullatum* Leenh., Daly 12952 (NY), Madagascar, KM516834, KM516863, KM516808; *Canarium muelleri* F.M.Bailey, Fine 1400 (NY), Australia, KM516836, KM516865, GU246095; *Canarium obtusifolium* Scott-Elliot, Daly 12953 (NY), Madagascar, KM516837, KM516866, KM516810; *Canarium oleiferum* Baill., Munzinger GD 1373 (NOU), New Caledonia, FJ466460, FJ466494, FJ466472; *Canarium ovatum* Engl., HCAN 7/N91-26 at NGR, Philippines, AY635366-8, AY635359, FJ466473; *Canarium pilosum* A.W. Benn., Bogler s.n. (TEX), Malaysia, AY315119-20, AY315008, FJ466474; *Canarium* sp. nov. 1, Daly 12967 (NY), Madagascar, KM516835, KM516864, KM516809; *Canarium* sp. nov. 2, Daly 12963 (NY), Madagascar, KM516838, KM516867, KM516811; *Canarium strictum* Roxb., HCAN 22/97-02 at NGR, China, AY635369, AY635360, FJ466475; *Canarium tramdenum* C.D. Dai and Yakolvlev, HCAN 23/N97-04 AT NGR, China, AY635370, AY635361, FJ466476; *Canarium vulgare* Leenh., Lai s.n. (BH), Malaysia, AY315121-3, AY315009, FJ466477; *Canarium whitei* Guillaumin, Munzinger LB600 (NOU), New Caledonia, FJ466461, FJ466495, FJ466478; *Canarium zeylanicum* Blume, Lai s.n. (BH), Malaysia, AY315124-6, AY315010, FJ466479; *Commiphora angolensis* Engl., Raal and Raal 801 (TEX), South Africa, AY315054-6, AY315011, KM516812; *Commiphora aprevalii* Guillaumin, Phillipson 2563 (MO), Madagascar, AY831870, AY831942, –; *Commiphora capensis* Engl., Weeks 06-XII-23-1 (GMUF), Namibia, KM516839, KM516868, KM516813; *Commiphora edulis* (Klotzsch) Engl., Weeks 00-VI-14-3 (TEX), Zimbabwe, AY315057-9, AY315012, FJ466480; *Commiphora eminii* subsp. *zimmermannii* (Engl.) J.B. Gillett, Mwandoka and Shangai 595 (MO), Tanzania, AY315060-2, AY315013, KM516814; *Commiphora falcata* Capuron, Phillipson et al. 3744 (MO), Madagascar, AY831875, AY831947, GU246097; *Commiphora franciscana* Capuron, Labat 2082 (MO), Madagascar, AY315063-5, AY315014, KM516815; *Commiphora kua* (R.Br. ex Royle) K. Vollesen, Gilbert et al. 7629 (MO), Ethiopia, AY315066-8, AY315015, –; *Commiphora leptophloeos* (Mart.) J.B. Gillett, Abbott 16295 (TEX), Bolivia, AY315069-71, AY315016, KM516816; *Commiphora monstrosa* (H. Perrier) Capuron, Phillipson 2354 (MO), Madagascar, AY831884, AY831956, –; *Commiphora rostrata* Engl., Gilbert et al. 7472 (MO), Ethiopia, AY315072-4, AY315017, KM516817; *Commiphora saxicola* Engl., Weeks 06-XII-30-1 (GMUF), Namibia, KM516840, KM516869, KM516818; *Commiphora schimperi* Engl., Weeks 00-VIII-18-8 (TEX), South Africa, AY315075-7, AY315018, GU246098; *Commiphora ugogensis* Engl., Lovett 1626 (MO), Tanzania, AY315078-80, AY315019, KM516819; *Commiphora wightii* (Arn.) Bhandari, Weeks 00-VIII-18-3 (TEX), India, AY315081-3, AY315020, KM516820; *Commiphora wildii* Merxm., Weeks 06-XII-30-5 (GMUF), Namibia, KM516841, KM516870, KM516821; *Crepidospermum atlanticum* Daly, Stefano 204 (NY), Brazil, KJ503399, KJ503682, KJ503776; *Crepidospermum rhoifolium* (Benth.) Engl., Daly et al. 13817 (NY), Colombia, KJ503429, KJ503707, KJ503804; *Dacryodes buettneri* H. J. Lam, Carvalho 5748 (TEX), Equatorial Guinea, AY315139-40, AY315024, GU246100; *Dacryodes* cf. *peruviana* (Loes.) H.J.Lam, GV984 (NY), Ecuador, KM516848, KM516871, KM516822; *Dacryodes chimantensis* Steyerm. and Maguire, Fine s.n. (NY), Peru, KM516842, KM516872, KM516823; *Dacryodes cuspidata* (Cuatrec.) Daly, Fine 259 (NY), Peru, KM516843, KM516873, GU246101; *Dacryodes edulis* (G. Don) H.J. Lam, Wilks 2552 (NY), Gabon, KM516844, AY315025, GU246102; *Dacryodes excelsa* Vahl, Struwe and Specht 1085 (NY), Puerto Rico, KM516845, KM516874, AY594509; *Dacryodes hopkinsii* Daly, Fine 137 (NY), Peru, KM516846, KM516875, KM516824; *Dacryodes klaineana* (Pierre) H.J. Lam, Merello et al. 1615 (MO), Equatorial Guinea, AY315141-3, AY315026, KM516825; *Dacryodes nitens* Cuatrec., Fine 1376 (NY), French Guiana, KM516847, KM516876, KM516826; Garuga floribunda Decne., McPherson 19447 (NOU), Malaysia, KM516849, KM516877, GU246105; *Garuga pinnata* Roxb., Maxwell 89-515 (MO), Thailand, KM516850, KM516878, KM516827; *Protium aidanianum* Daly, GV 53 (QCNE), Ecuador, KJ503367, KJ503654, KJ503744; *Protium altsonii* Sandwith, Fine 1298 (UC), Guyana, KJ503346, KJ503635, KJ503720; *Protium amazonicum* (Cuatrec.) Daly, Sara Smith s.n. (UC), Peru, KJ503422, KJ503700, KJ503799; *Protium aracouchini* (Aubl.) Marchand, Fine 1385 (UC), French Guiana, KJ503359, KJ503647, KJ503736; *Protium attenuatum* Urb., Howard 1983 (NY), St. Lucia (Lesser Antilles), KJ503381, KJ503667, KJ503758; *Protium brasiliense* (Spreng.) Engl., MS 227 (NY), Brazil, KJ503404, KJ503687, KJ503781; *Protium calanense* Cuatrec., ND864 (UC), Peru, KJ503379, KJ503665, KJ503756; *Protium calendulinum* Daly, AmaLin tree 19- 111-5 (UC), Peru, KJ503435, KJ503712, KJ503810; *Protium colombianum* Cuatrec., Daly et al. 13819 (NY), Colombia, KJ503430, KJ503708, KJ503805; *Protium confusum* Pittier, Perez 2126 (SCZ), Panama, KJ503342, KJ503632, KJ503716; *Protium copal* (Schltdl. and Cham.) Engl., Daly s.n., Belize, KJ503368, KJ503655, KJ503745; *Protium costaricense* (Rose) Engl., Perez 1984 (SCZ), Costa Rica, KJ503341, KJ503631, KJ503715; *Protium cranipyrenum* Cuatrec., Daly et al. 13831 (NY), Colombia, KJ503433, KJ503711, KJ503808; *Protium crassipetalum* Cuatrec., Fine 1304 (UC), Peru, KJ503356, KJ503643, KJ503730;

*(Continued)*

#### **Table 1 | Continued**

*Protium cubense* Urb., Fine 2016 (UC), Cuba, KJ503420, KJ503699, KJ503797; *Protium decandrum* (Aubl.) Marchand, Fine 1371 (UC), French Guiana, KJ503376, KJ503663, KJ503753; *Protium demerarense* Swart, Fine 1426 (UC), French Guiana, KJ503353, KJ503641, KJ503727; *Protium divaricatum* Engl. var. *divaricatum*, Fine 292 (UC), Peru, KJ503362, KJ503649, KJ503739; *Protium fragrans* Urb., Fine 2013 (NY), Cuba, KJ503418, KJ503697, KJ503795; *Protium gallosum* Daly, Fine 297 (UC), Peru, KJ503360, KJ503648, KJ503737; *Protium giganteum* Engl., Fine 1372 (UC), French Guiana, KJ503377, KJ503664, KJ503754; *Protium glabrescens* Swart, Fine 215 (UC), Peru, KJ503348, KJ503637, KJ503722; *Protium guianense* (Aubl.) Marchand, Fine 1369 (UC), French Guiana, KJ503374, KJ503661, KJ503751; *Protium heptaphyllum* (Aubl.) Marchand subsp. *heptaphyllum*, Stefano 223 (NY), Brazil, KJ503402, KJ503685, KJ503779; *Protium icicariba* (DC.) Marchand, Stefano 222 (NY), Brazil, KJ503401, KJ503684, KJ503778; *Protium javanicum* Burm.f., Chase 2089 (K), Indonesia, KJ503339, KJ503629, KJ503713; *Protium klugii* J.F.Macbr., Fine 955 (UC), Peru, KJ503366, KJ503653, KJ503743; *Protium laxiflorum* Engl., Fine 311 (UC), Peru, KJ503361, –, KJ503738; *Protium madagascariense* Engl., Daly et al. 13092 (NY), Madagascar, KJ503382, KJ503668, KJ503759; *Protium nervosum* Cuatrec., Daly et al. 13815 (NY), Colombia, KJ503428, KJ503706, –; *Protium nodulosum* Swart, Fine 956 (UC), Peru, KJ503363, KJ503650, KJ503740; *Protium opacum* Swart subsp. *opacum*, Fine 957 (UC), Peru, KJ503365, KJ503652, KJ503742; *Protium ovatum* Engl., Fonseca 169 (NY), Brazil, KJ503408, KJ503689, KJ503785; *Protium pallidum* Cuatrec., Fine 958 (UC), French Guiana, KJ503357, KJ503645, KJ503732; *Protium panamense* I.M.Johnst., Perez 1838 (SCZ), Panama, KJ503344, –, KJ503718; *Protium paniculatum* Engl. var. *paniculatum*, Fine 153 (UC), Peru, KJ503364, KJ503651, KJ503741; *Protium pecuniosum* Daly, Aguila 12937 (NY), Costa Rica, KJ503340, KJ503630, KJ503714; *Protium pilosum* (Cuatrec.) Daly, Fine 1452 (UC), French Guiana, KJ503434, –, KJ503809; *Protium pittieri* (Rose) Engl., Garcia 47 (LSCR), Costa Rica, KJ503398, KJ503681, KJ503775; *Protium plagiocarpium* Benoist, Fine 1363 (UC), French Guiana, KJ503370, KJ503657, KJ503747; *Protium polybotryum* (Turcz.) Engl., PACL Assunção 803 (INPA), Brazil, KJ503396, KJ503679, KJ503773; *Protium puncticulatum* J.F.Macbr., Daly et al. 13776 (NY), Brazil, KJ503388, –, KJ503765; *Protium rhyncophyllum* (Rusby) ined., Daly 12163 (NY), Brazil, KJ503383, KJ503669, KJ503760; *Protium sagotianum* Marchand, Fine 1451 (UC), French Guiana, KJ503351, KJ503639, KJ503725; *Protium serratum* (Wall. ex Colebr.) Engl., Daly et al. 13880 (NY), Vietnam, KJ503410, KJ503691, KJ503787; *Protium sessiliflorum* (Rose) Standl., Perez 1910 (SCZ), Panama, KJ503343, KJ503633, KJ503717; *Protium spruceanum* (Benth.) Engl., ND 1181 (UC), Peru, KJ503355, –, KJ503729; *Protium subacuminatum* Swart, Fine 2001 (UC), Cuba, KJ503416, KJ503695, KJ503793; *Protium unifoliolatum* Engl., ND 1202 (NY), Cuba, KJ503378, –, KJ503755; *Protium warmingianum* Marchand, Stefano 232 (NY), Brazil, KJ503405, –, KJ503782; *Santiria apiculata* A.W. Benn., Lai s.n. (BH), Malaysia, AY315127-9, AY315030, FJ466482; *Santiria griffithii* Engl., Lai s.n. (BH), Malaysia, AY315130-2, AY315031, FJ466483; *Santiria trimera* (Oliver) Aubrév., Bradley et al. 1026 (MO), Gabon, KM516851, KM516879, GU246109; *Scutinanthe brunea* Thw., Mohtar 53964 (MO), Sarawak, KM516852, KM516880, –; *Tetragastris balsamifera* (Sw.) Kuntze, Torrens s.n. (UC), Dominican Republic, KJ503409, KJ503690, KJ503786; *Tetragastris catuaba* Soares da Cunha, Piotto 3850 (NY), Brazil, KJ503413, KJ503694, KJ503790; *Tetragastris hostmannii* (Engl.) Kuntze, Cabral 63 (NY), Brazil, KJ503412, KJ503693, KJ503789; *Tetragastris varians* Little, Daly et al. 13822 (NY), Colombia, KJ503432, KJ503710, KJ503807; *Trattinnickia burserifolia* Mart., Daly et al. 9061 (NY), Brazil, KM516853, KM516881, KM516828; *Trattinnickia* cf. *lancifolia* (Cuatrec.) Daly, GV11958 (NY), Ecuador, KM516855, KM516882, KM516829; *Trattinnickia demerarae* Swart, SM 25262 (NY), French Guiana, KM516854, KM516883, GU246111; *Trattinnickia glaziovii* Swart, Gentry and Revilla 69141 (MO), Brazil, AY315136-8, FJ466498, FJ466485; *Triomma malaccensis* Hook.f., Gentry and Tagi 34056 (MO), Malaysia, KM516856, KM516884, GU246112.

#### **MELIACEAE**

*Trichilia elegans* A. Juss., Nee and Wee 53785 (NY), Bolivia, KP055339, KP055472, KP055584.

#### **RUTACEAE**

*Boronia denticulata* Sm., 8116105 (BKL), USA (cultivated in NY), KP055340, –, –; *Dictyoloma peruviana* Planch., L. Valenzuela et al. 3260 (BRIT), Peru, KP055341, KP055473, KP055585; *Poncirus trifoliata* (L.) Raf., SKP 697 (BKL), USA (cultivated in NY), KP055342, KP055474, KP055586; *Spathelia bahamensis* Vict., D. S. Correll 46048 (BRIT), Bahama, KP055343, KP055475, KP055587; *Zanthoxylum* sp., Acevedo 11126 (US), French Guiana, KP055344, –, AY594541.

#### **SAPINDACEAE**

*Acer cissifolium* K. Koch, SKP 698 (BKL), USA (cultivated in NY), KP055345, KP055476, KP055588; *Acer griseum* (Franch.) Pax, SKP 700 (BKL), USA (cultivated in NY), KP055346, KP055477, KP055589; *Acer mandshuricum* Maxim., SKP 699 (BKL), USA (cultivated in NY), KP055347, –, –; *Acer pensylvanicum* L., SKP 696 (BKL), USA (cultivated in NY), KP055348, –, KP055590; *Cupania scrobiculata* Rich., Acevedo 11119 (US), French Guiana, KP055349, AY594595, AY594508; *Dilodendron bipinnatum* Radlk., Acevedo 11129 (US), Bolivia, KP055350, KP055478, AY594510; *Diplokeleba floribunda* N.E. Br., Acevedo 11130 (US), Bolivia, KP055351, –, AY594511; *Dodonaea viscosa* Jacq., Acevedo 11144 (US), Bolivia, KP055352, AY594597, AY594513; *Guioa koelreuteria* (Blanco) Merr., Takeuchi 7123 (NY), Papua New Guinea, KP055353, KP055479, AY594517; *Hypelate trifoliata* Sw., Acevedo 11425 (US), Puerto Rico, KP055354, AY594604, AY594519; *Placioscyphus* sp., Pell 602 (NY), Madagascar, KP055355, –, AY594528; *Sapindus saponaria* L., Zanoni 15476 (NY), Dominican Republi, KP055356, KP055480, AY594534; *Serjania glabrata* Kunth, Acevedo 6553 (US), Bolivia, KP055357, KP055481, AY594536; *Serjania polyphylla* (L.) Radlk., Acevedo s.n. (US), Puerto Rico (cultivated), KP055358, KP055482, AY594537; *Thouinia portoricensis* Radlk., Acevedo 11435 (US), Puerto Rico, KP055359, –, AY594539.

*Each species name is followed by its herbarium voucher, country of origin and GenBank accession numbers for nrDNA ETS, cpDNA rps16 intron, and trnL-F intron-spacer region. –, DNA sequence missing.*

from Germany (47 Ma; Manchester et al., 2007). Ages of all Anacardiaceae fossils included herein are associated with sediments that have been dated radiometrically. Three Burseraceae fossils were selected. Two fossils are from the London Clay and are Early Eocene fruit casts of *Bursericarpum aldwickense* Chandler, which is assignable to extant Protieae on the basis of the number of pyrenes per fruit (Chandler, 1961; Harley and Daly, 1995), and *Protocommiphora europea* Reid and Chandler, which is similar to extant *Commiphora* (Reid and Chandler, 1933; Collinson, 1983). The age of these fossils is estimated as 48.6 Ma, the age of the lowest stratum of the Middle Eocene (Lutetian) and the upper-most bound of the Early Eocene (Ypresian). The remaining Burseraceae fossil is a leaf impression ascribed to *Bursera* subgenus *Elaphrium* from the Green River Flora of Colorado and Utah (MacGinitie, 1969; Plate 30, **Figure 2**), whose base has a radiometrically-determined age of 49.7–50.7 Ma (Clyde et al., 1995). Dates for all geological periods and epochs follow those of the International Commission on Stratigraphy.

### **PHYLOGENETIC DATING AND DIVERSIFICATION ANALYSES**

The chronogram and divergence times were co-estimated using Markov Chain Monte Carlo (MC2) sampling in BEAST v1.8 (Drummond et al., 2012). A birth-death speciation process (Gernhard, 2008) was specified as a tree prior with a death rate parameter sampled from a U(0,1) prior distribution, and a growth rate parameter sampled from a U(0,inf) prior distribution. Rate heterogeneity among lineages was modeled using an uncorrelated lognormal relaxed molecular clock (Drummond et al., 2006) with a mean sampled from an Exp(10) prior distribution. We used a secondary calibration to set the prior on the age of the root using a N(85,8) prior distribution; this parameterization accounts for the uncertainty surrounding the age of the Sapindales (Muellner et al., 2006; Magallón and Castillo, 2009). We used the six Terebinthaceae fossils (see above) to set priors on six nodes: the most recent common ancestor (MRCA) of *Cotinus*, the MRCA of *Loxopterygium*, the MRCA of *Anacardium*, the MRCA of the Protieae, the MRCA of *Commiphora*, and the MRCA of *Bursera* subgenus *Elaphrium*. Because all of these fossils are fragmentary, it is not possible to be certain that any of those fossils possess features that would place them in the crown groups. Therefore, we took a conservative approach and used them as minimum calibrations of the stem groups (Forest, 2009). All these nodes were parameterized with Exponential distributions in which the offset matched the minimum bound set by the fossil age, and the mean was set to be 10% older than this value. Because random starting trees did not satisfy the temporal and topological constraints associated with some fossil calibrations, we used ExaML v1.0.12 (Stamatakis and Aberer, 2013) to estimate a maximum likelihood tree, transformed it into a chronogram using penalized likelihood (Sanderson, 2002; Paradis, 2013), and used it as starting topology in BEAST. The MC2 was run for <sup>6</sup> <sup>×</sup> <sup>10</sup><sup>7</sup> generations sampling every 4 <sup>×</sup> <sup>10</sup><sup>3</sup> with the first 20% of the samples discarded as burn-in. Convergence to stationarity of the MC2 sampling was determined with time-series plots of the likelihood scores and cumulative split frequencies, and assessing that estimated effective sample sizes for the chronograms and model parameters were at least 100. Post burn-in chronograms were summarized with a majority clade credibility tree (MCCT) using median branch lengths.

We carried out diversification analyses in two ways. First, we used BayesRate (Silvestro et al., 2011) to evaluate whether a single birth-death diversification process for the whole Terebinthaceae, or two birth-death diversification processes, one for Anacardiaceae and one for Burseraceae, better explain the accumulation of lineages through time. For this analysis, we used flat priors, clade-specific taxon sampling proportions (*PAnacardiaceae* = 0*.*21, *PBurseraceae* = 0*.*19), we unlinked rates between clades, and ran the MC2 for 1 <sup>×</sup> <sup>10</sup><sup>5</sup> generations, sampling every 1 <sup>×</sup> <sup>10</sup>2, and discarding the first 10% as burn-in. For model selection, we used Bayes Factors using the marginal likelihoods calculated using thermodynamic integration. Second, we used BAMM (Rabosky, 2014) to automatically detect shifts in diversification process through time without defining tree partitions *a priori*. For this analysis, we used 1.0 for the Poisson rate prior, the lambda initial prior, and the extinction rate prior. We included a global taxon sampling proportion *P* = 0*.*20. We ran <sup>1</sup> <sup>×</sup> <sup>10</sup><sup>7</sup> generations of MC2, sampling every 1 <sup>×</sup> <sup>10</sup>3, and discarding the first 10% as burn-in, with two independent runs to assess convergence.

#### **GEOGRAPHIC RANGE AND CLIMATIC NICHE EVOLUTION ANALYSES**

To study geographic range evolution through time, we employed maximum-likelihood inference of geographic range evolution using the dispersal, extinction, and cladogenesis (DEC) model (Ree et al., 2005; Ree and Smith, 2008) implemented in Lagrange version 0.1β, and estimated split and ancestral states concurrently. We described the geographic distributions of each Anacardiaceae and Burseraceae taxon following the biogeographic divisions of Good (1974) and Olson and Carlquist (2001) with some modifications. We assigned each species to one or more of the following seven areas: NA: North America (including Central America and the Caribbean); SA: South America; EA: Eurasia (including North Africa/Mediterranean/Arabian Peninsula); SSA: sub-Saharan Africa; MAD: Madagascar, SeA: Southeast Asia (including India); and OC: Oceania (including Papua New Guinea/Tropical Australia/New Caledonia, and Tropical Pacific Islands). We ran a single DEC unconstrained model assuming rates of dispersal/expansion and extinction were uniform across the areas in the model and across the phylogeny. We estimated D, the dispersal/expansion rate across the phylogeny (Ree and Smith, 2008) for each family, by running two more DEC analyses, one for each family.

To study climatic niche evolution through time, we carried out a second DEC analysis. Although DEC was initially designed for modeling geographic range evolution, it provides a sound framework for modeling the evolution of other type of characters (Ree and Smith, 2008), in particular climatic niche. We found two significant benefits of DEC over alternative approaches for modeling the evolution of climatic niche. First, broad climatic niches encompassing two or more unique climatic niches are valid states for single species; many species in nature display broad climatic tolerances. Second, we reasoned that, analogous to geographic range, for an ancestor with a broad climatic niche, climatic adaptation and lineage divergence can result in daughter species inheriting mutually exclusive climatic niches, or one daughter species inheriting one climatic niche, while the other (the remainder of the ancestral lineage) inheriting the ancestral climatic niche (for details and further discussion, see Ree et al., 2005; Ree and Smith, 2008). For this analysis we assigned each species to one or more of the following climatic categories: Temperate (frost-tolerant), Tropical Seasonal Dry Forest/Savannah/Scrubland, and Tropical Moist/Wet Forest. Taxa were assigned to these regions on the basis of the authors' knowledge of the taxa and published sources (Daly et al., 2011; Pell et al., 2011). We ran a single DEC unconstrained model assuming rates of dispersal/expansion and extinction were uniform across the climatic niches in the model and across the

Anacardiaceae and Burseraceae indicated to the right of the clades. Posterior probabilities for branches are shown on descendant nodes.

phylogeny. We estimated D, the dispersal/expansion rate across the phylogeny (Ree and Smith, 2008), for each family by running two more DEC analyses, one for each family.

### **RESULTS**

Genbank accession information for all taxa is listed in **Table 1**. Alignments, analyses and trees generated by the study are posted to Treebase, study number 16073 (www*.*treebase*.*org).

#### **PHYLOGENETIC RELATIONSHIPS**

Results support the monophyly of Anacardiaceae and Burseraceae individually and their relationship as sister clades (**Figure 2**). Within Anacardiaceae, phylogenetic results pose challenges to all published subfamilial classifications of the family (**Figure 3**). The most widely used classification of the family includes five tribes, Anacardieae, Dobineae, Rhoeae, Semecarpeae, Spondiadeae, (Engler, 1881, 1883, 1892; modified by Mitchell and Mori, 1987; and again by Pell, 2004), only two of which are monophyletic as circumscribed, Dobineae (*Campylopetalum* and *Dobinea*) and Semecarpeae (represented here by *Drimycarpus*, *Melanochyla* and *Semecarpus*, but also including *Holigarna* and *Nothopegia*). Pell and Mitchell (Mitchell et al., 2006) proposed the most recent classification, which includes two subfamilies, Anacardioideae and Spondioideae, both of which are shown here to be polyphyletic. Our results do provide some clarity for the position of three genera that have been difficult to place within the evolutionary context of the family: *Buchanania, Campnosperma,* and *Pentaspadon*. Although it is most often treated as a member of tribe/subfamily Anacardieae/Anacardioideae, *Buchanania* is here resolved sister to a clade of taxa traditionally recognized within tribe/subfamily Spondiadeae/Spondioideae. *Campnosperma* and *Pentaspadon* remain challenging but are resolved as sister lineages to much larger clades: *Pentaspadon* is sister to the rest of Anacardiaceae and *Campnosperma* is sister to a clade that includes all of subfamily Anacardioideae included in our sampling (excluding *Pentaspadon* from the Mitchell et al. (2006) circumscription; *Buchanania* + clade of *Dobinea* to *Rhus chinensis* Mill.) and most of Spondioideae (clade containing *Choerospondias* to *Operculicarya*).

Within Burseraceae, phylogenetic relationships show the monotypic Mexican species *Beiselia mexicana* sister to a clade containing three well-supported lineages that correspond to the *Protium* alliance or Protieae (sensu Thulin et al., 2008), the *Bursera* alliance or Bursereae (sensu Thulin et al., 2008) and the *Boswellia* + *Canarium* alliance hereafter referred to as Garugeae (sensu Thulin et al., 2008) (**Figure 4**). Within Protieae, a clade of Southeast Asian and Madagascan species is sister to the speciesrich American clade containing the remaining *Protium* species and the genera *Tetragastris* and *Crepidospermum* (see also Fine et al., 2014). Within Bursereae, the monotypic West African *Aucoumea klaineana* is sister to a well-supported clade containing American *Bursera* and predominantly African *Commiphora*. Our study finds marginal support for a paraphyletic *Bursera* with the lineage corresponding to *Bursera* subg. *Bursera* sister to *Commiphora* and *Bursera* subg. *Elaphrium*. More robustly sampled phylogenies of Anacardiaceae and Burseraceae (Pell et al. in prep.; Weeks et al. in prep.) will expand on the brief taxonomic results presented in this publication.

#### **TIMING OF DIVERSIFICATION AND DIVERSIFICATION PATTERNS**

Early diversification events in Terebinthaceae including stem and crown divergences of both families occurred in the Early to Late Cretaceous (**Figure 2**, Supplementary Material Figures S1A,B). The divergence of Terebinthaceae from the other Sapindales lineages and its split into Anacardiaceae and Burseraceae spanned the Albian–Aptian of the Early Cretaceous (116 Ma, 105–127 Ma; mean age, 95% HPD and 108 Ma, 95–121 Ma, respectively). The crown radiations of Anacardiaceae (97 Ma, 83–128 Ma) and Burseraceae (91 Ma, 78–106 Ma) were centered on the Cenomanian of the Late Cretaceous. Lineage through time (LTT) plots show that even though the radiation of Anacardiaceae is slightly older than that of Burseraceae, accumulation of lineages through time has been roughly equivalent in both families (**Figure 5**). Nevertheless, there is evidence that two, rather than one diversification process has governed the evolution of these families. The highest marginal likelihood was assigned to the model with different Birth-Death diversification processes for each family (*LM* = −1188*.*75). A Bayes Factor analysis shows very strong evidence in favor of this model against a model with a single Birth-Death diversification process for the whole tree (*BF* = 12). However, the mean net diversification rate for both families is approximately the same (2.4 vs. 2.5). BAMM analysis found strong support for a model with one rate shift, with a posterior probability *p* = 0*.*74. Bayes Factors show strong evidence in favor of this model vs. a null model with no shifts (*BF* = 574*.*73). These results suggest a substantial increase in speciation rate in the ancestral lineage leading to the Protieae within Burseraceae (Supplementary Material Figure S2). The posterior probability of a rate shift occurring in the three deepest branches of the Protieae is *p* = 0*.*91. Bayes Factors indicate overwhelming evidence in favor of a rate shift in the branch leading to the Neotropical Protieae (*BF* = 2004*.*07).

#### **GEOGRAPHIC RANGE AND CLIMATIC NICHE EVOLUTION**

Lagrange analyses show there is uncertainty in the geographic ranges and climatic niches for several ancestors (Supplementary Material Figure S3, Tables S1, S2). We define uncertainty as when multiple ancestral states are within two log-likelihood scores. With this in mind, we restrict our description of results and discussion only to the most likely reconstructions (i.e., the one with the highest relative probability).

Lagrange analysis indicates that the most recent common ancestor of Terebinthaceae had a widespread geographic range and it occurred in wet and dry tropical climatic niches (**Figures 6**, **7**). Speciation within this broad geographic range led to a lineage restricted to tropical wet climates in Southeast Asia (for the most recent common ancestor of Anacardiaceae), and to a lineage inheriting the widespread ancestral geographic range and ancestral climatic niche (for the most recent common ancestor of Burseraceae). Within Anacardiaceae, subsequent speciation events during the Cretaceous occurred within Southeast Asia with a geographic range expansion into the tropical wet forests of sub–Saharan Africa around the Cretaceous–Paleocene boundary.

geographic ranges are shown at nodes. For each species, the extant geographic range is represented by colored boxes.

geographic range evolution was *D* = 0*.*22. In contrast, few ancestors occurred across wet and dry tropical climates, and these quickly specialized to either tropical wet or tropical dry climates. The overall rate of dispersal/expansion for climatic niche evolution was *D* = 0*.*05. In total, there were only 11 cladogenic events that coincided with the transition from broad tropical climates to tropical dry or tropical wet climates. The only exception to this pattern is the Garugeae clade in which widespread geographic ranges among ancestors were maintained across multiple cladogenic events. Nevertheless, this clade remained highly specialized to tropical wet climates.

#### **DISCUSSION**

#### **ORIGINS OF TEREBINTHACEAE AND PATTERNS OF DIVERSIFICATION IN ANACARDIACEAE AND BURSERACEAE**

The Cretaceous age of the stem and crown lineages of Terebinthaceae (95–127 Ma) coincides with several episodes of continental vicariance that may have contributed to their early widespread distribution and spurred their diversification through allopatric speciation. Biogeographic reconstructions place the common ancestor of extant Anacardiaceae in Southeast Asia and extant Burseraceae in virtually all pantropical locations soon after their divergence. This reconstruction, combined with the Northern Hemisphere distribution of pre-Eocene Terebinthaceae fossils and the earliest diverging extant taxa, points most strongly to an origin of Terebinthaceae in Laurasia rather than Gondwana. In Burseraceae, the oldest fossils attributable to the clade derive from the Paleocene and Eocene of North America and Britain (Daly et al., 2011). Definitively Anacardiaceae fossils appear in the Eocene floras of North America and Europe (e.g., Taylor, 1990; Manchester, 1994; Ramírez and Cevallos-Ferriz, 2002; Meyer, 2003). An intriguing possibility is that the northward incursion of the Atlantic Ocean between North America and Eurasia/proto Southeast Asia during the Upper Cretaceous (Seton et al., 2012) contributed to these early diversification events by interrupting gene flow among the increasingly isolated regions of Laurasia.

After the divergence of the families, the Upper Cretaceous brought crown radiations and the establishment of at least four lineages in Burseraceae and six in Anacardiaceae (**Figures 2**–**4** and Supplementary Material). Timing of the older diversification events within Burseraceae is broadly comparable to that found by previous studies of the family. Similar studies in Anacardiaceae are lacking, with the biogeography of only some of the more recently diversified clades having been evaluated. Our study places the crown radiation of Burseraceae firmly in the Upper Cretaceous, notably older than the Paleocene age estimated by Weeks et al. (2005) and Fine et al. (2014). This discrepancy may be caused by the inclusion of the species sister to all other members of the family, *Beiselia mexicana*, a monotypic genus distributed in western Mexico. It has highly divergent DNA sequences and may inflate older ages as a consequence of its effect on the DNA alignment across Sapindalean taxa.

During the Cretaceous, Anacardiaceae accumulated more lineages than Burseraceae, either through increased speciation or decreased extinction (**Figures 2**–**5**). Diversifications of Burseraceae also span the K-T boundary. The stem and crown ages of Burseraceae's Protieae, Bursereae and Garugeae clades

During the Paleocene, geographic range expansion into sub-Saharan Africa led to allopatric divergence, with a daughter species inheriting the Southeast Asian geographic range and its sister species inheriting the sub-Saharan African range and further expanding into South America. The Southeast Asian ancestor later diverged into descendant species that expanded their geographic ranges to sub-Saharan Africa and colonizing South America. Although these geographic range expansions did not involve changes in the ancestral Tropical Wet climatic niche, the most recent common ancestor of *Buchanania* and *Operculicarya* expanded into tropical dry climates in Southeast Asia. During the Eocene, Anacardiaceae dispersed into North America, Oceania, and Madagascar, with some ancestors from tropical wet climatic niches expanding into tropical dry as well as temperate climatic niches. From the Miocene forward, there were multiple ancestral geographic range and climatic niche expansions, including the colonization of Eurasia and diversification in temperate climates. Overall, the predominant pattern in the geographic range and climatic niche evolution of Anacardiaceae is that ancestors with widespread geographic ranges and broad climatic tolerances were common and persisted through multiple speciation events, suggesting that dispersal and evolution of climatic tolerances are common in this family. The overall rate of dispersal/expansion for geographic range evolution was *D* = 0*.*15, and it was *D* = 0*.*14 for climatic niche evolution. Transitions between climatic niches were extremely common and included 74 instances of expansions or specializations to new niches (**Figure 6**).

Unlike Anacardiaceae, Burseraceae did not experience as many geographic range expansions and clearly fewer climatic niche expansions (**Figure 7**). In general, several ancestors with widespread geographic ranges spanned multiple speciation events. However, multiple ancestors diversified within distinct geographic regions. The overall rate of dispersal/expansion for

closely match the Paleocene ages estimated by studies that sample fewer Sapindalean outgroups (Weeks et al., 2005; De-Nova et al., 2012; Fine et al., 2014). Phylogenetic relationships resolved within these large clades reveal the ancient but relatively rapid establishment of pantropical distributions.

Our analyses of diversification through time indicate that the evolutionary history of Terebinthaceae has been shaped by a mixture of heterogeneous processes. We found strong evidence for an explosive burst in speciation associated with the origin of the Neotropical Protieae. Fine et al. (2014) also found support for a rate shift for this clade using a different modeling approach. Taken together, this suggests that the acceleration in rates during this time interval likely reflects the occurrence of a key innovation or the colonization of a new geographic region and open ecological opportunities. It is noteworthy that similar geographic range expansions did not lead to rate shifts elsewhere in the history of Terebinthaceae [e.g., the clade *Bursera* (Burseraceae) or the clade Anacardioideae 2 (Anacardiaceae)]. Although sampling bias may influence the inference of diversification dynamics, the fast and recent radiation of the Neotropical Protieae (Fine et al., 2014) deeply altered the steady the accumulation of lineages in the history of Terebinthaceae from the Cretaceous to the present. Increased sampling in other species-rich clades such as *Bursera* could inform whether more rate shifts have shaped the evolutionary history of Terebinthaceae.

#### **GEOGRAPHIC RANGE EVOLUTION: HAVE LINEAGES PERSISTED IN UNIQUE GEOGRAPHIC REGIONS OR HAVE THEY DISPERSED TO NEW GEOGRAPHIC REGIONS (I.E., HOW COMMON IS DISPERSAL)?**

Long-distance dispersal features prominently in the biogeographic history of Terebinthaceae. Although dispersal rates estimated with Lagrange may not be precise (Ree and Smith, 2008), our results suggest that dispersal rates for both Anacardiaceae and Burseraceae are relatively high and similar (*DAnacardiaceae* = 0*.*15, *DBurseraceae* = 0.22). While frequency of long-distance dispersal in plants is not necessarily correlated to dispersal syndrome (Higgins et al., 2003), seeds of the majority of Burseraceae and Anacardiaceae taxa are dispersed by animals (esp. birds, bats, terrestrial mammals; Daly et al., 2011; Pell et al., 2011). Some members of both families are wind-dispersed and a few Anacardiaceae species are water dispersed. A closer examination of Terebinthaceae evolution reveals cases in which repeated short-distance dispersals or extreme long-distance dispersal must be invoked. In Burseraceae, within Garugeae, separation of Southeast Asian and African taxa occurs (*Boswellia*, *Garuga*; 52 Ma, 33–68 Ma) along with a separation of a New Caledonian endemic taxon, *Canarium oleiferum*, from the remaining pantropical Garugeae clade (46 Ma, 36–59 Ma). In Anacardiaceae, sister lineages from South America and sub-Saharan Africa diverge (*Anacardium, Fegimanra;* 48 Ma, 47–51 Ma), African and Oceanian taxa split (*Blepharocarya*, rest of Anacardioideae '2 Ma, 30–56 Ma), Madagascan and African taxa diverge (*Faguetia, Trichoscypha*; 41 Ma, 25–58 Ma), and South and North American lineages diverge (Anacardioideae 1, 2; 47 Ma, 39–57 Ma). The similar timing of these biogeographic expansions suggests relatively rapid dispersal among continents followed by radiation within continental regions but does not indicate shared routes. For instance, Fine et al. (2014) posits that the predominantly South American Protieae derived from an ancestor that dispersed across the Atlantic Ocean from North America or Africa, whereas Weeks and Simpson (2007) suggest the ancestor of Paleotropical *Commiphora* migrated from the Americas to the Old World across the North Atlantic boreotropical land-bridge.

Closer examination of the biogeographical reconstructions of these two families reveal repeated instances of lineage divergences associated with two different regions and/or continents. These splits occurred throughout the past 100 million years, including some very recent events. While some of these divergences may have been due to vicariance, we believe the great majority of them have been due to long-distance dispersal events. Recent reviews of the biogeographic history tropical woody plant lineages have emphasized the importance of long-distance dispersal (Lavin et al., 2004; Pennington and Dick, 2004; Renner, 2004). Our results support this view, and we conclude that both Anacardiaceae and Burseraceae have moved easily across oceans.

#### **CLIMATIC NICHE EVOLUTION: HAVE LINEAGES RETAINED DISTINCT CLIMATIC NICHES OR HAVE THEY EVOLVED CLIMATIC TOLERANCES (I.E., HOW COMMON IS "NICHE EXPANSION")?**

Burseraceae are characterized by a low degree of climatic niche evolution. There are no frost-tolerant species, i.e., all Burseraceae are restricted to tropical and subtropical latitudes. Although Burseraceae are common and even dominant elements in seasonally-dry tropical forests and xeric scrublands as well as moist/wet rain forests across the tropics, switches between wet and dry climates are relatively rare in the family (**Figure 7**). For example, *Commiphora* and *Bursera*, which dominate some seasonally-dry regions of sub-Saharan Africa and Mesoamerica respectively, share a common ancestor that almost certainly was a dry forest specialist. Both dry-forest and wet-forest lineages are ancient in Burseraceae, and both climatic niches have included Burseraceae taxa since before the Paleocene.

Unlike Burseraceae, transitions among climatic niches are extremely common in Anacardiaceae. Our estimate of the overall rate of dispersal/expansion for Anacardiaceae is at least twice the rate in Burseraceae (*DAnacardiaceae* = 0.14, *DBurseraceae* = 0.05). In the Lagrange analysis (**Figure 6**), the most recent common ancestor of all Anacardiaceae and all deep nodes within the family are hypothesized to be wet forest taxa until the first expansion into dry climates during the Paleocene and then many more during the Eocene. Colonization of the temperate zone occurred early in Anacardiaceae evolution, likely during the Eocene. There have been at least 74 climate transitions (expansions and specializations) across all climatic niches examined here. Interestingly, although transitions between wet and dry climates are very common in Anacardiaceae, expansion into the temperate climate appears to have occurred in only one clade (although may have been lost and re-evolved in the same clade several times). This clade includes a broad selection of genera primarily in the Americas with a scattering of taxa from Europe, Asia, and the Pacific. Clades within this lineage in which frost tolerance has evolved are *Cotinus*, *Pistacia*, *Rhus* s.s., *Schinus,* and *Toxicodendron*. Within this clade, there have been multiple transitions among temperate and wet and dry tropical climates.

It is clear that climatic niche evolution is not integral to explaining the high species diversity of Burseraceae, as close relatives almost always occur in the same climatic niche. However, for Anacardiaceae, it is tempting to make a connection between climatic niche evolution and diversification. For several clades, sister species (or sister clades) share the same geographic region but inhabit different climatic niches, suggesting that specialization to dry or wet (or temperate vs. tropical) could arise after a widespread taxon lived in more than one climate. Tradeoffs involved in drought or frost tolerance are likely involved in such specialization. For example, Pittermann et al. (2012) showed that adaptation to dry biomes by Cupressaceae trees involves cavitation resistant xylem which results in reduced photosynthetic rates causing low growth rates which presumably prevents dry-adapted lineages from competing successfully in wet biomes.

Wind dispersal occurs in both families but to a greater degree of frequency and complexity in Anacardiaceae. Burseraceae have five wind-dispersed genera (*Aucoumea*, *Ambilobea*, *Beiselia, Boswellia* and *Triomma*; ca. 24 species) in all of which wings are obtained via conplanation of the pyrene, whereas Anacardiaceae have 23 genera, including ca. 75 species, in which a wide variety of mechanisms have evolved to facilitate wind dispersal. These include, for example, wings developed from petals, sepals, pericarp, subtending bracts, and whole inflorescences. Wind dispersal is more common in tropical dry forests than in tropical wet forests (Gentry, 1982, 1991), and thus perhaps affords an evolutionary advantage in this habitat. Fruit structure in Anacardiaceae may be more plastic than in Burseraceae, suggested by the fruit diversity in the family and the multiple times wind dispersal has evolved through the modification of different structures. Anacardiaceae, after being dispersed to a dry habitat, may have evolved more advantageous wind dispersed fruits quickly *in situ*, or they may have evolved wind-dispersed fruits in wet habitats then dispersed to dry habitats. Physiological responses to environment may also play a role in the ability of Anacardiaceae to change habitats. In some wet to dry switching lineages, like *Astronium,* the moist forest species occur in areas that often have a briefly drier period during which the species may lose their leaves. Ancestors of these lineages may have also had periodic deciduousness, possibly pre-adapting them to more seasonal forests with longer, more extreme dry periods. Other Anacardiaceae lineages include examples of morphological plasticity with respect to leaf morphology. For example in the genera *Loxopterygium* and *Astronium* leaves of most wet-habitat taxa are mostly entire margined, while leaves of most dry-habitat taxa have toothed margins.

In contrast to Anacardiaceae, it appears that the great majority of diversification occurs *within* climatic niches for Burseraceae. There are many mechanisms that could yield this pattern. First, if a lineage has frequent dispersal to the same climatic niche in different geographic regions, allopatric speciation could occur, and then re-dispersal back to the original region could inflate sympatric species totals. Second, habitat specialization to other habitats within climatic niches or niche partitioning along other niche axes within habitats can increase the numbers of species of a lineage within a climatic niche. For example, edaphic specialization to different soil types has been implicated in the diversification of the Protieae (Fine et al., 2005, 2014). Finally, escape from natural enemies through effective chemical defenses may promote species radiations within biomes (Ehrlich and Raven, 1964). Becerra (2007) showed that the terpene defenses of *Bursera* were more divergent than expected by co-occurring species within regions, and she suggested that coevolutionary interactions between Burseraceae-feeding beetles promoted chemical divergence and speciation in this group. Other Burseraceae lineages such as the Protieae have also been shown to express a wide diversity of terpenes and other nonvolatile antiherbivore defenses Fine et al., 2006, 2014; Zapata and Fine, 2013.

## **CONCLUSION**

We found that a densely sampled, comprehensive geographic sample of Anacardiaceae and Burseraceae taxa has yielded a highly supported phylogenetic reconstruction that supports current taxonomic concepts of both families. Moreover, our fossil-calibrated chronogram and biogeographic analyses give results that are broadly congruent with the fossil record. We conclude that the most common ancestor to these families was widespread and likely originated in Northern Hemisphere during the Cretaceous. Continental vicariance between hemispheres may have spurred initial divergence into Burseraceae and Anacardiaceae and indeed the two families followed different evolutionary trajectories since their split, with Anacardiaceae steadily accumulating lineages since the late Cretaceous–Paleocene while the majority of Burseraceae's diversification has occurred much more recently, with Miocene radiations of the Protieae and Bursereae. Both families have relied on effective wind and animal dispersal to achieve pantropical distributions with multiple intercontinental colonization events inferred for both families throughout the past 100 million years. Anacardiaceae have shifted climatic niches frequently during this time, including colonization of the temperate biomes, while Burseraceae have experienced very few shifts between tropical dry and tropical wet climates, with no temperate zone adaptation. Thus, in the context of the question of whether is it easier for these plant lineages to move or to evolve, we conclude that both Anacardiaceae and Burseraceae move easily, but Anacardiaceae have a much greater capacity to adapt to new climate regimes than Burseraceae and this is one of the most striking features of their evolutionary history.

#### **ACKNOWLEDGMENTS**

Research presented was funded by: NSF DEB awards 0919179 to Weeks, 0919567 to Fine, 0918600 to Daly and Mitchell, and 0919485 to Pell; the Thomas and Kate Jeffress Memorial Foundation (Weeks), Conservation International (Pell), and the Beneficia Foundation (Pell). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or our other sponsors. Analyses were conducted with computational resources and services at the Center for Computation and Visualization at Brown University, supported in part by NSF EPSCoR EPS-1004057 and the State of Rhode Island. We would like to thank the following herbaria for allowing us to destructively sample specimens for this project: BKL, GMUF, H, JEPS, L, MO, NY, UC, US.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fgene*.*2014*.* 00409/abstract

#### **REFERENCES**


Collinson, M. E. (1983). *Fossil Plants of the London Clay.* Oxford: University Press.


und ausgestorbenen Anacardiaceae. *Bot. Jahrb. Syst. Pflangengeschichte Pflanzengeographie* 1, 365–426.


Good, R. (1974). *The Geography of Flowering Plants*. London: Longman Group Ltd.


tropical trees. *Mol. Phylogenet. Evol*. 68, 432–442. doi: 10.1016/j.ympev.2013. 04.024

Zerega, N. J. C., Clement, W. L., Datwyler, S. L., and Weiblen, G. D. (2005). Biogeography and divergence times in the mulberry family (Moraceae). *Mol. Phylogenet. Evol.* 37, 402–416. doi: 10.1016/j.ympev.2005.07.004

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 July 2014; accepted: 04 November 2014; published online: 28 November 2014.*

*Citation: Weeks A, Zapata F, Pell SK, Daly DC, Mitchell JD and Fine PVA (2014) To move or to evolve: contrasting patterns of intercontinental connectivity and climatic niche evolution in "Terebinthaceae" (Anacardiaceae and Burseraceae). Front. Genet. 5:409. doi: 10.3389/fgene.2014.00409*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Weeks, Zapata, Pell, Daly, Mitchell and Fine. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Corrigendum to: The establishment of Central American migratory corridors and the biogeographic origins of seasonally dry tropical forests in Mexico

## *Charles G. Willis 1,2\*, Brian F. Franzone2, Zhenxiang Xi <sup>2</sup> and Charles C. Davis <sup>2</sup>*

*<sup>1</sup> Harvard University Center for the Environment, Cambridge, MA, USA*

*<sup>2</sup> Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, MA, USA*

*\*Correspondence: charleswillis@fas.harvard.edu*

*Edited and reviewed by:*

*Toby Pennington, Royal Botanic Garden Edinburgh, UK*

**Keywords: adaptive lag time, diversification, land bridge, long-distance dispersal, pre-adaptation, tropical biogeography, South America, species pool**

#### **A corrigendum on**

#### **The establishment of Central American migratory corridors and the biogeographic origins of seasonally dry tropical forests in Mexico**

*by Willis, C. G., Franzone, B. F., Xi, Z., and Davis, C. C. (2014). Front. Genet. 5:433. doi: 10.3389/fgene.2014.00433*

In the original article, the title of the article in the Supplementary Material was wrong. The correct Supplementary Material appears below. This mistake does not change the scientific conclusions of the article in any way.

The author apologizes for this error.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http:// www.frontiersin.org/journal/10.3389/fgene. 2015.00064/abstract

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 21 January 2015; accepted: 09 February 2015; published online: 24 February 2015.*

*Citation: Willis CG, Franzone BF, Xi Z and Davis CC (2015) Corrigendum to: The establishment of Central American migratory corridors and the biogeographic origins of seasonally dry tropical forests in Mexico. Front. Genet. 6:64. doi: 10.3389/fgene. 2015.00064*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2015 Willis, Franzone, Xi and Davis. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Historical factors that have shaped the evolution of tropical reef fishes: a review of phylogenies, biogeography, and remaining questions

## *Peter F. Cowman\**

Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA

#### *Edited by:*

James Edward Richardson, Royal Botanic Garden Edinburgh, UK

#### *Reviewed by:*

Giacomo Bernardi, University of California, Santa Cruz, USA Lukas Rüber, Naturhistorisches Museum der Burgergemeinde Bern, Switzerland

#### *\*Correspondence:*

Peter F. Cowman, Department of Ecology and Evolutionary Biology, Yale University, 21 Sachem Street, New Haven, CT 06511, USA e-mail: peter.cowman@yale.edu

Biodiversity patterns across the marine tropics have intrigued evolutionary biologists and ecologists alike. Tropical coral reefs host 1/3 of all marine species of fish on 0.1% of the ocean's surface.Yet our understanding of how mechanistic processes have underpinned the generation of this diversity is limited. However, it has become clear that the biogeographic history of the marine tropics has played an important role in shaping the diversity of tropical reef fishes we see today. In the last decade, molecular phylogenies and age estimation techniques have provided a temporal framework in which the ancestral biogeographic origins of reef fish lineages have been inferred, but few have included fully sampled phylogenies or made inferences at a global scale. We are currently at a point where new sequencing technologies are accelerating the reconstruction and the resolution of the Fish Tree of Life. How will a complete phylogeny of fishes benefit the study of biodiversity in the tropics? Here, I review the literature concerning the evolutionary history of reefassociated fishes from a biogeographic perspective. I summarize the major biogeographic and climatic events over the last 65 million years that have regionalized the tropical marine belt and what effect they have had on the molecular record of fishes and global biodiversity patterns. By examining recent phylogenetic trees of major reef associated groups, I identify gaps to be filled in order to obtain a clearer picture of the origins of coral reef fish assemblages. Finally, I discuss questions that remain to be answered and new approaches to uncover the mechanistic processes that underpin the evolution of biodiversity on coral reefs.

**Keywords: coral reef fishes, ancestral biogeography, marine tropics, phylogeny, diversification**

#### **INTRODUCTION**

A latitudinal gradient in species diversity is a common feature of many taxonomic groups, both terrestrial and marine (Willig et al., 2003; Hillebrand, 2004). However, a longitudinal gradient in species diversity is also apparent across the marine tropics. Fishes exemplify this diversity gradient (Hughes et al., 2002; Tittensor et al., 2010) driven largely by patterns of species richness associated with tropical coral reef habitats. Species richness of reef associated fishes forms an enigmatic "bullseye" pattern centered on the Indo-Australian Archipelago (IAA; **Figure 1A**). This region has also been called several other names (reviewed by Hoeksema, 2007), but its position at the center of this species richness gradient has given it status as the largest marine biodiversity hotspot, covering two thirds of the global equatorial tropics (Bellwood et al., 2012). Unlike terrestrial biodiversity hotspots (Myers, 1988; Myers et al., 2000), centers of endemism are not concordant with the center of highest species diversity, whether endemic species are defined by regional checklists (**Figure 1B**), or the extent of their geographic range (Hughes et al., 2002; but see Mora et al., 2003). Traditional hotspot analysis of the marine environment has identified endemic centers under high levels of threat (Roberts et al., 2008), however these 10 defined areas of endemism exclude some areas that have the high diversity of overlapping, wide ranging species. In

addition to the distinctive biodiversity gradient, the tropics have been divided into a number of realms, regions, provinces and ecoregions based on shared environmental characteristics (Spalding et al., 2007), composition of endemic taxa (Briggs and Bowen, 2012), or measures of species dissimilarity (Kulbicki et al., 2013). Although these differing regional schemes are based on present day patterns, it appears that the division of regional assemblages across the tropics is linked to its biogeographic history and the formation of several historical barriers to dispersal (Cowman and Bellwood, 2013a,b). While environmental clines in sea surface temperature are linked to latitudinal variation in diversity (Tittensor et al., 2010), the extensive tectonic, eustatic, climatic, oceanographic and geomorphological (TECOG; Bellwood et al., 2012) processes have played an important role in the origin and maintenance of the tropical biodiversity gradient spanning both deep and shallow times scales (Renema et al., 2008; Pellissier et al., 2014).

The mechanistic processes that underpinned this history and how they have generated such biodiversity has inspired much debate over the last 40 years (Potts, 1985; Briggs, 1999; Bellwood and Meyer, 2009a; Cowman and Bellwood, 2013a) with numerous hypotheses being proposed (Bellwood et al., 2012), but little consensus. The answers to key questions regarding where species

**reef fishes. (A)** Map of species biodiversity by tropical ecoregion (Spalding et al., 2007) with color gradient denoting areas of high species richness (dark red) to areas of low species richness (light red). **(B)** Map of endemic species by ecoregion. Under this scheme a species is endemic if it is only found in a single ecoregion, i.e., a regional assessment of endemism rather that designated by percent of area comparison (Hughes

species counts from the "checklist" × "all species" dataset of Kulbicki et al. (2013). **(C)** Biogeographic delineation of tropical Realms, Regions, and Provinces based on species dissimilarity analysis of Kulbicki et al. (2013). This biogeographic scheme is base on checklists as base units (see Kulbicki et al., 2013), however here the scheme is imposed onto the tropical ecoregions of Spalding et al. (2007).

have originated and the processes that have promoted speciation and extinction in tropical clades remain unclear. The popularity of phylogenetics, fossil-calibrated age estimation techniques, and the availability of geographic information have allowed biologists to examine the history of the taxa that form the marine biodiversity hotspot. In the case of coral reef fishes and the IAA

biodiversity hotspot, what have we learned in the last decade? While new genomic sequencing methods are becoming available (Faircloth et al., 2013) and larger datasets are increasing the resolution of deeper nodes in the Fish Tree of Life (Near et al., 2012), what gaps remain in the evolutionary record of reef associated fishes? How complete is our understanding of the evolution of biodiversity on tropical reefs and which questions will benefit from more sampling, more data, and new analytical approaches?

In this review, I examine the phylogenetic and biogeographic completeness of key families found in reef habitats globally. By exploring the current state of the biogeographic history of tropical reef fishes I highlight where further analysis and discussion is needed, and what new questions require answers.

#### **EVOLUTION OF FISHES ON CORAL REEFS – FILLING IN THE GAPS**

Bellwood andWainwright (2002) discussed the biogeographic history of fishes on coral reefs. They stated that from the integration of systematics, biogeography, ecology, and paleontology a new understanding of the nature of reef fishes would arise. Twelve years on, the integration of methods and multiple datasets has cast a wide net across the fields of reef fish ecology, evolution and biogeography to give vast insight into the important phases of evolution of coral reef fishes over the past 400 million years (Bellwood et al., in press). A major part of this insight has come from the combination of molecular phylogenetics and the fossil record to form a temporal framework in which to ask questions regarding the origin and tempo of reef fish diversification. Although only a handful of new fossils have been described with reef affinities in the last decade (Carnevale, 2006; Micklich et al., 2009; Bannikov, 2010) the fossil record of early reef fish forms continues to provide a wealth of information on the early origins of reef association in teleost fishes. New analytical techniques have revealed the origins and diversification of anatomical features (Friedman, 2010), important morphological transitions (Goatley et al., 2010), and the emergence of essential functional roles on coral reefs (Bellwood et al., 2014a,b). However, it is the utility of fossils as calibrations points on molecular phylogenies that have allowed the evolution of reef associated lineages to be studied on an absolute timescale. In particular, while the origins of several

reef fish groups can be found in the fossil deposits of the Monte Bolca *Lagerstätten* (50 mya; Bellwood, 1996) the crown ages and the diversification of major lineages that lack a fossil record have only been examined with the aid of calibrated chronograms.

There has been debate over what characterizes a 'coral reef' fish (Bellwood, 1998; Robertson, 1998), but a general list of 'reef' fish families (**Table 1**) identifies those groups that are characteristic of a modern reef assemblage (both coral and rocky reefs), regardless of geographic location (Bellwood and Wainwright, 2002). Indeed, species counts of these families on coral reefs around the world are found in relatively similar proportions (Bellwood and Hughes, 2001). Although these nine fish families found on coral reefs are often used as model groups to address questions regarding diversification on coral reefs, there are at least 35 families of acanthomorph fishes that can be considered 'reef associated' (Price et al., 2014). Some of these families are monotypic (e.g., Zanclidae) while other can be entirely reef dwelling but not globally distributed (e.g., Siganidae). Interestingly, the most diverse fish family found on reefs, the Gobiidae, containing over 2000 species, has several lineages confined to coral reefs (Herler et al., 2011), yet remains off the list of traditional reef fish families. Its exclusion from this list may be related to their consistent undersampling in geographic surveys (Ackerman and Bellwood, 2000), made ever more difficult by their cryptic nature and many undescribed species. Nonetheless, this non-traditional reef fish family may provide a good model to study speciation and biodiversity on coral reefs (Rüber et al., 2003; Taylor and Hellberg, 2005). While this review focuses on those nine families classically recognized as reef fish families, other lineages found on (and off) reefs might provide further insight into the evolution of tropical biodiversity. The utility of these families and lineages should be determined by several factors, most importantly, the level at which they have been sampled for phylogenetic reconstruction. As the nine reef fish families are prominent on coral reef around the globe they have been examined with phylogenetic



Species richness and percentage of reef associated members are taken from http://www.fishbase.org. % F is the percent sampling of species in published family level study; % EToL is the percent sampling of species in the Euteleost Tree of Life (Betancur-R et al., 2013); % R is the percent sampling of species in the published phylogeny of Rabosky et al. (2013); % N is the percent sampling of species in the published phylogeny of Near et al. (2013); % GASPAR is the percent of family richness that are present in the checklists of the GASPAR database (Parravicini et al., 2013). Superscript denotes source of family level phylogeny: 1-Cowman and Bellwood (2011); 2-Choat et al. (2012); 3-Hundt et al. (2014); 4-Dornburg et al. (in press); 5-Frédérich et al. (2013); 6-Sorenson et al. (2013); 7-Reed et al. (2002). \*No phylogeny from a family level study was accessible for the Mullidae.

methods and increasing levels of genetic data over the past two decades.

Early molecular phylogenetic studies within reef fish groups contained a small number of taxa in select genera (Lacson and Nelson, 1993; McMillan et al., 1999; Bernardi et al., 2001; Reed et al., 2002). Later, the combination of generic level phylogenies with relaxed clock methods (Sanderson, 2003) allowed the estimation of ages of divergence within several reef-associated lineages (Bellwood et al., 2004; Bernardi et al., 2004; Klanten et al., 2004; Barber and Bellwood, 2005; Read et al., 2006). With improved sequencing efforts at the family level (Westneat and Alfaro, 2005; Cooper et al., 2009; Thacker and Roje, 2009), molecular datasets have given insight into the crown origins of reef fish groups and the tempo at which they have diversified (Westneat and Alfaro, 2005; Alfaro et al., 2007; Fessler and Westneat, 2007; Cowman et al., 2009; Frédérich et al., 2013). Even though the characteristic nine families have been the focus of many phylogenetic studies (albeit some more than others), as of yet, not one of these families is represented by a fully sampled, species level phylogeny (**Table 1**). While the majority of major lineages and genera are sampled within these phylogenetic studies, the level at which species within these lineages are sampled varies dramatically (**Figure 2**). Thosefamilies that have more completely sampled phylogenies have achieved it through the combination of multiple sequence datasets and the use of supermatrix phylogenetic methods. The combination of datasets for the butterflyfish family Chaetodontidae (Fessler and Westneat, 2007; Bellwood et al., 2010) has resulted in a phylogeny that is over 70% complete (**Table 1**; Cowman and Bellwood, 2011). Similarly, the family Acanthuridae is nearly complete (76%) through the combination of previously published and new sequence data (Sorenson et al., 2013). Other families have been the focus of several phylogenetic studies, incrementally increasing taxon sampling as more data or specimens become available, e.g., the wrasses, family Labridae (now inclusive of odacids and parrotfishes; Westneat and Alfaro, 2005; Alfaro et al., 2009; Cowman et al., 2009; Kazancioglu et al., 2009; Cowman and Bellwood, 2011); and the damselfishes, family Pomacentridae (Cooper et al., 2009; Cowman and Bellwood, 2011; Frédérich et al., 2013). Within the Labridae and Pomacentridae, shallower lineages have also been examined with increased sampling to explore a variety of evolutionary and ecological questions (Smith et al., 2008; Choat et al., 2012; Hodge et al., 2012; Litsios et al., 2012). Other families, such as the Blenniidae and the Apogonidae have been plagued by taxonomic issues that are only beginning to be addressed with more taxa and multi-locus datasets (Thacker and Roje, 2009; Hundt et al., 2014). The incomplete phylogenetic sampling for reef fishes is exacerbated by the rate of new species descriptions and identification of cryptic species (Zapata and Robertson, 2006; Mora et al., 2008; Bowen et al., 2013). Recently, Allen (2014) reviewed the systematics of Indo-Pacific coral reef fishes over the past three decades to reveal that over 1,400 new species have been described with an average of 51.3 new species description per year since 2010.

The incomplete sampling observed in these reef fish families appears to have been a general symptom seen across all fishes when compared to other vertebrate branches of the Tree of Life (Thomson and Shaffer, 2010a). However, there have been three recent efforts in reconstructing the Fish Tree of Life (Betancur-R et al., 2013; Near et al., 2013; Rabosky et al., 2013) with new sequencing methods (Faircloth et al., 2013) providing an exciting avenue for future phylogenomic research in fishes. These 'top down' approaches to reconstructing the Fish Tree of Life have greatly improved the resolution of deep nodes and divergences in the major fish groups, including those with reef affinities. These datasets have included varying degrees of taxon sampling of reef associated lineages (**Table 1**), depending on the core aim of the study. The chronogram of Near et al. (2013) concentrated on sampling all families of acanthomorphs with as complete a molecular matrix as possible. While it does not have high species level sampling of reef fish lineages, it has allowed the exploration of rates of transition of fish lineages (at the family level) on and off of reefs over the past 100 million years (Price et al., 2014). The chronogram of Rabosky et al. (2013) closely matches the sampling effort of family level studies of the nine characteristic groups, achieved by mining the published sequence data available on GenBank. In the cases of the families Carangidae and Mullidae, this concatenated super-matrix approach included more species than any other published phylogeny for each family (**Table 1**). These large-scale phylogenetic studies employing supermatrix methods have also allowed the identification of the closest sister families to prominent reef fish families. However, disagreement among these large phylogenies still exists for some families. For example, the closest sister group to the Labridae changes from being the family Centrogenyidae (Betancur-R et al., 2013), to the family Ammodytidae (Rabosky et al., 2013), to the family Gerridae (Near et al., 2013). This highlights the utility of supermatrix approaches, but caution is still needed in their implementation (Thomson and Shaffer, 2010b). In the case of fishes, more work remains to resolve some of these early diverging lineages at the top of the percomorph "bush" (Nelson, 1989), where the origin of several reef associated lineages are found. While these top–down approaches continue to reveal the early evolution of reef fishes, 'bottom–up' studies concentrating on the origins of extant lineages have provided a framework to examine the diversification of reef fishes over the past 65 million years.

#### **DIVERSIFICATION OF FISHES ON TROPICAL REEFS**

Phylogenetic sampling and the resolution of reef fish lineages remains a key issue for future research. However, for those groups that have been the focus of age estimation studies, some general, concordant patterns have emerged. The stem lineages of many reef lineages extend back into the Cretaceous (Near et al., 2013) while the crown origins are strongly associated with the aftermath of the K-Pg boundary mass extinction event (∼65 ma; Bellwood et al., in press). A recent study of family level transitions into reef habitat and associated morphological divergence has outlined two waves of colonization before and after the K-Pg boundary (Price et al., 2014). Initial colonization of lineages before the K-Pg boundary (90–72 mya) was accompanied with morphological divergence of clades, while the subsequent wave of reef colonization (65–56 mya) appears to saturate with increasing convergence in morphospace (Price et al., 2014). Patterns of

**FIGURE 2 | Phylogenetic sampling of characteristic reef fish families.** Published chronologies of the nine characteristic reef fish families found globally on coral reefs (Bellwood and Wainwright, 2002). Sources of these trees can be found in **Table 1**. Level of taxon sampling per lineage is denoted by color with black branches completely sample. Percent sampling was calculated by a per genus basis with species

counts taken from Fishbase (http://www.fishbase.org). In the cases of the families Labridae, Chaetodontidae, Pomacentridae and Apogonidae lineage richness estimates were taken from Cowman and Bellwood (2011). Asterisk indicates node where the parrotfish phylogeny of Choat et al. (2012) was grafted to the Labridae tree of Cowman and Bellwood (2011).

reef invasions within families are likely to be more dynamic (Price et al., 2014) with trophic evolution showing increasing association between fishes and the reef benthos (Cowman et al., 2009; Bellwood et al., 2010). From the appearance of more generalist trophic modes in the Eocene/Oligocene, new and novel trophic modes began to appear on reefs in the Miocene with the trophic system in place by 7 mya (Cowman et al., 2009; Bellwood et al., 2010). Some reef fish lineages have diversified ecologically by expanding into novel areas of morphospace (Friedman, 2010; Price et al., 2011) while others have exhibited convergent radiations across similar trophic strategies (Frédérich et al., 2013). An association with coral reef habitat appears to both promote clade diversity, with higher reef occupancy linked to faster rates of diversification (Alfaro et al., 2007; Cowman and Bellwood, 2011), and increased rates of morphological diversification within lineages (Price et al., 2011, 2013). Lineage diversity and morphological divergence do not appear to be related in these groups (Cowman et al., 2009; Price et al., 2011), however key innovations have been linked to increased diversity in some clades (Kazancioglu et al., 2009; Litsios et al., 2012; Wainwright et al., 2012). In addition, an over arching link between rate of body size evolution and rate of diversification appears to be a general trend across the fish tree of life (Rabosky et al., 2013), but its affect on the evolution of reef clades has not been examined.

By the end of the Eocene, major lineages leading to present day genera and tribes within reef fish families were in place for many reef fish families (Cowman and Bellwood, 2011). After what may have been a cryptic extinction event near the Eocene/Oligocene boundary, coinciding with the origin of the butterflyfishes (∼33 mya), a rebound in cladogenesis within reef associated lineages during the Oligocene/Miocene underpinned much of the extant diversity seen on todays reefs (Cowman and Bellwood, 2011). Several lineages within the Labridae, Pomacentridae, Apogonidae and Chaetodontidae display significantly more diversity than expected, with the most reef-associated lineages appearing more resistant to higher extinction rates than their non-reef counterparts (Cowman and Bellwood, 2011). Elevated cladogenesis has previously been identified in several marine fish lineages (Rüber and Zardoya, 2005), with reef association or habitat shifts suggested to be the underlying mechanism. Later, the relationship between reef association and elevated rates of diversification was demonstrated in Tetraodontiformes (Alfaro et al., 2007), marine gastropods (Williams and Duda, 2008), and more recently in sharks (Sorenson et al., 2014). Whether reef habitats promote this diversity through elevated speciation, or relaxed extinction remains to be seen. As extinction rates are notoriously difficult to estimate from molecular phylogenies in the absence of a paleontological record (Quental and Marshall, 2009, 2010; Rabosky, 2009b), the vital evidence in the form of Miocene fossils for many reef fish lineages, at least, remains out of reach. The expansion of coral reef habitat in the Miocene may have promoted cladogenesis, and provided a refuge from extinction, two processes that may vary on both temporal and geographic scales (Cowman and Bellwood, 2013a).

The majority of extant coral reef fishes examined by Cowman and Bellwood (2011, 2013a) are of Miocene age (23– 5 mya; **Figure 3**) with some possibly being older than the IAA

hotspot (Renema et al., 2008). While some geographic variation in the reconstructed ages of lineages exists (**Figure 3**), the older ages of extant species challenged the early suggestion that sea level fluctuations during the Pleistocene was a major factor in the origin of modern coral reef assemblages (Potts, 1985). For reef fishes, the majority of cladogenetic events occurred in the Miocene but speciation still continues in several groups from the Pleistocene onward (Rocha and Bowen, 2008). Pleistocene speciation in these groups may be linked to patterns of barrier vicariance (Bowen et al., 2013; Cowman and Bellwood, 2013b), peripheral budding (Hodge et al., 2012), and more recent fluctuations in coral reef stability (Pellissier et al., 2014). Pleistocene processes may very well have played an active role in the evolution of butterflyfishes (McMillan and Palumbi, 1995), which display younger global ages of extant lineages (∼2.6 mya) than labrids (∼6.7 mya) and pomacentrids (∼6.7 mya), particularly in the Indian Ocean and IAA hotspot (**Figure 3**). It is likely that when the gaps in taxonomic sampling of these reef fish families are filled, and cryptic species are identified, the inclusion of unsampled lineages closer to the present may enhance the role played by speciation in the Pleistocene and the importance of peripheral locations in promoting biodiversity. Processes at work in the Miocene appear to be the main source of origination of modern reef fish biodiversity patterns, while processes maintaining this pattern are prominent from the Pliocene/Pleistocene. However, to gain a clearer picture of the magnitude of these processes across the marine tropics, studies with a biogeographic focus have been important.

#### **BIOGEOGRAPHY AND BIODIVERSITY**

As with the phylogenetic history of reef fishes, the biogeographic history is reliant on sampling, specifically, knowledge of the current extend of reef fish species ranges. In this regard, we are fortunate to have had many skilled ichthyologists throughout the decades collecting geographic information on reef fish distributions (Allen, 2014). Several initiatives have been actively cataloging the diversity found on and off coral reefs, e.g., Atlas of Living Australia1; IUCN red list2; the Global Biodiversity Information Facility3; Map of Life4; and Ocean Biogeographic Information System5. A recent effort to construct a global database for tropical reef fishes has resulted in over 6300 records for reef fishes across 169 locations (GASPAR database; Kulbicki et al., 2013; Parravicini et al., 2013). This database has been used to explore global predictors of reef fish species richness (Parravicini et al., 2013); global biogeography of reef fishes (Kulbicki et al., 2013); human mediated losses of phylogenetic and functional diversity (D'agata et al., 2014); and the role of stable reef habitat in preserving reef fish diversity (Pellissier et al., 2014).

Of the valid nominal species in the nine reef fish families, these geographic checklists (Kulbicki et al., 2013; Parravicini et al., 2013) include the vast majority of them, ranging from 63% of carangid species, to 100% of acanthurid species (**Table 1**). These data are likely to include the majority, if not all reef associated members of these families. In combination with a fully sampled phylogeny of reef fishes, these geographic data would allow us to tease apart some of the questions that have been partially answered so far regarded the origins of tropical biodiversity. Unfortunately, fully sampled phylogenies for important groups are still out of reach. The incomplete and clade biased phylogenetic sampling also translates into a bias in geographic sampling (**Figure 4**). Those charismatic families such as the Labridae and Chaetodontidae that have been the focus of several papers from a variety of research groups have more even phylogenetic sampling across biogeographic ecoregions, with over 50% of taxa present in each region represented in a published phylogeny (**Figures 4A,B**). A sampling bias can be observed among ocean basins, and among families, where some families (Apogonidae, Blenniidae; **Figures 4E,F**) have higher phylogenetic sampling in Indo-Pacific locations, whereas others (Pomacentridae, Mullidae, Carangidae; **Figures 4C,H,I**) show higher phylogenetic sampling in the Atlantic locations. Overall, the families Apogonidae, Blenniidae, Mullidae and Carangidae show concerning levels of lower phylogenetic resolution across tropical reef habitats (**Figures 4E,F,H,I**) with many regions showing below 10% phylogenetic sampling of ecoregion assemblages. Nonetheless, the geographic data available, regardless of its sampling in phylogenetic trees, have been fruitful in delineating biogeographic regions across the marine tropics.

Biogeographic science has an important role in the guidance of biodiversity conservation (Whittaker et al., 2005). Dividing the tropics into discrete regions has proven to be a difficult process (Mouillot et al., 2013), but it is a necessary step toward critically evaluating and implementing conservation priorities (Whiting et al., 2000; Olson et al., 2001). With the predictions of a grim future ahead for coral reef systems under a changing climate

3http://www.gbif.org

5http://www.iobis.org

**FIGURE 4 | Phylogenetic sampling of nine reef fish families across the marine tropics. (A–I)** Global maps of tropical ecoregions displaying phylogenetic sampling of species assemblages for each of the nine characteristic reef fish families. Species richness for each family within ecoregions is base on species checklist of species counts from the "checklist" × "all species" dataset of Kulbicki et al. (2013) and phylogenetic sampling is based on taxon sampling of each published family phylogeny (**Figure 2**;**Table 1**). Ecoregions that have <10% of the family species pool represented are outlined in black.

<sup>1</sup>http://www.ala.org.au

<sup>2</sup>http://www.iucnredlist.org

<sup>4</sup>http://www.mol.org

(Hughes et al., 2003), the biogeographic delineation of the marine tropics and how regional assemblages have formed through time is paramount to our understanding of biodiversity maintenance. In the past decade, several studies have provided a schematic break down of regions across the tropical belt based on differing criteria (Spalding et al., 2007; Briggs and Bowen, 2012; Kulbicki et al., 2013). Most recently, Kulbicki et al. (2013) used a hierarchical approach to delineating tropical reef regions based on species dissimilarity. Using species checklists across 169 locations (Parravicini et al., 2013), their results identify three realms (Atlantic, Central Indo-Pacific, Tropical East Pacific; **Figure 1C**), each with varying degrees of structure within those delineated regions and provinces (**Figure 1C**; Kulbicki et al., 2013). The Central Indo-Pacific region, within the Indo-Pacific realm, was characterized by lower within region dissimilarity, while neighboring regions (Western Indian and Central Pacific) could be broken down further into provinces (although some internal structure is seen when analyses were based on ecoregions as base units; see Kulbicki et al., 2013). While the IAA (or Coral Triangle) may be delineated as the area containing the highest proportion of reef fish species (Briggs and Bowen, 2012), in terms of species composition there is no strong evidence delineating it as a separate entity in the Central Indo-Pacific (Kulbicki et al., 2013). The IAA biodiversity hotspot may not be a defined region based on species dissimilarity. But, an area the extent of the IAA at the center of the highest number of overlapping species ranges must have played a significant role on an evolutionary scale in the generation of current day biodiversity patterns. On a shallow timescale, the complex role of the IAA and Coral Triangle region has been illustrated through numerous population level and phylogeographic studies (reviews by Carpenter et al., 2011). While the extant ranges of reef associated fishes can statistically delineate regions of dissimilar assemblages, the lines of division are highly dependent on the method used (Leprieur et al., 2012; Mouillot et al., 2013), and there remains no consensus on which method or regional scheme is best. It is likely that the appropriate biogeography scheme will depend on the question being addressed. From a macroevolutionary perspective, whether any present day scheme for biogeographic delineation has a meaning for past diversification and biogeographic evolution has yet to be addressed.

The evolution of reef fish biodiversity patterns is likely to be concordant with the evolution of coral reef habitats. Higher diversification rates of reef associated fish lineages have been demonstrated (Cowman and Bellwood, 2011) and transitions onto coral reefs appear important for accelerated morphological evolution (Price et al., 2014). However, a direct (or indirect) link between the diversification of corals and the diversification of their associated fish lineages has yet to be recognized (Duchene et al., 2013). From a biogeographic perspective, the spatial and temporal distribution of coral taxa and the platforms they construct may provide insight into the evolution of reef fishes that inhabit them. Extent of coral reef area (Bellwood and Hughes, 2001) and its stability through time (Pellissier et al., 2014) have been highlighted as significant predictors of extant reef fish biodiversity. The fossil record of reef building corals highlight differences among ocean basins (Budd, 2000; Wallace and Rosen, 2006). The Atlantic and Caribbean fossil reef biota display high turnover of coral species and extinction of reef habitat (Budd, 2000), while the Indo-Pacific fossil biota displays a history of eastward movement linked to tectonic activity (Wilson and Rosen, 1998; Renema et al., 2008) with modern coral taxa in the Central Indo-Pacific consisting of Tethyan relicts and recent speciation events (Wallace and Rosen, 2006). Such data could be used to model the spatial and temporal dynamics of coral reef habitat allowing us to test more explicit biogeographic scenarios and hypothesis related to tropical biodiversity (e.g., Ree and Sanmartín, 2009).

## **THE IAA BIODIVERSITY HOTSPOT – A CENTER OF CONFUSION**

Although compositionally the IAA hotspot may not currently present a geographic entity within the Central Indo-Pacific Realm (Kulbicki et al., 2013), the area has historically been recognized as a center of biodiversity in the Indo-Pacific (Ekman, 1953). In an effort to understand the processes that have been important in producing the diversity pattern across the Indo-Pacific and the associated center of high diversity, three hypotheses became popular in the early 1980s, originally formulated to explain the biodiversity of reef building corals (summerized by Potts, 1985). These 'center of' hypotheses have been co-opted in the context of reef fish biodiversity. They have been expanded and modified to explain the extensive and overlapping widespread ranges seen in several reef fish groups (Hughes et al., 2002; Connolly et al., 2003). The details of each of these, and other hypotheses have been reviewed by Bellwood et al. (2012). Both phylogenetic and population level studies of reef fish taxa have highlighted evidence describing the IAA (or the Coral Triangle) as a center of origin (Briggs, 2003; Timm and Kochzius, 2008), a center of overlap (Hubert et al., 2012; Gaither and Rocha, 2013), or a center of accumulation/survival (Barber and Bellwood, 2005; Kool et al., 2011).

Each hypothesis has made predictions about the location of origin of species, their age, and their trajectory of range expansion or change (see Bellwood et al., 2012). Primarily, species with restricted endemic ranges have been important in the assessment of these hypotheses, but even the study of endemic taxa has been fraught with debate (Bellwood and Meyer, 2009a,b; Briggs, 2009). Even how an endemic range is defined can lead to conflicting patterns of endemism (Hughes et al., 2002; Mora et al., 2003). Bellwood and Meyer (2009a,b) highlighted the diffuse ages of endemic taxa, whether they are found inside or outside the IAA. Endemic taxa can be young (neo-endemics) or old (palaeo-endemics) and as such their use to delineate areas of species geographic origin should be cautious. Indeed the ages of endemic coral reef fishes in several families do not differ significantly from those of more widespread species (Hodge et al., 2014). Instead of using endemic species as a tool in pinpointing locations of species origin it is becoming clear that understanding how processes of isolation and extinction have lead to current patterns of endemism along side widespread species is an important step in the study of reef fish biodiversity.

Recent studies have begun to highlight that the processes that promote, maintain and diffuse biodiversity in the marine tropics are more dynamic in nature with multiple drivers acting both in concert, and decoupled across temporal and geographic scales (Bowen et al., 2013; Cowman and Bellwood, 2013a). The question has changed from which hypothesis is most accurate, to when and where the processes they invoke have been most prevalent and how they have interacted to produce the biodiversity we see today. To this end, it may be time to mute the discussion about 'centers of' in the field of reef fish biodiversity in favor of directly examining and modeling rates of speciation, extinction and dispersal in a temporal and geographic framework. Such methods have been advantageous in investigating terrestrial diversity patterns on global scales (Jetz et al.,2012; Rolland et al.,2014). If these different processes have played an active role in the development of tropical biodiversity but on different temporal and spatial scales, then several biogeographic areas may have historically acted as sources or sinks (or both) for biodiversity at different periods in time. For example, the Atlantic realm, like the IAA hotspot, can be considered a center for species origination (Cowman and Bellwood, 2013a), but its history of isolation from the Indo-Pacific (Floeter et al., 2001; Joyeux et al., 2001) and extinction (O'Dea et al., 2007) has contributed to its lower diversity when compared to the Indo-Pacific. Both the Indian Ocean and the Central Pacific regions have higher standing diversity of reef fishes than the Atlantic. However, most of their diversity has been derived through expansions of lineages from the IAA. But peripheral locations in both these regions have also been sites of species origination (Hodge et al., 2013). In addition, within the Indo-Pacific realm, it remains unclear if the IAA hotspot actually has experienced higher rates of speciation than adjacent regions (Bellwood and Meyer, 2009b; Litsios et al., 2014). While speciation has certainly occurred within the IAA hotspot, peripheral locations are also important sources of new species (Bowen et al., 2013; Hodge et al., 2014). None of these hypotheses can be disregarded, but nor can any one of them solely explain the IAA biodiversity pattern (Rosen, 1984; Palumbi, 1997; Halas andWinterbottom, 2009; Hoeksema, 2009). Halas andWinterbottom (2009), comparing reconstructed area relationships of cladograms of fishes, corals and molluscs, found little congruence among these taxa and little evidence for any of the core models examined, despite these groups displaying very similar patterns of diversity across the tropics (Roberts et al., 2008). Several studies have asked what present day geographic or environmental factors explain the variation in the IAA diversity pattern (Mora et al., 2003; Tittensor et al., 2010; Parravicini et al., 2013), but it appears that examining historical factors may have more explanatory power when examining the origin and maintenance of biodiversity in the marine tropics (Renema et al., 2008; Pellissier et al., 2014). While the history of tropical biodiversity may remain clouded until complete phylogenies are available, concordant patterns in currently published data for tropical reef fishes has allowed key events in the history of the tropics to be recognized.

Though the IAA hotspot is enigmatic, it has not been a unique pattern through time. It represents the modern manifestation of a pattern that has existed for at least the past 50 million years (Renema et al., 2008). The center of biodiversity has 'hopped' from a Tethyan location (Paleocene), to an Arabian/IAA hotspot (Eocene/Oligocene), to its current location in the IAA (Miocene; Renema et al., 2008). This biogeographic re-centering of biodiversity was associated with a sequence of TECOG events (Bellwood et al., 2012), dynamic processes controlling the origin and survival of species (Cowman and Bellwood, 2011, 2013a) resulting in the establishment of a trophic system characteristic of modern coral reefs. These processes resulted in the contraction and expansion of carbonate platforms, the evolution of the coral species that built them, and their associated fish lineages.

The earliest fossil records of lineages leading to modern coral reef fishes and the coral genus *Acropora* are found in close proximity in the Late Paleocene/Early Eocene deposits of Europe and the Western Indian Ocean (Bellwood, 1996; Wallace and Rosen, 2006). These deposits can be realistically extrapolated to be associated with the ancestral hotspot centered in the Western Tethys seaway (Renema et al., 2008). No fossil *Acropora* are currently know from the Eocene of the Indo-West Pacific. While this could be an observational artifact, this gap in the coral record corresponds to a geographic gap with fewer shallow water habitat for coral growth in the Indo-Pacific at that time (Wilson and Rosen, 1998). It is not until the Late Oligocene/Early Miocene (∼26 mya) where we see the first fossil evidence of coral species of the genus *Acropora* occurring in the IAA (Wallace and Rosen, 2006). From this time, the tectonic collision of Australian and South East Asian plate fragments favored localized isolation and origination of new coral taxa and the expansion of carbonate platforms in the IAA (Wilson and Rosen, 1998). It is during this time we also see the demise of carbonate platforms in Europe and the Mediterranean deposits (Wallace and Rosen, 2006) and the collapse of the ancestral Tethyan and Arabian biodiversity hotspots (Renema et al., 2008). This collapse of the ancestral hotspots is associated with an eastward shift in fossil deposits of reef associated organisms (Renema et al., 2008) and the expansion of carbonate platforms in the IAA (Wallace and Rosen, 2006). A period of high extinction may be visible in the molecular record of some reef fish groups coinciding with a decrease in fossil numbers of all marine taxa (Cowman and Bellwood, 2011). More fossil data for focal fish groups is required to confirm this pattern, however, a total evidence approach including fossil taxa as dated tips in an ancestral biogeographic framework for the family Holocentridae holds promising insight (Dornburg et al., in press).

The Miocene epoch represents an important phase in the evolution of the IAA biodiversity hotspot (Bellwood et al., in press), with the expansion of both coral reef platforms (Wallace and Rosen, 2006) and associated fish lineages (Cowman and Bellwood, 2013a). As a result of tectonic activity we see the final closure of the Tethys seaway, known as the Terminal Tethyan Event (TTE, 18–12 mya; Steininger and Rögl, 1979) and the development of the Isthmus of Panama (IOP; Coates and Obando, 1996) isolating the Atlantic and Caribbean from the Indo-Pacific. The development and closure of these 'hard' land barriers would have been associated with climatic upheaval (Hallam, 1994; Montes et al., 2012) and extinction in reef locations (McCoy and Heck, 1976; Budd, 2000; O'Dea et al., 2007). This has led to a diffuse pattern of vicariance in the molecular record of some reef fish families (Cowman and Bellwood, 2013b). The TTE and the IOP barriers in conjunction with the expanse of ocean known as the East Pacific Barrier (EPB; Bellwood and Wainwright, 2002) have left a lasting mark on modern tropical reef fish assemblages (Kulbicki et al., 2013). However, some recent dispersal from the Indian Ocean into the Atlantic has been detected (Rocha et al., 2005a; Bowen et al., 2006), with several lineages maintaining gene flow across the EPB (Lessios and Robertson, 2006).

In terms of coral reef ecology, the Miocene holds the origins of many novel feeding modes (Cowman et al., 2009; Bellwood et al., 2010), and an escalation in herbivory and detritivory that have become essential services performed by fishes on healthy coral reefs (Hughes et al., 2011). In the Labridae, coral reef associated lineages show significantly higher rates of trophic ecomorphological evolution with over a third of that diversity seen within trophic modes only found on coral reefs (Price et al., 2011). A switch to consuming low quality food items has been linked to higher rates of diversification in several coral reef lineages with origins in the Oligo-Miocene (Lobato et al., 2014). This reflects fossil evidence showing the transition of reef fish forms to exploiting the epilithic algal matrix, an underutilized resource on coral reef flats (Bellwood et al., 2014a).

By the end of the Miocene, the center of fish diversity has taken shape in the IAA (Cowman and Bellwood, 2013a) and important trophic components are in place on coral reefs (Cowman et al., 2009; Price et al., 2011; Bellwood et al., 2014a). Speciation continues in several lineages from the Pliocene to Recent time periods. Expansion of lineage ranges from the IAA to adjacent regions is common (Cowman and Bellwood, 2013a) with vicariance and speciation in peripheral locations (Hodge et al., 2014). In the Atlantic realm, Pliocene speciation has been described in several reef associated genera (Floeter et al., 2008), with evidence of ecological speciation (Rocha et al., 2005b). While ecological speciation is likely to be ongoing in the Indo-Pacific, recent studies reflect a complex history of sympatric, allopatric and parapatric speciation (Rocha and Bowen, 2008; Choat et al., 2012; Hodge et al., 2012, 2013) with rapid dispersal potential (Quenouille et al., 2011) blurring the geographic history of speciation.

### **MACROEVOLUTION AND MACROECOLOGY ON TROPICAL REEFS**

Albeit incomplete, dated phylogenies combined with biogeographic distributions can detect the initial origins of ancestral reef fish lineages, their extinction and survival with shifting centers of biodiversity, and proliferation within expanding habitat. From these patterns it is possible to identify temporal and spatial variation in rates of speciation, extinction and dispersal and how this variation has resulted in the current biodiversity gradient. Measuring the net rate of diversification and how it varies through time has become an important metric in the integrated study of macroevolution and macroecology (Rabosky, 2009a). Methods to model variation in diversification rates in light of ecological processes has seen dramatic advancement in the last decade (reviewed by Morlon, 2014). In particular, recent interest and debate has grown around whether diversity in clades or assemblages can increase unbounded or if it can be limited by ecological or other factors (Rabosky, 2009a; Morlon et al., 2010). Only a handful of studies have explicitly examined rate variation in reef fish lineages with the comparison of constant and rate variable models of diversification (Rüber and Zardoya, 2005; Alfaro et al., 2007, 2009; Cowman and Bellwood, 2013a; Litsios et al., 2014; Lobato et al., 2014), but none have considered the effects of ecological or other factors in limiting biodiversity among tropical regions. If limiting factors do govern the capacity for biodiversity in clades and communities, variation in tropical reef fish biodiversity may have little to do with rates of speciation or extinction and more to do with the capacity of regions to support biodiversity. Rather, clades or communities have experienced different phases in their rate of diversification where they initial radiate and then slowdown as a limit is approached (Rabosky, 2009a). Variation in where and when clades have radiated would led to the observed patterns in tropical biodiversity. If clades have varied in the timing of their radiating phase among geographic regions this might manifest itself as differences in the ages of lineage origination among regions. This may be the case for some families where data is available (**Figure 3**), however these data are still from incompletely sampled phylogenies.

This radiation and subsequent slowdown is also termed "density-dependent" diversification and can resemble a "nichefilling" process. Such a process was recently uncovered in the trophic diversification of several tropical reef fish families (Lobato et al., 2014) where a switch to low quality food items by several lineages resulted in significant diversification. This highlights the potentialfor ecological opportunity on reefs to shape lineage diversification. However, it is unclear if this potential has manifested as an actual limit on diversification as many reef fish lineages do not display a slowdown in diversification rate toward the present (Cowman and Bellwood, 2011). While there is evidence of speciation rates decaying over time it appears that limits of diversity in several groups have yet to be realized (Morlon et al., 2010).

We have yet to definitively identify within a complete phylogenetic framework how rates of net diversification on tropical reefs have been altered by ecological or biogeographical processes. If such processes have underpinned the radiation of fishes on coral reefs it may change our understanding of the origins of biodiversity and what factors are important in maintaining diversity in the present.

## **TROPICAL BIODIVERSITY AND RATES OF MOLECULAR EVOLUTION**

Across several taxonomic groups there is consistent evidence of a link between the rate of molecular evolution and the observed biodiversity of clades (Fontanillas et al., 2007; Lanfear et al., 2010b; Duchene and Bromham, 2013). This pattern is not universal (Goldie et al., 2011) and has yet to be critically evaluated across the Fish Tree of Life. However, a recent study of genomic variation in African cichlids highlights several molecular mechanisms that may be linked to the enigmatic and rapid diversification of the group (Brawand et al., 2014). In the context of biodiversity patterns there is a tangle web among ecological traits, diversification and molecular rate (Dowle et al., 2013). There are a large number of characteristics, ecological and environmental, that can potentially shape the rate at which genes evolve with numerous hypotheses put forward (Bromham, 2011).

When exploring the link between molecular rate and diversity there are three main explanations that have been discussed (Barraclough and Savolainen, 2001). First, there is something about the process of speciation itself that increases the rate of molecular evolution (Venditti and Pagel, 2010). If the rate of speciation is associated with the rate at which populations divide or become isolated, then a reduction in the effective population size could increase the rate of substitution of nearly neutral mutations (Bromham, 2011). Second, the direction of causation could be the opposite where changes at the genomic level drive rates of speciation, and as such directly influence macroevolutionary patterns (Bromham, 2011). Higher mutation rates would result in faster accumulation of incompatibilities among hybrids and hasten the reproductive isolation among populations. Lastly, the association between diversity and molecular rate could be indirect, where a third factor promotes an increase in the rate of molecular evolution and the diversification rate. Methods are available for testing these scenarios (reviewed by Lanfear et al., 2010b) and results tend to show evidence for the rate of mutation influencing diversification (Lancaster, 2010; Lanfear et al., 2010a; Duchene and Bromham, 2013). These hypotheses have yet to be examined in fishes and may provide insight into the underlying mechanics of speciation on coral reefs.

If speciation drives the rate of molecular evolution through population subdivision, higher diversity and molecular rates in reef associated fish lineages could be driven by the fragmentation of habitat and peripheral isolation, both process that have been reported in evolutionary studies of tropical reef fishes (Hodge et al., 2012; Pellissier et al., 2014). If correct, endemic range species should show faster rates of molecular evolution when compared with a widespread sister lineage. Endemic range species and isolated populations within widespread species have displayed increased genetic structure and haplotype diversity than their widespread counterparts (Hobbs et al., 2013). Whether this is a reflection of a fast molecular rate remains to be seen.

If mutation rate drives speciation rate, would this mean that coral reefs provide the molecular fuel for speciation? It has already been demonstrated that coral reefs promote both the diversification of lineages (Alfaro et al., 2007; Cowman and Bellwood, 2011; Sorenson et al., 2014) and morphological diversity (Price et al., 2011, 2013) so it is not unrealistic that they would also speed molecular evolution. But not all lineages found on coral reefs are morphologically diverse, nor are they all biodiverse. If a similar pattern is found in the molecular rates of coral reef fish clades, where only some lineages identify with faster rates, their proliferation may be due to intrinsically higher mutation rates. This higher rate would allow populations that are briefly or partially separated by any number of mechanisms to become reproductively isolated faster. The IAA hotspot may be exceptionally diverse because its complex series of archipelagos and shallow basins provide more opportunity for population separation than elsewhere. A higher mutation rate could also provide more genomic variation for selection to act upon (Bromham, 2011) for adaptation and separation along ecological axis (Schluter and Conte, 2009). Ecological speciation has been documenting on coral reefs (Rocha et al., 2005b), and a link between adaptations to new niches and high diversity (Lobato et al., 2014). For this scenario, there would

not be anything particularly special about coral reef association other than it enabling those lineages with higher mutation rates to promote lineage diversification. The IAA being at the center of the diversity gradient would be a consequence of more reef habitat, which has previously been shown as a significant predictor of variation in reef fish diversity (Bellwood and Hughes, 2001). A situation where coral reefs have acted as a medium for the direct influence of molecular rate on diversification is very different from a third scenario where an indirect factor associated with coral reef habitats promotes faster molecular rates, and independently higher diversification. In comparing reef and non-reef habitats, or tropical versus temperate latitudes, there are a number of indirect factors that could promote both molecular rate and diversification (Dowle et al., 2013). However, across the tropical belt it may be difficult to deduce what particular factors mediate the link on coral reefs in the IAA and not on reef in other regions. As with models of diversification, it is likely that when these hypotheses are examined in depth, the processes at play will be more dynamic and possibly include more than one explanation.

## **CONCLUSION**

In reviewing the current state of phylogenies and historical biogeography of tropical reef fishes I have summarized a series of historical events that have underpinned the origins and proliferation of reef fish biodiversity in the tropics. This review also highlights several groups that require increased sampling and further analysis. While some focal groups are almost completely sampled, an additional push is needed to obtain complete species level sampling. Although the traditional nine coral reef fish families have been important models in the exploration of marine speciation and evolution on coral reefs, there are other fish families and lineages that may provided as much, if not more insight into the origins of tropical biodiversity. It is in this respect that a robust and well-resolved Fish Tree of Life will be beneficial to both the examination and comparison of evolutionary rates among discrete tropical clades found on and off reefs, and the investigation of overarching patterns of tropical diversification. I suggest that future research concerning the macroevolutionary patterns of fishes found on coral reefs examine the historical variation in rates of speciation, extinction, and dispersal among biogeographic regions and across multiple lineages. Further discussion is needed to evaluate how hypotheses concerning the origin and maintenance of biodiversity are modeled to account for the interaction between macroecology, macroevolution and molecular processes.

There are several questions that offer exciting pathways for future research:


• Do tropical clades experience increase rates of molecular change on coral reefs and how does this link to patterns of biodiversity across the tropical belt?

With increasing access to genomic methods, there is a unique opportunity to reconstruct the evolutionary history of all fishes to the level of resolution that is available in other vertebrate clades. Within this framework we can move beyond categorizing patterns and predictors of extant biodiversity, and statistically examine the evolutionary history under hypotheses driven models of diversification.

#### **ACKNOWLEDGMENTS**

I would like to thank J. Tanner, D. Bellwood, J. Hodge and members of the Macroevolution and Macroecology group at the Australian National University for helpful discussion, and G. Bernardi and L. Rüber for valuable comments on earlier drafts. I would like to thank F. Santini for access to the Acanthuridae phylogeny; B. Frédérich for access to the Pomacentridae phylogeny; P. Hundt for access to the Blenniidae phylogeny; A. Dornburg for access of the Holocentridae phylogeny; and S. Klanten for access to the parrotfish phylogeny. I thank M Kulbicki and V Parravicini for access to data on species distributions and endemism obtained from the GASPAR program. The GASPAR program is part of the CESAB initiative financed by the Foundation pour la Recherche en Biodiversité (FRB). This review was funded from the Gaylord Donnelley Postdoctoral Environment Fellowship administered by the Yale Institute for Biospheric Studies (YIBS).

#### **REFERENCES**


history of parrotfishes (Family Labridae). *Biol. J. Linn. Soc.* 107, 529–557. doi: 10.1111/j.1095-8312.2012.01959.x


nestedness matters for Indo-Pacific coral reef fishes. *J. Biogeogr*. 40, 2228–2237. doi: 10.1111/jbi.12194


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 August 2014; accepted: 27 October 2014; published online: 13 November 2014.*

*Citation: Cowman PF (2014) Historical factors that have shaped the evolution of tropical reef fishes: a review of phylogenies, biogeography, and remaining questions. Front. Genet. 5:394. doi: 10.3389/fgene.2014.00394*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Cowman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Global diversification of a tropical plant growth form: environmental correlates and historical contingencies in climbing palms

## *Thomas L. P. Couvreur 1,2 \*†, W. Daniel Kissling3 \*†, Fabien L. Condamine4 , Jens-Christian Svenning5 , Nick P. Rowe6,7 and William J. Baker <sup>8</sup>*

<sup>1</sup> Institut de Recherche pour le Développement, UMR-DIADE, Montpellier, France


<sup>7</sup> CNRS, UMR AMAP, Montpellier, France

<sup>8</sup> Royal Botanic Gardens, Surrey, UK

#### *Edited by:*

James Edward Richardson, Royal Botanic Garden Edinburgh, UK

#### *Reviewed by:*

Colin Hughes, University of Zurich, Switzerland Isabel Sanmartin, Consejo Superior de Investigaciones Científicas, Spain

#### *\*Correspondence:*

Thomas L. P. Couvreur, Institut de Recherche pour le Développement, UMR-DIADE, DYNADIV team, 911, Avenue Agropolis, F-34394 Montpellier, Cedex 5, France e-mail: thomas.couvreur@ird.fr; W. Daniel Kissling, Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, P.O. Box 94248, 1090 GE Amsterdam, Netherlands e-mail: wdkissling@gmail.com

†These authors have joint first authorship.

Tropical rain forests (TRF) are the most diverse terrestrial biome on Earth, but the diversification dynamics of their constituent growth forms remain largely unexplored. Climbing plants contribute significantly to species diversity and ecosystem processes in TRF. We investigate the broad-scale patterns and drivers of species richness as well as the diversification history of climbing and non-climbing palms (Arecaceae). We quantify to what extent macroecological diversity patterns are related to contemporary climate, forest canopy height, and paleoclimatic changes.We test whether diversification rates are higher for climbing than non-climbing palms and estimate the origin of the climbing habit. Climbers account for 22% of global palm species diversity, mostly concentrated in Southeast Asia. Global variation in climbing palm species richness can be partly explained by past and present-day climate and rain forest canopy height, but regional differences in residual species richness after accounting for current and past differences in environment suggest a strong role of historical contingencies in climbing palm diversification. Climbing palms show a higher net diversification rate than non-climbers. Diversification analyses of palms detected a diversification rate increase along the branches leading to the most speciesrich clade of climbers. Ancestral character reconstructions revealed that the climbing habit originated between early Eocene and Miocene. These results imply that changes from non-climbing to climbing habits may have played an important role in palm diversification, resulting in the origin of one fifth of all palm species.We suggest that, in addition to current climate and paleoclimatic changes after the late Neogene, present-day diversity of climbing palms can be explained by morpho-anatomical innovations, the biogeographic history of Southeast Asia, and/or ecological opportunities due to the diversification of high-stature dipterocarps in Asian TRFs.

**Keywords: ClaSSE, BAMM, growth form, lianas, plant traits, rattans, tropical rain forest evolution, Dipterocarpaceae**

#### **INTRODUCTION**

"The object of all climbing plants is to reach the light and free air with as little expenditure of organic matter as possible."

Darwin (1865)

Besides being the most diverse terrestrial ecosystem on Earth, tropical rain forests (TRF) contain a wide array of growth forms such as large emergent to small understory trees, shrubs, epiphytes, lianas, and vines, as well as parasitic plants (Richards, 1996). Understanding how these different growth forms have originated and contributed to the diversification of TRFs through time provides important insights into the evolution of this biome (e.g., Givnish et al., 2014). Anthropogenic disturbances have caused

significant structural changes in TRFs and recent studies indicate that understanding changes in the abundance and biomass of specific growth forms has important implications for future community and ecosystem dynamics in TRFs (Phillips et al., 2002; Schnitzer and Bongers, 2011).

The climbing growth form (lianas and vines) constitutes a key component of tropical forests worldwide, contributing considerably to species diversity (between 10–50%), stem density (∼25%), and ecosystem processes such as forest transpiration and carbon sequestration (Gentry, 1991; Schnitzer and Bongers, 2002). The climbing habit is present in more than 130 plant families and has evolved independently numerous times within angiosperms

(Gentry, 1991). Across species from different woody plant families, abundance of climbers in tropical forests is negatively correlated with mean annual precipitation and positively with seasonality, peaking in tropical dryforests (Schnitzer,2005). However, regional studies of climber abundance do not necessarily support such results and suggest that structural characteristics of the forests can be more important than the physical environment (van der Heijden and Phillips, 2008). There is also large variation in stem structural properties and hydraulic architectures among climbing plants (Tomlinson and Fisher, 2000; Rowe et al., 2004) that could potentially mask environmental controls of specific lineages in cross-taxon analyses.

Despite their ecological importance, few studies have investigated the role of the climbing habit in the evolution and diversification of TRF. Based on sister group comparisons of species richness between climbing and non-climbing clades within 48 angiosperm families, Gianoli (2004) inferred that the climbing habit is a key innovation within flowering plants, leading to higher species richness than non-climbing sister groups. Wang et al. (2012) used a generic level dated phylogeny of the largely climbing TRF family Menispermaceae and found evidence for a burst of diversification shortly after the Cretaceous–Paleogene boundary, suggesting an important role of the climbing habit in the diversification of TRFs throughout the Cenozoic (this is also found in ferns; Schneider et al., 2004). Moreover, biogeographic analyses of the Neotropical tribe Bignonieae (Bignoniaceae) further showed that drivers of climber diversification are possibly related to climate drying and Andean orogeny (Lohmann et al., 2013).

With about 2,500 species, palms (Arecaceae) are a speciesrich, monocotyledonous plant family characteristic of tropical and subtropical ecosystems (Dransfield et al., 2008; Couvreur and Baker, 2013). Palms have limited ability to survive in areas with cold and arid climates due to structural constraints (Tomlinson, 2006). As a consequence, palm species richness generally peaks in warm and humid areas with low seasonality (Bjorholm et al., 2005; Kissling et al., 2012a). However, historical legacies related to evolutionary history of specific lineages (Baker and Couvreur, 2013a,b), dispersal limitation (Bjorholm et al., 2006; Kissling et al., 2012a; Eiserhardt et al., 2013), and the unique history of biogeographic regions (Bjorholm et al., 2006; Kissling et al., 2012a; Blach-Overgaard et al., 2013; Rakotoarinivo et al., 2013) also play an important role in shaping global patterns of palm species richness and distribution.

A large diversity of growth forms has evolved within palms, including tree palms, palms with clustered stems, acaulescent palms, and climbing palms (Dransfield, 1978; Dransfield et al., 2008; Balslev et al., 2011). The climbing habit has evolved independently several times (Baker et al., 2000a), most notably within subfamily Calamoideae (**Table 1**), but also in relatively small sets of species within subfamily Arecoideae in the Neotropical genera *Desmoncus* and *Chamaedorea*, and in the Madagascan genus *Dypsis* (**Table 1**). The mainly Southeast Asian *Calamus* is the most species-rich genus of palms and is one of the most diverse genera of climbing plants (Gentry, 1991). Climbing in palms is typically facilitated by traits such as elongated stems that are stiffened by cylindrical leaf sheaths distally, the presence of spines on almost

all organs, and wider vessels compared to their self-supporting counterparts (Tomlinson and Fisher, 2000). In addition, climbing palms have evolved two unique climbing organs: the cirrus, an extension of the leaf rachis usually equipped with recurved grapnel-like spines as well and, in some taxa, hook-like reflexed leaflets, and the flagellum, a modified, sterile inflorescence also armed with recurved grapnel-like spines, which is only found in *Calamus*. Both are highly efficient attachment structuresfor climbing (Dransfield et al., 2008; Isnard and Rowe, 2008). However, palms do not actively twine to gain support, but rather become anchored passively on adjacent vegetation via these climbing organs (Putz, 1990).

As in most other monocotyledons, palms lack a vascular cambium for secondary growth. Instead, they retain their primary anatomical architecture of the stem for their entire life. Nevertheless, they can still achieve remarkable heights of up to 60 m tall (Sanin and Galeano, 2011). While some tall tree palms having developed mechanical properties to minimize elastic buckling when achieving large heights (Rich, 1986), there seems to be a critical threshold that limits palm height growth (Niklas, 1993, 1994). Thus, the tree growth form in palms could be a competitive disadvantage in tall TRF environments where structural and functional limitations prohibit them from reaching the canopy. In contrast, climbing palm stems can attain astonishing lengths, reaching up to 170 m length or even more in *Calamus*, which represents the longest unrooted aerial plant stem on record (Burkill, 1966). Given the potential importance of forest structure for abundance and distribution of climbers (van der Heijden and Phillips, 2008), it can be hypothesized that on an evolutionary timescale the global biogeographic differences in tropical rain forest canopy heights (Lefsky, 2010) might have influenced the diversification of climbing palms.

Here, we combine macroecological and macroevolutionary analyses to investigate the role of climbing in the evolutionary history of palms. We quantify global patterns and drivers of climbing palm diversity and ask what role the climbing habit has played in the diversification of palms through time. Specifically, we test the following hypotheses and corresponding predictions:

	- (a) Due to physiological and functional adaptations of palms, species richness of climbing palms is positively correlated with temperature and precipitation and negatively with temperature seasonality.
	- (b) Differences in canopy height among TRFs explain global variation in species richness of climbing palms, with tall forests having more climbing species than short-stature forests.
	- (a) Diversification rates are higher for climbing than nonclimbing palms lineages.
	- (b) The origin of the climbing habit correlates with increases in diversification rates.



Data are based on the World Checklist of Palms (Govaerts et al., 2014). Data for the genus Dypsis have been additionally updated (now two climbing species).

### **MATERIALS AND METHODS PALM SPECIES DISTRIBUTION DATA**

We used data on global palm species distributions from an exhaustive, authoritative checklist of the World's palm species (Govaerts et al., 2014, data used here accessed March 2009). Though coarse in resolution, this dataset currently represents the most complete and reliable source on species distributions of all palms worldwide. The dataset records palm species presences and absences across the world within the level 3 geographic units as defined by the International Working Group on Taxonomic Databases (TDWG; Brummitt, 2001). These TDWG level 3 units mostly correspond to countries and/or major islands, such as Borneo, Madagascar, and New Guinea, but very large countries such as USA, Brazil, and China are subdivided into smaller units. We included only native palm occurrences, excluding introduced occurrences, and doubtful as well as erroneous records. To derive estimates of species richness, we summed all presences of palm species within each TDWG level 3 unit, and did this separately for climbing and non-climbing palms (see below).

The dataset contained a total of 2,445 accepted palm species names and 5,027 native occurrence records within 194 TDWG level 3 units (Kissling et al., 2012a). It does not include a recently described climbing species of *Dypsis* (Rakotoarinivo and Dransfield, 2010) which would raise the total number of liana species in this genus to two. The addition of this extra species would not impact the results presented here, but this new information was taken into account throughout the discussion and in **Table 1**.

#### **CATEGORIZATION OF GROWTH FORMS**

We classified all palm species into two growth forms: (1) climbers, and (2) non-climbers (including all other growth forms such as stemmed and acaulescent palms). Palm species that show leaning growth forms such as some *Bactris* species (not really climbers or trees) were considered as non-climbers. For climbers, we also included palm species that can, within the same species, show both climbing and non-climbing habits (only 16 species). The remaining species (*n* = 2431) were exclusively climbers or non-climbers. Information on climbing habit was derived from the literature (Russell, 1968; Dransfield, 1979, 1986; Dransfield and Beentje, 1995; Henderson et al., 1995; Henderson, 2002, 2009; Dransfield et al., 2008; Dowe, 2010) and for a few species supplemented with expert knowledge.

#### **ENVIRONMENTAL DETERMINANTS OF CLIMBER AND NON-CLIMBER SPECIES RICHNESS**

We tested 14 predictor variables as potential determinants of species richness in climbing and non-climbing palms. These variables reflected contemporary climate (six variables), paleoclimate (six variables), canopy height (one variable), and biogeographic region (one variable). Present and past climates (Kissling et al., 2012a; Blach-Overgaard et al., 2013; Rakotoarinivo et al., 2013) as well as biogeographic history (Baker and Couvreur, 2013a,b) are important drivers of broad-scale species distributions and diversity patterns in palms. Extracted at a coarse resolution (averaged within TDWG level 3 units), these predictor variables allow assessing how broad-scale trends in environmental conditions are related to geographic differences in species numbers of palms worldwide. We acknowledge that fine-scale heterogeneity (e.g. in climates, canopy heights etc.) within TDWG level units might not be well captured by such coarse-grained data, but analyses with new species distribution datasets at high resolution could incorporate some of this heterogeneity in the future.

#### *Contemporary climate*

To represent contemporary climate, we chose six climatic predictor variables from the Worldclim dataset (version 1.4; www.worldclim.org), a set of global climate layers with a spatial resolution of ca. 1 km2 (Hijmans et al., 2005). We used (1) annual precipitation (PREC, in mm year−1), (2) annual mean temperature (TEMP, in ◦C × 10), (3) precipitation seasonality measured as variation of monthly values (PREC SEAS, in mm), (4) temperature seasonality measured as standard deviation of monthly means (TEMP SEAS, in ◦C × 10), (5) extremes of drought measured as precipitation of the driest quarter (PREC DRY, in mm), and (6) extremes of cold measured as mean temperature of the coldest quarter (TEMP COLD, in ◦C × 10). Data extraction and geoprocessing of climate data are described in more detail in Kissling et al. (2012a). Several of these variables, including PREC and various temperature measures (TEMP, TEMP SEAS, TEMP COLD), have been identified as important drivers of the global range and species richness of palms (Kissling et al., 2012a). Other climate variables (e.g., minimum or maximum monthly precipitation and temperature) are highly correlated with these climatic predictors and were hence not included here.

#### *Paleoclimate*

To represent paleoclimatic changes over the Neogene and Quaternary epochs, we compiled both temperature and precipitation data from paleoclimatic reconstructions representing the Last Glacial Maximum (LGM, ca. 21,000 years ago), the late Pliocene (∼3 mya), and the late Miocene (∼10 mya). Data for the LGM were compiled from two climate simulations representing the Community Climate System Model version 3 (CCSM3) and the Model for Interdisciplinary Research on Climate version 3.2 (MIROC3.2), both of which were part of the second phase of the Paleoclimate Modeling Intercomparison Project1 (PMIP2; Braconnot et al., 2007). Paleoclimate data for deeper time periods were derived from coupled ocean–atmosphere general circulation models representing the late Pliocene (3.29–2.97 mya; Haywood et al., 2009) and the late Miocene (11.61–7.25 mya; Pound et al., 2011). All paleoclimate data were resampled in ArcGIS with a bilinear interpolation from the original resolution (2.5, 1◦, or 2.5◦) to the resolution of the contemporary climate data (see above). We then calculated anomalies (i.e., the difference between the current climate and the past) for all three time periods as well as for both precipitation and temperature data, resulting in six paleoclimatic predictor variables reflecting the change in climate since the LGM (LGM*PREC*, LGM*TEMP*), the Pliocene (PLIO*PREC*, PLIO*TEMP*), and the Miocene (MIOPREC, MIOTEMP). Anomalies were measured by subtracting the paleoclimate value in each TDWG level 3 unit from its present-day climate (i.e., contemporary climate minus paleoclimate). Large positive anomaly values indicate a higher precipitation and temperature in the present than in the past whereas small or negative anomaly values indicate the opposite, i.e., higher precipitation and temperature in the past than in the present. Hence, a negative relationship between species richness and climate anomalies indicates that species richness is higher in areas that were relatively wetter or warmer in the past than today. TDWG level 3 unit values for LGM precipitation and temperature were calculated as mean values across two paleoclimatic simulations (CCSM3, MIROC3.2). Note that temperature anomalies since the LGM to the present (LGMTEMP) can be considered roughly representative for climatic oscillations of the whole Quaternary (the last 2.6 million years; Jansson, 2003; Kissling et al., 2012a).

#### *Canopy height*

We included canopy height (CANOPY) of tropical rain forests as a predictor variable to test whether species richness of climbing palms increases with the tallness of forests. Forest canopy height data were derived from a recent global map of forest heights (Lefsky, 2010), derived from LiDAR and multispectral remote sensing data for forest patches across the world (average patch size of approximately 25 <sup>±</sup> 50 km2). The product provides the 90th percentile canopy height as an index that captures the tallest heights in a given stand (Lefsky, 2010). The original dataset with a 500 m resolution was geoprocessed in ArcGIS 10 and mean values of canopy height were calculated at the resolution of the TDWG level 3 units for which the palm distribution data are available. Note that the canopy height data only represent natural forests with >70% cover and therefore leave out savannas and woodlands (Lefsky, 2010). This is unproblematic for palms which predominantly occur in tropical and subtropical forests.

#### *Biogeographic history*

We derived a biogeographic variable (REGION) to capture potential effects related to the long-term history of biogeographic regions. This categorical variable distinguished seven major regions: Afrotropics, Australasia, Indomalaya, Nearctic, Neotropics, Oceania, and Palaearctic (Kissling et al., 2012a). It can capture major differences in species richness and clade distributions between regions (Kissling et al., 2012a,b; Baker and Couvreur, 2013a,b) and permits examination of how species richness varies among realms once present-day environment and paleoclimate have been statistically accounted for (Kissling et al., 2012a).

#### *Statistical analysis of determinants of species richness*

We assessed the environmental determinants of species richness of both climbers and non-climbers with separate multi-predictor regression models. We only included TDWG level 3 units where species richness >0 and for which environmental data were available. A number of smaller islands had to be excluded for nonclimbing palms because canopy height data were not available for those. Hence, final sample sizes for the statistical analysis were 82 and 164 TDWG level 3 units for climbers and non-climbers, respectively.

We used generalized linear models (GLM) with a Gaussian error distribution and included all 14 predictor variables as well as either species richness of climbers or non-climbers as response variable. We then applied a model selection based on the Akaike Information Criterion (AIC) to derive a minimum adequate model that had the smallest possible number of predictor variables (i.e., the lowest AIC value; Burnham and Anderson, 2002). We used the variance inflation factors (VIF) to test for multicollinearity among the predictors included in the regression models, excluding variables with VIF >10 before AIC model selection (TEMP SEAS and

<sup>1</sup>http://pmip2.lsce.ipsl.fr/

TEMP COLD for climber richness, TEMP COLD and PREC DRY for non-climber richness). All continuous predictor variables were scaled before the analysis (standardized to mean = 0 and SD = 1). We compared the relative importance of predictor variables to explain species richness of climbing and non-climbing palms with semi-standardized coefficients (for continuous variables only). Both response variables as well as several predictor variables (TEMP, PREC SEAS, TEMP SEAS, PREC DRY, LGMTEMP) were log-transformed to improve normality. We included second order polynomials to account for non-linear relationships if single predictor models with polynomials showed statistically significant improvements over models without polynomials (using an ANOVA model comparison at *P* < 0.05; Crawley, 2007). The reference level of the categorical variable REGION was set to the Indomalayan region because this allowed us compare all other regions to the region with the highest species richness of climbing palms (i.e. Indomalaya). For climbing palms, TDWG level 3 sample sizes for the Nearctic, Oceania, and the Palaearctic were small (*n* ≤ 1), and statistical analyses of climbing palm species richness were therefore restricted to the Afrotropics, Australasia, Indomalaya, and the Neotropics.

To test for a potential influence of spatial autocorrelation (Kissling and Carl, 2008), we further calculated Moran's *I* values on the residuals of our minimum adequate GLMs. The geographic distance for calculating Moran's *I* values was based on the closest neighbor of each TDWG level 3 unit, and significance of Moran's *I* was determined by permutation tests (*n* = 10000 permutations). Moran's*I* values of the residuals of both minimum adequate GLMs were not statistically significant (see Results) and residual spatial autocorrelation was therefore considered to be unimportant in our analyses. Hence, there was no need to additionally implement spatial regression models (Kissling and Carl, 2008). All statistical analyses on determinants of species richness were done with the R version 3.0.2 (R Development Core Team, 2013). Moran's *I* analyses were conducted using the R library 'spdep.'

#### **DIVERSIFICATION ANALYSES**

Beyond examining the determinants of species richness of climbers and non-climbers, we performed diversification analyses to test whether the evolution of climbers has an impact on diversification rates as a whole and for specific clades.

#### *Overall diversification analysis of climbing palms*

We used the Cladogenetic State Speciation and Extinction model (ClaSSE, Goldberg and Igi´c, 2012) to test if climbing species have overall higher diversification rates than non-climbing palms. We generated a species-level phylogeny of palms (i.e., a 2,445 tip dated phylogeny), based on the genus-level, fossil-calibrated tree of Couvreur et al. (2011a) derivedfrom the global palm supertree of Baker et al. (2009) as an initial backbone. This phylogenetic tree provides the most robust hypothesis of relationships between palm genera to date. We then simulated random species phylogenies at the tips (i.e., the genera) of this backbone tree using known species diversity for each genus (see above). This was done 100 times to get 100 species level trees. We used the script "gratfMissingTaxa"

written by François Michonneau2 and modified it to simulate phylogenies under a pure birth model (sim.bdtree function in APE, Paradis et al., 2004). For each genus, the crown node of the simulated species-level phylogeny was inserted either (i) randomly along the whole length of the subtending branch of the genus (referred to as the 'non-constrained analysis'), or (ii) following a uniform distribution between 0.2 and 0.8 (i.e., between 20 and 80% of the whole length of the subtending branch, referred to as 'constrained analysis'). This latter approach was used to test the effect of extremely young or old inferred crown node ages on the results.

Different proportions of climbers and non-climbers within genera (see **Table 1**) can create a problem when simulating random species relationships at the tips of the phylogeny because specieslevel relationships are unknown. A preliminary analysis suggested that coding all species within climbing genera as climbers (generic level coding) or coding the observed proportions within each genus (i.e., coding each species as is) had little effect on the results. We therefore used the generic level coding approachfor subsequent analyses.

The 100 trees generated above were combined with the trait dataset of growth forms. The ClaSSE model is a derived model of the binary state speciation and extinction model (BiSSE) (Maddison et al., 2007). The ClaSSE model has ten parameters (**Figure 1**): two cladogenetic speciation rates without character change associated with the non-climbing habit (0) [λ000; the abbreviation means that one lineage with state 0 gives two lineages with state 0 and 0 (000)] and the climbing habit (1) (λ111), four cladogenetic speciation rates with character change: (λ110), (λ001), (λ100), and (λ011), two extinction rates associated with non-climbing (μ1) and climbing (μ0), and two anagenetic state change rates with one from climbing to non-climbing (q10) and one from non-climbing to climbing (q01). For clarity, we call the speciation rate that produces two daughter lineages from one parent without character changes (λ<sup>000</sup> and λ111) the "symmetrical" speciation rate, and the speciation rate that produces two daughter lineages from one parent with character changes (λ110, λ001, λ100, and λ011) the "asymmetrical" speciation rate. Overall, speciation rates for a given growth form are the sum of symmetrical and asymmetrical speciation rates estimated by the ClaSSE models.

To identify the best model given our data we considered ten ClaSSE diversification scenarios (**Figure 1**):


<sup>2</sup>https://stat.ethz.ch/pipermail/r-sig-phylo/2012-January/001826.html


All analyses on both datasets were performed using the Rpackage *diversitree* 0.7–6 (FitzJohn, 2012). For each tree, we computed the AICc (AIC corrected for finite sample sizes) corresponding to each ClaSSE model. In addition, we evaluated the support for the selected model against all models nested within it using the likelihood ratio test (LRT, significant at *P* < 0.05). The scenario supported by the LRT and with the lowest AICc value was considered the best given the data over all 100 trees. We did not undertake Markov Chain Monte Carlo (MCMC) analyses to

estimate the confidence intervals of the parameters because these are highly dependent on each individual tree shape and thus have no real meaning in our context (randomly generated species level trees). In contrast, the maximum likelihood (ML) analyses are averaged over all trees and here provide a better way to take the phylogenetic uncertainty into account.

#### *Clade specific diversification analyses*

In order to test for possible shifts in diversification rates (speciation minus extinction) associated with the evolution of the climbing habit we used the Bayesian Analysis of Macroevolutionary Mixtures approach implemented in BAMM version 2.2.0 (Rabosky, 2014). This analysis is complementary to ClaSSE in that BAMM tests for rate shifts across lineages whereas ClaSSE investigates the significance of the climbing habit for the diversification of the family as a whole.

The main assumption of previous methods for identifying diversification rate shifts in phylogenies was based on constant rates between the shifts (e.g., MEDUSA, Alfaro et al., 2009). This assumption is generally violated, especially in large trees (Morlon et al., 2011; Morlon, 2014). BAMM explicitly accounts for rate variation through time and uses a reversible jump MCMC algorithm to quickly explore numerous candidate models of lineage diversification (Rabosky, 2014). BAMM has been shown to better identify increases/decreases in diversification shifts when compared to other methods such as MEDUSA (Rabosky, 2014).

BAMM accommodates incomplete taxon sampling under a phylogenetically structured sampling. In our phylogeny, we have included most palm genera, each one being represented by a single species. We thus provided BAMM with the proportion of species sampled per genus (i.e., 1/number of species in genus). We used the chronogram of palm genera as the input tree (Couvreur et al., 2011a). Priors were estimated with BAMMTools (Rabosky et al., 2014b) using the function "setBAMMpriors". A compound Poisson process is implemented in BAMM for the prior probability of a rate shift along any branch. A prior value of 1.0 suggests a strong assumption of no rate shifts across the phylogeny. However, prior studies using MEDUSA strongly suggest rate heterogeneity in palms (Baker and Couvreur, 2013b). We therefore ran our analyses under two different priors: (1) a value of 1.0, mainly to re-test the hypotheses that palms underwent a rate shift at least once in their evolutionary history, and (2) a value of 0.1 (which generates a more flattened distribution around six rate shifts) reflecting our prior knowledge and from which we derived our conclusions. In each case, we ran three independent MCMC for 1.5 million generations sampling event data every 1000 steps. After checking for convergence of parameter estimates using the effective sampling size (ESS), we re-ran the analysis for 5 million generations, sampling every 5000 steps.

Post run analyses were undertaken in BAMMTools following Rabosky et al. (2014a). To visualize where in the tree the shifts occurred, we generated the mean phylorate plot which represents the mean diversification rate (option spex = "se" in plot.bammdata) sampled from the posterior at any point in time along any branch of the phylogenetic tree (Rabosky et al., 2014a).

In contrast to methods that identify a single best rate shift configuration across a tree (e.g., MEDUSA), BAMM identifies a set of most credible rate shift sets (CSS) ordering them by posterior probability (Rabosky et al., 2014a). Here, we selected the CSS based on a Bayes factor (BF) of 50 or more. Even though the BAMM website3 suggests that a BF of five provides substantial evidence against the null hypothesis (no rate shift along a branch), we follow the widely cited table of Kass and Raftery (1995) where strong support should be concluded from BF between 20 and 150.

Finally, we scaled the phylogenetic tree to be proportional to the BF and marginal probabilities of a rate shift along a branch. This helps to visualizes the topological location of diversification rate shifts.

#### **ANCESTRAL CHARACTER RECONSTRUCTION**

To identify how many times and when the climbing habit arose in palms above the genus level, we conducted an ancestral character reconstruction using a stochastic character (posterior) mapping approach, as implemented in the program SIMMAP (Huelsenbeck et al., 2003; Bollback, 2006). We used the generic level chronogram of Couvreur et al. (2011a; 183 tips) and the genus-level coded dataset (derived from the monomorphic dataset used for the diversification analyses above). This approach does not take into account the evolution of climbers within mainly non-climbing genera such as *Dypsis* and *Chamaedorea* (three species only). We also coded the genera *Calamus* and *Daemonorops* (subfamily Calamoideae; subtribe Calaminae) as ancestrally climbing, even though a few species within each genus are non-climbers (**Table 1**). Ancestral states were estimated using the *make.simmap* function in the R package phytools (Revell, 2012). Because specifying incorrect prior values can influence posterior mapping results (Couvreur et al., 2010) we used the empirical approach function of *make.simmap* where the priors (α and β) of the transformation rate from one character to another

(γ) were specified as follows: β = 5 and α = β × ML(Q), with Q being the transition matrix between both states. This was achieved using the *use.empirical* = *TRUE* option in *make.simmap*. We undertook 10,000 generations sampling every 100 steps. We used the function 'DensityMap' in Phytools to depict the changes in posterior probabilities (PP) along branches of the phylogeny.

#### **RESULTS**

#### **GLOBAL SPECIES RICHNESS OF CLIMBING AND NON-CLIMBING PALMS**

Out of 2,446 palm species in our dataset, a total of 535 species (22%) were classified as climbers. The majority of climbing palm species are found within the genera *Calamus* (348 species) and *Daemonorops* (92 species). In most cases, genera with climbing habits have a high (≥90%) proportion of climbing species (**Table 1**). Geographically, climbing palm species occur in all four major tropical regions (Afrotropics, Australasia, Indomalaya, Neotropics; **Table 1**; **Figure 2**) and two species in *Calamus* even reach Oceania. Climber species richness peaked in Indomalaya, closely followed by Australasia, with the Neotropics showing the lowest climber richness (**Figure 2A**). This contrasted to the species richness of non-climbing palms, which, when compared to other tropical regions, was highest in the Neotropics and lowest in the Afrotropics (**Figure 2B**).

#### **DETERMINANTS OF SPECIES RICHNESS IN CLIMBING PALMS**

Among contemporary climatic variables, TEMP (positive effect) and PREC SEAS (negative effect) were the most important variables to explain climber species richness in the minimum adequate models (**Figures 3B,C**; **Table 2**). Interestingly, PREC SEAS showed contrasting effects for climbers vs. non-climbers (negative vs. positive sign). Two additional contemporary climatic variables (PREC,TEMP SEAS) were of further importance to explain species richness of non-climbing palms, but not climbers (**Table 2**). Paleoclimatic changes (anomalies) since the Pliocene (PLIOPREC) and Miocene (MIOPREC) showed negative effects on species richness of both climbers (**Figures 3D,E**) and non-climbers (**Table 2**). This indicated that areas that were relatively wetter during the late Pliocene or late Miocene tend to have more palm species today than areas that were relatively drier in the past. In contrast to Pliocene and Miocene climate variables, precipitation and temperature anomalies since the LGM were not important to explain species richness of climbing palms, but the LGMTEMP effect was important for non-climbing palms (**Table 2**).

The minimum adequate models included CANOPY as an important predictor variable for the species richness of both non-climbing and climbing palms, but the effect was stronger for climbers than for non-climbing palms (**Table 2**). The CANOPY effect on climbing palm species richness was positive (**Figure 3A**), supporting the hypothesis that richness increases with canopy height of forests. In addition to CANOPY and contemporary climate as well as paleo-climatic variables, biogeographic region (REGION) was also selected in the minimum adequate models (**Table 2**). Climbers showed a significantly higher species richness in Indomalaya relative to other tropical regions (**Figure 3F**) whereas non-climbers showed a significantly lower

<sup>3</sup>http://bamm-project.org/postprocess.html

species richness in the Afrotropics relative to Indomalaya and the Neotropics (**Table 2**). These results were similar to the trends in raw species richness among realms (**Figure 2**), but they additionally accounted for major differences in contemporary climate, canopy height, and paleoclimate. This suggests that deep-time historical effects beyond those driven by Quaternary and late Neogene climate contribute substantially to the major differences in species richness of climbers and non-climbers among regions.

#### **DIVERSIFICATION ANALYSES**

Comparing the ten ClaSSE diversification models revealed that model 7, the cladogenetic speciation and extinction model, was the best fitting model based on average AICc values from the 100 trees (**Table 3**). This was not only the case for the unconstrained analysis but also for the constrained analysis. This model suggested that the anagenetic state change rates between climbing and nonclimbing palms are equal (q1−<sup>0</sup> = q0−1, **Figure 1**), while all other parameters are significantly different. Climbers had on average significantly higher speciation and extinction rates when compared to non-climbers (**Table 3**).

The BAMM analyses reached a stationary state well before 100,000 generations in all independent runs. The ESS values for the number of shifts and the log-likelihood were always above 200, indicating appropriate sampling of parameters from the posterior. Under the Poisson prior of 1.0, the zero rate shift model was rarely

sampled from the posterior, strongly supporting the hypothesis of diversification rate heterogeneity across palms. Based on a Poisson prior of 0.1, the most probable number of rates shifts was 9 (PP = 0.135), closely followed by 8 (PP = 0.132), and 10 (PP = 0.131). This means that diversification rates have changed 9, 8, or 10 times across the history of palms.

The phylorate plot shows an increase in mean diversification rates at the crown node of subtribe Calaminae (depicted by an arrow in **Figure 4A**). Calaminae is the largest clade of climbing palms containing around 20% of all palms species (Dransfield et al., 2008). We can also note an increase in diversification rates for subtribe Bactridinae, which contains one genus of climbing palms (*Desmoncus*). Other climber dominated clades do not show such increases.

The six most probable CSS's (with a cumulative PP of 0.614, **Figure 5**) show that significant rate increases (red circles) are mainly located on the branches leading to part of the tribe Areceae, most of tribe Trachycarpeae and most of subtribe Bactridinae. No significant rate shifts within theses sets were identified in relation to climbing dominated clades mainly found within Calamoideae (**Figure 5**). **Figure 6** represents the phylogenetic tree of palms where branch lengths were scaled to their BF (a) or marginal probabilities (b) of containing a rate shift. The shift in probability for subtribe Calaminae is visible in both cases but not significant when compared to other branches across palms.

#### **ANCESTRAL CHARACTER RECONSTRUCTION**

Posterior mapping identified 4.9 average transformations from the non-climbing to the climbing state across the 100 trees, and 0.61 transformations in the opposite direction (reversals). These results do not include the three climbing species within the genera *Dypsis* and *Chamaedorea*, but in terms of timing of the origin of climbers in palms this omission has no effect on the results. The ancestral state of the crown nodes of the subtribes Calaminae, Plectocomiinae, and Ancistrophyllinae were strongly supported as climbing [PP(1) = 1.00; 0.99; 0.98, respectively]. **Figure 4B** shows that the climbing habit first evolved in palms along the branch leading to the crown node of subtribe Ancistrophyllinae during the early Eocene (55–47 mya). A second and third origin of the climbing habit was dated to the late Eocene and early Oligocene (44–33 mya) along the branches leading to the crown nodes of the subtribes Plectocomiinae and Calaminae. The timing of the two last origins could not be estimated, having evolved along the stem node of the South American genus *Desmoncus* and the Southeast Asian genus *Korthalsia* (**Figure 4B**).

#### **DISCUSSION**

#### **ENVIRONMENTAL AND GEOGRAPHIC CORRELATES OF CLIMBING PALM SPECIES RICHNESS**

Our results show that climbing palm species richness is associated with present-day climate (temperature, precipitation seasonality) and paleoclimatic changes since the late Neogene (Miocene and Pliocene; **Table 2**; **Figure 3**). An increase in species richness with higher temperatures was observed for both climbers and nonclimbers which is in line with our hypothesis (prediction 1a) and consistent with findings on global palm diversity patterns (Kissling et al., 2012a). This relationship reflects the limited physiological andfunctional adaptations of palms, which reduce their survival in areas with cold climates (Tomlinson, 2006). However, the missing effect of temperature seasonality and the observed negative effect of precipitation seasonality on species richness of climbers are opposite to that of non-climbers (**Table 2**). The negative correlation with precipitation seasonality also contrasts with woody climbing plants, which increase (rather than decrease) in diversity with precipitation seasonality (Schnitzer, 2005). These trends could be explained by climbing palms being mostly found within Calamoideae, a subfamily that is constrained to warm and humid environments (Baker et al., 2000c; Couvreur et al., 2011a). The effect of precipitation changes (rather than absolute levels of contemporary precipitation) is also supported by the paleoclimatic effects in our regression models, which showed that Miocene and Pliocene precipitation anomalies had negative effects on climbing palm species richness. Hence, areas that were relatively wetter during the late Miocene or late Pliocene today have more climbing palm species than areas that were drier in the past. This suggests that multimillion-year non-equilibrium dynamics in diversity– climate relationships play a role in explaining present-day diversity of palms (Blach-Overgaard et al., 2013).


**Table 2 | Multiple-predictor regression models to explain global species richness of climbing (***n* **= 534) and non-climbing (***n* **= 1911) palms.**

Based on the AIC, a minimum adequate model was selected from the full set of predictor variables after removing those variables that showed high multicollinearity. Standardized coefficients are given for continuous predictor variables (scaled before the analysis to mean = 0 and SD = 1). The effect of the categorical variable REGION is relative to Indomalaya. Statistically significant effects of continuous and categorical variables are highlighted in bold. Sampling units are International Working Group on Taxonomic Databases (TDWG) level 3 units (n = 82 for climbers; n = 164 for non-climbers). '–' indicates not selected variables, 'NA' not available variables or categorical levels. Residual spatial autocorrelation was tested using Moran's I values based on the closest neighbor of each TDWG level 3 unit. Species richness and several continuous predictor variables (TEMP, PREC SEAS, TEMP SEAS, PREC DRY, LGMTEMP) were log<sup>10</sup> transformed. Abbreviations of predictor variables: CANOPY, canopy height; PREC, annual precipitation; TEMP, annual mean temperature; PREC SEAS, precipitation seasonality; TEMP SEAS, temperature seasonality; PREC DRY, precipitation of driest quarter; TEMP COLD, Mean temperature of coldest quarter; LGMPREC, Last Glacial Maximum precipitation anomaly; LGMTEMP, Last Glacial Maximum temperature anomaly; PLIOPREC, Pliocene precipitation anomaly; PLIOTEMP, Pliocene temperature anomaly; MIOPREC, Miocene precipitation anomaly; MIOTEMP, Miocene temperature anomaly; REGION, biogeographic region (categorical variable, effects are relative to the Indomalayan region). Significance levels: \*\*\*P < 0.001; \*\*P < 0.01; \*P < 0.05; n.s., not significant. R2, explained variance.


**Table 3 | Inferred rates of speciation and extinction for climbing and non-climbing palms using the Cladogenetic State Speciation and Extinction model (ClaSSE) for the best fitting model (model 7) out of ten (see Figure 1).**

In model 7 (non-constrained analysis), simulated crown nodes were allowed to vary between 0 and 100% of the whole length of the subtending branch of the genus. In model 7 (constrained analysis), the crown node of genera were restricted to be between 20 and 100% of the whole length of the subtending branch of the genus (see Materials and Methods). df, degrees of freedom; LogL, likelihood of model; AICc , Akaike Information Criterion (AIC) correction for finite sample sizes.

Our results further support the hypothesis that climbing palms are more diverse in tall-stature forests than in lower canopy ones (**Figure 3A**; hypothesis and prediction 1b). Forest canopy is not uniform across continents, being highest in Southeast Asia and Africa when compared to South America and Australia (Banin et al., 2012). Indeed, higher canopies and larger trees could arguably provide a three dimensional space that is more effectively exploited by certain kinds of climbing plants, particularly climbing palms. First, higher forests have a more patchy canopy because they have larger gaps in between tree trunks and fewer large trees reaching maximum heights (e.g., dipterocarp forests, see below, Richards, 1996). Second, larger trees can lead to larger tree-fall gaps, which has been shown to maintain a higher diversity of climber species in general (Putz, 1984; Schnitzer and Carson, 2001). However, in contrast to nonmonocot lianas (Schnitzer, 2005) the majority of climbing palms might be less tolerant of strong disturbances. This is probably also reflected in the negative relationship of palm species richness with PREC SEAS (**Table 2**), which is opposite to dicotyledonous lianas (Schnitzer, 2005). Third, climbing palms have developed particularly long and light stiff "searcher-stems" compared with many other woody climbers, and they also produce long and sharptoothed cirri and flagella. These climbing traits are particularly effective for spanning large gaps between large and tall canopy trees.

As indicated by Isnard and Rowe (2008), there is relatively little developmental and anatomical difference between climbing and non-climbing palms compared with clades of woody species in which climbers can develop highly derived and complex stem structures via anomalous secondary growth. Since the arguably more "simple" palm organization can develop effective "liana"-sized climbers it is intriguing that palms have not evolved the climbing habit more often and more widely, especially in the Neotropics. With just seven species, the Neotropics contain few representatives of the subfamily Calamoideae (Dransfield et al., 2008) and the observed diversity anomaly among regions could thus be related to phylogenetic constraints within palms (Bjorholm et al., 2006). Compared with the Calamoideae, few climbers have evolved within the Arecoideae (the largest palm subfamily in general and also within the Neotropics, Pintaud et al., 2008; **Figure 4**). This is reflected in genera such as *Chamaedorea* and *Dypsis* (Madagascar), which each contain one and two climbing species out of 110 and 140 species, respectively

(Dransfield et al., 2008). The failure to diversify as climbers might be related to the fact that they are non-spiny, in contrast to the majority of other climbers in the Calamoideae. The only exception in Arecoideae is the Neotropical spiny genus *Desmoncus* (**Figure 4**), which has a currently reported diversity between 12 and 24 species (Isnard et al., 2005; Henderson, 2011). Other tropical plant families show a similar pattern of low climber diversity in the Neotropics compared to the Paleotropics. For example, Annonaceae, an important TRF plant family (Couvreur et al., 2011b), have a single climbing species in the Neotropics compared to ∼500 species in the Paleotropics. This might point toward potential phylogenetic constraints on certain plant families to successfully evolve and diversify as climbers in a given region. Some families might be particularly successful in their evolution toward a climbing form in one region, for example, the Neotropics (e.g., Bignoniaceae; Lohmann et al., 2013) whereas others might not (e.g., palms and Annonaceae).

#### **THE EVOLUTIONARY DIVERSIFICATION OF CLIMBING PALMS**

Evidence suggests that the evolution of climbers promoted diversification within angiosperms (Gianoli, 2004). The diversification history of palms has not been homogenous across time (Baker and Couvreur, 2013b), and our results based on an improved diversification analysis approach (Rabosky, 2014) confirms this hypothesis. Indeed, a total of nine rate shifts across palms were detected with the highest posterior probability.

The analyses presented here suggest an important evolutionary role of climbers in explaining present-day palm diversity (hypothesis 2). Our ClaSSE analyses indicated that across palms, species with a climbing habit diversified on average 1.3 times faster when compared to species with a non-climbing habit (λ<sup>111</sup> > λ000; see **Table 3**, prediction 2a). Elevated diversification rates for particular clades are highly dependent on young estimated crown node ages (Linder, 2008). In our case, young crown node ages of large climbing genera such as *Calamus* and *Daemonorops* will strongly pull toward higher speciation rates for climbers compared to nonclimbers. An inverse result would occur if crown node ages were inferred to be very old for both these genera. Unfortunately, to date no valid estimations of crown nodes exist for genera within Calaminae. By constraining the crown node ages of all genera in our simulations to be between 20 and 80% of the stem node age ('constrained analysis'), we avoided bias due to very old or very young crown node ages. Under these conditions, the climbing

#### **FIGURE 4 | Evolutionary history of the climbing habit in palms.**

**(A)** Phylorate plot of the mean diversification rates sampled from the posterior (red = high diversification rates; blue = low diversification rates) resulting from the BAMM output using the dated generic-level chronogram (Couvreur et al., 2011a) in millions of years ago (mya). Each terminal branch represents a genus. The figure also shows the phylogenetic distribution of genera that include climbing species either containing entirely or mainly climbing species (red circles) or mainly non-climbing species (one or two species of climbers, blue circles). All other non highlighted genera are strictly non-climbers. The arrow indicates the crown node of subtribe Calaminae. Numbers refer to subtribes or

genera (1: Ancistrophyllinae; 2: Korthalsiinae (Korthalsia); 3: Plectocomiinae; 4: Calaminae; 5: Desmoncus; 6: Chamaedorea; 7: Dypsis. **(B)** DensityMap plot of the posterior probabilities (PP) of state 1 (climbing, blue) vs. state 0 (non-climbing, red) along branches based on 10,000 generations using SIMMAP. The arrow indicates the crown node of subtribe Calaminae. The five state changes from non-climbing to climbing are represented by a change in color from blue to red. Photos illustrate typical climbing palms: **(C)** Korthalsia zippelii from Papua New Guinea (Photo: William J. Baker, www.palmweb.org), and **(D)** Laccosperma robustum, a frequent species throughout Central Africa (Photo: Thomas L. P. Couvreur, www.palms.myspecies.info).

**FIGURE 5 | Most probable combination of diversification rate shifts across palms.** The six most credible rate shift sets (CSS) with the highest posterior probability using a Bayes factor (BF) threshold of 50 based on the dated generic-level chronogram (Couvreur et al., 2011a). For each distinct shift configuration, the locations of diversification rate shifts are shown with filled circles (red = rate acceleration).

Circle size is proportional to the marginal probability of that shift. Letters highlight clades that are associated with significant rates shifts (Ba, Bactridinae; Tr, Trachycarpeae; Ar, Areceae). The increase in diversification rates leading to the most species-rich clade of climbing palms, subtribe Calaminae (Ca), was not significant within the first six CSS.

habit was also found to have significantly higher speciation rates, albeit slightly smaller in magnitude (**Table 3**).

Despite the overall higher diversification rate associated to climbers, the impact of the climbing trait on the diversification of specific clades is unclear (prediction 2a). Based on the analysis of the BAMM output, an increase in diversification rates (**Figure 4A**) is detected around the crown node of subtribe Calaminae (subfamily Calamoideae), the most species rich clade of climbing palms with around 500 Southeast Asian species (Dransfield et al., 2008). Surprisingly, however, this increase is not significant when compared to other diversification rate shifts found across the family (**Figures 5** and **6**). Nevertheless, a previous diversification study using the same phylogenetic tree and data but based on a ML stepwise AIC approach implemented in MEDUSA identified two significant rate increases at the stem nodes of *Calamus* and *Daemonorops*, both genera belonging to Calaminae (Baker and Couvreur, 2013b). In addition, *Calamus* was found to have significantly more species than expected under a constant birth–death rate model as well as have significantly higher diversification rates when compared to other genera under a high extinction rate assumption (Baker and Couvreur, 2013b). Even though these results should be interpreted with caution, taken together they support the idea that the evolution of the climbing habit in Calaminae positively impacted diversification

rates in palms resulting in the speciation of a fifth of all palm species.

The increase in diversification rates in Bactridinae does not appear to be related to the climbing habit that evolved in the genus *Desmoncus*. Indeed, this rate increase, which was also detected with the MEDUSA analysis (rate shift 8 of Baker and Couvreur, 2013b that included *Desmoncus*, *Bactris*, and *Astrocaryum*), concerns most of this subtribe (excluding*Acrocomia*, **Figure 5**) and was suggested to be related to the evolution of epidermal spines, a common trait of all Bactridinae members which functions as a protection against herbivory in the Neotropics (Baker and Couvreur, 2013b).

The ancestral state reconstructions indicated that the climbing habit evolved a minimum of five independent times in palms: four times in Calamoideae and once in Arecoideae (**Figure 4B**). These results are consistent with the inferences for Calamoideae of Baker et al. (2000a). However, the climbing habit arose at least seven independent times because of the three climbing species within *Dypsis* and *Chamaedorea* that were not taken into account in the analysis presented here (both genera coded as non-climbing). At least in *Chamaedorea*, the single climbing species (*C. elatior*) is nested within the genus validating this assumption (Cuenca and Asmussen-Lange, 2007). In addition, these independent evolutions will be valid if our coding assumptions of *Calamus* and *Daemonorops* as ancestrally "climber" are also correct. Indeed, both these genera have a small proportion of non-climbing species (**Table 1**), and in this study we did not take them into account (both genera coded as climbers). The phylogenetic relations within both genera and for the subtribe Calaminae in general remain insufficiently understood and the exact placement of the non-climbing species is unresolved (Baker et al., 2000b).

Our results reveal an interesting pattern: the climbing habit appears to have had an impact on diversification rates in Calaminae, but not in other clades/genera where it evolved (Ancistrophyllinae, Plectocomiinae, *Korthalsia*, *Desmoncus*). For example, the climbing subtribes Ancistrophyllinae and Plectocomiinae (**Figure 4A**) have relatively old stem node ages associated with few extant species (Dransfield et al., 2008; Sunderland, 2012; Baker and Couvreur, 2013a; Faye et al., 2014). Why did the climbing habit have such an important effect on the diversification in a particular clade and not in other clades? One reason might be that different morphological adaptations are underlying the "same" trait (Donoghue, 2005), i.e., that convergent evolution (homoplasy) is constructed quite differently in different lineages and therefore has different impacts on diversification rates. For instance, in palms the climbing habit in different clades is associated with different combinations of morphological characters

Couvreur et al. Evolution of climbing palms

(Baker et al., 2000a; Isnard, 2006). Comparative studies of anatomical characters between Calaminae (*Calamus* and *Daemonorops*) and *Desmoncus* and subtribe Plectocomiinae (although based on the study of a few species) suggest that the evolution of a more flexible stem within subtribe Calaminae could explain the outstanding diversification of this group in terms of climbing mechanics (Rowe et al., 2004; Isnard, 2006). In addition, the evolution of a unique climbing organ (the flagellum, a modified inflorescence) has taken place within *Calamus* (Dransfield et al., 2008) and might have provided an additional advantage over other climbing structures in palms. Moreover, the presence of a "knee" a swelling at the junction of the leaf sheath and petiole, in most species in subtribe Calaminae and, to some extent, the African genus *Eremospatha* has been suggested as a potentially important trait for enhancing leaf strength at this important stress point (Isnard and Rowe, 2008). Overall, the repeated homoplasious occurrence of the climbing habit in the Calamoideae ('clustered homoplasy') could also indicate a more cryptic evolutionary innovation related to genetic or developmental precursors (Marazzi et al., 2012). In subfamily Calamoideae, this could be related to the tendency of organizing epidermal emergences as whorls, which manifest themselves, for example, both in the characteristic scales on the fruits as well as in organized grapnel spines on climbing organs. Finally, some life history traits might also act against increasing diversification rates. Indeed, Plectocomiinae and the genera *Korthalsia* (Korthalsiinae, **Figure 4C**) and *Laccosperma* (Ancistrophyllinae, **Figure 4D**) are hapaxanthic (individual stems die after a single flowering event), a condition which is generally not associated with high species richness across palms (Dransfield et al., 2008). Thus, several morphological novelties of subtribe Calaminae absent from other climbing genera (more flexible stems, the evolution of a flagellum in *Calamus*, the better mechanical role of the leaf sheath under stress, and the presence of a knee) possibly played decisive functional roles in explaining the diversification of this group when compared to other climbing palms.

Explaining geographic differences in diversification rates (e.g., numerous climbing species in Southeast Asia vs. few in the Neotropics) might be related to the fact that a particular trait only increases diversification rates under certain environmental conditions (de Queiroz, 2002). For instance, a highly dynamic geographic setting linked to a complex biogeographic history of Southeast Asia (Hall, 2009) might be important in explaining the extraordinary diversification of rattans in this region (Baker and Couvreur, 2012). The estimated increase in diversification rates around the crown node of subtribe Calaminae (34 Ma, **Figure 4A**) coincides with a period of important geological activity (Oligocene, early Miocene) induced by the collision of Sundaland and Australia (Hall, 2009). The Oligocene–Miocene boundary was also an important climatic transition going from a seasonally dry to an everwet aseasonal climate (Morley, 2007). In line with our results, numerous other studies have underlined the importance of the Miocene in the diversification of the Southeast Asian flora (Morley, 2007; Su and Saunders, 2009; Lohman et al., 2011; Nauheimer et al., 2012; Thomas et al., 2012; Bacon et al., 2013; Buerki et al., 2013; Richardson et al., 2014).

One important and characteristic plant family in Southeast Asia is Dipterocarpaceae (Appanah and Turnbull, 1998), with major peaks in diversity in Malaysia and mainland Southeast Asia. Its species are generally large emergent trees, which often dominate the upper canopy of the region's forests, contributing to the much taller stature of the forests in this region relative to, e.g., the Neotropics (Richards, 1996). Interestingly, the first record of the family Dipterocarpaceae in Southeast Asia (Borneo) after its dispersal from India (Dutta et al., 2011) as well as the start of its estimated radiation in the region (Morley, 2000) correspond to the timing of increased diversification rates within the Calamoideae (Oligocene to early Miocene). We therefore hypothesize that the diversification of dipterocarps in combination with the evolution of several morpho-anatomical traits in Calaminae species could have triggered the radiation of climbing palms in this region. Ecological opportunity responses of one clade based on the success of another have been suggested in other cases such as ferns (Schneider et al., 2004) or ants (Moreau et al., 2006) during the radiation of angiosperms.

#### **CONCLUSION**

Global diversity patterns of climbing palms show a diversity anomaly relative to other palms, with a strong peak of species richness in Southeast Asian rain forests and low species richness in other regions (e.g., the Neotropics). Present-day climate, forest canopy heights, and paleoclimatic changes in the Neogene and Quaternary can partly explain this pattern, but they do not provide a sufficient explanation for the extraordinary diversification of climbing palms in Southeast Asia. An increase in diversification rates in Calaminae, even though not significant based on our data, relative to other climbers and non-climbers might instead be the outcome of anatomical and morphological innovations, the complex biogeographic history of Southeast Asia, and/or ecological opportunity responses to the regional presence and diversification of tall canopy trees such as dipterocarps. We suggest that, in addition to climatic and paleoclimatic factors, such historical and evolutionary contingencies play an important role in explaining present-day biodiversity across TRFs. New datasets (e.g., global high-quality species distribution data at fine resolutions, well resolved species-level phylogenies, and additional morphological trait data) as well as novel analytical tools will likely increase our knowledge of palm diversification and our understanding of tropical rain forest evolution in the future.

#### **ACKNOWLEDGMENTS**

We thank John Dransfield for species-level information about climbing palms and Yanis Bouchenak-Khelladi for discussions and help with R scripts and diversification analyses. We are also grateful to two anonymous reviewers whose comments improved this manuscript. We thank James Richardson, Valenti Rull, and Toby Pennington for inviting us to submit our work to this special issue. W. Daniel Kissling acknowledges a University of Amsterdam (UvA) starting grant. Jens-Christian Svenning was supported by the European Research Council (ERC-2012-StG-310886-HISTFUNC) and the Danish Council for Independent Research | Natural Sciences (12-125079).

#### **REFERENCES**


Goldberg, E. E., and Igi´c, B. (2012). Tempo and mode in plant breeding system evolution. *Evolution* 66, 3701–3709. doi: 10.1111/j.1558-5646.2012.01730.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 May 2014; accepted: 10 December 2014; published online: 08 January 2015.*

*Citation: Couvreur TLP, Kissling WD, Condamine FL, Svenning J-C, Rowe NP and Baker WJ (2015) Global diversification of a tropical plant growth form: environmental correlates and historical contingencies in climbing palms. Front. Genet. 5:452. doi: 10.3389/fgene.2014.00452*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2015 Couvreur, Kissling, Condamine, Svenning, Rowe and Baker. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Patterns of diversification amongst tropical regions compared: a case study in Sapotaceae

### *Kate E. Armstrong1,2,3\*, Graham N. Stone2, James A. Nicholls 2, Eugenio Valderrama2,3, Arne A. Anderberg4, Jenny Smedmark5, Laurent Gautier 6, Yamama Naciri 6, Richard Milne7 and James E. Richardson3,8*


#### *Edited by:*

*Marshall Abrams, University of Alabama at Birmingham, USA*

#### *Reviewed by:*

*Marcial Escudero, Doñana Biological Station - Consejo Superior de Investigaciones Científicas, Spain Ze-Long Nie, Chinese Academy of Sciences, China*

#### *\*Correspondence:*

*Kate E. Armstrong, The New York Botanical Garden, 2900 Southern Boulevard, Bronx, NY 10458, USA e-mail: karmstrong@nybg.org*

Species diversity is unequally distributed across the globe, with the greatest concentration occurring in the tropics. Even within the tropics, there are significant differences in the numbers of taxa found in each continental region. *Manilkara* is a pantropical genus of trees in the Sapotaceae comprising *c.* 78 species. Its distribution allows for biogeographic investigation and testing of whether rates of diversification differ amongst tropical regions. The age and geographical origin of *Manilkara* are inferred to determine whether Gondwanan break-up, boreotropical migration or long distance dispersal have shaped its current disjunct distribution. Diversification rates through time are also analyzed to determine whether the timing and tempo of speciation on each continent coincides with geoclimatic events. Bayesian analyses of nuclear (ITS) and plastid (*rpl32-trnL*, *rps16-trnK*, and *trnS-trnFM*) sequences were used to reconstruct a species level phylogeny of *Manilkara* and related genera in the tribe Mimusopeae. Analyses of the nuclear data using a fossil-calibrated relaxed molecular clock indicate that *Manilkara* evolved 32–29 million years ago (Mya) in Africa. Lineages within the genus dispersed to the Neotropics 26–18 Mya and to Asia 28–15 Mya. Higher speciation rates are found in the Neotropical *Manilkara* clade than in either African or Asian clades. Dating of regional diversification correlates with known palaeoclimatic events. In South America, the divergence between Atlantic coastal forest and Amazonian clades coincides with the formation of drier Cerrado and Caatinga habitats between them. In Africa diversification coincides with Tertiary cycles of aridification and uplift of the east African plateaux. In Southeast Asia dispersal may have been limited by the relatively recent emergence of land in New Guinea and islands further east *c.* 10 Mya.

**Keywords: Sapotaceae,** *Manilkara***, pantropical, biogeography, diversification rates**

#### **INTRODUCTION**

Biodiversity is unevenly distributed across the globe and is most intensely concentrated in the tropics, particularly in wet tropical forests, which are the most species-rich biomes on the planet. Even within the tropics, there are significant differences in the floristic composition and the numbers of taxa found in each of the continental regions. It is estimated that there are *c*. 27,000 species of flowering plants in tropical Africa (Lebrun, 2001; Lebrun and Stork, 2003), compared with *c*. 90,000 for South America (Thomas, 1999) and *c*. 50,000 for Southeast Asia (Whitmore, 1998). This uneven species diversity raises the fundamental question of how variation in the pattern and tempo of speciation and extinction among continents might have driven observed patterns. Differences in diversity have been attributed to higher extinction rates in Africa (Richards, 1973) and faster diversification in the Neotropics (Gentry, 1982). Dated molecular phylogenies suggest speciation in response to recent climatic changes (such as aridification, e.g., Couvreur et al., 2008; Simon et al., 2009) or geological phenomena (such as mountain uplift in the Neotropics, e.g., Richardson et al., 2001; Hughes and Eastwood, 2006).

Intercontinental disjunctions in distribution between tropical regions of Africa, Asia and South America have been attributed to Gondwanan break-up (Raven and Axelrod, 1974), and/or the degradation of the boreotropical flora (e.g., Malpighiacaeae, Davis et al., 2002b; Meliaceae, Muellner et al., 2006; Moraceae, Zerega et al., 2005). However, current studies have shown that many tropical groups are of more recent origin (e.g., *Begonia*, Thomas et al., 2012), and that long distance dispersal has been an important factor in determining the composition of modern tropical floras (Pennington et al., 2006; Christenhusz and Chase, 2013). While long-distance dispersal could have occurred at any time, it was generally believed to be the only viable explanation for tropical intercontinental disjunctions younger than *c.* 33 Mya (although see Zhou et al., 2012).

Pantropically distributed taxa are excellent models for studying the evolution of tropical forests and regional variation in diversification rates between continents. *Manilkara* is a genus of trees in the Sapotaceae comprising *c.* 78 species distributed throughout the tropics (30 in South and Central America, 35 in Africa and 13 in Southeast Asia). This even spread and relatively low number of species across major tropical regions makes *Manilkara* an excellent candidate for comparison of regional diversification patterns and testing of hypotheses for the genesis of pantropical distributions. Here a near specieslevel dated phylogeny of *Manilkara* is presented. If the distribution of the genus can be explained by Gondwanan break up, the timing of phylogenetic splits would be expected to reflect that break up 165–70 Mya (McLoughlin, 2001). Similarly if splits resulted from the degradation of the boreotropical flora, they would be expected to occur as temperatures cooled following the Early Eocene Climatic Optimum/Paleocene– Eocene Thermal Maximum (EECO/PETM), 50–55 Mya (Zachos et al., 2001). Additionally, a boreotropical origin should leave a phylogeographic signature in the form of southern lineages being nested within more northern ones. Therefore, lineages in South America or to the east of Wallace's Line would be nested within Laurasian lineages, resulting in the pattern one would expect from a retreat of the boreotropical flora from the Northern Hemisphere. The onset of glaciation from 33 Mya induced further global cooling (Zachos et al., 2001) and the disintegration of the boreotropical flora. Therefore, ages of splits younger than *c*. 33 Mya would most likely be explained by long distance dispersal. The prediction advanced by Gentry (1982) that diversification rates in the Neotropics have been higher than in other tropical regions is also tested.

#### **MATERIALS AND METHODS**

#### **DNA EXTRACTION, PCR, SEQUENCING, AND ALIGNMENT**

Evolutionary relationships were reconstructed using nuclear (ITS) and plastid (*rpl32-trnL, rps16-trnK*, and *trnS-trnFM*) sequences. Divergence times were calculated using an ITS dataset with 171 accessions of Sapotaceae. In total 53 of the global total of 79 *Manilkara* species (67%) were included in the analysis. The dataset includes representatives of the tribe Mimusopeae as well as multiple representatives of the tribes Isonandreae and Sideroxyleae, which also belong to the subfamily Sapotoideae, in order to accommodate calibration of fossils related to those groups. The tree was rooted using *Sarcosperma*, shown in previous studies to be sister to the rest of the family (Anderberg and Swenson, 2003). The plastid dataset comprised 95 accessions of subtribe Manilkarinae, as well as outgroups in subtribe Mimusopinae, plus *Northia*, *Inhambanella*, *Eberhardtia*, and *Sarcosperma*, which provided the root for the tree. See **Supplementary Table 1** for the list of taxa with voucher specimen information and GenBank accession numbers.

Total DNA was extracted from herbarium specimens and silica gel-dried leaf samples using the Qiagen Plant DNeasy Mini Kit following the manufacturer's instructions. Amplifications of the ITS region were performed using the ITS5p/ITS8p/ITS2g/ITS3p (Möller and Cronk, 1997) and ITS1/ITS4 (White et al., 1990) primer pairs. Polymerase chain reaction (PCR) was carried out in 25-µL volume reactions containing 1µL of genomic DNA, 5.75µL sterile distilled water, 2.5 µL 2 mM dNTPs, 2.5µL 10x NH4 reaction buffer, 1.25µL 25 mM MgCl2, 0.75µL of each 10µM primer, 10µL 5 M betaine, 0.25µL BSA and 0.25µL of 5 u/µL Biotaq DNA polymerase buffer. The thermal cycling profile consisted of 5 min denaturation at 95◦C, followed by 35 cycles of 30 s at 95◦C for denaturation, 50◦C for 30 s for annealing and 72◦C for 1 min and 30 s for extension with a final extension period of 8 min at 72◦C on a Tetrad2 BioRad DNA Engine. Extraction from herbarium specimens often yielded low amounts of degraded DNA and required nested PCR to amplify quantities sufficient for sequencing. In nested PCR the ITS5/ITS8 primer pair was used in the first reaction. 1µl of this PCR product was then used in a second PCR with the ITS1/ITS4 primer pair and the same thermocycling profile. Further internal primers, ITS2g and ITS3p, were used in place of ITS1 and ITS4 when amplification using the latter primers was unsuccessful. Plastid markers were amplified using *rpl32-trnL* (Shaw et al., 2007), *rps16-trnK* (Shaw et al., 2007), and *trnS-trnFM* (Demesure et al., 1995) primer pairs as well as *Manilkara*-specific internal primers designed for this study (**Supplementary Table 2**). PCR was carried out in 25µL volume reactions containing 1µL of genomic DNA, 15.25µL sterile distilled water, 2.5µL 2 mM dNTPs, 2.5µL 10x NH4 reaction buffer, 1.25µL 25 mM MgCl2, 0.75µL of each 10µM primer, 0.8µL BSA and 0.2µL of 5 u/µL Biotaq DNA polymerase buffer. All plastid regions were amplified using the *rpl16* program of Shaw et al. (2005). Nested PCR was also performed on selected accessions using self-designed internal primers (**Supplementary Table 2**). PCR products were purified using Exo-SAP (GE Healthcare) according to the manufacturer's instructions.

Sequencing PCRs were carried out using the BigDye Terminator v. 3.1 Cycle Sequencing Kit (Applied Biosystems) and were purified and sequenced on an ABI 3730 sequencer at the University of Edinburgh's GenePool facility. Forward and reverse sequences were assembled into contiguous sequences (contigs) and edited using the alignment software Sequencher ver. 4.7. Edited contigs were assembled and aligned by eye in MacClade ver. 4.08 (Maddison and Maddison, 2008) and later in BioEdit ver. 7.0.5 (Hall, 2005).

Potentially informative indels in the plastid dataset were coded according to the simple indel coding method of Simmons and Ochoterena (2000). Ambiguous alignment regions 113–118 and 380–459 in *rps16-trnK* were excluded. Indel events in ITS were so frequent that their coding as additional characters was deemed to be too ambiguous. Gaps were treated as missing data and all characters were equally weighted.

The ITS dataset was partitioned into three segments: ITS1 (372 bp), 5.8 s (167 bp), and ITS2 (339 bp). Plastid regions and their indels were retained as separate partitions: *rpl32-trnL* (1130 bp + 26 indels), *rps16-trnK* (1134 bp + 21 indels), and *trnS-trnFM* (999 bp + 13 indels).

#### **PHYLOGENETIC ANALYSIS**

Bayesian analyses were carried out using MrBayes 3.1 (Huelsenbeck and Ronquist, 2001). Two independent runs of four Metropolis Coupled Monte Carlo Markov Chains (MCMCMC) each (three heated and one cold) were run with a temperature setting of 0.10 for 8,000,000 generations, which was found to provide sufficient mixing between chains and convergence between runs. Trees were sampled every 8000 generations and a 10% burn-in was removed from the sampled set of trees leaving a final sample of 900 trees, which were used to produce a majority rule consensus tree. Convergence of models was determined to have occurred when the standard deviation of split frequencies for two runs reached 0.01 (Ronquist et al., 2005). Appropriate burn-in and model convergence were checked by visual confirmation of parameter convergence of traces in Tracer v.1.5 (Rambaut and Drummond, 2009). Clade support values are posterior probabilities (pp); pp values of 100–95% are taken to indicate strong support, values of 94–90% moderate support, and values between 89 and 55% weak support for nodes, respectively. The output tree files were visualized in FigTree v.1.3.1. The majority rule consensus tree was used to determine the monophyly of key clades used to define calibration points in the dating analysis.

Plastid data were not included in the subsequent BEAST analysis because they were not informative enough to discern between alternative hypotheses and because fewer taxa were sampled. Additionally, hard incongruence was demonstrated between the topologies reconstructed in MrBayes from the nuclear and plastid datasets (see Supplementary Material Section on chloroplast capture, and **Supplementary Figure 1**). Therefore, the two datasets were not combined and only nuclear data was used for divergence time analysis.

#### **FOSSIL CALIBRATION**

Sideroxyleae pollen from the Ypresian (47.8–56 Mya) of England (Gruas-Cavagnetto, 1976) was used to constrain the minimum age of the Sideroxyleae stem node (node B in **Figure 1**). A log normal prior was used to constrain the age of this node (offset: 52.2 Ma, mean: 0.001). A mean of 0.001 was chosen so that 95% of the probability is contained in an interval between the midpoint and the upper boundary of the Ypresian (52.2–55.6 Mya). A Mid-Eocene (37.2–48.6 Mya) *Tetracolporpollenites* pollen grain from the Isle of Wight was used to constrain the minimum age of the node for the tribe Mimusopeae. This pollen grain was described by Harley (1991) and determined to closely resemble *Tieghemella heckelii* (a monotypic genus in the Mimusopeae). Harley suggested (pers. comm. 2010) that it would be appropriate to err on the side of caution with the identification and use the fossil to constrain the age of the tribe Mimusopeae rather than the genus itself. This fossil was, therefore, used to constrain the age of the crown node of Mimusopeae (node D in **Figure 1**: offset: 42.9 Mya, mean: 0.095). A mean of 0.095 was chosen so that 95% of the probability was contained in an interval between the midpoint (42.9) and the upper boundary of the mid Eocene (42.9–48.6 Mya). The final calibration point is based on a series of Oligocene (23–33.9 Mya) fossil leaves from Ethiopia (Jacobs et al., 2005). Pan described these specimens as *Sapoteae* sp. and suggested possible placement in either *Manilkara* or *Tieghemella* (pers. comm. 2010) based on the occurrence of stoma surrounded by fimbricate periclinal rings, a character present in these genera, but absent from the related genera *Autranella* and *Mimusops*. Although they are both members of the Tribe Mimusopeae, *Manilkara* and *Tieghemella* are not sister taxa, and placing the fossil at the node of the most recent common ancestor (the entire Tribe Mimusopeae) seemed illogical for such a young date, when a 45 Mya fossil pollen grain of cf. *Tieghemella* was a better fit for the same node. Instead, the fossil was alternatively placed at the *Manilkara* crown node (node Q in **Figure 1**) and on the node of the split between *Tieghemella* and *Autranella* (node I in **Figure 1**), in order to determine whether placement on either genus made a significant difference to age estimates using a prior age estimate with an offset of 28 Mya, mean: 0.1. A mean of 0.1 was chosen so that 95% of the probability was contained in an interval between the midpoint and the upper boundary of the Oligocene at (28-33.9 Mya).

#### **DATING ANALYSIS**

The software package BEAST v.1.7.5 (Drummond and Rambaut, 2007) was used to analyze divergence times in the ITS dataset. An xml input file was created in BEAUti v.1.7.5. Substitution models were unlinked across partitions, but clock models and tree topologies were kept on the linked default setting. Four taxon sets per analysis were generated in order to define nodes for placement of fossil calibration points. They were based on known monophyletic clades from previous analyses and were constrained to be monophyletic.

The GTR + I + G model was applied to each partition. The mean substitution rate was not fixed and base frequencies were estimated. Following support for a molecular clock in these data using MrBayes, an uncorrelated log-normal model was selected to allow for relaxed clock rates and rate heterogeneity between lineages. A speciation: birth-death process tree prior was used with a randomly generated starting tree. The most recent common ancestor (MRCA) node age priors were set to define calibration points using taxon sets. All other priors were left at default settings that were either uniform or gamma-distributed. Posterior distributions for each parameter were estimated using MCMCMC run for 40,000,000 generations, with parameters logged every 5000 generations, giving 8000 samples per run. The BEAUti xml file was executed in BEAST v.1.7.5. Two separate analyses were run and the output log files were reviewed in Tracer v.1.5 (Rambaut and Drummond, 2009) to check for convergence between runs and adequate effective sampling sizes (ESS) of >200 (Drummond et al., 2007). The tree files from the two runs were combined in LogCombiner v.1.7.5 (Drummond and Rambaut, 2007) with a conservative burn-in of 4000 generations. The combined tree files were input into TreeAnnotator v.1.5.3 (Drummond and Rambaut, 2007). The Maximum Clade Credibility (MCC) tree was selected with mean node heights; this option summarizes the tree node height statistics from the posterior sample with

#### **FIGURE 1 | Maximum clade credibility chronogram of the ITS dataset.** Dashed lines indicate branches which lead to nodes with a posterior probability of <0.95. Mean ages are given for profiled nodes. Node bars indicate 95% HPD age ranges. Lettered nodes are discussed in the text. Stars indicate the placement of fossils. Lineages are colored according to their distribution: Yellow, Africa; Green, Madagascar; Blue, Asia; Pink, South America; Orange, Central America and the Caribbean. Geological

epochs are indicated in a scale at the bottom of the chronogram. Outgroups have been reduced to gray bars at the base of the chronogram. Ten regions were coded in the ancestral area reconstruction as illustrated in the map and legend. Pie charts represent the percentage likelihood of the ancestral state at the selected node. Map inset depicts the timing and direction of long-distance dispersal events reflected in the chronogram.

the maximum sum of posterior probabilities. The output file was visualized in FigTree v.1.3.1.

#### **ANCESTRAL AREA RECONSTRUCTION IN RASP**

Ancestral area states were reconstructed in RASP (Reconstruct Ancestral State in Phylogenies; http://mnh.scu.edu.cn/soft/blog/ RASP) software that implements Bayesian Binary MCMC (BBM) time-events curve analysis (Yu et al., 2011) and allows multiple states to be assigned to terminals. BBM suggests possible ancestral ranges at each node and also calculates probabilities of each ancestral range at nodes. The analysis was performed using the MCC tree generated in BEAST as an input file, with 5,000,000 cycles, 10 chains, sampling every 100 cycles, with a temperature setting of 0.1 and with the maximum number of areas set to four for all nodes. The root node was defined *a priori* as Asian; because the Asian taxa *Sarcosperma* and *Eberhardtia* form a grade within which the rest of the family is nested, this is the most likely state for the crown node of the family.

Areas are coded according to continent, based predominantly on tectonic plate margins and then on floristic regions (**Figure 1**). In Southeast Asia, the Sahul and Sunda Shelves (which mark the boundary between continental Asia and Australia-New Guinea) were coded as separate states within the Malesia floristic region, which stretches from the Isthmus of Kra on the Malay Peninsula to Fiji (Takhtajan, 1986; Van Welzen et al., 2005). East Asia is defined as being east of the Himalayas and south as far as the Malay Peninsula, with a predominantly Indo-Chinese flora. South Asia is delineated by the margin of the Indian subcontinent. The countries of Iran, Turkey and the Arabian Peninsula support a drier Irano-Turanian flora (Takhtajan, 1986) and were, therefore, designated as being part of the Middle-Eastern region. The remaining regions (the Seychelles, Madagascar, Africa and North and South America) are all on separate continental tectonic plates and are floristically unique from one another (see **Supplementary Table 1** for species-specific area codes).

#### **DIVERSIFICATION RATE METHODS**

A separate ITS lineage through time (LTT) plot dataset (hereafter referred to as ITS LTT) was used to compare diversification rates within *Manilkara*. Because the genus was found to be paraphyletic, with the Southeast Asian *M. fasciculata* clade (P in **Figure 1**) being more closely related to *Labourdonnaisia* and *Faucherea*, this small clade was excluded, leaving only the monophyletic lineage of *Manilkara s.s.* (clade Q in **Figure 1**) for analysis. Additionally, only one individual per species was included. The simple diversification rate estimators of Kendall (1949) and Moran (1951) were calculated for the African, Neotropical and Asian clades, where the speciation rate lnSR = [ln(N)−ln(N0)]/T (N = standing diversity, N0 = initial diversity, here taken as = 1, and T = inferred clade age). This is a pure-birth model of diversification with a constant rate and no extinction (Magallón and Sanderson, 2001). Another model that does not assume constant rates of speciation and extinction through time within lineages was applied using BAMM (Bayesian Analysis of Macroevolutionary Mixtures; Rabosky, 2014). BAMM uses a reversible-jump Markov Chain Monte Carlo to explore shifts between macroevolutionary regimes, assuming they occur across the branches of a phylogenetic tree under a compound Poisson process. Each regime consists of a time-varying speciation rate (modeled with an exponential change function) and a constant rate of extinction. The BAMM analysis used the BEAST MCC tree, but because not all species were sampled, it was necessary to specify to which lineage each of the missing taxa belonged (i.e., to which species it was most closely related based on morphological similarity). The results of the analysis with adjustments to account for missing taxa were not different from those assuming complete taxon sampling. Two MCMC simulations were run with 5,000,000 generations, sampling every 1000, and discarding the first 10% as burn-in. Appropriate priors for the ITS LTT phylogeny, convergence of the runs and effective sampling size were each estimated using the BAMMtools (Rabosky, 2014) package in R (R development team).

LTT plots were generated using phytools (Revell, 2012) in R for 1000 trees sampled through the post-burn-in (20%) posterior distribution generated by BEAST (see above for details). The median and 95% highest posterior density (HPD) were estimated for the ages of each number of lineages in each plot. To compare the observed LTT plots with the predictions of a model with constant diversification rates, 1000 trees were simulated using the mean speciation and extinction rates estimated by BAMM in TreeSim (Stadler, 2011). Simulations used the age of the most recent common ancestor of each of the 1000 observed trees and the current number of species per plot. LTT plots were drawn for the trees including all species of *Manilkara s.s.* and to examine region-specific patterns for pruned lineages that included only those species from each of Africa, the Neotropics and Asia.

## **RESULTS**

#### **NODE AGES**

Mean ages with 95% HPD confidence intervals for key nodes are reported in **Table 1**. The MCC tree from the BEAST analysis (**Figure 1**) resolves the mean crown age of the tribe Mimusopeae as 43 Mya (HPD 44–42 Mya; node D), in the Mid Eocene. The mean age of subtribe Manilkarinae is estimated to be 32 Mya (HPD 36–29 Mya; node K) and the genus *Manilkara* is resolved as 29 Mya (HPD 32–28 Mya; node Q), both having originated during the Oligocene. Results also reveal that cladogenesis and inter-continental dispersal (see below and **Figures 1**, **3**) within *Manilkara* occurred from the Oligocene through the Miocene and most intensively from the mid-late Miocene.

#### **ANCESTRAL AREA RECONSTRUCTION AND INTERCONTINENTAL DISPERSAL EVENTS**

Ancestral area inferences and likelihood support are given in **Table 1** and **Figure 1**, which also indicates the age and direction of inferred dispersal events. The tribe Mimusopeae, subtribe Manilkarinae and the genera *Manilkara*, *Labramia*, and *Faucherea/Labourdonnaisia* are all inferred to have African ancestry (**Figure 1**).

Following its origin in Africa during the Oligocene 32 Mya (HPD 36–29; node K) and subsequent diversification 29 Mya (HPD 32–28 Mya; node Q), *Manilkara s.s*. spread via long distance dispersal to Madagascar twice, Asia once and the Neotropics once during the Oligocene–Miocene. Both the



*Faucherea/Labourdonnaisia/Manilkara* clade (N) (28 Mya; HPD 33–23 Mya) and the genus *Mimusops* (clade J) (22 Mya; HPD 28–17 Mya) also exhibit a similar pattern, having originated in Africa and later dispersed to both Madagascar and Asia during the Miocene.

Long-distance dispersal from Africa to Madagascar and the surrounding islands has occurred on multiple occasions in the tribe Mimusopeae: twice in *Manilkara s.s.* (X3 and X4, 8–4 Mya); at least once for the clade comprising *Labramia*, *Faucherea*, and *Labourdonnaisia* between 32 Mya (HPD 36–29; node K) and 30 Mya (HPD 35–26 Mya; node L); and twice in *Mimusops* between 22 Mya (HPD 28–17 Mya; node J) and 9 Mya (HPD 13–5 Mya; node J1), as well as 5 Mya (HPD 2–6 Mya; node J3).

The Neotropical *Manilkara* clade (S) is also derived from an African ancestor, which dispersed to South America during the Oligocene–Miocene between 26 Mya (HPD 30–22 Mya; node R) and 18 Mya (HPD 22–14 Mya; node S). From South America, further dispersal occurred to Central America 16–15 Mya and throughout the Caribbean islands starting from 15 to 10 Mya.

Asia was reached by three independent dispersal events within the tribe Mimusopeae. *Manilkara s.s.* reached Asia from Africa between 27 Mya (HPD 30–23 Mya; node W) and 23 Mya (HPD 27–19 Mya; node Y), while *Mimusops* did the same 8–6 Mya (node J2). The *Manilkara fasciculata* clade reached Asia from Madagascar between 28 (HPD 33–23 Mya; node N) and 15 Mya (HPD 20–10 Mya; node P).

#### **DIVERSIFICATION RATES**

Net diversification rates (lnSR) differed somewhat between regions, ranging from a lowest mean value of 0.06 (0.05–0.07) for the Asian lineage, through 0.10 (0.09–0.10) for the African lineage to a maximum of 0.15 (0.12–0.19) for the Neotropical lineage. Despite sampling models with up to five different macroevolutionary regimes, BAMM analysis selected models without shifts between macroevolutionary regimes along the *Manilkara s.s.* phylogeny, with the highest posterior probability obtained for zero shifts models, i.e., a single, constantly varying net diversification rate throughout the history of the genus (**Figure 2**). Bayes Factor comparison, following the criteria of Kass and Raftery (1995) provided unsubstantial support (1.68) for the zero shifts models over the models including shifts between macroevolutionary regimes.

LTT plots are presented in **Figure 3**, for all regions (**Figure 3D**) and for the pruned African, Asian and Neotropical lineages (**Figures 3A–C** respectively). The figure shows both observed rates, and rates predicted for the same numbers of lineages evolving under a constant net diversification rate process (i.e., constant speciation and extinction rates, estimated using BAMM for the genus *s.s.*). None of the observed LTT patterns diverge significantly from those predicted assuming a constant diversification rate. The analyses including all *Manilkara s.s.* lineages (**Figure 3D**) and only the Neotropical lineage (**Figure 3C**) both show a good fit between observed patterns and those predicted

under a constant diversification rate. In contrast, African lineages (**Figure 3A**) show a trend toward reduced diversification rates from 25 to 12 Mya, followed by an increase in diversification rates to levels matching those in the Neotropics from 12 Mya to the present. The Asian lineage shows low and decreasing diversification rates toward the present. While the Asian pattern is derived from just eight species, and thus any observed pattern must be interpreted with caution, it is striking that Asia produced no new lineages during the last 7 Mya, at a time when Africa and the Neotropics were both showing rapid diversification.

## **DISCUSSION**

#### **ORIGIN OF** *MANILKARA*

The tribe Mimusopeae evolved ∼52 Mya (HPD 58–48 Mya; node C) and began to diversify 43 Mya (HPD 44–32 Mya; node D) during the Eocene when global climates were warmer and wetter and a megathermal flora occupied the northern hemisphere. This age estimate also coincides with the first occurrence of putative Mimusopeae fossils recorded from North America and Europe, e.g., *Tetracolporpollenites brevis* (Taylor, 1989), and *Manilkara* pollen (Frederiksen, 1980) in addition to the *Tetracolporpollenites sp.*, pollen grain (Harley, 1991), used in this study, which give further weight to the hypothesis that the tribe Mimusopeae was present in the boreotropics and may have originated there. Previous studies (Smedmark and Anderberg, 2007) implicate the break-up of the boreotropics in creating intercontinental disjunctions in the tribe Sideroxyleae and data from the present study are consistent with this hypothesis. Smedmark and Anderberg's (2007) estimate for the age of Sideroxyleae was 68 Mya and in this study the crown node age is reconstructed as being 62 Mya (HPD 73–52 Mya; node B).

The subtribe Manilkarinae evolved 39 Mya (HPD 43–35 Mya; node G), consistent with the hypothesis that it arose late during the existence of the boreotropics. Diversification began 32 Mya (HPD 36–29 Mya; node K), around the time that global cooling and the widening Atlantic were breaking up the boreotropics. Hence migration toward the equator as the climate in the northern hemisphere cooled might have caused or promoted diversification. This transition from the northern hemisphere to equatorial latitudes is also reflected in the putative Manilkarinae fossil record, where during the Oligocene, there is still a strong representation of northern fossils [e.g., Isle of Wight, UK (Machin, 1971), Vermont, USA (Traverse and Barghoorn, 1953; Traverse, 1955) and Czechoslovakia (Prakash et al., 1974)], but fossils also begin to appear in Africa (e.g., *Sapoteae* sp. leaves in Ethiopia, Jacobs et al., 2005). Further cooling and aridification during the Oligocene coincides with diversification of Manilkarinae into genera and may have been a causal factor in this diversification. Alternatively, Manilkarinae may have originated in Africa, as suggested by the ancestral area analysis. However, the analysis cannot account for southward climate shifts and the modern absence of the group from higher latitudes.

*Manilkara* is nested within a grade of other representatives of the tribe Mimusopeae, which is predominantly composed of African taxa (*Mimusops*, *Tieghemella*, *Autranella*, *Baillonella*, *Vitellaria*, and *Vitellariopsis*) and this suggests that the genus may have had its origin there. In the ancestral area reconstruction both *Manilkara* and the subtribe Manilkarinae are resolved as having a 96% likelihood of an African origin, and the tribe Mimusopeae is reconstructed as having a 99% likelihood of originating in Africa. As such, there is very strong support for an African ancestry for the genus *Manilkara*, the subtribe Manilkarinae and the tribe Mimusopeae.

#### **THE ORIGIN OF** *MANILKARA***'S PANTROPICAL DISTRIBUTION**

Intercontinental disjunctions in *Manilkara* are too young (27–4 Mya) to have been caused by Gondwanan break-up, which would have had to occur before 70 Mya. *Manilkara* is also too young for its pantropical distribution to be the result of migration through the boreotropics, which would have had to occur between 65 and 45 Mya, after which the climate would have been too cool for tropical taxa to cross the North Atlantic Land Bridge, even though this might have persisted until ∼33 MYA (Milne and Abbott, 2002). The most likely period for migration of tropical taxa by this route was during the PETM/EECO, 55–50 Mya (Zachos et al., 2001). Furthermore, a boreotropical origin should leave a phylogeographic signature in the form of southern lineages being nested within more northern ones. However, South American lineages are not nested within Central American lineages, and neither are those southeast of Wallace's line nested within those to the northwest. With these vicariance-based explanations not supported, *Manilkara*'s disjunct pantropical distribution could only have resulted from long-distance dispersal from Africa to Madagascar, Asia and the Neotropics. This has been demonstrated for numerous other groups distributed across the tropics, e.g., *Begonia* (Thomas et al., 2012) and *Renealmia* (Särkinen et al., 2007).

*Manilkara* has fleshy, sweet fruit ranging in size from 1.5 to 10 cm, which are consumed by a wide variety of animals. With seeds that are too bulky for wind dispersion, it is more likely that long distance dispersal could have been achieved through

Asian lineages.

transport in the gut-contents of birds or by transoceanic rafting in large mats of vegetation. Houle's (1998) study demonstrated that during the Miocene, intercontinental rafting could have occurred

in less than 2 weeks on the North and South Equatorial currents.

ages for each number of lineages in solid and dashed lines, respectively. The lines for observed trees are shown in blue and for the trees

## **REGIONAL DIVERSIFICATION IN** *MANILKARA*

Within the Neotropics, *Manilkara* first colonized South America, as indicated in the reconstruction of the ancestral distribution of clade S. The South American clade (U) is divided into two subclades, which correspond to contrasting regional ecologies, with one clade (U1) comprised of Amazonian species and the other (U2) of Atlantic coastal forest species. The only inconsistency in this geographic pattern is the second accession of *Manilkara cavalcantei* (b), an Amazonian species that the analysis places in the Atlantic coastal forest clade. However, in the plastid tree (**Supplementary Figure 1**) this accession is resolved in a strongly supported (0.99 pp) Amazonian clade with *M. bidentata*, *M. huberi*, and *M. paraensis*. The phylogenetic split between these two regions occurred during the Mid-Miocene (12–10 Mya), when the Andes were being elevated (Gregory-Wodzicki, 2000; Graham, 2009) and drainage systems in the Amazon basin began to shift eastwards.

Atlantic coastal species in clade U2 and Amazonian species in clade U1 are geographically separated by the dry biomes of the Cerrado and the Caatinga, as well as the higher relief of the Brazilian shield. Simon et al. (2009) and Fritsch et al. (2004) found that the origin of dry-adapted Cerrado Leguminosae and Melastomataceae lineages span the Late Miocene to the Pliocene (from 9.8 to 0.4 Mya), broadly coinciding with the expansion of C4 grass-dominated savanna biomes. However, it is likely that a dry environment would have been present just prior to this time to allow for adaptation of these groups to the new biome. Such timing is exhibited by the Microlicieae (Melastomataceae), where the crown node is 9.8 Mya, and the stem node is 17 Mya (Fritsch et al., 2004). *Manihot* (Euphorbiacae) species of this biome began to diversify from 6.6 Mya (Chacón et al., 2008). Likewise, a phylogenetic study of *Coursetia* (Leguminosae) (Lavin, 2006) reveals that species which inhabit the dry forest of the Brazilian Caatinga are 5–10 My old. This suggests that the Cerrado and Caatinga could have been in existence, at least in part, by the time the South American *Manilkara* subclades U1 and U2 diverged ca.12 Mya, and their development may have driven the geographical split in this South American lineage of *Manilkara*.

suggest lower diversification rates in part of the histories of African and

The Central American/Caribbean clade (T) originated following dispersal from South America 16–15 Ma, and then split geographically into a Central American subclade (T1, 6 Ma), and a Caribbean subclade (T2, 11 Ma). The only exception to this geographical structure is the single Central American species, *M. chicle* (T3), which is nested in the Caribbean clade, suggesting a Pliocene dispersal (2 Ma) back to the continent. These age estimates place the New World spread of *Manilkara* prior to the estimated age of the closing of the Isthmus of Panama ∼3.5 Ma (Coates and Obando, 1996), although recent studies (Farris et al., 2011) indicate that the Isthmus may have closed much earlier, in which case *Manilkara* may have taken an overland route. Overwater dispersal between Central and South America has been demonstrated in numerous other plant taxa (Cody et al., 2010).

African *Manilkara* species are resolved in two clades, both of which are Oligo-Miocene in age. The main African/Madagascan clade (X) is estimated to be 15 My old (HPD 18–11 Mya), and the smaller clade (V) is 21 My old (HPD 27–15 Mya). Africa has been affected by widespread aridification during the Tertiary (Coetzee, 1993; Morley, 2000). The response by *Manilkara* to this changing climate could have been migration, adaptation or extinction. A study of the rain forest genera *Isolona* and *Monodora* (Annonaceae) found that throughout climatic cycles, taxa remained in remnant pockets of wet forest (Couvreur et al., 2008). They are, therefore, an example of a group that migrated or changed its distribution to track wetter climates. Another study of the genus *Acridocarpus* (Malpighiaceae) (Davis et al., 2002a) indicated an east African dry forest adapted lineage nested within a wet forest lineage. The dry adapted lineage was dated to periods of Oligo-Miocene aridification, and is, therefore, an example of a wet forest lineage, which has adapted to changing environmental conditions rather than becoming restricted to areas of favorable climate. The timing of diversification and evolution of dry-adapted species vs. wet-restricted species in the three African *Manilkara* clades suggests a combination of both scenarios. The split between the African clades occurred between 29 Mya (HPD 32–28 Mya; node Q) and 26 Mya (HPD 30–22 Mya; node R), during a period of dramatic continent-wide cooling, which fragmented the Eocene coast to coast rain forest, potentially isolating the three lineages. A second wave of diversification within the main African/Madagascan clade (X) coincides with the Mid-Miocene climatic optimum 17–15 Mya, when global temperatures warmed (Zachos et al., 2001). During the same period the collision of the African and Eurasian plates closed the Tethys Sea, instigating further aridification. The resulting drier and warmer climates caused the spread of savannas and the retraction of rain forest, as evidenced by an increase in grass pollen during this period (Morley, 2000; Jacobs, 2004). Nonetheless, cladogenesis in the main African/Madagascan clade (X) gained pace from the Mid-Miocene onwards. In particular, a third wave of diversification from rain forest into drier shrubland environments in eastern and southern Africa occurred subsequent to the main uplift of the Tanganyikan plateau in the East African Rift System ca. 10 Mya, which had a significant impact on further regional aridification (Lovett and Wasser, 1993; Sepulchre et al., 2006) (**Table 1**).

Clade X is predominantly composed of Guineo-Congolian rain forest species. This is almost exclusively the case in subclade X1, aside from the Madagascan taxa, which are also rain forest species. However, within subclade X2, there is a transition from wet to dry environments. The sole Madagascan taxon in this lineage (*M. sahafarensis*) is a dry, deciduous forest species. The four dry, eastern-southern African taxa in subclade X2 (*M. discolor,* *M. sansibarensis, M. butugi, M. cuneifolia*) all evolved between 8 and 5 Mya subsequent to the main uplift of the East African Rift System. The ancestor of the smaller African clade composed of *M. mochisia* and *M. concolor* also diversified into these two dryadapted eastern/southern species at the same time 6 Mya (HPD 10–2 Mya). Hence, some African *Manilkara* lineages adapted to a drying climate, while others remained in their ancestral rain forest habitat.

Within the main Asian clade of the plastid phylogeny (Yc1, **Supplementary Figure 1**), the Indian species *Manilkara roxburghiana* is sister to the other species and the two Fijian species are among the most derived, consistent with the hypothesis that the founding dispersal event was from Africa to India with subsequent spread eastward into Malesia. However, ancestral area reconstruction of the ITS data (node Y, **Figure 1**) suggests that migration within Asia was from east to west (Sahul Shelf to Sunda Shelf) 23 Mya (HPD 27–19 Mya). Dated phylogenies also indicate that many other angiosperm groups have crossed Wallace's Line from the late Miocene onwards: *Pseuduvaria* (Annonaceae) (Su and Saunders, 2009), Aglaieae (Meliaceae) (Muellner et al., 2008), at least four separate lineages of *Begonia* (Begoniaceae) (Thomas et al., 2012) and *Cyrtandra* (Gesneriaceae) (Cronk et al., 2005). In Sapotaceae four lineages of Isonandreae have migrated from west to east across Wallace's Line (Richardson et al., 2014), whereas evidence from the tribe Chrysophylloideae suggests recent movement in the opposite direction, from Sahul to Sunda Shelf (Swenson et al., 2008). The two youngest (9 Mya) Asian species (*M. vitiensis* and *M. smithiana*) are both Fijian. The oldest land available for colonization in Fiji is between 14 and 5 Mya (Johnson, 1991; Heads, 2006) hence, the age of these two Fijian taxa coincides with the first emergence of land in the archipelago.

### **DIVERSIFICATION RATES OF** *MANILKARA* **IN DIFFERENT PARTS OF THE TROPICS**

The BAMM analysis did not support significant rate variation among lineages or regions in *Manilkara s.s.* Despite apparent variation in regional patterns revealed by LTT plots (**Figure 3**), the data most strongly support a model with a single net diversification rate throughout the genus. Trends within the data for specific regions only suggest departure from a constant rate model in Asia and Africa. Given that observed patterns do not exceed the 95% confidence intervals for the constant rate model for either region, these trends must be considered with caution. This is particularly true for Asia, for which the pattern was derived from only eight species. Because sensitivity and statistical power of methods for detection of shifts in diversification rates may correlate positively with the number of species in the clade (Silvestro et al., 2011), rate shifts in clades with a small number of species (as in Asia for *Manilkara s.s.*) may not have been detected by the methods used here (a potential type two error). A simulation study would be required to examine the impact of taxon number on type two error rates in these analyses. Similarly, small numbers of taxa may be more likely to generate apparent trends through stochastic effects, and these could also generate the apparent two-phase pattern of low, and then rapid, diversification in African lineages.

Taken at face value, net diversification rates and LTT plots both suggest a trend for more rapid diversification in Neotropical and African lineages than in Asian ones. The timing of rapid Neotropical diversification falls within the time frame of Andean uplift (i.e., from the late Miocene onwards), proposed as a diversification engine in many taxa (e.g., Richardson et al., 2001). However, because many South American *Manilkara* species are native to the Atlantic Forest, on the opposite side of the continent from the Andes, Andean uplift may be considered unlikely to directly explain high diversification rates region-wide. Interestingly, the rapid diversification of the African lineage coincided with periods of regional aridification. The slowest diversification rate, in the Southeast Asian lineage, includes species that are mostly to the east of Wallace's Line. This may be explained by the fact that the mountainous topography of much of this region (dominated by New Guinea) limits the habitat available for lineages such as *Manilkara* that are largely restricted to lowland rain forest that covers a greater area of Africa or the Neotropics. Although there is no statistical support for significant diversification rate variation in *Manilkara s.s.*, the causes highlighted here should have similar impacts on other lowland rainforest taxa—a prediction that can be tested in future studies utilizing phylogenies of more species rich taxa and meta-analyses of multiple unrelated lineages.

### **AUTHOR CONTRIBUTIONS**

This paper is a result of Kate E. Armstrong's Ph.D. thesis research at the Royal Botanic Garden Edinburgh and University of Edinburgh. Kate E. Armstrong and James E. Richardson conceived the study and Kate E. Armstrong carried out the research and wrote the manuscript apart from the diversification rate analysis, which was conducted and written by Eugenio Valderrama. James E. Richardson, Graham N. Stone, and Richard Milne supervised the Ph.D. project. Graham N. Stone and James E. Richardson edited the manuscript. James A. Nicholls assisted with phylogenetic analyses. Arne A. Anderberg, Jenny Smedmark, Laurent Gautier, and Yamama Naciri contributed DNA sequence data to the study. All authors have reviewed the manuscript.

### **ACKNOWLEDGMENTS**

This doctoral research was made possible through a scholarship from the Torrance Bequest at the University of Edinburgh. Grants for fieldwork from the Royal Geographical Society, the Carnegie Trust, the Systematics Association, and the Davis Expedition Fund are also gratefully acknowledged. D. Ndiade Bourobou (CENAREST, IRAF) is thanked for a DNA aliquot of *Baillonella toxisperma*, and Jerome Chave (CNRS) is thanked for ITS sequences of *Manilkara bidentata* and *M. huberi*. Thanks to members of the Stone Lab at the University of Edinburgh for comments on an earlier draft of the manuscript.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene. 2014.00362/abstract

#### **INCONGRUENCE BETWEEN NUCLEAR AND PLASTID TREES**

Phylogenies generated with nuclear (**Figure 1**) and plastid data (**Supplementary Figure 1**) showed high topological congruence. However, there are a couple examples of hard incongruence (strongly supported clades which conflict in their placement between the two datasets), both of which have biogeographic implications. The first is in the placement of the two Asian species *Manilkara hexandra* and *M. littoralis*, and the two African species *M. mochisia* and *M. concolor*. In the ITS phylogeny *M. hexandra* and *M. littoralis* are resolved in the Asian clade Y, while *M. mochisia* and *M. concolor* are resolved in the small African clade V. In contrast, in the plastid phylogeny, these four species form a strongly supported clade (posterior probability 1), marked in **Supplementary Figure 1**.

A second hard incongruence is apparent in the placement of the three taxa *Manilkara yangambensis*, *M. triflora*, and *M. suarezensis*. In the plastid phylogeny these form a monophyletic clade Z (**Supplementary Figure 1**). In contrast, in the ITS analysis, the Brazilian *M. triflora* was poorly resolved at the base of clade T, whereas the Madagascan *M. suarezensis* was resolved within the main African clade (X). The Congolese species *M. yangambiensis* was not included in the ITS analysis due to difficulties in amplifying its DNA from herbarium specimens.

These discrepancies between the nuclear and plastid trees may be the result of either ancestral polymorphism with incomplete lineage sorting or chloroplast capture (introgression) following dispersal.

#### **EVIDENCE FOR CHLOROPLAST CAPTURE?**

In the dated nuclear phylogeny, the Asian species *M. hexandra* (Sri Lanka) and *M. littoralis* (Myanmar) (clade Y1) are placed with other Asian species (clade Y2), whereas in the plastid phylogeny, they are resolved in clade with two African species *M. mochisia* (Zambia) and *M. concolor* (South Africa) (from clade V in ITS). This suggests hybridization of taxa across the Indian Ocean possibly resulting in chloroplast capture. Intercontinental chloroplast capture may also be implicated in the case of clade Z, which is resolved in the plastid analyses but not in the ITS analyses and is composed of *M. suarezensis* (Madagascar), *M. triflora* (Brazil), and *M. yangambiensis* (Congo). The ITS analysis did not include *M. yangambiensis*, but placed *M. triflora* with other Neotropical species in clade S, and *M. suarezensis* with other Madagascan species within a larger clade of African species (clade X). Therefore, ITS resolved at least two of the clade Z species with species from the same landmass, but cpDNA did not, and resolved them together instead. Clade Z is strongly supported (pp 0.99) in the plastid analysis. Assuming that the correct species level relationships are resolved, clade Z presents a case of long distance dispersal and chloroplast capture more remarkable than the clade V/Y1 scenario, because it involves species from three landmasses, and hence two dispersal events.

Hybridization and chloroplast capture across long distances such as ocean barriers has been indicated previously in Sapotaceae. The species *Chrysophyllum cuneifolium* is inferred to have originated from an intercontinental hybridization event where the chloroplast is South American and the nuclear genome is African (Särkinen et al., 2007). Likewise, the Pacific genus *Nesoluma* is hypothesized to have arisen as a result of intercontinental hybridization in the boreotropical region during the Eocene (Smedmark and Anderberg, 2007). *Nesoluma* presents the opposite pattern to *Chrysophyllum*, where the chloroplast is Armstrong et al. Diversification amongst tropical regions compared

African and the nuclear genome is Neotropical. Hybridization between New and Old World lineages has also been demonstrated in the pantropical genus *Gossypium* (Malvaceae; Wendel et al., 1995) and intercontinental chloroplast capture is hypothesized to have also occurred in *Thuja* (Cupressaceae; Peng and Wang, 2008). Additionally, both hybridization and introgression events are inferred to have occurred between distantly related species in *Ilex* (Aquifoliaceae; Manen et al., 2010). What is abundantly clear is that long distance dispersal has played a crucial role in the establishment of the modern distribution of *Manilkara*.

**Supplementary Figure 1 | Bayesian majority rule consensus tree of the chloroplast dataset.** Posterior probability values are indicated above branches. Nodes with letters/symbols are discussed in the text.

**Supplementary Figure 2 | Phylogenetic tree used in the BAMM analysis showing the nodes for which a proportion of sampled species was calculated, as shown in Supplementary Table 3.**

**Supplementary Table 1 | Herbarium specimen data, GenBank accession number and ancestral area coding for taxa included in the analyses.** Accessions of newly generated sequences are emboldened.

**Supplementary Table 2 | Chloroplast primers designed for this study.**

**Supplementary Table 3 | Lineage specific correction used to take into account incomplete taxon sampling in the BAMM analysis.** The unsampled species were assigned to the more recent node including the species with the most similar morphology. The proportion of sampled over total taxa was calculated for the nodes shown in **Supplementary Figure 2**.

#### **REFERENCES**


Lebrun, J. P. (2001). *Introduction á la Flore d'Afrique*. Paris: Cirad, Ibis Press.


Machin, J. (1971). Plant microfossils from Tertiary deposits of the Isle of Wight. *New Phytol.* 70, 851–872. doi: 10.1111/j.1469-8137.1971.tb02586.x


in the amphi-Atlantic rain forest genus *Renealmia* L.f. (Zingiberaceae). *Mol. Phylogenet. Evol.* 44, 968–980. doi: 10.1016/j.ympev.2007.06.007


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 July 2014; accepted: 29 September 2014; published online: 03 December 2014.*

*Citation: Armstrong KE, Stone GN, Nicholls JA, Valderrama E, Anderberg AA, Smedmark J, Gautier L, Naciri Y, Milne R and Richardson JE (2014) Patterns of diversification amongst tropical regions compared: a case study in Sapotaceae. Front. Genet. 5:362. doi: 10.3389/fgene.2014.00362*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Armstrong, Stone, Nicholls, Valderrama, Anderberg, Smedmark, Gautier, Naciri, Milne and Richardson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Corrigendum: Patterns of diversification amongst tropical regions compared: a case study in Sapotaceae

Kate E. Armstrong1, 2, 3 \*, Graham N. Stone<sup>2</sup> , James A. Nicholls <sup>2</sup> , Eugenio Valderrama2, 3 , Arne A. Anderberg<sup>4</sup> , Jenny Smedmark <sup>5</sup> , Laurent Gautier <sup>6</sup> , Yamama Naciri <sup>6</sup> , Richard Milne<sup>7</sup> and James E. Richardson3, 8

<sup>1</sup> The New York Botanical Garden, New York, NY, USA, <sup>2</sup> Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, Scotland, <sup>3</sup> Royal Botanic Garden Edinburgh, Edinburgh, Scotland, <sup>4</sup> Naturhistoriska Riksmuseet, Stockholm, Sweden, <sup>5</sup> University Museum of Bergen, Bergen, Norway, <sup>6</sup> Conservatoire et Jardin Botaniques, Genève, Switzerland, <sup>7</sup> Institute of Molecular Plant Sciences, University of Edinburgh, Edinburgh, Scotland, <sup>8</sup> Laboratorio de Botánica y Sistemática, Universidad de los Andes, Bogotá, Colombia

Keywords: Sapotaceae, Manilkara, pantropical, biogeography, diversification rates

## **A Corrigendum on**

#### Edited and reviewed by:

Toby Pennington, Royal Botanic Garden Edinburgh, UK

> \*Correspondence: Kate E. Armstrong, karmstrong@nybg.org

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 23 January 2015 Accepted: 18 February 2015 Published: 05 March 2015

#### Citation:

Armstrong KE, Stone GN, Nicholls JA, Valderrama E, Anderberg AA, Smedmark J, Gautier L, Naciri Y, Milne R and Richardson JE (2015) Corrigendum: Patterns of diversification amongst tropical regions compared: a case study in Sapotaceae. Front. Genet. 6:86. doi: 10.3389/fgene.2015.00086 **Patterns of diversification amongst tropical regions compared: a case study in Sapotaceae** by Armstrong, K. E., Stone, G. N., Nicholls, J. A., Valderrama, E., Anderberg, A. A., Smedmark, J.,

et al. (2014). Front. Genet. 5:362. doi: 10.3389/fgene.2014.00362

In the sixth paragraph of the Discussion section entitled "Regional Diversification in Manilkara," in the sentence beginning "In Sapotaceae four lineages of Isonandreae have migrated. . . " the citation of Swenson et al., 2008 should instead be: Swenson, U., Nylinder, S., and Munzinger, J. (2014). Sapotaceae biogeography supports New Caledonia being an old Darwinian island. J. Biogeogr. 41, 797–809. doi: 10.1111/jbi.12246

Additionally, in the second paragraph of the Supplementary Material section entitled "Evidence for Chloroplast Capture?" in the sentence beginning "The species Chrysophyllum cuneifolium is inferred to have originated. . . " the citation of Särkinen et al., 2007, should instead be: Swenson, U., Richardson, J. E., and Bartish, I. V. (2008). Multi-gene phylogeny of the pantropical subfamily Chrysophylloideae (Sapotaceae): evidence of generic polyphyly and extensive morphological homoplasy. Cladistics 24, 1006–1031. doi: 10.1111/j.1096-0031.2008.00235.x

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Armstrong, Stone, Nicholls, Valderrama, Anderberg, Smedmark, Gautier, Naciri, Milne and Richardson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# An engine for global plant diversity: highest evolutionary turnover and emigration in the American tropics

Alexandre Antonelli 1, 2 \* † , Alexander Zizka1 †, Daniele Silvestro1, 3 , Ruud Scharn<sup>1</sup> , Borja Cascales-Miñana<sup>4</sup> and Christine D. Bacon1, 5

<sup>1</sup> Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden, <sup>2</sup> Gothenburg Botanical Garden, Göteborg, Sweden, <sup>3</sup> Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland, <sup>4</sup> Laboratoire de Paléobiogéologie, Paléobotanique, Paléopalynologie, Département de Géologie, Université de Liège, Liège, Belgium, <sup>5</sup> Laboratório de Biología Molecular (CINBIN), Department of Biology, Universidad Industrial de Santander, Bucaramanga, Colombia

#### Edited by:

James Edward Richardson, Royal Botanic Garden Edinburgh, UK

#### Reviewed by:

Berit Gehrke, Johannes Gutenberg Universität, Germany Tiina Elina Sarkinen, Royal Botanic Garden Edinburgh, UK

#### \*Correspondence:

Alexandre Antonelli, Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg Botanical Garden, Carl Skottsbergs gata 22B, SE-41319 Göteborg, Sweden alexandre.antonelli@bioenv.gu.se

† These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 06 October 2014 Accepted: 18 March 2015 Published: 08 April 2015

#### Citation:

Antonelli A, Zizka A, Silvestro D, Scharn R, Cascales-Miñana B and Bacon CD (2015) An engine for global plant diversity: highest evolutionary turnover and emigration in the American tropics. Front. Genet. 6:130. doi: 10.3389/fgene.2015.00130 Understanding the processes that have generated the latitudinal biodiversity gradient and the continental differences in tropical biodiversity remains a major goal of evolutionary biology. Here we estimate the timing and direction of range shifts of extant flowering plants (angiosperms) between tropical and non-tropical zones, and into and out of the major tropical regions of the world. We then calculate rates of speciation and extinction taking into account incomplete taxonomic sampling. We use a recently published fossil calibrated phylogeny and apply novel bioinformatic tools to code species into user-defined polygons. We reconstruct biogeographic history using stochastic character mapping to compute relative numbers of range shifts in proportion to the number of available lineages through time. Our results, based on the analysis of c. 22,600 species and c. 20 million geo-referenced occurrence records, show no significant differences between the speciation and extinction of tropical and non-tropical angiosperms. This suggests that at least in plants, the latitudinal biodiversity gradient primarily derives from other factors than differential rates of diversification. In contrast, the outstanding species richness found today in the American tropics (the Neotropics), as compared to tropical Africa and tropical Asia, is associated with significantly higher speciation and extinction rates. This suggests an exceedingly rapid evolutionary turnover, i.e., Neotropical species being formed and replaced by one another at unparalleled rates. In addition, tropical America stands out from other continents by having "pumped out" more species than it received through most of the last 66 million years. These results imply that the Neotropics have acted as an engine for global plant diversity.

Keywords: angiosperms, biogeography, diversification rates, latitudinal diversity gradient, out-of-the-tropics, phylogenetics, tropical biodiversity

## Introduction

The world's biodiversity is unevenly distributed, and most species are found in the tropical regions of Asia (including Australasia), Africa, and the Americas. Understanding the underlying causes for the latitudinal biodiversity gradient—the decrease of taxonomic diversity away from the equator has fostered extensive and integrative research, and its formation still constitutes a matter of debate in evolutionary biology and biogeography (see e.g., Pianka, 1966; Hillebrand, 2004; Jablonski et al., 2006; Wiens et al., 2006; Brown, 2014; Huang et al., 2014; Kerkhoff et al., 2014; Mannion et al., 2014; Rolland et al., 2014).

There are three primary explanations for the latitudinal biodiversity gradient, which are not mutually exclusive. Often referred to as the museum hypothesis (Stebbins, 1974), one view is that there has been a longer period of time for the accumulation of diversity in the tropics because most of the Earth was essentially tropical until the Eocene–Oligocene boundary c. 34 million of years ago (Ma; Zachos et al., 2008). In contrast to the focus on geological and evolutionary time, it has also been proposed that higher tropical biodiversity could be caused by higher net diversification rates in tropical vs. temperate zones (Mittelbach et al., 2007), i.e., either due to high speciation, low extinction, or some combination of both. Why such rates would be different is in itself a matter of further debate, with a key role being attributed to kinetics (Brown, 2014). More recently it has been suggested that it is the inability of tropical lineages to disperse, survive, and diversify out of the tropics that drives the latitudinal biodiversity gradient, due to intrinsic eco-physiological constraints (niche conservatism; Kerkhoff et al., 2014).

A second striking feature of tropical biodiversity, besides being consistently higher than in non-tropical regions, is its uneven distribution among the three tropical regions of the world. For instance, it has been suggested that the American tropics (the Neotropics) comprise more species of seed plants than tropical Africa and tropical Asia together, with similar patterns for other organismal groups such as amphibians, mammals, birds, nymphalid butterflies, and reptiles (Govaerts, 2001; Antonelli, 2008; Antonelli and Sanmartín, 2011 and references therein). The underlying causes for these inter-continental differences are poorly understood, and could be analogous to those determining the latitudinal biodiversity gradient. In addition, differences in area and biome sizes, environmental and soil heterogeneity, climatic history, biological exploration, and digitalization of natural history collections (amongst others) could also play important roles.

Evaluating the validity and relative roles of the factors driving these fundamental biodiversity differences requires combining evidence from several sources and disciplines, such as palaeontology, ecology and molecular phylogenetics. Among these, two main components stand out as essential in this pursuit: understanding species diversification (i.e., the interplay between speciation and extinction) and the geographic history of lineages. In this study we explore these two components at a global and continental scale. We focus on the Cenozoic history (i.e., the last 66 Ma) of flowering plants (angiosperms), which form the dominant structure of tropical and temperate ecosystems. We ask two overarching questions:

(1) Have the tropics as a whole, and each tropical region separately, been mainly a sink or a source of angiosperm diversity? More specifically, did range shifts (including trans-oceanic dispersals) between tropical and non-tropical zones, and into and out of each tropical region, occur in both directions at a roughly constant pace throughout the Cenozoic, or were there phases of markedly different range shift rates and directionality?

(2) Is high diversity correlated with high speciation and/or low extinction? More specifically, were there significant differences in speciation and extinction rates between tropical and non-tropical zones, and among tropical regions? In such case, are the most

species rich regions also those with highest speciation and/or lowest extinction?

To address these questions, we calculate and compare rates of speciation and extinction between tropical and non-tropical zones and among the world's three tropical regions (in Africa, Asia, and the Americas), and we infer the timing and direction of range shifts into and out of each tropical region.

## Material and Methods

## Data Compilation

Fossils, molecular phylogenies, and species occurrences constitute diverse data sources that, taken together, can be used to infer diversity trends through time and space. Here we explore the feasibility of using both neontological and palaeontological data for addressing the questions outlined in this study.

## Fossils

We explored whether fossils could be used to infer diversity trends through time, as has been recently demonstrated for fossil rich clades such as mammals (Silvestro et al., 2014). For this we assessed a global data set of angiosperm macrofossil occurrences originally downloaded from the Paleobiology database (https://www.paleobiodb.org) as described by Silvestro et al. (2015). The data set included 9,665 records, representing a total of 297 fossil taxa identified to the genus level; identifications below the generic level were grouped by genus. To investigate potential biases in the data, all records were subdivided by country and time period (from the Lower Cretaceous to today), according to the Geological Time Scale of Gradstein et al. (2012). Unfortunately, a visual inspection of the data (**Figure 1**) showed severe spatial and temporal biases. These biases precluded any sensible analyses of diversity changes in tropical regions, and we were therefore forced to rely on species distribution and molecular data alone.

## Species Occurrences

We downloaded all geo-referenced (i.e., provided with a longitude and latitude) species occurrences of angiosperms available at the Global Biodiversity Information Facility (GBIF, http://www.gbif.org; downloaded in June 2014). Records flagged to contain "known coordinate issues" were excluded prior to the download. One record per location per species was retained. We then applied basic data cleaning steps on the full data set (c. 40 gigabytes) for identifying and excluding obviously erroneous data points, such as records with non-numeric coordinates or missing species names, records with identical latitude and longitude, and latitudes or longitudes equal to zero (which we considered to have been left in blank during data entry). For these steps

we used a modified version of the scripts by Zanne et al. (2014) implemented in R (R Core Team, 2014).

#### Geographic Assignments

We coded each species for its presence and absence in four large regions or operational units (**Figure 2**): tropical America (the Neotropics), tropical Africa (the Afrotropics), tropical Asia (including Australasia), and all other (non-tropical) regions combined. We delimited those regions by following the same boundaries for biomes and ecoregions as adopted by the World Wide Fund for Nature (WWF), as described in Olson et al. (2001). We considered the following ecoregions as forming together the tropical region: "Tropical and Subtropical Moist Broadleaf Forests," "Tropical and Subtropical Dry Broadleaf Forests," "Tropical and Subtropical Coniferous Forests" and "Tropical and Subtropical Grasslands, Savannas, and Shrublands." All other ecoregions were merged to form our "non-tropical" region. We classified "Flooded Grasslands and Shrublands" as tropical or non-tropical depending on the surrounding biome and geographic position. We acknowledge that the WWF biome and ecoregion classification is to some extent arbitrary and based on expert opinion, rather than directly data derived (Vilhena and Antonelli, 2014). However, we consider that the level of accuracy of this classification is adequate for the purposes of this study, and superior to a classification based solely on latitudinal limits or a purely climatic classification without proper consideration of biotic components (Kottek et al., 2006).

For each continent, all polygons for biomes classified as "tropical" were merged into a single polygon, and the same was done for all "non-tropical" biomes, which were merged into a single multi-polygon comprising areas in both the southern and the northern hemisphere. This means that each tropical region comprised e.g., both rainforests and savannas, but excluded very dry areas (such as the Sahara in Africa, the Caatinga in South America and parts of the Deccan plateau in India) as well as the coldest habitats (e.g., high altitude areas in the South American Andes and along the African Great Rift Valley) located within the tropical belt (between c. 23◦ north and c. 23◦ south). Although smaller operational units would have been interesting from a biological perspective, e.g., separating rain forests and savannas, it would inevitably incur a considerable loss of data and statistical power for the subsequent analyses. We utilized the software package SpeciesGeoCoder v.1.0 (Töpel et al., 2014) to code species into operational units. The resulting polygons can be retrieved from the authors upon request.

To further identify potential biases caused by erroneous georeferences (e.g., due to wrong coordinates or species identifications), we applied a set of arbitrary thresholds in order for a species to be coded as "present" in a certain operational unit. Three filters were defined, with increasingly more strict criteria, as outlined in **Table 1**. We implemented functions and scripts to carry out this data filtering in R (scripts available from the authors).

There was no major loss of occurrence records by going from Filter 1 to the more conservative Filter 2 (see Results below). We therefore chose to perform our analyses on range transitions on the data set generated under Filter 2, and the diversification rate analyses using the Filter 3 data set, due to the fact that the method we employed cannot handle widespread taxa (see below).

#### Molecular Phylogeny

We chose to work with a single dated tree rather than performing a meta-analysis of individual trees (e.g., Jansson et al., 2013, so that divergence times among clades would be more directly comparable with each other. We therefore used the recent fossil-calibrated molecular phylogeny of angiosperms from

TABLE 1 | Automated criteria for recording presence of each species in each operational unit defined in Figure 2, departing from raw GBIF species occurrence data.


\*Under this filter, presence was only coded in the region with the highest number of records.

Zanne et al. (2014), with 30,535 species. The phylogeny was based on data from seven gene regions and families and orders were constrained to the APG III classification system (Bremer et al., 2009). To evaluate whether the level of taxonomic representation was consistent among regions, which could otherwise bias our subsequent analyses, we calculated the ratio between the number of species sampled in the phylogeny and the total number of species recorded in each of the four regions in the GBIF database.

#### Tropical Conservatism

We tested whether species in each of the regions defined (**Figure 2**) were clustered in the angiosperm phylogeny (i.e., showed strong phylogenetic signal) using Bayesian Tip-Significance testing implemented in the software BaTS v. 1.0 (Parker et al., 2008). We compared the observed distribution of states in the reference phylogeny against 100 randomized replicates, which were used to compute 95% credible intervals of trait distributions.

#### Range Shifts through Time

We used the region-coded, dated phylogeny of angiosperms to estimate the timing and directionality of range shifts between tropical and non-tropical lineages, and among the three tropical regions of the world. Since our analyses focused on the Cenozoic, when the three tropical continents were already widely separated by oceans (Mcloughlin, 2001), these events should include both trans-oceanic dispersals as well as range expansions over continuous land between the tropical and non-tropical zone.

We used stochastic character mapping (Huelsenbeck et al., 2003) to reconstruct histories of shifts across biogeographic regions (e.g., Clark et al., 2008). We calculated the relative number of transitions through time (Silvestro, 2012; Fernández-Mendoza and Printzen, 2013) as the absolute number of transitions divided by the number of nodes in 5 million year time bins. We did this to account for the fact that even under a simple birth model of speciation the number of lineages in a phylogeny tends to increase exponentially, therefore increasing the possibility of range shifts to occur toward the present (Silvestro, 2012). Credible intervals around the relative number of transitions through time were obtained by simulating 100 stochastic histories of geographic range evolution. We optimized the original scripts implementing this method and implemented them in TABLE 2 | Proportion of species included in the phylogeny for each plant order analyzed.


Species numbers follow those in the GBIF Backbone Taxonomy. (Sampling in tropics = number of species in the diversification analyses classified as tropical divided by the total number of species classified as tropical; Sampling outside the tropics = number of species in the diversification analyses classified as outside the tropics divided by the total number of species classified as outside the tropics).

R using phytools (Revell, 2012) to perform stochastic mapping (new scripts available from the authors).

#### Diversification Rates

We calculated rates of speciation (λ) and extinction (µ) for each tropical region separately, as well as for tropical and non-tropical species. For these analyses we used the Multiple State Speciation and Extinction method (MuSSE) as implemented in diversitree (Fitzjohn, 2012). We analyzed 17 subclades separately (**Table 2**), which we chose to correspond to plant orders. This division was necessary due to computational limitations in analysing the full tree under this method, but also carried the advantage of creating a sample of rate estimates across different angiosperm clades. We did not explore the effect of splitting the angiosperm tree into different numbers of subclades or along different branches, since there would be an almost endless number of possible combinations. We accounted for varying levels of taxonomic sampling in the phylogeny by calculating the sampling fraction of each order.

We compared the significance of results from the diversification analyses using Analysis of Variance (ANOVA), and then applied the Tukey's honest significant difference (HSD) test in order to identify outstanding values. To account for intrinsic differences among plant orders, we normalized the rates of speciation and extinction for each order over all regions. This was done by dividing each rate by the sum of the rates in all regions analyzed. In all analyses, we used mean values of rates.

## Results

## Data Compilation

**Figure 3** shows the number of species and occurrences coded into each of the regions defined, the number of those that were also present in the phylogeny, and the influence of each filter applied. The raw data set of species occurrence points (after applying the basic cleaning steps described above) comprised a total of 24,908,478 records pertaining to 188,655 species (purple bars, **Figure 3**). Many species could not be matched between the species occurrence data set and the molecular phylogeny used, due to taxonomic issues that could not be easily solved (e.g., synonymisation and different taxonomic circumscriptions), and the fact that numerous species did not occur in both data sets. Despite these issues, a total of 27,585 species could be fully matched between the molecular phylogeny and the occurrence data set, representing 14.6% of the total number of currently accepted species of angiosperms (273,174 species, according to http://www.theplantlist.org; accessed September 2014). The data set generated under Filter 2, used for all analyses except MuSSE and BaTS, comprised a total of c. 20 million occurrence points and between c. 500 to 6600 species per region (**Figure 3**).

The proportion between species with geo-references and species in the phylogeny ranged from c. 8 to 15% among regions

FIGURE 3 | Number of angiosperm species and occurrences in the four regions defined in this study. The bars show the influence of different cleaning steps on the data set (see also Table 1). (A) Number of species per dataset and geographic region, (B) number of occurrence points per dataset and geographic region, (C) number of species per dataset and geographic region (Tropical vs. Non-Tropical), (D) number of occurrence records per dataset and geographic region

(Tropical vs. Non-Tropical). Purple: GBIF download; blue: species that are included (and could be matched) in the phylogeny; dark green: Filter 1 (minimum 3 occurrences to be coded as present in a given region); light green: Filter 2 (additionally 10% of all occurrences per species needed to be coded as present); orange: Filter 3 (additionally widespread species restricted to one region). The Filter 2 data set was used for all analyses except for MuSSE and BaTS.

(**Table 3**). All tropical regions were similarly represented in the phylogeny, with only 2% difference between the best sampled tropical region (tropical Asia) and the least sampled one (tropical America). Non-tropical regions were better sampled phylogenetically than tropical ones (15% vs. 9%, respectively).

#### Phylogeny-based Analyses

**Figure 4A** shows the angiosperm phylogeny and the coding of each species as occurring in each of the four regions defined, whereas **Figure 4B** shows the coding in tropical and non-tropical regions. The Bayesian Tip-Significance testing indicated that species in all regions (**Figure 2**) are highly clustered phylogenetically (p < 0.001 for all three statistical tests implemented in BaTS: parsimony score, association index and maximum exclusive single-state clade).

The results from the range shift analyses are summarized in **Figure 5**. Confidence intervals of range shift rates were generally large and mostly overlapping, but the width of their ranges decreased toward the present. During most of the Cenozoic, mean emigration rates (out of the tropics) were slightly higher or very similar to migration into the tropics (**Figure 5A**). From c. 58 to c. 44 Ma, immigration into the tropics showed a small decrease. Both tropical Africa (**Figure 5B**) and tropical Asia (**Figure 5C**) showed similar mean rates of immigration and emigration through time, except for some fluctuations (especially in Asia, prior to c. 25 Ma). In contrast, there was a consistently higher rate of emigration from tropical America (**Figure 5D**). These rates only reached equilibrium c. 14 Ma.

The region-specific rates of speciation and extinction inferred using the MuSSE model are shown in **Figure 6**, calculated under the sampling fractions for each order indicated in **Table 2.** Individual estimates are reported in Supplementary Table S1, and significance values in each set of comparisons are summarized in **Table 4**.

The median values of both speciation and extinction rates were higher in non-tropical than in tropical zones, but these estimates showed large overlap in their confidence intervals and are not statistically different (**Figures 6A,B**). In contrast, both the speciation and the extinction rates estimated for tropical America were significantly higher than those estimates for tropical Africa and tropical Asia (**Figures 6C,D**, p < 0.05 for speciation, and p < 0.001 for extinction).

TABLE 3 | Number of species recorded in each of the regions defined for the analyses (for which georeferenced data were available from GBIF), number of those species that could be included in the range shift analysis (after applying Data Filter 2; see Table 1 and Figure 3), and their sampling fraction.


## Discussion

### The Geographic History of Tropical Angiosperms

Our analyses of historical range shift events (**Figure 5**) reveal some interesting patterns. During the first half of the Cenozoic (from 66 until c. 30 Ma), our results indicate that most range shifts took place out of the tropics. This result corroborates a recent meta-analysis of 111 dated phylogenies, including seven clades of angiosperms (Jansson et al., 2013), and also reflects the directionality observed from the fossil record of marine bivalves for the last 11 Ma (Jablonski et al., 2006,

FIGURE 4 | Angiosperm phylogeny used for the range shift and diversification analyses, pruned from Zanne et al. (2014). The tree contains c. 22,600 terminal species and shows (A) the codification into each one of the continental-level regions defined in Figure 2, and (B) the codification of all species as tropical or non-tropical. Species in each of the regions defined are highly clustered phylogenetically according to Bayesian Tip-Significance testing (p < 0.001).

2013), Overall, range shifts appear poorly associated in time with climate, approximated through a mean global temperature curve (**Figure 5A**). Some correspondence may however include a c. 30% decrease in range shifts into the world's tropics during the highest temperature levels of the Cenozoic, around the Early Eocene Climatic Optimum c. 52 Ma (Zachos et al., 2008). An additional overall decrease is observed coinciding with the Mid-Miocene Climatic Optimum c. 15 Ma. Why global warming would have influenced range shifts among tropical and nontropical regions as observed here is puzzling, and may reflect large-scale but poorly understood vegetational changes. We also note that range shifts into and out of the tropics reached an equilibrium only a few million years after the Eocene-Oligocene transition, a global cooling event associated with the gradual glaciation of Antarctica (Zachos et al., 2008).

Range shifts into and out of tropical Africa (**Figure 5B**) occurred in both directions at about the same rate, and showed the least fluctuations among the three tropical regions analyzed. The initial formation of the Sahara c. 7 Ma (Zhang et al., 2014) did not seem to leave a considerable footprint on these rates.

Range shifts into and out of tropical Asia (**Figure 5C**) were fairly similar and exhibited most fluctuations prior to c. 23 Ma. Major events in that period include the collision of India with Asia c. 55–45 Ma, the uplift of the Qianghai-Tibetan Plateau c. 45–20 Ma, and the establishment of the monsoon system in Southeast Asia c. 35–20 Ma (Favre et al., 2014). The "out-of-India" hypothesis postulates that a number of African-derived organisms, including both animals (Bossuyt and Milinkovitch, 2001) and plants (Conti et al., 2002), rafted on the Indian subcontinent and dispersed into Asia after the collision of these landmasses. This dispersal route has received support from the molecular analyses of several taxa (Karanth, 2006). We note a temporal correlation between the initial collision (c. 55 Ma) and the shift from tropical Asia being mainly a sink of lineages to it becoming a net source of angiosperm diversity. Another major event in the Cenozoic is the geological rejuxtaposition of Southeast Asia, which created a stepping-stone route between Oceania and Asia from c. 40 Ma (Hall, 2009). This event might be reflected in our results by the increase of lineages entering tropical Asia around that time, leading again to a net input of non-tropical lineages into tropical Asia.

Range shifts out of tropical America were consistently more frequent that those entering it, throughout most of the Cenozoic (from c. 65 to 15 Ma; **Figure 6D**). A remarkable peak in emigration shifts was estimated at c. 57 Ma, which was simultaneously associated with a modest decrease in immigration events. These results imply a c. 3 times higher rate of lineages leaving the Neotropics than shifts in the opposite direction. We note that this peak corresponds closely in time (allowing for the uncertainties in molecular dating) to the Paleocene-Eocene Thermal Maximum (PETM; **Figure 5A**). This was a short-lived (c. 10,000 years)

non-tropical); (B) Extinction rates per geographic region (tropical vs. non-tropical); (C) Speciation rates for the three tropic regions; (D) Extinction rates for the three tropic regions. All results are normalized against each

shown as a horizontal line and the whiskers indicating data range outside the quantiles. \*\* and \*\*\* denote significant differences (p < 0.05 and p < 0.001, respectively; ANOVA). See Methods for details.

#### TABLE 4 | Variables and statistical tests based on the MuSSE analyses of the molecular phylogeny of angiosperms.


Significant values at 95% confidence levels are underscored.

event which took place c. 56.3 Ma and was characterized by mean global temperatures reaching above 12◦C from today's level (Zachos et al., 2008). Evidence from the fossil record show that considerable changes occurred at the PETM in Neotropical rainforests, with rapid origination of new taxa and changes in vegetation composition due to range shifts and local extirpations (Jaramillo et al., 2010). It seems therefore reasonable to suggest that newly speciated taxa might, at least in part, account for the inferred peak.

The high rate of range shifts out of the Neotropics is particularly noteworthy in comparison to the other tropical regions, where we did not find this difference between immigration and emigration. Thus, our results suggest that the Neotropics have functioned as a "species pump" for the rest of the world during the first 50 million years of the Cenozoic, but in particular during the Paleocene and early Eocene. The reasons for this require further investigation, but reflect the patterns observed in marine bivalves in which clades with higher diversification were the most likely to expand out of the tropics (Jablonski et al., 2013).

A second event of potential significance for range shifts in the Neotropics was the establishment of a stepping-stone land bridge reducing the gap between North and South America, known as the Greater Antilles and Aves Ridge or GAARlandia (Iturralde-Vinent and Macphee, 1999; Pennington and Dick, 2004). The existence and role of the GAARlandia in facilitating dispersals remains controversial (Ali, 2012), but the hypothesis has gained recent support in phylogeographic analyses of several animal taxa, including spiders (Crews and Gillespie, 2010), amphibians (Alonso et al., 2012) and cichlids (Rí ˇ can et al., 2013 ˇ ). We did not detect any definite signal of the GAARlandia in our estimation of range shifts for angiosperms, except perhaps for a slow decrease in shifts entering the Neotropics (which, if confirmed, could also be linked to the global temperature decline at Eocene/Oligocene transition).

## Building up Tropical Biodiversity

Our phylogeny-based estimates of speciation and extinction rates (**Figure 6**) showed that angiosperms in tropical regions both speciated and went extinct at lower rates than in temperate regions, although this difference was not significant (p > 0.05; **Table 4**). This result reflects the lack of conclusive evidence on this issue. Several studies have suggested higher rates of diversification (defined as speciation minus extinction) in the tropics (Mittelbach et al., 2007), including amphibians (Pyron and Wiens, 2013), mammals (Rolland et al., 2014), and squamate reptiles (Pyron, 2014). Others have found temperate regions to have higher diversification rates, based on the analysis of birds and mammals (e.g., Weir and Schluter, 2007). An analysis of bird diversification showed yet a third pattern, where the major differences in diversification rates were between the western and eastern hemispheres, rather than between tropical and temperate zones (Jetz et al., 2012). Our results are similar to those obtained by Jansson et al. (2013), who found no significant differences in the net diversification between tropical and temperate sister lineages. Overall, our results suggest that the higher diversity of angiosperms in tropical compared to non-tropical regions is not primarily dependent on higher speciation and/or lower extinction in the tropics.

In contrast, our results show significantly different rates of speciation and extinction amongst the tropical regions of the world (**Figures 6C,D**). Neotropical angiosperms speciated on average about 2–2.5 times faster than angiosperms in tropical Asia and tropical Africa. However, they also went extinct about 2–2.5 times faster than in tropical Asia. These high rates of speciation and extinction in the Neotropics indicate a rapid evolutionary turnover, i.e., species being formed and replacing each other at an unparalleled rate. This result is also in accordance to the observation that South American plant diversity is characterized by a relatively large number of recent, species-rich radiations, for instance in the tropical Andes (Hughes and Eastwood, 2006; Drummond et al., 2012; Madriñán et al., 2013) and Amazonia (Richardson et al., 2001; Erkens et al., 2007). Diversification in the region has been linked to the substantial changes in the landscape in the Neogene (Hoorn et al., 2010; Wesselingh et al., 2010), but several taxa may have an even younger origin in the Quaternary (Rull, 2011; Smith et al., 2014).

## Reliability of Results: Pushing the limits of Biological Data

Evolutionary biology and biogeography are now experiencing a tremendous accumulation of data, including molecular sequences, fossils, and species occurrences, with a hitherto unrealized scientific potential. An emerging question, however, is to what extent available data and methods are sufficient to provide us with reliable answers to some of the most fundamental questions in biology. A critical evaluation of the data, methods and assumptions is therefore crucial but often underestimated in evolutionary studies.

Whenever possible, palaeontological data should be studied in conjunction with molecular-based evolutionary analyses (Quental and Marshall, 2010; Fritz et al., 2013; Silvestro et al., 2014). However, our assessment of angiosperm fossils currently available (**Figure 1**) suggests that data unavailability is a serious issue for angiosperms. The number of angiosperm fossil occurrences publicly available varied considerably among countries and geological periods, with some countries (e.g., USA, Russia) and periods (e.g., the Miocene) being considerably better represented than others. On a continental scale, lack of data is particularly critical for Africa, Southeast Asia and Australasia; but even within relatively well-sampled continents (such as Europe and South America) there are strong regional biases among countries.

Similar to the case of fossil data, there is general skepticism concerning the use of publicly available species occurrences for understanding species distributions, especially from nonverified databases such as GBIF. Distribution data have been shown to contain important taxonomic, temporal and spatial biases (Boakes et al., 2010). The question of whether bioinformatic tools may correctly infer biodiversity patterns despite those biases remains largely unanswered, and will also depend on the scale and taxa in focus—with higher accuracy expected for wellstudied taxa and large spatial units. Recent studies suggest that automated data handling procedures are able to yield biologically realistic results, if enough care and appropriate techniques are employed (Zanne et al., 2014; Engemann et al., 2015; Maldonado et al., accepted). In other cases, the manual validation by taxonomists appears crucial, e.g., for the assessment of species' conservation status for the IUCN Red List of Threatened Species (Hjarding et al., 2014).

Our approach of automatically coding species into regions and calculating sampling fractions using GBIF data and polygons is not intended to replace the time-consuming work by taxonomists. However, it constitutes an additional, dataderived and spatially explicit approach that deserves further exploration and validation. Estimating global and regional patterns of species richness and biodiversity remains a notoriously difficult and contentious topic, with no consensus reached (Govaerts, 2001; Crane, 2004; Ungricht, 2004; Wortley and Scotland, 2004; Chapman, 2009; Mora et al., 2011). In addition, there is no general agreement on how to best define, delimit and name biogeographical regions (Kreft and Jetz, 2010; Holt et al., 2013; Vilhena and Antonelli, 2014), with the implication for this study that the world's three tropical regions are differently circumscribed in the literature. Our study suggests that a relatively stable assignment of species to large regions (as in **Figure 2**) may be attained through simple, automated filtering steps, in which the addition of increasingly restrictive criteria for coding species results in relatively small differences (**Figure 3**).

The reconstruction of ancestral character states (such as morphology and geographic distribution) along phylogenies is now common practice in evolutionary studies, but only make sense when the traits analyzed are phylogenetically structured—i.e., they are not randomly distributed across the tree. Since we found highly significant clustering of species pertaining to the same geographic assignment in each of the regions defined (**Figure 4**), we consider that the geographic coding and reconstruction analyses using stochastic mapping are suitable for the goals of this study.

The low taxonomic sampling in the phylogeny (**Tables 2, 3**) may influence the calculation of range shifts. However, two considerations suggest that this influence is unlikely to significantly affect the general patterns obtained. First, taxonomic sampling varied by only 2% or less among the tropical continents. Second, even at low sampling it should be possible to recover a relatively large proportion of range shifts among the regions outlined. This is because biological sampling is far from being random, with an over-representation of deep nodes that reflect morphological and geographical variations in taxa (Hohna et al., 2011; ter Steege et al., 2011; Cusimano et al., 2012). In other words, even if only a couple of species were sampled from a speciesrich but strictly African clade, our analyses should be able to detect when that clade arrived in Africa. Further simulations would be helpful to assess at which sampling levels the calculation of continental-level range shifts stabilize and become fully reliable.

Diversification rates of angiosperms have varied widely among clades (Magallón and Sanderson, 2001) and through time (Silvestro et al., 2015). Inferring the dynamics between speciation and extinction through the Cenozoic for each continent should therefore provide important insights into the evolution of their floras. However, the taxonomic sampling in the angiosperm phylogeny was at or below 10% for all tropical regions (**Figure 3**, **Table 3**). Sampling levels already below c. 80% are bound to flaw diversification rate estimates under current methods, often showing slowdowns in net diversification that represent methodological artifacts (Cusimano and Renner, 2010). Expectations on how the missing species are distributed in a phylogeny depending on the sampling scheme may increase the accuracy of diversification analyses (Stadler and Bokma, 2013). However, no method has been developed so far that is capable of confidently dealing with the level of taxonomic sampling observed in the angiosperm phylogeny we used. The MuSSE analyses carried out here can only provide point estimates for the orders surveyed, but should constitute a more powerful approach given the relatively large size of the phylogeny utilized.

## Future Prospects: More Data, Improved Methods

The inevitable incompleteness of the fossil record represents a limit to macro-evolutionary analyses that can be carried out using currently available data. However, the development of new methods has shown that even incomplete fossil data can provide essential information in estimating trends of phenotypic evolution (Slater and Harmon, 2013) and species diversification dynamics (Silvestro et al., 2014). Such models should be ideally extended to historical biogeography and might shed new light on the dynamics of migration of lineages through time and among regions. In particular, fossils provide an important resource for improving biogeographic reconstructions, as they provide information on past species ranges and may therefore further refine or validate ancestral range analyses as performed here (Ronquist et al., 2012; Wood et al., 2013; Lawing and Matzke, 2014). Although correct fossil placement on phylogenies can be problematic, their potential in this area is still insufficiently explored (Wood et al., 2013).

Phylogeny-based diversification analyses are powerful complements to palaeontological inferences. However, they still require further development to be confidently used with poorly sampled phylogenies—as is often the case in plants, regardless of geographic region (**Figure 3** and **Table 3**). Until sampling improves to a much higher level (both taxonomically and genetically), or methods currently used successfully with e.g., mammals (Morlon et al., 2011; Stadler, 2011) are adapted and validated for plants, we remain with limited power to assess the dynamics of diversification rates through time and across clades.

## Conclusions

Here we have shown that currently available biological data including species occurrences and dated phylogenetic trees hold the potential of providing novel and important insights into large-scale patterns of species diversification and biogeography.

The geographic history of angiosperms involved a large number of range transitions between tropical and non-tropical zones, as well as into and out of the world's three tropical regions. Global climatic changes and major geological events are likely to have influenced some of the observed changes in range shifts, such as the early Eocene climatic conditions and the large geographic reconfigurations in tropical Asia (outlined in **Figures 5A,C**). However, these are temporal correlations that require further validation. We cannot rule out that some of the fluctuations we observed in the mean rates of range shifts reflect instead the stochastic nature of dispersals and biome shifts, and/or from lack of phylogenetic signal for events that happened tens of millions of years ago.

No significant differences could be found between the speciation and extinction of tropical and non-tropical angiosperms. This result reflects the lack of conclusive evidence on global diversification patterns for different organism groups. Although diversification estimates need to be continuously revalidated with the addition of more genetic and taxonomic data and increasingly robust methods, our results suggest that the latitudinal diversity gradient in angiosperms is not primarily caused by differences in speciation or extinction rates. Longer time for speciation and tropical niche conservatism might therefore constitute better models for explaining tropical angiosperm diversity.

Continental differences in tropical angiosperm diversity show clearer patterns, adding to our knowledge on the global patterns of plant diversity (Kier et al., 2005; Barthlott et al., 2007; Kreft and Jetz, 2007; Kreft et al., 2010; Mutke et al., 2011). The outstanding species richness of angiosperms found today in the Neotropics as compared to tropical Africa and tropical Asia is associated with significantly higher speciation and extinction rates in the Neotropics (**Figures 6C,D**)—and thereby higher species turnover and shorter average longevity of species. The causes underlying these differences remain elusive, but might be associated with the substantial landscape dynamics that have affected northern South America since the Miocene, among other continent-specific differences such as biome sizes, niche space, and climatic history. Our results also show that Neotropical diversity, once generated in situ, was to a large extent "pumped out" of the Neotropics (**Figure 5D**).

## Data Availability

All scripts used in data compilation and cleaning are available upon request.

## Author Contributions

AA and AZ conceived this study. AZ, DS, and RS compiled and analyzed the molecular data. BC-M and DS compiled and

## References


analyzed the fossil data. All authors interpreted the results and provided input on the manuscript. AA and CDB led the writing with contribution from all authors.

## Acknowledgments

We thank three reviewers and the Associate Editor for constructive feedback on this manuscript. This research was supported by funding from the Swedish Research Council (B0569601) and the European Research Council under the European Union's Seventh Framework Programme (FP/2007-2013, ERC Grant Agreement n. 331024) to AA and CDB; from Carl Tryggers and Wennergren stiftelse to DS; and from a Marie Curie COFUND Postdoctoral Fellowship (University of Liege—grant number: 600405) to BC-M. The code used here was developed, tested and benchmarked on the bioinformatics computer cluster Albiorix at the Department of Biological and Environmental Sciences, University of Gothenburg, and further analyses were run at the high-performance computing center Vital-IT of the Swiss Institute of Bioinformatics (Lausanne, Switzerland).

## Supplementary Material

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene. 2015.00130/abstract


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Antonelli, Zizka, Silvestro, Scharn, Cascales-Miñana and Bacon. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Neotropical mammal diversity and the Great American Biotic Interchange: spatial and temporal variation in South America's fossil record

## *Juan D. Carrillo1,2\*, Analía Forasiepi 3, Carlos Jaramillo2 and Marcelo R. Sánchez-Villagra1*

*<sup>1</sup> Paläontologisches Institut und Museum, University of Zurich, Zurich, Switzerland*

*<sup>2</sup> Smithsonian Tropical Research Institute, Panama City, Panama*

*<sup>3</sup> Instituto Argentino de Nivología, Glaciología y Ciencias Ambientales (IANIGLA), CCT-CONICET Mendoza, Mendoza, Argentina*

#### *Edited by:*

*James Edward Richardson, Royal Botanic Garden Edinburgh, UK*

#### *Reviewed by:*

*William Daniel Gosling, University of Amsterdam, Netherlands Bruce D Patterson, Field Museum of Natural History, USA*

#### *\*Correspondence:*

*Juan D. Carrillo, Paläontologisches Institut und Museum, University of Zurich, Karl-Schmid-Strasse 4, 8006 Zürich, Switzerland e-mail: juan.carrillo@pim.uzh.ch*

The vast mammal diversity of the Neotropics is the result of a long evolutionary history. During most of the Cenozoic, South America was an island continent with an endemic mammalian fauna. This isolation ceased during the late Neogene after the formation of the Isthmus of Panama, resulting in an event known as the Great American Biotic Interchange (GABI). In this study, we investigate biogeographic patterns in South America, just before or when the first immigrants are recorded and we review the temporal and geographical distribution of fossil mammals during the GABI. We performed a dissimilarity analysis which grouped the faunal assemblages according to their age and their geographic distribution. Our data support the differentiation between tropical and temperate assemblages in South America during the middle and late Miocene. The GABI begins during the late Miocene (∼10–7 Ma) and the putative oldest migrations are recorded in the temperate region, where the number of GABI participants rapidly increases after ∼5 Ma and this trend continues during the Pleistocene. A sampling bias toward higher latitudes and younger records challenges the study of the temporal and geographic patterns of the GABI.

**Keywords: Miocene, Pliocene, biogeography, mammalia, South America**

## **INTRODUCTION**

The Neotropics [Neotropical region *sensu lato* of Morrone (2014)] supports an extremely large diversity of living mammals. Currently there are around 1500 recognized species which represent in the order of 30% of the total world mammal diversity. Included are endemic groups such as marsupials (opossums), xenarthrans (sloths, armadillos, and anteaters), caviomorph rodents (capybaras, spiny rats, chinchillas), platyrrhine monkeys, and phyllostomid bats (Patterson and Costa, 2012). The variety of biomes found in the Neotropics (lowland rainforest, savannas, mountain forest, scrublands, and deserts) could provide a partitioned environment enhancing species richness (Tews et al., 2004).

The current Neotropical mammal fauna is the result of a long evolutionary history. The Cenozoic (66–0 Ma) in South America was characterized by long term geographical isolation with the evolution of an endemic fauna (Simpson, 1980). Sporadic dispersal events from other geographic areas interrupted this isolation introducing novel clades into South America including caviomorph rodents during the middle Eocene (∼41 Ma) and platyrrhine monkeys during the late Oligocene (∼26 Ma) (Pascual, 2006; Antoine et al., 2012; Croft, 2012; Goin et al., 2012). The isolation of South America's mammal fauna ceased by ∼10–7 Ma, when proximity, and then permanent connection was established with Central America. This connection initiated a massive faunal exchange between North America (NA) and South America (SA). This event is known as the Great American Biotic Interchange (GABI) (Simpson, 1980; Webb, 1985). The classic interpretation places the onset of the GABI by ∼3.0 Ma, with some early migrations during the late Miocene from SA to NA by ∼9 Ma and from NA to SA by ∼7 Ma. Other studies using dated molecular phylogenies across a wide range of taxa indicate an important part of the interchange may have predated the permanent land connection by ∼3 Ma (Koepfli et al., 2007; Cody et al., 2010; Eizirik et al., 2010; Eizirik, 2012). The core of the GABI is composed by a series of major migration "waves" during the Pliocene–Pleistocene (2.5–0.012 Ma) (Webb, 2006; Woodburne, 2010). Recently, several NA mammals have been reported from the late Miocene deposits, ∼10 Ma, within the Amazon basin. These include a dromomerycine artiodactyl, gomphotheres, peccaries, and tapirs which suggest a more intense earlier connection (Campbell et al., 2000, 2010; Frailey and Campbell, 2012; Prothero et al., 2014). However, the taxonomy and age of some of these fossils have been questioned (Alberdi et al., 2004; Lucas and Alvarado, 2010; Lucas, 2013). In Amazonia, Pleistocene terraces are built from older Cenozoic deposits (Latrubesse et al., 1997), resulting in non-contemporaneous associations (Cozzuol, 2006). Even with these concerns in mind, in the last decades the presence of northern forms in South America is becoming better understood.

During the late Miocene (11.6–5.3 Ma) and early Pliocene (5.3–3.6 Ma), the GABI was taxonomically balanced, as predicted by the MacArthur–Wilson species equilibrium hypothesis, with similar number of NA and SA families participating in the interchange (Webb, 1976; Marshall et al., 1982). During the Pleistocene, NA mammals appeared to have diversified exponentially in SA, resulting in an overall prevalence of NA over SA–derived mammals. This could be the result of competitive displacement (Webb, 1976, 1991; Marshall et al., 1982), but this has not been subjected to rigorous analyses. In contrast, ecological replacement has been demonstrated for extinct metatherians and placental carnivores (Prevosti et al., 2013).

Vrba (1992) analyzed the GABI in the context of the "habitat theory" (i.e., physical environmental changes are the main drivers of "distribution drift") and highlighted the importance of environmental changes over biotic interactions as the major cause of the biotic turnover. Webb (1991) proposed that the Pleistocene glaciations and the widespread development of savannas in the Neotropics facilitated dispersals during the GABI of savannaadapted mammals. Woodburne (2010) agreed with Webb's model and related the pulses of faunistic movements to the glaciations and sea level changes of the Pliocene and Pleistocene. However, most recent evidence does not support the widespread expansion of savannas in the tropics during glacial times (Behling et al., 2010). The GABI was dynamic with bidirectional migrations (Carlini et al., 2008b; Castro et al., 2014) and with reciprocal exchanges within a single lineage (e.g., procyonids; Baskin, 1989; Forasiepi et al., 2014; and felids; Prevosti, 2006).

Potential biogeographic barriers or corridors along with environmental changes controlled patterns of movements (Webb, 1991; Woodburne, 2010). The Andes are currently an important biogeographic feature in South America extending for about 8000 km from Venezuela to Argentina, reaching average heights of about 4000 masl and maximum elevations up to 7000 masl (Ramos, 1999). The present day elevations of the northern and the north central Andes (north of 20◦S) were reached during or soon after the late Miocene (Mora et al., 2009) and may have constituted a colonization corridor during the GABI (Patterson et al., 2012 and references therein).

A full understanding of the GABI is difficult because of the difference in fossil sampling between low and high latitudes (**Figure 1**). Even with the major recent advances in Neotropical paleontology (Kay et al., 1997; Campbell, 2004; MacFadden, 2006; Sánchez-Villagra et al., 2010; Antoine et al., 2012), our knowledge of this large portion of territory that comprises the neotropics, twice the size of Europe and almost as large as North America is scarce (Croft, 2012).

In this contribution, we investigate biogeographic patterns for the middle and late Miocene (15.9–5.3 Ma) in SA at the initiation of the GABI. We review the temporal and geographical distribution of fossil mammals during the GABI and discuss the special significance of the fossil record from northern SA to understand the patterns and dynamics of the interchange.

#### **MATERIALS AND METHODS**

Species lists from several middle and late Miocene–Pliocene mammal associations (La Venta, Fitzcarrald, Quebrada Honda,

**FIGURE 1 | Number of collections in the Paleobiology Database (PBDB) across latitude for land mammals in North America (gray boxes) and South America (white boxes) for each 1 ma period in the last 12 ma.** The boxplot shows the mean and standard deviation of the latitude of the PBDB collections for each time interval.

Collon Curá, Urumaco, Acre, Mesopotamian, Cerro Azul, Chiquimil, Andalhuala, Monte Hermoso, Inchasi and Uquía) were compiled from several sources (Goin et al., 2000; Cozzuol, 2006; Reguero and Candela, 2011; Brandoni, 2013; Tomassini et al., 2013; Tejada-Lara et al., in press) and other references available in the Paleobiology Database (PBDB) (Alroy, 2013), to which we added 450 references with records of Neogene fossil mammals from the Americas (**Figures 2**, **3**; Supplementary Material 1–2). We obtained latitude and paleolatitude from each locality from the PBDB (**Table 1**) and estimated the distance in km among localities using Google Earth. Localities were coded for presence/absence at the generic level (Supplementary Table 1). The biochronology refers to the South American Land Mammal Ages (SALMA) and the calibration of the boundaries of Tomassini et al. (2013, modified from Cione et al., 2007) and Cione and Tonni (1999, 2001). Genera were used as taxonomic unit (including taxonomic identifications with *cf.* and *aff.* qualifiers). Lower taxonomical levels are still unresolved for several localities and data are incomparable.

We analyzed closely contemporaneous fossil mammal associations from SA using the Bray-Curtis binary dissimilarity index. This reaches a maximum value of 1 when there are no shared taxa between the two compared communities. The Vegan package (Okasanen et al., 2013) was used to perform a cluster analysis with average grouping method and a Nonmetric Multidimensional Scaling (NMDS) set to two dimensions (axes) and 1000 runs. We compared tropical and temperate Miocene localities, and in order to account for differences in the sample size, we set the number of taxa equal to the assemblage with the lowest richness within the subgroup and calculate Bray-Curtis dissimilarity by resampling with replacement 1000 times all the localities. The Vegan package was used to obtain genera accumulation curves for

tropical assemblages, using the random method. All analyses were performed in R (R Core Team, 2013).

We obtained records for late Miocene to late Pliocene land mammals for NA and SA from the PBDB. We classified each genus as North or South American if the taxon or its ancestor were in either NA or SA before 10 Ma. We compared the geographic distribution (tropical vs. temperate) and time of first appearance datum (FAD) of GABI migrants in the continent (Supplementary Material 3 and Supplementary Table 2). In order to account for the age uncertainty of each FAD, we generate 1000 different random values between the maximal and minimal age estimate and calculate the mean and standard deviation of the age estimate for each record.

#### **STUDY SITES**

We selected faunal associations from the tropical and temperate regions of South America which all together span from the middle Miocene (∼15 Ma) to the late Pliocene (∼2 Ma), a critical time period for the GABI. The study sites cover a wide latitudinal gradient across the continent (**Table 1**).

#### *La Venta*

La Venta is one of the best-studied fossil assemblages from the Neotropics and among vertebrates includes freshwater fishes, crocodiles, turtles and different mammal clades (Kay et al., 1997). These come from the Honda Group in the central Magdalena valley, Colombia (**Figure 2**). Its age is constrained by radiometric and paleomagnetic data. The assemblage of La Venta served as the basis for defining the Laventan SALMA (middle Miocene, 13.5–11.8 Ma) (Madden et al., 1997).

**FIGURE 3 | Chronostratigraphy, South American Land Mammal Ages (SALMAs) and temporal distribution of the faunal assemblages discussed in the text.** Colloncuran:15.7–14 Ma (Madden et al., 1997) Laventan: 13.5–11.8 Ma. (Madden et al., 1997); Mayoan: 11.8–10 Ma. (Flynn and Swisher, 1995); Chasicoan: 10– ∼8.5 (Flynn and Swisher, 1995); Huayquerian = ∼8.5–5.28 Ma. Lower age following (Cione and Tonni, 2001; Reguero and Candela, 2011) and upper age following (Tomassini et al., 2013); Montehermosan = 5.28 –4.5/5.0 Ma. (Tomassini et al., 2013); Chapadmalalan = 4.5/5.0–3.3 (Tomassini et al., 2013); Marplatan = 3.3 – ∼2.0 Ma. Lower age following (Tomassini et al., 2013) and upper age following (Cione and Tonni, 1999; Cione et al., 2007); Ensenadan = ∼2.0–*<*0.78(0.5?) Ma. (Cione and Tonni, 1999; Cione et al., 2007); Bonaerian = *<*0.78(0.5?)–0.13 Ma. (Cione and Tonni, 1999); Lujanian = 0.13–0.08 Ma (Cione and Tonni, 1999).

#### *Fitzcarrald*

The localities of the Fitzcarrald assemblage are found along the Inuya and Mapuya rivers in the Amazon of Peru (**Figure 2**) from the Ipururo Formation, interpreted as middle Miocene (Laventan Age) (Antoine et al., 2007; Tejada-Lara et al., in press). The vertebrate assemblage includes fishes, turtles, crocodiles, snakes and 24 mammalian taxa (Negri et al., 2010; Tejada-Lara et al., in press).

#### *Quebrada Honda*

Quebrada Honda is located in southern Bolivia at ∼21◦S latitude, 20 km north of the Argentine frontier and at an elevation of about 3500 m (**Figure 2**). The fossil-bearing deposits crop out in the valley of the Honda River and its tributaries. Paleomagnetic and radioisotopic data provide an extrapolated age of 13–12.7 Ma for


**Table 1 | Modern and ancient latitude and elevation of the faunal assemblages used in this study.**

the fossil bearing beds (MacFadden et al., 1990). Multiple proxies to estimate paleoelevation of the Central Andean Altiplano have yielded values between 1000 and 2000 m for the middle Miocene (Garzione et al., 2008); however, a most recent study using clumped isotope thermometry on paleosol carbonates inferred an earlier uplift for the Altiplano, with Quebrada Honda at about 2600 ± 600 m and a mean annual temperature of ∼9 ± 5◦ C (Garzione et al., 2014). The assemblage includes about 30 mammals representing metatherians, xenarthrans, rodents, astrapotheres, litopterns and notoungulates and correspond to the Laventan SALMA (Croft, 2007).

#### *Collón Curá*

The Collón Curá Formation is largely exposed at the west of Nord-Patagonian Massif (Neuquén and Río Negro provinces, and Norwest Chubut Province). The rich vertebrate association is represented by reptiles, birds, and principally mammals: metatherians, xenarthrans, rodents, notoungulates, litopterns, and astrapotheres (Kramarz et al., 2011). The fossil mammals collected in the vicinities of the Collón Curá river by Santiago Roth in the late 19th Century are the basis for the definition of the Colloncuran SALMA, although a critical review of most of the findings is still pending. Several radiometric dates for the Collón Curá Formation indicate ages between 15.5 and 10 Ma for the vertebrate association (e.g., Rabassa, 1974, 1978; Marshall et al., 1977; Bondesio et al., 1980; Mazzoni and Benvenuto, 1990; Madden et al., 1997).

#### *Urumaco*

The Urumaco sequence is found in the Falcón State in northwestern Venezuela (**Figure 2**). It includes the Querales, Socorro, Urumaco, Codore and San Gregorio formations, which together span from the middle Miocene to late Pliocene (Quiroz and Jaramillo, 2010). The Urumaco sequence shows a high diversity of crocodilians (Scheyer et al., 2013) and xenarthrans (Carlini et al., 2006a,b, 2008a,c). We focus our analysis on the Urumaco Formation. Linares (2004), on the basis of a mammal list of undescribed material suggested a middle to late Miocene age. Until a detail taxonomic revision is conducted, the biostratigraphic correlation of the Urumaco association remains tentative.

### *Acre*

The Acre region in the southwestern Amazonia includes several fossiliferous localities which would represent different time intervals considering the geological and palinological evidence (Cozzuol, 2006). Fossil vertebrates come from the Solimões Formation of the state of Acre, Brazil and Peruvian and Bolivian localities from the Madre de Dios Formation (Negri et al., 2010) (**Figure 2**). The vertebrate assemblage is very diverse and includes fishes, snakes, lizards, birds, turtles, crocodiles, and mammals including whales, dolphins, manatees and a diverse assemblage of terrestrial forms. The Acre mammal assemblage has been referred to late Miocene, Huayquerian SALMA (Cozzuol, 2006; Ribeiro et al., 2013) or included also in the Pliocene, Montehermosan SALMA (Cozzuol, 2006). Campbell et al. (2001) reported 40A/39A dates of 9.01 <sup>±</sup> 0.28 Ma for the base of the Madre de Dios Formation and 3.12 ± 0.02 Ma near the top.

#### *Mesopotamian*

The continental mammals of the Mesopotamian assemblage come from the lower levels of the Ituzaingó Formation, which crops out along the cliffs of the Paraná River in Corrientes and Entre Ríos provinces, north-east Argentina (**Figure 2**). The vertebrate assemblage is rich and includes fishes, crocodiles, birds and mammals (Cione et al., 2000; Brandoni and Noriega, 2013). It differs taxonomically from other associations in Argentina at the same latitudes and this was explained by a southern extension of the northern realm (Cozzuol, 2006). The age of the Mesopotamian assemblage has been largely debated (Cione et al., 2000 and references therein); it is currently assigned to the late Miocene, Huayquerian SALMA (Cione et al., 2000) or also extended into the Chasicoan SALMA (Brandoni, 2013; Brunetto et al., 2013). The dating of 9.47 Ma for the upper levels of the lower Paraná Formation (Pérez, 2013) represents a maximum limit for the Mesopotamian assemblage.

## *Cerro Azul*

Several localities in central east Argentina (La Pampa and Buenos Aires provinces) have provided abundant fossil vertebrates from the Cerro Azul and Epecuén formations which are considered geologically correlated (Goin et al., 2000). This assemblage includes reptiles, birds and a rich mammal association. These units are assigned to the late Miocene, Huayquerian SALMA (Goin et al., 2000; Montalvo et al., 2008; Verzi and Montalvo, 2008; Verzi et al., 2011) on the basis of mammal biostratigraphy. This association is currently the most complete list for this age (Goin et al., 2000). The possibility of extension into the late Pliocene cannot be discarded for some localities assigned to the Cerro Azul Formation (Prevosti and Pardiñas, 2009).

## *Chiquimil*

The Chiquimil Formation is exposed in north-west Argentina (Catamarca Province) and is divided in three members. The Chiquimil A (Riggs and Patterson, 1939; Marshall and Patterson, 1981) or El Jarillal Member (Herbst et al., 2000; Reguero and Candela, 2011) provided a rich fossil record. The mammalian association has been assigned to the late Miocene, Huayquerian SALMA (Reguero and Candela, 2011). A dating in the middle section of the Chiquimil Formation indicated ∼6.68 Ma (Marshall and Patterson, 1981).

## *Andalhuala*

The Andalhuala Formation is exposed in the Santa María Valley in north-west Argentina (Catamarca Province). This is a classical fossiliferous unit of the South American Neogene with abundant and diverse fossil remains, including plants, invertebrates, and vertebrates (Riggs and Patterson, 1939; Marshall and Patterson, 1981). Basal levels of the Andalhuala Formation have been dated to ∼7.14 Ma (Latorre et al., 1997) and ∼6.02 Ma (Marshall and Patterson, 1981) while a tuff sample close to the upper part of the sequence was dated to ∼3.53 Ma (Bossi et al., 1993). The mammal association has been referred to the Montehermosan– Chapadmalalan SALMAs (Reguero and Candela, 2011).

## *Monte Hermoso*

The Monte Hermoso Formation is exposed in the Atlantic coast at the south west of Buenos Aires Province, Argentina. This unit has provided fishes, anurans, reptiles, birds, and a diverse mammal association. Recent biostratigraphic and biochronological analyses (Tomassini and Montalvo, 2013; Tomassini et al., 2013) have recognized a single biozone (the *Eumysops laeviplicatus* Range Zone) in the Montehermosan Formation which is the base for the Montehermosan SALMA. The Montehermosan was restricted to the early Pliocene between *<*5.28 and 4.5/5.0 Ma by considering the dating of 5.28 Ma in levels with Huayquerian mammals and paleomagnetic correlations in the upper Chapadmalal Formation (Tomassini et al., 2013).

### *Inchasi*

The locality of Inchasi is found in the eastern cordillera in the department of Potosí, Bolivia at an elevation of about 3220 masl and ∼19◦S latitude (**Figure 2**). The mammal assemblage includes 10 mammals, representing xenarthra, rodentia, and native ungulates (Litopterna and Notoungulata) (Anaya and MacFadden, 1995). Paleomagnetic analysis indicates an age of about 4– 3.3 Ma. The analysis of the mammal association first suggested Montehermosan and/or Chapadmalalan ages (MacFadden et al., 1993). A later revision (Cione and Tonni, 1996) correlated Inchasi with the Chapadmalalan, although probably older than the classical Chapalmalalan sections at the Atlantic coast.

## *Uquía*

The Uquía Formation crops out in the Quebrada de Humahuaca, Jujuy province, north western Argentina at an elevation of ∼2800 masl and ∼23◦S latitude (**Figure 2**). The Uquía Formation is divided in three units: the Lower Unit was assigned to the late Chapadmalalan, the Middle Unit to the Marplatan (Vorhuean, Sanandresian), and the Upper Unit to the Ensenadan (Reguero et al., 2007; Reguero and Candela, 2011). 40K–40Ar data from a volcanic tuff ("Dacitic tuff") in the Lower Unit provided ∼3.0 Ma. Another tuff (U1) dated as 2.5 Ma is the boundary between the Middle and Upper Unit. The geological and paleontological evidence suggested that during the late Pliocene the area was a wide intermountain valley at about 1700–1400 masl (Reguero et al., 2007).

## **RESULTS**

## **MIDDLE AND LATE MIOCENE–PLIOCENE MAMMAL FAUNAS FROM SA**

In the NMDS analysis (stress value = 0.083), the analyzed South American localities are primarily grouped by age and secondarily by geographic position (**Figure 4A**). The NMDS1 clearly separates middle Miocene, late Miocene and Pliocene localities and for the middle and late Miocene assemblages, the NMDS2 separates tropical from temperate localities. For the middle Miocene (Colloncuran, Laventan), the cluster analysis separates the tropical assemblages of La Venta (∼2.6◦N paleolatitude) and Fitzcarrald (∼12.5◦S paleolatitude) from the southern Collón Curá (∼41.3◦S paleolatitude) and Quebrada Honda (∼22.3◦S paleolatitude). For the late Miocene (Huayquerian– Montehermosan), Urumaco (∼10.9◦N paleolatitude) appears outside the groups formed by Acre (∼10.5◦S paleolatitude) and Mesopotamian (∼32.5◦S paleolatitude), another cluster includes the Argentinean assemblages of Andalhuala (∼26.8◦S paleolatitude), Chiquimil (∼27.0◦S paleolatitude), Cerro Azul (∼37.0◦S paleolatitude), and Monte Hermoso (∼38.9◦S paleolatitude). Finally, the early Pliocene (Chapadmalalan– Marplatan) temperate associations from Inchasi (∼19.9◦S paleolatitude) and Uquía (∼23.4◦S paleolatitude) cluster together, although there are no tropical assemblages to compare with. If we compare only faunal assemblages from the same time period (middle Miocene, late Miocene and Pliocene), there is a positive relationship between the Bray-Curtis dissimilarity and the distance of each pair of assemblages studied (**Figure 4B**).

The Bray-Curtis dissimilarity values with resampling calculated for the tropical, temperate and tropical vs. temperate assemblages for the middle and late Miocene shows that all the assemblages are very different (**Figure 4C**). The Bray-Curtis dissimilarity between middle Miocene tropical (La Venta and Fitzcarrald) and temperate (Quebrada Honda and Collón Curá) assemblages compared to the dissimilarity between tropical vs. temperate are found to be statistically significant. Dissimilarity

Bray-Curtis dissimilarity; triangles, middle Miocene; circles, late Miocene; squares, Pliocene. The gray lines show the clustering result. **(B)** Bray-Curtis dissimilarity relationship with distance in km, between each locality pair. We include only localities pairs which are within the same time interval (middle Miocene, late Miocene, Pliocene), red, tropical–tropical pair; blue, temperate–temperate pair; black, tropical–temperate pair. **(C)** Density histograms of the Bray-Curtis dissimilarity values among the different faunal associations analyzed for the middle and late Miocene, red, only tropical faunas, blue, only temperate faunas, black, tropical vs. temperate faunas.

values of middle Miocene tropical (mean = 0.830) are lower than middle Miocene tropical vs. temperate (mean = 0.956) (Mann-Whitney U, *p <* 2*.*2 e-16); whereas middle Miocene temperate dissimilarity (mean = 0.964) is higher than middle Miocene tropical vs. temperate dissimilarity (Mann-Whitney U, *p* ≤ 2*.*87 e-15). For the late Miocene, dissimilarity of tropical assemblages (Acre and Urumaco) is lower (mean = 0.873) than tropical vs. temperate (mean = 0.969) (Mann-Whitney U, *p <* 2*.*2 e-16). We also found difference between temperate assemblages (Mesopotamian, Chiquimil, Andalhuala, Cerro Azul, and Monte Hermoso; mean = 0.899) and tropical vs. temperate dissimilarity (Mann Whitney U, *p <* 2*.*2e-16).

The number of PBDB collections was used to generate accumulation curves for the tropical assemblage (**Figure 5**). Each collection represents a geographic and stratigraphic point where the fossils have been found and provide a good proxy for sampling effort. We excluded from the analysis the Acre collection with unknown stratigraphic provenance. The accumulation curves show that generic richness for tropical assemblages is underestimated, even for the better known assemblage of La Venta.

#### **TEMPORAL AND SPATIAL DISTRIBUTION PATTERNS OF GABI**

The cumulative first appearance datum (FAD) of non-native taxa for both NA and SA continents (**Figure 6A**, Supplementary Table 2) shows that first migrations are recorded in the temperate region (cumulative FAD mean = 2 by 10 Ma), represented by the ground sloths *Thinobadistes* (Mylodontidae) and *Pliometanastes* (Megalonychidae) recorded at McGehee Farm, Florida (Hirschfeld and Webb, 1968; Webb, 1989). During the late Miocene (12–5 Ma), the number of FAD is similar between the tropics (cumulative FAD mean = 6 by 5 Ma) and temperate (cumulative FAD mean = 7 by 5 Ma). In the tropics, the oldest records of migrants are those from the Acre region in Peru (Campbell et al., 2010; Prothero et al., 2014) of disputable age (Alberdi et al., 2004; Lucas and Alvarado, 2010; Lucas, 2013). During the Pliocene (between 3 and 4 Ma) there is an increase in the number of FAD at higher latitudes (cumulative FAD mean =

**FIGURE 5 | Accumulation curves estimated with random method for the tropical faunal associations, shaded areas represent the 95% confidence interval.**

**(B)** Number of collections with records of land mammals in the Paleobiology Database (PBDB) for each million year since 12 Ma; red, collections in the tropics; blue= collections in the temperate region.

21), but this is not recorded in the tropics (cumulative FAD mean = 9). Finally, during the Pleistocene (2–1 Ma) a higher number of FADs are recorded in tropical and temperate regions. Most of the collections in the PBDB with records of land mammals in the Americas are in the temperate region and are younger than 4 Ma (**Figure 6B**).

#### **DISCUSSION**

#### **MIDDLE AND LATE MIOCENE–PLIOCENE MAMMAL FAUNAS FROM SA**

The NMDS1 shows that a strong temporal component establishes the dissimilarity relationships among the faunas. In addition, an important influence of the geographic position is reflected in the distribution of the faunas along the NMDS2 axis. There is a positive relationship between the Bray-Curtis dissimilarity values and the distance between faunas (**Figures 4 A,B**).

For the middle Miocene, Colloncuran–Laventan faunal associations, a differentiation between the tropical assemblages of La Venta and Fitzcarrald, and the southern Quebrada Honda and Collón Curá was observed (**Figure 4A**). The middle latitude fauna Quebrada Honda appears unique, although it is closer to the slightly older and temperate Collón Curá than to the contemporaneous tropical faunas of La Venta and Fitzcarrald (Croft, 2007; Tejada-Lara et al., in press). The reconstructed paleoenvironment for the middle Miocene Monkey Beds assemblage at La Venta considered an estimated annual rainfall between 1500 and 2000 mm using diet, locomotion and body size indices of the mammal community (Kay and Madden, 1997a,b).

For the late Miocene assemblages, the NMDS indicates a high dissimilarity between the tropical faunas of Urumaco and Acre. For the Urumaco mammal assemblage, xenarthrans and rodents are the most conspicuous elements, but further studies on other clades promise to document a higher diversity than currently recognized. The temperate assemblages of Chiquimil, Andalhuala, Cerro Azul, and Monte Hermoso cluster together and the Mesopotamian is between this group and Acre (**Figure 4A**).

After taking into account the differences in sample size, we found that the dissimilarity values of tropical assemblages (mean = 0.830 for middle Miocene, and mean = 0.879 for late Miocene) and late Miocene temperate assemblages (mean = 0.899 for late Miocene) are lower than the values for tropical vs. temperate assemblages (mean = 0.956 for middle Miocene and mean = 0.969 for late Miocene) (**Figure 4C**). Consequently, the Bray-Curtis dissimilarity between faunas of the same age and biome is lower than between faunas of different biomes (tropical vs. temperate); although, the mean dissimilarity values in all cases are high (*>*0.8).

As shown by the accumulation curves (**Figure 5**), the generic richness of the tropical assemblages studied are underestimated. A more comprehensive knowledge of tropical faunas is needed to better understand the paleodiversity patterns and paleobiogeography in the new world.

#### **TEMPORAL AND SPATIAL DISTRIBUTION PATTERNS OF GABI**

The cumulative FAD across time of GABI participants in each continent shows that the GABI was a gradual process that began in the late Miocene (∼10 ma) (**Figure 6A**). The early phase of GABI (pre GABI *sensu* Woodburne, 2010) is characterized by a small number of migrants, with a mean cumulative FAD = 6 between 4 and 5 Ma in the tropics and a cumulative FAD = 7 in the temperate region. The land connection between the two continents occurred at the Isthmus of Panama, located within the tropical zone. Therefore, it would be expected that the Neotropics record the earliest GABI immigrants, but older immigrants have been found at higher latitudes.

The findings reported by Campbell and colleagues (Campbell et al., 2010; Frailey and Campbell, 2012; Prothero et al., 2014) in the Acre region of the Amazon basin, assigned to late Miocene (∼9 Ma) sediments would represent the oldest NA immigrants. However, the dromomerycine artiodactyl, peccaries, tapirs, and gomphotheres have not been found in other late Miocene localities in SA and these findings await further clarifications. In SA, the most frequent pre-GABI elements are procyonids recorded in several late Miocene–Pliocene (Huayquerian–Chapadmalalan) SA localities since ∼7.3 Ma (Cione et al., 2007; Reguero and Candela, 2011; Forasiepi et al., 2014). The evidence of the fossil record combined with the living species distribution suggests that much of the evolutionary history of procyonids occurred in the Neotropics, possibly in SA (Eizirik, 2012). Molecular studies have predicted that the diversification of the group occurred in the early Miocene (∼20 Ma), with most of the major genus-level lineages occurring in the Miocene (Koepfli et al., 2007; Eizirik et al., 2010; Eizirik, 2012). This scenario requires a bias in the fossil record, claims an evolutionary history for procyonids in SA that largely precedes the GABI, and suggests an arrival into SA long before previously thought as for several other mammalian clades (Almendra and Rogers, 2012; and references therein).

Since 4 Ma, the number of FAD at higher latitudes rapidly increases and this trend continues during the Pleistocene. In contrast, the number of FAD in the tropics remains low during the Pliocene (cumulative FAD mean = 9 by 2–3 Ma), but rapidly increases during the Pleistocene. A large difference in the number of PBDB collections across time and latitude is observed for land mammals for the last 12 Ma (**Figure 6B**). Most records come from higher latitudes and are younger than 4 Ma, by the time the FAD increases; this suggest that temporal and geographic patterns of GABI are influenced by the sampling bias toward high latitudes and the higher number of Pliocene–Pleistocene records.

The migration of northern taxa into SA after the completion of the land bridge by ∼3 Ma was correlated with supposed expansion of savannas and grasslands in the Neotropics during glacial periods (Webb, 1991, 2006; Leigh et al., 2014). The expansion of savannas during glacial times has been questioned (Behling et al., 2010). If this is the case, the Andes could have served as route of migration of northern taxa toward temperate environments in SA (Webb, 1991), as NA taxa seem to have been more successful in temperate biomes whereas SA taxa dominate in the tropics (Webb, 1991, 2006; Leigh et al., 2014).

## **CONCLUSIONS**

The dissimilarity analysis primarily grouped the faunal assemblages by age and secondarily by geographic distribution. The dissimilarity values among the fossil faunal assemblages analyzed support the differentiation between tropical and temperate assemblages in SA during the middle Miocene (Colloncuran– Laventan) and late Miocene (Huayquerian–Montehermosan). The mid-latitude, middle Miocene assemblage of Quebrada Honda has higher affinities with the slightly older and temperate Collón Curá than with the tropical assemblages of La Venta and Fitzcarrald. For the late Miocene, the temperate assemblages of Chiquimil, Andalhuala, Cerro Azul, and Monte Hermoso cluster together, while the Mesopotamian is between this group and the tropical assemblages of Acre and Urumaco.

The cumulative FAD across time and latitude shows that faunisitc movements related to GABI began during the late Miocene (∼10 Ma) with the oldest records found at higher latitudes. The number of FAD remained relatively low until 4–5 Ma when FAD starts to increase, peaking during the Pleistocene.

The study of paleodiversity patterns and paleobiogeography in the Americas is challenged by the sampling bias toward higher latitudes and the still scarce data from tropical faunas. The interpretation of the temporal and geographic patterns of GABI is likely influenced by these sampling issues.

#### **AUTHOR CONTRIBUTIONS**

Conceived and designed: Juan D. Carrillo, Analía Forasiepi, Carlos Jaramillo, Marcelo R. Sánchez-Villagra. Compiled bibliographic data: Juan D. Carrillo, Analía Forasiepi F, Carlos Jaramillo. Analyzed data: Juan D. Carrillo, Carlos Jaramillo. Wrote the paper: Juan D. Carrillo, Analía Forasiepi. All authors contributed to the final interpretation and editing of the manuscript.

### **ACKNOWLEDGMENTS**

We are grateful to V. Rull, and the topic editors T. Pennington and J. E. Richardson for the invitation to contribute to this volume and two anonymous reviews for the valuable comments. We thank A. A. Carlini, M. Bond, F. Prevosti, and the group of Evolutionary Morphology and Paleobiology of Vertebrates (Zurich), in particular G. Aguirre-Fernández and M. Stange, for valuable comments. A. Cardenas and J. Alroy contributed to the data available at the PBDB. Thanks go to Smithsonian Institution, The Anders Foundation, Gregory D. and Jennifer Walston Johnson, NSF OISE-EAR-DRL 0966884 and NSF EAR 0957679 and to A. A. Carlini and J. D. Carrillo-Briceño for the support during our work in Urumaco. We thank the authorities at the Instituto del Patrimonio Cultural of the República Bolivariana de Venezuela and the Alcaldía Municipio de Urumaco for their generous support. Juan Carrillo was supported by Swiss National Fund SNF 31003A-149605 to M. R. Sánchez-Villagra.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fgene*.*2014*.* 00451/abstract

#### **REFERENCES**


*Naturales de la Provincia del Neuquén*, eds H. A. Leanza, C. Arregui, O. Carbone, J. C. Danieli, and J. M. Vallés (Buenos Aires: Asociación Geológica Argentina), 557–572.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 August 2014; accepted: 10 December 2014; published online: 05 January 2015.*

*Citation: Carrillo JD, Forasiepi A, Jaramillo C and Sánchez-Villagra MR (2015) Neotropical mammal diversity and the Great American Biotic Interchange: spatial and temporal variation in South America's fossil record. Front. Genet. 5:451. doi: 10.3389/ fgene.2014.00451*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2015 Carrillo, Forasiepi, Jaramillo and Sánchez-Villagra. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The age of chocolate: a diversification history of Theobroma and Malvaceae

James E. Richardson1, 2 \*, Barbara A. Whitlock <sup>3</sup> , Alan W. Meerow<sup>4</sup> and Santiago Madriñán<sup>5</sup>

<sup>1</sup> Programa de Biología, Universidad del Rosario, Bogotá, Colombia, <sup>2</sup> Tropical Diversity Section, Royal Botanic Garden Edinburgh, Edinburgh, UK, <sup>3</sup> Department of Biology, University of Miami, Coral Gables, FL, USA, <sup>4</sup> United States Department of Agriculture—ARS—SHRS, National Clonal Germplasm Repository, Miami, FL, USA, <sup>5</sup> Laboratorio de Botánica y Sistemática, Departamento de Ciencias Biológicas, Universidad de los Andes, Bogotá, Colombia

Dated molecular phylogenies of broadly distributed lineages can help to compare patterns of diversification in different parts of the world. An explanation for greater Neotropical diversity compared to other parts of the tropics is that it was an accident of the Andean orogeny. Using dated phylogenies, of chloroplast ndhF and nuclear DNA WRKY sequence datasets, generated using BEAST we demonstrate that the diversification of the genera Theobroma and Herrania occurred from 12.7 (11.6–14.9 [95% HPD]) million years ago (Ma) and thus coincided with Andean uplift from the mid-Miocene and that this lineage had a faster diversification rate than other major clades in Malvaceae. We also demonstrate that Theobroma cacao, the source of chocolate, diverged from its most recent common ancestor 9.9 (7.7–12.9 [95% HPD]) Ma, in the mid-to late-Miocene, suggesting that this economically important species has had ample time to generate significant within-species genetic diversity that is useful information for a developing chocolate industry. In addition, we address questions related to the latitudinal gradient in species diversity within Malvaceae. A faster diversification rate is an explanation for the greater species diversity at lower latitudes. Alternatively, tropical conditions may have existed for longer and occupied greater areas than temperate ones meaning that tropical lineages have had more time and space in which to diversify. Our dated molecular phylogeny of Malvaceae demonstrated that at least one temperate lineage within the family diverged from tropical ancestors then diversified at a rate comparable with many tropical lineages in the family. These results are consistent with the hypothesis that Malvaceae are more species rich in the tropics because tropical lineages within the family have existed for longer and occupied more space than temperate ones, and not because of differences in diversification rate.

Keywords: Andes, chocolate, latitudinal gradient, Malvaceae, phylogenetic niche conservatism, Theobroma

## INTRODUCTION

In his revision of Theobroma, Cuatrecasas (1964) recognized 22 species of understory trees all found in Neotropical lowland rainforests from the Amazon basin to Southern Mexico. Previous phylogenetic analyses (Whitlock and Baum, 1999; Silva and Figueira, 2005) have indicated that Theobroma is sister to Herrania Goudot, a genus of about 20 species monographed by

#### Edited by:

Federico Luebert, Universität Bonn, Germany

#### Reviewed by:

Alexandre Antonelli, University of Gothenburg, Sweden Nyree Zerega, Northwestern University, USA

\*Correspondence: James E. Richardson jamese.richardson@urosario.edu.co

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Ecology and Evolution

> Received: 20 February 2015 Accepted: 08 October 2015 Published: 10 November 2015

#### Citation:

Richardson JE, Whitlock BA, Meerow AW and Madriñán S (2015) The age of chocolate: a diversification history of Theobroma and Malvaceae. Front. Ecol. Evol. 3:120. doi: 10.3389/fevo.2015.00120 Schultes (1958), and they are both representatives of the tribe Theobromeae (Whitlock et al., 2001) along with two other genera, Glossostemon Desf., with one species from Arabia, and the Neotropical Guazuma Mill., with 2–5 species. **Figures 1A–C** indicates the distributions of species of Theobroma and Herrania with geo-referenced data taken from the Global Biodiversity Information Facility. The distribution of Theobroma cacao in the Neotropics is plotted separately, together with another widespread species, and this may contain some cultivated individuals. These figures show that Northwestern South America, specifically Colombia, is the most species-rich region for Theobroma, Herrania, and in fact all Theobromeae based on herbarium collections (26 species of Theobromeae can be found in Colombia). Theobroma is of great interest from a biogeographic viewpoint as it is distributed in an area that has been subject to much relatively recent geological activity. It is also of interest because it includes the economically important species Theobroma cacao, the source of chocolate, predicted to be a \$100 billion dollar industry by 2016 with worldwide demand increasing by 2.5% a year largely driven by newly emerging markets (Markets and Markets, 2011). However, current cultivation practices face major challenges associated with the advanced age of plantations, the lack of variety of cultivated material, the low density of trees per hectare, low production, a poor comprehensive crop management strategy and fungal and viral diseases (Schnell et al., 2007; Motamayor et al., 2008). There is also a need to ensure the long-term sustainability of this industry by protecting it from the risks posed by climate change. Information on the origin and evolutionary history of Theobroma cacao and its relatives will assist with planning crop improvement strategies.

The uplift of the Andes and the bridging of the Panamanian Isthmus are two geological events that have been suggested to have had a profound impact on patterns of Neotropical plant diversification (Gentry, 1982; Burnham and Graham, 1999; Richardson et al., 2001; Knapp and Mallet, 2003; Antonelli et al., 2009; Roncal et al., 2013; Meerow et al., 2015), and may, in part, explain the greater diversity of this region in comparison with the palaeotropics. Hoorn et al. (2010) provided maps at various stage in the development of these geological systems. These events may have produced barriers to the dispersal of lineages restricted to lowland tropical forests and changed the substrate composition and fluvial systems in lowland areas (Roncal et al., 2013), facilitating diversification. The Andean Cordillera extends for 5000 km along the western coast of South America (Gregory-Wodzicki, 2000). The timing of uplift of the Andes varied from north to south and from east to west (Gregory-Wodzicki, 2000; Mora et al., 2010). The Altiplano-Puna of the Central Andes reached no more than a third of its modern elevation of 3700 by 20 Ma and no more than half its modern elevation by 10 Ma (Gregory-Wodzicki, 2000). From the middle Miocene through to the early Pliocene, elevations in the northern Eastern Cordillera of the Andes were no more than 40% of their modern values, but between two and five Ma uplift occurred at a more rapid rate reaching modern elevations by around 2.7 Ma (Gregory-Wodzicki, 2000). In Colombia the Andes divide into the Eastern, Central, and Western Cordilleras. The Western and Central Cordilleras do not reach the northern coast of South America and therefore may not constitute barriers to dispersal for lowlandrestricted organisms. The formation of the Eastern Cordillera could therefore have been crucial in erecting a montane barrier to dispersal for lowland restricted plants. The timing at which that barrier became effective in restricting migration will depend on the adaptive or dispersal capacity of individual lineages. Western Amazonia also experienced a period of submergence from the Early Miocene that resulted in the formation of an extensive wetland called the Pebas System that existed from 17 to 11 Ma (Wesselingh et al., 2002; Wesselingh, 2006; Wesselingh and Salo, 2006; Wesselingh and Ramos, 2010). This may also have acted as a barrier to dispersal of lowland wet forest restricted lineages during the period of its existence. Other potential barriers may have been the Llanos grassland ecosystem that spreads from the foothills of the Andes to the coast of Eastern Venezuela or areas of dry forest adjacent to the Andes Mountains in Colombia, e.g., to the north of Los Llanos in Arauca and Casanare or in the Inter-Andean valleys of the Magadalena and Cauca Rivers.

In addition to directly causing diversification by splitting lowland populations as mountains rose, diversification may also have resulted indirectly from changes to lowland sediments and river systems that flank the mountains. The joining of Gondwanan and Laurasian landmasses through the formation of the Isthmus of Panama was also thought to be a key event for Neotropical biotic evolution because it allowed the interchange of terrestrial species between North and South America (Simpson, 1980). According to Coates and Obando (1996) the formation of the Isthmus of Panama did not occur in one single event, but was reportedly completed in the Middle Pliocene at around 3.4–3.1 Ma. However, recent studies indicate that the land bridge may actually have begun to form from the early Miocene (Farris et al., 2011; Montes et al., 2015). The migration history of plants and animals across the Isthmus of Panama region has been reviewed by Cody et al. (2010) who concluded that plants had a greater capacity for traversing between North and South America prior to the formation of the land bridge and more recently by Bacon et al. (2015) who re-assessed biological migrations in the light of an older isthmus closure. The role of the rise of the Andes separating the Chocó and Mesoamerican regions of the Neotropics from the Amazonian and eastern regions of South America in promoting diversification has been demonstrated in various groups of organisms including birds (Gonzalez et al., 2003; Brumfield and Edwards, 2007), primates (Cortés-Ortiz et al., 2003), insects (Arrivillaga et al., 2002), rodents (Patterson and Velazco, 2008), mammals (Patterson et al., 2012), and fish (Albert et al., 2006) but few studies have focused on lowland plants (e.g., Pirie et al., 2006; Winterton et al., 2014). The distributions of both Theobroma and Herrania make them an excellent model group to study the effects of montane uplift, the closure of the Isthmus of Panama and other geological events in the region on diversification patterns in the Neotropics.

In order to fully understand the diversification of Theobroma and its allies, it is necessary to place it into spatial and temporal context within the family to which it belongs. The

descriptions of species that are not found on GBIF. Theobroma cacao may contain some cultivated individuals.

circumscription of Malvales has changed markedly in recent years in the light of molecular phylogenetic studies (e.g., Alverson et al., 1999). Previously recognized families have now been sunk into a broader Malvaceae. One of these, Sterculiaceae, the former home of Theobroma, is polyphyletic. Theobroma is now placed in the tribe Theobromeae within subfamily Byttnerioideae Burnett (Whitlock et al., 2001), one of nine sub-families currently recognized within Malvaceae. Byttnerioideae includes 27 genera and 650 species (Stevens, 2001) and also includes the tribes Byttnerieae, Hermannieae, and Lasiopetaleae. Most of the nine subfamilies of Malvaceae have a predominantly tropical distribution although some have strong representation at higher latitudes. The genus Tilia in Tilioideae is restricted to temperate areas and Malvoideae are well-represented in both tropical and temperate zones.

The comparative evolution of temperate and tropical lineages is of great interest as it may allow us to answer questions related to the latitudinal gradient in species diversity (described in e.g., Hillebrand, 2004; Jablonski et al., 2006; Brown, 2014) that is the greater species richness at lower latitudes. Temperate lineages are those found at high latitudes (or altitudes), of which there are few in Malvaceae, and does not include those found in mid-latitudinal deserts, Mediterranean or warm regions. One explanation for this latitudinal gradient is that tropical lineages have been around for longer (Stebbins, 1974) and have occupied more space. Throughout much of the history of angiosperms global temperatures have been much warmer than modern ones. A decline in temperature was experienced throughout the course of the Tertiary creating temperate conditions at higher latitudes and biome areas would have changed in response to those changes. The fossil record has copious evidence of tropical elements at higher latitudes (e.g., London Clay Flora, Reid and Chandler, 1933) during the warmer periods of the Tertiary. As outlined by Fine and Ree (2006) tropical lineages thus occupied greater areas for longer periods of time than temperate ones, and tropical groups therefore had more time and space within which to diversify. An alternative hypothesis to explain the latitudinal gradient was outlined by Mittelbach et al. (2007) who suggested that greater diversity in the tropics is due to faster diversification rates (see also, Rolland et al., 2014). Lineages that have both tropical and temperate clades may be used to compare their age and diversification rates allowing us to determine whether either of these two hypotheses is correct. The tropical/temperate distribution of Malvaceae also permits addressing questions related to phylogenetic niche conservatism (Kerkhoff et al., 2014). Were temperate lineages derived from tropical ones? If so, how often and when did those lineages arise and did their evolution coincide with climatic changes such as Tertiary cooling?

The primary aim of the present study is to use a dated molecular phylogeny to determine the effects of the Andean uplift and the formation of the Isthmus of Panama on the temporal and spatial diversification of Theobroma and Herrania. Inability to disperse across water would result in Central/South American disjunctions being dated to after the closure of the Isthmus of Panama. Similarly, if they could not disperse over mountains with an altitude of 2000 m then west/east Andean disjunctions would be dated to c. 5 Ma. Diversification may also have increased in lowland areas during periods of Andean uplift that altered the landscapes of the Amazon Basin and Chocó. We also aimed to determine the age of Theobroma cacao and discuss the implications for the chocolate industry. Additionally we aimed to assess the diversification history of Malvaceae throughout its range. If the latitudinal gradient is to be explained by faster diversification rates in the tropics we would expect to see higher rates in tropical lineages compared with temperate ones. Alternatively, temperate lineages may have been around for less time and occupied less space than tropical ones in which case we might expect to see temperate lineages nested within tropical ones and for both to have similar diversification rates. Few temperate lineages nested within tropical ones would be consistent with phylogenetic niche conservatism in terms of cold tolerance traits.

## METHODS

## Map Generation

Maps were generated that included all accessions recorded in GBIF for the species within their native range as taken from monographic treatments (we excluded accessions georeferenced outside their native rage). Additionally, all specimens which mentioned "cultivated" in the specimen description were eliminated. The map may include cultivated specimens within their native range that could not be identified through the information provided in GBIF, but these should represent only a very small percentage of the total.

## Sampling

We utilized two datasets in this study. We downloaded 157 plastid ndhF sequences from GenBank that were derived from publications by Alverson et al. (1999), Whitlock et al. (2001) Nyffeler et al. (2005), and Wilkie et al. (2006) and aligned them automatically using ClustalW in BioEdit (Hall, 1999) and then manually be eye using Mesquite (Maddison and Maddison, 2015). Of these 137 were of Malvaceae, representing each of the currently recognized subfamilies, and 20 were outgroups from other families in Malvales, and Brassicales. We also used a previously published matrix of 23 WRKY sequences of five different orthologs from the tribe Theobromeae (Borrone et al., 2007) that included 15 individuals (**Table 1**) representing 11 species of Theobroma, seven species of Herrania, and an outgroup, Guazuma ulmifolia, that is sister to these genera in the ndhF analysis.

## Phylogenetic Analysis and Molecular Dating

The ndhF dataset was analyzed using BEAST (Drummond and Rambaut, 2007). A fossil based calibration point was used along with a secondary calibration point that was used to constrain the age of the stem node of Malvaceae. Fossil leaves of Malvoideae from the middle-late Paleocene Cerrejón Formation in Colombia (58–60 mya) were described as a new species, Malvaciphyllum macondicus (Carvalho et al., 2011),


TABLE 1 | Voucher specimens or USDA-ARS MIA DNA sample numbers, GenBank accession numbers of the WRKY sequences used in the phylogenetic analyses.

Abbreviations: CATIE, Centro de Agronómico Tropical de Investigación Enseñanza, Costa Rica; CEPLAC, CEPLAC/CEPLAC, Itanubo, Brazil. Only the last three digits are shown. All accession numbers begin with EF640. For example, the Guazuma ulmifolia WRKY03 GenBank Accession No. is EF640168

that can be assigned to the clade Eumalvoideae because of distal and proximal bifurcations of the costal secondary and agrophic veins that is a synapomorphy for this clade. Eumalvoid leaves and bombacoid pollen found in formations of the mid to late Paleocene of Colombia indicate that representatives of Malvoideae and Bombacoideae were present in neotropical forests at that time. The Malvaciphyllum macondicus fossil was used to constrain the age of the node representing the stem of eumalvoideae (indicated in **Figure 2**) using a log normal distribution, as recommended for fossil calibrations by Ho (2007) and Ho and Phillips (2009), with an offset of 60 mya (based on the older age estimate for the fossil according to Carvalho et al., 2011) and a mean of 1. This approach biases in favor of an older age estimate for this node. For the secondary calibration a normal prior distribution, as recommended for secondary calibrations by Ho (2007) and Ho and Phillips (2009), with a mean of 91.85 mya and standard deviation of 0.1 mya was assigned to the stem node of Malvales. This was based on the age with 95% confidence interval of this node derived from a dated phylogeny of all angiosperms that utilized numerous fossils (Magallón and Castillo, 2009).

For the ndhF analysis an XML (eXtensible Mark-up Language) input file was generated in the Bayesian Evolutionary Analysis Utility software (BEAUti) version v.1.6.2 (part of the BEAST package). The best performing evolutionary model was identified under two different model selection criteria, the hierarchical likelihood ratio test (hLRT) and the Akaike information criterion (Akaike, 1974) as implemented in MrModelTest (Nylander, 2004). Both selection criteria indicated that a General Time Reversible (GTR) with site heterogeneity being gamma distributed and with invariant sites model was optimal. A relaxed clock uncorrelated lognormal minimal distribution was chosen based on the assumption of the absence of a strict molecular clock. To specify informative priors for all the parameters in the model, the Yule tree prior was used since it is recommended as being appropriate for species-level phylogenies (Ho and Phillips, 2009). The XML file was run in BEAST software version v.1.4.8 (Drummond and Rambaut, 2007). Five runs were performed with the MCMC chain length set to 10,000,000, to screen every 10,000 and sample every 10,000 trees. The resulting log file was imported into Tracer (Rambaut and Drummond, 2007) to check whether effective sample sizes (ESS) values were adequate for each parameter. LogCombiner and TreeAnnotator v1.6.2 (also part of the BEAST package) were also used to remove burn-ins and combine tree files and to produce the maximum clade credibility (MCC) tree that has the maximum sum of posterior probabilities on its internal nodes and summarizes the node height statistics in the posterior sample. MCC files were visualized using FigTree version 1.3.1 (Rambaut, 2009).

The BEAST package was also used to analyze the WRKY dataset employing a secondary calibration using a normal distribution based on the age of the split between Theobroma and Herrania derived from the ndhF analysis [11.56 (4.0–20.9 [95% HPD]) Ma] with a standard deviation of 4.5 that was chosen so that 95% of the distribution fell within the 95% confidence intervals of the age based on the ndhF analysis. Conditions of the analysis were identical to those for the ndhF analysis except that only three runs of 10 million generations were run as this was sufficient to achieve adequate ESS values for this dataset.

We calculated diversification rates using the simple estimator of Kendall (1949) and Moran (1951) where SRln = ln(N)– ln(N0)/T (where N = standing diversity, N<sup>0</sup> = initial diversity, here taken as = 1, and T = inferred clade age). We used this equation which was that of Magallon and Sanderson (2001) in the absence of extinction (∈ = 0.0) and under a high relative extinction rate (∈ = 0.9). Standing diversity was based on information from the Angiosperm Phylogeny Website (Stevens, 2001; http://www.mobot.org/MOBOT/research/APweb/), our current understanding of the phylogeny and numbers of species in each clade although we acknowledge that these numbers may change in the light of new sequence data and taxonomic studies.

## RESULTS

## Phylogenetic Analysis and Molecular Dating

**ndhF**–The dataset consisted of 2219 characters and 157 taxa representing 100 of the over 200 genera of the family. The ESS values for all parameters exceeded 200. The maximum clade credibility (MCC) tree from the BEAST analysis is indicated in **Figure 2** with outgroups excluded. Dates with 95% confidence intervals and posterior probabilities for all nodes are indicated in **Supplementary Figures 1**, **2** respectively, along with GenBank accession numbers. Malvaceae were strongly supported as monophyletic [posterior probability (pp) = 0.96] with stem and crown node ages of 78.2 (70.1–87.2 [95% HPD]) million years old (Ma) and 70.7 (63.4–78.6 [95% HPD]) Ma, respectively. Each of the subfamilies indicated in **Figure 2** was monophyletic and received >0.95 pp-values except Malvoideae that had a ppvalue of 0.8, and Bombacoideae (pp = 0.45). The tribe to which Theobroma belongs, Theobromeae, had pp = 0.99 with stem and crown node ages of 53.4 (36.7–70.2 [95% HPD]) and 26.6 (9.6–46.0 [95% HPD]) Ma, respectively. Theobroma was monophyletic but with poor support and relationships within the genus were also poorly supported. The stem node of the genus was 11.6 (4.0–20.9 [95% HPD]) and the crown node was 8.4 (2.4–15.0 [95% HPD]) Ma. The two individuals of Theobroma cacao formed a monophyletic group that had a pp of 1.0 and stem and crown node ages of 6.5 (1.0–12.5 [95% HPD]), and 1.2 (0.01–3.5 [95% HPD]) Ma, respectively. Number of species, crown ages with confidence intervals and diversification rates of subfamilies and other selected clades of Malvaceae are indicated in **Table 2**.

**WRKY**–The dataset consisted of 23 taxa and 3987 characters. The ESS values for all parameters exceeded 200. The MCC tree from the BEAST analysis is shown in **Figure 3** and this has a topology identical to that of Borrone et al. (2007). Confidence intervals on age estimates and posterior probabilities are shown in **Supplementary Figures 3**, **4**, respectively. The genera Theobroma and Herrania were both strongly supported as monophyletic with pp-values of 1. Theobroma had stem and crown node ages of 12.7 (11.6–14.9 [95% HPD]) and 11.0 (8.6–14.3 [95% HPD]) Ma, respectively. Relationships within Theobroma were not strongly supported but the three individuals of Theobroma cacao were with stem and crown node ages for the species of 9.9 (7.7–12.9 [95% HPD]) and 0.5 (0.95– 0.1 [95% HPD]) Ma, respectively. There are multiple possible trans-Andean splits in the phylogeny indicated in **Figure 3** at nodes X [3.1 (1.7–4.9 [95% HPD]) Ma] with an eastern lineage splitting from a predominantly western species (T. gileri), Y [8.3 (5.9–11.1 [95% HPD])], and Z [3.9 (2.4–5.9 [95% HPD]) Ma]. Some species have distributions on either side of the Andes. Trans-isthmian splits include that between the T. mammosum and T. angustifolium clade and its sister that occurred 0.7 (0.2–1.3 [95% HPD]) Ma.

## DISCUSSION

## Theobromeae Biogeography

Theobroma began to diversify 11.0 (8.6–14.3 [95% HPD]) Ma, coincident with the uplift of the Northwestern Andes and resultant changes to lowland areas. **Figure 1** indicates the distributions of each of the species of Theobroma and Herrania included in the analysis. Twenty-six species of Theobromeae can be found in Colombia in the lowlands and foothills surrounding the Andes including species endemic to the Pacific coastal Chocó such as T. chocoense or T. gileri. These distributions and timings are consistent with diversification of the genus being affected by Andean uplift. This could be due to phylogenetic niche conservatism in the tribe with populations not being able to survive the cooler temperatures at higher altitudes resulting in allopatric speciation and phylogenetic splits as the mountains rose. The two trans-Andean splits in the phylogeny indicated in **Figure 3** at nodes X, Y, and Z occurred at 3.1 (1.7–4.9 [95% HPD]) Ma, 8.3 (5.9–11.1 [95% HPD]), and 3.9 (2.4–5.9 [95% HPD]) Ma, respectively. Some species have distributions on either side of the Andes. It could be that splits between groups within these species that are found on either side of the Andes have similarly old ages, e.g., T. bicolor or T. cacao, and those splits may have been caused by Andean uplift, but more samples from these species must be included to test this. The Pebas System that existed from 17 to 11 Ma does not appear to have caused any vicariance events in Theobroma or Herrania because it predates diversification within each of these genera. Possible trans-isthmian splits include that between the T. mammosum and T. angustifolium clade and its sister that occurred 0.7 (0.2–1.3 [95% HPD]) Ma. This is well after the formation of the Isthmus and therefore could have resulted from overland migration rather than long distance trans-oceanic dispersal.



Some trans-Andean species evolved after the Andes had reached a significant height, therefore ancestral species must have been able to disperse over high mountain passes, e.g., T. gileri or H. cuatrecasasana (**Figures 1**, **3**). The mode of dispersal is unknown for Theobroma and Herrania, as it is for many Neotropical trees with large indehiscent fruit. Direct dispersal by vertebrates is a possibility (Cuatrecasas, 1964), including by extinct megafauna (Janzen and Martin, 1982) or early human populations. Dispersal by water is also a possibility and pods have been observed floating in rivers (Whitlock, pers. obs.), although this is not a plausible process to account for dispersal over high passes. There are areas of lower elevation along the eastern cordillera that may have allowed migration of otherwise lowland restricted lineages. However, these areas are flanked by the desert of Tatacoa and the dry forests mentioned in the introduction that lie to the west of the Eastern Cordillera and these could have acted as barriers for dispersal of wet forest restricted taxa. In fact the age of splits could also be used to estimate ages for the development of these dry biomes that could have arisen as a result of rain shadow effects resulting from montane uplift. Splits between wet forest restricted lineages of Manilkara (Sapotaceae) on either side of dry cerrado and caatinga vegetation in Brazil more or less coincided with diversification of cerrado lineages restricted lineages (Armstrong et al., 2014). Timing of splits within other lineages that share similar distributions need to be used along with geological and paleontological data to reconstruct biotic and abiotic history.

We demonstrate that diversification in Theobromeae coincided with major periods of uplift of the Andean mountains and that diversity in the tribe is greatest in areas flanking the Andes. Diversification could therefore have been a direct result of allopatric speciation resulting from the rise of the mountains, as mentioned above, and/or as a result of changes in substrates (as shown for example by Savolainen et al., 2006) and fluvial patterns that occurred in lowland areas around them or other modes of speciation as reviewed by Haffer (1997). Although we have only focused on diversification rates of selected clades (**Table 2**), within which the sampling Richardson et al. The age of chocolate

is often low and for which there is overlap in confidence intervals for age estimates, the mean rate we report for the diversification of the Theobroma/Herrania clade is much greater than any of the others in Malvaceae that we highlight. This is consistent with montane uplift resulting in elevated diversification rates in comparison with those not found in montane regions.

## The Age of Chocolate

According to the WRKY BEAST analysis T. cacao diverged from its most recent common ancestor 9.9 (7.7–12.9 [95% HPD]) Ma. We prefer to accept this date rather than that from the ndhF analysis because the WRKY data sampled more species and gave a better resolved tree. The possibility of retrieving a younger age for the species when all taxa in the genus are added cannot of course be ignored, but the present data indicate that T. cacao diverged early from the remaining lineages within the genus. The phylogenetically isolated position of T. cacao in Theobroma was also recovered by Whitlock and Baum (1999) and is further supported by its placement in its own monotypic section by Cuatrecasas (1964). Its early divergence time indicates that it may have had ample opportunity to achieve a broad natural distribution with high levels of genetic diversity, although human effects on its distribution and diversity cannot be discounted. The timing and extent of diversification within the species will require greater sampling of more individuals throughout its geographic range. Studies using microsatellite data of T. cacao have indicated substantial genetic diversity within wild and cultivated representatives of the species (e.g., Motamayor et al., 2008; Thomas et al., 2012; Motilal et al., 2013). The timing of diversification and extent of variability has implications for the chocolate industry as basing plantations on only a percentage of this genetic diversity means that it may be at unnecessary risk from disease and other threats such as climate change (Motamayor et al., 2008). Under-utilized wild varieties may be brought into cultivation to introduce greater genetic diversity that might protect against these risks and also introduce a wider range of flavors to the industry.

The low support and the lack of a complete species level phylogeny in the WRKY phylogeny means that we still cannot be sure what the closest ancestor of T. cacao is (see also Whitlock and Baum, 1999). Sampling of more individuals and more genes will be necessary to determine its closest relatives and when the species itself began to diversify. The current sample of three cultivated individuals needs to be expanded to use a phylogeographic approach to determine more precisely where and when the species originated and diversified and to complement the many studies on the population genetics of the species (e.g., Motamayor et al., 2008).

## Malvaceae and the Latitudinal Gradient in Species Diversity

Most species of Malvaceae are tropical and the family thus conforms to a latitudinal gradient in species diversity. Temperate lineages of Malvaceae (or at least those groups that contain species with a temperate distribution) are nested within tropical ones, consistent with this pattern that was demonstrated by Judd et al. (1994) and Baum et al. (2004) and have stem nodes dated to the mid-Eocene (Tilioideae) or from the late Oligocene (**Figure 2**, e.g., Malva, Hibiscus). This is consistent with them having evolved from tropical progenitors as temperate climates spread as a result of climatic cooling. The fact that groups from temperate regions are found in few lineages indicates a high degree of phylogenetic niche conservatism with respect to the ability to adapt to cooler temperatures. Interestingly, diversification rates based on crown node ages of the temperate lineage Tilioideae [50 species, crown node age of 17.1 (2.2–33.2 [95% HPD])] Ma and diversification rate of 0.13 [0.07–0.57] with zero extinction and 0.1 [0.05–0.81] with a high relative extinction rate) compare favorably with many sub-families that are restricted to tropical regions (**Table 2**), e.g., Brownlowioideae [68 species, crown node age of 20.5 (5.0–37.0 [95% HPD]) Ma with a diversification rate of 0.12 [0.07–0.32] with a zero extinction rate and 0.1 [0.06–0.41] with a high relative extinction rate]. The fact that this temperate lineage has a similar diversification rate to many tropical ones (**Table 2**) is consistent with the age and area of occupancy of a lineage having been a more important factor in determining the latitudinal diversity gradient than differences in diversification rates between temperate and tropical regions. This comparison is of course limited and better sampling of Malvaceae will permit comparison with other temperate lineages in the family. Studies on other groups of organisms have yielded contrasting results with higher diversification rates in the tropics for some primates (Böhm and Mayhew, 2005), birds (Cardillo, 1999; Cardillo et al., 2005; Ricklefs, 2006; Martin and Tewksbury, 2008), amphibians (Wiens, 2007) or plants (Jansson and Davies, 2008), but no significant differences for mammals and birds (Weir and Schluter, 2007) or amphibians (Wiens et al., 2006, 2009). Our results contrast with Jansson and Davies (2008) study that indicated a latitudinal gradient in diversification rates in flowering plants. Diversification patterns through time will be better determined by dating complete species level phylogenies of lineages that have both temperate and tropical elements. Care must also be taken to decouple latitudinal effects from local or regional geological processes that might have had an enormous impact on diversification rates at regional scales. It could be argued that tropical regions have been more geologically active, e.g., Andean uplift in the Neotropics and complex tectonic activity and orogenic events in Southeast Asia. The diversification rate of Theobroma/Herrania that we focus on here is 0.32 (0.93–0.18) with zero extinction and 0.14 (0.08–0.41) with a high relative extinction rate based on the crown node age of this group of 11.6 (4.0–20.9 [95% HPD]) Ma that is considerably faster than that of any of the subfamilies alone (**Table 2**). The diversification of these genera coincides with Andean uplift that is consistent with this event causing this faster speciation rate. Antonelli et al. (2015) have demonstrated that the Neotropics have a higher rate of evolutionary turnover and emigration than in other parts of the tropics helping to explain the reasons for the longitudinal gradient in diversity that exists in addition to a latitudinal one.

## CONCLUSIONS

The diversification of Theobroma and Herrania coincide with periods of uplift in the Northwestern Andes. The few temperate lineages of Malvaceae are nested within tropical ones having evolved as temperatures cooled during the Tertiary. Tropical lineages do not generally have faster diversification rates than temperate ones but have been around for longer and likely occupied more space consistent with age and area being more important than differences in diversification rate in explaining the latitudinal gradient in species diversity. Finally, T. cacao diverged from its MRCA 9.9 (7.7–12.9 [95% HPD]) Ma, and has had ample time to diversify although the timing of the onset of this diversification and within species variability requires denser sampling.

## REFERENCES


## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fevo. 2015.00120

Supplementary Figure 1 | MCC tree from the BEAST analysis of the ndhF data that includes GenBank numbers, error bars on node age estimates.

Supplementary Figure 2 | MCC tree from the BEAST analysis of the ndhF data that includes GenBank numbers and posterior probabilities of nodes.

Supplementary Figure 3 | MCC tree from the BEAST analysis of the WRKY data that includes error bars on node age estimates and posterior probabilities of nodes.

Supplementary Figure 4 | MCC tree from the BEAST analysis of the WRKY data that includes posterior probabilities of nodes.


Amazonian chocolate tree (Theobroma cacao L). PLoS ONE 3:e3311. doi: 10.1371/journal.pone.0003311


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Richardson, Whitlock, Meerow and Madriñán. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## The establishment of Central American migratory corridors and the biogeographic origins of seasonally dry tropical forests in Mexico

#### *Charles G. Willis 1,2\*, Brian F. Franzone2, Zhenxiang Xi <sup>2</sup> and Charles C. Davis <sup>2</sup> \**

*<sup>1</sup> Center for the Environment, Harvard University, Cambridge, MA, USA*

*<sup>2</sup> Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, MA, USA*

#### *Edited by:*

*Toby Pennington, Royal Botanic Garden Edinburgh, UK*

#### *Reviewed by:*

*Marcial Escudero, Doñana Biological Station—Consejo Superior de Investigaciones Científicas, Spain Matthew T. Lavin, Montana State University, USA*

#### *\*Correspondence:*

*Charles G. Willis and Charles C. Davis, Department of Organismic and Evolutionary Biology, Harvard University Herbaria, 24 Oxford St., Cambridge, MA 02138, USA e-mail: charleswillis@ fas.harvard.edu; cdavis@oeb.harvard.edu*

Biogeography and community ecology can mutually illuminate the formation of a regional species pool or biome. Here, we apply phylogenetic methods to a large and diverse plant clade, Malpighiaceae, to characterize the formation of its species pool in Mexico, and its occupancy of the seasonally dry tropical forest (SDTF) biome that occurs there. We find that the ∼162 species of Mexican Malpighiaceae represent ∼33 dispersals from South America beginning in the Eocene and continuing until the Pliocene (∼46.4–3.8 Myr). Furthermore, dispersal rates between South America and Mexico show a significant six-fold increase during the mid-Miocene (∼23.9 Myr). We hypothesize that this increase marked the availability of Central America as an important corridor for Neotropical plant migration. We additionally demonstrate that this high rate of dispersal contributed substantially more to the phylogenetic diversity of Malpighiaceae in Mexico than *in situ* diversification. Finally, we show that most lineages arrived in Mexico pre-adapted with regard to one key SDTF trait, total annual precipitation. In contrast, these lineages adapted to a second key trait, precipitation seasonality, *in situ* as mountain building in the region gave rise to the abiotic parameters of extant SDTF. The timing of this *in situ* adaptation to seasonal precipitation suggests that SDTF likely originated its modern characteristics by the late Oligocene, but was geographically more restricted until its expansion in the mid-Miocene. These results highlight the complex interplay of dispersal, adaptation, and *in situ* diversification in the formation of tropical biomes. Our results additionally demonstrate that these processes are not static, and their relevance can change markedly over evolutionary time. This has important implications for understanding the origin of SDTF in Mexico, but also for understanding the temporal and spatial origin of biomes and regional species pools more broadly.

**Keywords: adaptive lag time, diversification, land bridge, long-distance dispersal, pre-adaptation, tropical biogeography, South America, species pool**

## **INTRODUCTION**

The application of phylogenetics has stimulated the field of community ecology (Webb et al., 2002; Cavender-Bares et al., 2009). Beyond informing us on the nature of community assembly in the present, however, phylogenetic community ecology also holds the promise of integrating deep evolutionary history to understand the origin of communities in relation to geographic and climatological changes across tens of millions of years (Emerson and Gillespie, 2008). In this spirit, we envision the eventual merger between the fields of "community ecology" and "biogeography." More specifically, we imagine a time in the near future where every species within a community can be placed into its broader phylogenetic context, allowing us to pinpoint each species time and place of origin, and its broader pattern of trait evolution and diversification. From an ecological perspective, biogeographic approaches can provide insight into the origin of the larger species pool for a given region. And from a biogeographic perspective, community ecology can provide insight into the ecological processes that structure and maintain diversity within the same region.

In biogeographic studies, a common way to delineate the formation of a species pool is to focus on the larger geographic region or a particular biome within that region. A major topic along these lines, but one that has not been sufficiently treated for most biomes and taxa, involves the role of migration (originating *ex situ*) vs. diversification (originating *in situ*) in shaping a regional biota (Emerson and Gillespie, 2008). These processes are not mutually exclusive, of course, but represent ends of a spectrum. Yet, the extent to which either of these processes dominates remains poorly understood. Classic island biogeography predicts that migration will dominate in regions that are both new or near a source species pool, while diversification will dominate in regions that are old or isolated (MacArthur and Wilson, 1967; Emerson and Gillespie, 2008; Losos and Ricklefs, 2009). An example of the former is the origination of *Oxalis* diversity in the Atacama Desert, where the modern *Oxalis* species pool was formed through multiple dispersals into the Atacama Desert region from geographically adjacent, likely pre-adapted lineages (Heibl and Renner, 2012). In contrast, the remarkable species diversity in the Andes is often attributed to prolific *in situ* diversification (Bell and Donoghue, 2005; Hughes and Eastwood, 2006; Antonelli et al., 2009). The Andean uplift and the development of more temperate environments effectively isolated this region from its surrounding tropical species pool (i.e., "continental island" effect) and permitted the colonization of temperate lineages via a small number of dispersals events (often only one from within a single larger clade), which subsequently radiated. Understanding how the balance between dispersal and diversification for a given biome changes with time, however, remains poorly understood.

A major component necessary for understanding the balance between migration and diversification is the degree to which lineages are pre-adapted to newly inhabited regions (Donoghue, 2008). Along these lines, it has been hypothesized that given the prevalence of phylogenetic niche conservatism in plants, species may more frequently migrate into regions to which they are preadapted, i.e., "it is easier to move than to evolve" (Donoghue, 2008). For instance, lineages from a clade that share a preadaption to a new biome are more likely to establish there, in which case the biome's species pool is likely to be assembled through multiple dispersal events. In contrast, lineages from a clade that are not pre-adapted will be required to adapt subsequent to their arrival to successfully establish. To the extent that *in situ* adaptation is difficult, lineages that disperse into the biome will fail to establish, thus favoring *in situ* diversification from a smaller number of dispersals. Perhaps the most striking evidence of the former scenario was presented by Crisp et al. (2009), who identified a broad pattern of phylogenetic bias in the tendency for lineages to disperse and establish across ecologically similar biomes.

Short of a complete phylogeny for both the biome and its resident species, it is most effective at this stage to focus on a representative clade that forms an important component of a biome's diversity. Here, we investigate a biome of high diversity, seasonally dry tropical forest (SDTF) in Mexico, and a clade of flowering plants, Malpighiaceae, that represents an important component of this diversity. Mexico is one of the top ten megadiverse countries in the world and contains a variety of tropical biomes (Williams et al., 2001). Among its most prominent and distinct biomes is the widespread SDTF, which blankets the Pacific slopes of Mexico, ranging from central Sonora and southeastern Chihuahua to the southern state of Chiapas southward into Central America (**Figure 1**). SDTF is determined by precipitation patterns characterized by total annual precipitation less than 1800 mm yr−<sup>1</sup> and a distinct seasonally dry period with less than 100 mm over 5–6 months (Murphy and Lugo, 1986; Gentry, 1995). At 1800 mm yr−1, SDTF falls below the threshold of what defines a tropical rainforest (2000 mm yr−1). What further distinguishes SDTF is their striking seasonality in precipitation, which results in a distinctly green period for half of the year, followed by an equally long dry period during which many species lose their leaves. In terms of their physiognomy, Mexican SDTF is characterized by low to medium trees

et al. (2006b).

in which grasses are a minor component (Pennington et al., 2000).

Details on the origin of the Mexican SDTF flora, and the timing of its expansion, have received recent attention (reviewed in De-Nova et al., 2012). The challenge of understanding the origin and expansion of this biome is attributed to scant fossil evidence, as well as, complex geological changes that created the conditions necessary for the biome's characteristic climate (Graham, 2010). Geologically, the establishment of the SDTF in Mexico likely began with the advent of mountain building in southern Mexico during the late Eocene. Today, the north-south Sierra Madre Occidental and the east-west Neovolcanic mountain chains greatly maintain the climatic conditions of this biome, especially by blocking cold fronts from the north. The last uplift of the Sierra Madre Occidental was between 34 and 15 Myr, while the Neovolcanic mountain chain was thought to have been established more recently and in several stages, from west to east, beginning ∼23 Myr and ending only recently, 2.5 Myr. Thus, the formation of SDTF in Mexico likely occurred slowly and steadily over a period spanning at least 34 Myr, coincident with these major mountain building events. Becerra (2005) used a timecalibrated phylogeny of the prominent Mexican dry forest clade *Bursera* to assess the origin and expansion of the SDTF flora. Her findings indicated that the oldest Mexican *Busera* began to diversify between 30 and 20 Myr in western Mexico, in concert with the origination of the Sierra Madre Occidental. In contrast, younger lineages (>17 Myr) diversified in south-central Mexico, with the expansion of the east-west Neovolcanic mountain chain. These results suggest that SDTF formation was greatly facilitated by mountain building in this region and that this biome first established in western Mexico during the Oligocene and subsequently expanded south and east in Mexico, and eventually to Central America (Graham and Dilcher, 1995; Becerra, 2005).

Malpighiaceae are a pantropical clade and include ∼1300 species, 90% of which are found in the New World (Davis et al., 2001; Anderson et al., 2006 onwards; Davis and Anderson, 2010). The family is thought to have originated in South America (Anderson, 1990, 2013; Davis et al., 2002b, 2004), an hypothesis that has been corroborated more recently with greatly expanded phylogenetic sampling (Cai et al., unpublished results). The clade has received broad phylogenetic and biogeographic attention (Cameron et al., 2001; Davis et al., 2001, 2002a,b, 2004; Davis and Anderson, 2010), including more focused investigations in the Old World, especially in Africa and Madagascar (Davis et al., 2002a). Efforts to determine finer scale patterns of Malpighiaceae biogeography in the New World, however, have been hampered by taxonomic sampling deficiencies. Along these lines, one area that is ripe for exploration is the Mexican Malpighiaceae flora, which include ∼162 species (Anderson et al., 2006 onwards; Anderson, 2013). Malpighiaceae are especially abundant in Mexico's SDTF (Gentry, 1995), where they are represented by ∼60 species, putting them in the top five most diverse families in SDTF (Lott and Atkinson, 2006). Based on taxonomic and phylogenetic grounds, Mexican Malpighiaceae have been hypothesized by Anderson (2013) to represent as many as 42 independent origins from outside of Mexico.

The first goal of our study is to test Anderson's hypothesis on the origin of Mexican Malpighiaceae by greatly expanding current taxon sampling for the family (Davis and Anderson, 2010). This will not only facilitate a rigorous assessment of the number of introductions to Mexico (including those inhabitants of SDTF but more broadly in this geographical region), but will also establish the timing and ancestral origins of the diverse Mexican Malpighiaceae flora. This will enable us to distinguish between more ancient long-distance dispersal events directly from South America versus more recent shorter-distance, "stepping stone" dispersal via Central America. This is especially relevant to understanding when Central America became a major corridor for plant migration (Cody et al., 2010; Gutiérrez-García and Vázquez-Domínguez, 2013; Leigh et al., 2014). A related second goal is to understand how rates of dispersal compare to rates of *in situ* diversification, and how these processes shaped the species richness vs. their phylogenetic diversity in Mexico. Finally, the third goal is to investigate if lineages were pre-adapted to SDTF, or if they adapted to this novel biome *in situ*, subsequent to their arrival in Mexico. Relatively few studies have investigated the timing of biome formation using phylogenetic proxies (Becerra, 2005; Davis et al., 2005; Arakaki et al., 2011; Couvreur et al., 2011; De-Nova et al., 2012), yet these approaches hold tremendous promise. It is important to keep in mind, however, using present-day categorizations of biomes as static ecological characters to infer their origin does not fully capture their more dynamic formation over geological time. In particular, different aspects of the abiotic parameters that characterize extant biomes may originate at different times and change at different rates. Here, we seek to elucidate the origination of SDTF in Mexico using two key climate parameters that define this biome today, i.e., overall precipitation (total annual precipitation), and precipitation seasonality (precipitation in the driest quarter). Understanding the dynamics of these interactions is likely to shed key insights into the temporal and spatial nature of the timing of the origin of this biome as it is defined today.

## **MATERIALS AND METHODS**

#### **TAXON SAMPLING**

Our backbone four gene data set (i.e., plastid [pt] *matK*, *ndhF*, *rbcL*, and nuclear [nu] *PHYC*) was published by Davis and Anderson (2010). It includes 338 ingroup accessions of Malpighiaceae representing all 77 currently recognized genera in the family (Anderson et al., 2006 onwards; Davis and Anderson, 2010). Here, we greatly expanded on this taxon sampling, focusing on Neotropical Malpighiaceae, especially from the Caribbean, Central America, and Mexico (see Table S1 in the Supplementary Material). These additional species are represented in numerous diverse genera, including *Bunchosia, Gaudichaudia*, *Mascagnia*, *Stigmaphyllon*, and *Tetrapterys*. Our sampling was guided by W. Anderson (pers. comm.) and is largely summarized in Anderson (2013). Members of Centroplacaceae and Elatinaceae have previously been identified as well supported sister clades to Malpighiaceae (Davis and Chase, 2004; Wurdack and Davis, 2009; Xi et al., 2012), and were included in our analyses as outgroups. *Peridiscus lucidus* Benth. (Peridiscaceae) was used for rooting purposes.

### **MOLECULAR METHODS**

Total cellular DNAs were prepared following Davis et al. (2002a) or were obtained from other sources (see Acknowledgments). Voucher information is listed in Table S1.

Amplification and sequencing protocols for obtaining *matK* followed Cameron et al. (2001), using their primers 400F, *trnK*-2R, and 842F; *ndhF* followed Davis et al. (2001); *rbcL* followed Cameron et al. (2001); and *PHYC* followed Davis et al. (2002b) with the addition of forward primer int-1F, which produced an ∼800 base-pair (bp) amplicon when paired with reverse primer 623r/cdo (Davis and Anderson, 2010).

Double-stranded polymerase chain reaction (PCR) products were primarily gel extracted and purified using the QIAquick Gel Extraction Kit (*Qiagen*, *Valencia*, *California*, *USA*). PCR products were sequenced in both directions using dye terminators and sequencing protocols at the University of Michigan DNA facility (Ann Arbor, Michigan, USA), MWG Biotechnology (High Point, North Carolina, USA), and GENEWIZ, Inc. (Cambridge, Massachusetts, USA). Chromatograms were assembled into contiguous sequences and checked for accuracy using the software program Sequencher v4.7 (Gene Codes Corporation, Ann Arbor, Michigan, USA). All newly generated sequences were submitted to GenBank (see Table S1).

### **PHYLOGENETIC ANALYSES**

Nucleotide sequences were aligned by eye using MacClade v4.0 (Maddison and Maddison, 2000); the ends of sequences, as well as ambiguous internal regions, were trimmed from each data set to maintain complementary data between accessions.

Maximum likelihood (ML) bootstrap percentage (BP) consensus trees and Bayesian posterior probabilities (PP) from all individual analyses of the four gene partitions revealed no strongly supported incongruent clades (i.e., >80 ML BP/1.0 PP) and were thus analyzed simultaneously using the search strategies described below.

The optimal model of molecular evolution for the individual and combined analyses was determined by the Akaike Information Criterion (AIC) using ModelTest v3.7 (Posada, 2008). The optimal model was the General Time Reversible model, with rate heterogeneity modeled by assuming that some sites are invariable and that the rate of evolution at other sites is modeled using a discrete approximation to a gamma distribution (GTR+I+-). ML analyses of the individual and combined matrices were implemented in the parallelized version of RAxML v7.2.8 (Stamatakis, 2006) using the default parameters. ML BPs were estimated from 100 bootstrap replicates. The Bayesian analyses were implemented with the parallel version of BayesPhylogenies v2.0 (Pagel and Meade, 2004) using a reversible-jump implementation of the mixture model as described by Venditti and Pagel, 2008. This approach allows the fitting of multiple models of sequence evolution to each character in an alignment without *a priori* partitioning. Two independent Markov chain Monte Carlo (MCMC) analyses were performed, and the consistency of stationary-phase likelihood values and estimated parameter values was determined using Tracer v1.5. We ran each MCMC analysis for 10 million generations, sampling trees and parameters every 1000 generations. Bayesian PPs were determined by building a 50% majority-rule consensus tree from two MCMC analyses after discarding the 20% burn-in generations.

#### **PHYLOGENETIC AND DIVERGENCE TIME ESTIMATION**

We used Bayesian methods as implemented in BEAST v1.6.1 (Drummond et al., 2006) to simultaneously estimate the phylogeny and divergence times of Malpighiaceae. A likelihood ratio test rejected a strict clock for the entire dataset (*P*-value < 0.001) and we therefore chose the uncorrelated-rates relaxed clock model, which allows for clade-specific rate heterogeneity.

Our four gene regions were analyzed simultaneously as a single partition using the GTR+I+ model as determined using the model selection method described above. Three fossil calibration points served as minimum age constraints for Malpighiaceae and were fit to a lognormal distribution in our analyses. The phylogenetic placement of these calibration points are described in more detail elsewhere (Davis et al., 2002a,b, 2004, 2014). A fossil species of *Tetrapterys* from the early Oligocene (33 Myr, Hably and Manchester, 2000) of Hungary and Slovenia provides a reliable age estimate for the stem node of the two *Tetrapterys* clades. *Eoglandulosa warmanensis* from the Eocene Upper Claiborne formation of northwestern Tennessee (43 Myr, Potter and Dilcher, 1980; Taylor and Crepet, 1987) provides a reliable stem node age for *Brysonima*. Finally, *Perisyncolporites pokornyi* is found pantropically and provides a reliable stem node age for the stigmaphylloid clade (49 Myr, Germeraad et al., 1968; Lowrie, 1982; Berggren et al., 1995; Davis et al., 2001; Jaramillo and Dilcher, 2001; Jaramillo, 2002). The root node, which we set to a normal distribution of 125 ± 10 Myr, corresponds to the approximate age of the eudicot clade. This represents the earliest known occurrence of tricolpate pollen, a synapomorphy that marks the eudicot clade, of which Malpighiales are a member (Magallón et al., 1999; Stevens, 2001 onwards).

MCMC chains were run for 10 million generations, sampling every 1000 generations. Of the 10,001 posterior trees, we excluded the first 2000 as burn-in. Convergence was assessed using Tracer v1.5 (Rambaut and Drummond, 2007).

#### **DISTRIBUTION AND CLIMATE DATA**

Malpighiaceae are distributed widely throughout the Old and New World tropics. We characterized the geographic ranges for each species into seven regions abbreviated as follows (Table S2, Figure S1): SA, South America; CA, Central America; Me, Mexico; Ca, Caribbean; As, Asia; Af, Africa; and M, Madagascar. Widespread species were assigned to more than one area. The maximum observed geographic range included four regions (SA, CA, Me, and Ca). The species' occurrence within these regions was based on Anderson et al. (2006 onwards; pers. comm.).

In addition to this broad geographic classification, we also obtained geo-referenced occurrence data from the Global Biodiversity Information Facility database (www.gbif.org) on May 07, 2012. These data, which included 28,254 initial records, were subsequently filtered based on several criteria. They were scrubbed to exclude species that were not included in our phylogeny, and of occurrences that fell outside of a species' geographic distribution according to Anderson et al. (2006 onwards). This helped to eliminate misidentified specimens or specimens with incorrect locality information. After filtering, our database included 14,743 geo-referenced data points representing 357 species (median number of records for each species = 8). Plant synonymy was checked against the University of Michigan Herbarium Malpighiaceae website (http:// herbarium.lsa.umich.edu/malpigh/). We developed a python web scraping script to automate this procedure (https://github.com/ Bouteloua/MalpighiaceaeTaxaScraper).

Climate data for total annual precipitation (mm, bio12) and precipitation of the driest quarter (mm, bio17) was extracted from the World Bioclim dataset at 30 resolution (www. worldclim.org) based on geo-referenced data and averaged for each species.

#### **ANCESTRAL CLIMATE AND BIOME ESTIMATION**

To estimate ancestral climate states for total annual precipitation and precipitation during the driest quarter, we used a standard Brownian motion model with parameters fitted with ML and Bayesian MCMC. These methods were implemented in the R package *phytools* v0.3–93 (Revell, 2012) using the functions "anc.ML" and "anc.Bayes," respectively. Bayesian models were run for 10,000 generations. We estimated ancestral climate states for each of the 100 ML bootstrap trees.

#### **EVOLUTIONARY LAG TIME**

Adaptation to a new habitat can occur after, before, or during a geographic migration event. In the event that adaptation occurs after a lineage has migrated to a new region, it is known as an "evolutionary lag." To investigate whether there was an evolutionary lag in the adaptation to dry forest habitats after lineages became restricted to Mexico, we calculated the difference between the age of each geographic restriction to Mexico and the age that a lineage first evolved either a given minimum climatic threshold.

Minimum climatic values were based on current definitions of SDTF (Murphy and Lugo, 1986; Pennington et al., 2009). SDTF is defined by total annual precipitation of ≤ 1800 mm and a seasonally dry period of 6 months with ≤ 100 mm of precipitation. For total annual precipitation, we estimated the lag time for two minimum thresholds: 1800 mm yr−<sup>1</sup> (in line with the traditional definition of the biome) and 1600 mm yr−<sup>1</sup> (in line with the observed value for endemic Mexican Malpighiaceae, see **Figure 2**, Table S8). For the seasonally dry period, we estimated precipitation during the driest quarter (3 months) for two minimum thresholds: 50 mm qtr−<sup>1</sup> (in line with the traditional definition of the biome) and 100 mm qtr−<sup>1</sup> (in line with a less severe seasonality).

#### **CLIMATIC PROFILE OF MEXICAN MALPIGHIACEAE**

To characterize the climate profiles of Malpighiaceae, we used phylogenetic generalized linear models (PGLM; Revell, 2010) to test how Mexican Malpighiaceae differed from other Malpighiaceae with regard to total annual precipitation and

#### **FIGURE 2 | Precipitation profiles of Mexican and non-Mexican**

**Malpighiaceae.** Mexican Malpighiaceae are subdivided into two groups: endemic lineages that are geographically restricted to Mexico, and wide-ranging lineages that also occur outside of Mexico. The two climate variables are: total annual precipitation (mm yr−1) and seasonal precipitation (total precipitation during the driest quarter, mm yr−1). Error bars indicate standard errors. Significant (*P* < 0.05) differences in mean values between groups based on a comparison with phylogenetic generalized linear models are indicated by a (Non-Mexican), b (Mexican, Endemic), and c (Mexican, Widespread).

precipitation during the driest quarter. Models were analyzed in R v3.0.2 (R Team Core, 2013) using the "pgls" function implemented in *caper* v0.5 (Orme et al., 2013). We compared two groups of Mexican Malpighiaceae with non-Mexican Malpighiaceae at large. The first group of Mexican Malpighiaceae included species that were geographically restricted (i.e., endemic) to Mexico. Species in the second group were more widespread, occurring in Mexico, as well as, in regions outside of Mexico. These analyzes were run across all 100 ML bootstrap trees.

#### **BIOGEOGRAPHIC RANGE RECONSTRUCTION**

Reconstructing the biogeographic history of Malpighiaceae was estimated using the dispersal-extinction-cladogenesis model (DEC; Ree and Smith, 2008) modified to incorporate founder-event speciation (DEC+J; Matzke, 2013). The DEC+J model assumes dispersal-mediated range expansion, extinctionmediated range contraction, and founder-event speciation with the probability of either event occurring along a particular branch being proportional to the length of that branch and the instantaneous transition rates between geographic areas (Ree and Smith, 2008; Matzke, 2013). Of the 480 species included in our original phylogenetic inference and divergence time estimation, 395 had enough available distribution data to be included in the biogeographic analysis.

We initially compared the DEC+J model with the standard DEC model, as well as the DIVA and BAYESAREA models to determine their fit to our data. Model-fit was assessed by comparing weighted AIC scores (Matzke, 2013). The DEC+J model was the best fit to our data, and was subsequently used for all following analyses (Table S7).

We used the R package *BioGeoBEARS* v0.2.1 (Matzke, 2013) to obtain the most likely dispersal scenarios at all internal nodes of 100 ML bootstrap trees (with outgroups removed) under the DEC+J model.

In our biogeographic model, we restricted the number of regions a lineage can inhabit to the maximum number of regions observed among extant taxa (four of seven). We altered the migration probabilities among geographic areas to reflect changes in connections over geological time (Mao et al., 2012). These migration probabilities range from 0.1 for well-separated areas, to 1.0 for contiguous landmasses. We devised separate migration matrices for four discrete time intervals: 70–45 Myr, 45–30 Myr, 30–5 Myr, and 5–0 Myr. The use of non-zero migration probabilities allowed for the possibility that lineages could have a range that includes regions that are separated today but were once connected. They also allowed for changes in the probability of dispersal between regions that were once separated by large distances before becoming nearly contiguous.

To calculate "dispersal rate" into Mexico, we calculated the average number of expansions into Mexico per million years. Furthermore, we tested for changes in dispersal rate, by estimating an inflection point across the age of expansions into Mexico. To test for an inflection point, we used the function "findiplist" in the R package *inflection* v1.1 (Christopoulos, 2012). We subsequently re-calculated the migration rates for expansions on either side of the estimated inflection point.

#### **TAXON SAMPLING BIASES**

Biases in taxon sampling can introduce errors in estimates of geographic range and ancestral biome. To address potential taxon sampling biases across geography and biomes we utilized our GBIF records as a reference. This dataset includes 828 species of Malpighiaceae (357 of which we sampled in our phylogeny). While this GBIF dataset does not include all of the species in the family, it provides a very broad representation across all sampled biomes and geographic regions relevant to our analyses. Thus, it can be used to assess biome and geographical biases in the taxa included in our phylogeny.

For each genus, we compared expected species counts per biome/geographic region based on GBIF against species counts in our phylogeny using a standard χ2-test. The affinity of each species biome was scored based on the majority occurrence of the species in the World Wild Fund terrestrial biome map (http://www.worldwildlife.org/science/wildfinder/). Of the 72 genera sampled in our study, eight (*Acridocarpus*, *Banisteriopsis*, *Bunchosia*, *Byrsonima*, *Diplopterys*, *Heteropterys*, *Hiraea*, and *Tetrapterys*) exhibited biases in biome sampling (Table S3), while only four (*Byrsonima, Heteropterys, Hiraea*, and *Stigmaphyllon*) exhibited biases in geographical sampling across the New World (Table S4). The biases were primarily limited to our sampling of Central American SDTF taxa (Table S5). The removal of these genera, however, did not affect our general conclusions regarding the lag time for the adaptation of lineages to seasonally dry tropical forests (Table S6).

#### **DIVERSIFICATION RATES**

We tested for shifts in net species diversification (speciation extinction) rate through time and among lineages using BAMM v1.0 (Rabosky, 2014). BAMM allows for simultaneous estimates of rate shifts across a phylogeny using a Bayesian framework. BAMM can also account for incomplete taxon sampling by lineage. We included estimates of lineage completeness by taking the proportion of species within a given genus present in our phylogeny relative to the total number of species reported to be in the genus (Table S9). Additional priors were set using the "setBAMMpriors" function in the R package *BAMMtools* v1.0.1 (Rabosky et al., 2014). For each of 100 ML bootstrap trees, we conducted one run with 10 million generations of MCMC sampled per run, sampling parameters every 10,000 generations. We discarded the first 10% of each run as burn-in. We computed effective sample size (ESS) for log-likelihood and rate parameters to assess the convergence of each run. All parameters had effective sample sizes >400. We calculated the mean of the marginal posterior density of the net diversification rate for small segments (τ = 0.5) along each branch for every tree using the *dtRates* function. This allowed us to assess the variation in diversification rates across every tree.

#### **RESULTS**

#### **PHYLOGENETIC INFERENCE AND DIVERGENCE TIME ESTIMATION OF MALPIGHIACEAE**

Our taxon sampling includes 461 accessions (413 taxa identified to species and 28 taxa identified to genus) representing ∼35% of the total species diversity of Malpighaiceae. In addition, we included 19 taxa from across the Malpighiales as outgroups. The aligned pt *matK*, *ndhF*, *rbcL*, and nu *PHYC* data sets included 1194, 867, 1414, and 1180 base pairs (bp), respectively. The data matrix presented in this study is available in TreeBase (ID# 16087).

Our phylogeny is congruent with previous results (Cameron et al., 2001; Davis et al., 2001; Davis, 2002; Davis and Anderson, 2010), but represent an improvement over previous studies. Our increased sampling here is particularly relevant because it greatly enhances our ability to investigate fine scale diversification patterns in the Neotropics, especially related to Mexican Malpighiacae. Our sampling included 95 species from Mexico, 47 of which are endemic to the region. In addition, several new taxa from adjacent areas of Central America, the Caribbean, and South America were also sampled. Well-supported relationships were congruent between analyses of individual data sets, and the data were thus analyzed in combination. For the sake of space, we present the ML results of the combined four-gene analysis here (see Figure S2). Similarly, our divergence time estimates are congruent with previous estimates (Davis et al., 2002b, 2004, 2014).

#### **CLIMATIC PROFILE OF SAMPLED MEXICAN MALPIGHIACEAE**

Our sampled Malpighiaceae that are geographically restricted to Mexico tended to have lower total annual precipitation, when compared to Malpighiaceae that are not endemic to Mexico, but this difference was not significant (Table S7, **Figure 2**). However, Malpighiaceae restricted to Mexico did differ significantly with regard to precipitation seasonality (Table S8, **Figure 2**). Namely, Malpighiaceae restricted to Mexico had, on average, a significantly distinct dry season with an average precipitation of ∼49.2 mm during the driest quarter. This is compared to an average precipitation during the driest quarter of 195.8 mm for non-Mexican Malpighiaceae, and 195.0 mm for Malpighiaceae that occur in, but are not restricted to Mexico.

#### **BIOGEOGRAPHIC RANGE RECONSTRUCTION**

We identified an average of ∼33 (Quantile95%: 29–38) independent range expansions into Mexico, i.e., a transition where the ancestral range of a lineage expands to include Mexico, but is not necessarily restricted to Mexico (**Figure 3**). These expansions were inferred to have occurred at the stem group node, and originate predominantly from South America. Furthermore, they often include the colonization of additional regions, most commonly Central America, and to a lesser extent the Caribbean. We also identified an average of ∼22 (Q95%: 19–26) range restrictions to Mexico i.e., a transition where the ancestral range of a lineage becomes restricted to Mexico (**Figure 4**). The majority of these restrictions involved lineages where the ancestral range include both Mexico and other additional regions, with the restriction to Mexico occurring after the lineage went extinct in the ancestral non-Mexican regions (Figure S3).

The mean migration rate (based on expansion events, as defined above) of lineages into Mexico was 0.8 lineages Myr−<sup>1</sup> (Q95%: 0.6–1.2 lineages Myr−1). The migration rate was not

expansion event estimated from Lagrange for both stem, parent nodes (upper row) and crown, daughter nodes (lower row). Error bars represent 95% confidence intervals estimated across mean ages from 100 ML trees. Plot ordered by age of stem node from oldest to youngest. Pie-charts represent the estimated proportion of the ancestral range at each respective node. Parent node range represents the original range, prior to dispersal into

uniform through time, however. We identified an inflection point in the migration rate at 23.9 Myr (Q95%: 19.5–28.3 Myr; **Figure 5**). The migration rate prior to the inflection point was 0.3 lineages Myr−<sup>1</sup> (quantile95%: 0.1–0.9 lineages Myr−1), while the rate after the inflection point increased significantly to 1.7 lineages Myr−<sup>1</sup> (Q95%: 1.3–2.5 lineages Myr−1).

#### **PRE-ADAPTION vs.** *IN SITU* **ADAPTATION POST COLONIZATION**

The evolution of total annual precipitation characteristic of SDTF occurred well before the evolution of either geographic restrictions to Mexico (**Figures 4**, S4). For total annual precipitation <sup>≤</sup> 1800 mm yr−1, the mean lag time was 22.0 Myr before the geographic restriction to Mexico, with values ranging from 39.4 Myr before to 30.0 Myr after the restriction to Mexico (Q95%: 36.0–21.6 Myr) (Figure S4). At an even stricter threshold consistent with modern SDTF, total annual precipitation ≤ 1600 mm yr−1, the lag time was 15.8 Myr before the geographic restriction to Mexico, with values ranging from 43.8 before to 2.2 Myr after a restriction to Mexico (Q95%: 23.3–11.6 Myr) (**Figures 4**, S4).

The evolutionary lag of precipitation seasonality characteristic of SDTF was dependent on the age of the geographic restriction which includes Mexico. Key indicates ancestral range reconstructions for stem, parent nodes and daughter, crown nodes to left and right of backslash, respectively. Horizontal bold, blue line indicates the inferred inflection point (solid = mean, dashed = 95% quantile range) when the rate of dispersal between North and South America increases significantly (see **Figure 5**). *P*, Pliocene; Q, Quaternary (Pleistocene/Holocene).

to Mexico (**Figures 4**, S5). For precipitation seasonality ≤ 50 mm qtr−1, there were, on average, four lineages that adapted to seasonality before a restriction to Mexico, 10 lineages that adapted to seasonality after a restriction to Mexico, and six lineages that adapted to seasonality concurrent with a restriction to Mexico (**Figures 4**, S5). Lineages that adapted to seasonality after becoming restricted to Mexico tended to be older, while younger lineages tended to be pre-adapted to seasonally dry periods (**Figures 4**, S5).

### **DIVERSIFICATION RATES**

There were, on average, four major shifts in diversification rate across the Malpighiaceae. These shifts were primarily driven by increases in speciation rate (Figure S6). Two of these shifts were associated with lineages that include high species diversity in Mexico, *Galphimia* (22 sp.) and *Bunchosia* (20 sp.). The additional shifts in diversification rate were at the base of the *Byrisonima* clade, and earlier in the Malpighiaceae (at the most recent common ancestor of the hiraeoid and tetrapteroid clades sensu Davis and Anderson, 2010). The mean diversification rate for Malpighiaceae was 0.08 lineages Myr−<sup>1</sup> (Q95%: 0.04–0.27 lineages Myr−1).

**seasonal dry tropical forest climate in Mexico.** Mean age of geographic restrictions to Mexico shown with black circles. Mean age of the evolution of total annual precipitation (≤1800 mm yr−1; blue circles) and seasonal precipitation (≤50 mm qtr−1; red circles) in Mexican endemics as defined for modern seasonally dry tropical forest shown with blue and red circles, respectively. Also included is (i) the mean age of the evolution of total annual

Malpighiaceae in Mexico (≤1600 mm yr−1; light blue) and (ii) seasonal precipitation under more moderate historic conditions (≤100 mm qtr−1; yellow). Error bars represent 95% confidence intervals estimated across mean ages from 100 ML bootstrap trees. Plot ordered by age of restriction event from oldest to youngest. Cret, Cretaceous; Paleo, Paleocene; P, Pliocene; Q, Quaternary (Pleistocene/Holocene).

## **DISCUSSION**

We have structured our discussion below to focus first on the arrival of Malpighiaceae into Mexico as a geographic entity. Here, we detail the broader biogeographic context of these immigrants, including their ancestral areas of origin and timing of arrival. For the second part of our discussion we explore the relative contribution of *in situ* lineage diversification vs. dispersal-mediated processes in forming the Malpighiaceae species pool in Mexico, which gave rise to SDTF there. And third, we focus on the numerous immigrants to Mexico that have become geographically restricted (i.e., endemic) to this region. Mexican endemic Malpighiaceae are overwhelmingly represented in the SDTF (Figure S7), and thus represent a key to understanding the aridification of Mexico and the formation of this important biome. Here, we focus on characterizing the temporal and spatial nature of the origin of SDTF by investigating if these adaptations arose before or after these lineages became endemic to Mexico.

*Mexican Malpighiaceae represent numerous dispersal events from South America.* Our analyses indicate that the ∼162 species that constitute the Mexican Malpighiaceae flora represent ∼33 independent introductions from outside Mexico (**Figure 3**). These estimates corroborate the large number of introductions to Mexico hypothesized by Anderson (2013) based on his knowledge of the phylogeny and taxonomy of the family. Our ancestral area reconstructions further identify South America as the ultimate source of these many Mexican clades, which is supported by earlier evidence that the origin and early diversification of the family occurred in South America (Anderson, 1990; Davis et al., 2002b, 2004). These findings more generally corroborate the striking floristic similarities between dry-forested regions in northern South America and Mexico (Linares-Palomino et al., 2011).

The migration of South American Malpighiaceae into Mexico falls into two categories. The first involves more ancient northward migrations of South American ancestors directly into Mexico (**Figure 3**). These six migrations began during the Eocene and continued into the early Miocene (46–19 Myr): the first three occur between the Eocene–mid Oligocene (46–33 Myr)

in orange, respectively. Diversification rates are based on estimates of speciation and extinction across all Malpighiaceae to illustrate the full range of diversification rates across the entire family (blue). Vertical lines indicate median rate values; horizontal lines indicate the 95% quantile range for each rate. Both dispersal and diversification rates were estimate across 100 ML bootstrap trees.

and involve disjunct distributions between South America and Mexico; the remaining three occur between the mid-Oligocene– early Miocene (28–19 Myr) and involve disjunct distributions between South America and Mexico or occupancy of Mexico alone. Based on our ancestral range reconstructions, these migrations do not appear to be facilitated by intervening connections involving Central America or the Caribbean. Taxon sampling bias, while a concern for some lineages, is not a likely explanation for the general pattern we observe (see Materials and Methods). Given the physical distance between these two regions for at least the earliest three dispersal events, long-distance dispersal is likely the predominant mode of migration from South America to Mexico. The three more recent dispersals occur during a time period when we believe Central American corridors were becoming at least partially available for migration, which perhaps explains the increased ambiguity in these ancestral range reconstructions (**Figure 3**).

The second category of migration from South America to Mexico includes the majority of events, and implicates a "stepping-stone" dispersal scenario, with Central America, and to a lesser extant the Caribbean, serving as a bridge between South America and Mexico (**Figure 3**). Here, a remarkable 17 of the 33 Mexican Malpighiaceae clades resulting from northward migrations originating in South America had ancestral ranges that expanded to also include Central America. These migratory events begin in the early Miocene ∼16 Myr, and continued until as recently as the early Pleistocene, ∼2 Myr. Collectively, these findings corroborate a wide range of studies indicating that dispersal plays a major role in tropical forest assembly (Dick et al., 2003; Lavin et al., 2004; Clayton et al., 2009).

A broader geographical framework that more precisely incorporates the timing of these dispersal events into Mexico provides a more nuanced context for interpreting the biogeographic origins of the Mexican Malpighiaceae flora. South American migrations to Mexico occur with striking regularity over an extended period beginning in the Eocene and continuing to the Pliocene (46.4–3.8 Myr). During this period we observe an average rate of 0.8 migrations Myr−1. This average, however, obscures a far more dynamic pattern in the change in the rate of migration. At ∼23.9 Myr (Q95%: 19.5–28.3 Myr) we infer a significant, six-fold increase in the rate of migration between North and South America.

This early Miocene shift in the pattern and rate of migration likely sheds key insights into geological processes that shaped Neotropical biogeography. The Cenozoic paleoland and paleoclimatic reconstructions for Central America, especially involving the Central American Seaway and the rise of the Isthmus of Panama, are contentious (Klocker, 2005; Molnar, 2008). Although numerous lines of geological and biological evidence support a more recent Pliocene shoaling of the Isthmus (Molnar, 2008; Leigh et al., 2014), which established the first direct connection between North and South America, evidence from geology and molecular divergence time estimates in various clades indicate that this barrier was partly permeable to plant migration beginning at least by the Oligocene (Montes et al., 2012; Gutiérrez-García and Vázquez-Domínguez, 2013; Leigh et al., 2014). Our results demonstrate a combination of early longdistance dispersal between South America and Mexico in the Eocene–Oligocene, followed by a shift in the early Miocene, when we see a dramatic increase in the rate of migration, likely due to an increased potential for shorter-distance dispersal. These results suggest that sufficient Central American land corridors were available for plant migration between North and South America, well in advance of when the geological connection between these continents appears to have been fully established ∼3.0 Myr (Leigh et al., 2014). Additionally, our analyses of xeric adaption discussed below indicate that these Central American corridors would likely have been characterized by drier forest biomes, in particular, forests with relatively low annual precipitation.

Finally, fruit dispersal syndromes summarized by Anderson (2013) provide ancillary support for these two hypothesized dispersal routes to Mexico–i.e., direct long-distance dispersal to Mexico from South America vs. "stepping-stone" dispersal via Central America. The two clades implicated in older long-distance dispersal events to Mexico from South America, *Bunchosia* and *Byrsonima*, are characterized by fleshy birddispersed fruits. While the influence of dispersal morphology on the potential for long-distance dispersal remains controversial (Higgins et al., 2003; Nathan, 2006), our results provide ancillary evidence indicating that bird dispersed lineages were more likely to disperse long-distances without an intervening land connection such as Central America. This is corroborated by other empirical studies that have found an increased propensity for long-distance dispersal among bird dispersed plants in the tropics (Hardesty et al., 2006; Jordano et al., 2007). In contrast, the vast majority of Malpighiaceae that successfully made the more recent migrations possess wind-dispersed fruits, including species with winged samaras (*Adelphia*, *Bronwenia*, *Banisteriopsis*, *Calcicola*, *Callaeum*, *Carolus*, *Christianella*, *Cottsia*, *Diplopterys*, *Gaudichaudia*, *Heteropterys*, *Hiraea*, *Mascagnia*, *Psychopterys*, *Stigmaphyllon*, *Tetrapterys*) and bristly fruits (*Echinopterys*, *Lasiocarpus*). These fruits, while not ideal for traversing large bodies of open ocean, were likely sufficient for making shorter step-wise dispersals to Mexico when facilitated by intervening land masses with the rise of Central America. *Malpighia*, which possesses mostly bird-dispersed fleshy fruits, is an exception in this regard. They are one of the more recent inhabitants of Mexico (expanding into Mexico ∼5 Myr) that migrated to that region via Central America. On the basis of fruit morphology, however, Anderson (2013) hypothesized that *Malpighia* may have colonized Mexico via winged, wind-dispersed ancestors, which is consistent with our broader interpretation of dispersal patterns. A final conundrum is *Galphimia,* which represents the oldest introduction to Mexico. Although the petals of *Galphimia* species in Mexico are persistent and thus may facilitate wind-dispersal, their fruits otherwise break apart into dry cocci that exhibit no obvious adaptation for long-distance dispersal (Anderson, 2013).

Dispersal mediated processes dominate initial species pool formation and contribute disproportionately to the phylogenetic diversity of Mexican Malpighiaceae. Our analyses of species diversification rates illuminate the formation of the initial species pool of Malpighiaceae that gave rise to SDTF in Mexico. The vast majority of the ∼33 South American lineages that dispersed into Mexico exhibit no evidence of accelerated net diversification rates (Figure S6). Beginning in the Miocene, when rates of migration increased significantly in conjunction with the development of land corridors through Central America, the rate of Malpighiaceae dispersal between South America and Mexico was 1.7 lineages Myr−1, outpacing even the highest rate of net species diversification, 0.3 lineage Myr−<sup>1</sup> (**Figures 3**, **5**). This nearly six-fold difference in the rate of dispersal vs. diversification establishes the former as key factor in the initial establishment of the Mexican Malpighiaceae species pool. The lone exceptions to this pattern of relatively low net diversification are the two oldest introductions to Mexico in the Eocene and Oligocene, *Galphimia and Bunchosia*. These two clades exhibit significantly increased net diversification rates. Although we cannot pinpoint the precise geographic location where these two clades began to diversify, their average diversification rate is on par with the rate of dispersal we observe into Mexico following the development of a Central American migratory corridor.

Our results collectively highlight the balance between *in situ* diversification vs. dispersal in forming a regional species pool, and how this balance can change with time. Here, the two lineages that demonstrate prolific diversification in the family, *Galphimia* (22 sp. in Mexico; Anderson, 2013) and *Bunchosia* (20 sp. in Mexico), constitute roughly 24–28% of the ∼162 extant Mexican Malpighiaceae. Beyond these exceptional clades, however, it is remains clear that even the large number of independent dispersal events into Mexico (∼33) that formed the initial species pool cannot account entirely for the extant diversity of Mexican Malpighiaceae (∼162 sp). Thus, *in situ* diversification is clearly an important contributor to the extant species richness of this region. At the same time, however, our results indicate that the predominant source of increased phylogenetic diversity (PD) in Mexican Malpighiaceae is attributed largely to dispersal. Using the stem group age of each introduction to Mexico as a conservative assessment of PD generated via *in situ* diversification (as measured by mean phylogenetic distance; Webb et al., 2002), we found that the mean phylogenetic distance (i.e., the mean branch length between any pair of taxa) between relatives that arose via *in situ* diversification was 12.5 Myr (Q95% = 10.4-14.9 Myr). In contrast, mean phylogenetic distance between the Mexican subclades that arose via dispersal into the region was substantially larger, 100.5 Myr (Q95% = 96.4-104.6 Myr). It is clear from these results that dispersal has contributed substantially to PD in Mexico by effectively "sampling" broadly from the entire Malpighiaceae clade. The contribution of dispersal to PD we identify may be a general pattern facilitating the formation of SDTF in Mexico, and to the formation of regional species pools more broadly. The *Oxalis* of the Atacama Desert, for instance, represent a broad diversity of *Oxalis* clades that independently dispersed into this region (Heibl and Renner, 2012).

What limited the majority of introduced lineages from radiating in Mexico? One hypothesis is that newly established lineages did not exhibit a competitive advantage to more recently introduced lineages. Thus, newly arriving lineages were able to occupy SDTF in Mexico with a frequency commensurate with the apparent richness of their source pool outside of Mexico (especially from South America). Support for this hypothesis comes from the fact that we find the vast majority of immigrant lineages arrived pre-adapted to at least one of the two traits that characterizes SDTF, annual precipitation. However, most lineages were not adapted to seasonally dry periods. A second hypothesis is that SDTF is more generally dispersal-limited, thus restricting the ability of lineages to radiate subsequent to their arrival into expanding habitat. Such a scenario might offer newly available niche space for lineages to occupy via dispersal from outside the regional source pool. Previous studies have found a significant degree of genetic differentiation among geographically isolated populations across dry forest habitats, supporting the idea that these dry-forests are dispersal-limited (Pennington et al., 2006a, 2009). Furthermore, at least one of the two lineages (*Bunchosia* [20 sp.], and less significantly so *Malpighia* [19 sp.]) that diversified substantially in Mexico are bird dispersed. Such lineages are likely to more easily migrate and radiate across a landscape via allopatric speciation. This is compounded by the fact that the formation of SDTF was not likely contiguous, both spatially and temporally (Graham and Dilcher, 1995). New fragments of SDTF likely arose at different times and in different regions of Mexico, which might have greatly facilitated the continuous establishment of new Malpighiaceae lineages via dispersal from external source pools. In combination, these processes might act to limit the overall diversification of single Mexican introductions, and help explain the pattern of frequent and continued dispersal of new lineages into this biome.

The predominance of dispersal in the formation of a regional species pool has been shown elsewhere, including in the California chaparral (Ackerly, 2004), the Atacama Desert (Heibl and Renner, 2012), and in temperate, post-glacial habitats (Williams et al., 2004). However, these and related studies are commonly restricted to a narrow window of time or to a single, relatively young clade. In our case, the overwhelming pattern of dispersal we identify spans a wide period of time as well as a large, diverse plant clade. This raises the possibility that the characterization of species pools being dominated by single lineages that moved into a region and subsequently radiate, such as in the Andes (Hughes and Eastwood, 2006) or on remote islands (Baldwin and Sanderson, 1998; Emerson, 2002), may be exceptions rather than the rule of species pool assembly. A more general pattern, especially with regard to the phylogenetic diversity of a species pool, might instead be the steady recruitment of independent lineages via dispersal over tens of millions of years, most of which exhibit modest rates of *in situ* diversification.

*Endemic Mexican Malpighiaceae exhibit both ex situ* preadaptation *and in situ adaption to xeric environments in Mexican SDTF.* Our analyses of trait adaptation to precipitation provide direct insights for more clearly interpreting the spatial and temporal origins of xeric adaptations by Mexican Malpighiaceae. An emerging question in the field of biogeography is when did traits relevant to community assembly originate (Ackerly, 2004; Cavender-Bares et al., 2009; Simon et al., 2009; Edwards and Donoghue, 2013). Specifically, did Mexican Malpighiaceae evolve adaptations to xeric habitats elsewhere (*ex situ*, i.e., were they pre-adapted?) followed by movement into Mexico or did they evolve these adaptations *in situ* in Mexico. This question has been more succinctly phrased as "is it easier to move, than to evolve?" (Donoghue, 2008). To explore this question in Mexican Malpighiaceae, we focused on two key traits central to the definition of contemporary Mexican SDTF: lower total annual precipitation (≤1800 mm) and extreme seasonality in precipitation (≤50 mm over the driest 3 months). Here, we specifically examined the ∼22 Mexican lineages that have become endemic to Mexico (**Figure 5**), since these lineages are overwhelmingly represented in SDTF of this region (Figure S4).

We found that the timing of adaptation to these two key climate traits contrast sharply in relation to when lineages become restricted to Mexico (**Figures 4**, S4, S5). Adaptation to drier annual precipitation evolved well before Malpighiaceae become endemic to Mexico by an average of ∼15 Myr (Q95%: -1.3 to 64.8 Myr) (**Figures 4**, S4). Thus, lineages that subsequently became geographically restricted to Mexico were likely pre-adapted for living in relatively drier environments. This climatic adaptation appears to have arisen largely in South America (Figure S4). In contrast, adaptation to extreme precipitation seasonality evolved largely concurrent with or after Malpighiaceae became restricted to Mexico (**Figures 4**, S5). We interpret these results to indicate that adaptations to precipitation seasonality largely arose *in situ* in Mexico as the abiotic conditions of SDTF were developing in this region (Figure S5). On average, the lag time for this *in situ* adaptation to precipitation seasonality was ∼0.7 Myr. This pattern is supported by recent findings that identified a lag time in the adaptation of lineages to dry environments following their arrival into the Atacama Desert in South America (Guerrero et al., 2013).

Our results, however, provide a more nuanced perspective of *in situ* adaptation to dry forest environments and illuminate the timing of the origin and expansion of modern SDTF in Mexico. The first lineage to inhabit Mexico evolved dry forests adaptations ∼26.0 Myr. For the remaining lineages, however, adaptation to precipitation seasonality did not arise until much later, around ∼13.7 Myr on average. This suggests that although SDTF appears to have arisen in Mexico during the late Oligocene, it was likely geographically restricted at the time. It did not expand greatly until the mid-Miocene. This pattern is consistent with slow and steady north-south mountain building coinciding with the final uplifit of the Sierra Madre Occidental (34–15 Myr), followed by east-west orogeny of the Neovolcanic mountain chain (∼23–2.5 Myr), which is hypothesized to have initiated the geological and abiotic conditions that gave rise to SDTF in Mexico (Moran-Zenteno, 1994; Becerra, 2005). Furthermore, this pattern more broadly coincides with global aridification that began during the Miocene, which marked the worldwide expansion of grasslands and succulent biomes (Cerling et al., 1993, 1997; Arakaki et al., 2011). Our results for Malpighiaceae further indicate that the expansion and widespread establishment of SDTF in the mid-Miocene marks a transition in the temporal pattern of adaptation to precipitation seasonality. While older lineages that become restricted to Mexico adapted to precipitation seasonality *in situ*, younger lineages that become restricted to Mexico (≤13.7 Myr) tended to be pre-adapted to precipitation seasonality (**Figures 4**, S5). These younger lineages appear to have arisen from ancestors that adapted to precipitation seasonality in Mexico and Central America, but subsequently became geographically restricted to Mexico. These patterns collectively illuminate the timing of the origin and expansion of modern SDTF in Mexico, and indicate that this biome more recently served as an important species pool for other SDTFs across the Neotropics.

It is important to keep in mind that the formation of the SDTF biome in Mexico occurred over millions of years, which appears to have afforded the earlier inhabitants of this region the time necessary to adapt to the novel abiotic conditions that define this biome today. If our hypothesis is correct, we should observe a pattern of more gradual adaptation to precipitation seasonality for dry periods that are consistent with intermediate historic levels (i.e., twice the level of current precipitation seasonality, 200 mm). When we perform this exercise, we observe a clear pattern of earlier adaption to these historic levels as expected under our hypothesis (**Figures 4**, S5). The extent to which this *in situ* adaptation occurred within the geographic footprint of the biome itself, vs. geographically adjacent regions in Mexico cannot be resolved with our data. However, we would expect *in situ* adaptation within the biome for many of these lineages. Further investigations using additional taxon sampling and far better geographic reconstructions of this biome throughout the Cenozoic would be required to determine this more confidently. Additionally, studies of recent range expansions and adaptions to novel environments could shed insight into this scenario. Nevertheless, the lag times we identify raise concerns about using modern biome categorizations as static characters for phylogenetic character state reconstruction (Davis et al., 2005; Crisp et al., 2009). In particular, our study indicates that ancestral reconstructions of biomes as static characters are problematic if the abiotic parameters that define these biomes today are much more recently evolved than the clades that inhabit them. Instead, our results suggest that inferring the evolution of key parameters that characterize these biomes (e.g., annual precipitation and precipitation seasonality) provide far more insight into the evolution of the lineages that inhabit the biome, as well as the biome itself.

### **CONCLUSION**

The assembly of a biome's flora depends on the complex interplay of dispersal, adaptation, and *in situ* diversification (Emerson and Gillespie, 2008). These processes in turn depend on both the geographic proximity and ecological similarity between the source pool and the biome in question (Emerson and Gillespie, 2008; Edwards and Donoghue, 2013). Our results additionally demonstrate that these processes are not static and change over evolutionary time. This has important implications for the composition of SDTF in Mexico to the formation of regional species pools and biomes more broadly. In the case of the formation of the Mexican Malpighiaceae flora, we found that changes in the geographic connectivity between South America and Mexico in the Miocene lead to dramatic changes in dispersal patterns from South America. Initially, lineages were limited to less frequent, longer-distance dispersal from South America. But increasingly the availability of Central America permitted more frequent, shorter-distance dispersal, facilitating vastly greater migration to Mexico starting in the mid-Miocene. The increased rate of dispersal, in turn, significantly influenced the composition of Mexican Malpighiaceae. While much of the extant species diversity in Mexico derives from *in situ* diversification, the overwhelming majority of phylogenetic diversity derives from multiple, independent introductions from South American ancestors spanning the phylogenetic breadth of the Malpighiaceae clade. Furthermore, climatic conditions that define the extant SDTF biome appear to have changed through time. While most lineages arrived in Mexico pre-adapted to one key factor of SDTF, total annual precipitation, these lineages subsequently adapted to another major axis, seasonal dry periods *in situ*. In this case, *in situ* adaption appears to have occurred gradually, over many millions of years, coincident with mountain building that established the geological conditions that maintain the biome today. Moreover, we demonstrate that once SDTF becomes widely established in Mexico, it increasingly becomes a source pool for other more recently developed SDTF, especially in Central America.

#### **ACKNOWLEDGMENTS**

We thank C. Anderson, W. Anderson, J. Beaulieu, S. Cappellari, M. Gomez, D. Rabovsky, C. Webb, W. Zhang, and members of the Davis laboratory for technical assistance and valuable discussions. Funding for this study came from US National Science Foundation Assembling the Tree of Life grant DEB-0622764, DEB-1120243, and DEB-1355064 (to C.C.D.), and from the Michigan Society of Fellows. This paper is dedicated to William R. Anderson–friend, and lifelong student of Malpighiaceae.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene. 2014.00433/abstract

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 08 July 2014; accepted: 22 November 2014; published online: 19 December 2014.*

*Citation: Willis CG, Franzone BF, Xi Z and Davis CC (2014) The establishment of Central American migratory corridors and the biogeographic origins of seasonally dry tropical forests in Mexico. Front. Genet. 5:433. doi: 10.3389/fgene.2014.00433*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Willis, Franzone, Xi and Davis. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The evolution of African plant diversity

## *H. Peter Linder\**

*Institute of Systematic Botany, University of Zurich, Zurich, Switzerland*

#### *Edited by:*

*Toby Pennington, Royal Botanic Garden Edinburgh, UK*

#### *Reviewed by:*

*Harald Schneider, Natural History Museum, UK Thomas Couvreur, Institut de Recherche pour le Développement, Cameroon David J. Harris, Royal Botanic Garden Edinburgh, UK*

#### *\*Correspondence:*

*H. Peter Linder, Institute of Systematic Botany, University of Zurich, Zollikerstrasse 107, CH-8008 Zurich, Switzerland e-mail: peter.linder@systbot.uzh.ch*

Sub-Saharan Africa includes some 45,000 plant species. The spatial patterns of this diversity have been well explored. We can group the species into a set of biogeographical regions (largely co-incident with regions defined for terrestrial vertebrate groups). Furthermore, we know that the diversity is unevenly distributed, with southern Africa (especially the south-western tip) disproportionally species rich, while the West African interior is disproportionally species poor. However, the origins of this diversity have only been explored for two anomalous African Floras (the Tropic-alpine Flora and the Cape Flora), whereas the origins of the diversity of the other floras are still unknown. Here I argue that six floras, with distinct geographical centers, different extra-African affinities, ages of radiation and radiation rates, can be delimited: the Austro-temperate, Tropic-alpine, Lowland forest, Tropic-montane, Savanna and Arid Floras. The oldest flora may be the Lowland forest Flora, and the most recent is the Tropic-alpine, which probably evolved during the Plio-Pleistocene on the summits of the East Africa volcanoes. My results suggest that the most rapidly radiating flora is the Austro-temperate Flora, while the other floras are all diversifying at more or less the same rate, this is also consistent with the current massive species richness in this flora (about half of the African species richness). The Austro-temperate Flora appears to be related to the floras of the other southern continents, the Tropic-alpine Flora to that of the Northern Hemisphere, and the four tropical floras to the tropical regions of the other continents, consistent with the theory of phylogenetic niche conservatism. Current African diversity may be the result of the sequential adding of new floras to the continent. Possibly the species poverty especially of the Lowland forest Flora may be the result of the spread of C4 grasslands and associated regular fires.

**Keywords: Africa, biogeography, Cenozoic, diversification rates, extinction, fire, floras, savanna**

## **INTRODUCTION**

A central theme in evolutionary biology is the evolution of diversity. This can be addressed at many scales. At the most detailed level it deals with the evolution of genetic diversity within populations, and at the broadest level it addresses the evolution of differences in species richness and composition among continents. The time-scales also vary from hours in the case of bacterial diversification to geological time for continental differentiation. Africa, symmetrically astride the equator, is rich in biodiversity paradoxes. The Sahara, the world's largest desert, is arguably the largest extremely species-poor area on the planet. At the other end of the continent is the Cape flora, arguably the globally most species rich temperate flora. Similarly, the extensive species-poor Sahelian semi-desert of West Africa is counterpoised to the worlds richest semi-desert flora of the succulent karoo. The very extensive species-poor savannas are host to one of the richest mammal faunas. The ice-desert of the worlds highest free-standing mountain (Mt Kilimanjaro) contrasts with salt-desert of the Afar depression, the second deepest dry place below ocean level (after the Dead Sea). We do not yet have a narrative explaining the origins of this African diversity.

Africa includes approximately 45,000 plant species in 29 million km<sup>2</sup> (Klopper et al., 2007), compared to c. 90,000 in 17.84 million km<sup>2</sup> in South America and c. 42,000 species in 3 million km<sup>2</sup> in Malesia (Davis, 1995). Furthermore, this diversity is distributed very unevenly across the continent (**Figure 1**). Whereas the central Sahara is probably one of the largest, maximally species-poor non-glaciated regions of the planet, the south western tip of the continent at Cape Town is one of the most species rich (Manning and Goldblatt, 2013; Snijman, 2013), and houses some 25% of all African species. There is a remarkable gradient in diversity from the Sahara southwards to the West African coast, peaking in Cameroun-Gabon area. The escarpments of central and southern Africa are also much more species rich than the central plateau, and the eastern escarpment is richer than the western escarpment. These patterns are based on analysis of 5881 species analyzed at a 1◦ level (Linder, 2001), but the broad pattern matches the diversity estimates based on whole floras (Küper et al., 2004; Mutke and Barthlott, 2005).

Endemism richness (Kier et al., 2009), defined as the proportion of range restricted species, shows somewhat different patterns, with the northern half of the continent with isolated

patches of endemism richness, usually associated with mountains (**Figure 2**). The areas between these patches are very low in endemism richness. Africa south of 10◦ has a much higher background level of endemism richness (Linder, 2001), and the Eastern Arc of East Africa is generally high in endemism richness. This reaches an extreme in the Cape flora, with some 70% endemism to the winter rainfall region (Snijman, 2013). Areas of high endemism richness are usually also areas with a high turnover in species composition, as many species have narrow distribution ranges. The center of the continent is species poor, with a low level of endemism richness, and so with a low level of species composition turnover.

The spatial changes in species composition have been used to regionalize the African flora. The "phytochorological" classification of White (1983) was based on the comparison of lists of species from across Africa, but without a numerical analysis. White's highly influential system recognized regional centers of endemism, separated by transitional zones, but did not arrange these hierarchically. There have been several attempts to regionalize the flora objectively (Denys, 1980; Linder et al., 2005, 2012b). The last compared the plant patterns with those of several groups of vertebrates (mammals, birds, snakes and frogs) and found a surprisingly high degree of congruence. Based on this a new regionalization of the African biota was proposed (**Figure 3**).

Three disjunct distribution tracks have been described for Africa. The first described is the "arid track" which links the semi-desert of the Kalahari with NE Africa, and which is manifested both by shared, disjunct species and by sister species (De Winter, 1966, 1971; Verdcourt, 1969; Thiv et al., 2011; Bellstedt et al., 2012). This can be extended to the African West Coast at Mauritania and to the semi-arid Sahelian regions, encompassing the arid margins of the continent, the so-called "Rand Flora" (Jürgens, 1997; Sanmartin et al., 2010). The second track links the mesic evergreen forests and grasslands of the tropic-montane

**FIGURE 2 | Endemism richness in Africa, based on the distributions of 5881 species.** This is the sum of species per grid, but each occurrence is inversely weighted by its distribution range. Widespread species contribute little to the endemism richness of a gridcell, whereas species with small ranges contribute much. Yellow indicates grids with few, and mostly widespread species, and red and purple indicates grids with numerous, narrowly distributed species. This picks out the centers of endemism and richness more effectively than simple richness.

regions across Africa. This "Afromontane" track was regarded by White (1978) as a floristic, chorological region, but subsequent numerical analyses failed to retrieve it (Linder, 1998; Linder et al., 2005). This is linked by several widespread species, and also by a numerous genera largely restricted to this region, and has received much attention to date (Wickens, 1976; Linder, 1990; Griswold, 1991; Grimshaw, 2001; Galley et al., 2007). The third disjunction is between the rainforests of West and Central Africa and of the East African coastline, often with closely related species (Faden, 1974; Couvreur et al., 2008). These forests are thought to be vicariant, with the disjunction pre-dating the African Miocene doming that resulted in the Rift Valleys and the arid East African interior.

I ask how the current diversity (patterns of richness, regionalization and tracks) of the sub-Saharan African flora (south of about 22◦ North) evolved, and address this by analysing the evolutionary history of a selection of African clades. In order to be able to synthesize the individual histories of numerous clades, I group the clades into floras. I define a flora as a group of clades which (a) are largely found in the same area, (b) have largely the same extra-Africa geographical affinities, (c) share a diversification history, and (d) have a common maximum age. This implies that in any given area, several floras can be present, and that any given flora can be geographically disjunct or even fragmented. Floras can also be linked to a limited number of vegetation types. I use these histories to determine when each flora, and so its associated vegetation types, evolved in Africa. I explore the diversification rates in each flora. I ask where, outside Africa, the most closely related clades are found. Finally, I synthesize from these a scenario postulating the evolutionary history of the African flora.

### **MATERIALS AND METHODS**

#### **GEOGRAPHICAL PATTERNS**

One of the attributes of a flora is that it occupies a discrete geographical area or areas, consequently the members of a flora can be identified by a shared geographical range. These taxa could be species. However, if a flora also contains evolutionary history, then we should cluster groups of related species or clades (e.g., "Cape clades," Linder, 2003). This could be especially important in floras which are disjunct, and/or which have at least some allopatric species replacement [e.g., in the Tropic-montane Flora (Linder et al., 2012b), the Lowland forest Flora (Faden, 1974), or the Cape flora (Weimarck, 1941)]. This could be incorporated by weighting shared presence by phylogenetic relatedness, but a much simpler albeit cruder method is to use taxonomy as an indication of relatedness, by for example using genera or families. Here I clustered the genera on their shared geographical regions, based on the assumption that each genus is restricted to one flora. This assumption is, however, often violated, especially in large genera, which are often found in several quite distinct floras. This can lead to some strange assignments of genera to floras.

I used the species distribution dataset which was compiled at the Nees Institute, University of Bonn, in the context of BIOTA Africa project together with several external partners. For more details about the dataset and a full acknowledgement of all contributors please refer to Linder et al. (2005) and Küper et al. (2006). This data set includes the distribution data, scored as presence to a 1◦ grid, for 5881 species in 1125 genera. These taxa were selected as they had been recently revised, and the distribution data had consequently been verified. These data were simplified to genera, with the number of species per genus in each grid cell summed. I used the checklist compiled and maintained by the African Plants Database (http://www*.*ville-ge*.*ch/musinfo/bd/cjb/ africa/recherche*.*php, accessed June 2013) (Klopper et al., 2006) to obtain an estimate of the number of species in each genus, and used this list to filter out all genera with less than 50% of the species present in the Bonn list. This left 331 genera and 3660 species. From this dataset all genera found in fewer than four grid cells, and all grid cells with fewer than five genera, were filtered out, as small numbers in cells can lead to aberrant results. The final database included 309 genera (of the 3802 sub-Saharan genera (Klopper et al., 2007) and 1378 grid cells.

Genera were clustered on the basis of shared distribution ranges. The distances between the genera was determined with βsim, which does not take shared absences into account. Because it estimates the proportion of shared species between the two entities that are being compared, for the entity with the smaller number of species, it is also not distorted by a large difference in species richness among the entities (Kreft and Jetz, 2010). The resultant distance matrix was clustered using UPGMA. All analyses were done in R (R Development Core Team, 2012), using the packages "vegan" (Oksanen et al., 2012), "mass" (Venables and Ripley, 2002) and "cluster" (Maechler et al., 2011). Generic presence was treated as presence/absence, due to the absence of a suitable approach which combines the attributes listed above, with a weighting by the proportion of genus in the region. The distance matrix obtained with the βsim algorithm was also ordinated using principal coordinates analysis (PCoA).

#### **AGE AND EXTRA-AFRICAN AFFINITIES OF THE FLORA**

In order to establish the age of each flora, where its closest sister taxa are found, and what its diversification rate was, I searched the literature for all phylogenies for which the following information was available: the crown age of the clade, the number of species in the flora in which it occurs; and the geographical area of the sister clade. In total 57 clades were used (Supplementary materials). Each clade, or part of a clade, was assigned to one or more African floras. Each crown age was critically compared to available information to check that the indicated ages are consistent with the rest of the literature.

The possible floras had to be postulated prior to the analysis. I used the four floras detected in the geographical analysis, as well as two additional possible floras (the Arid Flora, and the Tropic-alpine Flora) which the geographical analysis could not detect.

#### **SPECIES ACCUMULATION PATTERNS**

Ideally, diversification rates are inferred from species-level dated phylogenies. However, very few such phylogenies are available for the African flora, consequently I used simpler and much cruder methods to determine whether the diversification patterns in the six floras are the same. I built a general linear model in SPSS v. 19, with species richness of each clade as response variable, and using the age of the clade, and flora membership, source area, and growthform (herbaceous = 1, woody = 2, succulent = 3, softly woody = 4, and geophytes = 5), as predictor variables. To obtain homogenous variances the response variable was log(10) transformed. I used the same 57 clades that were used in the age and source analysis.

#### **RESULTS**

#### **GEOGRAPHICAL PATTERNS**

The geographical analyses suggest the recognition of four African floras: Savanna, Tropic-montane, Lowland forest and Austrotemperate. However, these are not clearly separated. Most distinct are the heathland genera of the Austro-temperate Flora from the lowland forest genera, with tropic-montane and savanna taxa forming intermediate clusters (**Figures 4**, **5**). Thirty-seven genera were not assigned, as they linked into the phenogram below the cut-off for the four groups (unshaded genera in **Figure 4**), and in the PcoA they are centered, more or less co-incident with the tropic-montane genera.

### **EXTRA-AFRICAN AFFINITIES OF THE FLORA**

For all floras except the Tropic-alpine, the most common regional source is Africa (**Table 1**), indicating substantial within-African diversification. This indicates geographical rather than ecological control over the source areas of clades in a flora. However, the extra-African affinities show a strong ecological influence. Lowland forest and savanna clades have their strongest affinities to South America and Asia, which also host tropical forests and savannas. The Austro-temperate Flora is strongly recruited from Australia and only to a minor extent from Eurasia. The tropicalpine clades show a strong connection to Eurasia and to a lesser extent to South America, but none to Australia. This suggests that the Austro-temperate Flora can be regarded as a "southern temperate flora" with a close relationship to the flora of Australia, the Tropic-alpine Flora as a "northern temperate flora" with a close connection to the floras of Europe and the Himalaya. Lowland forest, Arid and Savanna Floras show a linkage to the tropical floras of South America and Asia.

#### **AGES OF FLORAS**

The oldest African clades are found in the Lowland forest Flora (Annonaceae, Moraceae and Sapotaceae), these date to the Cretaceous (**Figure 6**). The Austro-temperate and Arid Floras date from the Paleogene, and especially in the Austro-temperate Flora many clades originated in the Eocene. The oldest savanna and tropic-montane clades date from the earliest Miocene, while the oldest tropic-alpine clades are from Late Miocene. In three cases the first clades to be assigned to a flora are substantially older than the next oldest (Zygophylloideae in the Arid Flora, Restionaceae in the Austro-temperate Flora, and Moraceae in the Lowland forest Flora). The temporal co-occurrence of many clades after that does give a sense that there is a maximum age to a flora, and there is continuous recruitment into a flora.

#### **DIVERSIFICATION RATE**

The optimal model which predicts the number of species per clade includes the median crown age, the growth form, the source area and the flora to which it belongs and explains 47% of the variation (**Table 2**). Evaluation of the parameter estimates shows that whereas all growth forms are significant different (*p <* 0*.*05), only the Austro-temperate Flora is significantly different from the other four floras. Disentangling the floras based on the mean crown ages of the clades (**Figure 7**) indicates that overall the

ordination in **Figure 5**. The genera are listed assigned to their groups in the Supplementary Materials.

Austro-temperate Flora has a higher diversification rate than the other African floras.

## **DISCUSSION**

#### **CAVEATS**

This first attempt to infer the evolutionary history of the African flora is based on a number of simplifying assumptions which might affect the results. The search for geographical patterns based on genera assumes that each genus is restricted to a single flora. This might apply to most genera, but there are a number of cases of ecologically (and so be implication also

**Table 1 | Geographical areas (simplified to continents) of the sister clades of African clades typical of Lowland forest, Arid, Austrotemperate, Savanna, Tropic-montane, and Tropic-alpine Floras.**


*The values in the cells refer to the number of clades (Supplementary Materials). Highest extra-African ranking areas are shaded red, next come green shaded areas, and least common sister-areas are shaded yellow.*

floristically) widespread genera. For example, both *Acridocarpus* (Malpighiaceae) and *Coccinia* (Cucurbitaceae) are found in forests, savanna and seasonally arid vegetation (Davis et al., 2002; Holstein and Renner, 2011). Similar shifts have also been documented in *Euphorbia*, albeit without the forest component (Bruyns et al., 2011; Peirson et al., 2013). In *Euphorbia* many of the shifts occurred in parallel between the subgenera, often also associated with a parallel evolution of succulence. *Dorstenia* (Moraceae) is an example of a typical forest understory herb which has also spread into arid habitats, with *D. gigas* a succulent-stemmed shrub in the seasonally arid vegetation of

**Table 2 | Results of a general linear model with the log species richness of each clade as response variable, and the mean crown age, the flora, growth form and source region.**


*aR*<sup>2</sup> <sup>=</sup> *0.470 (Adjusted R*<sup>2</sup> <sup>=</sup> *0.310).*

Socotra (Misiewicz and Zerega, 2012). These genera could result in obscuring the differences among the floras.

The small number of tropical African clades available for analysis means that the results are perforce tentative, and this applies to both the age of the floras as well as to the assessment of their sister areas. This applies particularly to the Savanna and the Arid Floras. Similarly, estimating the diversification rate from only the crown age and the number of extant species is a very crude approach, which assumes that the diversification rates have been more or less constant. Mixed rates cannot be detected with these methods, and this can lead to the underestimating of diversification rates in nested radiations. Nonetheless, the results obtained are consistent with other information, suggesting that the overall patterns might be correct.

#### **HOW MANY AFRICAN FLORAS?**

It is evident that the floristic diversity of Africa is the product of a complex amalgam of several floras. Geographical analysis suggests four floras: an Austro-temperate Flora and three tropical floras: Lowland forest, Savanna and Tropic-montane Floras. Although some regions are strongly dominated by specific floras,

other regions contain elements of several floras, leading to geographical mixtures. Similarly, in many cases the assumption of a genus being restricted to one flora is violated, leading to genera that cannot be confidently placed to any flora. In addition, there are probably two more floras, not detected by the geographical analysis. There is most likely an "Arid Flora," centered in the Horn of Africa, with a secondary center is south-western Africa (Jürgens, 1997). This may have remained undetected because it was not adequately sampled due to the deletion of speciespoor grids. Filtering out cells with few species may have removed many cells from Somalia, north-east Kenya and the Kalahari in which the Arid Flora is well represented. There is also a Tropic-alpine Flora (Hedberg, 1957, 1965), this could not be detected because it is completely geographically embedded in the Tropic-montane Flora.

Investigation of extra-African distribution ranges of closely related clades indicates three groups of floras: a north temperate group (the Tropic-alpine Flora), a south temperate group (the Austro-temperate Flora), and a tropical group (Lowland forest, Savanna and Arid Floras). The Tropic-montane Flora seems to have mixed affinities. The largely north temperate elements in the Tropic-alpine Flora have long been known, and were first systematically documented by Hedberg (1965). Subsequent studies have revealed a somewhat more complex picture, but confirmed frequent immigration from north-temperate areas (Gehrke and Linder, 2009), although showing that other temperate areas could also act as source. The recent and repeated immigration of species from the north is well illustrated in *Arabis alpina* (Koch et al., 2006; Assefa et al., 2007).

Similarly, the austral elements in the Cape flora (interpreted here to be part of the Austro-temperate Flora) have long attracted attention (Levyns, 1962, 1964), and recent analysis has shown that these are a substantial part of the flora (Galley and Linder, 2006), although the flora contains elements from most continents (Goldblatt, 1978). Remarkably few lineages have been recruited from South America into the Cape flora [but see *Prionium* (African) and *Thurnia* (South America) (Thurniaceae)], while the links to the more distant Australia are strong. The floristic exchange with Australia has been continuous throughout the Cenozoic. There is some indication that Paleogene exchanges were largely from Australia to the Cape flora [e.g., Proteaceae (Sauquet et al., 2009), Restionaceae (Galley and Linder, 2006) and Schoeneae (Verboom, 2006)], whereas exchange from the Cape to Australia seems to be more common in the Neogene (Bergh and Linder, 2009), involving, *inter alia*, genera like the Geraniaceae genus *Pelargonium* (Bakker et al., 1998) and the Campanulaceae genus *Wahlenbergia* (Prebble et al., 2011). Exchanges with tropical and north temperate areas are less common. They do, however, involve several noteworthy instances, such as *Erica* (McGuire and Kron, 2005) and *Lobostemon*, sister to the Mediterranean-Macaronesia *Echium* (Boraginaceae) (Hilger and Böhle, 2000).

The linkages among the tropical floras have also long been known, despite their assignment to different floristic kingdoms (Takhtajan, 1986; Cox, 2001). In the Late Cretaceous and Paleogene, India was adjacent to East Africa and may have formed a bridge to south-east Asia (Conti et al., 2002; Rutschmann et al., 2004). Connections across the Atlantic Ocean have generally be regarded to be the result of long distance dispersal by wind or ocean current, and here, too, many disjunctions have been documented (Thorne, 1973; Renner, 2004). Several large South American clades have a small number of African species, for example *Maschalocephalus dinklagei* Gilg and K.Schum. (Rapateaceae), *Rhipsalis cassytha* Gaertn. (Cactaceae) and *Pitcairnia feliciana* (A.Chev.) Harms and Mildbr. (Bromeliaceae). These intercontinental exchanges are especially evident in the Lowland forest and Arid Floras. The complexity of these patterns is well illustrated by the remarkable disjunctions and complex biogeography of the mega-genus *Euphorbia*, with some 2000 species, many of which are succulent. *Euphorbia* probably originated in the Oligocene, but the bulk of its diversification was in the Neogene (Bruyns et al., 2011), and most subgenera and sections have species on most continents (Bruyns et al., 2011; Horn et al., 2012; Peirson et al., 2013; Riina et al., 2013), indicating much recent dispersal. Even though not included in any of our analyzed taxa, there are also tropical connections to Australia, best known of which is *Adansonia* (Malvaceae) (Baum et al., 1998).

These patterns are consistent with intercontinental dispersal being largely within biomes (Crisp et al., 2009). The habitat might also play a role, as is shown by the sister-species relationship between the tropical American *Thurnia* and the Cape *Prionium* (Thurniaceae): although the climatic regimes are quite different, both grow in permanent streams on highly oligotrophic soils. The large number of African flora elements which have their closest relatives in other floras in Africa (**Table 1**) show that much evolution in Africa is by adaptation rather than migration.

Our results suggest that diversification rates in the Austrotemperate Flora are higher than those in the tropical floras, and clades in the Austro-temperate Flora have more species than predicted by their age, growth form and source areas, than the tropical clades. The extreme case is in the core Ruschieae of the Aizoaceae, which has one of the highest diversification rates globally (Klak et al., 2004; Valente et al., 2014). However, generally the Cape flora (the most species rich part of the Austro-temperate Flora) does not have an exceptionally high diversification rate, compared to other areas were there have been floristic radiations (Linder, 2008). The only published comparison of diversification rates of groups inside and outside the Cape Floristic Region (Marloth, 1908; Goldblatt, 1978) has been for *Protea*, and here no higher diversification rate could be demonstrated in the Cape Floristic Region than in the Drakensberg and regions to the north (Valente et al., 2010). However, there is some indication that diversification rates may be higher in the Cape flora than in the Mediterranean (Valente et al., 2011). The latter compares two floras, while the former compared two regions within one flora, which might explain the result.

There are probably six African floras (**Table 3**), based on evidence from the distributions (also taxonomic composition), the affinities outside Africa, the differences in the diversification rates and the age of the floras. These six floras are more heuristic devices with which to evaluate the history of the African flora, rather than six distinct entities.

#### **AUSTRO-TEMPERATE FLORA**

The Austro-temperate Flora is centered in the south-western tip of Africa (**Figure 8A**), but is widespread in southern Africa with outliers in East Africa. It includes several elements: the "Cape flora" which radiated on the Cape mountains and intermontane valleys (Marloth, 1908; Goldblatt, 1978; Linder, 2003); the Drakensberg Alpine Center in the Drakensberg (Hilliard and Burtt, 1987; Carbutt and Edwards, 2006); the Namaqualand semidesert flora (Cowling et al., 1999; Snijman, 2013); and outliers further to the north in the Zimbabwean Chimanimani (Phipps and Goodier, 1962). Typical elements are Restionaceae (e.g., *Elegia*), Proteaceae (e.g., *Protea, Mimetes*), as well as taxa typical of the Drakensberg grasslands (e.g., *Brownleea, Kniphofia*).

The Cape flora has long been treated as highly distinct (Bolus, 1886; Marloth, 1908; Goldblatt, 1978; Linder, 2003), and it has even been recognized as one of the six floristic kingdoms (Good, 1974; Takhtajan, 1986). Indeed, there are a large number of clades centered in this region (the "Cape clades"), and endemism at both species and generic level to this region is extraordinarily high (Manning and Goldblatt, 2013). More striking is how few clades contributed substantially to this enormous species richness, and about half of the species can be attributed to just over 30 Cape clades (Linder, 2003). These clades have been well studied, indicating that most of the radiations of these clades date from the Miocene (Linder, 2005; Verboom et al., 2009), although in many cases the stem ages are much older. Many of these radiating clades are centered on, or even restricted to, the highly oligotrophic sandy soils derived from the sandstone bedrock of the Cape fold mountains, suggesting that they might be edaphic specialists predating the probably late Miocene origin of the unusual winter-rainfall climate (Dupont et al., 2011). However, the inception of the winter-rainfall climate may well have been the trigger for extensive speciation (Goldblatt, 1997).

There is much evidence that the Cape flora is not geographically sharply defined. There are outliers well beyond its main range in the south-western Cape (Weimarck, 1933, 1936, 1941), many of these are on quartzitic bedrock, for examples the sandstones of Pondoland (Van Wyk, 1989, 1990) and the quartzites of the Chimanimani mountains of Zimbabwe (Phipps and Goodier, 1962). Others are on well-leached soils at high altitudes in the Drakensberg (Killick, 1978; Carbutt et al., 2013). More striking, though, are the numerous genera that are as important a component in the tropic-montane grasslands of tropical Africa, as in the Cape, albeit with a much lower species richness. These genera include archetypal Cape clades such as *Erica*, *Protea*, *Helichrysum*, the orchid genera *Disa* and *Satyrium* and the grass genus *Pentameris*. In these clades the distribution can be described to be widespread in the cooler, more temperate, environments in Africa, but with a concentration of species richness in the south-western Cape. The opposite situation, where the greatest species richness is in the tropical African mountains and only a few species represent the clade in the south-western Cape, is much rarer. An example is the genus *Kniphofia* (Ramdhani et al., 2008). The floristic delimitation to the semi-arid south-western coast is equally poor. The vegetation is totally different: the Cape flora is a shrubland of highly branched, small-leaved, evergreen, sclerophyllous shrubs, while the dominant vegetation of Namaqualand is succulent. Floristically the two regions intergrade and intermix, and this lead to Jürgens (1997) suggesting that they should be combined. This was critically tested by Born et al. (2007) and implemented by Snijman (2013).

Possibly several cool and/or seasonal drought adapted clades differentiated during the middle Cenozoic in southern Africa. The Lesotho uplands were at a relatively high altitude throughout the Cenozoic (Partridge and Maud, 2000), providing a suitably moist and temperate habitat. Furthermore, the current rainfall gradient



from east to west, with a relatively arid western side of the subcontinent, appears to have existed during the Eocene (Partridge and Maud, 2000) and probably persisted since then, and might have provided a suitable rainfall gradient over which clades could differentiate. These southern African temperate clades may have included the families Scrophulariaceae, Crassulaceae, Aizoaceae, large clades in Asteraceae, Fabaceae, and Iridaceae, and Poaceae subfamily Danthonioideae. Specialization to the oligotrophic soils of the Cape fold mountains, as these were exposed by erosion (Tinker et al., 2008a,b; Scharf et al., 2013), probably laid the foundation for the distinctive Cape floral elements such as Restionaceae and Proteaceae. Specialization to seasonal aridity on the West Coast subsequent to the climate changes induced by the increased activity and lower temperatures of the Benguela upwelling (Dupont et al., 2011), led to the Late Miocene radiation of clades in the succulent karoo (Klak et al., 2004; Verboom et al., 2009). The most recent specialization might be to the subalpine conditions along the summits of the Drakensberg, subsequent to the Pliocene uplift of ca. 900 m (Partridge and Maud, 2000), with erosional fragmentation leading to allopatric differentiation, for example in the daisy genus *Macowania* (Bentley et al., 2014). This scenario envisages a series of local radiations of a widespread southern African flora in response to new, extreme habitats, as these become available, and sees the iconic Cape flora as a part of this Austro-temperate Flora.

#### **LOWLAND FOREST FLORA**

The lowland forest genera form a geographically highly distinctive group, centered along the Atlantic coast from the Gambia to northern Angola, with a low density in the Congo basin, and with outliers in the Mombasa—Tanga coastal region of East Africa (**Figure 8B**). This includes genera such as *Allanblackia*, *Garcinia*, *Begonia*, *Tricalysia*, and *Voacanga*.

Lineages which belong to the Lowland forest Flora are the oldest lineages in our analysis, and date back into the Cretaceous, suggesting that African lowland forests are ancient. This corroborates other, global, analyzes based on palms (Couvreur et al., 2011) and Malpighiales (Wang et al., 2009) that forests of angiosperms are ancient. The current indications are not only that this is the oldest African flora, but also that it has persisted over the past 80 My.

Surprisingly little is known about the spatial evolution of this Flora in Africa. It might have been widespread in the African Paleogene (Plana, 2004), but there appears to be no direct evidence, except from around Mt Cameroun, where forest was reconstructed for the Eocene from pollen data (Utescher and Mosbrugger, 2007). These forests may have reached across Africa to the Indian ocean (Morley, 2000). This may well have been part of a pan-equatorial Paleogene hothouse forest, which became increasingly fragmented as the northern Indian Ocean was formed by the northward drift of India (Rutschmann et al., 2004). This continuous forest was subsequently fragmented in Africa by the Oligocene initiation of doming (Chorowicz, 2005), leading to the evolution of the East African Rift system with its complex system of cool uplands, dry rainshadows, and areas of high orographic rainfall (Sepulchre et al., 2006). Phylogeographical studies indicate that the eastern and western forests may have been fragmented and reconnected several times during the Miocene (Couvreur et al., 2008). During this period the East African forest patches may have been reduced in extent and were replaced by grassland and woodland (Morley, 2000). The contrasting patterns of high forest diversity in the Cameroun-Gabon areas, and low diversity across the Congo basin (Parmentier et al., 2007), have led to suggestions that in the Neogene the forest might at times have been reduced to a few refugia (Hamilton, 1982). Much of the herbaceous understory diversity in these forests is also concentrated in these putative refugia (Sosef, 1996; Küper et al., 2004). Off-shore sediments indeed indicate major fluctuations in rainforest along the west African coast during the last 150 kyrs, but also show that they were always present (Dupont et al., 2000).

The African lowland forests are, compared to those in South America and south-east Asia, species poor (Corlett and Primack, 2011). Parmentier et al. (2007) suggested that the low local diversity in the high-rainfall African forests may be due to a small African lowland, high rainfall forest species pool, which reflects a series of Neogene bottlenecks.

#### **TROPIC-MONTANE FLORA**

The Tropic-montane Flora is closely related to the Lowland forest Flora, but has its geographical center in the East African Arc, from Kitulo Plateau in the Southern Highlands of Tanzania, to Mt Elgon on the Uganda-Kenya border (Lovett, 1988, 1989, 1993a,b,c). The flora is widespread on all higher-lying land in Africa, from the Gambia to the Yemen, and south to Port Elizabeth (**Figure 8C**). Typical genera include upper forest margin genera like *Hypericum*, forest taxa such as *Cussonia, Podocarpus* and *Impatiens*, as well as grassland taxa such as *Colpodium*. The vegetation is a mosaic of evergreen, laurophyllous forests and grassland. Nearer the equator forest dominates with few grassland patches, at higher latitudes grassland dominates with a few isolated forest patches. The relative importance of these two vegetation types may have fluctuated extensively during the Pleistocene climate fluctuations (Meadows and Linder, 1993). At higher altitudes a scrubland with dense *Erica*-dominated vegetation can also occur (White, 1983).

Although this Tropic-montane Flora was recognized as a distinct unit by White (as "Afromontane," Chapman and White, 1970; White, 1978, 1981), regionalization studies based on species have failed to retrieve it (Linder et al., 2005, 2012b), possibly due to the high turnover in species composition between mountain blocks. The results here indicate that the flora can be recognized based on a distinct set of tropic-montane genera.

The few available dated clades indicate a Miocene age. This is consistent with the East African doming, but the number of clades investigated are too small to be confident of these results. Some African mountains predate the Miocene, suggesting that this flora could contain older elements. At the southern end it intergrades into the Austro-temperate Flora: although the Drakensberg system has generally been assigned to the "Afromontane" (White, 1983), the analysis here suggests that it is part of the Austrotemperate Flora. Many genera (e.g., *Podocarpus*) are widespread between the two systems, and consequently they have been combined as the "Afro-temperate flora" (Linder, 1990).

#### **TROPIC-ALPINE FLORA**

The Tropic-alpine Flora ["Afro-alpine" of Hedberg (1957) and White (1983)] is the youngest African flora. It forms a typical tropic-alpine vegetation, dominated by gaint rosette herbs, and adaptated to an aseasonal, diurnal freeze-thaw cycle (Smith and Cleef, 1988). The earliest clades are contemporanous with the Late Miocene age of the formation of these volcanic peaks (Wichura et al., 2010). No doubt there were earlier mountains in Africa, and particularly in Ethiopia extensive trappe lava flows in the Oligocene must have built substantial mountains (Chorowicz, 2005), but there is no evidence of alpine peaks before the Rift Valley volcanos.

Consistent with such a young flora, the immigration rate seems much higher than the local diversification rate (e.g., *Carex*, Gehrke and Linder, 2009), and the largest clades contain only a few species—possibly the largest local radiation is in *Alchemilla* (Gehrke et al., 2008).

#### **SAVANNA FLORA**

The Savanna Flora is rather poorly sampled in this study. The center of species richness is along the high ground forming the watershed between the Congo, Zambezi and Ruaha River systems, but it is widespread in the seasonally arid parts of the continent (**Figure 8D**). It includes both the Sudanian and Zambezian savannas recognized by White (1965). Included here are widespread savanna genera like *Crotalaria* and *Kirkia*. Also included here are some typical arid adapted taxa, such as *Namibia*. The most common vegetation is woodland, often with some grass in the understory and regular fire (White, 1983).

The Savanna Flora, with grasses as its key element, evolved in the Miocene, and in the Eocene no grass pollen was found in Africa (Utescher and Mosbrugger, 2007). From the Early Miocene onwards there are increasing proportions of grass pollen in the fossil record (Morley and Richards, 1993; Jacobs et al., 1999; Dupont et al., 2013; Feakins et al., 2013; Hoetzel et al., 2013), with indications of a dominance of C4 grasses dating only from the Late Miocene / Pliocene (Dupont et al., 2013; Feakins et al., 2013; Hoetzel et al., 2013). Many of the woody genera associated with African savannas also contain rainforest species (eg., *Brachystegia*, *Isoberlinia*, *Acacia*). Fossil records from West Africa indicate expansion and retraction of savannas after 6.3 Ma (Morley, 2000). However, no critical evaluation of the evolutionary history of any of the typical clades of this flora is as yet available, and such an evaluation will no doubt be immensely valuable in giving an insight into the evolutionary history of not only this flora, but also of the Lowland forest Flora.

#### **ARID FLORA**

There are probably two Arid Floras in Africa. The one is associated with the south-west African coast and the Namib desert, and is at least partially a winter-rainfall arid flora, and is included here in the Austro-temperate Flora. The other is associated with a summer rainfall regime, and is found along the eastern part of the continent. This flora is what is often referred to as succulent thicket, thicket, or seasonally dry vegetation (White, 1983; Schrire et al., 2005a,b). It is not a grassland and, like forest, is not pyrophytic. It reaches into southern Africa, where it has received some attention (Cowling et al., 2005; Ramdhani et al., 2010; Potts et al., 2013). This flora is centered in the Somalia-Masai regional center of endemism White (1983).

Seasonally arid habitats probably existed throughout the Cenozoic, even in the Paleogene hot-house earth, and some climatic reconstructions suggest that these were spatially as extensive as now (Huber and Goldner, 2012). This is consistent with the assembly of East African Eocene leaves which match those currently found in seasonally dry savannas (Jacobs and Herendeen, 2004). Our data are consistent with the Arid Flora dating to the Paleocene.

This flora, which is currently quite fragmented, would benefit from detailed exploration (Cowling et al., 2005; Schrire et al., 2005a). It is a relatively old flora, and radiations started in the Oligocene/Early Miocene in *Euphorbia* (Bruyns et al., 2011; Horn et al., 2012; Peirson et al., 2013) and *Commiphora* (Weeks and Simpson, 2007). There appears to be an interplay between dispersal among the arid habitats in Africa and repeated, local, radiations, illustrated well in the stem-succulent stapeliads (Bruyns et al., 2014) and the Rutaceae genus *Thamnosma* (Thiv et al., 2011). The Miocene evolution of savannas, and finally the Pliocene evolution of C4 grasslands and a massively increased fire frequency, may have transformed a large portion of the arid thicket vegetation into grassland.

#### **IMPACT OF C<sup>4</sup> GRASSES, OR WHAT HAPPENED TO AFRICA'S BIODIVERSITY?**

The generally low species diversity of many of the African floras, compared to the other two major equator-straddling continents, South America and south-east Asia, is puzzling. There are several islands of high species richness in this generally speciespoor continent: the Cape flora as the by far most species-rich mediterranean-type ecosystem, the adjacent succulent karoo, as the most species rich semi-arid flora (both part of the Austrotemperate Flora), and the lowland forests between Gabon and Mt Cameroun, which are on a par with rainforests in South America and south-east Asia. This suggests that some event in the past erased much of the African diversity, and did it in clearly delimited biomes. No doubt climatic deterioration, as suggested by many researchers (e.g., Dransfield, 1988; Morley and Richards, 1993) might have been a factor, but global climatic deterioration should also have played a role in the other continents, and should have impacted all African floras.

The three species-rich African areas share the historical and current absence of C4 grasses. Grasses start spreading in the Miocene, with several reports of increasing importance of grass pollen from East Africa (Jacobs et al., 1999; Feakins et al., 2013) and southern Africa (Dupont et al., 2013; Hoetzel et al., 2013). In both East African and southern Africa this early grassland appears to have been C3 dominated: the signature of C4 dominance, as inferred from the stable isotope composition of plant waxes retrieved from a deep sea drilling core off Namibia, dates only from the Late Miocene/Pliocene or even during the Pleistocene (Hoetzel et al., 2013). This leads to the suggestion that C4 grasses may have replaced C3 grasses rather than woody vegetation (Feakins et al., 2013). In southern Africa, this increase in dominance of C4 grasses coincides with a massive increase in charcoal (Hoetzel et al., 2013), suggesting that with C4 grasses there is an increase in fire frequency. The link between the dominance of C4 grasses and fires is frequently made (Bond et al., 2003, 2005; Beerling and Osborne, 2006). A similar coincidence of increased grass pollen and charcoal is also reported from the Middle Miocene of the Gulf of Guinea, but without an indication whether this was C3 or C4 grassland (Morley and Richards, 1993). Such a massive increase in fire frequency in a modern system would be associated with extensive extinction, and there is no reason to believe that it was any other in the Late Miocene. Not only does Africa have more extensive C4 savannas than all other continents, but their extent was much larger during the glacial maxima (Scott, 2002), but not in the winter rainfall parts of southern Africa. This suggest that C4 grasslands may have driven a massive orgy of extinction during the Late Miocene and Pliocene. The fire refuge areas (Cape fynbos, succulent karoo and the Gabon-Camerounian rainforests) are relicts of what may once have been a much more species rich flora. All these areas do contain grasses, but these are C3 grasses, and they occur at a much lower density, so that they do not drive annual fires as is the case in savanna systems. This hypothesis predicts that regions that have been C4 grassland since the end Miocene might have accumulated, and those never invaded by C4 grassland retained, rich floras. However, areas that are on the dynamic interface between floras (savanna—rainforest, or savanna—arid) might be impoverished.

Although fire requires a dry season of several months, the dry season itself does not result in fire. This is evident both from the semi-arid south west African coast, with its succulent vegetation, and to the north very sparse desert flora. Here an increase in desert indicator pollen in the Plio-Pleistocene record is matched by a decrease in the charcoal in the deposits (Hoetzel et al., 2013), vividly illustrating the requirement of a minimum of rain for fire. Both fynbos in the Cape flora and the seasonally dry thicket in north-east Africa have sufficient growth to support fire, and a dry season, but lack C4 grasses. The result is fire on a decadal scale in the fynbos, or not at all in the seasonally dry thicket. This illustrates the importance of C4 grasses as biotic modifiers (Linder et al., 2012a) in introducing fire at an annual scale and transforming the vegetation to a savanna.

#### **SUMMARY**

The current plant diversity in Africa can be interpreted as the result of the sequential adding of new floras. The continent entered the Cenozoic with the Lowland forest Flora. During the Paleocene the Arid Flora and the Austro-temperate Floras were added. The doming and volcanism of the Neogene resulted in the adding of the Tropic-montane and a Tropic-alpine Floras. The Savanna Flora may have been the result of a biotic event—the evolution and spread of grassland. However, what is not clear is whether there were other floras in the past, which are now extinct or so reduced as to be difficult to detect. The patterns of species accumulation seem to differ among the floras, and the Austrotemperate Flora may be accumulating species more rapidly than the other floras. There could be a relationship between standing diversity and exposure to invasion by grassland, particularly by C4 grassland with its associated high fire frequency.

We know very little about the processes linked to diversification (speciation and extinction) in these floras. Africa, with its highly diverse landscape and intermingled floras, offers an excellent opportunity to disentangle the effects of environmental variables such as topographical complexity, climatic stability, and climatic extremes, from biotic variables such as fire frequency and herbivory. We are right at the beginning of starting to disentangle the processes that have led to the diversity of species and biomes.

#### **ACKNOWLEDGMENTS**

The University of Zurich is thanked for funding, Jens Muttke for making the Bonn dataset available, and Yanis Bouchenak-Khelladi for advice on R, Tony Verboom for dates of numerous southern African clades, Yanis Bouchenak-Khelladi, Renske Onstein, and Tony Verboom for insightful comments on an earlier draft of the manuscript. This paper would not have happened if William Bond had not nagged me about fires and grasses.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fevo*.*2014*.*00038/ abstract

## **REFERENCES**


Morley, R. J., and Richards, K. (1993). Gramineae cuticle: a key indicator of Late Cenozoic climatic change in the Niger Delta. *Rev. Palaeobot. Palynol.* 77, 119–127. doi: 10.1016/0034-6667(93)90060-8

Morley, R. J. (2000). *Origin and Evolution of Tropical Rain Forests.* Chichester: Wiley.

Mutke, J., and Barthlott, W. (2005). Patterns of vascular plant diversity at continental to global scales. *Biol. Skr.* 55, 521–531.


Weimarck, H. (1933). Die Verbreitung einiger Afrikanisch-montanen Pflanzengruppen, I–II. *Sven. Bot. Tidskr.* 27, 400–419.


Wickens, G. E. (1976). *The Flora of Jebel Marra (Sudan Republic) and its Geographical Affinities*. London: Her Majesties Stationary Office.

**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 07 May 2014; accepted: 04 July 2014; published online: 25 July 2014.*

*Citation: Linder HP (2014) The evolution of African plant diversity. Front. Ecol. Evol. 2:38. doi: 10.3389/fevo.2014.00038*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Ecology and Evolution.*

*Copyright © 2014 Linder. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Living on the edge: timing of Rand Flora disjunctions congruent with ongoing aridification in Africa

Lisa Pokorny <sup>1</sup> \*, Ricarda Riina<sup>1</sup> , Mario Mairal <sup>1</sup> , Andrea S. Meseguer <sup>2</sup> , Victoria Culshaw<sup>1</sup> , Jon Cendoya<sup>1</sup> , Miguel Serrano<sup>3</sup> , Rodrigo Carbajal <sup>3</sup> , Santiago Ortiz <sup>3</sup> , Myriam Heuertz 4, 5, 6 and Isabel Sanmartín<sup>1</sup> \*

<sup>1</sup> Real Jardín Botánico (RJB-CSIC), Madrid, Spain, <sup>2</sup> INRA, UMR 1062, Centre de Biologie pour la Gestion des Populations (INRA, IRD, CIRAD, Montpellier SupAgro), Montferrier-sur-Lez, France, <sup>3</sup> Department of Botany, Pharmacy School, University of Santiago de Compostela, Santiago de Compostela, Spain, <sup>4</sup> Forest Research Centre (INIA-CIFOR), Madrid, Spain, <sup>5</sup> INRA, BIOGECO, UMR 1202, Cestas, France, <sup>6</sup> University of Bordeaux, BIOGECO, UMR 1202, Talence, France

#### Edited by:

James Edward Richardson, Royal Botanic Garden Edinburgh, UK

#### Reviewed by:

Thomas L. P. Couvreur, Institut de Recherche pour le Développement, Cameroon Lars Chatrou, Wageningen University, Netherlands

#### \*Correspondence:

Lisa Pokorny and Isabel Sanmartín, Real Jardín Botánico (RJB-CSIC), Plaza de Murillo 2, 28014 Madrid, Spain pokorny@rjb.csic.es; isanmartin@rjb.csic.es

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 06 October 2014 Accepted: 05 April 2015 Published: 01 May 2015

#### Citation:

Pokorny L, Riina R, Mairal M, Meseguer AS, Culshaw V, Cendoya J, Serrano M, Carbajal R, Ortiz S, Heuertz M and Sanmartín I (2015) Living on the edge: timing of Rand Flora disjunctions congruent with ongoing aridification in Africa. Front. Genet. 6:154. doi: 10.3389/fgene.2015.00154 The Rand Flora is a well-known floristic pattern in which unrelated plant lineages show similar disjunct distributions in the continental margins of Africa and adjacent islands—Macaronesia-northwest Africa, Horn of Africa-Southern Arabia, Eastern Africa, and Southern Africa. These lineages are now separated by environmental barriers such as the arid regions of the Sahara and Kalahari Deserts or the tropical lowlands of Central Africa. Alternative explanations for the Rand Flora pattern range from vicariance and climate-driven extinction of a widespread pan-African flora to independent dispersal events and speciation in situ. To provide a temporal framework for this pattern, we used published data from nuclear and chloroplast DNA to estimate the age of disjunction of 17 lineages that span 12 families and nine orders of angiosperms. We further used these estimates to infer diversification rates for Rand Flora disjunct clades in relation to their higher-level encompassing lineages. Our results indicate that most disjunctions fall within the Miocene and Pliocene periods, coinciding with the onset of a major aridification trend, still ongoing, in Africa. Age of disjunctions seemed to be related to the climatic affinities of each Rand Flora lineage, with sub-humid taxa dated earlier (e.g., Sideroxylon) and those with more xeric affinities (e.g., Campylanthus) diverging later. We did not find support for significant decreases in diversification rates in most groups, with the exception of older subtropical lineages (e.g., Sideroxylon, Hypericum, or Canarina), but some lineages (e.g., Cicer, Campylanthus) showed a long temporal gap between stem and crown ages, suggestive of extinction. In all, the Rand Flora pattern seems to fit the definition of biogeographic pseudocongruence, with the pattern arising at different times in response to the increasing aridity of the African continent, with interspersed periods of humidity allowing range expansions.

Keywords: Africa, historical biogeography, climate change, diversification rates, long-distance dispersal, Rand Flora, vicariance

## Introduction

Large-scale biodiversity patterns have intrigued naturalists since the eighteenth century (Forster, 1778; von Humboldt and Bonpland, 1805; Wallace, 1878; Fischer, 1960; Stevens, 1989; Lomolino et al., 2010). Recognizing that spatial variation in environmental variables such as temperature or precipitation is insufficient to explain such patterns, more integrative explanations that emphasize the role of both environmental and evolutionary factors have recently been advanced (Qian and Ricklefs, 2000; Wiens and Donoghue, 2004; Jablonski et al., 2006). As Wiens and Donoghue (2004) state "environmental variables cannot by themselves increase or decrease local or regional species richness"; only evolutionary processes such as dispersal, speciation and extinction can. Therefore, reconstructing rates of dispersal, speciation, and extinction across the component lineages of a biota might help us understand how assembly took place across space and through time (Pennington et al., 2004; Ricklefs, 2007; Wiens, 2011). Moreover, understanding patterns of biotic assembly is a pressing goal in biodiversity research at a time when nearly one tenth of species on Earth are projected to disappear in the next hundred years (Maclean and Wilson, 2011).

Africa is a continent especially interesting to study patterns of biotic assembly. On one hand, African tropical regions are comparatively species-poorer than regions situated in the same equatorial latitudes in the Neotropics and Southeast Asia (Lavin et al., 2001; Couvreur, 2015), which has led to the continent being referred to as the "odd man out" (Richards, 1973). On the other, Africa offers some extraordinary examples of continentwide disjunctions. For example, tropical rainforests in Africa appear in two main blocks, the West-Central Guineo-Congolian region and the coastal and montane regions of East Africa, now separated by a 1000 Km-wide arid corridor (Couvreur et al., 2008). Another prime example is the so called Rand Flora (RF), a biogeographic pattern in which unrelated plant lineages show comparable disjunct distributions with sister taxa occurring on now distantly located regions in the continental margins of Africa: Macaronesia-northwest Africa, Western African mountains, Horn of Africa-South Arabia (including the Island of Socotra), Eastern Africa (incl. Madagascar), and Southern Africa (Christ, 1892; Lebrun, 1947, 1961; Quézel, 1978; Andrus et al., 2004; Sanmartín et al., 2010; **Figure 1**). All RF lineages share sub-humid to xerophilic affinities, so that the tropical lowlands of Central Africa and the large Sahara and Arabian deserts in the north or the Namib and Kalahari deserts in the south presumably constitute effective climatic barriers to their dispersal.

Swiss botanist K. H. H. Christ (1892) first referred to "cette flore marginale de l'Afrique," that is "this marginal African flora," in a note addressing the role the so called ancient African flora played on European floras, with emphasis on the Mediterranean biome. Later, in his "Die Geographie der Farne" (i.e., "The Geography of Ferns"; Christ, 1910), he very aptly named this geographic pattern "Randflora" (see pp. 259–275), where the Germanic word "Rand" stands for rim, edge, border, margin (see **Figure 1** inset), noting its similarities with Engler's "afrikanischmakaronesische Element" (Engler, 1879, 1910; see pp. 76 in the former and pp. 983–984 and 1010 in the latter), that is, an "Afro-Macaronesian element" linking disjunct xerophilic taxa found in the continental margins of Africa and its adjacent islands (e.g., Canary Islands, Cape Verde, etc.).

FIGURE 1 | Rand Flora disjunction pattern as evidenced by angiosperm plant lineages analyzed for this study. The inset shows K.H.H. Christ's (1910) depiction of "cette flore marginale de l'Afrique" or "Randflora" (in orange color), note their similar geographic limits. Taxa: Adenocarpus (Fabaceae), Camptoloma (Scrophulariaceae), Campylanthus (Plantaginaceae), Canarina (Platycodoneae, Campanulaceae), Cicer (Fabaceae), Colchicum (Colchicaceae), Euphorbia subgen. Athymalus (sects. Anthacanthae and Balsamis; Euphorbiaceae), Euphorbia subgen. Esula (sect. Aphyllis), Euphorbia subgen. Esula (African clade of sect. Esula), Geranium subgen. Robertium (Geraniaceae), Hypericum (Hypericaceae), Kleinia (Asteraceae), Plocama (Rubiaceae), and Sideroxylon (Sideroxyleae, Sapotaceae).

Historical explanations for this pattern and, in particular, its temporal framework, its exact boundaries, and the ecology of the plants involved have varied through these past two centuries. The early view (Engler, 1879, 1910; Christ, 1892, 1910) was one of a pan-African flora found throughout the continent that became restricted to its margins as a result of major climate changes (i.e., increasing aridification) throughout the Tertiary (i.e., the Cenozoic Period, 66.0–2.58 Ma). Lebrun (1947; see pp. 134–137), and later Monod (1971, p. 377) and Quézel (1978, p. 511), interpreted Christ's ancient African flora as a complex ensemble that had experienced alternating expansions and contractions through time, having had a chance to spread across northern Africa during favorable moments in the Miocene and needing to retract at the end of the Neogene (i.e., Pliocene): a further increase in aridity at the beginning of Pleistocene glaciations would have confined relictual or vicariant taxa to Macaronesia, northwest Africa and Arabia. Axelrod and Raven (1978) explained some of these disjunctions in relation to a more ancient, widespread Paleogene flora of subtropical origin that covered the entire African continent at the beginning of the Cenozoic, and that was decimated by successive events of aridification, of which the relict floras of Macaronesia, the Cape Region, and the Afromontane forests in eastern and western Africa would be remnants. Bramwell (1985) explains this pattern in terms of pan-biogeographic "general tracks" that connect what would be the remains of an ancient flora that extended across the Mediterranean and Northern Africa in the Miocene, and whose vestiges could be found in the Macaronesian laurisilva and a few enclaves in the island of Socotra, the Ethiopian Highlands and southern Yemen.

These authors share a vicariant perspective and presume RF lineages were part of a widespread pan-African Tertiary flora that became fragmented by the appearance of climatic barriers (i.e., aridification), leaving relictual lineages with reduced distributions at "refugia" in the margins of Africa (i.e., "continental" islands). This "refugium" idea rests on the fact that many of these RF regions—Macaronesia, the South African Cape region, and the semi-arid regions of Eastern Africa and Southern Arabia (e.g., Ethiopia, Yemen, Socotra)—harbor a large number of endemic species, when compared to neighboring areas. Moreover, the "fragmentation-refugium" hypothesis implies the disappearance, possibly by extinction, of RF lineages from part of their distributional range (e.g., across the Sahara in central Northern Africa), which is consonant with the "climatic vicariance" concept (Wiens, 2004): an environmental change creates conditions within a species' geographic range that are outside the ancestral climatic tolerances; individuals are unable to persist and the species' geographic range becomes fragmented.

The alternative explanation is one of independent dispersal (immigration) events among geographically isolated regions and subsequent speciation in situ. In this framework, divergence events need not be congruent across lineages, since long-distance dispersal (LDD) events are highly stochastic in nature (Nathan, 2006). Asides from transoceanic dispersal—which has been postulated in the case of Aeonium (Kim et al., 2008), Geranium (Fiz et al., 2008), and other RF lineages (Andrus et al., 2004) based on molecular phylogenetic evidence—, cross-continent LDD dispersal is also possible: published examples favoring cross-continent LDD include Senecio, with a disjunct distribution between Macaronesia-Northern Africa and South Africa (Coleman et al., 2003; Pelser et al., 2012). Moreover, dispersal does not necessarily imply long-distance migration events. In some cases, dispersal across intermediate areas that act as "stepping stones" or "land bridges" could have been possible. For example, the presence of isolated mountain ranges (offering suitable habitats) throughout the Sahara, such as the Tibesti and Hoggar massifs, could have allowed this short or medium-range dispersal in Campanula (Alarcón et al., pers. comm.). Correspondingly, some RF lineages might have used the Arabian Plate as a land bridge to reach East Africa (Campanula, Roquet et al., 2009; Hypericum, Meseguer et al., 2013), and others may have benefited from the new habitats offered by the Pliocene uplift of the Eastern Arc Mountains to migrate to or from South Africa (Meseguer et al., 2013).

Discriminating between climate-driven vicariance vs. independent dispersal events between geographically isolated regions requires framing the evolution of disjunct lineages on a temporal scale (Sanmartín, 2014). On the other hand, to unravel the origin of a biota or biome, a meta-analysis across dated phylogenies of multiple non-nested clades is needed (Pennington et al., 2010; Wiens, 2011; Couvreur, 2015). Sanmartín et al. (2010) carried out a meta-analysis of 13 lineages to infer relative rates of historical dispersal among RF regions (Macaronesia, Eastern Africa-Southern Arabia, and Southern Africa) and found the highest rate of biotic exchange between east and west Northern Africa, across the Sahara. However, they did not integrate absolute estimates of lineage divergences in their inference, since very few RF lineages (e.g., Roquet et al., 2009) had been dated at the time.

In this study, we estimate time divergences for up to 13 plant lineages (**Table 1**) displaying RF disjunct distributions (**Figure 1**), and use published divergence times for four other lineages (see Materials and Methods), in order to provide a much-needed temporal framework for this pattern. An extensive description of each of these lineages, geographic distributions and phylogenetic relationships is provided in Supplementary Materials. We also frame these disjunctions in the context of major climatic and geological events in the history of Africa (see summary below) and estimate net diversification rates in an attempt to address the role that evolutionary processes, such as climate-driven extinction, may have played in the formation of the African RF pattern.

## Materials and Methods

## Study Area: African Climate through Time

To understand biogeographic patterns in the African flora, it is necessary to briefly review the climatic and geological history that might have influenced the evolution of African plant lineages. Extensive reviews of African climatic and vegetation history can be found in Axelrod and Raven (1978); van Zinderen Bakker (1978); Maley (1996, 2000); Morley (2000); Jacobs et al. (2010), Plana (2004), and Bonnefille (2011), among others.

During the Late Mesozoic, Africa was part of the supercontinent Gondwana, located in the southern hemisphere, and enjoyed a relatively humid and temperate climate (Raven and Axelrod, 1974). After breaking up from South America ca. 95 Ma, Africa started moving northwards toward the equatorial zone (**Figure 2A**). The result was a general trend toward continental aridification in which different regions became arid or wet at alternative times (**Figure 2B**, Senut et al., 2009). Paleocene Africa (66–56 Ma) was mainly wet and warm, characterized by a major diversification in the West African flora (Plana, 2004). A global increase in temperatures in the Eocene (56–33.9 Ma) led to increased aridity in Central Africa, with a rainforest-savannah mosaic in the Congo region. This was followed by a global cooling event at the Eocene-Oligocene boundary (33.9 Ma), which led again to aridification and major extinction but did not change biome composition (Axelrod and Raven, 1978).

The Early Miocene (23–16 Ma) was warm and humid, with wide extension of rainforests, from the northern Sahara to parts of Southern Africa. The Mid Miocene (16–11.6 Ma) was a period of major changes in climate and topography. A combination of factors, including the gradual uplift of Eastern Africa, the successive closure of the Tethys seaway in the north, and the expansion of the East Antarctic ice sheet in the south (Trauth et al., 2009), led to a general intensification of the aridification process, though it was not homogeneous across the continent. Geological and paleontological evidence suggest that now arid regions (e.g., northern Africa, Horn of Africa, Namib Desert) were during


GenBank numbers can be found in the references listed under column "Dataset reference."

this period more humid than they are today, whereas other now humid regions (e.g., Congo Basin) were much drier (**Figure 2B**). Desertification started in the southwest (Namib Desert) around 17–16 Ma ago, and proceeded eastward and northward. In Southern Africa, tropical to subtropical vegetation was replaced by wooded savannah during the lower Mid-Miocene (Senut et al., 2009). In Northern Africa, the earliest evidence of aridity in the Sahara region is from the Late Miocene (11.6–5.3 Ma), ca. 7–6 Ma (Senut et al., 2009; **Figure 2B**). In Central Africa, a semiarid desert ("Miocene Congo Desert," **Figure 2B**) occupied the region until the Mid Miocene, 13–12 Ma ago, when the Eastern African uplift and subsequent subsidence led to the establishment of the Congo River drainage and a general increase in humidity ("tropicalization"). Also in the Late Miocene, ca. 7– 8 Ma, a new period of tectonic activity in Eastern Africa led to the uplift of the Eastern Arc Mountains and the uplands of West Central Africa (Cameroon volcanic line), which led to increasing aridity and the expansion of savannahs and grasslands in these regions (Sepulchre et al., 2006). Uplifting reached a maximum during the Plio-Pleistocene and led to the formation of the Ethiopian Highlands and the desertification of low-lying areas in the Horn of Africa (Senut et al., 2009). From the Late Pliocene to the Holocene, the alternation of glacial-and interglacial periods seems to have led to repeated contractions and expansions of distributional ranges across both subtropical and tropical taxa (Maley, 2000; Bonnefille, 2011). Some areas like the Saharan massifs of Tibesti and Hoggar or the Ennedi Mountains could have served as refuges during arid periods for subtropical taxa (Osborne et al., 2008), whereas the uplands of Upper and Lower Guinea and the east of the Congo Basin, the Albertine Rift, or the Eastern Arc Mountains could have played the same role for tropical plant taxa (Maley, 1996; **Figure 2B**).

#### Taxon Sampling

We retrieved sequences from GenBank from existing studies (**Table 1**) for the following 13 lineages exhibiting a distribution congruent with the RF pattern (Andrus et al., 2004; Sanmartín et al., 2010): Adenocarpus, Aeonium, Camptoloma, Campylanthus, Cicer, Colchicum, Euphorbia sects. Antachanthae, Aphyllis, Balsamis, and Esula, Geranium, Kleinia, and Plocama (**Figure 3**).

We chose these lineages because sampling is nearly complete in most cases with very few to no missing taxa. Most of these RF taxa have been sequenced for several markers from the nuclear and chloroplast DNA regions. For each group we selected the markers

Africa in the Mid-Miocene, proceeding eastward and northward, and finalizing with the formation of the Sahara Desert. Conversely, Central Africa

> with most sequences and tried representing both genomic compartments whenever possible. The sequences were aligned using the Opalescent package (Opal v2.1.0; Wheeler and Kececioglu, 2007) in Mesquite v3.01 (Maddison and Maddison, 2014) and

deserts or rainforests, rarely some find refuge in mountain areas of North

African Sahara (e.g., Tibesti and Hoggar Massifs).

Table 1.

manually adjusted in SE-AL v2.0a11 (Rambaut, 2002) using a similarity criterion, as recommended by Simmons (2004). For four other RF lineages —Campanula (Alarcón et al., 2013), Canarina (Mairal et al., 2015), Hypericum (Meseguer et al., 2013), and Sideroxylon (Stride et al., 2014)—we used recently published time estimates by our research team (except for Sideroxylon, which nonetheless used a dating approach similar to ours). Approximately 1600 sequences from ca. 675 taxa from 12

divergence times within each lineage correspond to the disjunctions

families and 9 orders of angiosperms were included in our study (**Table 1**).

## Estimating Absolute Divergence Times

Divergence times were estimated under a Bayesian framework in BEAST v1.8 (Drummond et al., 2012). For each lineage, we constructed a dataset including the markers listed in **Table 1**, which were partitioned by genome (chloroplast vs. nuclear), whenever possible. The best-fitting substitution model for each partition was selected using the Akaike Information Criterion implemented in MrModeltest v2.2 (Nylander, 2004) and run in PAUP<sup>∗</sup> v4.0b (Swofford, 2002). The relaxed uncorrelated lognormal clock model (UCLD, Drummond et al., 2006) and a Yule speciation process as tree model were selected for all datasets based on preliminary explorations. MCMC searches were run 5 × 10<sup>7</sup> generations and sampled and logged every 2500th generation. We used Tracer v1.6 (Rambaut et al., 2013) to determine stationarity of the Markov chain and to verify that all parameters had large enough effective sampling sizes (ESS>200). TreeAnnotator v1.8.0 (Drummond et al., 2012) and FigTree v. 1.4.2 (Rambaut, 2009) were used respectively to generate and visualize the resulting maximum clade credibility (MCC) chronograms.

Calibration points for obtaining absolute divergence times were based on either the fossil record or on published secondary calibration constraints (**Table 2**). The latter were obtained from published dated phylogenies of datasets including our study groups (e.g., the family to which the genus belongs), and were assigned normal distribution priors (Ho and Phillips, 2009) in the BEAST analysis that encompassed the mean and the 95% highest posterior density (HPD) confidence interval (CI) from these studies [except in the case of time constrains from Bell et al. (2010), for which a lognormal distribution was used, since posterior estimates for a normal prior were not available]. For fossil calibration points we used a lognormal prior, since this distribution better represents the stratigraphic uncertainty associated to the fossil record (Ho and Phillips, 2009). The offset of the lognormal distribution was set to the upper bound of the stratigraphic period where the fossil was found, and the standard deviation (SD) and mean were set so that the 95% CI encompassed the lower and upper bound of the period (e.g., for Late Eocene Hypericum antiquum a lognormal distribution offset at 33.9 Myr, with mean = 1.0 and SD = 0.7, was used to cover the length of the period where the fossil was found, that is 33.9–37.2 Ma). A summary of time constraints used for each dataset and their provenance can be found in **Table 2**.

## Diversification Analyses

We used divergence times estimated above to calculate absolute diversification rates in the aforementioned lineages. There have been numerous developments in macroevolutionary birthdeath models that allow a more accurate estimation of extinction and speciation rates from dated molecular phylogenies, including episodic time-variable models and trait-dependent diversification models (Stadler, 2013; Morlon, 2014; Rabosky et al., 2014).


TABLE 2 | Time constraints and prior probability distributions imposed on constrained nodes to estimate divergence times in RF lineages.

At least one node (preferably toward the root) was constrained in each phylogeny (Figures S1–S16 show resulting chronograms explicitly stating any constrained nodes).

\*Plantaginacearumpollis miocenicus (Late Miocene, 10.3 Ma; Nagy, 1963; Doláková et al., 2011).

§Geranium cf. lucidum (Late Miocene, 7.246 Ma ± 0.005; Van Campo, 1989).

†Raiguenrayun cura (Middle Eocene, 47.5 Ma; Barreda et al., 2012).

However, these methods usually require both very large phylogenies (e.g., ≥100 tips) and a fairly complete sampling. We here chose a simpler approach, the "method-of-moments" estimator (Magallón and Sanderson, 2001), implemented in the R package GEIGER (Harmon et al., 2008). This method uses clade size (extant species number) and clade age (either crown or stem) to estimate net diversification rates (r = speciation minus extinction), under different values of background extinction or turnover rate (ε = extinction/speciation = 0.0, 0.5, and 0.9). Net diversification rates (bd.ms function in GEIGER) were here estimated for all RF disjunctions and for a series of successively encompassing clades (e.g., section, genus, tribe, subfamily, and so on) to detect possible rate shifts. Crown diversification rates could not be estimated for clades containing only two taxa because Magallón and Sanderson's formula (r = [log(n)–log 2]/t in its simplest version, that is, with no extinction; for ε > 0 see formula number 7 in Magallón and Sanderson, 2001) results in zero in this case. In an attempt to counter this problem, clades containing two taxa were assigned a diversity value of 2.01, which permitted the estimation of net diversification rates (r).

Additionally, the probability of obtaining a clade with the same size and age as the RF disjunction, given the background diversification rate of the encompassing clade/s and at increasing extinction fractions (ε = 0, 0.5, and 0.9), was estimated with the crown.p function in GEIGER. We also estimated the 95% confidence interval of expected diversity through time (crown.limits function, GEIGER, ε = 0, 0.5, and 0.9) for a clade that diversifies with a rate equal to that of the family containing a RF disjunction with the highest diversification rate (i.e., Asteraceae); we then mapped RF lineages according to their crown or stem age and standing species diversity to assess which RF disjunct clades are significantly less diverse than expected given their stem and crown age in relation to the highest rate calculated for a RF family (Magallón and Sanderson, 2001; Warren and Hawkins, 2006).

## Results

## Divergence Times

Up to 21 disjunctions were identified and divergence times were estimated for 17 lineages exhibiting a geographic distribution consistent with the RF pattern (**Figures 3**, **4** and **Figures S1–S17**). These disjunctions represent two possible geographic splits: I) Eastern Africa (including the Eastern Arc Mountains, the Horn of Africa, and Southern Arabia) vs. Southern Africa (including southern Angola and Namibia and the Cape Flora region up to the Drakensberg Mountains), hereafter E-S, and II) Western Africa (including Macaronesia and NW Africa south to the Cameroon volcanic line) vs. Eastern Africa, (with or without S Africa), hereafter W-E(&S).

From youngest to oldest, E-S disjunctions (**Figure 4**) occur in Plocama (ca. 4 Ma between S African Pl. crocyllis on one side and, among other E African-S Arabian species, Pl. yemenensis and Pl. tinctoria on the other; **Figure 3** and **Figure S15**), Camptoloma (ca. 4 Ma between E African Cm. lyperiiflorum and S African Cm. rotundifolium; **Figure 3** and **Figure S4**), Colchicum (ca. 5 Ma between E African Co. schimperianum and S African Co. albanense and Co. longipes, **Figure 3** and **Figure S8**), the African clade of Euphorbia sect. Esula (ca. 7 Ma between S African and E African taxa; **Figure 3** and **Figure S10**), and E. sect. Anthacanthae (ca. 7.5 Ma separate subsects. Platycephalae and Florispinae; **Figure 3** and **Figure S11**).

Also from youngest to oldest, W-E disjunctions (**Figure 4**) can be found in the Azorina clade of Campanula (ca. 1 Ma between Cape Verdean Ca. jacobaea and Socotran Ca. balfouri; **Figure 3** and **Figure S3**), in Hypericum sect. Campylosporus (ca. 1.5 Ma within H. quartinianum; **Figure 3** and **Figure S13**), in Aeonium (1.7 Ma between E African Ae. leucoblepharum and a number of Macaronesian species; **Figure 3** and **Figure S2**), in Cicer (ca. 3.5 Ma between Canarian Ci. canariense and E African Ci. cuneatum; **Figure 3** and **Figure S7**), in Adenocarpus (ca. 4 Ma between E African Ad. mannii and a number of species in the Ad. complicatus complex; **Figure 3** and **Figure S1**), in Euphorbia sect. Balsamis (ca. 4 Ma between W African Eu. balsamifera subsp. balsamifera and E African-S Arabian Eu. balsamifera subsp. adenensis; **Figure 3** and **Figure S11**), in Camptoloma (ca. 5.5 Ma between Canarian Cm. canariense, on one hand, and E African Cm. lyperiiflorum and S African Cm. rotundifolium, on the other; **Figure 3** and **Figure S4**), Eu. sect. Aphyllis (ca. 5.5 Ma between Cape Verdean Eu. tuckeyana and all E African and S African species in this section; **Figure 3** and **Figure S9**), Plocama (ca. 6 Ma between Canarian Pl. pendula and S African Pl. crocyllis plus a number of E African/S Arabian Plocama species, **Figure 3** and **Figure S16**), in Canarina (6.5 Ma between Canarian Cn. canariensis and E African Cn. eminii; **Figure 3** and **Figure S6**), in Kleinia (ca. 7 Ma between the Macaronesian species, on one hand, and a clade of several E African species, on the other; **Figure 3** and **Figure S14**), in Campylanthus (ca. 7.5 Ma between the Macaronesian and the E African-S Arabian species in the genus; **Figure 3** and **Figure S5**), in Geranium subgen. Robertium (ca. 11 Ma between all E African species in this subgenus and a clade formed by W African taxa and a number of broadly distributed circum-Mediterranean and E Asian taxa; **Figure 3** and **Figure S12**), in the Androsaemum clade of Hypericum (ca. 17 Ma between Socotran H. scopulorum, H. tortuosum and Turkish H. pamphylicum, on one hand, and a number of Macaronesian and W Mediterranean species, on the other; **Figure 3** and **Figure S13**), and in Sideroxylon (ca. 17 Ma between Moroccan S. spinosus and E African S. mascatense; **Figure 3** and **Figure S16**).

## Absolute Diversification Rates

**Figure 5** and **Table S1** show results from net diversification rate analyses. Most lineages fall within the 95% CI of expected diversity under a no-extinction scenario (ε = 0) in the context of the RF family showing the highest rate of diversification (i.e., Asteraceae). However, some RF disjunct clades were significantly less diverse: W-E disjunctions in Sideroxylon (S. spinosus vs. S. mascatense), Canarina (C. canariensis vs. C. eminii), and Hypericum (H. canariense clade vs. H. scopulorum and H. pamphylicum). Other RF disjunct taxa were above the upper bound of the 95% CI: W-E(&S) disjunction in Euphorbia sect. Aphyllis (S), Adenocarpus, Aeonium, and Campanula; and E-S disjunction in Plocama. Otherwise, all taxa fell within the 95% CI with increasing ε values 0.5 and 0.9, except for Sideroxylon.

Interestingly these trends are generally repeated in the more encompassing lineages of the least diverse RF disjunct clades (e.g., Canarina, Hypericum, Sideroxylon). Notably, though Camptoloma has a low extant diversity given its age (three species diverging in the last 6 Myr), the subfamily it belongs to, that is Buddlejoideae, stands above the 95% CI for ε = 0 (**Figure 5**). Something similar can be observed in the case of Kleinia, which shows lower diversity than its encompassing lineage, tribe Senecioneae. Another example of potential diversification shift, though in the opposite direction, is that of Euphorbia, where the genus is significantly less diverse than expected given its age (for all ε values) but RF disjunct clades are species-richer than expected (i.e., E. sect. Aphyllis), except for those that fall within the 95% CI limits (e.g., E. sect. Balsamis, **Figure 5**).

disjunct taxa distributed in southern Arabia-Eastern Africa vs. southern

When comparing crown vs. stem age it is noticeable that in some RF disjunct clades crown and stem ages are far apart: Cicer canariensis vs. Ci. cuneatum (crown age = 3.4 Ma, stem age = 12.2 Ma, with the stem age falling below the lower bound of 95% CIs when ε = 0.0 and 0.9; **Figure 5**). Other examples include, Camptoloma (crown age= 5.5 Ma, stem age = 10.2 Ma), Campylanthus (crown age = 7.5 Ma, stem age = 20.0 Ma), and most notably Sideroxylon (crown age = 17.4 Ma, stem age = 47.3 Ma, **Figure 5**).

the literature: Pistacia lentiscus and Erica arborea (see Discussion).

## Discussion

### Rand Flora Disjunctions through Time

Engler's (1910) intuition on the Tertiary origins of the Afro-Macaronesian floristic element, aka Christ's (1910) Rand Flora, very much hit the mark on the timing of its assembly. Our divergence estimates for Rand Flora disjunctions span five successive time frames (**Figure 4**): Burdigalian, Tortonian, and Messinian Stages (within the Miocene), the Pliocene, and the Pleistocene. The two earliest disjunctions happen on genera Sideroxylon and Hypericum and date back to the Early Miocene (Burdigalian; 17.5 and 17.3 Ma, respectively), coinciding with the longest warming

section, subgenus the RF disjunct clade falls in; diamonds go one level above indicating genus, tribe, subfamily). Ninety five percent confidence intervals show expected diversity through time for a RF lineage that diversifies at the highest rate estimated (i.e., Asteraceae) given three possible scenarios: no extinction (ε = 0), turnover at equilibrium (ε = 0.5), and high extinction (ε = 0.9). See Table S1 for associated net diversification rate estimates.

period of the Miocene (the Miocene Climatic Optimum; Zachos et al., 2008) and with the start of desertification in south-central Africa (Senut et al., 2009). Couvreur et al. (2008) also dated divergences in Annonaceae back to this time period and explained them in terms of a once-continuous Early Miocene rainforest that became fragmented by decreasing moisture brought by the closure of the Tethys Sea. The fact that Sideroxylon and Hypericum exhibit less xeric affinities than other RF lineages, and that their crown diversification dates back to the Paleogene (Meseguer et al., 2013; Stride et al., 2014), suggests these taxa could be relicts of an earlier megathermal flora (sensu Morley, 2000, 2003).

The next disjunction is that of Geranium subgen. Robertium and it dates back to the Late Miocene (Tortonian, 11.0 Ma). This disjunction follows a drastic decline in global temperatures (Late Miocene cooling, 11.6–5.3 Ma; Beerling et al., 2012) and coincides with the temporary closing of the Panama isthmus in America and a moist "washhouse" climate period in Europe (Böhme et al., 2008). This disjunction marks the separation of Macaronesian (e.g., G. maderense) and circum-Mediterranean taxa (e.g., G. robertianum), on one side, and E African species (e.g., G. mascatense), on the other, leaving open the possibility of a colonization of Macaronesia by a Mediterranean ancestor (**Figure 4** and **Figure S12**). Since the disjunction in Geranium subgen. Robertium is linked to a more humid period, rather than an increase on aridity, and because the possible Mediterranean origin of its Macaronesian taxa, this lineage does not exactly match the RF pattern.

Most other Neogene disjunctions seem to concentrate around the Miocene-Pliocene border (**Figure 4**). Messinian disjunctions can be observed in Camptoloma, Campylanthus, Canarina, Euphorbia sects. Anthacanthae and Aphyllis, Kleinia, and Plocama. Pliocene disjunctions are found in Adenocarpus, Camptoloma, Cicer, Colchicum, Euphorbia.sects. Balsamis and Aphyllis, and Plocama. These disjunctions follow two different geographic splits, W-E(&S) Africa and E-S Africa. W-E(&S) disjunctions present the widest temporal (as well as spatial) range. Besides the lineages dated here, other examples can be found in the literature of this W-E(&S) disjunction, e.g., according to Xie et al. (2014), in the Anacardiaceae Pistacia lentiscus and P. aethiopica diverged 4.55 Ma (see **Figure S17**). E-S disjunctions link South Africa and adjacent areas to the East African Rift Mountains, the Ethiopian Highlands, and the Arabian Peninsula. The timing of these E-S disjunctions (Mio-Pliocene) matches the uplift of the Eastern Arc Mountains (Sepulchre et al., 2006). The absence of W-S disjunctions is notable and probably results from African aridification having started in the early Miocene (some 17–16 Ma) in the region where the current Namib Desert stands. This aridification not only persisted through time in this area but also intensified and resulted in the formation of the Kalahari Desert (Senut et al., 2009), effectively limiting range expansions in this direction (W-S), in the absence of successful colonization following LDD. Even in the case of genus Colchicum (**Figure S8**), were S African species appear closely related to NW African ones, W Mediterranean species are always sister to E Mediterranean ones. These leaves open the possibility of a colonization of NW Africa (from S Africa) via E Africa and W Mediterranean populations with subsequent extinction in E Africa. An alternative colonization from Central-West Asia into South Africa and NW Africa seems unlikely given the phylogeny of this genus (**Figure S8**), though proper biogeographic inference to test either possibility remains to be done. Indeed, Sanmartín et al. (2010) found a higher frequency of biotic exchange between NW-E African elements than with either E-S African or W-S African ones, where the latter elements were hardly connected, if at all, confirming our observations. We further argue that the magnitude of observed biotic exchange follows the history of desertification in Africa.

In all, the sequential timing of Neogene disjunctions in RF lineages, which is nonetheless concentrated in certain time intervals (e.g., Late Miocene-Pliocene), is in agreement with a scenario of range expansions (dispersal) in favorable times (windows of opportunity) and range contractions (extinction) as aridification flared up. Extinction results in absence (of a population, species, clade, or lineage) and thus leaves hard to track traces in phylogenies in the absence of fossil data (Meseguer et al., 2015). If repeated cycles of speciation, dispersal, and extinction take place in the same area over time, only taxa that optimize any (or a combination) of these processes (e.g., increased speciation, higher dispersal, lower extinction rates) will persist. It is to be expected that more recent populations, species, clades, or lineages show traces of these processes when compared to ancient ones.

On the other hand, our net diversification rate estimates (**Figure 5**) do no fully support an extinction explanation since, in the context of the family with the highest diversification rate among RF lineages, i.e., Asteraceae, most of the taxa fall inside the 95% CI under a no-extinction scenario (ε = 0.0). However, the method chosen to estimate net diversification rates (Magallón and Sanderson, 2001), though more appropriate given phylogeny size and sampling effort, is still limited. Crown diversification rates cannot be estimated for clades with 2 terminal taxa (see Materials and Methods), which is the case for several RF lineages (e.g., Sideroxylon). Additionally, the "method-of-moments" estimator performs well detecting declining diversity for old groups in exceedingly species-poor clades (Magallón and Sanderson, 2001; Warren and Hawkins, 2006) or young groups notably species-rich (recent radiations, Magallón and Sanderson, 2001), but we observed that statistical power is low to detect declines in diversity for young species-poor groups (e.g., Camptoloma). Most RF disjunct clades dated comprise less than 10 species—e.g., Aeonium, Campanula, Camptoloma, Cicer, Colchicum, Euphorbia sect. Balsamis, Kleinia, and Plocama—, limiting our ability to effectively detect the effects of extinction.

Nonetheless, if we focus on crown ages, disjunct clades in Canarina, Hypericum, and Sideroxylon are less diverse than expected, and given that their encompassing lineages (**Table 1**, **Figure 5**) also follow this trend, it would be safe to assume these lineages have indeed experienced high levels of extinction through time. Likewise, if we were to focus on stem ages, a few other groups fall below the no-extinction scenario (ε = 0.0), notably, Camptoloma, Campylanthus, and Cicer. Moreover, these groups exhibit wide-spanning (often >10 Ma) stem-crown intervals (see Sideroxylon or Cicer in **Figure 5**), an observation that has been tied to historically high extinction rates in recent diversification studies (Antonelli and Sanmartín, 2011; Nagalingum et al., 2011). This would further support the hypothesis that lower diversification rates in RF lineages could be explained in terms of increased extinction rather than a decrease in speciation rates.

Additionally, and given the aforementioned limitations of our diversification method of choice, it would also be safe to conclude that, within Euphorbia, sects. Anthacanthae (sect. Balsamis included), sect. Esula, and sect. Aphyllis, present higher diversity than expected (above the CI for ε = 0.0 in all cases, and also above the CI for ε = 0.5 for the former two clades), which is exceptional in the context of the genus, since Euphorbia is significantly poorer than expected for all ε values. Horn et al. (2014) also detected increased diversification rates in these sections of Euphorbia. Desertificationtropicalization cycles in Africa (Senut et al., 2009) suggest repeated reconnections between now disjunct RF regions since the Neogene, which would have permitted biotic exchange in favorable periods, whereas the isolation of these regions at unfavorable times would have induced speciation through vicariance, enhancing endemicity in these sub-humid/sub-xeric lineages. Molecular dating in tropical trees from the genus Acridocapus (Malpighiaceae; Davis et al., 2002) and the Annonaceae family (Couvreur et al., 2008) shows a similar pattern of connection phases between East African and Guineo-Congolian rainforest regions since the Oligocene following major climate shifts.

The youngest disjunctions, those of Aeonium, Campanula, and Hypericum sect. Campylosporus, are Pleistocene in age (**Figure 4**) and far too recent to result from the Neogene aridification of the African continent. Either rare LDD (i.e., Aeonium; Kim et al., 2008) or stepping-stone dispersal events (i.e., Campanula, Alarcón et al., pers. comm.), perhaps favored by Pleistocene cool and drier glacial cycles, could explain these more recent disjunct geographic patterns, as previously observed in other African taxa, e.g., Convolvulus (Carine, 2005), Moraea (Galley et al., 2007), or the tree heath (Erica arborea). Désamoré et al. (2011) took notice of successive range expansions of Er. arborea from an Eastern African center of diversity toward Northwest Africa, Southwest Europe, and Macaronesia, first during the Late Pliocene (ca. 3 Ma; **Figure 4**) and subsequently in the Pleistocene (ca. 1 Ma).

## Redefining the Rand Flora Pattern

In a recent review, Linder (2014) synthesized the individual histories of numerous African lineages by recognizing five different "floras," which he defined as "groups of clades, which: (a) are largely found in the same area, (b) have largely the same extra-African geographical affinities, (c) share a diversification history, and (d) have a common maximum age." The "Rand Flora" does not fit well this definition. This flora does group a number of lineages that share the same geographic range (even if discontinuous), but they have slightly different climatic tolerances, i.e., sub-humid to sub-xeric or xerophilic, and they do not necessarily share the same extra-African geographical affinities.

Some RF lineages fall within what Linder (2014) terms "tropicmontane flora" (e.g., Hypericum, Canarina), others within the "arid flora" (e.g., Kleinia, Campylanthus). Some RF lineages are better connected with the Mediterranean Region (e.g., Adenocarpus), others with Asia and the Indo-Pacific Region (e.g., Plocama). Moreover, RF taxa on either side of any given disjunction (i.e., W-E or E-S) do no longer share a "diversification history," though they do share the same fate as other RF lineages with similar distribution. In fact, the different ages estimated here for the various RF disjunctions agree well with what has been termed biogeographic pseudocongruence (Donoghue and Moore, 2003), a phenomenon whereby two or more lineages display the same biogeographic pattern but with different temporal origins (Sanmartín, 2014). What is shared by all RF lineages is the nature of the climatic (ecological) barriers separating the taxa at either side of any given disjunction: arid regions such as the Sahara, the Kalahari or the Namib deserts, or the tropical lowlands in Central Africa. The congruence between RF disjunction ages and successive major climatic events in Africa during the Neogene (**Figure 4**) suggest that the ongoing aridification of the continent (or the "tropicalization" of Central Africa) affected RF lineages according to their different physiological (climatic) tolerances: more sub-humid lineages diverged first (e.g., Sideroxylon), more xeric later (e.g., Campylanthus).

One point of contention in the literature has been the limits of the Rand Flora with respect to the "Arid Corridor" or "Arid Track" (hereafter AC), a path repeatedly connecting southwest to north-east arid regions in Africa (and henceforth to central and southwest Asia) first proposed by Winterbottom (1967) and later expanded by de Winter (1966, 1971) and Verdcourt (1969). Bellstedt et al. (2012) defined the AC pattern as the disjunction occurring between Southern Africa and Eastern African-Southern Arabian xeric floristic elements. Linder (2014) considered the RF as an expansion of the AC to the west, in agreement with Jürgens' (1997) view. However, we consider that the RF and AC patterns are different. AC elements have more xeric preferences than the sub-humid to sub-xeric ones exhibited by RF elements. AC elements often extend into deserts (e.g., Namib, Kalahari, Sahara)—see studies by Beier et al. (2004) on Fagonia (Zygophyllaceae), Bellstedt et al. (2012) on Zygophyllum (also Zygophyllaceae), Carlson et al. (2012) on Scabiosa (Dipsacaceae), or Bruyns et al. (2014) on Ceropegieae— and have broader, more continuous distributions, plus they tend to be younger in age (often Pleistocene, coincident with Quaternary glaciation cycles). Our understanding is that this younger xeric AC elements move in parallel to RF taxa webbing with them in areas favorable to either, and thus confusing their limits. Something similar could have happened with Afromontane elements migrating south to north as the Eastern African mountains rose through the Mio-Pliocene; these elements are not part of the RF (e.g., Iris, Moraea, Galley et al., 2007).

In this study, we have provided a temporal framework for the Rand Flora pattern and estimated net diversification rates for 17 RF lineages. Our results provide some support to the historical view of an ancient African flora, whose current disjunct distribution was probably modeled by the successive waves of aridification events that have affected the African continent starting in the Miocene, but whose origin predates the latest events of Pleistocene climate change. These patterns were probably formed by a combination of climate-driven extinction and vicariance within a formerly widespread distribution. Whether these lineages all had a continuous, never interrupted, distribution that occupied all the area that now lies in between the extremes of the disjunction, or they had a somewhat narrower distribution in the past and they expanded their range tracking their habitat across the landscape in response to changing climate (e.g., along a corridor), is difficult to say with the current evidence. Discerning between these hypotheses will require the integration of phylogenetic, biogeographic and ecological approaches to reconstruct the ancestral ranges and climatic preferences of ancestral lineages (Mairal et al., 2015; Meseguer et al., 2015). Compared to speciation, extinction has received far less attention in studies focusing on the assembly of tropical biotas. Disentangling extinction from other processes is particularly difficult because the biodiversity we observe today is only a small fraction of that of the past. The Rand Flora pattern might offer a prime study model to understand the effects of climatedriven extinction in the shaping of continent-wide biodiversity patterns.

## Author Contributions

IS and LP conceived and designed the study. LP analyzed the data with help from IS, RR, and MM. LP and IS co-wrote the text, with contributions from MH, RR, MM, and AM. All authors contributed with data compilation, figure preparation, or text comments. MM has copyright of all plant pictures, except for Cicer canariense.

## Acknowledgments

This study was funded by the Spanish Ministry of Economy and Competitiveness (MINECO): Project AFFLORA, CGL2012-40129-C02-01 to IS. LP was funded by CSIC postdoctoral contract within AFFLORA. MH was funded by CGL2012-40129-C02-02, the Research Council of Norway (203822/E40) and a Ramón y Cajal Fellowship (RYC2009- 04537). RR was supported by a JAE-DOC postdoctoral fellowship (MINECO) and the European Social Fund. MM and VC were supported by MINECO FPI predoctoral fellowships (BES-2010-037261 and BES-2013-065389 respectively). We thank Virginia Valcárcel (Department of Biology, UAM, Spain) for help with data compilation and literature revision during the earlier stages of the project, Andrea Briega (Department of Ecology, UAH, Spain) for help with data compilation, and Manuel Gil for providing a Cicer canariense picture.

## Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fgene. 2015.00154/abstract

Supplementary Materials include descriptions of study groups with references, **Table S1**, and **Figures S1–S17**.

Table S1 | Net diversification rates (bd.ms) for all RF disjunct clades and their encompassing lineages (bold = highest crown.p, red when n ≤ **2**) under three possible scenarios: no extinction (ε = **0**), turnover at equilibrium (ε = **0**.**5**), and high extinction (ε = **0**.**9**). Probability (crown.p) of obtaining a clade with the same size and age as the RF disjunction, given the background diversification rate of the encompassing clade/s and at increasing

## References


extinction fractions (bold = highest crown.p, italics p < 0.05). Stem and Crown ages in Myr.

Figures S1–S17 | BEAST MCC chronograms showing mean estimates and 95% high posterior density (HPD) confidence intervals for those nodes receiving 50% support. Branch width is proportional to PP support. Red colored taxa indicate Eastern African provenance; Macaronesia/western African taxa and southern African taxa are colored in blue and green, respectively. Calibration points are indicated with stars; RF disjunctions within each lineage discussed in the text and represented in Figures 3–5 are indicated with arrows.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Pokorny, Riina, Mairal, Meseguer, Culshaw, Cendoya, Serrano, Carbajal, Ortiz, Heuertz and Sanmartín. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Continental scale patterns and predictors of fern richness and phylogenetic diversity

Nathalie S. Nagalingum<sup>1</sup> \*, Nunzio Knerr <sup>2</sup> , Shawn W. Laffan<sup>3</sup> , Carlos E. González-Orozco2, 4, Andrew H. Thornhill 2, 5, Joseph T. Miller <sup>2</sup> and Brent D. Mishler <sup>6</sup>

<sup>1</sup> National Herbarium of New South Wales, Royal Botanic Gardens and Domain Trust, Sydney, NSW, Australia, <sup>2</sup> Centre for Australian National Biodiversity Research, Commonwealth Scientific and Industrial Research Organisation Plant Industry, Canberra, ACT, Australia, <sup>3</sup> Centre for Ecosystem Science, School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, NSW, Australia, <sup>4</sup> Institute for Applied Ecology and Collaborative Research Network for Murray-Darling Basin Futures, University of Canberra, ACT, Australia, <sup>5</sup> Australian Tropical Herbarium, James Cook University, Cairns, QLD, Australia, <sup>6</sup> University and Jepson Herbaria, and Department of Integrative Biology, University of California, Berkeley, CA, USA

#### Edited by:

Toby Pennington, Royal Botanic Garden Edinburgh, UK

#### Reviewed by:

Mary Gibby, Royal Botanic Garden Edinburgh, UK Michael Kessler, University of Zurich, Switzerland

#### \*Correspondence:

Nathalie S. Nagalingum, Royal Botanic Gardens and Domain Trust, Mrs Macquaries Road, Sydney, NSW 2000, Australia nathalie.nagalingum@ rbgsyd.nsw.gov.au

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 30 June 2014 Accepted: 19 March 2015 Published: 14 April 2015

#### Citation:

Nagalingum NS, Knerr N, Laffan SW, González-Orozco CE, Thornhill AH, Miller JT and Mishler BD (2015) Continental scale patterns and predictors of fern richness and phylogenetic diversity. Front. Genet. 6:132. doi: 10.3389/fgene.2015.00132 Because ferns have a wide range of habitat preferences and are widely distributed, they are an ideal group for understanding how diversity is distributed. Here we examine fern diversity on a broad-scale using standard and corrected richness measures as well as phylogenetic indices; in addition we determine the environmental predictors of each diversity metric. Using the combined records of Australian herbaria, a dataset of over 60,000 records was obtained for 89 genera to infer richness. A molecular phylogeny of all the genera was constructed and combined with the herbarium records to obtain phylogenetic diversity patterns. A hotspot of both taxic and phylogenetic diversity occurs in the Wet Tropics of northeastern Australia. Although considerable diversity is distributed along the eastern coast, some important regions of diversity are identified only after sample-standardization of richness and through the phylogenetic metric. Of all of the metrics, annual precipitation was identified as the most explanatory variable, in part, in agreement with global and regional fern studies. However, precipitation was combined with a different variable for each different metric. For corrected richness, precipitation was combined with temperature seasonality, while correlation of phylogenetic diversity to precipitation plus radiation indicated support for the species-energy hypothesis. Significantly high and significantly low phylogenetic diversity were found in geographically separate areas. These separate areas correlated with different climatic conditions such as seasonality in precipitation. The phylogenetic metrics identified additional areas of significant diversity, some of which have not been revealed using traditional taxonomic analyses, suggesting that different ecological and evolutionary processes have operated over the continent. Our study demonstrates that it is possible and vital to incorporate evolutionary metrics when inferring biodiversity hotspots from large compilations of data.

Keywords: Polypodiopsida, Filicopsida, ferns, Australia, conservation, evolution, community, spatial phylogenetics

## Introduction

Biodiversity hotspots occur because organisms have non-random patterns of distribution, and identifying and explaining these hotspots have long been of interest to biologists, ecologists, and biogeographers (Magurran and McGill, 2011). In the past, patterns have been identified based on anecdotal observations, but with the availability of large datasets and informatic tools, these patterns can now be inferred using analytical methods. Oftentimes these analytical approaches have confirmed previous observations, however, they have also revealed new patterns, particularly at continental or global scales (Currie and Paquin, 1987; Crisp et al., 2001; Francis and Currie, 2003; Hawkins et al., 2003; Kier et al., 2005, 2009; Kreft and Jetz, 2007; Kreft et al., 2010).

The most widely used measure of biodiversity is taxon diversity, also known as richness, which is a count of the number of terminal taxa (typically species but sometimes genera or other taxonomic levels) that occur in a particular region. Richness is calculated from occurrence data with geographic locations. However, richness estimates are influenced by sampling size and effort. Therefore, a number of metrics have been developed addressing these issues in order to obtain more accurate estimates (Magurran, 2004; Maurer and McGill, 2011).

Alternatively, there are measures of biodiversity that incorporate phylogeny with geography. These phylogenetic measures can be calculated for diversity and endemism, and are advantageous because use of a phylogeny takes into account evolutionary history (Faith, 1992; Moritz and Faith, 2002; Rosauer et al., 2009). Thus, phylogenetic measures provide an estimate of how much of the evolutionary history is represented in a particular region, and is referred to as phylogenetic diversity. Taxon-based measures use counts of species or genera as units, but in phylogenetic measures the units are the branch lengths connecting the terminal taxa in a region. Using a species as a terminal does not add substantially more branch length than using a genus, and thus, phylogenetic measures are not particularly sensitive to the taxonomic level chosen for analysis or to splitting and lumping of taxa (Rosauer et al., 2009). Identifying regions with high diversity is critical to conservation efforts, and the inclusion of phylogenetic indices can identify areas that standard metrics do not (Forest et al., 2007; Hendry et al., 2010; Winter et al., 2013).

Key to understanding distribution patterns is how they correlate with environmental conditions. Environmental variables can be climatic, such as rainfall, radiation, and temperature, or physical, such as soil type and topography. Numerous variables have been investigated in relation to richness, but rarely for phylogenetic indices. In several cases, the variables most strongly correlated with plant richness are mean annual temperature, water availability, and evapotranspiration, as well as topography (Currie and Paquin, 1987; Francis and Currie, 2003; Hawkins et al., 2003; Kreft and Jetz, 2007; Kreft et al., 2010). Although these variables are strong predictors of richness and may be causal, historical factors could also be responsible for current distribution patterns (Qian and Ricklefs, 2004; Wiens and Donoghue, 2004). Here we explore the relationships between richness, phylogenetic diversity, and environmental variables, and address the potential role of evolutionary history through the inclusion of phylogenetic indices.

Because ferns have a broad range of habitat preferences, spanning tropical rainforests to deserts, they are widely distributed and are therefore an ideal group for understanding how diversity is distributed. Most of our understanding of fern distribution derives from transects documenting richness and its relationship to environmental variables (Dzwonko and Korna´s, 1994; Lwanga et al., 1998; Tuomisto and Poulsen, 2000; Kessler, 2002a; Aldasoro et al., 2004). Fern richness is greatest in areas of high topographic relief and complexity, high evapotranspiration, and with many rain days (Kessler, 2010). Special focus has been given to the relationship between elevational gradients and richness, with richness peaking along mid-regions of slopes (suggesting the presence of the mid-domain effect) (Kessler, 2001, 2002b, 2010; Hemp, 2002; Kromer et al., 2005; Kluge and Kessler, 2006, 2011; Watkins, 2006; Kessler et al., 2011). However, a study compiling data from multiple altitudinal transects across the globe indicates that climate (water availability and temperature) rather than the mid-domain effect has better explanatory power for species richness (Kessler et al., 2011). Furthermore, the mid-domain effect may be more relevant to particularly high elevation regions such as in the Andean tropical regions, but may be less relevant in flat and vast continents such as Australia.

Overall, there are few synthetic studies of fern diversity and their relationship to the environment over large regions; although there are studies in the Iberian Peninsula, New Zealand, and Australia, but all focusing on richness alone (Lehmann et al., 2002; Bickford and Laffan, 2006; Moreno Saiz and Lobo, 2008). At broad scales, these studies found that water availability was correlated with regions of greatest richness, and also variably identified mean annual temperature, radiation, and topography (environmental heterogeneity) as important. At these scales, it is unclear what the corresponding patterns of phylogenetic diversity are, and how they relate to the environment.

Here we examine the patterns of fern diversity using richness and phylogenetic diversity indices. In particular we test for areas of significant randomized phylogenetic diversity against a null model. We also assess the patterns of diversity observed for correlations with environmental variables. We have chosen Australia as the study area because: the continent encompasses a broad range of habitat types, the unique availability of distributional data due to the efforts of Australia's Virtual Herbarium, and a completed floristic treatment of the ferns, which means that the taxonomy has largely been standardized across states (Australian Biological Resources Study, 1998).

## Material and Methods

### Geographic Data

Records of fern collections held in Australian herbaria are available in Australia's Virtual Herbarium (AVH) (http://avh.ala.org. au/) and were downloaded for this study. The download totaled 84,134 records. Using Google Refine version 2.5 (http://code. google.com/p/google-refine/), the dataset was cleaned to remove non-fern records (e.g., algae, lycophytes, and angiosperms), foreign collections (as well as Norfolk and Macquarie Islands), cultivated material, weeds, and garden escapees. The ferns were restricted to the Classes Marattiopsida and Polypodiopsida (Smith et al., 2006), and geographic ranges were examined to determine if there were potential misidentifications for specimens found outside of their known geographic range (Australian Biological Resources Study, 1998). The taxonomy was reconciled against a classification for extant ferns (Smith et al., 2006), for a total of 386 species in 89 genera and 25 families. Misspellings in the taxonomy were also corrected. Hybrids, doubtfully identified taxa (with either cf., aff., or ?), and specimens assigned only to genus-level were discarded. Records lacking geographic coordinates were excluded from the dataset, and latitude and longitude values of the remaining records were transformed into an Albers equal area coordinate system (European Petroleum Survey Group code EPSG3577). Following all of these spatial verification and cleaning steps, the dataset consisted of 63,230 records with greatest sampling in eastern Australia (**Figure 1A**).

Species and genus richness were analyzed using grid cell sizes of 50 × 50 km and 100 × 100 km, which showed similar patterns (Figures S1A vs. S1B, S1C vs. S1D); thus we use the finerscale, 50 × 50 km, cells for all of the analyses presented in order to minimize aggregation of the environmental data. Inspection of species vs. genus richness (Figures S1A vs. S1C) show that the patterns obtained are congruent. Therefore, the genus level results are presented herein because the phylogenetic diversity analyses were conducted at the genus level as there are currently inadequate molecular data to generate species-level phylogenies. A general relationship between species richness and higher taxon richness is found in many other studies as well (Williams and Gaston, 1994; Williams et al., 1997; Aldasoro et al., 2004; Currie and Francis, 2004).

Using the 50 × 50 km grid cell size resulted in a total of 1986 grid cells across Australia with at least one fern record. When the data were aggregated within the grid cells, the 63,230 records were reduced to 18,050 unique occurrences. As an example, the best-collected genus Cheilanthes comprised 6466 records but after aggregation into the 1986 grid cells there were 1320 unique occurrences. As the calculations only take into account occurrence in a cell, and not abundance values, there are multiple redundant records of the same taxon in each grid cell (**Figure 1B**). Redundancy, a measure of sampling quality, is scaled from zero to one (Garcillán and Ezcurra, 2003; Laffan et al., 2010). Redundancy values close to zero indicate possible undersampling, while those close to one indicate well-sampled cells. Redundancy shows some variability across the continent, most closely linked to regions of population density (**Figure 1B**). There are some regions with values closer to zero, but since ferns are rarely found in these arid regions, the lack of duplicate sampling is likely not influencing the patterns we observe.

### Molecular Data and Phylogenetic Analyses

All of the available sequences for each genus were downloaded from Genbank, regardless of species or geographic origin. Initially seven chloroplast markers were assessed for potential use in the phylogenetic analyses, but of these only three markers (atpA, atpB, and rbcL) were selected because they had the lowest amount of missing data. There were 420 sequences for atpA, 1117 sequences for atpB, and 2454 sequences for rbcL; the sequences were aligned using MAFFT version 6.860b (Katoh and Toh, 2008). Using each of these markers separately, a maximum likelihood phylogeny was constructed using GARLI (Genetic Algorithm for Rapid Likelihood Inference) version 0.951 (Zwickl, 2006), and set to terminate automatically

using default parameters. In order to select one representative sequence for each genus, each of the three phylogenies was examined to determine that the sequences representing a genus were monophyletic. In cases where a genus was paraphyletic, this was typically due to an outlier sequence; thus, the phylogenetic position of the genus was checked against published phylogenies to verify that the sequence was indeed an outlier, and then it was discarded. Subsequently, two additional criteria were applied to select a representative sequence: first, a sequence was obtained from an Australian species, and second, if there were no Australian species, another species was selected if all three markers were available (or otherwise two markers).

New sequences were generated for seven genera not previously represented in Genbank, with GenBank accession numbers KP164480-KP164497 (Table S1). The following primers were used as amplification and sequencing primers for rbcL: ESRBCL1F, ESRBCL628F, ESRBCL654R, ESRBCL1361R; atpB: ESATPB172F, ESATPE45R; and atpA: ESATPF412F, ESATPF412F, ESTRNR46F, ESTRNR46F (the latter two as reverse primers) (Schuettpelz et al., 2006; Schuettpelz and Pryer, 2007). In addition, sequencing only primers were used for atpB: ATPB1163F, ATPB910R; and atpA: ESATPA856F, ESATPA877R (Schuettpelz et al., 2006; Schuettpelz and Pryer, 2007). PCR amplification was performed at 95◦C for 10 min; 30 cycles of 94◦C for 30 s, 64◦C for 1 min, and 72◦C for 45 s; and one cycle at 72◦C for 5 min.

One representative sequence per marker for each of the 89 genera was assembled into a matrix for a total of 3893 base pairs. In this matrix, atpA and atpB were each missing sequences for four genera, and rbcL was complete. In this final matrix, the genus Actinostachys was incomplete at two of the three markers, Anogramma, Colysis, Cyclosorus, and Marattia were incomplete at one of three markers, and all other genera were complete for all three markers (Table S1). Molecular phylogenetic analysis of the partitioned, concatenated three markers were conducted using RAxML-HPC 7.3.2 using the CIPRES online portal (Miller et al., 2010). The supermatrix and resultant trees are deposited in Tree-Base as study #15499. Overall, there was strong support for most of the nodes in the phylogeny (Figure S2), and the phylogenetic relationships were verified against a tree derived from the most well sampled and complete molecular dataset of ferns to date (Schuettpelz and Pryer, 2007).

To assess the impact of taxonomic changes on the results, we compared the PD values from this study to the PD values of an updated tree. In this updated tree, multiple recent taxonomic changes were incorporated: Doodia and Pteridoblechnum were omitted because they were synonymized into Blechnum (Perrie et al., 2014); the new genus Telmatoblechnum, which is a segregate of Blechnum, was included (Perrie et al., 2014); Oenotrichia and Coveniella were omitted because they had been synonymized into Lastreopsis (Labiak et al., 2014a,b); the new genus Parapolystichum, which is a segregate of Lastreopsis, was added (Labiak et al., 2014a,b); and Revwattsia was removed because it was synonymized into Dryopteris (McKeown et al., 2012). When we reanalyzed the data with the new tree, the range of PD values did not change significantly. In fact, when we subtracted the original PD value from the new PD value (calculated within the same grid cell) the differences ranged from −0.00000050 to 0.00000050, and the mean difference across all 1986 grid cells was −0.00000006. As a comparison, the PD values in this study were 0.01879–0.9308. It is clear, therefore, that recent taxonomic changes have little impact on the results.

### Analysis of Diversity

The phylogeny derived from the three markers and the geographic data were imported into Biodiverse version 0.17 (Laffan et al., 2010). Several measures of diversity were calculated: species richness (SR), genus richness (GR), Margalef genus richness (MR), phylogenetic diversity (PD), and randomized phylogenetic diversity (PDrand).

The richness measures (SR and GR) are a direct count of the number of taxa occurring in each grid cell. To correct for uneven sampling effort among grid cells, the Margalef diversity metric is used. This metric standardizes richness across grid cells by dividing richness (R - 1) by the natural log of the number of samples (N) in a grid cell, thus RMargalef = (R − 1)/ln N (Magurran, 2004; Maurer and McGill, 2011).

Phylogenetic diversity (PD) measures the amount of the phylogenetic tree that is represented in a grid cell (Faith, 1992), which represents the sum of branch lengths in that grid cell. Specifically, this measure takes all of the terminal taxa that are found in one grid cell and sums the branches connecting them along a path to and including the root node. We depict PD as a proportion of the total tree length. PD is expected to be correlated with richness since with more terminal taxa there are more branch lengths to be summed (Tucker and Cadotte, 2013)—thus in rich areas the PD is expected to be larger. Indeed, we found that the GR and PD values within each grid cell are highly correlated; the generalized linear model (GLM) yielded an r 2 equivalent of 0.895.

To test for statistical significance of the PD results, we used a randomization test (Mishler et al., 2014). First, all of the 18,050 unique occurrences (see "Geographic Data") are pooled, and records from this pool are randomly assigned (without replacement) to the grid cells based on a constraint that the number of unique occurrences per grid cell are kept constant. This has the effect of keeping the number of terminal taxa per cell constant, for example, if a grid cell had 11 unique occurrences, then 11 unique occurrences from the pool were randomly assigned to that grid cell. In addition, the range size of each terminal taxon was kept constant. Second, PDrand is recalculated for each grid cell using the randomly assigned unique occurrences and the original phylogeny. These two steps yield new PDrand values for each grid cell. This process is repeated 999 times to give a distribution of PDrand values, and the original PDrand value is compared to the 999 PDrand values. If the original PD falls in the upper or lower 2.5% of the 1000 values, the PDrand of that cell is judged statistically significant. This method is similar to a randomly generated null community, known as null model 1 in the software Phylocom (Webb et al., 2008), or the null constrained model (Kembel and Hubbell, 2006). We recognize that this is one of several possible null models (Gotelli, 2000), each of which tests different questions, and is associated with different assumptions, but this model is appropriate for our objective to identify cells with significantly high or low PD values. Furthermore, we conducted an additional analysis where we standardized the PD value by the number of taxa (PD/richness), referred to as relative phylogenetic diversity (PDrel) by Davies et al. (2007). The PDrel randomization results are identical to the PD randomizations, indicating the robustness of the metrics; herein we refer to the PD randomizations only.

#### Environmental Predictors of Diversity

Eleven environmental variables were assessed against the diversity indices. These variables encompass temperature, precipitation, topography and substrates (**Table 1**). The 11 variables were selected because they are "independent" and representative of the predominant conditions. The climatic data were from selected BIOCLIM layers described and developed as part of ANU-CLIM version 5.1 (Hutchinson et al., 2000). The soil layers were obtained from a database that was generated as part of a national survey (National Land and Water Resources Audit, 2002).

The predictors were tested against all of the indices as single variables (e.g., mean annual temperature). The top five single variables (based on the Akaike Information Criterion, AIC, see below) were examined further as interaction variables (e.g., mean annual temperature × topography), and as additive variables (e.g., mean annual temperature + topography). Some of the grid cells did not have environmental data and were excluded from the analyses, leaving 1913 grid cells in this part of the study. Most of these points were on the coast.

#### Generalized Linear Models

Because the indices do not have a normal distribution, standard linear models (LM) were not appropriate for analyzing their relationship to the 11 environmental variables. Based on examination of the data, a Poisson distribution best described the GR data, and a gamma distribution was most appropriate for the PD data. Consequently, generalized linear models (GLMs) were used because they allow for data that are not normally distributed, and a GLM was calculated for each variable and diversity metric with their appropriate distributions. A different approach was required for PDrand results since they are either significant or not, that is p ≤ 0.05. The statistically significant values were transformed into 1 and all other values were set to 0. These transformed data were then analyzed using GLMs with binomial distributions; the binomial is the only distribution that accepts categorical data, whereas all other distributions require continuous data. The higher and lower than expected PDrand were analyzed separately (high PDrand = 1 or low PDrand = 1). In total there were 40 high PDrand values and 84 low PDrand values. For each GLM, the following were recorded: AIC, percent of deviance explained (equivalent to r 2 ), the statistical significance of the z-score of Wald's test, and whether the slope of the line was positive or negative. AIC values within 3 units of the best model were considered equally informative.

#### Detecting Spatial Autocorrelation Using Moran's I

To test whether there are biases due to spatial autocorrelation, Moran's I was calculated on the residuals of each standard linear model (LM). Spatial autocorrelation can yield misleading results because points that are close to each other will tend to share the same taxa, and are not independent (Dormann et al., 2007). Testing for spatial autocorrelation required a series of steps.

Firstly, a matrix was created that establishes which points are neighbors (referred to as a neighbors list), thus identifying the adjacent grid cells that could be affected by spatial autocorrelation. A distance criterion based on grid cell size was used to define the nearest neighbors. Since the grid cells were 50 × 50 km, all of the eight adjoining grids are in the radius of 75 km and so this latter value was set as the distance value for defining the nearest neighbors. Of the 1913 grid cells, 18 cells did not have neighbors within 75 km, and were subsequently excluded from the analyses (Figure S3). For the PDrand data, neighbors were defined at either 150 or 300 km because the radius of 75 km resulted in too few neighbors to draw any meaningful conclusions. There were 40 PDrand high values, and when using the smaller radius size, 15 of these points were deleted because there were no neighbors. In contrast, using the larger radius size required deletion of only four neighbors (Figures S4A,B). For the PDrand low analyses there were 84 values, and at the 150 km radius nine points were removed, while four were removed at the 300 km level (Figures S4C,D).


Secondly, weights were assigned to the neighbor relationships (via a spatial weights matrix). All of the relationships were given equal values of 1. Finally, the Moran's I global tests were run using the spatial weights matrix and a LM for each of the environmental variables and each of the indices. Moran's I values around zero indicate that there is no spatial autocorrelation in the residuals of the LMs and so neighbors have random values that are not linked. Moran's I values toward 1 indicate a positive correlation where adjoining grid cells are likely to share the same value (highhigh; low-low); values toward -1 indicate a negative correlation suggesting that neighbors are more likely to have opposite values (low-high). The p-values of the Moran's I results were also recorded.

### Accounting for Spatial Autocorrelation Using Spatial Autoregressive Models

To account for spatial autocorrelation, spatial autoregressive (SAR) models were used (Dormann et al., 2007; Kissling and Carl, 2007). There are three types of SARs, each of which account for autocorrelation via the addition of an extra term, a covariance matrix, to a standard linear model (LM). (1) SARlag is the lagged response model, which accounts for spatial autocorrelation by adding a term for spatial autocorrelation in the response variable (in this study, the richness indices), (2) SARmix is the lagged mixed model accounting for autocorrelations in both the indices and the environmental variables (the response and predictor variables), and (3) SARerr is a spatial error model that accounts for autocorrelation via an error term (i.e., in neither the indices nor the variables) (Dormann et al., 2007). For our data, the three SAR models (SARerr, SARlag, and SARmix) identified the same variables as the best models, however, SARmix occasionally yielded some differences from the other two models (not shown). Given the high error rate in SARmix identified in earlier studies, this result is not surprising and is likely to be a type I error (Dormann et al., 2007; Kissling and Carl, 2007). Tests of all three models indicate that SARerr has the least bias and error among all the SAR models (Kissling and Carl, 2007). Therefore, we present the results of SARerr here.

In the same steps as the calculations of the Moran's I, a neighbor list was constructed, followed by a spatial weights matrix; both were identical to that used for calculating the Moran's I. The SARs were then calculated using a LM for each of the environmental variables and each of the indices, together with the spatial weights matrix and the extra autocorrelation term. AICs were calculated for each model to determine which of the environmental variables best fitted each of the indices. The significance value for each SAR model was also recorded. SARs were calculated for all of the indices except for PDrand, which are statistical significance values and are therefore not appropriate for analysis using a LM.

The GLMs, Moran's I, and the SARs were calculated using the spdep package version 0.5–56 (Bivand, 2014) in R version 2.15.3 (R Core Team, 2014).

## Results

#### Richness

The majority of fern richness is found along the eastern coast in Australia (**Figure 2A**). The greatest richness occurs in the Wet Tropics in Queensland with 80 genera (red-orange, **Figure 2A**), while the Border Ranges (between Queensland and New South Wales) is the second richest area with approximately 45 genera (yellow, **Figure 2A**). There is a continuous tract of low richness (approximately 30–40 genera, light blue **Figure 2A**) extending from Tasmania along the east coast of the continent. However, the vast majority of Australia has poor richness (less than 10 genera, dark blue, **Figure 2A**). Correcting for sampling using Margalef richness (MR) (**Figure 2B**) intensifies the patterns observed in uncorrected richness (**Figure 2A**). Namely, the east coast tract of low richness increases to medium richness with patches of medium-high richness. These patches correspond to the Border Ranges and Sydney Sandstone regions.

### Phylogenetic Diversity

Phylogenetic diversity (PD) (**Figure 2C**) shows similar patterns to richness (**Figure 2A**), but most especially to MR (**Figure 2B**). In agreement with the two richness metrics, PD also shows the Wet Tropics as the greatest hotspot with 90% (0.90) of the phylogeny represented. Notable similarities between MR and PD are the medium-high values in the Border Ranges and the Sydney Sandstone regions, with about 50–60% of the tree present for PD (**Figure 2C**). Also, the Northern Territory and Tasmania have corresponding regions of low MR and low PD (**Figures 2B,C** light blue). As for the richness metrics, the overwhelming pattern is that most of the continent has poor PD (dark blue, **Figure 2C**), and that the east coast hosts most of the fern PD (light blue-yellow-orange-red, **Figure 2C**).

## Statistically Significant Phylogenetic Diversity

The phylogenetic diversity randomizations showed that cells with significantly high PDrand were mostly separate from those with low PDrand (**Figure 2D**). The high PDrand regions are concentrated toward the north of Australia, in the most northern parts of the Northern Territory and Queensland (purples, **Figure 2D**), whereas the low PDrand regions are aggregated along the east coast (teals, **Figure 2D**). We note that PDrand and PD standardized by richness (not shown) yielded identical results, indicating that PDrand is not sensitive to sampling bias.

The significantly high PDrand cells are characterized by the overrepresentation of taxa with long branches representing disparate clades of the phylogeny. Principally these are the combination of a long-branched early diverging lineage (Schizaeales) and a long-branched derived family, Pteridaceae. The genus Lindsaea is typically found in these cells, and in some cells the early diverging lineages Salviniales and Gleichneiales are present too. Alternatively, significantly high PDrand cells can have one taxon representing a particularly long branch, either Acrostichum or Ceratopteris. Conversely, the significantly low PDrand cells can have one taxon with an especially short branch, for example Abrodictyum in the early diverging Hymenophyllaceae. Examination of the taxonomic composition of the cells that have significantly low PDrand reveals three additional ways in which low PDrand can arise. First, there are few taxa in that cell, and they are all restricted to one family, such as Pteridaceae. Second, there is a moderate number of taxa but they are restricted to one clade, the tree fern + polypod clade, with the early diverging lineages

FIGURE 2 | Maps showing grid cells values for (A) richness, (B) Margalef richness, (C) phylogenetic diversity (as a proportion of total phylogenetic diversity), and (D) significant phylogenetic diversity identified via randomization.

not represented. Thirdly, again there is moderate to high numbers of taxa, but here the early diverging lineages are represented, while the eupolypods I (composed of Nephrolepidaceae, Lomariopsidaceae and Polypodiaceae) are missing.

#### Predictors

When the indices were correlated with the environmental variables, the GLMs for richness and PD indicated that annual precipitation together with mean annual radiation, as additive or interactive variables, is the best predictor (**Table 2**). In the case of Margalef richness, the GLM with the lowest AIC was annual precipitation together with seasonality in temperature. However, Moran's I values from 0.48 to 0.59 show that these results are biased by spatial autocorrelation (**Table 2**). When spatial autocorrelation is taken into account using SARs, the best predictor for richness changes to annual precipitation by topography (ridge top). For both Margalef richness and PD, the models recovered using GLM and SAR are identical. All of


metrics.

TABLE

 autoregressive

 variables; "+" indicates additive model of the two variables,

 models,

 measures are reported as z- and p-values. The z-score is given for the simultaneous

> for significance

 levels: \* ≤ 0.05; \*\* ≤ 0.01; \*\*\* ≤ 0.001; ns, not significant. See Table 1 for environmental

deviance explained; I, Moran's I of residuals of linear model, note that all Moran's I are highly significant ≤ 0.001. Significance

and the p-value is given for the generalized linear model. Abbreviations

and "×" indicates an interactive model for the two variables. Degrees of freedom are provided in Table 3.

TABLE 3 | Statistical relationships between environmental variables and PDrand, randomized phylogenetic diversity, with higher (high PDrand) or lower (low PDrand) than expected PD for statistically significant cells only.


See Table 2 for abbreviations and full explanation.

the best performing models are explained by a positive interaction, and all included a term for annual precipitation, indicating the importance of water in determining the diversity of ferns (**Table 2**). Regardless of the metric used, the two-factor models outperformed all of the single-factor models.

The randomizations of PD are explained by a different set of environmental factors compared to the other indices (**Table 3**). In addition, high PDrand and low PDrand are each explained by different environmental factors. The high PDrand areas correlate with mean annual radiation added/by temperature seasonality (0 compared to 2 1AIC respectively, **Table 3**), and a close model was temperature seasonality plus precipitation in the coldest quarter (3 1AIC, **Table 3**). Overall, the highest performing models for high PDrand all had temperature seasonality in common. The low PDrand areas also correlate with temperature seasonality, but by annual precipitation. For 150 or 300 km values at which neighbors were defined, Moran's I values were all close to zero indicating that spatial autocorrelation was not present in these datasets (**Table 3**).

## Discussion

### Comparing Diversity Metrics and Hotspots

Using the largest fern dataset assembled to date, we find that the Wet Tropics is the most diverse region identified using three different diversity metrics (red cells in **Figures 2A–C**). When richness is compared to PD there are regions that have greater phylogenetic diversity than richness. However, when uneven sampling effort is accounted for, PD and richness (measured as Margalef richness) are generally in agreement. Additional regions with significant diversity are the Border Ranges and Sydney Sandstone, both in coastal eastern Australia (light blue in **Figure 2A** vs. orange-yellow in **Figure 2C**), and Kakadu– Alligator Rivers in the Northern Territory, the Kimberly region in northwest Western Australia, southwest Western Australia, and Tasmania (dark blue in **Figure 2A** vs. light blue-yellow in **Figure 2C**).

The disparity between uncorrected richness and phylogenetic diversity indicates that richness is not necessarily predictive of phylogenetic diversity. Studies elsewhere have shown a disparity between richness and phylogenetic diversity, sometimes when sampling has been accounted for (Davies et al., 2007; Forest et al., 2007; Huang et al., 2011). Such disparity emphasizes that caution is needed when using richness alone as a metric of diversity. Thus, we focus on Margalef richness results because the sampling bias has been corrected. From a conservation viewpoint, diversity must be assessed using sample standardization for richness as well as using phylogenetic diversity. Regions with high phylogenetic diversity can harbor unrecognized diversity value (Moritz and Faith, 2002; Rosauer and Mooers, 2013), and may be culturally and medicinally useful too (Forest et al., 2007).

In an earlier study, fern richness across Australia was documented at the species scale and using a dataset approximately half the size of the present study (Bickford and Laffan, 2006). Differences in the datasets are most evident in the increased sampling in inland regions. Regardless of the dataset size (Figure S1A our study, vs. Figure 2A, Bickford and Laffan, 2006), or the taxonomic level (**Figure 1A** our study vs. Figure S1A our study), there remains greatest richness along the east coast of Australia with a substantial hotspot in the Wet Tropics. This Wet Tropics hotspot was earlier detected for a dataset of vascular plants, comprising principally of angiosperms (Crisp et al., 2001). However, other significant angiosperm hotspots, including the significant southwest Western Australia hotspot, are not shared with ferns (Crisp et al., 2001; González-Orozco et al., 2011, 2014; Schmidt-Lebuhn et al., 2012; Kooyman et al., 2013); this is not surprising given the more arid conditions in these latter hotspots, as well as the dependence on water for ferns during the reproductive phase of their life cycle, and preference for moist conditions. The inconsistency among hotspots of various floristic groups indicates that hotspots need to be inferred on a case-by-case basis. These may be the result of the dissimilar ecological preferences as well as different diversification histories. Interestingly, two angiosperm hotspots, the Border Ranges and Sydney Sandstone, are only observed as fern hotspots when using Margelef richness and phylogenetic diversity. On the other hand, liverworts and mosses show greatest diversity along the east coast (Stevenson et al., 2012), largely matching the pattern seen in the ferns (Nagalingum et al., 2014). These corresponding patterns likely reflect the more critical requirement for water of all of these seed-free plant groups, and its greater availability in these regions.

### Explaining the Distribution of Diversity

Differences in diversity distribution have been attributed to a variety of mechanisms. In ferns, the overwhelming pattern is the near-absence of diversity in the central arid interior of the continent, and a hotspot in the Wet Tropics. The Wet Tropics likely represents the ancestral niche for ferns, whereas survival in arid biomes require a suite of adaptations that have arisen in only two of the 89 genera examined here. This pattern fits the ancestral niche hypothesis that predicts that greater diversity will be present in an ancestral niche because more taxa have accumulated here as the group has remained in that niche; at the same time, the group is unable to disperse to other niches without the evolution of suitable adaptations (Crisp et al., 2009; Wiens et al., 2010). In addition, a range of historical factors has been used to explain the previously observed greater diversity in the tropics. These range from elevated speciation rates and decreased extinction rates (compared to extra-tropical regions; the cradle-museum hypothesis), a greater geographical extent of tropical rainforests in the past (the area effect), and an older age of the tropics (the age effect) (Qian and Ricklefs, 2004; Wiens and Donoghue, 2004). Although we were not able to conduct a detailed examination of traits and historical factors with the present dataset, it is likely that both have shaped the distribution patterns we observed for the ferns.

Biodiversity hotspots and underlying differences in diversity distribution are commonly linked to current climatic and environmental conditions—although this has led to discussion between the role of current versus historical factors (Currie and Paquin, 1987; Latham and Ricklefs, 1993; Qian and Ricklefs, 2000; Francis and Currie, 2003; Ricklefs, 2004). Regardless of the scale, from global modeling studies (Kreft and Jetz, 2007; Kreft et al., 2010) to regional meta-analyses (Lehmann et al., 2002; Bickford and Laffan, 2006), plot-based elevational transects (Kessler, 2000, 2001; Bhattarai et al., 2004; Grytnes and Beaman, 2006; Kluge et al., 2006; Kessler et al., 2011) and regional plotsurveys (Aldasoro et al., 2004), water is the leading determinant for fern richness. Indeed, all of the best performing models in our analyses show that the two richness metrics as well as phylogenetic diversity are explained by annual precipitation combined with a different variable (**Table 2**). At a global scale, water availability combined with elevation and temperature are most important for richness of pteridophytes (ferns and lycophytes), and also for plants in general (Kreft and Jetz, 2007; Kreft et al., 2010).

A global analysis of pteridophytes indicates that greatest richness occurs in wet tropical regions with increasing elevation, referred to as topographic complexity (Kessler, 2010; Kreft et al., 2010). The role of elevation is confirmed in our analyses with topography combined with annual precipitation best explaining fern richness (but we note that topography is not included the best-performing model in corrected richness and phylogenetic diversity; **Table 2**). Other regional analyses have also demonstrated the importance of elevation (Dzwonko and Korna´s, 1994; Ferrer-Castan and Vetaas, 2005). It is thought that topography contributes to greater richness because increasing elevations are associated with variability in substrates (including microhabitats), climate, and environment, all yielding a greater diversity of niches for more taxa (Aldasoro et al., 2004; Moran, 2008; Moreno Saiz and Lobo, 2008; Kessler, 2010; Kessler et al., 2011). However, topography alone does not necessarily predict hotspots. In New Zealand, richness is greatest in the North Island (Lehmann et al., 2002), which is considerably less mountainous than the South Island. Instead, the North Island hotspot is associated with the warmest climate zone. At particularly high elevations, the elevational-richness relationship is not observed likely because frost limits fern diversity (Kessler, 2000, 2001; Lehmann et al., 2002; Bhattarai et al., 2004; Grytnes and Beaman, 2006; Kluge et al., 2006; Kessler et al., 2011).

Regions with greater Margalef richness are explained by increases in both precipitation and seasonal temperature extremes. This is in contrast to uncorrected richness, which correlates to increases in precipitation and topography. The difference suggests that sampling biases affect the richness results. These biases may be due to greater sampling effort in topographically complex areas, and the removal of the poorly sampled cells (via the Margalef calculation). The relationship between seasonal temperatures in fern diversity is surprising compared to other findings for ferns (Kessler, 2010) and to trees (Currie and Paquin, 1987). However, seasonality has been identified as a mechanism that enables plants requiring different niches to co-occur (Pausas et al., 2003).

The positive relationship between radiation and phylogenetic diversity suggests that available energy is a limiting factor, conforming to the species-energy hypothesis. Several studies have concluded that plant productivity controls richness (Currie, 1991 and references therein), and for ferns, the link between productivity and species richness has been mechanistically attributed to competition and niche availability (Kessler et al., 2014). However, the mechanisms explaining fern phylogenetic diversity and radiation are unresolved. It has been observed that there is a predominance of epiphytes in tropical rainforests (Gentry and Dodson, 1987; Gentry, 1992). Thus, with low light availability in tropical rainforests, perhaps phylogenetic breadth is limited to epiphytes (which occur in only a few fern clades, Schuettpelz and Pryer, 2009); whereas with greater radiation there is more light available to support a broader range from epiphytic to terrestrial forms. Overall, the relationship between plant phylogenetic diversity and environment is poorly understood with the exception of few analyses (Williams et al., 2010). It remains to be determined if radiation is important for other plant groups, and if radiation is causally linked to phylogenetic diversity.

Interestingly, we found different environmental factors each explaining richness, corrected richness, and phylogenetic diversity. As noted above annual precipitation is common to all of the best models—for richness it combines with topography, for corrected richness it interacts with temperature seasonality, while for phylogenetic diversity it combines with mean annual radiation (**Table 2**). In general, assessment of fern diversity with environmental factors is conducted for single variables, which unsurprisingly identifies precipitation/humidity as the most important factor (Aldasoro et al., 2004; Bickford and Laffan, 2006; Kessler, 2010). Using two variables, we find that models with single variables were always outperformed by those with two variables (**Table 2**). We also employed spatial autoregression, which accounted for non-independence of the data by incorporating a spatial term into the models (Dormann et al., 2007; Kissling and Carl, 2007). The models for Margalef richness and phylogenetic diversity were identical using a general linear model and the spatial autoregression model; however, the models differed for richness (**Table 2**). This result indicates that multivariate models as well as spatial terms need to be considered when inferring the relationship with diversity and environment.

Many fern species distributions are shaped by rock and soil types (Kessler, 2010 and references therein), however, our analyses indicate that fern diversity hotspots are not related to rock and soil substrate (**Table 2**). In fact, rock grain size, sand, and clay all performed extremely poorly in our models. On the other hand, soil fertility (measured as C/N ratio) best explained fern richness for plots across Uganda (Lwanga et al., 1998). It is possible that fertility may have a strong relationship to richness at the continental-scale, but soil nutrient profiles have been difficult to obtain for larger areas, and future studies are needed to further examine the relationship (Kessler, 2010).

## Differences in Phylogenetic Representation Over the Landscape

As discussed above, richness is not necessarily predictive of phylogenetic diversity, despite their general correlation. In this study, we used a randomized phylogenetic diversity test to identify quantitatively places where the expected correlation does not hold. This method discerns whether it has a significantly higher or lower representation of the phylogeny compared to a random sampling of the same number of taxa. In the case of significantly high PDrand (randomized phylogenetic diversity) there is more of the phylogeny represented in that grid cell than expected; this is otherwise known as "phylogenetic overdispersion." Possible explanations include ecological competition that excludes close relatives, or biogeographic processes creating refugia (Webb et al., 2002; Hennequin et al., 2014). Alternatively, when there is significantly low PDrand there is less of the phylogeny represented than expected, and this is referred to as "phylogenetic clustering." Possible explanations include ecological filtering where close relatives have the same habitat requirements, or evolutionarily recent radiations (Webb et al., 2002; Hennequin et al., 2014).

It is also important to note that PD is not necessarily predictive of significant PDrand. For example, some regions, such as the Wet Tropics, are high in PD yet significantly low in PDrand (perhaps due to ecological factors as discussed below). Conversely, some areas, such as the Kakadu–Alligator Rivers region in the Northern Territory, have only low to moderate levels of PD, but significantly high PDrand (perhaps due to ecological and biogeographical factors as discussed below). The ability to detect such regions with unusually low or high values of PD shows the value of the PDrand test used here.

We found that there are two strong geographic patterns in the distribution of high and low PDrand cells. The high PDrand cells are clustered in the north of the continent (**Figure 2D**, purples), whereas the low PDrand cells are concentrated along the east coast (**Figure 2D**, teals). Given the strong geographic pattern, it is not surprising that there are differences in the environmental conditions between these two regions (**Table 3**). High PDrand regions are negatively associated with seasonality in temperature, which means that high PDrand is linked to temperature stability. Equable, stable temperatures may be more favorable to a wider range of lineages across the phylogeny because they do not need to evolve adaptations to extremes in temperature.

Significantly low PDrand is positively associated with annual precipitation and temperature seasonality. The latter is indicative of extremes in temperature, however, not all fern clades are able to tolerate cold conditions. Indeed in low PDrand grid cells families that prefer warmer temperatures are absent, such as the Polypodiaceae (with the exception of Grammatis, which occur in less than one-quarter of low PDrand cells); while families that are cold tolerant, such as the Blechnaceae and Dennstaedtiaceae are present. Alternatively, the absence of Polypodiaceae may be due to the absence of suitable host trees for this largely epiphytic group. It is also possible that such extremes in conditions limit dispersal due to the absence of suitable traits (Zanne et al., 2014), which for ferns include easily dispersed spores and underground rhizomes, and also promote extinction, although extinction may not necessarily be related to temperature.

## Concluding Remarks

Using the largest fern dataset to date, we have identified several areas as hotspots for richness, phylogenetic diversity, and significantly high or low randomized phylogenetic diversity across Australia. Notably, the use of several metrics identifies different or additional areas of importance. The PDrand measure identifies novel areas of diversity significance compared to the two other metrics we used, and these regions have not been revealed in any other analyses. We suggest that these areas are of particular evolutionary and conservation importance and a detailed analysis of them is needed in future studies. Notably environmental predictors explain the distribution of various hotspots, and in turn, the different metrics are predicted by different environmental variables. However, with the onset of changing climate, the conditions that support greatest diversity today will change in the future, e.g., increased and decreased rainfall across Australia

## References


(Bureau of Meteorology and CSIRO, 2014), and will likely impact species distributions and thus, diversity patterns.

We note that these broad scale continental-level patterns were obtained using digitized herbarium records. With increasing digitization efforts across the globe we will be able to conduct even larger scale analyses (without having to rely on modeled distributions, Lehmann et al., 2002; Kreft and Jetz, 2007) as has been performed for global marine distribution records (Tittensor et al., 2010). Furthermore, our study is a proof-of-concept that it is possible and vital to incorporate evolutionary metrics when inferring biodiversity hotspots from large compilations of data.

## Author Contributions

NSN and BDM designed the study. NSN, NK, and AHT assembled the data. NSN, NK, SWL, and CEG-O analyzed the data. NSN prepared the manuscript, and all authors edited the manuscript.

## Acknowledgments

We thank Carolyn Connelly from the Royal Botanic Garden and Domain Trust for assistance with laboratory techniques, Nada Sankowsky for providing fern specimens, and CSIRO (Australia) for a Distinguished Visiting Scientist Award to BDM. Brian Enquist, Sabine Hennequin, Paul Manos, Michelle McMahon, Michael Sanderson and Maurizio Rossetto are thanked for helpful discussions. The reviewers are thanked for their comments on the manuscript.

## Supplementary Material

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene. 2015.00132/abstract


taxonomic levels. J. Biogeogr. 149, 1159–1165. doi: 10.1046/j.1365-2699.2002. 00773.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Nagalingum, Knerr, Laffan, González-Orozco, Thornhill, Miller and Mishler. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Revisiting the vanishing refuge model of diversification

*Roberta Damasceno1,2 \*†, Maria L. Strangas 3 †, Ana C. Carnaval 3,4 , Miguel T. Rodrigues <sup>2</sup> and Craig Moritz1,5*

<sup>1</sup> Museum of Vertebrate Zoology, Integrative Biology Department, University of California Berkeley, Berkeley, CA, USA

<sup>2</sup> Departamento de Zoologia, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil

<sup>3</sup> Biology Department, The Graduate Center, City University of New York, New York, NY, USA

<sup>4</sup> Biology Department, City College, City University of New York, New York, NY, USA

<sup>5</sup> Research School of Biology, The Australian National University, Acton, ACT, Australia

#### *Edited by:*

Toby Pennington, Royal Botanic Garden Edinburgh, UK

#### *Reviewed by:*

Olivier Hardy, Université Libre de Bruxelles, Belgium Jonathan B. Losos, Harvard University, USA

#### *\*Correspondence:*

Roberta Damasceno, Departamento de Zoologia, Instituto de Biociências, Universidade de São Paulo, Rua do Matão, travessa 14, n◦ 101, São Paulo 05508-090, Brazil e-mail: rpdama@gmail.br

†Roberta Damasceno and Maria L. Strangas have contributed equally to this work.

Much of the debate around speciation and historical biogeography has focused on the role of stabilizing selection on the physiological (abiotic) niche, emphasizing how isolation and vicariance, when associated with niche conservatism, may drive tropical speciation. Yet, recent re-emphasis on the ecological dimensions of speciation points to a more prominent role of divergent selection in driving genetic, phenotypic, and niche divergence. The vanishing refuge model (VRM), first described by Vanzolini and Williams (1981), describes a process of diversification through climate-driven habitat fragmentation and exposure to new environments, integrating both vicariance and divergent selection. This model suggests that dynamic climates and peripheral isolates can lead to genetic and functional (i.e., ecological and phenotypic) diversity, resulting in sister taxa that occupy contrasting habitats with abutting distributions. Here, we provide predictions for populations undergoing divergence according to the VRM that encompass habitat dynamics, phylogeography, and phenotypic differentiation across populations. Such integrative analyses can, in principle, differentiate the operation of the VRM from other speciation models. We applied these principles to a lizard species, Coleodactylus meridionalis, which was used to illustrate the model in the original paper. We incorporate data on inferred historic habitat dynamics, phylogeography and thermal physiology to test for divergence between coastal and inland populations in the Atlantic Forest of Brazil. Environmental and genetic analyses are concordant with divergence through the VRM, yet physiological data are not. We emphasize the importance of multidisciplinary approaches to test this and alternative speciation models while seeking to explain the extraordinarily high genetic and phenotypic diversity of tropical biomes.

**Keywords: vanishing refuge model, speciation, diversification, phenotypic evolution, habitat stability, niche evolution**

#### **INTRODUCTION**

Speciation is a dynamic, multifaceted, and continuous process (Mayr, 1963; de Queiroz, 2007; Mallet, 2007; Berner et al., 2009; Peccoud et al., 2009; Sobel et al., 2010). Much of the debate around speciation and historical biogeography has focused on the role of stabilizing selection on the physiological (abiotic) niche, emphasizing how isolation and vicariance, when associated with niche conservatism, may drive tropical speciation (e.g., Janzen, 1967; Wiens and Graham, 2005). Yet, the recent re-emphasis on the ecological dimensions of speciation points to a more prominent role of divergent selection in driving genetic, phenotypic, and niche divergence (e.g., Endler, 1977; Schluter, 2009; Nosil, 2012). We focus on one opportunity for divergent selection to drive speciation through climate-driven biome shifts. More specifically, we revisit one model of speciation, first referred to by Williams and Vanzolini (1980) and Vanzolini (1981), and formally proposed by Vanzolini and Williams (1981): the vanishing refuge model (VRM). This model predicts vicariance and subsequent divergent selection as habitats change over time. Developed to explain the distribution of sister species in adjacent yet environmentally contrasting biomes, Vanzolini and Williams' (1981) model states that habitat shifts can lead to genetically and phenotypically divergent species, without invoking mechanisms of divergence with gene flow such as parapatric speciation (e.g., Endler, 1977, 1982):

*"Some populations of forest-restricted species may be pre-adapted to life in open formations. If, during the dry part of a climatic cycle, they happen to be confined to a refuge that eventually vanishes, they may, in the process, become completely adapted to open formation conditions and constitute a full ecological variant."*

#### (Vanzolini and Williams, 1981)

Vanzolini and Williams (1981) proposed the VRM as a variant of allopatric speciation, foreseeing eco-geographic isolation (Sobel et al., 2010), genetic and phenotypic divergence, and, ultimately, speciation, as original forest habitats shrink and disappear. While the VRM builds on the Pleistocene refuge hypothesis (PRH; Haffer, 1969), the two models are clearly distinct. Both the PRH and the VRM stress the geographic setting of allopatric divergence, yet the PRH addresses divergence across isolated patches of

similar habitats (e.g., forest refugia), whereas the VRM specifically focuses on divergent evolutionary trajectories across distinct habitat types (continuing forest vs. former forest but now savanna). Speciation and phylogeographic breaks across biomes, as well as eco-phenotypic divergence in response to climate-driven changes in habitat distribution, are uniquely under the domain of the VRM. Yet, the VRM has sometimes been inappropriately linked to speciation across similar (e.g., forest) habitats (e.g., Viljanen, 2009; Wirta, 2009; de Mello Martins, 2011; Prado et al., 2011; de Carvalho et al., 2013).

Revisiting and refining the VRM is relevant for present-day discussions about the drivers of diversification, particularly given the renaissance of ecological and biogeographic thinking in speciation studies, and the new methods used to infer population and biogeographic history. This often neglected model mirrors recent emphasis on the role of ecology and adaptive divergence on speciation processes (Nosil, 2012; Arnegard et al., 2014). Though the VRM has been evoked to explain biogeographical patterns in the tropics (Almeida et al., 2007; Graham et al., 2010; Lim and Sheldon, 2011), it has not, to our knowledge, been tested explicitly. Moreover, the predictions associated with the VRM relative to alternative speciation models, specifically in the context of climate-induced shifts in habitat distributions, have not been clearly defined.

As originally illustrated by Vanzolini and Williams (1981), the VRM describes a process of divergence in which late Pleistocene climatic oscillations led to forest fragmentation, adaptation to new environments, and subsequent speciation. Importantly, however, the model can be used to describe divergence, habitat change, and phenotypic disparity driven by fragmentation of any habitat type, followed by vanishing of the original habitat, and at different time scales.

In this paper, we (1) articulate why and how the VRM contributes to ongoing discussions about the links between diversification and niche evolution; (2) make clear predictions about the expected patterns of genetic structure, phylogenetic relationships, realized niches, and phenotypic disparity resulting from the processes described by the VRM; (3) consider alternative historical processes that can result in diversity patterns similar to those expected under theVRM; (4) discuss multidisciplinary, integrative approaches to test the model; and (5) illustrate an initial test of the predictions of the VRM using new data on the distribution, genetics and physiology of lizards in the Brazilian Atlantic Forest and adjacent dry biomes – the same habitats and taxa used byVanzolini and Williams (1981) when they first proposed the model.

## **THE VRM AND LINKS BETWEEN DIVERSIFICATION AND NICHE EVOLUTION**

In general, theory on speciation processes considers the interaction of suppression of gene flow with opportunity for divergent selection and/or genetic drift (Gavrilets, 2003). The VRM assumes that populations of species have become genetically isolated following climate-driven habitat fragmentation of their forest habitats – i.e., allopatric divergence, and that one or more such isolates are then subject to loss of the ancestral (forest) habitat. This combination of isolation and habitat change provides the context for strong differential selection on both biotic and abiotic niche axes, which will enhance the probability of speciation due to rapid build up of genetic incompatibilities (Gavrilets, 2003), ecological barriers to genetically effective dispersal, and potential for correlated responses on mate choice (Nosil, 2012). The PRH also assumes climate-driven fragmentation of forest habitats, but the isolated populations remain in a habitat that is largely similar to that of the ancestral population, therefore being subject to similar selection pressures. In this context, for instance, "mutation-order speciation" posits that similar selection processes operating in isolated populations occupying analogous habitats can nonetheless result in genetic incompatibility because different and incompatible mutations are favored by selection (e.g., Nosil and Flaxman, 2011). This process, which requires long-term isolation, might apply to frequently reported cases of eco-morphologically cryptic speciation among long isolated forest refugia (e.g., Singhal and Moritz, 2013).

Whether the opportunity for speciation under the VRM will be realized depends on the remnant, initially forest-associated, populations remaining viable under divergent selection. Given potentially rapid environmental change, persistence of populations will be enhanced by (i) the presence of standing genetic variation on which selection can act, (ii) plasticity in key traits to buffer the demographic costs of selection, or (iii) a high rate of intrinsic growth relative to the rate of environmental change (Gomulkiewicz and Houle, 2009; Chevin et al., 2010); all "preadaptations" to biome shifts. In many cases this will not be possible, resulting in local extinction. Between these two extremes, for instance where forested areas are reduced to small patches in a mosaic of dry habitat types, we might expect selection for broader niches due to spatially varying selection with gene flow, with concomitant change in eco-phenotype, but not a full biome shift. This process might explain phenotypic divergence in peripheral forest refugia in systems that are otherwise phenotypically conservative (Hoskin et al., 2011).

## **PREDICTIONS OF THE VRM**

Here, we extend the original formulation of the VRM to include specific predictions regarding habitat structure, phylogeography and historical demography that are pertinent to any time period and habitat type (**Table 1A**). Yet we exemplify the predictions with the original conditions illustrated by Vanzolini and Williams (1981).

Taxa most amenable to VRM diversification will occur in preferred (ancestral) habitat types (e.g., forest), yet show evidence for "pre-adaptation" (i.e., tolerance) to broader environmental conditions, possibly inferred through natural history observations, for instance through species records in edge or anthropogenic habitats.

Populations of taxa undergoing the initial stages of diversification as described by the VRM will inhabit distinct isolates of suitable habitat. While some populations will remain in more climatically stable (core) areas and under stabilizing selection, others populations will occur in patches (the vanishing refugia) which are being replaced by the surrounding, ecologically distinct matrix. The later will hence be subject to strong directional selection.

**(A) VRM stage (fromVanzolini andWilliams, 1981) Stage description (as per Vanzolini andWilliams, 1981) Predictions (B) Suggested tests** Step 1 "Continuous forest areas (hachured) surrounded by open formations (stippled). "Y" is an ecotone-tolerant population of "A," a forest-restricted species." • If pre-adaptation is manifested in phenotype, expect slight differentiation (eco-morphology, eco-physiology) between A and Y. • Phenotypic comparisons among populations Step 2 "Desiccation of the climate: beginning of the dissection of the forest by open formations and of the differentiation of A in refuges." • Recent habitat connection between now isolated patches. • Paleo-habitat distribution modeling • Fine-scale genetic analyses to test for population differentiation and absence of current gene flow among forest refuges Step 3 "Further dissection of the forest and differentiation of A; Y becomes definitely ecotone-adapted." • Broader physiological tolerances in Y (on individual or population level). • Genetic differentiation between Y and other populations. • Phenotypic comparisons among populations • Phylogeographical (tree-based) analyses • Test for divergence without gene flow Step 4 "Maximum aridity: full process of differentiation in refuges; Y's refuge has disappeared and the population has become fully adapted to open formations, and spatially isolated from the populations within the surviving refuges." • Distinct lineages in distinct habitat types. • Phylogeographical (tree-based) analyses • Test for divergence without gene flow • Phenotypic comparisons among populations Step 5 "Amelioration of the climate: beginning of the recovery of the forest; refuges begin to coalesce; Y is now a fully adapted open formation form." • Population expansion for Y'. • Population genetics approaches to test for population expansion in Y'. Step 6 "Process completed: continuous forest area reconstituted; Y' a widespread, parapatric open formation species." • Independent evolutionary trajectories. • Phenotypic differentiation (adaptive). • Reproductive isolation. • Phylogenetic analyses • Phenotypic comparisons among populations/species • Fitness studies • Breeding experiments

**Table 1 | (A)** Description of the vanishing refuge model and associated predictions.The illustrations on the left were adapted from Vanzolini and Williams (1981); stage descriptions are provided as per the original paper. **(B)** Possible tests of the various stages described by the model.

At this initial stage, we expect to find evidence of:


Population contraction, associated with the reduction in the size of the preferred habitat and the demographic costs of selection, is possible at this stage – but not a necessary correlate of the process. As natural selection acts upon isolated populations in vanishing refugia, perhaps in combination with genetic drift, these populations will be exposed to the broader range of abiotic conditions found in ecotone environments and are expected to evolve:

	- (a) Inter-individual trait variation (polymorphism), where some individuals in vanishing refugia are more tolerant of the matrix conditions (**Table 1A**, steps 1 and 2);
	- (b) Greater capacity for plasticity, in terms of either developmental plasticity or (individual) reversible acclimation (**Table 1A**, steps 1 and 2; Angilletta, 2009); or
	- (c) Adaptation to ecotone conditions, where all individuals are able to tolerate both original habitat and the matrix (**Table 1A**, step 3).

At later stages of the VRM process, we observe that:

(4) Sister lineages occupy distinct *types* of habitat: lineages in core areas occupy ancestral habitat types, whereas lineages evolving in vanishing refugia will have different climatic niches and occupy a distinct (matrix-like) habitat type (**Table 1A**, steps 4–6). In this instance and hereafter, we use the term "lineage" to refer to lineages of individuals (organisms) rather than lineages of genes.

As individuals of the lineage undergoing climatic niche evolution (be it through physiological, morphological, or behavioral evolution) colonize the matrix, we expect to uncover:

(5) Genetic signatures of population expansion in the newly occupied habitat (**Table 1A**, steps 5–6).

Multiple mechanisms may then contribute to:

(6) Pre- or post-mating reproductive isolation between individuals of the ancestral lineages in core habitats relative to those of the newly evolved lineage occupying the matrix, including for instance divergent sexual selection and reinforcement.

In isolation and under the new regime of selective pressures:

(7) The lineage occupying the matrix becomes further differentiated, phenotypically, from those in ancestral habitats. Because the VRM stresses the role of divergent selection, at least some of these phenotypic differences will be adaptive. Whether the physiological tolerances of the diverging lineage will remain broader than that of the ancestral population (or just shift to a different optimum) will depend on the direction of selective pressures in the matrix and hence cannot be predicted.

## **ALTERNATIVE PROCESSES THAT LEAD TO SIMILAR PATTERNS OF BIODIVERSITY**

The diversification process described by the VRM results in the pattern that originally motivated Vanzolini and Williams (1981): that of sister taxa occupying contrasting habitats with abutting distributions. Yet, this pattern may also be generated by alternative scenarios (Endler, 1982; Doebeli and Dieckmann, 2003; Losos and Glor, 2003), and should not be used as conclusive evidence for the VRM on its own (**Table 2**).

The VRM was initially proposed to avoid the assumption of divergence with gene flow, yet a parapatric mode of speciation, in which two populations diverge across an environmental gradient in the presence of continuous gene flow (Mayr, 1963; Endler, 1977; Gavrilets, 2003) could result in the same broad pattern. Under parapatric speciation, the strength of divergent selection must overcome the homogenizing effects of gene flow in order to lead to diversification and speciation (Haldane, 1930; Langerhans et al., 2003; Moore et al., 2007), and thus this process has long been considered biologically difficult. However, several recent studies have found strong evidence of speciation in the presence of gene flow (Niemiller et al., 2008; Pinho and Hey, 2010; Cooke et al., 2012). While parapatric speciation explicitly invokes divergence with gene flow, the VRM suggests divergence in the absence of gene flow.

Peripatric speciation, for instance caused by founder events (Mayr, 1942), may also result in the same geographic patterns of species distribution, gene flow, phylogenetic relationships, genetic isolation, and trait divergence through adaptive speciation as described by the VRM (e.g., Rasner et al., 2004). However, peripatric speciation does not require habitat fragmentation (though is also not incompatible with it), and is expected to result in a severe reduction in population size, which is not necessary under theVRM. The potential for peripatric speciation driven by founder events is highly controversial; though some evidence for founder event speciation has been documented (Templeton, 2008; Balakrishnan and Edwards, 2009; Matute, 2013), many argue that these events are very rare (Coyne and Orr, 2004; Walsh, 2005; Yeung et al., 2011).

### **TESTING THE VRM**

As with any discussion of refugial dynamics (Gavin et al., 2014) or of speciation (de Queiroz, 2007), multiple forms of evidence

**Table 2 | Differences between the vanishing refuge, parapatric, and peripatric speciation models.**


are necessary to validate this model. Indeed, no single line of evidence can distinguish between the operation of the VRM and these alternatives (**Table 2**). Given a suitable test system, we suggest integrative testing that combines habitat modeling over climatic fluctuations, genetic analyses, and phenotypic comparisons (**Table 1B**).

#### **SELECTING A SYSTEM**

#### *Relevant geographic areas*

Regions of high climatic heterogeneity and steep environmental gradients are good candidates for operation of the VRM process because climatic changes, such as those of the Quaternary, can readily lead to habitat deterioration and fragmentation and hence to the isolation and diversification of populations. Extreme differences across habitats can also prevent organisms from experiencing similar microhabitats through behavioral thermoregulation that may otherwise shield them from divergent

selection pressures (Gunderson and Leal, 2012). Local environmental analyses (e.g., environmental PCA, Robertson et al., 2001) may help to detect regions with such attributes. The use of correlative habitat models to map stability over time (Graham et al., 2010; Carnaval et al., 2014) should also be used to identify spatial variation in habitat stability and past habitat fragmentation.

#### *Candidate taxa*

As Vanzolini and Williams (1981) emphasize, exemplars of the VRM process are species or lineages that differ in habitat use relative to their sister taxa and the ancestral state. Appropriate candidate taxa are also expected to have sufficient standing genetic variation and a lack of internal trade-offs to evolve rapidly in response to new selection pressures (Angilletta et al., 2006; Kemp, 2007; Labra et al., 2009). Though difficult to assess directly, these characteristics may be inferred

the genetic analyses and those with stars were included in the physiology dataset. State lines are represented and the State of Bahia is labeled. Only Coleodactylus natalensis (morphologically and ecologically distinct clade embedded within Coleodactylus meridionalis) is found in locality 12.

shown. Letters (a-f) indicate clades described in the results. Names and numbers match the map of Northern Atlantic Forest climatic stability **(B)** over the last glacial cycle (∼120 kya to present-day; data from Carnaval et al.,

for species with high intra-population variation in key traits or observed habitat preferences. Lability in tolerance traits, as observed across the broader phylogeny (e.g., Grizante et al., 2012), may also provide indirect evidence for evolvability or pre-adaptation.

Generally, we expect that low dispersal organisms will more likely diversify through vanishing refuge processes than will high dispersal species. Because the former may be unable to track habitats through movement or migration (Araújo et al., 2008; Sandel et al., 2011), therefore failing to shift their ranges in response to rapid environmental changes (Walther et al., 2002;Wiens and Graham, 2005; Velo-Anton et al., 2013), these species are more readily exposed to new and harsh selection regimes such as those assumed in the VRM.

#### **TESTING THE HYPOTHESIS** *Environmental analyses*

A key approach to validating the VRM is through paleo-habitat modeling. By using bioclimatic distribution models to infer the distribution of the inferred ancestral habitat rather than individual species, this approach can reveal past habitat fragmentation (Graham et al., 2010; Carnaval et al., 2014) and highlight disjunct areas with lower paleostability within which VRM processes are predicted. Modeling of individual species would be inappropriate in this context, given that VRM predicts niche evolution and species distribution modeling methods assume niche conservatism. Paleo-habitat modeling can help differentiate VRM processes from parapatric or peripatric speciation, as the latter two do not depend on climate and habitat changes (**Table 2**).

Molecular data can help identify demographic signatures of the VRM at different stages of the diversification model. Coalescentbased methods (e.g., BPP, Yang and Rannala, 2010, and others cited in Fujita et al., 2012) can test for the existence of independently evolving lineages across habitat patches (**Table 1**, step 2). Population genetic methods (e.g., IMa, Hey and Nielsen, 2007) can verify the occurrence of divergence without gene flow (Hey, 2010) and test for population expansion after divergence (Hey and Nielsen, 2004) as expected under the model. Combining the tools of habitat paleomodeling, coalescent simulations, and statistical phylogeography (Hickerson et al., 2006; Carnaval et al., 2009; Knowles, 2009) one can assess the concordance between the time of habitat fragmentation and divergence, and test alternative hypotheses of responses to past environmental shifts. It is also becoming increasingly feasible to locate the region of origin of population expansions (Lemey et al., 2009; Peter and Slatkin, 2013) – and to test whether ecologically derived taxa originated in a region formally occupied by ancestral habitats. In general, these more advanced, coalescent-based analyses require evidence from 100 to 1000s of independent loci, which are now more accessible thanks to new technological advances. These genetic analyses, evidently, will always be limited by the availability of data from each system (Knowles, 2009).

#### *Phenotypic analyses*

Phenotypic analyses, in combination with phylogeographic evidence, provide crucial evidence on the process of diversification. Though the VRM may be relevant to a broad range of taxa, we focus on phenotypic traits in terrestrial ectothermic vertebrates

2014). State lines are represented and the State of Bahia is labeled. Stars mark populations we sampled physiology data for. The insert **(C)** shows in detail climatic stability in the isolates (localities 18, 20, 21, and 24) as well as in one (out of the three) of the coastal localities also examined physiologically (8, 9, and 26).

due to data availability and relevance to the examples of lizard species originally provided by Vanzolini and Williams (1981). Morphology and thermal physiology in such taxa are central to how organisms respond to changes in habitat, and hence can be particularly informative when testing the VRM. Body shape, for instance, is strongly associated with climate and habitat (Kohlsdorf and Navas, 2012) and affects performance traits (Losos, 2009; da Silva et al., 2014), thus suggesting adaptive significance. Limb length and body size are other labile traits that often evolve when habitats change (Mahler et al., 2010). Because thermal physiology directly influences performance traits for ectotherms, critical temperatures and water loss rates can be particularly informative in tests of the VRM (Angilletta, 2009; Sinervo et al., 2010). Unfortunately, little is known about the heritability and evolvability of these thermal physiological traits outside of model systems such as *Drosophila* (Angilletta, 2009).

#### **CASE STUDY: TESTING VRM WITH Coleodactylus meridionalis**

We illustrate an initial test of the VRM within one lizard species (*Coleodactylus meridionalis*, Sphaerodactylidae, Gekkota) mentioned in the original VRM paper. Vanzolini and Williams (1981) were impressed by the record of one population of this species in the Caatinga biome (Exú, state of Pernambuco) that contrasted widely with all the other records known at that point (only in forest habitat). Because of *Coleodactylus meridionalis'* potential exposure to divergent selection, they hypothesized that populations of this species could be in the initial stages of speciation under the VRM.

Vanzolini and Williams (1981) also suggested that three other forest lizard species may be diverging according to the VRM: (1) the sphaerodactylid gecko *Gonatodes humeralis*, (2) the dactyloid *Norops brasiliensis* (at the time *Anolis chrysolepis*), and (3) the tropidurid *Plica plica*. The authors suggested that all of them could possibly be "pre-adapted" to tolerate non-forest habitat. *P. plica* is an arboreal species, typical of primary forest and rarely found at forest edges, which shows occasional basking behavior. *G. humeralis* and *N. brasiliensis* show inter-population variation in habitat use, being found in primary forests as well as highly disturbed habitats. Based on distribution and morphological distinctiveness, Vanzolini and Williams (1981) also hypothesized that the skink *Copeoglossum arajara* (at the time *Mabuya arajara*) has completed the speciation process according to the VRM (Williams and Vanzolini, 1980; Vanzolini and Williams, 1981). This suggestion was made because the authors considered *M. arajara* to be fully adapted to open habitats, whereas *Copeoglossum nigropunctatum* (formerly *M. bistriata*), its putative sister species, was restricted to forested environments (despite being able to actively thermoregulate). Our recent field observations (Rodrigues, personal communication, 2014), however, do not support the presumed differences in habitat use between these two species.

To test whether geographically isolated lineages of *Coleodactylus meridionalis* are diverging according to the VRM, we combined novel occurrence data, preliminary phylogeographic analyses, and physiological assays with existing hypotheses about the historical climatic stability of the Northern AF. Because related species are primarily distributed in forest habitats (Geurgas et al., 2008; Gamble et al., 2011), we suggest this to be the ancestral state for *Coleodactylus meridionalis,* with possible VRM divergence into drier and more open formations. Further increasing its likelihood for exposure to novel selection regimes,*Coleodactylus meridionalis*' small body size (SVL <5 cm) suggests limited dispersal capacity, which may prevent it from tracking shifting forests in periods of rapid climatic and habitat change.

Our distribution data reveals that this species is indeed restricted to the leaf litter and occurs primarily in the Northern AF, yet is also found in several localities within the much drier Caatinga and Cerrado biomes of Northern and Northeastern Brazil, in addition to the site Vanzolini and Williams (1981) noted at Exú (**Figure 1**). To test for the VRM predictions with *Coleodactylus meridionalis*, we sampled populations in the climatically stable and currently continuous forested area along the Brazilian coast (mostly in Bahia), as well as inland populations in forested areas that have been climatically unstable over the last 120 ky and are currently isolated from coastal AF habitats by the surrounding by Caatinga biome (**Figures 1** and **2**). If VRM mechanisms are in progress, we should find evidence of: (1) recent shifts in the distribution of the AF, (2) recent AF fragmentation, (3) genetic, and (4) phenotypic differentiation between populations in core (stable) forest areas and more unstable, isolated forest patches. We also expect (5) greater acclimation capacity and broader thermal tolerances in lineages occupying historically unstable areas. Although one can expect more within-populations variation (polymorphism) in tolerance traits in unstable areas than in stable areas, our limited sample sizes in some key areas prevents us from testing it statistically.

Two lines of environmental evidence are concordant with a VRM of diversification in this system. Based on correlative paleo-modeling of the Northern AF, developed at 4 ky

**Table 3 | Results of ANOVA with repeated measures to test evidence of acclimation capacity (comparing data after capture and after acclimation treatments).**


intervals through a full glacial cycle (120 ky), historical climatic stability of forest habitats has varied throughout this species' range (Carnaval et al., 2014); some populations have likely been exposed to divergent selection within the last glacial cycle, as habitats shifted in the more climatically unstable areas. Furthermore, present-day occurrence data shows that *Coleodactylus meridionalis* is found in large forested areas along the coast as well as in small, isolated forest patches further inland, with an overall gradient of decreasing historical stability of forest habitat from the coast to the inland regions (**Figures 1** and **2B,C**).

A preliminary phylogeographic analysis based on one mitochondrial locus (16S rDNA), yet covering most of the species distribution (supplementary methods and supplementary Tables 1 and 2), revealed relatively shallow phylogeographic structure within *Coleodactylus meridionalis*. Such low genetic differentiation suggests that, if diverging under the VRM, this system must be in its very early diversification stages (**Figure 2A**). The data nonetheless indicate the existence of a few differentiated clades within this species (bootstrap support >85): (a) a large clade including samples from the climatically stable coastal Bahia, (b) a north coastal clade within a region with low-medium stability, (c) a clade including inland, relatively unstable sites (and *Coleodactylus natalensis* from the coast; see also Geurgas et al., 2008), (d) a distinct lineage comprising individuals from the climatically unstable Morro do Chapéu, and clades including samples from the low-medium stability coastal sites of (e) Murici and (f) Maceió. Of these, the most likely candidate lineage to be diverging under the VRM, given its low climatic stability and geographic location, is that in the high altitude inland

site, Morro do Chapéu, that is surrounded by semi-arid Caatinga habitat. In the future, the availability of multi-locus genetic data for *Coleodactylus meridionalis* will enable more rigorous tests of the VRM predictions, improving inferences about the historical demography, gene flow and timing of divergence in this system.

Existing physiological data, however, do not provide evidence that individuals in historically unstable areas show divergent physiology relative to stable sites. We experimentally measured individual upper and lower critical thermal limits and preferred temperatures in seven localities (CTmin, CTmax, and Tpref, see supplementary methods and supplementary Tables 3 and 4), including Morro do Chapéu, other inland sites, and coastal Bahia, areas with varying levels of inferred paleo-stability for forest (**Figure 2C**). We also assessed short-term reversible acclimation capacity in three localities with varying degrees of climatic stability by conducting experiments immediately after capture as well as after two different acclimation treatments (details in supplementary methods). We nonetheless found no evidence of acclimation capacity in CTmax, CTmin, thermal tolerance (CTmin–CTmax), or thermal preferences (**Table 3**, **Figure 3**), except for a significant shift in CTmin in Chapada (locality 21; **Figure 2C**). Because this result probably reflects the small sample size of the 30◦ C treatment (two individuals), we avoid over-interpreting it. Nevertheless, if CTmin is actually plastic in Chapada, this would be in opposition to the VRM prediction given the moderate to high stability score of this locality (stability scores presented in supplementary table 2).

In contrast to predictions of VRM, we did not detect a correlation between historical habitat stability and thermal tolerance (*R*-squared = 0.015, *p* = 0.554). Tolerance did vary across populations (*F*-statistics = 3.572, *p* = 0.025, corrected for seasonal effects, **Figure 4**), yet the greatest difference (∼5◦C) was observed between individuals collected in Salvador (narrower tolerance) relative to those in Chapada (broader tolerance, primarily due to lower CTmin, **Figure 4**). These two sites have high historical climatic stability, yet Salvador is a coastal, lowland site while the Chapada is an inland and higher elevation region where climate is more seasonal (Supplementary Table 3). Together, these results suggest that thermal tolerance in *Coleodactylus meridionalis* may reflect current climate rather than long-term exposure to different levels of climatic fluctuation. Indeed, we found a positive relationship between thermal tolerance (residuals against seasonal effects) and current annual temperature range (*F*-statistics = 4.872, *p* = 0.038). Together, these results suggest that thermal physiology in *Coleodactylus meridionalis* is labile. Whether such correlation represents local adaptation or developmental plasticity is yet to be determined. In contrast to predictions (Khaliq et al., 2014), temperature seasonality does not predict thermal tolerance (residuals; *F*-statistics = 0.4806, *p*-value = 0.495).

It is possible that other phenotypic traits, such as ecomorphology, may have diverged in *Coleodactylus meridionalis* under the VRM. Alternatively, populations in areas with even lower inferred forest stability may be undergoing VRM processes. *Coleodactylus natalensis* (**Figure 1**, clade c) may be one such candidate. This lineage is nested within *Coleodactylus meridionalis*, and may have diverged quite recently Geurgas et al., 2008). It is found in an area with low climatic stability over time, suggesting that it may have been exposed to divergent selective pressures relative to many *Coleodactylus meridionalis* lineages. *Coleodactylus natalensis* is known from only one forested area on a coastal dune system (Capistrano and Freire, 2009, **Figure 1**, locality 12), and is morphologically distinct from sister lineages (Freire, 1999). Some data on thermal physiology exist for the *Coleodactylus natalensis* lineage (de Sousa and Freire, 2011), though these are not yet sufficient to test for VRM processes.

#### **CONCLUSION**

Our case study illustrates how multiple lines of evidence can be combined to identify lineages potentially diverging under the VRM. Recent integrative studies support the view that historic climatic stability promotes the accumulation and *maintenance* of diversity in space and over time (Graham et al., 2006; Carnaval et al., 2009). The vanishing refugia model describes a mechanism by which dynamic climates, hence environmental change and instability, play a key role in *generating* adaptive diversity (Vanzolini and Williams, 1981; Moritz and Carnaval, 2010; Hoskin et al., 2011). Importantly, the diversification process described by the VRM generates high functional diversity, and the resulting taxa are morphologically and/or physiologically distinct. The high genetic diversity observed in stable areas is not expected to show such high morphological and physiological disparity. In addition, the VRM highlights the often overlooked evolutionary potential of peripheral isolates (Moritz et al., 2012). We argue that further identification of lineages and regions undergoing diversification under the VRM will be particularly insightful and relevant to conservation in the face of rapid anthropogenic climate change.

#### **DATA ACCESSIBILITY**

DNA sequences: GenBank accessions KM852739-KM852829.

#### **AUTHOR CONTRIBUTIONS**

Roberta Damasceno and Maria L. Strangas contributed equally to this study. Roberta Damasceno, Maria L. Strangas, Ana C. Carnaval, Craig Moritz, and Miguel T. Rodrigues designed this study. Roberta Damasceno and Miguel T. Rodrigues collected data. Roberta Damasceno performed analyses. Roberta Damasceno, Maria L. Strangas, Ana C. Carnaval, Craig Moritz, and Miguel T. Rodrigues interpreted the data. Roberta Damasceno, Maria L. Strangas, Ana C. Carnaval, and Craig Moritz wrote the manuscript. Miguel T. Rodrigues revised it critically for important intellectual content.

#### **ACKNOWLEDGMENTS**

M. Matos, T. Porto, C. Leite, M. Teixeira Jr., R. Recoder, F. Dal Vechio, A. Camacho, J. Cassimiro, M. Sena, J. M. Ghellere, and S. Rocha assisted with the collection of vouchers and tissues. S. Geurgas contributed with a few sequences. This work profited from National Science Foundation awards to Ana C. Carnaval and Craig Moritz (DEB-817035, DEB-1035184, DEB-1120487), an NSF Graduate Research Fellowship to Maria L. Strangas, FAPESP grants to Miguel T. Rodrigues (2003/10335-8 and 2011/50146-6) and Australian Research Council support to Craig Moritz. Roberta Damasceno acknowledges support from FAPESP (2013/22477-3) and CAPES-Fulbright (BEX 2740/06-0). This work is partially co-funded by FAPESP (BIOTA, 2013/50297- 0), NSF (DOB 1343578), and NASA through the Dimensions of Biodiversity Program. Research procedures using live vertebrate animals use in this study were performed under approved animal use protocol R278-0314 by the Animal Care and Use Committee at the University of California, Berkeley, on April 4, 2013.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene.2014.00353/ abstract

#### **REFERENCES**


Coyne, J. A., and Orr, H. A. (2004). *Speciation*. Sunderland: Sinauer Associates, Inc.


Yeung, C. K., Tsai, P.-W., Chesser, R. T., Lin, R.-C., Yao, C.-T., Tian, X.-H., et al. (2011). Testing founder effect speciation: divergence population genetics of the spoonbills *Platalea regia* and *Pl. minor* (Threskiornithidae, Aves). *Mol. Biol. Evol.* 28, 473–482. doi: 10.1093/molbev/msq210

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 July 2014; accepted: 21 September 2014; published online: 22 October 2014.*

*Citation: Damasceno R, Strangas ML, Carnaval AC, Rodrigues MT and Moritz C (2014) Revisiting the vanishing refuge model of diversification. Front. Genet. 5:353. doi: 10.3389/fgene.2014.00353*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Damasceno, Strangas, Carnaval, Rodrigues and Moritz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Multi-model inference in comparative phylogeography: an integrative approach based on multiple lines of evidence

*Rosane G. Collevatti 1\*, Levi C. Terribile2 , José A. F. Diniz-Filho3 and Matheus S. Lima-Ribeiro2*

<sup>1</sup> Laboratório de Genética & Biodiversidade, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Brazil

<sup>2</sup> Laboratório de Macroecologia, Universidade Federal de Goiás, Jataí, Brazil

<sup>3</sup> Departamento de Ecologia, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Brazil

#### *Edited by:*

Toby Pennington, Royal Botanic Garden Edinburgh, UK

#### *Reviewed by:*

David Vieites, Spanish Research Council – Consejo Superior de Investigaciones Científicas, Spain Ana C. Carnaval, City College of New York, USA

#### *\*Correspondence:*

Rosane G. Collevatti, Laboratório de Genética & Biodiversidade, Instituto de Ciências Biológicas, Universidade Federal de Goiás, C. P. 131, Campus II, Setor Samambaia, 74001-970 Goiânia, GO, Brazil e-mail: rosanegc68@hotmail.com

Comparative phylogeography has its roots in classical biogeography and, historically, relies on a pattern-based approach. Here, we present a model-based framework for comparative phylogeography. Our framework was initially developed for statistical phylogeography based on a multi-model inference approach, by coupling ecological niche modeling, coalescent simulation and direct spatio-temporal reconstruction of lineage diffusion using a relaxed random walk model. This multi-model inference framework is particularly useful to investigate the complex dynamics and current patterns in genetic diversity in response to processes operating on multiple taxonomic levels in comparative phylogeography. In addition, because of the lack, or incompleteness of fossil record, the understanding of the role of biogeographical events (vicariance and dispersal routes) in most regions worldwide is barely known. Thus, we believe that the expansion of that framework for multiple species under a comparative approach may give clues on genetic legacies in response to Quaternary climate changes and other biogeographical processes.

**Keywords: biogeography, coalescence, palaeodistribution modeling, Quaternary climatic changes, vicariance**

#### **INTRODUCTION**

Comparative phylogeography has historically been derived from classical historical biogeography, whereby common patterns in lineage distribution within multiple taxa are often explained by vicariant events shared by all taxa (Bermingham and Moritz, 1998; Avise, 2000). However, because similar population genetic structures may arise under different demographic processes, the conventional method based on narrative descriptions and pattern interpretation derived from historical biogeography often results in dubious or indistinguishable historical demographic or vicariant processes. Thus, recovering the true demographic history of species is critical for understanding microevolutionary processes and the spatial context of lineage divergences (Knowles and Maddison, 2002), but this approach should still be expanded to the context of comparative phylogeography.

In this context, one of the major challenges in the emergent field of statistical phylogeography is to set up demographic scenarios independently of gene trees, which should help to define alternative hypotheses for temporal and geographical aspects of species dynamics and, consequently, establish relevant phylogeographical inferences (Knowles et al., 2007; Knowles, 2009; Carstens and Knowles, 2010). Recently, phylogeographers began to explore multiple lines of evidence obtained from advances in geology, paleontology, palaeopalynology, and ecology to guide choices about species dynamics across time and space, such as: (1) fossil and archeological records, including ancient DNA, as direct empirical evidence (Lagerholm et al., 2014); (2) palaeoclimatological and palaeovegetational reconstructions or general patterns of species distribution based on floristic records, as

indirect evidence (e.g., Collevatti et al., 2012a,b); (3) palaeodistribution modeling, the historical extension of a model-based approach increasingly applied to macroecological and palaeobiological questions by coupling the theory of ecological niche with palaeoclimatic simulations (e.g., Carnaval and Moritz, 2008); and (4) a combination of two or more of these approaches (e.g., Metcalf et al., 2014). Due to incompleteness of the fossil record (Paul, 2009), as well as the coarse nature of the temporal and spatial resolution of palaeoecological reconstructions, the use of ecological niche models has been an accessible and efficient tool to incorporate explicit spatio-temporal information into analyses of gene trees. Such links between statistical phylogeography and macroecology are indeed an emergent approach of current phylogeographical analyses (Richards et al., 2007; Collevatti et al., 2013).

More recently, models based on Approximate Bayesian Computation (ABC) implemented in MTML-msBayes (Huang et al., 2011) have been used to test simultaneous divergence in comparative phylogeography, taking into account the stochastic variance in coalescence processes underlying multiple co-distributed lineages, thus providing general biogeographic explanations for phylogeographical patterns (e.g., Chan et al., 2011; Bell et al., 2012). Here we discuss and propose perspectives for expansion of this new framework of comparative analyses by exploring a multi-model inference approach, centered on the recent advances of statistical phylogeography (see Collevatti et al., 2012a, 2013). We first present the multi-model approach and its components and then discuss how it can be expanded to infer processes in a comparative fashion.

### **COUPLING STATISTICAL PHYLOGEOGRAPHY AND MACROECOLOGICAL APPROACHES IN A MULTI-MODEL INFERENCE FRAMEWORK**

The reasoning for inferring processes from a multitude of models as an alternative to null-hypothesis significance testing comes from the common need to search general explanations for observed patterns from unknown explicit causal mechanisms (Stephens et al., 2006). Multi-model inference assumes that the most effective processes causing a given observed pattern should be inferred from the best-fitted model describing empirical data (Grueber et al., 2011). However, models used in practice and the inferred processes are limited to the available information. Paul Velleman clearly synthesize this limiting factor:

"A model for data, no matter how elegant, or correctly derived, must be discarded or revised if it does not fit the data or when new or better data are found and it fails to fit them" (Velleman, 2008, p. 4).

Thus, uncertainties from multiple sources of evidence should be exhaustively explored to reach the most possible realistic and less changeable inferences, and eventually the level of uncertainty can be so high that actually many alternative models become available, allowing a multi model-based approach (Millington and Perry, 2011).

In the multi-model approach for phylogeography, uncertainties can be investigated (i) within the set of alternative demographic scenarios being considered (for instance, exploring the impact of alternative methods and climatic simulations in ecological niche modeling), (ii) during the coalescent simulation steps (when estimating genetic and dispersal parameters), and lastly (iii) at the model selection stage (given multiple available selection criteria, e.g., Akaike's Information Criteria and likelihood estimates; Csilléry et al., 2010). Within the sub discipline of comparative phylogeography, we further propose to explore uncertainties from multiple phylogeographical hypotheses of the studied species.

## **SPECIES NICHE, PALAEODISTRIBUTION MODELING, AND THE ALTERNATIVE DEMOGRAPHICAL HYPOTHESIS**

Ecological niche modeling (ENM) has allowed exploration of the geographic context of species dynamics through the past by hind casting suitable climatic conditions (an *n*-dimensional space of climatic variables) using palaeoclimatic scenarios (Martínez-Meyer et al., 2004; Nogués-Bravo, 2009). ENMs deal with the geographic context independent of gene trees, and predict species' potential distribution over different time periods, thus providing additional information about species distribution dynamics that can be further used to set valid demographical hypotheses (Richards et al., 2007; Collevatti et al., 2013; **Figure 1A**). ENMs can actually generate multiple and independent hypotheses of species distribution history that reflect, at same time, ecological, and biogeographical realism (Collevatti et al., 2013). This is because different assumptions about the dynamics of species' ecological niches can be explored using different ENM methods (algorithms) and palaeoclimatic simulations, which are in turn based on different modeling assumptions or types of training data (modeling uncertainty). Fossil records, when available, can be used to improve predictions of past distributions (either by providing

additional information about species environmental preferences or by validating ENM predictions; Lorenzen et al., 2011; Varela et al., 2011), or to propose additional demographic hypotheses not discriminated from ENMs (**Figure 1A**).

Moreover, ENMs also provide the environmental context of population movements across geographical space through time (**Figure 1B**), as proposed by Collevatti et al. (2012b). Dispersal events may be inferred from spatially explicit analyses of population genetic structure in relation to fluctuations of climatic suitability through time (a direct measure from ENMs in areas where populations were sampled), space (habitat tracking or range shifts observed from consensual palaeodistribution as predicted by ENMs), and location of historical refugia (areas climatically suitable for the focal species throughout the time).

However, today's practices of coupling ENMs with phylogeographic data analysis are not free of caveats. Currently, most climatic reconstructions through Atmosphere-Oceanic General Circulation Models (AOGCMs) are mainly available for short time slices, as in the case of the last 21,000 years, i.e., the time interval since the last glacial maximum (see the most recent palaeoclimatic simulations in PMIP3<sup>1</sup> – and CMIP5 databases2). Additionally, the major lineage divergences and common phylogeographic breaks among species may have occurred earlier than this. In fact, coalescent analyses infer the time to the most recent common ancestor (TMRCA) in a time scale derived from the molecular sequence substitution rate under a particular demographic model (Kuhner, 2008). As a consequence, molecular sequences with lower mutation rates (such as chloroplast DNA for example) may lead to older divergence dating compared to the time interval of ENMs. Consequently, the predictions from ENMs and coalescent analyses would be temporarily discordant. We propose two solutions to this apparent weakness.

First, if palaeodistribution modeling is based only on climatic conditions and, most importantly, if predictions are intended to test the genetic legacy from recurrent glacial cycles, the modeler may set the distribution dynamics across the last glacial cycle (for which palaeoclimatic simulations are commonly available) and assumes analogous dynamics through the older cycles. Although separate glacial cycles have provided idiosyncratic dynamics on small temporal and geographical scales, the general pattern of intermittent glacial and interglacial periods was common throughout the Quaternary. Thus, considering broad geographical scales, it seems acceptable to assume that similar distributional dynamics have occurred across different Quaternary glaciations. In contrast, ENMs could be projected for deeper periods (e.g., last interglacial – 125,000 year – mid-Pliocene – ∼3 Million years), avoiding such important assumption, although few AOGCMs are currently available for such periods (see Stone et al., 2013, and the special issue "PlioMIP: experimental design, mid-Pliocene boundary conditions and implementation" at the journal Geoscientific Model Development, available at3). Thus, it is important that the assumption of analogous dynamics through time is validated by comparing the potential population movements across

<sup>1</sup>https://pmip3.lsce.ipsl.fr/

<sup>2</sup>http://cmip-pcmdi.llnl.gov/

<sup>3</sup>http://www.geosci-model-dev.net/special\_ issue5.html

geographical space including the specific time slice from ENM predictions with patterns of lineage diffusion explicitly simulated by relaxed random walk (RRW) model, which encompasses deeper time that is proportional to the molecular evolution of the sequence used (Lemey et al., 2009, 2010; see also the section below). If the general patterns of population dispersal during cooling and warming phases are concordant between ENM and RRW, then assuming similar distributional dynamics across Quaternary glaciations (on broad spatio-temporal scales) is not equivocal (but see next proposal).

Second, due to recent advances on Pliocene–Pleistocene stacked estimates of isotopic globally distributed oxygen (e.g., Lisiecki and Raymo, 2005), climatic conditions may be extended backward by using the climate change between LGM and present-day to

**(C)** Coalescent framework used to select the most likely demographic scenario matching the empirical genetic parameters. Demographic hypotheses in a spatially explicit context through time (H1, H2, and H3; colored circles represent the population dynamics trough time) are simulated using coalescent framework to investigate their consequent population genetic structure. Models are selected from multiple criteria, such as likelihood based on posterior predictive distribution and akaike information criterion (AIC). **(D)** Relaxed random walk (RRW) modeling used to predict historical dispersal routes at the time scale of molecular sequencing data. Saving the time scaling from each model,

shifts (see **B**). Considering the distribution map from figure, the RRW predicts intermittent dispersal routes through the time in a similar direction of range shift predicted from ENMs. The evidence from multiple models indicates that the dispersal routes predicted by RRW could be the result of climatic forcing across sequential glaciations. Moreover, RRW provides support to explore other features of population dynamics (e.g., source and sink of migrants) and biogeographic processes (e.g., dispersal barriers). Representation of Markov Chain was adapted from Professor Peter Beerli Lecture Notes (http://evolution.gs.washington.edu).

interpolate climate trends to older glacial cycles following the proportional oscillation across the deeper oxygen curve. At the same time, if predictions are at species level and comparative across multiple species, the ancestral state of the species niche may still be simulated across a phylogenetic hypothesis, so that palaeodistributions are automatically obtained at the deeper time (see example in Lawing and Polly, 2011; Rödder et al., 2013). When possible, these solutions should be preferred instead of assuming similar distribution dynamics across different Quaternary glaciations. Palaeoclimatic simulations for deeper times may improve our approach by projecting ENM predictions directly or using the temporal interpolation.

## **DEMOGRAPHIC HISTORY SIMULATION AND MODEL SELECTION**

To trace demographic history, demographic scenarios can be modeled, and simulated under a coalescent framework (**Figure 1C**; Kingman, 1982; see Csilléry et al., 2010 for a review of available software). Briefly, the available software runs independent simulations for each sequence region based on demographic parameters such as migration and effective population size, and under a given evolutionary model, sequence length, and mutation rate. Usually, simulation output includes genetic diversity estimates such as haplotype and nucleotide diversities or expected heterozygosity under Hardy–Weinberg equilibrium for genotypic data, number of haplotypes or alleles, parameters for neutrality, and demographic expansion tests, sequences, and genotypes.

The alternative models may be compared using several criteria (**Figure 1C**). For instance, the posterior estimates of genetic parameters for the alternative demographic scenarios can be compared with the empirical haplotype and nucleotide diversity (Csilléry et al., 2010). The likelihood of each model can be obtained from the posterior predictive distribution and the alternative models can be compared using the Akaike Information Criterion (AIC; see Burnham and Anderson, 2002). Model fitting may also be performed generating coalescent trees under each simulated demographic scenario and compared with observed coalescence time using ABC implemented in MTML-msBayes (Huang et al., 2011).

However, despite multiple lines of evidence to design alternative demographic hypotheses, spatially explicit modeling is yet to be developed. Although some advances have been made with software like SPLATCHE2 (Ray et al.,2010) and PHYLOGEOSIM 1.04 (Dellicour et al., 2014), more complex palaeodistribution dynamics that can differentiate among some predictions are still unavailable. For instance, modeling demographic scenarios for range shift and range expansion is still a challenge because both scenarios may result from similar demographical dynamics (smaller effective population size in the past than in the present-day), but generate different genetic signatures due to spatial context (see Excoffier et al., 2009). Also, different range shift scenarios may generate distinct genetic signatures depending on the spatial direction of colonization of founding lineages (Excoffier et al., 2009; Waters et al., 2013).

#### **DIFFUSION MODEL AND CLUES TO DISPERSAL ROUTES**

The lack of fossil records for most species makes understanding historical dispersal routes difficult especially in Neotropics. Thus, integrating direct spatio-temporal reconstruction of lineage dispersal may give new insights on the pathway of lineage dispersal to better understand phylogeographic patterns.

Lemey et al. (2009, 2010) proposed a Bayesian statistical approach to infer continuous phylogeographic diffusion using a RRW model, while simultaneously reconstructing the evolutionary history in time from molecular sequence data (**Figure 1D**). More specifically Lemey et al.'s (2009) approach describes the phylogeographical diffusion processes by stochastically selecting a diffusion rate scalar on each branch of the rooted phylogeny from an underlying discretized rate distribution while running a Bayesian Markov Chain Monte Carlo model. Consequently, two important advantages arise from Lemey et al.'s (2009) approach: (1) relaxing the most restrictive assumption of the standard Brownian diffusion model, and (2) infering the migration process in natural time scales (i.e., the time scale of the molecular sequence substitution process). Moreover, because this framework is based on stochastic models, it naturally accesses the uncertainties along the ancestral state reconstructions and the underlying phylogeographic process (Lemey et al., 2009, 2010), an essential component from any multi-model inference approach (Millington and Perry, 2011). We understand that this approach may be a fine complement to the static ENM predictions of population dispersal, using explicitly simulated dispersal routes in the evolutionary time scale of molecular sequences. Thus, it will become an indispensable component in a multi-model framework for phylogeographical inferences. RRW also deals with uncertainties along the ancestral state reconstructions and the underlying phylogeographical process because it is based on stochastic models (Lemey et al., 2009, 2010). The relaxed random walk model is implemented in the software BEAST 1.8.0 (Drummond and Rambaut, 2007) that analyses sequence evolution, demographic model, and lineage diffusion in space and time simultaneously, and the spatio-temporal reconstruction can be performed using SPREAD 1.0.6 (Bielejec et al., 2011). Although a promising approach, we understand that it is still necessary to find explicit methods to couple predictions from ENM and RRW.

### **EXPANDING THE MULTI-MODEL FRAMEWORK FOR COMPARATIVE PHYLOGEOGRAPHY**

Following this reasoning of model-based inference, we propose the extension of the framework coupling ENM, coalescent simulation and the RRW model for comparative phylogeography (**Figure 2**). In a nutshell, for such an upgrade, alternative demographical hypotheses should firstly be set considering the complete range of distribution dynamics considering all analyzed species. In a similarly manner to statistical phylogeography, all uncertainties from ENMs, and other sources of palaeodistribution scenarios should be explored, such as palaeovegetation reconstruction and fossil records, to set individual species dynamics.

However, species may share common palaeodistribution dynamics or not, under ENM predictions. Due to the differences in life-history or functional traits, species may have responded differently to climate changes or other biogeographical processes (Colinvaux et al., 2000). For instance, *Tabebuia impetiginosa* (Collevatti et al., 2012a) and *Astronium urundeuva* (Caetano et al., 2008), both from seasonally dry forests in Brazil, expanded their range in response to drier and cooler periods of the glacial cycles in Neotropics, whereas in Brazilian savannas, *Caryocar brasiliense* (Collevatti et al., 2012b), and *T. aurea* (Collevatti et al., 2014) showed population retraction in multiple refugia as response to the same climatic events. The lack of common palaeodistribution dynamics, however, does not mean that different historical biogeographical process affected each species

<sup>4</sup>http://ebe.ulb.ac.be/ebe/Software.html

#### **FIGURE 2 | Continued**

**Schematic view of the generalized framework for coupling ecological niche modeling, coalescent simulation, and diffusion model in a statistical comparative phylogeography approach.** Palaeodistribution maps resulting from ENMs are used to generate demographic hypotheses (see **Figure 1**) in a spatially explicit context through time (hypotheses H1 and H2; colored circles represent the population dynamics), according to the range shifts in species distributions in the past (e.g., range retraction or range stability). Other hypotheses may also be set based on fossil records (hypothesis H3) or a priori biogeographic scenario. Uncertainties in setting alternative hypotheses can be incorporated into the framework using several ENM methods and projecting distributions through different AOGCMs. In a next step, simulated coalescence structures are compared with observed data in a model selection approach, allowing selecting among the demographic hypothesis (H1, H2, or H3) the most likely to generate the current phylogeographic structure derived from molecular data. At the same time, phylogeographic diffusion models allow reconstructing colonization routes that are compared with palaeodistribution maps. Finally, this framework can be expanded into a multi-species comparative approach allowing inferring how whole assemblages responded to the interplay between climate changes, geographic barriers, and demographic processes, shaping the current patterns of species distribution, and biodiversity. Coalescent time for each species can be compared using, for instance, Approximate Bayesian Computation implemented in MTML-msBayes. Representation of Markov Chain was adapted from Professor Peter Beerli Lecture Notes (http://evolution.gs.washington.edu).

if evolutionary timing matches. Divergence timing and demographic response may be compared to better understand how Quaternary glaciations affected multiple species from distinct regions and with different traits. Concerning the comparative analyses, demographic hypotheses should be set *a priori*, and therefore biogeographical hypotheses would be particularly investigated for species with unique characters; i.e., hypotheses may be usually proposed for entire biomes or biotas (e.g., Pleistocene Arc hypothesis for Neotropical seasonally dry forests, see Prado and Gibbs, 1993) and thus may be simulated for all species from the same biome in a comparative phylogeography framework (**Figure 2**).

Consequently, the coalescent simulations based on the palaeodistribution scenarios may also be performed for all species and compared among species from the same functional group, similar ecosystems, or with similar life-histories (e.g., similar pollination and dispersal syndromes). The role of life-history or quantitative traits in shaping general phylogeographic patterns may ultimately be investigated using random or mixed effects models in meta-regression, weighting evidence by its level of uncertainty (Stanley and Jarrell, 1989). Whatever the source, the higher the uncertainty for a species (e.g., from ENM predictions) the lower is its influence to draw general phylogeographical inference under this comparative framework.

Moreover, the reconstruction of colonization routes would complement the understanding of how unique historical process affected multiple species in a broad biogeographical hypothesis (e.g., Taberlet et al., 1998). Even with a higher number of species studied, understanding the role of vicariance and dispersal routes is compromised in most regions worldwide because of the lack of direct empirical evidence from fossil records at community level. Thus, integrating direct spatio-temporal reconstruction of lineage diffusion with ecological niche modeling and coalescent simulation may indicate the pathways where multiple lineages have dispersed and their genetic legacies as a response to Quaternary climate changes and other biogeographic processes. For *T. aurea*, for instance, reconstruction of colonization routes unraveled the role of populations with higher genetic diversity at the edge of the historical climatic refugium as a source of migrants, whereas populations at the center of climatically stable areas worked usually as a sink of migrants (Collevatti et al., submitted).

In addition, integrating direct spatio-temporal reconstruction of lineage diffusion with dispersal routes predicted by the fossil record may allow validation and improvement of the lineage diffusion model. For instance, Lima et al. (2014) used the pollen fossil record of *Mauritia flexuosa,* a Neotropical swamp palm, to validate the predictions of ENM on population range shifts. The comparison of dispersal routes based on RRW models with pollen fossil records and ENM predictions can be applied to predict and validate dispersal routes during spatial population displacements.

In conclusion, along with the flexible and integrative nature of our multi-model framework in the context of the statistical phylogeography, its expansion in a comparative direction also makes it comprehensive. This aspect of our multi-model inference framework is particularly useful to investigate the complex dynamics and current patterns of genetic diversity in response to processes operating on multiple taxonomic levels as approached in comparative phylogeography.

#### **ACKNOWLEDGMENTS**

Our research program integrating macroecology and molecular ecology has been continuously supported by grants to the research network GENPAC (Geographical Genetics and Regional Planning for natural resources in Brazilian Cerrado) supported by CNPq/MCT/CAPES/FAPEG (projects no. 564717/2010-0, 563727/2010-1 and 563624/2010-8), and the network Rede Cerrado CNPq/PPBio (project no. 457406/2012-7) that we gratefully acknowledge.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

#### *Received: 29 June 2014; accepted: 22 January 2015; published online: 17 February 2015.*

*Citation: Collevatti RG, Terribile LC, Diniz-Filho JAF and Lima-Ribeiro MS (2015) Multi-model inference in comparative phylogeography: an integrative approach based on multiple lines of evidence. Front. Genet. 6:31. doi: 10.3389/fgene.2015.00031*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2015 Collevatti, Terribile, Diniz-Filho and Lima-Ribeiro. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Evidence for an intrinsic factor promoting landscape genetic divergence in Madagascan leaf-litter frogs

## *Katharina C. Wollenberg Valero\**

*Department of Natural Sciences, College of Science, Engineering and Mathematics, Bethune-Cookman University, Daytona Beach, FL, USA*

#### *Edited by:*

*James Edward Richardson, Royal Botanic Garden Edinburgh, UK*

#### *Reviewed by:*

*Andrew J. Crawford, Universidad de Los Andes, Colombia Miguel Vences, Technische Universität Braunschweig, Germany*

#### *\*Correspondence:*

*Katharina C. Wollenberg Valero, Department of Natural Sciences, College of Science, Engineering and Mathematics, Bethune-Cookman University, 640 Drive Mary McLeod Bethune Boulevard, Daytona Beach, FL 32114, USA valerok@cookman.edu*

#### *Specialty section:*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics*

> *Received: 01 July 2014 Accepted: 05 April 2015 Published: 15 May 2015*

#### *Citation:*

*Wollenberg Valero KC (2015) Evidence for an intrinsic factor promoting landscape genetic divergence in Madagascan leaf-litter frogs. Front. Genet. 6:155. doi: 10.3389/fgene.2015.00155*

Frontiers in Genetics | www.frontiersin.org May 2015 | Volume 6 | Article 155 |

The endemic Malagasy frog radiations are an ideal model system to study patterns and processes of speciation in amphibians. Large-scale diversity patterns of these frogs, together with other endemic animal radiations, led to the postulation of new and the application of known hypotheses of species diversification causing diversity patterns in this biodiversity hotspot. Both extrinsic and intrinsic factors have been studied in a comparative framework, with extrinsic factors usually being related to the physical environment (landscape, climate, river catchments, mountain chains), and intrinsic factors being clade-specific traits or constraints (reproduction, ecology, morphology, physiology). Despite some general patterns emerging from such largescale comparative analyses, it became clear that the mechanism of diversification in Madagascar may vary among clades, and may be a multifactorial process. In this contribution, I test for intrinsic factors promoting population-level divergence within a clade of terrestrial, diurnal leaf-litter frogs (genus *Gephyromantis*) that has previously been shown to diversify according to extrinsic factors. Landscape genetic analyses of the microendemic species *Gephyromantis enki* and its widely distributed, larger sister species *Gephyromantis boulengeri* over a rugged landscape in the Ranomafana area shows that genetic variance of the smaller species cannot be explained by landscape resistance alone. Both topographic and riverine barriers are found to be important in generating this divergence. This case study yields additional evidence for the probable importance of body size in lineage diversification.

Keywords: landscape divergence, speciation, riverine barriers, topographical complexity, Madagascar

## Introduction

Mechanisms of lineage diversification are still poorly understood biological phenomena. Large animal radiations are thought to be the result of the complex interaction between parameters of past and present physical environments (extrinsic factors), and factors intrinsic to the organisms (e.g., aspects of the phenotype and its evolutionary history). In order to obtain a quantitative understanding of the process of speciation, the relative importance of both types of factors needs to be assessed. For example, in African cichlid fishes, if optimal values for both extrinsic and intrinsic factors are in concordance (e.g., solar radiation and lake depth as extrinsic and sexual dichromatism as intrinsic), the likelihood of lineage diversification can be partially predicted (Wagner et al., 2012). The endemic Malagasy frog radiations have been extensively studied for their phylogenetic relationships (e.g., Wollenberg et al., 2007, 2008, 2011; Vieites et al., 2009) and biogeography, while less is known about their ecology (except for general ecological modes like habitat and breeding biology, Glaw and Vences, 2007). These frogs are sharing the island with other endemic radiations (Lemurs, Tenrecs, Vanga birds), resulting in patterns of diversification being shared among radiations and Madagascar thus constituting a good model region to infer the processes causing species diversity, species richness and endemism (Wollenberg et al., 2008; Vences et al., 2009). Regarding the Malagasy frogs, both extrinsic and intrinsic factors have been studied, with extrinsic factors usually being related to the physical environment (landscape, climate, river catchments, mountain chains), and intrinsic factors being clade-specific traits or constraints (reproduction, ecology, morphology, physiology). Additionally, geographic range size can be the result of and act as an extrinsic as well as an intrinsic factor (reviewed in Cooper and Purvis, 2009).

Most research in Madagascan frogs has been conducted on extrinsic factors, as the available data on genetics and distribution data facilitates this type of study. Following general practice in biogeographic inference, observed patterns (e.g., two phylogenetic clades being situated in two different climatic regimes), are being related to the diversification process (the difference in climate led to the evolution of the two groups). However, despite that extrinsic factors often correspond to phylogeographic splits of more basal clades (e.g., Kaffenberger et al., 2012) these largescale extrinsic factors fail to explain the majority of the more recent speciation events. In the case of the Madagascan frog genus *Gephyromantis*, three basal splits in the phylogeny mirror distribution areas in three different areas of faunal endemism, but most of species diversification events (46 in total, not considering taxonomic uncertainty or possible extinctions), could not be explained by such barriers (Kaffenberger et al., 2012). To this end, more fine-scale factors impeding gene flow over the landscape need to be studied. Given the fact that most Madagascan amphibian species diversity is located in the Eastern Rainforest biome corresponding to an underlying escarpment, topographic heterogeneity might be an important factor contributing to this diversity (Wollenberg et al., 2008). For example, Guarnizo and Cannatella (2013) found that elevational bands are the most important predictor for diversification between recent sister species of Andean *Dendropsophus* frogs. Other studies have emphasized the importance for smaller rivers, or montane ridges as barriers for frog dispersal (e.g., Zhan et al., 2009; Gehring et al., 2012). As for intrinsic factors, recent studies in frogs have emphasized the importance of body size on clade diversity (Van Bocxlaer et al., 2010; Zimkus et al., 2012). Testing this hypothesis for the largest Malagasy frog radiation (Mantellidae, with 242 species), revealed that smaller species indeed have higher clade diversity, smaller distribution areas, and higher mitochondrial substitution rates (Wollenberg et al., 2011). However, this trend was not statistically significant within the mantellid frog radiation (Wollenberg et al., 2011), potentially due to the small portion of large frogs with large range sizes within mantellids available for comparative testing. Pabijan et al. (2012) found nucleotide divergence between spatially separated populations in a subset of mantellid frogs to be inversely correlated with body size, which supports the hypothesis that body size as an intrinsic factors plays a role in generating genetic diversity.

Without doubt, both intrinsic and extrinsic factors contribute to generating Madagascan amphibian species diversity (Vences et al., 2009; Brown et al., 2014). The question is, what is their relative contribution? Under the assumption that similar processes of selection will produce similar outcomes, one way to test such interactions is to compare patterns across sister species that only differ in intrinsic factors. The subgenus *Gephyromantis* (Mantellidae/*Gephyromantis*) is a group of diurnal, inconspicuous leaf-litter frogs endemic to Madagascar. From what is known, many of the up to 18 species of the subgenus deposit eggs on land, and have pseudo-direct development, with varying degrees of reduction of a free-swimming tadpole stage (Randrianiaina et al., 2011). Within the subgenus *Gephyromantis*, one monophyletic lineage is comprised of small, microendemic frogs (containing the species *Gephyromantis enki*, *G. blanci*, *G. runewsweeki*) and a monophyletic lineage of larger frogs with wider distribution (containing populations of the species *Gephyromantis boulengeri*). While *G. runewsweeki* and *G. blanci* are elusive and probably only occurring in single, small patches of habitat, *G. enki* is widely distributed in Ranomafana National Park (RNP). There, it inhabits mid- to high-elevations. *G. boulengeri* occurs from RNP to Nosy Mangabe in the North–East, and is also widely distributed in lowlands. Since these two lineages containing small and medium sized frogs are sister to each other, (1) they are of the same evolutionary age (Wollenberg et al., 2011). (2) They share the same general mode of reproduction, thus being similar in breeding biology. (3) Both clades being diurnal and occupying similar calling positions, they are ecologically similar. (4) Inhabiting partly the same area (Ranomafana) means, that there they are faced with the same obstacles to dispersal. The main differences observed between *G. enki* and *G. boulengeri* are (1) body size, and (2) range size. Because of their similarities in most other life-history traits relevant for amphibians, these two species therefore comprise an ideal system to test whether different body and range size cause different patterns of genetic divergence over a landscape. In this paper I test whether population genetic structure of these two species is affected by landscape resistance and geographical barriers the same way or differently. Within RNP, both species occur on both sides of a large river (the Namorona River). Further, elevation steadily increases within a short distance. Wollenberg et al. (2011) proposed that a microendemic phenotype (small frogs with small range sizes) would diversify faster than frogs with a combination of larger range and body size. This leads to the expectations of: (1) Increased level of genetic differentiation in *G. enki* compared to *G. boulengeri*, and (2) Topographic structures such as elevational bands or the Namorona River constituting strong barriers to diversification for *G. enki*.

## Materials and Methods

To test these hypotheses, I analyzed sequences of the mitochondrial cytochrome b (*cytb*) gene and the nuclear recombination-activating gene 1 (RAG1) gene of populations of both clades and other members of the subgenus *Gephyromantis*. For *cytb*, 106 sequences of *G. enki* and 58 sequences of *G. boulengeri* were analyzed. For *RAG1*, 30 sequences of *G. enki* and 33 sequences of *G. boulengeri* were analyzed. Amplification and sequencing protocols for newly determined sequences follow Kaffenberger et al. (2012). Specimen and locality information, and Genbank accession numbers are listed in Supplementary Table S1. For visualization of genealogical relationships, reticulate evolutionary networks were constructed from each (phased for RAG1) alignment with the software NETWORK V.4.611 (Fluxus Technology Ltd, 1999–2012). The Median-joining algorithm was applied (Bandelt et al., 1999). The resulting networks were edited in NETWORK and Corel Draw (V.X6). Extensive networks were constructed for all haplotypes without removal of single sequence haplotypes.

Locality datasets were constructed for both species (for coordinates, see Supplementary Table S1) as input files for the spatial analyses. First, I constructed environmental niche models for both *G. enki* and *G. boulengeri* in the software Maxent 3.3.3k under standard settings (Phillips et al., 2006; Phillips and Dudík, 2008). The models (random seed) were created per species for Madagascar as background with 10,000 background points. A resistance map was then calculated for each species by applying the circuit theory to the Maxent models (software Circuitscape V. 4.0, McRae, 2006; McRae and Shah, 2009). In this approach, landscapes are represented as conductive surfaces, with high resistances assigned to barriers for movement and dispersal (McRae and Shah, 2009). Output was set to resistances. These resistance maps are commonly used to predict patterns of gene flow. Values for landscape resistance and for elevation were extracted from the resistance map and a digital elevation model for each sampling locality per species in DIVA GIS (V.7.5.0, Hijmans et al., 2001).

Genetic distance matrices of *G. enki* and *G. boulengeri* were constructed in MEGA (V.6, Tamura et al., 2004, 2013) using the Maximum Composite Likelihood model. All codon positions were included. The genetic distance matrices were spatially decomposed using the PCNM function (Principal Components of the Neighborhood matrix, Borcard and Legendre, 2002; Borcard et al., 2004) in R (package vegan, Oksanen et al., 2011). PCNMs with negative Eigenvectors or very small values were then discarded prior to analysis.

One dataset per species containing the genetic distance PCNMs and the extracted values for elevation and landscape resistance was assembled for statistical analysis (StatSoft, Tulsa, OK, USA). A regression analysis was conducted with landscape resistance and elevation as independent variables and the genetic PCNMs as dependent variables,

convex polygon of the sampling localities within Ranomafana National Park (RNP). (C) Shows a significant but weak correlation between spatially

differences computed with Kruskal–Wallis test. *G. enki* in average has a higher residual variance in genetic distance over its distribution area than *G. boulengeri*.

barriers as yellow bars. (B) based on *RAG1* Maximum Composite Likelihood distances; k-s- *G. enki* dispersal barriers as yellow bars; Map Data: Google, 2015 Digital Globe.

in order to compute residuals. These residuals represent the remainder of the genetic variance of each species and marker, after removing the effect of isolation by resistance and topography. Two data points of *G. boulengeri* were removed from the cytb dataset, as their residuals exceeded twice the size of the standard deviation and thus represented outliers. The regression was then repeated with exclusion of these two data points. Localities included were (1) within RNP: Ranomafana, Station Valbio, Valbio: Campsite, Ambatolahy, Sahamalaotra, Kidonavo, Ranomafanakely, Sakaroa, Talatakely II, Talatakely II, Talatakely III, and Station Thermale, and (2) outside RNP: Ifanadiana, Ambohitsara, Andasibe (Supplementary Table S1). To determine whether the remainder of genetic variance differs between the smaller species and the larger one, a Kruskal–Wallis test was then performed in STATISTICA.

To analyze the genetic divergence of the smaller species *G. enki* within RNP, which represents the extent of its spatial distribution, a spatial representation of barriers to dispersal was computed using the methods of Manniet et al. (2004) within the software Barrier 2.2. Barrier identifies spatial boundaries corresponding to areas of high genetic distance using Monmonier's maximum difference algorithm (Manniet et al., 2004). These barriers were computed for *cytb* and *RAG1* separately. Arlequin (V. 3.5, Excoffier and Lischer, 2010) was used to assess haplotype diversity of sampling localities, and to test hypotheses of diversification across barriers for each genetic marker separately. For this purpose, analysis of molecular variance (AMOVA) was run on two groups; (1) including populations on both sides of the Namorona river, and (2) including populations separated by elevational bands. Populations were grouped according to north and south of the Namorona River, with a northern group containing Ranomafana, Valbio, Valbio: Campsite, Ambatolahy, Sahamalaotra, Kidonavo, Ranomafanakely, and a southern group containing Sakaroa, Station Thermale, Talatakely I, Talatakely II, and Talatakely III sampling localities. Populations grouped by three elevational bands were 1-Ranomafana, Station Thermale (630–640 m.asl.), (2) Ambatolahy, Campsite, Station Valbio, Talatakely I, Talatakely II, Talatakely III, Sakaroa (900–1000 m.asl.), (3) Sahamalaotra, Ranomafanakely, Kidonavo (1140–1160 m.asl.). AMOVAs were performed using pairwise differences and 10,000 random permutations. Significance of recovered fractions was tested with 10100 random permutations.

## Results and Discussion

## Genetic Diversity of the Smaller Species *G. enki* within RNP is Greater than in the Larger Species *G. boulengeri*

Maxent returned good AUC values for both *G. enki* (0.99) and *G. boulengeri* (0.99) for the environmental niche model computation. The resistance maps computed on the basis of these environmental niche Models showed that landscape resistance for both *G. enki* and *G. boulengeri* is low in the Ranomafana area (**Figure 1**). Regression results reveal that some genetic differentiation of both *G. enki* and *G. boulengeri* can be explained by landscape resistance and the prevalence of different elevational bands in the area (isolation-by-resistance, McRae, 2006). A correlation between landscape resistance and the spatially decomposed genetic distances is shown in **Figure 1**. This analysis included all populations for both species. The regression models were significant for the *G. enki cytb* (*R*<sup>2</sup> <sup>=</sup> 0.45, *<sup>p</sup> <sup>&</sup>lt;* 0.0001) and the *G. boulengeri* RAG1 (*R*<sup>2</sup> = 0.1, *p <* 0.04) datasets, but not for the *G. enki* RAG1 (*R*<sup>2</sup> <sup>=</sup> 0.03, *<sup>p</sup> <sup>&</sup>lt;* 0.4) and the *G. boulengeri cytb* (*R*<sup>2</sup> = 0.03, *p <* 0.5) datasets. Residuals were then computed and used for hypothesis testing. The smaller species *G. enki* in average showed higher residual genetic variance than the larger species *G. boulengeri* after controlling for landscape resistance and topography. A Kruskal–Wallis test for landscape-independent genetic divergence was significant for cytb, but not for RAG1 [KW-H: 11.88; *p* = 0.0006 for both markers combined (not shown), KW-H: 7.1322, *p* = 0.0076 for cytb alone, KW-H: 0.5336, *p* = 0.4651 for RAG1 alone; **Figure 1**].

Conclusively, the results confirm the expectation that among two ecologically similar sister species of frogs, the smaller species shows higher genetic variance over the same geographic area, independently from isolation-by-resistance. These results correspond well to the analysis of Pabijan et al. (2012), who found a similar trend for a set of mantellid frogs over a larger distance (between Andasibe and RNP).

## Landscape Effects on Diversification of the Smaller Species *G. enki*

Haplotype networks generated for *G. enki* showed a separation of haplotypes between localities north and south of the Namorona River (**Figure 2**). While the RAG1 network showed some haplotypes restricted to the northern populations, the southern populations were all allocated to haplotypes that also occurred north of the Namorona River. The faster evolving cytb gene, however, showed a clear distinction between two haplotype groups that differed in one mutated position between northern and southern banks of the river (**Figure 2**). This distinction was not perfect, but hints at the Namorona River being a barrier for these frogs. The estimated dispersal barriers for *G. enki* exist this riverine barrier, close to and parallel to the Namorona River (e.g., a,c,e,f,g, **Figure 3**). Additionally, barriers perpendicular or far away from the Namorona River (e.g., b,d, **Figure 3**) suggest the importance of elevational bands for impeding *G. enki* gene flow. The AMOVA for two classes of barriers (riverine versus elevational) confirmed that *G. enki* showed both significant withinpopulation differentiation as also high within-group differentiation (**Table 1**). Furthermore, significant among-group variation was detected for both classes of barriers in *cytb*, but not in *RAG1*.

Sampling localities on opposite sides of the Namorona River explained 32.04% of the molecular variance of *cytb* found in *G. enki*. The location of populations on either side of the Namorona River was a significant predictor for genetic divergence, also the elevational bands (perpendicular to the Namorona River) explained a significant portion of molecular variance in *cytb* of *G. enki*. With 19.22%, this grouping explained 12.82% less variance than the riverine barrier grouping. No single best predictor for genetic divergence of *G. enki* was found, which indicates that any topographic structure can act as a barrier for a small frog, not only large rivers.

The Namorona River is therefore a stronger barrier to dispersal of *G. enki* than the elevational profile of RNP. This might be explicable by the fact that the

TABLE 1 | Analysis of molecular variance (AMOVA) for the partitioning of genetic variation of the mitochondrial *cob* gene and the nuclear RAG1 gene within and among populations of G. enki.


*Shown is percentage of variance explained for: populations grouped according to position relative to the Namorona River, and populations grouped by elevational bands. Significance of fractions (covariance components) tested with 10100 permutations, indicated with* ∗*p < 0.05 or* ∗∗*p < 0.01. Significant among-group divergence in bold.*

subgenus *Gephyromantis* is the only clade of Malagasy frogs that has a terrestrial mode of development. Tadpoles of many Malagasy frog species are adapted to fast-flowing streams and can therefore be expected to cross a riverine barrier, but not *G. enki* (Glaw and Vences, 2007; Randrianiaina et al., 2011). These results conform to the expectation that fine-scale topography, in this case located in the lower montane RNP in Madagascar, contains multiple barriers for diversification for a small species of frog which are not limiting gene flow for its larger sister species. Besides the classic question of whether large scale biogeographic barriers such as the Amazon impedes dispersal and gene flow (Lougheed et al., 1999; Gascon et al., 2000, 2006), recently also smaller water bodies have been confirmed as barrier for recent amphibian diversification events (Ratsoavina et al., 2013; Munoz-Ortiz et al., 2014; van de Vliet et al., 2014, but see Dahl et al., 2013).

In addition to the confirmation of a small river and fine-scale topography serving as a dispersal barrier for a small rainforest frog, this study also confirms the hypotheses that a small and a larger sized, ecologically similar sister species pair of frogs show different patterns of landscape divergence. Adding to recent evidence for an effect of life-history traits on evolutionary processes shaping biodiversity (Fouquet et al., 2012), this case study shows that intrinsic factors such as body size, and associated distribution area size, might be important for diversification of Malagasy frogs.

## References


## Acknowledgments

Numerous colleagues helped during the collection of samples, most notably Franco Andreone, Parfait Bora, Rainer Dolch, Olga Jovanovic, Roger-Daniel Randrianiaina, Goran Safarek, David R. Vieites, Theo Raejarison, and Emile Raofjarison. Gabi Keunecke, Meike Kondermann, and Eva Saxinger were of invaluable help in the lab. This study was carried out in the framework of a cooperation accord and corresponding permits between the Département de Biologie Animale of the University of Antananarivo, Madagascar, the Technical University of Braunschweig, and the Zoologische Staatssammlung, München, Germany. Miguel Vences is thanked for providing valuable comments on the manuscript. The RNP and Research Station Valbio are thanked for providing facilities and logistic assistance. This work was funded by a travel grant of the German Academic Exchange Service (DAAD), National Science Foundation grant HBCU-UP 1435186, and by the Systematics Research Fund of the Systematics Association and the Linnean Society of London.

## Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fgene.2015.00155/ abstract


integrative amphibian inventory. *Proc. Natl. Acad. Sci. U.S.A.* 106, 8267–8272. doi: 10.1073/pnas.0810821106


**Conflict of Interest Statement:** The Reviewer Miguel Vences, declares that, despite having collaborated with author Katharina C. Wollenberg Valero, the review process was handled objectively. The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Wollenberg Valero. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Comparative phylogeography of eight herbs and lianas (Marantaceae) in central African rainforests

## *Alexandra C. Ley1,2\*, Gilles Dauby2, Julia Köhler 1, Catherina Wypior 1, Martin Röser <sup>1</sup> and Olivier J. Hardy2*

*<sup>1</sup> Institut für Geobotanik und Botanischer Garten, University Halle-Wittenberg, Halle (Saale), Germany*

*<sup>2</sup> Evolutionary Biology and Ecology, Faculté des Sciences, Université Libre de Bruxelles, Brussels, Belgium*

#### *Edited by:*

*Toby Pennington, Royal Botanic Garden Edinburgh, UK*

#### *Reviewed by:*

*Stephen Cavers, Centre for Ecology and Hydrology, UK Marc Sosef, Botanic Garden Meise, Belgium*

#### *\*Correspondence:*

*Alexandra C. Ley, Institut für Geobotanik und Botanischer Garten, University Halle-Wittenberg, Neuwerk 21, Halle (Saale) 06108, Germany e-mail: alexandra.ley@botanik. uni-halle.de*

Vegetation history in tropical Africa is still to date hardly known and the drivers of population differentiation and speciation processes are little documented. It has often been postulated that population fragmentations following climate changes have played a key role in shaping the geographic distribution patterns of genetic diversity and in driving speciation. Here we analyzed phylogeographic patterns (chloroplast-DNA sequences) within and between eight (sister) species of widespread rainforest herbs and lianas from four genera of Marantaceae (*Halopegia*, *Haumania, Marantochloa, Megaphrynium*), searching for concordant patterns across species and concordance with the Pleistocene refuge hypothesis. Using 1146 plastid DNA sequences sampled across African tropical lowland rainforest, particularly in the Lower Guinean (LG) phytogeographic region, we analyzed intra- and interspecific patterns of genetic diversity, endemism and distinctiveness. Intraspecific patterns of haplotype diversity were concordant among most species as well as with the species-level diversity pattern of Marantaceae. Highest values were found in the hilly areas of Cameroon and Gabon. However, the spatial distribution of endemic haplotypes, an indicator for refuge areas in general, was not congruent across species. Each proposed refuge exhibited high values of endemism for one or a few species indicating their potential role as area of retraction for the respective species only. Thus, evolutionary histories seem to be diverse across species. In fact, areas of high diversity might have been both refuge and/or crossing zone of recolonization routes i.e., secondary contact zone. We hypothesize that retraction of species into one or the other refuge happened by chance depending on the species' distribution range at the time of climate deterioration. The idiosyncratic patterns found in Marantaceae species are similar to those found among tropical tree species, especially in southern LG.

**Keywords: endemism, distinctiveness, genetic diversity,** *trnC-petN1r***, refugia, Lower Guinea**

## **INTRODUCTION**

Vegetation history in tropical Africa is still to date hardly known and the drivers of speciation and population differentiation processes are little documented. Hypotheses on the diversification of the Afrotropical flora include allopatric differentiation/speciation driven by population fragmentation following Pleistocene climate changes (Robbrecht, 1994; Sosef, 1994; Maley, 1996) and parapatric differentiation/speciation across ecological gradients (e.g., temperature and precipitation gradients; Fjeldsa and Lovett, 1997; Vande Weghe, 2004; Heuertz et al., 2013). Phylogeographic studies within and between closely related species might shed new light on this matter as indicated by similar studies in temperate regions (Taberlet et al., 1998; Schönswetter et al., 2005).

Palynological studies (Maley and Brenac, 1998; Dupont et al., 2000; Bonnefille, 2007; Ngomanda et al., 2009; Dupont, 2011) and palaeo-environmental reconstructions (Anhuf et al., 2006) suggest a repeated fragmentation of the tropical forest in Africa due to (glacial-interglacial) climate oscillations for the last million years. For example, during the African Humid Holocene period (c. 6000–9000 years BP) a single forest block extended from West to Central Africa beyond the current forest cover limit, while the forest was presumably highly fragmented and reduced in size during the last glacial maximum (c. 19000–26000 years BP). This might have led to population fragmentation followed by the independent evolution of the isolated populations through mutation and drift and ultimately the establishment of species. Alternatively, and/or simultaneously, isolated populations might have adapted to different climatic conditions ultimately forming ecologically different species. Indeed, within Lower Guinea (LG, i.e., the western part of the Central African rainforest block, identified as a phytochorion by White, 1979, **Figure 1**), climatic heterogeneity is characterized by a marked W-E precipitation gradient from the Coast to the inland and a North-South seasonal inversion at a latitude c. 2◦N (Leroux, 1983; Vande Weghe, 2004).

Recently, comparative phylogeographic studies of central African trees revealed a partial congruence of phylogeographic patterns with postulated refugia (Hardy et al., 2013; Heuertz et al.,

**FIGURE 1 | Pattern of species diversity of Marantaceae in tropical Africa based on Schnell (1957), Dhetchuvi (1996), Jongkind (2008) and Ley and Claßen-Bockhoff (2012).** Postulated refugia after (Maley, 1996); DRCongo,

Democratic Republic of Congo; RCongo, Republic of the Congo. Inset shows phytochoria after (White, 1979): UG, Upper Guinea; LG, Lower Guinea; C, Congolia.

2013; Dauby et al., 2014a). However, despite the occurrence of some common phylogeographic features, each species displayed an original pattern, especially in Gabon, suggesting idiosyncratic evolutionary histories. It has been hypothesized that this is due to less severe changes in forest cover reduction in this area during climate oscillations (Dupont et al., 2000; see also Holstein and Renner, 2011).

Here we investigate whether similar genetic patterns as so far detected in tropical African trees might also be found in perennial herbs and lianas from the forest understorey. We might expect that, compared to trees, phylogeographic patterns of herbs and lianas mirror younger historical events due to presumably shorter life cycles (Putz, 1990; Gerwing, 2004; Brandes et al., 2011). Furthermore, general patterns might be more structured in herbs than in trees (see e.g., *G*ST in nuclear markers in Nybom, 2004) due to a more patchy community structure and potentially smaller dispersal distances of pollinators and dispersers in the tropical understorey (for trees: *<*14 km, Ward et al., 2005; 100 m– 100 km, Carbone et al., 1999; for understorey shrub: 10–20 m, Zeng et al., 2012).

More specifically, we perform in Lower Guinea a comparative phylogeographic study of eight perennial herbs and lianas of the family Marantaceae. We search for (1) congruent patterns across species that might have been driven by a common vegetation history, and (2) congruence of these patterns with postulated rainforest refugia that might support the importance of these areas for species survival and population differentiation.

#### **MATERIALS AND METHODS**

#### **SPECIES STUDIED AND SAMPLING**

The Marantaceae (30 genera) are a pantropical family of perennial herbs and lianas of the understorey and gaps of lowland rainforest (0–1500 m) with highest species diversity found in America (∼450 spp.) followed by Asia (∼50 spp.) and Africa (∼40 spp.) (Dhetchuvi, 1996; Andersson, 1998; Kennedy, 2000; Suksathan et al., 2009; Ley and Claßen-Bockhoff, 2011). Each genus of the family is endemic to one continent except *Halopegia* and *Thalia* (Andersson, 1998). Phylogenetic investigations suggest a split of this family from its sister family Cannaceae some 95 ± 5 Ma ago. The family then started to diversify ca. 63 ± 5 Ma ago (Kress and Specht, 2005) in the late cretaceous with the establishment of the first tropical everwet habitats in the current tropics (Willis and McElwain, 2002). The Marantaceae are thus probably not a Gondwanan group, i.e., Marantaceae are not distributed pantropically due to vicariant events ca. 110 Ma ago (Kearey and Vine, 1996). Biogeographic analyses suggest instead the occurrence of several independent dispersal events between continents followed each time by intra-continental speciation resulting in several independent species clades per continent (Prince and Kress, 2006). In continental Africa the current distribution of the Marantaceae family ranges from Senegal in the West to Tanzania in the East following today's limits of the tropical rainforest. Highest species numbers are found in Gabon and Cameroon (**Figure 1**). Distribution ranges of individual species vary from widespread (equaling the distribution of the whole Marantaceae family in Africa) to restricted, either to the West or East of the Dahomey gap and/or to Cameroon and/or Gabon (Dhetchuvi, 1996).

The Marantaceae species differ from their sister family Cannaceae by a pulvinus and an explosive pollination mechanism (Claßen-Bockhoff, 1991; Kennedy, 2000). It is a highly diverse family with regard to species number and adaptations to different pollinators and dispersal agents (Kennedy, 2000; Clausager and Borchsenius, 2003; Locatelli et al., 2004; Ley, 2008; Ley and Claßen-Bockhoff, 2009). The species of Marantaceae show typical characteristics of plants from the tropical understory such as self-compatibility (*Halopegia azurea* even autogamous, Ley and Claßen-Bockhoff, 2013), clonality via rhizomes (*Marantochloa congensis* additionally via vivipary (bulbils), Kennedy, 2000) and animal pollination and dispersal (Ley, 2008; Ley and Claßen-Bockhoff, 2009).

For the current study eight species from four different genera with different growth forms, distribution ranges and pollinators were chosen (**Table 1**). Sampling of leaf material for genetic analyses was envisioned to cover the whole distribution area of each species. However, for all species sampling was better in Cameroon and Gabon and only fragmentary in West Africa (i.e., Upper Guinean phytochorion) and the Congo Basin (i.e., Congolian phytochorion). We thus here used the entire dataset including West Africa and the Congo Basin for the description of the phylogeographic pattern of each species and then limited the dataset to Lower Guinea, when comparing the phylogeographic pattern qualitatively and quantitatively among species.

#### **DNA EXTRACTION AND AMPLIFICATION**

For the eight species we used sequences from the chloroplast (cp) inter-genic spacer *trnC-petN1r* using the primers trnC 5 -CCAGTTCAAATCTGGGTGTC-3 (modified from Demesure et al., 1995) and petN1r 5 -CCCAAGCAAGACTTACTATATCC-3 (Lee and Wen, 2004). For *Marantochloa congensis* an additional marker (*psbA-trnH*) was amplified to increase resolution, using the primers psbA 5 -GTTATGCATGAACGTAATGCTC-3 and trnH2 5 -CGCGCATGGTGGATTCACAATCC-3 (Sang et al., 1997; Tate and Simpson, 2003). The genetic data for the genera *Haumania* and *Marantochloa* was updated from Ley and Hardy (2010, 2014). For the third species of the genus *Haumania* (*H. leonardiana*) only sequences from six individuals from the Democratic Republic of Congo (DRCongo) could so far be obtained and were added to the haplotype network to show intragenus relationships but were not analyzed any further due to the scarcity of available sequences. The phylogeographic patterns of three species from the genera *Halopegia* and *Megaphrynium* were characterized here for the first time. The production of sequences for these species followed the protocol of DNA extraction, amplification and sequencing described in Ley and Hardy (2010).

#### **GEOGRAPHIC DISTRIBUTION OF CHLOROPLAST HAPLOTYPES AND PHYLOGENETIC NETWORKS**

For each species chloroplast haplotypes were analyzed in DnaSP Version 5.10 (Librado and Rozas, 2009) and their geographic distribution mapped. DNA haplotypes were submitted to Genbank (for accession numbers see Supplementary Table 1). To obtain the minimum number of mutations between haplotypes, a network was established with the software Network 4.5.1.0 (www*.* fluxus-engineering*.*com; Bandelt et al., 1999) using a maximum parsimony method based on a median joining algorithm (MJ). Networks were established per species and for entire genera to identify possible plastid captures between closely related species (Ley and Hardy, 2010, 2014). Nucleotide diversity, which represents the average number of nucleotide differences per site between two sequences, was calculated in Arlequin (Excoffier et al., 2009).

#### **GRID-BASED STANDARDIZED MEASURES OF GENETIC DIVERSITY, ENDEMISM AND DISTINCTIVENESS**

For the comparison of geographic patterns of genetic diversity between species at different scales in Lower Guinea we subdivided the region into three different grid systems with cell sizes of 0.75◦-, 1.5◦- and 3◦-sides (Supplementary Figure 1 for 0.75◦ and 1.5◦; 3◦ not shown). Given that a minimum of three samples was necessary per species and grid cell to compute diversity indices (see below), smaller cells allowed higher spatial resolution but at the cost of lower precision and higher loss of data in areas were sampling was less dense (for numbers of individuals per grid cell 0.75◦ and 1.5◦ see Supplementary Tables 2, 3).

#### *Within cell diversity, endemism and distinctiveness*

We computed several statistics quantifying genetic diversity for each species within each cell: Nielsen's estimator of the effective number of haplotypes *NAe* (Nielsen et al., 2003), the gene diversity corrected for sample size *He* (Nei, 1978) and the mean phylogenetic distance between individuals *v* (gene diversity with ordered alleles, Pons and Petit, 1996). The different statistics were computed with SPAGeDi Version 1.4 (Hardy and Vekemans, 2002). The degree of endemism of each haplotype was assessed by the maximal distance between individuals carrying that haplotype. We quantified the degree of haplotypic endemism, *End*,



*Abbr., Abbreviation; N, number;* #*Schnell, 1957; 1Tutin and Fernandez, 1993;* <sup>+</sup>*Dhetchuvi, 1996;* ◦*Ley, 2008; \*Ley and Cla*ß*en-Bockhoff, 2009. Phytochoria after White, 1979: C, Congolia; GC, Guineo-Congolian; LG, Lower Guinea; UG, Upper Guinea.*

per cell and species as the proportion of individuals carrying haplotypes with a maximal geographic extension of 200 km. Finally, for each species, the level of phylogenetic distinctiveness of each cell with respect to the other ones was computed following Dauby et al. (2014a). To this end, for each pair of cells (*i* and *j*), the mean phylogenetic distance between individuals drawn from *i* and *j* (*vij*) was computed, as well as the spatial distance between the centroids of individuals belonging to *i* and *j* (*dij*). *S'ij*, the residuals of the regression of *vij* on ln(*dij*), or the centered *vij* values themselves if there was no significant positive correlation between *vij* and ln(*dij*) according to a Mantel test, were then averaged over all pairs involving one particular cell, *S'i*, providing a measure of the phylogenetic distinctiveness of that cell above or below the average across all cells (Petit et al., 2003; Dauby et al., 2014a).

#### *Differentiation statistics*

Global differentiation statistics *G*ST and *N*ST among cells (for cells with at least three individuals) were computed for each species. *G*ST accounts for differences in haplotype frequencies while *NST* additionally accounts for the phylogenetic distances between haplotypes. To test if there was a phylogeographic signal, characterized by *N*ST *> G*ST, permutation tests were performed in SPAGeDi Version 1.4 (Hardy and Vekemans, 2002).

#### **CONGRUENCE OF PHYLOGEOGRAPHIC PATTERNS AMONG SPECIES**

Congruence of phylogeographic patterns for each pair of species was evaluated (i) by comparing within cell diversity and endemism metrics using Pearson correlation tests, and (ii) by comparing matrices of pairwise standardized distinctiveness among grid cells (*S'ij*) using Mantel tests (see Dauby et al., 2014a). To obtain a multispecies test of overall geographic congruence of local diversity, endemism or distinctiveness, these metrics were first centered (i.e., minus their mean value) and reduced (i.e., divided by their standard deviation) within species, and then differences among grid cells were tested using a One-Way ANOVA where grid cells were used as factor (due to missing data, species could not be added as another factor). To represent diversity patterns on a map, centered and reduced diversity and endemism metrics were shown per cell and species, or were averaged over species to represent multi-species trends.

## **RESULTS**

#### **GENETIC POLYMORPHISM**

The numbers of individuals sequenced per species ranged from 75 to 166, totaling 1046 individuals (991 in Lower Guinea, **Table 2**) sequenced for *trnC-petN1r* and 110 for *psbA-trnH* (Lower Guinea only, **Table 2**).

The *trnC-petN1r* region had an average length of about 800 bp (including indels). The number of SNPs (counting indels as single mutations) per species varied between 11 (*Halopegia azurea*) and 27 (*Marantochloa monophylla*) and the number of haplotypes per species varied between seven (*Megaphrynium trichogynum*) and 19 (*Marantochloa monophylla*, see **Table 2**). Overall nucleotide diversity ranged from 0.000846 ± 0.000690 in *Halopegia azurea* to 0.007289 ± 0.003925 in *Marantochloa monophylla* (**Table 2**). Average genetic diversity per grid cell measured as *He* was highest in *Haumania danckelmaniana* (**Table 3**). Genetic diversity measured as *v* taking genetic distance between haplotypes into account was highest in *Marantochloa monophylla* and endemism per grid cell (*End*) was highest in *M. incertifolia*. Measures of genetic diversity were independent of grid cell size (Supplementary Table 4).

For *M. congensis psbA-trnH* sequences reached a length of about 900 bp (including indels and reverse mutations). Reverse mutations were excluded in the following analyses leaving nine mutations, seven haplotypes (**Table 2**) and a network without loops with a maximum of two mutations between adjacent haplotypes (Supplementary Figure 2).


**Table 2 | Sample sizes and genetic diversity at the** *trnC-petN1r* **and** *psbA-trnH2* **region in the eight study species from the Marantaceae.**

*For abbreviations of species names see Table 1. N, number of individuals; SNP, number of single nucleotide polymorphisms. Haplotypes (total/private): "total" includes haplotypes shared between sister species from the same genus. "Private" considers only haplotypes found in the respective species.*


**Table 3 | Within cell diversity pattern at the** *trnC-petN1r* **region in eight Marantaceae species in Lower Guinea for grid cell size 0.75◦ (average across grid cells, [range]).**

*N, number; NAe, Effective N of alleles (Nielsen et al., 2003); He, gene diversity corrected for sample size (Nei, 1978); v, mean phylogenetic distance between individuals (Pons and Petit, 1996); End, mean proportion of individuals carrying endemic alleles (haplotype range < 200 km). For abbreviations of species names see Table 1.*

Haplotype networks based on *trnC-petN1r* required 11–27(32) mutations (without torso) within species and 11–38(56) mutations within genera. Haplotypes did never show a high divergence neither within nor between congeneric species (**Figures 2**, **3**). Within species haplotypes differed from the closest other haplotype generally by one mutation. Only few exceptions presented a distance of up to three mutations between closest haplotypes within species. All species networks included loops except for *Halopegia azurea, Haumania liebrechtsiana* and *Marantochloa incertifolia*. *Marantochloa monophylla* was the only species that showed two intraspecific divergent lineages. Between species, maximum distances between nearest haplotypes ranged between two to three mutations. In all networks we found a few individuals that belonged to one morphological species but exhibited the same haplotypes as individuals from the other morphological species.

#### **PHYLOGEOGRAPHIC PATTERNS WITHIN EACH GENUS AND SPECIES**

*Halopegia azurea* was the species with the lowest haplotype diversity (11 haplotypes) resulting in a simple network without loops (**Figure 2B**). The only frequent haplotype was distributed over the whole Lower Guinean-Congolian range of the species (**Figure 2A**). Localities with additional one to several rare geographically restricted haplotypes divergent by one mutation from the single widespread haplotype were found around the Cameroonian Volcanic Line and the Chaillu Massif in Gabon. In West Africa three divergent haplotypes were found. They were most closely related (different by three mutations) to the rare haplotype of the Cameroonian Volcanic Line.

The two species from the genus *Megaphrynium* presented very different phylogeographic patterns. In *Mega. trichogynum* (**Figure 2D**) there was one widespread haplotype covering the whole distribution area of the species and another frequent haplotype restricted to Gabon. The diversity center in this species was found in the North of Gabon where the frequent haplotypes overlapped in their distribution and three rare haplotypes also occurred. *Mega. macrostachyum* presented four haplotypes (H1, 2, 9, 16) exclusive to different, large geographic areas (Southwest Cameroon, Southwest Gabon to DRCongo (Bas Congo), North to Northwest Gabon, East Gabon/East Cameroon/Congos, **Figure 2C**). Each widespread haplotype was co-occurring with closely related and geographically restricted haplotypes. This resulted in five areas of increased haplotype diversity: the Cameroonian volcanic line, western DRCongo, northern Gabon, coastal northwestern Gabon (near Libreville) and the Cristal Mountains area in Gabon. Only five *Megaphrynium* individuals out of 282 carried a haplotype typical of the other species.

The spatial genetic structure of species from the genera *Haumania* and *Marantochloa* were already discussed in previous publications (Ley and Hardy, 2010, 2014) but updated here (**Figure 3**). *Haumania danckelmaniana* (**Figure 3A**) exhibited three haplotypes each covering a different large geographic area (Cameroon + northern Gabon, eastern Gabon, western Gabon). Additionally, there were several geographically very restricted haplotypes in localities found almost all over the species' distribution range. *H. liebrechtsiana* (**Figure 3B**) carried the same haplotypes as *H. danckelmaniana* in Gabon where both species occur in sympatry. In the Congo basin *H. liebrechtsiana* carried specific haplotypes: one widespread haplotype occurring from the Atlantic coast in DRCongo to the Center of the Congo Basin and several rare haplotypes being concentrated at the middle course of the Congo river.

In the genus *Marantochloa* there were two distinct patterns when comparing species. *M. congensis* (**Figure 3D**), the most widespread species, had two widespread haplotypes found across its entire distribution range and a few rare (locally restricted) haplotypes concentrated along the coast of Ivory Coast, in the Cameroonian Volcanic Line, in East DRCongo and in a corridor from the southern Chaillu Massif in Gabon to eastern Cameroon. *M. monophylla* (**Figure 3E**) in contrast exhibited a strong geographic pattern of two genetically distinct haplogroups, one distributed along the Atlantic coast, the other one east of that toward the Congo Basin. Major diversity centers were found in mountain ranges: Cameroonian Volcanic Line and Ngovayang (Cameroon); Cristal Mountains and the southern Chaillu Massif (Gabon); and in the Albertine Rift Valley (Uganda) (for localities compare **Figure 1**). Whereas, *M. congensis* and *M. monophylla*

**Marantaceae species.** *Halopegia azurea* **(A,B)**, *Megaphrynium macrostachyum* **(C,E)**, *Mega. trichogynum*, **(D,E)**. Gray hatched line: species distribution range. Sizes of circles are proportional to sample sizes at each locality in the geographic maps and proportional to haplotype frequency in

are number-coded. Stippled lines throughout the networks delineate groups of haplotypes according to the species in which they are usually found, but haplotypes can also be shared among species. Red numbers along branches are IDs of mutations, mv1 to mv2 indicate median vectors (Bandelt et al., 1999).

did hardly share any haplotypes (only one at Mount Cameroon), *M. incertifolia* (**Figure 3F**) shared half of its haplotypes with either one or the other sister species, *M. congensis* and *M. monophylla*. In western Cameroon and the Cristal Mountains there was one haplotype (H10) shared between all three *Marantochloa* species occurring there.

The fixation indices (*G*ST: 0.15–0.77 and *N*ST: 0.14–0.83, **Table 4**) were higher in *Haumania liebrechtsiana*, *Marantochloa incertifolia* and *Megaphrynium macrostachyum* and rather low in *Megaphrynium trichogynum*. Fixation indices varied somewhat according to grid cell size but the ranking of species was generally fairly consistent. There was always a marked difference in *G*ST and *N*ST between congeners: *Mega. macrostachyum > Mega. trichogynum; H. liebrechtsiana > H. danckelmaniana; M. incertifolia > M. monophylla > M. congensis.* In most species a marginally significant to very significant phylogeographic signal (*N*ST *> G*ST) could be detected at least at one scale (0.75◦, 1.5◦, and/or 3◦) (**Table 4**). The signal was most clear in *M. congensis* and *M. monophylla*. No phylogeographic signal could be detected in *Halopegia azurea, Haumania liebrechtsiana* and *Megaphrynium trichogynum*.

#### **CONGRUENCE OF GENETIC DIVERSITY PATTERN AMONG SPECIES**

As general geographic patterns of diversity were independent of grid cell size, only results based on 0.75◦ grid cells are reported here. Standardized effective numbers of alleles, gene diversity and phylogenetic diversity per grid cell and species revealed a significant geographic effect according to the ANOVA analyses (*NAe*: *F* = 1*.*79, *P* = 0*.*01; *He*: *F* = 2*.*59, *P <* 0*.*001; *v*: *F* = 12*.*64, *P <* 0*.*001). By contrast, the ANOVA test was non-significant for the mean frequency of endemic haplotypes (*End*; *F* = 0*.*99, *P* = 0*.*49) and the genetic distinctiveness per grid cell *(S'i; F* = 0.86, *P* = 0*.*69; for 1.5◦: *F* = 1*.*47, *P* = 0*.*12). Averaged standardized effective numbers of haplotypes per cell across species showed that diversity is highest in Cristal Mountains (Gabon) followed by northern and southern Gabon and Cameroonian volcanic line (Supplementary Figure 3, 4). By contrast south-western and eastern Gabon and coastal western and eastern Cameroon displayed below-average diversity values. South-western Cameroon and the DRCongo displayed close to average values. The multi-species pattern for phylogenetic diversity (*v*) was similar (Supplementary Figures 3, 4).

Comparing diversity patterns pairwise between species, the Pearson correlation tests revealed congruence between *Halopegia azurea, Haumania danckelmaniana, M. incertifolia, M. congensis* and *M. monophylla* (**Table 5**, Supplementary Table 5) with two main common centers of diversity in Gabon: the western Cristal Mountains area close to Libreville and the northern Chaillu Massif (**Figure 4**, Supplementary Tables 6, 7). Diversity centers of *M. congensis* are beside the Cameroonian volcanic line in Cameroon, the Cristal Mountains area, the northern Gabon and only observed in this species: the southern Chaillu Massif of Gabon and the northern part of the RCongo (see **Figure 4**, Supplementary Tables 6, 7). Furthermore, *Mega. macrostachyum* and *Mega. trichogynum* were inter-correlated for *NAe*. These two species showed many centers of genetic diversity well distributed across Cameroon and Gabon. They shared the center of diversity in the southwest of Gabon with *H. liebrechtsiana* and *M. congensis*. Concerning pattern of endemism there is congruence

**FIGURE 3 | Geographic distribution of chloroplast haplotypes and generic haplotype networks based on** *trnC-petN1r* **updated from Ley and Hardy (2010, 2014).** *Haumania danckelmaniana* **(A, C)**, *H. liebrechtsiana* **(B, C)**, *Marantochloa congensis* **(D, G)**, *M. monophylla* **(E, G)**, *M. incertifolia* **(F, G)**. Gray hatched line: species distribution range. Sizes of circles are proportional to sample sizes at each locality in the geographic maps and

proportional to haplotype frequency in the haplotype network. Frequent haplotypes are color-coded, rare haplotypes are number-coded. Stippled lines throughout the networks delineate groups of haplotypes according to the species in which they are usually found, but haplotypes can also be shared among species. Red numbers along branches are IDs of mutations, mv indicate median vectors (Bandelt et al., 1999).



*Test of significance for NST > GST : (\*), marginally significant (p < 0.1); \*, significant (p < 0.05); \*\*, highly significant (p < 0.01). All the GST values were significantly higher than 0 (p < 0.001). For abbreviations of species names see Table 1.*


**Table 5 | Pearson correlation of the effective number of haplotypes (***NAe***, lower diagonal) and within-cell phylogenetic diversity (***v***, upper diagonal) between species for grid cell size of 0.75◦.**

*(\*), marginally significant (p < 0.1); \*, significant (p < 0.05); \*\*, highly significant (p < 0.01). For abbreviations of species names see Table 1.*

**FIGURE 4 | Geographic distribution of standardized (i.e., centered and reduced) genetic diversity and endemism for eight Marantaceae species in Lower Guinea for grid cell size 0.75◦ .** Effective number of haplotypes (*NAe*) **(A)**; mean phylogenetic distance between individuals (*v*) **(B)**; haplotypic endemism (haplotype range *<*200 km, *End*) **(C)**; Genetic distinctiveness of each grid cell (*S'i*) **(D)**. Distinctiveness above or below average is based on

standardized pairwise genetic distance (*S'kij* computed for each species) among populations where genetic distance is estimated as the number of mutational steps between two individuals drawn from two populations (*vij*). Species along barplots from left to right are: *Halopegia azurea, Haumania danckelmaniana, H. liebrechtsiana, Marantochloa congensis, M. incertifolia, M. monophylla, Megaphrynium macrostachyum* and *Mega. trichogynum*.

between *Marantochloa monophylla* and *Haumania liebrechtsiana* and between *M. monophylla* and *M. incertifolia* (the latter only at 0.75◦) as well as between *Halopegia azurea* and *M. congensis* and between *Halopegia azurea* and *H. danckelmaniana* (but the latter only detectable at 1.5◦ grid because not enough shared cells at 0.75◦). Interestingly, there was no congruence in the patterns of haplotypic endemism across species (Supplementary Table 8).

#### **CONGRUENCE OF GENETIC DISTINCTIVENESS PATTERNS AMONG SPECIES**

The genetic distinctiveness per grid cell for each species is presented in **Figure 4D**. Above average levels of population distinctiveness for three or more species are reached in the Cameroonian volcanic line and in north-western Gabon (Libreville/coastal Gabon and western Cristal Mountains). In contrast, South and East Gabon, East Cameroon and DRCongo displayed always low levels of distinctiveness for most species.

There were only very few species pairs that showed statistically significant congruent patterns of genetic distinctiveness among grid cells (*S'ij*); there were three at grid cell size 0.75◦ (**Table 6**, *Megaphrynium macrostachyum* with *Marantochloa congensis* and *Marantochloa monophylla*, and *Marantochloa monophylla* with *Marantochloa congensis*) and three at grid cell size 1.5◦ (Supplementary Table 9).

## **DISCUSSION**

In this study, phylogeographic patterns of the plastid genome of eight herb and liana species from the family Marantaceae were compared in Lower Guinea. We expected that profound vegetation changes might have left their imprints in the distribution pattern of genetic diversity of species, and that similar species responses would lead to congruent phylogeographic patterns. In our study, however, we did not find overall congruence in the pattern of genetic diversity, endemism and distinctiveness across all study species but rather multiple patterns characteristic for one or a few species. Thus, there was not a uniform congruence of genetic pattern with the putative rainforest refugia proposed by Maley (1996). Our results indicate either idiosyncratic histories of the chosen taxa, or that once congruent genetic patterns resulting from similar species responses to particular climatic changes are already overlain by younger historical events (Alexandre et al., 1998; Maley and Brenac, 1998; Maley, 2002) leaving new individual imprints in the genetic patterns of species. Here, compared to tree species, phylogeographic patterns in herbs might reflect younger evolutionary events due to their shorter life cycles (for life cycles in perennial herbs/lianas see Putz, 1990; Gerwing, 2004; Brandes et al., 2011).

#### **GENETIC DIVERSITY AND DIFFERENTIATION BETWEEN SISTER SPECIES IN LOWER GUINEA**

In the eight Marantaceae species studied, the level of genetic diversity at the plastid gene sequenced (7–19 haplotypes per species, see also nucleotide diversity) was similar to that found for the same plastid marker in tree species from Lower Guinea (6– 24 haplotypes per species, (Dauby et al., 2014a); for nucleotide diversity see Heuertz et al., 2013). The high molecular diversity found in *Marantochloa monophylla* was congruent with its high morphological diversity – an exceptional morphological and genetic diversity (*NAe, v*) was found in Ngovayang Mountain in Cameroon. By contrast, genetic diversity was especially low in *Halopegia azurea*, a selfing species (Ley and Claßen-Bockhoff, 2013). Although selfing should not *per se* affect the diversity of maternally inherited genomes, it might enhance selective sweeps by generating a global linkage between nuclear and cytoplasmic genomes (Glemin et al., 2006), a possible explanation for the low diversity observed in the plastid genome.

The divergence of haplotypes within and between Marantaceae sister species was rather low (1–2 mutations) indicating a low degree of interspecific molecular divergence, potentially due to relatively recent speciation events. Species seem not to have yet established strong species boundaries which was probably the reason for the observation of recurrent hybridization events in sympatric regions in almost all sister species pairs considered here (see also Ley and Hardy, 2014).

#### **SPATIAL GENETIC STRUCTURE AND PHYLOGEOGRAPHIC SIGNAL WITHIN SPECIES**

A spatial genetic structure was found in all species (see significant *G*ST values) indicating intra-specific population differentiation. In addition, a significant phylogeographic pattern (*N*ST *> G*ST) could be detected in five of the eight Marantaceae species. This


**Table 6 | Results of Mantel test comparing pairwise standardized distinctiveness among grid cells (***S***'***ij***) between species pairs for grid cell size of 0.75◦.**

*Upper diagonal: number of grid cells shared between species; lower diagonal: correlation coefficient. Significant values (p < 0.05) and highly significant values (p < 0.01) are written in bold and indicated by \* and \*\*, respectively. For abbreviations of species names see Table 1.*

implies that, for these species at least, some of their populations have evolved in isolation for long enough to generate related haplotypes that tend to co-occur locally. Such phylogeographic pattern is expected if species survived in multiple isolated refugia. Only *Halopegia azurea*, *Haumania liebrechtsiana* and *Megaphrynium trichogynum* do not show such a signal. In all three species the low number of haplotypes (*<*10) prevents sufficient testing power.

Genetic differentiation between areas (*G*ST) in the eight Marantaceae species were comparable to values found in maternally inherited markers in many other angiosperm taxa, including tropical African trees (see Duminil et al., 2007; Dauby et al., 2014a). This is in contrast to our expectation of more substructuring in (perennial) herbs/lianas than in trees and may indicate rather similar dispersal and population structure in both growth form groups.

Within the Marantaceae, *G*ST values seem to correlate superficially with dispersal ability (see also Petit et al., 2003): the *G*ST was lowest in *Megaphrynium trichogynum* whose red fleshy fruits are ape/monkey dispersed (Williamson et al., 1990; Tutin and Fernandez, 1993; White and Abernethy, 1997), in *Halopegia azurea* (dispersal mode still unknown) and in *M. congensis* (see interpretation below), while it was highest in species of *Haumania liebrechtsiana* with large, probably gravity dispersed fruits, and in *Marantochloa incertifolia* with rather isolated occurrences and an extremely low production of flowers and fruits (5–15 flowers per inflorescence flowering sparsely over a month with a fruit set of 3–6%, see Ley, 2008; Ley and Claßen-Bockhoff, 2013). Fruits are here dispersed by small birds and/or water (Tutin, 1998; Ley, 2008). A rather low *G*ST value in the Marantaceae is presented by *Mega. trichogynum* (*G*ST = 0*.*33). However, there is no indication that its fruits are better dispersed than the ones of its sister species *Mega. macrostachyum* (fruit morphology compare in Dhetchuvi, 1996) which presents a much higher *G*ST value. There are other contrasting *G*ST values between sister species pairs (**Table 4**). However, we see a possible explanation in terms of dispersal ability difference only for the lower *G*ST value found in *M. congensis* compared to its sister species. The three investigated *Marantochloa* species produce rather small amounts of fruits (see Ley and Claßen-Bockhoff, 2013). *M. congensis* is the only species which additionally frequently propagates by vegetative means, producing large quantities of bulbils which might be dispersed by water and animals, potentially contributing to an efficient gene flow between populations and a rapid clonal expansion of the species distribution range (Kennedy, 2000; Ley, 2008). *M. congensis* is the species with the largest current distribution range of the three investigated *Marantochloa* species, occurring from West Africa to eastern DRCongo (Dhetchuvi, 1996). The observation of a shared, possibly ancestral haplotype in the Cameroonian Volcanic Line and Cristal Mountains might suggest that the three species originated there. We favor ancestral haplotype over chloroplast capture, as we are dealing here with a haplotype in the center of the haplotype network between the three species. Under this assumption, the much larger distribution range of *M. congensis* might be explained by better dispersal capacities.

#### **CONCORDANCE OF OBSERVED GENETIC PATTERNS ACROSS SPECIES AND WITH POSTULATED PLEISTOCENE REFUGIA IN LOWER GUINEA**

We demonstrated congruent geographic patterns of diversity across species: local genetic diversity is congruently high for six out of eight study species in the Cristal Mountains area and low for seven out of eight study species in eastern Cameroon, eastern and southern coastal Gabon, and Bas Congo (Mayumbe) in DRCongo. By contrast, values of above-average frequency of endemic haplotypes can almost be found in every grid cell when taking all species together. However, few species pairs display correlated patterns of endemic haplotype frequencies. Similarly, there is no general correlation in distinctiveness indices among Marantaceae species, while such a correlation was reported among five of eight tree species for Lower Guinea (Dauby et al., 2014a).

To interpret these patterns, it is worth noting that endemic haplotypes are potentially the best indicators for refuge areas in general (stable populations are expected to accumulate endemic haplotypes that have not the opportunity to emigrate), and distinctiveness indices might be best indicators of refuge areas that are not source of adjacent areas. High diversity is expected in refuge areas with historically large population sizes (but not under small but stable population size), but also in areas recolonized from multiple differentiated populations (secondary contact zones), a situation where phylogenetic diversity (*v*) should peak (Petit et al., 2003). As we found interspecific congruence in genetic diversity but not in endemism and distinctiveness, the data do not support a hypothesis whereby the different Marantaceae species would have primarily survived in the same set of refugia during periods of climate deterioration. Nevertheless, the correlation in diversity indices might indicate that there are some shared secondary contact and/or refuge areas. In fact, some high diversity areas, like the Cristal Mountains, might have been both a refuge for some species and a secondary contact zone for other species, or even the two for some species (if we imagine that a refuge area becomes "invaded" by an expanding population from another origin).

Overall, we found marked differences in patterns of haplotype distribution across species: (i) Some species are characterized by mostly parapatric distributions of their frequent haplotypes (*H. danckelmaniana*, *Mega. macrostachyum* and *M. monophylla*; plus *H. liebrechtsiana* though in this case the pattern may be due to plastid capture). Here a common pattern becomes apparent distinguishing Cameroon from south-western Gabon and eastern Gabon. Frequent haplotypes often overlap in their distribution range in the Mount Cristal area. Each of these individual distribution ranges overlaps with a different postulated refugium allowing two different scenarios: either an expansion of each of the frequent haplotypes from a central refugium in the Cristal Mountains area into different directions to Cameroon, southwestern Gabon and eastern Gabon, or the other way round with an expansion from three different refugia with an overlap today in the Cristal Mountains area. A similar pattern of restricted haplotype distribution ranges was also found in some tree species and has here been attributed to the retraction of these species to different refugia. In trees it might additionally have been coupled with an adaptation to different climatic conditions evoked by the East-West rainfall gradient and the North-South seasonal gradient in Lower Guinea (Duminil et al., 2013; Heuertz et al., 2013). (ii) A second haplotype distribution pattern is characterized by a wide distribution of one or a few frequent haplotypes over the entire distribution range of a species (*Halopegia azurea, M. congensis*, *Mega. trichogynum*). These species might have a single locality of origin from where an expansion took place across the current distribution range (though a second refugium but without visible expansion would explain the endemic haplotypes found in the Cameroonian Volcanic Line for *M. congensis*). So far the detected diversity and endemism pattern in these three species suggest a refugium within Lower Guinea based on high diversity and endemism e.g., Cristal Mountains with or without Chaillu Massif and/or Cameroonian Volcanic Line and/or eastern Cameroon depending on species. The only study species so far showing evidence of a refugium outside Lower Guinea in DRCongo is *H. liebrechtsiana* (see also Ley and Hardy, 2010, 2014). As already found in trees (see Hardy et al., 2013) major rivers seem not to play an important role as barriers to gene flow in Marantaceae in contrast to evidence found in animals (e.g., Gonder and Disotell, 2006; Anthony et al., 2007; Nicolas et al., 2011).

There is repeatedly genetic evidence in species for the Cristal Mountains being a refuge area (see Koffi et al., 2011; Dauby et al., 2014b). The fact that in the Marantaceae the high diversity of the Cristal Mountains area is not associated with high endemism for at least half of the species suggests that the high diversity is best explained by a recolonization from several sources, rather than by a refuge effect, at least for these species. Note however that an area might be a refuge and at the same time have been "invaded" by other sources.

The Cameroonian volcanic line is a locality well-known for its high species diversity and endemism level, which has been interpreted as a signature of a past forest refuge (see Sosef, 1994; Maley, 1996). In the Marantaceae, only species with large distribution ranges from Lower Guinea to West Africa present high genetic diversity and/or distinctiveness and/or endemism values here. This is in accordance with patterns found in widespread trees (see also Lowe et al., 2010) and might be due to a refuge effect, i.e., accumulation of mutations in stable populations, and/or to a topographical effect, i.e., differentiation between geographically close populations isolated by mountainous barriers (see also Dauby et al., 2014a).

Assuming that the limited concordance between phylogeographic patterns of Marantaceae species in Lower Guinea reflects their idiosyncratic histories of past population fragmentation, one may question the relative importance of chance (species survived by chance in one or several refugia following forest fragmentation) and ecological adaptations (e.g., species survived only in refugia reflecting the optimum of their climatic tolerance). As all species are currently co-occurring in all potential refugia without a marked adaptation to different habitats and/or climate regimes (see Dhetchuvi, 1996) we favor the hypothesis that demographic stochasticity affecting population survival as well as rare long distance dispersal driving recolonization routes played a major role in the resulting phylogeographic patterns. For tree species, congruence of genetic distinctiveness patterns was observed in northern Lower Guinea but not in southern Lower Guinea (Dauby et al., 2014a). This pattern was tentatively explained by a less drastic forest cover reduction in southern Lower Guinea where multiple micro-refugia (e.g., gallery forests) would have remained (see Kingdon, 1980; Dupont et al., 2000; Leal, 2001). The high ecological drift associated with these microrefugia would imply that each one would have hosted a limited number of typical rainforest species, which might have led to the observed idiosyncratic demographic histories of species. This hypothesis might also hold for our Marantaceae species.

A current limitation for the interpretation of our data is the difficulty to date population divergence or admixture and provide a confirmation that such events are concomitant with Pleistocene climate changes and not earlier or later events of climate change. Plastid markers are not ideal for this purpose due to their relatively low mutation rate. Additional studies based on nuclear sequencing should bring new insights.

### **BEYOND LOWER GUINEA—THE ROLE OF UPPER GUINEA AND CONGOLIA**

Assessing the importance of the areas adjacent to the East and the West of Lower Guinea (Congolia and Upper Guinea, respectively) for speciation and population differentiation is still difficult due to a lack of sufficient data. Patterns so far documented indicate that these areas have widespread haplotypes but also endemic haplotypes. In Upper Guinea several refugia were postulated by Maley (1996, see **Figure 1**) and the dry Dahomey gap in Benin might play an important role in isolating Upper and Lower Guinea (see Hardy et al., 2013), although the two forest blocks were probably connected during the Humid Holocene period (c. 6–9 kr BP). This can explain why several species are still restricted to Western Africa today (White, 1979; for Marantaceae see Schnell, 1957; Jongkind, 2008) and endemic haplotypes are found there (e.g., *Halopegia azurea*, see also Duminil et al., 2013) advocating the uniqueness of this area. Similarly, the Congo Basin and the adjacent eastern mountain range are interesting areas. Preliminary data suggest overall genetic diversity to be low in this area for most species, defining this region rather as an area of expansion. Some authors have suggested that Marantaceae species could have been spread to east Cameroon/RCongo due to human activities (Maley, 2001; Brncic et al., 2009). However, despite a rather fragmentary sampling, endemic haplotypes have also been detected in the Congo Basin and in the Albertine Rift Valley (see *M. monophylla*). This suggests a rather long existence of those species in that area.

## **AUTHOR CONTRIBUTIONS**

Alexandra C. Ley has been conducting research on Marantaceae since her PhD starting in 2004. She did most of the field collections and genetic laboratory manipulations, analyzed the chloroplast haplotype distribution and diversity pattern and took a lead in the editing of the article. Gilles Dauby developed the analyses of divergence of haplotype distribution across species in a previous project and applied his knowledge here on the Marantaceae dataset. Julia Köhler and Catherina Wypior conducted a "Forschungsgruppenpraktikum" on the acquisition and analyses of the genetic data of one species of *Megaphrynium* each. Martin Röser and Olivier J. Hardy are both group leaders. The latter conducted the analyses of spatial genetic pattern and endemism and significantly contributed to the discussion of results. All co-authors gave their final approval of the version to be published.

#### **ACKNOWLEDGMENTS**

For support in collecting specimens in Gabon, we thank the National Herbarium in Libreville (LBV), World Wildlife Fund (WWF), T. Stevart and M. Leal both from Missouri Botanical Garden (MBG), the Institut de Recherche en Ecologie Tropical (IRET), O. Hymas from Wildlife Conservation Society (WCS), the Conservators employed by the National Parks Council (CNPN) and IPHAMETRA and CENAREST for granting mission orders and research permits. For the collections in Cameroon, we are grateful for support by J.M. Onana, O. Sene from the Herbarium Cameroon in Yaounde (Y), by the staff of Limbe Botanical Garden as well as by A. Enang, M. Cheek, M. Etuge, P. Mambo and S. Moses. For the collections in DRCongo, we are thankful for the support of A. Serckx and WWF Congo, as well as for additional collections made in RCongo, DRCongo-Salonga and West Africa by J.-F. Gillet, G. Hohmann and C. Jongkind, respectively. B. Hildebrandt kindly supported our genetic work in the laboratory. The collection missions to Gabon and Cameroon were financed by the German Exchange Service (DAAD). Following this, the first author was granted a 2-year postdoctoral fellowship by the Germany Research Foundation (DFG). The laboratory work was financed by the Belgian Fund for Scientific Research (F.R.S-FNRS, grants FRFC 2.4.577.10 and MIS 4.519.10), through the project "C3A" financed by the French ANR (Agence Nationale de la Recherche) under the ANR-BIODIV program as well as the University Halle-Wittenberg.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene. 2014.00403/abstract

#### **REFERENCES**


years B.P. *Rev. Palaeobot. Palynol.* 99, 157–187. doi: 10.1016/S0034-6667(97) 00047-X



Vande Weghe, J. P. (2004). *Forests of Central Africa*. Tielt: Lannoo Publishers.

Willis, K. J., and McElwain, J. C. (2002). *The Evolution of Plants*. Oxford: Oxford University Press.

Zeng, X., Michalski, S. G., Fischer, M., and Durka, W. (2012). Species diversity and population density affect genetic structure and gene dispersal in a subtropical understory shrub. *J. Plant Ecol.* 5, 270–278. doi: 10.1093/jpe/rtr029

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 June 2014; accepted: 01 November 2014; published online: 19 November 2014.*

*Citation: Ley AC, Dauby G, Köhler J, Wypior C, Röser M and Hardy OJ (2014) Comparative phylogeography of eight herbs and lianas (Marantaceae) in central African rainforests. Front. Genet. 5:403. doi: 10.3389/fgene.2014.00403*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Ley, Dauby, Köhler, Wypior, Röser and Hardy. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Microrefugia and species persistence in the Galápagos highlands: a 26,000-year paleoecological perspective

#### *Aaron F. Collins <sup>1</sup> \*, Mark B. Bush1 and Julian P. Sachs <sup>2</sup>*

*<sup>1</sup> Department of Biological Sciences, Florida Institute of Technology, Melbourne, FL, USA*

*<sup>2</sup> School of Oceanography, University of Washington, Seattle, WA, USA*

#### *Edited by:*

*Valentí Rull, Botanic Institute of Barcelona (CSIC), Spain*

#### *Reviewed by:*

*Encarni Montoya, The Open University, UK Francis E. Mayle, University of Reading, UK William D. Gosling, The Open University, UK*

#### *\*Correspondence:*

*Aaron F. Collins, Department of Biological Sciences, Florida Institute of Technology, 150 W. University Blvd, Melbourne, FL 32901, USA e-mail: aaron.f.collins@gmail.com*

The Galápagos Islands are known to have experienced significant drought during the Quaternary. The loss of mesophytic upland habitats has been suggested to underlie the relatively lower endemism of upland compared with lowland plant assemblages. A fossil pollen record spanning the last 26,000 years from an upland bog on Santa Cruz Island, revealed the persistent presence of highland pollen and spore types during the last glacial maximum and a millennial-scale series of droughts in the mid Holocene. The absence of lowland taxa and presence of mesic taxa led to the conclusion that the highland flora of the Galápagos persisted during both these periods. The resiliency of the highland flora of the Galápagos to long-term drought contradicts an earlier hypothesis that an extinction of highland taxa occurred during the last glacial maximum and that rapid Holocene speciation created the modern plant assemblage within the last 10,000 years. Based on the palynological data, we suggest that, even during the height of glacial and Holocene droughts, cool sea-surface temperatures and strong trade-wind activity would have promoted persistent ground level cloudiness that provided the necessary moisture inputs to maintain microrefugia for mesophytic plants. Although moist conditions were maintained, the lack of precipitation caused the loss of open water habitat during such events, and accounts for the known extinctions of species such as *Azolla* sp., and *Elatine* sp., while other moisture dependent taxa, i.e., *Cyathea weatherbyana*, persisted.

**Keywords: Galápagos, fossil pollen, drought, last glacial maximum, extinction, microrefugia, garúa, precipitation**

### **INTRODUCTION**

The Galápagos has been an important laboratory for studies in population genetics, extinction, and speciation rates. The majority of these studies focused on adaptive radiation within the islands, the movement of species and incipient species between islands (e.g., Darwin, 1845; Grant and Grant, 1981, 2002; Grant et al., 2000; Caccone et al., 2002; Arbogast et al., 2006). Extinctions are harder to study, but several species of vertebrate have been shown to have gone extinct or, at the least, been extirpated from given islands following human contact (Steadman et al., 1991). Human impacts are also suggested to have caused extirpation (defined as loss from an island) of an upland species of *Acalypha* from San Cristobal Island in the last century (Restrepo et al., 2012). A study of macrofossils in a bog in the uplands of Santa Cruz Island revealed the apparent loss of the waterwort, *Elatine*, within the last millennium (Coffey et al., 2012). This extinction may have been due to hydrologic changes in the wetlands, or ecological cascades related to altered use of the area by giant tortoises or introduced grazers (Coffey et al., 2012). The only example of an extinction unequivocally attributed to climate change is the loss of an aquatic water fern from San Cristobal Island (Schofield and Colinvaux, 1969). Fossil palynological records from El Junco Crater Lake on San Cristobal revealed that during an undated time prior to the last glacial maximum (probably within the last ice age) a species of the water-fern *Azolla* different from any known on the islands today grew in the lake. The disappearance of this population is attributed to ice-age aridity (Schofield and Colinvaux, 1969; Colinvaux and Schofield, 1976a,b).

Under modern conditions, El Niño Southern Oscillation (ENSO) causes the greatest interannual variability in precipitation within the islands. During the El Niño (negative phase of ENSO) rainfall can increase by an order of magnitude, inducing a greening of the landscape, heavy flowering, and increased seed production compared with normal years (Grant and Grant, 1981, 2002; Grant et al., 2000). Contrastingly, during strong La Niña events, rainfall can be close to zero, with corresponding reduction in foliage, flowering, and seedset. Importantly, this pattern is strongest on low-lying islands or on the north slope of islands, but on southerly facing slopes and at elevations above c. 200 m, the lack of rainfall may be partially offset by water intercepted from regular ground-level fog immersion known locally as garúa (Pryet et al., 2012; Trueman et al., 2013).

Populations of animals and plants within the Galápagos have been profoundly influenced by ENSO variability (Grant and Grant, 1981, 2002; Hamann, 1985; Grant et al., 2000). While La Niña events bring annual-scale drought, decadal- or even longer droughts may have shaped floras.

During the last glacial maximum (LGM; defined as 22,000- 19,000 year BP), the ITCZ was located south of its modern position (Newell, 1973; Haug et al., 2001; Koutavas and Lynch-Steiglitz, 2004; Leduc et al., 2007; Hodell et al., 2008). The result of this displacement, coupled with weakening of the Atlantic Meridional Overturning Circulation (AMOC) induced oppositely-phased changes in precipitation pattern in each hemisphere. In the northern tropics, cooling of the LGM resulted in relatively cool dry conditions, whereas cool wet conditions prevailed in the southern tropics (Bradbury, 1997; Bush et al., 2011).

The desiccation of the freshwater lakes and bogs within the Galápagos highlands led researchers to conclude that southerly migration of the ITCZ resulted in reduced garúa, prolonged drought and increased aridity within the archipelago during the LGM (Newell, 1973; Colinvaux and Schofield, 1976a,b). There has not been any other terrestrial record that reached the LGM to support or refute the hypothesis of increased aridity with the Galápagos.

Johnson and Raven (1973) postulated that during glacial-aged aridity the xeric-adapted flora of the lowlands expanded upslope replacing the mesic upland flora. If this scenario of the complete extinction of the highland flora and the return of moist conditions to the highlands just c. 10,000 year BP was true, it would set up a phase of rapid speciation to establish the modern upland endemic flora.

The islands are known to have been drier than modern at the last glacial maximum, as lakes dried out and sediments oxidized (Colinvaux and Schofield, 1976a,b). Similarly, between 9000 to 4400 cal. year BP the Intertropical Convergence Zone (ITCZ) migrated northward causing widespread drought in northern and southern hemisphere South America known as the Mid Holocene Dry Event (MHDE) (Gonzàlez et al., 2008; Niemann and Behling, 2008). Within the Galápagos, lake level in El Junco Crater Lake fell and erosion decreased as a consequence of increased aridity during the MHDE (Conroy et al., 2008). Coffey et al. (2012) document the transition of upland pools to bogs during this period on Santa Cruz; consistent with drier, but not arid conditions in the uplands.

Here, we report palynological data from a bog in the Santa Cruz highlands that possesses an intermittent record from 26,200 cal. year BP to 8740 cal. year BP and a complete record from 8740 cal. year BP to modern. We test two competing hypotheses: (1) that the uplands became dominated by xeric vegetation during the last glacial maximum, and (2) that despite overall aridity upland species were able to persist during the glacial maximum.

### **MATERIALS AND METHODS**

#### **SITE DESCRIPTION**

Paul's Bog, named for Paul Colinvaux, who first cored the bog in 1967 (Colinvaux, 1968), is located in an eroded south-facing cinder cone (0◦ 38 42.2 S 90◦20 14.4 W, 800 m elevation) on the island of Santa Cruz with the floor of the bog ∼80 m by 40 m (**Figure 1**). The eroded nature of the cinder cone catchment provides a larger area for pollen rain capture than steep sided basins like El Junco Crater Lake, the only other full Holocene record from the archipelago. The bog lies in the path of prevailing southerly trade winds and is bathed in near constant garúa. The permanent moisture allows modern development of a *Sphagnum* bog, within a Fern/Sedge landscape. The moisture input to the bog provides the possibility

**FIGURE 1 | (A)** Map of Santa Cruz island with associated vegetation zones and agricultural zone within mesic zone highlighted. Paul's Bog (black ellipse) located within white box. **(B)** Diagram of slope of valley with Paul's Bog at the peak with trade wind direction, garúa location and vegetation communities listed.

of a continuous sediment accumulation during the mid- and late-Holocene.

A diverse bracken fern community is located on the slopes adjacent to the bog, with the area immediately downslope being invaded by the exotic tree *Cinchona pubescens* (Itow, 2003; Jäger et al., 2009). The *Miconia* community (c. 300–550 m elevation) lies just 0.5 km downslope and makes the bog a sensitive location to observe upslope migration of species (Johnson and Raven, 1973) should the garúa have weakened. The shape of the basin limits the potential surface area for past lake formation to c. 0.5 km2.

#### **CORE RETRIEVAL, PROCESSING AND SAMPLING**

The bog was cored using a 5 cm-diameter Colinvaux-Vohnout piston corer in September, 2004, and again in December, 2005. The longest of the cores raised in 2005 was 627 cm in length. The cores raised in 2004 were split at the University of Washington, while those raised in 2005, were split at the Florida Institute of Technology. Cores were stored at 4◦C and imaged using a GEOTEK core-logger of the LUCIE group at the University of Florida. Comparison of the distinctive sediments forming the records of the cores raised in 2004 (400 cm) and 2005 (627 cm) allowed accurate cross-correlation.

Accelerator Mass Spectrometry (AMS) 14C radiocarbon dating was conducted on 11 samples (one reversal) from the 2005 core to provide a chronology. Bulk sediment was used (5 cm3) for all 14C dates because macrofossils were not readily preserved beyond 20 cm depth in the sediment core.

The age model of Paul's Bog was based on ten 14C dates from the 2005 core (after removal of age reversal sample) and four bulk AMS 14C radiocarbon dates from the core raised in 2004. All dates were calibrated using Calib v6.0 (Stuiver and Reimer, 2005).

Sediment samples of the core raised in 2005 were taken every 2 cm between 7 and 219 cm total depth (*n* = 111). The material from deeper in the core was largely rotted tephra and preliminary surveys showed that pollen was not fossilized. Subsamples were treated with standard chemical protocols to concentrate pollen residues (Faegri and Iversen, 1989; Moore et al., 1991). Each sediment sample was spiked with ∼5000 polystyrene microspheres prior to chemical treatment to enable the calculation of pollen concentrations for samples (Battarbee and Kneen, 1982). The top 7 cm of the core was composed of *Sphagnum* moss and was sampled at 1 cm resolution (*n* = 7). Counts were conducted using a Zeiss Axioskop photomicroscope at × 400 and × 1000 magnification, until 250 pollen grains or 2000 microspheres were reached. Pollen identification was based on the modern pollen reference collection of the Florida Institute of Technology. Percentages of taxa and pollen concentrations (grains/cm3) within samples were calculated and concentrations were plotted using C2 software (Juggins, 2003).

Samples were ordinated using Detrended Corespondence Analysis (DCA) (Gauch, 1982; Birks, 1985, 1998; Bush, 1991) using PC-ORD v4.41 (McCune and Mefford, 1999). Due to poor pollen preservation in the glacial-aged samples, presence/absence data were used when ordinating glacial-aged and Holocene samples with DCA. All pollen and spore taxa that passed a persistence filter of presence in at least five samples (53 taxa) were used in the analysis. Pollen preservation improved markedly above 120 cm (8740 cal. year BP). A Holocene-only dataset of pollen percentage (0-8740 cal. year BP, *n* = 70), was also ordinated using DCA. The distance measure used was Bray-Curtis dissimilarity coefficient because it reduced the problems of datasets with many zeros (Bray and Curtis, 1957; Minchin, 1987; Pandolfi and Minchin, 1995; Pandolfi and Jackson, 2001). Rare taxa were down weighted.

An ANOSIM analysis was run with PRIMER v5.2.9 (Clarke and Gorley, 2001) on five major groups of Holocene samples located within the DCA to test for a significant difference between the groups (**Table 1**). The goal of this analysis was to test the DCA outputs for significant changes in community composition before, after, and during the MHDE.

Because pollen data can have many zeros, but can also be dominated by a few taxa in a sample, the data were square root transformed and standardized by the maximum abundance of the dominant taxon in each sample (after Faith et al., 1987). A *post-hoc* permutation test (10,000 replications) was run to detect which pairs of groups significantly differed.

## **RESULTS**

#### **SEDIMENT DESCRIPTION**

The 219 cm portion of the 2005 core that was used for pollen analysis had distinct changes in sediment color and type (**Figure 2**). The stratigraphic column was primarily composed of clays below c. 127 cm, and organic material above.

**Table 1 | Groups of samples for ANOSIM analysis with age range of groups and sample size for each group.**


*Groups were based on clustering by results of a DCA of Holocene-age pollen precentage.*

Orange sandy clay indicative of oxidizing conditions was present from the core base (627 cm) to 204 cm (26,050 cal. year BP). The clays differed in color and texture and appeared particularly strongly oxidized below 204 cm (**Figure 2**). A paleosol may be indicated by a band of weathered material at 138 cm depth and would probably mark a major hiatus in the sedimentation of the site.

After the hardened paleosol, organic-rich clays (c. 139–127 cm) gave way to organic material from 127 cm to c. 20 cm and 7 cm depth, when the organic muds were overlain by a mixture of rotted sphagnum and organic sediment, while the section from 7 to 0 cm was comprised of relatively well-preserved sphagnum. The surface *Sphagnum* layer did not appear to be disturbed as fresh material overlay progressively decayed layers.

The 2004 core possessed consistent sedimentary changes located within the 2005 core that was subjected to pollen analysis. The 2004 core had a longer sphagnum peat layer (50 cm) than the 2005 core (20 cm), but organic layers were similar in length, 120 cm in 2004 core and 127 cm in the 2005 core. Beyond the organic layer, the 2004 core has alternating tan and gray clay layers, with heavily oxidized soils beyond 200 cm. The presence

**FIGURE 2 | Core description for 219 cm sampled for pollen analysis (total core length 627 cm).** Samples subjected to 14C analysis have age (left) and depths (right) listed on either side of the core. The 26,500 cal. year BP mark is an interpolated age for the lowest sample taken for fossil pollen analysis, not an actual 14C date. Depths for points of change in sedimentology are only marked at right side of core. Inset: Paul's Bog age model for the last 26,000 cal. year BP based on a 14C chronology. Red rectangles signify dates that were cross-correlated from the 2004 core, blue rectangles signify dates taken from 2005 core. 2 sigma error of 14C dates shown by error bars.

and location of these major sedimentary events within both cores made cross correlation of radiocarbon dates from both cores feasible.

#### **AGE MODEL**

The age model for the 2005 core was robust, with only one reversal at c. 25,500 cal. year BP (230 cm). The date at c. 28,000 cal. year BP (209 cm) from the 2005 core was removed because to accept it would have required a greater number of ages to be rejected (**Table 2, Figure 2**).

The lowest meter of the 2005 core subjected to pollen analysis (200–300 cm; c. 26,800-26,300 cal. year BP) had the fastest sedimentation rate (7.6 years/cm) in the entire record. From 200- 120 cm (c. 25,310-8100 cal. year BP) the 2005 core had the slowest sedimentation rate in the record (100–400 years/cm). The top 120 cm of the 2005 core (c. 8100-0 cal. year BP) had a moderate sedimentation rate (20–40 years/cm) that was less variable than the previous 80 cm. The sampling resolution was fairly constant (80 years) between 6000 BP and 1400 BP, but the faster sedimentation produced a sampling interval of c. 30–40 years, between 6000 and 8000 cal. year BP. However, the last 1400 years is poorly represented, as the sedimentation rate was severely reduced (155 years/cm) due to hydrach succession, as a more rapidly sedimenting lake formed into a slowly sedimenting bog (**Figure 2**).

#### **DCA ANALYSES OF POLLEN DATASET**

DCA Axis 1 for the entire data set (presence or absence data) separated samples that were older than 8740 cal. year BP from those that were younger. The older samples were characterized by low species diversity and high proportions of Asteraceae (**Figure 3**).

Pollen taxa that influenced the negative portion of Axis 1 were *Psychotria*, *Myrica*, Solanaceae, and *Acalypha*. Taxa known to be selectively preserved in mildly oxidizing conditions (mixture of Asteraceae types and spores) (Havinga, 1964, 1967, 1984; Moore et al., 1991) influenced the positive extreme of Axis 1.

The negative extreme of Axis 2 was characterized by *Acnistus, Cassia*, and *Ludwigia*. The positive end of the axis had high scores for a mixture of Asteraceae, monolete spores, Myrtaceae, *Podocarpus*, and two unknown types.

The DCA of Holocene pollen samples exhibited robust clustering of similarly aged samples in five zones, with relatively few samples that crossed over into the space dominated by samples

**presence/absence fossil pollen data for the Paul's Bog dataset (26,200-0 cal. year BP).** To reduce noise in the dataset, the rarest types were removed. Samples from similar age periods were grouped.



*All samples were bulk organic sediment. Asterisks denoted age reversals.*

from another zone (**Figure 4**). The ordination separated the samples associated with the formation of the lake, PB-2, from those of the MHDE (6600-3660 cal. year BP). Similarly the late Holocene zones were separated from those of the early Holocene.

Marshland components (*Polygonum*, *Ludwigia*, Cyperaceae) characterized the negative extreme of Axis 1 with, *Polygonum* most strongly associated with PB-2 whereas *Ludwigia* dominated PB-3. Most dry land taxa had strong positive scores on Axis 1, e.g., *Zanthoxylum*, *Psychotria*, Myrtaceae, Solanaceae type, Cyperaceae and some lower elevation elements (e.g., *Bursera*, *Cordia*).

#### **HIGHLAND COMMUNITY COMPOSITION DURING THE LGM, DEGLACIATION AND EARLY HOLOCENE; LOCAL POLLEN ZONE PB-1 (26,200-9550 CAL. YEAR BP, 128–219 CM)**

Oxidizing conditions cause rapid decomposition of pollen, leaving only palynomorphs with thick exines, e.g., Asteraceae, Poaceae, and spores (Havinga, 1967; Faegri and Iversen, 1989). Despite this selective degradation; enough pollen and spores were preserved to provide qualitative information about conditions around the lake catchment between c. 26,200-9550 cal. year BP (**Figure 5**).

Spore concentration peaks occurred at c. 26,100 and 20,230 cal. year BP, with a third peak of well preserved spores, albeit at low concentrations, occurring at c. 12,910 cal. year BP. Pollen was preserved in a band of sediment at 26,100 cal. year BP, but was then absent until occurring again between 24,990 and 20,230 cal. year BP. Asteraceae, including highland pollen types associated with modern garúa zones, dominated the pollen preserved in this zone. Concentrations of spores and pollen even in the best of these samples were very low

(spores maximum peak: 2180 grains/cm3, with 142 grains/cm3 sample average; pollen maximum peak: 300 grains/cm3, with 67 grains/cm<sup>3</sup> average) suggesting oxidation and selective preservation.

*Cyathea* spores were present throughout the LGM and early Holocene (c. 26,200-9550 cal. year BP) at levels of 5–30 grains per sample; only one major period (25,500-22,500 cal. year BP) lacked this tree fern. There was a very brief but pronounced (899 grains/cm3) peak of *Cyathea* at 26,100 cal. year BP, as well as two peaks of 124 and 399 grains/cm<sup>3</sup> at 20,230 and 12,910 cal. year BP, respectively.

Unknown psilate monolete spores co-occurred with *Cyathea*, but the 20,230 cal. year BP peak was less marked than for *Cyathea*. Throughout the record, *Pteridium* was rare (4–180 grains/cm3), but ubiquitous with only a few samples lacking this taxon. The only substantial peak, 439 grains/cm3, occurred at 26,100 cal. year BP.

*Pteridium* spores exhibited peaks at 26,100, 20,230, and 12,910 cal. year BP (385, 381, and 839 grains/cm3, respectively), but were present in low concentrations throughout the entire record with 4–98 grains/cm<sup>3</sup> per sample. *Lycopodium* exhibited a distinct peak of 15 grains at 20,230 cal. year BP, but was absent from most of the record, with concentrations of 5–5.5 grains/cm3 when present.

#### **HIGHLAND COMMUNITY COMPOSITION DURING THE HOLOCENE**

The results of the DCA conducted on the Holocene data set (8740-0 cal. year BP) grouped similarly aged samples into five clusters (**Figure 4**). These clusters were used for statistical analyses and pollen zones within the lake basin history of Paul's Bog.

#### *Local pollen zone PB-2 (8740-7800 cal. year BP, 118–128 cm)*

The beginning of this zone had the highest levels (800–1200 grain/cm3, 10–20%) of *Polygonum*, an endemic marsh taxon. After 7800 cal. year BP, *Polygonum* fell below 140 grain/cm3 (2%) (**Figures 6**, **7**).

Representation of Asteraceae as a family was constant during this zone. Although abundances between the groups varied, the bulk of the Asteraceae came from *Jaegeria* (800–900 grain/cm3, 10–18%). *Spilanthes* and *Conzya* were present but remained below 250 grain/cm<sup>3</sup> (5%) except for a brief 800 grains/cm<sup>3</sup> (13%) peak of *Conzya* at 7800 cal. year BP. A mixture of other Asteraceae taxa were abundant as a group (c. 1200 grain/cm3, 5–10%), but very rare taxon to taxon (c. 50–100 grains/cm3, *<*1%) at the base of the Holocene record. This group of rare Asteraceae taxa began to drop off at 7800 cal. year BP.

Poaceae appeared at 8000 cal. year BP with abundances of 250–500 grains/cm3, and a brief peak (c. 1500 grains/cm3, 40%) at the termination of zone PB-2. *Cyathea* was extremely numerous (estimated 1,900,000 to 500,000 grains/cm3) before 7800 cal. year BP, while other spores were represented at *<*100,000 grains/cm3.

#### *Local pollen zone PB-3 (7800-6600 cal. year BP, 79–118 cm)*

A strong transition in the pollen flora was evident at 7800 cal. year BP. *Alternanthera* and *Zanthoxylum* occurred after 7800 cal. year BP, but remained below 100 grains/cm<sup>3</sup> (5%). Cyperaceae was present, but did not exceed 5% during this period. *Alternathera*

had one small peak of 450 grains/cm<sup>3</sup> at 7000 cal. year BP. Amaranthaceae also dropped from 320 grains/cm<sup>3</sup> to under 100 grains/cm<sup>3</sup> (10% to 2%) after 7800 cal. year BP.

Pollen concentrations in this zone were the lowest in the record, with the exception of the deglacial (**Figure 5**) and the last 500 years. The low stand in pollen concentration spanned 7800- 6600 cal. year BP. Poaceae, *Polygonum*, *Jaegeria*, and a mix of rare Asteraceae were the main constituents of pollen concentration for this episode. All spores remained very rare during this zone.

### *Local pollen zone PB-4 (6600-3660 cal. year BP, 27–79 cm)*

Poaceae dominated this period (3000–4000 grains/cm3, 20–40%) and had a peak in pollen concentration of 8000 grains/cm<sup>3</sup> at 5800 cal. year BP (**Figures 6**, **7**). After the peak, concentrations continued to fall to the upper boundary of this zone.

At 6000 cal. year BP *Ludwigia*, a marsh herb, became extremely dominant in the flora (2500–4000 grains/cm3, 10–20%), forming a peak at 4000 cal. year BP followed by a rapid disappearance from the record. *Polygonum* appeared with *Ludwigia* and followed the same trend of increase early in PB-4 to 4000 cal. year BP and

reduction in abundance, but it remained less numerous (200–400 grains/cm3, *<*5%) than *Ludwigia*.

Percentages of most highland Asteraceae remained at PB-3 levels with only *Spilanthes* having a small peak at 4250-3750 cal. year BP. *Jaegeria* and *Spilanthes* concentration paralleled those of *Ludwigia*, with a rapid decrease at 4000 cal. year BP.

*Tournefortia rufo-sericea* did not vary in percentage abundance during this episode, but concentrations rose from 6500-5500 cal. year BP (1000 grains/cm3) and stabilized at these levels after 3660 cal. year BP.

Shrub components of the flora (majority *Miconia*) began to show up in the concentration record (400–500 grains/cm3), with a plateau similar to *Tournefortia rufo-sericea* (**Figures 6**–**8**). Herbs from the Fern/Sedge community generally increased throughout the zone (chiefly mesic Asteraceae, *Tournefortia rufo-serica*, *Acalypha*).

#### *Local pollen zone PB-5 (3660-1500 cal. year BP, 7–27 cm)*

This episode was characterized by increased representation of taxa from lower elevation zones (**Figure 8**). *Bursera* percentages increased over the entire period, but concentrations had more variability (1000–4000 grains/cm3), with a 1500-year oscillation suggested in PB-5 and PB-6 (**Figure 6**). *Croton* and *Chamaesyce*, common components of arid and transitional communities, peaked in concentration, 6400 and 4000 grains/cm3, respectively, between 3750-2000 cal. year BP. Concentrations of mainland taxa (chiefly *Alnus*, *Podocarpus*, *Myrica*, *Myrsine*), *Croton* and *Chamaesyce* had similar oscillations to *Bursera* (**Figure 6**). *Scalesia* types (dominant transitional community canopy components) dominated the flora with peaks (3000–6000 grains/cm3) in concentration at 3750-1500 cal. year BP.

Poaceae, which had dominated zones PB-3 and PB-4 returned to PB-2 concentrations (c. 2000 grains/cm3) during the period of 3660-1500 cal. year BP. *Zanthoxylum* had a concentration high stand (1200–2400 grains/cm3) in the core record, while percentages were steady (5–10%) throughout the entire pollen zone.

#### *Local pollen zone PB-6 (1500-0 cal. year BP, 1–7 cm)*

Poaceae and *Bursera* increased in abundance in the last 1500 years (**Figures 8**, **9**). Cyperaceae reached their highest percentages in the entire core record. *Zanthoxylum* percentages remained at PB-5 levels (under 500 grains/cm3) from 1500 to modern. Taxa concentrations in general were lower during this zone compared with PB-4 and PB-5 with the exception of *Bursera* and *Scalesia* spp., which had maxima within the middle of this zone.

#### **ANOSIM ANALYSIS OF HOLOCENE POLLEN ZONES**

While local pollen zones are routinely assigned to help organize analysis of a fossil pollen record, differences between the zones are seldom quantified. Here, ANOSIM was used to test whether samples in the local zones differed from one another. The zones located within the Holocene pollen diagram (**Figure 7**) were significance tested with an ANOSIM analysis. A significant difference existed between the groups (*R* statistic 0.843, *p <* 0*.*001, α = 0*.*05, *n* = 70), and subsequent *post-hoc* analysis (10,000 permutations) found that every group with the exception of PB-6 and PB-2 significantly differed from each other (**Table 3**).

Importantly, time slices that lay within the period of the MHDE (PB-2, PB-3, and PB-4) were different from each other and from periods accepted as post-MHDE (PB-5 and PB-6).

## **DISCUSSION**

#### **MECHANISMS OF LONG TERM DROUGHT**

There were two prolonged drought phases in the Paul's Bog record. During the deglaciation and early-Holocene (26,000-9000 cal. year BP), Paul's Bog supported an ephemeral pond that only preserved small amounts of pollen and spores. The second phase of aridity occurred while Paul's Bog was a fluctuating shallow lake between 6500-3750 cal. year BP when littoral pollen taxa concentration was at maxima.

#### *The full glacial period*

During the LGM, the general reduction in sea-surface temperatures in the Pacific Ocean were c. 1–3◦C cooler than modern (Koutavas et al., 2002; Lea et al., 2006; Koutavas and Sachs, 2008). Data for the persistence of ENSO during the LGM (Tudhope et al., 2001) suggests that the cool tongue of water that characterize the modern eastern equatorial Pacific (EEP) was probably also present during the LGM (Lea et al., 2002; Stott et al., 2002; Chaing, 2009). Consequently, the temperature differential between the upwelling and the adjacent tropical ocean persisted. The presence of ENSO during the LGM would have allowed for wet and dry oscillations within the Galápagos highlands. The exact primary phase of ENSO variability within the LGM is debated. A persistent El Niño phase has been proffered that would have increased rains to the highlands, but lifted the garúa increasing drought stress on the mesophytic flora of the highlands leading to a possible expansion of arid lowland taxa (Johnson and Raven, 1973; Koutavas et al., 2002; Stott et al., 2002; Rein et al., 2005). An alternate theory is that the EEP cold tongue was accelerated resulting in a persistent La Niña-like state that would have reduced rainfall to the Galápagos highlands, but left garúa in place to reduce evaporative stress in the highlands (Martinez et al., 2003; Lee and Poulsen, 2006).

Pollen preservation requires organic-bearing sediments to be rapidly covered by subsequent sediment layers to prevent oxidation. The periods of mesic pollen preservation within Paul's Bog would be the result of increased rains quickly burying and preserving pollen during mesic periods during a generally dry LGM. The wet phases appear to align to periods of above average LGM sea-surface temperatures (Lea et al., 2006). It is noteworthy that at this location Heinrich Event 2 is a time of warm SST compared to the rest of the LGM and would support a wet episode on the islands due to increased convection (Koutavas et al., 2002; Lea et al., 2006). As the other Heinrich events appear to fall on warm episodes during between 30,000 and 50,000 cal. year BP, it is our working hypothesis that these would have been wet episodes on the islands. Indeed the glacial maximum, if taken as being 20,000 to 22,000 was also wet, consistent with tropical records from Yucatan, Ecuador, and Peru, though apparently different from records from Colombia and Venezuela (Bush et al., 1992; Baker et al., 2001; Hillyer et al., 2009; Correa-Metrio et al., 2010, 2012).

However, most of the previously published records deal with moisture derived from the Atlantic rather than the Pacific. In

the Pacific the migration of the ITCZ is complicated by the presence of the cold upwelled waters that inhibit a southward migration during non-El Niño years. The ITCZ cannot form over the upwelling resulting in the ITCZ being constrained to lie in the northern hemisphere (the modern condition). Two possible alternative scenarios under LGM conditions would be: (1) for the ITCZ to form far to the south of its present limit, i.e., south of the upwelling and not migrate past the upwelling

South American coast is shaded. **(A)** Modern ITCZ location; **(B)** ITCZ constrained to northern hemisphere; **(C)** ITCZ constrained to southern hemisphere; **(D)** ITCZ in southern hemisphere with weak ITCZ manifested in northern hemisphere.

**Table 3 | Pair wise** *post-hoc* **permutation test (10,000 replications) results for ANOSIM analysis of Holocene local pollen groups described in Table 1.**


(Newell, 1973); or (2) to be split, with a weakened northern range in the boreal summer, an "ITCZ gap" where the upwelling occurs, and a boreal winter location south of the upwelling (sensu Leech et al., 2013) (**Figure 9**). Leech et al. (2013) suggest that the ITCZ could have transitioned between these various states on a decadal to millennial scale. The presence of wet periods within the LGM EEP supports the hypothesis that the ITCZ was shifting on a millennial scale between a northern and southern hemisphere position. The wet phases increased pollen preservation in the Galápagos highlands as the ITCZ came closer to the islands before bypassing the EEP cold tongue.

Our working hypothesis of the ITCZ occupying, on millennial or sub-millennial timescales, more northern or southern positions than in its current range does not necessarily preclude the alternate hypotheses of a double ITCZ on either side of the equator or a more southerly position within the northern hemisphere during H1 and the LGM (Pahnke et al., 2007; Leduc et al., 2009; Leech et al., 2013).

#### *The deglacial period*

During the deglaciation, the same slight increases in sea-surface temperature that caused pollen deposition and preservation into Paul's Bog during the full glacial, failed to induce pollen preservation, indicating an overall drier climate after c. 20,000 cal. year BP. The transition toward increasing aridity at this time was also evident in a sudden lowering of lake level at Lake Pacucha in Peru (Hillyer et al., 2009; Valencia et al., 2010).

In general, records from south of the equator register a millennium-long strong dry event c. 20,000 cal. year BP, followed by a wet event that lasted until c. 16,000 cal. year BP (Hillyer et al., 2009; Urrego et al., 2010; Valencia et al., 2010; Mosblech et al., 2012). Records from Panama and the Yucatan, lying at 10◦ and 17◦ N, respectively, possess a prolonged dry event until c. 14,000 cal. year BP, an effect of a more southerly mean ITCZ position (Bush and Colinvaux, 1990; Correa-Metrio et al., 2012).

Precessional forcing would have been one of the factors that could have caused the ITCZ to migrate south during the LGM and deglaciation, enhancing rainfall to southern hemisphere lakes and caves (Koutavas et al., 2002; Koutavas and Lynch-Steiglitz, 2004; Cruz et al., 2005; Urrego et al., 2010; Mosblech et al., 2012). Northern Neotropical records generally document the early deglaciation as being dry (Bush and Colinvaux, 1990; Correa-Metrio et al., 2012), with sites in Central America, e.g., Petén Itzá, experiencing a peak of aridity from 17,000-14,000 cal. year BP (Correa-Metrio et al., 2010), while others only started to accumulate sediment after 14,300 cal. year BP, e.g., La Yeguada (Panama) (Bush et al., 1992). Contrastingly, in the southern hemisphere, drying began as early as 16,000 cal. year BP, but was most apparent between 9000 and 4400 cal. year BP (Baker et al., 2001; Hillyer et al., 2009). In the Galápagos the LGM was dry, but had brief mesic periods in phase with the southern hemisphere, c. 26,000 to 20,000 and 13,000 cal. year BP.

Forcing from the Atlantic and Laurentide ice-sheet meltwater impacts, plus the overall height and scale of the ice sheet (CLIMAP, 1981; He et al., 2013) could have accounted for the rapid climatic oscillations seen between 19,000 cal. year BP and c.9000 cal. year BP. Evidence for the migration of the ITCZ in this period is found in the opposed signature of wet and dry events at c. 13◦S and 17◦N (Baker et al., 2001; Vélez et al., 2006; Hillyer et al., 2009; Valencia et al., 2010; Correa-Metrio et al., 2012). Throughout this period the ITCZ apparently formed either north or south of the Galápagos, but did not lie over the islands.

A failure of the ITCZ to arrive at the islands and reduced SST would reduce atmospheric moisture and wet season precipitation. Garúa is present in the highlands because of temperature inversion of moist air over the cold ocean. As garúa formation is dependent on saturated air, reduced evaporation from the ocean could have caused the garúa to move upslope or to form less often. Reduced inundation with the ground-level cloud of garúa would have increased desiccation and reduced the hydroperiod of Paul's Bog. A water table that oscillated between saturated surface soils and oxidizing conditions would have caused the loss of all but the thickest-walled pollen and spores.

On the Galápagos the duration of background arid conditions from 26,000 to 9000 cal. year BP is unusually long for a site in either hemisphere. The equatorial position of the islands made them relatively sensitive to migrations of the ITCZ, meaning that the Galápagos may reflect a northern hemispheric pattern of aridity at the LGM because the ITCZ was biased to the southern hemisphere, and a southern pattern during the early Holocene.

When seasonality increased due to precessional forcing around 11,000 cal. year BP, the ITCZ was already in its modern northern position (Haug et al., 2001). This was a time of falling lake levels and aridity in the southern Neotropics. With the ITCZ lying to the north of the Galápagos the background dry conditions that began prior to the LGM would have continued. Marine isotope records from the EEP, particularly the Galápagos region, described reduced SSTs until c. 12,000 cal. year BP (Koutavas and Sachs, 2008). Prolonged cooler SSTs during the deglaciation and into the Holocene could have resulted in the extended background dry conditions observed in Paul's Bog.

#### *The holocene period*

The pollen samples associated with the MHDE stood out statistically from the other Paul's Bog samples. The ANOSIM data supported the interpretation of El Junco Crater Lake data that the MHDE was manifested on the Galápagos, and that it was not a simple, uniform event, but rather a period encompassing many climatic events against an overall backdrop of drier-than-modern conditions (Conroy et al., 2008).

During the MHDE, the ITCZ probably lay near the northern limit of its range, amplifying the Walker circulation and upwelling in the EEP. Between c. 9000 and 4400 cal. year BP, relatively weak ENSO cycles seemed to prevail (Sandweiss et al., 2001; Moy et al., 2002), as the overall state resembled La Niña-like conditions with unusually low SSTs (Haug et al., 2001; Koutavas et al., 2002, 2006). In the lowlands, aridity was amplified by the cool SSTs reducing evaporative moisture available to the archipelago, but in the uplands the intensified garúa, prevented dessication. Nevertheless, without the deluging rains associated with El Niño, water level fell in Paul's Bog as the pond dried down to a marsh between c. 6500 and 3750 cal. year BP.

Although the overall MHDE was manifested on the islands, the local effects of elevation, orientation to the trade winds and thereby, the persistence of local garúa provided slightly different outcomes at Paul's Bog and El Junco Crater Lake (**Figure 10**). El Junco Crater Lake begins to show a reduction in lake level c. 6750 cal. year BP, while Paul's Bog remained flooded until 6500 cal. year BP (Conroy et al., 2008). An explanation for the differences in time of lake dessication could be the effect of garúa migrating upslope during the MHDE. Paul's Bog lies at 800 m, while El Junco Crater Lake is at 675 m. If the SSTs were lower during the early MHDE, garúa formation could have moved upslope from El Junco Crater Lake, but Paul's Bog could have remained bathed in garúa, reducing evaporation and allowing the basin to become a

**elements and core stratigraphy for Paul's Bog and** *Azolla* **abundance for El Junco Crater Lake (from Colinvaux, 1972).**

larger lake early in the MHDE. Also the catchment area for Paul's Bog, is larger proportional to its size than that of El Junco Crater lake allowing greater interception of precipitation.

The MHDE is not believed to be a period of constant aridity along the entire South American mainland, but a drought-prone period between c. 9,000 and 4400 cal. year BP with brief periods of intense rainfall (Valencia, 2006; Vélez et al., 2006; Hillyer et al., 2009). Although the MHDE is generally defined as finishing between 5000 and 4400 cal year BP (Sandweiss et al., 1996; Koutavas et al., 2002; Abbott et al., 2003; Lea et al., 2006), lake levels were continuing to rise in mainland South America until c. 3400 cal. year BP (Bush et al., 2011). Hence, it is not surprising to see this trend toward increased lake levels reflected in the hydrologically sensitive Galápagos wetlands and lakes.

MHDE wet episodes, perhaps the result of intensified El Niño activity or a southern displacement or the ITCZ, were evident on the archipelago c. 8.5 to 6.5 cal. year BP with lake formation at the Paul's Bog catchment. This episode caused desiccation of Colombian swamps and lakes while along the equator an Ecuadorian bog on the eastern flank of the Andes recorded increased precipitation from 8 to 6.5 cal. year BP (Vélez et al., 2006; Gonzàlez et al., 2008; Niemann and Behling, 2008). The Galápagos Islands were dependent on ITCZ-derived precipitation, so a southern shift in the ITCZ during the MHDE would have provided increased moisture. An alternative possibility is that during the peak warming of the Holocene the nature of El Niño events changed, and became more similar to modern Modoki events that have a strong central pacific signature, but induce little change in precipitation in the EEP (Ashok and Yamagata, 2009; Yeh et al., 2009). Thus, rather than looking for a strictly north-south movement of the ITCZ during the Holocene, there may also have been strong east-west gradients in moisture within the Pacific Basin that varied through time as suggested by ocean-atmosphere climate models of the EEP (Leduc et al., 2009).

#### **MEGADROUGHTS AND THE HIGHLAND FLORA**

Despite prolonged droughts during the deglaciation and late- to mid-Holocene, the Galápagos does not appear to have undergone wholesale habitat replacement.

Paul's Bog provides information, albeit discontinuous, about the flora of the island back to 26,200 cal. year BP when soils were too oxidized to retain even strong-walled pollen or spores. The oldest palynomorphs retained within the sediment were highland Asteraceae types and spores, including *Cyathea weatherbyana*. This tree fern is moisture dependent and has been used to identify mesic periods around the MHDE (Colinvaux and Schofield, 1976a). The presence of *Cyathea* in Paul's Bog throughout the late-LGM and deglaciation lends support to the idea that the highlands were not completely desiccated, and may have supported the wider fern/sedge community throughout.

Oxidized soils from El Junco Crater Lake dating to *>*48,000 14C year BP were taken as an indicator of dry conditions in the highlands during the LGM (Colinvaux and Schofield, 1976a). The same soil types from Paul's Bog contained spores, indicating that this portion of the highlands was an area possessing some moisture in order to support *Cyathea* and bracken fern (*Pteridium*). The qualitative appearance of highland elements from Asteraceae lent further support that the windward portions of the Santa Cruz highlands retained moisture and have offered at least microrefugial (sensu Rull, 2005) habitats for mesic taxa during the droughts of the LGM.

The qualitative presence of bracken ferns, *Cyathea*, and highland Asteraceae suggests the continued presence of mesic habitats through the late glacial period. For this system to maintain the mesic elements argues against the complete loss of garúa. The exposed windswept location of Paul's Bog and the other pocket wetlands would dry out very quickly without cloud cover. Our data do not invalidate Johnson and Raven's (1973) argument that loss of habitat area reduced the capacity of endemic taxa to evolve and persist in the highlands. However, our data point to the long-term persistence of the montane endemics such as *Cyathea weatherbyana*, rather than a Holocene invasion and recent speciation.

Conservatively, we can hypothesize that the Galápagos highlands during the LGM were drier than modern, but maintained some garúa cover. Importantly, our data do not indicate an increase of any of the lowland pollen types during the LGM. Microrefugia might have been larger than previously thought (Colinvaux, 1972), possibly located along high elevation (800 m) windward facing highlands, e.g., Paul's Bog, where updrafted moisture was concentrated.

During weaker cold SST events, like the MHDE, garúa does not appear to have been greatly altered because both El Junco and Paul's Bog remained, but were reduced in lake level. Peaks of the littoral taxa, including *Ludwigia* and *Polygonum*, defined the periods of low lake level in Paul's Bog, the apparent dessication of some other bogs on Santa Cruz (Coffey et al., 2012), and were broadly coincident with the peak of the water-fern *Azolla* documented by Colinvaux (1972) in El Junco Crater Lake (**Figure 10**).

The highland flora did not exhibit significant reductions in diversity during the MHDE, and highland Asteraceae, *Tournefortia rufo-serica*, and *Alternanthera* maintained populations. During the MHDE, La Niña-like conditions were hypothesized to dominate the EEP (Koutavas et al., 2002; Koutavas and Lynch-Steiglitz, 2004). La Niña conditions reduce rainfall to the islands, but garúa cover is strengthened (Snell and Rea, 1999). Under these conditions, the montane vegetation, which is immersed in cloud for much of the year, does not experience the same level of evaporative stress as the lowlands. However, during the peak of the hot season (DJF) there may have been cloudless skies and no rain, increasing evaporation and drought stress. We infer that open water gave way to a marsh at Paul's Bog due to a reduction of precipitation during the MHDE.

The presence of garúa, which acts as a barrier to low elevation taxa (Itow, 2003), would have prevented the upslope expansion of arid taxa. Paul's Bog did not exhibit significant increases in low elevation taxa, providing further support for the persistence of garúa during the MHDE and stability of the highland flora, even with reduced rainfall.

#### **RESILIENCY, EXTINCTION AND SPECIATION OF THE GALÁPAGOS FLORA**

Johnson and Raven's (1973) theory of extinction and subsequent rapid speciation of highland elements of the Galápagos during the Holocene cannot be supported by Paul's Bog. Highland taxa appear to have been present in high elevation windward facing peaks and valleys during the deglaciation/early-Holocene period. Presence of pollen from mesophilic taxa at the LGM and the limited community compositional change during the MHDE strengthens the argument for resiliency of the mesic portion of the Galápagos flora.

During the LGM, the Galápagos flora had to contend with oscillations in precipitation, present as periods of pollen preservation and complete oxidation. It has been postulated in the South American mainland that these oscillations allowed downslope migrations of alpine plants in glacials and subsequent population fragmentation as interglacials forced these taxa back upslope (Colinvaux, 1998; Rull, 2005; Mosblech et al., 2012). The highland taxa of the Galápagos apparently persevered in refugia within windward slopes (e.g., Paul's Bog), but the downslope migration probably reflected trends in garuá rather than temperature. In the context of the Galápagos, the distances are so small and the probability of new invasion so slight, the explanation of the observed lowland-highland difference in endemism may have more to do with the number of potential islands that could harbor differentiating populations. Grant et al. (2004) have argued that finches did not radiate *in situ* on each island, but have had complex migrations between islands, radiating and then re-uniting with ancestral types in what they have termed a "braided river" of evolution. The same may be true of lowland plant evolutions, where the larger area of lowlands, and many more islands lacking highlands offer greater potential for speciation, than is available to highland counterparts. Studies of plant geneflow between different island habitats would be particularly revealing.

#### **HIGHLAND FLORA UNDER FUTURE CLIMATE CHANGE PREDICTIONS**

Future climate models for the EEP are not concordant, with some models predicting decreased ENSO activity under increased greenhouse gas concentrations (Collins, 2000; Cane, 2005; Collins and Groups, 2005), while others predict strengthened ENSO activity (Timmermann et al., 1999; IPCC, 2007b). Under modern ENSO garúa is weakened during El Niño events and strengthened during La Niña events (Snell and Rea, 1999). Increased ENSO activity, particularly more frequent or intense El Niño events, could reduce density of garúa, stressing the highland flora. While decreased ENSO activity would maintain the non-ENSO event state of garúa presence in the highlands.

Increased global temperatures are postulated to have resulted in increased EEP SST and precipitation observed in Galápagos lake records and models predict increased precipitation in the Galápagos region (IPCC, 2007a; Conroy et al., 2009). Increased SST, in conjunction with a predicted weakening of the Walker Circulation could cause an El Niño-like condition in the EEP that would reduce garúa formation in the highlands (Vecchi et al., 2006; Vecchi and Soden, 2007; Sachs and Ladd, 2010). Future warming could result in garúa no longer buffering the highland flora to arid conditions for the first time since the LGM. Garúa formation and mechanics, however, are difficult to predict (Sachs and Ladd, 2010; Pryet et al., 2012). Because garúa is a major factor in the presence of a mesic plant community in an oceanic desert, further study into garúa dynamics is warranted to understand how modeled future ENSO activity and global temperatures will impact the highland flora.

#### **CONCLUSIONS**

The flora of the Galápagos has withstood long-term drought for much of the last 26,000 years. Overall, glacial conditions were dry, but sea-surface temperature still appears to have been an important determinant of wet and dry episodes, with the times of maximum cooling in the Atlantic perhaps being most likely to be wet in the Galápagos. Throughout the last 26,000 years, the continued presence of garúa allowed microrefugia to support populations of highland species. The species most susceptible of extinction on the Galápagos appear to have been obligate aquatic organisms. While humid conditions in the uplands appear to have been maintained by cloud cover even if the rainfall is negligible, open water bodies were vulnerable to desiccation. The composition of fossil pollen types reflected lake level, but there was no evidence of lowland taxa invading the highlands. The resiliency of mesic communities within the archipelago in the LGM and Holocene was far greater than previously postulated (Schofield and Colinvaux, 1969; Johnson and Raven, 1973; Colinvaux and Schofield, 1976a,b). With increasing global temperatures postulated to bring further aridity to the Galápagos and the continued rise in invasive species, the flora of the Galápagos faces new threats, quite different to those of the past. Locating Quaternary refugia may outline potential areas for intensive preservation to allow the highland community to persist future periods of long-term drought.

## **ACKNOWLEDGMENTS**

We gratefully acknowledge permission from the Galápagos National Park to conduct this research, and logistic support from the Charles Darwin Research Station. Thanks also to members of the 2004 Galápagos expedition, Rienk Smittenberg, Paul Colinvaux, Michael Miller, Miriam Steinitz-Kannan, Jonathan Overpeck, and Jessica Conroy, for their assistance in the field. Aaron F. Collins was partially supported by a Graduate Teaching Fellowship from the National Science Foundation (Florida Institute of Technology InSTEP Program) under grant Nos. DGE 0440529 and 0638702. NSF grants BCS0926973 (to Mark B. Bush) and ESH0639640 (to Julian P. Sachs) provided financial support for field work and laboratory analyses. We appreciate the comments and suggestions on this manuscript by Frank Mayle, William Gosling, and Encarni Montoya.

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 04 September 2013; paper pending published: 27 September 2013; accepted: 18 November 2013; published online: 03 December 2013.*

*Citation: Collins AF, Bush MB and Sachs JP (2013) Microrefugia and species persistence in the Galápagos highlands: a 26,000-year paleoecological perspective. Front. Genet. 4:269. doi: 10.3389/fgene.2013.00269*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2013 Collins, Bush and Sachs. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*