# EVOLUTION OF ORGANISMAL FORM: FROM REGULATORY INTERACTIONS TO DEVELOPMENTAL PROCESSES AND BIOLOGICAL PATTERNS

EDITED BY: Sylvain Marcellini and Hector Escriva PUBLISHED IN: Frontiers in Genetics and Frontiers in Ecology and Evolution and Frontiers in Plant Science

#### *Frontiers Copyright Statement*

*© Copyright 2007-2017 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

> *The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88945-097-8 DOI 10.3389/978-2-88945-097-8

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **EVOLUTION OF ORGANISMAL FORM: FROM REGULATORY INTERACTIONS TO DEVELOPMENTAL PROCESSES AND BIOLOGICAL PATTERNS**

Topic Editors:

**Sylvain Marcellini,** University of Concepcion, Chile **Hector Escriva,** CNRS, UPMC université Paris 06, UMR 7232, BIOM, Observatoire Océanologique de Banyuls sur Mer, France

A live *Xenopus tropicalis* tadpole was exposed to a pulse of Alizarin red at stage NF56, and subsequently subjected to a Calcein pulse at stage NF60. This approach reveals growth patterns associated to the process of osteogenesis because it allows the simultaneous visualization of "old" and "newly deposited" mineralized bone matrix in red (bottom picture) and green (middle picture), respectively. The images show a dissected frontoparietal bone covered by some scattered pigment cells (black dots). Orientation of the merge picture: anterior (down) and posterior (up).

Authors: Carlos Henríquez and Sylvain Marcellini

Laboratory of Development and Evolution (LADE), Department of Cell Biology, Faculty of Biological Sciences, University of Concepcion, Concepcion, Chile.

Today's biodiversity is the spectacular product of hundreds of millions of years of evolution. Understanding how this diversity of living organisms appeared is one of the most intriguing and challenging question in biology. Because organismal morphology is established during embryonic development, and because morphological traits diversified from ancestral forms during evolution, it can be inferred that changes in the mechanisms controlling embryonic development are instrumental for morphological evolution. This syllogism lies at the very heart of a new discipline called Evo-Devo which is centered in the identification of the cellular and genetic mechanisms that, through modifications in developmental programmes, were at the base of morphological innovations during evolution.

After the discovery of the broad conservation of gene content and regulatory networks in the animal kingdom, as well as in plants, Evo-Devo is orienting towards the study of differences through experimental and functional approaches. Given the wide range of species, gene families, and developmental processes considered, a concerted effort is still required to shed light on the genetic, cellular and molecular mechanisms involved in phenotypic evolution. It is a particularly exciting time for this field of evolutionary developmental biology, as the advent of novel imaging, genome editing and sequencing technologies allows the study of almost any organism in ways that were unthinkable only a few years ago. Therefore, the aim of this Frontiers Research Topic is to gather an original collection of experimental approaches, concepts and hypotheses reflecting the current diversity of the Evo-Devo field. We have organized the articles according to the mechanistic depth with which they tackle specific evolutionary issues. Hence, comparisons of expression patterns have been grouped in Chapter 1, changes in regulatory interactions and gene networks are presented in Chapter 2, while Chapter 3 focuses on the evolution of developmental processes and biological patterns.

**Citation:** Marcellini, S., Escriva, H., eds. (2017). Evolution of Organismal Form: From Regulatory Interactions to Developmental Processes and Biological Patterns. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-097-8

# Table of Contents

*05 Editorial: Evolution of Organismal Form: From Regulatory Interactions to Developmental Processes and Biological Patterns* Sylvain Marcellini and Hector Escriva

## **CHAPTER 1. Expression patterns**


Sébastien Enault, David N. Muñoz, Willian T. A. F. Silva, Véronique Borday-Birraux, Morgane Bonade, Silvan Oulion, Stéphanie Ventéo, Sylvain Marcellini and Mélanie Debiais-Thibaud

#### **CHAPTER 2. Regulatory interactions and gene networks**


Elena M. Kramer

#### **CHAPTER 3. Developmental processes and biological patterns**

*90 Evolution of epithelial morphogenesis: phenotypic integration across multiple levels of biological organization*

Thorsten Horn, Maarten Hilbrant and Kristen A. Panfilio


# Editorial: Evolution of Organismal Form: From Regulatory Interactions to Developmental Processes and Biological Patterns

Sylvain Marcellini <sup>1</sup> \* and Hector Escriva<sup>2</sup>

<sup>1</sup> Laboratory of Development and Evolution, Department of Cell Biology, Faculty of Biological Sciences, Universidad de Concepción, Concepción, Chile, <sup>2</sup> Observatoire Océanologique de Banyuls sur Mer, Centre National De La Recherche Scientifique, UPMC Université Paris 06, UMR 7232, BIOM, Banyuls sur Mer, France

Keywords: Evo-Devo, regulatory interactions, developmental processes, biologiccal patterns, genomes

**The Editorial on the Research Topic**

#### **Evolution of Organismal Form: From Regulatory Interactions to Developmental Processes and Biological Patterns**

Living organisms display an astonishing morphological and behavioral diversity shaped by extrinsic environmental conditions and by intrinsic changes in developmental processes. In turn, such developmental trajectories are contingent on a myriad of regulatory interactions occurring at all possible steps of gene expression and cellular function. We are pleased to present a Frontiers Research Topic composed of 10 original research articles and reviews whose focus, ideas, and hypotheses reflect the current diversity and future directions of the field of Evo-Devo.

The evolution of gene families, gene expression patterns, and alternative splicing are addressed by examining germline determinants in cephalochordates and their implications for our understanding of multipotency and regeneration (Dailey et al.), by an extensive analysis of Fox members expressed during amphioxus early embryogenesis (Aldea et al.), and by deciphering the origin and expansion of alternative splicing of Pax genes in chordates (Fabian et al.). The evolution of cell types and tissue morphogenesis are addressed by illustrating, using the insect extraembryonic epithelia, how tissue architecture and physical context are crucial to understand gene function and evolution (Horn et al.), by proposing a model to explain how the transition between immature and mature cartilage might have facilitated the emergence of the osteoblastic regulatory network (Gomez-Picos and Eames), and by examining major fibrillary collagen genes expressed in the catshark and the clawed frog skeletons, thereby providing new insights on the origin of cartilage calcification (Enault et al.). Finally, some authors discuss important concepts in the field, such as the interpretations of heterologous assays and their possible pitfalls (Kramer), the convergent evolution of a two-steps morphogenetic mechanism controlling organ shape in plants and animals (Mentink and Tsiantis), the evolution of short peptide motives driving the generation of specific protein complexes involved in key bilaterian innovations (Merabet and Galliot), and the intricate relationships linking the genotypic and phenotypic dimensions (Orgogozo et al.).

So, what is the broad contribution to Evo-Devo of the 10 aforementioned manuscripts, and how do they relate to the future research directions that this discipline must prioritize in order to remain both successful and attractive to the community? Hints to our first answer are included within the topic title itself, inspired from a François Jacob influential review emphasizing the complex and crucial relationships between different levels of biological organization (Jacob, 1977). Undoubtedly, understanding how specific mutations and environmental factors affect molecular

#### Edited and reviewed by:

Samuel A. Cushman, United States Forest Service Rocky Mountain Research Station, USA

> \*Correspondence: Sylvain Marcellini smarcellini@udec.cl

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 31 May 2016 Accepted: 03 August 2016 Published: 17 August 2016

#### Citation:

Marcellini S and Escriva H (2016) Editorial: Evolution of Organismal Form: From Regulatory Interactions to Developmental Processes and Biological Patterns. Front. Genet. 7:148. doi: 10.3389/fgene.2016.00148 networks, cells, organs, species, and ecosystems, represents one of the most stimulating Evo-Devo conceptual frameworks. While it is a long way to connect "regulatory interactions to developmental processes and biological patterns," the initial and obligatory step is to report expression patterns of key molecular actors (Aldea et al.; Fabian et al.; Dailey et al.). One can then move to higher hierarchical levels, for instance by understanding the mechanisms facilitating the emergence of regulatory interactions involved in body plan patterning (Merabet and Galliot), or by integrating gene activity, cellular behavior and mechanical forces to reach a comprehensive view of embryonic evolution (Horn et al.). Our second answer relates to our ability to cope with a modern era continuously flooded by ever-improving imaging, genome editing and sequencing technologies. As a consequence, data analysis, and not data generation, should be our present priority (Moore, 2012). As recently argued, facing the next grand challenge in evolutionary biology will require a strong synergy between three major branches of the field: Experimental data, genomics, and modeling (Cushman, 2014). According to this strategy, evolutionary models must be validated by performing carefully designed and controlled experiments, which, in the case of heterotopic functional assays, should follow the guidelines

### REFERENCES

Cushman, S. A. (2014). Grand challenges in evolutionary and population genetics: the importance of integrating epigenetics, genomics, modeling, and experimentation. Front. Genet. 5:197. doi: 10.3389/fgene.2014. 00197

Jacob, F. (1977). Evolution and tinkering. Science 196, 1161–1166. doi: 10.1126/science.860134

Moore, A. (2012). Have we produced enough results yet, sir? Bioessays 34:163. doi: 10.1002/bies.201290005

proposed by Kramer. This multidisciplinary approach will be particularly well-suited to extract universal principles underlying the development of multicellular organisms (Mentink and Tsiantis), to decipher the evolution of the bone and cartilage gene regulatory networks (Enault et al.; Gomez-Picos and Eames), or to understand, for any species of interest, how the interaction between genotype and environment generates complex phenotypic spaces (Cushman, 2014, Orgogozo et al.). These are exciting times for the Evo-Devo community, we hope that you will enjoy this collection of articles and look forward in the near future to reading any follow-up work that it will have inspired.

## AUTHOR CONTRIBUTIONS

SM and HE wrote, edited, and revised the manuscript.

## ACKNOWLEDGMENTS

This collaborative effort has been supported a FONDECYT research grant 1151196 to SM.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Marcellini and Escriva. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Expression of Fox genes in the cephalochordate *Branchiostoma lanceolatum*

#### Daniel Aldea, Anthony Leon, Stephanie Bertrand\* and Hector Escriva\*

Centre National De La Recherche Scientifique, UPMC université Paris 06, UMR 7232, BIOM, Observatoire Océanologique de Banyuls sur Mer, Banyuls sur Mer, France

Forkhead box (Fox) genes code for transcription factors that play important roles in different biological processes. They are found in a wide variety of organisms and appeared in unicellular eukaryotes. In metazoans, the gene family includes many members that can be subdivided into 24 classes. Cephalochordates are key organisms to understand the functional evolution of gene families in the chordate lineage due to their phylogenetic position as an early divergent chordate, their simple anatomy and genome structure. In the genome of the cephalochordate amphioxus Branchiostoma floridae, 32 Fox genes were identified, with at least one member for each of the classes that were present in the ancestor of bilaterians. In this work we describe the expression pattern of 13 of these genes during the embryonic development of the Mediterranean amphioxus, Branchiostoma lanceolatum. We found that FoxK and FoxM genes present an ubiquitous expression while all the others show specific expression patterns restricted to diverse embryonic territories. Many of these expression patterns are conserved with vertebrates, suggesting that the main functions of Fox genes in chordates were present in their common ancestor.

#### Keywords: Fox genes, amphioxus, Evo-Devo, chordates, embryonic development

# Introduction

Forkhead box (Fox) transcription factors originated early during evolution and are specific to opisthokonts. They are present in fungi as well as in metazoans (Mazet et al., 2006; Larroux et al., 2008; Shimeld et al., 2010a) in which they play essential roles during embryonic development (Carlsson and Mahlapuu, 2002; Tuteja and Kaestner, 2007a,b; Benayoun et al., 2011). Fox proteins possess a helix-turn-helix DNA-binding domain called the forkhead domain which corresponds to a conserved region of approximately 110 amino acids (Weigel and Jackle, 1990; Clark et al., 1993). A molecular phylogeny-based classification of the Fox gene family allowed to propose its subdivision into 24 classes (ranged from FoxA to FoxS and including subfamilies that were recently subdivided: FoxJ (FoxJ1 and FoxJ2), FoxL (FoxL1 and FoxL2), and FoxN (FoxN1/4 and FoxN2/3) (Mazet et al., 2003). Many Fox gene losses or duplications occurred in different bilaterian clades, affecting different Fox classes. For example, FoxAB is found in cephalochordates and in the sea urchin but not in tunicates or vertebrates (Tu et al., 2006; Yu et al., 2008a), and families R and S are vertebrate-specific (Wotton and Shimeld, 2006; Shimeld et al., 2010b). Using phylogenetic analyses, it has been proposed that 22 Fox gene families were already present in the bilaterian ancestor (Shimeld et al., 2010b).

#### *Edited by:*

Naoki Osada, Hokkaido University, Japan

#### *Reviewed by:*

Jr-Kai Yu, Academia Sinica, Taiwan Haruki Ochi, Yamagata University, Japan

#### *\*Correspondence:*

Stephanie Bertrand and Hector Escriva, Laboratoire Arago, Avenue du Fontaulé, 66650 Banyuls-sur-Mer, France stephanie.bertrand@obs-banyuls.fr; hescriva@obs-banyuls.fr

#### *Specialty section:*

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Ecology and Evolution

> *Received:* 29 May 2015 *Accepted:* 07 July 2015 *Published:* 28 July 2015

#### *Citation:*

Aldea D, Leon A, Bertrand S and Escriva H (2015) Expression of Fox genes in the cephalochordate Branchiostoma lanceolatum. Front. Ecol. Evol. 3:80. doi: 10.3389/fevo.2015.00080

Cephalochordates (i.e., amphioxus) belong to the chordate phylum together with tunicates and their sister group, the vertebrates. They present morphological, developmental, and genomic characteristics that are proposed to be very similar to the ancestral state in the chordate clade, making amphioxus a key model system to understand chordate evolution (Bertrand and Escriva, 2011, 2014). Interestingly, it has been shown that amphioxus is the only living bilaterian possessing at least one member of each of the 22 Fox gene families proposed to have been present in Urbilateria (Yu et al., 2008a). Thus, the study of Fox genes in this cephalochordate may shed light on the functional evolutionary history of this transcription factor gene family. Past studies using genomic data from the Caribbean cephalochordate Branchiostoma floridae described the presence of 32 Fox genes in this species (Yu et al., 2008a) and the expression pattern of 11 of these genes was previously described: FoxAa and FoxAb (formerly named AmHNF3-1 and AmHNF3- 2, respectively) (Shimeld, 1997), FoxB (Mazet and Shimeld, 2002), FoxC (Mazet et al., 2006), FoxD (Yu et al., 2002b), FoxE4 (Yu et al., 2002a), FoxF (Mazet et al., 2006; Onimaru et al., 2011), FoxG (Toresson et al., 1998), FoxL1 (Mazet et al., 2006), FoxN1/4a (Bajoghli et al., 2009), FoxQ1 and FoxQ2 (Yu et al., 2003; Mazet et al., 2006). In this work we searched for Fox sequences in the transcriptome of the Mediterranean amphioxus Branchiostoma lanceolatum. We found 28 Fox sequences and we describe here the spatiotemporal expression pattern of 13 Fox genes during embryonic development, including seven previously described in B. floridae and six for which expression was not known. We show that in B. lanceolatum some Fox genes exhibit ubiquitous expression as FoxK and FoxM, while the others show specific and dynamic expression patterns restricted to diverse embryonic territories. These expression patterns suggest that Fox genes are performing both general and specific functions during amphioxus embryonic development, most of them being probably ancestral in the chordate clade.

# Materials and Methods

#### Phylogenetic Analysis

All reference sequences, except for B. lanceolatum, were obtained from Genbank or from Fritzenwanker et al. (2014) The multiple alignment was performed only for the conserved Forkhead amino acid domain sequences using the MUSCLE module implemented in MEGA 6 and manually refined in its interface (Tamura et al., 2013). The best fit substitution model for phylogenetic reconstruction was estimated using MEGA 6 (Tamura et al., 2011). Bayesian inference (BI) tree was inferred using MrBayes 3.2 (Ronquist et al., 2012), with the model recommended by MEGA 6 under the Akaike information criterion (RtRev+Ŵ), at the CIPRES Science Gateway V. 3.1 (Miller et al., 2015). Two independent runs were performed, each with four chains and 1 million generations. A burn-in of 25% was used and a 50 majority-rule consensus tree was calculated for the remaining trees.

#### Cloning and Expression Study

B. lanceolatum Fox sequences were recovered from its reference transcriptome (Oulion et al., 2012) by TBLASTN using sequences from B. floridae as queries. Specific primers were then designed for RT-PCR amplification from total RNA. Primer sequences are as follow:

FoxA\_a\_5′ AAGTCGCCGGTGTACGAGATG FoxA\_a\_3′ GTATTATAGAGACGAAGGTTG FoxA\_b\_5′ CATTTCCTCAGAACAGACATG FoxA\_b\_3′ TCCTAAAGACTCCCAACAACA FoxAB\_5′ CAGTGTGAGGTGAACATCATG FoxAB\_3′ CGATTGACAGGTTGATAGAAC FoxB\_5′ ACAACAGGACCCTGACTCGT FoxB\_3′ GCATTCCCTGACGTCTTGA FoxC\_5′ AACCGTCCCGTTTTCCTCATG FoxC\_3′ CAGTTTTGATTCGTAAGGACT FoxD\_5′ ACAGCTGTGGAGTGGACACTT FoxD\_3′ CACGAGACATGTAAGTCTCCG FoxEa\_5′ AACCAACCCCGTACCAGCATG FoxEa\_3′ ATATGACACGGACACTGAACT FoxG\_5′ ACGCACATTAGCACAGTTCG FoxG\_3′ ACTTGACCCTGGCTTGACAC FoxJ1\_5′ TACAGACAACTGTAAACCATG FoxJ1\_3′ TTGTAATGCAGGGTGGGGCCT FoxK\_5′ GGAAGGCGGAGTTGGACAATG FoxK\_3′ CCGGACACGTCCTGCACCTGT FoxM\_5′ AGGAGAGTGTGACAAACCATG FoxM\_3′ TTCTCAGCTATTCAGTAATAC FoxN1/4a\_5′ GCGCACCGAGTATCGTTCTGA FoxN1/4a\_3′ ACATAGGTAGGACTATGTACT FoxN2/3\_5′ CAGTAAACACGAGCAGACATG FoxN2/3\_3′ AGCTGAAGACAATGATGATCC

A mix of total mRNA of B. lanceolatum extracted from embryos at different developmental stages was used as a template for retro-transcription. Amplification was performed using Advantage 2 Polymerase kit (Clontech) and a touch-down PCR program with annealing temperature ranging from 65 to 40◦C. Amplified fragments were cloned using the pGEM-T Easy system (Promega) and sub-cloned in pBluescript II KS+ for probe synthesis.

#### Whole Mount *In situ* Hybridization

Probes were synthesized using the DIG labeling system (Roche) after plasmid linearization with the appropriate enzymes. Ripe animals of B. lanceolatum were collected in Argelès-sur-Mer (France), and gametes were obtained by heat stimulation (Fuentes et al., 2004, 2007). In vitro fertilization was undertaken in Petri dishes filled with filtered sea water. Fixation and whole mount in situ hybridization were performed as described in Somorjai et al. (2008).

### Results

#### Molecular Phylogenetic Analysis of *B. lanceolatum* Fox Gene Sequences

We looked for Fox gene sequences in the reference transcriptome of B. lanceolatum (Oulion et al., 2012). The sequences that were recovered were used to conduct a phylogenetic tree reconstruction presented in **Figure 1**. We showed that B. lanceolatum possesses at least 28 Fox genes, each of them being orthologous to one of the 32 genes described in B. floridae and corresponding to at least one member of each of the 22 families present in the bilaterian ancestor (Yu et al., 2008a). Specific duplications, that occurred in the cephalochordate clade at least in the ancestor of B. floridae and B. lanceolatum, gave rise to three members in the FoxQ2 group (FoxQ2a, FoxQ2b, FoxQ2c), two members in the FoxN1/4 group (FoxN1/4a and FoxN1/4b), and two genes in the FoxE group (FoxEa and FoxEc). We then analyzed the expression pattern during B. lanceolatum embryonic development of 13 of these 28 Fox genes corresponding to those showing a higher expression level in the transcriptome (Oulion et al., 2012).

#### *FoxAa* and *FoxAb*

FoxAa (formerly named AmHNF3-1) (Shimeld, 1997) was first expressed at the gastrula stage in the anterior ventral endoderm and in the mesendodermal layer of the dorsal blastoporal lip (**Figures 2A,B**). At the late gastrula stage, we detected transcripts in the axial dorsal mesendoderm corresponding to the presumptive notochord territory, as well as in mesendoderm cells of the archenteron floor (**Figures 2C,D**). Expression in the axial mesoderm and endoderm persisted through mid-late neurula stage (**Figures 2E,F**). Later on, at late neurula stage before the mouth opens, the expression in the notochord was restricted to the most anterior and posterior tips of the embryo, while the endodermal expression was restricted to the middle region of the gut (**Figure 2G**). At the larva stage, the expression at the anterior tip of the notochord and in the tailbud was still observed and we detected a diffuse expression in the gut (**Figure 2H**).

FoxAb (formerly named AmHNF3-2) (Shimeld, 1997) expression was first detected at the gastrula stage as a weak signal in the mesendodermal part of the dorsal blastoporal lip (**Figures 2I,J**). At the late gastrula stage, we detected expression in the central paraxial mesoderm on both sides of the notochord anlagen (**Figures 2K,L**). At the mid-late neurula stage transcripts were detected in the neural tube, including the cerebral vesicle, and in the dorsal part of the endoderm (**Figures 2M,N**). At the late neurula stage, before the mouth opens, FoxAb was expressed in the neural tube and in the most anterior part of the pharynx. In the posterior region, expression was detected in the tailbud and in the dorsal midline of the gut (**Figure 2O**). At the larva stage, we observed expression in the pharynx, in the preoral pit, in the club-shaped gland and in the tailbud. At this stage, the expression in the neural tube gets restricted to some neurons and to the posterior part of the cerebral vesicle (**Figure 2P** and Figure S1A).

#### *FoxAB*

FoxAB transcripts were detected as a weak and ubiquitous signal from the eight-cell stage to the blastula stage (**Figures 2Q,R**). This ubiquitous expression was confirmed by the presence of reads in transcriptome analyses (data not shown). At the gastrula stage we observed a strong specific expression in the dorsal blastoporal lip, the amphioxus putative organizer (**Figures 2S,T**). At the late gastrula stage, expression gets restricted to the presumptive notochord territory (**Figures 2U,V**). No expression could be detected by in situ hybridization in later stages.

#### *FoxB*

FoxB expression was first detected dorsally, both in the ectoderm and in the mesendoderm, as a weak signal in mid gastrula stage embryos (**Figures 2W,X**). Later on, in early neurula stage embryos, a signal could be observed in the neural plate on either side of the midline, as well as in two patches in the posterior paraxial mesendoderm (**Figures 2Y,Z**). During the late neurula stage, expression was detected in the most posterior paraxial mesoderm that give rise to the newly formed somites and in the neural tube posterior to the cerebral vesicle (**Figures 2A',B'**). Then, FoxB expression in the mesoderm faded away in late neurulae (**Figure 2C'**) and get later restricted to the cerebral vesicle and to some neurons along the neural tube in larvae (**Figure 2D'** and Figure S1B).

# *FoxC*

FoxC was expressed at the gastrula stage in the dorsal paraxial mesendoderm (**Figures 3A,B**). Later on, at the late gastrula stage, expression was detected in the region that gives rise to the three most anterior somites (**Figures 3C,D**). In mid-late neurulae, the transcripts remained all along the body in the somites and a new expression domain appeared in the anterior endoderm at the level where the first gill slit opens (**Figure 3E**). At the late neurula stage, the expression persisted in the pharynx and somites and was also detected in the club-shaped gland anlagen (**Figures 3F,G**). At the larva stage a diffuse expression was observed in the somites as well as in the preoral pit, in the club-shaped gland and in the first gill slit (**Figure 3H** and Figure S1C).

### *FoxD*

FoxD transcripts were first detected at the gastrula stage in the dorsal blastoporal lip (**Figures 3I,J**). Then, at the late gastrula stage, FoxD was expressed in the dorsal axial mesendoderm, in part of the dorsal paraxial mesendoderm as two patches on both sides of the midline and in the anterior region of the neural plate (**Figures 3K,L**). At the mid-late neurula stage, the notochord and the somites, as well as the cerebral vesicle, were labeled (**Figure 3M**). At the late neurula stage, before the mouth opens, transcripts were detected in the paraxial somitic mesoderm, in the notochord, in the cerebral vesicle and in the posterior endoderm (**Figures 3N,O**). A faint labeling was also detected at this stage in the first gill slit and in the club-shaped gland anlagens. At the larva stage, we observed a low expression level in the cerebral vesicle, in the preoral pit, in the club-shaped gland, in the first gill slit, in the notochord and in the posterior part of the gut. We also observed an anterior to posterior gradient of expression in the somites (**Figure 3P** and Figure S1D).

#### *FoxEa*

FoxEa (formerly named FoxE4 in B. floridae) expression was first detected at early neurula stage in the antero-ventral mesendoderm (**Figures 3Q,R**). Later on, at the mid-late neurula

FIGURE 1 | Phylogenetic analysis of *B. lanceolatum* Fox genes. Unrooted 50 majority-rule consensus Bayesian inference tree based on the amino acid sequences of the forkhead domain. Posterior probablilities are shown at each node. The different paralogy groups are colored in pink or light blue boxes. Divergent sequences appeared outside these boxes. Only one amphioxus Fox gene, named Fox1 (Yu et al., 2008a), that probably originated by a specific duplication and fast evolutionary rate in cephalochordates, (Continued)

#### FIGURE 1 | Continued

localizes outside these paralogy groups. Abbreviations: Dm, Drosophila melanogaster; Mm, Mus musculus; Dr, Danio rerio; Ci, Ciona intestinalis; Sp, Strongylocentrotus purpuratus; Sk,

Saccoglossus kowalevskii; Nv, Nematostella vectensis; Bf, Branchiostoma floridae; Bl, Branchiostoma lanceolatum. Red stars indicate Bl sequences. Scale bar represents 0.4 amino acid substitution per site.

FIGURE 2 | Expression of *B. lanceolatum FoxAa*, *FoxAb, FoxAB, and FoxB*. In all the panels except (B, J, Q, R, T, X) anterior is to the left. In lateral and blastoporal views dorsal is to the top. FoxAa expression pattern (A–H). Gastrula lateral (A) and blasporal (B) views. Late gastrula lateral (C) and dorsal (D) views. Mid-late neurula lateral (E) and dorsal (F) views. In the late neurula lateral view (G) arrow marks the endodermal expression in the middle region. In the larva stage lateral view (H), the double arrowhead indicates the expression in the anterior tip of the notochord and the arrowhead marks the expression in the tailbud. FoxAb expression pattern (I–P). In the gastrula lateral (I) and blastoporal (J) views the arrow indicates the expression in the mesendodermal part of the dorsal blastoporal lip. Late gastrula lateral (K) and dorsal (L) views. In the mid-late neurula lateral (M) and dorsal (N) views the double arrowhead marks the expression in the

cerebral vesicle. In the late neurula lateral view (O), the double arrow marks the expression in the most anterior part of the pharynx. In larva lateral view (P) the arrowhead indicates the expression in the tailbud. FoxAB expression pattern (Q–V). Eight-cell stage (Q). Blastula stage (R). Gastrula lateral (S) and blasporal (T) views. Late gastrula lateral (U) and dorsal (V) views. FoxB expression pattern (W–D'). Gastrula lateral (W) and blastoporal (X) views. Early neurula lateral view (Y). In the early neurula dorsal (Z) view the arrowhead indicates the two expression patches in the posterior paraxial mesendoderm. Mid-late neurula lateral (A') and dorsal (B') views. The double arrowhead marks the expression in the newly formed somites. Late neurula lateral view (C'). In larva lateral view (D') the arrow indicates the expression in the cerebral vesicle. Scale bar: 10µm (A–F), (I–N), (Q-V), (W-B'), and 50 µm (G,H), (O,P), (C',D').

stage, FoxEa transcripts were detected ventrally in the endoderm with a higher expression level on the right side of the pharynx (**Figures 3S,T**), and a slight expression domain in the posterior gut was also visible. At the late neurula stage, FoxEa transcripts remained ventrally in the pharyngeal endoderm on the right side (**Figure 3U**). Finally, at the larva stage, transcripts were detected in the club-shaped gland (**Figure 3V** and Figure S1E).

#### *FoxG*

FoxG expression was first observed at the neurula stage in the anterior region of the first somites (**Figures 3W,Y**). At the late neurula stage, FoxG was expressed in the anterior ventral region of the three most anterior somites (**Figures 3X,Z**). Later on, in late neurula before the mouth opens, a neural expression appeared in some individual neurons within the neural tube, while the expression observed in the first somites disappeared (**Figure 3A'**). This expression persisted in the larva stage embryos in which FoxG was also detected in some neurons of the cerebral vesicle (**Figure 3B'** and Figure S1F).

#### *FoxJ1*

FoxJ1 showed a dynamic expression pattern. Expression began during gastrulation and was detected in the ectoderm except the ectoderm around the blastopore (**Figures 4A,B**). Later on, at the late gastrula stage, this expression pattern persisted in the ectoderm that give rise to the epidermis (**Figures 4C,D**). At the mid-late neurula stage, we detected transcripts in the neural tube while the expression in the epidermis was completely lost (**Figures 4E,F**). This neural tube expression was no more observed in late neurula stage embryos before the mouth

FIGURE 3 | Expression of *B. lanceolatum FoxC*, *FoxD, FoxEa, and FoxG*. In all the panels except (B,J), anterior is to the left. In lateral and blastoporal views dorsal is to the top. FoxC expression pattern (A–H). Gastrula lateral (A) and blastoporal (B) views. The double arrowhead indicates the expression in the paraxial mesoderm. Late gastrula lateral (C) and dorsal (D) views. The arrowheads marks the region that will give rise to the three most anterior somites. In mid-late neurula lateral view (E) the arrow indicates a new expression domain in the anterior endoderm. Late neurula dorsal (F) and lateral (G) views. The arrow marks the expression domain in the pharynx. Larva lateral view (H). FoxD expression pattern (I–P). Gastrula lateral (I) and blasporal (J) views. Late gastrula lateral (K) and dorsal (L) views. The arrow indicates the expression in the anterior region of the neural

opens (data not show), however at the larva stage we observed expression at the anterior tip of the embryo and in the pharynx at the level of the preoral pit and of the first gill slit (**Figure 4G** and Figure S1G).

#### *FoxK*

FoxK was ubiquitously expressed from the eight-cell stage to the blastula stage (Figures S2A,B). At the gastrula stage, the expression became restricted to the mesendoderm (Figures S2C,D), and by the late gastrula stage transcripts were detected mostly in the dorsal mesoderm (Figures S2E,F). At the mid-late neurula stage, we detected a stronger expression in the most anterior region of the embryo (Figures S2G,H). Transcripts were then detected in the whole embryo at the late neurula stage with a stronger expression in the anterior tip (Figures S2I,J). Finally, at the larva stage, we observed a ubiquitous expression with a higher level at the anterior tip and in the pharynx (Figure S2K).

#### *FoxM*

FoxM transcripts were detected ubiquitously during the whole embryonic development, from the eight-cell stage until the midlate neurula stage except in the epidermis (Figures S2L–S). Later plate and the double arrowhead marks the expression in the paraxial dorsal mesendoderm. Mid-late neurula lateral view (M). Late neurula dorsal (N) and lateral (O) views. Larva lateral view (P). In (M, O, P) the arrows indicate the expression domain in the cerebral vesicle. FoxE expression pattern (Q–V). Early neurula lateral (Q) and dorsal (R) views. Mid-late neurula lateral (S) and dorsal (T) views. Late neurula lateral view (U). Larva lateral view (V). FoxG expression pattern (W–B'). Early neurula lateral (W) and dorsal (Y) views. Mid-late neurula lateral (X) and dorsal (Z) views. The arrowhead indicates the expression in the three most anterior somites. In the late neurula stage lateral view (A') the arrows mark the neurons within the neural tube. Larva stage lateral view (B'). Scale bar: 10µm (A–E), (I–L), (Q–T), (W–Z), and 50 µm (F–H), (N–P), (U,V), (A',B').

on, at late neurula stage, FoxM expression could not be detected anymore by in situ hybridization (Figure S2T).

#### *FoxN1/4a*

Ubiquitous FoxN1/4a expression was detected from the eight-cell stage until the blastula stage (**Figures 4H,I**). At the gastrula stage, a signal was detected in the anterior ectoderm (**Figures 4J,K**). Later on, at the early neurula stage, we observed transcripts in the anterior endoderm as well as in the axial central mesoderm (**Figures 4L,M**). At the mid-late neurula stage, we detected three major expression domains: one anterior, at the level of the cerebral vesicle, a second one in the anterior ventral endoderm and a third one in the posterior mesoderm (**Figure 4N**). At the late neurula stage before the mouth opens, we observed expression in the anterior and posterior endoderm (**Figure 4O**). Finally, at the larva stage, we detected expression in the posterior region of the gut and in the anus (**Figure 4P**).

#### *FoxN2/3*

Ubiquitous expression of FoxN2/3 was observed from the eightcell stage (**Figure 4Q**) to the blastula stage (**Figure 4R**). Then, at the gastrula stage, the expression was restricted to the

*FoxN2/3*. In all the panels except (B, H, I, K, Q, R, T) anterior is to the left. In lateral and blastoporal views dorsal is to the top. FoxJ1 expression pattern (A–G). Gastrula lateral (A) and blasporal (B) views. Late gastrula lateral (C) and dorsal (D) views. Mid-late neurula lateral (E) and dorsal (F) views. In the larva lateral view (G) the bracket indicates the pharyngeal region. FoxN1/4a

lateral (J) and blastoporal views (K). Early neurula lateral (L) and dorsal (M) views. In the mid-late neurula lateral view (N), the arrowhead, double arrowhead and arrow mark the three main expression domains: at the level of the cerebral vesicle, in the anterior ventral endoderm and in the posterior (Continued)

#### FIGURE 4 | Continued


mesendoderm (**Figures 4S,T**). At the late gastrula stage, the expression remained strong in the mesendoderm but started to become lower in the ventral part (**Figures 4U,V**). By the mid-late neurula stage, FoxN2/3 transcripts were detected in the mesoderm and in the neural tube (**Figure 4W**). At the late neurula stage, before the mouth opens, the expression was mainly detected in the paraxial mesoderm (somites) and in the notochord. A new expression domain also appeared at this stage in the pharyngeal endoderm (**Figures 4X,Y**). At the larva stage, we did not detect any specific signal using in situ hybridization.

#### Discussion

#### Fox Genes Expression in Cephalochordate Species

The complete or partial embryonic expression patterns of FoxAa, FoxAb, FoxB, FoxC, FoxD, FoxEa, FoxG, and FoxN1/4a were previously described in B. floridae and/or B. belcheri (Shimeld, 1997; Terazawa and Satoh, 1997; Toresson et al., 1998; Mazet and Shimeld, 2002; Yu et al., 2002a,b; Mazet et al., 2006; Bajoghli et al., 2009). These genes overwhelmingly show a similar embryonic expression to what we observed in B. lanceolatum, as we have previously noticed for other important developmental genes (Somorjai et al., 2008). However, our work brings some new information.

First, in contrast to what has been described in B. floridae, we showed that FoxAa and FoxAb have different expression patterns. Indeed, in B. floridae, FoxAb in situ hybridization data showed that it has a similar expression to FoxAa at early stages whereas expression was no more detected after the eight somites stage (Shimeld, 1997). Here we showed that although both genes were expressed in the mesendodermal part of the dorsal blastoporal lip at the gastrula stage, the overall expression patterns are consistently different between the two genes and we observed a restricted expression of FoxAb from the gastrula to the larva stage. These discrepancies might be explained by the fact that the level of expression of FoxAb is very low. Indeed, staining of embryos hybridized to FoxAb took very long suggesting a low expression level. Thus, the staining time used in B. floridae might have been too short to detect expression in late stage embryos. Moreover, the expression we observed for FoxAa in B. lanceolatum is different from what was observed in B. floridae but similar to what has been described in B. belcheri (Terazawa and Satoh, 1997). Indeed, as in B. belcheri, FoxAa was not expressed in the central nervous system of B. lanceolatum. On the other hand, FoxAb showed a very specific expression in the ventral part of the neural tube in neurula stage embryos, which has been proposed to be homologous to the vertebrate floor plate. Vertebrates have three FoxA group paralogous genes that are expressed in the organizer, the notochord, the floor plate and the endoderm (Friedman and Kaestner, 2006). In Ciona (Di Gregorio lateral (U) and dorsal (V) views. Mid-late neurula lateral view (W). Late neurula lateral (X) and dorsal (Y) views. The arrow in (X) indicates the expression domain in the pharyngeal endoderm. Scale bar: 10µm (A–F), (H–N), (Q–W), and 50 µm (G), (O–P), (X,Y).

et al., 2001), Ci-fkh is also expressed in the notochord, the floor plate and the endoderm. The data we obtained in B. lanceolatum suggest that the expression of FoxA in the chordate ancestor was similar to what is observed in tunicates and that independent subfunctionalizations occurred in cephalochordates after specific gene duplication and in vertebrates after the two rounds of whole genome duplications.

Concerning FoxB, expression in B. floridae was first detected in neurulae with five somites (Mazet and Shimeld, 2002). Here we showed that in B. lanceolatum FoxB expression could be observed in gastrula embryos in the dorsal posterior mesendoderm and ectoderm. Then, in neurulae, we detected expression in the neural plate similar to B. floridae, as well as an expression in the most posterior somites that was not previously described. This expression in the neural plate/neural tube and in the lastly formed somites persisted until the late neurula stage. Interestingly, in amphioxus three different somitic populations have been described (Bertrand et al., 2011). The first, most anterior, population forms under the control of the FGF signal and the two posterior populations forms independently of the FGF signal. Several genes are expressed specifically in these three somitic populations but only one gene, Mox,(Minguillon and Garcia-Fernandez, 2002) is expressed in the second and third populations. The present data suggest that FoxB also plays a role in the formation of these somitic population since it is also expressed in the two most-posterior somitic populations.

In B. floridae, FoxC has been described as being firstly expressed in the mesoderm of neurulae but its expression was described only in one developmental stage (Mazet et al., 2006). Here we showed that expression starts much earlier, at the gastrula stage, in the dorsal paraxial mesendoderm, the presumptive somitic mesoderm territory. Expression persisted in the paraxial mesoderm/somites until the larva stage, and at the late neurula stage we started to observe expression in the clubshaped gland anlagen and at the place where the first gill slit opens. These data suggest a major ancestral role of FoxC during somitogenesis which would have been conserved in vertebrates (Kume et al., 2001; Wilm et al., 2004; Wotton et al., 2008) and lost in tunicates in which FoxC is expressed in neural and palp cells (Imai et al., 2006).

FoxD and FoxEa expression in B. lanceolatum was very similar to previous descriptions in B. floridae (Yu et al., 2002a,b). However we noticed expression in some specific regions of the pharynx in late neurulae and larvae for FoxD, and a transient expression in mid-late and late neurula stage embryos in the posterior endoderm for FoxEa that were not described in the Caribbean species.

FoxG, previously known as Brain Factor 1 (BF-1), was described in B. floridae as a gene that is ventrally expressed in the cerebral vesicle and in the anterior-most portion of the first somite pair (Toresson et al., 1998). Our results showed a conserved expression pattern in the cerebral vesicle area in B. lanceolatum. However, mesoderm expression is not only limited to the first somite pair but the first three somite pairs exhibit the same pattern at the neurula stage suggesting that this gene might play a role during anterior somitogenesis. This result highlights the functional differences between the formation of the anterior somites which is under the control of the FGF signaling pathway and the formation of the most posterior somites which is not FGF-dependent (Bertrand et al., 2011). Moreover, expression is localized in the ventral part of these three most anterior somites which will give rise to the perivisceral coelom, suggesting a function of FoxG in the establishment of the somitic compartments.

#### *FoxJ1* and the Formation of Motile Cilia

FoxJ1 orthologs were identified in many eumetazoans as well as in sponges (Larroux et al., 2006) and choanoflagellates (King et al., 2008). In vertebrates, FoxJ1 plays an essential role in the generation of motile cilia and in mediating Left/Right asymmetry (Chen et al., 1998; Brody et al., 2000; Yu et al., 2008b). It has also recently been shown that misexpression of FoxJ1 from placozoans, echinoderms and platyhelminthes in zebrafish embryos induces the expression of ciliary genes, whereas the inactivation of FoxJ1 in the flatworm Schmidtea mediterranea impairs the normal differentiation of motile cilia, suggesting a conserved function in metazoans (Vij et al., 2012). This conserved function is also supported by the embryonic expression of FoxJ1 in different phyla (Choi et al., 2006; Tu et al., 2006; Fritzenwanker et al., 2014). In B. lanceolatum, we showed that FoxJ1 is first expressed in the ectoderm of the gastrulae, excluding the blastoporal region and the presumptive neural plate, at the time at which motile cilia start to grow. Then, in neurulae, expression was lost in the epidermis and appeared in the closed neural tube. At the larva stage, expression was restricted to the anterior tip of the animal and to the ciliated preoral pit and first gill slit. This expression pattern suggests that in amphioxus FoxJ1 might also play a role in the formation of motile cilia. However, other cells, like the epithelial gut cells, also harbor motile cilia and do not express FoxJ1, suggesting that other genes might also be implicated in ciliogenesis in these embryonic structures.

#### *FoxAB*

In B. lanceolatum, FoxAB was transiently expressed in the organizer at the gastrula stage and in the presumptive notochord later on. No expression could be detected in mid-neurulae or larvae. FoxAB family genes were described in hemichordates (Fritzenwanker et al., 2014), sea urchin (Tu et al., 2006) and cnidarians and are absent in vertebrates and tunicates, the two other chordate clades (Yu et al., 2008a). In the hemichordate Saccoglossus kowalevskii, FoxAB is expressed in the ectoderm and the mouth perforates through the ring expressing this gene in the ventral side (Fritzenwanker et al., 2014). In bryozoans, FoxAB also shows an ectodermal expression (Fuchs et al., 2011). Therefore, it is still difficult to propose any scenario for the evolution of the function of FoxAB family genes in bilaterians. FoxAB could have been recruited for the patterning of the

#### *FoxK* and *FoxM* Ubiquitous Expression

We detected a ubiquitous expression of FoxK starting at the eight-cell stage until the larva stage. In other bilaterians data are scarce. In vertebrates, there are two paralogs in the FoxK family, FoxK1 and FoxK2. In mouse, the study of the function of FoxK1 during embryonic development was undertaken showing that the gene is involved in myogenic differentiation (Bassel-Duby et al., 1994). In Ciona intestinalis (Imai et al., 2004) as in the hemichordate S. kowalevskii (Fritzenwanker et al., 2014), the expression of FoxK is quite ubiquitous as observed for B. lanceolatum. Finally, studies in Drosophila have shown that FoxK is involved in the differentiation of midgut in the fly embryo (Casas-Tinto et al., 2008). Altogether these data do not allow us to infer any putative ancestral function for FoxK family genes and further studies are required in different animal phyla.

FoxM expression is also ubiquitous in B. lanceolatum and was first detected as early as the eight-cell stage. Then the expression level continuously decreased while development proceeds and became undetectable by in situ hybridization at the late neurula stage. In Xenopus, FoxM1 is maternally expressed and transcripts are thereafter detected in the neuroectoderm (Pohl et al., 2005). Moreover this gene has been shown to be important for early neuronal differentiation (Ueno et al., 2008). In mouse, FoxM1 is expressed in dividing cells and knock-out animals exhibit embryonic lethal phenotype due to many malformations affecting different organs such as the liver, the heart, the lung, or the vasculature (Kalin et al., 2011). As for FoxK, the data available up to now do not give us any indication on the putative ancestral function of genes belonging to the FoxM family.

#### *FoxN1/4a* and *FoxN2/3* Expression

In all vertebrates studied so far, FoxN1 plays an essential role in thymus development (Ma et al., 2012; Neves et al., 2012; Lee et al., 2013; Romano et al., 2013). Moreover, in mammals, FoxN1 is essential for hair formation whereas it is also expressed in chick during feather development (Darnell et al., 2014). Although mammal and fish FoxN1s are able to activate the expression of hair keratin genes, FoxN1/4 from amphioxus is not because its N-terminal region of the forkhead domain is different compared with vertebrates (Schlake et al., 2000). On the other hand, FoxN4 is expressed in the nervous system, including retina, during vertebrate development (Danilova et al., 2004; Kelly et al., 2007; Boije et al., 2013). Outside vertebrates, embryonic expression has been described in S. kowalevskii (Fritzenwanker et al., 2014) and in a single developmetal stage of B. floridae (Bajoghli et al., 2009). In the hemichordate, expression of FoxN1/4 is ubiquitous during early development and is thereafter observed in the ectoderm. In B. lanceolatum, the expression of FoxN1/4a was very dynamic with a maternal ubiquitous expression followed by restricted expression in the ectoderm at the gastrula stage, in the endoderm and axial mesoderm in neurulae, in the cerebral vesicle, the pharynx and the posterior somites later on, and, finally, in the posterior gut of the larvae. These data suggest that FoxN1 and FoxN4 probably acquired new functions in vertebrates, and analysis of the expression of FoxN1/4 family genes in tunicates will be needed to better understand this point. Interestingly, the gut of amphioxus larva and adult is considered as a major organ for immunity and FoxN1/4a might, as vertebrates FoxN1, play a role in the control of immune system function in amphioxus. However, further functional studies are required to test this hypothesis.

In vertebrates, FoxN3 is important for craniofacial and eye development (Schuff et al., 2007; Samaan et al., 2010; Schmidt et al., 2011). In Xenopus, FoxN3 is expressed in neural crest and eye field whereas FoxN2 is expressed early in the eye field and then in branchial arches, retina and vagal ganglion (Schuff et al., 2006). In mouse, FoxN2 is expressed in craniofacial, limb, nervous system and somitic tissues (Tribioli et al., 2002). In Ciona intestinalis, expression of FoxN2/3 is quite ubiquitous during early development and becomes more intense in the sensory vesicle, the mesenchyme, the notochord and the palps after gastrulation (Imai et al., 2004). In sea urchin FoxN2/3 is expressed in the non-skeletogenic mesoderm and, later on, in the endoderm and it has been shown that FoxN2/3 function is important for ingression and for the expression of genes coding for proteins of the skeletal matrix (Rho and Mcclay, 2011). Here, we show that FoxN2/3 in amphioxus was ubiquitously expressed at early stages. Then, at the gastrula stage, its expression was restricted to the endomesoderm and later on we observed a specific expression in the somites. Altogether, this suggests a conserved role of FoxN2/3 in the development of mesoderm in deuterostomes, although genes of this family seem to have acquired specific functions in each chordate lineage.

# References


# Conclusions

Analyzing the expression of Fox genes in the Mediterranean amphioxus, B. lanceolatum showed us several points. First, as previously described for other gene families (Somorjai et al., 2008), the expression of orthologous genes in different amphioxus species shows a high degree of stasis. However, differences may be found that can easily be explained by variation in experimental sensitivity. And, second, the comparative analyzes of the expression of amphioxus Fox genes with other metazoans and particularly chordates have shown a high degree of conservation for some genes (e.g., FoxC, FoxD), but also divergent patterns in others (e.g., FoxM, FoxN1/4a). This indicates that Fox genes were necessary for essential functions in metazoans but they were also instrumental for the evolution of new functions. Further studies in amphioxus and other metazoans, and particularly functional studies, will be extremely important in the future to establish the complete picture of Fox genes expression and function and their role in the evolution of animals.

#### Acknowledgments

DA holds a fellowship from CONICYT "Becas Chile." Part of this study was supported by the EXOMOD grant from the CNRS.

## Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fevo. 2015.00080


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Aldea, Leon, Bertrand and Escriva. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Asymmetric Distribution of *pl10* and *bruno2*, New Members of a Conserved Core of Early Germline Determinants in Cephalochordates

Simon C. Dailey 1 †, Roser Febrero Planas 2 †, Ariadna Rossell Espier <sup>2</sup> , Jordi Garcia-Fernàndez <sup>1</sup> and Ildikó M. L. Somorjai 1, 2 \*

<sup>1</sup> Gatty Marine Laboratory, Scottish Oceans Institute, University of St Andrews, St Andrews, UK, <sup>2</sup> Department of Genetics, University of Barcelona, Barcelona, Spain

#### *Edited by:*

Sylvain Marcellini, University of Concepcion, Chile

#### *Reviewed by:*

Brian I. Crother, Southeastern Louisiana University, USA Eduardo E. Zattara, Indiana University, USA

> *\*Correspondence:* Ildikó M. L. Somorjai imls@st-andrews.ac.uk

† These authors have contributed equally to this work.

#### *Specialty section:*

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Ecology and Evolution

> *Received:* 01 September 2015 *Accepted:* 28 December 2015 *Published:* 22 January 2016

#### *Citation:*

Dailey SC, Febrero Planas R, Rossell Espier A, Garcia-Fernàndez J and Somorjai IML (2016) Asymmetric Distribution of pl10 and bruno2, New Members of a Conserved Core of Early Germline Determinants in Cephalochordates. Front. Ecol. Evol. 3:156. doi: 10.3389/fevo.2015.00156 Molecular fingerprinting of conserved germline and somatic "stemness" markers in different taxa have been key in defining the mechanism of germline specification ("preformation" or "epigenesis"), as well as expression domains of somatic progenitors. The distribution of molecular markers for primordial germ cells (PGCs), including vasa, nanos, and piwil1, as well as Vasa antibody staining, support a determinative mechanism of germline specification in the cephalochordate Branchiostoma lanceolatum, similarly to other amphioxus species. pl10 and bruno2, but not bruno4/6, are also expressed in a pattern consistent with these other germline genes, adding to our repertoire of PGC markers in lancelets. Expression of nanos, vasa, and the remaining markers (musashi, pufA, pufB, pumilio, and piwil2) may define populations of putative somatic progenitors in the tailbud, the amphioxus posterior growth zone, or zones of proliferative activity. Finally, we also identify a novel expression domain for musashi, a classic neural stem cell marker, during notochord development in amphioxus. These results are discussed in the context of germline determination in other taxa, stem cell regulation, and regenerative capacity in adult amphioxus.

Keywords: amphioxus, posterior stem cells, evolution, regeneration, PGCs, preformation

# INTRODUCTION

One of the key innovations coupled to the evolution of multicellularity was the ability to segregate the germline and the soma, with transcriptional repression of a somatic programme being key to maintaining the germ cell fate (Hallmann, 2011). Historically, two main mechanisms of germline specification have been defined in animals: preformation and epigenesis (Extavour and Akam, 2003), or determinative and inductive modes, respectively. In the determinative mode, cytoplasmic determinants associated with the germ plasm in the egg are inherited by a limited number of daughter cells during cleavage, which are thus specified as presumptive germ cells (PGCs), and go on to form the mature adult gametes. In contrast, during induction, inductive cues cause somatic cells to become specified as germline. Studies in mouse, axolotl, and cricket suggest that BMP signaling may be an ancient mechanism for PGC induction from mesoderm in animals (Chatfield et al., 2014; Donoughe et al., 2014). The phylogenetic distribution of these two mechanisms of germ cell specification suggests that the inductive mode may represent the ancestral state, and that germ plasm has evolved independently multiple times (Blackstone and Jasker, 2003; Extavour and Akam, 2003; Johnson et al., 2003a,b; Crother et al., 2007; Extavour, 2007; Ewen-Campen et al., 2010).

Comparative studies in multiple taxa have revealed that the molecular signature of germ cells may often be shared across species that use both determinative and inductive modes of PGC specification, leading to the proposition of a conserved germline multipotency programme (Extavour, 2007; Juliano et al., 2010). Interestingly, some basal metazoans appear to use a combination of mechanisms to specify germ cells, and many of the classic germline markers are in fact also expressed in adult somatic stem cells in these organisms (Alié et al., 2011; Leclère et al., 2012). Germline-associated genes are most often RNA-binding proteins, but there is considerable species-specific variation in the suite employed (see Gazave et al., 2013 for a compilation of much of the recent literature). However, a key core of proteins including Vasa/PL10, Tudor and a PIWI domain containing protein may represent an ancestral "pluripotency module" (Ewen-Campen et al., 2010). Tweaking upstream regulators or downstream targets, combined with the addition of new germ cell genes, such as nanos or bruno, would have generated the diversity in germline specification mechanisms in early metazoans (Ewen-Campen et al., 2010).

Until recently, little was known about germline specification in cephalochordates (lancelets or amphioxus), the sister group to the vertebrates and tunicates, and the best living proxy for the ancestral chordate (Bertrand and Escrivà, 2011). Classic studies suggested that lancelets might employ an inductive mode of PGC specification (reviewed in Extavour and Akam, 2003). Electron microscopy data however showed that in Branchiostoma floridae, the pole plasm localizes to the vegetal cortex soon after fertilization and segregates into a single blastomere during cleavage, putting into question this hypothesis (Holland and Holland, 1992). Although functional data are still lacking, blastomere separation experiments combined with expression data for molecular markers traditionally associated with the germline, including piwi-like1, nanos, vasa, and Vasa protein, strongly support a determinative mode of PGC specification in cephalochordates (Wu et al., 2011; Zhang et al., 2013 and **Figure 1**). Zygotic expression domains from gastrulation onwards of these genes, as well as piwi-like2 and tudor7, also suggest a function in somatic progenitors/stem cells of the posterior growth zone. Together, these data provide a general framework for understanding how markers for PGCs and posterior progenitors may be expressed in cephalochordates during development (**Figure 1**).

Currently, the most convincing evidence for the existence of somatic stem cells in cephalochordates comes from studies of tail regeneration in the European amphioxus, Branchiostoma lanceolatum, whose adult regenerative ability is comparable to that seen in many ambulacrarians (echinoderms and hemichordates; Somorjai et al., 2012a,b). Unfortunately, no germ cell markers have so far been characterized during development in this species, and few putative somatic stem cell markers exist in cephalochordates. The purpose of this study was therefore threefold: First, to characterize the early expression of candidate amphioxus germline markers nanos, piwil1, vasa, and Vasa protein in B. lanceolatum for comparative purposes with other cephalochordates; second, to determine whether germline markers in other taxa, including pumilio, pufA, pufB, musashi, pl10, bruno2, and bruno4/6, are associated with PGCs in amphioxus; and third, to analyse the late developmental expression of some of these candidates as a prelude to future regeneration studies. Given the considerable conservation in developmental gene expression patterning in cephalochordates demonstrated thus far (e.g., Somorjai et al., 2008; Wu et al., 2011; Zhang et al., 2013), we hypothesize that markers for PGCs and posterior somatic domains will show comparable gene expression profiles in B. lanceolatum to B. belcheri, B. japonicum, and B. floridae.

Here, we present the first analysis of putative germline and somatic stem cell markers in the European amphioxus, B. lanceolatum. We identify a core set of conserved PGCassociated markers in cephalochordates, including piwil1, nanos, and vasa, characterize Vasa protein distribution, and identify two new candidate germ cell markers in cephalochordates, pl10, and bruno2. We also characterize the amphioxus musashi ortholog, whose expression in the notochord represents a novelty in chordates. The highly conserved molecular expression data in the Branchiostomatidae support the view that cephalochordate evolution is strongly constrained, and show that data are broadly transposable across species, even in the context of germline formation. This study also provides the foundation for future studies of regeneration in the amphioxus B. lanceolatum.

# MATERIALS AND METHODS

## Embryos

Ripe adults were collected in Argelès-sur Mer (France) and spawned as previously described (Fuentes et al., 2007). Embryos were fixed at the relevant time points in 4% PFA in MOPS salts (0.1 M MOPS, 2 mM MgSO4, 1 mM EGTA, and 0.5 M NaCL), and stored in 70% ETOH at −20◦C. For phalloidin staining, embryos were stored in PBS at 4◦C. Embryos were staged according to Hirakow and Kajita (1991, 1994), with modifications as per Zhang et al. (2013).

# Phylogenetic Analysis

If not previously published, putative orthologous sequences were identified using a BLASTp search; reciprocal BLAST was used to confirm identity (Camacho et al., 2009). Protein sequences were aligned in Jalview version 2.8.2 (Waterhouse et al., 2009) using MAFFT on default settings, and checked manually. All positions with less than 95% site coverage were eliminated directly in MEGA5 prior to analysis. Evolutionary models considered to best describe the substitution pattern were identified as those with the lowest BIC (Bayesian Information Criterion) scores using MEGA5 (Tamura et al., 2011). Both neighbor joining (NJ) and maximum likelihood (ML) analyses were performed with 500 and 1000 bootstraps, respectively. The Nearest Neighbor Interchange method was used to infer tress in ML. Unless otherwise noted, and since concordant with results from the NJ method, only ML trees are shown. Model details for each analysis are included in the figure legends for ease of reference. All

sequences used for phylogenetic analyses, including associated accession numbers, are included in Supplementary File 1.

# Cloning and Probe Synthesis

RNA was extracted from embryos and adult tissues using Trizol and phenol chloroform extraction; cDNA was generated using Tetro cDNA Synthesis kit (Bioline). Gene fragments for probe generation were amplified by PCR using gene-specific primers designed against the genome of B. floridae (Supplementary File 2), ligated into PGEMT–Easy (Promega) and transformed into XL10-Gold (Stratagene) or DH5α (Invitrogen) strains of E. coli by heat shock using standard protocols. Selected clones were mini-prepped using peqGOLD or Promega plasmid miniprep kits, and sequence verified. Template was generated by PCR on plasmids using Universal M13F (5′ GTAAACGACGGCCAGT 3 ′ ) and M13R (5′AACAGCTATGACCATG 3′ ) primers. The band was gel-purified using either the QIAquick (Qiagen), GFX (Amersham), or Isolate II (Bioline) gel extraction kits following manufacturers guidelines. DIG-labeled (Roche) antisense probes were in vitro transcribed using T7, T3, and SP6 enzymes as appropriate following standard protocols. Probes were checked by agarose electrophoresis and purified using miniQuick Spin columns (Roche) or via precipitation with sodium acetate (3 M, pH 5.2) and ethanol.

# Whole Mount *In situ* Hybridization (WMISH)

WMISH was performed as previously described (Somorjai et al., 2008). Briefly, fixed embryos were washed in PBT (0.1% Tween), and permeabilized using proteinase K (7.5 mg/ml) for empirically-tested periods based upon embryo stage and enzyme batch. Embryos were postfixed for 40 min in PFA, deacetylated in acetic acid in triethanolamine (0.1 M, pH 8), and pre-hybridized at least 2 h in hybridization solution. Embryos were incubated overnight with shaking at 60–65◦C depending on probe. The first post-hybridization washes were performed at the hybridization temperature, with subsequent washes at room temperature in decreasing concentrations of SSC. An RNAse step was included (37◦C). Embryos were incubated overnight in primary antibody (anti-DIG AP, Roche), pre-adsorbed at 1:3000, with rocking at 4 ◦C. Copious washing in PBT was performed between each step. For the chromogenic reaction we used either BM Purple (Roche) or NBT/BCIP (Roche); embryos were postfixed in PFA for 20 min when the signal:background was deemed appropriate. At least three WMISH were performed for each gene, on 10–50 embryos per stage in total. Embryos were mounted in 80% glycerol/20% PBS, and photographed under a Leitz DMRB microscope (Leica Microsystems) with Normarski optics. Photographs were taken with the Retiga 2000R camera and the QCapture software suite (QImaging), and processed in Adobe Photoshop CS3.

# Immunohistochemistry

Immunohistochemistry and Alexa-fluor 568-labeled phalloidin stainings for F-Actin (Invitrogen, 1:400) were carried out as per Somorjai et al. (2012a). Briefly, after fixation, embryos were washed in PBT (phosphate buffered saline plus 0.1% Tween, pH 7.6), and permeabilized in PBS with 0.2% Triton-X for 40 min. After copious washing in PBT, embryos were incubated overnight at 4◦C in primary antibody. Embryos were again washed in PBT and incubated in secondary antibody or phalloidin for 2 h at room temperature, or overnight at 4◦C. A specific B. floridae anti-Vasa antibody, generously donated by Dr Jr-Kai Yu, was used at 1:20,000 (Wu et al., 2011). Secondary antibodies were Alexa fluor 488 and 568 diluted at 1:400 (Molecular Probes). Embryos were mounted in Vectashield (VectorLabs) containing Hoescht 33342 dye to stain nuclei (1:2000 of 10 mg/ml). Confocal images were taken on a Lecia TCS SP8 confocal microscope, and processed using NIH ImageJ 1.48 d and Adobe Photoshop CS3.

# RESULTS

# Identification of Candidate Germline and Somatic Stem Cell Markers in *B. lanceolatum*

We selected DEAD-box (Vasa, Pl10), Pumilio domain (Pumilio, PufA, PufB), PIWI domain (Piwil1, Piwil2) RRM (Musashi), CELF (Bruno2 and Bruno4/6), and Nanos families as candidate germline and somatic stem cell markers for analysis in B. lanceolatum. When B. floridae orthologs had not been previously characterized in the extensive phylogenetic analyses of Kerner et al. (2011), we identified putative stem cell markers using BLASTp searches against the genomes of B. floridae and B. belcheri, and confirmed the identity of our B. lanceolatum proteins by comparison with published sequences in other cephalochordates (Supplementary File 4), including transcriptomic data from B. lanceolatum in the NCBI TSA database (Oulion et al., 2012). We generated phylogenies that include, where possible, sequences from more than one amphioxus species to support the identity of these proteins (Supplementary File 5; and see below). We then cloned partial sequences of orthologs in B. lanceolatum using primers designed in its sister species B. floridae (Supplementary File 2). Using this approach, we successfully cloned 12 genes (including two piwil1; not shown) with known function in the germline or somatic stem cells (**Table 1**). While previous phylogenies show that the distinction among Piwi clades is unequivocal (Kerner et al., 2011), the evolutionary history of piwi genes in cephalochordates is more complex. We identified a single piwil2 (piwiA in Kerner et al., 2011) and three piwil1 (piwiB in Kerner et al., 2011) genes in the genomes of B. belcheri and B. floridae. The latter belong to an apparent tandem duplication cluster (not shown and Yue et al., 2015) that appears to be present in all Branchiostoma, as we successfully cloned two of the three paralogs of piwil1 in B. lanceolatum. We also identified an ortholog of piwiX ("piwilike" in Zhang et al., 2013), but have been unable to clone the gene in B. lanceolatum. While EST data collected in NCBI and B. floridae EST databases (Yu et al., 2008) support the expression of piwil1 and piwil2 (Supplementary File 3), we have not identified any expression data for piwiX in any database, including our own tail regenerate transcriptome dataset (Dailey and Somorjai, unpublished).

We also cloned partial pl10, vasa, nanos, bruno2 (brunoB or CELF2 in Kerner et al., 2011), bruno4/6 (brunoA or CELF4/5/6 in Kerner et al., 2011), pufA, pufB, and pumilio sequences. The phylogenetic analyses broadly confirm previous studies (Kerner et al., 2011), though we could only confirm the existence of single A-type and B-type Bruno sequences. In most cases we could identify B. belcheri orthologs for the B. floridae proteins, in addition to several B. lanceolatum sequences (Supplementary Files 4, 5). EST data in B. floridae also supported the expression of these putative germline and somatic stem cell markers (Supplementary File 3).

As we were interested specifically in stem cell-related Musashi, and relationships among Musashi-related protein families are complex (Gasparini et al., 2011) we generated phylogenies utilizing the available full length B. floridae and B. belcheri sequences, and included putative Saccoglossus kowalevski orthologs. We clearly identified sequences belonging to the TARDBP43 and hnrpA3/hnrpD clades (**Figure 2**). The close relationship between Musashi-like and DAZAP proteins is also strongly supported by this analysis, although the branching order is unclear particularly within DAZAP sequences and in basal metazoans. Notably, we were unable to find an amphioxus sequence with convincing affinity to DAZAP/hnrp27 genes in either species. We did however identify a Musashilike sequence in both B. belcheri and B. floridae (**Figure 2**, Supplementary File 4). In spite of the relatively low support for the Musashi clade, most likely due to the inclusion of

FIGURE 2 | Phylogenetic analysis of the RRM domain containing protein family in animals, including Musashi-like, DAZAP, hnrpD, hnrpA, and TARDP43 clades. Maximum likelihood analysis was performed in MEGA5 with 1000 bootstrap replicates, indicated as a percentage at each node. The model used was rtREV + G with five rate categories on 173 sites. Branches are colored according to the level of node support; amphioxus species names are highlighted in blue and red, with the clade representing musashi genes boxed in blue. Protein names are taken directly from the literature where available. See text for details. Species name abbreviations are as follows: Aqu, Amphimedon queenslandica; Bfl, Branchiostoma floridae; Bbe, Branchiostoma belcheri; Bsc, Botryllus schlosseri; Cgi, Crassostreas gigas; Cte, Capitella teleta; Dme, Drosophila melanogaster; Efl, Ephydatia fluviatilis; Hsa, Homo sapiens; Nve, Nematostella vectensis; Pdu, Platynereis dumerilii; Sko, Saccoglossus kowalevski; Tad, Trichoplax adherens; Tca, Tribolium castaneum; Tki, Thelohanellus kitauei; Xtr, Xenopus tropicalis.


TABLE 1 | Genes cloned in this study and accession numbers for all identified sequences in *Branchiostoma* species.

Protein sequences predicted in genome or transcriptome assemblies are indicated by italicized accession numbers. Percentage identity of each B. lanceolatum clone is given relative to the most-complete available B. floridae protein (Bla/Bfl). Abbreviations: Bla, B. lanceolatum; Bfl, B. floridae; \*, additional sequences; N.D., not determined. The sequence listed as "Unpublished data<sup>1</sup> " is provided in Supplementary File 1 as "Bbe\_Bruno2\_076200F\_001000\_in," and "Unpublished data<sup>2</sup> " as "Bbe\_PL10\_173980F\_003600\_in."

non-bilaterian metazoan sequences and the divergent insect "Musashi" proteins, the amphioxus sequence groups with vertebrate and hemichordate sequences with strong support (85), in addition to the recently identified "real" Drosophila Musashirelated protein Rbp6 (Siddall et al., 2012). Insect "musashi" and Rbp6 may therefore represent clade-specific duplications in this group from a musashi-like ancestor. We therefore propose that the Rbp6/Msi sequences be referred to as Musashi-like (blue boxed region in **Figure 2**), and all others outside the clade as DAZAP. Based on this nomenclature and the firm position of amphioxus musashi among deuterostome sequences, we are therefore confident that we identified a musashi gene orthologous to vertebrate musashi1 and musashi2.

## Candidate Marker Expression in Putative PGCs

Recently, expression patterns for putative germline markers have been described in three other species of amphioxus: B. floridae, B. belcheri, and B. japonicum (Wu et al., 2011; Zhang et al., 2013). We therefore performed WMISH for piwil1, piwil2, vasa, and nanos orthologs in early developmental stages of B. lanceolatum. **Figures 3A–D** show the characteristic expression in single "points" from the two cell stage to the gastrula stage in all four genes with the exception of piwil2. In some cases, the morulae or gastrulae contained up to three points (not shown). By the early neurula stages, the punctate distribution may be masked by the zygotic tailbud expression (discussed below).

Another dead-box containing gene, pl10, has been implicated in germ cell specification, and in some cases regeneration, from sponges to annelids (Alié et al., 2011; Rebscher et al., 2012; Leininger et al., 2014; Kozin and Kostyuchenko, 2015). PL10 is closely related phylogenetically to the Vasa protein (Kerner et al., 2011), but expression of pl10 has so far not been described in any cephalochordate. We therefore cloned a clear pl10 ortholog in B. lanceolatum (Supplementary Files 4, 5) and determined its expression using WMISH. Like vasa, pl10 is expressed in a punctate pattern from fertilization until gastrula stages, consistent with a role in PGC specification or maintenance (**Figure 3E**).

We also determined early expression of members of three other classes of RNA-binding proteins that we might expect to have a stem cell association based on reports in other species: the Pumilio domain containing genes pumilio, pufA, and pufB; the CELF/Bruno genes bruno2 and bruno4/6, and musashi (Gazave et al., 2013 and references therein). Up to gastrulation, pumilio, pufA, and pufB show no clear localization in the presumptive germline (Supplementary Files 6A–C). Interestingly, pufA ESTs are found in blastula-stage embryos, and we observe several independent but convincing instances in which pufA appeared to be expressed in a punctate distribution reminiscent of our other PGC-associated patterns in some cleavage stage embryos (Supplementary File 6A). Similarly to PUM domain containing genes, bruno4/6 was absent in B. floridae EST databases, and showed no convincing expression until gastrulation (**Figure 3G**). In contrast, bruno2 showed clear and strong localization to nuage or PGCs (**Figure 3F**). No other marker analyzed had specific expression in the presumptive PGCs (Supplementary File 6), including musashi, which had diffuse ubiquitous expression at early stages (Supplementary File 6D; see Supplementary File 7 for sense control).

# Vasa Protein Distribution is Consistent with PGCs and Somatic Progenitor Cell Domains in *B. lanceolatum*

Along with transcript expression, localization of Vasa is a hallmark of primordial germ cells (PGCs) in multiple species. In order to confirm the identity of PGCs in B. lanceolatum, we took advantage of the recent generation of an antibody against B. floridae Vasa (Wu et al., 2011) to perform immunohistochemistry. Given its clear cross-reaction in several amphioxus species (Zhang et al., 2013), we reasoned that α-BfVasa should also label PGCs in the European amphioxus,

confirming our expression data. The protein distribution resembles that of vasa transcripts (**Figure 3D**), with a pattern reminiscent of germplasm in fertilized eggs and cleavage stages (**Figures 4A–D**). In the late gastrula/early neurula, the protein is perinuclear in small clusters of cells within the ventral endoderm (**Figures 4E,F**); although variable in number (or at least detection), we could clearly identify as many as eight cells by the careful analysis of series of confocal image z-sections (**Figure 4F**). Such clusters could be identified even in some mid-neurula stage embryos, either on one side in the ventral mesoderm (**Figure 4G** and inset), or in most cases posteriorly congruent with the zygotic tailbud domain (**Figure 4H** and inset). Vasa expression was however most conspicuous in the posterior neural

oriented downwards. G3-bla, blastopore view; G3-lat, lateral view. Scale bars = 50 microns.

tube throughout neurulation (**Figures 4I,J**). Only in premouth stage and later larvae was it possible to again more easily identify posterior clusters of Vasa-expressing cells as distinct from posterior neural and tailbud expression (**Figures 4K,L**). Vasa also appeared to demarcate the posteriormost somites (not shown), similarly to vasa transcripts (**Figure 5C**, see below).

### Candidate Stem Cell Marker Expression in Developing Somatic Tissues

We performed WMISH for selected genes from gastrulation onwards, reasoning that they should show expression patterns with possible roles in late developmental processes (**Figure 5** and Supplementary File 8). We thus identified two classes:

"tailbud-enriched" and, broadly speaking, "anterior endodermassociated." We found that piwil1, nanos, and vasa have strong tailbud expression throughout development (**Figures 5A–C**). piwil1 and nanos also show clear posterior neural tube expression in N4 neurulae and L1 stage premouth larvae, as well as expression outlining the posterior somites (black arrowheads, **Figure 5A**; Supplementary File 8A). Though weaker, piwil2 and pl10 both show tailbud expression at later stages, and pl10 is clearly expressed in the neural tube (Supplementary File 9).

In contrast, Pumilio domain containing genes appear enriched in anterior endoderm (**Figures 5D,E** and Supplementary Files 8D,E). During gastrulation, pumilio shows weak expression around the blastopore. In early and mid-neurula stages, stronger expression is evident in the neural plate and anterior ventral endoderm, as well as anterior mesoderm. As neurulation proceeds, pumilio appears mostly restricted to the anterior endoderm, with expression much weaker in the last third of the embryo (**Figure 5E**). Expression continues to be strongest in the future pharyngeal domain until the pre-mouth larval stage. Weaker expression is evident in the rest of the endoderm, with some conspicuous staining in the mesoderm and endoderm of the tailbud region. Expression of

piwi-like1; (B) nanos; (C) vasa; (D) pufA; (E) pumilio; (F) musashi. Embryonic stages are indicated along the top of panel (A), from late gastrula/early neurula G7 to the premouth larval stage L1. Purple arrows indicate PGC-like expression that continues to be detectable for piwi-like1, nanos, and vasa into the early neurula stage N1. Black arrowheads in (A) and (C) indicate expression demarcating somite boundaries. All views are lateral, and all panels are oriented with anterior to the left and dorsal up. Scale bars = 50 microns.

pufA is broadly mesendodermal until N3 neurula stages, when it becomes stronger in an anterior domain that resolves into the club shaped gland in premouth L1 larval stages, as well as in most of the posterior endoderm (**Figure 5D**). pufB expression was very difficult to evaluate as long staining exposures were required for stages post-gastrulation, but quite closely matched that of pufA (not shown). In addition to its posterior expression, pl10 shows diffuse but clear staining in anterior endoderm in N4 and L1 stages (Supplementary Files 9G–I), and clearly resolves to a domain encompassing the presumptive first gill slit in 2–3 day-old larvae (not shown).

We cloned the amphioxus musashi ortholog with the expectation that it would have neural expression. During early stages of development, musashi is ubiquitously expressed (Supplementary File 6D), paralleling B. floridae EST data (Supplementary File 3). However, in the gastrula stage, musashi resolves to a chordomesodermal domain of expression (Supplementary Files 6D, 7), which broadens in the early neurula N1 (**Figure 5F**). musashi is strongly expressed in the anteriormost endoderm and mesoderm from mid-neurula onwards, with weak expression in the neural floorplate and strong expression throughout the chordal plate. By the late neurula stage (30 h, N4) patches of expression can be seen in the neural tube as well as weakly in the cerebral vesicle. Expression is high and stable throughout the forming notochord as well as in the anterior endoderm. Strong notochordal and weak neural expression domains persist in the premouth L1 larva, with strongest expression in the anterior and posteriormost domains of the notochord. The presumptive pharynx also expresses musashi.

# DISCUSSION

# Germline-Associated Gene Expression Conservation in Cephalochordates

Recent work in B. floridae, B. japonicum, and B. belcheri has suggested that germline specification occurs by the asymmetric segregation of cytoplasmic determinants during cleavage, with expression of key conserved germline markers such as vasa and nanos, as well as piwil1 and tudor-related7, in the germ plasm and PGCs (Wu et al., 2011; Zhang et al., 2013). We set out here to characterize the expression of germline-associated markers in the European amphioxus, B. lanceolatum, for which there were until now no data. Similarly to other species, our results also argue against an inductive mechanism for PGC specification: we demonstrate here that B. lanceolatum expresses nanos, piwil1, and vasa in the putative PGCs, as well as Vasa protein, suggesting the presence of a conserved core of germline-associated transcripts in cephalochordates. Stasis in developmental gene expression over millions of years of evolution is considered typical of Branchiostoma (Somorjai et al., 2008), paralleling the genus' relative genomic and morphological conservativeness. The apparent conservation in germline-associated gene expression in amphioxus species is in stark contrast to hypotheses derived in vertebrates that suggest that the evolution of germ plasm is coupled to increased speciation in this lineage (Johnson et al., 2011; Evans et al., 2014). Data in Asymmetron, the earliest diverging and mostslowly evolving of the three extant amphioxus lineages (Kon et al., 2007; Yue et al., 2014), will be invaluable in evaluating the degree of conservation of germline specification mechanisms in cephalochordates.

Our research also identifies pl10, a DEAD-box gene related to vasa, and bruno2 as putative PGC markers in amphioxus. Accumulating evidence suggests that pl10 often plays a role in the germline in metazoans. In addition to Drosophila, pl10 orthologs are expressed in the germinal cells or their derivatives in the annelid Platynereis dumerilii (Rebscher et al., 2007; Gazave et al., 2013), the platyhelminth Dugesia japonicum (Shibata et al., 1999; reported as vasa-related genes)several hydrozoan cnidarian species (Leclère et al., 2012; Siebert et al., 2015), the ctenophore Mnemopsis leydii (Alié et al., 2011), the sponge Sycon ciliatum (Leininger et al., 2014), and in the colonial urochordate B. schlosseri (Rosner et al., 2009). vasa is coexpressed with pl10 in the latter, similarly to our results in B. lanceolatum. In contrast, data are sparse for the second gene identified, bruno2. Homologs of bruno are expressed in PGCs and/or germline derivatives in ctenophores (Alié et al., 2011), but not in Platynereis (Gazave et al., 2013). Interestingly, using RNAi, the Bruno-like gene bruli was shown to be required for maintenance of a subset of neoblasts in the asexual planarian Schmidtea mediterranea (Guo et al., 2006), but this gene is not homologous to canonical bruno genes. Confirmation of expression of pl10 and bruno2 in other species, and Tudor related tdrd7 in B. lanceolatum, will further expand this repertoire.

We also identified several markers with weak ubiquitous expression during early development. For instance, musashi, piwil2, and genes of the Pumilio domain family do not appear to be associated specifically with PGCs in B. lanceolatum or B. floridae (this study; Yue et al., 2015). A possible exception is pufA, which we found to be concentrated in a PGC-like domain in some cleavage stage embryos (two-cell to morula) in several independent experiments. Given the variability of the expression observed, we hesitate to classify this as bone fide expression in PGCs. However, pufA is expressed in germ cells in other species, including zebrafish (Kuo et al., 2009) and P. dumerilii (Gazave et al., 2013). Interestingly, a global search of germline and reproduction-associated genes using the transcriptome and genome of Asymmetron lucayanum and B. floridae, respectively, identified a pumilio/puf gene with expression in oocytes (Yue et al., 2015). Similar studies in maturing gonads in B. lanceolatum may also reveal functions for some of our candidates during germline maturation.

# Evolution of Musashi Related RRM-Containing Proteins and Novel Expression of Amphioxus *musashi*

The Musashi related proteins belong to a larger superfamily of RRM containing proteins, including Musashi, DAZAP, hnrp, and TARDBP clades. Although the evolutionary history of RRM domain containing proteins is complex, orthologs of musashi-related genes have been identified from sponge to human (Gasparini et al., 2011; Okamoto et al., 2012), including lancelets (Gasparini et al., 2011, this study). One of the principal findings of this study is that cephalochordates appear to have lost the ortholog of DAZAP, as we were unable to identify the gene in either the genomes of B. floridae or B. belcheri. Considerable confusion exists in the nomenclature in the literature due to the difficulty in distinguishing between musashi and DAZAP related genes. This is particularly evident in basal metazoans, where phylogenetic signal is weak (Okamoto et al., 2012, this study). Gasparini et al. (2011) first suggested that previously identified musashi-like genes in Halocynthia roretzi and Ciona intestinalis (Kawashima et al., 2000) are in fact DAZAP. This gene is expressed in the brain and nerve cord, as might be expected from musashi-like genes (Kawashima et al., 2000). However, the bona fide DAZAP1 in Botryllus schlosseri is expressed both during asexual (blastogenesis) and sexual (embryonic) development in many proliferating cell types, including the new growing vessels of the colonial circulatory system and the embryonic nerve cord, and is not restricted to neural stem cells as in other systems. Likewise, in the planarian D. japonicum DAZAP/musashi-like gene Djdmlg is expressed in differentiated tissues as well as Xray sensitive neoblasts (Higuchi et al., 2008). In this context, it would be particularly interesting to determine whether the cephalochordate musashi is taking on any of the DAZAP functions, or whether a different functional homolog might be involved.

Our observation that neural cells within the developing CNS of amphioxus express musashi is broadly consistent with data in bilaterians. For instance, in the flatworm Dugesia japonica, three musashi-like genes have been identified with expression in the brain primordia (Higuchi et al., 2008). Similarly, in zebrafish, musashi1 is expressed in neural tissues during early development, and knockdown by morpholino results in aberrant CNS formation (Shibata et al., 2012). Surprisingly however, amphioxus does not express musashi in a pattern consistent with a role in PGC specification or maintenance, in contrast to many other taxa. In Drosophila, musashi is required to maintain stem cell identity in GSCs (Siddall et al., 2006), and Rbp6, which is more closely related to vertebrate musashi1/2 (Siddall et al., 2012; this study), also may play some function in the germline (Siddall et al., 2012). In mice, the msi1 and msi2 orthologs appear to have sub-functionalized such that msi1 is required to maintain stem cell identity during early spermatogenesis, whereas msi2 plays a role in differentiation (Siddall et al., 2006). The generation of specific antibodies will be critical to gaining an understanding of the distribution of Musashi protein during amphioxus development and stem cell regulation.

Given its known neural and germline functions, the finding that musashi is predominantly expressed in the developing notochord in amphioxus was unexpected. We are not aware of any data demonstrating a specific function for musashi in the notochord in any chordate. However, the ancestral function of these RRM containing proteins may simply be in the switch between undifferentiated/stem cell and differentiated cell types and in the regulation of proliferation (Potten et al., 2003; MacNicol et al., 2011; Hochgreb-Hägele et al., 2014). Supporting this, the anterior endoderm encompassing the zones that will form the mouth and gill slits in amphioxus larvae, which has conspicuous musashi expression, is a zone of extensive proliferation and remodeling (Holland and Holland, 2006). The expression in developing notochord described here, which is unique to amphioxus, might also reflect a role in differentiation of this structure. Functional studies will help elucidate the role of Musashi in this and other structures.

# Posterior Stem Cells and Implications for Amphioxus Regeneration

The zygotic expression of several markers in the tailbud, including nanos, vasa, piwil1, and piwil2 among others, combined with circumstantial evidence that PGCs may migrate at the neurula stage toward the posterior (this study; Wu et al., 2011), suggest that the tailbud may be a source of progenitors or stem cells in larval amphioxus. Posterior elongation in amphioxus involves budding of somites directly from the tailbud, a source of Wnt ligand (Holland et al., 2000; Schubert et al., 2000, 2001; Somorjai et al., 2008). Although architecturally different, the tailbuds of vertebrates like mouse and chick are also sources of multipotent stem cells for embryonic elongation whose fate is Wnt signaling-dependent (Wilson et al., 2009; Garriock et al., 2015). The posterior growth zone may also act as a niche for progenitor cells even into adulthood, particularly in animals that add segments throughout their lives, such as many arthropods and most annelids (Bely and Wray, 2001; de Rosa et al., 2005; Seaver et al., 2005). The observation that the Vasa-positive PGCs lie within a stem cell marker-expressing posterior growth zone in amphioxus larvae (Wu et al., 2011; this study), representing a "mosaic" of PGCs and somatic stem cells, is however not unique to amphioxus. Gazave et al. (2013) have proposed the existence of an RNA binding protein signature for a new type of animal stem cell, termed "posterior stem cells," in P. dumerilii. Lineage analysis and EdU labeling have also revealed that the 4 presumptive PGCs, which appear during gastrulation, are derived from a mesoderm posterior growth zone (MPGZ; Rebscher et al., 2007, 2012). While the mechanisms employed by these annelids and cephalochordates to specify the germline are somewhat different, the use of such techniques in amphioxus will be instrumental in elucidating the origin and fate of different cell types during posterior elongation.

The existence of a posterior stem cell in the tailbud, or any other resident stem cell population that could be activated following tail amputation, has clear implications for regeneration in amphioxus. Although it has recently been demonstrated that the European amphioxus has considerable regenerative ability, most notably of the tail (Somorjai et al., 2012a,b), we still know next to nothing about the molecular signature or function of the somatic stem cells/progenitor pools involved in the process. This study represents the first step toward identifying a putative posterior stem cell pool in B. lanceolatum. Our prediction is that somatic stem cell markers that are normally expressed during tailbud development, such as vasa, nanos, piwil1, or piwil2, will also be expressed during the adult tail regeneration process. We are currently analysing blastema transcriptomes and proteomes to test this hypothesis (Dailey and Somorjai, unpublished). We might also expect to find genes traditionally associated with the germline to be expressed during tail regeneration, if common expression of "stemness" markers in PGCs and somatic stem cells reflect broader roles in developmental regulation, as has recently been demonstrated for Vasa in the sea urchin (Yajima and Wessel, 2015). Although functional experiments are lacking, comparative expression data in annelids are beginning to provide compelling evidence for this. In P. dumerilii, a number of RNA binding protein genes are expressed in PGCs as well as in putative posterior mesodermal and ectodernal stem cells during caudal regeneration, including vasa, pl10, piwi, pufA, pufB, nanos, and several tudor related genes (Rebscher et al., 2007; Gazave et al., 2013). Of these, several markers are also differentially expressed both in the germline and terminal growth zone during normal development and regeneration in the polychaetes Alitta virens and Capitella sp I (Dill and Seaver, 2008; Giani et al., 2011; Kozin and Kostyuchenko, 2015). However, the most striking example of a germline-independent redeployment of classic PGC markers in somatic tissues has been shown in the freshwater annelid Pristina leidyi, which reproduces exclusively asexually in the laboratory via paratomic fission. As might be expected, nanos, piwi1, and vasa are expressed in the posterior growth zone and developing (but unused) gonads. Notably, transcripts are also detected following amputation in the anterior blastema as well as the fission zone (Bely and Sikes, 2010; Özpolat and Bely, 2015), highlighting a more general role in tissues undergoing proliferation and remodeling. This phenomenon is not restricted to invertebrates or basal metazoans, as piwil1 and piwil2 are expressed in a complex spatiotemporal sequence during axolotl limb regeneration, with knockdown of either gene resulting in retardation of the regenerate outgrowth (Zhu et al., 2012). Future work in amphioxus will assess the tissuespecific expression pattern of some of the candidates identified here during adult tail regeneration. Development of knockdown tools and lineage analysis will be indispensable to elucidate their functional role during the regeneration process. Moreover, these methodologies will permit the comparative analyses of cellular and molecular processes necessary to understand the evolution of regeneration mechanisms in deuterostomes. More broadly, these types of studies should add to the growing body of literature aimed at understanding the link between soma and germline evolution.

# AUTHOR CONTRIBUTIONS

SD, RF, and AR performed experiments. JG discussed experiments and contributed reagents. IS conceived the study, performed experiments, contributed reagents, analyzed the data and wrote the manuscript.

# ACKNOWLEDGMENTS

This work was carried out sporadically over the course of several years in various countries. We would like to thank Jr-Kai Sky Yu (Academia Sinica, Taiwan) for contributing the Vasa antibody, and Irene Garcia for help in the laboratory (Barcelona). SD is funded through a MASTS PhD studentship (St Andrews, UK). IS gratefully acknowledges previous funding from Marie Curie IEF postdoctoral fellowship, FP7 People Programme (Barcelona, Spain); as well as MASTS (Marine Alliance for Science and Technology Scotland) laboratory start-up funds (St Andrews, UK). Embryo collection was made possible in part through the ASSEMBLE access programme (grant agreement no. 227799). We thank the Laboratoire Océanologique de Banyuls-sur-Mer, and most especially Dr. Hector Escrivà and Dr. Stéphanie Bertrand for hosting us. The University of St Andrews Library fund for open access supported the article publishing fee.

## REFERENCES


# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fevo. 2015.00156


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Dailey, Febrero Planas, Rossell Espier, Garcia-Fernàndez and Somorjai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Pax2/5/8 and Pax6 alternative splicing events in basal chordates and vertebrates: a focus on paired box domain

#### Peter Fabian, Iryna Kozmikova, Zbynek Kozmik and Chrysoula N. Pantzartzi\*

*Department of Transcriptional Regulation, Institute of Molecular Genetics, Prague, Czech Republic*

#### Edited by:

*Hector Escriva, Centre National de la Recherche Scientifique, France*

#### Reviewed by:

*Manuel Irimia, Centre for Genomic Regulation, Spain Simona Candiani, University of Genoa, Italy*

#### \*Correspondence:

*Chrysoula N. Pantzartzi, Department of Transcriptional Regulation, Institute of Molecular Genetics, Videnska 1083, Prague 14220, Czech Republic chrysoula.pantzartzi@img.cas.cz*

#### Specialty section:

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics*

> Received: *28 May 2015* Accepted: *15 June 2015* Published: *02 July 2015*

#### Citation:

*Fabian P, Kozmikova I, Kozmik Z and Pantzartzi CN (2015) Pax2/5/8 and Pax6 alternative splicing events in basal chordates and vertebrates: a focus on paired box domain. Front. Genet. 6:228. doi: 10.3389/fgene.2015.00228* morphogenesis. The number of *Pax* homologs varies among species studied so far, due to genome and gene duplications that have affected PAX family to a great extent. Based on sequence similarity and functional domains, four Pax classes have been identified in chordates, namely Pax1/9, Pax2/5/8, Pax3/7, and Pax4/6. Numerous splicing events have been reported mainly for *Pax2/5/8* and *Pax6* genes. Of significant interest are those events that lead to Pax proteins with presumed novel properties, such as altered DNA-binding or transcriptional activity. In the current study, a thorough analysis of *Pax2/5/8* splicing events from cephalochordates and vertebrates was performed. We focused more on *Pax2/5/8* and *Pax6* splicing events in which the paired domain is involved. Three new splicing events were identified in *Oryzias latipes*, one of which seems to be conserved in Acanthomorphata. Using representatives from deuterostome and protostome phyla, a comparative analysis of the *Pax6* exon-intron structure of the paired domain was performed, during an attempt to estimate the time of appearance of the *Pax6(5a)* mRNA isoform. As shown in our analysis, this splicing event is characteristic of Gnathostomata and is absent in the other chordate subphyla. Moreover, expression pattern of alternative spliced variants was compared between cephalochordates and fish species. In summary, our data indicate expansion of alternative mRNA variants in paired box region of *Pax2/5/8* and *Pax6* genes during the course of vertebrate evolution.

Paired box transcription factors play important role in development and tissue

#### Keywords: Pax258, Pax6, alternative splicing, paired domain, splice variants

# Introduction

Transcription factors encoded by genes of the paired box (PAX) family are highly conserved throughout metazoan phyla and hold a vital role in embryonic development. The association of different PAX subfamilies with organ and tissue morphogenesis, such as the thymus, central nervous system (CNS), enteric nervous system, kidneys, ear, thyroid, neural crest, vertebrae, and midbrain-hindbrain boundary (MHB) formation has been the object of various studies (summarized in Noll, 1993; Chi and Epstein, 2002; Paixao-Cortes et al., 2013; Blake and Ziman, 2014). Certain members of the PAX family are characterized as the master control genes for eye morphogenesis (Gehring, 1996, 2002, 2012; Kozmik, 2008; Klimova and Kozmik, 2014) and genetic defects are linked to the onset of eye-related diseases, e.g., small eye in mouse or aniridia in human. Mutations in Pax genes are also correlated with diseases of the kidney and the CNS, but various types of cancer as well (see Chi and Epstein, 2002; Lang et al., 2007; Paixao-Cortes et al., 2013; Blake and Ziman, 2014 and references therein).

Gene and genome duplications, followed by gene losses, helped shape PAX gene family, leading to a varying number of Pax homologs in metazoan phyla studied so far (reviewed in Noll, 1993; Breitling and Gerber, 2000; Hoshiyama et al., 2007; Paixao-Cortes et al., 2013). Different subfamilies have been identified and are classified according to the similarity in sequence, functional domains they possess as well as their expression patterns (see Stuart et al., 1994; Blake and Ziman, 2014 and references therein). PaxB is apparently the oldest member and is present in sponges and cnidarians (Kozmik et al., 2003; Hoshiyama et al., 2007; Hill et al., 2010). Pox neuro and single genes from Pax1/9, Pax2/5/8, Pax4/6, and Pax3/7 classes are present in cephalochordates (Short and Holland, 2008; Takatori et al., 2008). Pox neuro is present in Drosophila (Bopp et al., 1989), but is lost in the lineages of tunicates and vertebrates. Two Pax258 genes are present in urochordates, due to a duplication prior to ascidian and larvacean diversification (Wada et al., 2003; Canestro et al., 2005). As a result of two rounds of whole-genome duplication (Escriva et al., 2002; Putnam et al., 2008) and subsequent gene losses, the coelacanth Latimeria chalumnae possesses nine Pax genes, i.e., discrete gene copies from each of the Pax1/9, Pax2/5/8, Pax3/7, and Pax4/6 classes, suggesting that all members of the PAX family were present in the ancestor that gave rise to the tetrapod lineage (Paixao-Cortes et al., 2013). More than nine genes are present in teleost fishes (reviewed in Ravi et al., 2013), due to the so-called third round of genome duplication (Jaillon et al., 2004; Van De Peer, 2004).

All Pax proteins contain a DNA-binding domain in their N-terminus, known as the paired domain (PD), as well as transactivation and inhibitory domains in their C-terminus. The PD consists of 128 aminoacids and is made up by two helixturn-helix (HTH) subdomains, known as PAI and RED or Nterminal and C-terminal respectively, joined through a linker (Czerny et al., 1993; Xu et al., 1999). The Pax4/6 class contains an additional DNA-binding homeodomain, while the Pax2/5/8 possesses a partial homeodomain and an octapeptide motif, the latter known to interact with members of the Groucho family of co-repressors (Eberhard et al., 2000; Kreslova et al., 2002). Classes Pax1/9 and Pax3/7 both contain the octapeptide motif, yet the former lacks the homeodomain (Chi and Epstein, 2002). Pax loci which encode for proteins with truncated paired domain have been identified in Drosophila, C. elegans, as well as representatives of Hemichordata and Echinodermata (Chisholm and Horvitz, 1995; Cinar and Chisholm, 2004; Howard-Ashby et al., 2006; Friedrich and Caravas, 2011; Ravi et al., 2013).

Gene/genome duplication is a driving force for evolution (Bergthorsson et al., 2007; Maere and Van De Peer, 2010) and this could be nicely exemplified by the well-studied PAX gene family, where many duplicates were preserved in the genome and obtained new functions and new domains of expression (neofunctionalization), or original gene functions were partitioned (subfunctionalization) between duplicates (Pfeffer et al., 1998; Bassham et al., 2008; Kleinjan et al., 2008, reviewed in Holland and Short, 2010).

At posttranscriptional level, alternative splicing is also known to promote evolution, protein diversity and development of novel functions in eukaryotic genomes (reviewed in Nilsen and Graveley, 2010; Chen et al., 2012; Kelemen et al., 2013). In fact, it has been suggested that in the case of Pax genes the impact of alternative splicing on functional motifs is more intense than gene duplication and subsequent divergence of the duplicated genes (Short and Holland, 2008). It has been shown that alternative spicing usually takes place in a tissue or developmental stage-specific manner (Wang et al., 2008a; Kelemen et al., 2013). Depending on which exonic segments are cut-out and whether intronic regions are retained in the transcripts, splicing events can be clustered into four major groups, namely (1) exon skipping, (2) alternative 3′ -, (3) alternative 5′ -splice sites, and (4) intron-inclusion (Koralewski and Krutovsky, 2011). These four types of events can occur independently or in combination with other incidents, such as mutually exclusive exons, alternative initiation and alternative polyadenylation (Wang et al., 2008a; Koralewski and Krutovsky, 2011; Kelemen et al., 2013).

Alternative splicing of Pax genes has been observed in various species, from protostomes (Fu and Noll, 1997; Cinar and Chisholm, 2004) to cephalochordates (Glardon et al., 1998; Kozmik et al., 1999; Short and Holland, 2008; Holland and Short, 2010; Short et al., 2012) and vertebrates (Kozmik et al., 1993, 1997; Poleev et al., 1995; Heller and Brandli, 1997, 1999; Lun and Brand, 1998; Short et al., 2012), where multiple incidents from all major groups of splicing events were present. In principle, splice forms seem to have diverged between lineages, some of them are species- or genus-specific (Heller and Brandli, 1999; Short et al., 2012), nevertheless several splice isoforms seem to be evolutionary conserved (Kwak et al., 2006; Short and Holland, 2008; Short et al., 2012; Ravi et al., 2013). The majority of reported splice events regards the transactivation and inhibitory domain in the C-terminal part of the Pax proteins (Kozmik et al., 1993; Ward et al., 1994; Nornes et al., 1996; Tavassoli et al., 1997; Kreslova et al., 2002; Robichaud et al., 2004), still there is an increasing number of events affecting paired domain and consequently DNA binding capacity (Kozmik et al., 1993, 1997; Zwollo et al., 1997; Short and Holland, 2008; Short et al., 2012). One such example is the Pax6(5a) isoform, where inclusion of exon 5a (Walther and Gruss, 1991; Glaser et al., 1992; Puschel et al., 1992) leads to a protein with interrupted paired domain that recognizes an altered DNA binding sequence (Epstein et al., 1994). Apparently, this event is quite conserved among vertebrate lineages, with differences in size and peptide sequence of exon 5a between fish and tetrapods (Ravi et al., 2013). In some cases, alternatively spliced Pax isoforms exhibit temporally and spatially differentiated expression patterns (Kozmik et al., 1993, 1997; Heller and Brandli, 1997; Short and Holland, 2008) and have been associated with cancer and genetic disorders (reviewed in Wang et al., 2008b; Holland and Short, 2010).

Previous studies have shown that any insertion in the conserved paired domain, no matter if it is a single-aminoacid extension or a whole exon cassette, modifies DNA binding capacity and attributes differentiated functions to the isoforms bearing the insertion (Kozmik et al., 1997; Azuma et al., 2005).

In the present study, we sought to identify splicing events in Pax2/5/8 and Pax6 classes that affect the paired domain and study the expression patterns of these alternative spliced transcripts. We identified three new splicing events in Oryzias latipes Pax2 genes, one of which seems to be highly conserved in Acanthomorphata. We detected a re-occurring splicing event in O. latipes and Danio rerio Pax6 genes generating the exon 5a insertion. Using our data set we tried to elucidate the time point at which the exon 5a-insertion appeared and the extent of its conservation in various phylogenetic groups.

# Materials and Methods

#### Data Collection and de novo Gene Annotation

Nucleotide and aminoacid sequences for annotated Pax2/5/8 and Pax6 genes were obtained using proper keywords, through NCBI GenBank (Benson et al., 2013), ENSEMBL release 78 (Cunningham et al., 2015), the UCSC Genome Browser database (Karolchik et al., 2014), the SpBase (Sea Urchin Genome Database, Cameron et al., 2009) and the JGI (Grigoriev et al., 2012). The retrieved Pax genes were crosschecked using GENSCAN (Burge and Karlin, 1997), BLASTx and version 0.9 of the NNSPLICE splice predictor (Reese et al., 1997).

For various taxonomic groups (e.g., Chondricthyes: Holopocephali) there are available genomes, but no Pax genes are annotated in public databases. In order to include representatives from these groups, we conducted BLAST searches against the NCBI GenBank and wgs subdivision (Trace archive), using known homologs from Deuterostomia species. Where required, small contigs or scaffolds were fused using Merger of the EMBOSS software suite (Rice et al., 2000) and gene structure was defined using GENSCAN (Burge and Karlin, 1997), BLASTx and splice predictor (Reese et al., 1997). ScanProsite (De Castro et al., 2006) was used to detect conserved functional domains in newly identified genes. In addition, adjacent genes of the de novo predicted Pax genes were also predicted/annotated and gene order was compared to known Pax syntenic regions through the Genomicus website v78.01 (Louis et al., 2013). In all cases, the NNSPLICE was used for the prediction of possible alternative acceptor and donator sites.

PipMaker (Schwartz et al., 2000) was used along with BLAST, in order to locate putative sequence conservation among species. Alignment of Pax6 paired domains from various species was performed using the MUSCLE algorithm (Edgar, 2004) included in Mega version 5.0 (Tamura et al., 2011).

#### Expressed Sequenced Tags (ESTs) Retrieval and Analysis

In order to validate already annotated or newly predicted Pax homologs, BLAST searches were performed against the ESTs subdivision. Collected ESTs were aligned with predicted coding sequences from genome analyses and mRNA sequences if available—in order to detect putative splicing events not recognized so far.

#### Animal Collection

Specimens of Branchiostoma floridae were collected from Old Tampa Bay, Florida, USA. Gametes were obtained and embryos were raised, as previously described (Holland and Yu, 2004). B. lanceolatum adults were collected in Banyuls-sur-Mer, France, prior to summer breeding season and raised in the lab until spawning. The spawning of males and females was induced by temperature shift (Fuentes et al., 2007). B. lanceolatum and B. floridae embryos were developed at 16◦C and 26◦C, respectively. Embryos of inbred strains of Oryzias latipes (Cab) and Danio rerio (AB) were used for all experiments. O. latipes and D. rerio embryonic stages were determined according to Iwamatsu (2004) and Kimmel et al. (1995). Housing of animals and in vivo experiments were performed after approval by the Animal Care Committee of the Institute of Molecular Genetics (study ID#36/2007) and in compliance with national and institutional guidelines (ID#12135/2010-17210).

#### RNA Isolation and Reverse Transcription

Total RNA was isolated from embryos using the Trizol reagent (Ambion). Random-primed cDNA was prepared in a 20µl reaction from 500 ng of total RNA using SuperScript VILO cDNA Synthesis kit (Invitrogen).

#### Screen for Alternative Splicing and RT-PCR Analysis

cDNA was subjected to PCR using DreamTaq polymerase (Thermo Scientific) for 30 cycles under the following conditions: 1 min at 98◦C, 30 s at 60◦C, 30 s at 72◦C. Primers for this analysis are provided in **Table 1**. PCR products were analyzed on 2.5% agarose gel and bands of interest were eluted, cloned to pCR-Blunt II (Invitrogen) and sequenced (GATC Biotech sequencing service, Germany).

### Results

#### Pax2/5/8 Splicing Events in Chordates

Exhaustive search through databases and literature, in combination with de novo analysis of available ESTs and mRNA sequences (Table S1), revealed numerous splicing events in chordate members of the Pax2/5/8 class (**Figure 1**). Some of these events seem to characterize specific orthologs and are present in cephalochordates, fish and mammals (e.g., exon 2 of Pax5 gene), while others are much less conserved (e.g., exon 3a of mouse Pax5). Branchiostoma floridae appears to experience the largest number of splicing events, however no event of insertion in the paired domain has been reported so far and no such event could be predicted using splice prediction software or available ESTs/mRNA sequences.

A single Oryzias latipes Pax2 gene, namely OlPax2.2, located on chromosome 19 (NC\_019877), has been used in previous studies (Paixao-Cortes et al., 2013). Our search through NCBI revealed an annotated Pax-2a-like gene on chromosome 15 (NW\_004088010.1). It must be noted that the aminoacid sequence encoded by the first half of this gene exhibits no similarity to the paired domain of other Pax genes and it is not supported by ESTs, a fact that indicates an erroneous gene

TABLE 1 | Summary of primers used in RT-PCR reactions in the present study.


*For Branchiostoma species the same set of primers was used for each gene class amplified. Oryzias latipes Pax2.1 reverse primer marked with asterisk was used to amplify alternative exon 2a along with Pax2.1 forward primer.*

prediction, caused by a non-sequenced area in this genomic region. We assume that the first coding exon as well as the two exons coding for the paired domain of OlPax2.1 are located in this non-sequenced region. The record of an unplaced scaffold (NW\_004093539, Table S1) was retrieved through BLAST. It apparently corresponds partly to the non-sequenced region of chromosome 15 and contains the 5′ UTR, the first exon of OlPax2.1 gene and part of the first intron.

In order to retrieve more information on OlPax2.1 and to elucidate how many OlPax2 (either Pax2.1 or Pax2.2) transcripts exist, we searched for different ESTs and mRNA sequences using OlPax2.2 and Danio rerio Pax2.1 as queries. There are only two ESTs (AM320053 and AM321390) for OlPax2.1 that contain the first coding exon, as well as the complete exon 2 and part of exon 3, encoding for N- and C- paired subdomains, respectively. Using the genomic scaffolds and available ESTs collectively (Table S1), an almost complete OlPax2.1 gene was reconstructed (**Figure 1**). For OlPax2.2 gene, one cDNA sequence and four ESTs were retrieved (Table S1), comparison of which revealed both 5′ and 3′ alternatively spliced parts of exon 2 encoding for the N-terminal of paired domain (see **Figure 1**).

Exon-to-exon comparison of Pax2/5/8 genes between O. latipes and D. rerio, shows that in principle there is conservation in the sequence, number, size, and borders of exons and some indication for alternatively spliced exons in OlPax2 genes (light gray boxes in **Figure 1**), which are not suggested by the available ESTs. Retrieved ESTs encoding both OlPax2.1 and OlPax2.2 support a 5′ alternative splicing donor site in exon 2 (**Figure 1**, Table S1); the insertion is 12 bp long, and results to four additional aminoacids, exactly at the beginning of the paired domain. The same insertion has been reported for D. rerio Pax2.1 (Lun and Brand, 1998), as well.

In the present study we identified two splicing events that lead to insertion of extra aminoacids in the paired domain of OlPax2 genes. More specifically, an alternatively spliced 21-bp exon was detected between exons 2 and 3 of fish Pax2.1 genes, which is annotated in some species (e.g., Poecilia reticulata and Maylandia zebra). This exon could not be detected in silico in the genome of O. latipes, due to the fact that the intron between exons 2 and 3 is not sequenced. Through BLAST searches and de novo analysis of Pax2.1 genes we spotted this exon in numerous representatives from different orders of Acanthomorphata (Table S2), while PipMaker alignment reveals a high degree of sequence conservation among compared species (**Figure 2**). A putative exon with proper splice sites has been identified in the respective genomic region of three Cyprinidae species (D. rerio, Pimephales promelas, and Cyprinus carpio). Even though this exon is highly conserved in these species, the encoded aminoacids are quite dissimilar from those of the Acanthomorphata 21-bp exon (**Figure 2**). In both cases, inclusion of this exon leads to an alternative transcript, which incorporates seven extra aminoacids toward the end of the a3 helix of the PAI subdomain (**Figure 2**).

In regard to OlPax2.2, the available mRNA sequence in GenBank and our analysis revealed a 24-bp in-frame extension at the 3′ end of exon 2 (**Figures 1**, **3**, Table S1), which does not alter the downstream translation (**Figure 3**). This isoform, to which we will refer as OlPax2.2(ext24+), is due to an alternative splicing donor downstream the canonical one (**Figure 3**). In D. rerio, a similar isoform is neither supported by splicing prediction software nor by available mRNA sequences.

It should be noted that splicing prediction analysis of OlPax2.1 gene revealed the presence of an alternative donor site in exon 2, upstream the canonical one. Deletion of 35 bp at the 3′ end of exon 2 causes a frameshift and leads consequently to a premature stop codon at the beginning of exon 3 (Figure S1). A similar donor site was not in silico identified in D. rerio (data not shown).

#### Comparative Analysis of 5a-exon Insertion in Pax6 Paired Domain

In teleosts and tetrapods studied so far, the major part of Pax6 paired domain is encoded by two exons, responsible for the N- and C-subdomains, with a size of 131 and 216 bp, respectively. In Tetrapoda and in the Pax6.1 copy of teleosts, a small exon of varying size (36–42 bp), namely 5a, has been shown to be included in alternative transcripts, causing an in-frame insertion in the paired domain (Ravi et al., 2013). We wanted to identify at which point of evolutionary history this exon appeared and investigate a putative correlation of the appearance of this exon with the exon/intron organization of Pax6 homologs. For this reason, already annotated Pax6 homologs were collected from public databases and available genomes and EST sequences from non-jawed vertebrates,

of Pax2/5/8 representatives from vertebrates and cephalochordates. Dark gray and white boxes represent constitutive and alternatively spliced exons, respectively. Yellow boxes denote alternatively spliced parts of exons, due to different 5′ or 3′ splicing donors/acceptors. The suffix "a" is used for non-canonical exons, characteristic of the different *Pax2/5/8* genes (different colors of outline is used for different orthologs). For *O. latipes*, light gray boxes were predicted due to high similarity to respective exons in *D. rerio*, whereas black box shows the non-sequenced part of exon 3. Lines represent introns (not drawn to scale). Blue thick lines represent intron retention events. stand for alternative stop codons. Blue, brown, and purple boxes define borders of paired domain, the octapeptide, and partial homeodomain, respectively. Alternative splicing events detected in the present study are indicated by red and green boxes (alternative 5′ splice donors) and green stripped box (exon cassette). Previously published data are included (Dressler et al., 1990; Krauss et al., 1991; Kozmik et al., 1993, 1997, 1999; Ward et al., 1994; Poleev et al., 1995; Zwollo et al., 1997; Lun and Brand, 1998; Pfeffer et al., 1998; Borson et al., 2002; Robichaud et al., 2004; Kwak et al., 2006; Short and Holland, 2008; Arseneau et al., 2009; Busse et al., 2009).

cephalochordates, tunicates, hemichordates, echinoderms, as well as Drosophila and C. elegans were analyzed (Table S3).

Analysis of a genomic scaffold from Leucoraja erinacea, containing the Pax6 ortholog, provides evidence that besides Holocephali (Ravi et al., 2013), the 5a exon is also present in Elasmonbranchii, the second subclass of Chondricthyes. Unfortunately, no genomes from hagfishes are publicly available, yet the two Pax6 mRNA sequences that were retrieved from Eptatretus bergeri (Table S3), do not provide any indication of an exon5a-like insertion in the paired domain.

In regard to Hyperoartia, genomic scaffolds containing parts of the Pax6 genes from Petromyzon marinus and Lethenteron japonicum were retrieved and analyzed, along with two Pax6 mRNA from the species L. japonicum and Lampetra fluviatilis. Apparently, there are more than one Pax6 genes in the L. japonicum genome, yet the low genome coverage for both P. marinus and L. japonicum (5× and 20×, respectively) does not allow for safer conclusions. In all cases, there is no evidence for the existence of the exon 5a in Hyperoartia.

Existing models for Stongylocentrotus purpuratus predict two truncated neighboring Pax6 proteins that contain either the paired or the homeobox domain. Taking into account the provided information in SpBase mentioning that the gene models are incomplete and an intervening sequence appears to be missing in a scaffold gap in the Spur3.1 assembly (Howard-Ashby et al., 2006; Cameron et al., 2009), we re-evaluated the prediction and tried to re-construct the SpPax6 homolog.

Our focus was on the exon-intron structure in the region of paired domain (**Figure 4**). It is apparent, from our analysis, that the size and borders of the two exons encoding the main part of paired domain underwent various changes in different taxonomic groups (**Figure 4**). More specifically, in basal deuterostomes, such as Echinodermata and Hemichordata, there is one large exon with a size of 347 bp encoding for the first 115 aminoacids of the paired domain. Therefore, there is no intervening non-coding region in the respective position to vertebrate's intron, or in other words "space" for insertion of an alternatively spliced exon in the paired domain. In cephalochordates and tunicates, this large exon has split into two exons, first of which has a size of 166 bp, still larger than the N-subdomain encoding exon in vertebrates, and the second one is 181 bp long, slightly smaller than the respective exon in vertebrates. The 166 bp exon encodes a peptide including the first four aminoacids of a3 helix of the N-subdomain (Xu et al., 1995) and ends shortly after the position where the exon 5a is inserted, i.e., downstream of a3 helix. Thorough in silico search in the intronic sequence flanked by the pairedencoding exons in tunicates and cephalochordates did not reveal a putative alternatively spliced exon cassette similar to the exon 5a (**Figure 4**, Figure S2). It seems that paired-encoding


FIGURE 2 | Alternatively spliced isoform OlPax2.1(2a+). Pip diagram of paired-encoding (2 and 3, blue boxes) and alternatively spliced exon (21 bp, red box) in Acanthomorphata species. Blue thick line represents alternatively spliced exon predicted in three Cyprinidae species. Dots stand for conserved residues, small case letters correspond to intronic nucleotides. Blue arrowheads point to PAI a3 helix.

exons obtained fixed borders and size of 131 and 216 bp for PAI and RED, respectively before diversification of cyclostomes and preserved them throughout vertebrates, nonetheless "birth" of exon 5a probably appeared in Gnathostomata (**Figure 4**, Figure S2). It should be noted that the major re-arrangements concern the genomic region encoding for the PAI a3 helix, in contrast to the high conservation observed toward the C-subdomain.

#### Developmental Expression of PAI-RED Isoforms

To verify in silico predicted alternative splice isoforms (**Figures 2**–**4**, Figure S1) and compare their expression across various developmental stages, we performed RT-PCR using RNA from different embryonic stages of Branchiostoma lanceolatum, B. floridae, Oryzias latipes, and Danio rerio (**Figure 5**).

In agreement with our in silico analysis, RT-PCR (**Figure 5**) and DNA sequencing (data not shown) using primers located in exons encoding PAI and RED domains revealed that B. lanceolatum and B. floridae express single Pax258 and Pax46 isoforms.

Three splice isoforms of OlPax2 genes have been in silico predicted. Lack of information concerning intron 2 of OlPax2.1 prohibited the in silico detection of the 21-bp exon, characteristic of Acanthomorphata. However, the presence of 21-bp exon in the O. latipes genome was experimentally verified, using a proper

set of primers (**Table 1**), one of which was specifically designed on the 21-bp exon characteristic of Acanthomorphata (**Figure 2**). Expression of the alternatively spliced OlPax2.1(2a+) isoform was detectable from neurula and later developmental stages (**Figure 5**). In contrast, the truncated isoform OlPax2.1(del35), which results from a deletion in exon 2 and a premature stop codon in the 3rd exon (Figure S1), was present across all developmental stages (**Figure 5**). OlPax2.1(del35) was expressed at much lower level than dominant OlPax2.1(2a-) isoform.

Sequencing of this isoform revealed that it makes use of the alternative donor site in exon 2, as predicted by in silico analysis (Figure S1). The extended isoform OlPax2.2(ext24+) (**Figure 3**) was present at detectable level throughout the examined stages (**Figure 5**). In the case of D. rerio Pax2.1 and Pax2.2 genes, no alternative splice variants were detected, in agreement with splicing prediction analysis.

The alternatively spliced isoform OlPax6.1(5a-) was expressed approximately at the same level as isoform OlPax6.1(5a+)

(**Figure 5**). In agreement with previous studies (Ravi et al., 2013), O. latipes Pax6.3 gene does not possess the equivalent of exon 5a. Variants bearing exon 5a were observed for both D. rerio Pax6.1a and Pax6.1b genes, and in both cases, expression level of isoform 5a- was relatively higher than isoform 5a+.

In silico and experimental data, collectively, demonstrate increased complexity of splicing events in vertebrate paired domain of Pax genes in comparison to cephalochordates.

#### Discussion

Paired box (Pax) genes encode for transcription factors that are considered key players in organogenesis and embryonic development. The presence of Pax genes in a variety of organisms and the evolution of the PAX family has been the object of various studies (Hill et al., 2010; Paixao-Cortes et al., 2013; Ravi et al., 2013). Whole-genome duplications as well as lineagespecific gene duplications provide additional possibilities for diversified evolution and/or speciation (Bergthorsson et al., 2007; Maere and Van De Peer, 2010). These processes are considered to have played important role in shaping the number of Pax homologs in various taxonomic groups (see Paixao-Cortes et al., 2013; Ravi et al., 2013 and references therein), but same applies for alternative splicing, a posttranslational mechanism that also promoted evolution and complexity of Pax proteins (Glardon et al., 1998; Short et al., 2012).

In the present study, we wanted to evaluate the degree of alternative splicing taking place in various lineages, as well as to identify splicing events that are either evolutionary conserved or characteristic of cephalochordates and not vertebrates or vice versa. For this purpose, we collected annotated homologs from Pax2/5/8 and Pax4/6 classes from public databases (NCBI, Ensembl, UCSC, JGI and SpBase). Furthermore, we analyzed de novo genomes, ESTs and mRNA sequences from different species, in order to enrich our dataset with taxonomic groups not present in previous studies. Our second focus was Pax isoforms from cephalochordates and vertebrates, that differ in the paired domain, and their expression patterns across different developmental stages.

Apart from partially reconstructing the Oryzias latipes Pax2.1, using available scaffolds and EST sequences, we identified three new splicing events in the Pax2 genes of O. latipes. The OlPax2.1(2a+) isoform is reminiscent of the 5a isoform found in Pax6 homologs (Ravi et al., 2013, this study), as it incorporates a 21-bp in-frame-exon located in the intron between the two exons encoding for the paired domain. Our analysis showed that this exon is present in numerous species from various orders of Acanthomorphata and exhibits a high degree of conservation among compared species (**Figure 2**). We presume that sequence conservation of this mRNA splice form over a wide phylogenetic distance also implies conservation of this isoform's function. A similar exon in terms of location, yet quite divergent in terms of sequence, was in silico predicted only in three Cyprinidae species (D. rerio, Pimephales promelas, and Cyprinus carpio).

The second alternatively spliced isoform, namely OlPax2.2(ext24+), results from the use of an alternative splice donor downstream the canonical one at the end of OlPax2.2 exon 2. In this case, extra aminoacids are incorporated in the middle of a3 helix of the PAI subdomain, with no influence on the downstream sequence. A similar isoform could not be detected in D. rerio, neither experimentally nor in silico. The sequences surrounding the normal splice junctions of exon 2-intron 2 are highly conserved between D. rerio and O. latipes Pax2.2 genes, yet there is no proper donor-acceptor site in the region of D. rerio (AG-AG) that corresponds to the alternative splice site of O. latipes.

Both OlPax2.1(2a+) and OlPax2.2(ext24+) transcripts bear an insertion in the recognition a3 helix of PAI subdomain. Previous studies on insertions in the paired domain of Pax genes have proven that, regardless of the number of the inserted aminoacids, disruption of this helix, which is responsible for all major groove DNA contacts of the Nterminal subdomain (Xu et al., 1995, 1999) is expected to inactivate the DNA-binding function of the N-terminal HTH motif, which subsequently leads to severe restriction in the DNAbinding sequence specificity of the paired domain (Kozmik et al., 1997).

The importance of alternative splicing as a mechanism for divergent evolution is established. In the case of Pax genes, the fact that insertions in the paired domain may preferentially guide Pax proteins, namely Pax6(5a) and Pax8(S), to the control region of genes containing a modified binding site (5aCON-like sequence, Kozmik et al., 1997), in other words insertions add new target-genes in the repertoire of genes controlled by Pax genes, may be indicative of a mechanism through which alternative splicing contributes to the increase of complexity at the level of protein function.

The isoform OlPax2.1(del35) makes use of an alternative 5′ splicing donor, upstream of the normal splicing site in exon 2 (N-terminal of paired domain). As mentioned before, the exact junction sequence between exon 2 and intron 2 is not known, nevertheless, the sequence at the normal end of exon 2 (CAG) is in agreement with the optimal consensus for 5′ splice sites (Stephens and Schneider, 1992), in contrast to the sequences at the alternative upstream 5′ splice donor (CGG/GT, Figure S1). As it has been observed before for the Pax8 gene (Kozmik et al., 1997), there is a higher abundance and constitutive splicing of the Pax2 mRNA relative to the alternative transcript (**Figure 5**), a fact that could be attributed to different affinities by which the spliceosomes may recognize the two 5′ donor sites. A similar truncated isoform could not be detected neither during Pax2.1 transcript analysis of D. rerio, nor by in silico analysis.

The alternative isoform OlPax2.1(del35), lacks the greater part of a3 helix of PAI domain and ends at a premature stop codon exactly at the beginning of exon 3. Truncated isoforms are not a rare phenomenon, given the fact that approximately 35% of alternatively spliced human transcripts have been found to contain a premature termination codon, rendering them as candidates for non-sense- mediated decay (Green et al., 2003; Lewis et al., 2003). It has been proposed that most low copy number alternative isoforms produced in human cells are likely to be non-functional, therefore we assume that this is also the case for OlPax2.1(del35). Deletion of a3 helix has been observed in one of the Pax6 isoforms in B. floridae (Glardon et al., 1998), yet this deletion does not influence downstream translation and hence its functionality. A 32 bp deletion in mouse is responsible for splotch phenotype in mouse (Epstein et al., 1991). In addition, there are accumulating reports about heterozygous deletions of parts of PAI subdomain in general or a3 helix in specific, most of which cause a frame shift and a premature stop codon (Schimmenti et al., 1997; Fletcher et al., 2005) and are correlated with diseases in human (e.g., renalcoloboma syndrome, oligomeganephronia).

In regard to the Pax6 class and the alternative splice isoform Pax6(5a), our analysis showed that an important re-arrangement of coding and non-coding sequences in the region of paired domain took place during evolution. Although conservation of the position of introns has been noted between highly divergent eukaryotes, the number and placement of the majority of introns are dynamically fluctuating during evolution (Hartung et al., 2002; Rogozin et al., 2003). In Hemichordata and Echinodermata, the exon-intron organization does not allow for any type of insertions in the paired domain. In other lineages compared, paired domain is encoded by exons disrupted by one or more introns. Incidents of intron gain and loss as well as intron sliding have been reported for various genes (Hartung et al., 2002), whereas the intron density, i.e., the average number of intron per gene does not necessarily coincide with the position of the genome on the evolutionary tree (Jeffares et al., 2006). We assume that the 5a insertion is characteristic of Gnathostomata. Introns are required for alternative splicing and alternative splicing increases the size of the proteome, thus increasing the level of complexity in higher eukaryotes. Moreover, introns have been found to harbor many conserved non-coding elements, necessary for gene regulation (Irvine et al., 2008; Bhatia et al., 2014).

Unique isoforms were detected during expression pattern study of Branchiostoma Pax258 and Pax46 genes. This is in agreement with in silico analysis, during which no splicing events involving the paired domain were predicted. In contrast, new alternative spliced variants were identified for fish species. Previous studies have shown that there is no developmental regulation of paired domain alternative splice forms of Pax6 and Pax8, as opposed to splicing events affecting the C-terminal sequences of Pax8 protein (Kozmik et al., 1993, 1997). In principle, non-constitutive OlPax2 isoforms are expressed at low levels, therefore at this stage, it is not easy to conclude as to the regulation of these isoforms. Nonetheless there is an indication of a temporal

#### References


regulation of OlPax2.1(2a+) isoform, which requires further investigation.

### Acknowledgments

This study was supported by grant LH12047 from the Ministry of Education, Youth and Sports of the Czech Republic and the project "BIOCEV—Biotechnology and Biomedicine Centre of the Academy of Sciences and Charles University" (CZ.1.05/1.1.00/02.0109). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

## Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fgene. 2015.00228


domain are regulated by alternative splicing. Genes Dev. 8, 2022–2034. doi: 10.1101/gad.8.17.2022


their expression in embryonic development. Dev. Biol. 300, 74–89. doi: 10.1016/j.ydbio.2006.08.039


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Fabian, Kozmikova, Kozmik and Pantzartzi. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Edited by:* Naiara Rodriguez-Ezpeleta, AZTI-Tecnalia, Spain

*Reviewed by:*

Ingo Braasch, University of Oregon, USA Brian Frank Eames, University of Saskatchewan, Canada

#### *\*Correspondence:*

Sylvain Marcellini, Department of Cell Biology, Faculty of Biological Sciences, Universidad de Concepción, Barrio Universitario s/n, Casilla 160-C, Concepción, Chile smarcellini@udec.cl; Mélanie Debiais-Thibaud, Institut des Sciences de l'Evolution de Montpellier, UMR5554, Université Montpellier, Centre National de la Recherche Scientifique, IRD, EPHE, Eugène Bataillon, cc064 - 34000 Montpellier, France mdebiais@univ-montp2.fr

#### *†Present Address:*

Willian T. A. F. Silva, Department of Ecology and Genetics/Evolutionary Biology, Evolutionary Biology Center, Uppsala University, Uppsala, Sweden ‡Co-first authors.

#### *Specialty section:*

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> *Received:* 30 April 2015 *Accepted:* 24 August 2015 *Published:* 15 September 2015

#### *Citation:*

Enault S, Muñoz DN, Silva WTAF, Borday-Birraux V, Bonade M, Oulion S, Ventéo S, Marcellini S and Debiais-Thibaud M (2015) Molecular footprinting of skeletal tissues in the catshark Scyliorhinus canicula and the clawed frog Xenopus tropicalis identifies conserved and derived features of vertebrate calcification. Front. Genet. 6:283. doi: 10.3389/fgene.2015.00283

# Molecular footprinting of skeletal tissues in the catshark *Scyliorhinus canicula* and the clawed frog *Xenopus tropicalis* identifies conserved and derived features of vertebrate calcification

Sébastien Enault 1‡, David N. Muñoz 2‡, Willian T. A. F. Silva<sup>1</sup> † , Véronique Borday-Birraux 3, 4 , Morgane Bonade<sup>3</sup> , Silvan Oulion<sup>1</sup> , Stéphanie Ventéo<sup>5</sup> , Sylvain Marcellini <sup>2</sup> \* and Mélanie Debiais-Thibaud<sup>1</sup> \*

1 Institut des Sciences de l'Evolution de Montpellier, UMR5554, Université Montpellier, Centre National de la Recherche Scientifique, IRD, EPHE, Montpellier, France, <sup>2</sup> Laboratory of Development and Evolution, Department of Cell Biology, Faculty of Biological Sciences, Universidad de Concepción, Concepción, Chile, <sup>3</sup> Laboratoire EGCE UMR Centre National de la Recherche Scientifique 9191, IRD247, Université Paris Sud, Gif-sur-Yvette, France, <sup>4</sup> Université Paris Diderot, Sorbonne Paris Cité, Paris, France, <sup>5</sup> Institute for Neurosciences of Montpellier, Institut National de la Santé et de la Recherche Médicale U1051, Montpellier, France

Understanding the evolutionary emergence and subsequent diversification of the vertebrate skeleton requires a comprehensive view of the diverse skeletal cell types found in distinct developmental contexts, tissues, and species. To date, our knowledge of the molecular nature of the shark calcified extracellular matrix, and its relationships with osteichthyan skeletal tissues, remain scarce. Here, based on specific combinations of expression patterns of the Col1a1, Col1a2, and Col2a1 fibrillar collagen genes, we compare the molecular footprint of endoskeletal elements from the chondrichthyan Scyliorhinus canicula and the tetrapod Xenopus tropicalis. We find that, depending on the anatomical location, Scyliorhinus skeletal calcification is associated to cell types expressing different subsets of fibrillar collagen genes, such as high levels of Col1a1 and Col1a2 in the neural arches, high levels of Col2a1 in the tesserae, or associated to a drastic Col2a1 downregulation in the centrum. We detect low Col2a1 levels in Xenopus osteoblasts, thereby revealing that the osteoblastic expression of this gene was significantly reduced in the tetrapod lineage. Finally, we uncover a striking parallel, from a molecular and histological perspective, between the vertebral cartilage calcification of both species and discuss the evolutionary origin of endochondral ossification.

Keywords: fibrillar collagens, vertebrate skeletogenesis, bone, cartilage, *Scyliorhinus canicula*, *Xenopus tropicalis*

# Introduction

The evolutionary origin and diversification of the skeleton remains one of the most intriguing issue in vertebrate biology. Solving this problem requires a comprehensive view of the diversity of skeletal cell types found in distinct developmental contexts, tissues, anatomical locations, and species, as has been emphasized in a recent synthesis of existing skeletal terminologies (Dahdul et al., 2012). In mammals, the chondrocytes produce the extracellular matrix of the fibrous, elastic and hyaline cartilage, while osteoblasts and osteocytes are involved in bone formation (Benjamin and Evans, 1990; Hartmann, 2009; Zhang et al., 2009; Long, 2011). Yet, an intermediate type of chondroid bone, exhibiting characteristics of both bone and cartilage, has been described in rodents as well as teleosts, leading some authors to propose that, in fact, bone and cartilage represent two extreme forms of a skeletal tissue continuum (Huysseune and Verraes, 1986; Huysseune and Sire, 1990; Mizoguchi et al., 1997; Kranenbarg et al., 2005; Estêvão et al., 2011). In addition, chondrichthyans display a series of heavily calcified skeletal tissues such as the cartilaginous tesserae of the jaws (with no obvious homologs in osteichthyans, see Dean et al., 2005; Dean and Summers, 2006; Dean et al., 2009; Omelon et al., 2014), the vertebral body developing around the notochord (Peignoux-Deville et al., 1982; Dean and Summers, 2006; Eames et al., 2007; Porter et al., 2007; Fleming et al., 2015) and the perichondrium of the neural arches laying on each side of the neural tube (Peignoux-Deville et al., 1982; Eames et al., 2007). In summary, while developmental and paleontological studies have revealed the versatile nature of skeletal tissues characterizing the vertebrate skeleton (Donoghue and Sansom, 2002; Janvier and Arsenault, 2002; Dahdul et al., 2012; Janvier, 2015), the molecular identity and the evolutionary relationships of the distinct vertebrate skeletal cell types remain an open question.

The comparison of expression patterns represents a powerful approach to examine cell type evolution and, for instance, has shed light on the origin of sensory neurons in animals (Arendt, 2008). Here, we have explored the possibility that combinations of expression patterns of genes coding for crucial components of the skeletal matrix can serve as useful molecular footprints to compare the identity of skeletal cell types between chondrichthyan and osteichthyan representatives. We chose to focus on the Col1a1, Col1a2, and Col2a1 genes, belonging to the Clade A of the fibrillar collagen family, because they are known to contribute to biomineralization and because they are intimately associated to skeletal development and evolution (Wada et al., 2006; Rychel and Swalla, 2007; Zhang and Cohn, 2008; Landis and Silver, 2009; Eyre and Weis, 2013; Veis and Dorvee, 2013). Col1a1 and Col1a2 (Type I collagen) are robustly expressed in osteichthyan osteoblasts (Kobayashi and Kronenberg, 2005; Li et al., 2009; Albertson et al., 2010; Estêvão et al., 2011; Eames et al., 2012). By contrast, the osteoblastic expression of Col2a1 (Type II collagen) is more variable and has been reported in developing bones of gar and teleosts (Benjamin and Ralphs, 1991; Albertson et al., 2010; Eames et al., 2012), at low levels in some scattered mouse osteoblasts (Hilton et al., 2007), and transiently in chick preosteoblasts (Abzhanov et al., 2007). In addition, Col2a1 displays a conserved expression pattern in chondrocytes of immature hyaline cartilage whose proliferation drives the growth of endochondral bones (Benjamin and Ralphs, 1991; Nah et al., 2001; Kerney and Hanken, 2008; Hartmann, 2009; Albertson et al., 2010; Estêvão et al., 2011; Eames et al., 2012). Col2a1 expression becomes progressively downregulated as the hyaline cartilage matures and calcifies its extracellular matrix (Eames et al., 2003; Hartmann, 2009). Of particular relevance for this study, Col2a1-negative mature cartilage calcification usually occurs at levels that are too weak to robustly stain with Alizarin red, a reagent commonly used to specifically detect the calcifying bone matrix of vertebrate embryos (Kirsch et al., 1997; Khanarian et al., 2014), with some exceptions reported in the swell shark vertebrae and the domestic fowl trachea (Hogg, 1982; Eames et al., 2007). Possibly due to lineage-specific duplications, lamprey and hagfish (cyclostomes) exhibit one or two Col2a1 orthologs (and no Col1a1 or Col1a2 genes) expressed in broad regions encompassing mesenchymal cells and some, but not all, cartilaginous elements (Zhang and Cohn, 2006, 2008; Zhang et al., 2006; Ota and Kuratani, 2010; Cattell et al., 2011). In shark, immunohistochemistry allowed the clear detection of Type II collagen fibers in cartilage matrix, while the weaker reactivity of the anti-Type I collagen antibody suggested a perichondral expression, without allowing the discrimination of cells secreting Col1a1 and/or Col1a2 proteins (Eames et al., 2007).

In order to identify skeletal cell types sharing a specific molecular identity between remotely related jawed vertebrates, we compared the endoskeletal expression patterns of the Col1a1, Col1a2, and Col2a1 fibrillar collagen genes in the chondrichthyan Scyliorhinus canicula (S.c.) and the tetrapod Xenopus tropicalis (X.t.). We find that, depending on the anatomical location, skeletal calcification in S.c. occurs in the vicinity of cell types expressing distinct combinations of fibrillar collagen genes. In particular, calcification is associated to perichondral cells expressing high levels of Col1a1 and Col1a2 in the neural arches, and to chondrocytes expressing high levels of Col2a1 in the tesserae or experiencing a drastic Col2a1 downregulation in the centrum. In X.t., the moderate expression of Col2a1 in some osteoblasts differs from the situation described in actinopterygians and amniotes, suggesting that the osteoblastic expression of this gene was significantly reduced in the tetrapod lineage. Finally, we observe a striking parallel between the internal calcification of the vertebral cartilage of X.t. and S.c. and discuss the evolutionary origins of endochondral ossification.

# Materials and Methods

### *Scyliorhinus canicula* Biological Material

Scyliorhinus canicula embryos were obtained at the Station Méditerrannéenne de l'Environnement Littoral (SMEL, Sète, France) and raised in the laboratory at 18◦C. Embryos were euthanized by overdose of MS-222 (Sigma) following all animal-care specifications of the European ethics legislation. Whole embryos were fixed 48 h in PFA 4% in PBS 1× at 4 ◦C and then transferred in ethanol at −20◦C for storage. Dissected body parts (jaws or trunk sections) were rehydrated and transferred to a 25% sucrose bath for cryosection at 14µm thickness, and stored at −20◦C on alternative slides to get comparable sections on each slide. These sections were used for in situ hybridizations and Alizarin red—Alcian blue histological staining (see following sections). Dissected body parts were decalcified in MORSE (sodium citrate 10% and formic acid 20%) solution for 5 days before being transferred to paraplast blocs and sectioned at 10µm thickness. These sections were used for Hematoxylin-Eosin-Safran (HES) histological staining and anti-Col2 immunofluorescence. To perform immunofluorescence, dissected trunk slices from 6.7 cm-long embryos and dissected jaw from 9 cm-long embryo were demineralized for 3 h in MORSE solution at room temperature prior to dehydratation, embedded in paraplast and cut at 10–12µm thickness.

#### Histological and Immunological Stainings

The same histological procedures were performed for the catshark and clawed frog samples. Histological Alizarin red/Alcian blue double staining was performed by rehydrating samples 1 min in phosphate-buffered saline (PBS) 1X, incubating 30 s in a 0.005% Alizarin red S solution (in KOH 0.5%), washing once with PBS 1X, incubating for 2 min in a 0.02% Alcian blue 8G in solution (in 8:2 ethanol/glacial acid acetic), and washing once in EtOH 100% and once in PBS 1X. The slides were then mounted in mowiol. Hematoxilin-Eosin-Safran (HES) histological staining was performed following standard protocols. Col2 immunofluorescence was performed using a 1/200 dilution of a primary anti-collagen II (II-II6B3; Developmental Studies Hybridoma Bank, Iowa City, IA, USA) and a 1/1500 dilution of a secondary Goat polyclonal anti-Mouse IgG—AlexaFluor 594 (Abcam ab150116). For epitope retrieval, sections were treated with trypsin 0.05% (Sigma) in 0.1% CaCl2 buffer at pH7.8 buffer during 10 min at 37◦C. Cell nuclei were counterstained with Hoechst.

#### *Scyliorhinus canicula* Collagen Clones

Plasmids containing partial or complete collagen cDNA sequences were obtained through screening of a cDNA library of embryo RNA extracts (Oulion et al., 2010). Specific clones were identified by BLAST as Scyliorhinus canicula (Sc-) Collagen1a1 gene (Sc-Col1a1, NCBI accession numbers EU241868.1 and KT261785), Collagen1a2 gene (Sc-Col1a2, NCBI accession numbers EU241869.1 and KT261784), and Collagen2a1 gene (Sc-Col2a1, NCBI accession number EU241867.1). The sequences and details of the clones are provided in the **Data Sheet 1**. The phylogenetic relationships between proteic sequences were inferred by using the Maximum Likelihood method based on the Le\_Gascuel\_2008 model (Le and Gascuel, 2008). Initial tree(s) for the heuristic search were obtained by applying the Neighbor-Joining method to a matrix of pairwise distances estimated using a JTT model. A discrete Gamma distribution was used to model evolutionary rate differences among sites [4 categories (+G, parameter = 0.7935)]. The analysis involved 16 amino acid sequences. All positions containing gaps and missing data were eliminated. There were a total of 544 positions in the final dataset. Evolutionary analyses were conducted in MEGA6 (Tamura et al., 2013).

#### *Scyliorhinus canicula* and *Xenopus tropicalis* Probes

PCR products from specific amplification on Sc-Col1a2 and Sc-Col2a1 cDNA inserts were ligated into the pGEM-Teasy vector using the TA cloning kit (Promega). Sc-Col1a1 was directly amplified from the original cDNA clone. Xenopus tropicalis (Xt-) Xt-Col1a1 (NM\_001011005.1), Xt-Col1a2 (NM\_001079250.1), and Xt-Col2a1 (NM\_203889) were amplified from stage NF60 hindlimb cDNA containing both bone and cartilage and bluntcloned into the pBluescript vector. The PCR primers used in this study are given in **Supplementary Table 1**. Antisense DIG riboprobes were synthesized using the DIG RNA labeling mix (Roche) and the T3, T7 or Sp6 RNA polymerase (Promega) following the manufacturer's instructions. DIGlabeled riboprobes were purified on MicroSpin G50 column (GE Healthcare).

### *In situ* Hybridization on *Scyliorhinus canicula* Sections

DIG-labeled probes were hybridized at 70◦C overnight, sections were washed twice in 50% formamide, 1× SSC, 0.1% Tween-20 for 1 h at 70◦C, twice in MABT buffer for 30 min before blocking in blocking buffer (MABT, 2% blocking reagent from Roche, 20% inactivated sheep serum) for 2 h at room temperature. Sections were then exposed overnight to a 1:2000 dilution of anti-DIG-AP conjugate antibody (Roche) at 4◦C. After washing, slides were incubated with NBT-BCIP (Roche) staining solution according to the manufacturer's instructions and the reaction stopped by washing in water. Images of in situ hybridizations and histological stainings were taken under a Hamamatsu NanoZoomer 2.0-HT Slide Scanner (40× objective). Sense probe negative in situ hybridization results are shown in **Data Sheet 2**.

#### *Xenopus tropicalis* Animal Care and *In situ* Hybridization Procedure

Adult frogs are routinely maintained at the University of Concepcion following standard protocols established for Xenopus tropicalis. Embryos and tadpoles were obtained by natural mating and staged according to the Nieuwkoop and Faber developmental table (Nieuwkoop and Faber, 1967). Tadpoles were anesthetized with a solution of 200 mg/mL of MS-222 (Sigma) and subsequently decapitated, in agreement with international bioethical recommendations (Close et al., 1996; Ramlochansingh et al., 2014). The Ethics Committee of the University of Concepcion (Concepcion, Chile) approved all experimental procedures carried out during this study, which were performed following the guidelines outlined in the Biosafety and Bioethics Manual of the National Commission of Scientific and Technological Research (CONICYT, Chilean Government). Sense probe negative in situ hybridization results are shown in **Data Sheet 2**. In situ hybridizations on paraffin sections were performed as previously described (see **Data Sheet 3** and Espinoza et al., 2010; Aldea et al., 2013).

# Results

#### Skeletal Expression of the Major Fibrillar Collagen Genes in *Scyliorhinus canicula* Fins and Jaws

The Sc-Col1a1, Sc-Col1a2, and Sc-Col2a1 protein sequences were unambiguously associated to their respective orthology groups by phylogenetic analyses (**Data Sheets 4**, **5**). We examined calcification patterns by Alizarin red, Alcian blue, and HES stainings as well as the expression of Sc-Col1a1, Sc-Col1a2, and Sc-Col2a1 in developing S.c. fins and jaws (**Figure 1**).

Alizarin red is specific for high levels of calcium ions and will therefore stain calcified extracellular matrix, while Alcian blue has a strong affinity for glycosaminoglycans of the cartilage matrix. The HES staining classically allows the location of nuclei (dark purple), cytoplasms (pink), and densely organized collagen fibers (orange-pink). Both Safran and the acid aniline dye Eosin will stain the mineralized matrix more intensely than the nonmineralized matrix. Transverse sections through the pectoral fins showed that cartilaginous radials are devoid of calcification both in 7 cm long embryos (**Figures 1A–C'**) and 9 cm long embryos (not shown). By contrast, longitudinal sections of Meckel's cartilage from 9 cm long embryos allowed the detection of tesserae calcification at the cartilage periphery (**Figures 1H–J**). Tesserae calcification is associated to a darker HES staining of the hyaline matrix surrounding clusters of chondrocytes, and occurs

cartilaginous element located at the top. The arrowheads in (I,J'–N') demarcate the fibrous perichondrium from the cartilage. (J) Alizarin red and Alcian blue double staining. (J') Higher magnification of a tesserae located in a similar region as the area boxed in (J) and stained with HES. (K–M) Gene expression patterns in the jaw for Sc-Col1a1 [the inset in (K) shows a Sc-Col1a1 positive dermal denticle from the same section], Sc-Col1a2 (L) and Sc-Col2a1 (M). (N) Immunofluorescence using an anti-Type II collagen (Col2) antibody specifically marks the cartilaginous condensations of Meckel's cartilage. Insets in (C–N) are shown at higher magnification in (C'–N'), respectively. CZ, calcification zone of the tesserae; Ch, chondroctyces; Fb, fibroblasts; Pc, perichondrium; Pq, palatoquadrate. Scale bars: (C–G), 250µm; (J–N), 100 µm.

within the cartilaginous scaffold, one or two cell diameters away from the fibrous perichondrium (**Figure 1J'**).

In the pectoral fin, Sc-Col1a1 and Sc-Col1a2 are expressed in the fibrous perichondrium and the connective tissue surrounding the cartilaginous elements (**Figures 1D–E'**), and Sc-Col2a1 is expressed in the chondrocytes of the cartilaginous matrix of the radials (**Figures 1F,F**'). In the jaw, we failed to detect Sc-Col1a1 at the level of Meckel's cartilage, albeit an intense staining was observed in dermal denticles located on the same section and serving as an internal positive control (**Figures 1K,K'**). Sc-Col1a2 and Sc-Col2a1 transcripts were detected, respectively, in the fibrous perichondrium of Meckel's cartilage (**Figures 1L,L'**) and in the chondrocytes of the cartilaginous element (**Figures 1M,M'**). Immunofluorescence experiments performed on developing S.c. fins and jaws further confirmed the cartilage-specific expression of the Sc-Col2a1 protein (**Figures 1G,G',N,N'**). The punctuated localization of Sc-Col2a1 around the cell body of fin and jaw chondrocytes might result from low levels of expression, and is consistent with the concentration of this protein in the pericellular matrix, as reported in other species (Benjamin and Ralphs, 1991; Mizoguchi et al., 1997; Nah et al., 2001). Taken together, our results support the idea that S.c. tesserae growth and calcification occur within a Type I-negative and Type II-positive collagenous microenvironment (**Figures 1J–N'**).

#### Skeletal Expression of the Major Fibrillar Collagen Genes in *Scyliorhinus canicula* Vertebrae

The transverse sections of 6 cm embryos shown in **Figures 2A–D** reveal that the S.c. vertebrae are cartilaginous, devoid of calcification, and express Sc-Col2a1 (in chondrocytes of the centrum and the neural arches) and Sc-Col1a1 and Sc-Col1a2 (in the perichondrium surrounding all vertebral elements). In the vertebral column of 7 cm-long embryos, Alcian blue stains the cartilaginous vertebrate body and the neural arches (**Figures 2E–I**'). Alizarin red specifically stains the fibrous perichondrium of the neural arches as well as an internal calcification ring located within the centrum and surrounding the notochord, as reported in other chondrichthyan species (see **Figures 2E–K** and Peignoux-Deville et al., 1982; Eames et al., 2007). Histologically, the calcified ring of the vertebral body exhibits darker HES staining of the matrix surrounding large cells of chondrocytic appearance (**Figures 2G,J**). By contrast, cells located in the calcifying extracellular matrix of the neural arches are thin with reduced amount of cytoplasm (**Figures 2H,K**).

The expression of Sc-Col1a1 and Sc-Col1a2 was evident in the fibrous perichondrium and the connective tissue surrounding all vertebral elements (**Figures 2L–M'**) as well as in scattered cells embedded in the calcified layer of the neural arches (arrowheads in **Figures 2L',M'**). Nor Sc-Col1a1 neither Sc-Col1a2 were detected in the calcified layer of the vertebral body (the lighter ring-shaped signal in **Figures 2L–M'** is identical to the background observed in negative controls, see **Data Sheet 2**). While Sc-Col2a1 is expressed in most vertebral chondrocytes, it is significantly downregulated in cells embedded within the calcifying layer of the vertebral body (**Figures 2N,N'**). Likewise, an anti-type II collagen antibody intensely stained the cartilaginous, non-calcified, vertebral cartilage of the neural arches, and the centrum, as well as a thin layer surrounding the notochord (**Figures 2O,O'**). In agreement with the in situ hybridization results, the calcifying regions of the neural arches and of the vertebral body displayed a much fainter reaction to the Type II collagen antibody (arrowheads in **Figures 2O,O'**). Taken together, these observations reveal a negative correlation between Sc-Col2a1 expression and extracellular matrix calcification. By contrast, Sc-Col1a1 and Sc-Col1a2 are expressed in all perichondral cells of the vertebrae, regardless of their calcification degree.

#### Skeletal Expression of the Major Fibrillar Collagen Genes in the *Xenopus tropicalis* Limb

We examined the expression of Xt-Col1a1, Xt-Col1a2, and Xt-Col2a1 in the diaphysis and epiphysis of X.t. hindlimbs both before (stage NF54, **Figures 3A–C**) and after (stage NF60, **Figures 3M–O**) ossification. At stage NF54, Xt-Col1a1, and Xt-Col1a2 are most strongly expressed in perichondral cells of developing long bones (**Figures 3D–I**). At stage NF60, Xt-Col1a1, and Xt-Col1a2 transcripts are robustly detected in osteoblasts and in some osteocytes, albeit more weakly (**Figures 3P–U**). Finally, Xt-Col2a1 is expressed in all chondrocytes of NF54 non-calcified cartilaginous elements (**Figures 3J–L**), and is restricted to the epiphyseal chondrocytes at stage NF60 (**Figures 3V–X**).

#### Histology of the Developing *Xenopus tropicalis* Vertebrae

Because of the complex shape of the X.t. vertebrae, transverse sections either run through the lateral (**Figures 4A,D–F,K–P**) or the dorsal (**Figures 4A,G–I,Q–V**) region of the non-calcified (stage NF54, see **Figures 4B–I**) and calcified (stage NF57, see **Figures 4J–V**) neural arches protecting the neural tube. At stage NF57, the cartilage matrix is abundant (**Figures 4L,R**) and displays a pronounced HES and Alizarin red staining co-localizing at the level of the dorsal region underlying the notochord, and within the lateral and dorsal neural arches (see **Figures 4K,M,N,Q,S,T**). In addition, cartilage calcification and periosteal bone develop in contact to each other (**Figures 4O,P,U,V**).

#### Skeletal Expression of the Major Fibrillar Collagen Genes in the *Xenopus tropicalis* Vertebrae

Xt-Col1a1, Xt-Col1a2, and Xt-Col2a1 expression patterns were examined in the lateral and dorsal neural arch regions of the vertebrae (see **Figures 4E,F,H,I,O,P,U,V**). At stage NF54, Xt-Col1a1, and Xt-Col1a2 are expressed in scattered cells of mesenchymal appearance located in the vicinity of the cartilage (**Figures 5A,B**), as well as in a thin layer of perichondrium surrounding the dorsal neural arch (**Figures 5D,E**). At this early stage, Xt-Col2a1 is expressed in all chondrocytes and is also evident in the perichondrium of the dorsal neural arch (**Figures 5C,F**). At stage NF57, Xt-Col1a1, and Xt-Col1a2 are robustly expressed in osteoblasts lying onto the calcified

FIGURE 2 | Cartilage calcification and collagen expression in *Scyliorhinus canicula* vertebrae. (A–D) Transverse sections of the vertebrae of 6 cm-long embryos (black arrowheads show the hyaline cartilage of the neural arches). (A) Alcian blue and Alizarin red double staining revealing the distribution of the hyaline cartilage and the absence of detectable calcification. (B–D) In situ hybridizations showing the expression of Sc-Col2a1, Sc-Col1a1, and Sc-Col1a2, as indicated. (E) Schematic drawing of the vertebral anatomy from 9 cm-long S.c. embryos (lateral view) and of the orientation of the transverse sections (blue dotted line) represented in (F) and shown in (I–O'). (G) General histology of the centrum. (H) General histology of the neural arches. (I) Alizarin red and Alcian blue double staining. (J,K) HES staining of the centrum and of the neural arch. (L–N) In situ hybridizations showing the expression of Sc-Col2a1, Sc-Col1a1, and Sc-Col1a2, as indicated. Arrowheads in (L',M') indicate scattered Sc-Col1a1 and Sc-Col1a2 positive cells embedded in the calcified layer of the neural arches. (O) Immunofluorescence using an anti-Type II collagen (Col2) specific antibody. Higher magnifications of (I,L–O) are shown in (I',L'–O') respectively. Orange and black arrowheads show the calcifying matrix of the centrum and neural arches, respectively. Cc, chordocytes; Ch, chondroctyces; na, neural arch; nac, neural arch cartilage; ns, notochord sheath; nt, neural tube; ntc, notochord core; Pe, perichondrium; vb, vertebral body; vbc, vertebral body cartilage. Insets in (L–O) are shown at higher magnification in (L'–O'), respectively. Scale bars: (A–D) 250 µm; (I,L–O) 200 µm; (J,K) 50 µm.

FIGURE 3 | Comparison of the *Col1a1*, *Col1a2*, and *Col2a1* expression patterns during *Xenopus tropicalis* hindlimb development. Stage NF54 (top panel) or NF60 (bottom panel) hindlimbs were examined by whole mount Alizarin red staining (insets), sectioned along the proximo-distal axis and stained with HES, (A–C, M–O) or processed by in situ hybridization for the Xt-Col1a1, Xt-Col1a2, and Xt-Col2a1 probes, (D–L, P–X). Results are shown for the whole skeletal element (left column, scale bar: 500 µm) and higher magnifications of the diaphysis (middle column, scale bar: 50µm) and epiphysis (right column, scale bar: 50µm). Arrows and arrowheads show osteoblasts and osteocytes, respectively. In situ hybridization signal is light to dark blue, and brown endogenous X.t. pigment cells are visible on most sections. Legend: Bo, bone; Ch, chondrocytes; Me, medulla; Pe, perichondrium; Sm, striated muscles.

region of the neural arches. (B) Color code used to represent the distinct skeletal tissues of the X.t. vertebrae in (F,I,N,T,P,V). (C) Whole mount Alizarin red staining of stage NF54 vertebral column (lateral view, anterior to the left). (D–I) Histology of the stage NF54 vertebrae examined with HES (D,E,G,H). (J) Whole mount Alizarin red staining of stage NF57 vertebral columns (lateral view, anterior to the left). (K–V) Histology of the stage NF57 vertebrae examined with HES (K,O,Q,U), Alcian blue (L,R) and Alizarin red (M,S). Insets in D, G, K and Q are shown in F, I, P and V, respectively. Panels E, H, K, O, Q and U are schematized in F, I, N, P, T and V, respectively. Abbreviations: nt, neural tube; ntc, notochord. Scale bars: 1 mm in (C,J); 250 µm in (D,G); 50 µm in (E,F) and (H,I); 500 µm in (K–N) and (Q–T); and 50µm in (O,P,U,V).

bone matrix of the vertebrae (arrows in **Figures 5G,H,J,K**). These osteoblasts also express Xt-Col2a1, albeit more weakly than hypertrophic chondrocytes (**Figures 5I,L**). In chondrocytes, Xt-Col2a1 is excluded from the Alizarin red-positive regions (asterisk in **Figures 5I,L**), forming sharp expression boundaries between calcified and non-calcified cartilage (dotted line in **Figures 4P,V**, **5I,L**). In addition, at stages NF54 and NF57, we detected a strong Xt-Col2a1 staining in the epithelial nonvacuolated cells of the notochord (arrowheads in **Figures 5C,I**), a known site of Col2a1 expression in cyclostomes and teleosts (Ota and Kuratani, 2010; Yamamoto et al., 2010).

# Discussion

#### Conserved Early Molecular Patterning of the Hyaline Cartilage and Non-calcified Perichondrium

In non-calcified S.c. skeletal elements, the expression patterns of the Col1a1/Col1a2 (perichondrium) and Col2a1 (cartilage) genes do not overlap. By contrast, in actinopterygians, Col2a1 orthologs are expressed in the perichondrium, albeit at lower levels than in cartilage (Albertson et al., 2010; Eames et al., 2012). Likewise, our results in X.t. reveal a faint Xt-Col2a1 expression in the non-calcified perichondrium of the dorsal neural arch at stage NF54. It is likely that more sensitive techniques will help assess the expression levels of Xt-Col2a1 in the perichondrium of the X.t. lateral neural arch or hindlimb, two sites where it was not detected by in situ hybridization. Interestingly, Clade A fibrillar collagen members from lamprey and hagfish are expressed both in perichondral cells and in chondrocytes, while the amphioxus ortholog is expressed in chondrocytes and in the mesenchyme located at the tip of regenerating cirri (Zhang and Cohn, 2006, 2008; Zhang et al., 2006; Ota and Kuratani, 2010; Cattell et al., 2011; Kaneto and Wada, 2011). Altogether, these data suggest that the largely complementary expression patterns of Col1a1/Col1a2 (exclusively in the fibrous perichondrium) and Col2a1 (preferentially in the hyaline cartilage) represent a synapomorphy of non-calcified skeletal elements in jawed vertebrates. It is therefore tempting to propose that the Clade A precursor was expressed in chondrocytes and perichondral cells, and that the functional partitioning of ancestral enhancers was involved in this expression divergence (Force et al., 1999; Zhang and Cohn, 2008). According to this scenario, after the genomic duplications that gave rise to the complete set of Clade A members, the Col1a1 and Col1a2 genes would have rapidly lost their cartilage-specific enhancers, while the activity of perichondral Col2a1 enhancers would have been dramatically reduced, or abolished, in distinct jawed vertebrate lineages.

#### *Col2a1* Osteoblastic Expression was Significantly Reduced in the Tetrapod Lineage

We detected X.t. Col2a1 transcripts in osteoblasts of the vertebrae, albeit they displayed a weaker in situ hybridization signal than hypertrophic chondrocytes present on the same section (**Figures 5I,L**), which is consistent with expression results obtained with primary cultures of X.t. osteoblasts (Bertin et al., 2015). While Col2a1 is traditionally considered to be a chondrocyte-specific marker (Kobayashi and Kronenberg, 2005; Hartmann, 2009), its robust osteoblastic expression has been reported in embryos from several species of actinopterygian fishes (Benjamin and Ralphs, 1991; Albertson et al., 2010; Eames et al., 2012). The moderate Col2a1 expression levels described in the clawed frog (this study), chick (Abzhanov et al., 2007) and mouse (Hilton et al., 2007) therefore support the idea that the osteogenic transcription of Col2a1 was significantly reduced in the tetrapod lineage, and almost completely abolished in mammals (**Figure 6**).

#### *Scyliorhinus canicula* Neural Arches, Tesserae, and Centrum Calcification Occur in Distinct Molecular Contexts

Our results reveal that at least three skeletal sites expressing different combinations of collagen genes are associated with robust S.c. calcification in: (i) the fibrous perichondrium of the neural arches, (ii) the tesserae developing in Meckel's cartilage, and (iii) the compact cartilage embedded within the vertebral bodies.

In neural arches, the cartilaginous scaffold is surrounded by a fibrous perichondrium whose matrix is highly calcified and devoid of Col2 protein, and whose cells express Sc-Col1a1 and Sc-Col1a2 and no detectable levels of Sc-Col2a1 (**Figures 2**,**5**). The evolutionary relationship between this calcified perichondrium

expression patterns were mapped onto a simplified vertebrate phylogenetic tree to deduce ancestral states and polarize evolutionary change. We propose that the ancestral Clade A fibrillar collagen gene (i.e., before the duplications that produced the distinct member of this family) was expressed in the non-calcified perichondrium. This expression pattern was inherited by the unique cyclostome fibrillar collagen gene which is more closely related to the Col2a1 subgroup. In jawed vertebrates, perichondral cells and osteoblasts maintained high levels of Col1a1 and Col1a2 while the Col2a1 osteoblastic expression was dramatically reduced in most (but not all) lineages. The presence of bone in placoderms and tetrapods supports the idea that the calcified fibrous perichondrium observed in some chondrichthyan species either represents bone evolutionary remnants (Hypothesis 1) or a secondary gain of calcification (Hypothesis 2). Osteocytes have been omitted for the sake of simplicity. See text for details.

and the osteichthyan bone has remained enigmatic and controversial (Peignoux-Deville et al., 1982; Eames et al., 2007; Zhang et al., 2009; Ryll et al., 2014). In the light of fossil evidence demonstrating that extant chondrichthyans are quite derived, having lost the perichondral bone surrounding the cartilaginous elements (Coates et al., 1998; Donoghue and Sansom, 2002), two hypotheses might account for the unusual calcification pattern observed in neural arches (**Figure 6**). On the one hand, it is possible that the perichondral bone was dramatically reduced to some evolutionary remnants of calcified fibrous perichondrium located in the neural arches (hypothesis 1). In this case, the cells involved in matrix calcification would correspond to highly derived osteoblasts having lost many crucial cellular features typically observed in osteichthyans, such as the ability to organize as a polarized pseudoepithelium (Izu et al., 2011; Liu et al., 2011). On the other hand, the perichondral bone might have been completely lost, and secondarily compensated by an independent ability to calcify the perichondral extracellular matrix (hypothesis 2). Below, we discuss two complementary strategies that might help resolve this issue. Firstly, a broader phylogenetic sampling is required to precisely assess the occurrence of a calcified perichondrium in neural arches, which currently seems to be limited to some chondrichthyan species. For instance, the skeleton of holocephalans displays little or no calcified tissue (a ring-shaped calcification of the centrum is reported in some fossil holocephalan and in the extant genus Chimaera) while batoids (rays and skates) have a tesserae-based calcification at the surface of their vertebral units (Reynolds, 1897; Goodrich, 1930; Zangerl, 1981). Secondly, it will be important to investigate the nature of the Col1a1 and Col1a2 positive cells embedded within the mineralized matrix (**Figures 2E',F'**). Indeed, such cells have been proposed to be osteocytes (Peignoux-Deville et al., 1982), which is consistent with the fact that cellular bone evolved before the origin of the jawed vertebrates (Donoghue and Sansom, 2002; Donoghue et al., 2006; Sanchez et al., 2013). Extensive phenotypical and molecular similarities between the scattered cells embedded within the S.c. calcified perichondrium and osteichthyan osteocytes would support their homology, and, therefore, the aforementioned hypothesis 1.

Another site of calcification in S.c. corresponds to the developing tesserae embedded in Meckel's cartilage, a process classically described to occur at the surface of the cartilaginous skeletal piece (Kemp and Westrin, 1979; Dean et al., 2009). As we show here, the onset of this type of calcification takes place in a Col2-positive context, within the cartilaginous scaffold (**Figures 1**, **6**). We failed to detect Col1a1/Col1a2 expression in the chondrocytes neighboring the mineralized matrix, suggesting that the cellular processes involved in matrix calcification are very different from what has been described in osteichthyan bone or chondroid bone (Mizoguchi et al., 1997). This type of calcification is well developed in extant batoid and selachimorph species, and is also known in fossil holocephalan species (Grogan and Lund, 2000; Finarelli and Coates, 2014) and, therefore, is considered to be an early evolutionary innovation of the chondrichthyan lineage (**Figure 7**).

Below, we will discuss the third type of calcification mechanism, which occurs in the Col1a1/Col1a2 negative S.c. vertebral cartilage experiencing a drastic Col2a1 downregulation, in the light of the striking similarities that it shares with the X.t. vertebrae.

#### An Ancient Type of Calcified Vertebral Cartilage Associated to the Down-regulation of *Col2a1*

The tetrapod hyaline cartilage calcifies its extracellular matrix, albeit to a much lesser extent than the bone tissue (Claassen et al., 1996; Khanarian et al., 2014) and, therefore, only weakly stains with Alizarin red (Kirsch et al., 1997). Here, we report an unusual type of calcified cartilage displaying remarkable similarities between X.t. and S.c. at three distinct biological levels: (i) anatomically, this cartilage is located in the vertebrae of both species, and, at least at the stages analyzed, in no other skeletal elements; (ii) from an histological perspective its robust calcification is reflected by intense Alizarin red and HES stainings; (iii) molecularly, both types of cartilages are Col1a1/Col1a2 negative and probably experience a Col2a1 downregulation, because in both species all cells of the vertebral cartilage express Col2a1 during early, non-calcified, developmental stages (see **Figures 2B,N,N'**, **5C,F,I,L**). In this respect, both types of vertebral cartilages seem to recapitulate the initial phase of endochondral bone formation typically seen in tetrapod long bones, during which proliferative chondrocytes progressively downregulate the expression of Col2a1, undergo hypertrophy, and calcify their extracellular matrix (**Figure 7**). Our observations, combined to data from mouse (Chandraraj and Briggs, 1988), and lizards (Lozito and Tuan, 2015), suggest that an calcified form of vertebral cartilage was present in the last common ancestor of jawed vertebrates, at least as a transitory developmental process.

As vertebral developmental processes are highly variable, homology relationships between the calcified ring surrounding the S.c. notochord and the calcified cartilage of the X.t. vertebrae cannot be inferred (Fleming et al., 2015). Rather, we propose that the genetic programme involving a downregulation of the Col2a1 gene predates the emergence of the last vertebrate common ancestor, and was subsequently co-opted and modified to produce a variety of novel non-calcified (Zhang and Cohn, 2006; Zhang et al., 2009) and calcified (Hogg, 1982; Claassen et al., 1996; Janvier and Arsenault, 2002; Porter et al., 2007) cartilaginous structures (**Figure 7**). One intriguing possibility is that the ancient, Col2a1-negative, calcified cartilage present in the last common ancestor of jawed vertebrates later came to play a key role in the subsequent elimination of cartilaginous matrix and its replacement by bone tissue. In this respect, it might have served as a crucial pre-patterning step contributing to the emergence of endochondral ossification commonly observed in tetrapods and whose precise origin still remains to be determined. In the future, a comprehensive comparison of gene expression signatures between cell types present in diverse skeletal tissues, anatomical locations, developmental stages, and species will provide a solid basis to unravel the complex and fascinating evolutionary history of the vertebrate skeleton.

a simplified vertebrate phylogenetic tree to deduce ancestral states and polarize evolutionary change. We propose that, in the last vertebrate common ancestor, the expression of Col2a1 experienced a strong downregulation in maturing, non-calcified, cartilaginous regions. This downregulation was subsequently inherited by distinct vertebrate lineages, and is associated to hard cartilage in cyclostomes and to calcified cartilage in jawed vertebrates. The chondrichthyan and osteichthyan representatives analyzed in this study display a calcified Col2a1-negative vertebral cartilage, a likely jawed vertebrate synapomorphy. Tesserae calcification, a recent chondrichthyan innovation, occurs in the absence of Col2a1 downregulation. Perichondrium and bone have been omitted for the sake of simplicity. See text for details.

## Acknowledgments

We are grateful to both reviewers for their constructive comments that improved the quality of the present manuscript. We thank Isabelle Germon, Marie-Ka Tilak and Fabienne Justy for help with molecular biology. This research was funded by a FONDECYT grant 1151196 to SM and by PEPS ExoMOD to MDT. This is ISEM contribution # ISEM 2015-146.

### Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fgene. 2015.00283

Supplementary Table 1 | list of *Scyliorhinus canicula* and *Xenopus tropicalis* specific primers.

Data Sheet 1 | Sequences of the *S.c.* clones.

Data Sheet 2 | Sense probe negative *in situ* hybridization results for *Scyliorhinus canicula* and *Xenopus tropicalis*.

Data Sheet 3 | *X.t. in situ* hybridization protocol.

Data Sheet 4 | Molecular Phylogenetic analysis of gnathostome Clade A fibrillar collagen genes. Phylogenetic relationships were inferred and the tree with the highest log likelihood (-7413.8950) is shown. The percentages of trees in which the associated taxa clustered together are indicated. The tree is drawn to scale, and branch lengths correspond to the number of substitutions per site. Orthology groups (OG) are identified as blue circles, and S.c. sequences are shown in red. The tree was rooted according to (Zhang and Cohn, 2008).

Data Sheet 5 | Fibrillar collagen sequences.

## References


expression patterns with mammals in spite of their highly divergent regulatory regions. Evol. Dev. 12, 541–551. doi: 10.1111/j.1525-142X.2010.00440.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Enault, Muñoz, Silva, Borday-Birraux, Bonade, Oulion, Ventéo, Marcellini and Debiais-Thibaud. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# On the evolutionary relationship between chondrocytes and osteoblasts

#### *Patsy Gómez-Picos and B. Frank Eames\**

*Department of Anatomy and Cell Biology, University of Saskatchewan, Saskatoon, SK, Canada*

Vertebrates are the only animals that produce bone, but the molecular genetic basis for this evolutionary novelty remains obscure. Here, we synthesize information from traditional evolutionary and modern molecular genetic studies in order to generate a working hypothesis on the evolution of the gene regulatory network (GRN) underlying bone formation. Since transcription factors are often core components of GRNs (i.e., kernels), we focus our analyses on Sox9 and Runx2. Our argument centers on three skeletal tissues that comprise the majority of the vertebrate skeleton: immature cartilage, mature cartilage, and bone. Immature cartilage is produced during early stages of cartilage differentiation and can persist into adulthood, whereas mature cartilage undergoes additional stages of differentiation, including hypertrophy and mineralization. Functionally, histologically, and embryologically, these three skeletal tissues are very similar, yet unique, suggesting that one might have evolved from another. Traditional studies of the fossil record, comparative anatomy and embryology demonstrate clearly that immature cartilage evolved before mature cartilage or bone. Modern molecular approaches show that the GRNs regulating differentiation of these three skeletal cell fates are similar, yet unique, just like the functional and histological features of the tissues themselves. Intriguingly, the Sox9 GRN driving cartilage formation appears to be dominant to the Runx2 GRN of bone. Emphasizing an embryological and evolutionary transcriptomic view, we hypothesize that the Runx2 GRN underlying bone formation was co-opted from mature cartilage. We discuss how modern molecular genetic experiments, such as comparative transcriptomics, can test this hypothesis directly, meanwhile permitting levels of constraint and adaptation to be evaluated quantitatively. Therefore, comparative transcriptomics may revolutionize understanding of not only the clade-specific evolution of skeletal cells, but also the generation of evolutionary novelties, providing a modern paradigm for the evolutionary process.

Keywords: EvoDevo, comparative transcriptomics, Sox9, Runx2, bone, cartilage, GRN

# Introduction: Cartilage and Bone might Share an Evolutionary History

Most of evolutionary theory has focussed on studies of morphological change (morphogenesis) among taxa, but the formation of tissue types (histogenesis) also can evolve in clade-specific manners. Therefore, we focus our attentions on a relatively understudied subject of evolutionary research: the evolution of histogenesis. A classic problem in evolutionary theory is to explain

#### *Edited by:*

*Hector Escriva, Centre National de la Recherche Scientifique, France*

#### *Reviewed by:*

*Sylvain Marcellini, University of Concepcion, Chile Daniel Medeiros, University of Colorado Boulder, USA*

#### *\*Correspondence:*

*B. Frank Eames, Department of Anatomy and Cell Biology, University of Saskatchewan, 3D01-107 Wiggins Road, Saskatoon, SK S7N 5E5, Canada b.frank@usask.ca*

#### *Specialty section:*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics*

> *Received: 02 July 2015 Accepted: 07 September 2015 Published: 23 September 2015*

#### *Citation:*

*Gómez-Picos P and Eames BF (2015) On the evolutionary relationship between chondrocytes and osteoblasts. Front. Genet. 6:297. doi: 10.3389/fgene.2015.00297* novelties, or traits with no clear ancestral antecedent (Shubin, 2002; Moczek, 2008; Wagner and Lynch, 2010). For example, vertebrates are the only animals that produce bone, but so far, the molecular genetic basis for this evolutionary novelty remains obscure. Here, we synthesize information from traditional evolutionary and modern molecular studies in order to generate a working hypothesis on the evolution of the genetic system underlying bone formation. Many studies argue that bone evolved from dentine (Kawasaki et al., 2004; Wagner and Aspenberg, 2011). However, using molecular genetic and embryological arguments that favor gradualism over saltationism (Gould, 2002), we hypothesize that bone (and perhaps all mineralizing tissues, such as dentine) appeared during evolution by co-opting a gene regulatory network (GRN) that was under prior natural selection to mineralize cartilage. In order to present an argument for skeletal tissue development and evolution over the past 500 million years, we make some generalizations that may trouble some readers, of whom we ask their indulgence, hoping that such generalizations help to reveal broader trends during the evolution of skeletal tissues.

An introductory look at the similarities and differences among cartilage and bone suggests that the underlying GRNs may be related. Cartilage and bone are specialized connective tissues that provide form and structural support to the body, protect vital organs, and play a crucial role in locomotion through muscle attachments (Gray and Williams, 1989). Despite these similarities, they also have distinct functions (**Figure 1**). Cartilage typically offers a flexible structure to support soft tissues and also to serve as a load-bearing surface between bones. On the other hand, bone is a hard, rigid structure that protects vital organs and acts as a storage site for minerals, such as calcium and phosphorus (Smith and Hall, 1990; Volkmann and Baluska, 2006). Also unlike cartilage, which has almost no capacity for regeneration, bone is a highly dynamic structure that undergoes constant remodeling, preserving bone strength and regulating calcium homeostasis (Datta et al., 2008). Perhaps related to regenerative capacity, these tissues differ in vascularity. Bone is highly vascularized, but cartilage typically is avascular. However, important exceptions to cartilage vascularization occur. Mature cartilage in tetrapods often is invaded by vasculature as it degrades, creating the marrow cavity (Johnson, 1980; Roach, 1997; Stricker et al., 2002; Ortega et al., 2004; Moriishi et al., 2005), and even immature cartilage is highly vascularized near articulating surfaces in some avian and mammalian species (Ytrehus et al., 2004; Blumer et al., 2005). When cartilage extracellular matrix (ECM) undergoes mineralization, its functions change. In some vertebrates, such as sharks, mineralized cartilage can serve as the major rigid structural support for the body, meanwhile providing a mineral reservoir (Daniel, 1934; Kemp and Westrin, 1979; Eames et al., 2007). In most extant vertebrates, however, mineralized cartilage mainly serves as a scaffold during endochondral ossification, outlined below.

During embryonic development, cartilage and bone formation share many features (**Figure 1**). Both cartilage and bone are differentiated from common mesenchymal (osteochondral) progenitor cells (Fang and Hall, 1997; Day et al., 2005; Hill et al., 2005). Both cartilage and bone initiate overt differentiation by aggregating mesenchymal cells into condensations, which can go on directly to secrete cartilage- or bone-specific matrix (Hall and Miyake, 1995, 2000; Kronenberg, 2003; Day et al., 2005). However, a unique feature of bone formation is that, in addition to differentiating directly from an osteogenic condensation (intramembranous ossification), bone also forms on a pre-existing cartilage template (endochondral ossification). Endochondral ossification actually involves the formation of the three skeletal tissues that comprise the majority of the extant vertebrate skeleton: immature cartilage, mature cartilage and bone (Eames et al., 2003, 2004; Eames and Helms, 2004). Some cartilage remains throughout development at the growth plates and throughout life at articular surfaces (we term this immature cartilage). Most of the cartilage produced during endochondral ossification, however, undergoes a series of changes, termed maturation (thus the terms immature vs. mature cartilage). In most vertebrates, cartilage maturation involves cell hypertrophy, matrix mineralization, cell death, and matrix degradation (Leboy et al., 1988; Hatori et al., 1995; Takeda et al., 2001; Miura et al., 2008). Although exceptions exist (Thorogood, 1988; Hirasawa and Kuratani, 2015), endochondral ossification typically gives rise to the bones of the endoskeleton, such as the chondrocranium or limb skeleton, whereas intramembranous ossification produces the exoskeleton, such as lateral plates in teleosts or the calvarium (Smith and Hall, 1990).

Histologically, immature cartilage, mature cartilage, and bone are very similar, yet each also has some unique features (**Figure 1**). All three skeletal tissues are comprised of cells embedded in an ECM that is rich in collagens and proteoglycans (Hardingham, 1981; Eames et al., 2003, 2004; Eames and Helms, 2004; Gentili and Cancedda, 2009). Immature cartilage is formed by chondrocytes that deposit a network of loose collagen fibers and a rich substance of proteoglycans, whereas chondrocytes of mature cartilage alter the immature cartilage ECM by decreasing its proteoglycan sulfation and mineralizing it (Lohmander and Hjerpe, 1975; Buckwalter et al., 1987; Bayliss et al., 1999). The requirement of proteoglycan degradation for mature cartilage ECM mineralization is debated (Hirschman and Dziewiatkowski, 1966; Granda and Posner, 1971; Poole et al., 1982; Campo and Romano, 1986). Bone is formed by osteoblasts that produce an ECM of tightly wound and highly cross-linked collagen fibers, and bone ECM has lower levels of proteoglycans than cartilage (Gentili and Cancedda, 2009). As a result of these collagen and proteoglycan concentrations, these three skeletal tissues have overlapping and unique histological staining patterns. High concentrations of sulfated proteoglycans cause immature cartilage to stain with Alcian blue and Safranin O (by comparison, mature cartilage and bone bind these dyes with decreasing intensity, respectively). The tightly wound collagen fibers of bone stain with Direct red and Aniline blue (by comparison, loose collagen fibers of cartilage matrix bind these dyes with lower intensity; Villanueva et al., 1983; Hall, 1986; Eames and Helms, 2004; Eames et al., 2004, 2007). Alizarin red can stain mineralized tissues of mature cartilage and bone (Hogg, 1982; Kirsch et al., 1997; Eames and Helms, 2004; Eames et al., 2007).

Immature cartilage, mature cartilage, and bone have overlapping, but distinct, gene and protein expression profiles


FIGURE 1 | Similarities and differences among immature cartilage, mature cartilage, and bone suggest that these three skeletal tissues share an evolutionary history.

(**Figure 1**). All these skeletal tissues express Collagen 11 and the proteoglycans Biglycan and Decorin (Li et al., 1998; Knudson and Knudson, 2001; Rees et al., 2001; Roughley, 2006). Immature cartilage expresses high levels of Collagens 2 and 9, as well as the proteoglycans Aggrecan, Fibromodulin, and Epiphycan, which distribute growth factors and provide swelling pressure due to water attraction (Yanagishita, 1993; Lefebvre et al., 1997; Lefebvre and de Crombrugghe, 1998; Watanabe et al., 1998; Liu et al., 2000). Mature cartilage has reduced expression of these same collagens and proteoglycans, while also expressing high levels of Collagen 10 (Orth et al., 1996; Eames et al., 2004; Talwar et al., 2006). In contrast to both types of cartilage, bone expresses high levels of Collagen 1 (Yasui et al., 1984; Kream et al., 1995). Interestingly (and central to the argument of this review), both mature cartilage and bone share expression of genes not expressed in immature cartilage, including *Sp7* (formerly called *Osterix)*, *Matrix metallopeptidase 13* and *Indian hedgehog* (Vortkamp et al., 1996; Inada et al., 1999; Neuhold et al., 2001; Zaragoza et al., 2006; Abzhanov et al., 2007; Mak et al., 2008; Huycke et al., 2012; Nishimura et al., 2012; Weng and Su, 2013). In fact, very few genes expressed in bone are not expressed in mature cartilage, and this list of genes decreases further when comparisons among mature cartilage and bone are carried out in actinopterygians (Eames et al., 2012). Multiple genes associated with matrix mineralization are expressed in both mature cartilage and bone, such as *Alkaline phosphatase, liver/bone/kidney* (*Alpl,* formerly called *Tissue-nonspecific alkaline phosphatase*), *Secreted phosphoprotein 1* (*Spp1,* formerly called *Osteopontin* or *Bone sialoprotein*)*, Secreted protein, acidic, cysteine-rich* (*Sparc*, formerly called *Osteonectin*), and *Bone gamma-carboxyglutamate protein* (*Bglap,* formerly called *Osteocalcin*; Termine et al., 1981; Pacifici et al., 1990; Chen et al., 1991; Bonucci et al., 1992; McKee et al., 1992; Mundlos et al., 1992; Nakase et al., 1994; Roach, 1999; Sasaki et al., 2000).

Currently, the evolutionary relationship among skeletal tissues is unclear, but the similarities highlighted above suggest that immature cartilage, mature cartilage, and bone share an evolutionary history. From a molecular genetic perspective, these observations lead to the hypothesis that the GRNs governing the formation of these three skeletal tissues (in particular, the differentiation of three skeletal cell types) also share an evolutionary history. Indeed, the many varieties of skeletal tissues intermediate between cartilage and bone observed in extant and fossil vertebrates may owe their existence to this shared history (Benjamin, 1990; Benjamin and Ralphs, 1991; Benjamin et al., 1992; Mizoguchi et al., 1997; Hall, 2005; Witten et al., 2010). In this review, we explore this hypothesis using traditional evolutionary and modern molecular genetic studies. We are not focussing on the exact anatomical location of a tissue, given that once the GRN regulating formation of that skeletal tissue is established in the genome, any cell in the body can co-opt its expression. Traditional studies have provided insight into the evolutionary relationship among skeletal tissues, since they demonstrate that immature cartilage originated first during phylogeny (Mallatt and Chen, 2003; Rychel et al., 2006). Interestingly, modern molecular genetic studies reveal that two GRNs dictate the formation of these three skeletal tissues (Bi et al., 1999; Inada et al., 1999; Eames et al., 2004; Hattori et al., 2010; Leung et al., 2011), and also that the GRN underlying cartilage formation is dominant to that of bone (Eames et al., 2004; Zhou et al., 2006). We expand upon this finding using an argument based on the relative parsimony of gradualism versus saltationism to hypothesize that bone evolved from a cartilage maturation program. In closing, we discuss how comparative transcriptomics will enhance dramatically our ability to test hypotheses on the evolution of the GRNs underlying cartilage and bone formation.

# GRN Underlying Immature Cartilage Formation Evolved First

Traditional studies, such as the fossil record, comparative anatomy, and embryology, demonstrate that the first skeletal tissue to evolve was immature cartilage (**Figure 2**). The fossil record reveals a great diversity of mineralized tissues about 500 million years ago (Mya; Janvier, 1996, 2015; Donoghue and Sansom, 2002; Donoghue et al., 2006), suggesting that GRNs of skeletal histogenesis were undergoing an adaptive radiation. So which skeletal tissue appeared first in the fossil record? This question is complicated by the facts that currently discovered fossils may represent a biased fraction of ancestral tissues, and that non-mineralized, lightly mineralized, or transiently mineralized tissues likely are not preserved well in the fossil record. Despite these limitations, however, the oldest skeletal tissue in the fossil record is unmineralized cartilage in the chordate fossil *Haikouella* from 530 Mya (**Figure 2A**; Mallatt and Chen, 2003). Many specimens preserving soft tissues of this incredibly important fossil have been found, but they appear to be represented only in a small region of the Yunnan province in China (Chen et al., 1999), reflecting potential bias in the fossil record.

Bone and mature cartilage appeared much later than immature cartilage in the fossil record (**Figure 2A**). Conodonts, a group of agnathans (jawless vertebrate fish), are the earliest (∼515 Mya) known fossils with a mineralized skeleton, characterized by pharyngeal tooth-like elements comprised of tissues that were bone-like, enamel-like, and mineralized cartilage-like (Sansom et al., 1992). However, subsequent analyses of conodont fossils refuted the conclusion that bone or mineralized cartilage was present in these primitive jawless fish, instead attributing the first appearance of bone in the fossil record to the exoskeleton of pteraspidormorphi (∼480 Mya), a group of armored agnathans (Janvier, 1996; Donoghue, 1998; Donoghue et al., 2006). Interestingly, some pteraspidomorph species (e.g., eriptychiids and arandaspids) and other, primitive fossil fish show traces of both mineralized cartilage and bone in their endoskeleton (Janvier, 1996, 1997; Zhang et al., 2009). Also, fossils of the ancestral vertebrate *Palaeospondylus gunni* (∼385 Mya) reveal an entire adult skeleton comprised of hypertrophic, mineralized cartilage, while bone is completely absent (Johanson et al., 2010). Despite these findings, the current fossil record generally suggests that bone preceded mineralized cartilage (Smith and Hall, 1990; Janvier, 1997; Donoghue et al., 2006), although the molecular genetic and embryological arguments of this review call into question the accuracy of this conclusion. What is clear from the fossil record is that unmineralized cartilage was the first skeletal tissue to appear leading to the evolution of vertebrates (Northcutt and Gans, 1983; Smith and Hall, 1990).

Comparative anatomy also supports the notion that immature cartilage was the first skeletal tissue to evolve, because immature cartilage is distributed in a broader range of taxonomic lineages than mature cartilage or bone (**Figure 2B**). Immature cartilage appears in both vertebrate and non-vertebrate species, whereas mature cartilage and bone are shared, derived traits of vertebrates only (Cole and Hall, 2004, 2009; Rychel et al., 2006). In a seminal study by Cole and Hall (2004), cartilage was demonstrated in a variety of taxonomically distinct invertebrates, such as polychaetes, arthropods, and molluscs. Reflecting the different evolutionary histories of immature and mature cartilage, cartilage in any invertebrate lineage, and also in extant agnathans, is unmineralized (Cole and Hall, 2004; Hall, 2005). The finding that lamprey cartilage can mineralize *in vitro* suggests that early agnathans may have possessed mineralized cartilage and these mineralization programs were repressed in cyclostomes (Langille and Hall, 1993).

The taxonomic distribution of cartilage suggests that the ancestor of vertebrates, cephalochordates, and hemichordates had an ability to make immature cartilage (**Figure 2B**). In fact, the deuterostome ancestor was proposed to be a benthic worm with cartilaginous gill slits (Rychel et al., 2006). Homology between invertebrate and vertebrate cartilages is supported by biochemical and histological analyses, which demonstrate high amounts of fibrous proteins and mucopolysaccharides (Cole and Hall, 2004; Cole, 2011). In fact, recent studies have shown that the cirri in amphioxus share many histological and molecular features with vertebrate immature cartilage (Kaneto and Wada, 2011; Jandzik et al., 2015). However, homology between deuterostome and protostome cartilage is still uncertain and must be confirmed by modern molecular analyses, including examination of gene expression patterns, GRN architectures, and GRN regulation. The ECM of hemichordate skeletal tissues may show features of both cartilage and bone (Cole and Hall, 2004), supporting the notion that these two tissues share an evolutionary history. Mineralized cartilage and bone, however, are only found in extant gnathostomes (**Figure 2**). These comparative anatomy analyses suggest that immature cartilage evolved before mature cartilage and bone.

Final support for the idea that cartilage arose earlier in evolution than mature cartilage and bone comes from comparative embryology. While the Biogenetic Law of Ernst

Haeckel definitely has its theoretical problems (Haeckel, 1866), a general correlation (recapitulation) between the timing of events during ontogeny with events during phylogeny is undeniable. Indeed, many early evolutionary biologists assumed this to be true (Gould, 2002). In this context, it is interesting to note that immature cartilage is the first skeletal tissue to undergo histogenesis during embryonic development, while cartilage maturation and bone formation are later events. The relative timing of cartilage maturation to bone formation, on the other hand, appears to vary among vertebrate taxa (Mori-Akiyama et al., 2003; Eames et al., 2004, 2012; Moriishi et al., 2005). While such relationships between the timing of developmental events have been argued to reflect simply the increasing complexity of ontogeny during phylogeny (Wallace, 1997), we believe that this issue, which has been debated for 100s of years, remains unresolved.

To sum up traditional studies of the fossil record, comparative anatomy, and embryology, the ability to make immature cartilage predates the ability to make mature cartilage or bone during evolution. Therefore, from a molecular genetic perspective, the GRN governing chondrocyte differentiation clearly appeared prior to that of the osteoblast. However, traditional approaches are still unclear whether mature cartilage or bone appeared next during evolution. With hopes that modern molecular and embryological analyses can shed light into the evolutionary origins of the vertebrate skeleton, we next discuss how the GRNs underlying the formation of immature cartilage, mature cartilage, and bone are organized.

# Sox9 GRN is Dominant to the Runx2 GRN

Skeletal histogenesis is governed by complex sets of genes, largely controlled by central transcription factors that are responsible for determining cell fate decisions (Eames et al., 2003, 2004; Kronenberg, 2003; Karsenty et al., 2009). Molecular genetic experiments demonstrate that the transcription factors Sox9 and Runx2 are the "master regulatory genes" of skeletal histogenesis. Sox9 and Runx2 expression patterns during mesenchymal condensation predict whether osteochondroprogenitor cells differentiate into immature cartilage, mature cartilage, or bone (Eames and Helms, 2004; Eames et al., 2004). Loss of Sox9 function abrogated immature and mature cartilage formation (Bi et al., 1999; Mori-Akiyama et al., 2003), whereas Runx2 loss of function blocked mature cartilage and bone formation (Hoshi et al., 1999; Inada et al., 1999; Kim et al., 1999; Enomoto et al., 2000). In gain-of-function experiments, Sox9 mis-expression induced ectopic cartilage formation, whereas Runx2 misexpression induced ectopic mature cartilage and bone formation (Eames et al., 2004). These and other experiments show clearly that a Sox9 GRN regulates immature cartilage formation, a Runx2 GRN drives bone formation, and a combination of Sox9 and Runx2 GRNs produce mature cartilage (**Figure 3**). We emphasize the relevance of these transcription factors to the evolution of GRNs underlying skeletal histogenesis, since conserved, core components of GRNs (i.e., kernels) are often transcription factors (Levine and Davidson, 2005; Davidson and Erwin, 2006).

Expression studies of skeletal tissues in a range of organisms suggest an ancestral interaction between Sox and Runx GRNs. *Runx2*, along with its related family members, *Runx1* and *3*, derive from gnathostome duplications of an ancestral *Runx*, while agnathan *Runx* genes may have undergone an independent duplication (Meulemans and Bronner-Fraser, 2007; Hecht et al., 2008; Cattell et al., 2011; Kaneto and Wada, 2011; Nah et al., 2014). *Sox9*, along with its related family members, *Sox8* and *10*, derive from duplications to the ancestral *SoxE*, while agnathan *SoxE* genes may have undergone an independent duplication (Meulemans and Bronner-Fraser, 2007; Ohtani et al., 2008; Yu et al., 2008; Cattell et al., 2011; Uy et al., 2012; Jandzik et al., 2015). *Runx* and *SoxE* orthologs are expressed in cartilage of amphioxus, lamprey, and hagfish, suggesting that the gene ancestral to Runx2 primitively functioned with the gene ancestral to Sox9 in early cartilage formation (Hecht et al., 2008; Wada, 2010; Kaneto and Wada, 2011). Notably, these animals do not have bone, and they do not mineralize their skeletons. Interestingly, the amphioxus cirral skeleton shows features of both cartilage and bone, suggesting that this ancient skeleton might have diverged to form cellular cartilage and bone of vertebrates (Kaneto and Wada, 2011). We argue that evaluating the interactions between Sox9 and Runx2 GRNs leads to a novel hypothesis for the evolution of bone.

Many studies in mammals and chick demonstrate that the Sox9 GRN is at least partially dominant to the Runx2 GRN. First, co-expression of Sox9 and Runx2 typically causes cartilage formation, not bone (Eames and Helms, 2004; Eames et al., 2004). Second, ectopic expression of Sox9 in Runx2-expressing cells of developing bone (achieved either normally during secondary cartilage formation or experimentally using Sox9 mis-expression) diverts the cells to make cartilage, whereas ectopic Runx2 expression in Sox9-expressing cells of developing cartilage does not divert them to make bone (Eames et al., 2004). Third, Sox9 expression needs to be down-regulated in order for the full

FIGURE 3 | During endochondral ossification, immature cartilage, mature cartilage, and bone differentiate under the control of Sox9 and Runx2 GRNs. Chondrocytes of immature cartilage, termed resting and proliferative chondrocytes during endochondral ossification, express high levels of genes in the Sox9 GRN. Genes known to be under direct transcriptional control of Sox9 or Runx2 are highlighted in red or green text, respectively. Chondrocytes of mature cartilage, termed prehypertrophic and hypertrophic chondrocytes during endochondral ossification, express low levels of genes in the Sox9 GRN and also genes in the Runx2 GRN. Osteoblasts in perichondral and endochondral bone during endochondral ossification express genes in the Runx2 GRN. ∗*Col1* is one of the only genes expressed in osteoblasts that is not expressed in mature chondrocytes; Col10 expression in osteoblasts is high only in some vertebrates. Col11, Decorin, and Biglycan are expressed in all three of these skeletal cell types. Similar gene expression patterns are seen in immature cartilage, mature cartilage, and bone developing in the articular surface (not shown).

Runx2-dependent cartilage maturation program to be expressed (Akiyama et al., 2002; Eames et al., 2004). Fourth, Sox9 overexpression can inhibit Runx2 expression (Eames et al., 2004). Finally, and most conclusively, Sox9 directly binds to Runx2, inhibits its transcriptional activity, and increases ubiquitinmediated degradation of Runx2 (Zhou et al., 2006; Cheng and Genever, 2010).

Given evidence that the Sox9 GRN can dominate the Runx2 GRN, the formation of mature cartilage during endochondral ossification, which requires both Sox9 and Runx2, must be regulated exquisitely (**Figure 3**). During early stages, both Sox9 and Runx2 are co-expressed in mesenchymal condensations (Akiyama et al., 2002; Eames and Helms, 2004; Eames et al., 2004; Zhou et al., 2006), so Sox9 must exert a dominant inhibitory effect over Runx2 in order to produce immature cartilage. Later, Sox9 is down-regulated and Runx2 activity increases, triggering cartilage maturation (Eames et al., 2004; Yoshida et al., 2004; Hattori et al., 2010). In fact, Sox9 down-regulation is a crucial step for mature cartilage formation (Hattori et al., 2010). Despite this down-regulation, a role for Sox9 in very late stages of cartilage maturation also has been revealed (Ikegami et al., 2011; Dy et al., 2012). One study even suggests that Runx2 can inhibit Sox9 activity (Cheng and Genever, 2010), illustrating that complex feedback mechanisms are in place to achieve the appropriate relative levels of Sox9 and Runx2 activity. In summary, the preponderance of published literature on molecular genetics demonstrates that Sox9 has dominant effects over Runx2, and we extend this conclusion to generate a new hypothesis on the evolution of bone.

#### Bone Evolved from Mature Cartilage

Combining evidence from traditional and modern studies, we hypothesize that the GRN underlying bone formation evolved from a GRN underlying mature cartilage formation (**Figure 4**). Functional, histological, embryological, and molecular similarities among immature cartilage, mature cartilage, and bone suggest that these tissues may share an evolutionary history (**Figure 1**). The fossil record, comparative anatomy, and embryology demonstrate that immature cartilage evolved first (**Figure 2**). When combined with molecular genetic data (**Figure 3**), this means that the first evolved skeletal GRN was dominated by the gene ancestral to Sox9, driving immature cartilage formation. This GRN likely involved genes ancestral to Runx2 in early phylogenetic (and ontogenetic) stages. In gnathostomes, a Runx2 GRN drives formation of both mature cartilage and bone (**Figure 3**), but how did this novel GRN evolve to produce these novel skeletal tissues?

We propose that immature cartilage provided a structural and molecular "buffer" for the gradual development of this novel, Runx2 GRN. The structural buffering effect refers to the fact that immature cartilage already had a functional role as a skeletal tissue, allowing more freedom for the evolving Runx2 GRN to develop new functions that simply modify a pre-existing skeletal tissue in a gradual, step-wise fashion. The molecular buffering effect refers to the partial dominance of the Sox9 GRN, which might have shielded to some extent the evolving Runx2 GRN from natural selection. This concept recalls the principle of "weak linkage," which contributes to evolvability by reducing the cost of generating variation (Kirschner and Gerhart, 1998; Gerhart and Kirschner, 2007).

We argue that these putative buffering effects provide a more parsimonious account for the gradual evolution of bone from mature cartilage than the alternative, which depends upon *de novo* establishment of bone in a more saltationist fashion (**Figure 4**). If bone had evolved before mature cartilage, then the Runx2 GRN would have been under much stronger natural selection than if it had been buffered by immature cartilage. Arguments that bone evolved from dentine suffer from the same limitations: how did dentine and its GRN appear? A new GRN appearing simultaneously with a completely new skeletal tissue, while possible, seems a less likely evolutionary scenario than the gradual establishment of the Runx2 GRN during evolution of mature cartilage. Assembling a GRN driving bone formation *de novo* appears to depend upon saltationist genetic mechanisms, such as large-scale genomic changes or small genetic effects acting early in development. Regarding the latter possibility, chondrocytes and osteoblasts are known to share a relatively late embryonic progenitor (Day et al., 2005). Therefore, the former, "macromutational" saltationist mechanism, favored by Goldschmidt (Goldschmidt, 1940), would have to have operated in the *de novo* appearance of the osteoblast. Even saltationists granted that gradualism is the more common evolutionary mechanism (Gould, 2002). Therefore, based on the relative parsimony and abundance of gradualism versus saltationism, we favor a model in which the Runx2 GRN evolved within immature cartilage to produce mature cartilage, and then a

FIGURE 4 | Differing models for the appearance of the GRN driving osteoblast formation. (A) In this scenario, the osteoblast (and the Runx2 GRN that drives its formation) appeared *de novo*, independent of the chondrocyte. This model is consistent with saltational evolution, in which large-scale genomic changes may facilitate the evolution of novelty over short periods of geologic time. (B) In an alternative scenario, the osteoblast appeared after a series of step-wise additions to the mature chondrocyte (and thus the Runx2 GRN that drives its formation). After establishment of the Runx2 GRN in mature chondrocytes, the osteoblast appeared when another population of cells co-opted the Runx2 GRN. This model is consistent with gradual evolution, in which a series of small changes over geologic time may facilitate the evolution of novelty. The size of the circles and polygons represent relative levels of up- or down-regulation of genes in the respective GRNs (see text for discussion of interactions between Sox9 and Runx2 GRNs).

different mesenchymal (non-chondrogenic) cell population coopted this GRN, producing the world's first example of bone formation (**Figure 4B**).

The hypothesis that bone evolved from mature cartilage also is consistent with a variety of other observations on skeletal tissues (Fisher and Franz-Odendaal, 2012). During evolution, the features of mature cartilage seen in various vertebrate taxa did not appear at the same time (Hall, 1975; Smith and Hall, 1990). Hypertrophy and mineralization occurred first, followed by cartilage matrix degradation, replacement by fat and endochondral bone deposition, and finally, invasion by the vasculature (in tetrapods). These findings suggest that cartilage maturation is a highly evolvable process. Also, the progression from immature cartilage to mature cartilage to bone during evolution is mimicked during endochondral ossification. Recently, cell lineage analyses suggest that some cells that express immature cartilage genes go on to express mature cartilage genes, and finally they express bone genes, effectively transitioning from an immature chondrocyte to a mature chondrocyte to an osteoblast (Hammond and Schulte-Merker, 2009; Zhou et al., 2014; Park et al., 2015). Finally, gene expression patterns appear to overlap much more when comparing mature cartilage to bone in actinopterygians, such as teleosts, than in sarcopterygians, such as tetrapods (Eames et al., 2012). This may reflect differential retention of molecular signatures of the evolutionary history between mature cartilage and bone in earlier diverging versus later diverging vertebrates.

# Comparative Transcriptomics: A Novel Approach to Solve Evo-Devo Issues

Identification of homologous tissue types among different taxonomic lineages using histology and cell morphology has enabled evolutionary studies of histogenesis, but modern molecular techniques will expand dramatically this field. Traditionally, comparative anatomy established homologies at the levels of organs, tissues, and cells. Homology among cartilagelike tissues can be relatively clear for closely related species, but can prove more difficult when comparing distant clades, where clade-specific differences can obscure homology. For example, histological features, such as cellularity of a tissue, may confuse homology designation; cartilage is cellular in vertebrates, but is acellular in hemichordates (Smith et al., 2003; Cole and Hall, 2004; Rychel et al., 2006). In addition, three types of agnathan cartilage have been distinguished by histology: hard cartilage, soft cartilage, and mucocartilage (Zhang and Cohn, 2006; Zhang et al., 2009; Cattell et al., 2011). Which of these would be homologous to hyaline cartilage of gnathostomes, or are they all? Modern evolutionary thinking overlooks such superficial histological differences, emphasizing instead the importance of tracking changes to the underlying molecular genetic factors during trait evolution.

Evolutionary studies of skeletal cells will benefit from transcriptomic techniques, such as RNAseq, that enable characterization of their molecular fingerprints, which are the sets of genes expressed in a homogenous population of cells (Arendt, 2003). Comparing the molecular fingerprint of distinct cell types has yielded insight into evolutionary relationships among remote animal clades (Arendt, 2005, 2008; Eames et al., 2012). A few technologies can generate molecular fingerprints, but of these, RNAseq currently produces the most robust, unbiased results (Necsulea and Kaessmann, 2014). Some advantages of RNA-seq include a higher dynamic range, allowing the detection of transcripts that are expressed at very high or low levels, and the ability to detect novel genes and alternative splice variants in samples from any animal (Wang et al., 2009). Important for evolutionary studies, then, RNAseq allows for an accurate comparison of molecular fingerprints in both closely and distantly related species (Necsulea and Kaessmann, 2014; Pantalacci and Semon, 2015).

Tracking gene expression patterns that underlie a homologous trait through phylogeny provides unparalleled insight into molecular mechanisms of evolution. In fact, comparative transcriptomics might reveal that two tissues are homologous (so-called "deep homology"; Shubin et al., 2009), despite superficial histological or cellular differences. For example, the presence of immature cartilage in a variety of invertebrate taxa raises the possibility of a tissue with deep homology to cartilage present in the ancestor to all metazoans (**Figure 2B**). Also, identifying invertebrate tissues that express "bone genes" may reveal deep homology of these cells to osteoblasts, potentially facilitating the *de novo* appearance of the Runx2 GRN underlying bone formation. Genes in the vertebrate *Sparc* family play a role in skeletal matrix mineralization *in vitro* (Termine et al., 1981; Pataquiva-Mateus et al., 2012). Although similar *in vivo* roles for *Sparc* genes have not been demonstrated clearly (Roach, 1994; Gilmour et al., 1998; Rotllant et al., 2008), comparative genomics reveal a clear correlation between some *Sparc* genes and bone formation (Kawasaki and Weiss, 2006; Martinek et al., 2007; Koehler et al., 2009; Bertrand et al., 2013; Venkatesh et al., 2014). Interestingly, *Sparc* genes are expressed in amphioxus, which do not have bone nor mineralize their tissues (Bertrand et al., 2013). If Runx2 co-opted regulation of these genes during the *de novo* appearance of the osteoblast, then *Sparc*-expressing cells in amphioxus may have deep homology to osteoblasts.

Comparative transcriptomics can be used to evaluate quantitatively important features of GRN evolution, including constraint and adaptation. Although Gould recently revived the formalist pleas of Galton, Whitman, and others for constraint to have a positive role during evolution (Gould, 2002), constraint commonly is considered a restriction or limitation on the evolutionary process (Arnold, 1992). Evidence of constraint can be seen when transcriptomes are highly conserved among various tissues or clades, presumably due to genomic, developmental, or structural limitations. In addition to these constraints, a GRN under stabilizing selection would not vary much with respect to the genes expressed and their levels of expression, thus giving a transcriptomic signal of constraint. In fact, the architecture of GRN kernels, which usually consist of transcription factors and other regulatory genes, can remain highly conserved for a long period of time (Levine and Davidson, 2005; Davidson and Erwin, 2006). In contrast, adaptation commonly is considered positive for change during evolution (Gould, 2002; Stayton, 2008; Losos,

2011). Evidence of adaptation can be seen when transcriptomes differ widely among various tissues or clades, presumably in response to tissue- or clade-specific selective pressures. A GRN under negative or positive selection would vary a lot in the genes expressed and their levels of expression.

Comparative transcriptomics has unraveled the complexity of several important developmental and evolutionary processes in both invertebrate (Levin et al., 2012; McKenzie et al., 2014) and vertebrate organisms (Chan et al., 2009; Brawand et al., 2011). A major challenge in evolutionary biology is to explain the appearance of novel traits and the GRNs underlying their formation. Two different models have been proposed, with only one currently receiving much experimental support. In the first model, a GRN driving a novel trait also evolved *de novo*

(**Figure 4A**). For example, orphan genes, or genes without clear family members, might be important drivers of evolutionary novelty. First described in the yeast genome (Dujon, 1996), they occur also in many taxa, including rodents, primates, and humans (Heinen et al., 2009; Toll-Riera et al., 2009a,b; Li et al., 2010). Orphan genes might have appeared *de novo* from non-coding sequences rather than from existing genes (Tautz and Domazet-Loso, 2011). Subsequent interactions that these orphan genes establish among other genes would create a novel GRN with the capability of driving formation of a novel trait. This "*de novo*" model has received little experimental support in metazoans, but currently serves as the basis for the hypothesis that bone (or dentine, if dentine appeared before bone during evolution) evolved before mature cartilage (**Figure 4A**). In molecular terms, the GRN driving formation of the osteoblast would have appeared *de novo*, presumably in a short evolutionary timeframe.

In the second model for appearance of evolutionary novelties, which is increasingly supported by the literature, a novel trait appears by co-opting a pre-existing GRN (**Figure 4B**; Fisher and Franz-Odendaal, 2012; Achim and Arendt, 2014). For example, comparative genomic studies on muscle cells, immune cells, and neurons suggested that these cell types evolved by co-opting preexisting genetic systems (Achim and Arendt, 2014). In addition, the appearance of a novel embryonic cell lineage in vertebrates, the neural crest cell, has been argued to result from the co-option of pre-existing GRNs that were employed by cells in the neural tube, notochord, and pharynx in ancestral chordates (Baker and Bronner-Fraser, 1997; Donoghue and Sansom, 2002; Meulemans and Bronner-Fraser, 2005, 2007; McCauley and Bronner-Fraser, 2006; Zhang and Cohn, 2006). In fact, the neural crest-derived vertebrate cartilaginous head skeleton might have arisen after neural crest cells co-opted an ancestral chordate GRN that was used for cartilage formation in other parts of the body (Jandzik et al., 2015). Here, we use the same argument to support our idea that the osteoblast appeared when a non-chondrogenic mesenchymal cell co-opted expression of the mature cartilage Runx2 GRN.

# Comparative Transcriptomics and Skeletal Tissue Evolution

How extensive is our understanding of the GRNs driving cartilage and bone formation? As outlined above, Sox9 and Runx2 GRNs are critical in a variety of vertebrates, but is this the whole story? Few studies have analyzed the molecular fingerprint of the chondrocyte and osteoblast using unbiased transcriptomics, but such experiments may identify unknown GRN's driving formation of these cell types. The chondrocyte molecular fingerprint was estimated by compiling data from the literature and summarizing their interactions into a GRN (Cole, 2011). Recently, transcriptomics on Sox9 and Runx2 loss-of-function skeletal cells *in vitro* have shed light on Sox9 and Runx2 GRNs that are relevant to chondrocyte and osteoblast differentiation (Oh et al., 2014; Wu et al., 2014). A promising future direction is to use transcriptomics to define these GRNs *in vivo* using *Sox9* and *Runx2* loss-of-function animals. Comparative

transcriptomics between vertebrae and gill arch skeletal elements of a teleost demonstrated a high degree of overlap in gene expression between these two tissues (Vieira et al., 2013), but the presence of multiple cell types, including chondrocytes and osteoblasts, in both samples confounds attribution of these data to a particular cell type. Therefore, more specific techniques should be used to isolate a pure population of cells*in vivo* in order to accurately reveal and compare the molecular fingerprints of different skeletal cell types (**Figure 3**).

Two related, fascinating questions remain for future research: how did the GRNs directing skeletal cell differentiation appear, and how did they evolve afterward? In this review, we argue that gradual establishment of the Runx2 GRN during evolution of the mature chondrocyte (subsequently co-opted by a non-chondrogenic mesenchymal cell to form bone) is more parsimonious than the *de novo* appearance of the Runx2 GRN in osteoblasts (**Figure 4**). Given the latter possibility, however, the tremendous gene expression similarities between mature cartilage and bone in tetrapods also may reflect cooption of the Runx2 GRN by the mature chondrocyte after it was established in the osteoblast. These possibilities predict divergent vs. convergent evolution, respectively, of the Runx2 GRN in mature chondrocytes after the appearance of the osteoblast. Therefore, we propose an examination of skeletal cell molecular fingerprints in a variety of vertebrates to resolve this issue. Our divergent model predicts that the overlap between mature chondrocyte and osteoblast molecular fingerprints will decrease in more recently evolved organisms (**Figure 5A**). For example, molecular fingerprints of mature chondrocytes and osteoblasts would overlap more in earlier diverged lineages of vertebrates, such as teleosts, than in later evolved lineages, such as amphibians or mammals. On the other hand, the convergent model predicts the opposite result (**Figure 5B**).

But do skeletal cell molecular fingerprints evolve in cladespecific manners? A limited number of studies trying to answer this question suggest two competing ideas. On the one hand, molecular fingerprints of the chondrocyte and the osteoblast have been proposed to be highly constrained among various vertebrate clades (**Figure 6A**; Fisher and Franz-Odendaal, 2012; Vieira et al., 2013). On the other hand, gene expression comparisons between gar, zebrafish, chick, and mouse suggest that the chondrocyte molecular fingerprint is constrained among vertebrates, while the osteoblast molecular fingerprint varied, perhaps in response to clade-specific selective pressures (**Figure 6B**; Eames et al., 2012). Interestingly, generalizing these results puts forward the hypothesis that earlier-evolved cell types, in this case chondrocytes, might be more constrained in their gene expression than cell types that appeared later, such as osteoblasts, perhaps due to stabilizing selection over geologic timescales. Comparative transcriptomics can quantitate constraint and adaptation, by measuring how transcript levels vary among samples from different taxonomic lineages.

In the future, comparative transcriptomics will elucidate the dynamics of skeletal cell type evolution, identifying lineagespecific changes in gene expression, providing quantitative measures of constraint and adaptation, and potentially establishing deep homology of skeletal cells with previously unappreciated cell types. Indeed, appropriate application of comparative transcriptomics has the potential to revolutionize understanding of the molecular mechanisms of trait evolution.

#### Summary

Given the role that fossilized bones played in devising early evolutionary theory, skeletal tissue evolution has fascinated scientists for centuries. In particular, the appearance of bone as an evolutionary novelty demands explanation, which modern molecular and embryological techniques address in ways never imagined by studies of the fossil record alone. Here, we focus on the three main skeletal tissues present in vertebrates (immature cartilage, mature cartilage, and bone), and use findings from both traditional and modern studies to argue that bone evolved from mature cartilage. Standing in contrast to the available fossil record, which suggests that bone appeared prior to mature cartilage, this hypothesis posits that a GRN driving traits such as matrix mineralization in mature cartilage was co-opted by non-chondrogenic mesenchymal cells to produce bone. Alternatively, the GRN driving bone formation may

#### References


have evolved first and subsequently was co-opted by mature cartilage, but we use an argument based on parsimony that this scenario would be more complicated to achieve. Comparing the molecular fingerprints of skeletal tissues in agnathans and sister chordate species with those in vertebrates might resolve among these possibilities. In addition to comparative transcriptomics revealing the origins of evolutionary novelties, tracking molecular fingerprints of skeletal cells in various vertebrate lineages can identify quantitative measures of constraint and adaptation within the GRNs that govern the formation of skeletal tissues. Therefore, we strongly believe that this novel approach may revolutionize understanding of the evolution of cartilage and bone and more generally provide a modern paradigm for molecular genetic changes during the evolutionary process.

#### Acknowledgments

PGP was supported by scholarships from the Consejo Nacional de Ciencia y Tecnologia (CONACYT Mexico), Saskatchewan Innovation and Opportunity Graduate Scholarships (SIOGS), and College of Graduate Studies and Research at the University of Saskatchewan. BFE was supported on this study by a Saskatchewan Health Research Foundation (SHRF) Establishment Grant, a Natural Sciences and Engineering Research Council (NSERC) Discovery Grant, and a Canadian Institutes of Health Research (CIHR) New Investigator Salary Award.


identifies a novel dimension to control of osteoblastogenesis. *Genome Biol.* 15:R52. doi: 10.1186/gb-2014-15-3-r52


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Gómez-Picos and Eames. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The TALE face of Hox proteins in animal evolution

#### Samir Merabet 1, 2 \* and Brigitte Galliot <sup>3</sup>

<sup>1</sup> Centre National de Recherche Scientifique, Institut de Génomique Fonctionnelle de Lyon, Lyon, France, <sup>2</sup> Institut de Génomique Fonctionnelle de Lyon, Ecole Normale Supérieure de Lyon, Lyon, France, <sup>3</sup> Department of Genetics and Evolution, Faculty of Science, Institute of Genetics and Genomics in Geneva, University of Geneva, Geneva, Switzerland

Hox genes are major regulators of embryonic development. One of their most conserved functions is to coordinate the formation of specific body structures along the anterior-posterior (AP) axis in Bilateria. This architectural role was at the basis of several morphological innovations across bilaterian evolution. In this review, we traced the origin of the Hox patterning system by considering the partnership with PBC and Meis proteins. PBC and Meis belong to the TALE-class of homeodomain-containing transcription factors and act as generic cofactors of Hox proteins for AP axis patterning in Bilateria. Recent data indicate that Hox proteins acquired the ability to interact with their TALE partners in the last common ancestor of Bilateria and Cnidaria. These interactions relied initially on a short peptide motif called hexapeptide (HX), which is present in Hox and non-Hox protein families. Remarkably, Hox proteins can also recruit the TALE cofactors by using specific PBC Interaction Motifs (SPIMs). We describe how a functional Hox/TALE patterning system emerged in eumetazoans through the acquisition of SPIMs. We anticipate that interaction flexibility could be found in other patterning systems, being at the heart of the astonishing morphological diversity observed in the animal kingdom.

Keywords: Hox, PBC, Meis, Metazoa, patterning, early-branching phyla, HX, SPIMs

# Introduction

The phenotypic diversity observed in the animal kingdom arose from genetic innovations that modulate developmental processes, a step in evolution that often precedes speciation events (Gould, 1992; Arthur, 2002). A major challenge in biology is to characterize these genetic innovations and to understand how they impact developmental processes. Remarkably, the specification of body plans and body parts in species as different as humans or flies is controlled by a relatively small and highly conserved genetic repertoire called the "genetic toolkit" (True and Carroll, 2002; Erwin, 2009). This genetic toolkit, which acts at restricted stages of embryonic development, encodes for molecules involved in cell-cell communication, and gene regulation (Mann and Carroll, 2002). Components of the genetic toolkit are described in several bilaterian species to form character identification networks (Wagner, 2007), or kernels (Davidson and Erwin, 2006), which are part of large developmental networks that underlie body plan development (Davidson and Erwin, 2006). Several members of the genetic toolkit are also expressed in choanoflagellates, indicating that they originated prior to the emergence of the first metazoans (King et al., 2003; King, 2004; Wenger and Galliot, 2013).

**Abbreviations:** ANTP, Antennapedia; AP, anterior posterior; HD, Homeodomain; HX, Hexapeptide; PG, Paralog Group; SPIM, Specific PBC Interaction Motif; TALE, Three Amino acid Loop Extension; TF, Transcription factor.

Edited by:

Sylvain Marcellini, University of Concepcion, Chile

#### Reviewed by:

Ingo Braasch, University of Oregon, USA Pedro Martinez, Universitat de Barcelona, Spain

#### \*Correspondence:

Samir Merabet, Centre National de Recherche Scientifique, Ecole Normale Supérieure de Lyon, UMR5242, 46 Allée d'Italie, Lyon 69007, France samir.merabet@ens-lyon.fr

#### Specialty section:

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> Received: 14 June 2015 Accepted: 31 July 2015 Published: 18 August 2015

#### Citation:

Merabet S and Galliot B (2015) The TALE face of Hox proteins in animal evolution. Front. Genet. 6:267. doi: 10.3389/fgene.2015.00267

The large majority of contemporary animals belong to Bilateria, which are characterized by three embryonic germ layers (ectoderm, mesoderm, endoderm) and a bilateral symmetry that results from the orthogonal intersection of two longitudinal axes, the anterior-posterior (AP) axis (also referred to as the primary axis), and the dorso-ventral (DV) axis (also referred to as the secondary axis). Bilaterians radiated during the Cambrian period some 500–550 million years ago. Other extant nonbilaterian species belong to Porifera (sponges), Ctenophora, Placozoa (Trichoplax), and Cnidaria, whose ancestors predate the Cambrian explosion, thus often named early-branched phyla (**Figure 1**). With the exception of Placozoa, species from these early-branched phyla display different types of symmetry, either radial (as seen in sponge larvae, some adult sponges, and in most cnidarians), or biradial (as seen in ctenophores), or partly bilateral (as seen in sea anemone species that belong to the anthozoan class of cnidarians). These various symmetries are especially evident during embryogenesis and larval stages and depend on the formation of a primary body axis (Ryan and Baxevanis, 2007).

Cnidaria, a sister group to Bilateria, share with them typical features of eumetazoans, i.e., an ectodermal layer that differentiates as an epidermis, an endodermal layer that differentiates as a gut, and a nervous system, which, at the oral pole/extremity, allows an active feeding behavior. Also, Cnidaria includes a large variety of taxa with a wide spectrum of morphological diversity. All together, these characteristics place Cnidaria at a key phylogenetic position for tracing the emergence of molecular innovations that underlie developmental changes and diversification in animal evolution (Steele et al., 2011). Representative(s) of the main gene families involved in the specification of eumetazoan features are also found in Cnidaria (Martindale, 2005). Their study is however more challenging, due to the lack of advanced genetic tools that could allow establishing transgenic animals for stable gene expression or extinction in a tissue- and/or stage-specific manner.

Among the different conserved developmental gene families are the Hox genes, which are considered as the "Rosetta Stone" of the genetic toolkit. Hox genes were initially discovered in Drosophila, then rapidly investigated in vertebrate species, showing striking conserved features throughout bilaterian lineages (Lewis, 1978; McGinnis and Krumlauf, 1992; Kmita and Duboule, 2003). These conserved properties have been discussed in several reviews and relate to their clustered genomic organization that constrains embryonic expression (Duboule, 2007), but also to the presence of several typical protein signatures (Ogishima and Tanaka, 2007; Merabet et al., 2009). Modifications in Hox gene expression or in Hox protein function have been linked to several morphological innovations during the evolution of bilaterians (Pearson et al., 2005; Heffer et al., 2013). The presence of Hox genes in Cnidaria therefore raised the question of their role in the emergence of innovations shared by cnidarians and bilaterians, as well as in the emergence of innovations responsible for the morphological diversity observed among cnidarian species.

The most spectacular observation came from the embryo of the cnidarian sea anemone Nematostella vectensis, where several Hox-related genes show a staggered-like expression pattern along the oral-aboral (OA) axis. This expression profile led to the proposition that the cnidarian OA axis could be homologous to the bilaterian AP axis (Finnerty et al., 2004; Matus et al., 2006). The OA expression profile of Nematostella Hox genes is however neither conserved in other cnidarian lineages nor strictly following the collinear rules normally observed in Bilateria (Gauchat et al., 2000; Finnerty et al., 2004; Kamm et al., 2006; Ryan et al., 2007; Chiori et al., 2009). These additional observations led to the opposite conclusion that a Hox patterning system is likely not existing in Cnidaria (Kamm et al., 2006).

Surprisingly, the question of the evolution of Hox patterning mechanisms is rarely approached at the protein level, in particular by considering members of the PBC and Meis families. PBC and Meis are crucial patterning cofactors of Hox proteins along the AP axis, a partnership that is evolutionarily-conserved throughout Bilateria (Moens and Selleri, 2006; Mann et al., 2009). PBC and Meis belong to the TALE (Three Amino acids Loop Extension) class of homeodomain (HD)-containing transcription factors (Bürglin, 1997), are widely conserved across metazoans and can therefore be used as a molecular hallmark of the Hox patterning system. In this review, we report how the intricate interaction properties between Hox and TALE proteins were progressively acquired in pre-bilaterian animal evolution to eventually constitute a major patterning system.

#### Origin and Early Evolution of the Hox/ParaHox and PBC/Meis Gene Families

Hox proteins belong to the ANTP (Antennapedia) class of HDcontaining transcription factors. This class contains two large groups of sister gene families: (i) the non-Hox ANTP-class group, which includes the NK and Extended (Ext)-Hox gene families, and (ii) the Hox/ParaHox genes (Garcia-Fernàndez, 2005). The Hox gene family is usually found organized in clusters and contains several paralog groups (PGs) that are themselves classified into anterior (PG1-3), central (PG4-8), and posterior (PG9-14) (Duboule, 2007). The ParaHox family contains three clustered genes initially discovered in the cephalochordate amphioxus (Brooke et al., 1998), and named Gsx, Pdx/Xlox, and Cdx. ParaHox genes share common ancestors with specific Hox gene families, Gsx, and Pdx/Xlox with the anterior PG2/PG3, Cdx with the posterior PG9 (Quiquand et al., 2009).

Two different scenarios are proposed to explain the evolutionary history of the Hox/ParaHox gene family with regard to the other ANTP-class members. In the first one, the Hox/ParaHox family is specific to eumetazoans (which regroup Bilateria, Cnidaria, and Placozoa) and would have originated from duplications of a ProtoHox gene derived from NK genes and related to Evx/Mox (Ext-Hox family) (Gauchat et al., 2000; Minguillón and Garcia-Fernàndez, 2003; Larroux et al., 2007; Quiquand et al., 2009; Ryan et al., 2010) (**Figure 1A**). This scenario is supported by the presence of a Gsx ParaHox gene in the genome of the placozoan Trichoplax adhaerens (Schierwater and Kuhn, 1998; Schierwater et al., 2008b), the presence of several NK representatives and the absence of Hox/ParaHox

genes in the genome of the ctenophores Mnemiopsis leidyi (Ryan et al., 2010) and Pleurobrachia bachei (Moroz et al., 2014) and the demosponge Amphimedon queenslandica (Srivastava et al., 2010). However, this scenario is challenged by the "ghost loci hypothesis", which postulates that Hox/ParaHox genes were already present in the Last Common Ancestor (LCA) of metazoans and secondarily lost in Porifera over evolutionary times (Mendivil Ramos et al., 2012). In this second scenario, the Hox/ParaHox and NK families emerged independently from a common ProtoANTP ancestor gene (**Figure 1B**). The recent finding of a Cdx-like ParaHox gene in the genome of two calcareous sponges (Fortunato et al., 2014), now argues in favor of the ghost loci hypothesis.

In addition to ANTP, other classes of HD-containing transcription factors are also present in early branch phyla. These include the Paired-like, Pax, Pou, Lim, Six, and TALE classes (Galliot and de Vargas, 1999; Larroux et al., 2008; Srivastava et al., 2010; Holland, 2013; Fortunato et al., 2014). Members of the TALE class contain an atypical 63-residues long HD, due to the presence of three extra residues in between the helices 1 and 2 of the HD (Mukherjee and Bürglin, 2007). TALE class members are among the most ancient transcription factors in eukaryotes, with several of them present in unicellular organisms, plants and fungi (Bürglin, 1997, 1998), therefore predating the origin of animals. Interestingly, TALE-class members can interact with different types of HD-containing proteins in plants (Bellaoui et al., 2001; Hackbusch et al., 2005; Kanrar et al., 2006; Hay and Tsiantis, 2010), fungi (Keleher et al., 1989; Stark and Johnson, 1994; Carr et al., 2004), and animals (Bürglin, 1998).

The TALE class comprises five families (PBC, Meis, Iro, TGIF, and MKX), among which two, PBC and Meis are known to interact with ANTP members (Mukherjee and Bürglin, 2007). PBC and Meis were already present before multicellular organisms appeared (King et al., 2008; Clarke et al., 2013; Suga et al., 2013) and (**Figure 1**). Animal representatives of PBC and Meis include the Pbx1-4 or Extradenticle (Exd) and Meis1-3 or Homothorax (Hth) proteins, as named in mammals and in the fruit fly Drosophila melanogaster, respectively. PBC and Meis families originated from the duplication of a common ancestor gene named MEINOX, and this duplication was proposed to coincide with the apparition of the first Hox cluster in metazoans (Bürglin, 1998). Genome comparisons between early-branched metazoan species and unicellular organisms now establish that the PBC/Meis duplication predated the ANTP class and therefore the Hox/ParaHox family. Thomas Bürglin was however the first one to consider the partnership between Hox and TALE proteins as an informative molecular hallmark to trace the origin of the Hox patterning system in Metazoa (Bürglin, 1998).

#### The Ground State of Hox/TALE Interaction Networks in Bilateria: Role of the Hexapeptide (HX) Motif

The formation of Hox/PBC/Meis complexes in Bilateria is described to rely on Hox-PBC and PBC-Meis interactions (**Figure 2A**). Interaction between PBC and Meis involves the Nterminal PBC-A and Meis-A domains, respectively (Mann and Affolter, 1998). In the absence of Meis, the PBC-A domain is masking two nuclear localization signals located in the HD of PBC. The interaction with Meis relieves the masking activity of the PBC-A domain, allowing the nuclear translocation of PBC (Saleh et al., 2000; Stevens and Mann, 2007).

Interactions between Hox and PBC have been extensively studied at the biochemical and structural levels. All these analyses converge to show a preponderant role for a short conserved motif present in Hox proteins, named hexapeptide (HX) (Mann et al., 2009). The HX motif lies upstream to the HD and contains a core Y/FPWM sequence in all but Abdominal Bgroup Hox proteins, which have a more divergent sequence (Merabet et al., 2009). More generally, the HX motif is defined as a PBC interaction motif (PIM) that contains an invariant Tryptophan residue located in a hydrophobic environment, followed by basic residues from +2 to +5 (In der Rieden et al., 2003). Crystal structures of vertebrate and invertebrate Hox/PBC complexes solved with anterior, central or posterior Hox proteins point to the critical role of the Tryptophan residue in maintaining strong interactions within the hydrophobic pocket formed in part by the three extra residues of the PBC HD (Passner et al., 1999; Piper et al., 1999; LaRonde-LeBlanc and Wolberger, 2003; Joshi et al., 2007). A recent structural analysis of the Hox/PBC complex bound on a physiological DNA-binding site further underlined that Hox paralog specific residues located in the N-terminal arm of the HD and in the linker region connecting the HX motif to the HD are important for recognizing a specific shape of the DNA minor groove in the presence of PBC (Joshi et al., 2007). SELEX-seq based approaches confirmed that Drosophila Hox/PBC complexes preferentially recognize different nucleotide sequences characterized by distinct minor groove topographies (Slattery et al., 2011). These results open new avenues for apprehending the molecular mechanisms underlying Hox and Hox/PBC DNA-binding specificity (Abe et al., 2015). Nevertheless, the systematic involvement of a unique Hox protein motif in the interaction with PBC does not easily explain the broad variety of functions that Hox/TALE complexes have in vivo (Hueber and Lohmann, 2008; Mann et al., 2009).

#### Specific PBC Interactions Motifs (SPIMs) as Versatile Complements to Diversify Hox/TALE Interaction Properties in Bilateria and Cnidaria

Our knowledge of Hox-TALE interaction properties results mostly from in vitro approaches. Along the same line, the duplication of Pbx and Meis genes in vertebrates could provide a supplementary layer of complexity. For example, direct Hox-Meis interactions are described with mouse proteins but their functional significance remains to be elucidated (Shen et al., 1997; Williams et al., 2005). The existence of alternative modes in Hox-PBC interaction came from the observation that the HX mutation does not obligatorily affect PBC-dependent functions of Hox proteins in the Drosophila embryo (Galant et al., 2002; Merabet et al., 2003). Additionally, several central and posterior Hox proteins from vertebrates and invertebrates interact with the TALE cofactors independently of the HX motif in vitro and in vivo (Hudry et al., 2012). Interestingly, HX-independent interactions between Hox and PBC are most often observed in the presence of Meis, and the involvement of Meis in such HXindependent interactions actually depends on its DNA-binding near the Hox/PBC binding site (Hudry et al., 2012). In other words, in acting at the level of target cis-regulatory sequences, Meis contributes to diversify the mode of Hox-PBC interactions and thus Hox functions (Merabet and Hudry, 2013).

The flexibility of Hox-TALE interaction properties is predicted to rely on Hox protein motifs that are more genespecific than the generic HX motif. These motifs are named SPIMs [Specific PBC Interaction Motifs, see also Merabet and Hudry, 2013]. Like the HX motif, SPIMs belong to the so-called short linear motifs, which are classically 5–10 residues long and most often located within intrinsically disordered protein regions (Tompa et al., 2014). Two such motifs have been identified in the Drosophila Ultrabithorax (Ubx) and AbdominalA (AbdA) proteins (Merabet et al., 2007, 2011; Hudry et al., 2012). One of them is conserved in insect AbdA proteins, with a core TDWM sequence reminiscent of the HX motif. The other motif, named

UbdA, is conserved between the protostome Ubx and AbdA proteins (Balavoine et al., 2002). Recent structural analyses showed that the UbdA motif constitutes a flexible extension of the HD that can establish direct contacts with the PBC partner (Foos et al., 2015). Altogether, studies with Ubx and AbdA confirm that Hox-TALE interactions and functions can rely on species- and/or paralog-specific motifs.

SPIMs remain to be identified in the majority of Hox proteins exerting HX-independent interactions with the TALE cofactors. Still, the usage of different SPIMs in Hox proteins constitutes an appealing molecular strategy for supporting the specific patterning functions of Hox/TALE complexes during

#### TABLE 1 | Presence or absence of the HX motif among the Hox/ParaHox and non-Hox/ParaHox families across Metazoa.

The color code denotes for presence or absence of the HX motif, and for incomplete or non-annotated gene, as indicated. Boxes surrounded in yellow in non-Hox/ParaHox proteins highlight a demonstrated role of the HX motif for interaction with TALE partners. See main text for details. Protein sequences were retrieved from Uniprot/Swissprot. Stars denote species with sequenced genome.

development (**Figure 2B**). Moreover, the conservation of this property in vertebrate and invertebrate species (Hudry et al., 2012) strongly suggests that interaction flexibility is ancient in Bilateria. As a consequence, it is of upmost interest to trace its origin beyond Bilateria and assess its role in developmental and/or patterning functions.

Besides Bilateria, Cnidaria is the only other phylum that contains a bona fide Hox repertoire (Chourrout et al., 2006; Kamm et al., 2006). As mentioned previously, the role of cnidarian Hox genes in axis patterning is unclear. Furthermore, not all cnidarian Hox proteins contain an intact HX motif [(Hudry et al., 2014) and **Table 1**]. Nevertheless, as cnidarians express PBC and Meis genes (Matus et al., 2006; Hudry et al., 2014), a Hox/PBC/Meis network could potentially exist. The interaction properties of Hox, PBC and Meis proteins of the sea anemone Nematostella vectensis were recently tested, and as expected, these proteins form dimeric and trimeric complexes in vitro (Hudry et al., 2014). In addition, mutating the HX motif leads to the loss of the cnidarian Hox/PBC complex, but this loss is rescued in the presence of Meis. Hence, as observed in bilaterians, the Nematostella Meis allows Nematostella Hox proteins to use alternative modes of interaction with PBC. Thus, bilaterian and cnidarian Hox proteins share the property of using different interfaces for recruiting the TALE cofactors. We propose that these additional interfaces could correspond to SPIMs that remain to be identified in several instances (**Figure 3**). Moreover, with the exception of the HX motif, bilaterian and cnidarian Hox proteins do not share strong sequence similarities outside the HD, suggesting that those putative SPIMs could have evolved independently during eumetazoan evolution (see also below).

#### Genesis of ANTP-TALE Networks during Early Metazoan Evolution

Molecular analyses underline that the HX motif is a generic interaction platform for recruiting the TALE partners. We therefore analyzed a large number of available protein sequences for assessing the presence of a putative HX motif in ANTP class members. A peptide sequence was considered as a putative HX motif when containing the consensus Y/FPWM (typical HX motif) or a single W (atypical/divergent HX motif) residue followed by a basic residue (R or K) from +2 to +6 and not localized more than 30 residues away from the HD (**Table 1**). In Bilateria, the HX motif is found in almost all Hox/ParaHox members, and in several individual representatives of non-Hox/ParaHox protein families, including Engrailed (En), Msx, Hex, Tlx, Not, and Emx proteins (**Table 1**). The HX motif is found in cnidarian Hox/ParaHox members among earlybranched animal phyla. It is however less conserved when compared to Bilateria, being lost or divergent in several cnidarian lineages (**Table 1**). Atypical HX motifs are also found in Msx and Hex members of Cnidaria, and in Not members of Cnidaria

and Placozoa (**Table 1**). Interestingly, the Evx, Mox, and Gsx proteins, which likely represent the most ancestral ProtoHox and Hox/ParaHox family members (Minguillón and Garcia-Fernàndez, 2003; Quiquand et al., 2009) all lack the HX motif (**Table 1**).

PBC-recruiting functions have been assigned to few non-Hox proteins among the ANTP class so far. Among them are the mammalian Tlx, Drosophila En and Nematostella Msx proteins, which do interact in a fully HX-dependent manner with the TALE cofactors (Rhee et al., 2004; Brendolan et al., 2005; Fujioka et al., 2012; Hudry et al., 2014). Still, these proteins display subtle differences in their TALE interaction properties. For example, the Drosophila En protein interacts with PBC or PBC/Meis in a HX-dependent manner (Hudry et al., 2014). By comparison, the Msx protein from Nematostella interacts in a HX-dependent manner with PBC, but only in the presence of Meis (Hudry et al., 2014). These observations highlight that the role of the PBC/Meis partnership in HX-dependent interactions can be different depending on the protein family and animal lineage considered.

We propose two different evolutionary scenarios to explain the presence of the HX motif in several ANTP family members among metazoan lineages: (i) either the HX motif was already present in the ProtoANTP ancestor, constituting the first molecular interface for recruiting the TALE cofactors (**Figure 4A**), or (ii) it emerged multiple times independently in the different ANTP families across animal evolution (**Figure 4B**). The position of the HX motif systematically located in the upstream vicinity of the HD supports the first scenario. As a corollary, the absence of any HX-like motif in all but one (Not) ANTP members of Placozoa, Porifera and Ctenophora would be attributed to repeated secondary losses. Although more sequences are needed in these three early-branched animal phyla, this apparently global and systematic loss of HX motif sequences is intriguing. This could argue in favor of the second scenario, whereby the HX motif would have appeared sporadically by convergent evolution in the different protein families. This second scenario does not exclude additional secondary losses, as observed in cnidarian Hox/ParaHox proteins (**Table 1**). Moreover, evolution by convergence is not atypical for short motifs in general (Van Roey et al., 2013), and has for example already been proposed for another motif widely found in ANTPclass members (including Gsx, En, Emx, and several NK) and other non-homeoproteins (Williams and Holland, 2000). In the case of the HX motif, it seemingly appeared later during evolution in bilaterian Tlx, Emx and En proteins (**Table 1**), suggesting a mechanism of convergent evolution. Of note, these bilaterian proteins are known to interact and/or participate with TALE cofactors in the context of tissue-specific functions (Brendolan et al., 2005; Capellini et al., 2010). Along the same line, an HX

motif is also present in non-ANTP class proteins, including LIM and several myogenic bHLH proteins (see In der Rieden et al., 2003, for a more complete list of HX-containing proteins). In the case of bHLH proteins the HX motif was further shown to be involved in the interaction and function with TALE cofactors

NK (Msx, orange arrow) families of the last common ancestor of Cnidaria and

during skeletal muscle differentiation in vertebrates (Knoepfler et al., 1999; Maves et al., 2007, 2009; Yao et al., 2013). Together these observations highlight the strong evolutionary plasticity of the HX motif for providing a TALE-recruiting activity to highly divergent protein families.

Cnidaria, coinciding with strong morphological radiation in these two phyla.

#### Genesis of the Hox-TALE Patterning System during Metazoan Evolution

The evolutionarily conserved PBC-A and Meis-A domains in PBC and Meis proteins are restricted to Bilateria, Cnidaria and Placozoa, suggesting that a Hox/TALE network exists only in these three phyla (**Figure 1**). Like all cnidarian and bilaterian Gsx proteins, the unique ParaHox Gsx representative of Trichoplax adhaerens has no HX motif (**Table 1**) and cannot interact with PBC and Meis (Hudry et al., 2014). By contrast, the two other ParaHox and the Hox-related proteins have retained an HX motif in most cnidarians and bilaterians (**Table 1**). Thus, Cnidaria and Bilateria are the only phyla where a Hox-TALE interaction network is effective.

Since interaction with TALE proteins is not a specific feature of Hox proteins, the next question is "When did Hox proteins acquire their patterning functions linked to the interaction with the TALE cofactors?" We postulate here that the acquisition of differential patterning functions was tightly linked to the emergence of diversified interaction properties between Hox and TALE proteins. Then the question could be reformulated as: "When did alternative TALE interaction motifs appear in addition to the HX motif in the Hox/ParaHox family?"

Recent work with Nematostella Hox and TALE proteins (Hudry et al., 2014) suggests that SPIMs co-evolved with the specification of embryonic axes. As SPIMs are specific to a given Hox family or to a given species, they likely emerged independently several times during evolution (**Figure 4**). We propose that the original HX-dependent interaction mode served as an initial molecular template for experiencing these novel HXindependent interaction properties with the TALE partners. It is tempting to speculate that SPIMs were a molecular prerequisite for allowing Hox proteins to acquire patterning functions during early eumetazoan evolution. In this model, the acquisition of SPIMs in Hox proteins likely happened in parallel to mechanisms regulating their expression, allocating Hox genes to specific spatio-temporal domains along the longitudinal axis (**Figure 5**).

Finally, SPIMs do not necessarily correspond to related peptide sequences, as already noticed for the TDWM and UbdA motifs in Drosophila (Merabet and Hudry, 2013), making their identification difficult. Additional SPIMs need however to be identified to validate our model. Several tools are now available for predicting the presence of short interaction motifs in protein sequences, based on the analysis of amino acid chemical properties and the classification of hundreds of characterized short motifs in databases (Tompa et al., 2014). Interestingly, these tools predict a number of short motifs in several regions of bilaterian (Merabet and Dard, 2014) and cnidarian (Baëza et al., 2015) Hox proteins. These regions are often involved in the interaction with different TFs (Baëza et al., 2015), and could therefore contain good candidate SPIMs to test in the future.

#### Perspective: the HX Motif and SPIMs as Molecular Markers of Patterning Functions in the ParaHox Family?

ParaHox genes share several common features with the Hox genes. For example, they are organized in clusters and display spatial-temporal constraints for their expression during embryogenesis of several bilaterian species (Garstang and Ferrier, 2013). The expression profile of ParaHox genes in Cnidaria is also reminiscent of important functions during embryogenesis, regeneration or budding, as seen in the solitary polyp Hydra (Schummer et al., 1992; Miljkovic-Licina et al., 2007), the coral Acropora (Hayward et al., 2001), the jellyfish Podocoryne (Yanze et al., 2001), the sea anemone Nematostella (Finnerty et al., 2003), or the colonial polyp Hydractinia (Cartwright et al., 2006). Moreover, ParaHox genes, and more particularly Gsx, could be more representative of the ProtoHox ancestor gene than any other Hox gene (Quiquand et al., 2009). Although Gsx does not contain any HX motif, it has a conserved role for the specification of neuroblast lineages in bilaterians (Weiss et al., 1998; Waclaw et al., 2009; Winterbottom et al., 2010; López-Juárez et al., 2013) and cnidarians, with a fine regulation along the body axis (Hayward et al., 2001; Miljkovic-Licina et al., 2007). Along the same line, Pdx-1 plays a crucial role in pancreatic beta-cell differentiation (Kaneto et al., 2007). These observations suggest that a primordial ParaHox (and Hox) function was dedicated to the emergence of novel cell types along the body axis, possibly in a TALE-independent manner (**Figure 5**) (De Jong et al., 2006; Miljkovic-Licina et al., 2007; Quiquand et al., 2009). This role could then have been deployed in several Hox/ParaHox members and in different tissues, requiring the acquisition of additional molecular features such as the HX motif and SLIMs for diversifying the novel patterning functions. In agreement with this hypothesis, in Bilateria Pdx/Xlox and Cdx transcription factors are required for the patterning of endodermal derivatives (Cole et al., 2009; Beck and Stringer, 2010; Annunziata et al., 2013; Ikuta et al., 2013) or during axis elongation with the Hox genes (Moreno and Morata, 1999; Van den Akker et al., 2002; Shinmyo et al., 2005; Young et al., 2009). The impact of TALE cofactors in those patterning functions remains to be investigated. The role of Pdx/Xlox and Cdx is also largely unknown in cnidarians. Testing their interaction properties with TALE cofactors could undoubtedly provide new insightful information into the origin and evolvability of the Hox/TALE patterning system in Metazoa. Ultimately, such studies should tell us whether the combination of one HX motif plus several SPIMs in the ParaHox proteins was necessary and sufficient to promote a spatial organization of cell differentiation along the body axis and thus the emergence of patterning functions in different tissues.

# Conclusion

Hox proteins are TFs displaying highly similar DNA binding properties in vitro. Still, each Hox protein will dictate a specific developmental program with the same set of TALE cofactors. We proposed here that the apparition of a functional Hox/TALE patterning system during metazoan evolution was tightly linked to the acquisition of different short motifs named SPIMs. The usage of different SPIMs in Hox proteins constitutes an appealing molecular strategy for explaining the specific and various developmental functions of Hox/TALE complexes. Due to their small size, SPIMs present the advantage of being highly dynamic during evolution, allowing diversifying the molecular

code between Hox and TALE proteins. This model supposes that interaction flexibility is an important feature of the Hox/TALE patterning system. Whether this molecular strategy could more widely apply to other key patterning networks constitutes a major issue to investigate in the future.

# Acknowledgments

We warmly thank Daniel Chourrout and Maja Adamska for helpful comments and criticisms on early versions of this review. Work in the laboratory of SM is supported by Association Française contre les Myopathies (AFM), Fondation pour la recherché Médicale (FRM), Association pour la Recherche contre le Cancer (ARC), Ligue Régionale contre le Cancer, Centre National de la Recherche Scientifique (CNRS), and Ecole

#### References


Normale Supérieure (ENS) de Lyon. Work in the Galliot lab is supported by the State of Geneva, the Swiss National Fonds for Research (SNF-31003A-149630). We apologize to colleagues whose work was not cited due to space constraints.


subcellular localization. Genetics 175, 1625–1636. doi: 10.1534/genetics.106. 066449


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Merabet and Galliot. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# **A stranger in a strange land: the utility and interpretation of heterologous expression**

*Elena M. Kramer\**

*Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA*

One of the major goals of the modern study of evodevo is to understand the evolution of gene function across a range of contexts, including sub/neofunctionalization, co-option of genetic modules, and the evolution of morphological novelty. To these ends, comparative studies of gene expression can be useful for constructing hypotheses, but cannot provide direct evidence of functional evolution. Unfortunately, determining endogenous gene function in non-model species is often not an option. Faced with this dilemma, a common approach is to use heterologous expression (HE) in genetically tractable model species as a proxy for functional analyses. Such experiments have important limitations, however, and require caution in the interpretation of their results. How do we dissociate biochemical function from its original genomic context? In the end, what does HE actually tell us? Here, I argue that HE only sheds light on specific types of biochemical conservation, but can be useful when experiments are carefully interpreted.

#### *Edited by:*

*Sylvain Marcellini, University of Concepcion, Chile*

#### *Reviewed by:*

*Verónica S. Di Stilio, University of Washington, USA Kacy Gordon, Duke University, USA*

#### *\*Correspondence:*

*Elena M. Kramer, Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA ekramer@oeb.harvard.edu*

#### *Specialty section:*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Plant Science*

> *Received: 30 June 2015 Accepted: 29 August 2015 Published: 15 September 2015*

#### *Citation:*

*Kramer EM (2015) A stranger in a strange land: the utility and interpretation of heterologous expression. Front. Plant Sci. 6:734. doi: 10.3389/fpls.2015.00734* **Keywords: evo-devo, heterologous expression, functional evolution, biochemical evolution, developmental genetics**

As developmental biologists, it is important to remember that when we speak of "gene function," we are conflating, by necessity, a complex array of different factors. At a fundamental level, we can think of gene function as representing two complementary components: the first being biochemical function and the second being developmental role (**Figure 1**). The former is determined by the coding sequence of the gene itself and encompasses everything from secondary/tertiary protein structure, to enzymatic capacity, to co-factor and/or DNA binding site affinity. These aspects of gene function may change as the sequence of your favorite gene (YFG) itself evolves. As if this weren't complicated enough, the actual developmental role played by YFG is the product of all of these primary components interacting with a wide array of *cis*- and *trans*-acting phenomena, including the expression patterns of YFG in relation to its co-factors, the epigenetic state of target loci, the position of binding sites within the genome, post-translation regulation of all interacting proteins, etc. Obviously, these secondary components evolve as well, to varying degrees in a coordinated fashion with YFG. So when we talk about the evolution of gene function, we are really considering the evolution of the whole genomic context of YFG—its protein sequence, *cis*- and *trans*-regulation, interacting partners, and target gene repertoire.

Heterologous expression (HE) takes the primary component of gene function—the sequence of the coding region itself—and plugs it into the second component—the genomic context—of a different species. We are essentially performing a site-directed mutagenesis experiment in which we ask whether the sequence differences between YFG and its endogenous homolog disrupt the functional roles normally played by the endogenous locus in its own genomic environment. Of course, HE can be conducted with varying degrees of rigor. The most rigorous approach is to drive expression with the endogenous promoter and ask whether YFG can rescue the phenotype of a null

**FIGURE 1 | A schematic representation of the dual nature of gene function.** Aspects influencing biochemical function are highlighted in shades of blue while aspects of the genomic context are highlighted in shades of red. Note that here, I am only considering heterologous expression of coding sequences, so upstream regulatory elements are considered to be part of the endogenous genomic context.

mutation in the endogenous locus. With surprising frequency, however, the heterologous locus is simply over-expressed in a wild type background (e.g., Lee et al., 2012; Perilleux et al., 2013; Lovisetto et al., 2015), such that the real question being asked is: Can this alien protein perturb development in the same manner as the endogenous protein when it is over-expressed? Such an approach creates new problems, including the nature of protein interactions, which are subject to reaction equilibria and therefore sensitive to the concentrations of the interacting factors.

Given this perspective, we should consider the variety of ways that HE is typically used in the evodevo field. These include to bolster evidence of genetic orthology (e.g., Serrano et al., 2009), to assess homology of a genetic module or an organ (e.g., Halder et al., 1995; Whipple et al., 2004), and to broadly assess conservation of gene "function" between taxa (e.g., Alvarez-Buylla et al., 2010; Kachroo et al., 2015). The first of these uses should be rejected since similarity of function is absolutely not a criterion for genetic homology in general or orthology in particular (Theissen, 2002; Gabaldon and Koonin, 2013). It is even true that positive HE results can be misleading when it comes to assessing orthology. Perhaps the best understood instance of this phenomenon is the *AGAMOUS* (*AG*) lineage of floral organ identity genes in flowering plants. The functions of *AG* homologs were first described in the core eudicot model systems *Arabidopsis* and *Antirrhinum* (snapdragon; Coen and Meyerowitz, 1991). In *Arabidopsis*, the *ag* mutant phenotype results in homeotic transformation of fertile organs into sterile organs and a loss of determinacy in the floral meristem. The *plena* (*ple*) mutant in *Antirrhinum* has the identical phenotype and *PLE* is clearly homologous to *AG*. However, *PLE* and *AG* are not orthologous but, rather, are derived from a whole genome duplication that occurred at the base of the core eudicots (Davies et al., 1999; Kramer et al., 2004). The orthologs of *PLE* in *Arabidopsis* are a pair of recent duplicates called *SHATTERPROOF1/2*, which participate in fruit and ovule development (Liljegren et al., 2000), while the ortholog of *AG* in *Antirrhinum* is called *FARINELLI* (*FAR*), a gene that primarily contributes to stamen development (Davies et al., 1999; Causier et al., 2005). These distinct functions appear to be due to independent patterns of subfunctionalization that occurred along the lineages leading to the rosid *Arabidopsis* on the one hand and the asterid *Antirrhinum* on the other. Furthermore, while the paralogs AG and PLE are biochemically equivalent in *Arabidopsis*, the orthologs AG and FAR are not (Causier et al., 2005; Airoldi et al., 2010). This is most likely due to changes in selection as *FAR* became specialized to function in stamen identity. Therefore, while it may commonly be true that orthologs are more likely than not to have both functional similarity and biochemical conservation, we cannot take it for granted.

If you will permit me a digression, I would also like to strongly discourage the common use of the term "functional ortholog." It is important to remember that function is generally considered not to be a criterion for homology, even among genes (Theissen, 2002). I actually agree with Mindell and Meyer (2001) on this point, that there should be some leeway for discussing the inheritance of genetic function, but we should recognize that it is widely held that functions of any kind cannot be homologous. What information are we trying to convey when we say "functional ortholog?" We want to say that we have a pair of genes that are genetic orthologs and also appear to play similar functional roles. This is an important piece of information; certainly, we often want to know if function is conserved among orthologs. However, this terminology seems to suggest that "functional" orthologs have an additional quality of greater orthology because they show conserved function. This is simply untrue. Orthology is a feature of genetic relationship, of inheritance and patterns of gene duplication. It does not increase or decrease based on functional similarity. It is much more informative to say that you have performed a rigorous phylogenetic and/or syntenic analysis and have determined that the genes in question are orthologs and, further, appear to share conserved functions. We must recognize that this statement can really only be made if you have conducted endogenous functional studies in the taxa being compared. If you have only performed HE, then the best you can say is that there is some degree of biochemical conservation.

The use of HE to assess homology of a genetic module or an organ is more complex and relates to the need to distinguish between process homology and morphological homology, which has been well-covered by many previous authors (Bolker and Raff, 1996; Abouheif, 1997; Abouheif et al., 1997). These authors recognized quite early during the molecular renaissance of our field that shared expression of genetic homologs, and even shared developmental control by homologous genetic modules, should not be used as the basis for assessment of morphological homology. Hodin (2000) succinctly addressed the issue while discussing the limited value of HE with *Pax6* homologs: "A positive result tells you only that the biochemical properties of the protein have been conserved, not necessarily that its function within a certain morphological structure has also been conserved. The commonplace use of the same gene within an organism performing distinct functions in a multitude of tissue

reveals why this experiment is generally uninformative with respect to evolutionary history (see also Abouheif et al., 1997)." Here, Hodin seeks to highlight the fact that conservation of biochemical interactions within a particular genetic module does not inform on the myriad of ways in which that module can be developmentally deployed. In this regard, I should note that HE can provide some relevant information if you are simply trying to assess homology of a genetic module, but I would argue that phylogeny-based homology assessment of the genes involved and tests of endogenous regulatory interactions are even more useful.

Process homology is especially relevant to cases of co-option of genetic modules to novel developmental functions. For instance, in butterflies *Distal-less* (*Dll*) orthologs have been recruited to promote the development of wing spots (Brunetti et al., 2001). The wing spot developmental program is very unlikely to be recapitulated by simply expressing the butterfly *Dll* in *Drosophila* because this developmental program is a product of what I defined as the second component of gene function, the endogenous genomic architecture of the butterfly. However, reciprocal HE of *Dll* orthologs between *Drosophila* and butterflies would be perfectly useful if your goal was to determine whether the evolution of the wing spot involved biochemical divergence in the butterfly *Dll* sequence. This type of co-option is just one extreme on a spectrum of evolutionary change that could also include morphological remodeling events such as the derivation of halteres from hindwings (Hersh et al., 2007), lodicules from petals (Whipple et al., 2007; Yoshida, 2012) or staminodia from stamens (Sharma and Kramer, 2013). Such evolutionary transitions may involve biochemical changes in upstream transcription factors but clearly also involve changes in target gene repertoires (e.g., Hersh et al., 2007). HE is much more likely to shed light on any biochemical changes rather than changes in target gene repertoires, which primarily depend on the positions of downstream binding sites dispersed throughout the genome.

The third common use of HE, to investigate conservation of "function," is perfectly legitimate in many cases but less so in others. It is probably useful to start with a consideration of what can go wrong with HE. For instance, a lack of rescue or the failure to produce a phenotype may simply be due to the divergence between your species of interest and the reference model system. Even proteins that are likely to serve conserved functions can experience the process of developmental system drift (True and Haag, 2001) at the level of primary sequence. In other words, this is a site directed mutagenesis experiment in which the altered protein cannot function in the model system's genomic context but may be perfectly functional in its original environment. On occasion, HE results in novel or dominant negative phenotypes (e.g., Lee et al., 2012; Katahata et al., 2014; Sun et al., 2014). These may be due to the disruptive effects of an alien protein being introduced to a system for which it is not adapted. If the heterologous protein can interact with some co-factors but not others, it may act as a dominant negative allele, especially when over-expressed. Perhaps most surprisingly though, even positive results can be misleading. Zarrinpar et al. (2003) tested the ability of SH3-domain protein homologs to rescue the function of one specific family member in yeast. They found that while endogenous paralogs were highly functionally specific and could not rescue, diverse metazoan homologs showed higher frequency of rescue. These results reflect the fact that members of the same genome, especially when co-expressed, will tend to co-evolve for a high degree of functional specificity. Homologs from divergent genomic contexts that have not experienced the same patterns of co-evolution may actually be quite promiscuous in a heterologous genome. Thus, we see that a range of results from HE can be uninformative or misleading, especially when you do not have functional data from the original organism.

So am I suggesting that HE is never useful for examining the evolution of gene function? Certainly not. In cases where biochemical divergence is specifically being assessed, this approach can be the best experiment to use, albeit with some caveats. Let's consider a classic HE experiment, Ronshaugen et al. (2002), in which they tested the ability of *Artemia* Ubx to suppress limb development in *Drosophila* (**Figure 2**). Interestingly, the authors found that while the full length *Artemia* Ubx had little limb-suppressing capacity, a relatively minor C-terminal deletion allowed the *Artemia* protein to repress limbs in *Drosophila*. In light of this finding, the authors proposed a model in which the Ubx protein of a crustacean/insect ancestor experienced mutation in the C-terminus of the protein that uncovered a limb-repression function. This is certainly a plausible scenario that fits the presented data, but we should also recognize a weakness in that the experiment was only performed in the *Drosophila* genomic context where Ubx has a limb repressing function. If you could put the mutated *Artemia* Ubx back into *Artemia*, would it have the capacity to repress limbs or is that primarily a product of the *Drosophila* genome? As it turns out, further studies in *Artemia* have revealed a more complex situation that suggests that there may be multiple reasons why Ubx does not repress limbs in *Artemia* (Hsia et al., 2010). These findings underscore the fact that accurate interpretation of HE data really hinges on having as much information as possible in both taxa, including functional results whenever possible.

One especially elegant demonstration of how powerful HE can be when paired with functional studies in both the donor and recipient is work done on the control of flowering time in sugar beet (*Beta vulgaris* ssp. *vulgaris,* Pin et al., 2010). In flowering plants, homologs of the PEBP lineage defined by the *Arabidopsis* gene *FLOWERING LOCUS T* (*FT*) are broadly involved with promoting the transition from vegetative to reproductive development (reviewed Ballerini and Kramer, 2011). The FT protein has been identified as the classic Florigen factor that moves from leaves, where it is produced, to the apical meristem in order to change meristem identity. Consistent with this role, most *FT* homologs are only expressed at significant levels after the initiation of reproductive development. In cultivated sugar beet, however, a very recent gene duplication has given rise to two copies: *BvFT1*, which is primarily expressed during vegetative development, and *BvFT2*, which is expressed as expected during the reproductive stage (Pin et al., 2010). Using RNAi and overexpression in beet, Pin et al. (2010) clearly established that the *BvFT1* paralog had acquired a dominant negative effect that represses flowering until vernalization (cold treatment) represses *BvFT1* and allows expression of the floral promoting paralog *BvFT2*. This dramatic difference in function between the two paralogs can be recapitulated in *Arabidopsis*, where BvFT2 activates flowering while BvFT1 represses it. This demonstrates that there is a biochemical change in BvFT1 relative to the otherwise highly conserved function of FT proteins. The use of chimeric proteins and site-directed mutagenesis in the more tractable *Arabidopsis* system allowed the authors to identify the specific amino acid changes that are responsible for the neofunctionalization, and further demonstrate that these changes are associated with BvFT1 alleles that were selected during domestication. This kind of study relies heavily on HE but uses it in exactly the right way—by targeting an otherwise highly conserved genetic module, and in combination with detailed expression and functional studies in the original system, which allows the heterologous results to be accurately interpreted.

Another powerful application of HE is to use homologs from a series of diverging taxa to probe the conservation of specific biochemical properties, such as recognition of DNA binding sites. This is essentially a matter of letting evolution do the sitedirected mutagenesis for you: as you move out to more deeply diverging taxa, there are more non-synonymous mutations, allowing you to ask whether the endogenous biochemical function is still retained. The land plant-specific transcription factor *LEAFY* (*LFY*) is ideal for this type of study because unlike most plant gene lineages, it has very few retained paralogs. Maizel et al. (2005), tested the ability of *LFY* homologs from across the land plants to rescue the *lfy* mutation in *Arabidopsis*, and then further complemented the phenotypic analysis with microarray studies of gene expression. They found that there was a gradual decreasing degree of phenotypic rescue as they moved out to more distantly related taxa. When paired with tests of protein/DNA interaction, their results suggest that "the declining ability to replace *Arabidopsis* LFY *. . .* is caused by a progressive failure to interact with the canonical LFY binding sites," which, of course, are defined based on work done in *Arabidopsis*. The microarray analysis of the various transgenic lines demonstrated that in the weakest cases of rescue, one of the last target interactions to be lost was with the floral meristem identity gene *APETALA1* (*AP1*). The authors quite correctly noted that this finding does not tell us anything about what the heterologous LFY homologs activate in their endogenous settings—*AP1* homologs are not even present outside angiosperms. Rather, this reflects the extraordinarily high affinity of the LFY binding site present in the *AP1* promoter, such that even deeply divergent homologs with many nonsynonymous changes are still capable of recognizing it. This kind of study highlights evolutionary processes affecting both aspects of developmental gene function since it detects biochemical changes that have altered DNA affinity while also underscoring the fact that repertoires of target genes will simultaneously be evolving.

In summary, my argument is that HE can be very useful in specific cases where we want to investigate changes in the primary component of gene function, which is to say biochemical function. This includes enzymatic capacity as well as affinity for a range of interactions such as protein-DNA and protein–protein. It yields the best results when paired with functional studies in both the donor and recipient taxa so that potentially spurious phenotypes can be ruled out. I think it is also true that HE works best when you can target a genetic module that is otherwise very highly conserved, so that you can lessen the impact of drift and divergence in other components of the pathway (although this is hard to ever rule out completely!). HE does not inform upon homology in general or orthology in particular, nor does it give us much information on what developmental roles the gene may play in its original genomic context, so use it with care.

### **Acknowledgments**

I would like to acknowledge several other scientists with whom I have discussed this question over the years, particularly Vivian Irish, who set me straight on the subject to begin with, as well as Amy Litt, and Sarah Mathews.

# **References**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Kramer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Evolution of epithelial morphogenesis: phenotypic integration across multiple levels of biological organization**

*Thorsten Horn, Maarten Hilbrant and Kristen A. Panfilio\**

*Institute for Developmental Biology, University of Cologne, Cologne, Germany*

Morphogenesis involves the dynamic reorganization of cell and tissue shapes to create the three-dimensional body. Intriguingly, different species have evolved different morphogenetic processes to achieve the same general outcomes during embryonic development. How are meaningful comparisons between species made, and where do the differences lie? In this Perspective, we argue that examining the evolution of embryonic morphogenesis requires the simultaneous consideration of different levels of biological organization: (1) genes, (2) cells, (3) tissues, and (4) the entire egg, or other gestational context. To illustrate the importance of integrating these levels, we use the extraembryonic epithelia of insects—a lineage-specific innovation and evolutionary hotspot—as an exemplary case study. We discuss how recent functional data, primarily from RNAi experiments targeting the Hox3/Zen and U-shaped group transcription factors, provide insights into developmental processes at all four levels. Comparisons of these data from several species both challenge and inform our understanding of homology, in assessing how the process of epithelial morphogenesis has itself evolved.

**Keywords: epithelial morphogenesis, evolution of development, insects, extraembryonic tissues,** *Hox3/zen***,** *Tribolium castaneum***,** *Megaselia abdita***,** *Oncopeltus fasciatus*

# **Introduction**

In the rapidly developing fruit fly *Drosophila melanogaster*, the predominant insect model for developmental genetics, embryonic morphogenesis occurs largely after cell fates are determined. Indeed, there is extensive literature on *Drosophila* early tissue patterning, including axis specification and segmentation, preceding morphogenesis. Perhaps as a result of our profound knowledge in *Drosophila*, many evolutionary developmental (evo-devo) studies in arthropods take a gene-centered approach and focus on early patterning, as early fate specification is often a powerful signal for comparisons of species that are separated by long periods of evolutionary time (e.g., Peel et al., 2005; Sachs et al., 2015).

In this Perspective article, however, we highlight the importance of studying the morphogenetic movements that occur during animal development and of integrating multiple levels of biological organization when making interspecific comparisons. For doing so, we distinguish between four increasingly inclusive levels of biological organization. (1) Genetic regulation of development comprises information about the specific genes and their protein products that are involved in transcriptional control, signaling cascades, and the molecular basis of cytoskeletal structure and remodeling. (2) Individual cells differentiate to acquire a particular identity, including

#### *Edited by:*

*Sylvain Marcellini, University of Concepcion, Chile*

#### *Reviewed by:*

*David Q. Matus, Stony Brook University, USA Stefanie D. Hueber, University of Konstanz, Germany*

#### *\*Correspondence:*

*Kristen A. Panfilio, Institute for Developmental Biology, University of Cologne, Zülpicher Str. 47b, 50674 Cologne, Germany kristen.panfilio@alum.swarthmore.edu*

#### *Specialty section:*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics*

> *Received: 30 June 2015 Accepted: 11 September 2015 Published: 29 September 2015*

#### *Citation:*

*Horn T, Hilbrant M and Panfilio KA (2015) Evolution of epithelial morphogenesis: phenotypic integration across multiple levels of biological organization. Front. Genet. 6:303. doi: 10.3389/fgene.2015.00303* the transcriptional state as well as cell shape and structure. (3) More broadly, cells coordinate with their neighbors within tissues. In epithelial tissues for example, cells retain contact with their neighbors via adherens junctions, such that cell shape changes affect the entire tissue's geometry. (4) Finally, the egg is a global system, where tissue integrity and inter-tissue adhesion need to be precisely controlled during morphogenesis to achieve the final form.

Overall, integration of different biological levels is as much a conceptual framework for understanding the physical context of a given gene's role in a developmental process as for interpreting how morphogenesis has evolved. To illustrate this, here we discuss recent advances in the study of extraembryonic (EE) development in a range of insect model species. We show that the insect EE epithelia provide a case study with a particularly rich evolutionary history, making them well suited to assessing the evolution of morphogenesis.

# **Development and Evolution of the Insect Extraembryonic Membranes**

Many arthropod eggs include an EE tissue component, but in the insects this feature has become a specific structural innovation (reviewed in Panfilio, 2008). At the base of the winged insect lineage, the EE epithelial membranes evolved to form discrete compartments within the egg. In most winged insects, the serosa lines the eggshell, providing the outermost cellular layer and enclosing all other contents, including the yolk. The amnion, analogous to its namesake in vertebrates, forms a fluid-filled cavity ventral to the embryo, retaining a connection to the embryo along the latter's dorsal margin (**Figure 1**: "most insects" schematic).

The ability of EE membranes to form these compartments early in development has allowed the insects to exploit diverse ecological niches, largely due to the manifold functions of the serosa as a protective outer layer that buffers the embryo against environmental fluctuations and assaults. Recent work has shown that serosal cuticle secretion correlates with the acquisition of desiccation resistance, and that the cuticle itself provides mechanical support to the egg (Rezende et al., 2008; Jacobs et al., 2013; Panfilio et al., 2013). At the same time, recent experimental evidence demonstrates the long hypothesized ability of the serosa to protect the embryo after wounding and pathogen infection via upregulation of the innate immune system (Chen et al., 2000; Jacobs et al., 2014). Furthermore, the serosa's ultrastructure is consistent with physiological roles in water and solute processing, and it has acquired additional mechanical and physiological functions during hatching and early larval life in species with oviposition sites within plant and animal tissues (citations in Panfilio, 2008).

The functional importance of the amnion remains far more enigmatic, despite early recognition of the potential value of an insect amniotic cavity (Zeh et al., 1989). Indeed, the amniotic cavity has been lost independently during the evolution of apocritan wasps and cyclorrhaphan flies (Fleig and Sander, 1988; Rafiqi et al., 2008), with the amniotic epithelium confined to a dorsal yolk cover (**Figure 1**: schematic for *Megaselia*). More extremely, in *Drosophila melanogaster* the serosa and amnion are conflated into a single, dorsal amnioserosa, dispensing with EE compartments entirely (**Figure 1**: schematic for *Drosophila*), and some *Drosophila* species have decanalized development to the point where amnioserosal formation is variable, but still essential (Gavin-Smyth et al., 2013; Panfilio and Roth, 2013).

Using the evolution and development of insect EE membranes as a case study, in the next sections we discuss how the different levels of biological organization are interconnected. We show that, for example, changes at the gene level can induce dramatic changes in cell and tissue behavior that differ between species, even if the consequences at the whole egg level are similar. On the other hand, similar morphogenetic movements can be achieved by quite different mechanisms on the cellular and tissue organizational levels.

# **Linking** *zen* **and U-shaped Genes to Changes at the Egg Level**

Extraembryonic tissue evolution is tightly linked to evolution of the Hox3/Zen transcription factor (**Figure 1**). The evolutionary origin of strictly EE expression of this gene coincides with the origin of complete EE compartments (Hughes et al., 2004; Panfilio et al., 2006). Hox genes are generally highly conserved in relative genomic position, protein sequence, copy number, and function in anterior-posterior patterning (Krumlauf, 1992; Cook et al., 2001). In contrast, arthropod *Hox3* orthologs are prone to duplication and marked sequence divergence, particularly the insect orthologs (known as "*zen*," after the original *Drosophila* mutants), with independent instances of duplication in beetles, flies, and lepidopterans (Pultz et al., 1988; Brown et al., 2002; Panfilio et al., 2006; Panfilio and Akam, 2007; Chai et al., 2008; Rafiqi, 2008; Ferguson et al., 2014). Interestingly, in the red flour beetle, *Tribolium castaneum*, the two *zen* paralogs have different functions (van der Zee et al., 2005), and these will be discussed in turn.

In all holometabolous insects studied so far *zen*, or *Tc-zen1* in *Tribolium*, has a conserved function in EE tissue specification (**Figure 1**: orange diamonds). However, while loss of function mutation in *Drosophila* is lethal (Wakimoto et al., 1984), the scuttle fly *Megaselia abdita* and *Tribolium* can survive after RNA interference (RNAi) knockdown (van der Zee et al., 2005; Rafiqi et al., 2008; Panfilio et al., 2013). Examining why the endstage phenotypes differ after loss of a conserved gene's function provides a good example for the integration of the different levels of biological organization.

Key to understanding the different phenotypic outcomes of disrupting *zen/zen1* is the evolutionary change in EE membrane complement between these species. The single amnioserosa of *Drosophila* exhibits features of both early serosa and late amnion (reviewed in Schmidt-Ott et al., 2010), and specification of the entire EE domain is under the control of *Dm-zen* (the paralog *Dm-z2* is not essential during embryogenesis, Pultz et al., 1988). In the less derived situation in *Megaselia* and *Tribolium*, *Ma-zen*/*Tczen1* only specifies the serosa (van der Zee et al., 2005; Rafiqi et al., 2008). The phenotypic outcome after *zen* knockdown could then be explained by loss of all EE tissue identity in *Drosophila*, while *Megaselia* and *Tribolium* retain an amnion.

the homeodomain transcription factor Zen are available, and which are discussed here. Diamonds represent individual *zen* genes (two each in *Drosophila* and *Tribolium*), with either a late morphogenetic function (green) or an early specification function (orange). Non-insect *Hox3* orthologs also have a specification function, albeit within embryonic rather than extraembryonic tissue. Note that within the fly lineage the highly divergent *bicoid* paralog has been omitted for clarity (for recent work on this, see Klomp et al., 2015). Schematics show evolutionary stages of EEM acquisition and secondary reduction as inferred from extant species (blue text; color coding is indicated in the legend). Here, "complete" refers to the formation of discrete, closed compartments within the egg, namely the outer serosal sac and the inner amniotic cavity. The illustration of EEM organization in primitively wingless insects is modified from (Panfilio, 2008), with the corresponding author's consent.

However, on closer inspection the similarity in gene function, residual EE tissue complement, and end-stage phenotypic outcome in *Megaselia* and *Tribolium* is rather surprising if we consider the difference in EE membrane configuration (**Figure 1**: schematics). In both species, we observe a respecification from serosal to amniotic fate and in both species it is important to have a tissue covering the yolk dorsally during the dorsal closure stage. However, the underlying wild type configurations are different. In *Megaselia*, the amnion provides a persistent dorsal yolk cover, and its overall shape, size, and dorsal position are not changed dramatically by the *Ma-zenRNAi* fate shift (Rafiqi et al., 2008). In contrast, in *Tribolium* the dorsal side of the egg is first covered by the serosa and only later in wild type development is the amnion pulled dorsally when the serosa contracts (Panfilio et al., 2013). How can *Tribolium* then survive without a serosa? Here, *Tc-zen1RNAi* not only produces a persistently dorsal amniotic region due to respecification (van der Zee et al., 2005), but also reveals novel cellular and tissue properties of the entire amnion in late development as it takes over the role of the serosa in providing a dorsal cover (Panfilio et al., 2013). Hence, the survival of *Ma-zenRNAi* embryos is rather due to the dispensability of the serosa for dorsal closure, while in *Tribolium* developmental regulation—that is, compensation via plasticity of the amnion—enables survival after *Tc-zen1RNAi* . Thus, conserved, early gene functions can feed into different developmental routes, depending on tissue configuration and morphogenetic properties.

At the same time, other genes with EE roles have undergone changes in their particular function and in their interaction partners during insect evolution. One example is the T-box transcription factor Dorsocross (Doc), a member of the Ushaped gene family (Frank and Rushlow, 1996; Reim et al., 2003). In *Drosophila*, *Dm-Doc* is necessary for the maintenance of the amnioserosa toward the end of germband extension, when Zen protein disappears (Reim et al., 2003, and references therein). In contrast, *Tc-Doc* has multiple roles in *Tribolium* EE morphogenesis, but no role in maintaining either EE tissue (TH, KAP unpublished observation). There is some evidence that *Ma-Doc* has a maintenance function in the *Megaselia* serosa (Rafiqi et al., 2008), but the end stage RNAi phenotype would also be consistent with an early morphogenetic role, as in *Tribolium*.

Consistent with this difference in the EE role of Doc, the molecular context of its function also differs between species. *Drosophila Doc* expression requires simultaneous inputs from Dm-Zen and Dm-Dpp (Reim et al., 2003). In contrast, in *Tribolium* these inputs are temporally and spatially distinct, and subsequent Dpp signaling is itself locally dependent on *Tc-Doc* (TH, KAP unpublished observation), a feature not known from *Drosophila*. Another example is *Doc*'s relation to *hindsight* (*hnt*), another U-shaped gene. In *Drosophila*, *Dm-hnt* is downstream of *Dm-Doc* and therefore shows a similar knockdown phenotype. In *Tribolium*, both genes also show a similar knockdown phenotype to one another, but they seem not to influence each other's expression (TH, KAP unpublished observation).

Finally, *Dm-Doc* performs multiple functions within the body proper (Hamaguchi et al., 2012; Sui et al., 2012), such as for heart development (Reim and Frasch, 2005), that are not observed in *Tribolium* (Nunes da Fonseca et al., 2010). Interestingly, one of these functions, bending of the *Drosophila* wing imaginal disc, directly links the transcription factor Dm-Doc to cellular and epithelial rearrangements (Sui et al., 2012). Here, Dm-Doc promotes intracellular microtubule web redistribution and degradation of the extracellular matrix through Matrix metalloproteinase. It remains to be seen if similar mechanisms are also employed downstream of Doc in EE morphogenesis across species.

In summary, disruption of *zen*, a gene with a conserved function in specification of the serosa, leads to lethality, compensation by the amnion, or simply loss of the serosa with no severe consequences for development, depending on the tissue topography of the species under investigation. Moreover, there are large differences in gene knockdown phenotypes, overall gene functions, and specific interaction partners between orthologous genes in different species, and these differences can only be understood if all other biological levels, from cells to the egg system, are taken into account.

In the next section we shift the focus from the genes themselves to tissue organization and function, again highlighting differences between species at different levels of biological organization.

# **Linking Cellular, Tissue, and Egg System Levels**

In late embryogenesis, it is essential that insect EE tissue actively withdraws in a precise way to mediate dorsal closure, whereby the embryonic epidermis seals at the dorsal midline and EE tissue degenerates within the yolk. Indeed, amnioserosa-epidermal tissue coordination during *Drosophila* dorsal closure has been extensively studied over the last 15 years (e.g., Jacinto et al., 2000; Kiehart et al., 2000; Solon et al., 2009; Lada et al., 2012; Wells et al., 2014). While differences in tissue organization are expected between dorsal closure involving an amnioserosa and dorsal closure involving a serosa and amnion, we also find differences between species with both EE membranes (**Figure 1**: "most insects" schematic). To illustrate this point, here we compare late EE morphogenesis between *Tribolium* and the hemimetabolous milkweed bug, *Oncopeltus fasciatus*, charting a sequence of similarities and differences as morphogenesis proceeds.

Firstly, rupture of the EE tissues over the embryo's head produces an opening through which the embryo passively emerges. In *Oncopeltus*, preparation for EE rupture within this specialized region involves apoptosis of the amniotic cells subjacent to the serosa, thinning the region to a single EE epithelium, while at the border of this region the amnion adheres strongly to the serosa (Panfilio and Roth, 2010). As this epithelial remodeling occurs locally, the entire egg system is subtly reorganized to ensure that the specialized EE region is centered at the egg pole, which appears to mechanically facilitate rupture via global contractile force exerted by the serosa (Panfilio, 2009; Panfilio and Roth, 2010). In contrast, in *Tribolium* the opening for EE rupture is not centered at the egg pole but occurs anteriorventrally (Panfilio et al., 2013). Here, precision in determining the site of EE opening involves morphological specialization in a cap of amniotic cells. Furthermore, preparation for rupture in *Tribolium* involves the formation of an amnion-serosa epithelial bilayer over most of the amnion's surface area (Koelzer et al., 2015), not just the narrow ring of amnion-serosa contact seen in *Oncopeltus*. These differences in local behavior of the amnion and in the amnion-serosa connection are all the more striking given that *Of-zen* and *Tc-zen2*, the second *Tribolium* paralog, both act extraembryonically to ensure that EE rupture occurs (**Figure 1**: green diamonds; van der Zee et al., 2005; Panfilio et al., 2006).

In subsequent stages the EE tissues withdraw dorsally, but with the serosa ending up in the tapered dorsal-anterior in *Oncopeltus* compared to the flat dorsal-medial region in *Tribolium* (**Figures 2A,E**). Nonetheless, in both cases the serosa transforms from a squamous to a columnar epithelium and forms a hollow disc known as the dorsal organ (Panfilio, 2009; Panfilio and Roth, 2010; Panfilio et al., 2013; **Figures 2B,C,F,G**). Thus, cell shape and intra-tissue organization are conserved despite the geometrical difference resulting from the tissues' positions within an anisotropic egg system.

However, as a consequence of the manner in which the amnion-serosa connection was prepared for rupture, the intertissue organization remains fundamentally different at the dorsal organ stage. The *Oncopeltus* amnion is only connected to the serosa at its margin, and sits on top of the yolk (**Figure 2D**). While both this attachment point and substrate also apply to the *Tribolium* amnion, the bilayer organization means that additionally a portion of the amnion has the serosa as a substrate (**Figure 2H**). As the serosa degenerates, tissue continuity over the yolk surface is essential for successful dorsal closure (Panfilio et al., 2013). The planar (lateral–lateral) nature of amnion-serosa attachment in *Oncopeltus* allows the serosa to efficiently pinch off and draw the edges of the amnion together above it (**Figure 2B**). In *Tribolium*, inter-tissue shearing is required so that the portion of the amnion over the serosa (apical-basal connection) can detach, enabling final serosal internalization (Koelzer et al., 2015).

Altogether, *zen*-mediated rupture, EE contraction and withdrawal, and the cellular structure of the serosal dorsal organ are shared between *Oncopeltus* and *Tribolium* even though the manner of amniotic regionalization (selective apoptosis or morphological alteration) and therefore the nature of the amnion-serosa inter-tissue connection differ.

# **Conclusions**

In this Perspective, we use morphogenesis of the insect EE epithelia to show how different levels of biological organization can provide apparently contradictory signals as to the degree of evolutionary conservation across species. At first glance, these levels are hierarchically ordered, with increasing complexity toward the whole egg system: genes specify cell types and shapes, cells of similar type form tissues, and different tissues shape the whole egg system morphology. However, any pattern of congruence across levels is possible. For example, *zen* orthologs are necessary to specify the serosa in holometabolous insects, but the loss of *zen* function is lethal in some species, while others survive—variously due to serosal dispensability or morphogenetic compensation by the amnion. These phenotypic

outcomes can be explained by a conserved gene function being embedded in the context of differences in EE tissue complement and topographical configuration across species. In the case of EE epithelial withdrawal in *Oncopeltus* and *Tribolium*, the nature of tissue regionalization and inter-tissue attachment differ dramatically even while gene function, intra-tissue structure, and gross morphogenesis are similar. Only the integration of all biological levels can provide the full picture and give insight into the evolution not just of epithelial morphogenesis but of embryogenesis in general, which ultimately depends on cell shape changes and coordinated tissue reorganization.

In the past, these levels have predominantly been studied separately or in limited combinations. In the watershed Heidelberg screen of *Drosophila* embryonic patterning mutants (Nüsslein-Volhard and Wieschaus, 1980), gene function was linked to final phenotype as determined from larval cuticle preparations, a method still widely employed, especially for large scale screening (Schmitt-Engel et al., 2015). However, even as the initial link between gene and egg system levels is being established, the aim is to refine this information to more precise phenotypic analysis. At this point, a misexpressed gene itself becomes a tool to further explore cell and tissue properties.

With new techniques available, we are increasingly able to investigate multiple levels of biological organization at the same time. For example, gene silencing via RNAi combined with live imaging of fluorescent constructs that afford cell and tissue resolution allows us to visualize the full developmental phenotype resulting from a given genetic manipulation, with *Tribolium* serving as a particularly amenable comparative model among the insects (Sarrazin et al., 2012; Benton et al., 2013; Panfilio et al., 2013; Koelzer et al., 2014). Also, as pioneered in *Drosophila*, mechanical manipulations provide a means of circumventing genetic manipulation when examining cell, tissue, and egg system levels (e.g., Ma et al., 2009; Monier et al., 2010; Wells et al., 2014), and clonal analysis approaches test cellular behaviors at tissue boundaries (Külshammer and Uhlirova, 2013). From all of these studies it becomes increasingly clear that the interplay between the levels is rather similar to a regulatory network (as known from gene interactions), including various interactions and feedback loops, than to a hierarchical structure based on increasing complexity.

Having understood the interplay of the developmental levels within a species, we can now start comparing different species and additional levels. For example, a key aspect of epithelial morphogenesis is the structure of boundaries between different tissues, where mechanical forces are transmitted and intertissue attachments are made. To what extent are mechanical, geometric properties of tissues and the egg system a better predictor than phylogenetic relatedness of how similar two species' morphogenetic processes will be? Moving beyond the confines of the egg system, an even more integrated view of the phenotype can be extended to the influence of the external environment, as addressed in the growing field of eco-evo-devo (Gilbert and Epel, 2008; Abouheif et al., 2014). Ultimately, as the number of comparative animal models and accessibility of experimental tools increases, so too should the sophistication of our phenotypic understanding of how development has evolved.

# **References**


# **Author Contributions**

KAP conceived the idea for the manuscript. KAP, TH, and MH jointly wrote the manuscript.

# **Acknowledgments**

We thank the German Research Foundation (Deutsche Forschungsgemeinschaft) for financial support for conducting some of the experimental investigations described here and during the preparation of this manuscript (grant PA 2044/1-1 to KAP).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Horn, Hilbrant and Panfilio. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **From limbs to leaves: common themes in evolutionary diversification of organ form**

#### *Remco A. Mentink and Miltos Tsiantis\**

*Department of Comparative Development and Genetics, Max Planck Institute for Plant Breeding Research, Cologne, Germany*

An open problem in biology is to derive general principles that capture how morphogenesis evolved to generate diverse forms in different organisms. Here we discuss recent work investigating the morphogenetic basis for digit loss in vertebrate limbs and variation in form of marginal outgrowths of angiosperm (flowering plant) leaves. Two pathways underlie digit loss in vertebrate limbs. First, alterations to digit patterning arise through modification of expression of the *Patched 1* receptor, which senses the *Sonic Hedgehog* morphogen and limits its mobility in the limb bud. Second, evolutionary changes to the degree of programmed cell death between digits influence their development after their initiation. Similarly, evolutionary modification of leaf margin outgrowths occurs via two broad pathways. First, species-specific transcription factor expression modulates outgrowth patterning dependent on regulated transport of the hormone auxin. Second, species-specific expression of the newly discovered REDUCED COMPLEXITY homeodomain transcription factor influences growth between individual outgrowths after their initiation. These findings demonstrate that in both plants and animals tinkering with either patterning or post-patterning processes can cause morphological change. They also highlight the considerable flexibility of morphological evolution and indicate that it may be possible to derive broad principles that capture how morphogenesis evolved across complex eukaryotes.

#### **Keywords: evolution and development, leaflet formation, digit formation, patterning versus post-patterning, morphological diversity**

A key question in biology is how morphological diversity is generated. Although plants and animals evolved multicellularity independently, within each kingdom conserved gene regulatory networks (hereafter termed networks) control the development of one or more body parts. In this context evolution operates as a "tinkerer," being strongly influenced by the materials currently at hand as well as prior history (Jacob, 1977; Davidson and Erwin, 2006; Pajoro et al., 2014; Sorrells et al., 2015). Consequently, considerable constraints exist on the evolution of new traits (Pires-daSilva and Sommer, 2003; Davidson and Erwin, 2006; Carroll, 2008; Pires and Dolan, 2012) raising the question of how evolutionary changes to networks that control development may circumvent these constraints.

Both theoretical arguments and empirical evidence suggest that regulatory sequence variation has greater potential for the generation of morphological change than coding sequence variation (Stern, 2000; Carroll, 2008). This is likely because regulatory sequences tend to be organized in highly modular *cis* elements, leading to their mutation having a lower propensity to generate pleiotropic effects that would compromise development (Stern, 2000; Carroll, 2008; Rebeiz et al., 2015). However, to what degree this broad principle manifests itself in different

#### *Edited by:*

*Sylvain Marcellini, University of Concepcion, Chile*

#### *Reviewed by:*

*Barbara Ambrose, The New York Botanical Garden, USA Cédric Finet, University of Cambridge, UK*

#### *\*Correspondence:*

*Miltos Tsiantis, Department of Comparative Development and Genetics, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829 Cologne, Germany tsiantis@mpipz.mpg.de*

#### *Specialty section:*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics*

> *Received: 16 June 2015 Accepted: 27 August 2015 Published: 08 September 2015*

#### *Citation:*

*Mentink RA and Tsiantis M (2015) From limbs to leaves: common themes in evolutionary diversification of organ form. Front. Genet. 6:284. doi: 10.3389/fgene.2015.00284* evolutionary lineages and how precisely the balance of conservation versus divergence of different networks creates morphological diversity remain open questions (Stern and Orgogozo, 2008, 2009; Rebeiz et al., 2015).

Other than knowing the types of genetic changes underlying the generation of morphological diversity, an understanding of evolution requires determining how, when and where those genetic changes influence morphogenesis. For example, it remains largely unclear if particular stages or aspects of development tend to be preferentially amenable to evolutionary tinkering. Does evolution primarily target developmental processes that are active during early stages of organ development; or are such early stages less favored by evolution, owing to the potential risk of causing pleiotropic effects that will influence later stages of development? Does diversity largely arise through tinkering with later acting developmental programs that fine-tune organ form after the more fundamental patterns have been laid down? Studies on the emergence of novel insect pigmentation patterns in closely related species suggest that later developmental stages (e.g., the insect pupal stage) might be more readily available for evolutionary tinkering (Wittkopp and Beldade, 2009). However, definitive answers to these questions are likely still to come and will depend on the particularities of the system that is under investigation, including its evolutionary history, its modularity, the type of trait being studied and its degree of integration with other traits. Nevertheless, one way to approach these problems in a unified fashion when comparing diverse organisms is to consider whether and how evolution influences patterning and post-patterning modes of development. Patterning processes act to impart positional information, for example through the use of morphogen concentration gradients, and facilitate correct distribution of cellular identities within tissues (Kondo and Miura, 2010; Rogers and Schier, 2011). Post-patterning processes, on the other hand, serve to sculpt emerging tissues and organs typically after their identity has previously been determined. For example, post-patterning processes may operate by removing superfluous cells through apoptosis or by adjusting the growth rates of specific populations of cells within the organ (Coen et al., 2004; Suzanne and Steller, 2013). Notably, this distinction between "post-patterning" and patterning does not exclude the possibility that patterning genes may have persistent effects in developmental time, including post-patterning stages (Salazar-Ciudad et al., 2003; McGregor et al., 2007; Werner et al., 2010).

Two recent papers have explored the significance of patterning versus post-patterning events on development by studying digit loss in mammals and leaf shape formation in angiosperms and revealed a strong link between altered, species-specific gene expression domains and morphological variation. Both studies suggest considerable versatility in how evolutionary tinkering with developmental processes can ultimately arrive at similar phenotypes.

Cooper et al. (2014) studied the evolutionary changes that resulted in convergent digit loss in different mammalian species. A mammalian limb (such as a leg) is attached to the body at one (proximal) end and has 1 to 5 anteroposteriorly distinct digits (e.g., toes) at the other (distal) end. Limbs develop from the limb bud through the sequential action of several distinct signaling centers (**Figure 1A**; Butterfield et al., 2010). Bone morphogenetic proteins (BMPs) specify the formation of the apical ectodermal ridge (AER) at the distal end of the limb bud, from which fibroblast growth factors (FGFs) are secreted to stimulate proximodistal outgrowth (Lewandoski et al., 2000; Pizette et al., 2001; Boulet et al., 2004). The morphogen Sonic hedgehog (SHH) is secreted from the posterior limb bud to direct both digit patterning and expansion of the hand- or footplate to accommodate all digits (Harfe et al., 2004; Towers et al., 2008). Subsequent digit elongation is controlled by FGFs secreted from the AER and in later stages BMPs sculpt the limb by inducing apoptotic cell death within interdigital tissue in concert with the transcription factor *Msx2* (Marazzi et al., 1997; Ferrari et al., 1998; Sanz-Ezquerro and Tickle, 2003).

Cooper et al. (2014) studied the possible relevance of these pathways to digit loss in 3- and 5-toed jerboas (small, desertdwelling rodents that develop a varying number of digits on the hind limb between different species) and mice, as well as in ungulates (hoofed animals with 1 to 4 toes). In jerboas, they observed no differences in patterning gene expression but rather found expanded domains of apoptotic cells in 3-toed jerboa hind limbs, surrounding tissue otherwise destined to form digits I and V. Specifically, they found that expression of *Msx2* was expanded in the 3-toed jerboa hind limb, likely causing increased cell death (**Figure 1A**). They obtained comparable results with 1-toed horse embryos, where *Msx2* expansion correlated with removal of digits II and IV. This shows a convergent evolutionary event in which an apoptotic pathway normally used to remove interdigital tissue was co-opted by regulatory changes to act in truncating digit outgrowth.

By expanding their study to even-toed ungulate species,Cooper et al. (2014) found a striking flexibility in modes of digit loss. In pigs the expression of *Patched 1 (Ptch1)*, a *Shh* receptor, is reduced toward the posterior limb bud. *Ptch1* acts to restrict the spread of *Shh* by sequestration, thus reduction in *Ptch1* expression leads to an expanded region of *Shh* activity and more uniform expression of its target genes, presumably causing a shift in limb axis symmetry to the space between digits III and IV (**Figure 1A**; Chen and Struhl, 1996; Butterfield et al., 2009). These findings were corroborated by a second group that showed a similar reduction of *Ptch1* expression in cow limb buds (Lopez-Rios et al., 2014). These authors also demonstrated that *cis* regulatory divergence of *Ptch1* renders it unresponsive to *Shh* signaling in a negative feedback loop. Remarkably, when Cooper et al. (2014) examined embryos of camel, a third ungulate, they observed no modification of *Ptch1* expression, but instead an expansion of apoptosis and *Msx2* expression, resembling the case in 3 toed jerboas and horses (**Figure 1A**; Cooper et al., 2014). These results indicate that in species of the same taxonomic order, such as camels and pigs (both members of the Artiodactyla or even-toed ungulates), fundamentally different mechanisms can be modified to achieve similar organ modifications, revealing considerable flexibility in evolutionary pathways. A conclusion that is additionally in line with the fact that Cooper et al. (2014) did not recover any evidence for evolutionary tinkering with the *HoxD* regulatory landscape, which has previously been identified and hypothesized to be a good candidate for vertebrate digit

diversification, owing to its highly modular nature (Montavon et al., 2011). These findings raise the question of how broadly this flexibility in evolutionary tinkering with either growth or patterning occurred during evolution of complex eukaryotes.

A recent paper by Vlad et al. (2014) establishes that a comparable logic helps explain diversification of leaf shapes in plants of the Brassicaceae family. Brassicaceae, like other flowering plants, form either simple leaves, consisting of an entire blade with smooth, serrated or lobed margins, or dissected leaves, comprising individual leaflets. Both types of leaves develop from leaf primordia that initiate from the pluripotent shoot apical meristem (SAM). *KNOTTED-LIKE HOMEOBOX (KNOX)* transcription factors are expressed in the meristem to maintain its organ-generating potential (Hay and Tsiantis, 2006; Barkoulas et al., 2007). Transport of the plant hormone auxin through the *PIN-FORMED 1* (*PIN1*) efflux transporter, coupled to a self-reinforcing feedback of auxin on PIN1 expression and polarization, likely creates sequential local auxin activity maxima at the flanks of the SAM. This process appears to be self-organizing and the resulting auxin maxima are required for sequential

primordium development (Reinhardt et al., 2003; Heisler et al., 2005; Jonsson et al., 2006; Smith et al., 2006). *CUP-SHAPED COTYLEDON (CUC)* genes mark the leaf primordium boundary and allow its separation by repressing growth at the flanks (Aida et al., 1997; Hibara et al., 2006). *CUC*s and *PIN1* also function together to pattern the leaf margin, as *CUCs* likely repress growth at the boundaries of serrations or leaflets, while *PIN1* generates auxin maxima at the sites of their outgrowth. Notably, in this context CUCs likely both repress growth at the flanks of marginal outgrowths and promote their outgrowth at least in part via promoting generation of an auxin maximum at their tip (Nikovics et al., 2006; Barkoulas et al., 2008; Blein et al., 2008; Koenig et al., 2009; Kawamura et al., 2010; Ben-Gera et al., 2012). In the *Arabidopsis thaliana* leaf margin, *CUC2* directs PIN1 localization to form local auxin maxima while auxin feeds back to repress *CUC2*, creating the repeated pattern of leaf serrations along the leaf margin (Bilsborough et al., 2011). *KNOX* genes, then, are expressed in dissected leaves and differentiate these from simple leaves by retarding cellular differentiation, thus rendering the leaf competent to form leaflets in response to *PIN1* dependent auxin maxima (Hay and Tsiantis, 2006; Barkoulas et al., 2008; Kimura et al., 2008; Bar and Ori, 2014). Similarly *CUC1*, a redundantly acting paralogue of *CUC2*, is expressed in the dissected leaves of *Cardamine hirsuta* but is confined to the leaf meristem boundary in its simple-leaved relative*A. thaliana.* These observations indicate that evolutionary tinkering with auxinbased patterning mechanisms through alterations in expression of upstream transcription factors such as *KNOX* and *CUC* may be a major route for generating diversity in leaf shapes (**Figure 1B**; Barkoulas et al., 2008; Blein et al., 2008; Piazza et al., 2010; Hasson et al., 2011; Finet and Jaillais, 2012; Bar and Ori, 2014).

Until recently, no genes had been identified that specifically influence leaflet formation without also affecting meristem function or leaf initiation. Such findings suggested that leaflets form through the redeployment of processes that acted earlier in development during leaf initiation (Bar and Ori, 2014; Vlad et al., 2014). To identify novel regulators of leaf complexity, Vlad et al. (2014) conducted a forward genetic screen for genes required for leaflet formation in *C. hirsuta*. They identified the *REDUCED COMPLEXITY (RCO)* homeobox gene, of which a loss of function allele simplifies the leaf without causing pleiotropic phenotypes, suggesting a specific requirement for *RCO* in leaflet formation. *RCO* evolved in the Brassicaceae family from a gene duplication of *LATE MERISTEM IDENTITY 1* (*LMI1*); originally identified in *A. thaliana* as a floral regulator (Saddic et al., 2006). They found that *RCO* is specifically expressed at the base of leaflets (**Figure 1B**), while *LMI1* is expressed more distally, in a complementary pattern, along the leaf margins. *RCO* does not appear to influence PIN1-mediated auxin patterning, but instead functions by repressing cellular growth between individual leaflets in *C. hirsuta*, a post-patterning process that allows leaflet separation. *RCO* was lost in *A. thaliana* during evolution, contributing to its leaf simplification, but re-introducing *RCO* into *A. thaliana* drives expression in basal regions of the leaf and increases leaf complexity, partially reversing the consequences of evolution. These results, together with a follow-up study in the sister species *Capsella rubella* and *Capsella grandiflora* by Sicard et al. (2014), suggest that *RCO* is a key regulator of leaf shape and diversity in the Brassicaceae and provide a striking example of organ shape diversification by tinkering with local growth regulation at the flanks of a growing organ primordium (Vlad et al., 2014). Another notable aspect of the *RCO* study is that this gene was discovered through performing a forward genetics study in *C. hirsuta* and could not have been found in *A. thaliana*, where the gene has been lost, thus highlighting the importance of unbiased studies in diverse taxa for understanding the genetic basis for the evolution of form.

Taken together, these two studies illustrate how evolution can exploit both patterning and post-patterning processes to

#### **References**


create morphological diversity in both plants and animals. It will be interesting to explore whether bias might exist for variations of either kind or for particular developmental pathways across different kingdoms. For example, plants and animals have evolved distinct biophysical properties and morphogenetic strategies that pose different constraints for evolution. Whereas animal morphogenesis involves the use of large-scale apoptosis and cell migration, these mechanisms are used to more limited extent (Gunawardena, 2008; Fendrych et al., 2014) or not at all respectively in plants. This is because rigid cell walls in plants somewhat complicate the use of both these mechanisms during development: cell walls typically remain after apoptosis, thereby constraining developmental options, while they make cell migration impossible by preventing the sliding of cells alongside each other. These fundamental differences in the cellular underpinnings of development suggest that morphological diversity in plants mostly arises through tinkering with regional growth rates and growth directionality (Coen et al., 2004), consistent with the findings of Vlad et al. (2014) These particularities of plants, however, do not preclude that changes to such growth-related processes can also contribute to the evolution of animal form (Abzhanov et al., 2004; Wu et al., 2004). In any event, independent of the organism studied, morphology is determined by processes that take place at different levels of organization and yield the final form through complex feedback loops of genetic regulation, signaling and tissue growth (Salazar-Ciudad and Jernvall, 2010; Kennaway et al., 2011; Prusinkiewicz and Runions, 2012). Conceptualizing how activity of gene regulatory networks creates organ shape is consequently not solely intuitive. The computer science and developmental biology interface offers a promising path for resolving such problems in a predictive fashion (Lewis, 2008; Green et al., 2010; Prusinkiewicz and Runions, 2012; Sheth et al., 2012). Quantitative investigations of morphogenesis and the genetic basis of its variation in different organismal lineages will allow us to build a general picture of how organ diversity is generated and maintained. Such studies should also help us understand the basis for and limits of predictability of morphological evolution.

### **Acknowledgments**

Work in the Tsiantis lab on diversification of leaf shape is supported by a Deutsche Forschungsgemeinschaft (DFG) "Adaptomics" grant TS 229/1-1, a DFG Collaborative Research Center SFB 680 grant on Evolutionary Innovations and a core grant from the Max Planck Society. MT also acknowledges support of the Cluster of Excellence on Plant Sciences. We thank Sheila McCormick for comments on the manuscript.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Mentink and Tsiantis. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The differential view of genotype–phenotype relationships

#### *Virginie Orgogozo1\*, Baptiste Morizot2 and Arnaud Martin3*

*<sup>1</sup> CNRS, UMR 7592, Institut Jacques Monod, Université Paris Diderot, Paris, France, <sup>2</sup> Aix Marseille Université, CNRS, CEPERC UMR 7304, Aix en Provence, France, <sup>3</sup> Department of Molecular Cell Biology, University of California, Berkeley, CA, USA*

An integrative view of diversity and singularity in the living world requires a better understanding of the intricate link between genotypes and phenotypes. Here we reemphasize the old standpoint that the genotype–phenotype (GP) relationship is best viewed as a connection between two differences, one at the genetic level and one at the phenotypic level. As of today, predominant thinking in biology research is that multiple genes interact with multiple environmental variables (such as abiotic factors, culture, or symbionts) to produce the phenotype. Often, the problem of linking genotypes and phenotypes is framed in terms of genotype and phenotype maps, and such graphical representations implicitly bring us away from the differential view of GP relationships. Here we show that the differential view of GP relationships is a useful explanatory framework in the context of pervasive pleiotropy, epistasis, and environmental effects. In such cases, it is relevant to view GP relationships as differences embedded into differences. Thinking in terms of differences clarifies the comparison between environmental and genetic effects on phenotypes and helps to further understand the connection between genotypes and phenotypes.

#### *Edited by:*

*Sylvain Marcellini, University of Concepcion, Chile*

#### *Reviewed by:*

*Jacob A. Tennessen, Oregon State University, USA Margarida Matos, University of Lisbon, Portugal*

#### *\*Correspondence:*

*Virginie Orgogozo, CNRS, UMR 7592, Institut Jacques Monod, Université Paris Diderot, 15 Rue Hélène Brion, 75013 Paris, France virginie.orgogozo@normalesup.org*

#### *Specialty section:*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics*

> *Received: 24 March 2015 Accepted: 28 April 2015 Published: 19 May 2015*

#### *Citation:*

*Orgogozo V, Morizot B and Martin A (2015) The differential view of genotype–phenotype relationships. Front. Genet. 6:179. doi: 10.3389/fgene.2015.00179* Keywords: genotype, phenotype, genetics, complex trait, GxE, GxG

# Introduction

*We sometimes seem to have forgotten that the original question in genetics was not what makes a protein but rather 'what makes a dog a dog, a man a man.'*

(Noble, 2006)

One fundamental question in biology is to understand what makes individuals, populations, and species different from each other. The concept of *phenotype*, which corresponds to the observable attributes of an individual, was coined in opposition to the *genotype*, the inherited material transmitted by gametes. Since the early proposal that genotypes and phenotypes form two fundamentally different levels of biological abstraction (Johannsen, 1911), the challenge has been to understand how they articulate with each other, how genotypes map onto phenotypes. In the last 15 years, more than 1,000 examples of DNA sequence changes have been linked to naturally occurring non-deleterious phenotypic differences between individuals or species in Eukaryotes (Martin and Orgogozo, 2013b). In human, the OMIM catalog (Online Mendelian Inheritance in Man, http://omim*.*org/) compiling the genetic determinants of disease-related phenotypes totals more than 4,300 entries and a total of 2,493 published Genome-Wide Association Studies (GWAS) have been uncovering a wealth of sites in the genome that are statistically associated to complex traits

**Abbreviations:** GP, genotype–phenotype.

(Welter et al., 2014). As the detection of causal links between genetic and phenotypic variation is accelerating, a reexamination of our conceptual tools may help us in finding unifying principles within the swarm of data. Here we reflect on the relationship between genotypes and phenotypes and we address this essay to biologists who are willing to try to challenge their current understanding of phenotypes. We single out one useful point of view, the differential view. We then show that this simple framework remains insightful in the context of pervasive pleiotropy, epistasis, and environmental effects.

## Genes as Difference Makers

Mutations isolated from laboratory strains have been instrumental to the understanding of the GP map. Under the classical scheme, a mutation is compared to a wild-type reference, and its phenotypic effects are used to infer gene function. This framework often leads to a semantic shortcut: from a genetic change causing a *variation* in phenotype, it is often convenient to assimilate the corresponding gene as a causal determinant of a trait (Keller, 2010; **Figure 1A**). It is common to find headlines expressing these simplifications, trumpeting to wide audiences the discovery of the "longevity" or "well-being" gene, that sacrifice scientific accuracy to psychological impact. Along these lines, should a gene whose mutation is lethal be called a "life gene"? What these over-simplified formulations truly mean is that *variation* at a given gene causes *variation* in a given phenotype (Dawkins, 1982; Schwartz, 2000; Waters, 2007). In fact, a gene alone can neither cause an observable phenotypic trait, nor can it be necessary and sufficient to the emergence of observable characteristics. Genes need a cellular environment, the combined action of multiple other genes, as well as certain physico-chemical conditions to have an observable effect on organisms (**Figure 1B**). For example, brown hair pigmentation in one human being is not just a product of the genes coding for pigment synthesizing enzymes but also of the presence of cells producing pigments of relevant substrate molecules (such as tyrosine for melanin), and of the amount of received sun light (Liu et al., 2013). Thus, the genetic reductionist approach, which only explores a few genetic parameters among the variety of causal factors, is vain to fully address the broad question of what makes hair brown, of what brings forth a particular biological structure, or process in its entirety. Nevertheless, genetic reductionism can be perfectly appropriate for identifying genetic loci where a change causes a phenotypic difference (**Figure 1C**). A *difference* in hair color between two individuals could be due in some cases to their genetic difference. We note, however, that not all phenotypic changes can be attributed to genetic changes. A difference in hair color could also be caused by non-genetic factors such as age, intensity of solar radiation or hair dyeing, or by a combination of both genetic and non-genetic differences.

While modern genetics was in its infancy, Alfred Sturtevant formulated the question of the GP map in simple terms: "one of the central problems of biology is that of differentiation – how does an egg develop into a complex many-celled organism?

That is, of course, the traditional major problem of embryology; but it also appears in genetics in the form of the question, How do genes produce their effects?" (Sturtevant, 1932). For long some geneticists may have thought that they were dissecting the morphogenetic mechanisms underlying the *formation* of phenotypic traits, while their experimental approach were in fact uncovering genes whose absence or alteration (mutations, deletions, duplications, rearrangements, etc.) leads to phenotypic *differences* (compare **Figure 1A** with **Figure 1C**). In fact, the sentence "your hair is brown" can be interpreted either as an absolute observation (a description of a particular assemblage of molecules containing defined levels of the dark pigment eumelanin and of the pale pigment pheomelanin) or with implicit reference to other possibilities (it is brown and not of another color). Misconceptions arise because phenotypes are usually defined relative to possibilities that are not formulated explicitly. Our minds and our language often tend to confuse the objects whose variation is under consideration with the variation itself (Keller, 2010), and it is essential to remind that, in genetics, the objects of interest (e.g., a given genotype, an allele or a phenotype) deserve to be defined *relatively* to another reference state.

In summary, the classical genetic reductionist approach is inherently unable to elucidate all the factors responsible for observable characteristics in the living world (Stotz, 2012) but is a powerful and relevant method for dissecting the genetic levers of heritable phenotypic variation. Focusing on phenotypic variation between individuals rather than on absolute characters present in single organisms is key to better comprehend the genetic causes of phenotypic diversity.

# The GP Relationship is between Two Levels of Variation

Thinking in terms of differences makes apparent an abstract entity that encapsulates both genetic and phenotypic levels. This entity is composed of a variation at a genetic *locus* (two alleles), its associated phenotypic change (two distinct phenotypic states), and their relationships (**Figure 1D**). The three of us name the assemblage of these elements a "gephe," but here we simply call it a "genotype–phenotype relationship" (GP relationship). We will show that the GP relationship is much more than a simple and loosely defined interaction between two levels of organization: it is a cause-and-effect connection that facilitates our understanding of phenotypic diversity.

#### The Genetic Part of a GP Relationship

In current genome annotation databases, a *gene* is usually defined as a stretch of nucleic acids that is transcribed and codes for an RNA or a polypeptide with a known or presumed function (Gerstein et al., 2007). The genetic locus underlying a phenotypic difference is not necessarily a gene in the strict sense; it can span a particular base-pair, a coding region, a *cis-*regulatory region, or extend to an entire gene with its *cis*-regulatory regions, or even to a gene cluster (**Table 1**). As previously noted by others (Falk, 1984; Gilbert, 2000; Stern, 2000; Moss, 2003; Griffiths and Stotz, 2013), the concept of gene in developmental biology and in current genome annotation databases is distinct from the concept of gene in evolutionary biology. Here the emphasis is not on the gene itself as defined in genome databases, but rather on a case-by-case functional partitioning of the genome into difference-making loci. The genotypic part of a GP relationship can take the form of various alleles: distinct codons coding for different amino acids, insertions/deletions within a protein coding sequence, diverging versions of a particular *cis*regulatory element, presence/absence of transposon insertions,

number of gene copies within a gene cluster prone to structural variation, etc. Within a genome not all nucleotide sites are associated with phenotypic variation. For instance, there are probably fragments of nucleotide sequences, including the socalled junk DNA (Graur et al., 2015), whose presence does not have any consequence on observable characteristics of the organism, besides being replicated, and possibly transcribed. There are also genetic loci that may have been associated with phenotypic variation in the past and that are no longer associated with phenotypic variation. For example, genetic variation in histone DNA binding coding regions may have been important during the early evolution of eukaryotic cells, but these genetic loci no longer harbor phenotypically relevant variation besides lethal mutations. Within a genome, there are thus nucleotide sites that are absolutely required for life, but that do not harbor viable phenotypically relevant variation themselves.

#### The Phenotypic Part of a GP Relationship

The phenotypic counterpart of the GP relationship refers to a kind of variation (hair color, level of toxin resistance, etc.) rather than to a state (blond hair, taster of phenylthiocarbamide, etc.; **Table 1**).

The phenotype associated with a genetic change is not necessarily confined to the organism that harbors the genetic mutation. For example, the difference between left- and rightcoiled shells in the snail *Lymnaea peregra* is determined by a single genetic locus with maternal effect: the genotype of the mother, but not of the individual itself, is responsible for the direction of shell coiling (Boycott et al., 1931). In other cases, the causal genetic change lies within symbiont bacteria: aphid thermal tolerance can vary between individuals due to a point mutation in their bacterial symbiont (Dunbar et al., 2007). Certain phenotypic effects can also come up at a level higher than the organism harboring the genetic change (Dawkins, 1982), one exemplary case being the social organization of an ant colony (Wang et al., 2013).

#### The Differential Part of a GP Relationship

As defined above, the GP relationship encompasses a genetic difference and a phenotypic difference. The relationship of *difference* at both the genetic and the phenotypic level is quite abstract, and it can correspond to three distinct differences within the living world: (#1) a difference between two reproductively isolated taxa (living or extinct), (#2) a difference segregating within a population, and (#3) the difference that first appeared during evolution, between an organism harboring the ancestral allele/trait and its direct descendant which evolved the new allele/trait. Of note, the variation in phenotype does not always immediately follow the emergence of the new causing mutation, but can appear later from the singular assortment of alleles that are segregating in the population. For example, a new phenotype of reduced armor plates appeared in a freshwater stickleback population when a recessive *EDA* allele already present at cryptic levels ended up in a homozygous state in one individual (Colosimo et al., 2005; Jones et al., 2012). A major conceptual advance made by Charles Darwin was to

#### TABLE 1 | A few examples of GP relationships.


*See (Martin and Orgogozo, 2013a) for references and for additional cases.*

relate variation among individuals within an interbreeding group (difference #2) with variation between taxonomic groups in space and time (difference #1; Lewontin, 1974a).

Note also that certain phenotypic changes may appear at the level of the entire organism when the "causative" mutation is accompanied by additional somatic mutations that are highly likely. For example, in women carrying a wild-type allele and a mutant allele o*f BRCA1*, cells can produce wildtype *BRCA1* proteins since they carry one copy of the wildtyp*e BRCA1* allele. Nevertheless, these women have up to an 80% risk of developing breast or ovarian cancer by age 70 compared to women carrying two wild-type copies of *BRCA1*, due to the appearance of additional deleterious mutations within the wild-typ*e BRCA1* allele in their somatic breast cells (Narod and Foulkes, 2004).

Importantly, the GP difference is always defined relative to a population, or taxon, of interest (Sober, 1988). In less medically developed countries, humans carrying two defective copies of the phenylalanine hydroxylase gene have serious medical problems including seizure and intellectual disabilities. In contrast, in most developed countries, such humans are diagnosed at birth and have a normal life span with normal mental development thanks to a phenylalanine-restricted diet (Armstrong and Tyler, 1955). Therefore the GP relationship involving the phenylalanine hydroxylase defective mutation is context-dependent: the mutation is associated with health problems in less medically developed countries but not in other countries. This example shows that the causal relationship between a genetic change and its associated phenotypic change can hide multiple embedded parameters (such as medical practices for the phenylalanine hydroxylase case) within the *ceteris paribus* assumption of "all other things being equal."

In summary, the GP relationship is best viewed as a relationship between two variations, one at the genotypic level, and one at the phenotypic level. The human mind can elaborate concepts of increasing abstraction: concepts of things (e.g., a cell), concepts of change (e.g., evolution), and concepts of relations (e.g., homology; Cassirer, 1910; Simondon, 1968). Here the concept of GP relationship establishes a relation between two *changes* (genetic and phenotypic). In the next paragraphs we will show that, compared to the usage of intuitive concepts of things, this detour through increased abstraction may prove more efficient to better understand phenotypic diversity.

# Several Current Representations of the Connection between Genotype and Phenotype Implicitly Dismiss the Differential View

We argued above that the differential view should always be kept in mind when thinking about the connection between genotypes and phenotypes. GWAS, which represent the most popular method to detect genomic loci that are associated with complex traits in populations, are based on the analysis of differences (Visscher et al., 2012). Nevertheless, in current research the differential view is sometimes implicitly dismissed. When multiple factors are observed to influence phenotypic traits (**Figure 1B**), the differential view is considered as too simplistic and researchers often prefer to focus back on phenotypes of single individuals, without explicitly relating them to a phenotypic reference.

In most current articles, the problem of connecting the genotype to the phenotype is framed in terms of genotype and phenotype maps. The first GP map was introduced by Richard Lewontin in his book "The genetic basis of evolutionary change" (Lewontin, 1974a; **Figure 2A**). He indicated the average genotype of a population as a point in the space of all possible genotypes (G space) and the average phenotype of the same population as a corresponding point in the space of all possible phenotypes (P space). The evolutionary process was thus decomposed into four steps: (1) the average phenotype is derived from the development of the distinct genotypes in various environments; (2) migration, mating, and natural selection acts in P space to change the average phenotype of the initial population into the average phenotype of the individuals

which will have progeny; (3) the identity of successful parents determines which genotypes are preserved; and (4) genetic processes such as mutation and recombination modify position in G space.

In another common graphical representation (**Figure 2B**), a point in the G space and its corresponding point in the P space correspond to the genotype and the phenotype of a single individual (Fontana, 2002; Landry and Rifkin, 2012). Under such a representation, the abstract object that we defined above as the GP relationship would correspond to a "move" in genotype space associated with a "move" in phenotype space (or, better, a sum of several "moves" in genotype, and phenotype spaces because several distinct genomes can carry the two alternative alleles of a given GP relationship). In a third representation put forward by Wagner (1996; **Figure 2C**), individual genes are connected to individual traits.

Although these three graphical representations of GP maps may facilitate our understanding of certain aspects of biology, in all of them the GP relationship and the differential view are not easy to grasp. It is quite perplexing that the first person to draw such a GP map was Richard Lewontin, an eloquent advocate of the differential view (see for example his preface to Oyama, 2000, a masterpiece of persuasion). Because these graphics focus on individual rather than differential objects, we believe that these three representations implicitly incite us to go back to the more intuitive idea of one genotype associated with one phenotype. Losing sight of the differential view might also come from the molecular biology perspective, where proteins are viewed as having causal effects on their own, such as phosphorylation of a substrate or binding to a DNA sequence. Because of the two entangled definitions of the gene, either as encoding a protein, or as causing a phenotypic change (Griffiths and Stotz, 2013), it is easy to move from a differential view to a non-differential view of the GP relationship.

In summary, many current mental representations of the connection between genotype and phenotype implicitly dismiss the differential view. We will now show that the differential view is compatible with the fact that phenotypic traits are influenced by a complex combination of multiple factors and that we can find a relevant schematic representation of GP relationships.

# The Problem of Pleiotropy

Decomposing an organism into elementary units such as anatomical structures has been instrumental in many biology disciplines such as physiology, paleontology and evolution. However, the issue is to identify the decomposition into characters that is most adequate for the question of interest. For questions related to relationships between organs of various individuals or species (such as homology), it might be appropriate to keep the traditional decomposition into anatomical structures (Wagner, 2014). Richard Lewontin and Günter Wagner defined characters as elements within an organism that answer to adaptive challenges and that represent quasi-independent units of evolutionary change (Lewontin, 1978; Wagner, 2000). Their definition deals with absolute traits observed in single organisms (for example the shape of a wing, or the number of digits in an individual) and is thus far from the differential view. Here, to better apprehend evolution and phenotypic diversity of the living world, we propose to decompose the observable attributes of an organism into multiple elementary GP variations that have accumulated through multiple generations, starting from an initial state. We insist that under this perspective, characters are not concrete objects (such as skin) but abstract entities defined by the existence of *differences* between two possible observable states (for example skin color). As an analogy, one can imagine two ways to produce a well-worn leather shoe of a particular shape. One can either assemble the different atoms into the same organization, or one can buy a shoe in a store and then subject it to a series of mechanical forces. We are naturally inclined to compare organisms to machines, and to think in terms of pieces that must be assembled to make a functional whole. However, the rampant metaphor of the designer or maker is inadequate for understanding the origin of present-day organisms (Coen, 2012). To understand the phenotypic features of a given organism it is more efficient to decompose it into abstract changes that occurred successively across evolutionary time, and not across developmental time. The initial state is a hypothetical ancestor of the organism under study.

Certain mutations (qualified as pleiotropic) are observed to affect several organs at once while others alter only one at a time (Paaby and Rockman, 2013; Zhang and Wagner, 2013). For pleiotropic mutations, we consider that the GP relationship should include all the phenotypic changes (in diverse organs, at various stages, etc.) associated with the genetic difference. For instance, the V370A mutation of the EDAR receptor is associated not only to hair thickness but also to changes in sweat gland and mammary gland density in Asian populations (Kamberov et al., 2013). The GP relationship is, in such cases, one-to-multi. Considering skin and eye as independent anatomical modules of the human body might seem appropriate for many evolutionary changes, but it is somewhat inadequate in cases where these two organs evolved a new pigmentation trait at once through a single mutation in the *SLC45A2* gene (Liu et al., 2013). Reasoning in terms of GP relationships strikes off the problem of finding a relevant decomposition into elementary anatomical structures. The elementary GP relationships themselves appear as adequate semi-independent modules, whose combination can account for the observable characteristics of an organism.

# The Problem of Continuous Complex Traits

Under the differential concept of GP relationships, one crucial point is to decompose observable traits into a series of semi-independent phenotypic variations, that is to identify the elementary changes that have occurred during evolution. Experimental approaches are available to decompose a given phenotypic difference into appropriate finer sub-variations. For example, crossing plants with different leaf shapes yields a progeny that exhibits a composite range of intermediate leaf shapes. Principal component analysis uncovered elementary leaf shape changes that can together account for the difference in shape between parental lines and that appear to be caused by distinct genomic regions (Langlade et al., 2005). This suggests to some extent that "the sum obscures the parts." What we traditionally consider as complex traits can be made of simpler traits, more amenable to genetic analysis. Another illuminating example is the abdominal pigmentation in the *Drosophila dunni* group. Taken as a single variable, the levels of pigmentation show a complex genetic architecture, but decomposing adult patterns into anatomical sub-units unravels discrete genetic control for each sub-trait (Hollocher et al., 2000). A better known case is the evolution of body color in beach mice. The difference in color between light-colored beach mice and dark mice can be decomposed into distinct phenotypes (dorsal hue, dorsal brightness, width of tail stripe, and dorsoventral boundary), which are all associated with distinct mutations in the *Agouti* gene (Linnen et al., 2013; **Figure 3**). Each *Agouti* genetic locus appears to be dedicated to the specification of pigmentation in a given body part. Together, they form a group of tightly linked loci that are associated with changes in coat pigmentation.

from Linnen et al. (2013).

While complex traits may not always be reducible to a suite of simple GP relationships, it is possible that traits such as adult human height, the most emblematic quantitative trait predicted to consist of many genetic effects of small size (Fisher, 1930), might also be decomposed into elementary variations, each explaining more discrete sub-traits. While some determinants of human height such as *LIN28B* have been associated to adult height at different ages, other genes have only reached statistical significance in stage-specific studies focusing on fetal growth and height velocity at puberty (Lettre, 2011). In other words, these data suggest that human height may be a composite trait that is modulated by several GP relationships, each acting at different phases of developmental growth.

# The Problem of Epistasis and GxE

Gene-by-Environment (GxE) interaction occurs when the phenotypic effect of a given genetic change depends on environmental parameters. Similarly, epistasis, or GxG interaction, occurs when the phenotypic effect of a given genetic change depends on the allelic state of at least one other locus (Phillips, 2008; Hansen, 2013). There is increasing evidence that GxG and GxE interactions are of fundamental importance to understand evolution and inheritance of complex traits (Gilbert and Epel, 2009; Hansen, 2013). We propose that both phenomena can be integrated into the basic GP differential framework, where both GxG and GxE interactions inject a layer of contextdependence, and result in differences embedded within differences.

The difference in color pigmentation between dark and light-colored beach mice mentioned previously (**Figure 3**) is not only due to mutations in *Agouti* but also to a coding mutation in the *MC1R* gene that decreases pigmentation (Steiner et al., 2007; **Figure 4B**). The effect of the *MC1R* mutation is visible only in presence of the light-colored-associated derived *Agouti* haplotype. Here the *Mc1R* locus is considered to interact epistatically with the *Agouti* locus. In this case, we propose that the GP relationship does not comprise a single phenotypic difference but two *possible* phenotypic differences (a change in coat pigmentation or no change at all). The choice between these two phenotypic differences is determined by the genetic background (here at the *Agouti* locus). The differential view thus remains relatively straightforward for two-loci interactions: the context-dependence of the phenotype is translated into a choice between two possible phenotypic differences. We propose that a GP relationship involving a mutation subjected to multiple epistatic interactions should comprise all possible phenotypic differences that can result from the mutation in all genetic backgrounds. Among all possible phenotypic variations, the phenotypic difference that will be observed is determined by other genetic loci. In general, GxG interactions involve multiple sites that are dispersed across the genome (Bloom et al., 2013).

An example of GxE interaction (see also **Figure 4A**) is the naturally occurring loss-of-function allele of *brx* in *Arabidopsis*

*Agouti* but not in an *Agouti* homozygous background for the recessive dark

allele (Steiner et al., 2007).

plants, which is associated with accelerated growth and increased fitness in acidic soils, and with severely reduced root growth compared to wild-type in normal soils (Gujas et al., 2012). GxE interactions are usually analyzed in the form of a *norm of reaction*, which represents all the observable traits of a *single* genotype across a range of environments (Johannsen, 1911; Sarkar, 1999). In the case of GxE interactions, we propose that the GP relationship should comprise all the possible phenotypic *changes* that can be caused by the associated genetic change across various experimental conditions. The associated phenotypic change is thus a difference between two norms of reaction. A textbook example is the variation in temperature-size rule in *C. elegans*. Like most other animals, *C. elegans* nematodes grow larger at low temperature, but a wild-type laboratory strain of *C. elegans* originating from Hawaii shows no variation in body size across various temperatures. An amino acid change in a calciumbinding protein is responsible for the decreased ability of the Hawaiian strain to grow larger at low temperature (Kammenga et al., 2007). Here the norm of reaction (representing nematode body size across a range of temperatures) differs between nematodes and the associated GP relationship encompasses the difference between these two slopes.

The range of phenotypic variations embodied within GP relationships subjected to GxG and GxE interactions can be quite overwhelming, especially in cases when several tissues are affected by the same mutation, and when the phenotypic variation of each tissue is influenced by other genomic loci and by environmental conditions. In fact, the phenotypic effects of a mutation always rely on other pieces of DNA from the same genome, so that any GP relationship can be considered to experience epistasis. In other words, a genetic locus affecting a phenotype never acts independently of other DNA sequences. For instance, a given opsin allele will only lead to particular color vision properties if an eye is formed and if this eye receives light during its development, allowing effective vision neural circuits to form. For the differential view to be tractable, we advise not to consider all possible genetic backgrounds and environmental conditions, but to restrict possibilities to potential environments, and segregating alleles that are relevant to the population of interest (Sober, 1988).

In summary, in presence of epistasis or GxE interactions, a genetic change is not associated with a single phenotypic difference but with multiple possible phenotypic differences, among which one will be achieved, depending on the environment and the genetic background. The contextdependence can be represented schematically as GP differences embedded into other genotype and environment differences.

#### The Differential View of Genetic and Environmental Effects on Phenotypes

As underlined by multiple authors (most notably Waddington, 1957; Oyama, 2000; Keller, 2010), genes and environment act jointly on the phenotype, and in most cases it is impossible to disentangle the effect of one from the other. Here we show that reasoning in terms of differences helps to clarify the comparison between genetic and environmental effects on phenotypes. However, we identify certain cases where the comparison remains difficult.

By analogy with the GP relationship, we can define the environment-phenotype relationship as an environmental variation (two environments), its associated phenotypic change (distinct phenotypic states), and their relationships. For example, in many turtle species, a change in temperature during egg development is associated with the male/female sex difference (**Figure 5A**) and at least six transitions from environmental to genetic sex determination (**Figure 5B**) occurred across the turtle phylogeny (Pokorná and Kratochvíl, 2009). In this case, environmental and genetic effects can be compared: sex chromosomes and temperature have the same phenotypic effect on turtles. Such observations led West-Eberhard (2003, 2005) to propose the "genes as followers" hypothesis, which suggests that novel phenotypic states are more likely to arise first from a change in the environment than from a genetic mutation, and that mutations occur only later, in modifying the threshold for expression of the novel trait. West-Eberhard (2003, 2005) extrapolated from differences segregating within populations (difference #2) to differences that arose temporally during the evolution of a population (difference #1).

The independent evolution of directional left–right asymmetry from symmetrical ancestors in multiple lineages has provided a major argument supporting the "gene as follower" hypothesis (Palmer, 2004). Under this framework, it is stipulated that directional asymmetry, where all individuals are same-sided, has often evolved from a "random asymmetry" state, where the directionality will depend on environmental factors and thus vary between genetically identical individuals. For instance,

the strongest claw of a lobster will develop based on usage and has a priori equal probabilities to develop on the left or on the right side. We can see how the "genes as followers" formula applies here: the environment triggers an asymmetry, and later in evolution some genetic effects can bias its directionality on one side or the other. But while asymmetry "occurs before genetic variation exists to control it," the differential view makes it clear that the genetic effect on directionality is not comparable to the environmental effect that triggers the asymmetry. The genetic change makes a switch between the final 100% same-sided condition and an initial condition where 50% of the cases are dextral and 50% sinistral. In contrast, the two alternative phenotypic states resulting from variation in the environment are considered to be 100% dextral and 100% sinistral. This example shows that for the sake of accuracy it is important to explicitly state the differences that are being considered within a GP relationship.

The differential view provides a theoretical framework that can help in designing experiments to investigate the proper variables: one can compare different genotypes in a fixed environment (classic GP relationship), compare the response of a fixed genotype to two different environments (phenotypic plasticity), or compare the sensitivity of two different genotypes to two different environments (wherein the phenotypic variation becomes a *difference in a difference*; see for example Engelman et al., 2009; Thomas, 2010).

Various quantitative methods have been developed to disentangle genetic from environmental effects and to quantify GxE interactions (Lynch and Walsh, 1998). Yet in certain situations it can be impossible to separate genetic from environmental effects in a biologically meaningful way, even when reasoning in terms of differences (Lewontin, 1974b). Populations of the beetle *Calathus melanocephalus* comprise two morphs, long-winged, and short-winged (Schwander and Leimar, 2011). The long-winged morph only develop from homozygous individuals for a recessive allele segregating in the population, and only when food conditions are good. In this case, the genetic and environmental effects are intermingled (**Figures 6A,B**). In the theoretical case of a population comprising only shortwinged heterozygous animal that have been raised in starving conditions and long-winged ones, both genes and environment are responsible for the wing difference between individuals and it is impossible to estimate the proportion of environmental and genetic effects because genes and environment act on distinct levels along the complex causal link between genotypes and phenotypes.

Another case that questions the classical environment/genetic distinction is when the addition of certain symbiotic bacteria modifies the host phenotype. Mice fed with a *Lactobacillus* strain of bacteria show reduced anxiety-related behaviors compared to control mice fed with broth without bacteria (Bravo et al., 2011). Here the behavioral difference is caused by a switch between presence or absence of a particular gut symbiont. The cause of the phenotypic difference is not a simple change in a DNA sequence,

nor a simple environmental change disconnected from genetic changes, but a switch between presence and absence of a factor that can be considered as an environmental factor – the bacteria – which contains DNA whose mutations may also change the host phenotype.

In conclusion, reasoning in terms of differences can help to clarify the comparison between genetic and environmental effects on phenotypes. However, the issues are nothing but simple. Since genes and environment act on distinct levels along the complex causal link between genotypes and phenotypes, in certain cases it is impossible to disentangle both causes.

# A Clarification on the Terminology Gain/Loss and Permissive/Instructive

Phenotypic differences appear to fall under two major categories, either the presence/absence of something (for example body hair or the ability to digest milk), or the shift between two alternatives that are both present (for example two hair colors). Similarly, on the genotype side, a mutation can correspond to the presence/absence of a relevant DNA sequence, or to a nucleotide polymorphism. The differential perspective makes it evident that a loss of phenotype is not necessarily associated with a loss of genetic material, and vice versa. For example, the evolutionary gain of dark pigments covering the entire coat of animals has often been associated with a loss of the *Mc1R* gene (Gompel and Prud'homme, 2009). Furthermore, as one of us noted previously (appendix of Stern and Orgogozo, 2008), gain or loss for a phenotype is subjective. For example, loss of hair might also be considered as gain of naked epidermis. Most insect epidermal cells differentiate into one of these two alternative states and both states involve large gene regulatory networks. It is not clear which phenotypic state represents a gain or loss relative to the other. Even on the genotypic side, defining losses and gains can be difficult. The insertion of a transposable element can knock down a gene, whereas a deletion can sometimes creates a new binding site for an activator of transcription. As a matter of fact, the evolutionary gain of *desatF* expression in *D. melanogaster* occurred through a series of three deletions, each creating an hexamer motif that is required for *desatF* expression (Shirangi et al., 2009).

Similarly, the differential perspective on environmental effects highlights the fallacy of the distinction between permissive and instructive signals. A permissive signal is associated with the presence/absence of a phenotype and an instructive signal with the shift between two alternatives that are both present. As argued above, these distinctions at the phenotypic level are not clear-cut.

In conclusion, we suggest that the gain/loss and instructive/permissive terminology should be used with caution.

# Taxonomically Robust GP Relationships

A mutation is expected to produce a somewhat reproducible phenotypic variation within a population. Such reproducibility in phenotypic outcome is required to allow genetic evolution and adaptation by natural selection (Lewontin, 1974a; Kirschner and Gerhart, 1998). Indeed, a newly formed allele that would generate yet another phenotype each time it ends up in a different organism would not be subjected to natural selection. Reasoning in terms of variation, rather than considering alleles as isolated entities, makes it clear that competition occurs between alleles that span the same genetic locus. Natural selection acts directly on the allelic variation that is consistently associated with a given phenotypic variation, which is the GP relationship itself. The GP relationship is thus a basic unit of evolutionary change, on which natural selection acts.

A major discovery of the past 20 years is that variation at certain genetic loci produce comparable phenotypic variation not only in various individuals of one population, but also in extremely diverse taxa (Martin and Orgogozo, 2013b). In other words, certain GP relationships are taxonomically robust and present across a large range of species. This implies that the genetic and environmental backgrounds have remained relatively constant or have appeared repeatedly throughout evolution to allow for genetic loci to generate similar phenotypic changes in various taxonomic groups. This important finding was quite unsuspected some 50 years ago. For a long time the singularity observed in the living world was expected to reflect a comparable singularity at the genetic level, implicating disparate and nonconserved genes, specific to each lineage (Mayr, 1963). As Mayr (1963) once proposed in 1963, "Much that has been learned about gene physiology makes it evident that the search for homologous genes is quite futile except in very close relatives [*...*]. The saying "Many roads lead to Rome" is as true in evolution as in daily affairs" (Mayr, 1963). In other words, the genetic loci that make a man a man were expected to be different from the ones that make a dog, or a fish. Later, in the 80–90s, a few researchers suggested quite the contrary that evolution proceeds through mutations in conserved protein-coding genes (Romero-Herrera et al., 1978; Perutz, 1983; Stewart et al., 1987; Carroll et al., 2005) – but they had little experimental data at hand to support their view (Tautz and Schmid, 1998). As of today, the accumulating data on the mutations responsible for natural variation make it clear that the diversity in living organisms share a common genetic basis on at least three points. First, comparative developmental biology revealed that animals share common sets of key regulatory genes with conserved functions (Wilkins, 2002, 2014; Carroll et al., 2005). Second, most interspecific differences in animals and plants for which the underlying genetic basis has been at least partly identified (154 cases out of 160) are due to mutations at homologous genes, and very few (6/160) are due to new genes, which nevertheless represent duplicates of existing genes (Martin and Orgogozo, 2013b). Third, multiple cases of similar phenotypic changes have been shown to involve mutations of the same homologous genes in independent lineages (**Table 1**), sometimes across large phylogenetic distances. For instance, the difference in pigmentation between white and orange Bengal tigers has recently been mapped to a single mutation in the transporter protein gene *SLC45A2* (Xu et al., 2013), and this gene has also been associated with hypopigmented eyes, skin, hair, and feathers in humans and chickens (Xu et al., 2013; **Figure 1E**). A more dramatic example is the recent evolution of a toxin resistance in three species that diverged more than 500 million years ago – a clam, a snake and a pufferfish – via the same amino acid substitution in a conserved gene (Bricelj et al., 2005; Geffeney et al., 2005; Venkatesh et al., 2005; Feldman et al., 2012). Such striking patterns of genetic repetition have now been found for more than 100 genes in animals and plants (Martin and Orgogozo, 2013b). Despite existing methodological biases favoring conserved genes in the search for quantitative trait loci (Rockman, 2012; Martin and Orgogozo, 2013b), the level of genetic repetition remains astounding and suggests that for the evolution of at least certain phenotypic differences, relatively few genetic roads lead to Rome (Stern, 2013). Nowadays, one should not be surprised that a piece of DNA associated with a complex wing color pattern in one *Heliconius* butterfly species provides similar wings and collective protection from the same predators when introduced into the genome of other butterflies (Supple et al., 2014). What makes a dog a dog or a man a man is now partly explained by singular assortments of taxonomically robust GP relationships, which are found in multiple lineage branches.

Certain environment–phenotype relationships are also taxonomically robust. For example, across most taxa body size is affected by nutrition; iron deficiency can cause anemia and certain toxic compounds can be lethal. In ectotherms the temperature of the organism depends on the environmental temperature. Given the daunting number of environmental conditions that can be conceived, it is probably impossible to determine whether taxonomically robust GP relationships or taxonomically robust environment– phenotype relationships are more prevalent. Furthermore, whether taxonomically robust GP relationships represent an exceptional and small fraction, or a significant proportion, of all GP relationships is a matter of debate. In any case, the existence of taxonomically robust GP relationships is now clear and should be broadly accepted by the research community.

Some of the most striking teachings of modern biology include the discovery that living beings share the same genetic material (DNA or RNA), the same genetic code (with few exceptions), and the same basic cellular machinery. It is thus far from paradoxical that individual differences are built upon similarities, and the finding that certain GP relationships persists over long evolutionary times completes the picture.

The precise predictive power resulting from the existence of taxonomically robust GP relationships is rare in biology, and is only starting to be exploited at its full potential. Long-range conservations of GP links now fully justify the use of comparative genetics approaches to tackle pragmatic problems. For instance, crop domestication took the form of similar selective pressures in many species, and we now have experimental evidence that this process has repeatedly involved mutations in the same set of conserved genes (Paterson et al., 1995; Martin and Orgogozo, 2013b). This observation opens up interesting applications, as we can use this emerging body of genetic expertise to assist the domestication of future crops, or to use marker-assisted strategies to produce and maintain crop biodiversity (Lenser and Theißen, 2013). GP predictability is already used in the identification of strains that evolved resistance to different pest control strategies, with extreme cases targeting anti-malarial drugs tolerance in *Plasmodium* parasites (Manske et al., 2012), antibiotic resistance in bacteria and yeasts (Fischbach, 2009; MacCallum et al., 2010), or even more dramatically, the anthropogenic evolution of insecticide-resistance in diverse cohorts of insects, regardless of their pest status (Ffrench-Constant et al., 2004; Martin and Orgogozo, 2013b).

Furthermore, repeatability in the genetic basis of phenotypic variation suggests that clinical research is also likely to benefit from genetic studies of a large range of model species (Robinson and Webber, 2014). For instance, natural variation in the tolerance to methotrexate, a chemotherapy drug, was mapped in *Drosophila* fruitflies to genes whose human orthologs are also associated with the response of patients to this drug (Kislukhin et al., 2013), thus extending the use of model organisms as disease models.

## Toward a Gene-Based Classification of Phenotypes

One original aspect of framing the GP connection in terms of individual GP relationships is that it allows to classify phenotypes according to their underlying genetic basis. On a first level, GP relationships implicating different regions within the same gene and producing comparable phenotypic outcomes can be grouped together. Simple cases of GxG interactions have been found between tightly linked mutations, generally within a coding sequence or within a *cis-*regulatory element, when they generate a non-additive effect on the phenotype. For example, a particular mutation in an enhancer was observed to produce different shifts in expression pattern of the downstream coding gene, depending on neighboring DNA sequence (Frankel et al., 2011; Rogers et al., 2013). Similarly, amino acid mutations in a hemoglobin gene was found to increase or decrease affinity to oxygen, depending on the allelic state of other sites (Natarajan et al., 2013). In such cases, it is intuitive to group such genetically linked sites together as they all affect the same kind of phenotypic trait.

Absence of melanin pigments in animals has been associated with mutations in several genes, including *OCA2*, *kit ligand* or *Mc1R* (reviewed in Gompel and Prud'homme, 2009; Liu et al., 2013). Whereas the absence of melanin is traditionally considered as one character state, albinism, irrespective of the underlying genetic basis, we propose here to distinguish *OCA2*-associated albinism from *Mc1R*-associated albinism, or from albinism associated with any other gene. One interest of decomposing the variation within the living world into these multiple elementary GP relationships is that these elements can then be grouped together into successively larger groups. Elementary phenotypic changes involving different genes that are part of the same genetic pathway could also be grouped together as concomitant components of the same phenotype-modulating mechanism. This is clearly the case for the TGF-β signaling molecules BMP15, GDF9, and the TGF-β receptor BMPR1B, that have all been repeatedly associated to variations in ovarian function in humans and in domestic sheep breeds (reviewed in Luong et al., 2011).

Another important consequence of the GP relationship perspective is that apparently distinct phenotypic changes caused by similar genetic loci in various organisms can be examined further to uncover what might be a common basic phenotypic change (Deans et al., 2015). For example, fly larvae and nematode worms have distinctive food search behaviors but mutations in the same orthologous gene (*for*/*egl-4*) have been shown to alter the intensity of food search behaviors in both organisms (Osborne et al., 1997; Mery et al., 2007; Hong et al., 2008). It is thus plausible that a basic behavioral change, which underlies seemingly distinctive fly and nematode food search changes, represents a conserved GP relationship across nematodes and flies. This somewhat borderline example illustrates the challenge to incorporate widespread comparative thinking into our global understanding of biology. Is a mutation in a mouse model relevant to human disease? Can we consider that a mouse phenotype is similar to a human condition if its genetic basis is different? We and others predict that the search for orthologous phenotypes, or "phenologs" (McGary et al., 2010), will represent a major task for modern genetics and will require a fruitful alliance between applied and evolutionary biology.

## Conclusion

In this paper, we bring back the differential concept of gene (Schwartz, 2000) into our framework for understanding the GP map. The differential view of the GP relationship helps to clarify the genetic and environmental effects on phenotypes and their connection. It also opens up new avenues of thinking, in particular regarding the decomposition of observable features within an organism and the representation of GP maps. Furthermore, the existence of taxonomically robust GP relationships encourages an unabashed use of comparative genetics to predict the genetic basis of phenotypic variation in diverse groups of organisms, and this predictive power has an important potential for translational research in agronomy and clinical research.

# Acknowledgments

We deeply acknowledge Marie-Anne Félix for enlightening discussions and Thomas Pradeu for bringing several relevant papers to our attention. We also thank Giuseppe Baldacci, Marie-Anne Félix, Pierre-Henri Gouyon, Alexandre Peluffo, Mark Siegal, David Stern, and the reviewers for their insightful comments on the manuscript. The research leading to this paper has received funding from the European Research Council under the European Community's Seventh Framework Program (FP7/2007-2013 Grant Agreement no. 337579) and from the John Templeton Foundation (Grant no. 43903).

## References


trap? *Zool. J. Linn. Soc.* 156, 168–183. doi: 10.1111/j.1096-3642.2008. 00481.x


**115**


Waters, C. K. (2007). Causes that make a difference. *J. Philos.* 104, 551–579.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Orgogozo, Morizot and Martin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*